• No results found

Simulation Study

In document Space time modelling of health data (Page 89-91)

5.3 Two-way and Multi-way Spatial Survival Models

5.3.2 Simulation Study

A small simulation study is conducted in order to illustrate the inferential properties of this proposed two-way spatial model. The aim is to understand under which circumstances our model would perform better than a spatial model.

Firstly, 300 observations were simulated from a standard spatial parametric proportional hazard model using thesimsurv function in thespatsurv package∗. A Weibull model for the baseline hazard with both the shape and scale parameters being set to 1 respectively is applied. An exponential model for the spatial correlation inZ is assumed and set the marginal variance and spatial decay parameters both to 0.1. The design matrix includes randomly sampled ‘age’, ‘sex’ and ‘cancer’ indicator. Ages were randomly sampled from a uniform distribution on interval [5,50]; the probability of the study subject being male is 0.5 and the probability of each ‘individual’ having cancer is 0.2. All covariate coefficients were set to be 0.01. Below, the resulting simulated times are referred to as the ‘survival times’, as opposed to the ‘calendar’ times (which we refer to as ‘entry’ times below) - which were introduced artificially, as described below.

The following four scenarios are considered in this simulation study

• Scenario 1. Uniform entry times and survival times: artificial entry times were simulated from a uniform distribution an the interval bounded by the range of the survival times. This scenario represents the case where incidence and trends in survival remain constant over time.

• Scenario 2. Uniform entry times and (artificially) longer survival times compared to case 1. Survival times generated from the process described above∗, were extended in a linear fashion using the relation

tnew=toriginal×(1 + 2×(τ1−min(τ1)/max(τ1)−min(τ1)))

CHAPTER 5. SPACE-TIME SURVIVAL ANALYSIS

i.e. unchanged at the ‘start’ of the study and extended by three times at the ‘end’ of the study. This scenario represents a situation in which disease incidence remains constant, but there are exogenous effects that modify survival times, e.g. ‘treament’ for the disease gradually improves over time.

• Scenario 3. Non-uniform entry times and the same survival times as case 1. Non-uniform entry times were simulated from a triangular distribution over the range of the survival times. This scenario represents a situation in which disease incidence increases over time, but treament for the ‘disease’ does not improve over time.

• Scenario 4. Non-uniform entry times and the same survival times as case 2. This scenario represents a situation in which disease incidence increases over time and treament for the disease improves over time.

The following models were fitted: (i) spatial survival models and (ii) two-way spatial survival models to the simulated data from the above four scenarios. All covariates are included in both the spatial and two-way models (i.e. age, sex and cancer presence/absence).

Arguably the simplest way to allow information on entry time to be included in spatial survival models is to include a ‘cohort’ factor variable as a predictor (simply as a covariate in Cox PH model) – this would partition calendar time into segments and each individual whose entry time fell into one of these segments would receive the same modification to their hazard. Although this is a computationally simple approach, the main disadvantage of this method is that we would expect the long-term effects of exogeneous variables (such as improved treatment), which may modify survival times, would generally do so in a smooth manner - i.e. the effects within segments should be temporally correlated.

An extension to this simple method would be to allow entry time to enter as a smooth effect, which can be achieved by using a spline function to represent changes in hazard over calendar time. In order to compare our two-way model with this ‘simpler’ alternative, it is assumed to follow a B-spline representation of the effect with respect to calendar time as additional covariates. This is achieved in R using thebs function from the packagesplines, which creates the additional columns of the design matrix required. Both models assume Weibull hazard for survival times and we used a B-spline hazard for calendar times in the two-way model.

CHAPTER 5. SPACE-TIME SURVIVAL ANALYSIS

Scenario 1 2 3 4

Spatial 369.0566 1310.61 327.1442 925.2546 Two-way 779.6715 1778.072 942.3362 346.682

Table 5.1: Table showing WAIC values from our simulation study.

The simulation study used the MCMC algorithm implemented in the packagespatsurv with 500,000 iterations in total; a 10,000 iteration burn-in, thinning every 490th sample left us with a sample of size 1000. We ascertained convergence by examining trace plots of model parameters. Appendix 8.2.2 gives R code for the simulated data.

The WAIC values for each scenario 1 to 4 are in Table 5.1 (smaller ones indicate better model results). The results show that when entry times are uniformly distributed (senarios 1 and 2), a simple spatial survival model fit the data better. When non-uniformly distributed entry times are considered, but there are no exogeneous changes in surival over time (scenario 3) the spatial model also performs better. However, in Scenario (4), where incidence increases, but do do survival rates, the two-way survival model performs much better. Before running this simulation study, we expected the two way model to be the best performing in scenarios 2 and 4; our expectations were confirmed in the latter case, but not in the former - though it is worthwhile noting that in scenario the relative performance of the two way model was much improved over scenarios 1 and 3.

Based on this small simulation study (see WAIC values), we would expect our two-way spatial surivival model to perform better in scenarios where incidence is increasing over time (perhaps because of better methods of disease detection) and where survival from the disease, it having been detected, is also increasing due to exogeneous, unmeasured factors (e.g. improvements in drug development and treatment regimens). Such an example scenario in real life would be in the analysis of long-term cancer registry data, see below.

In document Space time modelling of health data (Page 89-91)