External validation of the model - Results of the spatial microsimulation model

Chapter 4 – Spatial microsimulation modelling

4.8. Results of the spatial microsimulation model

4.8.2. External validation of the model

While the results of the internal validation were a positive, the point of spatial

130

has occurred should not be too much of a surprise. Having internally validated the model, the data was integerised in order to create ‘whole’ individuals within the data. This in turn allowed for the external validation to take place.

Following the integerisation of the data, a key error in the modelling was discovered during the external validation process. As mentioned in Section 4.4.2, data

transformation was undertaken on the NS-SEC data to account for the ‘not-classified’ data. Also mentioned in this section was that this process left some LSOAs (n=175) one or two (both positive and negative) out from the actual Census numbers. An additional section of code was created that split these 175 zones into four data frames: zones out by negative one; zone out by positive one; zones out by negative two; zones out by positive two. Once these zones had been corrected they were combined into a further data frame, then joined with the LSOAs without error to form one data frame with all 345 LSOAs, complete with correct NS-SEC data. However, due to the way the zones were re-combined, the data no longer matched the order of the LSOAs in the Census data. This led to spurious results during the external validation. While it would be possible to re-order the Census variables, a more practical approach was to fix the code to leave the LSOAs in the desired order. This was done by adding a key to the original Census data, which acted as a reference point for re-ordering the LSOAs later in the code. This allowed for more reliable external validation to take place, but also

reinforces the importance of model checking throughout. The internal validation would not have picked this up as this stage is concerned with checking whether the survey data had aggregated correctly to the Census constraints. Thus the order of the LSOAs was not as important as the data associated with each zone at this step.

As alluded to in Section 4.7, external validation of spatial microsimulation models is harder than internal validation. This is because there is rarely appropriate data available to test the model (hence the need for the modelling in the first place), and often the model outputs will need to be adjusted to match data at other spatial scales. With this in mind, the UK Data Archive, the Office for National Statistics (ONS) and the Health and Social Care Information Centre (HSCIC) were contacted regarding possible data

sources for external validation. The UK Data Archive was contacted specifically regarding differing spatial scales for the ADHS (which is only available publically at the Strategic Health Authority level), however such data were not available. Following a recommendation from data analysts at the UK Data Archive to check with the ONS, it

131

was found that data were available under special agreements at the old Primary Care Trust level. However, this equated to only 77 cases for Sheffield (out of 1,021 for Yorkshire, and 11,380 Overall), of which only 42 had undertaken a dental examination and included the ‘numdu98’ variable. Given that the sample size for the original spatial microsimulation model contained 4840 individuals, 42 was not seen as an adequate number for validating the model. Finally, the HSCIC was contacted with regard to whether they had any alternative datasets on tooth decay (not related to the ADHS) at a spatial scale lower than SHA (or similarly sized regions). No such data were available however.

Thus it became impossible to test how well the spatial microsimulation model had performed regarding the tooth decay variable. While this was unfortunate, it is a

common problem within the field. The performance of the model could still be assessed by evaluating it against ‘non-constrained’ variables though (Campbell, 2011). This involved taking data from variables that were present in both the Census and ADHS, and that were not included as constraint variables in the model. The idea behind this was that if the model was able to predict these target variables accurately, it indicates reliability on the models part when creating new data, and gives confidence that the other target variables have been predicted accurately as well. Variables for household tenure and limiting long-term illness were excluded based on their incompatibility with the Census data, while data on household size were also excluded as it counted the whole population, rather than those aged 16 and over.

This left ethnicity and marital status as potential variables to use. The data for marital status was collected in a very similar manner in the ADHS and the Census data, making it comparable without any data manipulation. The data for ethnicity was found to be less comparable however. When using data for the usual resident populations of each LSOA the ethnicity data from the Census can be manipulated to match the categories from the ADHS exactly. However, this usual resident population included residents aged under the age of 16, therefore making it incomparable with the microdata, which contained only those who were aged 16 or over. Ethnicity data on individuals aged 16 and over was available from NOMIS (NOMIS, 2011), however this data was not available in the same format, meaning that when cross-tabulated with age (so as to exclude those aged under 16) the only comparable categories were ‘white’, ‘mixed race’ and ‘other ethnic group’. It is unusual and unfortunate that the data was not available by age in a format

132

that allowed for a greater disaggregation of different ethnicities, however as the other more comparable datasets were not measuring the same population that was created in the spatial microsimulation modelling, there was little choice but to use these data. The data would still allow for external validation to be conducted, however it is a shame that a more extensive validation using ethnicity was not possible.

Two statistical measures were used to judge the success of the external validation. The first of these was the R2 value, as a general indicator of the fit of the data. This measures the fit of the data, but not necessarily around a given point. Therefore, the second measure used was the standard error around the identity (SEI), which is a measure of how well the data from the model falls around the 45-degree line (Tanton et al, 2011). Ideally data that fits perfectly should fall along this line, in the same way that it did during the internal validation (see Figure 15). The SEI score is calculated using the formula demonstrated in Equation 2.

Equation 2 – SEI calculation formula (Tanton et al, 2011)

In Equation 2 Yest are the spatial microsimulation estimates (of marital status or ethnicity in this case) and YABS are the equivalent data from the Census. Tanton et al, (2011) state that the SEI is interpreted in a similar way to the R2 value, in that a higher figure indicates a better fit, only this time it refers to the 45-degree line. Presented below are graphs and statistics for each of the variables used for the external validation (Figures 20-28). The plain black lines represent the 45-degree line around which the SEI is measured, while the blue lines represent the fit of the data through the R2 statistic. The blue shading around this line represents 95% confidence intervals.

133

Figure 20 – External validation using the marital status ‘single’ variable. R2 _{= 0.973,}

SEI = 0.9481061

Figure 21 – External validation using the marital status ‘married’ variable. R2 = 0.813,

134

Figure 22 – External validation using the marital status ‘civil partnership’ variable. R2

= -0.0000744, SEI = -0.303626

Figure 23 – External validation using the marital status ‘separated’ variable. R2 =

135

Figure 24 – External validation using the marital status ‘divorced’ variable. R2₌

0.605, SEI = 0.5857174

Figure 25 – External validation using the marital status ‘widowed’ variable. R2₌

136

Figure 26 – External validation using the ethnicity ‘White British/other white’ variable.

R2_{= 0.5314, SEI = 0.4424715}

Figure 27 – External validation using the ethnicity ‘Mixed/multiple ethnic group’

137

Figure 28 – External validation using the ethnicity ‘Other ethnic group’ variable. R2₌

0.6123, SEI = 0.2682731

As can be seen from Figures 20 to 28, the results of the external validation were mixed, and varied far more than those of the internal validation. Many of the tests obtained R2 values over 0.5, and SEI scores near to or above this figure. The scores for the ‘civil partnership’ variable (Figure 22) in particular were very low, with the ‘separated’ and ‘mixed/multiple ethnic group’ variables also having very low SEI scores (Figures 23 and 27). The latter two variables also showed the most discordance between the R2 value and the SEI value, an important reminder that they are not measuring the data in the same way.

The general trend in the data seemed to be that that the larger the counts associated with a different level of a variable the better the validation scores, and that the variables that did not perform as well in the external validation had the smallest counts. This pattern was seen consistently throughout the external validation. Therefore, any errors in the simulated variables with lower counts would make more of a difference proportionally than it would with a variable with larger numbers. For example, an error of 6 is not that large out of a population of 1000, whereas it would be out of a population of 15-20. Burden and Smith (2015) have commented on a similar issue regarding validation, stating that ‘the categories with small counts and those with high within-area

138

homogeneity were the most highly variable’ (p.575). This is not an attempt to ‘blame the data’ or to excuse the performance of the model however. It is an interesting discussion point though that the presence of external validation variables with smaller numbers may make a difference to the results, and perhaps are not best suited to such tasks. The analysis also showed that the model over and underestimated the populations for all of the target variables, indicating that the counts of the validation variables are not the only potential issue. It is of course highly subjective what a ‘big’ or ‘small’ number is, and may depend on the counts of the other variables in the model. Thus finding a suitable cut-off point for externally validating a model could be difficult if this approach was taken.

Part of the issue with the external validation may be due to the sample size. As already mentioned, Ryan et al. (2009) have commented that increased sample sizes lead to more accurate models. The sample size in this research was 5,388 when an initial model containing just tooth decay was completed, however this number fell to 4,840 upon the addition of the extra variables associated with the theoretical pathways. This was due to presence of missing data, as any individual with missing data points had to be removed from the analysis, as spatial microsimulation cannot work with missing data values. It may be that a larger sample would allow for a more accurate external validation. Given all of these concerns, most of the variables that had higher numbers (i.e. approaching 100 or above) still scored over 0.6 for both the R2 and SEI, indicating a good, if not great fit. Given these scores and those of the internal validation, the model was considered accurate enough to proceed with. Ideally more accurate values would have been obtained, and future work should investigate how the fit of such models could be improved, however such analysis was beyond the scope of this research.

In document Neighbourhood effects: spatial inequalities in tooth decay (Page 147-156)