Developing measures of diseases of comfort
6 Geodemographic and health outcome analysis - diseases of comfort- diseases of comfort
6.3 Predicting smoking behaviours
6.3.4 Mapping sm oking risk using kernel density estimation
To investigate the spatial distribution of the risk scores, neighbourhood scores associated w ith smoking behaviours w ere then m apped using point density estim ation in a GIS. This m ethod was chosen because it is commonly used w ithin public sector organisations to create hotspot m aps; but it is not strictly a m ethodology for defining spatial clusters (de Smith et al, 2008
Page 191 of 436
online). This is because the technique facilitates the developm ent of the spatial representations of population health at small levels, th ro u g h the production of density surfaces. As acknowledged by Lloyd (2005) there has been a m ove tow ards the use of surfaces to re-represent the original points in a m ore useful and visually understandable form at (Martin, 1996 and M artin et a I., 2000). The purpose of doing so in this research is to provide a
generalised p attern of sm oking behaviour and to indicate likely spatial clusters.
The process used for creating surfaces in this research is know n as kernel density estim ation (KDE). KDE is m ost comm only used to estim ate population density and diseases or the incidence of crime (Chainey and Radcliffe, 2005). In this case study the process w as used to create a density surface of sm oking behaviour, based u p o n the point locations of postcodes and their attributed health behaviour, to explore the presence of any inherent spatial pattern. It is a useful process for this d ata because, as noted by Bailey and Gatrell (1995), it produces a sm ooth estim ate of probability density from an observed set of inputs, in this case, the centroids of unit postcodes ascribed w ith the sm oking risks calculated in the previous section.
The KDE process assum es a p attern exists across the study area, not just at the point locations (points associated w ith code-point polygons, identifying u n it postcodes). The technique produces o u tp u t in the form of a grid, w ith each cell representing the estim ated spatial density of events (Fotheringham et al, 2002; Martin, 1996; M artin et al., 2000). In this case study each cell represents the density of postcodes and their associated attribute values. A detailed description of the process and the form ula are docum ented by Fotheringham and colleagues (2002, pages 146 to 149).
To create the surfaces a kernel was placed over each observation (the spatial coordinates of the postcode centroid) and w as then used to spread the attribute values for sm oking risk across the study area, according to a given radius. The kernel function w as placed over each point in the study region (Greater London) to retu rn a sm oothed continuous distribution reflecting the density of postcodes w hich exhibit certain health behaviours. The resultant surface reflects the intensity of smoking risk at a given point.
To produce the density surfaces, the risk scores of sm oking behaviour were transform ed into z-scores in order to norm alise the distribution (see H arris et al, 2005, page 152) by converting the scores into a stan d ard form. This was done in order to ensure all the survey response variables analysed w ere brought onto the same scale. It is useful because the index scores are not norm ally distributed. The m inim um index score is constrained by the 0 value, b u t the m axim um value has no constraint, it can continue ad infinitum . An alternative to using z-scores w ould be to transform the scores using
logarithm s. The associated z-scores for neighbourhood sm oking behaviours w ere assigned to the points of each code-point polygon (representative of postcode units).
The neighbourhood Types w ere used as the unique identifier to assign the scores to a spatial reference. A density surface of all sm oking behaviours for G reater London w as created (Figure 28) using kernel density estim ation to highlight patterns and local concentrations of the behaviours. De Smith et al (2007, C hapter 4, online) note The choice of grid resolution does not affect the resulting surface, b u t should be m eaningful w ithin the context of the dataset being analysed'. To this end, a grid cell spacing of 75m w as chosen because it approxim ates to the centroid radius of an urban postcode unit, the spatial resolution of the underlying data points. The next step w as to choose the bandw idth. The b an d w id th controls the am ount of sm oothing of the density
Page 193 of 436
surface, and "larger bandw idths will tend to highlight regional patterns, and smaller bandw idths will emphasise local patterns", (Fotheringham et al, 2002). The purpose of this exercise was to highlight local variations in health behaviour, using a num ber of tests, the results of w hich are seen in Figure 27.
Here, a bandw idth of 150m w as chosen - which appeared to m ost effectively reflect residential populations and account for holes that represent urban structures such as parks and rivers.
Bandwidth=150m Bandwidth=200m ~ B andw idth 250m
Figure 27: Surface of heavy smokers, made w ith varying kernel bandwidths
The advantages of using this technique will be discussed in the subsequent case study, b u t essentially it is used because it facilitates the m ultiple grids to be combined into composite surfaces of health behaviours.
Using this m ethod it was possible to identify neighbouring geographical locations w ith high concentrations of the predicted lifestyle behaviour across London. Its purpose w as to indicate the relative incidence of neighbourhoods across Greater London m ost likely to be comprised of cigarette smokers, a useful visual tool for social m arketers.
Page 194 of 436
Percentiles Z-Scores Index Value Category
25% -0.905 Less than 64.5 Likelihood of smoking:
considerably below average
50% -0.182 64.5 to 95.5 Likelihood of smoking: Below
average
75% 0.663 95.5 to 131.75 Likelihood of smoking: Above
average
100% 2.371 131.75 to 205 Likelihood of smoking:
Considerably above average
Table 15: Summarisation of categories used to classify a 1 smokers map
Once the grid surface was created, the m ap was classified using inter-quartile ranges, sum m arised in Table 15, for all smokers. Figure 28 highlights the spatial distribution of z-scores for all types of sm oking behaviours across G reater London. The visualisation of the m ap used a hot-cold cartographic technique. N eighbourhoods coloured in dark red represent locations w ithin the top 25% highest smoking z-scores. This corresponded to neighbourhoods w here the likelihood of smoking related behaviour is considerably greater than the average for England. N eighbourhoods shaded in light yellow
represent locations w here the likelihood of residents smoking is considerably below the average for England. Indeed, neighbourhoods coloured in light yellow signify the 25% of London neighbourhoods least likely to smoke.
Page 195 of 436
Figure 28: Map of Index Z- Scores for all smokers
Legend Risk likelihood
High
Moderately high M edium
Medium low low Very low
Likelihood of being a sm oker (lig h t/m o d e rate /h e a v y )
0 : 10
kilometres
The results of the density surface indicated that neighbourhood propensity to smoke appears to be 'clustered7 or concentrated. The spatial distribution highlighted neighbourhoods in the London suburbs as being the least likely to smoke and the largest neighbourhood concentrations of smoking related behaviour are predom inantly located in east London. N eighbourhoods in north east, east and south east London are most likely to comprise smokers.
One limitation is that these surfaces do not provide a m easure for the concentration of population.
As should be expected, m apping of the non-sm oking variable exhibited the inverse spatial distribution to the all smokers map. Once again z-scores were used to standardise the distribution and m apped using the same technique outlined in the above paragraphs. The output w as categorised using
quartiles, b u t the z-score ranges were slightly different, as highlighted in Table 16.
Page 196 of 436
Percentiles Z-Scores Index Value Category
25% -0.727 Less than 88.5 Likelihood of smoking:
considerably below average
50% 0.166 88.5 to 102 Likelihood of smoking: Below
average
75% 0.927 102 to 113.5 Likelihood of smoking: Above
average
100% 1.622 113.5 to 124 Likelihood of smoking:
Considerably above average Table 16: Summary of categories used to classify the map of non-smokers
The visual o u tp u t of this process is show n in Figure 29. In this example, non
smoking behaviour is represented by high index scores and their
corresponding z-scores. Again a hot-cold visualisation technique was used to categorise the quartile ranges. Neighbourhoods m ost likely to be non-
smokers w ere categorised in dark red. N eighbourhoods in light yellow, correspond to the 25% of neighbourhoods least likely to be non-smoking.
Figure 29 is the inverse of Figure 28, im plying spatial concentrations of
sm oking neighbourhoods across London, as predicted by the Gini coefficients in section 6.3.2. Smoking patterns appear to be spatially and dem ographically segregated. Further analysis using spatial regression techniques w ould
provide statistical validity to these results, b u t moves into the realm of explanatory analysis which is beyond the scope of this thesis. The degree to which these neighbourhoods are clustered could be assessed by m easuring the level of spatial autocorrelation (section 3.1.1.2), which m easures the tendency of similar values to cluster in space. A local M oran's statistic (LISA) w ould identify and m easure the extent of clustering in smoking
neighbourhoods.
Page 197 of 436
Figure 29: Map of index Z- Scores for non-smokers
Risk likelihood
High
M oderately high Medium Medium low
low Very low Likelihood of no t being a sm oker
Legend
0 5 1 0
k i l o m e t r e s