Extending the Multilevel Model - Searching for Spatial Processes

3.5. Searching for Spatial Processes

3.5.2 Extending the Multilevel Model

Standard MLMs require at least two levels of data, an individual level and a group level. With the decennial Census of the United Kingdom full individual records are not available due to confidentiality requirements. However, it is possible to access a 2% individual sample at a coarse geographical level, although these are of limited use for traditional multilevel modelling, as they do not contain identifiers for an individual’s location below the coarse SAR district level. It is not possible, for instance, to assign individuals to the ED within the SAR district that they live. Consequently, it is not practical to use the standard MLM techniques to analyse the Census data, areal units below the SAR level. However, Tranmer and Steel, (2001) have shown that it is possible to estimate these structures without the full individual level, and not lose significant efficiency, by making use of additional ED level data.

It is possible to express the traditional multilevel model in the following manner:

ig g

ig u

y    (10)

Where:y_igis the value of the variable of interest for the ith individual in the th

g area (ED in the case taken here);

 is the overall population mean, in the SAR;

 is the individual level component.

In terms of understanding the spatial processes that occur within geographical data, the ugterm is the most useful as it reflects these processes. As the area effects will

represent the interactions between people living in an area, it is likely that they would not be fully identifiable if an analysis were conducted purely at the individual level. Within this model, there are a number of important assumptions that must be taken into account. One assumption is that the processes that occur within the data occur solely at the levels available for analysis. When using real world data, such as the Census, it is unlikely that this assumption will remain valid. Thus, it would be useful to be able to provide an estimate of the areal level variance component that is free from such constraints. This can, through further analysis, enable the estimation of the higher level processes within the data.

3.5.2.1 Local Multilevel analysis

We will consider an example, which uses the SAR districts as the regions in which our analysis will be contained. The individual level data necessary will be taken from the 2% SAR, while the areal units are EDs.

The estimator of u_g will be denoted as uˆ_g, and is an estimate of ED level effects. Mathematically, it can be defined as:

) (

ˆ w y y

u_g  _g _g  (11)

Where:w_gis a weighting term;

y is the observed mean of the variable in the ED in question, and;

y is the overall observed mean of the variable for the whole (SAR) district. The weight (w_g ) can be calculated by the following equation:

)) ) 1 ( 1 /( ((2)   (2)  g g g n n w (12)

Where: ngis the number of observations in theg

) 2 (

 is intra-area correlation of the ED for the variable, as defined in the previous section (Pers. Com. Steel 2002)

These estimated group effects attempt to allow for the variation between group means that could come from purely individual level random variation. Application of the weights w_g shrink the deviations of the chosen areal means from the overall mean to allow for the likely impact of individual level variation, thus controlling for potential outlier values.

3.5.2.2 Identifying and Using Spatial Autocorrelation

Analysis of the uˆ_gcan be used to determine the processes occurring in the data between the areal units (in this case EDs), as each uˆ_gvalue is an indication of the group level effect within that unit. Therefore, uˆ_g values that are similar can be said to be the result of similar processes operating at the areal unit level. Measures of spatial autocorrelation of the group level effects can be used to determine the geography of the processes. Consequently, these analyses will be able to show whether or not the spatial processes operate at the same scales as the standard Census units. Such occurrences can be identified as clustering at a different level to that use in the level of analysis, through the setting of limits on the range of potential Local Moran’s I values observed. This is explored in greater detail below. In the discussion that follows, the basic units used will be at the ED level of aggregation. Instances of spatial autocorrelations of uˆ_gwill point to the existence of larger scale processes. If, as is supposed from the research of Tranmer and Steel (2001) the greater the level of spatial autocorrelation, the greater the effects of the MAUP (scale) on potential statistical analysis, then this will be identifiable from this analysis. Moreover, this technique could identify processes that operate between the standard Census levels, such as at a level of aggregation that was half way between the ED and Ward level. If it were possible to recognise this, then it would be possible to better inform Census users as to how to perform their analysis. Once the scale processes have been defined, the uˆ_gvalues can then be interpreted and used to suggest a definition of a higher aggregation level for the data. If the analysis were carried out on British Census data, at the individual (SAR) and ED levels, then the subsequent analysis could suggest a more appropriate construction for the higher level of aggregation for the Census data,

given the data structure of the variable under investigation. Furthermore, it would also be able to demonstrate how well the current Ward structure matched the autocorrelation (which indicates the extent of any spatial processes) apparent within the data. This would enable users to develop their expectations of the level of MAUP effect that occurs within their analysis of a given data structure.

The pattern of the processes within the data can be explored using the concepts of spatial autocorrelation. There are a number of measures of spatial autocorrelation, the most common of which are Geary’s G statistic and the Moran’sItest. These measures are similar, and the analysis below uses a version of Moran’s I. This measurement is “analogous to a covariance between the values of a pair of objects”, (Goodchild, 1986, p.17), measuring the differences between the values for attributes that, in this case, exist within a given spatial proximity. Figure 3.2 presents the three extreme cases of spatial autocorrelation, against which the results in Chapter 7 can be compared.

a) b) c)

Figure 3.2:The three types of spatial autocorrelation a) Positive spatial autocorrelation; b) no spatial autocorrelation, and; c) negative spatial autocorrelation (from O’Sullivan and Unwin, 2003).

However, to determine a spatial pattern, and therefore process, withina given dataset the standard measures of spatial autocorrelation are inadequate, as in some of the SAR districts there could be as many as 5000 EDs. There is no guarantee that the extent of the spatial autocorrelation will be constant within the study area. Consequently a measure that can be defined within the suite of tools known as Local Indicators of Spatial Association (or LISA) is required (Anselin, 1995). One of these tools is the Local Moran’s I which is a variant of the Global Moran’s I. In the Local Moran’s I

individual values are determined for all of the units in an analysis area. The form of the Local Moran’sIis as follows:



    g h h gh Z g g W u u S u u I ˆ ₂ ˆ * [ *(ˆ ˆ)] (13)

Where: I_g is the Local Moran’s value;

uˆ is the mean value of all observations;

uˆ estimated area-level effect for unith;

uˆ area estimate of the variable for unitg;

S is the variance over all observations, and;

W is a distance weight that can be defined by

gh gh

W  1 (14)

from CrimeStat (2003, p.288) and Levine (1996)

Hence a value for the Local Moran’s I can be computed for each areal unit in the region. To enable comparison, it is possible to calculate a ‘standardised’ version of the Local Moran’s I that takes into account its sampling error, and it is this that is referred to in the following analysis. The standardisation is carried out using the following function: ) ( / )] ( [ ) (Ii Ii E Ii S Ii Z   (15)

Where: Z(I_i)is the value of the standardised Local Moran’sI;

I is the Local Moran’sIvalue for the variable under analysis of areal unitiat the ED level;

) (Ii

E is the mean Local Moran’s value of areal unitI,and; )

(I_i

S is the standard deviation of areal uniti.

These standardised results are presented in map form, along with the estimates of area effects in the following section. The range of values for the Local Moran’s I is far

greater than for the global Moran’s I measure. However, after standardisation, the Local Moran’s I range falls between the –1 and 1 limits. A negative value indicates negative spatial autocorrelation, where geographically ‘close’ values are less similar than would be expected than if there were no spatial autocorrelation, while a value of zero indicates complete spatial independence. Strong spatial autocorrelation is denoted by high positive values. In practice it is unlikely that high positive or negative will be observed. Because the Local Moran’s I is standardised the results between different districts can be compared. Moreover, we take any values of standardised Local Moran’s I that are either below –3 or above +3 to indicate significant spatial processes between the areal units. These bounds were chosen, as it is likely that all significant clustering would be identified using these limits as they approximately correspond to the 99% confidence intervals of standard deviations.

In document The modifiable areal unit phenomenon: an investigation into the scale effect using UK census data (Page 93-98)