Spatial autocorrelation - Spatial temporal analysis of the distribution of pediatric tuberculos

Autocorrelation statistics for aggregated data provide an estimate of the degree of spatial similarity observed among neighboring values of an attribute over a study area.

A global measure which is a single value (average for the entire area) that applies to the entire data set produces the same pattern over the entire geographical area. Global spatial autocorrelation is used to test for the presence of geographical variables over a whole space. If features which are similar in location also tend to be similar in attributes, then the pattern as a whole is said to show positive spatial autocorrelation. Conversely, negative spatial autocorrelation exists when features which are close together in space tend to be more dissimilar in attributes than features which are further apart. And finally the case of zero autocorrelation occurs when attributes are independent of location. A local measure which is a value (unique number for each location) calculated for each observational unit provides different patterns that may occur in different parts of the region.

3.4.1 Moran’s I

Moran‟s I coefficient of autocorrelation is used to quantify the similarity of an outcome variable among areas that are defined as spatially related.

Global Moran‟s I statistic in our study was used to identify characteristics of the global spatial pattern. The global Moran‟s I statistic which measures the correlation among spatial observations, allowed us to find the characteristics of the global pattern (clustered, dispersed, random) Waller (2006). Also, it was used to evaluate autocorrelation in pediatric TB spatial distribution and test how counties are clustered or dispersed in space. It is defined as:

𝐼 = 𝑁

𝑊𝑖 𝑗 𝑖𝑗

𝑊𝑖 𝑗 𝑖𝑗(𝑥𝑖 − )(𝑥𝑗 − ) (𝑥𝑖 𝑖− )2

N is the number of Counties

is the variable of interest (in our case, it is the incidence rate for each county) is the mean of

is an element of a matrix of spatial weights.

Fundamental to all autocorrelation statistics is the spatial weight matrix. It is used to define the spatial relationships so that regions close in space, defined as neighbours are given greater weight when calculating the statistic than those that are distant Wikipedia (2012, Feb 9). In this study, inverse distance method was used to construct the spatial weight matrix. The Global Moran‟s I tool was used to calculate the Moran‟s I statistic.

Moran‟s I is approximately normally distributed and has an expected value of -1/ (N-1) when no correlation exists between neighboring values. The expected value of the coefficient therefore approaches zero as N increases.

Negative (positive) values indicate negative (positive) spatial autocorrelation. Values range from −1 (indicating perfect dispersion i.e. neighboring areas tend to have dissimilar attribute values) to +1 (perfect correlation i.e. clustering of areas of similar attribute values). A zero value indicates a random spatial pattern. For statistical hypothesis testing, Moran's I values can be transformed to Z-scores in which values greater than 1.96 or smaller than −1.96 indicate spatial autocorrelation that is significant at the 5% level.

Moran‟s I can be calculated using various software packages, including ClusterSeer, R, GeoDa and ArcGIS. For our study, we used arcGIS to compute the statistic.

3.4.2 Local indicators of Spatial Association (LISA)

This is the local version of Moran‟s I. It detects local spatial autocorrelation in aggregated data by decomposing Moran‟s I statistic into contributions for each area within a study region. This statistic is used to identify where clustering occurs and where spatial outliers are located. Also, we can map the polygon which has a statistically significant relationship with its neighbors and show type of relationship. It is calculated by:

𝐼_𝑖 = 𝑍_𝑖 𝑊_𝑖𝑗 𝑗

Where and are the observed values in standardized form and is an element of the spatial weights matrix.

𝑍_𝑖 =(𝑥𝑖 − ) 𝑆𝐷_𝑥

The arcGIS was used to construct the spatial weights matrix using the inverse distance method. In this method, a centroid for each county was determined and then the other counties were assigned weights based on distance. The neighbouring counties were assigned more weight.

With the obtained standardized spatial weights matrix, we were able to calculate the standardized z-scores and classified polygons in five kinds which are marked with different colours according to the type of relationship range in legend.

High – high indicated high cases of pediatric TB and surrounded by high cases High – low indicated a high pediatric TB cases and surrounded by low

Low – high indicated a low pediatric TB cases and surrounded by high Low – low indicated low cases of pediatric TB surrounded by low

3.4.3 Hot spot detection (Getis and Ord’s local statistic)

The local statistic Dirk et al., (2008) is an indicator of local clustering that measures the „concentration‟ of a spatially distributed attribute variable. The statistic helped us identify „hot spots‟ in spatial data. It is defined as:

𝐺_𝑖 𝑑 = 𝑊𝑖𝑗(𝑑)(𝑥𝑗 − )

𝑆_𝑖 𝑊𝑖(𝑛−1−𝑊𝑖

𝑛 −2

Where n is the number of areas within the region of interest and xi is the observed value for area i and is an element of a symmetric spatial weights matrix

𝑖 =

𝑛−1 𝑋𝑗 and 𝑊𝑖 = 𝑊𝑖𝑗

The polygons are classified in seven kinds which are marked different colours according to z value range in legend.

In document Spatial temporal analysis of the distribution of pediatric tuberculosis patterns in Kenya (Page 32-36)