Introduction to spatial data analysis

(1)

Introduction to spatial data

analysis

Scuola di Dottorato in Economia, La Sapienza, 2015/2016

Instructors: Filippo Celata, Federico Martellozzo and Luca Salvati http://www.memotef.uniroma1.it/node/6524

3

Spatial statistics: - f(location, distance..) - to identify invisible geographical

properties of data (eg. distribution patterns) - spatial association: to verify the degree of similarity of spatial events which are a function of their distance

John Snow’s map of Cholera London, 1854

Types of spatial association:

1. That are due to spatial dependence between geographical features (eg. Similar plants require similar soils)

2. That are due to spatial autocorrelation: the presence of a certain event increases the probability of finding similar events nearby, due to a reciprocal influence or «real contagion» (eg. Similar plants cluster because they are generated by other similar plants)

Methods:

A. To analyze the spatial distribution of a pre-selected set of similar event (point patterns or point processes) (eg. Firms owned by foreign born)

B. Autocorrelation analysis: the degree to which nearby features are more similar than distant ones (to identify relations between proximity and intensity; polygons)

(2)

1. (Simple) spatial distribution measures

- Spatial distribution

Case field: to identify different centres for different categories of features (marked point pattern) Weight:absolute vs. relative centrality

MEDIAN CENTER / MEAN CENTER

Do:the distribution of firms owned by foreign born Identify and render the mean center(spatial statistics / measuring geog. distr. / …) for firms owned by Bangla, Egyptians, Romanian, Chinese and Lybians (input: lez3/rm_immigDT.shp; weight field: “ADD08”; case field: “ORIGINE”)

Do a kernel density map of firms owned by foreign born: spatial analyst / density / kernel density(Input:

rm_immigDT.shp; Population field: CNT; cell size: 10 mts (or average distance between all points); search radius: 2.000 meters/ in “environments”: extent and raster analysis/mask = zoneurbanistiche.shp).

Mapping:modify the symbology of both ouput layers, and go to view/layout view to export the map (.tif, 300 dpi)

Discrete vs. surface statistical analysis Eg. Surface-based indicators(-> map algebra)

-> measures of spatial segregation

S (Surface-based) segregation index (numerator) (O’Sullivan-Wong 2007): the ‘local’ contribution to global spatial segregation = difference between the max and min values in any point of the kernel density (eg. Italians/Chinese = max(pCi,pIi) – min (pCi,pIi)]

(Descrete) segregation index: relation between two normalized or standardized density coefficients (eg. Normalized density of firms owned by Chinese / Normalized density of all firms) (from -1 to +1).

(3)

Grado di segregazione tra aree a prevalenza di imprenditori cinesi e aree

a prevalenza di imprenditori italiani Contributo locale alla segregazione tra aree a prevalente presenza di unità condotte da imprenditori cinesi o italiani

2. POINT PROCESSES: spatial distribution of events in a point pattern (or scheme)

-> Cluster analysis

- Spatial cluster:the spatial

distribution of (similar) events (points) is (more) ‘clustered’ (than a complete spatial random distribution, and/or than the general/global distribution of the process. Eg. Diseases due to ‘local’ causes).

-Clustering:a general tendency of (similar) events to co-locate

- Hot-spot:areas with an anomalous concentration of similar events

Eg. Business cluster

Firms’

clustering

and

external

economies

of scale:

empirical

evidence

random (concentrated*) Point processes and cluster analysis: to verify if the spatial distribution of (similar) events is clustered, dispersed (uniform or inhibitory) vs.the “complete spatial randomness hypothesis”

(4)

(Geographical concentration

measures and problems)

[Problems with standard (discrete, regional, a-spatial) concentration measures (eg. GINI index)]

1) MAUP (modifiable area unit problem): the degree of concentration is influenced by the spatial partition and spatial resolution of data

2) The degree of concentration is not function of the degree of ‘polarization’ of the most dense regions (Arbia 2001)

Concentration vs. polarization

Degree of spatial auto-correlation Degree of

concentration

Concentration vs. co-agglomeration

-

Ellison and Glaeser concentration index (1997):

a measure of

co-agglomeration which takes into account the average degree

of industrial concentration (Herfindahl index) and is not

influenced by the degree of spatial resolution of data (MAUP)

(5)

Point processes: clustering of events

2. POINT PROCESSES: spatial distribution of events in a point pattern

-> Cluster analysis

- Spatial cluster: the spatial distribution of (similar) events (points) is (more) ‘clustered’ (than a complete spatial random distribution, and/or than the

general/global distribution of the process. Eg. Diseases due to ‘local’ causes).

-Clustering:a general tendency of (similar) events to co-locate

- Hot-spot: areas with an anomalous concentration of similar events

Complete spatial randomness(Diggle, 1983) = the event has the same probability to locate anywhere =

- The number of events in any subregion is distributed as a Poisson

-The location of events is not depending upon the location of similar events (indipendence)

- The number of events in two nonoverlapping regions are independent

3) The average number of events per unit area (intensity) is homogeneous throughout the area (spatial statitionery) Random distributions implies a certain degree of concentration and/or clustering. This distribution is clustered whenever the degree of concentration is higher than what we would expect in case of complete spatial randomness.

Different techniques imply different CSR hypothesis

Problems with the analysis of spatial data #1:

-Study areaextension(if too small, the analysis may not include elements which are important to provide an exhaustive explanation. If too big, the spatial distribution pattern may be due of a diversity of processes which have nothing to do with what we want to explain. Example: suburban, scattered and low density urban areas).

-> reduce the size of the area Creat a mask of the area within the GRA (ring road) by selecting (manually) the zone urbanistiche within the GRA and exporting the selection as mask_area.shp

(6)

Clustering: “global” indexes (to measure the ‘global’ degree of clustering for the whole set of events) -> methods based on quadrats (joint count) vs. on distances

AVERAGE NEAREST NEIGHBOUR:the distance between events is less (clustering) or more (pattern inibitorio) of the expected distance in case of complete spatial randomness? (Clark-Evans, ’50s)

Nearest neighbour ratio = observed mean distance / expected mean distance (CSR) ->

Input:

Points: unweighted (= 1) / Projected coordinate system! (Polygons and lines: convert into points with x, y = centroids)

Output: - Observed Mean Distance -Expected Mean Distance - Nearest Neighbor Index -Graphic report -Test variables:

p-value:probabilty of the spatial distribution to be random

z-score:standard deviation of the real values from expected values

-measure the ANN

for firms within the GRA (selection of rm_immig.shp)

-> Toolbox / Spatial statistics / Analyzing patterns