• No results found

Introduction to spatial data analysis

N/A
N/A
Protected

Academic year: 2021

Share "Introduction to spatial data analysis"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction to spatial data

analysis

Scuola di Dottorato in Economia, La Sapienza, 2015/2016

Instructors: Filippo Celata, Federico Martellozzo and Luca Salvati http://www.memotef.uniroma1.it/node/6524

3

Spatial statistics: - f(location, distance..) - to identify invisible geographical

properties of data (eg. distribution patterns) - spatial association: to verify the degree of similarity of spatial events which are a function of their distance

John Snow’s map of Cholera London, 1854

Types of spatial association:

1. That are due to spatial dependence between geographical features (eg. Similar plants require similar soils)

2. That are due to spatial autocorrelation: the presence of a certain event increases the probability of finding similar events nearby, due to a reciprocal influence or «real contagion» (eg. Similar plants cluster because they are generated by other similar plants)

Methods:

A. To analyze the spatial distribution of a pre-selected set of similar event (point patterns or point processes) (eg. Firms owned by foreign born)

B. Autocorrelation analysis: the degree to which nearby features are more similar than distant ones (to identify relations between proximity and intensity; polygons)

(2)

1. (Simple) spatial distribution measures

- Spatial distribution

Case field: to identify different centres for different categories of features (marked point pattern) Weight:absolute vs. relative centrality

MEDIAN CENTER / MEAN CENTER

Do:the distribution of firms owned by foreign born Identify and render the mean center(spatial statistics / measuring geog. distr. / …) for firms owned by Bangla, Egyptians, Romanian, Chinese and Lybians (input: lez3/rm_immigDT.shp; weight field: “ADD08”; case field: “ORIGINE”)

Do a kernel density map of firms owned by foreign born: spatial analyst / density / kernel density(Input:

rm_immigDT.shp; Population field: CNT; cell size: 10 mts (or average distance between all points); search radius: 2.000 meters/ in “environments”: extent and raster analysis/mask = zoneurbanistiche.shp).

Mapping:modify the symbology of both ouput layers, and go to view/layout view to export the map (.tif, 300 dpi)

Discrete vs. surface statistical analysis Eg. Surface-based indicators(-> map algebra)

-> measures of spatial segregation

S (Surface-based) segregation index (numerator) (O’Sullivan-Wong 2007): the ‘local’ contribution to global spatial segregation = difference between the max and min values in any point of the kernel density (eg. Italians/Chinese = max(pCi,pIi) – min (pCi,pIi)]

(Descrete) segregation index: relation between two normalized or standardized density coefficients (eg. Normalized density of firms owned by Chinese / Normalized density of all firms) (from -1 to +1).

(3)

Grado di segregazione tra aree a prevalenza di imprenditori cinesi e aree

a prevalenza di imprenditori italiani Contributo locale alla segregazione tra aree a prevalente presenza di unità condotte da imprenditori cinesi o italiani

2. POINT PROCESSES: spatial distribution of events in a point pattern (or scheme)

-> Cluster analysis

- Spatial cluster:the spatial

distribution of (similar) events (points) is (more) ‘clustered’ (than a complete spatial random distribution, and/or than the general/global distribution of the process. Eg. Diseases due to ‘local’ causes).

-Clustering:a general tendency of (similar) events to co-locate

- Hot-spot:areas with an anomalous concentration of similar events

Eg. Business cluster

Firms’

clustering

and

external

economies

of scale:

empirical

evidence

random (concentrated*) Point processes and cluster analysis: to verify if the spatial distribution of (similar) events is clustered, dispersed (uniform or inhibitory) vs.the “complete spatial randomness hypothesis”

(4)

(Geographical concentration

measures and problems)

[Problems with standard (discrete, regional, a-spatial) concentration measures (eg. GINI index)]

1) MAUP (modifiable area unit problem): the degree of concentration is influenced by the spatial partition and spatial resolution of data

2) The degree of concentration is not function of the degree of ‘polarization’ of the most dense regions (Arbia 2001)

Concentration vs. polarization

Degree of spatial auto-correlation Degree of

concentration

Concentration vs. co-agglomeration

-

Ellison and Glaeser concentration index (1997):

a measure of

co-agglomeration which takes into account the average degree

of industrial concentration (Herfindahl index) and is not

influenced by the degree of spatial resolution of data (MAUP)

(5)

Point processes: clustering of events

2. POINT PROCESSES: spatial distribution of events in a point pattern

-> Cluster analysis

- Spatial cluster: the spatial distribution of (similar) events (points) is (more) ‘clustered’ (than a complete spatial random distribution, and/or than the

general/global distribution of the process. Eg. Diseases due to ‘local’ causes).

-Clustering:a general tendency of (similar) events to co-locate

- Hot-spot: areas with an anomalous concentration of similar events

Complete spatial randomness(Diggle, 1983) = the event has the same probability to locate anywhere =

- The number of events in any subregion is distributed as a Poisson

-The location of events is not depending upon the location of similar events (indipendence)

- The number of events in two nonoverlapping regions are independent

3) The average number of events per unit area (intensity) is homogeneous throughout the area (spatial statitionery) Random distributions implies a certain degree of concentration and/or clustering. This distribution is clustered whenever the degree of concentration is higher than what we would expect in case of complete spatial randomness.

Different techniques imply different CSR hypothesis

Problems with the analysis of spatial data #1:

-Study areaextension(if too small, the analysis may not include elements which are important to provide an exhaustive explanation. If too big, the spatial distribution pattern may be due of a diversity of processes which have nothing to do with what we want to explain. Example: suburban, scattered and low density urban areas).

-> reduce the size of the area Creat a mask of the area within the GRA (ring road) by selecting (manually) the zone urbanistiche within the GRA and exporting the selection as mask_area.shp

(6)

Clustering: “global” indexes (to measure the ‘global’ degree of clustering for the whole set of events) -> methods based on quadrats (joint count) vs. on distances

AVERAGE NEAREST NEIGHBOUR:the distance between events is less (clustering) or more (pattern inibitorio) of the expected distance in case of complete spatial randomness? (Clark-Evans, ’50s)

Nearest neighbour ratio = observed mean distance / expected mean distance (CSR) ->

Input:

Points: unweighted (= 1) / Projected coordinate system! (Polygons and lines: convert into points with x, y = centroids)

Output: - Observed Mean Distance -Expected Mean Distance - Nearest Neighbor Index -Graphic report -Test variables:

p-value:probabilty of the spatial distribution to be random

z-score:standard deviation of the real values from expected values

-measure the ANN

for firms within the GRA (selection of rm_immig.shp)

-> Toolbox / Spatial statistics / Analyzing patterns

References

Related documents

Moisture problems are generally caused by water intrusion as a result of poor design or workmanship, a faulty thermal or moisture envelope, an extreme weather event and/or from the

disclose any information obtained during the mediation unless the parties expressly consent to such disclosure, or unless disclosure is required by applicable rules of law. A

[20], which showed the respondents not washing their hands before preparing food and they have poor knowledge about safety regarding to the washing nd before handling,

Thus, a positive impact of urban sprawl on current revenues can be interpreted as follows: this spatially expansive urban development pattern increases the ability of municipal

In- deed, the overexpression of miR-200c increased the level of E-cadherin and decreased the levels of ZEB1, ZEB2, and N-cadherin proteins, whereas an opposite effect was observed

Value-added measures, the graduation rate, teacher experience levels and teacher education levels are not consistently positively related to housing prices, so researchers

One such agency is the National Library Board of Singapore (NLB), which was recently awarded the Best Practice Award in Resource Management at the 2014 Excellence in Public Service

In order to allow a harmonious development of a single market, it is important to encourage a better communication between the various players: to facilitate