Spatial
Data
Analysis
Using
GeoDa
9
Jan
2014
Frank
Witmer
Computing and Research Services Institute of Behavioral Science
Workshop
Goals
•
Enable participants to find and retrieve
geographic data pertinent to their study and
conduct spatial analysis using GeoDa
– Geographic data sources and formats
– Data joins in ArcGIS
– Exploratory spatial data analysis (ESDA) in GeoDa
•
Provide experience using ArcGIS and GeoDa
software
•
Provide the opportunity for you to work with
Types
of
Geographic
Data
1)
Spatial
data
• helpful to conceptualize as maps
• necessary for answering “Where…” questions
– used to establish spatial relations (e.g. distance, connectivity, containment)
– used to support spatial analysis
2)
Attribute
data
• helpful to conceptualize as tables • necessary for answering “What…” questions (and metadata too, typically in .xml format)Geographic
Data
Spatial data and attribute table are ‘linked’ together StateName Population Governor
New Jersey 7,730,188 C. Whitman
IBS
Data
Links
Page
• http://www.colorado.edu/ibs/crs/geographic_data_sources .html • Some highlights: – WONDER from CDC • http://wonder.cdc.gov/– ESRI Data Products
• http://www.esri.com/data/find‐data
• Census data
– American FactFinder
• http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml
– Census Explorer (no direct data download)
• http://www.census.gov/censusexplorer/
– http://blogs.census.gov/
Census
Maps
&
Data
•
http://www.census.gov/geo/maps
‐
data/
•
TIGER
Products
–
Cartographic
Boundary
Files
|
County
• 500k, 5m, 20m reflect the scale of the data• scale is the ratio of map distance to earth distance, so 1:500,000 has more detail than 1:20,000,000
–
TIGER/Line
Shapefiles |
2010
|
Download
|
Web
International
Borders
•
For
individual
countries,
can
sometimes
find
a
gov’t
agency
that
provides
geographic
data
•
ESRI
borders
–
online
or
from
Data
&
Maps
DVD
•
Global
Administrative
Areas
(GADM)
–
http://www.gadm.org/
Joining
Attribute
Data
to
Geodata
•
Will
often
find
attribute
data
in
tabular
form
•
So
might
need
to
obtain
geodata separately
and
join
the
attribute
data
to
it
•
Challenge:
construct
common
field
that
Joining
Tables
In ArcMap, right-click on the destination to begin a join!
Some
GIS
File
Types
• ESRI Shapefiles
– Very common since file format is open
– Multiple files with different extensions (.shp, .shx, .dbf)
– Display quickly and are editable
• But careful, polygons do not share boundary lines!
• ESRI SDC
– Smart Data Compression, files are compressed for efficient storage
• ESRI Interchangefiles
– Extension .e00
• ESRI GRID
– Attribute table stores number of occurrences/value • ESRI Geodatabases
– Integrated approach for storing & managing all types of geographic data and their relationships
(ESDA)
•
Actively find interesting patterns in the data
•
Facilitated by dynamically linked views
•
Use statistical measures of spatial association
such as global & local Moran’s I to explore spatial
dependence
– Global: one statistic to summarize the pattern
– Local: location specific statistics
•
Moran’s I frequently used to test for spatial
autocorrelation in regression residuals, but it is
also of interest when exploring the spatial
distribution of variables
Moran’s
I
Statistic
• Standardized cov/var • Significance tests Normal distribution Randomization/permutation • Spatial correlation‐1 ‐> neg. correlation (regularity) 0 ‐> no correlation
1 ‐> pos. correlation (clustering)
2 ij i j i j ij i i j i
w z z
n
I
w
z
=
∑∑
∑∑
∑
wij= weights matrix for contiguity matrix, wij= 1 if i and j adjacentH0: spatial independence Normal Distribution: – Assume X’s are identically normally distributed (each value for each region has same distribution) – Use E[I] and VAR[I] to calculate Z‐statistic – If Z‐statistic lies beyond critical value, then reject null Randomization/permutation: – Many times randomly rearrange the data on map and compute I each time. Create a histogram of distribution of I. – Then calculate the mean and variance of the distribution. And then a z‐statistic. – If Z‐statistic lies beyond critical value, then reject null
Low‐value clustering
High‐value clustering Anti‐correlated
Anti‐correlated
Local
Moran’s
I
•
Provides
a
measure
of
spatial
autocorrelation
for
every
areal
unit,
I
ic
=
a
constant
of
proportionality
•
If
assume
I
iis
normally
distributed,
can
be
transformed
into
Z
‐
statistic
to
test
for
significance.
cI
I
n i i=
∑
=1GeoDa Software
•
Open source, available for Windows, Mac & Linux
– http://geodacenter.asu.edu/
– project is led by Luc Anselin at ASU
– only supports shapefiles, so must use ArcGIS (or other
GIS software) to convert to shapefiles
•
Linking
– selection in one view results in selection in all views
(e.g. maps, tables, scatterplots)
•
Brushing
– dynamic version of linking
– click & drag rectangle over map or scatterplot