EXPLORING SPATIAL PATTERNS IN YOUR DATA

(1)

E

^XPLORING

S

^PATIAL

P

^ATTERNS ^IN

YOUR DATA

(2)

O

^BJECTIVES

 Learn how to examine your data using the Geostatistical Analysis tools in ArcMap.

 Learn how to use descriptive statistics in ArcMap and Geoda to analyze data.

 Be able to identify Geostatistical Analysis tools that can be used for further analysis.

(3)

W

^HY ^EXPLORE ^YOUR ^DATA

?

 It allows you to better select an appropriate tool to analyze your data.

 If you skip exploring your data, you may miss key information about it that may lead to incorrect

conclusions and decisions.

(4)

G

^EODA ^VS

. A

^RCMAP

 Geoda – free, open-source, simple, software specifically for statistical analysis

 ArcMap – proprietary, GIS software that can

perform statistical analysis along with hundreds of other analyses

(5)

G

^EODA

V

^S

. A

^RC

M

^AP

 With ArcMap you can view several data layers at once.

In Geoda, you view only one data layer.

 Some tools are found in both programs, while some are found in only one.

(6)

E

^XPLORE ^THE

L

^OCATION ^OF ^YOUR

D

^ATA

(7)

E

^XPLORE ^THE ^LOCATION ^OF ^YOUR ^DATA

 Explore:

 size of the study area

 mean

 median

 direction data are oriented

 You will see where data are clustered relative to the rest of the data.

(8)

M

^EAN

C

^ENTER

 The geographic center for a set of features.

 Constructed from the average x and y values for the input feature centroids (middle points, if input features are polygons).

(9)

M

^EDIAN

C

^ENTER

 Median Center is robust to outliers.

 Uses an algorithm to find the point that minimizes travel from it to all other features in the dataset.

 At each step (t) in the algorithm, a candidate

Median Center is found (X^t, Y^t) and refined until it represents the location that minimizes Euclidian Distance d to all features (i) in the dataset.

(10)

D

^IRECTION

D

ISTRIBUTION

(S

^TANDARD

D

^EVIATIONAL

E

^LLIPSE

)

 Standard deviational ellipses summarize the spatial

characteristics of geographic features: central tendency, dispersion, and directional trends.

 The ellipse allows you to see if the distribution of features is elongated and hence has a particular orientation.

 When the underlying spatial pattern of features is

concentrated in the center with fewer features toward the periphery (a spatial normal distribution),

 a one standard deviation ellipse polygon will cover approximately 68 percent of the features

 two standard deviations will contain approximately 95 percent of the features

 three standard deviations will cover approximately 99 percent of the features

(11)

(12)

E

^XPLORE ^THE ^VALUES ^OF ^YOUR ^DATA

(13)

N

^ORMAL

D

ISTRIBUTION

 Some analysis tools assume a normal distribution:

 Mean and median are similar

 Data are symmetrical

(14)

D

^ATA

F

^REQUENCY

U

^SING

H

^ISTOGRAMS

(15)

D

^ATA

D

ISTRIBUTION

U

^SING ^A

QQ

^PLOT

A normally distributed dataset

Many characteristics of a normal dataset Not normal

A normal QQ plot shows the relationship of your data to a normal distribution line.

(16)

B

^OX ^PLOT

 Displays the median and interquartile range (IQ) (25%-75%)

 Hinge = multiple of interquartile range

(17)

M

^APS

 For examining data values and frequencies:

 Quantile Map

 Natural breaks

 Equal intervals

 For finding outliers:

 Percentile Map

 Box Map

 Standard Deviation Map

(18)

Q

^UANTILE

M

^AP

 Displays the distribution of values in categories with an equal number of observations in each category.

(19)

E

^QUAL

I

^NTERVAL

M

^AP

 Sets the value ranges in each category equal in size.

 The entire range of data values is divided equally into however many categories have been chosen.

(20)

N

^ATURAL

B

^REAKS

M

^AP

 Seeks to reduce the variance within classes and maximize the variance between classes

(21)

O

^THER EXPLORATORY METHODS

 Scatter Plot (2 variables)

 Parallel coordinate plot (A pattern of lines is drawn that connects the coordinates of each observation across the variables on parallel x-axes.)

(22)

D

^ETECT

O

^UTLIERS

(23)

O

^UTLIERS

 Outliers can reveal mistakes, unusual occurrences, and shift points in data patterns (a valley in a

mountain range).

 You should use more than one method to find

outliers because some techniques will only highlight data values near the two ends of your range.

(24)

P

^ERCENTILE

M

^AP

 Groups ranked data into 6 categories

 Lowest and highest 1% are potential outliers

(25)

B

^OX

M

^AP

 Groups data into 4 categories, plus 2 outlier

categories at both ends

 Data are outliers if they are 1.5 or 3 times the IQ.

 Detects outliers with more

certainty than a percentile map

(26)

S

^TANDARD

D

^EVIATION

M

^AP

 Displays data 3 standard deviations above and below the mean.

 As a parametric map, it is sensitive to outliers.

(27)

S

EMIVARIOGRAM CLOUD

 When points closer together have greater

differences in their values, this may indicate an outlier in the data.

 The selected points may be outliers.

(28)

V

^ORONOI

M

^AP

 Cluster Voronoi maps show spatial outliers in your data; simple Voronoi maps can pinpoint data values that are many class breaks removed from

surrounding polygons.

The gray

polygons may be outliers.

(29)

H

^ISTOGRAM

 Values in the last bars to the left or right, if far removed from the adjacent values, may indicate outliers.

(30)

N

^ORMAL

QQ P

^LOT

 Values at the tails of a normal QQ plot can also be outliers. This can happen when the tail values do not fall along the reference line.

(31)

B

^OXPLOT

 Points outside the hinges (represented by the black, horizontal lines), maybe outliers.

(32)

E

^XPLORE

S

^PATIAL

R

ELATIONSHIPS IN YOUR

D

^ATA

(33)

S

^PATIAL AUTOCORRELATION



Everything is related, but objects closer together are more related than objects farther apart.



Explore using a semivariogram graph or cloud



Can also be explored using Moran’s I and

Getis-Ord G statistics

(34)

Height (sill) = variation between data values.

Range = distance between points at which the

semivariogram flattens out.

As the range increase, height should increase, since points further away from each other are not as related, so there should be more variation.

If a semivariogram is a horizontal line, there is no spatial autocorrelation.

(35)

V

^ARIATION ^IN ^YOUR ^DATA

 Many spatial statistics analysis techniques assume your data are stationary, meaning the relationship between two points and their values depends on the distance between them, not their exact location.

 Explore variation using a Voronoi map.

 A Voronoi map is created by defining Thiessen polygons around each point in your dataset.

 Any location inside a polygon represents the area closer to that data point than to any other data

point.

 This allows you to explore the variation of each sample point based on its relationship to

surrounding sample points.

(36)

A

^SIMPLE ^VORONOI ^MAP

 A simple Voronoi map shows the data value at each location. The map is symbolized using a geometrical

interval classification. This will show the variation in data values across your entire dataset.

Green = little local variation

Orange and Red = greater local variation

(37)

TYPES OF

V

^ORONOI

M

^APS

 Simple: The value assigned to a polygon is the value recorded at the sample point within that polygon.

 Mean: The value assigned to a polygon is the mean value that is calculated from the polygon and its neighbors.

 Mode: All polygons are categorized using five class intervals.

The value assigned to a polygon is the mode (most frequently occurring class) of the polygon and its neighbors.

 Cluster: All polygons are categorized using five class

intervals. If the class interval of a polygon is different from

each of its neighbors, the polygon is colored gray and put into a sixth class to distinguish it from its neighbors.

 Entropy: All polygons are categorized using five classes

based on a natural grouping of data values (smart quantiles).

The value assigned to a polygon is the entropy that is calculated from the polygon and its neighbors.

Entropy = - Σ (p_i * Log p_i ),

(38)

E

^XPLORE

T

^RENDS ^IN ^YOUR ^DATA

(39)

T

^REND ^ANALYSIS

 You can use the trend analysis tool in Arcmap to

visually compare the trend lines with any patterns in your data.

 When exploring trends, your data locations are mapped along the x- and y-axes. The values of each data location are mapped as height (z-axis).

 Trends are analyzed based on direction and on the order of the line that fits the trend. The trend line is a mathematical function, or polynomial, that

describes the variation in the data.

(40)

These polynomials show a clear curve, indicating a second-order trend in the data.

You can determine whether the order of the polynomial fits your data based on the shape created by the line.

A second-order polynomial will appear as an upward or a downward curve (known as a parabola).

(41)

S

ÊLECTING ÂN ÂNALYSIS ^TECHNIQUE

(42)

 Each of the following techniques are types of

interpolation. Interpolation creates surfaces based on spatially continuous data.

 Each surface uses the values and locations of your points to create (or interpolate) the values for the remaining points in the surface.

(43)

G

EOSTATISTICAL INTERPOLATION

 Creates surfaces using the relationships between your data locations and their values.

 Predicts values based on your existing data.

 Assumptions:

 Data is not clustered.

(Simple kriging technique has a declustering option.)

 Data is normally distributed.

(Transformation options are available.)

 Data is stationary (no local variation).

 Data is autocorrelated.

 Data has no local trends.

(You can remove trends from data as part of the interpolation process. )

(44)

G

^LOBAL DETERMINISTIC INTERPOLATION

 Creates surfaces using the existing values at each location.

 Uses your entire dataset to create your surface.

 Outliers have been removed from the data.

 Global trends exist in the data.

(45)

L

^OCAL DETERMINISTIC INTERPOLATION

 Uses several subsets, or neighborhoods, within an entire dataset to create the different components of the surface.

 Assumption:

 Data is normally distributed.

(46)

I

^NVERSE

D

^ISTANCE

W

^EIGHTED

INTERPOLATION

(IDW)

 A type of local deterministic interpolation.

 Data is not clustered.

 Data is autocorrelated.

(47)

O

^THER

S

^PATIAL

S

^TATISTICAL

T

^ESTS

 Tests for spatial autocorrelation

 Getis-Ord General G and Global Moran’s I (to determine overall clustering and dispersion of values)

 Hot Spot Analysis (Getis-Ord Gi*) and Anselin’s Local Moran’s I (to determine specific clusters of high and low values)

 Regression

 Used to evaluate relationships between two or more feature attributes. Are location, crime rates, racial make- up, and income related to housing values in a census tract?