Monte Carlo tests for spatial patterns and their
change
a
Juha Heikkinen
Finnish Forest Research Institute Unioninkatu 40 A, 00170 Helsinki
Workshop on Spatial Statistics and Ecology Perämeri Research Station, Hailuoto 16.–18.10.2003
Motivation
All available observations on long-horned beetles (sarvijäärät) Leptura
quadrifasciata (nelivyöjäärä) and Monochamus sutor (suutari) in Finland,
aggregated to 50km × 50km squares and two time intervals.
• Filled dot: observations both before and after 1960
• Empty dot: observations only before 1960
• + : observations only after 1960
Problems
The usual problem with such data: observational effort spatially and temporally variable and unknown.
Also species tend to be more abundant in the ‘core’ of their range than in the limits.
Question: Can spatial pattern of observations still reveal real changes? E.g., when no changes, random pattern of empty dots and +’s expected along the limits. Significant clustering of empty dots indicates decline of species?
Monte Carlo significance test
Let H0 be a null hypothesis about the distribution of (multidimensional) random variable X, such that
• H0 is simple (no unknown parameters involved) or
• H0 is composite, but sufficient statistics exist for all nuisance parameters.
Example (Besag & Diggle, 1977)
• X are locations of events (e.g., trees) in study region
R
• H0: pattern X is completely random
Null distribution is the homogeneous Poisson process on
R
:• number of events in X, n ∼ Poisson(λ|
R
|), where |R
| is the size ofR
• the n events are uniformly distributed over
R
, locations mutually independentGeneral test procedure (Barnard, 1963)
• Select any test statistic u, sensitive to suspected kind of departure from H0. Suppose large values indicate departure.
• Compute u1 = u(x) from the observed data.
• Simulate m−1 random samples x2,x3, . . . ,xm from the null distribution of X.
• Compute simulated u-values ui = u(xi), i = 2, . . . ,m.
• Order the complete set {u1,u2, . . . ,um}
• If u1 is the k’th largest, then the exact significance level of the test is
Citing Hope (1968)
“preferable to use known test of good efficiency instead of a Monte Carlo test procedure assuming that the alternative statistical hypothesis can be completely satisfied.”
Monte Carlo useful (at least in early stages of statistical analysis) when
• conditions for applying test based on (asymptotic) distribution assumptions not satisfied (e.g., small non-normal sample); note exact tests can be approximated by Monte Carlo
• distribution of test statistic under H0 unknown (this is often the case in spatial statistics)
Example (ctd.)
In the point pattern example, each simulated xi is a pattern of n
independent uniform points on
R
(conditioning on n).A commonly recommended graphical test is obtained by choosing a vector u of estimated values of the so-called K-function (Bartlett, 1964; Ripley, 1977) at a number of distances h > 0.
For a stationary point process with intensity λ (expected number of events per unit area)
λK(h) = E(number of further events within distance h from a random event).
For a random pattern K(h) = πh2, for regular patterns K(h) tends to be smaller and for clustered patterns greater (at least for small h).
Tree map from a 50m × 50m sample plot in Lapland −20 −10 0 10 20 −20 −10 0 10 20 0 5 10 15 20 25 0 500 1000 1500 2000 h K(h)
L-function
L(h) = pK(h)/π−ha motivated by variance stabilising square-root
transformation (Besag, 1977; Silverman, 1977) and comparison to the Poisson process (for which L(h) ≡ 0)
aor L(h) = p
K(h) (Silverman, 1977) or . . . ; L-function does not seem to be a very well-defined concept.
L-function for Lapland trees −20 −10 0 10 20 −20 −10 0 10 20 0 5 10 15 20 25 −1.0 −0.5 0.0 0.5 1.0 h L(h)
Dependence between different types of events
The cross K function can be defined as
λjKi j(h) = E(number of type j events within distance h
from a randomly chosen type i event),
To test for association between events of different type, bi-variate Poisson process not a particularly useful null hypothesis, because patterns of
each type should be allowed to have a structure among themselves. The whole marginal models for each type are nuisance parameters!
Conditioning on marginal patterns (Lotwick & Silverman, 1982)
Generate replications under
H0 : no association between events of different types
by randomly shifting the whole pattern of one type relative to the other, events moved outside of
R
by a shift ‘reappear’ inR
from the opposite side or corner (the toroidal idea, works for rectangularR
)Amacrine cells in a rabbit’s eye (Diggle, 1986)
Are the two types of cells formed initially in two separate layers or does differentiation occur in a later stage of development.
Are patterns of black and white dots independent?
ˆL12(h) and simulation envelopes from 29 random toroidal shifts.
0.00 0.05 0.10 0.15 0.20 0.25 0.30 −0.006 −0.002 0.002 h cross−L(h)
Random labelling of events
H0 : black dots are a random subset of all dots
is not the same hypothesis as independence between patterns.
Random labelling is often considered in epidemiological case-control -studies, where apparent clustering of cases may result from
inhomogeneous population density.
‘Controls’ (type 2 events): a random sample from the population at risk. If no clustering then ‘cases’ (type 1 events) are a random sample of the pattern of cases and controls.
Monte Carlo test for random labelling
H0 suggest the obvious simulation method: choose observed number of ‘cases’ randomly from the combined pattern.
In other words, fix spatial locations and permute the type labels, so this leads back to Fisher (1935).
Under random labelling marginal patterns are random thinnings of combined patterns, which implies
K11(h) = K22(h)
This suggests choosing u from differences Kˆ11(h)−Kˆ22(h) to study
clustering of cases over the natural environmental spatial clustering of controls.
Thefts by blacks and whites in Oklahoma City (Bailey & Gatrell, 1995) 150 200 250 300 350 100 150 200 250 300 20 40 60 80 −10000 0 5000
Simulation envelopes, random labelling
distance K ^ 1 − K ^ 2 ˆ
Space-time clustering
Space-time K-function (Diggle et al., 1993) can be defined by
λK(h,t) = E(number of events within distance h and time interval t
from a randomly chosen event). Here λ is the intensity in space-time: expected number of events in a space-time box of size one area unit by one time unit.
Monte Carlo test for space-time interaction
Simulation under
H0 : no space-time interaction
by random permutations of time labels keeping spatial locations fixed. If the processes operating in time and space are independent (no
space-time interaction) then
K(h,t) = KS(h)KT(t),
where KS is the usual (spatial) K-function and KT is the similarly defined function in time domain.
This suggests choosing u from differences Dˆ(h,t) = Kˆ(h,t)−KˆS(h)KˆT(t) to study whether events clustered in space are also close together in time.
Burkitt’s lymphoma in Uganda Bailey & Gatrell (1995)
1961 1962 1963 1964 1965
Plot of Dˆ(h,t) and a Monte Carlo test Distance 10 20 30 Time 500 1000 1500 D 0e+00 2e+05 4e+05 6e+05 8e+05 D plot Test statistic Frequency
−4e+07 0e+00 4e+07
0 1 2 3 4 Data Statistic MC results
Tools
All these methods discussed in Bailey & Gatrell (1995) in a more general context of point pattern analysis.
Manly (1997) is a gentle introduction to Monte Carlo methods in general. All the analyses are easily accessible to anyone in public domain spatial point pattern analysis package splancs running under commercial
Splus and free R
http://cran.r-project.org
splancs-functions for the examples
The splancs-functions which did the essential work for the examples were
Kenv.csr for the simple point pattern example
Kenv.tor for random toroidal shifts of one pattern w.r.t. another Kenv.label for random labelling of two types of events
stdiagn for space-time clustering analysis
Data sets for the examples (all but one)
The amacrine cells data set is available as data set amacrine in Splus/R-package spatstat
http://www.maths.uwa.edu.au/~adrian/spatstat.html
locations of Oklahoma City offences (okblack, okwhite) and Burkitt’s lymphoma cases (burkitt) are available in splancs. Type
Back to beetle problem
The actual data are a space-time point pattern.
Random permutations of time labels, keeping the spatial locations fixed, takes care of conditioning on marginal variation of observational effort and abundance both in space and time.
Monte Carlo test and interpretation
Rejection of
H0 : no space-time interaction indicates that
• either spatial distribution of species has changed
• or observational effort has changed in some parts of the country differently from other parts
Discrimination between these explanations left to the ecologist (unless info on observational effort somehow extracted)
Test statistic?
Could perhaps be more focussed than the general space-time
K-function.
For example, K-functions of empty dots (to test for decline) and +’s (to test for expansion). In line with the original idea.
Further ideas warmly welcome!
Bailey, T. C. & Gatrell, A. C. (1995). Interactive spatial data analysis. Longman Scientific & Technical, Harlow.
Barnard, G. A. (1963). Discussion on paper by M. S. Bartlett. J. R. Stat. Soc.
Ser. B 25: 294.
Bartlett, M. S. (1964). The spectral analysis of two-dimenstional point pro-cesses. Biometrika 51: 299–311.
Besag, J. & Diggle, P. J. (1977). Simple Monte Carlo tests for spatial pattern.
Appl. Statist. 26: 327–333.
Besag, J. E. (1977). Discussion on paper by b. d. ripley. J. R. Stat. Soc. Ser.
B 39: 193–195.
Diggle, P. J., Chetwynd, A. G., Haggkvist, R. & Morris, S. (1993). Second-order analysis of space-time clustering. Statistical Methods in Medical
Re-search 4: 124–136.
Fisher, R. A. (1935). The design of experiments. Oliver and Boyd, Edinburgh. Hope, A. C. A. (1968). A simplified Monte Carlo significance test procedure.
J. R. Stat. Soc. Ser. B 30: 582–598.
Lotwick, H. W. & Silverman, B. W. (1982). Methods for analysing spatial processes of several types of points. J. R. Stat. Soc. Ser. B 44: 406–413. Manly, B. F. J. (1997). Randomization, bootstrap and Monte Carlo methods
in biology. Chapman & Hall/CRC, Boca Raton, 2nd edn.
Ripley, B. D. (1977). Modelling spatial patterns (with discussion). J. R. Stat.