Monte Carlo tests for spatial patterns and their change a

(1)

Monte Carlo tests for spatial patterns and their

change

a

Juha Heikkinen

Finnish Forest Research Institute Unioninkatu 40 A, 00170 Helsinki

[email protected]

Workshop on Spatial Statistics and Ecology Perämeri Research Station, Hailuoto 16.–18.10.2003

(2)

Motivation

All available observations on long-horned beetles (sarvijäärät) Leptura

quadrifasciata (nelivyöjäärä) and Monochamus sutor (suutari) in Finland,

aggregated to 50km × 50km squares and two time intervals.

• Filled dot: observations both before and after 1960

• Empty dot: observations only before 1960

• + : observations only after 1960

(3)

(4)

Problems

The usual problem with such data: observational effort spatially and temporally variable and unknown.

Also species tend to be more abundant in the ‘core’ of their range than in the limits.

Question: Can spatial pattern of observations still reveal real changes? E.g., when no changes, random pattern of empty dots and +’s expected along the limits. Significant clustering of empty dots indicates decline of species?

(5)

Monte Carlo significance test

Let H₀ be a null hypothesis about the distribution of (multidimensional) random variable X, such that

• H₀ is simple (no unknown parameters involved) or

• H₀ is composite, but sufficient statistics exist for all nuisance parameters.

(6)

Example (Besag & Diggle, 1977)

• X are locations of events (e.g., trees) in study region

R

• H₀: pattern X is completely random

Null distribution is the homogeneous Poisson process on

R

:

• number of events in X, n ∼ Poisson(λ|

R

|), where |

R

| is the size of

R

• the n events are uniformly distributed over

R

, locations mutually independent

(7)

General test procedure (Barnard, 1963)

• Select any test statistic u, sensitive to suspected kind of departure from H₀. Suppose large values indicate departure.

• Compute u₁ = u(x) from the observed data.

• Simulate m−1 random samples x₂,x₃, . . . ,x_m from the null distribution of X.

• Compute simulated u-values u_i = u(x_i), i = 2, . . . ,m.

• Order the complete set {u₁,u₂, . . . ,u_m}

• If u₁ is the k’th largest, then the exact significance level of the test is

(8)

Citing Hope (1968)

“preferable to use known test of good efficiency instead of a Monte Carlo test procedure assuming that the alternative statistical hypothesis can be completely satisfied.”

Monte Carlo useful (at least in early stages of statistical analysis) when

• conditions for applying test based on (asymptotic) distribution assumptions not satisfied (e.g., small non-normal sample); note exact tests can be approximated by Monte Carlo

• distribution of test statistic under H₀ unknown (this is often the case in spatial statistics)

(9)

Example (ctd.)

In the point pattern example, each simulated x_i is a pattern of n

independent uniform points on

R

(conditioning on n).

A commonly recommended graphical test is obtained by choosing a vector u of estimated values of the so-called K-function (Bartlett, 1964; Ripley, 1977) at a number of distances h > 0.

For a stationary point process with intensity λ (expected number of events per unit area)

λK(h) = E(number of further events within distance h from a random event).

For a random pattern K(h) = πh2, for regular patterns K(h) tends to be smaller and for clustered patterns greater (at least for small h).

(10)

Tree map from a 50m × 50m sample plot in Lapland −20 −10 0 10 20 −20 −10 0 10 20 0 5 10 15 20 25 0 500 1000 1500 2000 h K(h)

(11)

L-function

L(h) = pK(h)/π−ha motivated by variance stabilising square-root

transformation (Besag, 1977; Silverman, 1977) and comparison to the Poisson process (for which L(h) ≡ 0)

a_or _L₍_h_{) =} p

K(h) (Silverman, 1977) or . . . ; L-function does not seem to be a very well-defined concept.

(12)

L-function for Lapland trees −20 −10 0 10 20 −20 −10 0 10 20 0 5 10 15 20 25 −1.0 −0.5 0.0 0.5 1.0 h L(h)

(13)

Dependence between different types of events

The cross K function can be defined as

λjKi j(h) = E(number of type j events within distance h

from a randomly chosen type i event),

To test for association between events of different type, bi-variate Poisson process not a particularly useful null hypothesis, because patterns of

each type should be allowed to have a structure among themselves. The whole marginal models for each type are nuisance parameters!

(14)

Conditioning on marginal patterns (Lotwick & Silverman, 1982)

Generate replications under

H₀ : no association between events of different types

by randomly shifting the whole pattern of one type relative to the other, events moved outside of

R

by a shift ‘reappear’ in

R

from the opposite side or corner (the toroidal idea, works for rectangular

R

)

(15)

Amacrine cells in a rabbit’s eye (Diggle, 1986)

Are the two types of cells formed initially in two separate layers or does differentiation occur in a later stage of development.

(16)

(17)

Are patterns of black and white dots independent?

ˆL₁₂(h) and simulation envelopes from 29 random toroidal shifts.

0.00 0.05 0.10 0.15 0.20 0.25 0.30 −0.006 −0.002 0.002 h cross−L(h)

(18)

Random labelling of events

H₀ : black dots are a random subset of all dots

is not the same hypothesis as independence between patterns.

Random labelling is often considered in epidemiological case-control -studies, where apparent clustering of cases may result from

inhomogeneous population density.

‘Controls’ (type 2 events): a random sample from the population at risk. If no clustering then ‘cases’ (type 1 events) are a random sample of the pattern of cases and controls.

(19)

Monte Carlo test for random labelling

H₀ suggest the obvious simulation method: choose observed number of ‘cases’ randomly from the combined pattern.

In other words, fix spatial locations and permute the type labels, so this leads back to Fisher (1935).

Under random labelling marginal patterns are random thinnings of combined patterns, which implies

K₁₁(h) = K₂₂(h)

This suggests choosing u from differences Kˆ₁₁(h)−Kˆ₂₂(h) to study

clustering of cases over the natural environmental spatial clustering of controls.

(20)

Thefts by blacks and whites in Oklahoma City (Bailey & Gatrell, 1995) 150 200 250 300 350 100 150 200 250 300 20 40 60 80 −10000 0 5000

Simulation envelopes, random labelling

distance K ^ 1 − K ^ 2 ˆ

(21)

Space-time clustering

Space-time K-function (Diggle et al., 1993) can be defined by

λK(h,t) = E(number of events within distance h and time interval t

from a randomly chosen event). Here λ is the intensity in space-time: expected number of events in a space-time box of size one area unit by one time unit.

(22)

Monte Carlo test for space-time interaction

Simulation under

H₀ : no space-time interaction

by random permutations of time labels keeping spatial locations fixed. If the processes operating in time and space are independent (no

space-time interaction) then

K(h,t) = K_S(h)K_T(t),

where K_S is the usual (spatial) K-function and K_T is the similarly defined function in time domain.

This suggests choosing u from differences Dˆ(h,t) = Kˆ(h,t)−Kˆ_S(h)Kˆ_T(t) to study whether events clustered in space are also close together in time.

(23)

Burkitt’s lymphoma in Uganda Bailey & Gatrell (1995)

1961 1962 1963 1964 1965

(24)

Plot of Dˆ(h,t) and a Monte Carlo test Distance 10 20 30 Time 500 1000 1500 D 0e+00 2e+05 4e+05 6e+05 8e+05 D plot Test statistic Frequency

−4e+07 0e+00 4e+07

0 1 2 3 4 Data Statistic MC results

(25)

Tools

All these methods discussed in Bailey & Gatrell (1995) in a more general context of point pattern analysis.

Manly (1997) is a gentle introduction to Monte Carlo methods in general. All the analyses are easily accessible to anyone in public domain spatial point pattern analysis package splancs running under commercial

Splus and free R

http://cran.r-project.org

(26)

splancs-functions for the examples

The splancs_{-functions which did the essential work for the examples} were

Kenv.csr for the simple point pattern example

Kenv.tor for random toroidal shifts of one pattern w.r.t. another Kenv.label for random labelling of two types of events

stdiagn for space-time clustering analysis

(27)

Data sets for the examples (all but one)

The amacrine cells data set is available as data set amacrine in Splus/R-package spatstat

http://www.maths.uwa.edu.au/~adrian/spatstat.html

locations of Oklahoma City offences (okblack, okwhite) and Burkitt’s lymphoma cases (burkitt) are available in splancs. Type

(28)

Back to beetle problem

The actual data are a space-time point pattern.

Random permutations of time labels, keeping the spatial locations fixed, takes care of conditioning on marginal variation of observational effort and abundance both in space and time.

(29)

Monte Carlo test and interpretation

Rejection of

H₀ : no space-time interaction indicates that

• either spatial distribution of species has changed

• or observational effort has changed in some parts of the country differently from other parts

Discrimination between these explanations left to the ecologist (unless info on observational effort somehow extracted)

(30)

Test statistic?

Could perhaps be more focussed than the general space-time

K-function.

For example, K-functions of empty dots (to test for decline) and +’s (to test for expansion). In line with the original idea.

Further ideas warmly welcome!

(31)

Bailey, T. C. & Gatrell, A. C. (1995). Interactive spatial data analysis. Longman Scientific & Technical, Harlow.

Barnard, G. A. (1963). Discussion on paper by M. S. Bartlett. J. R. Stat. Soc.

Ser. B 25: 294.

Bartlett, M. S. (1964). The spectral analysis of two-dimenstional point pro-cesses. Biometrika 51: 299–311.

Besag, J. & Diggle, P. J. (1977). Simple Monte Carlo tests for spatial pattern.

Appl. Statist. 26: 327–333.

Besag, J. E. (1977). Discussion on paper by b. d. ripley. J. R. Stat. Soc. Ser.

B 39: 193–195.

(32)

Diggle, P. J., Chetwynd, A. G., Haggkvist, R. & Morris, S. (1993). Second-order analysis of space-time clustering. Statistical Methods in Medical

Re-search 4: 124–136.

Fisher, R. A. (1935). The design of experiments. Oliver and Boyd, Edinburgh. Hope, A. C. A. (1968). A simplified Monte Carlo significance test procedure.

J. R. Stat. Soc. Ser. B 30: 582–598.

Lotwick, H. W. & Silverman, B. W. (1982). Methods for analysing spatial processes of several types of points. J. R. Stat. Soc. Ser. B 44: 406–413. Manly, B. F. J. (1997). Randomization, bootstrap and Monte Carlo methods

in biology. Chapman & Hall/CRC, Boca Raton, 2nd edn.

Ripley, B. D. (1977). Modelling spatial patterns (with discussion). J. R. Stat.

(33)