Practical maximum pseudolikelihood. for spatial point patterns. Adrian Baddeley and Rolf Turner. Abstract

(1)

Practical maximum pseudolikelihood

for spatial point patterns

Adrian Baddeley and Rolf Turner

Abstract

We describe a technique for computing approximate maximum pseudolikelihood estimates of the parameters of a spatial point pro-cess. The method is an extension of Berman and Turner's [7] de-vice for maximising the likelihoods of inhomogeneous spatial Poisson processes. For a very wide class of spatial point process models the likelihood is intractable, while the pseudolikelihood [8] is known ex-plicitly, except for the computation of an integral over the sampling region. Approximating this integral by a nite sum in a special way yields an approximate pseudolikelihood which is formally equivalent to the (weighted) likelihood of a loglinear model with Poisson responses. This can be maximised using standard statistical software for gener-alised linear or additive models, provided the conditional intensity of the process takes an `exponential family' form. Using this approach we are able to t rapidly a wide variety of spatial point process mod-els of Gibbs type, incorporating spatial trends, interaction between points, dependence on spatial covariates, and mark information.

Keywords:

Area-interaction process; Berman-Turner device; Dirichlet tes-sellation; Edge eects; Generalised additive models; Generalised linear mod-els; Gibbs point processes; GLIM; Hard core process; Inhomogeneous point process; Marked point processes; Markov spatial point processes; Ord's pro-cess; Pairwise interaction; Prole pseudolikelihood; Spatial clustering; Soft core process; Spatial trend; S-PLUS; Strauss process; Widom-Rowlinson model.

(2)

Introduction

This paper describes a computational device for rapidly tting statistical models to spatial point patterns. Applications are shown in Section 9. Datasets may consist of points in two or three dimensions or in space-time; the points may be classied into dierent types or carry auxiliary obser-vations (\marks"). Additionally there may be spatial covariates, such as topography or another spatial pattern observed in the same region.

Realistic models for such data should incorporate both spatial inhomo-geneity (`trend') and dependence between points (`interaction' such as clus-tering or regularity). Ogata and Tanemura [44, 45, 46, 47] and Penttinen [49] developed methods for maximum likelihood estimation for such models, and applied them to real data. Recent advances have been made by Geyer, Mller and others [20, 21]. However, maximum likelihood is computation-ally intensive, and employs simulation algorithms which are specic to the chosen model. It is even more costly for inhomogeneous spatial patterns be-cause of increased parameter dimensionality and complexity of simulation. This militates against the modern statistical practice of tting several al-ternative models to the same dataset and introducing smooth functions as model terms. Few writers apart from Ogata and Tanemura [47] have tted inhomogeneous point process models, other than the inhomogeneous Poisson process, to real spatial data.

Berman and Turner [7] introduced a technique for maximising the likeli-hoods of (a) general point processes in time, and (b) inhomogeneous Poisson

processes in d dimensional space. The intensity or conditional intensity of

the process is assumed to be loglinear in the parameters. They approxi-mated the log likelihood by a nite sum which has the same analytical form as the (weighted) log likelihood of a generalised linear model with Poisson responses. The approximate likelihood can then be maximised using existing software for generalised linear models. Related ideas have been explored by Lindsey [36, 37, 38, 39].

In this paper we extend the Berman-Turner device to a much larger class of spatial point process models, namely Gibbs point processes with

exponen-tial family likelihoods. We obtain an approximation to the pseudolikelihood

[8, 9, 30] rather than to the likelihood. The maximum pseudolikelihood es-timator is a practical alternative to the MLE, satises unbiased estimating equations and is consistent and asymptotically normal under suitable condi-tions. The MLE is not necessarily optimal here since the usual asymptotic theory is not applicable. Under reasonable assumptions [16] the maximum pseudolikelihood normal equations are a special case of the Takacs-Fiksel estimating equations, an application of the method of moments [17, 61, 18].

(3)

Using the extended Berman-Turner device, and standard statistical soft-ware, we are able rapidly to t quite complex spatial stochastic models in-volving spatial trends and spatial covariates as well as interactions between points.

The plan of the paper is as follows. Denitions and background are given in Sections 1 and Section 2. Our extension of the Berman-Turner computa-tional device is presented in Section 3. Section 4 treats a simple example. Application of the method to specic models is developed in Section 5 for models of spatial interaction, Section 6 for spatial inhomogeneity, and Sec-tion 7 for marked point patterns. SecSec-tion 8 treats some issues in estimaSec-tion and inference. The method is applied to real datasets in Section 9.

1 Background and denitions

1.1 Likelihoods

The data consist of a spatial point pattern

x

observed in a bounded region

W of space. Thus

x

=fx

1;:::;xn

g (1)

where the number of points n 0 is not xed, and eachxi is a point in W.

The region W is a known, bounded subset of d-dimensional space Rd, where

d 1. Extensions of this basic setup to incorporate spatial covariates and

marked points are discussed in sections 6.1{6.2 and 7 respectively.

The data

x

are assumed to be a realisation of a random point process

X in W. Typically the null model (or the null hypothesis) will be the ho-mogeneous Poisson point process [12, 34]. Other models will be specied by

their likelihood with respect to the Poisson process. Thus we assume X has

a probability density f(

x

;) with respect to the distribution of the Poisson

process with intensity 1 on W. Additionally we assume f(

x

;) > 0 implies

f(

y

;) > 0 for all subsets

y

x

. This is the class of Gibbs processes on W,

see [50, 54, 59]. The distribution is governed by a vector parameter ranging

over a set Rp. See [12, 21].

1.2 Basic models

Specic models are detailed in sections 5{7, but it is instructive to list brie y four important examples. Firstly the homogeneous Poisson process with

intensity > 0 has density

f(

x

;) = e?(?1)jWjn(x);

(4)

where n(

x

) denotes the number of points in

x

and jWj is the volume of W.

This yields the maximum likelihood estimate = n(b

x

)=

jWj.

Secondly consider theinhomogeneous Poisson process on W with rate or

intensity function : W !R, see [12, 34]. In statistical models, the intensity

(u) will depend on to re ect `spatial trend' (a change in intensity across

the region of observation) or dependence on a covariate. The density is

f(

x

;) = 2 4 n(x) Y i=1 (xi) 3 5exp ? Z W [(u)?1] du : (2)

Maximisation of (2) generally requires iterative optimization methods.

Thirdly thepairwise interaction process onW with trend or activity

func-tion b :W !R

+and interaction functionh:W

W !R +has density f(

x

;) = ()n(x) Y i=1 b(xi) Y i<jh(xi;xj) (3)

where() > 0 is the normalising constant. Conditions must be imposed on

b and h to ensure the density is well-dened and integrable: in particular

h(u;v) = h(v;u). Examples are given in section 5. See the excellentsurveys

by Ripley [53, 54]. Pairwise interaction models are suitable for the data in Figures 8 and 14, as shown in [44, 45, 46, 47] and [56, 62] respectively. The

terms b(xi) in (3) in uence the intensity of points and introduce a spatial

trend if b() is not constant. The terms h(xi;xj) introduce dependence

(`interaction') between dierent points of the process X. If h 1 the model

reduces to an inhomogeneous Poisson process with intensity function b(u).

The normalising constant () in (3) is generally an intractable function

of . Methods for approximating () and maximising likelihood include

functional expansions of (), Monte Carlo integration, and analogues of

E-M and stochastic approximation [21, 41, 44, 45, 46, 47, 49].

Most models considered in this paper are pairwise interaction processes, but we also discuss the Widom-Rowlinson (`area-interaction') model (sec-tion 5.2) and Ord's model (sec(sec-tion 5.3).

2 Pseudolikelihood

It is generally dicult to evaluate and maximise the likelihoods of point pro-cesses other than the inhomogeneous Poisson (2). Even simple exponential family models such as the pairwise interaction processes (3) include a

nor-malising constant which is an intractable function of . An alternative to the

(5)

likelihood function is the pseudolikelihood [8, 9, 10, 30] which we describe here. See [16, 17, 18, 53, 54, 56, 61] for other applications.

Originally Besag [8, 9] dened the pseudolikelihood of a nite set of

ran-dom variables X1;:::;Xn as the product of the conditional likelihoods of

each individual Xi given the other variables fXj : j 6= ig. This was

ex-tended [9, 10] to point processes, for which it can be viewed as an innite product of innitesimal conditional probabilities.

2.1 Conditional intensity

To construct the pseudolikelihood we require the (Papangelou) conditional

intensity (u;

x

) of X at a location u2W. This may be loosely interpreted

as giving the conditional probability that X has a point at u given that the

rest of the process coincides with

x

. See [32] for an informal introduction, or

[22, 23, 31, 35] for details.

For any Gibbs process onW (see section 1) with density f, the conditional

intensity at a point u2W is (u;

x

) = f(

x

[fug) f(

x

) (4) if u62

x

, while for xi 2

x

(xi;

x

) = f(_f(

_x

x

) nfxig): (5)

For example, the inhomogeneous Poisson process with intensity function()

has conditional intensity (u;

x

) = (u) at all points u. The fact that this

does not depend on

x

is a consequence of the independence properties of the

Poisson process. For a general Gibbs point process (u;

x

) does depend on

x

. The general pairwise interaction process (3) has conditional intensity

(u;

x

) = b(u) n(x) Y i=1 xi6=u h(u;xi): (6)

Note (;

x

) is discontinuous at the data points xi, and that the intractable

normalising constant in (3) has been eliminated in the conditional intensity.

(6)

2.2 Denition of pseudolikelihood

Besag [9] dened the pseudolikelihood of a point process with conditional

intensity(u;

x

) over a subset A W to be

PLA(;

x

) = " Y xi 2A (xi;

x

) # exp ? Z A(u;

x

)du (7) and gave examples of the utility of maximum pseudolikelihood estimates. Further theory was developed in [10, 29, 30].

If the process is Poisson the pseudolikelihood coincides with the likelihood

(2) up to the factor exp(jWj). For a pairwise interaction process (3), the

pseudolikelihood is PL(;

x

) = 2 4 n(x) Y i=1 b(xi) 3 5 " Y i6=j h(xi;xj) # exp 8 < : ? Z Wb(u) n(x) Y i=1 h(u;xi)du 9 = ; ; (8)

the intractable normalising constant () appearing in the likelihood (3) has

been replaced by an exponential integral in (8) as if the process were Poisson. We give other examples in section 5{6 below.

For processes with `weak interaction' in the sense that (u;

x

) can be

approximated well by a function ofu only, the process is approximately

Pois-son and the pseudolikelihood is an approximation to the likelihood. Hence the maximum pseudolikelihood estimator should be ecient if interaction is weak. Folklore holds that it is inecient for strong interactions.

2.3 Loglinear case

In this paper we focus on Gibbs point process models for which the condi-tional intensity is loglinear:

(u;

x

) = expf

TS(u;

x

)

g (9)

where S(u;

x

) is a vector of spatial covariates (possibly depending on

x

)

dened at each point u in W. This holds in particular if the model is of the

exponential family with canonical parameter .

Assume jjS(u;

x

)jj exp

TS(u;

x

)

is uniformly bounded in u 2 W and

2 , for each xed

x

. Then the maximum pseudolikelihood normal

equa-tions _@

@ logPLA(;

x

) = 0

(7)

become X xi 2A S(xi;

x

) = Z AS(u;

x

)exp TS(u;

x

) du: (10)

Numerical solution of (10) usually requires iterative algorithms.

It can easily be shown that (10) is an unbiased estimating equation, i.e.

the expectations of the left and right sides of (10) under are equal. The

proof is an application of a nonstationary form of the Nguyen-Zessin formula, viz. E " X xi 2X\A h(xi;X) # = Z AE[(u;X)h(u;X)] du (11)

holding for all nonnegative bounded measurable functions h(u;

x

). This

ex-tends a result of Diggleet al [16] that under reasonable conditions, the normal

equations in the stationary case are a special case of the Takacs-Fiksel esti-mating equations, itself an application of the method of moments [17, 18, 61]. Jensen and Mller [30] proved that for Gibbs point processes with expo-nential family likelihoods, the pseudolikelihood is log-concave and the

max-imum pseudolikelihood estimator is consistent as W % Rd, under suitable

conditions. Jensen and Kunsch [29] proved the MPLE is asymptotically nor-mal for stationary pairwise interaction processes, under suitable conditions (see (C1) and (C2) of [29]).

The parameter may be constrained to lie in a a convex set Rp.

In the loglinear case (9), the pseudolikelihood is log-convex so the maximum exists and occurs either at an interior point of where the normal equations

are satised, or on the convex boundary @ of .

3 Berman-Turner device for maximum

pseu-dolikelihood

This section describes the computational device which we propose for com-puting approximate maximum pseudolikelihood estimates. The method is an adaptation of a earlier technique of Berman and Turner [7] for approxi-mate maximum likelihood estimation for the inhomogeneous Poisson point process. Ideas related to [7] have been explored by Lindsey [36, 37, 38, 39].

3.1 Derivation

Let X be a Gibbs point process with conditional intensity (u;

x

) and

con-sider the pseudolikelihood (7) for X, taking A = W for simplicity.

(8)

mate the integral in (7) by a nite sum using any quadrature rule, Z W (u;

x

)du m X j=1 (uj;

x

)wj (12)

where uj; j = 1;:::;m are points in W and wj > 0 are quadrature weights.

This yields an approximation to the log pseudolikelihood,

logPL(;

x

) n(x) X i=1 log(xi;

x

)? m X j=1 (uj;

x

)wj: (13)

Extending an observation of Berman and Turner, we note that if the list of

points fuj;j = 1;:::;mgincludes all the data points fxi;i = 1;:::;ng, then

we can rewrite (13) as log PL(;

x

) m X j=1 (yjlogj ?j) wj (14)

where j =(uj) and yj =zj=wj, where

zj = 1 if uj is a data point,uj 2fx 1;:::;xn g 0 if uj is a dummy point, uj 62fx 1;:::;xn g: (15)

The right side of (14), for xed

x

, is formally equivalent to the log likelihood

of independent Poisson variables Yk Poisson(k) taken with weights wk.

The expression (14) can therefore be maximised using standard software for tting Generalised Linear Models [40], provided that (a) the software handles weighted likelihoods; (b) the software accepts noninteger values of

the responses yj in Poisson loglinear regression and correctly maximises the

loglikelihood expression; (c) the conditional intensity function (;

x

), for

xed

x

, is related to any explanatory variables by

g((u;

x

)) =TS(u;

x

) (16)

whereg is a link function implementedin the software, and S(u;

x

) is a vector

of spatial covariates (possibly depending on

x

) dened at each pointu in W.

Software packages satisfying these criteria include GLIM [1] and S-PLUS

[5, 11, 64]. The only choice of g in (16) which we shall consider is the log

link, giving rise to the `loglinear model' (9).

The key reason for adopting this approach is that the use of standard

statistical packages rather than ad hoc software confers great advantages

in applications. Modern statistical packages have a convenient notation for 8

(9)

statistical models [1, 11, 64] which makes it very easy to specify and t a wide variety of models of the type (9). Algorithms in the package may allow one to t very exible model terms such as the smooth functions in a generalised additive model [27]. Interactive software allows great freedom to reanalyse the data. The tting algorithms are typically more reliable and stable than in home-grown software.

3.2 Procedure

In summary, the procedure is as follows.

1. Generate a set of dummy points, and combine it with the data points

xi to form the set of quadrature points uj;

2. Compute the quadrature weightswj;

3. Form the indicators zj as in (15) and calculate yj =zj=wj;

4. Compute the (possibly vector) values vj = S(uj;

x

) of the sucient

statistic at each quadrature point;

5. Invoke the model-tting software, specifying that the model is a loglin-ear Poisson regression

logj =Tv

j (17)

to be tted to the responses yj and covariate values vj, with weights

wj.

The coecient estimates returned by the software give the (approximate)

MPLE of . The estimates of standard errors are not applicable, since theyb

assume i.i.d. Poisson observations. The software also typically returns the

deviance D of the tted model; this is related to the log pseudolikelihood of

the tted model by

?log PL b ;

x

= D2 +n(x) X i=1 logwi+n(

x

); (18)

note the sum is over data points only.

Conveniently, the null model j in the loglinear Poisson regression

corresponds to the uniform Poisson point process with intensity . The

MPLE is b = n(

x

)=

P

jwj =n(

x

)=jWj with corresponding log

pseudolikeli-hood logPL

^;

x

=n(

x

)[logn(

x

)?logjWj?1]:

(10)

Note that this formulation assumes(u;

x

) is positive everywhere. Zero values are also permissible, provided the set of zeroes does not depend on

. Thus we formally allow negative innite values for S(u;

x

). In the

ap-proximation (14) all points uj with (uj;

x

) = 0 will be dummy points,

oth-erwise the pseudolikelihood and likelihood will be identically zero. Dummy points contribute only to the integral part of the log pseudolikelihood, hence

their contribution is zero. Hence we can simply omit any points uj with

(uj;

x

) = 0 from the sum (14). In the tting algorithm, such points should

be omitted in all contexts.

3.3 Quadrature schemes and their accuracy

Figure 1: Quadrature using the Dirichlet tessellation [7]. Left: Illustrative

exampleof a point pattern dataset in the unit squareW. Right: The Dirichlet

tessellation ofW based on the data pointstogether witha 55 grid of dummy

points. Data points are marked by lled dots. The quadrature weight wj is

the area of the Dirichlet tile.

Berman and Turner [7] used the Dirichlet tessellation or Voronoi diagram [48] to generate quadrature weights for the analogue of (12). The data points are augmented by a list of dummypoints, then the Dirichlettessellation of the combined set of points is computed as sketched in Figure 1. The quadrature

weight wj associated with a (data or dummy) point uj is the area of the

corresponding Dirichlet tile.

A computationally cheaper scheme is to partitionW into tiles Tk of equal

area, and in each tile place exactly one dummy point, either systematically

or randomly. Ascribe to each dummy or data point uj a weight wj where

w?1

j is the number of (dummy or data) points in the same tile asuj. We call

these the counting weights.

(11)

Note that the conditional intensity(u;

x

) is typically a discontinuous

function of u at the data points xi, while generically the limit as u ! xi

exists. Thus the approximation (12) involves a `discontinuity error' of size

n(x) X i=1 (xi;

x

)? lim u!x i (u;

x

) wi (19)

(a sum of contributions from data points only) in addition to the `quadrature error' associated with the nite approximation to the integral. The

discon-tinuity error is controlled by reducing P

iwi, the total quadrature weight

of the contributions from the data points, usually by increasing the number

m?n of dummy points. See further comments at the end of section 4.

4 Example: Strauss process

Next we illustrate the method as it applies to the simple Strauss process

model [60, 33]. This is a pairwise interaction process (3) in which b(u)

is constant and h(u;v) = if jju ?vjj r, and h(u;v) = 1 otherwise.

Here > 0 and 0 1 are parameters and r > 0 is a xed `interaction

distance'. Thus each pair of points closer than r units apart contributes a

penalty of to the likelihood,

lik(; ;

x

) =n(x) s(x) (20)

(taking 00 = 1) where = (; ) is the normalising constant, and

s(

x

) = #f(i;j) : i < j; jjxi?xjjjrg

is the number of unordered pairs of points which lie closer thanr units apart,

The Strauss process is well-dened for all 2 [0;1]. If = 1, it reduces to

the homogeneous Poisson process with intensity . For = 0 it is a hard

core process in which no two points ever lie closer than r units apart. For

0< < 1 there is inhibition between close pairs of points. The conditional

intensity is

; (u;

x

) = t(u;x) (21)

where

t(u;

x

) = #fxi 2

x

: 0<jjxi?ujjrg (22)

is the number of points xi 2

x

which are close to u, other than u itself. See

Figure 2. The pseudolikelihood is 11

(12)

+ + + + + + +

Figure 2: The conditional intensity of the Strauss process at a point u ()

depends on the number of existing points (+) of the conguration

x

which

are closer than r units distant from u. In this illustration t(u;

x

) = 2.

PL(; ;

x

) =n(x) 2s(x)exp ? Z W t (u;x)du (23)

which is in the required loglinear form (9) with = (ln;ln )TandS(u;

x

) =

(1;t(u;

x

))T. The MPLE normal equations (10) are

n(

x

) = Z W t (u;x) du (24) 2s(

x

) Z W t (u;x)du = n(

x

) Z Wt(u;

x

) t (u;x)du: (25)

The maximumof the pseudolikelihood may occur either at a solution of these

equations or at = 0;1. If r is less than the minimum interpoint distance,

thens(

x

) = 0 and the pseudolikelihood is maximised when = 0. Otherwise

b

> 0. If the solution of (24{25) occurs where > 1, then since the log

pseudolikelihood is concave,b = 1. The maximised log pseudolikelihood is

logPL b ;b ;

x

=n(

x

)log + 2s(b

x

)log b ? b p(b ): (26)

Consistency and asymptotic normality of the MPLE follow from [29, 30]. To compute the approximate MPLE using the Berman-Turner device we would follow the procedure in section 3.2, tting the loglinear model

logj =1+2vj

where vj =t(uj;

x

); with t(u;

x

) as dened in (22). A suitable S-PLUS

invo-cation would be

glm( y ~ v, family = poisson, link = log, weights = w)

(13)

where y,v,ware S-PLUSvectors of equal length containing the responses y_j,

the `explanatory variable' values vj, and the weightswj respectively, for each

quadrature point uj. If glm()yields a solution 2 > 0, i.e. > 1, then the

MPLE is b = 1,

b

= n(

x

)=jWj.

Note the integrals in (23){(25) are polynomials in :

p( ) = Z

W t

(u;x)du (27)

= a0+a1 + ::: + aK

K ₍₂₈₎

say, where ak = jAkj is the area of the region Ak =fu 2 W : t(u;

x

) = kg.

Thus (24){(25) can be rewritten

p( ) = n(

x

) (29)

p0( )

p( ) = 2s(n(

x

) :) (30)

In this simple case, the MPLE can be computed by solving (29){(30) directly,

although this still requires evaluation of the coecients aj, which calls for

numerical integration or computational geometry. We shall use this

\polyno-mial" approach to check the accuracy of our method in section 9.

The quadrature approximation (12) consists in replacingp( ) by

q( ) =Xm j=1 t(u j; x) wj = K X k=1 bk k; (31) where bk =P uj 2A

kwj are approximations to the areas of the sets Ak. The

approximation includes discontinuity error (19) arising because the weight

wi for a data pointxi witht(xi;

x

) =k but limu!x

it(u;

x

) =k+1 is ascribed

to bk rather than to bk+1.

The total error in approximating p( ) by q( ) is bounded by E1 =

PK

k=1

jak?bkj; the sum of the errors in approximating the area ak bybk. The

error in approximating p0( ) by q0( ) is bounded by E

2 = PK

k=1k

jak?bkj.

To control bothE1 andE2, dummypoints must be sucientlydense

through-out W and suciently dense where t(u;

x

) is high, that is, near the data

points.

(14)

5 Spatial interaction terms

Sections 5{7 present further examples of point processes, and examine the computational requirements for applying our method. The present section concerns point processes with various kinds of interpoint interaction

(pair-wise interaction and other). Inhomogeneous models are discussed in x6 and

marked point processes in x 7.

5.1 Pairwise interaction models

5.1.1 General loglinear form

Consider rst the general pairwise interaction process (3) and assume

b(u) = expf TB(u) g (32) h(u;v) = expf TH(u;v) g (33)

whereB(u) and H(u;v) are vectors dened for every u;v 2W. Note H(u;v)

should be a symmetric function of u and v. The conditional intensity (6)

becomes (u;

x

) = exp 8 < : TB(u) + T n(x) X i=1 H(u;xi) 9 = ; : (34)

This is of the loglinear form (9) required for our approximation, with

S(u;

x

) = B(u) + n(x) X i=1 xi 6=u H(u;xi) (35)

and the procedure of section 3.2 may be applied. The log pseudolikelihood is

concave in so the MPLE values form a nonempty convex set. Consistency

of the MPLE is not guaranteed in this generality.

In the rest of this section we assumeB(u) is constant; models for spatial

inhomogeneityare discussed in section 6. Here it is important to note that the

general form (32) assumed for b embraces not only parametric models but

also Generalised Additive Models [27] whereB(u) would be a vector of spline

basis functions. However, this apparently does not extend to GAM type

models for h, since the sucient statistic (35) is a sum of a variable number

of terms, which is beyond the scope of current GAM tting algorithms. Hence we are currently forced to consider only parametric models for interpoint interaction, such as the Strauss process.

(15)

5.1.2 Soft core process

The `soft core' model discussed by Ogata and Tanemura [44] is a pairwise

interaction process (3) with b(u) and

h(u;v) = exp ( ? jju?vjj 2= ) u6=v

where > 0 and 0 < 1 are parameters and 0 < < 1 is a nuisance

parameter which we assume known for the moment. The limitas !0 is the

hardcore process (Strauss with = 0, r = ); the density is not integrable

for 1.

Thus log(u;

x

) = expf

TS(u;

x

)

g where S(u;

x

) = (1;V (u;

x

))

T and = (log ;2=)T with V (u;

x

) =? n X i=1 xi 6=u jju?xijj ?2= : (36)

The log pseudolikelihood is concave in and the MPLE is well dened,

consistent and asymptotically normal, by [29].

The conditional intensity is loglinear in . To estimate and (given a

value of ) one would execute theS-PLUScommand

glm(y ~ v, family=poisson, weights=w)

where y, v, w are S-PLUS vectors containing the responses y_j = z_j=w_j,

explanatory variable vj = V (uj;

x

), and weights wj respectively. Then ^ =

exp ^1 and ^ =

^2 =

2

where ^1 and ^2 are the estimates of the linear

coecients returned by the glm()function of S-PLUS.

5.1.3 Step function interaction

In the absence of nonparametric estimators of h, there is much interest

[17, 18, 49, 61] in tting a stationary pairwise interaction process with a

piecewise constant interaction function h. Thus b(u) and h(u;v) is

a step function of jju ?vjj, say logh(u;v) = ` if r`

?1 <

jju?vjj r`,

and logh(u;v) = 0 if jju?vjj> rk, where 0 = r

0 < r1 < r2 < ::: < rk are

parameters. This is a special case of (32){(33) with = (log;1;2;:::;k)

T say, and B(u) = (1;0;0;:::;0) (37) H(u;v) = (0;I1( jju?vjj);:::;Ik(jju?vjj)) (38) 15

(16)

where I`(d) =

1

fr`

?1 < d

r`gfor ` = 1;2;:::;k. Thus

S(u;

x

) = (1;t1(u;

x

);:::;tk(u;

x

))

T

where for ` = 1;2;:::;k

t`(u;

x

) = #fxi 2

x

: r`

?1 <

jjxi?ujjr`g

is the number of points xi 2

x

whose distance from u lies in the interval

(r`?1;r`]. The MPLE is consistent by [30, Theorem 3.1] if either `

0 for

all ` = 1;:::;k, or the ` are uniformly bounded from above and 1 =

?1

(the process has a hard core).

In our approach it is easy to t this model, analogously to the Strauss process. The associated loglinear model is

logj = log + 1v1j +::: + kvkj

where v`j =t`(uj;

x

).

5.2 Area interaction process

The Widom-Rowlinson `penetrable sphere model' of liquid-vapour equilib-rium [25, 55, 65], also known as the `area interaction process' [3], has proba-bility density (in the simplest case)

p(

x

) =n(x) ?A(x) (39)

where A(

x

) is the area of the union of discs of radiusr centred at the points

xi. Here ; ;r > 0 are parameters and = (; ;r) is the normalising

constant. Generalisations are given in [3]. The process is well dened, i.e.

(39) is integrable, for all values of > 0 and for all compact W R

2.

It reduces to a Poisson process when = 1, exhibits ordered patterns for

0 < < 1 and produces clustering when > 1. Other properties and

maximum likelihood estimation are investigated in [57].

The conditional intensity is of the desired form log(u;

x

) = TS(u;

x

)

putting = (log;log )TandS(u;

x

) = (1;A(

x

[fug)?A(

x

))

T. Results in

[30] imply that if r is known, the MPLE of (; ) is consistent. The authors

do not know whether a central limit theorem is available; the results of [29] do not apply.

Another advantage of the Berman-Turner approach here is the reduction

in the computational cost since the values of D(u;

x

) = A(

x

[fug)?A(

x

)

are only required for a small number of points u.

(17)

5.3 Ord's process

Ord [51, discussion] suggested a model for regular patterns of points repre-senting entities which compete for resources, such as trees or towns. The Dirichlet tile associated with a point can be interpreted as the \territory" from which it draws resources. Ord suggested densities of the form

f(

x

;) /

n

Y

i=1

g(A(xi;

x

)) (40)

where A(xi;

x

) is the area of the Dirichlet tile associated with xi in the

pattern

x

, and g : R ! [0;1) is a function combining the roles of the

spatial interaction and intensity terms in other models. The special case

g(v) is the uniform Poisson process with intensity . Typically g()

would be an increasing function, so that small tiles are penalised.

Ripley [52, p. 175] concludes his analysis of the Swedish pines data (sec-tion 9.1) with a comment that tting Ord's process would be an interesting alternative analysis. To our knowledge, this has not been attempted and Ord's model has not been investigated or mentioned further, except in [4].

The process (40) exists (i.e. f is integrable) under reasonable conditions,

for example, wheneverg() is uniformly bounded. The conditional intensity

is (u;

x

) =g(A(u;

x

[fug)) Y xi u g(A(xi;

x

[fug)) g(A(xi;

x

)) (41)

where the product is over all points xi that are Dirichlet neighbours of u

in the pattern

x

[fug, and A(u;

x

[fug) is the area of the Dirichlet tile

with centreu in this pattern. Explicit analytic expressions for A(u;

x

[fug),

the pseudolikelihood, or the MPLE are not available. Geometric

computa-tion of A(u;

x

) is time-consuming so that a discrete approximation to the

pseudolikelihood becomes a necessity.

The Berman-Turner device (x3) can be applied if the kernel is modelled

in loglinear form g(v) = expf

TG(v)

g. Then log(u;

x

) =

TV (u;

x

) where V (u;

x

) =G(A(u;

x

[fug)) + X xi u [G(A(xi;

x

[fug))?G(A(xi;

x

))]

is the regression variable. Evaluating vj =V (uj;

x

) for allj requires

compu-tation of m + 1 dierent Dirichlet tessellations.

(18)

6 Inhomogeneous models

Few writers to date, apart from Ogata and Tanemura [47], have tted explicit models to point pattern data that incorporate both spatial inhomogeneity and interpoint interactions. In the context of our method, it is easy to introduce a spatial trend or dependence on spatial covariates. This is simply

a matter of adding more terms to the linear predictorS(u;

x

) in the associated

Poisson loglinear regression model.

6.1 Spatial trend

A straightforward model of spatial trend in a pairwise interaction process (3) is [44, 45] b(u) = expf TB(u) g (42) h(u;v) = h(u?v) = expf TH(u ?v)g (43)

so that the spatial trend is expressed by the dependence ofb(u) on location

u, while the interpoint interaction does not exhibit trend. Typically H would be one of the pairwise interaction functions considered in sections 4{5, while

B(u) = (B1(u);:::;Bk(u))

Twould be a vector of convenient scalar functions

of location, such as polynomials or orthonormal functions of the coordinates.

It is also possible to use the GAM approach [27] to model each B`(u) by a

smooth function of one coordinate. Assuming (42){(43) we have

log(u;

x

) = TB(u) + T

n(x) X i=1 xi 6=u H(u?xi);

this is of the form (9) so we may easily t it using the method of section 3.2,

indeed simply by adding the term TB(u) to the linear predictor in one of

the models discussed in previous subsections.

Ogata & Tanemura [44] developed maximum likelihood estimation tech-niques for models of this form, in particular combining a spatial trend with

the soft core interaction of Section 5.1.2. The trend term TB(u) was a

polynomial in the Cartesian coordinates. Details are given in Section 9.3. More generally, the spatial interaction could also depend on location. For

examplea spatially-varying scale could be introduced by replacingH(u?v) in

(43) byH(r(u)r(v)(u?v)) or similar expressions that are symmetricin u;v.

The main diculty is that loglinearity is usually lost, and we cannot apply 18

(19)

the method of section 3 directly, unless all the components of in uencing

r(u) are treated as nuisance parameters.

An eective alternative way to t models with spatially-varying interac-tion range is proposed by Nielsen and Vedel Jensen [42].

6.2 Spatial covariates

The data may include spatial covariates such as topographic elevation, soil pH, or another observed spatial pattern. Covariates might be introduced into the analysis in order to regress intensity on explanatory variables, eliminate spurious trend, or make inferences conditional on another spatial pattern.

For our purposes the spatial covariate must be incorporated as a `spatial

variable', i.e. a function Z(u) dened at each location u 2 W and observed

at each of the quadrature points uj (thus, including the data points).

De-pendence on the covariate can then be modelled by adding terms in Z(uj)

to the linear predictor.

The covariate valueZ(u) might be simply the observed value of a variable

such as pH or elevation, but often the covariate observations will be

trans-formed to yield Z(u). For example in spatial epidemiology Z(u) could be a

kernel smoothed estimate of the density of the population at risk, derived from point pattern or regional count data [14].

Another observed spatial pattern, such as a pattern of points or line seg-ments or a digital image, can be included as a spatial covariate by computing

a suitable function Z(u) associated with the pattern. Berman [6] proposed

modelling the dependence of a point process on a line segment process by con-ditioning on the line segment process and testing whether the point process

is inhomogeneous Poisson with intensity (u) depending on the minimum

distance Z(u) from location u to the nearest line segment. More generally

the association between the point process of interest X and another spatial

process Y can be investigated conditionally on Y =

y

by constructing a

Gibbs model for the joint distribution of X and Y , forming the conditional

intensity of the conditional distribution of X given Y , say

XjY;(u;

x

j

y

) = expf

TS(u;

x

j

y

)g

and forming the associated pseudolikelihood, which depends on Z(u) =

S(u;

x

j

y

). Sarkka and Hogmander [58] study hierarchical Gibbs models

for the association between two or more point processes.

(20)

7 Marked point patterns

7.1 General

The observed points may also carry `marks', i.e. observations mi associated

with each point xi of the pattern. The full dataset is a list

v

=f(x

1;m1):::;(xn;mn)

g (44)

with xi 2 W and mi 2 M, where M is the space of possible marks. The

marks may be observations of any kind; commonly M is either a discrete

set of `labels' M=f1;2;:::;cg or the positive real lineM= [0;1). In the

discrete case, the data points are eectively classied into m dierent types

or colours, and the mark attached to each point indicates its type. In the

continuous case mi is usually a physical measurement such as the height or

diameter of a tree whose location is xi. See [15, chap. 6,7], [13, x8.6{8.7],

[4, 19, 59]. Jensen and Mller [30] formally treat the pseudolikelihood of a

marked point process and prove consistency of the MPLE; Goulard et al.

[24] investigate further statistical properties. See [58].

The reference process for likelihoods is the Poisson marked point process constructed by attaching i.i.d. random marks to the points of a Poisson point

process on W with unit intensity [34]. The distribution of the marks in this

reference process is an arbitrary probability distribution Q on M.

Theinhomogeneous Poisson marked point process with intensity function

b :W M!R

+is the analogue of (2), with density

f(

v

;) = n(x) Y i=1 b(xi;mi) (45) where = () = expf? R WM(b(u;m)

?1)dQ(m)dug is the normalising

constant.

Thepairwise interaction marked point process is the analogue of (3), with density f(

v

;) = 2 4 n(x) Y i=1 b(xi;mi) 3 5 " Y i<j h((xi;mi);(xj;mj)) # (46)

where = () is the normalising constant, b : W M ! R

+ is the

activity/trend function and h : (W M)

2

! R

+ the interaction

func-tion. The function h is symmetric in the sense that h((u;m);(u0;m0)) =

h((u0;m0);(u;m)) for u;u0

2 W and m;m

0

2 M. Conditions must be

im-posed on b;h to ensure the density is integrable.

(21)

7.2 Pseudolikelihood

The conditional intensity of a Gibbs marked point process, analogously to

(4){(5), is a function ((u;m);

v

) of the marked pattern

v

and of a marked

point (u;m) with u 2 W; m 2 M. For example the pairwise interaction

marked point process (46) has

((u;m);

v

) =b(u;m) n(x) Y i=1 (u;m)6=(x i;mi ) h((u;m);(xi;mi)): (47)

The pseudolikelihood of a Gibbs marked point process is [24, 30]

PL(;

v

) = 2 4 n(x) Y i=1 ((xi;mi);

v

) 3 5 exp ? Z W Z M ((u;m);

v

)dQ(m) du: (48)

In the case of a multitype point process with c dierent types, we have

M=f1;2;:::;cg and the pseudolikelihood is usually dened by

PL(;

v

) = 2 4 n(x) Y i=1 ((xi;mi);

v

) 3 5 exp ( ? c X m=1 Z W ((u;m);

v

)du ) : (49)

7.3 Berman-Turner device

To apply our approximation method to (48) we create a set of marked points

(uj;kj),j = 1;:::;M which include the data (xi;mi),i = 1;:::;n and form a

good quadrature rule for WM. It would usually be convenient to take the

Cartesian product of a set of quadrature points in W and a set of elements

of M. We shall assume this and write the marked points as (uj;k`) for

j = 1;:::;J and ` = 1;:::;L where uj 2 W;k` 2 M. Then dene the

indicator zj`to equal 1 if (uj;k`) is a data point and 0 if it is a dummy point.

Letwj` be the corresponding weights for a linear quadrature rule inWM.

Then the pseudolikelihood is approximated by

logPL( ;

v

) L X `=1 J X j=1 (yj`logj`?j`) wj` (50)

wherej`=((uj;m`);

v

) andyj`=zj`=wj`. For discrete marks as in (49),

the weights may simply be those for a quadrature rule in W corresponding

to the points uj. This is illustrated in Figure 3.

(22)

Figure 3: Quadrature of marked point pattern using the Dirichlet

tessella-tion. Left: Illustrative example of a marked point pattern with two types,

1 (triangles) and 2 (boxes) in the unit square W. Middle and Right: The

Dirichlet tessellation of W f1;2g based on the data and a 55 grid of

dummy points duplicated for the two types. Middle: the planeW f1g of

type 1 points; Right: the plane W f2g of type 2 points.

7.4 Example: 2-type Strauss process

This is the special case of the pairwise interaction marked point process (46)

in which M=f1;2g, i.e. points belong to one of two types, and

b(u;m) = m h((u;m);(u0;m0)) = m;m0 if 0< jju?u 0 jj< rm;m 0 1 otherwise

where 1;2 > 0 are intensity parameters, 11; 22; 12; 21

2[0;1] are

inter-action parameters, and r11;r22;r12;r21 > 0 are interaction distances, with

12 = 21 and r12=r21. The density may be expressed as

f(

v

;) = n1 (v ) 1 n2 (v ) 2 s11 (v ) 11 s12 (v ) 12 s22 (v ) 22

analogously to (20), where n1(

v

);n2(

v

) are the numbers of points of type 1

and 2 respectively, and sm;m0(

v

) is the number of pairs of distinct marked

points of types m and m0 respectively within a distancer

m;m0 of each other.

The conditional intensity is

((u;m);

v

) =m t1 ((u;m);v ) m1 t2 ((u;m);v ) m2 for u2W;m2M, where tm0((u;m);

v

) = # fi : 0 <jju?xijjrm;m 0; mi =m 0 g

is the number of type m0 points within the required distancer

m;m0 of a point

u with type m.

(23)

This model may be cast in the loglinear form (9) with parameter vector

= (log 1;log2;log 11;log 12;log 22) and ve \explanatory variables",

I1(m), I2(m), I1(m)t1((u;m);

v

),I1(m)t2((u;m);

v

)+I2(m)t1((u;m);

v

) and

I2(m)t2((u;m);

v

) respectively, where Ik(m) =

1

fm = kg. Equivalently it

may be described as a nested model with one factor and two covariates, one of which is nested within the factor.

The pseudolikelihood estimate of is consistent [30]. The central limit

theorem of Jensen and Kunsch [29] does not strictly apply here since there are more than two parameters, but these authors conjecture (p. 477) that a generalisation does hold.

8 Estimation and inference issues

8.1 Edge eects

For inferential purposes it matters whether we assume the data

x

are a

realisation of a nite point process X dened only inside the domain W

(`bounded case') or a partially observed realisation

x

=

y

\W of a point

process Y extending throughout Rd (`unbounded case'). In the unbounded

case we have an `edge eect' problem: the conditional intensity (u;

y

) of

Y may not be observable from the data

x

=

y

\ W, since the required

information may involve points lying outside the observation window W.

One can simply ignore edge eects and apply the pseudolikelihood to

x

as

if it were

y

. This corresponds to treating an observation of the `unbounded'

model as if it were drawn from the `bounded' model. This causes bias in parameter estimates of the interaction and intensity parameters. Remedies for edge eects are surveyed in [2, 53, 59]. Following are some possible strategies.

8.1.1 Periodic boundary conditions

If the window W is rectangular one may apply \periodic boundary

condi-tions" [51] by identifying opposite sides of W so that e.g. points near the

right hand edge have neighbours near the left edge. This typically reduces

bias but in ates variance, and is only applicable to certain shapes of W. It

seems inadvisable for inhomogeneous patterns.

8.1.2 Border method

This applies [53] to any process with nite interaction range r, in the sense

that the(u;

x

) depends only on data pointsxilying within a distancer of u.

(24)

An example is the Strauss process with xed r. Form the pseudolikelihood over the subregion

W r=

fu2W : b(u;r)Wg

of all points of W lying at least r units from the boundary. For u2W

r the

conditional intensity is observable, (u;

y

) = (u;

y

\W) so the

pseudo-likelihood over W r can be calculated from the data.

Ifr is unknown, one must be wary of comparing pseudolikelihoods based

on dierent subsets W r. One strategy is to compute all pseudolikelihoods

over the same domain W R whereR is the maximum r value contemplated.

The border method applies to any model with nite interaction range, including nonstationary processes, and the associated MPLE satises an un-biased estimating equation. However, it discards appreciable amounts of data.

8.1.3 Ripley's hybrid method

In estimation problems, an improvement on the border method is to use edge correction weights [2, 53, 59]. Ripley [53, p. 67], [64, p. 396] extended this to maximum pseudolikelihood estimation for the Strauss process. He proposed that the right side of (30), which cannot be observed due to edge

eects, be estimated by n(

x

)K(r)=b

jWj, a quantity which has approximately

the same expectation. Here K(r) is the estimate of the value K(r) of theb

second moment function K of the process [53]

b K(r) = jWj n(

x

)2 X i6=i 0

1

fjjxi?xi 0 jjrge(xi;xi 0;W) (51)

wheree(u;v;W) is an edge eect correction factor ensuring unbiased

estima-tion of 2K(r) if the point process is stationary and isotropic. The left side

of (30) is also subject to edge eects, and is modied by using the eroded

domain W r instead of W as the domain of integration in (27). This is a

`hybrid' of the border method and the edge correction weights strategies.

8.1.4 Edge corrected pseudolikelihood

An alternative, which we believe is new, is to introduce edge correction weights into the pseudolikelihood itself. Consider a stationary pairwise

in-teraction process with b(u) and h(u;v) = exp

TH(u

?v)

: Suppose that we now modify the model, replacing the pairwise interaction function by hE(u;v) = exp e(u;v;W)TH(u ?v) 24

(25)

where e(u;v;W) is an edge correction weight as in [2] which must be

sym-metric in u and v. This modied model has conditional intensity E(u;

x

) =

expf TSE(u;

x

) where SE_(u;

_x

_{) =} n(x) X i=1 e(u;xi;W)H(u?xi)

is a \plug-in" estimator of the unobservable potential S(u;

y

) for the original

model. Forming the pseudolikelihood for the modied model and deriving the

normal equations, we obtain (10) withS(u;

x

) replaced bySE_(u;

x

₎

through-out. By the nonstationary Nguyen-Zessin formula (11), this is an unbiased

estimating equation for the modied model. It is approximately unbiased

for the original model, when jjjj is small. This model can be tted by the

Berman-Turner method.

8.1.5 Data augmentation

The unobserved points of

y

outside the window which aect the value of

(u;

y

) for u 2 W can alternatively be regarded as missing data. One

approach is data augmentation [63, chap. 5] which has been applied to max-imum likelihood inference for point processes by Geyer [21]. This can also be applied in our context.

8.2 Nuisance Parameters and Prole Pseudolikelihood

The point process models considered above all contain `nuisance' parameters which do not enter in the loglinear form (9) required for our method. A possible approach to estimation is by analogy with prole likelihood. Write = ('; ) where are the nuisance parameters, so that we assume

(u;

x

) = exp

'TS(u;

x

; )

(52)

instead of (9). For each xed value of the model is loglinear in ', so that

we can apply our approximation method to maximise the pseudolikelihood

over ', yielding an MPLE '( ) for xed . Computing the maximisedb

pseudolikelihood as a function of yields the prole pseudolikelihood

PL(('( ); );b

x

) = max

' PL(('; )): (53)

The global MPLE of is then obtained by maximising this prole

pseudo-likelihood over . We examine this approach for the Strauss and soft core

processes in Section 9.

(26)

8.3 Parametric inference and model choice

In order to draw inferences about the parameters of a Gibbs process tted by maximum pseudolikelihood, and to choose models, we use the parametric

bootstrap. We shall generally assume in practice that the MPLE b of the

canonical parameter is approximately unbiased and approximately normal, although this has only been established in certain cases [29, 30].

To obtain condence intervals for , we generate a large number M of

simulated realisations from the tted model = b, yielding M simulated

val-ues b

(1);:::;

b

(M) from the distribution of the MPLE under the tted model.

We estimate the mean vector and covariance matrix of this distribution from the simulated values, then construct condence intervals using location mod-els based on the multivariate normal or the bootstrap distribution. Similarly for model choice we use the bootstrap distribution of the deviance between two (nested) models.

(27)

9 Examples of applications

The analyses in this paper were performed using S-PLUS [5, 11, 64] using

the Generalised Linear Model tting function glm() and occasionally the

Generalised Additive Model function gam(). Some analysis were repeated

using GLIM [1] as a cross-check.

9.1 Swedish pines data

• • • • • • • • • • • • • • • • • • • • ••• • • • • •• • •• • • • • • • • • • • _• • • • • • • • • • • • • • • • • • •• • • •

Figure 4: The Swedish pines data: locations of 71 pine saplings in a 10 x 10 metre square. Extracted by Ripley [52] from Strand [60]. Data obtained

from the MASS library accompanying [64].

Figure 4 depicts the Swedish pines data of Strand [60] which give the locations of 71 pine saplings in a 10 x 10 metre square. Ripley's pioneering

analysis [52, x8.6, pp. 172{175] plotted L(t) =

p

K(t)= and rejected the hypothesis of a homogeneous Poisson process at the 1% level by a Monte

Carlo test based on D = suptjL(t)?tj. Ripley then tted a Strauss process

manually, obtaining r = 0:7 metres and = 0:20. In the latest analysis,

again by Ripley [64, p. 396], was estimated to be 0:15 using maximum

pseudolikelihood with Ripley's hybrid edge correction (x8.1.3).

We tted a Strauss process to these data by maximum pseudolikelihood, using both the Berman-Turner device and the polynomial approach via (29){

(30). We estimated and , but initially held r xed at 0:7. For the

Berman-Turner method, varying densities of dummy points were tried, and quadrature weights were computed using both the Dirichlet and counting

methods (section 3.3). Estimates obtained for ranged from 0.29 down to

(28)

0.20, and for from 1.49 up to 2.12. A ner quadrature scheme always led

to a smaller value of and a larger value of . Both the Berman-Turner and

polynomial methods gave = 0:21 using a 5050 grid of dummypoints. This

is close to the value obtained by Ripley in [52]. The corresponding value of

was 1.98 by the Berman-Turner method and 2.01 by the polynomial method.

Various edge corrections (section 8.1) were tried, all using a 5050 grid of

dummy points. Using the border method, eroding the window by a distance

r = 0:7 metres, we obtained b = 0:13 and values of 3.24 (Berman-Turner)

and 3.29 (polynomial). Periodic edge correction yielded ^ = 2:09, ^ = 0:24

(Berman-Turner) and ^ = 2:24, ^ = 0:22 (polynomial). Our proposed edge

corrected pseudolikelihood method was also applied, using the translation

correction [2, 53] as the edge correction factor e(u;v;W). The parameter

estimates were = 1:97,b

b

= 0:25. The latter two edge corrections in ated

the estimate of while the border correction de ated it.

A plot of the prole log pseudolikelihood of the interaction distance r

is shown in Figure 5. The plot yields br = 0:7, which agrees with [64, p.

396]. The jaggedness of the plot is due to the discontinuity of the interpoint

interaction:

1

fjju?xijjrg and hences(

x

) are discontinuous functions of

r, while the left sides of (29){(30) are dierentiable with respect to r. There

seems little prospect of a convenient limit theory for r.b

Next we estimate the covariance matrix of the parameter estimates using the parametric bootstrap (section 8.3). To reduce the amount of computation we did not apply edge-correction and looked at only one set of quadrature

weights (based on a 50 50 regular grid). However r was estimated by

prole pseudolikelihood. This version of the estimation algorithm was rst

applied to the data yielding (^; ^ ; ^r) = (1:9781;0:2131;0:7). A

Metropolis-Hastings birth-death-shift algorithm [20] was used to generate 500 simulated realisations from the Strauss process with the same parameter values. The bootstrap covariance matrix, based on 500 parametric bootstrap replicates, was ^C = 2 4 0:1938 ?0:0155 0:0036 ?0:0155 0:0063 0:0008 0:0036 0:0008 0:0014 3 5

yieldingcorresponding (normal based) 95% condence intervals of [1:1153;2:8410], [0:0575;0:3686], and [0:6267;0:7733] for , , and r respectfully.

Normality of the estimates is suspect. Chi-squared tests for normality

on the sequences of bootstrap replicates of estimates gave p-values of 0.02,

0.22, and 0 respectively for the normality of ^, ^ , and ^r, so that ^ is the only

estimate which may legitimately be assumed normal. However, rough 95% condence intervals based on the empirical quantiles of the bootstrap

(29)

•• • • • • • • • • • •• •• • • • • • • • • • • •• • • • • • • • • • • • • •• •• • • • •• • • • • • • • • • • ••• ••• ••• •••• Interaction radius

Log pseudo likelihood

0.0 0.5 1.0 1.5

-90

-80

-70

-60

Figure 5: Prole log pseudolikelihood of the Strauss process nuisance

param-eter r for the Swedish pines data. Solid line: polynomial method and (26);

dotted line: Berman-Turner device and (18). For comparison, the

homoge-neous Poisson process achieves a maximum log pseudolikelihood of ?92:4

(see x3.2).

cates were calculated as [1:29;2:73], [0:09;0:39], and [0:62;0:81] respectively. These are in broad agreement with the normal-based intervals.

The condence interval for easily captures Ripley's (edge-corrected)

value of 0.15. However the interval does not embrace the corresponding

-value of 3.11. Thus appears to be more sensitive to the estimation

methodology than does .

9.2 Swedish pines data | Ord's model

Following remarks of Ord [51, discussion] and Ripley [52, p. 175] we at-tempted to t Ord's model (section 5.3) to the Swedish pines data. For

(30)

simplicity we took a \Strauss-type" kernel g(v) = if vv 0 if v > v0 (54)

where ; > 0 are parameters to be estimated and the threshold v0 > 0 is a

nuisance parameter. Threshold Profile log PL 0.5 1.0 1.5 2.0 2.5 3.0 -90 -85 -80 -75

Figure 6: Prole log pseudolikelihood for the threshold parameter v0 of the

Strauss-type kernel (54) in the Ord model for the Swedish pines data. The

peak is at v0 = 1:10 square metres. Dashed line shows log pseudolikelihood

for the homogeneous Poisson model.

Figure 6 shows the prole log pseudolikelihood ofv0for the Swedish pines

data. Jaggedness of the plot may again be explained by discontinuity of the

kernel. There is a sharp peak at v0 = 1:10 square metres. Adopting this

value as the threshold, the parameter estimates for the Swedish pines data

are = 1:70 and = 0:43. Figure 7 shows the result of tting the same

model (with v0 xed at 1.10) to 100 simulations of a binomial process, i.e.

71 independent uniformly distributed points in the same region. It indicates

very strong dependence between andb

b

for the binomial process. The plot conrms that the Swedish pines data appear to be strongly ordered.

(31)

log beta log gamma -1.0 -0.5 0.0 0.5 -0.5 0.0 0.5 • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

Figure 7: Estimated parameters log and log for the Strauss-type Ord

kernel (54) for the Swedish pines data (2) and for 100 simulations of a

binomial process with the same number of points (+). Dotted lines indicate = ^ (the estimated intensity) and = 1 (corresponding to a Poisson process).

9.3 Japanese black pines data

Figure 8 depicts the Japanese black pines data of Numata [43] giving the

locations of 204 seedlings in a 1010 metre square. Ogata & Tanemura [47]

used approximate maximum likelihood estimation to t a soft core model

with log-polynomial trend (i.e. where B(u) in (32) is a polynomial in the

Cartesian coordinates), choosing a cubic polynomial as giving the optimal t.

In our analysis we rst tted a soft core model with log-cubic trend. The homogeneous soft core model has already been discussed in section 5.1.2. Adding the polynomial trend to the model is trivial using the Berman-Turner device; it is simply a matter of adding polynomial terms in the Cartesian coordinates to the linear predictor in the associated loglinear model. The

estimation of the nuisance parameter is problematic, so we initially set

= 0:5 arbitrarily. The tted trend surface is shown in Figure 9. Its contours are similar to those obtained by Ogata and Tanemura in [47] . Edge corrections had little eect on the t, suggesting that edge eects are negligible.

(32)

+ + + + + + + + + ++ + + + + + ++ + + + + + + + + ++ + + + + + + + + + +₊+ + + + + + + + + + + + + + ₊ + + + + + + ++ + + + + + +++ + + + + + + + + + + + + + + ++ + ++ + + + +++ + + + + + + + + + ++ ++ + + + ++ + + + + + + + + + +++ + + + +++ + + + + + + + + + + + + + + + + + + + + + ₊ + + + + + + + + + ++ + + + + + + + + + + + + + ++ + + + + + ++ + + + + + + + + + + + + + + + + + +

Figure 8: Map of a natural stand of seedlings and saplings of Japanese black

pine (Pinus Thunbergii), 204 seedlings and saplings in a sampling rectangle

10 m 10 m. Source: Numata (1964). Data kindly supplied by Professors

Y. Ogata and M. Tanemura.

We also nd it helpful to plot the tted conditional intensity function b(

;

x

) as shown in Figure 10. This is not a substitute for plotting the

trend surface, since the conditional intensity depends on the realised pattern

x

. Its usefulness lies in visualising the eect of the tted interaction model

on the underlying trend, the relative magnitudes and ranges of the trend and interaction terms, and the tradeo between these two (when comparing dierent models). The plot also helps in checking discretisation eects.

Other interaction terms and trend terms can be tted at little extra cost using the Berman-Turner device, in contrast to the extra eort required for maximum likelihood or simulation-based approaches. It is of interest to compare the foregoing t with that obtained using a Strauss model for the interaction (along with a cubic polynomial spatial trend). In obtaining the

Strauss t we estimated the interaction radius r by maximizing the prole

pseudolikelihood, as well as estimating the parameters and .

As noted in x2.3, the model parameters may be subject to constraints.

The Strauss parameter must satisfy 0 1. In this analysis we had

to impose the constraint explicitly, i.e. for some values of r an estimate

^

> 1 was obtained, whereupon we set ^ = 1, and adjusted and the 32

(33)

0 2 4 6 8 10 X 0 2 4 6 8 10 Y 0 1 2 3 4 5 6 trend 0 2 4 6 8 10 0 2 4 6 8 10 0.511.5 1.5 2 2 2.53 2.5 3 3.5 3.5 4 4 4.5 4.5 5 5 5.5 5.5 + + + + + + ++ + +++ +++ + + ++ + + + + + + +_{+ + + +}++ ++ + + + + ++ + + + + + +₊ + + + + + + + + ++ ++ +₊₊ + + ₊+ + ++ + + + + ++ ++ + ++ + + + + + + + ++ +++++ + + + + + + + + + + ++ ++ + + ++++ + + + + + + + + +++ + + + + ++ ++ + + +₊ + + + + + + + + + + + +₊ + + + + + + + + + + + ++ +++ + + + + + + + + + + ++ + + + + +++ + + ++ + + + + + + ++ + + + + + + +

Figure 9: Fitted log-cubic trend surface expf

b

TB(u)

gfor the Japanese black

pines data with soft core interaction model. Top: perspective plot; bottom:

contour plot.

pseudolikelihood accordingly.

A plot of the resulting prole log pseudolikelihood is shown in Figure 11

and yields ^r = 0:14. This value is just less than the minimum interpoint

distance for the Japanese black pines data set. That is, when a spatial trend is allowed for, the optimal Strauss model for the interaction is the hard core model.

Using r = 0:14 we tted the inhomogeneous Strauss model. The tted

trend was visually identical to that obtained for the inhomogeneous soft core model, and is not shown. The tted conditional intensity function is shown in Figure 12; this is essentially the trend surface \with holes of radius 0.14 punched in it" at each data point.

Although the trend surfaces are visually identical, one might ask for a more objective assessment of the dierence between the two trends. To this end, we examined the dierences between the corresponding polynomial

(34)

0 2 4 6 8 10 X 0 2 4 6 8 10 Y 0 1 2 3 4 5 6 c.i.f.

Figure 10: Fitted conditional intensity function ^(

;

x

) for the Japanese

black pines data with soft core interaction model.

coecients. These appeared to be relatively small; the maximum percentage dierence jest 1 ?est 2 j (jest 1 j+jest 2 j)=2 100% was about 7%.

Yet it is not clear how to assess the magnitude of these dierences. A rough idea might be given by dividing the dierences by an estimate of the standard deviations of, say, the Strauss ts, obtained by bootstrapping. When this was done, the maximum absolute value of the resulting ratios was

0.0956 (corresponding to the x3 coecient). Intuitively this conrms the

visual impression that there is no evidence of a dierence between the two tted trends.

A trend was also tted to the Japanese black pines data in the form of a general non-parametric smooth function, the possibility of which was

mentioned in section 6.1. The smooth function was provided by the S-PLUS

(35)

• •• • • • •• • • • • • • • •• • •• •••• • • • • • • • • • • • • • • • • • Interaction radius r.

Profile log pseudolikelihood.

0.0 0.1 0.2 0.3 0.4 0.5

-40

-35

-30

Figure 11: Prole log pseudolikelihood of the Strauss interaction range r for

the Japanese black pines data; Strauss process, with log-cubic polynomial

trend, tted using the Berman-Turner device. Maximum occurs at r = 0:14.

functionlo(), and the t was accomplished using the functiongam()in place

of glm(). Both Strauss and soft core models were used for the interaction.

When the Strauss model was used, prole pseudolikelihood indicated a value

of 0.14 for the r parameter | i.e. a hard core model | the same as for the

cubic polynomial trend.

For the Strauss (hard core) model the intensity function and the trend surfaces were visually indistinguishable from those obtained using the cubic polynomial trend. For the soft core model the interaction seemed much more

\subdued" when the trend was modelled using lo(); the plot of the

condi-tional intensity showed not much more than dimples in the trend surface. This nding reinforces the principle that the more freedom we allow for the trend, the closer the trend ts the actual data, so the less interaction is needed to explain the data. This eect is not noticeable with the hard core

interaction which cannot adapt itself to the smooth trend surface. The lo()

(36)

0 2 4 6 8 10 X 0 2 4 6 8 10 Y 0 1 2 3 4 5 6 c.i.f.

Figure 12: Conditional intensity function for the Japanese black pines data with the interaction modelled as a Strauss process.

trend itself, with soft core interaction, was visually very similar to the cubic polynomial trend, but slightly lower.

Next we attempted to estimate the soft core nuisance parameter . The

approximate prole log pseudolikelihood can be calculated from the output of the GLM tting algorithm, via (18). However plots of this quantity and of

the parameter estimates suggested that small values of lead to numerical

instability. This persisted when dierent starting values and dierent

statis-tical packages (S-PLUS,GLIM) were used. Note that the interaction potential

(36) is unbounded, with innities at the data points, and the approximate

pseudolikelihood is not uniformly continuous in , even for xed data and

dummy points. Hence the quadrature schemes advocated in section 3.3 ap-pear to be inadequate for the prole pseudolikelihood.

An alternative numerical integration procedure was then implemented using the midpoint rule and a ne array of integration points. Figure 13 shows the resulting approximate prole log pseudolikelihood. It suggests that

(37)

• • • • • • • • • • • • • • • • • • • •_• •_• •_• •• • •_• •_• •_• •_• • • • • • • • • • • • • • Parameter kappa Log pseudolikelihood 0.0 0.1 0.2 0.3 0.4 -37 -36 -35 -34 -33

Figure 13: Prole log pseudolikelihood of the soft core process nuisance

pa-rameter for the Japanese black pines data. Calculated by the \exact"

method.

the maximum occurs very close to = 0. The pseudolikelihood will in fact

have an innite maximum at a point where = 0 in certain circumstances.

Let d = mini6=i

0jjxi ?xi0jj be the minimum inter-point distance and b =

supu2Wmini

=1;:::;n(x)

jju?xijj the maximum distance from a location in W

to the nearest data point. If b d then the soft core pseudolikelihood of S

has an innite maximum at = d and = 0.

To prove this, write the log pseudolikelihood

nln? X i6=j jjxi?xjjj 2= ? Z Wexp ( ? n X i=1 jjxi?ujj 2= ) du

in the form nln ?S ?I where S represents the nite sum and I the

integral. Maximising with respect to makes the prole log pseudolikelihood

of (;) equal to

nlnn?S?nlnI .

Let Er = fu 2 W : jju?xijj r for all ig. Observe that for u 2 E the

integrand of I tends to 1 as !0 and for u2W nE the integrand tends

to 0 so that I tends to jEj as !0.

If b d, jEj can be made arbitrarily small by taking less than but

suciently close to d. Now if < d thenS !0 as !0. Thus, for such ,

Practical maximum pseudolikelihood. for spatial point patterns. Adrian Baddeley and Rolf Turner. Abstract