Practical maximum pseudolikelihood
for spatial point patterns
Adrian Baddeley and Rolf Turner
Abstract
We describe a technique for computing approximate maximum pseudolikelihood estimates of the parameters of a spatial point pro-cess. The method is an extension of Berman and Turner's [7] de-vice for maximising the likelihoods of inhomogeneous spatial Poisson processes. For a very wide class of spatial point process models the likelihood is intractable, while the pseudolikelihood [8] is known ex-plicitly, except for the computation of an integral over the sampling region. Approximating this integral by a nite sum in a special way yields an approximate pseudolikelihood which is formally equivalent to the (weighted) likelihood of a loglinear model with Poisson responses. This can be maximised using standard statistical software for gener-alised linear or additive models, provided the conditional intensity of the process takes an `exponential family' form. Using this approach we are able to t rapidly a wide variety of spatial point process mod-els of Gibbs type, incorporating spatial trends, interaction between points, dependence on spatial covariates, and mark information.
Keywords:
Area-interaction process; Berman-Turner device; Dirichlet tes-sellation; Edge eects; Generalised additive models; Generalised linear mod-els; Gibbs point processes; GLIM; Hard core process; Inhomogeneous point process; Marked point processes; Markov spatial point processes; Ord's pro-cess; Pairwise interaction; Prole pseudolikelihood; Spatial clustering; Soft core process; Spatial trend; S-PLUS; Strauss process; Widom-Rowlinson model.Introduction
This paper describes a computational device for rapidly tting statistical models to spatial point patterns. Applications are shown in Section 9. Datasets may consist of points in two or three dimensions or in space-time; the points may be classied into dierent types or carry auxiliary obser-vations (\marks"). Additionally there may be spatial covariates, such as topography or another spatial pattern observed in the same region.
Realistic models for such data should incorporate both spatial inhomo-geneity (`trend') and dependence between points (`interaction' such as clus-tering or regularity). Ogata and Tanemura [44, 45, 46, 47] and Penttinen [49] developed methods for maximum likelihood estimation for such models, and applied them to real data. Recent advances have been made by Geyer, Mller and others [20, 21]. However, maximum likelihood is computation-ally intensive, and employs simulation algorithms which are specic to the chosen model. It is even more costly for inhomogeneous spatial patterns be-cause of increased parameter dimensionality and complexity of simulation. This militates against the modern statistical practice of tting several al-ternative models to the same dataset and introducing smooth functions as model terms. Few writers apart from Ogata and Tanemura [47] have tted inhomogeneous point process models, other than the inhomogeneous Poisson process, to real spatial data.
Berman and Turner [7] introduced a technique for maximising the likeli-hoods of (a) general point processes in time, and (b) inhomogeneous Poisson
processes in d dimensional space. The intensity or conditional intensity of
the process is assumed to be loglinear in the parameters. They approxi-mated the log likelihood by a nite sum which has the same analytical form as the (weighted) log likelihood of a generalised linear model with Poisson responses. The approximate likelihood can then be maximised using existing software for generalised linear models. Related ideas have been explored by Lindsey [36, 37, 38, 39].
In this paper we extend the Berman-Turner device to a much larger class of spatial point process models, namely Gibbs point processes with
exponen-tial family likelihoods. We obtain an approximation to the pseudolikelihood
[8, 9, 30] rather than to the likelihood. The maximum pseudolikelihood es-timator is a practical alternative to the MLE, satises unbiased estimating equations and is consistent and asymptotically normal under suitable condi-tions. The MLE is not necessarily optimal here since the usual asymptotic theory is not applicable. Under reasonable assumptions [16] the maximum pseudolikelihood normal equations are a special case of the Takacs-Fiksel estimating equations, an application of the method of moments [17, 61, 18].
Using the extended Berman-Turner device, and standard statistical soft-ware, we are able rapidly to t quite complex spatial stochastic models in-volving spatial trends and spatial covariates as well as interactions between points.
The plan of the paper is as follows. Denitions and background are given in Sections 1 and Section 2. Our extension of the Berman-Turner computa-tional device is presented in Section 3. Section 4 treats a simple example. Application of the method to specic models is developed in Section 5 for models of spatial interaction, Section 6 for spatial inhomogeneity, and Sec-tion 7 for marked point patterns. SecSec-tion 8 treats some issues in estimaSec-tion and inference. The method is applied to real datasets in Section 9.
1 Background and denitions
1.1 Likelihoods
The data consist of a spatial point pattern
x
observed in a bounded regionW of space. Thus
x
=fx1;:::;xn
g (1)
where the number of points n 0 is not xed, and eachxi is a point in W.
The region W is a known, bounded subset of d-dimensional space Rd, where
d 1. Extensions of this basic setup to incorporate spatial covariates and
marked points are discussed in sections 6.1{6.2 and 7 respectively.
The data
x
are assumed to be a realisation of a random point processX in W. Typically the null model (or the null hypothesis) will be the ho-mogeneous Poisson point process [12, 34]. Other models will be specied by
their likelihood with respect to the Poisson process. Thus we assume X has
a probability density f(
x
;) with respect to the distribution of the Poissonprocess with intensity 1 on W. Additionally we assume f(
x
;) > 0 impliesf(
y
;) > 0 for all subsetsy
x
. This is the class of Gibbs processes on W,see [50, 54, 59]. The distribution is governed by a vector parameter ranging
over a set Rp. See [12, 21].
1.2 Basic models
Specic models are detailed in sections 5{7, but it is instructive to list brie y four important examples. Firstly the homogeneous Poisson process with
intensity > 0 has density
f(
x
;) = e?(?1)jWjn(x);where n(
x
) denotes the number of points inx
and jWj is the volume of W.This yields the maximum likelihood estimate = n(b
x
)=jWj.
Secondly consider theinhomogeneous Poisson process on W with rate or
intensity function : W !R, see [12, 34]. In statistical models, the intensity
(u) will depend on to re ect `spatial trend' (a change in intensity across
the region of observation) or dependence on a covariate. The density is
f(
x
;) = 2 4 n(x) Y i=1 (xi) 3 5exp ? Z W [(u)?1] du : (2)Maximisation of (2) generally requires iterative optimization methods.
Thirdly thepairwise interaction process onW with trend or activity
func-tion b :W !R
+and interaction functionh:W
W !R +has density f(
x
;) = ()n(x) Y i=1 b(xi) Y i<jh(xi;xj) (3)where() > 0 is the normalising constant. Conditions must be imposed on
b and h to ensure the density is well-dened and integrable: in particular
h(u;v) = h(v;u). Examples are given in section 5. See the excellentsurveys
by Ripley [53, 54]. Pairwise interaction models are suitable for the data in Figures 8 and 14, as shown in [44, 45, 46, 47] and [56, 62] respectively. The
terms b(xi) in (3) in uence the intensity of points and introduce a spatial
trend if b() is not constant. The terms h(xi;xj) introduce dependence
(`interaction') between dierent points of the process X. If h 1 the model
reduces to an inhomogeneous Poisson process with intensity function b(u).
The normalising constant () in (3) is generally an intractable function
of . Methods for approximating () and maximising likelihood include
functional expansions of (), Monte Carlo integration, and analogues of
E-M and stochastic approximation [21, 41, 44, 45, 46, 47, 49].
Most models considered in this paper are pairwise interaction processes, but we also discuss the Widom-Rowlinson (`area-interaction') model (sec-tion 5.2) and Ord's model (sec(sec-tion 5.3).
2 Pseudolikelihood
It is generally dicult to evaluate and maximise the likelihoods of point pro-cesses other than the inhomogeneous Poisson (2). Even simple exponential family models such as the pairwise interaction processes (3) include a
nor-malising constant which is an intractable function of . An alternative to the
likelihood function is the pseudolikelihood [8, 9, 10, 30] which we describe here. See [16, 17, 18, 53, 54, 56, 61] for other applications.
Originally Besag [8, 9] dened the pseudolikelihood of a nite set of
ran-dom variables X1;:::;Xn as the product of the conditional likelihoods of
each individual Xi given the other variables fXj : j 6= ig. This was
ex-tended [9, 10] to point processes, for which it can be viewed as an innite product of innitesimal conditional probabilities.
2.1 Conditional intensity
To construct the pseudolikelihood we require the (Papangelou) conditional
intensity (u;
x
) of X at a location u2W. This may be loosely interpretedas giving the conditional probability that X has a point at u given that the
rest of the process coincides with
x
. See [32] for an informal introduction, or[22, 23, 31, 35] for details.
For any Gibbs process onW (see section 1) with density f, the conditional
intensity at a point u2W is (u;
x
) = f(x
[fug) f(x
) (4) if u62x
, while for xi 2x
(xi;x
) = f(f(x
x
) nfxig): (5)For example, the inhomogeneous Poisson process with intensity function()
has conditional intensity (u;
x
) = (u) at all points u. The fact that thisdoes not depend on
x
is a consequence of the independence properties of thePoisson process. For a general Gibbs point process (u;
x
) does depend onx
. The general pairwise interaction process (3) has conditional intensity(u;
x
) = b(u) n(x) Y i=1 xi6=u h(u;xi): (6)Note (;
x
) is discontinuous at the data points xi, and that the intractablenormalising constant in (3) has been eliminated in the conditional intensity.
2.2 Denition of pseudolikelihood
Besag [9] dened the pseudolikelihood of a point process with conditional
intensity(u;
x
) over a subset A W to bePLA(;
x
) = " Y xi 2A (xi;x
) # exp ? Z A(u;x
)du (7) and gave examples of the utility of maximum pseudolikelihood estimates. Further theory was developed in [10, 29, 30].If the process is Poisson the pseudolikelihood coincides with the likelihood
(2) up to the factor exp(jWj). For a pairwise interaction process (3), the
pseudolikelihood is PL(;
x
) = 2 4 n(x) Y i=1 b(xi) 3 5 " Y i6=j h(xi;xj) # exp 8 < : ? Z Wb(u) n(x) Y i=1 h(u;xi)du 9 = ; ; (8)the intractable normalising constant () appearing in the likelihood (3) has
been replaced by an exponential integral in (8) as if the process were Poisson. We give other examples in section 5{6 below.
For processes with `weak interaction' in the sense that (u;
x
) can beapproximated well by a function ofu only, the process is approximately
Pois-son and the pseudolikelihood is an approximation to the likelihood. Hence the maximum pseudolikelihood estimator should be ecient if interaction is weak. Folklore holds that it is inecient for strong interactions.
2.3 Loglinear case
In this paper we focus on Gibbs point process models for which the condi-tional intensity is loglinear:
(u;
x
) = expfTS(u;
x
)g (9)
where S(u;
x
) is a vector of spatial covariates (possibly depending onx
)dened at each point u in W. This holds in particular if the model is of the
exponential family with canonical parameter .
Assume jjS(u;
x
)jj exp
TS(u;
x
)
is uniformly bounded in u 2 W and
2 , for each xed
x
. Then the maximum pseudolikelihood normalequa-tions @
@ logPLA(;
x
) = 0become X xi 2A S(xi;
x
) = Z AS(u;x
)exp TS(u;x
) du: (10)Numerical solution of (10) usually requires iterative algorithms.
It can easily be shown that (10) is an unbiased estimating equation, i.e.
the expectations of the left and right sides of (10) under are equal. The
proof is an application of a nonstationary form of the Nguyen-Zessin formula, viz. E " X xi 2X\A h(xi;X) # = Z AE[(u;X)h(u;X)] du (11)
holding for all nonnegative bounded measurable functions h(u;
x
). Thisex-tends a result of Diggleet al [16] that under reasonable conditions, the normal
equations in the stationary case are a special case of the Takacs-Fiksel esti-mating equations, itself an application of the method of moments [17, 18, 61]. Jensen and Mller [30] proved that for Gibbs point processes with expo-nential family likelihoods, the pseudolikelihood is log-concave and the
max-imum pseudolikelihood estimator is consistent as W % Rd, under suitable
conditions. Jensen and Kunsch [29] proved the MPLE is asymptotically nor-mal for stationary pairwise interaction processes, under suitable conditions (see (C1) and (C2) of [29]).
The parameter may be constrained to lie in a a convex set Rp.
In the loglinear case (9), the pseudolikelihood is log-convex so the maximum exists and occurs either at an interior point of where the normal equations
are satised, or on the convex boundary @ of .
3 Berman-Turner device for maximum
pseu-dolikelihood
This section describes the computational device which we propose for com-puting approximate maximum pseudolikelihood estimates. The method is an adaptation of a earlier technique of Berman and Turner [7] for approxi-mate maximum likelihood estimation for the inhomogeneous Poisson point process. Ideas related to [7] have been explored by Lindsey [36, 37, 38, 39].
3.1 Derivation
Let X be a Gibbs point process with conditional intensity (u;
x
) andcon-sider the pseudolikelihood (7) for X, taking A = W for simplicity.
mate the integral in (7) by a nite sum using any quadrature rule, Z W (u;
x
)du m X j=1 (uj;x
)wj (12)where uj; j = 1;:::;m are points in W and wj > 0 are quadrature weights.
This yields an approximation to the log pseudolikelihood,
logPL(;
x
) n(x) X i=1 log(xi;x
)? m X j=1 (uj;x
)wj: (13)Extending an observation of Berman and Turner, we note that if the list of
points fuj;j = 1;:::;mgincludes all the data points fxi;i = 1;:::;ng, then
we can rewrite (13) as log PL(;
x
) m X j=1 (yjlogj ?j) wj (14)where j =(uj) and yj =zj=wj, where
zj = 1 if uj is a data point,uj 2fx 1;:::;xn g 0 if uj is a dummy point, uj 62fx 1;:::;xn g: (15)
The right side of (14), for xed
x
, is formally equivalent to the log likelihoodof independent Poisson variables Yk Poisson(k) taken with weights wk.
The expression (14) can therefore be maximised using standard software for tting Generalised Linear Models [40], provided that (a) the software handles weighted likelihoods; (b) the software accepts noninteger values of
the responses yj in Poisson loglinear regression and correctly maximises the
loglikelihood expression; (c) the conditional intensity function (;
x
), forxed
x
, is related to any explanatory variables byg((u;
x
)) =TS(u;x
) (16)whereg is a link function implementedin the software, and S(u;
x
) is a vectorof spatial covariates (possibly depending on
x
) dened at each pointu in W.Software packages satisfying these criteria include GLIM [1] and S-PLUS
[5, 11, 64]. The only choice of g in (16) which we shall consider is the log
link, giving rise to the `loglinear model' (9).
The key reason for adopting this approach is that the use of standard
statistical packages rather than ad hoc software confers great advantages
in applications. Modern statistical packages have a convenient notation for 8
statistical models [1, 11, 64] which makes it very easy to specify and t a wide variety of models of the type (9). Algorithms in the package may allow one to t very exible model terms such as the smooth functions in a generalised additive model [27]. Interactive software allows great freedom to reanalyse the data. The tting algorithms are typically more reliable and stable than in home-grown software.
3.2 Procedure
In summary, the procedure is as follows.
1. Generate a set of dummy points, and combine it with the data points
xi to form the set of quadrature points uj;
2. Compute the quadrature weightswj;
3. Form the indicators zj as in (15) and calculate yj =zj=wj;
4. Compute the (possibly vector) values vj = S(uj;
x
) of the sucientstatistic at each quadrature point;
5. Invoke the model-tting software, specifying that the model is a loglin-ear Poisson regression
logj =Tv
j (17)
to be tted to the responses yj and covariate values vj, with weights
wj.
The coecient estimates returned by the software give the (approximate)
MPLE of . The estimates of standard errors are not applicable, since theyb
assume i.i.d. Poisson observations. The software also typically returns the
deviance D of the tted model; this is related to the log pseudolikelihood of
the tted model by
?log PL b ;
x
= D2 +n(x) X i=1 logwi+n(x
); (18)note the sum is over data points only.
Conveniently, the null model j in the loglinear Poisson regression
corresponds to the uniform Poisson point process with intensity . The
MPLE is b = n(
x
)=P
jwj =n(
x
)=jWj with corresponding logpseudolikeli-hood logPL
^;
x
=n(
x
)[logn(x
)?logjWj?1]:Note that this formulation assumes(u;
x
) is positive everywhere. Zero values are also permissible, provided the set of zeroes does not depend on. Thus we formally allow negative innite values for S(u;
x
). In theap-proximation (14) all points uj with (uj;
x
) = 0 will be dummy points,oth-erwise the pseudolikelihood and likelihood will be identically zero. Dummy points contribute only to the integral part of the log pseudolikelihood, hence
their contribution is zero. Hence we can simply omit any points uj with
(uj;
x
) = 0 from the sum (14). In the tting algorithm, such points shouldbe omitted in all contexts.
3.3 Quadrature schemes and their accuracy
Figure 1: Quadrature using the Dirichlet tessellation [7]. Left: Illustrative
exampleof a point pattern dataset in the unit squareW. Right: The Dirichlet
tessellation ofW based on the data pointstogether witha 55 grid of dummy
points. Data points are marked by lled dots. The quadrature weight wj is
the area of the Dirichlet tile.
Berman and Turner [7] used the Dirichlet tessellation or Voronoi diagram [48] to generate quadrature weights for the analogue of (12). The data points are augmented by a list of dummypoints, then the Dirichlettessellation of the combined set of points is computed as sketched in Figure 1. The quadrature
weight wj associated with a (data or dummy) point uj is the area of the
corresponding Dirichlet tile.
A computationally cheaper scheme is to partitionW into tiles Tk of equal
area, and in each tile place exactly one dummy point, either systematically
or randomly. Ascribe to each dummy or data point uj a weight wj where
w?1
j is the number of (dummy or data) points in the same tile asuj. We call
these the counting weights.
Note that the conditional intensity(u;
x
) is typically a discontinuousfunction of u at the data points xi, while generically the limit as u ! xi
exists. Thus the approximation (12) involves a `discontinuity error' of size
n(x) X i=1 (xi;
x
)? lim u!x i (u;x
) wi (19)(a sum of contributions from data points only) in addition to the `quadrature error' associated with the nite approximation to the integral. The
discon-tinuity error is controlled by reducing P
iwi, the total quadrature weight
of the contributions from the data points, usually by increasing the number
m?n of dummy points. See further comments at the end of section 4.
4 Example: Strauss process
Next we illustrate the method as it applies to the simple Strauss process
model [60, 33]. This is a pairwise interaction process (3) in which b(u)
is constant and h(u;v) = if jju ?vjj r, and h(u;v) = 1 otherwise.
Here > 0 and 0 1 are parameters and r > 0 is a xed `interaction
distance'. Thus each pair of points closer than r units apart contributes a
penalty of to the likelihood,
lik(; ;
x
) =n(x) s(x) (20)(taking 00 = 1) where = (; ) is the normalising constant, and
s(
x
) = #f(i;j) : i < j; jjxi?xjjjrgis the number of unordered pairs of points which lie closer thanr units apart,
The Strauss process is well-dened for all 2 [0;1]. If = 1, it reduces to
the homogeneous Poisson process with intensity . For = 0 it is a hard
core process in which no two points ever lie closer than r units apart. For
0< < 1 there is inhibition between close pairs of points. The conditional
intensity is
; (u;
x
) = t(u;x) (21)where
t(u;
x
) = #fxi 2x
: 0<jjxi?ujjrg (22)is the number of points xi 2
x
which are close to u, other than u itself. SeeFigure 2. The pseudolikelihood is 11
+ + + + + + +
Figure 2: The conditional intensity of the Strauss process at a point u ()
depends on the number of existing points (+) of the conguration
x
whichare closer than r units distant from u. In this illustration t(u;
x
) = 2.PL(; ;
x
) =n(x) 2s(x)exp ? Z W t (u;x)du (23)which is in the required loglinear form (9) with = (ln;ln )TandS(u;
x
) =(1;t(u;
x
))T. The MPLE normal equations (10) aren(
x
) = Z W t (u;x) du (24) 2s(x
) Z W t (u;x)du = n(x
) Z Wt(u;x
) t (u;x)du: (25)The maximumof the pseudolikelihood may occur either at a solution of these
equations or at = 0;1. If r is less than the minimum interpoint distance,
thens(
x
) = 0 and the pseudolikelihood is maximised when = 0. Otherwiseb
> 0. If the solution of (24{25) occurs where > 1, then since the log
pseudolikelihood is concave,b = 1. The maximised log pseudolikelihood is
logPL b ;b ;
x
=n(x
)log + 2s(bx
)log b ? b p(b ): (26)Consistency and asymptotic normality of the MPLE follow from [29, 30]. To compute the approximate MPLE using the Berman-Turner device we would follow the procedure in section 3.2, tting the loglinear model
logj =1+2vj
where vj =t(uj;
x
); with t(u;x
) as dened in (22). A suitable S-PLUSinvo-cation would be
glm( y ~ v, family = poisson, link = log, weights = w)
where y,v,ware S-PLUSvectors of equal length containing the responses yj,
the `explanatory variable' values vj, and the weightswj respectively, for each
quadrature point uj. If glm()yields a solution 2 > 0, i.e. > 1, then the
MPLE is b = 1,
b
= n(
x
)=jWj.Note the integrals in (23){(25) are polynomials in :
p( ) = Z
W t
(u;x)du (27)
= a0+a1 + ::: + aK
K (28)
say, where ak = jAkj is the area of the region Ak =fu 2 W : t(u;
x
) = kg.Thus (24){(25) can be rewritten
p( ) = n(
x
) (29)p0( )
p( ) = 2s(n(
x
x
) :) (30)In this simple case, the MPLE can be computed by solving (29){(30) directly,
although this still requires evaluation of the coecients aj, which calls for
numerical integration or computational geometry. We shall use this
\polyno-mial" approach to check the accuracy of our method in section 9.
The quadrature approximation (12) consists in replacingp( ) by
q( ) =Xm j=1 t(u j; x) wj = K X k=1 bk k; (31) where bk =P uj 2A
kwj are approximations to the areas of the sets Ak. The
approximation includes discontinuity error (19) arising because the weight
wi for a data pointxi witht(xi;
x
) =k but limu!xit(u;
x
) =k+1 is ascribedto bk rather than to bk+1.
The total error in approximating p( ) by q( ) is bounded by E1 =
PK
k=1
jak?bkj; the sum of the errors in approximating the area ak bybk. The
error in approximating p0( ) by q0( ) is bounded by E
2 = PK
k=1k
jak?bkj.
To control bothE1 andE2, dummypoints must be sucientlydense
through-out W and suciently dense where t(u;
x
) is high, that is, near the datapoints.
5 Spatial interaction terms
Sections 5{7 present further examples of point processes, and examine the computational requirements for applying our method. The present section concerns point processes with various kinds of interpoint interaction
(pair-wise interaction and other). Inhomogeneous models are discussed in x6 and
marked point processes in x 7.
5.1 Pairwise interaction models
5.1.1 General loglinear form
Consider rst the general pairwise interaction process (3) and assume
b(u) = expf TB(u) g (32) h(u;v) = expf TH(u;v) g (33)
whereB(u) and H(u;v) are vectors dened for every u;v 2W. Note H(u;v)
should be a symmetric function of u and v. The conditional intensity (6)
becomes (u;
x
) = exp 8 < : TB(u) + T n(x) X i=1 H(u;xi) 9 = ; : (34)This is of the loglinear form (9) required for our approximation, with
S(u;
x
) = B(u) + n(x) X i=1 xi 6=u H(u;xi) (35)and the procedure of section 3.2 may be applied. The log pseudolikelihood is
concave in so the MPLE values form a nonempty convex set. Consistency
of the MPLE is not guaranteed in this generality.
In the rest of this section we assumeB(u) is constant; models for spatial
inhomogeneityare discussed in section 6. Here it is important to note that the
general form (32) assumed for b embraces not only parametric models but
also Generalised Additive Models [27] whereB(u) would be a vector of spline
basis functions. However, this apparently does not extend to GAM type
models for h, since the sucient statistic (35) is a sum of a variable number
of terms, which is beyond the scope of current GAM tting algorithms. Hence we are currently forced to consider only parametric models for interpoint interaction, such as the Strauss process.
5.1.2 Soft core process
The `soft core' model discussed by Ogata and Tanemura [44] is a pairwise
interaction process (3) with b(u) and
h(u;v) = exp ( ? jju?vjj 2= ) u6=v
where > 0 and 0 < 1 are parameters and 0 < < 1 is a nuisance
parameter which we assume known for the moment. The limitas !0 is the
hardcore process (Strauss with = 0, r = ); the density is not integrable
for 1.
Thus log(u;
x
) = expfTS(u;
x
)g where S(u;
x
) = (1;V (u;x
))T and = (log ;2=)T with V (u;
x
) =? n X i=1 xi 6=u jju?xijj ?2= : (36)The log pseudolikelihood is concave in and the MPLE is well dened,
consistent and asymptotically normal, by [29].
The conditional intensity is loglinear in . To estimate and (given a
value of ) one would execute theS-PLUScommand
glm(y ~ v, family=poisson, weights=w)
where y, v, w are S-PLUS vectors containing the responses yj = zj=wj,
explanatory variable vj = V (uj;
x
), and weights wj respectively. Then ^ =exp ^1 and ^ =
^2 =
2
where ^1 and ^2 are the estimates of the linear
coecients returned by the glm()function of S-PLUS.
5.1.3 Step function interaction
In the absence of nonparametric estimators of h, there is much interest
[17, 18, 49, 61] in tting a stationary pairwise interaction process with a
piecewise constant interaction function h. Thus b(u) and h(u;v) is
a step function of jju ?vjj, say logh(u;v) = ` if r`
?1 <
jju?vjj r`,
and logh(u;v) = 0 if jju?vjj> rk, where 0 = r
0 < r1 < r2 < ::: < rk are
parameters. This is a special case of (32){(33) with = (log;1;2;:::;k)
T say, and B(u) = (1;0;0;:::;0) (37) H(u;v) = (0;I1( jju?vjj);:::;Ik(jju?vjj)) (38) 15
where I`(d) =
1
fr`?1 < d
r`gfor ` = 1;2;:::;k. Thus
S(u;
x
) = (1;t1(u;x
);:::;tk(u;x
))T
where for ` = 1;2;:::;k
t`(u;
x
) = #fxi 2x
: r`?1 <
jjxi?ujjr`g
is the number of points xi 2
x
whose distance from u lies in the interval(r`?1;r`]. The MPLE is consistent by [30, Theorem 3.1] if either `
0 for
all ` = 1;:::;k, or the ` are uniformly bounded from above and 1 =
?1
(the process has a hard core).
In our approach it is easy to t this model, analogously to the Strauss process. The associated loglinear model is
logj = log + 1v1j +::: + kvkj
where v`j =t`(uj;
x
).5.2 Area interaction process
The Widom-Rowlinson `penetrable sphere model' of liquid-vapour equilib-rium [25, 55, 65], also known as the `area interaction process' [3], has proba-bility density (in the simplest case)
p(
x
) =n(x) ?A(x) (39)where A(
x
) is the area of the union of discs of radiusr centred at the pointsxi. Here ; ;r > 0 are parameters and = (; ;r) is the normalising
constant. Generalisations are given in [3]. The process is well dened, i.e.
(39) is integrable, for all values of > 0 and for all compact W R
2.
It reduces to a Poisson process when = 1, exhibits ordered patterns for
0 < < 1 and produces clustering when > 1. Other properties and
maximum likelihood estimation are investigated in [57].
The conditional intensity is of the desired form log(u;
x
) = TS(u;x
)putting = (log;log )TandS(u;
x
) = (1;A(x
[fug)?A(
x
))T. Results in
[30] imply that if r is known, the MPLE of (; ) is consistent. The authors
do not know whether a central limit theorem is available; the results of [29] do not apply.
Another advantage of the Berman-Turner approach here is the reduction
in the computational cost since the values of D(u;
x
) = A(x
[fug)?A(x
)are only required for a small number of points u.
5.3 Ord's process
Ord [51, discussion] suggested a model for regular patterns of points repre-senting entities which compete for resources, such as trees or towns. The Dirichlet tile associated with a point can be interpreted as the \territory" from which it draws resources. Ord suggested densities of the form
f(
x
;) /n
Y
i=1
g(A(xi;
x
)) (40)where A(xi;
x
) is the area of the Dirichlet tile associated with xi in thepattern
x
, and g : R ! [0;1) is a function combining the roles of thespatial interaction and intensity terms in other models. The special case
g(v) is the uniform Poisson process with intensity . Typically g()
would be an increasing function, so that small tiles are penalised.
Ripley [52, p. 175] concludes his analysis of the Swedish pines data (sec-tion 9.1) with a comment that tting Ord's process would be an interesting alternative analysis. To our knowledge, this has not been attempted and Ord's model has not been investigated or mentioned further, except in [4].
The process (40) exists (i.e. f is integrable) under reasonable conditions,
for example, wheneverg() is uniformly bounded. The conditional intensity
is (u;
x
) =g(A(u;x
[fug)) Y xi u g(A(xi;x
[fug)) g(A(xi;x
)) (41)where the product is over all points xi that are Dirichlet neighbours of u
in the pattern
x
[fug, and A(u;x
[fug) is the area of the Dirichlet tilewith centreu in this pattern. Explicit analytic expressions for A(u;
x
[fug),the pseudolikelihood, or the MPLE are not available. Geometric
computa-tion of A(u;
x
) is time-consuming so that a discrete approximation to thepseudolikelihood becomes a necessity.
The Berman-Turner device (x3) can be applied if the kernel is modelled
in loglinear form g(v) = expf
TG(v)
g. Then log(u;
x
) =TV (u;
x
) where V (u;x
) =G(A(u;x
[fug)) + X xi u [G(A(xi;x
[fug))?G(A(xi;x
))]is the regression variable. Evaluating vj =V (uj;
x
) for allj requirescompu-tation of m + 1 dierent Dirichlet tessellations.
6 Inhomogeneous models
Few writers to date, apart from Ogata and Tanemura [47], have tted explicit models to point pattern data that incorporate both spatial inhomogeneity and interpoint interactions. In the context of our method, it is easy to introduce a spatial trend or dependence on spatial covariates. This is simply
a matter of adding more terms to the linear predictorS(u;
x
) in the associatedPoisson loglinear regression model.
6.1 Spatial trend
A straightforward model of spatial trend in a pairwise interaction process (3) is [44, 45] b(u) = expf TB(u) g (42) h(u;v) = h(u?v) = expf TH(u ?v)g (43)
so that the spatial trend is expressed by the dependence ofb(u) on location
u, while the interpoint interaction does not exhibit trend. Typically H would be one of the pairwise interaction functions considered in sections 4{5, while
B(u) = (B1(u);:::;Bk(u))
Twould be a vector of convenient scalar functions
of location, such as polynomials or orthonormal functions of the coordinates.
It is also possible to use the GAM approach [27] to model each B`(u) by a
smooth function of one coordinate. Assuming (42){(43) we have
log(u;
x
) = TB(u) + Tn(x) X i=1 xi 6=u H(u?xi);
this is of the form (9) so we may easily t it using the method of section 3.2,
indeed simply by adding the term TB(u) to the linear predictor in one of
the models discussed in previous subsections.
Ogata & Tanemura [44] developed maximum likelihood estimation tech-niques for models of this form, in particular combining a spatial trend with
the soft core interaction of Section 5.1.2. The trend term TB(u) was a
polynomial in the Cartesian coordinates. Details are given in Section 9.3. More generally, the spatial interaction could also depend on location. For
examplea spatially-varying scale could be introduced by replacingH(u?v) in
(43) byH(r(u)r(v)(u?v)) or similar expressions that are symmetricin u;v.
The main diculty is that loglinearity is usually lost, and we cannot apply 18
the method of section 3 directly, unless all the components of in uencing
r(u) are treated as nuisance parameters.
An eective alternative way to t models with spatially-varying interac-tion range is proposed by Nielsen and Vedel Jensen [42].
6.2 Spatial covariates
The data may include spatial covariates such as topographic elevation, soil pH, or another observed spatial pattern. Covariates might be introduced into the analysis in order to regress intensity on explanatory variables, eliminate spurious trend, or make inferences conditional on another spatial pattern.
For our purposes the spatial covariate must be incorporated as a `spatial
variable', i.e. a function Z(u) dened at each location u 2 W and observed
at each of the quadrature points uj (thus, including the data points).
De-pendence on the covariate can then be modelled by adding terms in Z(uj)
to the linear predictor.
The covariate valueZ(u) might be simply the observed value of a variable
such as pH or elevation, but often the covariate observations will be
trans-formed to yield Z(u). For example in spatial epidemiology Z(u) could be a
kernel smoothed estimate of the density of the population at risk, derived from point pattern or regional count data [14].
Another observed spatial pattern, such as a pattern of points or line seg-ments or a digital image, can be included as a spatial covariate by computing
a suitable function Z(u) associated with the pattern. Berman [6] proposed
modelling the dependence of a point process on a line segment process by con-ditioning on the line segment process and testing whether the point process
is inhomogeneous Poisson with intensity (u) depending on the minimum
distance Z(u) from location u to the nearest line segment. More generally
the association between the point process of interest X and another spatial
process Y can be investigated conditionally on Y =
y
by constructing aGibbs model for the joint distribution of X and Y , forming the conditional
intensity of the conditional distribution of X given Y , say
XjY;(u;
x
j
y
) = expfTS(u;
x
jy
)gand forming the associated pseudolikelihood, which depends on Z(u) =
S(u;
x
jy
). Sarkka and Hogmander [58] study hierarchical Gibbs modelsfor the association between two or more point processes.
7 Marked point patterns
7.1 General
The observed points may also carry `marks', i.e. observations mi associated
with each point xi of the pattern. The full dataset is a list
v
=f(x1;m1):::;(xn;mn)
g (44)
with xi 2 W and mi 2 M, where M is the space of possible marks. The
marks may be observations of any kind; commonly M is either a discrete
set of `labels' M=f1;2;:::;cg or the positive real lineM= [0;1). In the
discrete case, the data points are eectively classied into m dierent types
or colours, and the mark attached to each point indicates its type. In the
continuous case mi is usually a physical measurement such as the height or
diameter of a tree whose location is xi. See [15, chap. 6,7], [13, x8.6{8.7],
[4, 19, 59]. Jensen and Mller [30] formally treat the pseudolikelihood of a
marked point process and prove consistency of the MPLE; Goulard et al.
[24] investigate further statistical properties. See [58].
The reference process for likelihoods is the Poisson marked point process constructed by attaching i.i.d. random marks to the points of a Poisson point
process on W with unit intensity [34]. The distribution of the marks in this
reference process is an arbitrary probability distribution Q on M.
Theinhomogeneous Poisson marked point process with intensity function
b :W M!R
+is the analogue of (2), with density
f(
v
;) = n(x) Y i=1 b(xi;mi) (45) where = () = expf? R WM(b(u;m)?1)dQ(m)dug is the normalising
constant.
Thepairwise interaction marked point process is the analogue of (3), with density f(
v
;) = 2 4 n(x) Y i=1 b(xi;mi) 3 5 " Y i<j h((xi;mi);(xj;mj)) # (46)where = () is the normalising constant, b : W M ! R
+ is the
activity/trend function and h : (W M)
2
! R
+ the interaction
func-tion. The function h is symmetric in the sense that h((u;m);(u0;m0)) =
h((u0;m0);(u;m)) for u;u0
2 W and m;m
0
2 M. Conditions must be
im-posed on b;h to ensure the density is integrable.
7.2 Pseudolikelihood
The conditional intensity of a Gibbs marked point process, analogously to
(4){(5), is a function ((u;m);
v
) of the marked patternv
and of a markedpoint (u;m) with u 2 W; m 2 M. For example the pairwise interaction
marked point process (46) has
((u;m);
v
) =b(u;m) n(x) Y i=1 (u;m)6=(x i;mi ) h((u;m);(xi;mi)): (47)The pseudolikelihood of a Gibbs marked point process is [24, 30]
PL(;
v
) = 2 4 n(x) Y i=1 ((xi;mi);v
) 3 5 exp ? Z W Z M ((u;m);v
)dQ(m) du: (48)In the case of a multitype point process with c dierent types, we have
M=f1;2;:::;cg and the pseudolikelihood is usually dened by
PL(;
v
) = 2 4 n(x) Y i=1 ((xi;mi);v
) 3 5 exp ( ? c X m=1 Z W ((u;m);v
)du ) : (49)7.3 Berman-Turner device
To apply our approximation method to (48) we create a set of marked points
(uj;kj),j = 1;:::;M which include the data (xi;mi),i = 1;:::;n and form a
good quadrature rule for WM. It would usually be convenient to take the
Cartesian product of a set of quadrature points in W and a set of elements
of M. We shall assume this and write the marked points as (uj;k`) for
j = 1;:::;J and ` = 1;:::;L where uj 2 W;k` 2 M. Then dene the
indicator zj`to equal 1 if (uj;k`) is a data point and 0 if it is a dummy point.
Letwj` be the corresponding weights for a linear quadrature rule inWM.
Then the pseudolikelihood is approximated by
logPL( ;
v
) L X `=1 J X j=1 (yj`logj`?j`) wj` (50)wherej`=((uj;m`);
v
) andyj`=zj`=wj`. For discrete marks as in (49),the weights may simply be those for a quadrature rule in W corresponding
to the points uj. This is illustrated in Figure 3.
Figure 3: Quadrature of marked point pattern using the Dirichlet
tessella-tion. Left: Illustrative example of a marked point pattern with two types,
1 (triangles) and 2 (boxes) in the unit square W. Middle and Right: The
Dirichlet tessellation of W f1;2g based on the data and a 55 grid of
dummy points duplicated for the two types. Middle: the planeW f1g of
type 1 points; Right: the plane W f2g of type 2 points.
7.4 Example: 2-type Strauss process
This is the special case of the pairwise interaction marked point process (46)
in which M=f1;2g, i.e. points belong to one of two types, and
b(u;m) = m h((u;m);(u0;m0)) = m;m0 if 0< jju?u 0 jj< rm;m 0 1 otherwise
where 1;2 > 0 are intensity parameters, 11; 22; 12; 21
2[0;1] are
inter-action parameters, and r11;r22;r12;r21 > 0 are interaction distances, with
12 = 21 and r12=r21. The density may be expressed as
f(
v
;) = n1 (v ) 1 n2 (v ) 2 s11 (v ) 11 s12 (v ) 12 s22 (v ) 22analogously to (20), where n1(
v
);n2(v
) are the numbers of points of type 1and 2 respectively, and sm;m0(
v
) is the number of pairs of distinct markedpoints of types m and m0 respectively within a distancer
m;m0 of each other.
The conditional intensity is
((u;m);
v
) =m t1 ((u;m);v ) m1 t2 ((u;m);v ) m2 for u2W;m2M, where tm0((u;m);v
) = # fi : 0 <jju?xijjrm;m 0; mi =m 0 gis the number of type m0 points within the required distancer
m;m0 of a point
u with type m.
This model may be cast in the loglinear form (9) with parameter vector
= (log 1;log2;log 11;log 12;log 22) and ve \explanatory variables",
I1(m), I2(m), I1(m)t1((u;m);
v
),I1(m)t2((u;m);v
)+I2(m)t1((u;m);v
) andI2(m)t2((u;m);
v
) respectively, where Ik(m) =1
fm = kg. Equivalently it
may be described as a nested model with one factor and two covariates, one of which is nested within the factor.
The pseudolikelihood estimate of is consistent [30]. The central limit
theorem of Jensen and Kunsch [29] does not strictly apply here since there are more than two parameters, but these authors conjecture (p. 477) that a generalisation does hold.
8 Estimation and inference issues
8.1 Edge eects
For inferential purposes it matters whether we assume the data
x
are arealisation of a nite point process X dened only inside the domain W
(`bounded case') or a partially observed realisation
x
=y
\W of a pointprocess Y extending throughout Rd (`unbounded case'). In the unbounded
case we have an `edge eect' problem: the conditional intensity (u;
y
) ofY may not be observable from the data
x
=y
\ W, since the requiredinformation may involve points lying outside the observation window W.
One can simply ignore edge eects and apply the pseudolikelihood to
x
asif it were
y
. This corresponds to treating an observation of the `unbounded'model as if it were drawn from the `bounded' model. This causes bias in parameter estimates of the interaction and intensity parameters. Remedies for edge eects are surveyed in [2, 53, 59]. Following are some possible strategies.
8.1.1 Periodic boundary conditions
If the window W is rectangular one may apply \periodic boundary
condi-tions" [51] by identifying opposite sides of W so that e.g. points near the
right hand edge have neighbours near the left edge. This typically reduces
bias but in ates variance, and is only applicable to certain shapes of W. It
seems inadvisable for inhomogeneous patterns.
8.1.2 Border method
This applies [53] to any process with nite interaction range r, in the sense
that the(u;
x
) depends only on data pointsxilying within a distancer of u.An example is the Strauss process with xed r. Form the pseudolikelihood over the subregion
W r=
fu2W : b(u;r)Wg
of all points of W lying at least r units from the boundary. For u2W
r the
conditional intensity is observable, (u;
y
) = (u;y
\W) so thepseudo-likelihood over W r can be calculated from the data.
Ifr is unknown, one must be wary of comparing pseudolikelihoods based
on dierent subsets W r. One strategy is to compute all pseudolikelihoods
over the same domain W R whereR is the maximum r value contemplated.
The border method applies to any model with nite interaction range, including nonstationary processes, and the associated MPLE satises an un-biased estimating equation. However, it discards appreciable amounts of data.
8.1.3 Ripley's hybrid method
In estimation problems, an improvement on the border method is to use edge correction weights [2, 53, 59]. Ripley [53, p. 67], [64, p. 396] extended this to maximum pseudolikelihood estimation for the Strauss process. He proposed that the right side of (30), which cannot be observed due to edge
eects, be estimated by n(
x
)K(r)=bjWj, a quantity which has approximately
the same expectation. Here K(r) is the estimate of the value K(r) of theb
second moment function K of the process [53]
b K(r) = jWj n(
x
)2 X i6=i 01
fjjxi?xi 0 jjrge(xi;xi 0;W) (51)wheree(u;v;W) is an edge eect correction factor ensuring unbiased
estima-tion of 2K(r) if the point process is stationary and isotropic. The left side
of (30) is also subject to edge eects, and is modied by using the eroded
domain W r instead of W as the domain of integration in (27). This is a
`hybrid' of the border method and the edge correction weights strategies.
8.1.4 Edge corrected pseudolikelihood
An alternative, which we believe is new, is to introduce edge correction weights into the pseudolikelihood itself. Consider a stationary pairwise
in-teraction process with b(u) and h(u;v) = exp
TH(u
?v)
: Suppose that we now modify the model, replacing the pairwise interaction function by hE(u;v) = exp e(u;v;W)TH(u ?v) 24
where e(u;v;W) is an edge correction weight as in [2] which must be
sym-metric in u and v. This modied model has conditional intensity E(u;
x
) =expf TSE(u;
x
) where SE(u;x
) = n(x) X i=1 e(u;xi;W)H(u?xi)is a \plug-in" estimator of the unobservable potential S(u;
y
) for the originalmodel. Forming the pseudolikelihood for the modied model and deriving the
normal equations, we obtain (10) withS(u;
x
) replaced bySE(u;x
)through-out. By the nonstationary Nguyen-Zessin formula (11), this is an unbiased
estimating equation for the modied model. It is approximately unbiased
for the original model, when jjjj is small. This model can be tted by the
Berman-Turner method.
8.1.5 Data augmentation
The unobserved points of
y
outside the window which aect the value of(u;
y
) for u 2 W can alternatively be regarded as missing data. Oneapproach is data augmentation [63, chap. 5] which has been applied to max-imum likelihood inference for point processes by Geyer [21]. This can also be applied in our context.
8.2 Nuisance Parameters and Prole Pseudolikelihood
The point process models considered above all contain `nuisance' parameters which do not enter in the loglinear form (9) required for our method. A possible approach to estimation is by analogy with prole likelihood. Write = ('; ) where are the nuisance parameters, so that we assume
(u;
x
) = exp'TS(u;
x
; )
(52)
instead of (9). For each xed value of the model is loglinear in ', so that
we can apply our approximation method to maximise the pseudolikelihood
over ', yielding an MPLE '( ) for xed . Computing the maximisedb
pseudolikelihood as a function of yields the prole pseudolikelihood
PL(('( ); );b
x
) = max' PL(('; )): (53)
The global MPLE of is then obtained by maximising this prole
pseudo-likelihood over . We examine this approach for the Strauss and soft core
processes in Section 9.
8.3 Parametric inference and model choice
In order to draw inferences about the parameters of a Gibbs process tted by maximum pseudolikelihood, and to choose models, we use the parametric
bootstrap. We shall generally assume in practice that the MPLE b of the
canonical parameter is approximately unbiased and approximately normal, although this has only been established in certain cases [29, 30].
To obtain condence intervals for , we generate a large number M of
simulated realisations from the tted model = b, yielding M simulated
val-ues b
(1);:::;
b
(M) from the distribution of the MPLE under the tted model.
We estimate the mean vector and covariance matrix of this distribution from the simulated values, then construct condence intervals using location mod-els based on the multivariate normal or the bootstrap distribution. Similarly for model choice we use the bootstrap distribution of the deviance between two (nested) models.
9 Examples of applications
The analyses in this paper were performed using S-PLUS [5, 11, 64] using
the Generalised Linear Model tting function glm() and occasionally the
Generalised Additive Model function gam(). Some analysis were repeated
using GLIM [1] as a cross-check.
9.1 Swedish pines data
• • • • • • • • • • • • • • • • • • • • ••• • • • • •• • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • •
Figure 4: The Swedish pines data: locations of 71 pine saplings in a 10 x 10 metre square. Extracted by Ripley [52] from Strand [60]. Data obtained
from the MASS library accompanying [64].
Figure 4 depicts the Swedish pines data of Strand [60] which give the locations of 71 pine saplings in a 10 x 10 metre square. Ripley's pioneering
analysis [52, x8.6, pp. 172{175] plotted L(t) =
p
K(t)= and rejected the hypothesis of a homogeneous Poisson process at the 1% level by a Monte
Carlo test based on D = suptjL(t)?tj. Ripley then tted a Strauss process
manually, obtaining r = 0:7 metres and = 0:20. In the latest analysis,
again by Ripley [64, p. 396], was estimated to be 0:15 using maximum
pseudolikelihood with Ripley's hybrid edge correction (x8.1.3).
We tted a Strauss process to these data by maximum pseudolikelihood, using both the Berman-Turner device and the polynomial approach via (29){
(30). We estimated and , but initially held r xed at 0:7. For the
Berman-Turner method, varying densities of dummy points were tried, and quadrature weights were computed using both the Dirichlet and counting
methods (section 3.3). Estimates obtained for ranged from 0.29 down to
0.20, and for from 1.49 up to 2.12. A ner quadrature scheme always led
to a smaller value of and a larger value of . Both the Berman-Turner and
polynomial methods gave = 0:21 using a 5050 grid of dummypoints. This
is close to the value obtained by Ripley in [52]. The corresponding value of
was 1.98 by the Berman-Turner method and 2.01 by the polynomial method.
Various edge corrections (section 8.1) were tried, all using a 5050 grid of
dummy points. Using the border method, eroding the window by a distance
r = 0:7 metres, we obtained b = 0:13 and values of 3.24 (Berman-Turner)
and 3.29 (polynomial). Periodic edge correction yielded ^ = 2:09, ^ = 0:24
(Berman-Turner) and ^ = 2:24, ^ = 0:22 (polynomial). Our proposed edge
corrected pseudolikelihood method was also applied, using the translation
correction [2, 53] as the edge correction factor e(u;v;W). The parameter
estimates were = 1:97,b
b
= 0:25. The latter two edge corrections in ated
the estimate of while the border correction de ated it.
A plot of the prole log pseudolikelihood of the interaction distance r
is shown in Figure 5. The plot yields br = 0:7, which agrees with [64, p.
396]. The jaggedness of the plot is due to the discontinuity of the interpoint
interaction:
1
fjju?xijjrg and hences(x
) are discontinuous functions ofr, while the left sides of (29){(30) are dierentiable with respect to r. There
seems little prospect of a convenient limit theory for r.b
Next we estimate the covariance matrix of the parameter estimates using the parametric bootstrap (section 8.3). To reduce the amount of computation we did not apply edge-correction and looked at only one set of quadrature
weights (based on a 50 50 regular grid). However r was estimated by
prole pseudolikelihood. This version of the estimation algorithm was rst
applied to the data yielding (^; ^ ; ^r) = (1:9781;0:2131;0:7). A
Metropolis-Hastings birth-death-shift algorithm [20] was used to generate 500 simulated realisations from the Strauss process with the same parameter values. The bootstrap covariance matrix, based on 500 parametric bootstrap replicates, was ^C = 2 4 0:1938 ?0:0155 0:0036 ?0:0155 0:0063 0:0008 0:0036 0:0008 0:0014 3 5
yieldingcorresponding (normal based) 95% condence intervals of [1:1153;2:8410], [0:0575;0:3686], and [0:6267;0:7733] for , , and r respectfully.
Normality of the estimates is suspect. Chi-squared tests for normality
on the sequences of bootstrap replicates of estimates gave p-values of 0.02,
0.22, and 0 respectively for the normality of ^, ^ , and ^r, so that ^ is the only
estimate which may legitimately be assumed normal. However, rough 95% condence intervals based on the empirical quantiles of the bootstrap
•• • • • • • • • • • •• •• • • • • • • • • • • •• • • • • • • • • • • • • •• •• • • • •• • • • • • • • • • • ••• ••• ••• •••• Interaction radius
Log pseudo likelihood
0.0 0.5 1.0 1.5
-90
-80
-70
-60
Figure 5: Prole log pseudolikelihood of the Strauss process nuisance
param-eter r for the Swedish pines data. Solid line: polynomial method and (26);
dotted line: Berman-Turner device and (18). For comparison, the
homoge-neous Poisson process achieves a maximum log pseudolikelihood of ?92:4
(see x3.2).
cates were calculated as [1:29;2:73], [0:09;0:39], and [0:62;0:81] respectively. These are in broad agreement with the normal-based intervals.
The condence interval for easily captures Ripley's (edge-corrected)
value of 0.15. However the interval does not embrace the corresponding
-value of 3.11. Thus appears to be more sensitive to the estimation
methodology than does .
9.2 Swedish pines data | Ord's model
Following remarks of Ord [51, discussion] and Ripley [52, p. 175] we at-tempted to t Ord's model (section 5.3) to the Swedish pines data. For
simplicity we took a \Strauss-type" kernel g(v) = if vv 0 if v > v0 (54)
where ; > 0 are parameters to be estimated and the threshold v0 > 0 is a
nuisance parameter. Threshold Profile log PL 0.5 1.0 1.5 2.0 2.5 3.0 -90 -85 -80 -75
Figure 6: Prole log pseudolikelihood for the threshold parameter v0 of the
Strauss-type kernel (54) in the Ord model for the Swedish pines data. The
peak is at v0 = 1:10 square metres. Dashed line shows log pseudolikelihood
for the homogeneous Poisson model.
Figure 6 shows the prole log pseudolikelihood ofv0for the Swedish pines
data. Jaggedness of the plot may again be explained by discontinuity of the
kernel. There is a sharp peak at v0 = 1:10 square metres. Adopting this
value as the threshold, the parameter estimates for the Swedish pines data
are = 1:70 and = 0:43. Figure 7 shows the result of tting the same
model (with v0 xed at 1.10) to 100 simulations of a binomial process, i.e.
71 independent uniformly distributed points in the same region. It indicates
very strong dependence between andb
b
for the binomial process. The plot conrms that the Swedish pines data appear to be strongly ordered.
log beta log gamma -1.0 -0.5 0.0 0.5 -0.5 0.0 0.5 • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Figure 7: Estimated parameters log and log for the Strauss-type Ord
kernel (54) for the Swedish pines data (2) and for 100 simulations of a
binomial process with the same number of points (+). Dotted lines indicate = ^ (the estimated intensity) and = 1 (corresponding to a Poisson process).
9.3 Japanese black pines data
Figure 8 depicts the Japanese black pines data of Numata [43] giving the
locations of 204 seedlings in a 1010 metre square. Ogata & Tanemura [47]
used approximate maximum likelihood estimation to t a soft core model
with log-polynomial trend (i.e. where B(u) in (32) is a polynomial in the
Cartesian coordinates), choosing a cubic polynomial as giving the optimal t.
In our analysis we rst tted a soft core model with log-cubic trend. The homogeneous soft core model has already been discussed in section 5.1.2. Adding the polynomial trend to the model is trivial using the Berman-Turner device; it is simply a matter of adding polynomial terms in the Cartesian coordinates to the linear predictor in the associated loglinear model. The
estimation of the nuisance parameter is problematic, so we initially set
= 0:5 arbitrarily. The tted trend surface is shown in Figure 9. Its contours are similar to those obtained by Ogata and Tanemura in [47] . Edge corrections had little eect on the t, suggesting that edge eects are negligible.
+ + + + + + + + + ++ + + + + + ++ + + + + + + + + ++ + + + + + + + + + +++ + + + + + + + + + + + + + + + + + + + + ++ + + + + + +++ + + + + + + + + + + + + + + ++ + ++ + + + +++ + + + + + + + + + ++ ++ + + + ++ + + + + + + + + + +++ + + + +++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + ++ + + + + + ++ + + + + + + + + + + + + + + + + + +
Figure 8: Map of a natural stand of seedlings and saplings of Japanese black
pine (Pinus Thunbergii), 204 seedlings and saplings in a sampling rectangle
10 m 10 m. Source: Numata (1964). Data kindly supplied by Professors
Y. Ogata and M. Tanemura.
We also nd it helpful to plot the tted conditional intensity function b(
;
x
) as shown in Figure 10. This is not a substitute for plotting thetrend surface, since the conditional intensity depends on the realised pattern
x
. Its usefulness lies in visualising the eect of the tted interaction modelon the underlying trend, the relative magnitudes and ranges of the trend and interaction terms, and the tradeo between these two (when comparing dierent models). The plot also helps in checking discretisation eects.
Other interaction terms and trend terms can be tted at little extra cost using the Berman-Turner device, in contrast to the extra eort required for maximum likelihood or simulation-based approaches. It is of interest to compare the foregoing t with that obtained using a Strauss model for the interaction (along with a cubic polynomial spatial trend). In obtaining the
Strauss t we estimated the interaction radius r by maximizing the prole
pseudolikelihood, as well as estimating the parameters and .
As noted in x2.3, the model parameters may be subject to constraints.
The Strauss parameter must satisfy 0 1. In this analysis we had
to impose the constraint explicitly, i.e. for some values of r an estimate
^
> 1 was obtained, whereupon we set ^ = 1, and adjusted and the 32
0 2 4 6 8 10 X 0 2 4 6 8 10 Y 0 1 2 3 4 5 6 trend 0 2 4 6 8 10 0 2 4 6 8 10 0.511.5 1.5 2 2 2.53 2.5 3 3.5 3.5 4 4 4.5 4.5 5 5 5.5 5.5 + + + + + + ++ + +++ +++ + + ++ + + + + + + ++ + + +++ ++ + + + + ++ + + + + + ++ + + + + + + + + ++ ++ +++ + + ++ + ++ + + + + ++ ++ + ++ + + + + + + + ++ +++++ + + + + + + + + + + ++ ++ + + ++++ + + + + + + + + +++ + + + + ++ ++ + + ++ + + + + + + + + + + + ++ + + + + + + + + + + + ++ +++ + + + + + + + + + + ++ + + + + +++ + + ++ + + + + + + ++ + + + + + + +
Figure 9: Fitted log-cubic trend surface expf
b
TB(u)
gfor the Japanese black
pines data with soft core interaction model. Top: perspective plot; bottom:
contour plot.
pseudolikelihood accordingly.
A plot of the resulting prole log pseudolikelihood is shown in Figure 11
and yields ^r = 0:14. This value is just less than the minimum interpoint
distance for the Japanese black pines data set. That is, when a spatial trend is allowed for, the optimal Strauss model for the interaction is the hard core model.
Using r = 0:14 we tted the inhomogeneous Strauss model. The tted
trend was visually identical to that obtained for the inhomogeneous soft core model, and is not shown. The tted conditional intensity function is shown in Figure 12; this is essentially the trend surface \with holes of radius 0.14 punched in it" at each data point.
Although the trend surfaces are visually identical, one might ask for a more objective assessment of the dierence between the two trends. To this end, we examined the dierences between the corresponding polynomial
0 2 4 6 8 10 X 0 2 4 6 8 10 Y 0 1 2 3 4 5 6 c.i.f.
Figure 10: Fitted conditional intensity function ^(
;
x
) for the Japaneseblack pines data with soft core interaction model.
coecients. These appeared to be relatively small; the maximum percentage dierence jest 1 ?est 2 j (jest 1 j+jest 2 j)=2 100% was about 7%.
Yet it is not clear how to assess the magnitude of these dierences. A rough idea might be given by dividing the dierences by an estimate of the standard deviations of, say, the Strauss ts, obtained by bootstrapping. When this was done, the maximum absolute value of the resulting ratios was
0.0956 (corresponding to the x3 coecient). Intuitively this conrms the
visual impression that there is no evidence of a dierence between the two tted trends.
A trend was also tted to the Japanese black pines data in the form of a general non-parametric smooth function, the possibility of which was
mentioned in section 6.1. The smooth function was provided by the S-PLUS
• •• • • • •• • • • • • • • •• • •• •••• • • • • • • • • • • • • • • • • • Interaction radius r.
Profile log pseudolikelihood.
0.0 0.1 0.2 0.3 0.4 0.5
-40
-35
-30
Figure 11: Prole log pseudolikelihood of the Strauss interaction range r for
the Japanese black pines data; Strauss process, with log-cubic polynomial
trend, tted using the Berman-Turner device. Maximum occurs at r = 0:14.
functionlo(), and the t was accomplished using the functiongam()in place
of glm(). Both Strauss and soft core models were used for the interaction.
When the Strauss model was used, prole pseudolikelihood indicated a value
of 0.14 for the r parameter | i.e. a hard core model | the same as for the
cubic polynomial trend.
For the Strauss (hard core) model the intensity function and the trend surfaces were visually indistinguishable from those obtained using the cubic polynomial trend. For the soft core model the interaction seemed much more
\subdued" when the trend was modelled using lo(); the plot of the
condi-tional intensity showed not much more than dimples in the trend surface. This nding reinforces the principle that the more freedom we allow for the trend, the closer the trend ts the actual data, so the less interaction is needed to explain the data. This eect is not noticeable with the hard core
interaction which cannot adapt itself to the smooth trend surface. The lo()
0 2 4 6 8 10 X 0 2 4 6 8 10 Y 0 1 2 3 4 5 6 c.i.f.
Figure 12: Conditional intensity function for the Japanese black pines data with the interaction modelled as a Strauss process.
trend itself, with soft core interaction, was visually very similar to the cubic polynomial trend, but slightly lower.
Next we attempted to estimate the soft core nuisance parameter . The
approximate prole log pseudolikelihood can be calculated from the output of the GLM tting algorithm, via (18). However plots of this quantity and of
the parameter estimates suggested that small values of lead to numerical
instability. This persisted when dierent starting values and dierent
statis-tical packages (S-PLUS,GLIM) were used. Note that the interaction potential
(36) is unbounded, with innities at the data points, and the approximate
pseudolikelihood is not uniformly continuous in , even for xed data and
dummy points. Hence the quadrature schemes advocated in section 3.3 ap-pear to be inadequate for the prole pseudolikelihood.
An alternative numerical integration procedure was then implemented using the midpoint rule and a ne array of integration points. Figure 13 shows the resulting approximate prole log pseudolikelihood. It suggests that
• • • • • • • • • • • • • • • • • • • •• •• •• •• • •• •• •• •• • • • • • • • • • • • • • Parameter kappa Log pseudolikelihood 0.0 0.1 0.2 0.3 0.4 -37 -36 -35 -34 -33
Figure 13: Prole log pseudolikelihood of the soft core process nuisance
pa-rameter for the Japanese black pines data. Calculated by the \exact"
method.
the maximum occurs very close to = 0. The pseudolikelihood will in fact
have an innite maximum at a point where = 0 in certain circumstances.
Let d = mini6=i
0jjxi ?xi0jj be the minimum inter-point distance and b =
supu2Wmini
=1;:::;n(x)
jju?xijj the maximum distance from a location in W
to the nearest data point. If b d then the soft core pseudolikelihood of S
has an innite maximum at = d and = 0.
To prove this, write the log pseudolikelihood
nln? X i6=j jjxi?xjjj 2= ? Z Wexp ( ? n X i=1 jjxi?ujj 2= ) du
in the form nln ?S ?I where S represents the nite sum and I the
integral. Maximising with respect to makes the prole log pseudolikelihood
of (;) equal to
nlnn?S?nlnI .
Let Er = fu 2 W : jju?xijj r for all ig. Observe that for u 2 E the
integrand of I tends to 1 as !0 and for u2W nE the integrand tends
to 0 so that I tends to jEj as !0.
If b d, jEj can be made arbitrarily small by taking less than but
suciently close to d. Now if < d thenS !0 as !0. Thus, for such ,