Smoothing estimation of intensity function .1 Kernel estimation.1Kernel estimation

EXPLORATORY DATA ANALYSIS

6.5 Smoothing estimation of intensity function .1 Kernel estimation.1Kernel estimation

If the point process has an intensity functionλ(u), this function can be estimated nonparametrically by kernel estimation.

Our favorite analogy is to imagine placing one square of chocolate on each data point. Using a hair dryer we apply heat to the chocolate so that it melts slightly. The result is an undulating surface of chocolate; the height of the surface represents the estimated intensity function of the point process. The total mass of chocolate is unchanged.

6.5.1.1 Kernel estimators

The usual kernel estimators of the intensity function [222, 106] are:

uncorrected: eλ⁽⁰⁾(u) =

n i=1

∑

κ_{(u − x}i), (6.7)

uniformly corrected: eλ^(U)(u) = 1 e(u)

n i=1

∑

κ_{(u − x}i), (6.8) Diggle’s [222] correction: eλ^(D)(u) =

n i=1

∑

e(xi)κ_{(u − x}i), (6.9) for any spatial location u inside the window W , whereκ(u) is the kernel function (the shape of one melted square of chocolate) and

e(u) =^Z

Wκ_{(u − v)dv} (6.10)

is a correction for bias due to edge effects. Outside the window W , the estimated intensity is zero.

For a data point at location xi, the function f (u) =κ_{(u − x}i) represents the melted square of chocolate that was originally placed at xi. The kernelκmust be a probability density, that is,κ_{(u) ≥} 0 for all locations u, and^R_R2κ(u) du = 1. A common choice is the isotropic Gaussian (Normal distribution) probability density. The standard deviation of the kernel is the smoothing bandwidth: a larger bandwidth gives more smoothing. The choice of bandwidth involves a tradeoff between bias and variance: as bandwidth increases, typically the bias increases and variance decreases.

Figure 6.9 shows the different kernel estimates (6.7)–(6.9) for the Swedish Pines data using the same kernel. Note that the ‘raw’ or ‘uncorrected’ estimate (6.7) decreases close to the boundary of the observation window. This is an edge effect. Trees lying just outside the window were not observed, and did not contribute to the estimate (6.7), so that locations u closer to the boundary of the window receive fewer contributions in the sum (6.7). The raw estimate therefore has a strong negative bias at locations close to the boundary, due to edge effects. The raw estimate should be used only in those rare situations where there are no edge effects — for example, in mapping the density of an isolated stand of trees, where every tree in the stand is represented in the data.

The uniform correction (6.8) and Diggle’s correction (6.9) are designed to compensate for the

0.511.5

Figure 6.9. Kernel estimates of intensity for Swedish Pines using different edge corrections. Left:

raw estimate.Middle: uniformly corrected estimate. Right: Diggle’s correction estimate. Isotropic Gaussian kernel with standard deviation1 metre. Intensity values are counts per square metre.

edge effect arising when a point process is observed inside a window. The uniformly corrected esti-mator (6.8) is unbiased when the true intensity is homogeneous. Diggle’s corrected estiesti-mator (6.9) has better performance overall (smaller mean square error) and is normalised so that the integral of eλ^(D)(u) over the window is exactly equal to the observed number of points.

Kernel estimators of the intensity function are slightly biased in general, because they smooth out details in the intensity function. To understand their statistical properties we can use Campbell’s formula. Suppose f (u) is a function of spatial location u, and consider the random sum

T =

∑

f (xi)

of the values of f at each of the points xiin a point process X. Campbell’s formula states that the expected value E[T ] of the random sum T is

∑

f (xi)] = Z

R² f (u)λ(u) du (6.11)

whereλ(u) is the intensity function of X. This can be justified by dividing space into pixels and considering the contribution to T from each pixel.

To apply Campbell’s formula to kernel estimation, suppose we fix a spatial location v, and let f (u) =κ(v − u) if u is inside the window W , and f (u) = 0 if it is outside. Then

∑

f (xi) =

∑

κ_{(v − x}i) = eλ⁽⁰⁾(v)

where the sum is over all points xiin the window W . By Campbell’s formula (6.11) E[eλ⁽⁰⁾(v)] = E[

∑

The expected value of the estimate eλ⁽⁰⁾(v) is not equal to the true intensity valueλ(v). Even if the true intensity is constant, sayλ_{(v) ≡}βfor all locations v, we get

E[eλ⁽⁰⁾(v)] = Z

Wκ_{(v − u)}βdu =β Z

Wκ(v − u)du =βe(v)

where e(v) was defined in (6.10). This motivates the definition of the corrected estimator eλ^(U)(u) which is then unbiased at least when the intensity is constant: if λ_{(v) ≡}λ_{, then Ee}λ^(U)(v) =λ.

Kernel estimation is implemented in spatstat by the function density.ppp, a method for the generic command density.

> den <- density(swp, sigma=1)

The smoothing bandwidth is specified by the argument sigma. This may be a single numerical value (specifying the bandwidth of the kernel in the same units as the point pattern), or a pair of numerical values (specifying different standard deviations in the x and y directions), or a function which performs automatic bandwidth selection (see below). Currently the smoothing kernel is an isotropic Gaussian density; other options will be added soon.

By default, the uniformly corrected kernel estimator (6.8) is calculated. For Diggle’s corrected estimator (6.9), specify diggle=TRUE: as discussed above, this has better statistical performance, but is slower to compute. For the uncorrected estimator (6.7), set edge=FALSE.

The value returned by density.ppp is a pixel image (object of class "im"). This class has methods for print, summary, plot, contour (contour plots), persp (perspective plots), and so on. The method for plot was used to produce Figure 6.9. The methods for persp and contour produced Figure 6.10.

x y

den

0.3 0.4

0.4

0.5

0.5 0.6

0.6

0.7

0.7 0.8 0.9 0.9 0.8

0.9

0.9 0.9 1

1 1.1 1.4

Figure 6.10. Perspective plot (Left) and contour plot (Right) for a kernel estimate of intensity in the Swedish Pines data. Plots were generated using persp.im and contour.im, respectively.

6.5.1.2 Bandwidth selection

The kernel bandwidth sigma controls the degree of smoothing (the amount of ‘melting’ in the chocolate analogy). As shown in Figure 6.11, a small value of sigma produces an irregular intensity surface, while a large value of sigma appears to oversmooth the intensity.

sigma = 0.5 sigma = 1 sigma = 1.5

50100150200250

Figure 6.11.Density estimates with different smoothing bandwidths.

If the bandwidth sigma is not specified in the call to density.ppp, the default is to take sigma equal to one-eighth of the shortest side length of the enclosing rectangle. This is a very rough rule of thumb which may be unsatisfactory in many cases.

Several algorithms are available for automatically selecting the bandwidth sigma by minimising a measure of error. They include bw.diggle for Diggle and Berman’s [222, 89] mean square error cross-validation method and bw.ppl for the likelihood cross-validation method [428, Sect. 5.3].

> b <- bw.ppl(swp)

> b sigma 4.036

These commands return a numerical value, the optimised bandwidth, which also belongs to the special class "bw.optim". The plot method for this class shows the objective function for the optimisation. Figure 6.12 shows the results of plotting the likelihood cross-validation value using plot(b) and zooming in using plot(b, xlim=c(3,6)). The first plot suggests that any smooth-ing bandwidth greater than about 2 metres would be adequate. This is what might be expected for a homogeneous point pattern.

0 1 2 3 4 5 6 7

−500−300−100

(

)

3.0 4.0 5.0 6.0

−94.22−94.16

(

)

Figure 6.12. Likelihood cross-validation criterion for smoothing bandwidth plotted against band-widthσ in metres. Right panel is zoomed in to the range3 ≤σ_{≤ 6.}

Bandwidth selection may also be based on a fast rule of thumb. Examples include bw.scott for Scott’s rule of thumb for bandwidth selection in multidimensional smoothing [608, p. 152], and bw.frac for a fast bandwidth selection rule based on the window geometry (explained in the spatstat help file).

Bandwidth selection commands can be invoked in either of the following ways:

> D <- density(swp, sigma=bw.diggle(swp))

> D <- density(swp, sigma=bw.diggle)

Different bandwidth selection methods can disagree substantially. For the Swedish Pines data, bw.diggle gives 0.571 metres, while bw.ppl gives 4.38 and bw.scott gives(1.41, 1.32).

Any bandwidth selection rule gives unsatisfactory results in some cases, because it is based on assumptions about the dependence between points, which may be inappropriate. Likelihood cross-validation bw.ppl assumes an inhomogeneous Poisson process; bw.diggle assumes a Cox pro-cess, which is more clustered (positively correlated) than a Poisson process; both these assumptions are probably inappropriate for the Swedish Pines data which are somewhat more regular (negatively correlated) than a Poisson process. It is often convenient to be able to adjust the automatically se-lected bandwidth by specifying the argument adjust, a numeric value which multiplies the sese-lected bandwidth:

This is equivalent to selecting the bandwidth by b bw.diggle(swp) then computing D <-density(swp, sigma=2*b).

6.5.1.3 Estimation of intensity at the data points

It is sometimes required to estimate the intensity valuesλ(xi) at the data points xithemselves. For example, the estimated intensity values at the data points can be used as weights in some analysis procedures. However, the estimates eλ^(U)(xi) and eλ^(D)(xi) have a large positive bias, because of the termκ_{(u − x}i) =κ(xi− xⁱ) =κ(0) appearing in the sum in (6.8)–(6.9). To deal with this problem it is advisable to use a leave-one-out estimator in which the value ofλ(xi) is estimated using all of the data points except xi:

eλ_−i^(U)(xi) = 1 e(xi)

∑

j6=i

κ(xi− xj) (6.12)

eλ_−i^(D)(xi) =

∑

j6=i

e(xj)κ(xi− xj). (6.13)

Typically the leave-one-out estimates have a slight negative bias.

To compute intensity estimates at the data points, invoke density.ppp with the argument at="points". The result is a numeric vector of density values for each data point. The default is to compute the leave-one-out estimates; this can be suppressed by setting leaveoneout=FALSE.

> dX <- density(swp, sigma=1, at="points")

> dX[1:5]

[1] 0.3750 0.7880 0.6397 0.6144 0.3938

6.5.1.4 Computation

The spatstat package uses different algorithms to compute the intensity estimates at data points and on a pixel grid. The intensity estimates at the data points are computed to high precision using the formulae (6.8)–(6.9) or (6.12)–(6.13) in double precision arithmetic. For the intensity estimates on a pixel grid, exact calculation would be too slow, so the pixel values are computed by spatially discretising the point pattern and convolving using the Fast Fourier Transform [171]. Thus, the following are approximately but not exactly equal:

> den <- density(swp, sigma=1)

> denXpixel <- den[swp]

> denXpixel[1:5]

[1] 0.9177 1.1003 0.9126 1.0252 0.6044

> denXexact <- density(swp, sigma=1, at="points", leaveoneout=FALSE)

> denXexact[1:5]

[1] 0.9211 1.0836 0.9145 0.9051 0.6038

6.5.1.5 Standard errors

To compute standard errors and confidence intervals for the intensity function, additional assump-tions are required. For example, assume a Poisson point process with intensity functionλ(u), and estimate the intensity by a kernel estimator of the general form

bλ(u) = a(u)

∑

b(xi)κ(xi− u) (6.14)

where a(u) and b(xi) are edge correction weights, embracing the three edge corrected estimators (6.7)–(6.9). Then the variance of bλ(u) is, for a Poisson process only [197, p. 188],

V (u) = varbλ(u) = a(u)² Z

Wb(v)²κ_{(u − v)}²λ(v) dv. (6.15) An unbiased consistent estimator of V (u) is

V (u) = a(u)b ²

∑

b(xi)²κ_{(u − x}i)². (6.16) This takes the form of a weighted kernel estimate of intensity. Ifκ(x) =κσ(x) is the isotropic Gaussian kernel with standard deviationσ, then a little algebra shows thatκσ(x)²= cκτ(x) where τ=σ/√

2 and c = 1/(8πτ²) = 1/(4πσ²). That is, the variance V (u) can effectively be estimated by smoothing the data with bandwidth τ =σ/√

2 and multiplying the result by c. Taking the square root gives the standard error for the intensity estimate. This calculation is performed by density.ppp when se=TRUE:

> dse <- density(swp, 1, se=TRUE)$SE

The result is shown in Figure 6.13: note the standard error increases near the boundary, because intensity estimates nearer the boundary are based on fewer data points. Similar calculations can be made when at="points".

0.20.30.40.50.6

Figure 6.13. Estimate of standard error for the kernel estimate of intensity for Swedish Pines.

Uniform edge correction, bandwidth1 metre.

Be warned that, although the standard error provides an indication of accuracy, and is justified by asymptotic theory, confidence intervals based on the standard error are notoriously unreliable [315], essentially because the estimates bλ(u) and bV (u) are strongly correlated.

6.5.1.6 Weighted kernel estimators

If the data points xi have numerical weights wi, we can use weighted versions of the kernel esti-mators described above. The contribution from a point xi to the estimator is simply multiplied by the weight w_i, so that (for example) the raw intensity estimator bλ⁽⁰⁾(u) = ∑iκ_{(u − x}i) becomes bλ^(0,w)_{(u) = ∑}_iwiκ_{(u − x}i). Using the chocolate analogy (page 168) the data point xiis represented by wiunits of chocolate rather than one unit.

Weighted kernel estimators are natural if the weight of a point represents its multiplicity (e.g., number of disease cases at the same residence) or physical mass (e.g., mass of galaxy) or economic value (e.g., total endowment of a mineral deposit).

For example, in a forest inventory we could take the ‘weight’ of each tree to be its estimated

0.001 0.002 0.003 0.004

Figure 6.14.Volume-weighted intensity for Finnish Pines data.

volume. The volume-weighted intensity is the average standing volume of wood per unit area of forest. Figure 6.14 shows this quantity for the Finnish Pines data. The scale is in metres (cubic metres of wood per square metre of forest).

The argument weights to density.ppp specifies weights for the density calculation. Fig-ure 6.14 was generated by

> vols <- with(marks(finpines),

(pi/12) * height * (diameter/100)^2)

> Dvol <- density(finpines, weights=vols, sigma=bw.ppl)

The average standing volume of wood per square metre was calculated at the end of Section 6.2: it is

> intensity(finpines, weights=vols) [1] 0.001274

6.5.2 Spatially adaptive smoothing

The kernel estimators described above are fixed-bandwidth smoothers: they use the same kernel and the same bandwith to compute estimates at different spatial locations. This approach has sev-eral weaknesses. A fixed smoothing bandwidth is unsatisfactory if the true intensity varies greatly across the spatial domain, because it is likely to cause oversmoothing in the high-intensity areas and undersmoothing in the low intensity areas. Kernel estimation is unsatisfactory when there is a sharp boundary between areas of high and low intensity, because this boundary will be smoothed out. These problems militate against the use of kernel estimation in seismology, for example.

Strategies for avoiding this problem include variable-bandwidth smoothing where the smooth-ing bandwidth is spatially varysmooth-ing and data-dependent [617, 203], [190, p. 654], and more generally adaptive smoothing. The contributed R package sparr [202] provides a suite of adaptive kernel spatial smoothing techniques and related tools.

Adaptive estimators of intensity can be based on Dirichlet-Voronoï tessellations (Section 8.2.3).

The Dirichlet-Voronoï estimator [81] of intensity λ(u) at a location u is eλ(u) = 1/|C(u;x)|, the reciprocal of the area of the tile C(u;x) containing u in the Dirichlet-Voronoï tessellation defined by the data point pattern x. Estimators of this type have been used in statistical seismology [509]

and perform well when there is an abrupt change in intensity. The Dirichlet-Voronoï estimator is computed in spatstat by the function adaptive.density with argument f=1.

> vden <- adaptive.density(swp, f=1)

The value returned by adaptive.density is another pixel image (object of class "im").

The algorithm in adaptive.density is more general. A specified fraction f of the points in the point pattern are selected at random, and used to construct a Dirichlet tessellation. A quadrat counting estimator of the intensity is based on this tessellation. This process is repeated nrep times and the results are averaged. The left panel of Figure 6.15 shows the result of

> aden <- adaptive.density(swp, f=0.1, nrep=30)

Another strategy is to measure the distance R = d(u,x) from a fixed point u to the nearest data point xi, and to compute the area A =πR²of the corresponding disc. For a homogeneous Poisson process with intensity λ, the random area A is negative exponential (λ) distributed, and the maximum likelihood estimate ofλ based on R is ˆλ = 1/(πR²). Similarly we could use the distance Rk to the k-th nearest data point (for k ≥ 1) and set ˆλk= k/(πR²_k). See [617, p. 96], [190, p. 654]. This intensity estimator can be calculated rapidly for all points u in a pixel grid: the spatstat function nndensity computes it. The right panel of Figure 6.15 shows the result of nndensity(swp, k=10).

500700900 60010001400

Figure 6.15. Dirichlet-type adaptive density estimate (Left) and 10th nearest-neighbour density estimate (Right) for the Swedish Pines data. Density values multiplied by 1000.

Intensity can also be estimated using nearest-neighbour distances [543, 216, 573, 575, 137] and similar principles [613]. These are often used as the plug-in estimates ofλ in other statistics related to interpoint distances, such as Ripley’s K-function (Chapter 7), or nearest-neighbour distances, and the G-function (Chapter 8). Bayesian estimation of the intensity is described in [332, 77].

6.5.3 Projections, transformations, change of coordinates

30 50

Figure 6.16. Illustration of change of coordinates for intensity.

In Section 6.2 and 6.3 we saw that the intensity depends on the unit of length. In fact, changing the spatial coor-dinate system in any way — through a change of units, a change of scale, a geometric transformation, or a geo-graphic projection — affects the intensity.

Intensity is the expected number of points per unit area, so any geometric transformation which changes the value of area also changes the value and the very meaning of the intensity.

The intensity function of a point process after a spa-tial transformation has been applied is related to the in-tensity function of the original point process, through the general principle of ‘change of coordinates’.

We motivate this with an example. Imagine a hillside, covered in trees: see Figure 6.16. Walking straight up the hill involves a walking distance of 50 metres and an increase in altitude of 30 metres,

metres wide. The total surface area of the hillside is 50 × 100 = 5000 square metres or half a hectare. If there are 800 trees on the hillside, then the estimated intensity is 800/0.5 = 1600 trees per hectare of hillside surface. However, if we photograph the hillside from directly above in a survey aircraft, or map the tree locations using a GPS device, the hillside is represented by a rectangle only 40 × 100 = 4000 square metres or 0.4 hectare in area on the map, and the estimated intensity is 800/0.4 = 2000 trees per hectare of map area.

Which of these calculations is ‘right’? Actually, both are correct. The ‘forest density’ could be defined either as a density per hectare of soil (which might be more useful for understanding soil ecology) or as a density per hectare of map area (perhaps more useful for understanding competition in the forest canopy).

The key is that the two measures of intensity are inter-related. Map area is equal to hillside surface area multiplied by the cosine of the slope angle, which is 40/50 = 0.8 in the example above.

Therefore the intensity per unit area of map is equal to the intensity per unit area of hillside, divided by the cosine of the slope angle.

The general principle is the following. Suppose that a point process X has intensity function λ(u). We now apply a geographic projection, a geometrical transformation, or a change of the coordinate system, so that the points xi are mapped to new coordinate positions yi= T (xi). The transformed point process Y = T (X) has intensity function

λY(u) = J(u)λX(T⁻¹(u)) (6.17)

where T⁻¹ is the inverse mapping (that is, T⁻¹(u) is the point mapped onto u), and J(u) is the Jacobian(determinant of the derivative matrix) of the inverse mapping.

Invoking this general principle, we can apply the same simple trigonometry to real terrain data where the slope is spatially varying. The tropical rainforest point pattern dataset bei comes with an extra set of covariate data bei.extra, which contains a pixel image of terrain elevation bei.extra$elev and a pixel image of terrain slope bei.extra$grad (for ‘gradient’), at a coarse spatial resolution of 5 metres. The command density(bei) computes the estimated intensity of trees relative to map area. To convert this to an estimate of the intensity relative to terrain surface area, we need the cosine of the slope angle. The covariate grad is given as the number of metres of elevation increase for every metre on the map, so that a grad value of 1 corresponds to a 45 degree slope. That is, grad is the tangent of the slope angle. Recalling our high school trigonometry, cos²(x) = 1/(1 + tan²(x)), so the conversion is:

> grad <- bei.extra$grad

> dens.map <- density(bei, W=grad)

> dens.ter <- dens.map * sqrt(1+grad^2)

The two estimates are not very different in this case, because the maximum value of grad is only 0.33, so that the maximum inflation factor is onlyp

(1 + 0.33²) = 1.05.

Figure 6.17 shows a perspective view of the rainforest terrain, shaded according to the estimated density of Beilschmiedia trees, using

> persp(bei.extra$elev, colin=dens.ter)

An alternative to the calculation above is to introduce the Jacobian weight into the kernel smoother. That is, we smooth the point pattern in the projected space, but weight each data point by the Jacobian term at that point:

> dens.ter2 <- density(bei, weights=sqrt(1+grad[bei]^2))

This is justified by Campbell’s formula (page 169). An advantage of this approach is that we only need to know the Jacobian values at the data points.

Figure 6.17. Perspective view of rainforest terrain, shaded according to the estimated density of Beilschmiedia trees per unit area of soil. Lighter shades represent higher predicted densities. Scale on vertical axis is 6 times the scale on horizontal axes.

6.6 Investigating dependence of intensity on a covariate

In document Baddeley, Adrian; Rubak, Ege; Turner, Rolf Spatial Point Patterns Methodology and Applications With r (Page 187-196)