Spatial Prediction 1: Deterministic Methods, Curve Fitting, and Smoothing
6.6 Local models and local data
In the remainder of this chapter, the focus is on areal interpolation methods that are either local in approach (based, in this case, on moving windows) or make use of additional data in recognition of the fact that the variable of interest (for example, human population count) varies from place to place.
6.6.1 Generating surface models from areal data
This class of methods relates to the point based interpolation procedures discussed in Section 6.1. With nonvolume preserving methods, whereby the output count over a given area is not necessarily the same as the input count over the same area, the usual approach is to select a point location to represent each area (for example, the centroid of an area) and to interpolate to a regular grid. The predictions may then be averaged within the target area.
Martin (258) outlines a method for mapping population from zone centroids.
With this approach, each zone centroid is visited in turn and the mean intercentroid distance is calculated within a predefined search radius. This measure indicates the unknown areal extent of zones in the region, and it is used to calibrate a distance-decay function that assigns weights to cells in the output grid. The cells that are closest to the zone centroid receive the largest weights, while those cells estimated to be located in the maximum areal extent of the zone receive the smallest weights. The population is then redistributed in the surrounding region using these weights. A given cell in the output grid may receive population values from one or more centroids, or may remain unpopulated as it is beyond the area of influence of any of the centroid locations. Assignment of values to each cell in a grid may be conducted with:
ˆ zi=
Xn j=1
zjwij (6.29)
where ˆziis the estimated population of cell i, zj is the population recorded at point j, n is the total number of data points (locally), and wijis the weight of cell i with respect to point j (262). An appropriate distance function (based on work by Cressman (97)) is given by:
178 Local Models for Spatial Analysis
where wijis the weight for distance dij, τjis the adjusted width of the window centred on point j, and r is an exponent. A modified version of this method is outlined below. To preserve the total population, it is necessary that the weights are constrained to sum to one:
Xn i=1
wij= 1.0 (6.31)
Population is preserved globally (the sum of populations in the zones is the same as the sum of populations in the population surface), but the sum of the number of people in a given zone does not necessarily correspond to the overlapping area in the population surface. Gotway Crawford and Young (158), (159) classify this approach as a spatial smoothing method.
Bracken and Martin (56) apply the method for linking 1981 and 1991 censuses of Britain. An illustrated example of the method is given in Section 6.6.1.1 and a real world case study in Section 6.6.2. The method of Bracken and Martin is a form of adaptive kernel estimation (32), as discussed in Section 8.10.2. Approaches such as this are considered problematic by some researchers on the grounds that population centroids used as the basis of the reallocation of population counts are usually not objectively defined.
6.6.1.1 Illustrating generation of a population surface
The application of the approach to redistributing population from zone centroids to grid cells described in Martin (258) is demonstrated here in a simple example. Table 6.7 gives population counts at four zone centroids for an artificial case. In this example, the population counts are redistributed to a grid of five-by-five cells whose centres have a minimum x and y coordinate of zero and a maximum x and y coordinate of four. The mean intercentroid distance is calculated within a predefined search radius, which in this case was 2.5 units. This gives the mean intercentroid distances specified in Table 6.8.
Spatial Prediction 1 179 TABLE 6.8
Mean intercentroid distances.
No X Y Mean distance
1 2 1 1.851
2 1.5 3 1.791
3 3.5 3 1.901
4 4 4 1.581
Then the distance between each centroid and each cell is calculated. The weights are then calculated using Equation 6.30. The weights are then scaled to sum to one. Finally, the weights are multiplied by the population values in each of the centroids. For example, the cell located at 2, 2 receives population from zone centroids 1 (1 unit in distance from the cell), 2 (0.71 units from the cell), and 3 (1.58 units from the cell).
Following this example, the weight for zone centroid 1 is calculated:
µ1.8512− 12 the weight for zone centroid 2:
µ1.7912− 0.712 and the weight for zone centroid 3:
µ1.9012− 1.582 The weights referring to each centroid are then scaled to sum to one. For example, the weights for centroid 1 sum to 4.245, with population assigned to nine cells, each weight is then calculated as a proportion of 4.245. These proportions are used to reassign population thereby ensuring the total output population is the same as the total input population (in this example, 98 people in total). The first of the three weights above becomes 0.129 (0.548 is 12.9% of 4.245), the second weight is 0.187, and the third weight is 0.044.
The population assigned to cell 2,2 is then calculated using Equation 6.29, giving:
(22×0.129)+(24×0.187)+(27×0.044) = 2.838+4.488+1.188 = 8.516 people The output grid is shown in Figure 6.13. Note that the sum of values in the population surface is the same as the sum of input data values. In the following section, a short real world case study is presented.
180 Local Models for Spatial Analysis
FIGURE 6.13: Population distributed to grid cells.
6.6.2 Population surface case study
In this case study, population data from the 2001 Census of Northern Ireland are used to illustrate areal interpolation from centroids of ‘Output Areas’ (one of the sets of zones for which population counts are provided) to a regular grid. The Output Areas and the population counts were given in Figure 1.8. A population surface was derived from these data using the method of Martin (258), and it is shown in Figure 6.14. The population values were redistributed to a grid with a 200 m cell spacing, and the Cressman decay function (defined in Equation 6.30) was used with a value of one for the decay parameter. The urban area around Belfast (in the east of the province) and other urban areas is clear, as is the fact that population is sparse across most of the region. Approaches which distribute values from irregularly-shaped zones to surfaces are useful in allowing comparison of counts from different time periods for which the original zonal systems are incompatible. So, the procedure applied here to 2001 population counts could be applied to counts from earlier censuses, and the surfaces compared to allow assessment of spatial variation in population change.
Spatial Prediction 1 181
0 25 50 Kilometres
Population Value
High : 1298
Low : 0
FIGURE 6.14: Population distributed to grid cells in Northern Ireland.
6.6.3 Local volume preservation
The term ‘pycnophylactic’ refers to the property of areal interpolators whereby the population of the target zones should be the same as the population of the constituent source zones — counts from a source zone should be allocated to target zones with which they intersect. Thus, volume is preserved locally.
Tobler’s pycnophylactic method was designed to generate continuous sur-faces from data represented as areas (for example, census district boundaries) (362). The method reallocates data — it is mass-preserving in that the volume of the attribute in a given area is the same in the original data (that is, in discrete polygons) and in the continuous surface derived from it. Summaries of the method are provided by Burrough and McDonnell (70) and Waller and Gotway (375).
The key condition for preserving mass (for example, maintaining the population count in the input data) is the invertibility condition:
Z
Ai
λ(s) ds = z(Ai) for all i (6.32)
182 Local Models for Spatial Analysis where λ(s) is a nonnegative density function, z(Ai) is the value (e.g., population) in region Ai, and there are N regions. This equation indicates that the total volume of values (e.g., population counts) in each zone does not vary whether the population count is modelled as a smooth surface (accounting for population in neighbouring areas) or as a set of discrete areas. The surface is assumed to vary smoothly such that neighbouring values are similar. As such, a simple approach is to fit a joint smooth surface to contiguous regions by minimising:
This is Dirichlet’s integral (see Equation 6.25) and the minimum, without the pycnophylactic and nonnegativity constraints, can be given with Laplace’s equation:
∂2λ
∂x2 +∂2λ
∂y2 = 0 (6.34)
The Laplacian equation can be used as the smoothness criterion, but requires modification to include the pycnophylactic constraint (362). Tobler’s approach applies a finite difference approximation to Laplace’s equation, as illustrated below.
Tobler’s method (362) works as follows: (i) overlay a fine grid of points on the source zones, (ii) assign the average density value to each point falling within a given zone, and (iii) then adjust these values iteratively with a smoothing function and a volume-preserving constraint until there is no significant change in the grid values between iterations. The approach is conceptually similar to the method detailed in Section 6.3.11.
Tobler (362) gives a small example of the application of the principles of his pycnophylactic method and part of that material is illustrated here.
In Tobler’s illustration, he uses the example of a histogram and calculating differences between adjacent bars as a measure of smoothness. Following Tobler (362), a 2D example can be illustrated with a grid representing cells belonging to two different regions:
1 1 1 1 2 2 2 2 2
Subtraction and squaring of adjacent cell values (that is, heights using the histogram parallel) can be defined with:
T0=
Spatial Prediction 1 183 where I is the number of rows in the array, and J is the number of columns.
Adding the pycnophylactic constraint leads to:
T = T0+
where ψ is the Lagrangian multiplier. Setting the partial derivatives of T with respect to each λ (cell location) and each ψ to zero gives a system of n + N linear equations where N are the number of regions, and n are the number of lattice points. The system is then, for a three-by-three matrix, Cλ = z:
5 (that is, if the counts represent population, there are 8 individuals in one region and 5 in the other). This gives:
2.240 2.060 1.956 1.740 1.306 1.170 0.996 0.816 0.711
with T = 2.734. The module surrounding –4 is the finite difference approximation to the two-dimensional Laplacian. Taking the first (top) row with respect to the λ value positions we have:
−2 1 0 1 0 0 0 0 0
The fifth row corresponds to the case with a full set of neighbours:
0 1 0 1 −4 1 0 1 0
In the computer program written by Tobler a nonnegativity constraint has been added (362).