Chapter 3 — Spatial data disaggregation
3.2 Data-driven spatial disaggregation approach
3.2.1 Data-driven approach without ancillary data Pycnophylactic area interpolation
Tobler’s (1979) Pycnophylactic area interpolation is a widely-quoted spatial disaggregation method. The underlying assumption is that the value of a spatial variable in the neighbouring target regions tends to be similar and the underlying structure of the variable distribution is continuous over the space. The method defines a continuous density surface over the study area, which can be estimated from the source zone population figures. The volume preserving requirement is enforced to the density surface by, namely, the
‘pycnophylactic property’, which requires the integral of the surface over a source zone
constrained to be equal to the known data for the zone. Subsequently, a smooth density function is employed that takes into account the effect of the adjacent zones, intending to minimise the curvature of the estimated surface. The pycnophylactic interpolation and the smoothing function iterate until there are no further changes in the pre-specified tolerance (see Figure 3.1). The final result is the data for the target areas (grid cells) which are spatially disaggregated from the source zones.
Figure 3.1: Pycnophylactic interpolation on a raster grid (source: Deichmann, 1996, p. 33)
The contribution of the pycnophylactic interpolation is the method of establishing two properties essential for the accurate data spatial disaggregation, pycnophylactic property and spatial autocorrelation. The pycnophylactic property is considered an essential for accurate interpolation. It gives a greater fidelity for the approximation of the target zone values in each source zone so that the subsequent estimation of a value for each target zone is less subject to error (Lam, 1983). Many new methods inherit these two requirements when they are used to solve spatial disaggregation problems.
scale, when smaller target zones are applied, the underlying assumption does not seem reasonable. Nordhaus (2002) comments that the pycnophylactic method smooths the result too much and tends to miss the fine gradations in the underlying data and so might not provide an accurate disaggregation result. One modification to the pycnophylactic method can be made by combining the method with the growing available ancillary information. For example, introduce green areas, water bodies into source zones by setting their initial density to zero. Then interpolate the density over the non-zero density areas may generate more accurate disaggregation result.
Kernel interpolation
Kernel interpolation (Bracken 1993) is derived from an areal interpolation method that uses point interpolation procedures (Lam, 1983, Oliver and Webster, 1990). Similar to pycnophylactic method, the kernel interpolation techniques impose a continuous density assumption over the study area. Essentially, the method assigns the attribute value of source zone to a polygon centroid and assumes the population density drops symmetrically while staying away from the centroid based on an exponential distance decay function (within a finite extent). The exponential model of the population density was firstly proposed by Clark (1951) and given as:
)
0exp( d
d p B
D = − (3.1)
where Ddis estimated density at a location at distance d away from the centroid; p0is the central density; B is a constant.
Having a polygon centroid for the source zones, a moving window filter (Silverman 1986) is applied to the study area. It focuses each centroid in turn, to estimate the population probability (weight) over a fine grid within that window. Then, each grid cell receives a share of the current centroid’s population, based on their distances. Thus, the method can provide a continuous surface of the population estimate across the study area. The grid cell values may then be aggregated to the target zones. The mathematical equations of the Kernel interpolation are:
∑
=where Pˆi is the estimated population of the cell i; Pjis the empirical population at point j;
c is the total number of data points; n is the total number of cells in the window; and Wijis the weighting of the cell i with respect to point j; wjis the initial radius of kernel window;
dijis the distance from cell i to point j. Wijmust be normalised to sum to 1 over all cells in the window and cell i will not receive the population from every point location but only from any points in whose kernel it falls.
There are number of drawbacks with kernel interpolation. An important problem in this technique is that it does not conserve the total value within each source zone. The method provides an estimate of the size of the areal unit (using the kernel window) represented by the current data point. All target zones will receive a population share from the kernel rather than from the original source zone. Martin (1996) modified the original kernel based interpolation to ensure that the data reported for the target zones are constrained to match the overall sum of the source zones. Nevertheless, the choice of a control point (centroid) to represent the zone may involve errors because, obviously, the distribution of the phenomenon is rarely symmetrical and the patterns of most socio-economic distributions are not uniform. Tobler (1999) again reviewed Martin’s 1996 approach and commented that the exponential model of the population density can only be considered a relevant approximation for the whole of an urban area. The density gradient farther out from the urban central presents more nearly as a linear fashion. The continued use of a centred exponential decay for every single source zone results in unrealistic density peaks, most obviously apparent in the large zones.
Simple overlay method (Simple area weighting)
known data at the source zones. The target zone data are estimated, based on proportioning the source attribute by the area, given the geometric intersection of the source zones and the target zones. The underlying assumption is that the spatial distribution of the objects is homogeneous within each source zone. The data for each target zone can be estimated as:
∑
× the intersection of the source and target zones.The Simple Area Weighting technique is recognised as the simplest area interpolation and disaggregation technique in use in terms of the ease of implementation and the data requirements. However, the assumption used by this technique is rather restrictive for a real geographical phenomenon. The general critique is that it incorrectly assumes that the density of the population within the source zones is uniform (Fisher and Langford 1995;
Langford and Fisher, 1996). Many studies have shown the overall low accuracy of the technique using a simple overlay method (see Gregory and Paul, 2005; Langford 2006;
Reibel and Aditya, 2006).
Lam (1983) provides a dated review of these earlier statistical and area overlay techniques. She concluded that the limitation of the area weighting method is that it takes each source zone as independent and does not consider the smoothness (or continuity) of the changes of the values between the zones, while assuming homogeneity within the source zone. Compared with the area weighting method, the pycnophylactic method represents a conceptual improvement because the effects of the neighbouring source zones have been taken into account and the homogeneity within the zones is not required.
In that circumstance, Lam (1983) suggested that the simple areal weighting yields better estimates if the data distribution is discontinuous, whereas the pycnophylactic interpolation techniques provide better results when the smoothness is a real property of the data distribution. The choice of spatial disaggregation techniques must consider the appropriateness of their underlying spatial structure.
3.2.2 Data-driven approach with ancillary data