Threshold, Identification of Data Outliers and Element Sources
7.3 Including the spatial distribution in the definition of background
7.3.2 The concentration-area plot
It has been demonstrated above that it is useful to consider the spatial scale of the distribu-tion of the data in selecting break points or thresholds in plots of the distribudistribu-tion funcdistribu-tion. The concentration-area (CA-) plot achieves this by looking at the fractal structure of the data (Cheng et al., 1994; Cheng and Agterberg, 1995). Many patterns in nature are replicating, or self-similar (Mandelbrot, 1982; Barnsley and Rising, 1993). Examples are the shape of fern leaves, a snow flake, or the dendritic patterns of manganese hydroxide coatings on the parting faces of rocks in desert environments. Geochemical patterns can also be investigated to determine if they are “fractal”. Normally geochemical maps are thought of as two-dimensional, however, if they are spatially non-random they will have a fractal dimension between two and three. It is this partial dimension that contains spatial information on the structure of the data displayed in the two normal map dimensions. To use a CA-plot the data need to be relatively evenly spaced across the survey area. There should also be no major gaps, e.g., large areas where the sample medium is absent, in the data set. The Kola data meet this requirement and Figures 7.3 and 7.4 demonstrate the CA-plot procedure for the O-horizon soil data. First the data are spatially interpolated (see Section 5.6) to a fine (100× 100) regular grid, increasing the number of data points so they can be used as a surrogate for measuring areas. Then the interpolated values outside the survey area boundary are trimmed away, so there is no interpolation to points out-side the survey area. The original and interpolated data values together define thex-axis of the CP-plots displayed in the upper two plots of Figure 7.3. The number of values plotted is of course greatly increased by the spatial interpolation procedure. In the CA-plot (Figure 7.3, lower right) they-axis shows the percentage of the interpolated values, plotted on a logarithmic scale, that are larger (or smaller) than each value plotted on a logarithmic scale on thex-axis.
As the interpolated points are on a fine regular grid the number of points is a surrogate for the area of the interpolated map that is larger (or smaller) than the corresponding value on the x-axis. Thus the CA-plot displays the relationship between the percentage of the survey area that has a particular value and the actual value. Background levels will occur frequently and will represent the majority of the survey area, and low or high extreme value-containing areas will represent small percentages of the survey area. A single straight line indicates a single
116 DEFINING BACKGROUND AND THRESHOLD, IDENTIFICATION OF DATA OUTLIERS AND ELEMENT SOURCES
Figure 7.3 Concentration-area (CA-) plot (lower left) cumulated downwards, for Cu (mg/kg) in Kola O-horizon soil samples. The upper left plot is the CP-plot of the original data and the upper right plot is the CP-plot for the interpolated data. The lower left image is a grey-scale map of the interpolated data
fractal relationship controlling the data distribution; multiple straight lines indicate multiple fractal processes controlling the data distribution. The choice of interpolation algorithm and data transformation will influence the CA-plot. Firstly, the data distribution should be sym-metrical so that extreme values do not over-influence the resulting interpolation. This is often achieved in applied geochemical data with a logarithmic transform. Secondly, in interpolation the data should not be excessively smoothed as that will “smeer out” the very features being sought. To avoid this, simple triangulation is employed for the interpolation.
The CA-plot in Figure 7.3 (lower right) has been prepared by ordering the interpolated values from the maximum value downwards. As the areas are plotted on a logarithmic scale, this results in scale expansion and structural (fractal) detail being easily observable for high values, in this case related to the smelter facilities in the survey area. It is apparent that most
INCLUDING THE SPATIAL DISTRIBUTION IN THE DEFINITION OF BACKGROUND 117
Concentration−area plot (n = 8627)
Cu in O−horizon [mg/kg]
% Cumulative area < values on x−axis
2 5 20 50 200 1000 5000
0.010.1110100
Concentration−area plot (n = 8627)
Cu in O−horizon [mg/kg]
% Cumulative area > values on x−axis
2 5 20 50 200 1000 5000
0.010.1110100
Figure 7.4 Concentration-area (CA-) plot, cumulated upwards (left) and downwards (right – see Figure 7.3), for Cu (mg/kg) in Kola O-horizon soil samples
of the data follow a single fractal relationship, with another relationship present at low levels and a more complex relationship above 500 mg/kg Cu near the smelters. Thus the distribution of the O-horizon Cu-data appears to be dominated by smelter-related processes. This does not help determine the lower limit of smelter influence; for this the CA-plot needs to focus on the lower end of the fractal distribution. This is achieved by ordering the interpolated values from lowest to highest, see Figure 7.4 (left). As distinct from the downwards cumulated display (Figure 7.4, right), it is immediately apparent that there are two major fractal processes present, one below about 10 mg/kg Cu, and one above.
The lower process is that controlling the regional background distribution of Cu in the O-horizon soil, and the upper process is related to the atmospheric dispersion of Cu from the smelter facilities. The fine detail present at the highest levels>500 mg/kg in Figure 7.4 (right) relates to very local processes close to the point sources, possibly fugitive releases of coarser-grained material incapable of long-range transport, or other local smelter-related activities.
It is interesting to note that the use of a simple triangulation interpolation procedure does not materially change the shape of the data distribution (compare the two upper plots in Figure 7.3).
A comparison of the CA-plot (lower right) with the CP-plot (upper left) of the data is most informative. In Figure 7.3 not a lot of new information is learnt: the gaps in both plots are similar; however, Figure 7.4 (left) provides real new insight into the importance of the flexure in the CP-plot at around 13 mg/kg Cu (see previous discussion in Section 7.3.1).
Previously in this chapter, threshold estimates for Cu (mg/kg) in Finnish O-horizon soils of 12 (“MEAN± 2 · SD” original scale), 13 (“MEAN ± 2 · SD” log-scale), 12 (boxplot upper fence original scale), 13 (boxplot upper fence log-scale), 10 (“MEDIAN± 2 · MAD” original scale), 11 (“MEDIAN± 2 · MAD” log-scale), and 13 (98thpercentile) have all been made (see Tables 7.1 to 7.4). The upwards cumulated CA-plot confirms in a most convincing way that a useful threshold for Cu in O-horizon soils is in the 10 to 13 mg/kg range. This information would be most useful for developing a map of Cu in O-horizon soils where a symbol change or isoline at somewhere between 10 and 13 mg/kg would indicate the limit of measurable influence (in the O-horizon Cu data) of the smelter facilities (compare Figure 7.2, right). What is apparent is that
118 DEFINING BACKGROUND AND THRESHOLD, IDENTIFICATION OF DATA OUTLIERS AND ELEMENT SOURCES in the context of the Kola Project the data from Finland provide a good estimate for the natural background range of the whole survey area. This also demonstrates how far one might have to go from a major point source of anthropogenic contamination to find a suitable area in which to establish a natural background range. Fortunately, northern Finland is generally geologically pedologically and ecologically similar to the Kola Peninsula; if this was not the case, as might be in other studies, finding a suitable “background” area could pose real challenges.