Class selection - Some common mistakes in geochemical mapping

Mapping Spatial Data

5.9 Some common mistakes in geochemical mapping

5.9.5 Class selection

Often geochemists use “tidy” but arbitrarily chosen class intervals, e.g., 0.5, 1, 1.5, . . . ; or 5, 10, 20, 30, . . . ; or 10, 20, 50, 100, 200, . . . mg/kg. This approach looks orderly in the legend, but the legend itself is not the most important part of a geochemical map. Such arbitrarily picked classes bear no relation to the spatial or statistical structure of the mapped data. Such classes often result in uninformative, “dull” geochemical maps that fail to reveal the underlying geochemical processes causing regional geochemical differences. Arbitrary classes should therefore be avoided. As a historical note, arbitrary classes were extensively used when optical

SOME COMMON MISTAKES IN GEOCHEMICAL MAPPING 87

Figure 5.14 Four maps showing the distribution of As in C-horizon soils of the Kola Project area.

Percentile classes and the EDA symbol set are used for mapping. To get a clear visual impression of the data distribution, the size of the symbols needs to be adjusted to the scale of the map

spectroscopy was employed as a geochemical analysis tool and the plates or filmstrips were read by eye with a comparator. This tended to lead to clustering of the reported values around the standards used for comparison. Thus the class boundaries were set at the mid-points between the standards, often on a logarithmic scale, e.g., with standards at 1, 5 and 10 units, class boundaries were set at 1.5, 3 and 7.

Another approach that often fails to reveal meaningful patterns is to base the classes in the mean plus/minus multiples of the standard deviation. This approach is based on the assump-tion that the data follow a (log)normal distribuassump-tion. In this case – and only in this case – the approach provides an easy method of identifying the uppermost 2.5 per cent of the data as

“extreme values”. However, as has long been recognised, most applied geochemical survey

88 MAPPING SPATIAL DATA

Figure 5.15 Maps showing the distribution of As in C-horizon soils of the Kola Project area. Propor-tional dots according to the exponential dot size function (left) in direct comparison to growing dots scaled according to percentile classes (right)

measurements do not follow a lognormal distribution. Some practitioners choose to assume that geochemical data follow a lognormal distribution because this facilitates the use of classi-cal statistics by taking the logarithms of the univariate values (Reimann and Filzmoser, 2000d).

Regrettably, modern (often “robust”) statistical procedures, that are better suited for geochem-ical data, are often omitted from basic statistics courses, and this may explain why the “mean plus standard deviation” approach is still so popular in geochemistry Reimann et al., 2005c).

The problem is that for data (or their logarithms) that are not normally distributed, but reflect a whole range of different processes and thus are multimodal, the mean, and especially the standard deviation, are not good measures of central tendency and spread (variability). For example, all class boundaries defined by this approach are strongly influenced by the outliers.

Yet, to identify statistical outliers is often one of the reasons why the samples were collected at the outset Reimann et al., 2005c). Geochemists should recognise the important distinction between the extreme values of a normal (or lognormal) distribution (detected satisfactorily by the mean± standard deviation approach) and outliers due to multimodal distributions (e.g., several “overlapping” geochemical processes), where this approach is unfounded.

Even when using percentile-based classes, many geochemists accentuate differences among the high numbers at the expense of variations at lower concentrations (Figure 5.12, lower right).

This results in a map focussing completely on the high values and neglecting the lower end of the distribution. In some studies, e.g., for trace element deficiencies, this is completely inappropriate. Focussing on the uppermost end of the distribution again neglects (as with arbitrarily chosen classes) the fact that there exists something called the “data structure”.

5.10 Summary

Environmental data are characterised by their spatial nature. Mapping should thus be an integral and early part of any data analysis. Once the statistical data distribution is documented (Chapters 3 and 4) the task is to provide a graphical impression of the “spatial data structure”. There

SUMMARY 89 are many different procedures for the production of maps, only a few provide a graphical impression of the spatial data structure. As a first step the focus should not be on the high values alone but rather on an objective map of the complete data distribution of all studied variables. Classes need to be carefully spread over the whole data range, the log-boxplot or percentiles are well suited for class selection. They need to be combined with a suitable symbol set (e.g., EDA symbols).

If only “highs” and “lows” are of interest proportional dot maps provide nice, and apparently

“easy”, maps. Just as the statistical data distribution should be documented in a series of summary plots the spatial data distribution needs to be documented in a series of maps of all variables before more advanced data analysis techniques are applied. Black and white maps are best suited for this purpose. Map scale and a north arrow should always accompany a map.

An easily legible legend is a further requirement.

More advanced mapping techniques will be used in a later stage, when surface, contour or isopleth maps are designed for “presentation”. For this purpose point data need to be inter-polated to surface data. Surface maps can be highly manipulative, the choice of parameters and the choice of colour are important considerations. When kriging is used for constructing surface maps a solid understanding of the method is required. Maps prepared by smoothing are not based on statistical assumptions in the same way as kriged maps. If the additional information from kriging (semivariogram, kriging variance) is not required, smoothing maps may be the better choice.

6

Further Graphics for Exploratory

In document Statistical Data Analysis Explained (Page 110-114)