THE USE OF THE VARIOGRAM CLOUD IN
GEOSTATISTICAL MODELLING
ALEXANDER PLONER*
Institute of Mathematics and Applied Statistics, University of Agricultural Sciences, Gregor Mendel Straûe 33, A-1180 Wien, Austria
SUMMARY
This paper gives an overview of some of the possible applications of the variogram cloud in geostatistical modelling, mainly in exploratory data analysis (EDA), but also in preliminary parameter quanti®cation,
and model validation. Copyright#1999 John Wiley & Sons, Ltd.
KEY WORDS spatial data analysis; geostatistics; variogram cloud; exploratory data analysis
1. INTRODUCTION
The variogram cloud in its traditional sense, i.e. as a scatter plot of the variogram estimates versus the distances, is well established in geostatistical practice (e.g. Isaaks and Srivastava 1989, p. 181). In our work with geochemical datasets we have found a number of generalizations of this concept, which are useful, but less well known, and which we will present in some detail in the following sections.
1.1. De®nitions
The basic tool for describing the autocorrelation structure of a spatial random process Z(x) ranging over some domainDRn is the variogram given by
2g x;x ~ VarZ x ÿZ x~ ; 1 wherex,xÄ are locations in D; as we will generally have only one observation zi per sampling locationxi,i1;. . .;n, we need the intrinsical hypothesis as an additional assumption, which basically amounts to
2g x;x ~ E Z x ÿZ x~ 2
2g x~ÿx; 2
implying constant mean of the process Z(x) and invariance to translation of its covariance structure.
CCC 1180±4009/99/040413±25$17.50 Received 11 May 1998
Copyright#1999 John Wiley & Sons, Ltd. Accepted 10 December 1998 ENVIRONMETRICS
Environmetrics,10, 413±437 (1999)
* Correspondence to: A. Ploner, Institute of Mathematics and Applied Statistics, University of Agricultural Sciences, Gregor Mendel Straûe 33, A-1180 Wien, Austria.
The variogram is typically estimated by a method-of-moments estimator. 2g h ^ 1 jN hj X N h Z x ÿZ x~ 2; 3 where summation is over the index set
N h f x;xjx~ ;x~ 2D;x~ ÿx hg: 4 In practice, when dealing with non-gridded data, we are hardly ever able to use (4), because there will be only few if any pairs of observations where the sampling locations dier exactly by some vectorh. Therefore we have to de®ne classes of distance vectors in order to get tolerably stable estimations, using the index set
N h f x;xjx~ ;x~ 2D;x~ ÿx'hg; 5 where the meaning of ' depends on the de®nition of the tolerance regions; typically, the estimates 2g ^l1hj;. . .;2g ^ lkhjcalculated for some distancesl1;. . .;lkin directionhjare called the empirical variogram function in this direction; if the empirical variogram functions are the same for all directionsh1;. . .;hl, then an isotropic variogram model is ®tted, where the value of 2g(h) depends only on the distancekhk; if the empirical variograms vary strongly in dierent directions, we will assume that we can ®nd a linear transformationAofDso that the appropriate anisotropic model 2ga can be expressed as some isotropic model 2gi applied to the transformed distance vector:
2ga h 2gi kAhk:
If the model is not continuous in the origin, the height of the jump is called the nugget constant; if the model takes on a constant value for distances beyond a certain limit, this value is called the sill and the limit the range of the model; the existence of a sill signi®es that observations are not correlated for distances larger than the range, and equivalently that the variance of the process Z(x) exists.
If we assume thatZ(xi) follows a normal distribution, we can make use of the fact that the squared dierences divided by the variogram arew2
1-distributed:
Z xi ÿZ xj2
2g xjÿxi w21: 6
1.2. Data
The examples presented in the following sections will be based on an anonymized subset of 145 observations of the large geochemical dataset that was obtained on the peninsula Kola in northern Europe, and which is described in Reimannet al. (1996): the soil content of more than 30 elements in dierent depths was measured in more than 600 locations in an eort to assess the impact of major industrial activity in this region and to map the degradation of the particularly vulnerable arctic ecosystem. Background information and a very detailed list of references
Copyright#1999 John Wiley & Sons, Ltd. Environmetrics,10, 413±437 (1999)
are available on an excellent webpage the Norwegian Geological Survey has created at http://dec01.ngu/no/Kola/.
The geostatistical analysis in Reimannet al. (1996) is focused on the ecient and parsimonious modelling of the huge amount of data, with the foremost intention of obtaining correct graphical representations that can be easily interpreted. The ideas and tools presented in this paper are aimed at facilitating an intuitive approach to further in-depth analysis based on more sophist-icated models.
2. DEFINITION OF THE VARIOGRAM CLOUD
As mentioned above, the term `variogram cloud' usually refers to a special kind of plot (Cressie 1991, p. 41). In this article, we will call the set of all pair-wise distance vectors combined with the squared dierences of the observationsz x1;. . .;z xna variogram cloud:
f xjÿxi; z xj ÿz xi2ji6j;i;j1;. . .;ng:
In many cases this basic de®nition is too limited, therefore we will consider in the following article also variations of this set where either the distances or the squared dierences or both are suitably transformed by applying some useful functiontland/ortm:
f tl xjÿxi;tm z xj ÿz xi2ji6j;i;j1;. . .;ng: 7 This set can then be plotted in numerous ways, depending on the focus of the analysis. Our motivation for working with this rather large dataset is mainly that we feel that we sacri®ce a lot of detail visible in the set of pairs by using empirical variogram functions right from the start.
3. APPLICATIONS OF THE VARIOGRAM CLOUD 3.1. Exploratory data analysis (EDA)
3.1.1. The standard plot. What is usually referred to as variogram cloud is just a plot of the squared dierences versus the distances, i.e. a scatterplot of the set of pairs
f kxjÿxik; z xj ÿz xi2ji6j;i;j1;. . .;ng:
This graphical representation of (7) is most useful when linked interactively to a map of the sampling locations as described in Haslett et al. (1991), so that pairs of observations that are highlighted in the cloud are labeled and connected in the map, and vice versa (1:1-link), although we also used 1:n-links, where clicking on a point of the map highlights all pairs of observations including this point. These techniques worked well for the detection of the following phenomena: (i) Global outliers are measurements that are distinctly separate from the main part of the data, and which can be also spotted by classical univariate EDA, e.g. a boxplot; in a standard plot of the variogram cloud they will stand out because for every distance, the squared dierences of pairs that were formed with such an outlier will be signi®cantly larger than the rest of the cloud.
Copyright#1999 John Wiley & Sons, Ltd. Environmetrics,10, 413±437 (1999)
(ii) Local outliers are hidden in the main bulk of the observed data, but dier markedly from the neighbouring values. Local outliers are more sneaky than global ones, as they will result in high squared dierences for small distances close to the origin only (thereby contributing to a high estimate of the nugget constant), but will behave normal for medium to large distances.
(iii) Pockets of non-stationarity are small areas withinDwhere one of the two assumptions of stationarity (2) appears to be incorrect: for non-stationarity in the mean, the same reasoning as for local outliers holds; non-stationarity of the dependency structure may possibly be detected, if a cluster of points exhibits a larger variability than the surrounding points, but areas where the variation is markedly smaller than usual will be lost in the crowded lower part of this plot.
Another common phenomenon in geostatistical data, which may indicate the absence of a normal distribution, is a relative paucity of values, especially if the observed variable is discrete (ppm, percentages of grade, etc.); this will result in characteristical horizontal stripes in the cloud, where there are not enough distinct values to cover the whole range of possible dierences.
Examples: Figure 1 shows the variogram cloud for lead (Pb), where all pairs including observation 139 are marked with a ®lled circle; the impression that this is a global outlier could be con®rmed by a conventional stem-and-leaf display. Figure 2 shows a variogram cloud for nickel (Ni), where pairs including observation 118 are highlighted, hinting at another global outlier; univariate EDA is inconclusive, as observation 118 is the maximum of the dataset and lies almost exactly at the usual rule-of-thumb limit for outliers (i.e. quartiles+1.5(interquartile range)), therefore we decided to have a closer look at the neighbourhood via a linked map, as shown in Figure 3; it appears that the only remarkable dierence is between observations 118 and 136, marked with an empty circle in both subplots, whereas there is a striking discrepancy between 136 and the surrounding observations, as can be seen in Figure 4. We therefore concluded that 118 is really an acceptable value at the high end of the range, whereas the value of Ni at location 136 is questionable in this context, although this value is well within the quartiles and not remarkable if compared with observations farther away, as illustrated by Figure 5.
Figure 6 shows an example of a `striped' variogram cloud for thorium (Th), which is due to only 15 distinct values for this variable.
3.1.2. The square-root-dierences cloud. This is almost the same as above, only with the square-root of the absolute distances instead of the squared distances, i.e. we consider a plot of the set
f kxjÿxik;jz xj ÿz xij1=2ji6j;i;j1;. . .ng:
The motivation for taking the root is to stabilize the spread of the squared dierences, and to make them less skewed; the choice of the fourth root is prompted by the assumption of normality of the underlying process, because a power transformation with exponent 0.25 will make a w2-distributed variable reasonably symmetric and platykurtic (Cressie 1991, p. 40).
The square-root-dierences cloud is useful much in the same way as the standard cloud plot. It has the advantage of not only pulling in large values, but also of thinning out the clutter of points for small values. A comparison between the standard plot and the square-root-dierences may be informative for cross-checking potential outliers, but beyond that the square-root-dierences
Copyright#1999 John Wiley & Sons, Ltd. Environmetrics,10, 413±437 (1999)
Figure 1. Variogram cloud for variable Pb, observation 139 marked C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) APPLIC A TIONS OF THE VARIOGR AM CL OUD 417
Figure 2. Variogram cloud for variable Ni, observation 118 marked C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) 418 A. PL ONER
Figure 3. Linked variogram cloud and map for variable Ni, neighbourhood of observation 118 marked C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) APPLIC A TIONS OF THE VARIOGR AM CL OUD 419
Figure 4. Linked variogram cloud and map for variable Ni, neighbourhood of observation 136 marked C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) 420 A. PL ONER
Figure 5. Linked variogram cloud and map for variable Ni, two transsects through observation 136 marked C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) APPLIC A TIONS OF THE VARIOGR AM CL OUD 421
Figure 6. Variogram cloud for variable Th C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) 422 A. PL ONER
cloud may help in getting a visual impression of the appropriateness of the assumption of normality.
Examples: In Figure 7 we see the square-root-cloud for the variable Pb, where all pairs with observation 139 are marked; obviously the transformation of the dierences has pulled this outlying value closer in, so that it appears more palatable than in Figure 1; the dierences are distributed more evenly, so that we can now see that there seems to be a problem with small dierences: there are disproportionally many equal values all over the place, and the correspond-ing zero dierences are distinctly apart from the main bulk of the cloud; it appears that there are too many equal values to uphold the assumption of normality.
3.2. Exploratory parameter assessment (EPA)
We use the term `exploratory parameter assessment' in analogy to `exploratory data analysis' as a euphemism for initial visual parameter estimation.
3.2.1. Assessing anisotropy. In the presence of anisotropy, outlier detection or model ®tting require the identi®cation of the anisotropy parameters beforehand. In case of two-dimensional locations, these are the directions and lengths of the principal axes of the ellipsoid characterizing the linear transformationA. (Actually, the ratio between the lengths of the axes is sucient.) We have found dierent representations of the untransformed variogram cloud to be of varying interest:
(i) Maybe the most obvious idea is to consider directional variogram clouds, in direct analogy to modelling anisotropy with empirical variogram functions, i.e. to consider the distances between points along a transsect throughD, and to compare these sub-clouds for dierent directions; this turns out to be not very satisfying for two reasons: on the one hand, we have to de®ne some kind of tolerance region around any given angle in order to get a reasonable number of observations per plot, which is basically what we wanted to avoid, and on the other hand, it is very dicult to get a visual range estimate from a variogram cloud, which would be necessary in order to ®nd the long and the short axis among the given directional plots.
(ii) The next obvious idea is probably a conventional 2D-symbol-plot, with the coordinates of the separation vectors on the axes, and dierent levels of squared dierences coded with dierent symbols: if the underlying process Z(x) is isotropic, then the values should increase at least approximately in concentric circles centered on the origin; if the rate of increase is markedly dierent in some directions, we might deal with realizations from an anisotropic process. In practice, such a plot is not easy to read; even for a modest number of observations, and using a set of symbols designed to minimize overlap (e.g. Cleveland 1994, p. 146), the plot is very crowded; and even if there are clear axes of anisotropy, it is hard to read o their directions from this kind of plot.
(iii) The shortcomings described above can be overcome by using polar coordinates for the symbol-plot: We have done so by marking the actual distances between points on the horizontal axis of the plot, and the angle on the vertical axis; a circle in the plane is then a vertical line, therefore the squared dierences should be approximately vertically constant, ifZ(x) is isotropic; a horizontal line with markedly lower values corresponds to a direction
Copyright#1999 John Wiley & Sons, Ltd. Environmetrics,10, 413±437 (1999)
Figure 7. Square-root-dierences cloud for variable Pb, observation 139 marked C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) 424 A. PL ONER
in which the squared dierences grow more slowly, which may indicate that this is the direction of the shorter axis of the anisotropy ellipsoid; it is now easy to read o the actual angle of the axis, and to verify that the direction orthogonal to a potential minor axis shows fast-growing squared dierences and is therefore the corresponding major axis. Besides, we can reduce the clutter by only considering angles in the range [0,p[, eliminat-ing the redundant symmetry induced by considereliminat-ing bothxiÿxjandxjÿxithat takes up half the plotting area in a simple symbol-plot. Another advantage of this kind of plot is the fact that it shows clearly beyond which distance there are only pairs of observations for a subset of all possible directions. This can lead to spurious correlations in variogram estimation due to odd border- and corner-eects, which is why usually only pairs of observations up to half the maximum distance are included during estimation; by using a polar symbol-plot of the kind described, this limit can be easily veri®ed and possibly also increased.
(iv) Once the orientation of the anisotropy ellipsoid has been assessed, we can use simple projections onto the planes de®ned by the axes of the ellipsoid and perpendicular to the plane, in order to verify the correctness of our initial estimate: a projection on the plane de®ned by the short axis should show a pronounced central valley in the cloud of projected dierences, whereas a projection on the plane of the long axis should show a uniform wall of values forming this valley.
After estimating the orientation of the ellipsoid, these plots can be used to assess the anisotropy factor by experimenting with dierent values until the transformed data appear to be suciently isotropic.
Examples: Figure 8 shows a standard symbol-plot for variable Na; even though it gives the impression that dierences are increasing slower along a transsect approximately angled at 0.75p, the situation is not very clear when compared with Figure 9, which is the same plot in polar coordinates: we can see that the angle for the short axis of the anisotropy ellipsoid is actually closer to 0.8p, and that the increase of the dierences is markedly stronger in the orthogonal direction of 0.3p; besides we can see that it will be a good idea to consider only distances up to approximately 175,000 m for ®tting a variogram model. Figures 10 and 11 show con®rmatory projection plots for 0.3p/0.8pas angles of the anisotropy axes.
3.2.2. Assessing structure. Obviously, scaling the squared dierences by some power lof the distance, i.e. plotting
kxjÿxik; z xj ÿz xi 2 kxjÿxikl ! ji6j;i;j1;. . .;n ( ) ; 8
will be useful when ®tting a power model of the form g h ckhkl, but it can be useful in exploring and summarizing the dependency structure even in situations where a global plot of (8) does not make sense, because the scale will be determined by only a few observations lying closely together, so that the rest of the plot is compressed into too little space to show any detail. This can be resolved by de®ning some kind of cut-o limit for the scaled distances, but it is more rewarding to divide the whole range of distances into subsets, that can then be plotted and
Copyright#1999 John Wiley & Sons, Ltd. Environmetrics,10, 413±437 (1999)
Figure 8. Symbol-plot of the variogram cloud for variable Na C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) 426 A. PL ONER
Figure 9. Symbol-plot of the variogram cloud for variable Na, polar coordinates C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) APPLIC A TIONS OF THE VARIOGR AM CL OUD 427
Figure 10. Projection of the variogram cloud for variable Na on a vertical plane passing through the origin at 0.3p C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) 428 A. PL ONER
Figure 11. Projection of the variogram cloud for variable Na on a vertical plane passing through the origin at 0.8p C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) APPLIC A TIONS OF THE VARIOGR AM CL OUD 429
scaled individually. For a simple dependency structure, we may typically ®nd the following subdivisions:
(i) a small set of pairs markedly closer to each other than the rest of the data, i.e. those that will compress the scale on a global plot into uselessness; these pairs will be primarily responsible for the presence and size of a nugget constant in our model, so while ®nding a proper power relation between squared dierences and distances via l may not be our main concern here, the identi®cation of this set may be helpful in assessing the nugget eect.
(ii) the main part of the data, where the actual structure, i.e. the decrease in correlation with increase of distance, is displayed. Experimentation with dierent values may yield althat describes this relationship adequately.
(iii) the set of pairs with the largest distances (in practice usually the pairs with separation larger than half the maximum distances) may show the same pattern as the main part of the data for similarl, which generally indicates that a power model can be ®tted. More often, the scaled dierences will decrease with distances increasing beyond a certain limit, even for althat produced a stable pattern with the rest of the data. A possible explanation for this is the presence of a sill, so that the squared dierences do not increase beyond the corresponding range; another possibility is a border eect, e.g. when for rectangular domainDthe observations that are separated by larger distances tend to cluster in the corners of the rectangle, thereby giving the impression of higher correlations, as mentioned in 3.2.1. In the latter case, we are more interested in the rough range estimate we get by this subdivision than in ®nding a reasonablel.
Examples: Figure 12 shows the standard plot of the variogram cloud for chrome (Cr); Figure 13 shows that scaling the cloud by the distances to the power of 0.4 stabilizes the spread uniformly over all distances, suggesting the ®t of such a model.
Figure 14 shows the standard plot of the variogram cloud for lanthanium (La); overall scaling does not work here, the reason is the single pair of observations in the upper left corner of Figure 15, where we can see the scaled dierences in the neighbourhood of the origin. Figure 16 shows that the scaled distances exhibit a constant spread for the power of 0.6 over a fairly wide range; note that the dierence in the vertical scale of Figure 15 and 16! Figure 17 ®nally shows that a sill seems to be reached at a distance of about 140,000 m, as the scaled dierences decrease strongly for the exponent 0.6 which produced stable spread in Figure 15.
3.3. Model validation
Once a modelg(h) has been ®t to the data, the quality of the ®t should be judged, comparing it for dierent areas in the domainDof interest. Barry (1996) uses the assumption of normality to highlight pairs of observations for which (6) is below or above speci®ed critical quantiles of the w2
1-distribution, in order to spot places where the ®t ofg(h) is uncomfortable. Similarly, we can
plot kxjÿxik; z xj ÿz xi2 2g xjÿxi ! ji6j;i;j1;. . .;n ( )
Copyright#1999 John Wiley & Sons, Ltd. Environmetrics,10, 413±437 (1999)
Figure 12. Variogram cloud for variable Cr, unscaled C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) APPLIC A TIONS OF THE VARIOGR AM CL OUD 431
Figure 13. Variogram cloud for variable Cr, scaled by the distances to the power of 0.4 C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) 432 A. PL ONER
Figure 14. Variogram cloud for variable La, unscaled C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) APPLIC A TIONS OF THE VARIOGR AM CL OUD 433
Figure 15. Variogram cloud for variable La, scaled by the distances to the power of 0.6, small distances C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) 434 A. PL ONER
Figure 16. Variogram cloud for variable La, scaled by the distances to the power of 0.6, medium distances C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) APPLIC A TIONS OF THE VARIOGR AM CL OUD 435
Figure 17. Variogram cloud for variable La, scaled by the distances to the power of 0.6, large distances C opyrig ht # 1999 John W ile y & Sons, Ltd. Envir onmetrics , 10 ,413 ±437 (1999) 436 A. PL ONER
linked to a map of observations, as described in 3.1.1; extreme pairs of observations will stand out more clearly and individually.
4. SUMMARY
We have found the generalized concept of the variogram cloud as the set of all pairwise dierences in location and observation and its graphical representations to be a promising tool in the initial stages of geostatistical modelling.
As a technical note we would like to add that we were pleasantly surprised by the speed of computation and display of the plots: even on our elderly workstation, the response time was quite good for datasets up to approximately 400 observations. For larger datasets, a faster machine, a high-end monitor, and the use of colour in our routines seem to be advisable.
The S functions used to create the ®gures in this article will be made available via StatLib. REFERENCES
Barry, R. P. (1996). `A diagnostic to assess the ®t of a variogram model to spatial data'.Journal of Statistical
Software1.
Cleveland, W. S. (1994).The Elements of Graphing Data. Summit, New York: Hobart Press.
Cressie, N. A. C. (1991).Statistics for Spatial Data. New York: Wiley & Sons.
Haslett, J., Bradley, R., Craig, P. S., Wills, G. and Unwin, A. R. (1991). `Dynamic graphics for exploring
spatial data, with application to locating global and local anomalies'. The American Statistician45,
234±242.
Isaaks, E. H. and Srivastava, R. M. (1989).An Introduction to Applied Geostatistics. New York: Oxford
University Press.
Reimann, C., AÈyraÈs, M., Chekushin, V., Bogatyrev, I., Boyd, R., de Caritat, P., Dutter, R., Finne, T. E., Halleraker, J. H., Jñger, é, Kashulina, G., Niskavaara, H., Pavlov, V., RaÈisaÈnen, M. L., Strand, T.,
Volden, T. (1996). `A geochemical atlas of the central parts of the Barents region'. InThe 6th Seminar on
Hydrogeology and Environmental Geochemistry 1996, no. 96.128, 46±47.
Copyright#1999 John Wiley & Sons, Ltd. Environmetrics,10, 413±437 (1999)