Deriving models of spatial variation for individual indicators to estimate mean values for land cover classes

Indicators other than soil organic carbon. For the indicators other than carbon we

assume that:

• The indicator has a normal distribution or a log-normal distribution. • The mean value of the indicator varies according to land class. • The spatial covariance may be represented by an exponential model To fit such a model to the available data:

• If both CS and NSI or NSIS data are available, add a constant to the CS data such that both data sets have the same median.

• Decide whether a log transform is required for the assumption of normality to be valid. A log transform is made if the skew of the indicator is larger than one.

• Make a first approximation to the mean value of the indicator over each Land Cover class by simple averaging.

• Subtract the appropriate mean from each indicator value and calculate a point estimate of the variogram for each data set by the method of

moments.

• If both data sets are available then scale the CS data such that the average of the point estimates over the five largest lag distances are the same for each data set. Then combine datasets and calculate a single point estimate of the variogram. Note when making these point estimates pair comparisons between observations from different datasets are not included.

• Fit an exponential variogram to the point estimate by weighted least squares.

• Use this estimated variogram to re-estimate the mean over each Land Cover class by generalised least squares.

We are aware that the estimation of variograms by method of moments from ordinary least squares residuals is not unbiased. However, alternative (likelihood) methods would be computationally prohibitive on these large data sets. Another consequence of the size of the data sets is that any bias should be small.

In Figure 3.3 we illustrate this process for soil cadmium over England and Wales. The distribution of cadmium observations from the NSIE&W_{is positively skewed}

(skew=2.18). Therefore a log transform was applied and the skew reduced to -0.47 and the long tail in the histogram removed. The point estimates from the NSI data (O’s in Figure 3.3) were smooth but there were no estimates for lag distances less than 5 km. The point estimates from the CS data (Xs in Figure 3.3) are noisier but provide

information from lag distances of <1 km. Point estimates over the shortest lags are known, in general, to be more accurate than point estimates over longer lags. The CS data was transformed as described above to ensure that its median and sill variance matched the NSIE&W_{data. A point estimate of the variogram was then made for the}

combined data sets (+’s in Figure 3.3) and an exponential model (continuous line in Figure 3.3) fitted to this point estimate by weighted least squares.

Note: The two graphs at the top are histograms showing the frequency of observations in intervals of (left) Cadmium concentration and (right) log of Cadmium concentration. The lower graphs are variograms showing how the semi-variance depends on lag distance (lag distance: distance class interval. Semi- variance: a measure of variability between data points at each distance (lag)).

Figure 3.3 Fitting a spatial model for soil cadmium concentration (mg kg-1) over England and Wales.

The variogram is one way of representing the spatial variance model. It shows how the variance of the difference between two observations of the soil (coordinate on the Y- axis of the graph) depends on the distance between them (the lag distance on the X- axis of the graph). The variogram typically approaches a 'sill' value at some lag

distance, and at this and longer distances two observations are uncorrelated with each other. The resultant graph indicates that model-based estimation should only be applied if the sampling points are located within 50 km of each other; beyond this there is no spatial correlation between observations to exploit in estimating land-cover class means.

Soil organic carbon. We made a number of changes to the above method when

modelling the spatial variation of SOC. A more complicated model of spatial variation was assumed for the status and change in SOC than for the status of the other indicators. This was due to the greater importance attached to monitoring soil organic carbon and it having more complicated spatial variation than the other indicators. For reference, the technical detail was that rather than assuming that the spatial covariance model of the random component of variation is the same over each land class (as previous), we assumed that the spatial correlation of the random component of variation is the same over each Land Cover class but that the variance of the random component varies according to land class.

Distribution of SOC is often bimodal (has two most frequently observed values). Figure 3.4 shows the histogram of SOC in Scotland from NSIS with peaks at around 5 per cent for mineral soils and around 50 per cent, the later due to highly organic/peaty soils. Therefore we used soil maps to divide the UK into areas of peaty and non-peaty soils and fitted separate models for each area. The peaty areas (shaded green in the map in Figure 3.4) cover most of the North West of Scotland. The separate histograms of peaty and non-peaty observations of SOC are not bimodal (lower two graphs Figure 3.4).

Due to the extra complexities in this model no attempt is made to include the CS data and the model was fitted by restricted maximum likelihood methods (REML). Change in SOC was estimated purely from the NSI data and the same approach as for SOC was applied.

Note: The distinction between peat and non-peat was based on the soil classification for the site, not on the measured SOC, which is why the ranges overlap substantially.

Figure 3.4 The distribution of SOC (%) over peaty (highly organic) and non-peaty soils in Scotland.

In document Design and operation of a UK soil monitoring network (Page 52-55)