were on a square grid. The spacing between points on this grid was x km where x is the smallest integer such that the number of sampled locations is less than 90 per cent of the total number of observations specified in Table 3.3. Two percent of the total sampling locations were positioned 2 km from randomly selected sampling locations on the grid. These observations aided the fitting of spatial models of variation in soil variables by providing information about the spatial variation over short distances. Sampling locations of any Land Cover class that was over-sampled (according to the number of observations allocated using relative areas specified in Table 3.3) were removed at random. The remaining sampling locations were selected to ensure that each Land Cover class was sampled at the specified rate. Initially locations of the required Land Cover class at the centre of the original grid cells were selected but where this was not possible locations were selected at random.
The S2 systematic sample scheme for Wales is shown in Figure 3.1. Most points are spread evenly but there is some clustering due to (i) the inclusion of some close pairs to learn about variation over short distances and (ii) the local extent of some land cover types meaning that the sample scheme must be clustered to sample them adequately.
Model-based, optimised sampling: If the mathematical model of spatial variation is
known, then for a particular sampling scheme it is possible to calculate the estimation variances of the means for each variable due to both the spatial variation of the property, and the uncertainty of the model that we will subsequently compute. These estimation variances differ according to the sampling locations. The challenge is to find the distribution of sample points that will minimise the expected value of the estimation variance.
We used an optimisation algorithm known as spatial simulated annealing to find the set of sampling locations that minimised the mean value of each of the estimation
variances, within the constraints listed in Table 3.3. This algorithm has been widely used for such problems (for example, see Marchant and Lark, 2006) and we do not consider its detail here. We included a constraint in the optimisation procedure to ensure that coastal sites were not over-represented.
In a real survey the model of spatial variation would be unknown prior to sampling and would be different for each property. In this study we based our sampling scheme on a simple model of the variability of SOC at UK scale. In the assessment of the scheme (below) we generated data from a rather more realistic model. This builds a constraint into the sampling scheme, but it is one that reflects the real-world situation. Because of this we do not automatically expect the optimised scheme to outperform the simpler systematic grid. The simpler scheme could turn out to be more representative. Figure 3.1 includes the optimised sample scheme for the S2 survey in Wales. It includes a number of close pairs of points. Each pair is generally of a different land cover type so that the differences in means over different land cover types may be accurately estimated.
Figure 3.1 Sample designs for Wales at intensity S2 and Land Cover Class map. Wales was used to illustrate the designs since the sample points can be distinguished at this scale.
Design-based, stratified random sampling:
Random sampling can be used to give an unbiased, design-based, estimate of the mean provided the probability of including each unit (such as 1 km square) in the sample is known (and none of these are zero). Just as the design-based estimate of the mean depends on the probability of including each unit in the sample, so the design-based estimate of the variance depends on the probabilities of pairs of units appearing in the sample. Hence an estimate of the variance of this estimated mean can also be obtained provided the inclusion probabilities of all pairs are known, with the proviso again that none of these are zero. The variance of the mean can be greatly reduced if, instead of sampling from the whole population at random, samples are drawn from sub-populations, established by stratification of the sampling locations. Estimates of means and variances can be obtained straightforwardly for each stratum and for any combination of strata. The practical implication of this is that the
classification of the UK to draw the sample (the design strata) could be subsets of the classification of the UK for which summary information is required (the reporting classes).
Stratified random sampling is greatly simplified if the assignment of variables to strata is known at the time the sample is drawn, although this is not essential. The broad condition under which stratified random sampling leads to more precise estimates of the mean than simple random sampling is that the average variance between variables that are in different strata must be greater than the average of the variances between variables within the same stratum. Provided this broad condition holds, there is no need to assume equality of the variances between variables within the same stratum.
Assumptions about the distribution of observations are required to construct confidence intervals about estimates of the mean for a variable such as an indicator.
The principal basis we used for defining strata was the Land Cover class and country. However, as we might expect additional variation between widely separated locations within a single Land Cover class, the 1 km squares in England and Scotland were split into seven and four geographically defined blocks respectively (Figure 3.2). Wales and Northern Ireland were retained as single blocks, giving a total of 13 blocks. The proposed sample size for each Land Cover class for each country was divided in proportion to the area of that Land Cover class in each block. The combinations of Land Cover class and geographical blocks formed the stratification from which samples were drawn at random. For some Land Cover classes, blocks had to be combined to form larger strata to ensure a sample size of at least two in every stratum. The resulting stratified random sampling can thus be used to provide estimates for the reporting classes, which in this instance were individual land uses (defined by Land Cover classes as a surrogate for NLUD) in each country.
Figure 3.2 Locations of geographic blocks within countries used to increase the efficiency of the design-based sampling schemes.
Design-based, cluster sampling: The costs associated with implementing a sampling
scheme will be reduced if less time could be spent travelling between sample locations. Hence costs will be less if the samples are clustered. Conversely, the information in the data may be less than for a random sample using the same number of sampling points because observations within a cluster will be spatially (positively) correlated. We investigated the properties of cluster sampling for samples of size 4000 based on a two-stage process: firstly, a sample of 1000 1 km squares was selected by an identical method to achieve a stratified random sample of size 1000. Four points were then located at random within each of the chosen 1 km squares.