8 Discussion
8.2 Spatial Uncertainty and Soil Map Refinement
The present thesis includes the development of a method to derive a practicable, spatial uncertainty measure in context of a DSM approach using RF. For geostatistical soil property mapping, the kriging variance presents a spatially distributed error estimate [Knotters et al., 1995; Carré & Girard, 2002; Diodato & Ceccarelli, 2006; Bourennane et al., 2007; Qu et al., 2013; Sun et al., 2013]. Generally in geostatistics, the spatial dependence of a target variable is modeled by the variogram function, whereby local predictions are derived from the weighted averages of neighboring observations [Goovaerts, 1999]. While the weights are determined by minimizing the variance of each local prediction, this quantity represents the kriging error [Burgess & Webster, 1980]. Besides, Malone et al. [2011] proposed a method to
8 Discussion
79 quantify spatial uncertainties based on prediction intervals (PI-uncertainty). Primarily, the intervals are derived from the residuals between predicted and observed data. Subsequently, a covariate space is clustered according to similar residuals. Then, a prediction interval is generated for each cluster based on the empirical distribution of residual observations of each cluster. According to the grade of membership to each cluster, a prediction interval is ascribed for each prediction location in the covariate space.
The proposed spatial uncertainty method is based on multiple decision tree realizations within a RF regression approach (cf., Section 6.3). The uncertainty measure is also expressed by the variability of prediction intervals. However, the intervals are straightforwardly derived for each prediction location, based on the results of the multiple randomized RF decision tree models (cf., Section 6.3). Thus, compared to the PI-uncertainty, the applied spatial uncertainty does not require an additional regionalization of prediction errors, which limits practicability due to statistical complexity and usually scarce temporal resources. Nevertheless, the PI- uncertainty accounts for all sources of uncertainty, only depending on the residuals derived from the model output and the observed data. Contrary, the kriging error depends on the model assumption for the variogram, the observed soil data and their spatial configuration [Brus et al., 2011; Lark & Lapworth, 2012]. Furthermore, the kriging error relies on the use and limitations of geostatistical methods, such as a relatively high sample density and the smoothing of local details in the predictions [Goovaerts, 1999]. The application of the spatial uncertainty measure also implies dependencies, such as the prerequisite to use a RF prediction model. Moreover, the RF model is often discussed to only allow limited interpretability, since the relation between predictor and prediction cannot be assessed for each tree. However, RF is increasingly applied in DSM [Grimm et al., 2008; Wiesmeier et al., 2011; Ließ et al., 2012; Heung et al., 2014; Schmidt et al., 2014]. This can be ascribed to the combined merits of modeling non-linear relationships, handling categorical and continuous covariates, resistance to overfitting, robustness to noise in the feature space, an implemented unbiased measure of error and variable importance, only a few user-defined model parameters, and a reduced computational load [Svetnik et al., 2003; Díaz-Uriate & de Andrés, 2006; Peters et al., 2007].
While the kriging error presents a well-established spatial error estimate [Knotters et al., 1995; Carré & Girard, 2002; Diodato & Ceccarelli, 2006; Bourennane et al., 2007; Qu et al., 2013; Sun et al., 2013], the PI-uncertainty is less common. Malone et al. [2011] applied it
8 Discussion
80
using a DSM case study predicting organic carbon and available water capacity. The proposed method was approved by comparing three RF prediction approaches referring to conventional accuracy measures and the proposed uncertainty measure (cf., Section 6.5). The calibration of the model approaches were based on legacy samples (LD), LD augmented by uncertainty- guided sampling (LDUnc), and LD augmented by simple random sampling (LDRandom), respectively. For both target soil properties, topsoil silt and topsoil clay, all quality estimation methods show uniform results. Thus, the LDUnc approach outperforms the approach using LDRandom, while both outperform the LD approach in terms of a decreased spatial uncertainty and increased prediction accuracies (Table 7; Figure 10). The uniform similarity between the results of all quality estimations approves the validity of (i) the conventional accuracy measures and (ii) the proposed spatial uncertainty measure.
A further aim of this thesis was to improve the initial DSM approaches of silt and clay predictions that were solely based on legacy samples. Thus, the initial legacy calibration set was augmented by an uncertainty guided sampling. Clifford et al. [2014] selected additional samples that, in combination with available legacy samples, cover the covariate space and approved the method by a simulation study. Carré et al. [2007] proposed a method to identify locations for additional samples by previously analyzing the distribution of legacy samples in the covariate space. Although the approach was approved by two different data sets, the method only refers to the covariate space, thus, disregarding geographical information. In this study, the study area was stratified according to the quartile distribution of the previously determined spatial uncertainty. Subsequently, additional samples were obtained in those strata with the lowest conformity between the covariate distributions in the strata and available legacy samples (Figure 7).
The spatial uncertainty values of both approaches were combined (cf., Section 6.5). This procedure implies a favored incorporation for the soil property, which generally shows increased prediction uncertainty. Furthermore, the procedure implies a harmonization in quality of both initial soil property predictions. The results confirmed these implications, while silt was favored with an uncertainty decrease of 31% compared to clay with a decrease of 27% in the LDUnc approaches (Table 7; Figure 10). The similar proportions of decreasing uncertainty between both predictions approve the method of combining the uncertainty maps in our case study.
8 Discussion
81 Collard et al. [2014] sampled a legacy soil map for calibrating a regression model and improved the class purity by 10%. Other studies showed an accuracy improvement of 6% to 19% using DSM approaches to upgrade legacy soil maps [Kempen et al., 2009; Yang et al., 2011; Rad et al., 2014]. The results from this study show increases in accuracy of 12% and 14% for the predictions of clay and silt when comparing the LD approach with the LDUnc approach. Generally in DSM, accuracies with R² > 70% are unusual, while R² < 50% are common [Malone et al., 2009]. The accuracy results of the best performing RF approach, which has been calibrated with LDUnc, show explained variances of R² = 0.59 for silt and R² = 0.56 for clay. The successful application of the spatial uncertainty measure, thus, improving the quality of initial DSM products by an uncertainty guided sampling, approves the practicability and validity of this method.