Spatial modelling - Mapping rainfall erosivity at the regional scale: a comparison of interpo

Estudio climatológico de la erosividad de la lluvia en la cuenca del Ebro

2.7. Mapping rainfall erosivity at the regional scale: a comparison of interpolation methods in the Ebro Basin (NE

2.7.1. Spatial modelling

In many studies the rainfall erosivity calculation is reduced to at-site analysis. An improvement focus on the reduction of the risk of erosion in landscape management and conservation planning is to obtain continuous maps for large areas as a preliminary step to evaluate the hazard. For this purpose a common procedure is the mapping of at-site estimated rainfall erosivity index values by means of interpolation techniques (e.g., Prudhome and Reed, 1999; Weisse and Bois, 2002).

In this article several interpolation methods including global, local and mixed approaches, are compared in order to determine which one describes better the spatial distribution of the average EI30 index and the R factor. A leave-one-out cross-validation technique was used for validating the goodness of fit (Efron and Tibshirani, 1997).

For the regression-based models, a digital elevation model (DEM) and a digital coverage of the Iberian Peninsula coastline were used. Both were obtained from the Ebro Hydrographical Confederation (http://www.chebro.es/).

Global methods

The global method used was generalized least squares (GLS) multiple regression.

Regression is a global approach to spatial interpolation, and it is based on finding empirical relationships between the variable of interest and other spatial variables.

Regression-based techniques adapt to almost any space and usually generate adequate maps (Goodale et al. 1998; Vogt et al. 1997; Ninyerola et al. 2000). The relationships between climatic data and topographic and geographic variables have been extensively analyzed throughout the scientific literature, and regression-based models allow exploiting this relationship to produce maps of the climatic parameters. Some authors have shown the advantages of incorporating the information provided by ancillary data on mapping extreme rainfall probabilities (Beguería and Vicente-Serrano, 2006; Casas et al., 2007).

Regression methods can be especially adequate in large regions with complex atmospheric influences, such as the Ebro Valley (Daly et al., 2002; Weisse and Bois, 2002; Vicente-Serrano et al., 2003), or if the sample network is not dense enough for local interpolation methods (Dirks et al., 1998).

GLS is an extension of the most common ordinary least squares (OLS) regression, which allows for autocorrelation in the dependent variable (Cressie, 1993). When dealing with spatial variables, it is common assumption that the observations are autocorrelated; this property forms, in fact, the basis of all geostatistical and mixed methods. The existence of autocorrelation in the residuals violates one of the main assumptions of OLS, thus making this technique not suitable for climatic variables with geographical imprint. This problem can be easily solved by using alternative regression techniques that account explicitly for spatial autocorrelation, such as GLS (Beguería and Pueyo, 2009). The differences between both methods can be easily explained by introducing their mathematical background. From the common OLS formula:

ε β +

= X

y , (2.23)

where y is the dependent variable, X is a matrix of p independent variables (model matrix), β is a vector of p+1 model coefficients to estimate, including a constant β0, and ε is a vector of random errors. In OLS it is assumed that the errors are normally distributed with mean 0 and variance I:ε~N(0,σ²I). In GLS, on the contrary, it is generally assumed that ε~N(0,Σ), where the error variance-covariance matrix Σ is symmetric and positive-definite. Different diagonal entries in Σ correspond to non-constant error variances, while nonzero off-diagonal entries correspond to correlated errors. Since the error variance-covariance matrix Σ is not known, it must be estimated from the data along with the regression coefficients β. Due to the high number of elements of Σ, it needs to be approximated by a parametric model. In the case of spatial regression, Σ can be adequately parameterized by a semi-variogram model. The semi-variogram model explains the covariance between the errors based on the distance between pairwise observations. Since the semivariogram constitutes the basis of geostatistical interpolation methods, it is explained in depth in a further section (see section Geostatistical interpolation methods).

We used a set of independent variables at a spatial resolution of 100 m. Elevation is usually the main determinant of the spatial distribution of climatic variables.

Nevertheless, other variables such as the latitude and longitude, or the incoming solar radiation may also have an influence on the distribution of erosive rains. All variables were derived from a DEM (UTM-30N coordinates). The incoming solar radiation is a spatially continuous variable that depends on the terrain aspect

(northern and southern slopes have low and high incoming solar radiation values, respectively). The annual mean incoming solar radiation was calculated following the algorithm of Pons and Ninyerola (2008). All these variables were processed in the MiraMon GIS package (Pons, 2006). Low-pass filters with radii of 5, 10 and 25 km were applied to elevation, slope and incoming solar radiation in order to measure the widest influence of these variables.

We used a Gaussian semivariogram model to parameterize the spatial autocorrelation between regression errors. As independent variables we used the spatial coordinates (longitude and latitude) in km and their squares (km²), the elevation (m a.s.l.), and the incoming solar radiation (J d^-1). The R statistical analysis package (R Development Core Team, 2008) was used for the regression analysis.

Local methods

In global methods, local variations are dismissed as random, unstructured noise, and the climatic map is created on the basis of general structure of the variable at all available points (Borrough and McDonnell, 1998). Local methods, on the contrary, use only the data of the nearest sampling points for climatic mapping.

Since interpolated values at ungauged locations depend on the observed values, local methods strongly depend on a sufficiently dense and evenly spaced sampling network.

Two local methods were used: inverse distance weighting (IDW) and splines. The IDW interpolation is based on the assumption that the climatic value at an unsampled point z(x) is a distance-weighting average of the climatic values at nearby sampling points z(x1), z(x2), …, z(xn). Climatic values are more similar at closer distances, so the inverse distance (1/di) between z(xi) and z(x) is used as the weighting factor:

where z(x) is the predicted value, z(xi) is the climatic value at a neighbouring weather station, dij is the distance between z(x) and z(xi), and r is an empirical parameter. Models with r = 1, r = 2 and r = 3 were tested.

The splines method is based on a family of continuous, regular and derivable functions. Splines are similar to the equations obtained from the trend surfaces or regression-based methods, but they are fitted locally from the neighbouring points around the candidate location x. A new function is created for each location x, without lost of continuity properties among the curves. Smoothing or tension parameters can be specified, resulting in more or less smoothed maps.

The predicted value z(x) is determined by two terms:

)

where T(x) is a polynomial smoothing term, and the second term groups a series of radial functions where ψj(r_i) is a known group of functions, and λj represents the parameters (Mitasova et al., 1995):

 the exponential integral function, and ri is:

(

) (

² _i

)

i x x y y

r = − + − , (2.27)

The algorithms for fitting splines are quite complex but are currently standard in GIS packages. In this paper several spline interpolations were used as implemented in the ArcGIS 9.3 software. Tension and smoothing parameters were φ = 400, φ = 5000, T(x) = 0 and T(x) = 400.

Geostatistical interpolation methods

Kriging methods assume that the spatial variation of a continuous climatic variable is too irregular to be modelled by a continuous mathematical function, and its spatial variation could be better predicted by a probabilistic surface. This continuous variable is called a regionalized variable, which consists of a drift component and a random, spatially correlated component (Burrough and McDonnell, 1998). Hence, the spatially located climatic variable z(x) is expressed by:

where m(x) is the drift component, i.e. the structural variation of the climatic variable, ε’(x) are the spatially correlated residuals, i.e. the difference between the drift component and the sampling data values, and ε’’ are spatially independent residuals. The predictions of kriging-based methods are currently a weighted average of the data available at neighbouring sampling points (weather stations).

The weighting is chosen so that the calculation is not biased and the variance is minimal. A function that relates the spatial variance of the variable is determined using a semi-variogram model which indicates the semivariance (γ) between the climatic values at different spatial distances.

The semivariogram describes the way in which similar observation values are clustered in space, in accordance with Tobler’s first law of geography (Tobler, 1970). The semivariogram is therefore a measure of the dissimilarity of data pairs as the spatial separation between them increases (Deutsch and Journel, 1998).

The semivariance is calculated for lagged sets of separation vectors hu as half the mean squared pairwise difference between the N observed values within the spatial lag, u:

To summarize the autocorrelation in space, a product-sum covariance model was automatically fitted to the semivariogram. First, only the sample semivariograms, γs,t(hs, 0), were considered. Valid semivariogram models were fitted to them, estimating automatically the partial range (Øu) and sill (sillu) and adding a nugget discontinuity (τu) at the origin to reflect spatial uncertainty if required.

Semivariogram models must be selected from a set of allowable functions that are conditionally negative definite (Mcbratney and Webster, 1986), i.e. spherical, exponential or gaussian models (Deutsch and Journel, 1998). There is some argument over the correct way to proceed in semivariogram model fitting (Diggle et al., 2002; Goovaerts, 1997); we favoured automatically fitting by the OLS method, followed by adjustment by eye, to reduce the effect of outliers. The Gaussian function adjusted best. Predictions may improve depending in the number of neighbours included in the interpolation. Our data were not very sensible to the number of neighbours. A combination of 9 neighbours, including at least 3 fitted best.

Several types of kriging methods have been proposed, depending on how the drift component m(x) is modelled (see, e.g., the reviews by Isaaks and Strivastava, 1989; Goovaerts, 1997; Burrough and McDonnell, 1998). Simple kriging (SK) assumes a known constant trend (expected value), m(x) = 0, and relies on a covariance function. However, neither the expectation nor the covariance function are usually known, so simple kriging is seldom used. In ordinary kriging (OK), the most common type of kriging, an unknown constant trend is assumed, m(x) = E(z(x)), and the estimation relies on a semivariogram model which is estimated from the sample.

SK and OK both assume stationarity of the spatial field, i.e. that the expected value of the variable does not change in space. This is often not the case with climatic variables, which tend to show spatial trends due to differences in the exposure to the atmospheric factors. Universal kriging (UK) allows incorporating non-stationarity by assuming a general linear trend model,

∑

where p defines the order of the polinomial model on the spatial coordinates of the point, f(x). This process is often called trend removal, and it is interesting because it can capture a real spatial structure present in the data. However, it increases the complexity of the kriging model by adding more parameters for estimation. A two-dimensional quadratic surface, for example, adds five parameters beyond the intercept parameter that need to be estimated. As it is well known, the more parameters to be estimated, the more uncertain the model becomes.

Spatial structure can also arise in climatic data due to co-variation with other geographical factors such as the elevation or the solar incoming radiation. Co-kriging (CK) allows considering the influence of external variables (co-variates) by analysing the cross-correlation between the errors of the different variables, ε1’(x), ε2’(x), etc.

Spatial correlation may occur at different distances when different directions are considered; this characteristic is called anisotropy. Since the Ebro basin has a marked NW-SE structure, the effect of including anisotropy in the model was also evaluated.

In our study we compared OK, UK and CK methods. The order of the trend removal component in UK was determined by the lowest root mean square error, computed by a leave-one-out bootstrap process. In the case of CK we used the elevation, as determined by a digital terrain model (DTM), as the spatially distributed co-variate; the kriging method used was the best one from the previous methods, i.e. OK and UK. All geostatistical analyses were done with the ArcGIS 9.3 software.

Mixed methods

Mixed methods, also called “hybrid” (Hengl et al. 2004), are based on a combination of regression and local interpolation techniques or kriging, exploiting the ability of regression to relate the target variable to other spatially distributed variables and the spatial self-correlation acting at the local scale on most spatial variables. Alternative forms of mixed methods have been proposed in the last years for mapping environmental variables (Odeh et al., 1994, 1995;

Brown and Comrie, 2002; McBratney et al., 2003; Hengl et al., 2004; Ninyerola et al., 2007; Vicente-Serrano et al., 2007). These and other studies have demonstrated that mixed methods usually allow for more precise and detailed representations of the target variables.

There are several types of mixed interpolation methods which vary upon their procedure. When regression residuals (ε) are interpolated by means of kriging two methods can be used: i) in kriging with external drift (KED), the drift component is defined by regression upon some auxiliary variables and fitted together with the spatial distribution of the residuals (Wackernagel, 1998; Chiles and Delfiner, 1999); ii) in regresion-kriging (RK) the drift and the residuals are fitted separately and then summed (Ahmed and Marsily, 1987; Odeh et al., 1994, 1995). Other kind of mixed methods interpolate residuals using local methods as the inverse distance weighting interpolation or splines (Vicente-Serrano et al., 2003;

Ninyerola et al., 2007).

In this study we used RK. To avoid misconceptions or sub-optimal solutions (Hengl et al. 2004), regression predictions were calculated by means of GLS (see section 2.4.1.), and then residuals surfaces were fitted by OK and added to the GLS predictions. The R statistical analysis package (R Development Core Team, 2008) was used for RK.

In document El factor climático en la erosión del suelo: Erosividad de la lluvia en la cuenca del Ebro (Page 112-119)