Background - Appendix: synthMix-SVR - Quantifying urban land cover by means of machine learning

Appendix: synthMix-SVR

2 Background

Support vector machines (SVMs) emanate from the field of machine learning and provide flexible, non-parametric and nonlinear models that are excellently suited for exploiting remote sensing image data (e.g., Foody and Mathur 2004; Melgani and Bruzzone 2004; Camps-Valls et al. 2006; Durbha et al. 2007). Detailed introductions into support vector machines and their underlying concepts can be found in Schölkopf and Smola (2002), Smola and Schölkopf (2004) or Burges (1998). While the support vector classifier (SVC) has been established as a powerful technique for the per-pixel mapping of discrete land cover classes, little attention has been paid to the use of support vector regression (SVR) for estimating sub-pixel fractions of land cover. This can be partially explained by the difficulty in finding reliable quantitative training information, i.e., pairs of spectra and associated cover fraction, needed for regression modeling. Compared to per-pixel classifiers, training signatures can hardly be labeled in the data itself or mapped in the field. A further possible solution to combine image spectra with spatially aggregated land cover information from a high resolution reference map often fails due to inaccurate co- registered data sets. In this context, the combination of SVR with synthetically mixed

training data has been demonstrated to overcome this drawback and was therefore recommended as suitable approach for sub-pixel mapping purposes (Okujeni et al. 2013).

Generation of synthetically mixed training data

The general idea behind generating synthetically mixed training data is to produce a set of multiple mixed spectra along with related mixing fractions, which can be used as training input for regression modeling of a single target land cover category (Figure A-2). A library consisting of pure material spectra that are assigned to their land cover category forms the data base. Following the description in Okujeni et al. (2013), further processing steps include:

(1) The partitioning of the library spectra into a target category and a background category (includes all remaining categories).

(2) The calculation of synthetic mixtures between each pure spectrum of the target category (with 100% mixing fraction) and each pure spectrum of the background category (with 0% mixing fraction). For simplification, linear mixing systematics is assumed. Further, the user needs to define the mixing parameters, including the mixing complexity (number of possible material spectra to be mixed) and the mixing interval (number of intermediate mixtures within the fraction range between 0 and 100%). (3) The combination of all pure original and mixed spectra in a single spectral library. The

mixing fraction of the respective target category is assigned to each spectrum.

synthMix-SVR was developed to generate synthetically mixed training data for multiple target categories through iterative processing. The user is requested to set the mixing interval. The current version of synthMix-SVR only supports the generation of binary mixtures between each pure spectrum of the target category and each pure spectrum of the background category. To account for environmental or instrumental errors, the user may optionally add noise to the spectral data.

Support vector regression modeling

SVR has been widely used as powerful, nonlinear technique mainly for quantifying biophysical/-chemical plant properties (Camps-Valls et al. 2006; Durbha et al. 2007; Tuia et al. 2011). In general, SVR estimates a linear dependency between pairs of n-dimensional input vectors (i.e., spectral bands) and a 1-dimensional target variable (i.e., land cover fraction of a target category) by fitting an optimal approximating hyperplane to the training data. For nonlinear problems, the training data are implicitly mapped by a kernel function into a higher dimensional space, wherein the new data distribution enables a better fitting of a linear hyperplane. The parameterization of an SVR requires the user to select the parameter(s) of a kernel function g as well as the regularization C and loss function ε parameters. Once these parameters have been selected, the optimal approximating hyperplane is found by quadratic optimization.

synthMix-SVR integrates the SVR algorithm provided by imageSVM 3.0 (available from: www.imagesvm.net). imageSVM is an IDL based tool for the SVM classification and regression analysis of remote sensing image data. imageSVM uses LIBSVM (Chang and Lin 2011) and a Gaussian kernel function during the training of the SVM. synthMix-SVR makes use of the imageSVM graphical user interface, which enables (i) the automatized or user-defined SVR model parameterization via grid search and internal validation, and (ii) the subsequent model application to derive a model prediction. Once the synthetically mixed data for the selected target categories have been generated, synthMix-SVR iteratively trains SVR models and derives fraction maps through model application to the image data.

Post-processing of fraction maps

Land cover fractions predicted by SVR cover continuous, physically meaningful fraction values between 0 and 100% through partial interpolations of the training data interval. However, improper extrapolations may also result in unrealistic fractions, i.e., negative

values (below 0%) or super-positive values (greater than 100%). Beyond, through mapping single land cover categories independently from each other, it cannot be guaranteed that the combination of all fraction maps sum to unity (100%). A comprehensive analysis and discussion is provided in Okujeni et al. (2013).

The synthMix-SVR post-processing module was designed to optionally account for unrealistic fraction values and to produce meaningful stacks of fraction maps that sum to unity.

In document Quantifying urban land cover by means of machine learning and imaging spectrometer data at multiple spatial scales (Page 146-149)