Pixel-wise classification by SVM - Spectral-spatial classification of n-dimensional images in r

Given ann-dimensional image, the idea of pixel-wise classification is to assign a name (class of interest) to each pixel based on the spectral similarities among the pixel vectors. The assignment of a class to each pixel requires a priori knowledge of the image (supervised classification) in the form of labelled pixels. Success in classification depends not only in the ability to categorize and/or discriminate objects, but also in the set of training samples used for the same. The supervised classification begins with a training phase where a set of training samples (usually obtained by manually labelling) are used for defining a model of the classes in the spectral space. A second classification phase assigns a class to each pixel based on the model created in the previous phase producing a classification or thematic map [146].

There are a variety of classifiers, such as linear discriminant analysis, maximum likeli- hood, random forests, nearest-neighbors, artificial neural networks and support vector ma- chines [146, 60]. An exhaustive evaluation of classifiers belonging to a wide collection of families over the whole UCI machine learning classification database2, was presented in [60]. The classifiers most likely to be the best were the random forest and the SVM.

In remote sensing, the supervised classifier SVM has been generally recognized as the one that offers good results in terms of accuracy even, when the number of training samples is small [72, 55]. This is an important property in remote sensing classification as the number of training samples available in hyperspectral images is usually small. In this work, the SVM is used to classify hyperspectral images. In the following, we briefly describe the general mathematical formulation of SVM for binary classification problems.

The standard two-class SVM method consists in finding the optimal hyperplane which separates two training samples belonging to different classes, maximizing the distance between the closest points of each class. Figure 2.17(a) shows an example of two linearly separable classes[+1,−1]and the maximum margin between them.

Let us consider a set ofNtraining points{xi,yi}Ni=1with the training vectorsxi∈Rnand

the corresponding targetyi∈ {±1}. And let the setNbe linearly separable into two classes.

The hyperplane can be estimated as:

yi(w·xi+b)>1 with i=1,2, . . . ,N, (2.20)

2.7. Pixel-wise classification by SVM 35

(a) (b)

Figure 2.17:SVM classification: (a) optimal hyperplane separating two linearly separable classes[+1,−1], (b) a non-linearly separable case by SVM.

where the hyperplane is defined by the vectorw∈Rn_{and the bias}_b_∈_R_{. The training samples}

that maximize the distance are called support vectors (SVs), illustrated in Figure 2.17(a) with double-circle.

Maximizing the distance of samples to the optimal hyperplane is equivalent to minimizing the norm ofw. Therefore, the optimal hyperplane can be obtained by minimizing:

Q(w) =min " ||w||2 2 # , subject to (2.20). (2.21)

The assumption of linear separability means that there existwandbthat satisfy (2.20). When the data are linearly inseparable, nonnegative slack variablesξi are introduced in (2.20) to

deal with misclassified samples [1]:

yi(w·xi+b)>1−ξi, with ξi≥0,i=1,2, . . . ,N. (2.22)

Figure 2.17(b) shows an example of a non-linearly separable case by SVM. Now, the optimization problem can be described as:

Q(w,b,ξi) =min " ||w||2 2 +C N

∑

i=1 ξi # , subject to (2.22), (2.23)

where the constantCis a regularization parameter that controls the amount of penalty (maxi- mizes the margin and minimizes the error). This optimization is usually solved by a quadratic programming problem [1] in the training phase.

The classification phase is performed by computing the sign of the following decision function:

D(x) =

_∑

i∈S

αiyi(xi·x) +b, (2.24)

whereαiare the nonzero Lagrange multipliers,Sis the subset of training samplesxi, andxis

the unknown sample. The parameters (αi,S,b)are found during the training phase. For each

x, the corresponding class is+1 ifD(x)>0 and−1 otherwise.

Although the solution may be determined optimally by the SVM when the training samples are not linearly separable, the SVM approach uses kernel tricks to map the non-linearly separable data into a higher dimensional space, in order to enhance linear separability between the two classes. Several types of kernels, such as linear, polynomial, splines or Radial Basis Function (RBF) kernels can be used in SVM classifiers. We refer the reader to [1, 72, 27] for further details on SVM and kernel tricks.

For hyperspectral image classification, the Gaussian RBF has been widely used. The discriminant function (2.24) can be expressed using the RBF kernel as:

D(x) =

_∑

i∈S αiyiexp −γ||xi−x||2 +b, (2.25)

where exp(−γ||xi−x||2)is the RBF kernel andγis a positive parameter controlling the width

of the kernel.

2.7.1 5-fold cross validation

The SVM is mainly a nonparametric method, although some parameters need to be tuned before the optimization. In the RBF kernel case, there are two parameters:C, which controls the penalty term, andγ, which is the width of the kernel. The best parameters are usually found

byk–fold cross-validation, where several parameters are tested, for example by a search in a range of values (grid search).

Given a set of training samples, thek-fold cross validation divides the total number of samples intokpartitions. One partition is used to validate the accuracy of the classification, and the remaining k−1 partitions are used as training data. This validation is repeatedk

times, using each partition once as a test. The final classification accuracy is determined by the average with thekclassification results. In this work we have used 5–fold cross validation for tuning the SVM parameters.

2.7. Pixel-wise classification by SVM 37

2.7.2 Multi-class SVM classification

The standard SVM classifier is designed for two-class problems. When the data haveK>2 classes a method for solving the multiclass problem must be defined. These methods may be of the type one-against-all (OAA), one-against-one (OAO) and all-at-once, where a set of binary classifiers is combined. In the OAA approach, theKclass problem is converted intoK

binary classifiers where theithclassifier separates the classifrom the remaining classes. The OAO multiclass approach createsT =K(K−1)/2 binary classifiers on each pair of classes, whereK is the number of classes. The final classification for each point is given by the highest number of votes obtained for a class in theT classifiers. Hsu and Lin [80] found that the OAO approach is more suitable for practical use than the other methods, mainly because the training time is shorter. This is the approach used in all the SVM classifications carried out in this work.

2.7.3 LIBSVM: the facto library for SVM

In this section we present the facto library for SVM (LIBSVM) and the principal interfaces and extensions that are based on this library. LIBSVM [35] is a library for Support Vector Machine (SVM) written in C, widely used in machine learning. The library implements the One-Against-One approach for multiclass classification. It is currently one of the most widely used SVM software and it has been extended to third-party software such as Weka3, R4, Octave and Matlab5.

Owing to its popularity, it has been also implemented in other languages such as Java5, Python5and C#6, as well as in other architectures, such as Cell processors and GPUs.

While efficient algorithms for training SVM are available, dealing with large datasets makes training and classification a computationally challenging problem. In [109] the training phase implemented in the LIBSVM is speeded up by parallel computation in Cell processor architectures. LIBSVM has also been accelerated with GPUs using CUDA [8]. This GPU- accelerated LIBSVM implementation is a modification of the original LIBSVM that exploits CUDA with the same functionality and interface of LIBSVM. A comprehensive list of other SVM implementations and wrappers can be found in [60]. In this thesis we have implemented

3_{https://weka.wikispaces.com/LibSVM/}

4_{http://cran.r-project.org/web/packages/e1071/}

5_{Matlab, Octave, Java and Python implementations are included in the LIBSVM package.}

a new GPU proposal (GPUSVM) for the classification stage, and the LIBSVM [35] has been used as a basis for evaluating the proposed schemes in CPU.

In document Spectral-spatial classification of n-dimensional images in real-time based on segmentation and mathematical morphology on GPUs (Page 62-66)