• No results found

CHAPTER 2: LITERATURE REVIEW

2.5 Support Vector Machine

2.5.1 Introduction to Support Vector Machine

SVM is the state-of-the-art neural network technology based on statistical learning theory introduced as a classification tool by Vapnik in the 1970s. SVM is a tool for investigating learning problem that offers adequate performance using linear or nonlinear function. It is a type of neural network that automatically determines the structural components. SVM has been applied successfully in various classification and clustering applications. Recently, it has been extended to apply regression and prediction applications (Khader & McKee, 2014; Li et al., 2013; Lou et al., 2013; Shi & Xu, 2012; Wei, 2015; Wieland et al., 2010; Zakaria & Shabri, 2012; Zhou et al., 2013).

In SVM, the linear model is primarily employed to set nonlinear class boundaries by nonlinearly mapping the input data into a high-dimensional feature space. In the new domain, the linear model characterizes a nonlinear decision boundary in the original domain (Solomatine et al., 2008; Wang et al., 2009). In other words, SVM constructs an optimal separating hyperplane in the new hypothetical field. This hyperplane may be either a line, a plane, or a surface that divides the data into two classes. When the data are split linearly, linear machines are trained for an optimal hyperplane that separates the data with minimum error (Chen & Yu, 2007; Solomatine et al., 2008).

SVM is advantageous because it follows the structural risk minimization principle, which aims to limit errors in both the training data set and the generalized model. With this feature, SVM can effectively generalize results even with limited input patterns (T. Asefa et al., 2006; Ding et al., 2014).

University

2.5.2 History of Support Vector Machine

SVM is a comparatively new AI modelling technique based on statistical learning theory introduced by Vapnik in the 1970s. SVM has been developed as a classification tool and it was applied successfully in a wide range of classification and clustering applications in. Recently, SVM have been successfully extended to apply in regression and prediction applications (Solomatine et al., 2008; Wu et al., 2008; Yu et al., 2006).

In the last few years, it has become a commonly used as modelling technique, due to the high performance of SVM, and have become an actual challenger to ANNs in regression and prediction applications. Since then there have been increasing SVM applications in the wide range fields such, civil engineering, water resources and other engineering applications (Tirusew Asefa et al., 2006; Behzad et al., 2009; Han et al., 2007; Misra et al., 2009; Zakaria & Shabri, 2012).

2.5.3 Training Process of Support Vector Machines

Consider a set of training data { , } (xi is the input vector, di is the corresponding

output and N is the number of data patterns), the linear regression function of SVM can be expressed as follows:

( , ) = ∙ ∅ ( ) + (2.11) where wis the weight vector; b is the bias; and ∅ is nonlinear mapping function.

University

The two factors (w and b) are computed by minimizing the following function: L (y, f(X, ω)) = |y − f(X, ω)|

( , ) = | − ( ∙ ∅ ( ) + )| − |( ∙ ∅ ( ) + ) − | >

0 ℎ (2.12) where y represents observed value. ε denotes the tube size and corresponds to the approximate accuracy of the training data points. Within the extent of the ε-tube and penalized losses Lε, the loss function describes the tolerated errors when data are located

external of the tube.

The nonlinear SVR problem can be expressed as the following optimization problem:

, , ∗= ( + ∗) +

1

2‖ ‖ (2.13) C ∑ ( + ∗) is the first term in Equation (2.13) and represents training error (risk).

where ξ and ξ* are slack variables represent the upper and lower training errors,

respectively, subject to error tolerance ε. These variables describe the difference between the observed data and the related boundary values of the ε-tube.

It is zero when the predicted data are within the ε-tube, as shown in Figure 2.10. ‖ω‖ is the second term and denotes the generalization term. It is a measure of function flatness. C is a positive constant that represents the regularized constant and regulates the trade- off between empirical risk and the regularization term. By maximizing the value of C, we can enhance the significance of empirical risk relative to the regularization term.

University

Equation (2.13) can then be solved using Equations (2.14) and (2.15) according to the following convex optimization problem:

Minimize: 1 2‖ ‖ + ( + ∗) (2.14) Subject to ∅ ( ) + − ≤ + − ( ∅ ( ) + ) ≤ + ∗ , ∗≥ 0 = 1,2,3 (2.15) Figure 2.10 shows the main concept of SVM based on Equation (2.13-2.15). In this regression problem, most data patterns are presumably within the ε-tube. If the data pattern (xi, di) is outside the ε-tube, errors are induced in ξ and ξ*. These variables are

thus reduced in the objective function. By limiting both the regularization ‖ ‖ + and the training error ∑ ( + ∗), in order to alleviate under- and over-fitting (Chen & Yu,

2007; Noori et al., 2011; Wu et al., 2008).

Figure 2.10: Nonlinear SVM with Vapnik’s e-insensitive loss function

(Chen & Yu, 2007)

University

Figure 2.11 presents the Schematic diagram of SVM, where the K(xi,x) is the output of the ith hidden node for input vector x, it is a mapping of the input x and the support

vector xi by selecting the kernel function (Chen & Yu, 2007).

Some mostly used kernel functions in SVM are as follows: - Linear ( , ) = xi⋅ x

- Polynomial ( , ) = [γ(xi⋅ x) + c]

- Sigmoid ( , ) = tanh[γ(xi⋅ x) + c]

- Radial basis function ( , ) = exp(−γ|xi− x| )

Many applications in hydrological modelling have proved the efficiency of the radial basis function in SVM. The results of the SVM model to be stated as Equation (2.16),

( ) =

α ⋅ K(x , x) + b

(2.16) Where,

x

represents the support vector, and m represents the number of support vectors. The SVM model employed herein has three interdependent parameters (C, ε, γ) to be valued. The near optimal values of these parameters are obtained by a trial and error method. The Lagrange coefficients

α

and the bias term b can be solved analytically, and the best structure is thus achieved.

University

Figure 2.11: Schematic diagram of SVM architecture (Chen & Yu, 2007)