• No results found

Robust Multi Weight Vector Projection Support Vector Machine

N/A
N/A
Protected

Academic year: 2020

Share "Robust Multi Weight Vector Projection Support Vector Machine"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

2017 2nd International Conference on Computational Modeling, Simulation and Applied Mathematics (CMSAM 2017) ISBN: 978-1-60595-499-8

Robust Multi-Weight Vector Projection Support Vector Machine

Heng-hao ZHAO and Qiao-lin YE

*

College of Information Science and Technology, Nanjing Forestry University, Nanjing, Jiangsu 210037, P. R. China

Jangsu Key Laboratory of Image and Video Understanding for Social Safety, Nanjing University of Science and Technology, 210094, P. R. China

*Corresponding author

Keywords: MVSVM, L1-norm, ITERATIVE algorithm, Robustness to outliers.

Abstract. Recently proposed Multi-weight vector projection support vector machines (MVSVM) is an outstanding algorithm for binary classification. However, it measuring distance in the objective function by squared L2-norm, which is easy to find that the impact of outliers is exaggerated. To alleviate this, we propose an effective algorithm, termed as Robust MVSVM based on the L1-norm distance (L1-MVSVM). The distance in the objective of L1-MVSVM is measured by L1-norm. Besides, we design a powerful iterative algorithm to solve the optimal problem of L1-norm, whose convergence is theoretically ensured. Finally, the effectiveness of L1-MVSVM has been verified through extensive experiments.

Introduction

In the last two decades, Support vector machine (SVM) has gained a great deal of attention due to its great generalization ability, which has been a powerful classification method in the machine learning [1]. For binary classification, a new approach to SVM classification named Generalized eigenvalue proximal support vector machine (GEPSVM) [2] is proposed by Wild and Mangasarian wherein each of two data sets to one of two distinct planes that are not parallel to each other. The unique characteristic of GEPSVM leads to lower computational complexity and outstanding classification performance, especially in the XOR problem, GEPSVM has a huge advantage than SVM. Following the GEPSVM, so many researchers have improved on GEPSVM in various aspects. Guarracino et al. proposed Regularized general eigenvalue classifier (ReGEC) [3]. Ye proposed Multi-weight Vector Projection Support Vector Machines (MVSVM) [4-5].

GEPSVM and its improvement algorithms always sensitive to the outliers or noises, because the model adopts L2-norm operation distance criterion. In recent years, many papers exposed that L1-norm distance have fine robust to the outliers and noises [6-9]. In document [9], Kwak first time introduced L1-norm distance into PCA. Then, L1-norm distance was used to discriminant criteria LDA feature extraction. Based on the L1-norm distance, Li proposed the robust L1-NPSVM [8], which adopt L1-norm distance in GEPSVM instead of square L2-norm operation distance criterion. L1-norm distance criterion guarantee the GEPSVM is robust to the outliers or noises.

We consider a binary classification problem in an n-dimensional space. We suppose that we have

m binary training sets, which are indicated as

( )

( i , ) | 1, 2, 1, 2,..., j yi ijmi x

. In the training sets, ( )i

j

x

denotes the

-ith

class and j-th sample. yj { 1, 1} is the class mark of the sample, which represents the class of positive or negative. We suppose the matrix A with size of m1n and the

matrix B with the size of m2n to describe in the following content. We define a pair of column

vector 1 1

m

e R and 2

2

m

(2)

In this document, we will pursue research in overcoming the non-robustness of the MVSVM about outliers. The convergence of algorithm has been theoretically proved by us. Finally, effectiveness has been verified through experimental results.

MVSVM

Like GEPSVM, MVSVM has two eigenvalue formulations. But MVSVM is different from GEPSVM in spirit. Instead of aiming to finding the specific planes, MVSVM aims to find the weight-vector projections w1 and w2 for the respective class. MVSVM can fast complete the computation and simultaneously handle the complex Exclusive Or (XOR) problems well.

The optimization criteria of MVSVM are given by:

1 2 1 1

(1) (2) 2 (1) (1) 2

1 1 1 1

1 1 1 1

1 2 1

1 1 1

max ( ) β ( )

m m m m

T T T T

j j i j

j j i j

mm   m

  

w x w x w x w x

(1)

2 1 2 2

(2) (1) 2 (2) (2) 2

2 2 2 2

1 1 1 1

2 1 2

1 1 1

max ( ) β ( )

m m m m

T T T T

j j i j

j j i j

mm   m

  

w x w x w x w x

(2)

Set

1

(1) 1

1 1

1 =

m

j j

m

μ x

is the mean vector of positive samples and

2

( 2) 2

1 2

1 =

m

j j

m

μ x

is the mean vector of negative samples. 1=( 1 1) ( 1 1)

T T T

 

S A e μ A e μ is the divergence matrix of the positive samples. 2=( 2 2) ( 2 2)

T T T

 

S B e μ B e μ is the divergence matrix of the negative samples.

3= ( 2 1)( 2 1)

T

 

S μ μ μ μ is the inter-class divergence matrix. We can rewrite the problem (1) and (2)

as:

1 3 1 1 1 1

max T β T

w S w w S w

(3)

2 3 2 2 2 2

max T β T

w S w w S w

(4) Where  is a free trade-off parameter. According to the above criteria, MVSVM can find two optimal weight-vector projections (each for a particular class), such that each of two data sets are closest to one of two class means and meanwhile the points sharing different labels are separated as far as possible.

Related Works

MVSVM Based on the L1-norm Distance

Tradition MVSVM model:

1 1 1 1 1 1

1 2 1 2 1 1

( ) ( )

min

( )( )

T T T T

T T

 

 

w A e μ A e μ w

w μ μ μ μ w (5)

2 2 2 2 2 2

2 2 1 2 1 2

( ) ( )

min

( )( )

T T T T

T T

 

 

w B e μ B e μ w

w μ μ μ μ w (6) Problem (5) and (6) have the same form, so we use problem (5) as an example to solve.

SetH ( 1 1)

TA e μ

and ( 2 1)

T

Gμμ

, we can rewrite the problem (5) as: 2

1 2 2 1 2

|| || min

|| ||

Hw

(3)

1 1 1 1

|| || min

|| ||

Hw

Gw (8) In order to facilitate the following solution, we rewrite the problem (8) to the maximum form:

1 2

1 1

1 1

max | | / | |

m m

i i

ii

g w

h w

(9) We should note that the change of w1 does not lead to changes of the objective value of the

original problem.

We convert the problem (9) to a maximization problem with equality constraints

1 2

1 1

1 1

max | |, s.t. | | 1

m m

i i

ii

g w

h w

(10) From:

1 1 1

1 1 1 1 1

1 1 1 1

| | = ( )( )

| |

T

m m m

T i i

i i i

i i i i

sign

  

 

  

 

  g g

g w w w g w g w

g w

(11)

2 2 2

1 1 1 1 1

1 1 1 1

| | = ( )( )

| |

T

m m m

T i i

i i i

i i i i

sign

  

 

  

 

  h h

h w w w h w h w

h w (12) Set fii 1/ |hiw1| and kisign(giw1) , we can rewrite the problem (10), like:

1 2

1 1 1

1 1

max ( ), s.t. ( ) 1

m m

T T

i i ii i i

i i

k f

 

g w w

h h w

(13) We denote ( )

1

p

w is the -th

p iteration ofw1. Therefore, we can get ( 1) 1

p

w  from the equation which is shown as:

1 2

1

( 1) ( ) ( )

1 1 1 1

1 1

arg max ( ), s.t. ( ) 1

m m

p p T p T

i i ii i i

w i i

k f

 

w g w w h h w

(14) Where ( ) 1/ | 1( )|

p p

ii i

fh w and ( ) ( 1( ))

p p

i i

ksigng w . Easy to verify, ( ) 1

( )

p i i

k g w

is the first-order Taylor expansion of |g wi 1| at point

( ) 1

p w

. Rewrite the problem (14) with matrix form:

1

( 1) ( ) ( )

1 arg max 1, s.t. 1 1 1

p p T T p

w

w K Gw w H F Hw

(15)

Where 2 2

( ) ( ) ( ) ( ) 11 22

( , ,..., )

p p p p

m m

diag f f f

F

and ( ) ( 1( ) )

p sign p T

K w G .

Now, we will give general form of result of problem (15). Construct the LaGrange function of problem (15):

( ) ( )

1 1 1 1

1

( , ) ( 1)

2

p T T p

L w γK Gwγ w H F Hw

(16) Where γ is the LaGrange multiplier. Solve the derivative of w1 for L w( , )1 γ and set it to zero. We can get the solution of problem (15):

( 1) ( ) 1 ( ) 1

1

( ) T

p  T pT p

w H F H G K

γ (17)

We can bring the solve (17) into the equation constraint 1 ( ) 1 1

T T p

w H F Hw :

( ) ( ) 1 ( )

( p )( T p ) ( T pT) 

(4)

Finally, we get:

( ) 1 ( ) ( 1)

1

( ) ( ) 1 ( )

( )

( )( ) ( )

T T

T p T p p

p T p T p

 

H F H G k

w

k G H F H G k (19) Similarly, the same way is applied to the problem (6):

( ) 1 ( ) ( 1)

2

( ) ( ) 1 ( )

( )

( )( ) ( )

T T

T p T p p

p T p T p

 

E F E N k

w

k N E F E N k (20)

Where ( 2 2)

T

 

E B e μ and ( 2 1)

T

 

N μ μ .

Algorithm 1: An efficient iterative algorithm to solve the problem (8).

Data: Input the data matrixXA;B

Result: w1

SetH ( 1 1)

T

Ae μ andG ( 2 1) T

μμ . Initialization ( ) 1

p

w and set p=1

Repeat:

1) Compute 2 2

( ) ( ) ( ) ( ) 11 22

( , ,..., )

p p p p

m m diag f f f

F and ( ) ( )

1

( )

p p T

sign

k w G .

2) Compute ( 1)

1

p

w by the problem (15).

3) Normalize ( )

1 p

w by the formula ( 1) ( 1) ( 1)

1 1 / || 1 ||1

p pp

w w w and setp p 1

UntilConverges

Theorem 1: In each iteration, Algorithm 1 monotonically increases the objective function (9)

Proof: we iterative to solve the problem (15) and set its objective function asJ(w1). From the

physical meaning of 1( 1) p

w , we can know that ( +1) ( )

1 1

(wp ) (wp )

J J . Therefore:

( ) ( 1) ( ) ( )

1 k 1

p p p p

k Gw   Gw (21)

1 1 1

( ) ( 1) ( ) ( ) ( )

1 1 1 1 1

1 1 1

( )( ) ( )( )= | |

m m m

p p p p p

i i i i i

i i i

sign g gsign g g g

  

w w

w w

w

(22) Since A is a convex function, therefore, we can get an inequality like:

( ) 1 1

( 1) ( ) ( 1) ( )

1 1 1 1 1

( ) ( )+ ( )| p ( )

p p p p

w w

f w   f w f' w w  w

(23) Problem (23) can be written as

1 1 1

( 1) ( ) ( ) ( 1) ( )

1 1 1 1 1

1 1 1

+ )g ( )

m m m

p p p p p

i i i i

i i i

gg sign g

  

 

| w |

| w |

( w w w

(24) So, we can get that

1 1

( 1) ( ) ( 1)

1 1 1

1 1

)g

m m

p p p

i i i

i i

gsign g

 

| w |

( w w

(25) Combine the problem (24) and (25):

1 1

( 1) ( )

1 1

1 1

| | | |

m m

p p

i i

i i

gg

 

w

w

(26) For any two nonzero variables v and u, we have:

2 2 2

2 2 2

( ) 0 2 0

2 2 2 2

v u v u

v u v u vu v v u

u u u

            

(27)

Set 1 1

p i

v|hw( )|, 1 p i

(5)

1 2 2

1 1 1

1 1

1 1

(h ) (h )

2 2

p p

p i p i

i p i p

i i

h h w

h h

w w

w

w w

( ) ( )

( ) ( )

( ) ( )

| | | |

| | | | (28)

Therefore

2 2 1 2 2 2 2

1 1 1

1 1

1 1 1 1 1 1

(h ) (h )

2 2

p p

m m m m

p i p i

i p i p

i i i i i i

h h

h h

 

   

  

w

w

w w

w w

( ) ( )

( ) ( )

( ) ( )

| | | |

| | | | (29)

Due to

2 ( ) ( ) ( ) ( )

1 1 i 1

1

| | 1

m

p T T p p p

i h

 

w H F Hw w

(30)

2 ( 1) 2 ( +1) ( ) ( +1) i 1

1 1 ( )

1 i 1

( )

1

| |

p m p T T p p

p i

 h w

w H F Hw

h w (31)

We can get 1

1 1 1

1

m

p i i

h

| w( )|

, so we have

2

1 +1 +1

1 1 1

1

m

p p T T p p

i i

h

| w( )| w( ) H F( )Hw( )

(32) Further

1 1 1 2

( 1) ( 1) ( +1) ( ) ( +1) ( 1) ( 1)

1 1 1 1 1 1

1 1 1 1

| | | | / ( ) | | / | |

m m m m

p p p T T p p p p

i i i i

i i i i

gggh

   

 

w

w w H F Hw

w

w

(33)

Combine the problem (33), problem (26) and equation 2

( ) i 1 1

| | 1

m

p i

h

w

, we can get:

1 1

2 2

( 1) ( )

1 1

1 1

( 1) ( )

1 i 1

1 1

| | | |

| | | |

m m

p p

i i

i i

m m

p p

i

i i

g g

h h

 

 

w w

w w

(34) When the equation (9) is established, it means that L1-MVSVM can find a local maximum point. And we can know that the algorithm is convergence. In practice, in order to guarantee the convergence of the algorithm, the conventional way is to set the iterative termination conditions which the difference of objective value between twice iteration is less than a small value, and at the same time, the iterative amount should be less than the given value. In the iterative process of our algorithm, we can know about the problem (19) and (20) that the matrix G DT ( )pG and T ( )p

H F Hcan

only guarantee the semi positive definiteness, so that we will get an inexact or unstable solution. Therefore, we can solve this problem by regularizing it. And the method is to replace a pair of matrix with G D GT ( )p δI and H F HT ( )p δI.

Experiment

In order to test the classification performance of L1-MVSVM, we will compare with four algorithms on the UCI dataset. The experimental data only contains two types, and all the sample data are normalized by the interval [-1, 1] to reduce the difference between the characteristics of different sample. In order to obtain the best generalization performance, parameter δ was selected from the value {2 |i i 12, 11,..., 12}  by using ten-fold cross validation method. The termination conditions for these algorithms are set that the target value between twice iteration less than 0.001 and the maximum number of iterations are 20.

Hardware of my experiment like: A PC with Inter(R) Core(TM) i7-3632QM CPU @ 2.20GHz, 8.00GB of RAM. And the software like: Windows 10 operating system; MATLAB 2014b.

(6)
[image:6.595.85.513.86.282.2]

Table 1. Test accuracy in UCI dataset.

Dataset GEPSVM L1-NPSVM MVSVM L1-MVSVM

Test accuracy Test accuracy Test accuracy Test accuracy

Brightdata 95.73% 95.94% 97.28% 95.53%

Checkdata 52.60% 52.60% 49.70% 54.30%

Germ 68.90% 68.70% 70.00% 70.30%

Haberm 74.43% 74.76% 71.91% 76.09%

Housingdata 71.53% 72.53% 71.92% 74.93%

Ionodata 78.33% 79.19% 84.90% 86.89%

Monk1 78.05% 78.94% 66.50% 66.48%

Monk2 67.89% 67.72% 71.05% 70.53%

Monk3 78.50% 78.50% 80.15% 77.61%

Musk 78.93% 79.54% 78.30% 76.67%

Spect 78.29% 78.29% 73.87% 77.51%

Votes 95.63% 95.63% 95.62% 95.17%

Vowel 88.44% 88.44% 89.95% 90.72%

Wpbc 74.68% 74.68% 74.32% 76.18%

Conclusions

In this paper, we have proposed MVSVM based on the L1-norm distance for binary classification, term as L1-MVSVM. In contrast with MVSVM, the application of L1-norm distance makes L1-MVSVM more robust to outliers and improves the flexibility of the model. Further, we design a valid iterative algorithm to solve the optimal problem of L1-norm, which is easy to implement and its convergence to a logical partial optimum is theoretically ensured. To sum up, L1-MVSVM has a better classification performance than MVSVM, GEPSVM and L1-NPSVM, the effectiveness of L1-MVSVM is proved by extensive experiments.

References

[1] C. Cortes, V. Vapnik. Support vector networks. Machine learning. 1995; 20: 273-297.

[2] O.L. Mangasarian, E.W. Wild. Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28:69-74.

[3] M. R. Guarracino, C. Cifarelli, O. Seref. Pardalos PM.A classification method based on generalized eigenvalue problems. Optim. Method Softw. 2007, 22: 73-81.

[4] Q. L. Ye, C.X. Zhao, N. Ye and Y.N. Chen. Multi-Weight Vector Projection Support vector machines, Pattern Recognition Letters, 2010, 31:2006-2011.

[5] Q. L. Ye, N. Ye, T.M. Yin, Enhanced multi-weight vector projection support vector machine. Pattern Recognition Letters 2014, 42: 91-100.

[6] C. N. Li, Y.H. Shao, and N.Y. Deng, Robust L1-norm non-parallel proximal support vector machine. Optimization, 2016. 65(1): p. 1-15.

[7] H. X. Wang, X.S. Lu, Z.L. Hu, and W.M. Zheng. Fisher discriminant analysis with L1-norm. IEEE Trans. Cybern., 2014, 6(44): 828-842.

[8] W. M. Zheng, Z. C. Lin, and H. X. Wang, “L1-norm distance Kernel Discriminant Analysis via Bayes Error Bound Optimization for Robust Feature Extraction,” IEEE Trans. Neural Netw., 2014, 4(24):793-805.

Figure

Table 1. Test accuracy in UCI dataset.

References

Related documents

In this paper, we study about the various related work in the area of image processing in cloud environment also discuss about the need of “cloud

The increased number of public health professionals has led to a number of practically trained persons working in public health leadership positions in the ministry, including

However, as a lawyer many good qualities can also be associated with one’s profession; some of these include making a direct impact on a client’s life and thus society as a

Only records with an AHI ≥ 5 with at least 20 mins recording time in both the supine and non-supine positions were included in the assessment of positional obstructive sleep

Physician Assistant in Family Practice; Flatbush Medical Group, Brooklyn, N.Y.. Practiced family medicine with a split between urgent visit patients and a panel of patients

Followings are the results of single lap spot welded joint on different load range by using different thickness specimen-, von- Mises Stress, Maximum Shear Stress ,Fatigue life,

five SMEPMS formulations with either propanol or ethanol as the cosurfactant in all saccharide solutions, TT values of the FD FFB samples reconstituted in either DI water or the

Fig. 1 Flow chart of the IDEAL trial*Defunctioning ileostomy, CR: continuity restoration.. with the disease); initial and cumulative 6-month length of hospital stay; 6-month