Volume 2, Issue 3, March 2013 Page 459

(1)

Volume 2, Issue 3, March 2013

Page 459

A

BSTRACT

Although twin support vector machine (TSVM) has faster speed than traditional support vector machine for classification problem, it does not take into account the importance of the training samples on the learning of the decision hyper-plane with respect to the classification task. In this paper, fuzzy twin support vector machine (FTSVM) is proposed where a fuzzy membership value is assigned to each training sample. Here, training samples are classified by assigning them to the nearest one of two nonparallel planes that are close to their respective classes. Moreover, this method only requires solving a smaller size SVM-type problem as compared to SVMs where the classifier is obtained by solving a quadratic programming problem. Experiments on several UCI benchmark datasets show that FTSVM is effective and feasible compared with twin support vector machine(TSVM), fuzzy support vector machine(FSVM) and support vector machine(SVM).

Keywords: Twin Support Vector Machine, Fuzzy weighting, Classification

1. I

NTRODUCTION

Support vector machine (SVM) is a powerful tool for pattern classification and regression and has drawn many researchers more and more the attention due to its generalization performance. Its theory basis is from structural risk minimization (SRM) principle. Support vector machine first maps the input points into a high-dimensional feature space and then finds an optimal hyper-plane that maximizes the margin between two classes in this feature space. Maximizing the margin between the two classes is attributed to solve a quadratic programming problem (QPP). In addition, support vector machine finds the optimal hyper-plane by means of kernel function without any knowledge of the mapping. The solution of the optimal hyper-plane can be written as a combination of a few input points called support vectors. Recently, the support vector machine has been successfully applied in many fields like text categorization, financial applications and etc.

Though SVM possesses better generalization performance compared with many other machine learning algorithms, it has larger computational complexity due to solving the quadratic programming problem and is sensitive to noise. To address these problems, a number of novel SVM models were proposed, such as least squares support vector machine (LS-SVM) and proximal support vector machine (PSVM). Moreover, Lin and Wang reformulated SVM to fuzzy support vector machine (FSVM) using fuzzy membership to each sample of SVM such that different samples can make different contributions to the surface [1]. In reality, SVM and proximal support vector machine (PSVM) aim to seek for one and only one separating plane, but it is difficult to efficiently deal with the complex cases (e.g. XOR problems). Therefore, Fung and Mangasarian proposed multi-surface proximal support vector machine via using generalized eigen-values (GEPSVM) [2]. The idea of GEPSVM is to find two nonparallel hyper-planes. Each surface is as close as possible to the samples of its own class and as far as possible from the samples of the other classes. And its computational cost is smaller than SVM. Subsequently, Jayadeva et al. proposed the twin support vector machine (TSVM) in the light of GEPSVM where solves two smaller size SVM-type problems to obtain the hyper-planes [3]. Compared to the traditional SVM, TSVM reduces the time complexity. Then, Kumar et al. proposed a least squares version of TSVM (LS-TSVM) [4]. In this paper, in order to further enhance the performance of TSVM, we propose a fuzzy version of TSVM by incorporating a membership value for each sample, called fuzzy twin support vector machine (FTSVM), based on idea both FSVM and TSVM. In fact, in the primal problem of TSVM, it is sensitive to noise. However, in our presented algorithm FTSVM, by adding an extra fuzzy value, it ensures the minimal effect of noise and the better generalization ability. Compared with FSVM, TSVM and SVM, FTSVM has larger superiority in terms of classification accuracy and computing time.

The paper is organized as follows. Section 2 briefly introduces support vector machine and Twin Support Vector Machine. Section 3 describes our algorithm FTSVM in detail. Section 4 presents the experimental results. Section 5

A Fuzzy Twin Support Vector Machine

Algorithm

Kai Li 1, Hongyan Ma 2

1

School of Mathematics and Computer Science, Hebei University, Baoding 071002, China

2

(2)

Volume 2, Issue 3, March 2013

Page 460

contains concluding remarks.

2. S

UPPORT

V

ECTOR

M

ACHINE

Consider a binary data classification problem with data setD{( ,x y₁ ₁), (x y₂, ₂),( ,x y_l _l), where ( ,x yi i)Rn  { 1, 1}. Denote by I



the set of indices i such that y_i 1 and by Ithe set of all indices, i.e.

I II. Let matrix _A__Rl n1 _and_B__Rl2n_{represent positive and negative training samples, respectively, where}_l 1

and l2 are numbers of samples with +1 class and -1 class. 2.1 Traditional Support Vector Machine

Traditional support vector machine finds an optimal separating hyper-plane between two classes of samples in some feature space in order to generate a classifier with maximal margin. The optimal separating hyper-plane is written as follows:

w x bT  0. (1)

In reality, the above described hyper-plane (1) lies in the middle between the bounding hyper-planes given by 1

T

w x b   ,w x bT   1.

To obtain the hyperplane, SVM solves the following optimal problem by maximizing margin between +1 class and -1 class:

2

,

1 min || ||

2

. . ( , ) 1, w b

i i

w

s t y w x  b  iI

or

2

,

1 min || ||

2

. . ( ( ) ) 1, w b

T

i i

w

s t y w  x b  iI

,

where



is a transformation from primal space to feature space. The final classifier is given

by ( ) ( T )

i

f x sign w x b or ( ) ( T )

i

f x sign w x b . However, samples are non linear separable in most cases. That is

to say that there exists no some separable hyperplane. To allow for the possibility of samples violating

( T ( ) ) 1,

i i

y w  x b  iI by introducing nonnegative slack variables i 0.The optimal hyperplane problem can be

expressed as the following quadratic programming problem with inequalities 2

, ,

1

i 1

min || || 2

. . ( , ( ) ) 1 , 0,

l

i w b

i

i i i

w C

s t y w x b i I

 

  





      



,

where C is a constant which is a cost trade-off between maximizing the margin and minimizing the classification error of the training samples.

2.2 Twin Support Vector Machine

The twin support vector machine (TSVM) is a new nonparallel plane classifier for binary data classification. It generates two nonparallel planes by solving two smaller-sized quadric programming problems such that each plane is closer to one of the two classes and is as far as possible from the other. That a new sample is assigned to class +1 or -1 depends upon its proximity to the two nonparallel hyper-planes. The linear classifier TSVM aims to obtain following two nonparallel planes

0 0

T T

w x b  ,w x b . This leads to the following pair of quadratic optimization problem

2 1 , ,

1

min || || . . ( ) , 0

2

T

w b_ __ Awe b  c e s t  Bwe b  e  

, (2)

2 2 , ,

1

min || || . . ( ) , 0

2

T

w b  Bw e b c e  s t Aw e b e  

             , (3)

(3)

Volume 2, Issue 3, March 2013

Page 461

given sample is closest to. For (2), its objective function makes class +1 proximity to the hyper-plane T 0

w x b , which the constraints make class -1 proximity to the hyperplane T 0

w x b  .By introducing the Lagrange multipliers, the dual problems with (2) and (3) are written in the following:

1

1 1

max ( ) . . 0

2

T T T T

e  G H H G  s t  c

    , (3)

1

2

1

max ( ) . . 0

2

T T T T

e   Q P P Q  s t  c

    . (4) The two nonparallel hyper-planes can be obtained from the solution of (3) and (4)

1

( , ) ( )

T T T T

w b H H I G

w b P P I Q

                     

. (5)

In order to deal with the case whichH HT or P PT is singular and avoid the possible ill-condition of H HT and P PT , formula (5) above artificially introduces a regularization term  I( 0), where I is an identity matrix of appropriate dimension.

It is be seen that in the above discussion, the linear TSVM requires matrices of size (n+1)×(n+1), where n is much smaller in comparison to the number of pattern of class +1 and -1.

For nonlinear case, the separating nonparallel planes are changed by introducing a nonlinear kernel K,namely

1 1 2 2

( T, T) 0, ( T, T) 0

K x C u b  K x C u b  , where T [ ]T

C  A B and K is a appropriate kernel. The primal problems of nonlinear TSVM are given as follows:

1 1 2

2

1 1 1 1 2 2 1 2 1 2 2 2

, , 1

|| ( , ) || . . ( ( , ) ) , 0

2

T T T

u bmin K A C u e b c e  s t  K B C u e b e    ,

1 1 2

2

2 2 2 2 1 1 2 1 2 2 1 1

, , 1

|| ( , ) || . . ( , ) , 0

2

T T T

u bmin K B C u e b c e  s t K A C u e b e    .

2.3 v-Twin Support Vector Machine

Similar to v-SVM, introducing two new parameter v1 and v2 instead of the trade-off factors C1 and C2, Peng

proposed v-twin support vector machine (v-TSVM) and rewrite the primal optimization problems as follows [5]: 2 1 , , 2 2 , , 1 1

min ( ) . . ( ) , 0, 0, .

2

1 1

min ( ) . . , 0, 0, .

2

j

i

T T

i j j j j

w b

i I j I

T T

j i i i i

w b

j I i I

w x b v st w x b j I

l

w x b v st w x b i I

l                                                               



To understand the roles of _ for all j 0,j I 

  (orI 0,j I 

  ), the negative (positive) samples are separated by the positive (or negative) hyperplane with the margin / ( T )

w w

   (or / ( )

T

w w

   ). At the same time, the adaptive quality effectively overcomes the above shortcomings in the TSVM. By introducing Lagrangian multipliers, two dual QPPs are obtained in the following:

1

1 2 1 2 1

1, 2

1

1 2 1 2 2

1, 2

1 1

min ( ) . . 0 , , .

2

1 1

min ( ) . . 0 , .

2

T T

j j j i i j j j

j j I i I i I

T T

i i i j j i i i

i i I j I i I

z z z z s t v j I

l

z z z z s t v i I

l                                  



To compute



, the samples x ii, I



 (orx ii, I 

 ) with₀ 1

i

l

 _

  (or0 _j 1

l

 _

  ) are chosen, which meansi 0

(orj 0) and

T i

w x_ b_ _ (or T i

w x_ b_ _).

3. F

UZZY

T

WIN

S

UPPORT

V

ECTOR

M

ACHINE

3.1 Linear Fuzzy Twin Support Vector Machine

(4)

Volume 2, Issue 3, March 2013

Page 462

(1) (1) ( 2 )

1

(1) (1) 2 ( 2) (1) (1) ( 2 ) ( 2 )

1 1 2 1 1

, , ,

2

1 1

min || || . . ( ) , 0, 0

2

T

w b

Aw eb s s t Bw b

l

              

, (6)

(1) (1) ( 2 ) 2

(2) (2) 2 (1) (2) (2) (1) (1)

2 2 1 2 2

, , ,

1

1 1

min || || . . ( ) , 0, 0

2

T

w b

Bw eb s s t Aw b

l

 

      

         , (7)

where v v₁, ₂(0,1]denotes the regularization parameter of positive and negative samples, respectively. s s₁, ₂(0,1] denotes the fuzzy membership of positive and negative samples, respectively. Objective for FTSVM finds the two non-parallel hyperplane, namely a positive hyperplane and a negative hyperplane. For the sake of brevity, we only consider the dual problem of optimal problem (6). In order to solve the optimization problem, we construct the following Lagrange function corresponding to the problem (6)

(1) (1) 2 (2) (1) (1) (2 ) ( 2)

1 1 2 1 1

2

1 1

|| || ( )

2

T T T

L Aw eb s Bw b

l

        

          , (8)

where Lagrange multipliers , ,   are all greater than zero. According to Karush-Kuhn-Tucker (KKT) conditions (1) (1)

(1) ( ) 0

T T

L

A Aw eb B

w 



   

 , (9) (1) (1)

1 1 2

(1)

(

)

0

T T

L

e S Aw

eb

e

b













, (10)

1 2

1

0 T

L

v e

 





    

 , (11)

2 (2) 2

0 s

L

l

















, (12) By simple Computation according to above equations, we obtain following equation (13).

(1) (1)

1 1 2

[ T e ][ e ][T ]T [ T e ]T 0

A A w b  B   (13)

Let (1) (1)

1 2

[ e ], [ ] ,T [ e ]

H  A U  w b G B and rewrite (13) as follows 1

0, ( )

T T T T

H HU G  U H H G 

    . (14) It is well known thatH HT is always positive semi-definite. However, it may be ill-conditioned in some situation. Thus, according to ridge regression approaches, we introduce a regularization term



I

to U to deal with possible ill-condition for H HT , where I is identity matrix with suitable order. Now, (14) becomes

1

( T ) T

U H H I G 

   .

Applying equations (9) to (12) into the Lagrange function, the primal problem (6) can be transforms into the following dual problem

1

2

1 2

1

min ( )

2

. . 0 ,

T T T

T

G H H G

s

s t e e v

l

  

 



  

. (15)

From the KKT conditions, we obtain

(1) (1) ( 2) ( 2)

1 1

( ) 0, 0, 0.

T T

Bw b

         

Similarly, we obtain the parameters of another hyper-plane ( 2) (2) 1

( )T , ( T ) T

w b R R Q Q  P 

   . The dual problem of the primal optimal problem (7) is given by

1

2 1

1

min P( ) P 2

. . 0 ,

T T T

T

Q Q

s

s t e e v

l

  

 



  

, (16)

where P[A e ]₁ and Q[B e ]₂ .

3.2 The Nonlinear Fuzzy Twin Support Vector Machine

In this section, we extend the presented method above to the nonlinear situation using kernel trick and consider the kernel-based surfaces rather than planes in primal space,

namely (1) (1) (2 ) (2)

( , T) 0, ( , T) 0

(5)

Volume 2, Issue 3, March 2013

Page 463

(1) (1) 2 ( 2) (1) (1) ( 2) (2 )

1 1 2 1 1

, , ,

2

1 1

min || ( , ) || . . ( ( , ) ) , 0, 0,

2

T T T

w b  K A C w eb   _l s  s t  K B C w eb      

(17)

(2 ) (2 ) 2 (1) ( 2) ( 2) (1) (1)

2 2 1 2 2

, , ,

1

1 1

min || ( , ) || . . ( ( , ) ) , 0, 0.

2

T T T

b K B C w eb _l s s t K A C w eb

               

(18)

We construct Lagrange function of primal problem (17) as

(1) (1) 2 (2) (1) (1) (2) (2)

1 1 2 1 1

2

1 1

|| ( , ) || ( ( , ) )

2

T T T T T

L K A C w eb s K B C w b

l

        

          ,

where0, 0,0 are Lagrange multipliers. According to KKT conditions

(1) (1) 1 (1) (1) (1) 1 2 (1) 1 2 1 2 ( 2 )

2

(A, ) ( (A, ) ) 0 ,

( (A, ) ) 0,

0,

0.

T T T T

T T T

T

L

K C K C w e b B

w L

e K C w eb e

b L v e s L l                                 

According to similar method above, we obtain equation (19).

(1) (1)

1 1 2

[( ( ,K A CT) e ][ ( ,T T K A CT) e ][w b ]T [( ( ,K B CT) e ]T T 0. (19)

Let (1) (1)

1 2

[ ( , T) e ], [ ] ,T [ ( , T) e ]

H  K A C U  w b G K B C . Then the equation (19) is modified as follows 1

0, ( )

T T T T

H HU G  U H H G 

    . So, the dual problem of nonlinear FTSVM is given by

1 2

1

min ( ) . . 0 ,

2

T T T s T

G H H G s t e v

l

    



   . (20)

Similarly, we also obtain the following dual problem for optimization problem (18), where T 1 P[ (A,C ) e ]K and T

2 [ (B,C ) e ]

Q K .

1 1

2 1

1

min P( ) P . . 0 ,

2

T T T s T

Q Q s t e v

l

    



   . (21)

Based on above derivation, we give fuzzy twin support vector machine algorithm FTSVM in the following which include linear FTSVM and nonlinear FTSVM.

Step 1 Choose a kernel function and compute membership of each sample for class +1 and class -1 to construct vector s1 and s2.

Step 2 Compute H and G.

Step 3 Set values of parameters v v₁, ₂(0 ,1).

Step 4 Solve the quadratic programming problems (15) or (20) and (16) or (21) to obtain U and R for two nonparallel hyperplanes.

Step 5 Compute distance dist+1 between xRnand x wT (1)b(1 ) 0 and distance dist-1 between xRnand

( 2 ) ( 2 )

0 T

x w b  , respectively.

Step 6 Compare dist+1 with dist-1, if dist+1> dist-1 then x is assigned to class +1 else class -1. 3.3 Fuzzy Membership Function

The design of fuzzy membership function is the key to the fuzzy algorithm using fuzzy technology. In this paper, we use class center method to generate fuzzy membership. Firstly, we denote the mean of class +1 as class-center

x

and the mean of class -1 as class center

x

_, respectively. The radius of each class r+ and r- are the farthest distance between

the each class training points and its class-center, respectively, namely

{ :max ||i i 1} ||

i x y

r x x 

  and

{ :max ||i i 1} ||

i x y

r x x



  . Fuzzy membership

s

_i is a function of the mean and radius of each class

-

-1 || || /( ), 1

,

1 || || /( ), -1

i i

i

i i

x x r if y

s

x x r if y

                

(6)

Volume 2, Issue 3, March 2013

Page 464

4. E

XPERIMENTAL RESULTS AND ANALYSIS

In this section, to evaluate the performance with proposed algorithm FTSVM, we investigate its classification accuracies and computational efficiencies on 7 real-world UCI benchmark datasets [6]. In experiments, we focus on the comparison between the proposed algorithm FTSVM and some methods which include TSVM, FSVM and SVM. All the classification methods are implemented in Matlab 7.0 environment on a PC with Intel P4 processor with 1GB RAM. We compute the fuzzy membership by a function of the distance between the points and its class center.

Table 1 gives the classification accuracy of linear FTSVM with TSVM, FSVM, and SVM using 5-fold cross-validation method. From Table 1, we can see that the accuracy of linear FTSVM is significantly better than linear TSVM on all 7 UCI datasets. We also report the training time of the algorithms which is shown in Table 2. It indicates that FTSVM is faster than the FSVM, because it solves two smaller size problems instead of one large size problem for all samples. However, there is no statistical different in average training time between FTSVM and FSVM for bupa dataset. Thus FTSVM is better than FSVM in the accuracy. Table 3 compares the performance of the FTSVM classifier with that of TSVM, FSVM and SVM for Gaussian kernel. The results in Table 3 are similar with that appeared in Table 1. That is to say that FTSVM has the better classification accuracy than TSVM in all datasets.

Table 1: Classification accuracy using 5-fold cross-validation

Data Set FTSVM TSVM FSVM SVM

australian 85.97±5.16 85.79±5.09 85.89±4.79 85.51±4.58 breast-cancer 65.00±4.13 62.83±3.16 64.86±2.48 64.86±4.73 bupa 74.82±3.18 68.40±6.38 74.80±2.52 69.28±3.02 fourclass 64.39±7.28 64.39±5.70 68.54±8.84 73.66±6.32 german 78.14±8.15 71.20±6.35 70.80±8.20 76.90±7.63 heart 85.56±4.45 82.22±6.60 82.59±6.08 81.48±8.58 pima 79.08±5.92 73.02±6.05 76.95±2.45 76.55±2.40

Table 2: Training time (in Seconds) Data Set FTSVM FSVM australian 12.50 133.53 breast-cancer 16.27 150.17

bupa 2.14 2.13 fourclass 7.89 29.44

german 14.98 50.09 heart 0.23 0.45 pima 8.58 42.65

Table 3: Classification accuracy using 5-fold cross-validation with RBF kernel

Data Set FTSVM TSVM FSVM SVM

australian 86.08±1.43 84.81±2.15 85.56±2.30 85.51±2.16 breast-cancer 65.60±4.32 64.42±3.87 65. 01±2.48 65.42±4.53 bupa 77.80±3.87 71.45±5.49 76.67±2.21 72.78±3.97 fourclass 64.53±5.51 64.45±5.49 64.38±6.18 64.35±6.48 german 78.20±8.15 72.45±6.35 71.68±8.20 73.50±7.63 heart 84.44±4.53 81.89±4.31 83.33±5.00 82.22±6.67 pima 79.51±5.92 73.70±6.05 77.42±2.45 76.55±2.40

(7)

Volume 2, Issue 3, March 2013

Page 465

5. C

ONCLUSIONS

In this paper, we study fuzzy twin support vector machine (TSVM) by applying fuzzy membership to training samples. Samples are classified by assigning them to the nearest one of two non parallel planes. Experiments on several UCI benchmark datasets show that the presented algorithm FTSVM is effective and feasible relative to twin support vector machine, fuzzy support vector machine and support vector machine. Moreover, we show that presented algorithm FTSVM is of anti-noise capability. In the future, we further study fuzzy twin support vector machine and expand it to multi-classification problem.

Acknowledgements

This work is support by Natural Science Foundation of China (No. 61073121) and Nature Science Foundation of Hebei Province (No. F2012201014).

References

[1] Lin Chun-Fu and Wang Sheng-De, “Fuzzy Support Vector Machines,” IEEE transactions on neural works, 13(2), pp. 464-471, 2002.

[2] Fung G, Mangasarian O L, “Proximal support vector machine classifiers,” In: Proc 7th ACM SIFKDD Intl Conf on Knowledge Discovery and Data Mining, pp. 77-86, 2001.

[3] Jayadeva, Khemchandni Reshma, “Suresh Chandra. Twin support vector machines for pattern classification,” IEEE Transaction on Pattern Analysis and Machine Intelligence, 29(5), pp. 905-910, 2007.

[4] Kumar M. Arun, Gopal M, “Least squares twin support vector machines for pattern classification,” Expert Systems with Applications, 36(4), pp. 7535-7543, 2009.

[5] Peng Xinjun, “A v-twin support vector machine (v-TSVM) classifier and its geometric algorithms,” Information Sciences, 180, pp. 3863-3875, 2010.

[6] Blake C. L., Merz C. J, “UCI Repository for Machine Learning databases IrvineCA: University of California,” Department of Information and Computer Sciences, http://www.ics.uci.edu/mlearn/MLRepository.html, 1998.

AUTHOR

Kai Li received the B.S. and M.S. degrees in Mathematics Department Electrical Engineering Department from Hebei University,Baoding, China, in 1982 and 1992, respectively. He received the Ph.D. degree from Beijing Jiaotong University, Beijing, China, in 2001. He is currently a Professor in School of Mathematics and Computer Science, Hebei University. His current research interests include machine learning, data mining, computational intelligence, and pattern recognition.