• No results found

An adaptive fuzzy clustering algorithm with generalized entropy based on weighted sample

N/A
N/A
Protected

Academic year: 2020

Share "An adaptive fuzzy clustering algorithm with generalized entropy based on weighted sample"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Volume 3, Issue 5, May 2014

Page 137

Abstract

Aiming at fuzzy clustering with generalized entropy, an adaptive fuzzy clustering algorithm with generalized entropy based on weighted sample is presented. Firstly, weight of sample is introduced into objective function for fuzzy clustering with generalized entropy. Based on it, we obtain optimization problem for fuzzy clustering with generalized entropy based on weighted sample. Then, we use Lagrange multiplier method to solve corresponding optimization problem and obtain degree of membership for each sample belonging to different cluster, centers of clusters and weights of samples. Finally, we select some representative datasets from UCI repository to conduct experiments. Experimental results show the effectiveness of presented algorithms above.

Keywords: fuzzy clustering, generalized entropy, weighted sample, adaptive method

1.

I

NTRODUCTION

Clustering is an important data analysis method and has been applied to pattern recognition, data mining, and etc. Up to now, researchers have proposed many different clustering algorithms. On them, division-based cluster analysis (also called as objective function based cluster analysis) is one of the commonly used methods, such as K-means and Fuzzy C-means. However, these clustering algorithms only consider data points or data attributes with the same importance. To solve these problems, researchers have proposed many different improved algorithms. Huang et al[1] introduced variable weights to the k-means clustering process and presented a k-means type clustering algorithm that can automatically calculate variable weights. Jing et al[2] included the weight entropy in the objective function to extend the k-means clustering process. They calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. To reduce the FCM algorithm's dependence on the initial cluster centers and data sets, Su et al[3] introduced weighting parameter to adjust the location of cluster centers and noise problem. To consider the particular contributions of different features, Li et al[4] presented a new feature weighted fuzzy clustering algorithm. In addition, Karayiannis [5] introduced entropy into fuzzy clustering and proposed fuzzy clustering algorithm based on maximum entropy. Following that, Li et al[6], and Wagner et al[7] combined the loss of function for data samples to cluster centers to propose maximum entropy clustering algorithm. Wei et al[8] presented a bidirectional association fuzzy clustering network to solve the problem of fuzzy clustering. In this paper, an adaptive fuzzy clustering with generalized entropy based on weighted sample is studied. In the process of clustering, with changes of degree of membership for each sample and centers of clusters, weights of samples are updated.

This paper is organized as follows. In section 2, we give an objective function about weighted sample for fuzzy clustering with generalized entropy and use Lagrange method to obtain membership of samples and centers of cluster. In section 3, an adaptive fuzzy clustering algorithm with the generalized entropy based on weighted sample is given. In the section 4, we choose commonly used datasets from UCI to test the presented algorithms’ performance. In the final section, conclusion is given.

2.

F

UZZY CLUSTERING WITH GENERALIZED ENTROPY BASED ON WEIGHTED SAMPLE

Let X { ,x x1 2,,xn}be a data set, where s i

xR , c is a positive integer greater than one and m1is fuzzy index;iji(xj)0is degree of membership for xj belonging to ith cluster center vi and

1

1

c ij i

. U is a membership matrix which is composed of all μij’s (i=1,2,…,c; j=1,2,…,n); V is a vector whose component consists of cluster center

vi(i=1,2,…,c). Objective function with generalized entropy’s fuzzy clustering is represented as

2 1 1

1 1

|| || (2 1) ( 1)

n c n c

m m m

G ij j i ij

j=1 i=1 j i

J (U,V) = x v  

 

   

 

. (1)

According to (1), we use Lagrange multiplier to obtain GEFCM algorithms [9]. As samples for data set have different importance in clustering process, here, weights of samples are introduced into objective function (1) above. So, we obtain the following objective function

An adaptive fuzzy clustering algorithm with

generalized entropy based on weighted sample

Kai Li1, Lijuan Cui2 and Xiuchen Ye3

1

School of mathematics and computer, Hebei University, Baoding 071002 China

2Library, Hebei University, Baoding 071002 China

(2)

Volume 3, Issue 5, May 2014

Page 138

2 1 1

1 1

|| || (2 1) ( 1)

n c n c

m m m

WG ij j i ij

j=1 i=1 j i

j

J (U,V) = w x v  

 

   



, (2) where δ is an adjustable parameter and wjis the weight of the j-th data sample.

Based on objective function (2) and constrained condition, we obtain the following optimization problem for fuzzy clustering with generalized entropy based on weighted sample

1

. . 1, 1, 2, ,

c ij i

WG

s t j n

min J (U,V)

 

 . (3)

In the following, we use Lagrange multipliers to solve optimization problem (3). So, Lagrange function L corresponding to (3) is written as

2 1 1

1 1

1 1 2 2

1 1 1

1

|| || (2 1) ( 1)

( 1) ( 1) ( 1)

; , ,

n c n c

m m m

ij j i ij

j= 1 i= 1 j i

c c c

i i n in

i i i

n

j

= x v

L(U ,V )

w

                 

 

 .

Here, we take the derivative of function L with respect to λj,μij and vi and let them equal to zero, namely

1 1 0 j ij c i L    

, (4)

1 2 1 1 1

(2

1)

0

||

||

m m m

j

j ij ij

ij L i j

m

x

v

w m

      

, (5)

1

2 ( j ) 0

n m

i j i

j i L

x

v v

  

 

. (6)

From ( 4 ), ( 5 ) and (6), and using simple algebra operation, we obtain the following degree of membership for each sample and centers of clusters.

1 1 2 2 1 1 1 1 1 || || || || 1

1, 2, , ; 1, 2, ,

(2 1) (2 1) m c k m j i j m j k j ij

i c j n

w x v

w x v

              

 

, (7)

1

1

1, 2 , ,

i n

m j i j j j

n m j i j j w w i c

v

x

   

. (8)

It is noted that weight of each sample wjis determined by some fixed method. Now, by using iterative method on the basis of (7) and (8), we obtain fuzzy clustering algorithm with generalized entropy based on weighted sample. Here, we call it as WGEFCM algorithm.

3.

A

DAPTIVE FUZZY CLUSTERING WITH GENERALIZED ENTROPY BASED ON WEIGHTED SAMPLE

To further improve performance of clustering, we modify (2) to obtain the following objective function

2 1 1

1 1

|| || (2 1) ( 1)

n c n c

m m m

AWG j ij j i ij

j=1 i=1 j i

J (U,V) = w x v  

 

   



, (9) where β is a parameter.

Based on objective function (9), we obtain the following optimization problem for fuzzy clustering with generalized entropy based on weighted sample

1

1

. . 1, 1, 2, ,

1 c ij i n j j

A W G

s t j n

w

m in J (U ,V )

    

 . (10)

(3)

Volume 3, Issue 5, May 2014

Page 139

2 1 1

1 1

1 1 2 2

1 1 1 1

1

|| || (2 1) ( 1)

( 1) ( 1) ( 1) ( 1)

; , ,

n c n c

m m m

j ij j i ij

j= 1 i=1 j i

c c c n

i i n in j

i i i j

n

= x v

w

L(U ,V )

w                     

 

 .

Here, we take the derivative of function L with respect to λj,μij, vi, wj and γ, and let them equal to zero, namely

1 1 0 j ij c i L    

, (11)

1 2 1 1 1

(2

1)

0

||

||

m m m

j

j ij ij

ij L i j

m

x

v

w m

      

, (12)

1

2 ( j ) 0

n m

i j i

j i L

x

v v

  

 

, (13)

1 2 1

|| j || 0

c m

j ij i

i j L x w w v    

  

. (14)

1 1 0 j n j L w      

(15)

From (11)-(15) and using simple algebra operation, we obtain the following degree of membership for each sample, centers of clusters and weights of samples.

1 1 2 2 1 1 1 1 1 || || || || 1

1, 2, , ; 1, 2, ,

(2 1) (2 1) m c j k j m j i m j k ij

i c j n

w x v

w x v

              

 

, (16)

1

1

1, 2 , ,

i n

m j i j j j

n m j i j j w w i c

v

x

   

, (17)

1 2 1 1 1 2 1 1 1 ( || || ) ( || || )

1, 2 , ,

c m

i j j i

i

j n c

m

il l i

l i w x v x v j n         

 

. (18)

In the following, we give the weighted sample’s fuzzy clustering algorithm with generalized entropy and name it as AWGEFCM algorithm.

Step 1 Initialize c centers of clusters and weights of samples, and assign m, β andδ. Step 2 Compute degree of membership μij for each sample according to (16).

Step 3 Compute center of cluster vi for each cluster according to (17). Step 4 Calculate weights of samples wj according to (18).

Step 5 Repeat step 2 to step 4 until the center of cluster vi does not change.

4.

E

XPERIMENTS

In order to verify the effectiveness of the proposed algorithm AWGEFCM, we select five datasets from UCI data repository[10]. In addition, we choose two indexes to evaluate performance of clustering result. They are accuracy ACC

and mutual information MI, respectively, where ACC is represented asACC=n err 100%

n

 , n is number of samples in

dataset X and err is number of misclassified samples. Moreover, we initialize centers of clusters by using randomly chose method form dataset in the following experiment.

(4)

Volume 3, Issue 5, May 2014

Page 140

are relation between fuzzy index m and accuracy whereas (b), (d), (f), (h) and (j) are relation between fuzzy index m and mutual information (MI).

(a) β=5 0.5

0.6 0.7 0.8 0.9 1

1.1 1.5 2 2.5 3 5 7 9 11

Fuzzy index m

A

c

c

u

r

a

c

y

(b) β=5 0.72

0.74 0.76 0.78 0.8

1.1 1.5 2 2.5 3 5 7 9 11

Fuzzy index m

MI

(c) β=10 0.86

0.88 0.9 0.92 0.94

1.1 1.5 2 2.5 3 5 7 9 11

Fuzzy index m

A

c

c

u

r

a

c

y

(d) β=10 0.7

0.72 0.74 0.76 0.78 0.8

1.1 1.5 2 2.5 3 5 7 9 11

Fuzzy index m

M

I

(e) β=20 0.86

0.88 0.9 0.92 0.94

1.1 1.5 2 2.5 3 5 7 9 11

Fuzzy index m

A

c

c

u

r

a

c

y

(f) β=20 0.72

0.74 0.76 0.78 0.8

1.1 1.5 2 2.5 3 5 7 9 11

Fuzzy index m

M

I

(g) β=50 0.86

0.88 0.9 0.92 0.94

1.1 1.5 2 2.5 3 5 7 9 11

Fuzzy index m

A

c

c

u

r

a

c

y

(h) β=20 0.7

0.72 0.74 0.76 0.78 0.8

1.1 1.5 2 2.5 3 5 7 9 11

Fuzzy index m

M

I

(i) β=100 0.86

0.88 0.9 0.92 0.94

1.1 1.5 2 2.5 3 5 7 9 11

Fuzzy index m

A

c

c

u

r

a

c

y

(j) β=100 0.7

0.72 0.74 0.76 0.78 0.8

1.1 1.5 2 2.5 3 5 7 9 11

Fuzzy index m

M

I

Figure 1Clustering performance for algorithm AWGEFCM, where (a), (c), (e), (g) and (i) are relation between fuzzy index m and accuracy whereas (b), (d), (f), (h) and (j) are relation between fuzzy index m and mutual information (MI).

It is seen that when β=5, clustering performance of algorithm AWGEFCM is approximately stable with the change of fuzzy index m. However, when β is taken as 10, 20, 50 and 100, respectively, there is bigger change for clustering performance of algorithm. Especially, when fuzzy index m is taken as 7, 9 and 11, respectively, we obtain better clustering results.

(5)

Volume 3, Issue 5, May 2014

Page 141

(a) Australian

0 0.2 0.4 0.6 0.8 1

5 10 20 50 100 1000

Fuzzy index m

A

c

c

u

r

a

c

y

(b) Breast-w 0.9725

0.973 0.9735 0.974 0.9745 0.975 0.9755

5 10 20 50 100 1000

Fuzzy index m

A

c

c

u

r

a

c

y

(c) Heart 0.6

0.602 0.604 0.606 0.608

5 10 20 50 100 1000

Fuzzy index m

A

c

c

u

r

a

c

y

(d) Ionosphere 0.685

0.69 0.695 0.7 0.705

5 10 20 50 100 1000

Fuzzy index m

A

c

c

u

r

a

c

y

Figure 2 Accuracy for different dataset using algorithm AWGEFCM

By Figure 2, we know that for different dataset, when βtakes different value, clustering performance is different. Besides, we also compare clustering performance for different algorithm on datasets Australian, breast-w, Heart, Ionosphere and Iris. The selected clustering algorithms mainly include FCM, GEFCM [10] and WGEFCM. Experimental results are given in Table 1. It is seen that we obtain better clustering results using presented algorithm AWGEFCM compared with FCM, GEFCM and WGEFCM in some datasets.

Table 1: Comparison result of different algorithm

Dataset FCM GEFC

M

WGEF CM

AWGEF CM

Australian 56.09% 56.09% 56.67% 60.14%

Breast-w 95.59% 95.61% 95.88% 97.5%

Heart 59.25% 59.26% 60.74% 60.74%

Ionosphere 70.94% 71.23% 71.23% 71.23%

Iris 89.33% 92.67% 91.33% 92.67%

5.

C

ONCLUSIONS

In this paper, we study fuzzy clustering with weights of samples based on generalized entropy. Objective function for fuzzy clustering method with generalized entropy based on weighted sample is obtained. Then, an adaptive fuzzy clustering algorithm with generalized entropy based on weighted sample is presented. We select some representative datasets to conduct experimental study. Experimental results show that the presented algorithm is effective.

Acknowledgment

This work is support by Natural Science Foundation of China (No. 61375075) and Nature Science Foundation of Hebei Province (No. F2012201014).

REFREENCE

[1] J.Z. Huang, K.N. Michael, H. Rong, and Z. Li, “Automated Variable Weighting in k-Means Type Clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27( 5),pp. 657-668 , 2005.

[2] L. Jing, K.N. Michael, and J.Z. Huang, “An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data,” IEEE Transactions on Knowledge and Data Engineering, vol.19(8),pp. 1026-1041,2007.

[3] X. Su, X. Wang, Z. Wang, Y. Xiao, “A New Fuzzy Clustering Algorithm Based on Entropy Weighting,” Journal of Computational Information Systems, vol. 6(10), pp. 3319-3326,2010.

[4] J. Li, X. Gao, X. Jiao, “A new feature weighted fuzzy clustering algorithm,” Acta electronica sinica, vol. 34(1), pp. 89-92, 2006.

(6)

Volume 3, Issue 5, May 2014

Page 142

[6] R.P. Li, M. Mukaidon, “A maximum entropy approach to fuzzy clustering,” In Proceedings of the Fouth IEEE

international conference on fuzzy system,Yokohama, Japan, pp. 2227-2232,1995.

[7] D. Tran and M.Wagner, “Fuzzy entropy clustering,” In Proceedings of the Ninth IEEE International Conference on Fuzzy Systems, vol. 1, pp. 152-157,2000.

[8] C. Wei, C.Fahn, “The multisynapse neural network and its application to fuzzy clustering,” IEEE Transactions on Neural Networks, vol.13(3), pp. 600-618, 2002.

[9] K. Li, H. Y. Ma and Y. Wang, “Unified model of fuzzy clustering algorithm based on entropy and its application to image segmentation,” Journal of Computational Information Systems, vol. 7(15), pp. 5476-5483,2011.

[10]Blake C. L., Merz C. J, “UCI Repository for Machine Learning databases IrvineCA: University of California,” Department of Information and Computer Sciences, http://www.ics.uci.edu/mlearn/MLRepository.html, 1998.

AUTHOR

Kai Li received the B.S. and M.S. degrees in Mathematics Department and Electrical Engineering Department from Hebei University,Baoding,China, in 1982 and 1992, respectively. He received the Ph.D. degree from Beijing Jiaotong University, Beijing, China, in 2001.

He is currently a Professor in School of Mathematics and Computer Science, Hebei University. His current research interests include machine learning, data mining, computational intelligence, and pattern recognition.

Lijuan Cui received the B.S. degrees in Education Department from Hebei University, Baoding, China, in 2007. She is currently a vice Professor with library of Hebei University. Her current research interests include data mining and information retrieval.

Figure

Figure 1 index m and accuracy whereas (b), (d), (f), (h) and (j) are relation between fuzzy index m and mutual information (MI)
Figure 2 Accuracy for different dataset using algorithm AWGEFCM

References

Related documents

They say the dimensions of a space determine the shape of what it can hold, and I hold my head as if to understand myself through this body encased, as if to penetrate the pleats

DEGs: Differentially expressed genes; DMFS: Distant metastasis-free survival; EC: Endothelial cells; EGFP: Enhanced Green Fluorescence Protein; EMT: Epithelial to

To know whether the target value for the CO2-concentration was exceeded in the living room and in the two largest bedrooms in every house, the average hours per day that

Accreditation documentation includes the IHLAP accreditation certificate, scope of accreditation document and a copy of the current AIHA-LAP, LLC license agreement (if your

For the purposes of this CEMS Code, acceptable alternative monitoring systems are those that meet the same criteria of performance with respect to accuracy, precision,

- จัดทําขอมูลประมาณการความตองการใชงบประมาณของหนวยงาน - ดูแล ควบคุมการใชงบประมาณ - จัดหา ควบคุมดูแล การใชงาน การซอมบํารุง วัสดุ ครุภัณฑใหเพียงพอ -

Nagel (2004) Agent- based activities planning for an iterative traffic simu- lation of Switzerland: Activity time allocation, presen- tation, The 4th Swiss Transport Research

This question arising is, given the peculiar nature and the serious complexity involved in the enforcement of ICJ judgment through the Security Council and General Assembly