• No results found

Fuzzy Entropy Clustering Using Possibilistic Approach

N/A
N/A
Protected

Academic year: 2021

Share "Fuzzy Entropy Clustering Using Possibilistic Approach"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Procedia Engineering 15 (2011) 1993 – 1997 1877-7058 © 2011 Published by Elsevier Ltd. doi:10.1016/j.proeng.2011.08.372

Procedia

Engineering

Procedia Engineering 00 (2011) 000–000 www.elsevier.com/locate/procedia

Advanced in Control Engineeringand Information Science

Fuzzy Entropy Clustering Using Possibilistic Approach

FU Hai-Jun

a,b

, WU Xiao-Hong

a,

*, MAO Han-Ping

b

, WU Bin

c

aSchool of Electrical and Information Engineering, Jiangsu University, Jiangsu Zhenjiang 212013, China

bJiangsu Provincial Key Laboratory of Modern Agricultural Equipment and Technology, Jiangsu University, Jiangsu Zhenjiang

212013, China

cSchool of Information and Computer Science, Anhui Agricultural University, Anhui Hefei 230036, China

Abstract

Fuzzy entropy clustering (FEC) is sensitive to noises the same as fuzzy c-means (FCM) clustering because the probabilistic constraints in their memberships. To solve this noise sensitive problem of FCM, Krishnapuram and Keller have presented the possibilistic c-means (PCM) clustering by abandoning the constraints of FCM. A possibilistic type of fuzzy entropy clustering is proposed based on fuzzy entropy clustering and possibilistic c-means clustering. The proposed algorithm deals with noisy data better than FEC. Furthermore, the parameters of PCM is optimized using possibilistic clustering trick. Our experiments show that FEC is sensitive to noises while our proposed algorithm is insensitive to noises and has better clustering accuracy than FEC.

Keywords: Fuzzy entropy clustering; fuzzy c-means; possibilistic c-means; noisy data

1. Introduction

Since Zadeh introduced the fuzzy set [1], it has advanced in many disciplines, such as control theory, optimization, pattern recognition, image processing, data mining, etc, in which information is incomplete or imprecise. The fuzzy clustering based on fuzzy set theory is used to deal with ill-defined boundaries between clusters. The well-known fuzzy c-means (FCM) clustering is conceived by Dunn and generalized by Bezdek [2]. FCM that is based on least-squared error clustering criterion assigns the memberships of a data point across classes sum to one by the probabilistic constraint. This constraint avoids the trivial

* Corresponding author. Tel.: +86 51188791245; fax: +86 511 88780088. E-mail address: wxh_www@163.com.

Open access under CC BY-NC-ND license.

© 2011 Published by Elsevier Ltd.

Selection and/or peer-review under responsibility of [CEIS 2011]

(2)

t

solution of all memberships being equal to 0, and it is appropriate to interpret memberships as degrees of sharing. However, the memberships do not always correspond to the intuitive concept of degree of belong or compatibility. Furthermore, the FCM is sensitive to noises. To overcome these disadvantages Krishnapuram and Keller have presented the possibilistic c-means (PCM) clustering [3] by abandoning the constraint of FCM and constructing a novel objective function. The PCM can deal with noisy data better than FCM. Inspired by Shannon’s statistical entropy theory, fuzzy entropy clustering (FEC) models [4,5,6] have been proposed. Based on maximum entropy inference in fuzzy clustering, FEC assigns the memberships of a data point across classes sum to one by the probabilistic constraints the same as FCM. To overcome the noises sensitivity problem of FEC, in this paper, a possibilistic type of fuzzy entropy clustering (PTFEC) is proposed based on fuzzy entropy clustering and possibilistic c-means clustering. The proposed algorithm deals with noisy data better than FEC. Furthermore, the parameters of PCM is optimized using possibilistic clustering trick.

2. Possibilistic c-means clustering with optimised parameters

Possibilistic c-means clustering (PCM) [3] is a model-seeking algorithm, and it abandons the probabilistic constraints used by FCM. The objective function of PCM is described as follows:

2 1 1 1 1

( , )

c n ik ik c i n

( log

ik ik ik

)

i k i k

J

t D

η

t

t

= = = =

=

∑∑

+

∑ ∑

U V

(1) Here

0

≤ ≤ ,

t

ik

1

m >

1,

D

ik

=

x ν

k

i . And c is the number of clusters, n is the number of data points, is the typicality of

t

ik

x

k in class , and it depends on all data. Krishnapuram and Keller suggest choosing the parameters

i

i

η

that are positive constants by computing [3]

2 1 1

,

n m ik ik k i n m ik k

u D

K

K

u

η

= =

=

>

0

(2) In this section, we use the technique that comes from possibilistic clustering algorithm (PCA) [7] to compute the parameters

η

i. Inspired by PCA algorithm, we define the objective function of the new PCM as follows: 2 2 2 1 1 1 1

( , )

c n ik ik c n

( log

ik ik ik

)

i k i k

J

t D

t

t

m c

σ

= = = =

=

∑∑

+

∑∑

U V

t

(3) Here the parameter is a normalization term that measures the degree of separation of the data set, and it is reasonable to define as the sample co-variance. That is:

2

σ

2

σ

2 2 1

1

n k k

x x

n

σ

=

=

with

x =

1

nj 1 j

n

=

x

(4)

Minimizing equation (3) is optimized under constraints and the following equations are obtained:

2 2 2

exp(

ik

), ,

ik

m cD

t

σ

=

i k

(5)

(3)

1

,

n ik k k i n

t x

ν

=

=

i

1 ik k

t

=

(6)

3. Possibilistic type of FEC

Because the fuzzy entropy clustering is sensitive to noises, we combine the possibilistic c-means clustering with optimized parameters and fuzzy entropy clustering to propose a possibilistic type of FEC (PTFEC). The possibilistic type of FEC has no probabilisticconstraints, so it is insensitive to noises. Its objective function is given as follows:

2 2 2 1 1 1 1 1 1 ( , ) c n ik ik c n ( logik ik ik) c n iklog ik i k i k i k J t D t t t t t m c

σ

λ

= = = = = = =

∑∑

+

∑∑

− +

∑∑

U V (7)

To minimize equation (7), subject to the constraints

0

≤ ≤1

t

ik , we obtain the following equations:

2 2 2 2

(

)

exp(

ik

), ,

ik

m c D

t

m c

λ

σ

λ

+

=

+

i k

(8) 1 1

,

n m ik k k i n m ik k

u x

ν

i

u

= =

=

(9) If

D

ik

=

x ν

k

i

>

0

for all and k≥1, and X contains c<n distinct data points, then the algorithm described below is called PTFEC -AO algorithm:

i

Initialization

(1) Run FCM until termination to obtain the class center V as V(0) used by PTFEC, and use Eq.(4) to

calculate the parameter

σ

2; (2) Fix c, 1<c<n;

(3) Set iteration counter r =1 and maximum iteration rmax;

Repeat

Step 1 Update membership T(r) by Eq.(8);

Step 2 Update V(r) by Eq.(9);

Step 3 Increment r;

Until (

V

( )r

V

( 1)r

<

ε

) or (r> rmax)

(4)

We conduct numerical experiments on data set X12 [8,9] by running FCM-AO, FEC-AO and

PTFEC-AO, respectively. X12 contains 12 data points whose coordinates are given in [8,9]. X12={ x1, x2,…,x12}.

There are ten points (except x6 and x12) form two diamond shaped clusters with five points each on the

left and right sides of the y axis. Here x6 and x12 are noises or outliers and they are equidistant from all

corresponding pairs of points in the two clusters. Computational condition: =0.00001, rmax = 100, m

=2.0,

λ

=1.0. From table 1, the membership values of x6 and x12 assigned by FEC are: u

ε

16=0.89, u26=0.11

and u112=0.00, u212=1.00. According these membership values x6 should belongs to class 1 and x12 should

belongs to class 2. But in fact, x6 and x12 are noisy points. So FEC is also sensitive to noises. PTFEC

abandons the probabilistic constraint in FEC. The typicality values of x6 and x12 are small from PTFEC

algorithm. Because x12 is farther away from cluster centers than x6, the PCM assigns the smaller typicality

values of x12 than those of x6. This reflects the real situations. That is to say, x6 and x12 are more atypical

than other 10 data points. So PTFEC can distinguish the noises from datasets.

Table 1. Terminal U and T fromFEC and PTFEC

No. FEC PTFEC

1 UT 2 UT 1T T T2T 1 1.00 0.06 0.29 0.00 2 1.00 0.03 0.32 0.00 3 1.00 0.01 0.74 0.00 4 1.00 0.10 0.32 0.00 5 1.00 0.09 0.36 0.00 6 0.89 0.11 0.03 0.03 7 0.00 1.00 0.00 0.36 8 0.00 1.00 0.00 0.32 9 0.00 1.00 0.00 0.74 10 0.00 1.00 0.00 0.32 11 0.00 1.00 0.00 0.29 12 0.00 1.00 0.00 0.00

Table 2. Clustering Accuracy from FEC and PTFEC on IRIS Data Set

λ

FEC PTFEC 0.1 89.3% 92.7% 0.2 88.7% 92.0% 0.3 88.7% 92.0% 0.4 89.3% 91.3% 0.5 89.3% 91.3% 0.6 89.3% 92.0% 0.7 90.0% 92.0% 0.8 90.0% 92.0% 0.9 89.3% 91.3% 1.0 89.3% 91.3%

We perform experiments by running FEC-AO and PTFEC-AO on the IRIS data set [10] that is widely used in experiments. It is a four-dimensional data set that includes three classes: Setosa, Versilcolor and Viginica and each class has 50 data points. The computational condition is =0.00001, maximum number of iterations rmax = 100, and the initialization of cluster centers

ε

0 0.4326 0.2877 1.1892 0.1746 1.6656 1.1465 0.0376 0.1867 0.1253 1.1909 0.3273 0.7258 − ⎡ ⎤ ⎢ ⎥ = − − − − ⎢ ⎥ ⎣ ⎦ V

We change the value of

λ

, and the clustering accuracy of from FEC and PTFECon IRIS data set are illustrated in Table 2. From Table 2 we know PTFEChas better clustering accuracy than FEC.

(5)

5. The results

In this paper, we propose a new possibilistic clustering model called possibilistic type of fuzzy entropy clustering (PTFEC) for extending fuzzy entropy clustering (FEC) to its possibilistic type. PTFEC combines the advantages of FEC and PCM, and overcomes their defects. We make experiments on data set X12 to find that PTFEC is more insensitive to noises than FEC. Next example is that we make

experiments on IRIS data set. The experimental results show that PTFEC has better clustering accuracy than FEC.

Acknowledgements

The authors would like to thank China Postdoctoral Science Foundation funded project (No. 20090460078) for financially supporting this research.

References

[1] Zadeh L A. Fuzzy sets. Inf. Control 1965, 8: 338-353.

[2] Bezdek J C. Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981.

[3] Krishnapuram R, Keller J. The Possibilistic c-Means Algorithm: Insights and Recommendations. IEEE Trans. Fuzzy Systems 1996, 4(3): 385-393.

[4] Tran D, Wagner M. Fuzzy Entropy Clustering. The Ninth IEEE International Conference on Fuzzy Systems, 2000, vol. 1, no.7-10, pp. 152-157.

[5] Li R P, Mukaidono M. Gaussian Clustering Method Based on Maximum-Fuzzy-Entropy Interpretation. Fuzzy Sets and Systems 1999, 102(2): 253-258.

[6] Li R P, Mukaidono M. A Maximum-Entropy Approach to Fuzzy Clustering. International Joint Conference of the Fourth IEEE International Conference on Fuzzy Systems and The Second International Fuzzy Engineering Symposium., Proceedings of 1995 IEEE International Conference on Fuzzy Systems, 1995, Vol 4, pp.2227-2232.

[7] Yang M S, Wu K L. Unsupervised Possibilistic Clustering. Pattern Recognition 2006, 39(1): 5-21.

[8] Wu X H, Zhou J J. A novel possibilistic fuzzy c-means clustering. Acta Electronica Sinica 2008, 36(10): 1996-2000. [9] Pal N R, Pal K, Bezdek J C. A Possibilistic Fuzzy c-Means Clustering Algorithm, IEEE Trans. Fuzzy Systems 2005, 13 (4): 517-530.

[10] Bezdek J C, Keller J M, Krishnapuram R et al. Will the Real Iris data stand up?. IEEE Trans. Fuzzy System 1999, 7(3): 368-369.

References

Related documents

This research adopts the Net Present Value (NPV) method of economic evaluation for the implementation of the Life Cycle Cost (LCC) model of cost comparison of

To estimate the cost associated with IPF monitoring, the mean unit costs of medical visits, tests and hospital admis- sions (see Additional file 2: Table S1) were multiplied by

They say the dimensions of a space determine the shape of what it can hold, and I hold my head as if to understand myself through this body encased, as if to penetrate the pleats

If you own a business or know of a local business that might like to support your local Catholic parish by advertising in its parish bulletin, please contact J.S.. Grow your business

History Curriculum Resource Guide • Core &amp; Supplemental Resources.. •

Men for å avrunde dette avsnittet vil jeg konkludere med at de tunghørte elevene som løste diktoppgaven de fikk utlevert, hadde en positiv opplevelse av det å skrive dikt, samt

As was discussed previously in section 2.1, updated inter-hourly regulating power prices are not made public, and therefore the field test utilised simulated

The obtained results for the amounts of heats esti- mated, the thermal efficiency of the adsorber-collector, and the coefficient of performance of solar adsorption re- frigerator