Supervised feature subset selection using extended fuzzy absolute information measure for handling different discretized datasets

(1)

Science

Procedia Computer Science 00 (2009) 000–000

www.elsevier.com/locate/procedia

ICEBT 2010

Supervised Feature Subset Selection using Extended Fuzzy Absolute

Information Measure for Handling Different Discretized Datasets

K.Sarojini

a

*, Dr.K.Thangavel

b

a

IEEE student member, Department of Computer Applications, S.N.R. Sons College, Coimbatore-641 015,India. b

Department of Computer Science, Periyar University, Salem-636 011, India.

Abstract

Feature subset selection is applied in wide range of domains like biometrics, chemistry, text processing, pattern recognition, speech processing and vision. Most of the prior work in biometrics and other applications have emphasized the importance of feature extraction and classification. However, the critical issue of examining the usefulness of extracted features has been largely ignored. But feature subset selection helps to identify and remove much of the irrelevant and redundant features. It reduces the dimensions of datasets. So it avoids the problems faced in feature extraction. The proposed Extended Fuzzy Absolute Information Measure (EFAIM) is applied to select feature subsets by focusing on boundary samples. The proposed method can select feature subsets with the minimum number of features. The experimental results with UCI datasets show that the proposed method is effective and efficient in selecting subset with minimum number of features than fuzzy entropy measure. This experiment shows that among the given number of features in the datasets, a small relevant subset of features is only selected for feature subset. Thus the selected subset of features is necessary in practice for building an accurate result.

Keywords: Discretization; Feature selection; Fuzzy Entropy; Fuzzy Absolute Information Measure

1. Introduction

In all type of applications, datasets of database has rows and columns, where rows represents the instances(records) and columns represents features related to that dataset. In datasets, features represented in the columns may have irrelevant and redundant features. These irrelevant features should be avoided to make efficient datasets. Feature Subset Selection is an essential pre-processing task for all applications. Feature selection process refers to choosing subset [1] of attributes from the set of original attributes. Aim of Feature Subset Selection is to reduce the number of features and to increase the Classification Accuracy. It is obvious that a data set might have irrelevant and relevant features. If relevant features are selected properly, then it is possible to increase the classification accuracy rates [2, 3].

This paper presents supervised feature subset selection based on EFAIM. First, discretization algorithms are used to discretize numeric features using K-Means, Fuzzy C Means (FCM) and Median as initial centriod of K-Means to construct the membership function of each fuzzy set of a feature. Then, it selects the feature subset, based on proposed EFAIM focusing on boundary samples. The proposed method can select feature subset with minimum number of features. The experiments are carried out by using Bio-metric data sets taken from UCI Repository of Machine Learning Databases. The proposed feature subset selection method can select feature subset with minimum number of features when compared to fuzzy entropy based feature subset selection method. Thus the selected feature subset can be used in all applications to get accurate results. The rest of this paper is organized as follows. Section 2 briefly reviews related work of feature subset selection. Section 3, presents proposed EFAIM for feature subset selection focusing on the boundary samples. An Experimental analysis is performed in section 4. Section 5 concludes this paper.

* Corresponding author. Tel.: +0-422-259-2529;

E-mail address: [email protected].

c

⃝ 2010 Published by Elsevier Ltd

Procedia Computer Science 2 (2010) 256–264

www.elsevier.com/locate/procedia

1877-0509 c⃝ 2010 Published by Elsevier Ltd

doi:10.1016/j.procs.2010.11.033

Open access under CC BY-NC-ND license.

(2)

2. Literature Review 2.1. Fuzzy Entropy Measure

Fuzzy sets and logic are powerful mathematical tools for controlling uncertain systems; they are facilitators for approximate reasoning in decision making in the absence of complete and precise information. Their role is significant when applied to complex phenomena, not easily described by traditional mathematical tools. Entropy is a measure, which measures the impurity of a collection. Some of the existing entropy measures are found in [4, 5, 6, 7, 8, 9]. In [8], Lee et al. presented a fuzzy entropy measure of an interval, based on Shannon’s entropy measure and Luca’s axioms.

Assume that a set X of samples is divided into a set C of classes. The class degree CDc(Ã) of the samples of class c, where c є

C, belonging to the fuzzy set Ã is defined by:

( )

A c A x X c x X

x

CD A

x

μ

∈ ∈

=

∑

% %

%

(1)

where Xc denotes the samples of class c, cєC,

μ

_A% denotes the membership function of the fuzzy set,

μ

A%

( )

x

denotes the

membership grade of x belonging to the fuzzy set Ã, and

μ

_A_%

( )

x

∈

[ 0,1] The fuzzy entropy FEc(Ã) [9] of the samples of class c,

where cєC, belonging to the fuzzy set Ã is defined as follows:

2 ( ) ( ) log ( )

C C C

FE A% = −CD A% CD A% (2)

The fuzzy entropy FE(Ã) [9] of a fuzzy set Ã is defined by:

( ) C( ) c C FE A FE A ∈ =

∑

% % (3)

2.2. Information Gain Measure (IG)

Information Gain Measure [10, 11] is a quantitative measure of an attribute. Entropy measures the impurity of a collection, whereas Information Gain measures the purity of a collection with respect to class attribute. The amount by which the entropy of X decreases reflects additional information about X provided by Y and is called information gain.

2.3. Fuzzy Absolute Information Measure (FAIM)

The FAIM [12] can be used to measure the degree of similarity between two fuzzy sets A and B, and so they are useful for further development of the theory of similarity measures. Suppose X is a discrete universe of discourse, and A, B is an element

of ξ(X), then the absolute difference value, denoted by H(A, B) is called the fuzzy absolute information measure of B to A,

where ξ(X) is a set consisting of all fuzzy subsets of discrete universe of discourse X, and is defined as,

( , )

(

)

( )

( / )

H A B

=

H A

∩

B

=

H A

−

H A B

(4) is called the fuzzy absolute mutual information of fuzzy set A to fuzzy set B. Here H(A), H(B) denotes the fuzzy entropy measure of attribute set and H(A/B) is information gain of the attribute set. H(A, B) shows an influence degree of the fuzzy set A to fuzzy set B in fuzzy information processing aspects. It is a generalized absolute fuzzy entropy measure, and is called the fuzzy absolute information measure (FAIM).

2.4. Fuzzy Entropy Based Feature Selection

In [9], Jen-Da shie, Shyi-Ming Chen has presented a method for dealing with feature subset selection based on fuzzy entropy measure for handling classification problems. It considers boundary samples while selecting features. Thus fuzzy entropy method, measures the impurity of a feature. This feature selection method selects relevant features to get higher average classification accuracy. This feature selection method can select features to get higher average classification accuracy rates than the ones selected by MIFS method [13], the FQI method [14], Dong-and-Kothari’s method [15] and the OFFSS method [16].

(3)

This method is only efficient for smaller datasets. For feature selection of large datasets it takes more time. It does not measure the purity of datasets too.

2.5. Discretization Process

In data mining, discretization process is known to be one of the most important data preprocessing tasks. Most of the existing machines learning algorithms are capable of extracting knowledge from databases that store discrete attributes. If the attributes are continuous, the algorithms can be integrated with discretization algorithms which transform them into discrete feature. Discretization methods are used to reduce the number of values for a given continuous attributes by dividing the range of the attribute into intervals. Discretization makes learning more accurate and faster [17, 18, 19, 20].

• Normal discretization process specifically consists of the following four steps: • Sort all the continuous values of the feature to be discretized.

• Choose a cut point to split the continuous values into intervals. • Split or merge the intervals of continuous values.

• Choose the stopping criteria of discretization process [20].

3. Proposed Work 3.1 Methodology

The feature subset selection problem can be regarded as a dimension reduction problem. The proposed method uses “boundary samples” instead of a full set of samples to select the feature subset. However, using the boundary samples to calculate the EFAIM of a feature subset directly is not possible. So, an indirect method is used to simplify the feature subset selection process described as follows.

FAIM with Shannon’s entropy of the interval I1 is equal to the interval I2, it cannot distinguish the entropies of the intervals I1 and

I2. But, EFAIM with Lee’s fuzzy entropy measure can indicate that the interval I2 is more ambiguous than that in the interval I1.

So, using Lee’s fuzzy entropy measure and information gain measure in FAIM, the proposed EFAIM is set up, which is used as a measure of proximity between two fuzzy sets. In this process Extension Matrix (EM), Combined Extension Matrix (CEM) and

boundary samples are generated from [9]. Applying the proposed EFAIM, Extension matrix EMf is generated for each feature f.

Then based on class degree, Fuzzy Feature Absolute Information Measure FFAIM(f) is calculated for each feature f. Then Combined-Extension-Matrix function (CEM) is used to construct the extension matrix of the membership grades of the values of a feature subset. Finally, EFAIM of each feature is calculated based on the Boundary Sample of Fuzzy Feature Absolute

Information Measure BSFFAIM (f1, f2) of a feature subset {f1, f2}.

EFAIM (A, B), is called as Extended Fuzzy Absolute Information Measure of B to A. It is defined as:

( , )

(

)

( )

( / )

( )

( / )

H A B

=

H A

∩

B

=

H A

−

H A B

=

FE A

−

FE A B

(5)

The EFAIM of a feature increases when the number of clusters increases. However, too many clusters could cause the over fitting problem and reduce their classification accuracy rates when they classify new instances. This paper, uses a threshold value

Tc to avoid the over fitting problem, where Tcє [ 0, 1]. Here, by using Lee’s fuzzy entropy method, fuzzy sets having lower

fuzzy entropies can be omitted. Threshold value Tr is used, where Tr є [0,1], to omit the fuzzy sets of feature whose maximum

class degree is larger than or equal to the threshold value Tr given by the user for feature subset selection.

3.2 Proposed Algorithm

The proposed work involves two phases and its overall structure is represented in the following algorithm. Phase – I

The first phase proposes K-Means, Fuzzy C means and Median as initial centroid of K-Means clustering algorithms to discretize numerical attributes and construct the triangular membership function to fuzzyfy all numeric features. The steps are given here under.

(4)

Step1: Initially, set the number K of clusters to 2.

Step2: Use the (K-Means or Fuzzy C means or Median as initial Centroid of K-Means) clustering algorithm

to generate K cluster centers based on values of a feature, where K ≥ 2.

Step3: Construct the membership function of the fuzzy sets using triangular membership functions based on these K Cluster centers, to create class degree respectively.

Step4: Calculate the fuzzy entropy of feature f using class degree. Step5: Calculate Information gain of feature f using class degree.

Step6: Calculate EFAIM for feature f using fuzzy entropy measure and information gain measure.

Step7: If the increasing rate of EFAIM of feature f is larger than the threshold value Tc given by the

user, where Tcє [0, 1], then let K = K+1 and go to Step 2.

Otherwise, let K = K−1 and Stop.

Phase – II

Assume that a set R of samples is divided into a set C of classes, where R = {r1, r2, . . . , rc}, F denotes a set of candidate

features and FS denotes the selected feature subset. The algorithm for feature subset selection is now presented as follows:

Step1: Construct extension matrix EMf. For each feature f membership grades belonging to the fuzzy

sets are used to calculate Extended Fuzzy Absolute Information Measure EFAIM(f).

Step2: Place the feature with the maximum EFAIM into the selected feature subset FS and remove it from the set F of candidate features.

Step3: Repeatedly place the feature which can improve the EFAIM of the feature subset into FS until no such a feature exists. Here the Boundary Samples based Fuzzy Absolute Information Measure BSFFAIM (FS, f) of the feature subset FS U {f} is calculated focusing on boundary samples. Finally, FS is a selected feature subset.

4. Experimental Analysis And Discussion

The proposed method is compared with raw data and fuzzy entropy based feature selection method to get minimum number of feature subset. It is observed that, the proposed algorithm improve the results for UCI datasets. The datasets are described in Table 1.

Table 1: Data Sets

The data sets used here have numerical attributes, so they can be discretized to transform the numerical data into categorical one. K-Means, FCM and Median as initial centriod of K-Means clustering algorithms are used to discretize the data. Then the proposed algorithm is implemented on the discretized dataset to find the subset of features.

Similarly, fuzzy entropy based feature selection algorithm is used to find feature subset where discretization algorithms like (K-Means, FCM and Median as initial centroid of K-Means) are used to discretize the datasets. The proposed algorithm is compared with the fuzzy entropy based algorithm and displays the number of selected features, which are presented in Table 2.

Data sets Abbreviation Samples Features Class

Wisconsin prognostic

Breast Cancer WPBC 198 33 2

Iris Iris 150 4 3

Optical Recognition of

Handwritten digits optdigists 5620 64 10

Ecoli Ecoli 336 7 7

Pen based recognition of

(5)

Table 2: A Comparative analysis of fuzzy entropy measure with proposed EFAIM method for feature selection

From Table 2, one can find that the number of features selected by the proposed method is lesser than the fuzzy entropy based method. Particularly, for all datasets, the proposed algorithm gives very minimum number of features even in all discretization methods.

From the analysis, it is proved that the proposed algorithm is very efficient for all specified datasets. It gives minimum

number of selected feature subset. In this algorithm, the threshold value Tc is used to construct the membership functions of the

fuzzy sets of a numeric feature and the threshold value Tr, used for feature subset selection. The threshold value Tc and Tr used in

this method for different datasets are shown in Table 3.

Table 3: The threshold value Tc and Tr for datasets

Minimum number of features selected by proposed and fuzzy entropy method are compared and displayed in Figure-1, 2, 3, 4 and 5. For each and every datasets the comparison is made. In all the figures when compared to fuzzy entropy method, fuzzy absolute information measure gives efficient results.

Data set Raw

Data Fuzzy Entropy Based Reduction Proposed EFAIM Based Reduction

Discretization Methods K-Means FCM Median as Initial Centroid of K- means K-Means FCM Median as Initial Centroid of k means WPBC 33 3 3 33 2 2 2 IRIS 4 2 2 4 2 2 2 OPTDIGITS 64 39 36 58 22 19 29 ECOLI 7 3 3 7 3 2 3 PENDIGITS 16 7 7 12 5 4 8

Data set Threshold value(Tc) Threshold value (Tr)

WPBC 0.5 0.8

IRIS 0.2 0.9

OPTDIGITS 0.2 0.5

ECOLI 0.2 0.3

(6)

W IS C O N S IN P R O G N O S T IC B R E A S T C A N C E R D A TA S E TS

33

3

33

2

0

5

10

15

20

25

30

35

K-M ean s FC M M edi an as Initi a l c entro id of k-m ean s K-M ean s FC M M edi an as Initi a l c entro id of k-m ean s

R aw D ata F uzzy E ntrop y B ase d R edu ction P ropo sed E FA IM B ase d R edu ction

N u m b er of sel e c te d Feat ure s R aw D ata

Fu zzy En tropy B a sed R ed uction K -M ea ns

Fu zzy En tropy B a sed R ed uction F C M

Fu zzy En tropy B a sed R ed uction M ed ian as In itial centro id o f k-m e ans

P rop ose d E FA IM B ased R ed uction K -M ea ns

P rop ose d E FA IM B ased R ed uction F C M

P rop ose d E FA IM B ased R ed uction M ed ian as In itial centro id o f k-m e ans

Figure 1. Features selected for WPBC Dataset

IRIS

4

2

4

2

1.5

2

2.5

3

3.5

4

4.5

K-Means FCM Median as Initial centroid of K-means K-Means FCM Median as Initial centroid of K-means Raw Data

Fuzzy Entropy Based Reduction

Proposed EFAIM Based Reduction Num b er of sel ected Features Raw Data

Fuzzy Entropy Based Reduction K-Means

Fuzzy Entropy Based Reduction FCM

Fuzzy Entropy Based Reduction Median as Initial centroid of K-means

Proposed EFAIM Based Reduction K-Means

Proposed EFAIM Based Reduction FCM

Proposed EFAIM Based Reduction Median as Initial centroid of K-means

(7)

OPTICAL RECOGNITION OF HANDWRITTEN DIGITS

64

39

36

58

22

19

29

10

15

20

25

30

35

40

45

50

55

60

65

70

K-Means FCM Median as Initial centroid of k-menas K-Means FCM Median as Initial centroid of k-menas Raw Data

Proposed EFAIM Based Reduction N u mber of selected Featur es Raw Data

Fuzzy Entropy Based Reduction Median as Initial centroid of k-menas Proposed EFAIM Based Reduction K-Means

Proposed EFAIM Based Reduction Median as Initial centroid of k-menas

Figure 3. Features selected for OPTDIGITS Dataset

ECOLI

7

3

7

3

2

3

0

1

2

3

4

5

6

7

8

Proposed EFAIM Based Reduction Number of sel ect ed Feat ur es Raw Data

Fuzzy Entropy Based Reduction Median as Initial centroid of K-means Proposed EFAIM Based Reduction K-Means

(8)

Figure 5. Features selected for PENDIGITS Dataset

5. Conclusion

This paper, presents a new method for feature subset selection based on proposed EFAIM. The proposed method can deal with numeric features. First, the numerical attributes have been discretized using some of the benchmark clustering algorithms such as K-means, Fuzzy C Means and Median as initial centroid of K-means. The EFAIM is computed through the triangular membership functions. Then the feature subset is obtained by focusing boundary samples. From the experimental results, it can be seen that, when compared with fuzzy entropy based feature selection and the raw data sets, the proposed method can select feature subset with minimum number of features. Thus the proposed feature subset selection method helps to identify and remove much of the irrelevant and redundant features in the datasets. Thus the minimized subset of attributes for bio-metric datasets gives effective results for all type of applications. So, Feature subset selection is applied in wide range of domains like Biometrics, Chemistry, Text Processing, Pattern Recognition, speech processing and vision.

References

[1] G S.M.Chen, “A new approach to handling fuzzy decision making problems”, IEEE Trans Syst Man Cybern 18(6):1012–1016, 1988.

[2] S.M.Chen, Kao CH, Yu CH,” Generating fuzzy rules from training data containing noise for handling classification problems”, Cybern Syst 33(7):723– 748,2002.

[3] S.M.Chen, J.D.Shie,”A new method for feature subset selection for handling classification problems”, In: Proceedings of IEEE international conference on fuzzy systems, Reno, NV, pp 183-188, 2005.

[4] E.C.C.Tsang, D.S.Yeung, X.Z.Wang,”OFFSS: optimal fuzzy valued feature subset selection”, IEEE Trans Fuzzy Syst 11(2):202–213, 2003. [5] Nicolaie popescu-bodorin, “Fast K-Means Image Quantization Algorithm and Its Application Iris Segmentation”, Bulletin Scientific-2007.. [6] S.M.Chen, C.H.Chang, “A new method to construct membership functions and generate weighted fuzzy rules from training instances”, Cybern Syst

36(4):397– 414, 2005.

[7] Luca AD, Termini S (1972), “A definition of non-probabilistic entropy in the setting of fuzzy set theory”, Inf Control 20(4):301-312.

[8] H.M.Lee, C.M.Chen, J.M.Chen, Jou YL, “An efficient fuzzy classifier with feature selection based on fuzzy entropy”, IEEE Trans Syst Man Cybern Part B Cybern 31(3):426–432, 2001.

[9] Jen-Da shie, Shyi-Ming Chen, “Feature subset selection based on fuzzy entropy measures for handling classification problems”, Appl Intell (2008) 28:69-82, 2008.

[10] D.Shie, S.M.Chen, “A new approach for handling classification problems based on fuzzy information gain measures”, In: Proceedings of the 2006 IEEE international conference on fuzzy systems, Vancouver, BC, Canada, pp 5427–5434,2006.

[11] Lei Yu, Huan Liu, “Redundancy Based Feature Selection for Micro array Data”, Proceedings of the 2004 ACM SIGKDD. 2004. pp. 737–742. [12] SHI-Fei Ding, Shi-xiong Xia, Feng-Xiang Jin and Zhong-Zhi Shis, “Novel fuzzy information proximity measures”, Journal of Information Science,

Volume 33 , Issue 6 (December 2007) , pages 678-685 Year of Publication: 2007 ISSN:0165-5515

[13] Battiti R (1994) Using Mutual Information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537-550 [14] De RK, Basak J, Pal SK (1999) Neuro-fuzzy feature evaluation with theoretical analysis. Neural Netw 12(10):1429-1455.

PEN BASED RECOGNITION OF HANDWRITTEN DIGITS DATASETS

16

7

12

5

4

8

3

5

7

9

11

13

15

17

Proposed EFAIM Based Reduction Num b er of sel ected Features Raw Data

Fuzzy Entropy Based Reduction Median as Initial centroid of K-means Proposed EFAIM Based Reduction K-Means

(9)

[15] Dong M, Kothari R (2003) Feature subset selection using a new definition of classifiability. Pattern Recognit Lett 24(9):1215-1225. [16] Tsang ECC, Yeung DS, Wang XZ (2003) OFFSS: optimal fuzzy-valued feature subset selection. IEEE Trans Fuzzy Syst 11(2):202-213. [17] Marzuki, F. Ahmad,”Data mining discretization methods and Performance”, Proceedings of International conference on Electrical engineering and

Informatics, institute Technology Bandung, Indonesia, June 2007.

[18] H Liu, etal: Discretization: ”An Enabling Technique. Data Mining and Knowledge Discovery”, 6,393-423, 2002. [19] J.A.Hartigan, M.A.Wong, “A k-means clustering algorithm”. J Roy Stat Soc Ser C 28(1):100–108, 1979.

[20] .S.M.Chen, Y.C.Chen, “Automatically constructing membership functions and generating fuzzy rules using genetic algorithms”, Cybern Syst 33(8):841– 862, 2002.