3.3 Granulation using FCM
3.3.1 The proposed FCM clustering algorithm
To use the FCM algorithm for generating data granules for binary data classification, it is desirable to differentiate data points belonging to different classes. In general, there are two approaches to achieve this differentiation: clustering data belonging to each class separately or including class labels as an additional input which increases the distance between two data points belonging to different classes.
While the second approach increases the dimensionality of the input data (hence increases the computational time of the clustering process), it has the advantage of reducing the effect of outliers points on the size of the clusters, especially in regions of overlap of classes in the input space of high dimensional data at lower granularity levels. This is illustrated in Figure 3.2 where two-dimensional two-class data (Figure 3.2-a) are clustered by the two different approaches. Figure 3.2-c shows the clusters resulted from clustering data points belonging to each class separately while Figure 3.2-d shows the clusters resulted from clustering data with class labels included as an additional input as shown in Figure 3.2-b. The outlier point at (5,2) (blue circle) has a greater influence on clusters of the same class in the first approach than the second approach. This influence is manifested by the wider clusters in the first approach (Figure 3.2-c) resulting in more overlap between clusters of different classes which suggest less efficient classification. FCMGr uses the second approach. The data points and cluster centres combined with their corresponding class labels are denoted by ˆX and ˆV respectively while the data points and cluster centres without class labels are denoted byXandV respectively.
To use FCM in the proposed method of granulation, some modifications to the algorithm are made. Firstly, the order of computing ˆU and ˆV is reversed. That is, the modified FCM starts by choosing initial clusters centres ˆV0then computes the new fuzzy partition matrix ˆUk using equation 3.10 and the updated clusters centres ˆVk+1using equation 3.9. For the first granulation level (the level with the highest granularity), the initial clusters centres ˆV0are chosen to be the points of the input data after adding some small random noise (δ) to avoid singularity problem of making d( ˆxi, ˆvk)=0 in equation 3.10.
CHAPTER3. CLUSTERING-BASEDGRANULATION 3.3. GRANULATION USINGFCM
FIGURE3.2. Illustrative example of labelled data fuzzy clustering. (a) La- belled data. (b) Labelled data with label as input. (c) Clustering each class separately. (d) Clustering labelled data with label as input.
The second modification is that the fuzziness factormis chosen dynamically for each granularity level rather than being fixed prior the algorithm start. To determine m, an objective function is proposed. For each granularity level,mis varied and the value that minimises the objective function is used. The proposed objective function is given by: Fg(m)=A c X j=1 n X i=1 Ci=Lj d(xi,vj)umji−B c X j=1 n X i=1 Ci6=Lj d(xi,vj)umji (3.11)
CHAPTER3. CLUSTERING-BASEDGRANULATION 3.3. GRANULATION USINGFCM
wherec is the number of clusters, Ci and Lj are the class labels of xi andvj respectively, andAandBare scaling factors. Initially, the number of clusterscis the number of training samples. Then, for each granularity level, the merging process determinesc.
The first term of the objective functionFg is the sum of weighted distances of
each data point to clusters of similar class label (Ds) while the second term is the
sum of weighted distances of each data point to clusters of opposite class label (Do).
Minimising the objective function Fgwith respect tomresults in the best trade-off between minimising Dsand maximisingDo, i.e. maximising the inclusion of data points to clusters of similar class label and the separation of data points from clusters of opposite class label.
For imbalanced data (where data samples belonging to a class are significantly more than the data samples belonging to the other class), the minimisation of Fg usually leads to the domination of minimising the first term over maximising the second term resulting in wide clusters with high values of inclusion (high values of uji for xi and vj of similar class labels) but with low separation values (high values ofuji forxiandvjof opposite class labels). To overcome this issue, the scaling
factorsAandBare introduced. LetPC equals the ratio of number of data points of
the class with less data samples to the number of data points of the class with more data samples, thenAis chosen to equal 1−PCandBequalsPC.
It should be noticed that bothd(xi,vj) and umji are computed for input data and
cluster centres without class labels (XandV respectively).