Clustering Algorithms - Third Data Set Description

3.3 Third Data Set Description

3.3.4 Clustering Algorithms

We have created a data set of 300 spectra * 10 PCs for experiments with clustering algo-

rithms. As we have different number of cases for each grade, we have used the following

approach to create a balanced data set.

• G-I: 100 spectra * 10 PCs (2 cases, 50 spectra from each case)

• G-II:104 spectra * 10 PCs (26 cases, 4 spectra from each case)

• G-III: 96 spectra * 10 PCs (6 cases, 16 spectra from each case)

For this complex data set, we have again used the two clustering algorithms, one a hard

3.3. Third Data Set Description 80

rithms were used with 3 clusters as an input in order to classify the three grades. The

parameters of the algorithms were the same as described in subsection 3.2.6. The aim of

the research is to create a decision support system with the help of advanced fuzzy logic

for breast cancer grading. Therefore, We have used both k-means and FCM clustering

algorithms in order to explore only the complexities of the data set.

3.3.5 Results

Table 3.5 shows results with the k-means clustering algorithm with 3 clusters. It can be

seen that cluster 1 mainly consists of values of G-II and G-III but with no clear distinction.

Cluster 2 has G-I and G-III members and cluster-3 mainly consists of members of G-I and

G-II. The results indicate that because of the complexity of the data no cluster was able

to differentiate between the three grades.

Table 3.5: Results with k-means clustering algorithm with data set 3

Cluster with members G-I G-II G-III

1(112) 13 51 48

2(91) 40 17 34

3 (97) 47 36 14

Table 3.6 shows results with FCM clustering algorithm with three clusters. It can be

seen that cluster 1 include a small number of members from G-I and majority classified

as G-II and G-III. Cluster 2 include members from all grades. Cluster 3 was able to

differentiate 50 members of G-I out of 100 correctly.

Table 3.6: Results with FCM clustering algorithm with data set 3

Cluster with members G-I G-II G-III

1(83) 5 41 37

2(135) 45 37 53

3 (82) 50 26 6

The results with clustering algorithms indicate that neither of the two algorithms are

3.4. Summary 81

cause of high level of variabilities involved in the data set, unsupervised learning with

standard clustering algorithms is not able to find a clear distinction between cancer grades.

These uncertainties may exist between between spectra of same case of grade (intra-case)

as well as between multiple cases same grade (inter-case). The other type of uncertain-

ties may exist between cases of same grade (intra-grade) and between cases of different

grades (inter-grade).

The results indicate that both k-means and FCM clustering algorithms are able to show

the complicated nature of the data as both algorithms performed poorly on the data set.

In the next stage of the research, we take this complicated data set and move towards a

decision support system with fuzzy logic.

3.4 Summary

In this chapter, we have used three different data sets with standard k-means and FCM

clustering algorithms to differentiate between different classes with the help of PCA. Data

set 1 was used to distinguish between tumour and stroma cells with PCA and FCM. The

results indicated that the method was able to make good classification. Data set 2 was a

real breast cancer spectral data set used to differentiate between three cancer grades with

PCA and k-means and FCM clustering algorithms. Results indicate that both methods

are able to differentiate between three grades. Data set 3 was a real complex spectral

data set involving a variety of cases from all grades. The same methods of PCA with k-

means and FCM clustering algorithms were applied to differentiate between breast cancer

grades. Results indicate that because of variabilities between cases of same grade and

between grades, the clustering algorithms perform poorly and are not able to distinguish

between grades emphasising high level of uncertainties involved in the data set. In the

next Chapter, we take this complex data set and move towards a decision support system

supervised learning approach by developing a fuzzy inferencing system that can classify

Chapter 4 Experiments with Fuzzy Inferencing

System

In this chapter, we have used data set 3, and a Mamdani type fuzzy inferencing system

(FIS) has been developed with 300 spectra and using three PCs taken from different cases

of each grade for classification. The system uses HC and SA algorithms to train member-

ship functions and rules. The developed system has also been tested with unseen data. The

results are compared with the standard k-means clustering algorithm and the performance

of the system is discussed.

4.1 System Structure

Figure 4.1 shows a block diagram of the main structure of the FIS used. A data set has

been created either by selecting data taken from a single case per grade or from all cases

of all grades. The created data set goes through PCA and first three PCs are selected as

an input to the system. Each input has three membership functions. These membership

functions are trained with the help of three training methods namely, Hill Climbing with

Membership Function Tuning (HCMT), Simulated Annealing with Membership Function

Tuning (SAMT) and Simulated Annealing with Membership function and Rule Tuning

(SAMRT). The HC algorithm is selected because it has been previously used in a spectral 82

4.1. System Structure 83

problem to find correct target spectral peak and was able to perform well [95]. We have

selected SA in order to avoid limitations of HC as in complex scenarios, HC tends to tilt

towards local minima. In case of SA, the chances of getting a better solution increase.

In the literature, SA has been found to perform well with complex FTIR spectral data in

order to find the optimal cut off threshold for detecting the quality of glycerol monolaurate

(GML) [19]. It shows that SA can be useful in problems where complexity of spectral

data is high. The best trained FIS is found by comparing the results on training data. The

selected FIS is tested on unseen data. The next sections describe the processes involved

In document Modelling FTIR spectral sata with Type-I and Type-II fuzzy sets for breast cancer grading (Page 97-102)