3.3 Third Data Set Description
3.3.4 Clustering Algorithms
We have created a data set of 300 spectra * 10 PCs for experiments with clustering algo-
rithms. As we have different number of cases for each grade, we have used the following
approach to create a balanced data set.
• G-I: 100 spectra * 10 PCs (2 cases, 50 spectra from each case)
• G-II:104 spectra * 10 PCs (26 cases, 4 spectra from each case)
• G-III: 96 spectra * 10 PCs (6 cases, 16 spectra from each case)
For this complex data set, we have again used the two clustering algorithms, one a hard
3.3. Third Data Set Description 80
rithms were used with 3 clusters as an input in order to classify the three grades. The
parameters of the algorithms were the same as described in subsection 3.2.6. The aim of
the research is to create a decision support system with the help of advanced fuzzy logic
for breast cancer grading. Therefore, We have used both k-means and FCM clustering
algorithms in order to explore only the complexities of the data set.
3.3.5
Results
Table 3.5 shows results with the k-means clustering algorithm with 3 clusters. It can be
seen that cluster 1 mainly consists of values of G-II and G-III but with no clear distinction.
Cluster 2 has G-I and G-III members and cluster-3 mainly consists of members of G-I and
G-II. The results indicate that because of the complexity of the data no cluster was able
to differentiate between the three grades.
Table 3.5: Results with k-means clustering algorithm with data set 3
Cluster with members G-I G-II G-III
1(112) 13 51 48
2(91) 40 17 34
3 (97) 47 36 14
Table 3.6 shows results with FCM clustering algorithm with three clusters. It can be
seen that cluster 1 include a small number of members from G-I and majority classified
as G-II and G-III. Cluster 2 include members from all grades. Cluster 3 was able to
differentiate 50 members of G-I out of 100 correctly.
Table 3.6: Results with FCM clustering algorithm with data set 3
Cluster with members G-I G-II G-III
1(83) 5 41 37
2(135) 45 37 53
3 (82) 50 26 6
The results with clustering algorithms indicate that neither of the two algorithms are
3.4. Summary 81
cause of high level of variabilities involved in the data set, unsupervised learning with
standard clustering algorithms is not able to find a clear distinction between cancer grades.
These uncertainties may exist between between spectra of same case of grade (intra-case)
as well as between multiple cases same grade (inter-case). The other type of uncertain-
ties may exist between cases of same grade (intra-grade) and between cases of different
grades (inter-grade).
The results indicate that both k-means and FCM clustering algorithms are able to show
the complicated nature of the data as both algorithms performed poorly on the data set.
In the next stage of the research, we take this complicated data set and move towards a
decision support system with fuzzy logic.
3.4
Summary
In this chapter, we have used three different data sets with standard k-means and FCM
clustering algorithms to differentiate between different classes with the help of PCA. Data
set 1 was used to distinguish between tumour and stroma cells with PCA and FCM. The
results indicated that the method was able to make good classification. Data set 2 was a
real breast cancer spectral data set used to differentiate between three cancer grades with
PCA and k-means and FCM clustering algorithms. Results indicate that both methods
are able to differentiate between three grades. Data set 3 was a real complex spectral
data set involving a variety of cases from all grades. The same methods of PCA with k-
means and FCM clustering algorithms were applied to differentiate between breast cancer
grades. Results indicate that because of variabilities between cases of same grade and
between grades, the clustering algorithms perform poorly and are not able to distinguish
between grades emphasising high level of uncertainties involved in the data set. In the
next Chapter, we take this complex data set and move towards a decision support system
supervised learning approach by developing a fuzzy inferencing system that can classify
Chapter 4
Experiments with Fuzzy Inferencing
System
In this chapter, we have used data set 3, and a Mamdani type fuzzy inferencing system
(FIS) has been developed with 300 spectra and using three PCs taken from different cases
of each grade for classification. The system uses HC and SA algorithms to train member-
ship functions and rules. The developed system has also been tested with unseen data. The
results are compared with the standard k-means clustering algorithm and the performance
of the system is discussed.
4.1
System Structure
Figure 4.1 shows a block diagram of the main structure of the FIS used. A data set has
been created either by selecting data taken from a single case per grade or from all cases
of all grades. The created data set goes through PCA and first three PCs are selected as
an input to the system. Each input has three membership functions. These membership
functions are trained with the help of three training methods namely, Hill Climbing with
Membership Function Tuning (HCMT), Simulated Annealing with Membership Function
Tuning (SAMT) and Simulated Annealing with Membership function and Rule Tuning
(SAMRT). The HC algorithm is selected because it has been previously used in a spectral 82
4.1. System Structure 83
problem to find correct target spectral peak and was able to perform well [95]. We have
selected SA in order to avoid limitations of HC as in complex scenarios, HC tends to tilt
towards local minima. In case of SA, the chances of getting a better solution increase.
In the literature, SA has been found to perform well with complex FTIR spectral data in
order to find the optimal cut off threshold for detecting the quality of glycerol monolaurate
(GML) [19]. It shows that SA can be useful in problems where complexity of spectral
data is high. The best trained FIS is found by comparing the results on training data. The
selected FIS is tested on unseen data. The next sections describe the processes involved