5.3 Experimental Evaluation
5.3.3 Determining Consensus Clusters
As Homogeneity Analysis creates this representation by examining both the similar- ity between clusters and the similarity between objects, a Hierarchical Clustering on this representation can uncover the consensus clusters. It is essential however to deter- mine the optimal cut of the cluster tree (dendrogram) in order to obtain clusters that maximise the the Silhouette criterion in the original representation and the Silhouette criterion in the agreement space. The second criterion is based on our intuition that well
defined, compact and distinct clusters in the agreement space will maximise the agree- ment between the consensus partition and the initial cluster ensemble. For that reason we will examine also which cut of the cluster tree maximises this agreement in order to make appropriate comparison. Finally we will also try to find which cut maximises the agreement with the ground truth since the latter is provided in these datasets, despite our disagreements that such a criterion should not be considered in evaluating clustering results.
In particular, we produce cuts of the cluster tree starting from two up to 20 clusters and we obtain the scores of each of the four criteria mentioned above for each of the 19 clusters returned. The silhouette criterion is calculated as it is specified in the equation 3.6 in the section 3.3.1, the agreement with the cluster ensemble by the mean Rand Index between the consensus partition and each of the clusterings of the cluster ensemble, and the agreement with ground truth by the Rand Index of the consensus partition with the classification provided by the dataset after these two are aligned.
In Fig. 5.8 we can see the level of disagreement between the four different criteria for the User Knowledge Modelling dataset. The Silhouette criterion is maximised in the case of the three clusters which is also the number of natural clusters that seem to form naturally inspecting the agreement space in Fig. 5.5. The Silhouette of agreement reaches the optimal point at the 15 clusters whereas the Agreement to the cluster ensemble identifies eight as the optimal number of clusters. Despite the fact that the Silhouette of agreement fails to agree with the criterion of maximising the similarity with the cluster ensemble, its optimal point achieves a score of agreement to the cluster ensemble that is very close to the optimal. The 17 clusters identified by the comparison to the ground truth shows how incompatible the discovery of the original classifications by clustering can be, especially when the classes are not well defined, like in the case of UNS. However we can observe that the value of similarity to the ground truth obtained by the point indicated by the Silhouette of agreement is again very close to the optimal.
Moving to the results of Seeds dataset in Fig. 5.9, the Silhouette criterion identifies two clusters as optimal, whereas the agreement to consensus three, which is the number of wheat varieties specified in the dataset. Again Silhouette of agreement fails to agree with the agreement to consensus but it still presents a point that is very close to the maximum agreement between the consensus partition and the cluster ensemble. It also manages to agree with the agreement to the ground truth which similarly identifies four clusters as optimal.
In Fig. 5.10 and Fig. 5.11 we can see that Silhoutte identifies two as the optimal number of clusters. The agreement to ground truth is incomprehensible in the case of the Region class, a result that complies with the visual representation of the class in the biplot of
Figure 5.8: Plot of the four criteria against the number of consensus clusters obtained by hierarchical clustering for the User Knowledge Modelling dataset
Figure 5.9: Plot of the four criteria against the number of consensus clusters obtained by hierarchical clustering for the Seeds dataset
Figure 5.10: Plot of the four criteria against the number of consensus clusters obtained by hierarchical clustering for the Wholesale Customers dataset for the class of Region
Fig 5.3. However, it reaches its maximum in the 16 clusters for the case of Channel class. Again the Silhouette of agreement finds 15 clusters as optimal, which is a clustering that provides scores of agreement with the consensus and with original classification that are very close to being optimal.
In all cases the Silhouette criterion finds a number of well defined clusters that is different from the number of clusters specified by the original classifications. This is justified in two of these cases as the biplots in an earlier section showed that these classes are not compact or well separated, whereas in the case of the Seeds dataset where the clusters seem to be well defined, the few instances that lie beyond the boundaries of the clusters cause a small deterioration in the Silhouette of the three clusters. A two cluster classification seems to overcome this issue.
The Silhouette of agreement was not able to maximise the agreement of the consensus partition with the initial cluster ensemble, but in all cases gave values that are very close to optimal. This small difference from the optimal value is caused by the small size of the cluster ensemble. We can observe that the number of clusters that maximises the silhouette criterion in the agreement space is exactly equal to non-zero elements of all the cross tabulations. That means that in the agreement space Homogeneity Analysis places together all the members of each of these non-zero elements. This can be seen
Figure 5.11: Plot of the four criteria against the number of consensus clusters obtained by hierarchical clustering for the Wholesale Customers dataset for the class of Channel
for example in the case of the Seeds dataset in the joint plot of Fig. 5.12, where the five members of the first cluster of K-means fall in the second cluster of PAM and therefore they are placed in the middle of their distance. This causes the Silhouette criterion in the agreement space to be maximises with four clusters and not the three which is indicated by the agreement to the consensus. If we used a bigger cluster ensemble this phenomenon would not be that distinct and it could lead to a more meaningful organisation of the disagreements among the clusterings of the ensemble. As desirable as bigger ensemble might be, this simplistic modelling of the disagreement remains particularly valuable as it reveals the patterns or occasions where the two clustering algorithms fail to agree. So the proposed two tier structure specified by the two different cuts on the hierarchical based on the silhouette values in the original representation and in the agreement space, can still present significant and meaningful information.