• No results found

1. Chapter I Introduction

3.1. Materials

4.2.3. Step 3 Unsupervised Clustering

Four different clustering techniques aimed at segmenting the MRI exams were considered in this study. The techniques included: K-means, agglomerative hierarchical, BIRCH and DBSCAN clustering. With these algorithms, the main groups of tissues expected to be clustered were the bone, muscle and fat tissues. Other clusters without an anatomical match were also expected. These should be grouped in one large cluster of body tissue apart from the identifiable main groups. This large cluster will be referred to as mixed tissue, similarly to the tissue of the numerical phantom which represented the mixture of a variable amount of adipose tissue, loose connective tissue, platysma (muscle) and small bones. Hence, the four main groups of tissues expected to be found after MRI clustering include bone, muscle, fat and mixed tissues.

The specifications of the unsupervised clustering algorithms and the clustering quality metrics used in this work are detailed below. Any unsupervised clustering algorithm returns a column vector containing the cluster indices of each point of the entry data. Besides other inputs specified below, the data input of such algorithms must be in a column vector. The inputs of the metrics algorithms are column vectors with the pixel’s intensities before the clustering and cluster indices of each data point.

Note that all the clustering quality metrics described in section 2.3.3 and tested in this dissertation express good clustering quality when the clusters are dense and well separated, which means that the objects inside the same cluster should be very close in value and, simultaneously, their values should be distant from the values of the objects of other clusters. However, when analysing the type of the data used in this work (i.e. MRI), we understand that it is very likely that the clusters formed in the data are not well separated as voxels’ intensity vary within a continuous range of values. Hence, I suspect that the results from the studied metrics do not reflect the quality of the clustering in the MR images studied.

K-means clustering specifications

The K-means algorithm was tested in MATLAB®, with [IDX, C] = kmeans(X, K)function,

which partitions the input data X into K clusters and returns the vector IDX containing the cluster indices of each point, and the K cluster centroid locations C. By default, kmeans function uses the squared Euclidean distance metric.

This algorithm was also tested in Python™, with KMeans from sklearn.cluster module. The two parameters required for the implementation of this algorithm were the k number of clusters and the input data.

In both cases, the algorithms required the input and the number of clusters, which were varied from 4 to 7. Clustering quality metrics described in section 2.3.3 were used for k parameter optimization.

Agglomerative hierarchical clustering specifications

The agglomerative hierarchical algorithm was implemented in Python™, therefore AgglomerativeClustering was imported from sklearn.cluster module. The two parameters required for the implementation of this algorithm were the k number of clusters and the type of linkage metric. Similarly to K-means, several k’s, from 4 to 7, were tested in order to obtain the optimal value for this parameter.

38 BIRCH specifications

The BIRCH algorithm was tested in Python™ using Birch imported from sklearn.cluster module. The three parameters required for the implementation of this algorithm were the threshold, which limits the distance between the entering sample and the existing subcluster, the number of clusters, and the branching factor, which limits the number of subclusters in a node. The k number of clusters was set to 7 and the threshold was set to its default value of 0.5.

Additionally, several values for the branching factor, from 2 to 30, were tested in order to choose which one provided better visual separation of the anatomical structures under study. In order to select the optimal branching factor, a clustering evaluation metric was used.

DBSCAN specifications

This algorithm was tested in Python™ using DBSCAN imported from sklearn.cluster module. The two parameters required for the implementation of this algorithm were the Eps and the MinPts, which were varied with a coarse grid-search approach from 0.01 to 10 and 10 to 500, respectively. In order to select the optimal combination of parameters, a clustering evaluation metric was used.

Silhouette coefficient specifications

The silhouette coefficient was implemented both in MATLAB®, using the function silhouette

of the Statistics and Machine Learning Toolbox, and in Python™, using silhouette_score imported from sklearn.metrics module. The output from the Python™ is the mean value of the silhouette coefficient of each data point and the MATLAB® function returns the silhouette values of all

data points. In order to compare the coefficients obtained from MATLAB® and Python™, the average

of the MATLAB® output was used.

Davies-Bouldin index specifications

The DBI was implemented in Python™, using davies_bouldin_score imported from sklearn.metrics module. The output of this metric is the mean value of the DBI of each cluster. Here, the computation of the DBI is simpler compared to the silhouette coefficient, however the distance metric is limited to Euclidean space. Zero is the lowest possible value for DBI; values similar to zero indicate a better partition of the data [97].

Calinski-Harabasz index specifications

The CHI was implemented in Python™, using calinski_harabasz_score imported from sklearn.metrics module.

39

Related documents