Mean-shift clustering - Unsupervised learning algorithms

1 The foundations of lesion function inference in the

1.3 Lesion segmentation

1.3.3.1 Unsupervised learning algorithms

1.3.3.1.2 Mean-shift clustering

Mean shift clustering is a general non-parametric clustering procedure (Comaniciu and Meer, 2002). Unlike the k-means method, it neither requires prior knowledge of the number of centroids present in the dataset nor assumes a shape of the clusters. A search window (bandwidth), that isolates a specific volume within the n-dimensional feature space is positioned on the dataset. If each data point is given an equal unit weighting, the centre of mass (mean location) of the search window is calculated which will represent the location of the next centroid. The search window is then shifted so that the newly calculated centroid is at its centre (Fukunage and Narendra, 1975). The

displacement vector is therefore dependent on the density gradient itself. This has two effects on the procedure. First, the vector will always point towards the direction of the maximum increase in density until convergence is achieved. Second, the algorithm automatically adjusts its convergence speed, with

smaller steps as the window nears the maxima.

In this way, the dataset with points in an n-dimensional feature space is treated as a probability density function, where dense regions correspond to the local maxima (modes) of the underlying distribution. Each data point within the dataset is then processed using the algorithm. Those that share the same (at least approximately) maxima locations are considered members of the same cluster. The only parameter that is required before execution of the algorithm is the bandwidth. This is a significant benefit over k-means clustering, since if the bandwidth could be selected prior to execution it would facilitate a fully automated segmentation routine. However its selection is not a trivial matter. Larger bandwidths provide the opportunity for larger displacement vectors enabling the algorithm to identify maxima more rapidly, but at the expense of its resolution.

Figure 1.2 - Mean shift clustering.

A dataset of 5 clusters is distributed in a 2 dimensional feature space. Each datum is selected in turn as the starting centroid. A bandwidth is specified prior to running the algorithm and can be visualised as a circle, centred on the centroid, with all data points lying within its borders used to calculate the next centroid location (b). The starting bandwidth is represented by the dashed circle, with the course of the centroid depicted by the red dotted line with its final location (maxima) identified by the red dot. Data points who have the same final centroid are clustered together (c). Large bandwidths traverse the feature space swiftly but risk losing spatial resolution (d). Small bandwidths risk over-fitting (e). The optimal bandwidth is able to separate all the clusters (f).

a c e b d f

In figure 1.2 we have a set of data points distributed in a 2 dimensional feature space. It is clear to see that there are 5 discrete clusters, with their constituent data points represented as a cross (+), x, circle, square or triangle, present in this space which we would like the algorithm to identify. If a large bandwidth is selected, the algorithm will quickly identify the 2 centroids (represented as a filled circle) for the cross and x clusters. However it will fail to separate the 3 smaller clusters in the bottom right hand corner because the bandwidth is too large, thus grouping these data points together with a common centroid. Conversely, if a small bandwidth is used, the algorithm will have greater focus on local features and ignore the gross structure, resulting in numerous inappropriate centroids being identified.

As the bandwidth is decreased, the maximum possible shift per iteration is also reduced. This will increase the computational load on the algorithm, as more iterations will be required before the algorithm identifies its maxima. Thus the challenge with mean-shift clustering lies in the preparation of the data to ensure the greatest contrast to assist classification with the largest bandwidth.

The situation is fairly simple if the various clusters (of interest – lesioned and unlesioned) are well separated and of a large magnitude. This is however a rare and unlikely case, since it is difficult to differentiate between lesion and signal artefact (for example) using signal intensity alone. Consequently there may be a significant number of data points that traverse or reside in saddle regions of the probability density function and thus will be particularly sensitive to bandwidth selection. One solution to this problem would be to select a range

of bandwidths, observe the various outcomes, and determine the optimal cluster arrangement using the various derived solutions.

Although referred to as a benefit, the ability of mean shift clustering to determine the number of clusters automatically presents another potential problem. Normal tissue is not homogenous, being coarsely divided into gray matter, white matter and cerebrospinal fluid (CSF). Consequently it is likely that normal tissue will not cluster into one single group, but actually comprise of a number of “splinter” groups. Post processing of the clustered image will therefore be needed, whereby lesioned and normal tissue clusters are differentiated. Indeed some algorithms (using a voxel based framework) use probabilistic map priors of the 3 aforementioned regions to determine whether a voxel belongs to any of these groups or to a separate “lesioned” group (Crinion et al., 2007).

In both described clustering methods, the mathematical calculations employed are not particularly complex. Despite this, the amount of data that needs to be processed in one brain volume forces both methods to still be time consuming. Moreover if a range of k values or bandwidths are to be investigated the

amount of processing required per volume increases further.

In document The foundations of lesion-function inference in the human brain (Page 40-43)