Chapter 6. Assessment of the performance and variability of ICA algorithms applied
7.1 Cluster Analysis
Cluster Analysis divides a collection of inputs or objects into a smaller number of clusters; a cluster is a collection of objects which are similar or related between themselves and are dissimilar or unrelated to the objects belonging to other clusters; the aim of cluster analysis is to determine the intrinsic grouping in a set of unlabeled objects (data). The concept of clustering is referred to an entire group of clusters; ideally all the clusters are well separated from each other, in other words the distance between two different clusters is larger than the distance between any two objects within a cluster.
One of the most important applications of clustering is in biology, specifically in taxonomy and hierarchical classification, where objects are classified according to their characteristics in species, classes or families; the concept of hierarchical refers to organising the objects into a “tree”. Clustering has been used in biology for example to group genes which have similar functions [46;86].
There are different similarity criteria to merge the collection of objects, a criterion of similarity could be the distance between the objects [46]. Data to be clustered can be presented by a data matrix or by a dissimilarity matrix D with dij elements, dij is the dissimilarity between the i-th and j-th objects. The set of objects belonging to a cluster satisfy a minimum of three conditions:
1. The dissimilarity between objects i and j is positive dij≥0. 2. The dissimilarity is equal to zero if the object is the same dii=0. 3. The dissimilarity is symmetric dij=dji.
In some applications it is more convenient to consider the similarity ij, between the i-th and j-th objects instead of the dissimilarity; the dissimilarity must satisfy the conditions listed before.
hierarchical clustering, the data are fused or partitioned in a series of steps.
Hierarchical clustering using agglomerative methods consists in fusing n objects into groups where the last group contains all the objects at each step; in the agglomerative method, the most similar pair of objects is clustered. A divisive method consists in separating a number of objects into groups where every group contains only one individual; at the beginning of the divisive clustering there is one cluster containing all the data, at each step of the clustering an existing cluster is divided into two [83].
The two-dimensional diagram that illustrates the fusion or division made during the hierarchical clustering is called a dendrogram (see Figure 7.1). The dendrogram or rooted tree diagram is a mathematical and pictorial representation of the complete clustering procedure. The height, h, in this tree represents the distance at which each fusion is made and the nodes (labelled from A to E) in the diagram represent clusters; for each pair of objects (i, j), the smaller the value of hij the more similar objects i and j are. This diagram displays the order in which the clusters were fusioned. Each of the terminal nodes represents one of the objects clustered (numbered from 1 to 6); the arrangement of nodes and heights is the topology of the tree. The node E is called the root of the tree and is the cluster which includes all the objects.
The dendrogram in Figure 7.1 is a binary dendrogram, it has n-1 internal nodes and each internal node has two nodes lying below it in the tree; all the dendrograms included in this chapter are binary. Since there are 2n-1 different ways of representing each binary dendrogram, the left-right ordering of the edges leading down from each internal node can be interchanged.
Figure 7.1 A dendrogram or rooted tree diagram, objects clustered are numbered from 1 to 6
and nodes are labelled from A to E, height is the distance at which cluster is made.
Different measures have been proposed to calculate the proximities between the data, typically measured by dissimilarities or the inter-objects distances [46]. Given a
m n data matrix X, the m entries of X are 1 n row vectors x1, x2,..., xm, the commonly used distances measures between the vector xi and xj are defined as follows: Euclidean distance, 2 1 ( ) n ij ik jk k d x x
City Block metric,
1 n ij ik jk j d x x
Minkowski metric, 1 1 n p p ij ik jk k d x x
.For the special case of p = 1, the Minkowski metric gives the City Block metric, and for the special case of p= 2, the Minkowski metric gives the Euclidean distance.
The most commonly used distance measure is the Euclidean distance; this can be interpreted as physical distance between two points in the Euclidean space.
There are three basic agglomerative methods used in hierarchical clustering to measure the inter-cluster similarity [65], all methods use generally a proximity matrix as input:
1) Single linkage clustering, also known as the nearest neighbour technique, defines the distance between groups as that of the closet pair of individuals.
2) Complete linking clustering or furthest neighbour is the opposite of single linkage and defines distance between groups as that of the most distant pair of individuals.
3) Group average clustering defines distance between groups as the average of the distances between all pairs of individuals.
Once the clustering procedure has been completed, the number of clusters must be decided by properly dividing of the dendrogram. There are two principal criteria to divide this hierarchical tree, by finding the natural divisions in the original data or by specifying an arbitrary number of clusters. In agglomerative clustering the number of cluster is performed by cutting the dendrogram at a particular height. The
inconsistency coefficient can be use to identify the cutoff or height of comparison in
the dendrogram [120]; each link between nodes in the hierarchical clustering is compared with adjacent links two levels below it. Another criterion is to determine the number the elements in each cluster according with the number the objects grouped [83].
The criterion used in this research, to cut off the dendrogram, in order to find the number of clusters in each “tree” was the 70% of the maximum height between clusters [83]. This criterion was considered more convenient than any other of the criteria mentioned since those involve make a subjective decision about the number of clusters or the number of elements in each cluster.
The objective of applying clustering to ICA is, for example, when the reliability of an ICA algorithm is assessed, that repeating the estimates several times in order to identify robust ICs. Himberg and Hyvarinen [64;65] propose to use
clustering to identify common components between estimates calculated by running
FastICA many times. After performing ICA, it could be important to identify
equivalent components across subjects, this is another application of clustering of ICs [40]. Stögbauer proposes to use clustering of mutually independent components to identify one- or multi-dimensional components [121].
The agglomerative method used in this research was single linkage clustering and the criterion of similarity measure was the Euclidean distance of the MI between the ICs calculated by TDSEP-ICA; when the Euclidean distance is used to measure similarities between values with different scales is convenient to normalize them (mean zero and standard deviation one). Authors in the literature who have used hierarchical clustering to group the ICs calculated by ICA include Himberg et al 2003 [65] and Krashov et al 2005 [86].