In this chapter, we reviewed some popular clustering algorithms including k-means, k-medoids and HC. Also five families of cpustering algorithms are summarised and discussed in one of our conference papers. All these clustering methods here can be used as the clustering based initialisation for NMF. In chapter 4, 5 and 6, we selected k-means and FCM as the examples for NMF initialisation and chose RAND as the measure of the clustering performance. Future works can be done using other clustering methods and clustering validations. In chapter 7, k-means, k-medoids and HC were used to analyse the clustering performance of EEG dataset. Also the ensemble clustering described in section 2.2.7 was applied to get the tight clusters.
Chapter 3
Nonnegative Matrix
Factorization
This chapter describes the knowledge of Nonnegative matrix factorisation (NMF), em- phasis on the topic of both NMF optimization strategies and NMF initialization meth- ods. Section 3.1 provides a brief concept of NMF and its usages. Section 3.2 reviews some NMF optimization strategies. Section 3.3 introduces three types of NMF initial- ization methods saying randomisation-based initialisation, cluster-based initialization and dimensionality reduction-based initialiszation.
3.1
Introduction
NMF, proposed by [24], is an algorithm based on decomposition by parts that can reduce the dimensionality of the datasets while keeping the most information about the datasets. It is different from principal component analysis (PCA) and independent component analysis (ICA) with the added non-negative constraints. Researchers have proposed several different algorithms based on the traditional NMF to make improve- ments such as Least squares-NMF [105], Weighted-NMF [41], Local-NMF [56], and so on. Here we briefly review the idea of NMF as follows. Given a nonnegative matrix
X= [xij] with m rows and n columns, the NMF algorithm seeks to find nonnegative factorsW= [wij] andH= [hij] such that
X ≈ WH (3.1)
whereWis anm×kmatrix andHis ak×nmatrix. Each column of W is considered as the basic vectors while each column of H contains the encoding coefficient. k here is the rank of dimensionality and normally smaller than m and n for the aim of the dimensionality reduction. All the elements inWand Hrepresent non-negative values. NMF method has been found to be useful tool in both data compression and data clustering.
Data compression
NMF is distinguished from the other dimensionality reduction methods by its non neg- ativity constraints. These constraints can lead to a part-based representation rather than whole-based representation because they only additive rather than subtractive and combinations. This is the good performance in the applicable fields such as image compression, text compression and so on. For example, in the image compression, the figure 3.1 from [23] shows the comparison between NMF basis images and singular value decomposition (SVD) basis images on the face dataset.
Figure 3.1: Description of NMF and SVD basis vectors on face dataset, from [23] BothWand its corresponding weights inHare sparse while SVD factors are nearly whole-based representation which show the whole faces and therefore needs more com- putations. NMF basis images W can be visualised as the parts of faces and have the nice interpretation which shows the individual components of the faces clearly (e.g. ears, noses,mouths and so on). To reach the good compression performance, some properties including sparsity, orthogonality and error are evaluated.
Also NMF can be used for text mining applications. In this process, a document- term matrix is constructed with the weights of various terms (typically weighted word
frequency information) from a set of documents. This matrix is factored into a term- feature and a feature-document matrix.
Data clustering
NMF is similar to the traditional vector quantisation and k-means clustering [23]. Also recently, the equivalence of NMF and spectral clustering has been proved in [14]. NMF then can be used to facilitate cluster exploration. Brunet et al. compare its clustering performance with self-organizing map (SOM) and hierarchical clustering (HC) [49] and concludes that NMF is an efficient method on identifying the gene expression dataset patterns. After that, a large body of researches has been published to address the anal- ysis extension and application of clustering performance of NMF in image processing, signal processing and data mining during the last decade.
Assume that each column in X represents the data points to be analyzed and k ≤
min(m, n) is often assumed as the number of clusters. Each element of H indicates the confidence value a data point belonging a data cluster. The ith data point is as- signed to thejth cluster whenj=argmaxkl=1hli. On the dual view, an element of W,
Wia(1 ≤ a ≤ k), describes the degree of the point i belonging to the cluster a and pointiis assigned to cluster a ofa=argmaxka=1Wia [49]. According to this property of NMF, we can obtain the cluster label for a given dataset from the W orHvalue.