Nonnegative Matrix Factorization with Integrated Graph and Feature Learning

(1)

42 and Feature Learning

CHONG PENG and ZHAO KANG, Southern Illinois University at Carbondale YUNHONG HU, Yuncheng University

JIE CHENG, University of Hawaii at Hilo

QIANG CHENG, Southern Illinois University at Carbondale

Matrix factorization is a useful technique for data representation in many data mining and machine learning tasks. Particularly, for data sets with all nonnegative entries, matrix factorization often requires that factor matrices be nonnegative, leading to nonnegative matrix factorization (NMF). One important application of NMF is for clustering with reduced dimensions of the data represented in the new feature space. In this paper, we propose a new graph regularized NMF method capable of feature learning and apply it to clustering. Unlike existing NMF methods that treat all features in the original feature space equally, our method distinguishes features by incorporating a feature-wise sparse approximation error matrix in the formulation. It enables important features to be more closely approximated by the factor matrices. Meanwhile, the graph of the data is constructed using cleaner features in the feature learning process, which integrates feature learning and manifold learning procedures into a unified NMF model. This distinctly differs from applying the existing graph-based NMF models after feature selection in that, when these two procedures are independently used, they often fail to align themselves toward obtaining a compact and most expressive data representation. Comprehensive experimental results demonstrate the effectiveness of the proposed method, which outperforms state-of-the-art algorithms when applied to clustering.

CCS Concepts:

r

Information systems_{→ Clustering;}

r

Theory of computation_{→ Unsupervised}

learning and clustering;

r

Computing methodologies→ Cluster analysis; Dimensionality

reduc-tion and manifold learning; Non-negative matrix factorizareduc-tion;

Additional Key Words and Phrases: Non-negative matrix factorization, manifold learning, feature learning, clustering

ACM Reference Format:

Chong Peng, Zhao Kang, Yunhong Hu, Jie Cheng, and Qiang Cheng. 2017. Nonnegative matrix factorization with integrated graph and feature learning. ACM Trans. Intell. Syst. Technol. 8, 3, Article 42 (January 2017), 29 pages.

DOI: http://dx.doi.org/10.1145/2987378

1. INTRODUCTION

High-dimensional data are often used in data-mining and computer vision problems, whose dimension makes it challenging to learn from examples and thus may lead This work was supported by the National Science Foundation, under grant IIS-1218712; the National Natural Science Foundation of China, under grant 11241005; and the Shanxi Scholarship Council of China 2015-093. Authors’ addresses: C. Peng, Z. Kang, and Q. Cheng (corresponding author), Department of Computer Science, Southern Illinois University, Carbondale, IL, 62901, USA; emails: {pchong, zhao.kang, qcheng}@siu.edu; Y. Hu, Department of Applied Mathematics, Yuncheng University, Yuncheng, Shanxi Province 044000, China; email: [email protected]; J. Cheng, Department of Computer Science and Engineering, University of Hawaii, Hilo, HI 96720, USA; email: [email protected].

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax+1 (212) 869-0481, or [email protected].

c

2017 ACM 2157-6904/2017/01-ART42 $15.00 DOI: http://dx.doi.org/10.1145/2987378

(2)

to limited applications. A data representation revealing latent data structures of high-dimensional data is usually helpful for further processing. How to find a suit-able representation of the data [Cai et al. 2008; Lee and Seung 1999; Li and Ding 2006; Tao et al. 2007] is thus a critical problem in many data-mining and -learning tasks. To this end, a number of methods for finding proper representations have been developed, among which matrix factorization-based methods have been popular [Jolliffe 2002; Liu et al. 2013; Peng et al. 2015; Lee and Seung 1999; Kang et al. 2015]. Matrix factor-ization finds two or more data matrices such that their product can well approximate the original data matrix. Some widely used techniques include principal component analysis (PCA) [Jolliffe 2002], singular value decomposition (SVD) [Duda et al. 2012], low-rank representation [Liu et al. 2013], and nonnegative matrix factorization (NMF) [Lee and Seung 1999]. Among them, SVD has been one of the most frequently used factorization techniques. Given a data matrix X∈ Rd×n_{of rank r with r} _{≤ min (d, n),} SVD can be expressed as X= UVT_{, where}_{is a diagonal matrix with}

ii = σisorted in non-increasing order, and U and V are unitary matrices that contain uiandvi corre-sponding toσi, respectively. Hereσi, ui ∈ Rd, andvi ∈ Rrare called singular values, left singular vectors, and right singular vectors, respectively. SVD can also be expressed as X=r_i₌₁σiuiviT. By making use of the largest k singular values, Latent Semantic Indexing (LSI) [Deerwester et al. 1990] seeks to approximate X by Xk=

k

i=1σiuiviT. In this way, the LSI approximation has a low rank because the variations that correspond to those small singular values are eliminated.

For datasets with only nonnegative values, such as images and documents, NMF has been used to represent them with nonnegative basis and coefficients, which naturally leads to parts-based representations [Lee and Seung 1999]. This approach matches the parts-based representation in human brain that has been shown in previous studies with psychological and physiological evidence [Palmer 1977; Wachsmuth et al. 1994; Logothetis and Sheinberg 1996]. NMF was originally proposed to learn parts of objects to overcome the drawback of LSI, for which the interpretation of basis vectors is difficult due to the issue of mixed signs. Given that X is nonnegative with instances stored as columns, NMF [Lee and Seung 1999, 2001; Paatero and Tapper 1994] finds two nonnegative matrices such that their product can well approximate the original matrix, which can be denoted as X≈ U VT_{. The column vectors of U and V are the basis vectors} and coefficient vectors with respect to the new basis, respectively. When the number of basis vectors, that is, the number of columns of U , is large, NMF has been proven to be NP-hard [Vavasis 2009]. Arora et al. [2012] has recently given some conditions, under which NMF is solvable. More recently, various variants based on the original NMF have been developed [Gillis and Vavasis 2014; Guan et al. 2012a; Arora et al. 2012; Guan et al. 2012b; Kuang et al. 2012; Esser et al. 2012; Ozerov and Févotte 2010; Févotte and Idier 2011; Kim et al. 2014]; see Wang and Zhang [2013] for a review. For example, Esser et al. [2012] proposes to factorize a matrix into a product of nonnegative matrices by a collaborative convex model, rendering the dictionary to coincide with certain observations of the data; Févotte and Idier [2011] comes up with algorithms to solveβ-divergence-based models, including three special cases with β equal to 2, 1, and 0, that correspond to the Euclidean distance, the Kullback-Leibler divergence, and the Itakura-Saito divergence, respectively. Many algorithms have been developed for the optimization of NMF, including multiplicative update rule [Lee and Seung 2001; Lin 2007b], projected gradient descent [Lin 2007a], and alternating nonnegative least squares [Berry et al. 2007]. However, because the objective function of NMF is not convex for U and V jointly, it is unrealistic to find a global optimum for NMF.

The classic NMF only considers the linear structures of the data by finding new data points with respect to the new basis and ignores the nonlinear structures of the data, which is usually important for many applications such as clustering. To learn the latent

(3)

nonlinear structures of the data, graph-regularized nonnegative matrix factorization (GNMF) considers the intrinsic geometrical structures of the data on a manifold by incorporating a Laplacian regularization [Cai et al. 2011]. By modeling the data space as a manifold embedded in an ambient space and performing NMF on this manifold, GNMF considers both linear and nonlinear relationships of the data points in the original instance space, and thus it is also more discriminating than ordinary NMF, which only considers the Euclidean structure of the data [Cai et al. 2011]. This renders GNMF more suitable for clustering purposes than the original NMF. Based on GNMF, robust manifold nonnegative matrix factorization (RMNMF) constructs a structured sparsity-inducing norm-based robust formulation [Huang et al. 2014]. With an 2,1 norm, RMNMF is insensitive to the between-sample data outliers and improves the robustness of NMF [Huang et al. 2014].

NMF algorithms often need to be applied to high-dimensional datasets such as im-ages and documents [Cai et al. 2011; Huang et al. 2014]. Because high-dimensional data usually have many irrelevant or less important features, feature selection can be used to find important features [Song et al. 2013; Cheng et al. 2011; Xing et al. 2001; Liu and Motoda 2012; Duda et al. 2012; Nie et al. 2010; Cai et al. 2010]. As a fundamental technique in machine learning and data mining, most feature selection methods are simply designed as a pre-processing step before data are analyzed. For NMF, the existing formulations usually consider Frobenius norm-based loss function, which regards all features to have the same importance. Since in reality not all fea-tures are equally important, incorporating the idea of feature selection into NMF can potentially boost the performance of NMF. However, for unsupervised learning, feature selection is often performed in a supervised fashion by requiring clustering for prepro-cessing, such as multi-class feature selection (MCFS) [Cai et al. 2010]. Moreover, the two-step approach that involves feature selection and further data processing is found to be inappropriate for some problems [Chang 1983; Arabie 1994], while embedding feature selection into subspace clustering framework has been shown to be effective for subspace clustering [Peng et al. 2016]. Therefore, it is desirable to design a method that performs feature learning and NMF in a single model. A recent work on NMF has used a column-sparse2,1-norm-based loss function to account for example-wise noise or outliers [Huang et al. 2014]. Similarly, a row-sparse “norm” can be used to account for feature-wise noise or outliers, such that the loss corresponding to irreverent or noisy features may be disregarded or relaxed in the approximation.

Based on the idea of marrying feature selection and NMF as discussed above, we propose a new NMF method that approximates important features with a row-sparse norm while incorporating the manifold structural information. In this way, we can exploit intrinsic geometrical structures of the data in the new feature space. Unlike existing papers that perform factorization on a manifold defined with noisy or irrelevant features in the original space, our article performs NMF on a manifold in the subspace spanned by clean or essential features, thus approximating important features and capturing intrinsic geometrical information more closely. The graph is updated with the selected features accordingly in each step of optimization, and thus feature learning and manifold learning are integrated into a unified procedure.

We summarize the main contributions of this article as follows:

• A new NMF method is proposed that seamlessly incorporates feature learning to alleviate the adverse effect of irrelevant or less important features in the NMF approach to unsupervised learning;

• Manifold learning is incorporated into the new NMF method, which enhances the capability of exploiting nonlinear structures of the data. Unlike existing graph-based NMF models that construct the graph in a pre-precessing step based on all features

(4)

in the original space, the new method constructs the graph based on the essential and most relevant features obtained during the learning process, and thus the graph is less afflicted with irrelevant and grossly corrupted features. Therefore, the proposed NMF method has a powerful data representation ability that assimilates as well as enables feature learning and manifold learning jointly.

• Extensive experimental results verify the effectiveness of the proposed model and algorithm.

The rest of the article is organized as follows. We review related work in Section 2. The proposed model and algorithms are presented in Section 3. Extensive experimental results are discussed in Section 4. Finally, we conclude this article in Section 5. 2. RELATED WORK

Given a nonnegative matrix X= [x1, x2, . . . , xn]∈ Rd×n, with each of its columns being an example, NMF finds two nonnegative matrices U = [u1, u2, . . . , ur] ∈ Rd×r and

V = [v1, v2, . . . , vn] ∈ Rn×r such that X ≈ U VT, where r  min {d, n}. Here, U is the basis matrix, and V is the coefficient matrix. Generally, for a dense matrix, the approximation U VT _{can efficiently reduce the rank of the matrix. Under this new} basis, each data point Xi ∈ Rdcan be represented as Vi ∈ Rrwith significantly reduced dimension for i= 1, 2, . . . , n. The Frobenius norm and the divergence are two commonly used loss functions of the fitting error [Paatero and Tapper 1994; Lee and Seung 2001]. Their corresponding NMF models solve the following optimization problems:

min U,V X − U V T2 F, s.t. U ≥ 0, V ≥ 0 (1) and min U,V i, j Xi jlog Xi j Yi j − Xi j+ Yi j , s.t. Yi j = [U VT]i j, U ≥ 0, V ≥ 0, (2) where · F is the Frobenius norm. Equation (2) minimizes the divergence of X from

Y instead of the distance between X and Y , which is not symmetric of X and Y . When

i jxi j =

i jyi j = 1, the objective of Equation (2) reduces to the Kullback-Leibler divergence or relative entropy. Equation (1) is more widely used for NMF formulation due to its ease of optimization and, in this article, we focus on the Frobenius norm-based formulations. It is seen that the objective function of Equation (1) is not jointly convex for U and V , even though it is convex for U or V individually. A multiplicative updating strategy, which has been proven to find local minima of Equation (1), has been proposed in Lee and Seung [2001] as follows:

Ui j ← Ui j (XV )i j (U VT_{V )} i j) (3) and Vi j← Vi j (XT_{U )} i j (V UT_{U )} i j). (4) Equation (1) learns a parts-based representation in the Euclidean space. The intrinsic geometrical structure of the data is not considered, and thus important information useful for real-world applications is not preserved. To overcome this limitation, a nat-ural assumption is that if two data points xi and xj are close in the intrinsic geometry of the data distribution, then the corresponding data points vi and vj with respect to the new basis U are also close [Cai et al. 2011]. Based on this assumption, the

(5)

following quantity can be defined to measure the smoothness of the low-dimensional representation: 1 2 n i=1 n j=1 vi− vj22Wi j, = n j=1 Dj jvTj vj− r i=1 n j=1 Wi jviTvj, = Tr(VT_{DV )}_{− Tr(V}T_{W V )}_{= Tr(V}T_{LV )}_, (5)

where W is the weight matrix that measures the pairwise similarities of original data points, D is a diagonal matrix with Dii =

jWi j, and L= D−W is the graph Laplacian [Chung 1997]. Recently, in spectral graph theory [Chung 1997] and manifold-learning theory [Belkin and Niyogi 2001], it has been demonstrated that a nearest-neighbor graph on a scatter of data points can effectively model the local geometric structure [Cai et al. 2011]. For data point xi, there is an edge connecting it with data point xj if xj is within the p nearest neighborhood of xi. To define the weight matrix W in Equation (5), three commonly used methods are as follows:

• 0-1 Weighting. Wi j = 1 if nodes i and j are connected; otherwise Wi j = 0. It is the simplest weighting method.

• Heat Kernel Weighting. Wi j = e− xi−xj 2

σ if nodes i and j are connected; otherwise Wi j = 0. Heat kernel has an intrinsic connection to the Laplace-Beltrami operator on differentiable functions on a manifold [Belkin and Niyogi 2001].

• Dot-Product Weighting. Wi j= xiTxj if nodes i and j are connected. For normalized

xi and xj, dot-product weighting is equivalent to cosine similarity.

Incorporating manifold information of Equation (5) into the NMF model, GNMF aims at solving the following model:

min

U,V X − U V T2

F+ βTr(VTLV ), s.t. U ≥ 0, V ≥ 0, (6)

where β > 0 is a tradeoff parameter. It is noted that in Equation (6), the Frobenius norm is adopted to measure the fitting residual, which is known to be sensitive to noise and outliers. To enhance the robustness, RMNMF measures samplewise fitting residuals with the2,1norm as follows [Huang et al. 2014]:

min

U,V X − U V T₂

,1+ βTr(VTLV ), s.t. V ≥ 0, VTV = I, (7) where · 2_,1 adds the2norms of all columns of a matrix, and I∈ Rr×r is an identity matrix.

3. NMF WITH INTEGRATED GRAPH AND FEATURE LEARNING 3.1. Formulation

For real data in most applications, especially high-dimensional data, not all features are important, and feature selection is often applied before further analysis. Inspired by feature selection, we consider building an unsupervised feature-learning ability in an NMF framework. Existing unsupervised feature selection methods usually use a projection matrix; by optimizing the projection matrix-based objective function, impor-tant features can be selected. However, NMF is to approximate the data matrix in the original instance space, and no projection matrix is readily available; hence projection matrix-based feature selection methods may not be used in NMF. This article will show

(6)

that we do not have to use a projection matrix to achieve unsupervised feature selec-tion. With some features of X being redundant or less important, the approximation to X does not need to well fit those features. To relax the requirement for approximat-ing these features, we simply require that the approximation residuals should have a row-sparsity property. Hence, unlike existing feature selection methods that use a projection matrix to do feature selection, this article exploits the approximation resid-uals to select features in an unsupervised NMF learning setting. Also, because the intrinsic geometric structures of the data are important and must be preserved in the new feature space, we follow Cai et al. [2011] to consider a graph Laplacian to model the local manifold as given in Equation (6). Putting them together, we propose the following model: min U,V Tr(V T_{LV )}_, s.t. H0≤ K, Hi = ([X − U VT](i))T2, U ≥ 0, V ≥ 0, (8) where·0denotes the vectorial0norm,·2is the2norm, A(i)represents the ith row of a matrix A, H = [H1, H2, . . . , Hd]T ∈ Rdis a vector containing the 2norms of the rows of X− U VT_{, and K}_{> 0 controls the sparsity of H. For a matrix M, let M}

(2,0)=

i1{M(i)2=0}, where 1 is an indicator function. Hence, M(2,0) counts the number of

nonzero 2 norms of the row vectors of M, measuring the row sparsity of the matrix similarly to how a0norm measures the sparsity of a vector. Because Equation (8) is hard to optimize, we use a regularized formulation of Equation (8) as follows:

min U,V X − U V T₍₂ ,0)+ βTr(VTLV ), s.t. U ≥ 0, V ≥ 0, (9) whereβ is a tradeoff parameter. Different weighting schemes can be adopted to define the graph Laplacian term in Equation (9); however, exploration of these schemes is not in the scope of this article. Following Cai et al. [2011], we use a 0-1 weighting scheme for simplicity, but our model can be readily extended to other weighting schemes. As described in Huang et al. [2014], RMNMF captures the local geometric structure of the data distribution by incorporating the p-nearest Laplacian graph. Its assumption that two neighboring points share the same label may fail to hold when p increases. Therefore, similarly to Cai et al. [2011] and Huang et al. [2014], we use a default number of p= 5 in this article, and in Subsection 4.4 we will empirically investigate the effect of p values.

It is noted that the first term in Equation (9) requires that important features be exactly recovered by U VT _{while less important ones are ignored. Equation (9) is a} formulation similar to Huang et al. [2014], hence a similar optimization may be em-ployed; for example, augmented Lagrange multipliers (ALM)-based optimization can also be used to optimize Equation (9). However, as pointed out by Huang et al. [2014], ALM-based optimization needs to solve SVDs at each iteration, which is inefficient. To overcome this computational drawback, we introduce a new variable E, in a similar vein to the slack variables used in support vector machines, and further relax Equation (9) as follows: min U,V,EX − U V T _{− E}2 F+ αE(2,0)+ βTr(VTLV ), s.t. U ≥ 0, V ≥ 0. (10)

Hereα and β are two tradeoff parameters, and E ∈ Rd×n_{represents those irrelevant} or inessential features that the factorization does not need to approximate closely. By introducing E in the objective function, featurewise noise or noisy features are captured

(7)

to better approximate the data restricted to clean or useful features. Moreover, the term E(2,0)in Equation (10) is less comprehensive thanX−U VT(2,0)in Equation (9) and thus is easier to optimize. As usually done in the literature to relax the0norm with the1norm, we also relax Equation (9) into another formulation:

min

U,V,EX − U V

T _{− E}2

F+ αE(2,1)+ βTr(VTLV ),

s.t. U ≥ 0, V ≥ 0, (11)

whereM(2_,1)=_iM(i)2is the sum of rowwise2norms for a matrix M. By tuning α in Equations (10) and (11), we can control the number of important features to be well approximated. However, a drawback of Equations (10) and (11) is that the graph Laplacian is still constructed using all features of the original data, which may include many irrelevant features or noise, making the graph Laplacian potentially highly noisy. In other words, Equations (10) and (11) fail to incorporate feature learning into the construction of the manifold. To address this limit, we further require that the factorization should respect the intrinsic manifold structure in the subspace of clean or essential features. Inspired by Wang et al. [2014], we construct the manifold structure based on X− E and adjust it along with the updating of E. Then, using the weighting scheme previously mentioned, the graph Laplacian can be updated accordingly. In this article, ˜W (respectively, ˜D and ˜L) is constructed from X− E while W (respectively, D and L) from X. Therefore, Equations (10) and (11) are further developed to be

min U,V,EX − U V T _{− E}2 F+ αE(2,0)+ βTr(V T_{LV )}_˜ _, s.t. U ≥ 0, V ≥ 0, (12) and min U,V,EX − U V T _{− E}2 F+ αE(2,1)+ βTr(VTLV )˜ , s.t. U ≥ 0, V ≥ 0, (13) respectively. It is noted that the solutions to Equations (12) and (13) are not unique. To see this, suppose we have obtained a solution,{U∗, V∗, E∗}, to either of the above models; then it is clear that when a scalar a> 0 increases, {aU∗,1

aV∗, E∗} will decrease the objective value. Therefore, a trivial solution can be obtained with a → ∞. Two approaches are commonly used to resolve this problem: The first approach as in Huang et al. [2014] is to add an additional constraint on V such that VTV = I, as done in Equation (7). The second approach, as in Cai et al. [2011], is to apply an ad hoc step by normalizing the columns of V∗ and scale U∗ correspondingly. In this article, we adopt the second approach, which will be shown to be more convenient for optimization. We refer to the proposed method as NMF with integrated graph and feature Learning (NMF2L); in particular, Equations (12) and (13) are referred to as NMF2L(2,0) and NMF2L(2,1), respectively.

3.2. Optimization with Multiplicative Updates

For the optimization, we first optimize Equation (12) and Equation (13) with respect to U and V while keeping E fixed. We rewrite Equations (12) and (13) as

Tr(X− E)(X − E)T− 2Tr(X− E)V UT + Tr(U VT_{V U}T₎_{+ αE(}

∗)+ βTr(VTLV )˜ , s.t. U ≥ 0, V ≥ 0,

(14)

(8)

For the constrained optimization with respect to U and V , we use the Lagrange multiplier method and get the Lagrangian

L = Tr(X− E)(X − E)T− 2Tr(X− E)V UT + Tr(U VT_{V U}T₎_{+ αE(∗)}_{+ βTr(V}T_{LV )}_˜ _, + Tr(UT₎_{+ Tr(V}T₎_,

(15)

where = [i j] and = [i j] are the Lagrange multipliers for U ≥ 0 and V ≥ 0. By the first-order optimality conditions, we have

∇UL = −2XV + 2U VTV+ = 0 (16)

and

∇VL = −2XTV + 2V UTU+ 2β ˜LV + = 0. (17) By the Karush–Kuhn–Tucker (KKT) conditions,i jUi j = 0 and i jVi j = 0, and we get the following updating rules for U and V element-wisely:

Ui j← Ui j [(X− E)V ]i j (U VT_{V )} i j (18) and Vi j ← Vi j [(X− E)T_U_{+ β ˜}_{W V ]} i j (V UT_U_{+ β ˜DV )} i j . (19)

After updating U and V , we keep U and V fixed and update E. Equation (12) or Equation (13) can be reduced to the following subproblem:

min

E X − U V

T _{− E}2

F+ αE(∗). (20)

To solve Eqution (20), we need the following two lemmas. LEMMA3.1. Given a matrix W = [w₁T, w₂T, . . . , wT

n]T ∈ Rm×nand a positive scalarλ,

then the optimal solution of min

X W − X 2

F+ λX(2,0) (21)

is X∗with the ith row being

X∗_(i)=

wT

i , if wiT22> λ

0, otherwise. (22)

PROOF. Note that Equation (21) is equivalent to the following optimization problem: min X n i=1 W(i)− X(i)2 2+ λ1{X(i)2=0}, (23)

which can be solved for each i in a decoupled manner: min

X W(i)− X(i) 2

2+ λ1{X(i)2=0}. (24)

Then, regarding the value of Equation (24), we have the following four cases: • W(i)= 0 and X(i)= 0. W(i)− X(i)2

2+ λ1{X(i)=0}= 0;

• W(i)= 0 and X(i)= 0. W(i)− X(i)2

(9)

• W(i)= 0 and X(i)= 0. W(i)− X(i)2

2+ λ1{X(i)=0}= W(i)

2 2;

• W(i)= 0 and X(i)= 0. W(i)− X(i)22+λ1{X(i)=0}≥ λ with equality holds when X(i)= W(i).

Now it is easy to check that Equation (22) yields the minimal value for Equation (24). Hence, X∗is the optimal solution to Equation (21).

LEMMA3.2 [HUANG ET AL. 2014]. Given a matrix W = [w1, w2, . . . , wn]∈ Rm×nand a

positive scalarλ, then X∗ is the optimal solution of min X 1 2W − X 2 F+ λX2,1, (25)

and the ith column of X∗is X∗_i =

_wi2−λ

wi2 wi, if wi2> λ

0, otherwise. (26)

According to Lemmas 3.1 and 3.2, we have the following two rowwise updating rules that correspond to Equations (12) and (13), respectively:

E(i)= Q(i), if (Q(i))T22> α 0, otherwise (27) and E(i)= Q(i)2−α/2

Q(i)2 Q(i), if (Q(i))

T₂_> α 2

0, otherwise, (28)

where Q= X − U VT.

After obtaining E, we update ˜W , ˜D, and ˜L accordingly based on X− E. In summary, we outline the procedure for solving Equations (12) and (13) in Algorithm 1.

3.3. Parts-Based Representations

For nonnegative matrix factorization, it is essential to enforce the basis and coefficient matrices to be nonnegative to obtain parts-based representations. The updating rules ALGORITHM 1: Solving Equation (12) or Equation (13) by Multiplicative Updates

1: Input: Parametersα, β, maximum iteration number tmax, and data matrix X. 2: Initialize: U(0)_{, V}(0)_{, E}(0)_{, ˜}_{W , ˜}_{D, and k}_{= 0.} 3: Repeat 4: [U(k+1)_] i j= [U(k)_] i j[(X−E(k))V(k)]i j [U(k)_(V(k)₎T V(k)_] i j ; 5: [V(k₊₁₎_] i j= [V(k)_] i j[(X−E(k))TU(k+1)+β ˜W V(k)]i j [V(k)(U(k+1))TU(k+1)+β ˜DV(k)]i j ; 6: for (12), [E(k+1)_]

(i)= Q(i)1{Q(i)>α};

for (13), [E(k+1)_]

(i)= Q_Q(i)2−α/2

(i)2 Q(i)1{Q(i)2>α2};

with Q= X − U(k+1)_(V(k+1)₎T

;

7: Update ˜W and ˜D according to X− E(k+1)_;

8: k= k + 1;

9: Until k≥ kmaxor convergence.

(10)

of Equations (18), (19), (27), and (28) guarantee that Algorithm 1 can learn parts-based representations for the proposed methods. To show this, we have the following Theorems 3.1 and 3.2.

THEOREM 3.1. Given nonnegative initializations of U , V , and zero initialization of E, the matrix sequence{U(k)_{, V}(k)_{, (X − E)}(k)_{} is nonnegative under the updating rules of} Equations (18), (19), and (27), and thus Algorithm 1 can learn parts-based representa-tions for NMF2L(2,0).

PROOF. At iteration 0, the nonnegativity of U(0)_{, V}(0)_{, and (X}_{− E)}(0)_{is guaranteed by} the initializations.

Assume that U(k) _{≥ 0, V}(k) _{≥ 0, and (X − E)}(k) _{≥ 0; next we are going to show that} U(k+1) _{≥ 0, V}(k+1) _{≥ 0, and (X − E)}(k+1) _{≥ 0 hold. The nonnegativity of U}(k+1) _and V(k+1)_{are straightforward by Equations (18) and (19). From Equation (27), we can see} that [E(k+1)_]

(i) is either 0 or [(X− U VT)(k+1)](i), hence [(X− E)(k+1)](i) is either X(i) or [(U VT)(k+1)](i), which is also nonnegative. Therefore, by mathematical induction, the point sequence{U(k)_{, V}(k)_{, (X − E)}(k)_{} is nonnegative.}

THEOREM 3.2. Given nonnegative initializations of U , V , and zero initialization of E, the matrix sequence{U(k)_{, V}(k)_{, (X − E)}(k)_{} is nonnegative under the updating rules} of Equation (18), Equation (19), and Equation (28), and thus Algorithm 1 can learn parts-based representations for NMF2L(2,1).

PROOF. The nonnegativity of U(0)_{, V}(0)_{, and (X}_{− E)}(0)_{is guaranteed by the} initial-izations, and hence the sequence{U(k)_{, V}(k)_{, (X − E)}(k)_{} being nonnegative holds when} k= 0.

Next, we are going to show that U(k+1) ≥ 0, V(k+1) ≥ 0, and (X − E)(k+1) ≥ 0 if we have U(k) _{≥ 0, V}(k) _{≥ 0, and (X − E)}(k) _{≥ 0 for any k > 0. Assume that all elements of} U(k)_{, V}(k)_{, and (X}_{− E)}(k) _{are nonnegative at iteration k; then U}(k+1)_{≥ 0 and V}(k+1) _{≥ 0} by simple algebra using Equations (18) and (19). From Equation (27), we can see that [E(k+1)_]

(i)is either 0 or [(X−U V

T₎(k+1)_]_(i)_−α/2

[(X−U VT₎(k+1)_]_(i)₂ [(X− U VT)(k+1)](i). If [E(k+1)](i) = 0, then [X− E(k+1)_]

(i) = X(i), which implies that (X− E)(k+1) is nonnegative. In the case of [E(k+1)_] (i) = [(X−U V T₎(k+1)_]_(i)_−α/2 [(X−U VT₎(k+1)](i)2 [(X− U V T₎(k+1)_] (i), denoting Q(k+1) = (X − U VT)(k+1), we have [X− E(k+1)](i) = X(i)− [Q(k+1)_] (i)+α 2 [Q(k+1)](i) [Q(k+1)_]_(i)2 = [(U VT₎(k+1)_] (i)+ α 2 [X− (U VT₎(k+1)_](i) [Q(k+1)_]_(i)2 = α 2 X(i) [Q(k+1)_]_(i)2 + [(U V T₎(k+1)_] (i) [Q(k+1)_](i)2₋α 2 [Q(k+1)_]_(i)2 . (29)

Since [Q(k+1)](i)2 > α₂, we know the right-hand side of Equation (29) is nonnega-tive. Therefore, by mathematical induction, the point sequence{U(k)_{, V}(k)_{, (X − E)}(k)_{} is} nonnegative.

(11)

It is difficult to show the convergence of either objective value sequence or point sequence. In a special case where E is fixed, the updating rules guarantees the non-increasing property of the objective value sequence. The proof essentially follows Cai et al. [2011]. Moreover, if E is fixed to zero matrix, then the proposed models fall back to GNMF.

3.4. Computational Complexity

In this subsection, we analyze the computational cost of our algorithms in terms of multiplications as they dominate the total cost. To begin with, we analyze the cost per iteration. For U , we need dnr+ nr2_{+ dr}2_{+ 2dr = O(dnr + nr}2_{+ dr}2_{) operations per} iteration. For V , we need ndr + n2_r _{+ nr + dr}2 _{+ nr}2 _{+ n}2_r _{+ nr + 2nr = O(dnr +} n2_r_{+ dr}2_{+ nr}2_{) operations. For the update of E, we consider the two cases respectively.} For NMF2L(2_,0), we need d(n+ 1) = O(dn) operations; for NMF2L(2_,1), we need d(n+ 1)+ (n + 1)d = O(dn) operations; that is, to update E, we need O(dn) operations. Hence, the total number of operations needed to update U , V , and E per iteration is O(dnr+ nr2 _{+ dr}2₎_{+ O(dnr + n}2_r_{+ dr}2 _{+ nr}2₎_{+ O(dn) = O(dnr + dr}2_{+ n}2_r_{+ nr}2_). Besides, to update ˜W and ˜D, we need O(dn2_{) multiplications per iteration. Supposing} the algorithm terminates in k iterations, the total cost is O((dnr+ dr2_{+ n}2_r_{+ nr}2_)k)₊ O(kdn2₎_{= O(dnrk + dr}2_k_{+ n}2_rk_{+ nr}2_k_{+ dn}2_k).

4. EXPERIMENTS

In this section, we compare the proposed NMF2L with several widely used methods to show its effectiveness. To evaluate the performance of these methods, we use three met-rics including clustering accuracy, normalized mutual information (NMI), and purity. Clustering accuracy, ranging from 0 to 1, measures the extent to which each cluster contains data points from the same class, given as

Accuracy= n

i=1δ(map(si), ri)

n , (30)

where n is the total number of data points, si and ri represent the predicted and true labels of the data point xi, respectively, map(si) maps each cluster label ri to the equivalent label from the dataset by permutation such that Equation (30) can be maximized, andδ(x, y) is the delta function that returns 1 when x = y otherwise 0. The second measure is defined as

NMI= N i=1 N j=1ni, jlognnii, jnˆj N i=1nilogn_ni Nj=1nˆjlogn_nˆj , (31)

where niand ˆnjdenote the sizes of the ith cluster and jth class, respectively, ni, jdenotes the number of data that are in the intersection between them, and N is the number of clusters. NMI measures the quality of clusters. The third measure is defined as

Purity=1 n N i=1 max(n_ij), (32)

where n_ij is the number of data points in the jth cluster that belong to the ith class. Purity measures the extent to which each cluster contains data points from primarily one class. More details about these measures can be found in Nie et al. [2012]. Detailed discussions about the experiments are shown in the following subsections.

(12)

Table I. Data Description

Data Pen Digits COIL20 EYaleB ORL Jaffe Semeion

Size 10,992×16 1,440×1,024 2,414×1,024 400×1,024 213×676 1,593×256

N 10 20 64 40 10 10

4.1. Data Sets

We use six datasets, including Pen digit1 _{[Lichman 2013], Semeion}2_{[Lichman 2013],} COIL203 _{[Nene et al. 1996], ORL}4 _{[Samaria and Harter 1994], Extended Yale B} (EYaleB)5 [Georghiades et al. 2001], and Jaffe6 [Lyons et al. 1998], to evaluate the performance of NMF2L. Among them, the first two are handwritten digit datasets, the third is an object image dataset, and the rest are face image datasets. We summarize some statistics of these datasets in Table I and briefly describe them as follows: • Pen Digit dataset contains 10,992 handwritten pen digits of size 4×4 from

44 writers.

• Semeion dataset contains 1,593 handwritten digits from around 80 persons. These images were scanned, stretched into size 16× 16.

• COIL20 contains 1,440 gray scale images of 32 × 32 pixels from 20 objects viewed from different angles.

• ORL dataset contains 400 face images belonging to 40 individuals with 10 images per person. The images were taken at different times, with varying lighting conditions, facial expressions and facial details.

• EYaleB dataset consists of 2,414 face images of 38 individuals. Each image is resized to 32×32 from the original size 192×168.

• Jaffe contains 213 images of seven facial expressions posed by 10 Japanese models with each image rated on six emotion adjectives by 60 Japanese subjects.

4.2. Algorithms in Comparison

We compare NMF2L with several state-of-the-art algorithms, including k-means [Cai 2011], PCA [Jolliffe 2002], robust principal component analysis (RPCA) [Cand`es et al. 2011], spectral clustering (SC) [Ng et al. 2002], NMF [Lee and Seung 1999], GNMF [Cai et al. 2011], RMNMF [Huang et al. 2014], and MCFS [Cai et al. 2010], to evaluate the effectiveness of NMF2L. For fair comparison, N, the number of clusters, is given for all methods. Brief descriptions of these methods in comparison are summarized as follows:

• k-means clustering method is one of the most widely used clustering methods due to its simplicity and efficiency. For k-means, we use a fast implementation.7 _{It is also} applied as a final clustering step for the other methods.

• PCA is one of the most widely used dimension reduction techniques. We keep the principal components corresponding to the top 85% variance. Then k-means is applied to the top principal components.

• RPCA is more robust to noise and outliers than PCA. We use inexact ALM (IALM) as the solver with the theoretically optimal choice of the regularization parameter being used. Then k-means is applied to the recovered low-rank part.

1_{https://archive.ics.uci.edu/ml/datasets/Pen-Based+Recognition+of+Handwritten+Digits.} 2_{https://archive.ics.uci.edu/ml/datasets/Semeion+Handwritten+Digit.} 3_{http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html.} 4_{http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html.} 5_{http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html.} 6_{http://www.kasrl.org/jaffe.html.} 7_{http://www.cad.zju.edu.cn/home/dengcai/Data/Clustering.html.}

(13)

• For SC, we use RBF kernel to construct the graph matrix with radius parameter chosen from the setS = {10−3, 10−2, 10−1, 100, 101, 102, 103}.

• NMF learns parts-based representations. Then we apply k-means clustering to the learned coefficients matrix V , that is, the new data points with respect to the new basis.

• GNMF considers the nonlinear structures of the data on manifold as well as linear structures of the data in the original instance space. Then k-means clustering is applied to the learned coefficient matrix. The balancing parameter ranges over the setS.

• RMNMF adopts 2,1 norm as the loss function instead of the Frobenius norm as in GNMF. In our experiment, the regularization parameter ranges over the setS. • MCFS is a feature selection method such that the selected features can best preserve

the multi-cluster structure of the data. After applying MCFS to the data, we apply k-means, NMF, and GNMF to the selected features, namely MCFS + k-k-means, MCFS + NMF, and MCFS + GNMF, respectively. The number of selected features, F, depends on dataset and, depending on dimensions of the datasets, we consider F from the sets {2, 4, 6, 8, 10, 12}, {10, 20, 30, 40, 50}, and {50, 100, 150, 200, 250, 300} for Pen digit, Semeion, and all the rest datasets, respectively. For MCFS + GNMF, all possible combinations of F andβ values are used. MCFS needs to construct a graph Laplacian with p= 5 neighbors using a 0-1 weighting scheme.

• NMF2L represents the proposed method in this article, including two variants NMF2L(2,0)and NMF2L(2,1). We adopt the same way as MCFS to construct a graph Laplacian for GNMF, RMNMF, and NMF2L, except that NMF2L updates the graph Laplacian per iteration. All possible combinations of values from S × S are used for balancing parameters. For NMF-based methods, r is needed as an input. For a dataset with N clusters, we set r = N as done in Cai et al. [2011] and Huang et al. [2014].

4.3. Comparison Results

For each dataset, the total number of clusters is denoted as ¯N. Then, for more detailed comparison, we conduct experiments using subsets of the given dataset with different number of clusters N. Then, for each N value, there are (N_N¯ ) possible subsets. Among them, we randomly choose 10 subsets, on which we conduct experiments using dif-ferent methods. For each method, the average performances on these 10 subsets are recorded for all possible combinations of parameters and the best average performance is reported. This strategy is repeated for all N and all datasets that appear in Table II to VII . These tables show the results in clustering accuracy, normalized mutual infor-mation, and purity on the datasets, respectively. For all datasets, the best performance for each N is shown in bold. It is observed that NMF2L achieves the best performance in almost all cases on four of six datasets, including the Pen digits (27 of 30 cases), ORL (24 of 27 cases), Jaffe (24 of 30 cases), and Semeion (24 of 30 cases) datasets. On these datasets, the improvements are significant. For example, on the Pen digit data, NMF2L improves the performance over the second-best method by around 5% to 10% in all three measures. It is also observed that by combining feature selection, GNMF improves its performance on some datasets. This implies that the combination of feature selection and NMF model is reasonable. However, the performance is lower for some cases. A possible explanation is that the independence of feature selection and factorization may lose some important connections of these steps, even though both procedures aim at obtaining a more compact representation while preserving important information of the data. Meanwhile, NMF2L incorporates feature learn-ing and manifold learnlearn-ing in a unified NMF model, which shows more competitive

(14)

T a b le II. Cluster ing P e rf or mance o n P en Digits N Accuracy (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 2 88.71 ± 09.39 88.64 ± 09.42 88.71 ± 09.39 83.64 ± 13.05 86.21 ± 11.76 86.78 ± 11.89 84.49 ± 15.02 88.15 ± 11.43 85.27 ± 10.58 92.28 ± 06.68 84.99 ± 14.32 87.90 ± 09.21 3 89.86 ± 07.26 89.85 ± 07.27 89.86 ± 07.26 88.03 ± 07.56 86.41 ± 14.60 88.90 ± 06.51 85.30 ± 07.89 92.18 ± 04.65 90.14 ± 06.31 91.33 ± 08.27 94.26 ± 03.12 94.26 ± 03.12 4 83.45 ± 10.25 83.30 ± 09.96 83.45 ± 10.25 80.90 ± 10.52 68.99 ± 07.53 81.48 ± 09.00 68.49 ± 07.40 83.16 ± 08.58 80.98 ± 08.37 83.30 ± 08.59 88.07 ± 08.93 86.87 ± 10.40 5 78.60 ± 10.03 78.50 ± 09.75 78.60 ± 10.03 71.74 ± 07.96 62.68 ± 09.94 75.69 ± 11.07 59.70 ± 08.82 79.56 ± 09.29 75.63 ± 10.76 79.90 ± 11.19 84.97 ± 10.23 85.15 ± 07.32 6 72.97 ± 10.02 73.80 ± 09.62 72.97 ± 10.02 76.92 ± 06.68 58.22 ± 09.01 78.32 ± 05.52 54.61 ± 09.06 76.68 ± 07.25 78.12 ± 06.81 80.58 ± 07.10 86.91 ± 04.59 87.11 ± 04.33 7 73.80 ± 08.77 76.56 ± 07.10 73.80 ± 08.77 73.71 ± 08.80 56.88 ± 07.36 74.10 ± 08.11 51.14 ± 05.53 77.72 ± 04.57 76.24 ± 06.61 79.49 ± 04.33 84.85 ± 05.69 85.60 ± 04.39 8 71.94 ± 05.51 76.17 ± 02.47 71.94 ± 05.51 72.87 ± 04.46 48.35 ± 06.08 73.90 ± 04.36 45.18 ± 03.99 72.48 ± 04.77 71.28 ± 05.37 73.79 ± 06.09 83.13 ± 05.87 83.13 ± 05.87 9 76.06 ± 03.56 76.49 ± 01.93 76.06 ± 03.56 73.06 ± 04.78 ————— 74.49 ± 05.58 45.77 ± 05.09 70.55 ± 06.26 66.59 ± 04.93 71.91 ± 04.01 79.44 ± 05.48 79.44 ± 05.48 10 77.06 73.92 77.06 77.97 ————— 79.20 49.55 76.11 75.14 76.90 88.01 88.26 Av e ra g e 79.16 79.69 79.16 77.65 ————— 79.21 60.47 79.62 77.71 81.05 86.07 86.41 N Normalized Mutual Information (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 2 60.37 ± 27.46 59.98 ± 27.59 60.37 ± 27.46 50.82 ± 35.17 55.29 ± 31.76 56.75 ± 31.72 49.26 ± 31.17 58.30 ± 29.23 48.33 ± 29.53 69.59 ± 20.25 56.35 ± 31.68 57.03 ± 27.50 3 74.92 ± 13.42 74.86 ± 13.47 74.92 ± 13.42 71.68 ± 12.31 66.21 ± 14.43 72.50 ± 12.01 64.04 ± 14.03 77.73 ± 10.69 72.80 ± 14.31 78.91 ± 16.26 82.79 ± 06.06 82.79 ± 06.06 4 68.95 ± 13.10 68.55 ± 12.81 68.95 ± 13.10 64.98 ± 14.14 57.04 ± 09.20 66.71 ± 11.94 50.04 ± 07.50 67.65 ± 11.54 65.34 ± 11.02 70.93 ± 12.40 78.67 ± 10.63 77.00 ± 14.09 5 66.41 ± 10.82 65.89 ± 10.87 66.41 ± 10.82 60.05 ± 08.32 47.98 ± 10.31 63.89 ± 12.11 45.07 ± 08.77 66.34 ± 12.18 62.77 ± 12.08 72.05 ± 11.45 76.82 ± 11.18 75.23 ± 09.99 6 66.58 ± 06.64 66.03 ± 06.59 66.58 ± 06.64 65.69 ± 05.33 51.26 ± 08.66 68.11 ± 04.82 42.84 ± 08.42 65.64 ± 07.76 67.34 ± 06.62 71.13 ± 05.77 79.80 ± 04.46 80.03 ± 04.58 7 67.84 ± 05.06 68.53 ± 04.21 67.84 ± 05.06 67.41 ± 05.53 51.43 ± 06.63 68.73 ± 04.84 41.05 ± 05.05 68.76 ± 04.11 67.60 ± 05.49 72.51 ± 03.49 78.27 ± 04.83 78.49 ± 05.18 8 66.45 ± 02.62 66.80 ± 02.86 66.45 ± 02.62 66.63 ± 02.35 45.56 ± 05.56 68.11 ± 02.92 38.85 ± 03.04 65.60 ± 04.14 64.27 ± 04.88 71.49 ± 04.50 79.95 ± 03.97 79.95 ± 03.97 9 68.31 ± 02.18 67.76 ± 02.03 68.31 ± 02.18 66.54 ± 02.11 ————— 68.41 ± 02.14 40.86 ± 04.77 66.23 ± 03.22 63.63 ± 03.55 71.42 ± 03.30 77.90 ± 04.13 77.90 ± 04.13 10 69.20 67.58 69.20 71.09 ————— 73.02 40.48 69.09 67.95 71.77 84.60 85.07 Av e ra g e 67.67 67.33 67.67 64.99 ————— 67.36 45.83 67.26 64.45 72.20 77.24 77.06 N Purity (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 2 88.71 ± 09.39 88.64 ± 09.42 88.71 ± 09.39 83.64 ± 35.17 86.21 ± 11.76 86.78 ± 11.89 84.59 ± 14.76 88.15 ± 11.43 85.27 ± 10.58 92.28 ± 06.68 84.99 ± 14.32 87.90 ± 09.21 3 89.86 ± 07.26 89.85 ± 07.27 89.86 ± 07.26 88.03 ± 07.56 86.41 ± 12.53 88.90 ± 06.51 85.30 ± 07.89 92.18 ± 04.65 90.14 ± 06.31 91.33 ± 08.27 94.26 ± 03.12 94.26 ± 03.12 4 83.61 ± 09.89 83.43 ± 09.67 83.61 ± 09.89 80.92 ± 10.50 71.33 ± 05.38 81.50 ± 08.97 69.27 ± 06.85 83.16 ± 08.58 80.98 ± 08.37 83.30 ± 08.59 88.21 ± 08.64 87.06 ± 10.09 5 79.27 ± 09.11 79.17 ± 08.80 79.27 ± 09.11 73.52 ± 06.71 63.45 ± 09.50 76.70 ± 09.99 62.30 ± 07.79 79.56 ± 09.29 76.38 ± 09.84 81.23 ± 09.56 85.61 ± 08.84 85.15 ± 07.32 6 76.17 ± 07.13 76.28 ± 06.95 76.17 ± 07.13 77.02 ± 06.46 62.62 ± 07.89 78.53 ± 05.38 57.01 ± 07.71 77.20 ± 06.71 79.12 ± 05.84 81.24 ± 05.38 87.31 ± 03.83 87.37 ± 03.89 7 76.05 ± 06.03 77.63 ± 05.15 76.05 ± 06.03 75.43 ± 06.85 60.62 ± 06.22 75.92 ± 05.96 53.28 ± 05.35 77.91 ± 04.20 76.92 ± 05.34 79.83 ± 03.95 85.30 ± 04.53 85.60 ± 04.39 8 73.93 ± 03.28 76.17 ± 02.47 73.93 ± 03.28 74.05 ± 02.31 52.30 ± 06.44 75.38 ± 02.79 47.48 ± 03.95 72.64 ± 04.50 71.70 ± 04.82 74.92 ± 05.66 83.97 ± 04.82 83.97 ± 04.82 9 76.50 ± 02.47 76.49 ± 01.93 76.50 ± 02.47 74.08 ± 03.30 ————— 75.13 ± 03.96 47.32 ± 04.84 71.42 ± 05.10 69.30 ± 03.92 73.52 ± 04.04 80.08 ± 04.61 80.08 ± 04.61 10 77.06 73.92 77.06 77.97 ————— 79.20 49.57 76.11 75.14 77.00 88.01 88.26 Av e ra g e 80.14 80.18 80.14 78.30 ————— 79.78 61.79 79.81 78.33 81.63 86.42 86.63

(15)

T a b le III. Cluster ing P e rf or mance o n C OIL20 N Accuracy (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 4 74.38 ± 12.43 74.38 ± 12.43 75.76 ± 13.32 74.37 ± 08.38 66.39 ± 13.70 90.03 ± 12.37 75.17 ± 10.38 82.15 ± 09.79 77.95 ± 08.15 90.76 ± 10.40 89.58 ± 10.35 90.75 ± 10.67 6 77.66 ± 09.27 78.68 ± 08.87 80.19 ± 10.08 78.15 ± 09.36 63.84 ± 11.21 95.02 ± 05.41 77.45 ± 10.43 87.80 ± 08.94 85.74 ± 08.72 95.21 ± 05.28 91.64 ± 07.10 92.15 ± 07.43 8 72.15 ± 10.51 74.86 ± 10.19 77.24 ± 11.05 63.35 ± 09.37 55.66 ± 07.02 90.61 ± 09.09 65.23 ± 08.93 78.14 ± 08.62 78.04 ± 07.29 89.81 ± 06.85 90.76 ± 08.61 92.50 ± 07.06 10 67.06 ± 04.45 67.67 ± 04.71 68.97 ± 05.84 67.89 ± 04.88 51.93 ± 06.00 84.21 ± 03.67 65.40 ± 05.64 73.44 ± 04.03 71.71 ± 04.54 84.15 ± 05.42 84.44 ± 05.60 88.94 ± 05.93 12 67.80 ± 08.21 67.18 ± 07.51 69.95 ± 06.07 67.07 ± 06.73 51.13 ± 05.41 82.75 ± 08.52 61.71 ± 07.43 70.75 ± 06.79 69.26 ± 06.68 81.57 ± 08.06 82.64 ± 09.45 83.92 ± 08.37 14 66.15 ± 07.38 66.22 ± 06.33 67.85 ± 07.22 67.25 ± 06.52 47.10 ± 05.28 83.61 ± 06.10 62.01 ± 05.50 67.53 ± 07.10 67.11 ± 07.04 82.29 ± 08.42 82.54 ± 04.03 82.74 ± 05.58 16 66.05 ± 04.89 66.84 ± 04.22 66.03 ± 03.67 65.50 ± 03.25 49.38 ± 04.83 82.27 ± 03.90 58.13 ± 05.18 65.38 ± 05.90 66.10 ± 05.61 79.75 ± 04.84 81.45 ± 02.87 81.62 ± 02.20 18 62.42 ± 02.65 63.87 ± 04.02 66.54 ± 04.02 65.52 ± 03.49 47.27 ± 03.06 79.30 ± 02.57 56.33 ± 04.21 64.81 ± 04.26 65.31 ± 04.74 76.40 ± 04.03 78.17 ± 01.80 78.17 ± 01.80 20 60.49 61.32 61.46 67.43 47.36 82.71 56.39 69.44 64.72 75.83 82.99 83.96 Av e ra g e 68.24 69.00 70.44 68.50 53.34 85.61 64.20 73.27 71.77 83.98 84.91 86.08 N Normalized Mutual Information (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 4 65.53 ± 15.34 65.56 ± 15.33 64.59 ± 11.87 64.13 ± 10.09 56.26 ± 14.49 87.63 ± 10.37 63.54 ± 12.51 72.27 ± 13.47 67.72 ± 11.68 88.36 ± 09.92 86.41 ± 10.98 88.75 ± 10.54 6 76.68 ± 08.89 78.05 ± 07.88 78.15 ± 09.18 76.09 ± 09.21 59.63 ± 11.85 94.28 ± 04.85 73.74 ± 10.99 86.32 ± 08.33 81.80 ± 10.14 94.52 ± 04.58 91.26 ± 05.96 92.15 ± 06.07 8 73.43 ± 08.83 75.30 ± 08.85 76.13 ± 09.99 67.08 ± 08.06 55.95 ± 07.88 91.73 ± 06.89 64.41 ± 08.36 76.68 ± 08.92 75.68 ± 07.82 91.02 ± 05.84 91.61 ± 06.84 93.35 ± 05.89 10 71.14 ± 02.99 72.58 ± 03.70 72.44 ± 04.30 70.22 ± 04.72 54.34 ± 03.68 88.12 ± 02.61 67.55 ± 04.81 74.91 ± 03.38 72.37 ± 03.58 87.73 ± 03.59 88.05 ± 02.88 91.83 ± 03.12 12 73.06 ± 05.92 73.10 ± 05.70 75.02 ± 04.97 71.43 ± 05.44 57.76 ± 04.14 89.40 ± 04.13 66.18 ± 06.51 74.09 ± 05.11 71.84 ± 05.46 87.85 ± 04.79 88.63 ± 04.29 90.62 ± 05.07 14 74.93 ± 05.21 74.37 ± 05.67 75.43 ± 4.96 72.46 ± 04.79 55.57 ± 03.43 90.15 ± 03.37 67.42 ± 04.06 74.20 ± 05.54 72.59 ± 05.29 88.86 ± 05.07 88.79 ± 02.86 89.23 ± 03.21 16 74.17 ± 03.28 74.80 ± 02.59 74.65 ± 03.22 73.53 ± 02.54 58.14 ± 02.99 89.28 ± 02.21 66.29 ± 03.73 72.92 ± 04.55 73.11 ± 03.99 87.91 ± 05.52 88.20 ± 02.01 89.13 ± 01.92 18 73.95 ± 01.63 74.49 ± 02.03 75.60 ± 02.39 73.68 ± 02.05 56.25 ± 01.49 88.08 ± 01.46 65.71 ± 02.06 73.77 ± 02.96 72.45 ± 03.37 85.81 ± 02.18 86.46 ± 01.06 86.46 ± 01.06 20 73.86 73.74 73.55 76.00 55.49 90.59 66.65 75.13 73.01 83.66 88.04 90.63 Av e ra g e 72.97 73.55 73.95 71.62 56.60 89.92 66.83 75.59 73.40 88.41 88.61 90.24 N Purity (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 4 75.66 ± 10.99 75.66 ± 10.99 77.15 ± 10.72 74.72 ± 07.84 68.44 ± 11.37 90.03 ± 10.54 75.73 ± 09.58 82.15 ± 09.79 77.95 ± 08.15 90.76 ± 10.40 89.58 ± 10.35 90.73 ± 10.67 6 79.24 ± 07.79 80.23 ± 07.23 81.44 ± 8.42 79.54 ± 07.86 66.44 ± 10.17 95.02 ± 05.41 78.10 ± 10.14 88.24 ± 08.09 85.88 ± 08.49 95.21 ± 05.28 91.64 ± 07.10 92.15 ± 07.43 8 74.01 ± 09.04 75.99 ± 09.05 77.95 ± 09.94 66.93 ± 08.49 58.91 ± 06.17 90.64 ± 09.03 67.34 ± 07.94 79.01 ± 08.04 78.54 ± 07.15 90.30 ± 06.45 91.51 ± 07.42 92.53 ± 07.02 10 69.31 ± 02.90 70.18 ± 03.68 70.86 ± 04.86 69.19 ± 04.25 54.88 ± 05.30 85.97 ± 03.05 66.99 ± 04.15 74.29 ± 03.65 72.60 ± 04.31 84.92 ± 04.60 85.62 ± 04.71 90.18 ± 05.22 12 70.07 ± 06.86 69.98 ± 06.62 72.08 ± 05.18 68.68 ± 06.49 54.72 ± 05.62 85.51 ± 07.03 63.23 ± 07.20 71.94 ± 06.04 70.90 ± 06.22 83.96 ± 07.16 85.52 ± 07.40 86.74 ± 07.46 14 68.65 ± 07.20 68.66 ± 06.72 70.02 ± 06.60 68.75 ± 06.07 51.07 ± 04.59 85.83 ± 05.26 63.23 ± 04.80 70.55 ± 06.75 69.46 ± 06.51 84.69 ± 07.22 84.56 ± 03.50 84.72 ± 04.82 16 68.32 ± 03.94 68.90 ± 04.07 68.86 ± 03.40 67.47 ± 02.79 53.77 ± 04.55 85.09 ± 03.15 60.28 ± 04.54 67.80 ± 05.21 69.38 ± 05.00 82.75 ± 03.69 83.92 ± 02.64 84.31 ± 01.98 18 65.45 ± 02.41 66.48 ± 02.89 68.51 ± 03.38 67.17 ± 03.08 50.91 ± 02.49 82.28 ± 02.70 58.42 ± 03.63 67.52 ± 03.65 67.72 ± 03.59 79.53 ± 03.85 80.94 ± 01.23 80.94 ± 01.23 20 64.65 64.17 64.44 69.24 49.58 84.44 58.13 69.44 67.64 78.54 84.58 86.81 Av e ra g e 70.60 71.14 72.37 70.19 56.52 87.20 65.72 74.55 73.34 85.63 86.43 87.68

(16)

T a b le IV . Cluster ing P e rf or mance o n E Y a leB N Accuracy (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K-means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 5 23.01 ± 00.80 23.42 ± 00.79 23.01 ± 00.99 24.01 ± 00.99 ————— 41.97 ± 09.86 28.82 ± 03.98 36.91 ± 07.57 35.01 ± 05.71 55.68 ± 08.78 63.30 ± 07.91 44.28 ± 10.95 10 13.84 ± 01.19 13.59 ± 00.92 13.93 ± 01.03 18.28 ± 01.49 ————— 30.09 ± 05.00 22.67 ± 02.02 27.07 ± 03.57 31.23 ± 05.26 46.41 ± 04.59 34.17 ± 04.59 34.17 ± 04.59 15 11.46 ± 01.01 11.57 ± 00.92 10.97 ± 00.77 17.90 ± 02.44 ————— 23.80 ± 06.49 20.98 ± 01.98 21.95 ± 03.55 25.14 ± 03.19 37.49 ± 06.80 27.44 ± 07.65 32.59 ± 04.42 20 10.69 ± 01.08 10.61 ± 01.15 09.85 ± 00.80 15.04 ± 01.18 ————— 21.17 ± 01.96 19.64 ± 01.19 17.97 ± 01.69 23.44 ± 02.38 33.01 ± 03.14 28.59 ± 03.20 32.79 ± 02.10 25 09.35 ± 01.05 09.14 ± 00.57 08.37 ± 00.38 12.89 ± 01.00 ————— 15.96 ± 01.99 17.75 ± 01.30 15.60 ± 01.42 19.26 ± 02.96 28.52 ± 03.91 26.91 ± 03.12 30.54 ± 02.49 30 08.48 ± 00.71 08.78 ± 00.67 08.14 ± 00.66 13.10 ± 01.39 ————— 16.47 ± 00.98 17.21 ± 01.10 15.67 ± 01.34 20.15 ± 01.37 28.90 ± 02.87 29.01 ± 01.72 31.84 ± 02.34 35 08.85 ± 00.75 08.24 ± 00.55 08.67 ± 00.80 10.67 ± 00.79 ————— 14.64 ± 00.76 16.48 ± 01.23 14.12 ± 00.98 19.11 ± 01.82 28.27 ± 01.50 28.77 ± 01.24 30.26 ± 01.81 38 08.53 07.75 08.99 12.34 ————— 16.16 16.86 14.71 17.90 27.96 29.12 30.12 Av e ra g e 11.78 11.64 11.49 15.53 ————— 22.53 20.05 20.50 23.90 35.78 33.41 33.32 N Normalized M utual Information (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K-means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 5 00.94 ± 00.73 01.02 ± 00.78 00.74 ± 00.35 01.79 ± 01.22 ————— 30.22 ± 15.58 06.49 ± 04.14 21.98 ± 10.50 17.12 ± 06.95 46.10 ± 13.44 53.16 ± 07.53 31.92 ± 16.34 10 02.39 ± 02.21 01.81 ± 00.68 02.17 ± 01.15 09.98 ± 03.02 ————— 29.67 ± 06.65 15.34 ± 01.93 23.36 ± 04.73 27.72 ± 06.91 49.16 ± 04.54 34.03 ± 06.00 34.03 ± 06.00 15 03.96 ± 01.48 04.48 ± 01.86 03.31 ± 01.35 16.59 ± 03.83 ————— 28.01 ± 11.77 19.89 ± 03.74 22.46 ± 05.58 27.86 ± 05.53 43.26 ± 07.50 31.96 ± 11.88 33.19 ± 04.81 20 06.91 ± 01.59 06.82 ± 01.90 05.50 ± 01.99 16.47 ± 02.16 ————— 27.62 ± 03.08 23.11 ± 01.25 22.01 ± 02.76 30.21 ± 02.37 41.90 ± 03.47 37.97 ± 04.15 37.27 ± 02.55 25 07.00 ± 01.42 06.36 ± 01.11 05.41 ± 00.88 16.52 ± 01.68 ————— 21.60 ± 03.34 24.04 ± 01.15 20.94 ± 02.21 27.46 ± 03.23 38.23 ± 03.31 36.76 ± 03.02 37.22 ± 02.55 30 07.79 ± 01.19 08.05 ± 01.10 07.49 ± 01.02 20.21 ± 02.65 ————— 23.79 ± 00.89 26.28 ± 01.09 23.82 ± 01.85 30.93 ± 01.51 40.67 ± 03.38 39.57 ± 01.46 39.97 ± 01.76 35 10.04 ± 01.44 01.14 ± 08.96 09.41 ± 01.48 16.70 ± 01.32 ————— 25.36 ± 01.27 27.67 ± 01.58 22.80 ± 01.50 31.48 ± 01.94 38.57 ± 02.14 39.88 ± 01.22 39.67 ± 01.09 38 10.51 08.49 10.42 21.59 ————— 25.86 28.46 22.08 29.77 35.94 41.15 41.94 Av e ra g e 06.19 04.77 05.56 14.98 ————— 26.52 21.41 22.43 27.82 41.73 39.31 36.90 N Purity (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K-means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 5 23.73 ± 01.31 24.20 ± 01.36 23.80 ± 01.22 24.96 ± 01.62 ————— 45.75 ± 11.05 29.45 ± 03.99 37.13 ± 07.61 36.33 ± 05.72 57.79 ± 09.10 63.68 ± 07.68 47.16 ± 12.04 10 14.58 ± 01.31 14.25 ± 01.00 14.64 ± 01.31 19.24 ± 01.65 ————— 31.67 ± 05.20 23.73 ± 01.88 27.97 ± 03.66 32.55 ± 05.19 49.66 ± 04.11 36.24 ± 05.22 36.24 ± 05.22 15 12.26 ± 01.02 12.32 ± 01.08 11.73 ± 00.90 18.61 ± 02.58 ————— 26.76 ± 07.08 22.16 ± 02.24 22.71 ± 03.65 26.17 ± 03.38 39.92 ± 06.74 29.94 ± 08.11 34.57 ± 04.73 20 11.27 ± 01.10 11.26 ± 01.13 10.39 ± 00.82 15.85 ± 01.18 ————— 23.85 ± 02.50 20.48 ± 01.25 18.97 ± 01.71 24.33 ± 02.49 36.10 ± 03.11 31.87 ± 03.41 34.73 ± 01.84 25 10.00 ± 01.00 09.82 ± 00.58 08.95 ± 00.37 13.75 ± 01.03 ————— 17.71 ± 02.19 18.77 ± 01.29 16.76 ± 01.41 20.25 ± 02.99 31.20 ± 03.40 29.32 ± 02.99 32.79 ± 02.35 30 09.23 ± 00.60 09.55 ± 00.65 08.79 ± 00.68 13.91 ± 01.42 ————— 18.12 ± 01.02 18.13 ± 01.04 16.66 ± 01.42 21.19 ± 01.20 31.39 ± 02.97 30.81 ± 01.55 33.92 ± 01.97 35 09.66 ± 00.71 08.96 ± 00.62 09.36 ± 00.86 11.46 ± 00.73 ————— 15.44 ± 00.80 17.42 ± 01.18 14.89 ± 00.97 19.93 ± 01.76 29.95 ± 01.38 30.24 ± 01.20 32.65 ± 01.39 38 08.86 08.49 09.57 13.01 ————— 18.39 17.94 15.12 18.64 29.25 30.86 32.81 Av e ra g e 12.46 12.36 12.15 16.35 ————— 24.71 20.01 21.28 24.92 38.16 35.37 35.61

(17)

T a b le V . C luster ing P erf o rm ance o n O RL N Accuracy (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 5 79.00 ± 13.54 80.20 ± 12.73 81.00 ± 14.06 79.60 ± 12.85 59.60 ± 08.37 81.00 ± 14.43 76.80 ± 14.79 81.80 ± 12.73 81.80 ± 12.70 82.20 ± 13.31 80.60 ± 14.49 80.80 ± 14.88 10 62.10 ± 08.67 61.70 ± 08.78 68.20 ± 09.72 66.60 ± 07.18 37.00 ± 08.75 71.20 ± 08.18 66.70 ± 05.81 67.50 ± 08.68 72.40 ± 09.35 72.90 ± 07.84 74.30 ± 08.73 73.70 ± 08.74 15 62.53 ± 05.99 64.20 ± 06.58 63.53 ± 06.80 69.73 ± 07.08 28.67 ± 03.50 67.80 ± 05.99 66.60 ± 03.01 65.20 ± 06.78 69.60 ± 05.00 68.13 ± 06.84 71.80 ± 07.46 71.40 ± 06.13 20 57.80 ± 06.21 58.15 ± 05.97 60.90 ± 05.78 62.90 ± 06.81 25.80 ± 01.70 65.50 ± 07.86 61.10 ± 03.41 62.80 ± 03.94 61.70 ± 03.54 64.20 ± 05.06 67.05 ± 05.91 67.05 ± 05.91 25 57.24 ± 03.21 57.08 ± 02.69 59.16 ± 03.08 58.20 ± 02.59 23.60 ± 01.74 63.96 ± 05.28 62.72 ± 03.66 59.04 ± 03.76 59.36 ± 03.46 64.04 ± 03.60 66.20 ± 05.12 66.44 ± 04.55 30 55.63 ± 03.17 57.57 ± 04.17 57.97 ± 02.71 52.50 ± 02.45 22.17 ± 01.22 62.23 ± 03.03 58.57 ± 03.81 56.67 ± 03.56 57.50 ± 04.26 60.80 ± 03.22 63.57 ± 02.92 63.80 ± 03.68 35 52.83 ± 02.54 55.09 ± 03.30 55.11 ± 03.29 48.86 ± 02.97 20.71 ± 00.79 59.60 ± 04.22 56.60 ± 02.79 54.00 ± 03.03 57.69 ± 03.10 58.89 ± 03.09 60.69 ± 01.66 60.69 ± 01.66 40 53.50 57.25 63.00 53.50 20.25 55.75 56.25 57.00 56.75 60.50 60.25 65.50 Av e ra g e 60.09 61.41 63.61 61.49 29.73 65.88 63.17 63.00 64.60 66.46 68.06 68.67 N Normalized Mutual Information (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 5 73.72 ± 15.91 74.17 ± 15.99 77.51 ± 13.57 74.74 ± 14.22 53.17 ± 09.32 77.63 ± 14.69 71.64 ± 15.32 76.10 ± 15.39 76.32 ± 13.85 77.66 ± 12.95 76.76 ± 14.38 74.81 ± 18.83 10 67.81 ± 08.47 68.51 ± 08.10 74.16 ± 07.61 74.05 ± 05.74 38.64 ± 09.37 77.19 ± 06.29 70.56 ± 03.50 74.88 ± 05.75 76.36 ± 07.25 78.08 ± 05.74 78.81 ± 06.68 76.56 ± 08.16 15 72.91 ± 05.39 74.78 ± 05.93 73.69 ± 04.91 78.62 ± 04.98 35.32 ± 04.15 77.28 ± 04.46 73.67 ± 02.22 74.01 ± 05.98 78.37 ± 04.19 77.25 ± 04.28 80.26 ± 04.67 78.71 ± 04.99 20 71.60 ± 04.32 71.97 ± 04.64 73.40 ± 03.73 76.35 ± 04.78 38.02 ± 01.01 76.93 ± 05.26 72.62 ± 02.62 75.22 ± 02.39 74.20 ± 02.04 76.83 ± 02.76 78.59 ± 03.84 78.59 ± 03.84 25 72.45 ± 01.91 71.70 ± 02.50 73.38 ± 01.64 72.59 ± 01.76 39.22 ± 01.46 77.73 ± 02.67 75.15 ± 02.24 73.03 ± 02.82 73.63 ± 02.56 76.75 ± 02.66 78.25 ± 03.59 77.65 ± 04.49 30 72.53 ± 01.90 73.48 ± 02.46 73.90 ± 02.14 70.11 ± 01.91 40.53 ± 00.89 77.02 ± 02.35 73.34 ± 02.33 72.30 ± 02.50 72.77 ± 02.84 76.16 ± 01.59 77.65 ± 01.51 76.90 ± 02.79 35 70.71 ± 01.40 72.13 ± 02.23 72.12 ± 01.75 67.58 ± 02.05 41.48 ± 00.66 75.51 ± 02.27 72.43 ± 01.70 71.38 ± 01.90 74.40 ± 01.29 74.99 ± 01.62 75.54 ± 01.32 75.54 ± 01.32 40 71.82 74.75 72.35 74.51 42.64 74.72 73.03 74.07 74.23 77.16 76.28 78.37 Av e ra g e 71.69 72.69 73.81 73.57 41.13 76.75 72.81 73.87 75.04 76.86 77.77 77.14 N Purity (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 5 79.20 ± 13.27 84.80 ± 12.43 82.40 ± 11.31 80.40 ± 11.62 63.40 ± 07.89 82.40 ± 11.84 77.80 ± 14.00 82.00 ± 12.36 81.80 ± 12.70 83.40 ± 11.35 81.80 ± 12.49 81.20 ± 14.34 10 66.60 ± 08.38 65.90 ± 07.95 72.50 ± 08.25 71.30 ± 06.40 39.90 ± 08.49 75.40 ± 06.92 69.70 ± 04.42 72.10 ± 05.99 75.10 ± 08.08 76.90 ± 06.15 77.90 ± 08.02 74.80 ± 08.12 15 67.80 ± 05.13 69.80 ± 05.88 67.67 ± 04.93 73.27 ± 05.90 30.27 ± 04.53 72.47 ± 05.48 69.73 ± 02.71 69.07 ± 06.33 73.67 ± 04.27 71.60 ± 06.01 76.27 ± 05.89 74.00 ± 05.51 20 63.40 ± 05.35 63.60 ± 04.83 65.35 ± 05.08 67.50 ± 05.88 27.35 ± 01.97 69.65 ± 06.02 65.45 ± 03.03 67.40 ± 02.99 66.85 ± 03.20 68.75 ± 04.17 71.45 ± 04.23 71.45 ± 04.23 25 62.16 ± 01.87 62.32 ± 02.58 63.92 ± 02.10 63.08 ± 02.50 25.08 ± 01.67 69.32 ± 03.79 66.36 ± 03.95 63.84 ± 03.87 64.04 ± 02.56 68.16 ± 02.97 70.64 ± 04.49 69.32 ± 04.23 30 60.43 ± 02.43 63.10 ± 02.69 62.93 ± 02.06 57.50 ± 02.13 23.07 ± 01.32 66.63 ± 02.39 62.43 ± 03.63 61.53 ± 02.86 62.13 ± 03.65 65.57 ± 02.39 68.53 ± 02.52 67.00 ± 03.17 35 57.86 ± 02.15 60.20 ± 02.44 59.74 ± 02.69 53.31 ± 02.83 22.31 ± 00.74 64.31 ± 03.22 60.43 ± 02.67 59.09 ± 02.84 62.20 ± 02.34 63.66 ± 02.07 65.23 ± 01.53 65.23 ± 01.53 40 57.00 59.75 58.75 60.50 21.75 62.25 61.00 60.25 61.25 65.25 65.50 68.25 Av e ra g e 64.31 66.18 40.41 65.86 31.64 70.30 66.61 66.91 68.38 70.41 72.16 71.41

(18)

T a b le V I. Cluster ing P e rf or mance o n J aff e N Accuracy (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 2 100.0 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 99.75 ± 00.79 100.0 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 99.75 ± 00.79 100.0 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 3 98.40 ± 01.86 98.40 ± 01.86 100.0 ± 00.00 97.32 ± 03.40 84.97 ± 19.34 99.84 ± 00.51 97.62 ± 01.86 99.53 ± 01.06 95.28 ± 08.02 100.0 ± 00.00 100.0 ± 00.00 99.84 ± 01.26 4 99.30 ± 01.83 99.07 ± 02.04 99.19 ± 02.57 96.03 ± 04.14 72.55 ± 11.74 99.42 ± 01.26 98.83 ± 01.73 99.19 ± 02.20 97.68 ± 02.84 99.53 ± 01.12 99.88 ± 00.37 99.42 ± 01.26 5 98.68 ± 02.00 98.77 ± 02.04 98.87 ± 02.42 94.55 ± 03.96 74.08 ± 10.70 99.15 ± 01.37 97.46 ± 03.09 98.86 ± 01.47 97.83 ± 01.98 99.24 ± 00.73 99.15 ± 01.37 99.15 ± 01.37 6 95.97 ± 04.13 96.61 ± 03.45 99.38 ± 01.95 89.60 ± 05.53 63.27 ± 10.08 97.25 ± 06.55 95.14 ± 04.07 98.67 ± 01.46 97.97 ± 02.07 99.22 ± 00.73 99.45 ± 00.73 99.45 ± 00.73 7 95.65 ± 06.03 97.06 ± 02.17 97.33 ± 02.63 93.98 ± 03.89 59.00 ± 09.51 96.48 ± 06.69 90.24 ± 06.90 97.20 ± 02.50 94.73 ± 03.78 98.60 ± 01.19 98.73 ± 01.35 98.73 ± 01.35 8 91.97 ± 06.28 95.05 ± 03.19 97.05 ± 02.19 89.36 ± 05.84 61.64 ± 05.38 93.68 ± 08.43 91.63 ± 05.58 96.46 ± 02.52 94.76 ± 02.93 97.52 ± 01.86 96.59 ± 01.77 96.59 ± 01.77 9 91.87 ± 04.43 95.05 ± 03.78 94.63 ± 01.25 88.94 ± 06.35 61.37 ± 11.03 95.30 ± 07.35 90.73 ± 07.06 95.46 ± 01.02 93.79 ± 03.29 96.18 ± 04.35 96.71 ± 02.70 96.71 ± 02.70 10 84.04 81.69 95.77 85.45 57.75 97.65 95.77 96.71 92.96 97.18 98.12 98.12 Av e ra g e 95.10 95.74 98.02 92.78 70.51 97.64 95.27 98.01 96.08 98.61 98.74 98.67 N Normalized Mutual Information (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 2 100.0 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 98.55 ± 04.59 100.00 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 98.55 ± 04.59 100.0 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 3 94.88 ± 05.88 94.88 ± 05.88 100.0 ± 00.00 92.51 ± 09.12 70.23 ± 19.02 99.41 ± 01.87 92.02 ± 05.91 98.44 ± 03.40 88.74 ± 17.01 100.0 ± 00.00 100.0 ± 00.00 99.41 ± 01.87 4 98.37 ± 04.11 97.80 ± 04.73 98.49 ± 04.79 91.57 ± 07.40 65.14 ± 10.88 98.56 ± 03.05 96.98 ± 03.88 98.44 ± 03.89 94.70 ± 06.09 98.89 ± 02.55 99.66 ± 01.08 98.56 ± 03.05 5 97.32 ± 03.90 97.56 ± 04.00 98.05 ± 04.13 90.87 ± 05.33 67.91 ± 08.45 98.30 ± 02.75 95.01 ± 05.29 97.66 ± 03.04 95.83 ± 03.46 98.27 ± 02.08 98.30 ± 02.75 98.30 ± 02.75 6 94.15 ± 04.51 94.70 ± 03.39 99.13 ± 02.74 85.76 ± 06.80 64.58 ± 09.68 97.53 ± 03.78 91.76 ± 05.45 97.58 ± 02.24 96.20 ± 03.72 98.36 ± 01.38 98.90 ± 01.33 98.90 ± 01.33 7 94.87 ± 04.19 95.15 ± 03.30 96.29 ± 03.44 91.21 ± 05.22 59.13 ± 03.78 96.60 ± 04.28 87.12 ± 05.60 95.73 ± 03.63 92.40 ± 05.20 97.61 ± 02.01 97.92 ± 02.23 97.92 ± 02.23 8 90.91 ± 05.29 92.96 ± 03.56 95.96 ± 02.93 88.18 ± 03.88 65.20 ± 04.20 91.20 ± 03.97 89.09 ± 05.20 95.19 ± 03.38 93.19 ± 03.77 96.68 ± 02.34 95.40 ± 02.34 95.40 ± 02.34 9 90.86 ± 03.11 93.55 ± 02.63 93.53 ± 01.75 88.21 ± 05.59 64.05 ± 09.52 94.06 ± 03.36 89.34 ± 05.09 94.45 ± 01.47 92.21 ± 03.88 95.96 ± 02.25 95.81 ± 03.20 95.81 ± 03.20 10 82.68 82.56 94.16 85.40 66.82 96.50 93.54 96.21 92.15 96.09 96.99 96.99 Av e ra g e 93.78 94.35 97.29 90.25 69.23 96.91 92.76 97.08 93.77 97.98 98.11 97.92 N Purity (%) K-Means PC A RPC A NMF SC GNMF RMNMF MCFS + K -means MCFS + N MF MCFS + G NMF NMF2L (2 ,0) NMF2L (2 ,1) 2 100.0 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 99.75 ± 00.79 100.00 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 99.75 ± 00.79 100.0 ± 00.00 100.0 ± 00.00 100.0 ± 00.00 3 98.40 ± 01.86 98.40 ± 01.86 100.0 ± 00.00 97.32 ± 03.40 84.97 ± 14.09 99.84 ± 00.51 97.62 ± 01.86 99.53 ± 01.06 95.28 ± 08.02 100.0 ± 00.00 100.0 ± 00.00 99.84 ± 00.51 4 99.30 ± 01.83 99.07 ± 02.04 99.19 ± 02.57 96.03 ± 04.14 75.98 ± 09.45 99.42 ± 01.26 98.83 ± 01.73 99.19 ± 02.20 97.68 ± 02.84 99.53 ± 01.12 99.88 ± 00.37 99.42 ± 01.26 5 98.68 ± 02.00 98.77 ± 02.04 98.87 ± 02.42 94.55 ± 03.96 74.83 ± 08.73 99.15 ± 01.37 97.46 ± 03.09 98.86 ± 01.47 97.83 ± 01.98 99.24 ± 00.98 99.15 ± 01.37 99.15 ± 01.37 6 95.97 ± 04.13 96.61 ± 03.45 99.38 ± 01.95 89.60 ± 05.53 66.79 ± 09.08 97.80 ± 04.82 95.14 ± 04.07 98.67 ± 01.46 97.97 ± 02.07 99.22 ± 00.73 99.45 ± 00.73 99.45 ± 00.73 7 96.12 ± 04.63 97.06 ± 02.17 97.33 ± 02.63 93.98 ± 03.89 62.20 ± 08.66 97.07 ± 04.90 90.84 ± 05.66 97.20 ± 02.50 94.73 ± 03.78 98.60 ± 01.19 98.73 ± 01.35 98.73 ± 01.35 8 92.62 ± 05.06 95.05 ± 03.19 97.05 ± 02.19 90.06 ± 04.59 65.82 ± 04.32 93.65 ± 05.86 91.87 ± 05.22 96.46 ± 02.52 94.76 ± 02.93 97.52 ± 01.86 96.59 ± 01.77 96.59 ± 01.77 9 92.18 ± 04.01 95.10 ± 03.62 94.63 ± 01.25 89.36 ± 05.90 63.41 ± 09.80 95.30 ± 05.04 91.15 ± 06.22 95.46 ± 01.02 93.79 ± 03.29 96.55 ± 03.23 96.71 ± 02.70 96.71 ± 02.70 10 84.04 81.69 95.77 85.45 61.03 97.65 95.77 96.71 92.96 97.18 98.12 98.12 Av e ra g e 95.22 95.75 98.02 92.90 72.78 97.76 95.41 98.01 96.08 98.65 98.74 98.67