In this chapter the method Non-negative matrix factorization was introduced as a technique to de- compose non-negative data into potentially meaningful components.
2.6. SUMMARY 25 The extracted basis and weight matrices are both constrained to be non-negative, too.
We briefly reviewed the existing literature on NMF and discussed the basic challenges in the design of NMF algorithms.
Concluding this chapter, we note that there are two important aspects of NMF which remain partly unsolved questions and need further investigation:
1. Uniqueness
Is there a principled way to determine an optimal solution of a given NMF problem? 2. Model order
Is there a way to automatically determine an optimal number of components in a given NMF problem?
While a solution to the first question will be the topic of the next chapter, Bayesian approaches to answer both questions will be discussed in chapters (6) and (8).
Chapter 3
Uniqueness of NMF and the
Determinant Criterion
In this chapter, we have a closer look on the problem concerning the uniqueness of NMF.
We propose a determinant criterion to constrain the solutions of non-negative matrix factorization problems and achieve unique and optimal solutions in a general setting, provided an exact solution exists. We demonstrate how optimal solutions are obtained by a heuristic named detNMF in an illustrative example and discuss the difference to sparsity constraints.
3.1
Uniqueness of NMF
NMF has seen numerous applications in recent years (see paragraph 2.2) where it has been primarily applied in an unsupervised setting for example in image and natural language processing, sparse coding, and a variety of applications in computational biology. Usually, NMF is used to gather hidden information from data, such as the extraction of (at least some of) the underlying components which can be used to generate the whole dataset by non-negative superposition.
3.1.1
Why is uniqueness important?
To make NMF a reliable data analysis tool, it is essential to understand its intrinsic insufficiency of a unique solution. Figure (3.1) illustrates two different problems concerning the optimal solutions of an optimization problem. In case A (left hand side) several points of the search space (or parameter space) lead to an equally low cost function. These minima are equivalent, since they can not be distinguished by evaluating the cost function. The related uniqueness problem (i.e. to assign one best solution among several equivalent ones) is an intrinsic problem of the system.
In the second case B displayed on the right hand side of figure (3.1), there is one global minimum and at least one local minimum. A downhill algorithm such as gradient descent can not traverse a minor bump of the cost function and become stuck into a local minimum, although there is another minimum with a lower cost. This problem can be circumvented by several runs using different starting positions, or stochastic search algorithms.
As we will see, the uniqueness problems of NMF are of the kind A: without additional constraints there are several equivalent solutions which can not be distinguished by a cost function which measures the reconstruction error only.
The existence of several solutions can lead to multiple interpretations. Running an identical analysis twice must not lead to different solutions without understanding their origin or judging their quality.
Figure 3.1: The x-axes denote the space of possible parameter settings, while the y-axes show the value of a cost function. In case A (left) there are several different optimal solutions which have an equal cost function. In contrast, case B has one global minimum and one or more local optima.
Established analysis tools like for example PCA have a fixed hierarchy of the extracted principle components, which are sorted according to their explained variance and orthogonal to each other. NMF does not have such an ordering of the extracted components.
In contrast, there can be several equivalent solutions at an equal level of the cost function (type A ). In addition, NMF algorithms are greedy algorithms and can, on principle, only converge (if at all) to local minima of the cost function.
There are some recent attempts to deal with the non-uniqueness NMF-solutions. [DS04], poses the two fundamental questions
1. Under what assumptions is NMF well-defined, e.g. in some sense unique?
2. Under what assumptions is the factorization correct, recovering the right answer?
The idea of a simplical cone in the non-negative orthant spanned by the basis vectors which contains the data cloud is discussed, and situations are given (e.g. an ice-cream cone) where there is only one possible solution.
Other approaches add penalty terms to the plain cost function, such as the sparsity constraint on the weights used in non-negative sparse coding by Hoyer [Hoy02]
Dnnsc= X i X j (Xij− [WH]ij)2+ λ X ik Wik (3.1)
or the constraints used in the local NMF algorithm (LNMF) by Li et al. [LHZC03] encouraging spatially localized representations
Dlnmf= X i X j (Xijln Xij [WH]ij − Xij+ [WH]ij) + α X ia [WTW]ia− β X k [HHT]kk (3.2)
Various regularizing constraints have also been added to enforce certain characteristics of the solutions or to impose prior knowledge about the application considered [Hoy04]. Further, [TST05] analyzed the uniqueness properties of sparse NMF and prove that Hoyer’s algorithm [Hoy04] indeed finds the closest points of fixed sparseness.
3.2. GEOMETRICAL APPROACH 29