Nuclear Norm Minimization - Message Passing

5.3 Message Passing

6.1.2 Nuclear Norm Minimization

In the previous section, we compared the guarantees of OptSpace and Alternating Least Squares with some fundamental limits of matrix completion. In this section, we compare these with the guarantees achieved by a parallel approach to matrix completion, namely nuclear norm minimization in the noiseless scenario. In this context, a natural optimization problem for matrix completion is the following :

minimize rank(X) (6.3)

subject to PE(X) = PE(M)

where the optimization is over matrices X ∈ R^m×n and PE(.) : R^m×n → R^m×n was defined in Section 3.1 as

PE(A)ij =

( Aij if (i, j)∈ E , 0 otherwise.

That is, consider all matrices that agree with the revealed set and return the one with the smallest rank. When there a unique rank r matrix that agrees with the revealed set, this problem recovers M exactly. However, this is an NP-hard problem and has a complexity that is doubly exponential in the problem dimension [27].

To overcome this difficulty, it is common to use the nuclear norm minimization heuristic, introduced by Fazel [40, 41] :

minimize kXk∗ (6.4)

subject to PE(X) = PE(M)

where kXk∗ denotes the nuclear norm of X, i.e the sum of the singular values of X.

It is a convex function of X and is commonly used as a proxy for rank(X). It has

6.1. NOISELESS SCENARIO 107

been shown that this problem can be formulated as an SDP [41] and can be solved by off-the-shelf solvers with a computational complexity of O(n⁴). Note that this is a significant improvement over solving (6.3). However, an O(n⁴) complexity is still prohibitive for large datasets (n ≥ 1000). In Section 6.3.1, we present a few low complexity algorithms for solving (6.4).

Nuclear norm minimization is closely related to compressed sensing [32, 22]. Here, we wish to find the sparsest vector satisfying a set of affine constraints :

minimize kxk⁰ (6.5)

subject to Ax = b

where kxk⁰ denotes the number of non-zero entries of x. This problem is again NP-hard. A common heuristic is to replace it with the convex problem:

minimize kxk1 (6.6)

subject to Ax = b

where k.k¹ denotes the ℓ1 norm of a vector, i.e the sum of the absolute values of its entries. It was shown [32, 22] that the convex optimization problem (6.6) coincides with the solution of (6.5) under appropriate conditions on the measurement matrix A.

The similarities between the matrix completion problems (6.3), (6.4) and the compressed sensing problems (6.5, 6.6) are striking. In (6.3), the rank counts the number of non-zero singular values of a matrix, which is analogous to the ℓ₀ function.

In turn, the nuclear norm measures the sum of the singular values, analogous to the ℓ₁ function. These analogies lead to interesting connections between the two problems and a number of matrix completion algorithms are derived from their compressed sensing counterparts. Some of these algorithms are described in Section 6.3.1. Further connections between compressed sensing and nuclear norm minimization are explored in [88].

The relaxed optimization problem (6.4) has a significant reduction in computa-tional complexity compared to (6.3). But when is it a good estimator of M? This question was addressed by Cand`es and Recht in [21]. A significant challenge in an-alyzing nuclear norm minimization, as compared to compressed sensing, is that the restricted isometry property – which proved to be quite instrumental in analyzing compressed sensing – is not satisfied in the matrix completion problem. To address this challenge, Cand`es and Recht introduced the incoherence property which is a less restrictive condition and has proved to be very useful. Under the simple noiseless set-ting and assuming the incoherence property with parameter µ and uniform sampling, they proved that there exists a numerical constant C such that if |E| ≥ Cn^6/5rlogn, then solving (6.4) recovers M correctly with high probability.

The guarantees were further tightened using ideas from [49, 87, 24]. A major breakthrough was the introduction of powerful matrix concentration inequalities by Gross [50] in the context of quantum state tomography. The most recent analysis shows that (6.4) recovers the matrix M exactly if

|E| ≥ Cnµr(log n)² (6.7)

This bound coincides with the guarantees obtained for Alternating Least Squares in Theorem 5.2.1 when the condition number κ is bounded. In addition, Theorem 5.2.1 also provides the rate of convergence which is of independent interest.

The results obtained for OptSpace in Theorem 4.2.2 guarantee exact matrix recovery if

|E| ≥ Cnµrκ²max{log n, µrκ⁴}

This improves over (6.7) when µr = O((log n)²) and the matrix M has bounded con-dition number. Further, when the rank and the incoherence parameter are bounded, the guarantee achieved by OptSpace, namely exact reconstruction for

|E| ≥ Cn log n

6.1. NOISELESS SCENARIO 109

coincides with the lower bounds derived in Section 6.1.1 up to constants and is there-fore order-optimal. Note that in many practical applications such as positioning [97]

and structure-from-motion [26], the rank is known to the small. Indeed in these appli-cations, the rank is comparable to the ambient dimension of 3. However, the bounds achieved for OptSpace are sub-optimal in the case of

(i) Large rank : The number of samples required for reconstruction should scale linearly in r rather than quadratically as suggested by Theorem 4.2.2. However, as should be clear from the numerical experiments of Section 6.3.2, this appears to be a drawback of our analysis rather than the algorithm.

(ii) Large condition number : This appears to be a limitation of the singular value decomposition step. However, [61] introduces a simple modification of OptSpace and shows empirically that it overcomes this problem.

Finally, note that the second drawback is shared by the Alternating Least Squares algorithm which also uses the singular value decomposition for initialization.

In document EFFICIENT ALGORITHMS FOR COLLABORATIVE FILTERING (Page 116-119)