Fundamental Limits - Message Passing - EFFICIENT ALGORITHMS FOR COLLABORATIVE FILTERING

5.3 Message Passing

6.1.1 Fundamental Limits

Let us begin the investigation on fundamental limits of matrix completion by consid-ering the simple but instructive case of rank r = 1. In this case, graph-theoretical interpretations provide the limits we seek. Assume that we know 3 entries of the matrix M that belong to the same 2 × 2 minor. Explicitly, for two row indices i, j ∈ [m] and two column indices a, b ∈ [n], the entries M^ia, Mja and Mib are known.

Unless Mia = 0, the fourth entry of the same minor is then uniquely determined M_jb = M_jaM_ib/M_ia. The case M_ia = 0 can be treated separately but, for the sake of simplicity we shall assume that M_ia 6= 0.

This observation suggests a simple matrix completion algorithm: Recursively look for a 2× 2 minor with a unique unknown entry and complete it according to the rule Mjb = MjaMib/Mia. As anticipated above, this algorithm has a nice graph-theoretic interpretation. Consider the bipartite graph G = (R, C, E) with vertices R corresponding to the rows and vertices C corresponding to the columns of M and edges E for the observed entries. Essentially, R is isomorphic to [m] and C to [n]. If a 2× 2 minor has a unique unknown entry, it means that the corresponding vertices j ∈ R, b ∈ C are connected by a length-3 path in G. Hence the algorithm recursively adds edges to G connecting distance-3 vertices.

After at most O(n²) operations the process described halts on a graph that is a disjoint union of cliques, corresponding to the connected components in G. Each edge corresponds to a correctly predicted matrix entry. If the graph is connected then there is a unique rank-1 matrix matching the sampled entries, and the above recursion recovers M exactly. If it is disconnected then multiple rank-1 matrices match the observations and no algorithm can distinguish the correct M.

Let us use a parametrization of ǫ = √

α(log m + w). The number of isolated nodes

in G is a Poisson random variable with mean e^−w. Hence

P(∃ an isolated node in G) → 1 − e^−e⁻^w (6.1) Thus, if w → −∞, then with high probability, there exists isolated nodes in G and matrix reconstruction is impossible by any algorithm. Further, it is known [18] that for G(n, p) graphs, the probability that a random graph is connected is about the same as the probability that there are no isolated nodes. Together with 6.1 the analysis could be generalized to bipartite graphs. Thus, if w→ ∞, then the graph is connected, and the above recursive algorithm reconstructs M. Observe that the order of|E| predicted by the above analysis coincides (within log n factors) with the guarantees achieved for OptSpace and Alternating Least Squares for incoherent matrices.

The above analysis is sharpened by the following consideration from [58]. In the large n-limit only the components with O(n) vertices matter (as they have O(n²)edges). It is a fundamental result in random graph theory [39] that there is no such component for ǫ≤ 1/√

α. For ǫ≥ 1/√

α there is one such component involv-ing approximately nξ vertices in R and mζ vertices in C, where (ξ, ζ) is the largest solution of

ξ = 1− exp^−ǫ^√^αζ ζ = 1− exp

− ǫ

√αξ

(6.2)

This analysis implies the following result as proved in [58].

Proposition 6.1.1. Let M be a random rank 1 matrix, and denote by ξ(ǫ), ζ(ǫ) the largest solution of (6.2). Then there exists an algorithm with O(n²) complexity achieving with high probability, a room mean squared error

RMSE≤p

1− ξ(ǫ)ζ(ǫ)M^max+ O(p

log n/n) where Mmax is the largest entry of the matrix M.

This result implies that arbitrarily small root mean squared error can be achieved with a large enough constant ǫ. A very simple extension of this algorithm to general

6.1. NOISELESS SCENARIO 105

rank-r matrices is introduced and analyzed in [77] under the assumption that all the r× r minors of U and V are full-rank (here U and V are the left and right factors resulting from the SVD of M = USV^T).

Another argument to understand the log n factor in the necessary condition is via the coupon collector’s problem. Assume that there are n coupons with labels in [n] that we wish to collect. But the only way of obtaining a coupon is by picking one uniformly at random from the set of all coupons [n]. Then, it is well known that we need to pick O(n log n) coupons to be sure of having at least one of each type. Replacing coupons with rows of the matrix M, it follows that we need to pick O(n log n) samples to be sure of sampling at least one entry from each row.

Clearly, if a particular row has no samples, then no algorithm can reconstruct M exactly. This essentially leads to the lower bound of O(n log n) samples for exact matrix reconstruction.

For general rank r matrices, a stronger necessary condition was proved assuming incoherence by Cand`es and Tao in [24]. They showed that if

|E| ≤ (3/4)nrµ log n

then there exist multiple µ-incoherent matrices of rank r whose entries, with prob-ability larger than 1/2, coincide on the revealed set E. This implies that assuming only the rank r and incoherence parameter µ, no algorithm can guarantee success with less than (3/4)nrµ log n random samples.

The above analyses provide probabilistic limits on completion for the uniform sampling model. It is possible to obtain bounds using a complete different approach -based on rigidity theory. In [98], Singer and Cucuringu introduced an algorithm that checks if there exists multiple rank r matrices that are identical given the sampled set E. The basic idea is the following. Factorize M as M = XY^T and think of the rows of X (xi) and Y (yj) as points in R^r. Then the revealed matrix entries can be thought of as defining the distance between two points and the edge set E as the graph imposed on these points. Now, if this “structure” is not rigid, it means that there are multiple solutions to the positions of the points (x_i and y_j) that satisfy the given constraints.

This implies that no algorithm can reconstruct M exactly. This bound is presented for comparison with our numerical results for exact matrix reconstruction in Section 6.3.2.

In document EFFICIENT ALGORITHMS FOR COLLABORATIVE FILTERING (Page 113-116)