• No results found

SIM Robustified: Missing Data

Our previous solution to handling data corruption relies on the computation of the row space Vof the data X. When the data contains missing entries, computing the row- space cannot simply be achieved by SVD. Here, we exploit the idea that our goal truly is to estimate the subspace on which the data lies (which V is an orthogonal basis of). Linear subspaces of a fixed dimension form a Riemannian manifold known as the Grassmannian. Therefore, we propose to make use of an optimization technique on the Grassmann manifold to obtain an estimate ofVin the presence of missing data.

More formally, letG(N,r)denote the Grassmann manifold ofr-dimensional linear subspaces of RN (Chikuse [2003]). A point Y ∈ G(N,r), i.e., an r-dimensional subspace of RN, can be represented by any orthogonal matrix VRN×r whose columns span ther-dimensional subspaceY. Estimating the row spaceV(an orthog- onal matrix) of the data matrix can then be thought of as finding the corresponding linear subspace onG(N,r).

To estimate V, we utilize the GROUSE (Grassmannian Rank-One Update Sub- space Estimation) algorithm (Balzano et al. [2010]). GROUSE is an efficient online algorithm that recovers the column space of a highly incomplete observation matrix. To this end, it utilizes a gradient descent method on the Grassmannian to incrementally update the subspace by considering one column of the observation matrix at a time.

More specifically, in our context, at each iteration t, we take as input a vector xtRNt, which corresponds to the partial observation of a single vector x

t ∈

RN in the data matrix X,2 with observed indices defined by

t ⊂ {1,· · · ,N}. Let Vt be the submatrix of V consisting of the rows indexed by Ωt. Following the GROUSE formalism, which relies on the least-squares reconstruction of the data, we can formulate the update at iterationtas the solution to the optimization problem

min V∈G(N,r),a 1 2kVΩtaxΩtk 2 2 (4.12)

wherea corresponds to the representation (or weights) of the dataxt in the current estimate of the subspace.

Since (4.12) is not jointly convex in aand V, the two variables are obtained in a sequential manner: First, the optimal weights w = a∗ are computed for the current subspace, and then the subspace is updated given those weights. Due to the least- squares form of the objective function, the solution for the weights can be obtained in closed-form as w = V

txΩt, where V

Ωt is the pseudoinverse of VΩt. To update

the subspace, i.e., the orthogonal basis matrix V, GROUSE exploits an incremental gradient descent method on the Grassmann manifold, which we describe below.

LetItRN×Nt be theN

tcolumns of theN×Nidentity matrix indexed byΩt. Then, the objective function of (4.12) can be rewritten as

Et = 1

2kIΩt(VΩtwxΩt)k

2

2 . (4.13)

The update of the subspace is achieved by taking a step in the direction of the gradient of this objective function on the Grassmannian, i.e., moving along the geodesic defined by the negative Grassmannian gradient. To this end, we first need to compute the regular gradient of the objective function with respect to V. This gradient can be

2Note that even though we considerx

tto be a column vector, it really corresponds to one row of the

written as

Et

V =−(IΩt(xΩt−Ωtw))w

T (4.14)

=−rwT , (4.15)

wherer=It(xtVtw)denotes the (zero-padded) vector of residuals.

The gradient on the Grassmannian can then be obtained by projecting the regular gradient on the tangent space of the Grassmannian at the current point. Follow- ing Edelman et al. [1998], this can be written as

∇Et = (IVVT) Et

V (4.16)

=−(IVVT)rwT (4.17)

=−rwT . (4.18)

As shown in (Edelman et al. [1998]) (or Theorem 2.2 in Chapter 2), a gradient step along the geodesic with tangent vector−∇Et is defined as a function of the singular values and vectors of∇Et. Since∇Et has rank one, its SVD is trivial to compute. The compact SVD of−∇Et can be written as

−Et = r krk × krkkwk × w kwk T (4.19) =p1σqT1 , (4.20) withp1= krrk,σ =krkkwkandq1= kwwk.

Let p2,· · · ,pN be the orthonormal set orthogonal to p1, and q2,· · · ,qN be the orthonormal set orthogonal toq1. Then the full SVD of−Et can be written as

−Et =PΣQT (4.21)

= [p1p2 · · · pN]×diag(σ, 0,· · · , 0)×[q1q2 · · · qr]T , (4.22) with P = [p1 p2 · · · pN] ∈ RN×r, Σ = diag(σ, 0,· · · , 0) ∈ Rr×r, and Q =

[q1 q2 · · · qr] ∈ Rr×r. Clearly, PTP =Ir×r,PPT =IN×N, andQTQ = QQT = Ir×r.

Following Eq. (2.5), we updateVwith a step sizeη as

V(η) = VQ P cos(Ση) sin(Ση) QT =VQcos(Ση)QT+Rsin(Ση)QT

=VQcos(diag(ση, 0,· · · , 0))QT+

Psin(diag(ση, 0,· · · , 0))QT

=VQdiag(1, 1,· · · , 1)QT+

VQdiag(cos(ση)−1, 0,· · · , 0)QT+

Pdiag(sin(ση), 0,· · · , 0)QT

=VQIQT+ (cos(ση)−1)Vq1qT1 +sin(ση)p1q1T =V+ (cos(ση)−1)Vq1qT1 +sin(ση)p1q1T =V+(cos(ση)−1) kwk2 Vww T+ sin(ση) r krk wT kwk .

In short, the update ofVat timetis given by Vt+1 =Vt+ (cos(ση)−1) kwk2 Vtww T +sin( ση) r krk wT kwk . (4.23)

The Grassmannian update is very efficient since each subspace update only in- volves linear operations. Furthermore, for a specific diminishing step-size η, it is

guaranteed to converge to a locally optimal estimate of V (Balzano et al. [2010]). After getting an estimate of Vusing this method, we can directly apply the RSIM to perform subspace clustering.

The pseudocode of our robust SIM with missing data (RSIM-M) algorithm is given in Algorithm 4.4. Note that:

1. Stochastic gradient descent may require a relatively large number of steps to be stable. With small amounts of data, we run multiple passes over the data. For example, in our experiments on motion segmentation with incomplete trajecto- ries, we iterated over all the frames 100 times. Thanks to the high efficiency of rank-one Grassmannian update, RSIM-M remains very efficient.

2. Due to the non-convexity of this problem, initialization is important for conver- gence speed and optimality. In practice, we start with the subspace spanned by the most completerrows ofX.