• No results found

In this section, we address the following subspace clustering problem:

Problem: Given a data matrixX = [x1,x2, ...,xN](X∈ RD×N with each column a data point) drawn from a union of k subspaces {Si}ki=1 of unknown dimensions, cluster the columns ofXinto their respective subspaces.

Assumption: The subspaces {Si}ki=1 in which the data lies are independent. In other words,dim(⊕ki=1Si) =∑ki=1dim(Si), where⊕is the direct sum operator.

With this assumption, subspace clustering can be formulated by first solving for each data pointxithe optimization problem

min

ci

kcikp s.t. xi =X−ici , (3.1)

where k · kp is any p vector norm with p ≥ 0, and X−i is the data matrix from which point xi has been removed. Given the coefficients {ci} for each data point, we can form the coefficient matrix C = [c1· · ·cN], from which the affinity matrix W = |C|+|CT| can be built. The clusters corresponding to the subspaces can then

be obtained by applying spectral clustering.

The idea behind (3.1) is that the data is self-expressive, and thus a data point can be represented as the linear combination of all the other data points. To make this repre- sentation tight, the current data point should only use the points in the same subspace, which is achieved by minimizing the`pnorm of coefficientsci. The following theorem shows that, when the subspaces are independent, the solution to (3.1) is block-diagonal with non-zero blocks corresponding to points in the same subspaces.

Theorem 3.1 LetXRD×N be the data matrix whose columns are drawn from a union of k independent linear subspaces. Let us assume that X has been sorted according to the subspaces, i.e.,X = [x1· · ·xN]Γ, whereΓ ∈ RN×N is an unknown permutation matrix that specifies the clusters of the data. Then the solution to (3.1) C∗ = [c1∗· · ·cN]Γ is block diagonal, i.e., there are only connections within clusters and no connection between clusters.

Proof. If ci is a solution of the `p problem in (3.1), then ci is a vector of min- imum `p norm satisfying xi = X−ic∗i. Let us decompose c∗i as c∗i = ci +hi, where ci corresponds to the coefficient recovered from the true subspace of xi and hi corresponds to the coefficient recovered from the other subspaces. Now we only need to show that hi = 0. Since ci = ci+hi, then we have xi = X−ici = X−i(ci+hi) = X−ici+X−ihi. Furthermore, sincexi ∈ SiandX−ici ∈ Siand, from the independence assumption, X−ihi 6∈ Si, we have X−ihi = 0. If we assume that hi 6=0, and from the fact thatciandhihave support on disjoint subset of indices, we havekcikp <kci+hikp=kci∗kp, which contradicts the optimality ofci. Therefore, hi =0, which concludes the proof.

Intuitively, consider a toy example in Figure 3.1 where we have two lines, each of which forms one linear subspace. The two subspaces are independent because the dimension of the space spanned by lineL1andL2equals the dimension ofL1plus the dimension of L2. Suppose we want to represent the pointxponL1with other points, and letXinbe the matrix containing points on L1(excluding point p) as columns and Xout be the matrix containing points onL2. We have

xp =Xincin+Xoutcout , (3.2) wherecinandcoutare the corresponding representation coefficients.

Note that any combinations of points on line L2still lie onL2, soXoutshould have no contribution in representing xp, i.e., Xoutcout = 0. There are two cases forcout: first, cout = 0; second,cout 6= 0and the sum of points on L2cancels out by certain nonzero coefficients cout. By theorem 3.1, we are guaranteed to have the first case, i.e. cout = 0, if we minimizekckp = k[cinT coutT ]Tkp. This makes sense because we always havek[cTin0T]Tkp ≤ k[cTincTout]Tkp.

Figure 3.1: A toy example for subspace clustering: two linesL1(with a point p) and L2passing through the originoform two independent linear subspaces.

Instead of solving individual optimization problems for each data point, a global minimization problem over the matrix Ccan be formulated. For instance, SSC (El- hamifar and Vidal [2013]) and LRR (Liu et al. [2013]) are special cases of this formu- lation. For the sake of clustering, we want the intra-cluster connectivity to be as dense as possible. The Frobenius (or`2) norm provides a good regularizer to encourage this property. This motivated the formulation of this chapter discussed in the next section.