Notation and Assumptions - Shen_unc_0153D

2.5 Proofs

3.1.1 Notation and Assumptions

All quantities are indexed by the dimensiondin the current chapter. However, when it will not lead to confusion, the subscriptd will be omitted for convenience. Recall above, let the population covariance matrix be Σ. The eigen-decomposition of Σ is

Σ =UΛUT,

where Λ is the diagonal matrix of the population eigenvalues λ1 ≥λ2 ≥ . . . ≥λd and U is

the matrix of corresponding population eigenvectors so thatU = [u1,· · ·, ud].

Recall Assumption 2.1.1 thatX1, . . . , Xnare random samples from ad-dimensional normal

distribution N(0,Σd). Denote the data matrix by X = [X1, . . . , Xn]d×n and the sample

covariance matrix by ˆΣ =n−1XXT. Then, the sample covariance matrix ˆΣ can be similarly decomposed as

Σ = ˆUΛ ˆˆUT,

where ˆΛ is the diagonal matrix of the sample eigenvalues ˆλ1 ≥ λˆ2 ≥ . . . ≥λˆd and ˆU is the

matrix of the corresponding sample eigenvectors so that ˆU = [ˆu1, . . . ,uˆd].

Let ¯uj be any sample based estimator of uj, e.g. ¯uj = ˆuj for j = 1, . . . , d. Recall three

consistency concepts from Section 2.2.1:

• Consistency: The direction ¯uj is consistent with its population counterpartuj if

Angle(¯uj, uj)≡arccos(|<u¯j, uj >|) p −

→0,asd→ ∞, (3.1)

where<·,·>denotes the inner product between two vectors.

• Strong Inconsistency: The direction ¯uj is strongly inconsistent with its population

counterpartuj if Angle(¯uj, uj) p − → π

2, asd→ ∞.

• Consistency with convergence rate dι: The direction ¯uj is consistent with its

population counterpart uj with the convergence rate dι if |< u¯j, uj >|= 1 +op(d−ι),

where the notationGd≡op(d−ι) means thatdιGd p −

In addition, we consider another important concept in the current chapter:

• Marginal Inconsistency: The direction ¯uj is marginally inconsistent with uj if

Angle(¯uj, uj) converges to a (possibly random) quantity in (0, π₂), as d→ ∞.

We now use two illustrative examples to highlight our key theoretical results. The examples are chosen mainly for intuitive illustration. Our theorems cover more general single component spike models (Sections 3.2 to 3.4).

Example 3.1.1. Assume that X1, . . . , Xn are random sample vectors from a d-dimensional

normal distribution N(0,Σd), where the covariance matrix Σd has the eigenvalues as

λ1 =dα, λ2 =. . .=λd= 1, α≥0. (3.2)

This is a special case of the single component spike covariance Gaussian model considered before by, for example, Johnstone (2001); Paul (2007); Johnstone and Lu (2009); Amini and Wainwright (2009).

Without loss of generality (WLOG), we further assume that the first eigenvector u1 is

proportional to the d-dimensional vector

˙ u1= (

bdβc z }| {

1, . . . ,1,0, . . . ,0)T,

where 0 ≤ β ≤ 1 and bdβc denotes the integer part of dβ. For example, if β = 0, the first population eigenvector becomesu1 = (1,0, . . . ,0)T. (Note that in general the non-zero entries

do not have to be the first bdβc elements, nor do they need to have equal values.)

We formally define α as the spike index that measures the strength of the spike, and β

as the sparsity index that quantifies the sparsity of the maximal eigenvector u1, where bdβc

is the number of its non-zero elements. Under Model (3.2), Jung and Marron (2009) showed that the first empirical eigenvector (the PC direction) uˆ1 is consistent with u1 when α > 1;

however for α < 1, it is strongly inconsistent. Jung et al. (2012) then showed that uˆ1 is in

point of the thesis is an exploration of conditions under which sparse methods can lead to consistency even when the spike index α≤1, by exploiting sparsity.

Example 3.1.2. Our theorems are also applicable to the sparse single-component spike model considered by Amini and Wainwright (2009), although we have a different focus from them as illustrated below. The covariance matrixΣd can be expressed using our notation as

Σd= (λ1−1)z∗z∗T +    Ibdβ_c 0 0 Γ_d−bdβ_c   ,

where the first eigenvalue λ1 >1, the first bdβc entries of the maximal eigenvector are non-

zero with values of ±1/pbdβ_c_{, and}_Γ

d−bdβ_c is a symmetric positive semi-definite matrix with the maximal eigenvalueλ_max(Γ_d_−b_dβ_c)≤1. For this example, consider cases where all eigenvalues of Γ_d_−b_dβ_c are one. Hence, the eigenvalues of Σ_d are λ₁ >1 =λ₂=· · ·=λ_d.

Amini and Wainwright (2009) focused on the consistent recovery of the support set of the maximal eigenvectorz∗. We, however, are interested in the consistency of the actual direction vector. We note that the two types of consistency are not equivalent. For the above model, the two results are summarized below.

• Assuming fixedλ1 andn, d→ ∞, Amini and Wainwright (2009) showed that the support

set can be recovered if n > cu(bdβc)2log(d− bdβc), while it can not be recovered if

n < cl(bdβc)2log(d− bdβc), where cu and cl are two constants.

• For fixed n, λ1 =dα andd→ ∞, our Theorems 3.2.2 and 3.3.4 show that sparse PCA

is consistent when α > β, although the support set may not be consistently recovered (Theorem 3 of AW); and our Theorem 3.4.1 indicates that sparse PCA is strongly inconsistent when α < β, even when one knows the exact support set.

In document Shen_unc_0153D_12982.pdf (Page 58-60)