• No results found

Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA

N/A
N/A
Protected

Academic year: 2021

Share "Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Contents lists available atSciVerse ScienceDirect

Journal of Multivariate Analysis

journal homepage:www.elsevier.com/locate/jmva

Boundary behavior in High Dimension, Low Sample Size asymptotics

of PCA

Sungkyu Jung

a,∗

, Arusharka Sen

b

, J.S. Marron

c

aDepartment of Statistics, University of Pittsburgh, Pittsburgh, PA 15260, USA

bDepartment of Mathematics and Statistics, Concordia University, Montreal, Quebec H3G 1M8, Canada

cDepartment of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA

a r t i c l e i n f o

Article history: Received 1 June 2011 Available online 23 March 2012 AMS 2000 subject classifications: primary 62H25

34L20 secondary 62F12 Keywords:

Principal component analysis High Dimension Low Sample Size Geometric representation ρ-mixing

Consistency and strong inconsistency Spiked covariance model

a b s t r a c t

In High Dimension, Low Sample Size (HDLSS) data situations, where the dimensiondis much larger than the sample sizen, principal component analysis (PCA) plays an important role in statistical analysis. Under which conditions does the sample PCA well reflect the population covariance structure? We answer this question in a relevant asymptotic context wheredgrows andnis fixed, under a generalized spiked covariance model. Specifically, we assume the largest population eigenvalues to be of the orderdα, whereα <,=, or>1. Earlier results show the conditions for consistency and strong inconsistency of eigenvectors of the sample covariance matrix. In the boundary case,α = 1, where the sample PC directions are neither consistent nor strongly inconsistent, we show that eigenvalues and eigenvectors do not degenerate but have limiting distributions. The result smoothly bridges the phase transition represented by the other two cases, and thus gives a spectrum of limits for the sample PCA in the HDLSS asymptotics. While the results hold under a general situation, the limiting distributions under Gaussian assumption are illustrated in greater detail. In addition, the geometric representation of HDLSS data is extended to give three different representations, that depend on the magnitude of variances in the first few principal components.

©2012 Elsevier Inc. All rights reserved.

1. Introduction

The study of the covariance matrix and its usual estimator, the sample covariance matrix, is an important issue in multivariate statistics. In particular, the sample covariance matrix provides the conventional estimator of principal component analysis (PCA) through the eigenvalue–eigenvector decomposition. PCA plays an important role in dimension reduction and visualization of important data structure. The High Dimension, Low Sample Size (HDLSS) data situation, where the dimensiondof the sample space is much larger than the sample sizen, occurs in many areas of modern science, and thus the dimension reduction through PCA is becoming more important for analysis of such data. The sample PCA (through the eigen-decomposition of the sample covariance matrix) is still well-defined whend

>

n, and thus frequently used in practice. Even when the dimension is much higher than the sample size, the PCA has shown to be successful such as in microarray studies [5]. What is the underlying mechanism which leads to the success of PCA in the HDLSS situation? This is the question we answer in this paper.

A central question is whether the sample principal components reflect true underlying distributional structure in the HDLSS context. This has been investigated by comparing the sample eigenvalues and eigenvectors with their population

Corresponding author.

E-mail addresses:[email protected](S. Jung),[email protected](A. Sen),[email protected](J.S. Marron). 0047-259X/$ – see front matter©2012 Elsevier Inc. All rights reserved.

(2)

counterparts, in a relevant asymptotic context where the dimensiondgrows while the sample sizenis fixed [2,14,25,24]. The asymptotic direction ofdgrowing andnfixed is also studied in different contexts; see for instance, [7,20] and Chapter 4.5 of [21]. While we focus on this asymptotic context in this paper, we also point out that there has been a different approach for the problem where the limits are taken along the direction wheredandngrow at the same rate, i.e.d

/

n

c

(

0

,

)

asd

→ ∞

. For the result of this type, we refer to [13,4,19,18,16] and references therein.

In both investigations, the majority of results are well described in a spiked covariance model, proposed by [13]. An exception we point out is a work by [8], which proposes to estimate the spectral distribution of eigenvalues without assuming a spike model.

A spiked covariance model assumes that the first few eigenvalues are distinctively larger than the others. We use a generalized version of the spike model, as described in Section 3, which is different from that of [13,19]. Let Σ(d) denote the population covariance matrix andS(d)denote the sample covariance matrix. The eigen-decomposition ofΣ(d) isΣ(d)

=

UdΛdUd′, whereΛdis a diagonal matrix of eigenvalues

λ

1,d

λ

2,d

≥ · · · ≥

λ

d,din non-increasing order,Udis a matrix of corresponding eigenvectors so thatUd

= [

u1,d

, . . . ,

ud,d

]

, and′denotes the transpose of the preceding matrix. The eigen-decomposition ofS(d)is similarly defined asS(d)

= ˆ

UdΛ

ˆ

dU

ˆ

d′. As a simple example of the spiked model, consider

λ

1,d

=

σ

2dα,

λ

2,d

= · · · =

λ

d,d

=

τ

2, for

α, σ

2

, τ

2

>

0 fixed. The first eigenvector ofS(d)corresponding to the largest eigenvalue is of interest, as it contains the most important variation of the data. The first sample eigenvectoru

ˆ

1,dis assessed with the angle formed by itself and its population counterpartu1,d. The directionu

ˆ

1,dis said to beconsistentwithu1,dif Angle

(

u

ˆ

1,d

,

u1,d

)

0 asd

→ ∞

. However in the HDLSS context, a perhaps counter-intuitive phenomenon frequently occurs, where the two directions tend to be as far away as possible. We say the directionu

ˆ

1,disstrongly inconsistentwith

u1,dif Angle

(

u

ˆ

1,d

,

u1,d

)

π2 asd

→ ∞

. In the one spike model above, the order of magnitude

α

of the first eigenvalue is the key condition for these two limiting phenomena. [14] have shown that

Angle

(

u

ˆ

1,d

,

u1,d

)

0

,

α >

1

;

π

2

, α <

1

,

(1) in probability under some conditions. Although the gap between consistency and strong inconsistency is relatively thin, the case

α

=

1 has not been investigated, and is a main focus of this paper.

It is natural to conjecture from(1)that when

α

=

1, the angle does not degenerate but converges to a random quantity in

(

0

, π/

2

)

. This claim is established in the simple one spike model in the next section, where we describe a range of limits for the eigenvalues and eigenvectors, depending on the order of magnitude

α

of

λ

1,d. In Section3, the claim is generalized for multiple spike cases, and is proved in a much more general distributional setting.

The parameter

α

gives a sharp mathematical boundary for the set of HDLSS situations where the estimated PC direction converges to the population direction. In the boundary,

α

=

1, it will be shown that the estimated PC direction is affected by the true PC directions, but not as strong as the

α >

1 case. The success of PCA in the HDLSS situation is understood as an example of the

α

1 cases, as otherwise the estimated principal components are meaningless as shown in(1).

In a multiple spike model withm

>

1 spikes, where the firstmprincipal components contain the importantsignalof the distribution, the sample PCA can be assessed by simultaneously comparing the firstmprincipal components. In particular, we investigate the limits ofdistancebetween two subspaces: the subspace generated by the firstmsample PC directions

ˆ

u1,d

, . . . ,

u

ˆ

m,dand the subspace by the firstmpopulation PC directions. The distance is usefully measured by canonical angles and metrics between subspaces, the limiting distributions of which will be investigated for the

α

=

1 case, as well as the cases

α

̸=

1, in Section3.3. The probability density functions of the limiting distributions for

α

=

1 are also derived and illustrated under a Gaussian assumption, to show the effect of parameters in the distributions.

The HDLSS data set has an interesting geometric representation in the limitd

→ ∞

, as shown in [11]. In Section4, we extend the result and show that there are three different geometric representations, which coincide with the range of limits depending on

α

.

2. Range of limits in the single spike model

Suppose we have a data matrixX(d)

= [

X1,d

, . . . ,

Xn,d

]

, withd

>

n, where theddimensional random vectorsXi,dare independent and identically distributed. We assume for now thatXi,dis normally distributed with mean zero and covariance matrix6(d), but the Gaussian assumption will be relaxed in the next section. The population covariance matrix6(d)is assumed to have one spike, that is, the eigenvalues of6(d)are

λ

1,d

=

σ

2dα

, λ

2,d

= · · · =

λ

d,d

=

τ

2. The corresponding eigenvectors of6(d)are denoted byui,d. The sample covariance matrix is defined asS(d)

=

1nX(d)X(d)with itsith eigenvalue

and eigenvector denoted by

λ

ˆ

i,dandu

ˆ

i,d, respectively.

The following theorem summarizes the spectrum of the limiting distributions of the eigenvalues and eigenvectors of

S(d), depending on the different order

α

of

λ

1,d. Note that the angle between the two vectorsu

,

u

ˆ

is represented by the inner product through Angle

(

u

,

u

ˆ

)

=

cos−1

(

uu

ˆ

)

. For the eigenvectors with common eigenvalues, there are of course an infinite number of choices. The argument in the following theorem assumes that wechoosea set of population eigenvectorsuj,d before obtainingu

ˆ

j,d.

(3)

Theorem 1. Under the Gaussian assumption and the one spike case above,

(

i

)

the limit of the first eigenvalue depends on

α

:

ˆ

λ

1,d max

(

dα

,

d

)

H⇒

σ

2

χ

n2 n

,

α >

1

;

σ

2

χ

2 n n

+

τ

2 n

, α

=

1

;

τ

2 n

,

α <

1

,

as d

→ ∞

, where

H⇒

denotes the convergence in distribution, and

χ

2

n denotes a random variable with the

χ

2distribution

with degrees of freedom n. The rest of the eigenvalues converge to the same quantity when scaled, that is for any

α

∈ [

0

,

)

, j

=

2

, . . . ,

n,

ˆ

λ

j,d d

τ

2 n

,

as d

→ ∞

,

in probability.

(

ii

)

The limit of the first eigenvector depends on

α

: u1,du

ˆ

1,d

H⇒

1

α >

1

;

1

+

τ

2

σ

2

χ

2 n

−12

α

=

1

;

0

,

α <

1

,

as d

→ ∞

. The rest of the eigenvectors are strongly inconsistent with their population counterpart, for any

α

∈ [

0

,

),

j

=

2

, . . . ,

n,

uj,du

ˆ

j,d

0

,

as d

→ ∞

,

in probability.

The case

α

=

1 bridges the other two cases. In particular, the ratio of the sample and population eigenvalues

λ

ˆ

1,d

1,dis asymptotically unbiased to 1 when

α >

1. It is asymptotically biased when

α

=

1, and becomes completely deterministic in the case

α <

1, where the effect of

σ

2on

λ

ˆ

1,dbecomes negligible. Moreover, the angle Angle

(

u1,d

,

u

ˆ

1,d

)

to the optimal direction converges to a random quantity which is defined on

(

0

, π/

2

)

and depends on

σ

2

, τ

2, andn. The effect of those

parameters on the limiting distribution of Angle

(

u1,d

,

u

ˆ

1,d

)

is illustrated inFig. 1. The ratio

σ

2

2can be understood as a

signal to noiseratio. A high value of

σ

2

2means that the major variation along the first PC direction is strong. Therefore, for

larger values of

σ

2

2

,

Angle

(

u

1,d

,

u

ˆ

1,d

)

should be closer to zero than smaller values of the ratio, as depicted in the upper panel ofFig. 1. Moreover, the sample PCA with larger sample sizenshould perform better than with smaller sample size. The sample sizenbecomes the degrees of freedom of the

χ

2distribution in the limit, and the bottom panel ofFig. 1shows

that theu

ˆ

i,dis closer toui,dfor larger values ofn.

The Gaussian assumption in the previous theorem appears as a driver of the limiting

χ

2distributions. Under the general non-Gaussian assumption we state in the next section, the

χ

2will be replaced by a distribution that depends heavily on the

distribution of the population principal component scores, which may not be Gaussian.

Remark 1. The results inTheorem 1can be used to estimate the parameters

σ

2and

τ

2in the model with

α

=

1. As a

simple example, one can set

τ

ˆ

2

=

n n−1

n j=2 ˆ λj,d d and

σ

ˆ

2

= ˆ

λ

1,d

/

d

− ˆ

τ

2

/

n. Then

τ

ˆ

2

τ

2and

σ

ˆ

2

H⇒

σ

2χ 2 n n asd

→ ∞

by

Theorem 1and Slutsky’s theorem. The estimator

σ

ˆ

2is not consistent but can evidently be used to construct a confidence interval for

σ

2: P

n

σ

ˆ

2

χ

2 n,1−(a/2)

σ

2

n

σ

ˆ

2

χ

2 n,a/2

(

1

a

)

asd

→ ∞

.

This can be extended to provide asymptotic confidence intervals for principal component scores. A similar estimation scheme can be found in a related but different setting whered

,

n

→ ∞

together; see for example [19,16]. These papers do not discuss eigenvector estimation. The sample eigenvector,u

ˆ

1,dis difficult to improve upon mainly because the direction of deviationu

ˆ

1,dfromu1,dis quite random unless a more restrictive assumption (e.g. sparsity) is made.

3. Limits under the generalized spiked covariance model

The results in the previous section will be generalized to much broader situations, including a generalized spiked covariance model and a relaxation of the Gaussian assumption. We focus on the

α

=

1 case, and describe the limiting distributions for eigenvalues and eigenvectors.

(4)

Fig. 1. Angle densities for the one spike case. The top panel shows an overlay of the densities with differentσ2, with other parameters fixed. The bottom

panel shows an overlay of the densities with different degrees of freedomnof theχ2distribution. For a larger signal to noise ratioσ2

τ2, and for a largern, the angle to optimal is smaller.

3.1. Eigenvalues and eigenvectors

In the following, all the quantities depend on the dimensiond, but the subscriptdis omitted when it does not cause any confusion. We first describe some elementary facts from matrix algebra, that are useful throughout the paper. The dimension of the sample covariance matrixSincreases asdgrows, so it is challenging to deal withSdirectly. A useful approach is to use the dual ofS, defined as then

×

nsymmetric matrix

SD

=

1

nX

X

,

by switching the role of rows and columns ofX. The

(

i

,

j

)

th element ofSDis1nXiXj. An advantage of working withSDis that for larged, the finite dimensional matrixSDis positive definite with probability one, and itsneigenvalues are the same as the non-zero eigenvalues ofS. Moreover, the sample eigenvectorsu

ˆ

iare related to the eigen-decomposition ofSD, as shown next. LetSD

= ˆ

VnΛ

ˆ

nV

ˆ

n, whereΛ

ˆ

n

=

diag

(

λ

ˆ

1

, . . . ,

λ

ˆ

n

)

andV

ˆ

nis then

×

northogonal matrix of eigenvectors

v

ˆ

icorresponding to

λ

ˆ

i. RecallS

= ˆ

UΛ

ˆ

U

ˆ

′. SinceSis at most rankn, we can writeS

= ˆ

UnΛ

ˆ

nU

ˆ

n′, whereU

ˆ

n

= [ ˆ

u1

, . . . ,

u

ˆ

n

]

consists of the firstn columns ofU

ˆ

. The singular value decomposition ofXis given by

X

= ˆ

UnΛ

ˆ

nV

ˆ

n

=

n

i=1

(

n

λ

ˆ

i

)

− 1 2u

ˆ

i

v

ˆ

i

.

Then thekth sample principal component directionu

ˆ

kfork

nis proportional toX

v

ˆ

k,

ˆ

uk

=

(

n

λ

ˆ

k

)

− 1

2X

v

ˆ

k

.

(2)

Therefore the asymptotic properties of the eigen-decomposition ofS, asd

→ ∞

, can be studied via those of the finite dimensional matrixSD.

It is also useful to representSDin terms of the population principal components. LetZ(d)be the standardized principal

components ofX, defined by Z(d)

=

Z1

...

Zd

=

Λ −1/2 d UdX

,

whereZi

=

(

Zi1

, . . . ,

Zin

)

is theith row ofZ(d), so that

Zi

=

λ

− 1 2 i uiX

.

(3)

(5)

Under the Gaussian assumption of the previous section, each element ofZ(d)is independently distributed as the standard normal distribution. ByX

=

UdΛ1/2Z(d), SD

=

1 nXX

=

1 nZ ′ (dZ(d)

=

1 n d

i=1

λ

iZiZi

.

The Gaussian assumption onXis relaxed as follows. We assume that each column Xi ofX follows addimensional multivariate distribution with mean zero and covariance matrixΣ. Each entry of the standardized principal components, or the sphered variablesZ(d)is assumed to have finite fourth moments, and is uncorrelated but in general dependent with each other. We regulate the dependency of the principal components by a

ρ

-mixing condition (see [15,6]). We briefly describe a version of

ρ

-mixing for our purpose. For

−∞ ≤

J

L

≤ ∞

, letFJLdenote the

σ

-field of events generated by the random variablesZi

,

J

i

L. For any

σ

-fieldA, letL2

(

A

)

denote the space of square-integrable,Ameasurable real-valued random

variables. For eachm

1, define the maximal correlation coefficient

ρ(

m

)

:=

sup

|

corr

(

f

,

g

)

|

,

f

L2

(

F−∞j

),

g

L2

(

Fj∞+m

),

where sup is over allf

,

gandj

Z. The sequence

{

Zi

}

is said to be

ρ

-mixing if

ρ(

m

)

0 asm

→ ∞

.

While the concept of

ρ

-mixing is useful as a mild condition for the development of laws of large numbers, its formulation is critically dependent on the ordering of variables. Therefore we assume that there is some permutation of the data which is

ρ

-mixing. In particular, let

{

Zij,(d)

}

di=1be the components of thejth column vector ofZ(d). We assume that for eachd, there exists a permutation

π

d

: {

1

, . . . ,

d

} −→ {

1

, . . . ,

d

}

so that the sequence

{

Zπd(i)j,(d)

:

i

=

1

, . . . ,

d

}

is

ρ

-mixing. This

assumption makes the results invariant under a permutation of the variables. We denote these distributional assumptions as

(

c1

)

.

We then define a generalized spiked covariance model. Recall that a simple one spike model was defined on the eigenvalues of the population covariance matrixΣ, for example,

λ

1

=

σ

2dα

, λ

2

= · · · =

λ

d

=

τ

2. This is generalized by allowing multiple spikes, and by relaxing the uniform eigenvalue assumption in the tail to a decreasing sequence. The tail eigenvalues are regulated by a measure of sphericity

ϵ

kin the limitd

→ ∞

. The measure of sphericity

ϵ

k

,

k

=

1

,

2

, . . .

, is defined for

{

λ

k

, . . . , λ

d

}

as

ϵ

k

(

d

)

d

i=k

λ

i

2

d d

i=k

λ

2 i

,

which is away from 0 and close to 1 when

{

λ

k

, . . . , λ

d

}

are close to each other. Then we shall assume that the tail eigenvalues do not decrease too fast. In particular, we say the

ϵ

k-conditionis satisfied when

ϵ

k

(

d

)

decreases at a rate slower thand−1, i.e.

(

d

ϵ

k

)

−1

=

d

i=k

λ

2 i

d

i=k

λ

i

2

0 asd

→ ∞

.

The

ϵ

kcondition holds for quite general settings [14, Sec. 2c]. As an example, a polynomially decreasing sequencei− 1 2 of eigenvalues satisfies the condition

ϵ

kwithk

=

1. In the generalized spiked model, the eigenvalues are assumed to be of the form

λ

1

=

σ

12dα

, . . . , λ

m

=

σ

m2dα, for

σ

12

≥ · · · ≥

σ

m2

>

0 for somem

1, and the

ϵ

m+1condition holds for

λ

m+1

, . . . , λ

d. Also assume that1d

d

i=m+1

λ

i

τ

2asd

→ ∞

. These conditions for spike models are denoted by

(

c2

)

.

The following theorem gives the limits of the sample eigenvalues and eigenvectors under the general assumptions in this section. We use the following notations. Let

ϕ(

A

)

be a vector of eigenvalues of a real symmetric matrixAarranged in non-increasing order and let

ϕ

i

(

A

)

be theith largest eigenvalue ofA. Let

v

i

(

A

)

denote theith eigenvector of the matrix

Acorresponding to the eigenvalue

ϕ

i

(

A

)

and

v

ij

(

A

)

be thejth loading of

v

i

(

A

)

. Also note that there are many choices of eigenvectors ofS including the sign changes. We use the convention that the sign ofu

ˆ

iwill be chosen so thatu

ˆ

iui

0. Recall that the vector of theith standardized principal component scores isZi

=

(

Z1i

, . . . ,

Zni

)

′. Denote then

×

mmatrix of the firstmprincipal component scores asW

= [

σ

1Z1

, . . . , σ

mZm

]

. The limiting distributions heavily depend on the finite dimensional random matrixW.

Theorem 2. Under the assumptions

(

c1

)

and

(

c2

)

with fixed n

m

1, if

α

=

1, then

(

i

)

the sample eigenvalues d−1n

λ

ˆ

i,d

H⇒

ϕ

i

(

WW

)

+

τ

2

,

i

=

1

, . . . ,

m

;

τ

2

,

i

=

m

+

1

, . . . ,

n

,

(6)

(

ii

)

The inner products between the sample and population eigenvectors have limiting distributions:

ˆ

ui,duj,d

H⇒

v

ij

(

WW

)

1

+

τ

2

i

(

WW

)

as d

→ ∞

jointly for i

,

j

=

1

, . . . ,

m

.

The rest of eigenvectors are strongly inconsistent with their population counterpart, i.e.

ˆ

ui,dui,d

0 as d

→ ∞

for i

=

m

+

1

, . . . ,

n

,

in probability.

The theorem shows that the firstmeigenvectors are neither consistent nor strongly inconsistent to the population counterparts. The limiting distributions of angles Angle

(

u

ˆ

i,d

,

ui,d

)

to optimal directions are supported on

(

0

, π/

2

)

and depend on the magnitude of the noise

τ

2 and the distribution ofWW. Note that them

×

msymmetric matrixWWis the scaled covariance matrix of the principal component scores in the firstmdirections. When the underlying distribution ofXis assumed to be Gaussian, thenWW

is the Wishart matrixWm

(

n

,

Λm

)

, whereΛm

=

diag

12

, . . . , σ

m2

)

. Ifm

=

1, the matrix becomes a scalar random variableWW

=

ϕ

1

(

WW

)

, and is

χ

n2under the Gaussian assumption, which leads to

Theorem 1.

In general, the distribution of

ϕ

i

(

WW

)

is not simply described. We refer to [17, Ch. 9] for the Gaussian case, and [3] for the casem

→ ∞

.

The limiting distributions of the cases

α

̸=

1 can be found in a similar manner, which we only state the result for the case

α >

1 and for the firstmcomponents. We refer to [14] for more general results. Fori

,

j

=

1

, . . . ,

m, the eigenvalues

d−αn

λ

ˆ

i,d

H⇒

φ

i

(

WW

)

and the inner productsu

ˆ

i,duj,d

H⇒

v

ij

(

WW

)

asd

→ ∞

. In comparison to the

α

=

1 case, if we set

τ

2to be zero, the result becomes identical for all

α

1.

Remark 2. When the sample sizenalso grows, consistency of sample eigenvalues and eigenvectors can be achieved. In particular, fori

=

1

, . . . ,

m, we have asdgrows

ˆ

λ

i,d

λ

i,d

=

d n d−1n

λ

ˆ

i,d

σ

2 i d

H⇒

ϕ

i

(

WW

/

n

)

σ

2 i

+

τ

2 n

σ

i2

byTheorem 2. SinceWW

/

n

diag

2

1

, . . . , σ

m2

)

by a law of large numbers, we get the consistency of eigenvalues, i.e.

ˆ

λ

i,d

i,d

1 asd

,

n

→ ∞

, where the limits are applied successively. For the sample eigenvectors, fromTheorem 2(ii) and because

v

ij

(

WTW

)

1 ifi

=

j, 0 otherwise asn

→ ∞

and

ϕ

i

(

WW

)

=

O

(

n

)

, we getu

ˆ

i,duj,d

1 ifi

=

j, 0 otherwise asd

,

n

→ ∞

. Therefore, the sample PC directions are consistent to the corresponding population PC directions, i.e. Angle

(

u

ˆ

i,d

,

ui,d

)

0 asd

,

n

→ ∞

(applied successively), fori

m. Therefore it is conjectured that whend

,

ngrow together withd

n, a similar conclusion toTheorem 2can be drawn.

Proof of Theorem 2. The following lemma (Theorem 1 of [14]) shows a version of the law of large numbers for matrices, that is useful in the proof. Recall thatZ

i

(

Z1i

, . . . ,

Zni

)

is theith row ofZ(d). Lemma 1. If the assumption

(

c1

)

and the

ϵ

k-condition hold, then

cd−1 d

i=k

λ

i,dZiZi

In

,

as d

→ ∞

in probability, where cd

=

n−1

d

i=1

λ

i,dand Indenotes the n

×

n identity matrix. In particular, if d−1

d i=k

λ

i,d

τ

2, then 1 d d

i=k

λ

i,dZiZi

τ

2I n

,

as d

→ ∞

in probability.

This lemma is used to show that the spectral decomposition ofd−1nS

D, d−1nSD

=

m

i=1

σ

2 iZiZi

+

d −1 d

i=m+1

λ

iZiZi

,

can be divided into two parts, and the latter converges to a deterministic part. ApplyingLemma 1, we haved−1nS

D

H⇒

S0

asd

→ ∞

, where

(7)

Then since the eigenvalues of a symmetric matrixAare a continuous function of elements ofA, we have

ϕ(

d−1nSD

)

H⇒

ϕ(

S0

),

asd

→ ∞

. Noticing that fori

=

1

, . . . ,

m,

ϕ

i

(

S0

)

=

ϕ

i

(

WW

)

+

τ

2

=

ϕ

i

(

WW

)

+

τ

2

,

and fori

=

m

+

1

, . . . ,

n

, ϕ

i

(

S0

)

=

τ

2gives the result.

For the eigenvectors, note that the eigenvectors

v

ˆ

iofd−1nS

Dcan be chosen so that they are continuous [1]. Therefore, we also have that

v

ˆ

i

=

v

i

(

d−1nS

D

)

H⇒

v

i

(

S0

)

asd

→ ∞

, for alli. Also note that

v

i

(

S0

)

=

v

i

(

WW

)

fori

m.

Similar to the dual approach for covariance matrices, the eigenvectors of then

×

nmatrixWWcan be evaluated from the dual of the matrix. In particular, letW

=

UwΛwVw

=

m

i=1

λ

iwuiw

v

i′w, where

λ

2 iw

=

ϕ

i

(

WW

)

and

v

iw

=

v

i

(

WW

)

. Then

v

1

(

S0

)

=

uiw

=

W

v

iw

λ

iw

=

W

v

i

(

WW

)

ϕ

i

(

WW

)

.

Now from(2),(3)and the previous equation, for 1

i

,

j

m,

uju

ˆ

i

=

uj X

v

ˆ

i

n

λ

ˆ

i

=

ujX

v

ˆ

i

n

λ

ˆ

i

=

σ

2 jdZj

v

ˆ

i

n

λ

ˆ

i

=

σ

jZj

v

ˆ

i

d−1n

λ

ˆ

i

H⇒

σ

jZjW

v

i

(

WW

)

ϕ

i

(

WW

)

+

τ

2

ϕ

i

(

WW

)

.

(4)

Note that

σ

jZjW

= [

σ

j

σ

1ZjZ1

· · ·

σ

j

σ

mZjZm

]

is thejth row ofWWandWW

v

i

(

WW

)

=

ϕ

i

(

WW

)v

i

(

WW

)

. Therefore, the limiting form(4)becomes

ϕ

i

(

WW

)v

ij

(

WW

)

ϕ

i

(

WW

)

+

τ

2

ϕ

i

(

WW

)

.

Fori

=

m

+

1

, . . . ,

n, again from(2)and(3), we get

uiu

ˆ

i

=

λ

iZi

v

ˆ

i

n

λ

ˆ

i

=

d−12

τ

Zi

v

ˆ

i

n

λ

ˆ

i

/

d

=

Op

d−12

.

3.2. Asymptotic results for the centered data matrix

In practice, the sample covariance matrix is usually derived from the centered data matrix. In such a case, we obtain a weaker result thanTheorem 2. LetS

˜

(d)

=

n−1

(

Xd

− ¯

X

)(

Xd

− ¯

X

)

′, whereX

¯

=

n−1

n

i=1Xi,d1′nis ad

×

nmatrix consisting of

ncolumns of the sample mean vector.

For a general mean vector

µ

, the representation ofXin terms of standardized principal componentsZ(d)is X

=

µ

1′n

+

UdΛ 1 2Z(d) and thus X

− ¯

X

=

UdΛ 1 2

(

Z(d)

− ¯

Z

),

where theith row ofZ

¯

isz

¯

i

=

n

−1

n

j=1Zij1′n

=

n −1Z

iJn. The symbolJnrepresents then

×

nmatrix consisting entirely of ones.

The dual covariance matrix ofS

˜

(d)is thenS

˜

D

=

n−1

d

i=1

λ

i

(

Zi

− ¯

zi

)(

Zi

− ¯

zi

)

=

n−1

(

In

n−1Jn

)

d

i=1

λ

iZiZi

(

In

n−1Jn

)

′. Then we have the following result.

Proposition 1. Let

λ

˜

i,dbe the ith largest eigenvalue of S

˜

(d). Under the assumptions

(

c1

)

and

(

c2

)

with fixed n

>

m

1, if

α

=

1, then for any

ϵ >

0,

lim d→∞P

m

i=1

n d

˜

λ

i,d

∈ [

ϕ

i

(

W

˜

W

˜

), ϕ

i

(

W

˜

W

˜

)

+

τ

2

]

n

−1 i=m+1

n d

˜

λ

i,d

2

ϵ, τ

2

+

ϵ)

=

1

,

(8)

Proof of Proposition 1.Note that

λ

˜

i,d

=

ϕ

i

(

S

˜

(d)

)

=

ϕ

i

(

S

˜

D

)

fori

=

1

, . . . ,

n

1. Similar to the proof ofTheorem 2, the spectral decomposition ofd−1nS

˜

Dis divided into two parts,

d−1nS

˜

D

=

m

i=1

σ

2 i

(

Zi

− ¯

zi

)(

Zi

− ¯

zi

)

+

(

In

n−1Jn

)

1 d d

i=m+1

λ

iZiZi

(

In

n−1Jn

)

,

and usingLemma 1,d−1nS

˜

Dconverges in distribution toS

˜

0

= ˜

WW

˜

+

τ

2

(

In

n−1Jn

)

. Then the eigenvalues ofd−1nS

˜

Djointly converge to the eigenvalues ofS

˜

0.

Fori

=

1

, . . . ,

m, Weyl’s inequality [14, p. 4121], [10] yields that

ϕ

i

(

W

˜

W

˜

)

+

ϕ

n

{

τ

2

(

In

n−1Jn

)

} ≤

ϕ

i

(

S

˜

0

)

ϕ

i

(

W

˜

W

˜

)

+

ϕ

1

{

τ

2

(

In

n−1Jn

)

}

,

where

ϕ

j

{

τ

2

(

In

n−1Jn

)

} =

τ

2forj

=

1

, . . . ,

n

1 and 0 forj

=

n, and

ϕ

i

(

W

˜

W

˜

)

=

ϕ

i

(

W

˜

W

˜

)

. Fori

=

m

+

1

, . . . ,

n

1, also applying Weyl’s inequality gives

ϕ

n

(

W

˜

W

˜

)

+

ϕ

i

{

τ

2

(

In

n−1Jn

)

} ≤

ϕ

i

(

S

˜

0

)

ϕ

i

(

W

˜

W

˜

)

+

ϕ

1

{

τ

2

(

In

n−1Jn

)

}

,

and because the rank ofW

˜

W

˜

is at mostm

, ϕ

i

(

W

˜

W

˜

)

=

0 fori

>

m. Thus,

ϕ

i

(

S

˜

0

)

=

τ

2, which leads tod−1n

λ

˜

i,d

τ

2in probability asd

→ ∞

. The result is derived by combining the casesi

=

1

, . . . ,

n

1.

When the centered data matrixX

− ¯

X is used, the scaled eigenvalue estimate no longer converges in distribution to

ϕ

i

(

W

˜

W

˜

)

+

τ

2. However, the difference becomes smaller for largern, since the centering matrixIn

n−1Jnis close toInfor largen. For the rest of the paper, we assume that the mean is known and zero for the sake of clear presentation.

3.3. Angles between principal component spaces

Under the generalized spiked covariance model withm

>

1, the firstmpopulation principal directions provide a basis of the most important variation. Therefore, it would be more informative to investigate the deviation of eachu

ˆ

ifrom the subspaceLm

1

(

d

)

spanned by

{

u1,d

, . . . ,

um,d

}

. Also, denote the subspace spanned by the firstmsample principal directions as

ˆ

Lm

1

(

d

)

span

{ ˆ

u1,d

, . . . ,

u

ˆ

m,d

}

. When performing dimension reduction, it is critical for the sample PC spaceL

ˆ

m1 to be close

to the population PC spaceLm1. The closeness of two subspaces can be measured in terms ofcanonical angles.

We briefly introduce the notion of canonical angles and metrics between subspaces, detailed discussions of which can be found in [23,9]. As a simple case, the canonical angle between a 1-dimensional subspace and anm-dimensional subspace is defined as follows. LetL

ˆ

ibe the 1-dimensional linear space with basisu

ˆ

i. Infinitely many angles can be formed between

ˆ

LiandLm1 withm

>

1. The canonical angle, denoted by Angle

(

L

ˆ

i

,

Lm1

)

, is defined by the smallest angle formed, that is the

angle betweenu

ˆ

iand its projectionu

ˆ

Pi ontoLm1. This angle is represented in terms of an inner product as

Angle

(

L

ˆ

i

,

Lm 1

)

=

cos −1

ˆ

uiu

ˆ

Pi

u

ˆ

Pi

u

ˆ

i

=

min y∈Lm1 Angle

(

u

ˆ

i

,

y

)

for

y

>

0

.

(5)

When two multi-dimensional subspaces are considered, multiple canonical angles are defined. Among angles betweenL

ˆ

m

1

andLm1, the first canonical angle is geometrically defined as

θ

1

(

L

ˆ

m1

,

L m 1

)

=

max x∈ ˆLm1 min y∈Lm 1 Angle

(

x

,

y

)

for

x

,

y

>

0

,

(6)

where Angle

(

x

,

y

)

is the angle formed by the two vectorsx

,

y. One can show that the second canonical angle is defined by the same geometric relation as above withL

ˆ

−xandL−yforx

,

yfrom(6), whereL

ˆ

−xis the orthogonal complement ofxinL

ˆ

m

1.

In practice, the canonical angles are found by the singular value decomposition of a matrix. LetU

ˆ

mandUmbe orthonormal bases forL

ˆ

m

1

,

Lm1 and

γ

i’s be the singular values ofU

ˆ

mUm. Then the canonical angles are

θ

i

(

L

ˆ

m1

,

L m 1

)

=

cos −1

i

)

in descending order.

Distances between two subspaces can be defined using the canonical angles. We point out two metrics from Chapter II.4 of [23]:

1. gap metric

ρ

g

(

L

ˆ

m1

,

Lm1

)

=

sin

1

)

,

2. Euclidean sine metric

ρ

s

(

L

ˆ

m1

,

Lm1

)

= {

m i=1sin 2

i

)

}

1 2.

The gap metric is simple and only involves the largest canonical angle. The Euclidean sine metric makes use of all canonical angles and thus gives a more comprehensive understanding of the closeness between the two subspaces. We will use both metrics in the following discussion.

(9)

At first, we examine the limiting distribution of the angle between the sample PC directionu

ˆ

iandLm1.

Theorem 3. Under the assumptions

(

c1

)

and

(

c2

)

with fixed n

m

1, if

α

=

1, then for i

=

1

, . . . ,

m, the canonical angle converges in distribution: cos

Angle

(

L

ˆ

i

,

Lm 1

)

H⇒

1

1

+

τ

2

i

(

WW

)

as d

→ ∞

.

Proof. Sinceu

ˆ

P i

=

m j=1

(

u

ˆ

iuj

)

uj,

ˆ

uiu

ˆ

Pi

u

ˆ

Pi

u

ˆ

i

=

u

ˆ

Pi

=

(

u

ˆ

iu1

)

2

+ · · · +

(

u

ˆ

ium

)

2

.

The result follows from(5),Theorem 2(ii) and the fact that

m

j=1

v

ij

(

WW

)

2

=

v

i

(

WW

)

2

=

1.

We then investigate the limiting behavior of the distances betweenL

ˆ

m1 andLm1, in terms of either the canonical angles or the distances. From the fact thatU

ˆ

m

= [ ˆ

u1

, . . . ,

u

ˆ

m

]

andUm

= [

u1

, . . . ,

um

]

are orthonormal bases ofL

ˆ

m1 andLm1

respectively, cosines of the canonical angles are the singular values ofU

ˆ

mUm. Since the

(

i

,

j

)

th element ofU

ˆ

mUmisu

ˆ

iuj,

Theorem 2leads to

ˆ

UmUm

H⇒

v

1

(

WW

)

· · ·

v

m

(

WW

)

1

+

τ

2

ϕ

1

(

WW

)

− 1 2 0

...

0

1

+

τ

2

ϕ

m

(

WW

)

−12

,

asd

→ ∞

. Therefore the canonical angles (

θ

1

, . . . , θ

m) betweenL

ˆ

m1 andL

m

1 converge to the arccosines of

(

1

+

τ

2

m

(

WW

)

−12

, . . . ,

1

+

τ

2

1

(

WW

)

−12

),

(7)

asd

→ ∞

. Notice that these canonical angles between subspaces converge to the same limit as inTheorem 3, except that the order is reversed. In particular, the limiting distribution of the largest canonical angle

θ

1is the same as that of the angle

betweenu

ˆ

mandLm1, and the smallest canonical angle

θ

mcorresponds to the angle between the first sample PC directionu

ˆ

1

and the population PC spaceLm1.

The limiting distributions of the distances between two subspaces are readily derived by the discussions so far. When using the gap metric,

ρ

g

(

L

ˆ

m1

,

Lm1

)

H⇒

1

+

ϕ

m

(

WW

)/τ

2

− 1

2 asd

→ ∞

.

And by using the Euclidean sine metric,

ρ

s

(

L

ˆ

m1

,

L m 1

)

H⇒

m

i=1 1 1

+

ϕ

i

(

WW

)/τ

2

12 asd

→ ∞

.

(8)

Remark 3. The convergence of the canonical angles for the case

α >

1 has been shown earlier. [14] introduced a notion of

subspace consistency, where the directionu

ˆ

imay not be consistent touibut will tend to lie inLm1, i.e. Angle

(

L

ˆ

i

,

Lm1

)

0 as

d

→ ∞

, fori

m. In this case, the canonical angles betweenL

ˆ

mi andLm1 and the distances will converge to 0 asdgrows. In that sense, the empirical PC spaceL

ˆ

m

i is consistent toLm1. On the other hand, when

α <

1, all directionsu

ˆ

itend to behave as if they were from the eigen-decomposition of the identity matrix. Therefore, all angles tend to be

π/

2 and the distances will converge to their largest possible values, leading to the strong inconsistency.

We now focus back on the

α

=

1 case, and illustrate the limiting distributions of the canonical angles and the Euclidean sine distance, to see the effect of parameters in the distribution. For simplicity and clear presentation, the results corresponding tom

=

2 are presented under the Gaussian assumption. Note that the limiting distributions depend on the marginal distributions of the first few principal component scores. Therefore no common distribution is evaluated in the limit.

Let

σ

2

(

=

σ

2

1

+

σ

22

)

denote the (scaled) total variance of the first two principal components. The ratioσ

2

τ2 is understood as a signal to noise ratio, similar to the single spike case. Since the ratio of

σ

12and

σ

22affects the limiting distributions, we use

1

, λ

2

)

with

λ

1

+

λ

2

=

1 so that

σ

2

1

, λ

2

)

=

12

, σ

22

)

. Note that forW

W

W

2

(

n

,

diag

12

, σ

22

))

,

ϕ(

W

W

)

has the same law as

σ

2

ϕ(

W2

(

n

,

diag

1

, λ

2

)))

.

(10)

Fig. 2. Overlay of contours of densities of canonical angles for them=2 case, corresponding to different(σ22, λ

1/λ2). Larger signal to noise ratios σ22lead to the diagonal shift of the density function, and both canonical angles will have smaller values. For a fixedσ22, the ratioλ

1/λ2between the

first and second PC variances is a driver for different distributions, depicted as the vertical shift of the density function.

The joint limiting distribution of the two canonical angles in (7), also in Theorem 3, is illustrated inFig. 2, with various values of (

σ

2

2

, λ

1

2) and fixedn. Note that for larged, the first canonical angle

θ

1

Angle

(

u

ˆ

2,d

,

Lm1

)

and

θ

2

Angle

(

u

ˆ

1,d

,

Lm1

)

, and that

θ

1

θ

2.

The diagonal shift of the joint densities inFig. 2is driven by different

σ

2s with other parameters fixed. Both

θ

1and

θ

2are

smaller for larger signal to noise ratios.

For fixed

σ

2

2, several values of the ratio between the first and second variances (

λ

1

2) are considered, and the overlay

of densities according to different

λ

1

2is illustrated as the vertical shift inFig. 2. When the variation along the first PC

direction is much stronger than that along the second, i.e. when

λ

1

2is large,

θ

2becomes smaller but

θ

1tends to be

much larger. In other words,u

ˆ

1is a reasonable estimate ofu1, butu

ˆ

2becomes a poor estimate ofu2.

See(A.4)in theAppendixfor the probability density function of the canonical angles.

The limiting distribution of the Euclidean sine distance betweenL

ˆ

m1 andLm1 is also depicted inFig. 3, again with various values of (

σ

2

, λ

1

, λ

2). It can be checked from the top panel ofFig. 3that the distance to the optimal subspace is smaller when

the signal to noise ratio is larger. The bottom panel illustrates the densities corresponding to different ratios of

λ

1

2. The

effect of

λ

1

2is relatively small compared to the effect of different

σ

2s, unless

λ

2is too small.

4. Geometric representation of the HDLSS data

[11] first showed that the HDLSS data has an interestinggeometric representationin the limitd

→ ∞

. In particular, for larged, the data tend to appear at vertices of a regular simplex and the variability is contained only in the random rotation of the simplex. In the spike model we consider, this geometric representation of the HDLSS data holds when

α <

1, as shown earlier in [14]. The representation in mathematical terms is

Xi

∥ =

τ

d

+

op

(

d

),

Xi

Xj

=

τ

2d

+

op

(

d

),

(9)

forddimensionalXi

,

i

=

1

, . . . ,

n. This simplified representation has been used to show some high dimensional limit theory for discriminant analysis; see [2,22,12].

Similar types of representation can be derived in the

α

1 case. When

α >

1, where consistency of PC directions happens, we have

Xi

/

dα/2

H⇒ ∥

Yi

,

Xi

Xj

/

dα/2

H⇒

Yi

Yj

(10)

whereYi

=

1Z1i

, . . . σ

mZmi

)

′s arem-dimensional independent random vectors with mean zero and covariance matrix diag

12

, . . . , σ

2

m

)

. To understandYi, letXiP be the projection ofXi onto the true PC space Lm1. It can be checked from

d−1/2XP

i

=

m

j=1

jZji

)

ujthatYiis the vector of loadings of the scaledXiPin the firstmprincipal component coordinates. When

α

=

1, a deterministic term is added:

Xi

2

/

d1

H⇒ ∥

Yi

2

+

τ

2

,

Xi

Xj

2

/

d

H⇒

Yi

Yj

2

+

2

τ

2

.

(11)

(11)

Fig. 3. Overlay of densities of the distance between the sample and population PC spaces, measured byρsfor them=2 case. The top panel shows a transition of the density function corresponding to different signal to noise ratios. The bottom panel illustrates the effect of the ratioλ1/λ2between the

first two eigenvalues. For a larger signal to noise ratioσ22, and for a smaller value ofλ

1/λ2, the Euclidean sine distance is smaller.

These results can be understood geometrically, as summarized and discussed in the following.

α >

1: The variability of samples is restricted to the true PC spaceLm

1 for larged, which coincides with the notion of

subspace consistency discussed inRemark 3. Thed-dimensional probability distribution degenerates to them -dimensional subspaceLm1.

α

=

1: (11)is understood with the help of the Pythagorean theorem, that is, the norm ofXiis asymptotically decomposed into orthogonal random and deterministic parts. Thus, data tend to be

τ

daway fromLm1, andXis projected on

Lm

1

=

span

{

um+1

, . . . ,

ud

}

, the orthogonal complement ofLm1, will follow the representation similar to the

α <

1

case.

α <

1: The geometric representation(9)holds. Note that the case

α

=

1 smoothly bridges the others.

An example elucidating these ideas is shown inFig. 4. Each panel shows 3d scatterplots of 10 different samples (shown as different symbols) ofn

=

3 Gaussian random vectors in dimensionsd

=

3

,

30

,

3000 (shown in respective columns ofFig. 4). In the spiked model, we take

σ

=

τ

=

1 andm

=

1 for simplicity and investigate three different orders of magnitude

α

=

1

2

,

1

,

2 of the first eigenvalue

λ

1

=

dα. For each pair of

(

d

, α)

, each sampleXiis projected onto the first true

PC directionu1, shown as the vertical axis. In the orthogonald

1 dimensional subspace, the 2-dimensional hyperplane

that is generated by the data is found, and the data are projected onto that. Within the hyperplane, variation due to rotation is removed by optimally rotating the data onto edges of a regular triangle by a Procrustes method, to give the horizontal axes in each part ofFig. 4. These axes are scaled by dividing by max

(

dα

,

d

)

and the 10 samples give an impression of various types of convergence as a function ofd, for each

α

.

The asymptotic geometric representations summarized above are confirmed by the figure. For

α

=

1

2, it is expected

from(9)that the data are close to the vertices of the regular triangle, with edge length

2d. The vertices of the triangle (in the horizontal plane) with vertical rays (representingu1, the first PC direction) are shown as the dashed lines in the

first two rows ofFig. 4. Note that ford

=

3, in the top row, the points appear to be quite random, but ford

=

30, there already is reasonable convergence to the vertices with notable variation alongu1. The cased

=

3000 shows more rigid

representation with much less variation alongu1. On the other hand, for the case

α

=

2 in the last row ofFig. 4, most of the

variations in the data are found alongu1, shown as the vertical dotted line, and the variation perpendicular tou1becomes

negligible asdgrows, which confirms the degeneracy toL1

1in(10). From these examples, conditions for consistency and

strong inconsistency can be checked heuristically. The sample eigenvectoru

ˆ

1is consistent withu1when

α >

1, since the

variation alongu1is so strong thatu

ˆ

1should be close to that.u

ˆ

1is inconsistent withu1when

α <

1, since the variation

alongu1becomes negligible so thatu

ˆ

1will not be nearu1.

For the

α

=

1 case, in the middle row ofFig. 4, it is expected from(11)that each data point will be asymptotically decomposed into a random and a deterministic part. This is confirmed by the scatterplots, where the order of variance along

(12)

Fig. 4. Gaussian toy example, showing the geometric representations of HDLSS data, withn=3, for three different choices ofα=1/2,1,2 of the spiked model and increasing dimensionsd=3,30,3000. Forα̸=1, data converge to vertices of a regular 3-simplex (caseα <1) or to the first PC direction (caseα >1). Whenα=1, data are decomposed into the deterministic part on the horizontal axes and the random part along the vertical axis.

u1remains comparable to that of horizontal components, asdgrows. The convergence to the vertices is noticeable even

ford

=

30, which becomes stronger for largerd, while the randomness alongu1remains for larged. Also observe that the

distance from eachXito the space spanned byu1becomes deterministic for larged, supporting the first part of(11).

Appendix. Derivation of the density functions

The probability density functions of the limiting distributions in(7)and(8)will be derived for the casem

=

2, under the normal assumption. The argument is readily generalized to allm, but when non-normal distribution is assumed, such derivation is much challenging.

We first recall some necessary notions for treating the Wishart matrixWWand eigen-decompositions. Most of the results are adopted from [17]. Let A

Wm

(

n

,

Σm

)

and denote its eigen-decomposition as A

=

HLH′ with L

=

diag

(

l1

, . . . ,

lm

)

. AssumeΣmis positive definite andn

mso thatl1

>

l2

>

· · ·

>

lm

>

0 with probability 1. Denote

O

(

m

)

= {

Hm×m

:

HH

=

Im

}

for the set of orthonormalm

×

mmatrices and

(

dH

)

forH

O

(

m

)

as the differential form representing the uniform probability measure onO

(

m

)

. The multivariate gamma function is defined as

Γm

(

a

)

=

π

m(m−1)/4 m

i=1 Γ

a

1 2

(

i

1

)

,

whereΓ

(

·

)

is the usual gamma function. ForH

≡ [

h1

, . . . ,

hm

] ∈

O

(

m

)

,

(

dH

)

1 Vol

(

O

(

m

))

(

HdH

)

=

Γm

m 2

2m

π

m2/2

(

HdH

),

(A.1) where

(

HdH

)

m i>jhidhj.

We are now ready to state the density function of

ϕ(

WW

)

form

=

2. Note that under the Gaussian assumptionWW is the 2

×

2 Wishart matrix with degree of freedomnand covariance matrixΣW

=

diag

12

, σ

22

)

. For simplicity, write

(

L1

,

L2

)

=

ϕ(

WW

)

, andL

=

diag

(

L1

,

L2

)

. Then the joint density function ofL1andL2is given by e.g. Theorem 3.2.18 of [17]

withm

=

2, and fL

(

l1

,

l2

)

=

π

2−n

2 1

σ

22

)

n2 Γ2

n 2

(

l1l2

)

n−3 2

(

l1

l2

)

O(2) exp

trace

1 2Σ −1 W HLH



(

dH

).

(A.2)

References

Related documents

Eligibility to participate in the survey was based on two criteria: (1) self-identification as a person with disability, or who is deaf or hard of hearing, with experience in

1 Students who successfully complete the programmes taught in English in Paris and London will be awarded with a BA (Hons) degree by MMU (Manchester Metropolitan University - UK);

Vacant dwellings, vacancy homes; 5,1% Owner-occupied houses; 33,4% Owner-occupied dwellings; 9,2% Rental dwellings - private landlords; 33,4% Rental dwellings - public sector;

When house prices fall, collateral values plunge, but out- standing debt does not fall by much, resulting in the spike in the debt-to-collateral ratio observed in the data.. The

The proposed system is applied to Saudi Arabia cellular service networks, which use the 1.8 - 2.1 GHz frequency band of the RF radio spectrum.. The system is modeled and

Thus just as the working class struggles to get higher wages for less work, it often struggles for the direct appropriation of wealth by stealing the C’ it has produced, via

CDPR = Cash dividends payout ratio, calculated as cash dividend per share paid in a year divided by earnings per share; SDPR = Stock dividend rate, calculated as stock dividend

We completely characterize both, the acquisition decisions and the resulting product market prices and profits, conditional upon the initial ownership structures of the two firms,