• No results found

Approximation Accuracy Estimation

In document Jiang_unc_0153D_17953.pdf (Page 57-63)

2.5 Discussion

3.3.2 Step 1: Signal Space Initial Extraction

3.3.2.2 Approximation Accuracy Estimation

A major challenge is segmentation of the joint and individual variation in the presence of noise

which individually perturbs each signal. A first step towards addressing this is a careful study of

how wellA

k

is approximated by ˜A

k

using theGeneralized

sin ΘTheorem

(Wedin, 1972).

Pseudometric Between Subspaces

To apply the Generalized sinθTheorem, we use the follow-

ing pseudometric as a notion of distance between theoretical and perturbed subspaces. Recall that

row(A

k

), row( ˜A

k

) are respectively ther

k

,r˜

k

dimensional score subspaces ofR

n

respectively for the

matrix

A

k

and its approximation ˜A

k

. The corresponding projection matrices are

P

Ak

and

P

Ak˜

,

respectively. A pseudometric between the two subspaces can be defined as the difference of the pro-

jection matrices under the operatorL

2

norm, i.e.,

ρ(row(A

k

),row( ˜A

k

)) =kP

Ak

−P

Ak˜

k

(Stewart

and Sun, 1990). When

r

k

= ˜r

k

, this pseudometric is also a distance between the two subspaces.

An insightful understanding of this pseudometric

ρ(row(A

k

),row( ˜A

k

)) comes from a principal

angle analysis (Jordan, 1875; Hotelling, 1936) of the subspaces row(A

k

) and row( ˜A

k

). Denote the

principal angles between row(A

k

) and row( ˜A

k

) as

Θ(row(A

k

),row( ˜A

k

)) ={θ

k,1

, . . . , θ

k,rk∧˜rk

}

(3.4)

withθ

k,1

≤θ

k,2

. . .≤θ

k,rk∧˜rk

. The pseudometricρ(row(A

k

),row( ˜A

k

)) is equal to the sine of the

maximal principal angle, i.e., sinθ

k,rk∧˜rk

. Thus the largest principal angle between two subspaces

measures their closeness, i.e., distance.

The pseudometric

ρ(row(A

k

),row( ˜A

k

)) can be also written as

ρ(row(A

k

),row( ˜A

k

)) =k(I−P

Ak

)P

Ak˜

k=k(I−P

Ak˜

)P

Ak

k

which gives another useful understanding of this definition. It measures the relative deviation

of the signal variation from the theoretical subspace. Accordingly, the similarity/closeness between

the subspaces and its perturbation can be written as

kP

Ak

P

Ak˜

k

and is equal to the cosine of

the maximal principal angle defined above, i.e., cosθ

k,rk∧˜rk

.

Hence, sin

2

θ

k,rk∧˜rk

indicates the

proportion of signal deviation and cos

2

θ

k,rk∧˜rk

tells the proportion of remaining signal in the

theoretical subspace.

Wedin Bound

For a signal matrix

A

k

and its perturbation

X

k

=A

k

+E

k

, the generalized sinθ

theorem provides a bound for the distance between the rank ˜r

k

(≤r

k

) singular subspaces ofA

k

and

Theorem 1

(Wedin, 1972).

Let

A

k

be a signal matrix with rankr

k

. LettingA

k,1

=U

k,1

Σ

k,1

V

k,1>

denote the rank

k

SVD of

A

k

, where

k

r

k

, write

A

k

=

A

k,1

+A

k,0

. For the perturbation

X

k

=

A

k

+E

k

, a corresponding decomposition can be made as

X

k

= ˜A

k,1

+ ˜E

k

, where

k,1

=

˜

U

k,1

Σ˜

k,1

k,1>

is the rank

˜r

k

SVD of

X

k

. Assume that there exists an

α≥0

and a

δ >0

such that

for

σ

min

( ˜A

k,1

)

and

σ

max

(A

k,0

)

denoting appropriate minimum and maximum singular values

σ

min

( ˜A

k,1

)≥α+δ,

andσ

max

(A

k,0

)≤α.

Then the distance between the row spaces of

k,1

and

A

k,1

is bounded by

ρ(row( ˜A

k,1

),row(A

k,1

))≤

maxkE

k

k,1

k,kE

>k

k,1

k

δ

∧1.

In practice we do not observe

A

k,0

thus

δ

cannot be estimated in general. A special case of

interest for AJIVE is ˜r

k

=r

k

, in which case

A

k,0

= 0,A

k

=A

k,1

. The following is an adaptation

of the generalized sinθ

theorem to this case:

Corollary 1

(bound for correctly specified rank).

For eachk= 1, . . . , K, the signal matrixA

k

is

perturbed by additive noise

E

k

. Let

θ

k,˜rk

be the largest principal angle for the subspace of signal

A

k

and its approximation

k

, where

k

=r

k

. Denote the SVD of

k

as

k

Σ˜

k

>k

. The distance

between the subspaces of

A

k

and

k

,

ρ(row(A

k

),row( ˜A

k

)), i.e., sine of

θ

k,˜rk

, is bounded above by

ρ(row(A

k

),row( ˜A

k

)) = sinθ

k,˜rk

max(kE

k

k

k,kE

k>

k

k)

σ

min

( ˜A

k

)

∧1.

(3.5)

In this case the bound is driven by the maximal value of noise energy in the column and row

spaces and by the estimated smallest signal singular value. This is consistent with the intuition

that a deviation distance, i.e., a largest principal angle, is small when the signal is strong and

perturbations are weak.

In general, it can be very challenging to correctly estimate the true rank of

A

k

. If the true

rank

r

k

is not correctly specified, then different applications of the Wedin bound are useful. In

particular, when

A

k,0

is not 0, i.e., ˜r

k

< r

k

, insights come from replacing

E

k

by

E

k

+A

k,0

in the

Corollary 2

(bound for under-specified rank).

For each

k= 1, . . . , K, the signal matrix

A

k

with

rank

r

k

is perturbed by additive noise

E

k

. Let

k

= ˜U

k

Σ˜

k

>k

be the rank

k

SVD approximation

of

A

k

from the perturbed matrix, where˜r

k

< r

k

. DenoteA

k

=A

k,1

+A

k,0

, where

A

k,1

is the rank

˜

r

k

SVD of

A. Then the distance between

row(A

k,1

)

and

row( ˜A

k

)

is bounded above by

ρ(row(A

k,1

),row( ˜A

k

))≤

max

k(E

k

+A

k,0

) ˜V

k

k,k(E

k

+A

k,0

)

>

k

k

σ

min

( ˜A

k

)

∧1.

For the other type of initial rank misspecification, ˜r

k

> r

k

, we augment

A

k

with appropriate

noise components to be able to use the Wedin bound.

Corollary 3

(bound for over-specified rank).

For each

k

= 1, . . . , K, the signal matrix

A

k

=

U

k

Σ

k

V

>k

with rankr

k

is perturbed by additive noise

E

k

. Let

k

= ˜U

k

Σ˜

k

>k

be the rankr˜

k

SVD

of

X

k

, where

k

> r

k

. Let

E

0

be the rank

˜r

k

−r

k

SVD of

(I−U

k

U

>k

)E

k

(I−V

k

V

>k

). Then the

pseudometric between

row(A

k

)

and

row( ˜A

k

)

is bounded above by

ρ(row(A

k

),row( ˜A

k

))≤

max (k(E

k

−E

0

) ˜V

k

k,k(E

k

−E

0

)

>

k

k)

σ

min

( ˜A

k

)

∧1.

The bounds in Corollaries 1, 2, 3 provide many useful insights. However, these bounds still

cannot be used directly since we do not observe the error matrices

E

1

, . . . ,E

K

. A re-sampling

based estimator of the Wedin bounds is provided in the next paragraph. As seen in Figure 3.6, this

estimator appropriately adapts to each of the above three cases. Moreover, Figure 3.6 also indicate

that the Wedin bound for over-specified rank is usually very conservative.

Estimation And Evaluation Of The Wedin Bound

As mentioned above, the perturbation

bounds of each

θ

k,rk∧˜rk

require the estimation of terms

kE

k

k

k,

kE

>

k

k

k

for

k

= 1,2. These

terms are measurements of energies of the noise matrices projected onto the signal column and row

spaces. Since an isotropic error model is assumed, the

distributions

of energy of the noise matrices

in arbitrary fixed directions are equal.Thus, if we sample random subspaces of dimension ˜r

k

, that

are orthogonal to the estimated signal ˜A

k

, and use the observed residual ˜E

k

=X

k

A

k

, this should

provide a good estimator of the distribution of the unobserved terms

E

k

k

,E

>k

k

.

In particular, consider the estimation of the term

kE

k

k

k. We draw a random subspace of

0 5 10 15 20 25 30 35 40 45 0 0.2 0.4 0.6 0.8 1 Rank 1 SVD of X 0 5 10 15 20 25 30 35 40 45 0 0.2 0.4 0.6 0.8 1 Rank 2 SVD of X 0 5 10 15 20 25 30 35 40 45 0 0.2 0.4 0.6 0.8 1 Rank 3 SVD of X 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Rank 2 SVD of Y 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Rank 3 SVD of Y 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Rank 4 SVD of Y

Figure 3.6: Principal angle plots between each singular subspace of the signal matrixAk,1and its estimator

˜

Ak for the toy dataset. Graphics forX are on the upper row, with Y on the lower row. The left, middle

and right columns are the under-specified, correctly specified and over-specified signal matrix rank cases respectively. Each x-axis represents the angle. The y-axis shows the values of the survival function of the resampled distribution, which are shown as blue plus signs in the figure. The vertical blue solid line is the theoretical Wedin bound, showing this bound is well estimated. The vertical black solid line segments represent the principal anglesθk,1, . . . , θk,rk∧r˜k between row(Ak,1) and row( ˜Ak). The distance between the black and blue lines reveals when the Wedin bound is tight.

the subspace spanned byV

k?

, written asX

k

V

?k

. The distribution (with respect to theV

k?

variation)

of the operatorL

2

normkX

k

V

?k

k=kE˜

k

V

?k

kapproximates the distribution of the unknownkE

k

k

k

because both measure noise energy in essentially random directions. Similarly the estimation of

kE

>k

k

k

is approximated bykX

>k

U

?k

k, whereU

?k

is a random ˜r

k

dimensional subspace orthogonal

to ˜U

k

. These distributions are used to estimate the Wedin bound by generating 1000 replications of

kX

k

V

k?

k

and

kX

>k

U

?k

k, and plugging these into (3.5). The quantiles of the resulting distributions

are used as prediction intervals for the unknown theoretical Wedin bound.

Note this random

subspace sampling scheme provides a distribution with smaller variance than simply sampling from

the remaining singular values ofX

k

, i.e. using 1000 subspaces each generated by a random sample

of ˜r

k

remaining singular vectors.

There are two criteria for evaluating the effectiveness of the estimator. First is how well the

resampled distributions approximate the underlying theoretical Wedin bounds. This is addressed

in Figure 3.6, which is based on the toy example in Section 3.1.1. For each of the matrices

X

and

Y

(top and bottom rows), the under, correctly, and over specified signal rank cases (Corollaries

2, 1 and 3 respectively) are carefully investigated.

In each case the theoretical Wedin bound

(calculated using the true underlying quantities, that are only known in a simulation study) are

shown as vertical blue lines. Our resampling approach provides an estimated distribution, the

survival function (1 - the c.d.f.) of which is shown using blue plus signs. This indicates remarkably

effective estimation of the Wedin bound in all three cases.

The second more important criterion is how well the prediction interval covers the actual

principal angles between row(A

k

) and row( ˜A

k

). These angles are shown as vertical black line

segments in Figure 3.6. For the square matrix

X, in the under and correctly specified case (top,

left, and center), the Wedin bound seems relatively tight. In all other cases, the Wedin bound is

conservative.

Figure 3.6 shows one realization of the noise in the toy example. A corresponding simulation

study is summarized in Table 3.1. For this we generated 10,000 independent copies of the data sets

X(100×100, true signal rankr

1

= 2) andY(10000×100, true signal rankr

2

= 3). Then for several

low rank approximations (columns of Table 3.1) we calculated the estimate of the angle between

the true signal and the low rank approximation. Table 3.1 reports the percentage of the times the

corresponding quantile of the resampled estimate is bigger than the true angle for the matrix

X.

Table 3.1: Coverages of the prediction intervals of the true angle between the signal row(Ak,1) and its estimator row( ˜Ak) for the matrixX in the toy example. Rows are nominal levels. Columns are ranks of

approximation (where 2 is the correct rank). The simulation based on 10000 realizations of Xshows good performance for this square matrix.

1

2

3

50%

91.9%

63.6%

100.0%

90%

100.0%

89.6%

100.0%

95%

100.0%

93.7%

100.0%

99%

100.0%

98.0%

100.0%

When the rank is correctly specified, i.e., ˜r

1

=r

1

= 2,we see that the performance for the square

matrixXis satisfactory as the empirical percentages are close to the nominal values. When the rank

is misspecified, the empirical upper bound is conservative. Corresponding empirical percentages

for the high dimension low sample size data set

Y

are all 100 %, and thus are not shown. This is

caused by the fact that Wedin bound can be very conservative if the matrix is far from square. As

seen in Figure 3.7 this can cause identification of spurious joint components. This motivates our

development of a diagnostic plot in Section 3.3.3. Recent works of Cai et al. (2018) and O’Rourke

et al. (2013) may provide potential approaches for improvement of the Wedin bound.

In document Jiang_unc_0153D_17953.pdf (Page 57-63)