2.5 Discussion
3.3.2 Step 1: Signal Space Initial Extraction
3.3.2.2 Approximation Accuracy Estimation
A major challenge is segmentation of the joint and individual variation in the presence of noise
which individually perturbs each signal. A first step towards addressing this is a careful study of
how wellA
kis approximated by ˜A
kusing theGeneralized
sin ΘTheorem
(Wedin, 1972).
Pseudometric Between Subspaces
To apply the Generalized sinθTheorem, we use the follow-
ing pseudometric as a notion of distance between theoretical and perturbed subspaces. Recall that
row(A
k), row( ˜A
k) are respectively ther
k,r˜
kdimensional score subspaces ofR
nrespectively for the
matrix
A
kand its approximation ˜A
k. The corresponding projection matrices are
P
Akand
P
Ak˜,
respectively. A pseudometric between the two subspaces can be defined as the difference of the pro-
jection matrices under the operatorL
2norm, i.e.,
ρ(row(A
k),row( ˜A
k)) =kP
Ak−P
Ak˜k
(Stewart
and Sun, 1990). When
r
k= ˜r
k, this pseudometric is also a distance between the two subspaces.
An insightful understanding of this pseudometric
ρ(row(A
k),row( ˜A
k)) comes from a principal
angle analysis (Jordan, 1875; Hotelling, 1936) of the subspaces row(A
k) and row( ˜A
k). Denote the
principal angles between row(A
k) and row( ˜A
k) as
Θ(row(A
k),row( ˜A
k)) ={θ
k,1, . . . , θ
k,rk∧˜rk}
(3.4)
withθ
k,1≤θ
k,2. . .≤θ
k,rk∧˜rk. The pseudometricρ(row(A
k),row( ˜A
k)) is equal to the sine of the
maximal principal angle, i.e., sinθ
k,rk∧˜rk. Thus the largest principal angle between two subspaces
measures their closeness, i.e., distance.
The pseudometric
ρ(row(A
k),row( ˜A
k)) can be also written as
ρ(row(A
k),row( ˜A
k)) =k(I−P
Ak)P
Ak˜k=k(I−P
Ak˜)P
Akk
which gives another useful understanding of this definition. It measures the relative deviation
of the signal variation from the theoretical subspace. Accordingly, the similarity/closeness between
the subspaces and its perturbation can be written as
kP
AkP
Ak˜k
and is equal to the cosine of
the maximal principal angle defined above, i.e., cosθ
k,rk∧˜rk.
Hence, sin
2
θ
k,rk∧˜rk
indicates the
proportion of signal deviation and cos
2θ
k,rk∧˜rktells the proportion of remaining signal in the
theoretical subspace.
Wedin Bound
For a signal matrix
A
kand its perturbation
X
k=A
k+E
k, the generalized sinθ
theorem provides a bound for the distance between the rank ˜r
k(≤r
k) singular subspaces ofA
kand
Theorem 1
(Wedin, 1972).
Let
A
kbe a signal matrix with rankr
k. LettingA
k,1=U
k,1Σ
k,1V
k,1>denote the rank
r˜
kSVD of
A
k, where
r˜
k≤
r
k, write
A
k=
A
k,1+A
k,0. For the perturbation
X
k=
A
k+E
k, a corresponding decomposition can be made as
X
k= ˜A
k,1+ ˜E
k, where
A˜
k,1=
˜
U
k,1Σ˜
k,1V˜
k,1>is the rank
˜r
kSVD of
X
k. Assume that there exists an
α≥0
and a
δ >0
such that
for
σ
min( ˜A
k,1)
and
σ
max(A
k,0)
denoting appropriate minimum and maximum singular values
σ
min( ˜A
k,1)≥α+δ,
andσ
max(A
k,0)≤α.
Then the distance between the row spaces of
A˜
k,1and
A
k,1is bounded by
ρ(row( ˜A
k,1),row(A
k,1))≤
maxkE
kV˜
k,1k,kE
>kU˜
k,1k
δ
∧1.
In practice we do not observe
A
k,0thus
δ
cannot be estimated in general. A special case of
interest for AJIVE is ˜r
k=r
k, in which case
A
k,0= 0,A
k=A
k,1. The following is an adaptation
of the generalized sinθ
theorem to this case:
Corollary 1
(bound for correctly specified rank).
For eachk= 1, . . . , K, the signal matrixA
kis
perturbed by additive noise
E
k. Let
θ
k,˜rkbe the largest principal angle for the subspace of signal
A
kand its approximation
A˜
k, where
r˜
k=r
k. Denote the SVD of
A˜
kas
U˜
kΣ˜
kV˜
>k. The distance
between the subspaces of
A
kand
A˜
k,
ρ(row(A
k),row( ˜A
k)), i.e., sine of
θ
k,˜rk, is bounded above by
ρ(row(A
k),row( ˜A
k)) = sinθ
k,˜rk≤
max(kE
kV˜
kk,kE
k>U˜
kk)
σ
min( ˜A
k)
∧1.
(3.5)
In this case the bound is driven by the maximal value of noise energy in the column and row
spaces and by the estimated smallest signal singular value. This is consistent with the intuition
that a deviation distance, i.e., a largest principal angle, is small when the signal is strong and
perturbations are weak.
In general, it can be very challenging to correctly estimate the true rank of
A
k. If the true
rank
r
kis not correctly specified, then different applications of the Wedin bound are useful. In
particular, when
A
k,0is not 0, i.e., ˜r
k< r
k, insights come from replacing
E
kby
E
k+A
k,0in the
Corollary 2
(bound for under-specified rank).
For each
k= 1, . . . , K, the signal matrix
A
kwith
rank
r
kis perturbed by additive noise
E
k. Let
A˜
k= ˜U
kΣ˜
kV˜
>kbe the rank
r˜
kSVD approximation
of
A
kfrom the perturbed matrix, where˜r
k< r
k. DenoteA
k=A
k,1+A
k,0, where
A
k,1is the rank
˜
r
kSVD of
A. Then the distance between
row(A
k,1)
and
row( ˜A
k)
is bounded above by
ρ(row(A
k,1),row( ˜A
k))≤
max
k(E
k+A
k,0) ˜V
kk,k(E
k+A
k,0)
>U˜
kk
σ
min( ˜A
k)
∧1.
For the other type of initial rank misspecification, ˜r
k> r
k, we augment
A
kwith appropriate
noise components to be able to use the Wedin bound.
Corollary 3
(bound for over-specified rank).
For each
k
= 1, . . . , K, the signal matrix
A
k=
U
kΣ
kV
>kwith rankr
kis perturbed by additive noise
E
k. Let
A˜
k= ˜U
kΣ˜
kV˜
>kbe the rankr˜
kSVD
of
X
k, where
r˜
k> r
k. Let
E
0be the rank
˜r
k−r
kSVD of
(I−U
kU
>k)E
k(I−V
kV
>k). Then the
pseudometric between
row(A
k)
and
row( ˜A
k)
is bounded above by
ρ(row(A
k),row( ˜A
k))≤
max (k(E
k−E
0) ˜V
kk,k(E
k−E
0)
>U˜
kk)
σ
min( ˜A
k)
∧1.
The bounds in Corollaries 1, 2, 3 provide many useful insights. However, these bounds still
cannot be used directly since we do not observe the error matrices
E
1, . . . ,E
K. A re-sampling
based estimator of the Wedin bounds is provided in the next paragraph. As seen in Figure 3.6, this
estimator appropriately adapts to each of the above three cases. Moreover, Figure 3.6 also indicate
that the Wedin bound for over-specified rank is usually very conservative.
Estimation And Evaluation Of The Wedin Bound
As mentioned above, the perturbation
bounds of each
θ
k,rk∧˜rkrequire the estimation of terms
kE
kV˜
kk,
kE
>
k
U˜
kk
for
k
= 1,2. These
terms are measurements of energies of the noise matrices projected onto the signal column and row
spaces. Since an isotropic error model is assumed, the
distributions
of energy of the noise matrices
in arbitrary fixed directions are equal.Thus, if we sample random subspaces of dimension ˜r
k, that
are orthogonal to the estimated signal ˜A
k, and use the observed residual ˜E
k=X
kA
k, this should
provide a good estimator of the distribution of the unobserved terms
E
kV˜
k,E
>kU˜
k.
In particular, consider the estimation of the term
kE
kV˜
kk. We draw a random subspace of
0 5 10 15 20 25 30 35 40 45 0 0.2 0.4 0.6 0.8 1 Rank 1 SVD of X 0 5 10 15 20 25 30 35 40 45 0 0.2 0.4 0.6 0.8 1 Rank 2 SVD of X 0 5 10 15 20 25 30 35 40 45 0 0.2 0.4 0.6 0.8 1 Rank 3 SVD of X 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Rank 2 SVD of Y 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Rank 3 SVD of Y 0 10 20 30 40 50 60 70 80 0 0.2 0.4 0.6 0.8 1 Rank 4 SVD of Y
Figure 3.6: Principal angle plots between each singular subspace of the signal matrixAk,1and its estimator
˜
Ak for the toy dataset. Graphics forX are on the upper row, with Y on the lower row. The left, middle
and right columns are the under-specified, correctly specified and over-specified signal matrix rank cases respectively. Each x-axis represents the angle. The y-axis shows the values of the survival function of the resampled distribution, which are shown as blue plus signs in the figure. The vertical blue solid line is the theoretical Wedin bound, showing this bound is well estimated. The vertical black solid line segments represent the principal anglesθk,1, . . . , θk,rk∧r˜k between row(Ak,1) and row( ˜Ak). The distance between the black and blue lines reveals when the Wedin bound is tight.
the subspace spanned byV
k?, written asX
kV
?k. The distribution (with respect to theV
k?variation)
of the operatorL
2normkX
kV
?kk=kE˜
kV
?kkapproximates the distribution of the unknownkE
kV˜
kk
because both measure noise energy in essentially random directions. Similarly the estimation of
kE
>kU˜
kk
is approximated bykX
>kU
?kk, whereU
?kis a random ˜r
kdimensional subspace orthogonal
to ˜U
k. These distributions are used to estimate the Wedin bound by generating 1000 replications of
kX
kV
k?k
and
kX
>kU
?kk, and plugging these into (3.5). The quantiles of the resulting distributions
are used as prediction intervals for the unknown theoretical Wedin bound.
Note this random
subspace sampling scheme provides a distribution with smaller variance than simply sampling from
the remaining singular values ofX
k, i.e. using 1000 subspaces each generated by a random sample
of ˜r
kremaining singular vectors.
There are two criteria for evaluating the effectiveness of the estimator. First is how well the
resampled distributions approximate the underlying theoretical Wedin bounds. This is addressed
in Figure 3.6, which is based on the toy example in Section 3.1.1. For each of the matrices
X
and
Y
(top and bottom rows), the under, correctly, and over specified signal rank cases (Corollaries
2, 1 and 3 respectively) are carefully investigated.
In each case the theoretical Wedin bound
(calculated using the true underlying quantities, that are only known in a simulation study) are
shown as vertical blue lines. Our resampling approach provides an estimated distribution, the
survival function (1 - the c.d.f.) of which is shown using blue plus signs. This indicates remarkably
effective estimation of the Wedin bound in all three cases.
The second more important criterion is how well the prediction interval covers the actual
principal angles between row(A
k) and row( ˜A
k). These angles are shown as vertical black line
segments in Figure 3.6. For the square matrix
X, in the under and correctly specified case (top,
left, and center), the Wedin bound seems relatively tight. In all other cases, the Wedin bound is
conservative.
Figure 3.6 shows one realization of the noise in the toy example. A corresponding simulation
study is summarized in Table 3.1. For this we generated 10,000 independent copies of the data sets
X(100×100, true signal rankr
1= 2) andY(10000×100, true signal rankr
2= 3). Then for several
low rank approximations (columns of Table 3.1) we calculated the estimate of the angle between
the true signal and the low rank approximation. Table 3.1 reports the percentage of the times the
corresponding quantile of the resampled estimate is bigger than the true angle for the matrix
X.
Table 3.1: Coverages of the prediction intervals of the true angle between the signal row(Ak,1) and its estimator row( ˜Ak) for the matrixX in the toy example. Rows are nominal levels. Columns are ranks of
approximation (where 2 is the correct rank). The simulation based on 10000 realizations of Xshows good performance for this square matrix.