or X and its CC or PC vector is described by a matrix, which is related to eigenvectors and the covariance matrix of the appropriate random vector. A difference between the two
1. The mean of T is
3.5.2 Transforms with Non-Singular Matrices
The preceding section demonstrates that the CCs of the original data can differ from those of the transformed data for linear transformations with singular matrices. In this section we focus on non-singular matrices. Theorem3.11remains unchanged, but we are now able to explicitly compare the CCs of the original and the transformed vectors. The key properties of the transformed CCs are presented in Theorem3.12. The results are useful in their own right but also show some interesting features of CCs, the associated eigenvectors and canonical
3.5 Canonical Correlations and Transformed Data 91 transforms. Because Theorem3.12is of particular interest for data, I summarise the data results and point out relevant changes from the population case.
Theorem 3.12 Forρ = 1,2, let X[ρ]∼
μρ,ρ
, and assume that theρ are non-singular with rank dρ. Let C be the canonical correlation matrix of X[1]and X[2], and let r be the rank of C and Pϒ QTits singular value decomposition. Let Aρ be non-singular matrices of size dρ× dρ, and let aρ be dρ-dimensional vectors. Put
T[ρ]= AρX[ρ]+ aρ.
Let CT be the canonical correlation matrix of T[1]and T[2], and write CT= PTϒTQTTfor its singular value decomposition. The following hold:
1. CTand C have the same singular values, and henceϒT= ϒ.
2. For k, ≤ r, the kth left and the th right eigenvectors pT ,k and qT ,of CTand the cor-responding canonical transformsϕT ,kandψT ,of the T[ρ]are related to the analogous quantities of the X[ρ]by
pT ,k=
A11AT1
1/2
( AT1)−1−1/21 pk and qT ,=
A22AT2
1/2
( AT2)−12−1/2q, ϕT ,k= (AT1)−11−1/2ϕk and ψT ,= (AT2)−12−1/2ψ.
3. The kth andth canonical correlation scores of T are UT ,k= pTk1−1/2(X[1]− μ1) and
VT ,= qT2−1/2(X[2]− μ2), and their covariance matrix is
var (
U(k)T V(T)
!
=
Ik×k ϒk×
ϒTk× I×
.
The theorem states that the strength of the correlation is the same for the original and transformed data. The weights which combine the raw or transformed data may, how-ever, differ. Thus the theorem establishes the invariance of canonical correlations under non-singular linear transformations, and it shows this invariance by comparing the singular values and CC scores of the original and transformed data. We find that
• the singular values of the canonical correlation matrices of the random vectors and the transformed vectors are the same,
• the canonical correlation scores of the random vectors and the transformed random vectors are identical (up to a sign), that is,
UT ,k= Uk and VT ,= V, for k, = 1,...,r, and
• consequently, the covariance matrix of the CC scores remains the same, namely, cov (UT ,k, VT ,)= cov(Uk, V)= υkδk.
Before we look at a proof of Theorem3.12, we consider what changes occur when we deal with transformed data
T[ρ]= AρX[ρ]+ aρ forρ = 1,2.
Going from the population to the sample, we replace the true parameters by their estimators.
So the means are replaced by their sample means, the covariance matrices by the sample covariance matrices S and the canonical correlation matrix CTby CT. The most noticeable difference is the change from the pairs of scalar canonical correlation scores to pairs of vectors of length n when we consider data.
I present the proof of Theorem3.12, because it reveals important facts and relationships.
To make the proof more transparent, I begin with some notation and then prove two lemmas.
For X[ρ], T[ρ], C and CTas in Theorem3.12, put
R[C]= CCT and K = [var(X[1])]−1/2R[C][var (X[1])]1/2;
RT[C]= CTCTT and KT= [var(T[1])]−1/2R[C]T [var (T[1])]1/2. (3.22) A comparison with (3.4) shows that I have omitted the second superscript ‘1’ in R[C]. In the current proof we refer to CCT and so only make the distinction when necessary. The sequence
CT←→ RT[C]←→ KT←→ K ←→ R[C]←→ C (3.23) will be useful in the proof of Theorem3.12because the theorem makes statements about the endpoints CTand C in the sequence. As we shall see in the proofs, relationships about KT and K are the starting points because we can show that they are similar matrices.
Lemma 1 Assume that the X[ρ] satisfy the assumptions of Theorem 3.12. Let υ1> υ2
> ··· > υrbe the singular values of C. The following hold.
1. The matrices R[C], K , R[C]T and KTas in (3.22) have the same eigenvalues υ12> υ22> ··· > υr2.
2. The singular values of CTcoincide with those of C.
Proof To prove the statements about the eigenvalues and singular values, we will make repeated use of the fact that similar matrices have the same eigenvalues; see Result1.8of Section1.5.1. So our proof needs to establish similarity relationships between matrices.
The aim is to relate the singular values of CT and C. As it is not easy to do this directly, we travel along the path in (3.23) and exhibit relationships between the neighbours in (3.23).
By Proposition3.1, R[C] has positive eigenvalues, and the singular values of C are the positive square roots of the eigenvalues of R[C]. A similar relationship holds for RT[C] and CT . These relationships deal with th e two ends of th e sequence (3.23).
The definition of K implies that it is similar to R[C], so K and R[C]have the same eigen-values. An analogous result holds for KTand RT[C]. It remains to establish the similarity of K and KT. This last similarity will establish that the singular values of C and CTare identical.
3.5 Canonical Correlations and Transformed Data 93 We begin with K . We substitute the expression for R[C]and re-write K as follows:
K= −1/21 R[C]11/2
= −1/21 1−1/2122−1/22−1/2T121−1/21/21
= −11 122−112T . (3.24)
A similar expression holds for KT. It remains to show that K and KT are similar. To do this, we go back to the definition of KT and use the fact that Aρ andvar (T) are invertible.
Now
KT= [var(T[1])]−1cov (T[1], T[2])[var (T[2])]−1cov (T[1], T[2])T
= (A11AT1)−1( A112AT2)( A22AT2)−1( A212T AT1)
= (AT1)−11−1A−11 A112AT2( AT2)−12−1A−12 A212T AT1
= (AT1)−11−1122−112T AT1
= (AT1)−1K AT1. (3.25)
The second equality in (3.25) uses the variance results of part 2 of Theorem3.11. To show the last equality, use (3.24). The sequence of equalities establishes the similarity of the two matrices.
So far we have shown that the four terms in the middle of the sequence (3.23) are similar matrices, so have the same eigenvalues. This proves part 1 of the lemma. Because the sin-gular values of CTare the square roots of the eigenvalues of RT[C], CTand C have the same singular values.
Lemma 2 Assume that the X[ρ] satisfy the assumptions of Theorem3.12. Let υ > 0 be a singular value of C with corresponding left eigenvector p. Define R[C], K and KT as in (3.22).
1. If r is the eigenvector of R[C]corresponding toυ, then r= p.
2. If s is the eigenvector of K corresponding toυ, then s= 1−1/2p
−1/21 p and p= 11/2s
11/2s . 3. If sTis the eigenvector of KTcorresponding toυ, then
sT= ( AT1)−1s
(AT1)−1s and s= AT1sT
AT1sT .
Proof Part 1 follows directly from Proposition3.1because the left eigenvectors of C are the eigenvectors of R[C]. To show part 2, we establish relationships between appropriate eigenvectors of objects in the sequence (3.23).
We first exhibit relationships between eigenvectors of similar matrices. For this purpose, let B and D be similar matrices which satisfy B= E DE−1for some matrix E. Letλ be an
eigenvalue of D and hence also of B. Let e be the eigenvector of B which corresponds toλ.
We have
Be= λe = E DE−1e.
Pre-multiplying by the matrix E−1leads to
E−1Be= λE−1e= DE−1e.
Letη be the eigenvector of D which corresponds to λ. The uniqueness of the eigenvalue–
eigenvector decomposition implies that E−1e is a scalar multiple of the eigenvectorη of D.
This last fact leads to the relationships η = 1
c1E−1e or equivalently,
e= c1Eη for some real c1, (3.26)
and E therefore is the link between the eigenvectors. Unless E is an isometry, c1is required because eigenvectors in this book are vectors of norm 1.
We return to the matrices R[C]and K . Fix k≤ r, the rank of C. Let υ be the kth eigenvalue of R[C] and hence also of K , and consider the eigenvector p of R[C] and s of K which correspond toυ. Because K = [var(X[1])]−1/2R[C][var(X[1])]1/2, (3.26) implies that
p= c2[var(X[1])]1/2s= c211/2s,
for some real c2. Now p has unit norm, so c−12 =11/2s, and the results follows. A similar calculation leads to the results in part 3.
We return to Theorem3.12and prove it with the help of the two lemmas.
Proof of Theorem 3.12 Part 1 follows from Lemma 1. For part 2, we need to find rela-tionships between the eigenvectors of C and CT. We obtain this relationship via the sequence (3.23) and with the help of Lemma2. By part 1 of Lemma2it suffices to consider the sequence
RT[C]←→ KT←→ K ←→ R[C].
We start with the eigenvectors of R[C]T . Fix k≤ r. Let υ2be the kth eigenvalue of R[C]T and hence also of KT, K and R[C]. Let pTand p be the corresponding eigenvectors of RT[C]and R[C], and sTand s those of KTand K, respectively. We start with the pair (pT, sT). From the definitions (3.22), we obtain
pT= c1[var (T[1])]1/2sT
= c1c2[var (T[1])]1/2( AT1)−1s
= c1c2c3[var (T[1])]1/2( AT1)−11−1/2p
by parts 2 and 3 of Lemma2, where the constants ciare appropriately chosen. Put c= c1c2c3. We find the value of c by calculating the norm of)p = [var(T[1])]1/2( AT1)−11−1/2p. In the
3.5 Canonical Correlations and Transformed Data 95 next calculation, I omit the subscript and superscript 1 in T, A and. Now,
)p2= pT−1/2A−1( A AT)1/2( A AT)1/2( AT)−1−1/2p
= pT−1/2A−1( A AT)( AT)−1−1/2p
= pT−1/2−1/2p= p2= 1
follows from the definition ofvar (T) and the fact that ( A AT)1/2( A AT)1/2= A AT. The calculations show that c= ±1, thus giving the desired result.
For the eigenvectors qT and q, we base the calculations on R[C,2] = CTC and recall that the eigenvectors of CTC are the right eigenvectors of C. This establishes the relation-ship between qT and q. The results for canonical transforms follow from the preceding calculations and the definition of the canonical transforms in (3.8).
Part 3 is a consequence of the definitions and the results established in parts 1 and 2. I now derive the results for T[2]. Fix k≤ r. I omit the indices k for the eigenvector and the superscript 2 in T[2], X[2]and the matrices A and. From (3.6), we find that
VT= qTT[var (T)]−1/2(T− ET). (3.27) We substitute the expressions for the mean and covariance matrix, established in Theo-rem3.11, and the expression for q from part 2 of the current theorem, into (3.27). It follows that
VT=
qT−1/2A−1
A AT1/2
A AT−1/2
(AX + a − Aμ− a)
= qT−1/2A−1A(X− μ)
= qT−1/2(X− μ) = V ,
where V is the corresponding CC score of X. Of course, VT= −qT−1/2(X− μ) is also a solution because eigenvectors are unique only up to a sign. The remainder follows from Theorem3.6because the CC scores of the raw and transformed vectors are the same.
In the proof of Theorem3.12we explicitly use the fact that the transformations Aρ are non-singular. If this assumption is violated, then the results may no longer hold. I illustrate Theorem3.12with an example.
Example 3.7 Theincomedata are an extract from a survey in the San Francisco Bay Area based on more than 9,000 questionnaires. The aim of the survey is to derive a prediction of the annual household income from the other demographic attributes. The income data are also used inHastie, Tibshirani, and Friedman(2001).
Some of the fourteen variables are not suitable for our purpose. We consider the nine variables listed in Table 3.5and the first 1,000 records, excluding records with missing data. Some of these nine variables are categorical, but in the analysis I will not distinguish between the different types of variables. The purpose of this analysis is to illustrate the effect of transformations of the data, and we are not concerned here with interpretations or effect of individual variables. I have split the variables into two groups:X[1]are the personal attributes, other than income, andX[2]are the household attributes, with income as the first variable. The raw data are shown in the top panel of Figure3.5, with the variables shown
Table 3.5 Variables of the income data from Example3.7
PersonalX[1] HouseholdX[2]
Marital status Income
Age No. in household
Level of education No. under 18 Occupation Householder status
Type of home
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
0 2 4 6 8
−10 0 10
Figure 3.5 Income data from Example3.7: (top): raw data; (bottom): transformed data.
on the x -axis, starting with the variables ofX[1], and followed by those ofX[2]in the order they are listed in the Table3.5.
It is not easy to understand or interpret the parallel coordinate plot of the raw data. The lack of clarity is a result of the way the data are coded: large values for income repre-sent a large income, whereas the variable occupation has a ‘one’ for ‘professional’, and its largest positive integer refers to ‘unemployed’; hence occupation is negatively correlated with income. A consequence is the criss-crossing of the lines in the top panel.
We transform the data in order to disentangle this crossing over. Put a= 0 and A= diag
2. 0 1. 4 1. 6 −1.2 1.1 1.1 1.1 −5.0 −2.5 .
The transformationX → AX scales the variables and changes the sign of variables such as occupation. The transformed data are displayed in the bottom panel of Figure 3.5. Vari-ables 4, 8, and 9 have smaller values than the others, a consequence of the particular transformation I have chosen.
The matrix of canonical correlations has singular values 0. 7762, 0. 4526, 0. 3312, and 0. 1082, and these coincide with the singular values of the transformed canonical correlation matrix. The entries of the first normalised canonical transforms for both raw and transformed data are given in Table3.6. The variable age has the highest weight for both the raw and transformed data, followed by education. Occupation has the smallest weight and opposite signs for the raw and transformed data. The change in sign is a consequence of the negative entry in A for occupation. Householder status has the highest weight among theX[2] vari-ables and so is most correlated with theX[1]data. This is followed by the income variable.
3.5 Canonical Correlations and Transformed Data 97 Table 3.6 First raw and transformed normalised canonical transforms from Example3.7
X[1] ϕ raw ϕ trans X[2] ψ raw ψ trans
Marital status 0.4522 0.3461 Income −0.1242 −0.4565
Age −0.6862 −0.7502 No. in household 0.1035 0.3802
Education −0.5441 −0.5205 No. under 18 0.0284 0.1045
Occupation 0.1690 −0.2155 Householder status 0.9864 −0.7974 Type of home −0.0105 0.0170
1 2 3 4
−2 0 2
1 2 3 4 5
1 2 3 4 1 2 3 4 5
−1 0 1 2
−5 0 5
−6
−4
−2 0 2 4
Figure 3.6 Contributions of CC scores along first canonical transforms from Example3.7: (top row) raw data; (bottom row) transformed data. TheX[1]plots are shown in the left panels and theX[2]plots on the right.
Again, we see that the signs of the weights change for negative entries of A, here for the variables householder status and type of home.
Figure 3.6 shows the information given in Table 3.6, and in particular highlights the change in sign of the weights of the canonical transforms. The figure shows the contribu-tions of the first CC scores in the direction of the first canonical transforms, that is, parallel coordinate plots of ϕ1U•1 for X[1] and ψ1V•1 for X[2] with the variable numbers on the x -axis. TheX[1]plots are displayed in the left panels and the correspondingX[2]plots in the right panels. The top row shows the raw data, and the bottom row shows the transformed data.
The plots show clearly where a change in sign occurs in the entries of the canonical transforms: the lines cross over. The sign change between variables 3 and 4 ofϕ is apparent in the raw data but no longer exists in the transformed data. Similar sign changes exist for theX[2] plots. Further, because of the larger weights of the first two variables of the X[2]
transformed data, these two variables have much more variability for the transformed data.
It is worth noting that the CC scores of the raw and transformed data agree because the matrices Sρand A are invertible. Hence, as stated in part 3 of Theorem3.12, the CC scores are invariant under this transformation.
For the income data, we applied a transformation to the data, but in other cases the data may only be available in transformed form. Example3.7 shows the differences between the analysis of the raw and transformed data. If the desired result is the strength of the correlation between combinations of variables, then the transformation is not required. If a more detailed analysis is appropriate, then the raw and transformed data allow different insights into the data. The correlation analysis only shows the strength of the relationship and not the sign, and the decrease rather than an increase of a particular variable could be important.
The transformation of Example3.6is based on a singular matrix, and as we have seen there, the CCs are not invariant under the transformation. In Example3.7, A is non-singular, and the CCs remain the same. Thus the simple univariate case does not carry across to the multivariate scenario in general, and care needs to be taken when working with transformed data.