Transforms with Non-Singular Matrices

or X and its CC or PC vector is described by a matrix, which is related to eigenvectors and the covariance matrix of the appropriate random vector. A difference between the two

1. The mean of T is

3.5.2 Transforms with Non-Singular Matrices

The preceding section demonstrates that the CCs of the original data can differ from those of the transformed data for linear transformations with singular matrices. In this section we focus on non-singular matrices. Theorem3.11remains unchanged, but we are now able to explicitly compare the CCs of the original and the transformed vectors. The key properties of the transformed CCs are presented in Theorem3.12. The results are useful in their own right but also show some interesting features of CCs, the associated eigenvectors and canonical

3.5 Canonical Correlations and Transformed Data 91 transforms. Because Theorem3.12is of particular interest for data, I summarise the data results and point out relevant changes from the population case.

Theorem 3.12 Forρ = 1,2, let X^[^ρ]∼

μ_ρ,_ρ

, and assume that the_ρ are non-singular with rank d_ρ. Let C be the canonical correlation matrix of X^[1]and X^[2], and let r be the rank of C and Pϒ Q^Tits singular value decomposition. Let A_ρ be non-singular matrices of size d_ρ× d_ρ, and let a_ρ be d_ρ-dimensional vectors. Put

T^[ρ]= A_ρX^[ρ]+ a_ρ.

Let C_T be the canonical correlation matrix of T^[1]and T^[2], and write C_T= PTϒTQ^T_Tfor its singular value decomposition. The following hold:

1. C_Tand C have the same singular values, and henceϒT= ϒ.

2. For k, ≤ r, the kth left and the th right eigenvectors pT ,k and q_{T ,}of C_Tand the cor-responding canonical transformsϕT ,kandψT ,of the T^[^ρ]are related to the analogous quantities of the X^[^ρ]by

p_{T ,k}=

A₁1A^T₁

₁_/2

( A^T₁)⁻¹^−1/2₁ p_k and q_{T ,}=

A₂2A^T₂

₁_/2

( A^T₂)⁻¹₂^−1/2q, ϕT ,k= (A^T₁)⁻¹₁^−1/2ϕk and ψT ,= (A^T₂)⁻¹₂^−1/2ψ.

3. The kth andth canonical correlation scores of T are U_{T ,k}= p^T_k₁^−1/2(X^[1]− μ₁) and

V_{T ,}= q^T₂^−1/2(X^[2]− μ₂), and their covariance matrix is

var (

U^(k)_T V⁽_T⁾

I_k_×k ϒk×

ϒ^T_k_× I_×

The theorem states that the strength of the correlation is the same for the original and transformed data. The weights which combine the raw or transformed data may, how-ever, differ. Thus the theorem establishes the invariance of canonical correlations under non-singular linear transformations, and it shows this invariance by comparing the singular values and CC scores of the original and transformed data. We find that

• the singular values of the canonical correlation matrices of the random vectors and the transformed vectors are the same,

• the canonical correlation scores of the random vectors and the transformed random vectors are identical (up to a sign), that is,

U_{T ,k}= Uk and V_{T ,}= V, for k, = 1,...,r, and

• consequently, the covariance matrix of the CC scores remains the same, namely, cov (U_{T ,k}, V_{T ,})= cov(Uk, V)= υkδk.

Before we look at a proof of Theorem3.12, we consider what changes occur when we deal with transformed data

T^[^ρ]= A_ρX^[^ρ]+ a_ρ forρ = 1,2.

Going from the population to the sample, we replace the true parameters by their estimators.

So the means are replaced by their sample means, the covariance matrices by the sample covariance matrices S and the canonical correlation matrix CTby C_T. The most noticeable difference is the change from the pairs of scalar canonical correlation scores to pairs of vectors of length n when we consider data.

I present the proof of Theorem3.12, because it reveals important facts and relationships.

To make the proof more transparent, I begin with some notation and then prove two lemmas.

For X^[ρ], T^[ρ], C and C_Tas in Theorem3.12, put

R^[C]= CC^T and K = [var(X^[1])]^−1/2R^[C][var (X^[1])]¹^/2;

R_T^[C]= CTC_T^T and K_T= [var(T^[1])]^−1/2R^[C]_T [var (T^[1])]¹^/2. (3.22) A comparison with (3.4) shows that I have omitted the second superscript ‘1’ in R^[C]. In the current proof we refer to CC^T and so only make the distinction when necessary. The sequence

C_T←→ R_T^[C]←→ KT←→ K ←→ R^[C]←→ C (3.23) will be useful in the proof of Theorem3.12because the theorem makes statements about the endpoints C_Tand C in the sequence. As we shall see in the proofs, relationships about K_T and K are the starting points because we can show that they are similar matrices.

Lemma 1 Assume that the X^[^ρ] satisfy the assumptions of Theorem 3.12. Let υ1> υ2

> ··· > υrbe the singular values of C. The following hold.

1. The matrices R^[C], K , R^[C]_T and K_Tas in (3.22) have the same eigenvalues υ₁²> υ₂²> ··· > υ_r².

2. The singular values of C_Tcoincide with those of C.

Proof To prove the statements about the eigenvalues and singular values, we will make repeated use of the fact that similar matrices have the same eigenvalues; see Result1.8of Section1.5.1. So our proof needs to establish similarity relationships between matrices.

The aim is to relate the singular values of C_T and C. As it is not easy to do this directly, we travel along the path in (3.23) and exhibit relationships between the neighbours in (3.23).

By Proposition3.1, R^[C] has positive eigenvalues, and the singular values of C are the positive square roots of the eigenvalues of R^[C]. A similar relationship holds for R_T^[C] and C_T. These relationships deal with th e two ends of th e sequence (3.23).

The definition of K implies that it is similar to R^[C], so K and R^[C]have the same eigen-values. An analogous result holds for K_Tand R_T^[C]. It remains to establish the similarity of K and K_T. This last similarity will establish that the singular values of C and C_Tare identical.

3.5 Canonical Correlations and Transformed Data 93 We begin with K . We substitute the expression for R^[C]and re-write K as follows:

K= ^−1/2₁ R^[C]₁^1/2

= ^−1/2₁ ₁^−1/212₂^−1/2₂^−1/2^T₁₂₁^−1/2^1/2₁

= ⁻¹₁ 12₂⁻¹₁₂^T . (3.24)

A similar expression holds for K_T. It remains to show that K and K_T are similar. To do this, we go back to the definition of K_T and use the fact that A_ρ andvar (T) are invertible.

Now

K_T= [var(T^[1])]⁻¹cov (T^[1], T^[2])[var (T^[2])]⁻¹cov (T^[1], T^[2])^T

= (A11A^T₁)⁻¹( A₁12A^T₂)( A₂2A^T₂)⁻¹( A₂₁₂^T A^T₁)

= (A^T₁)⁻¹₁⁻¹A⁻¹₁ A₁12A^T₂( A^T₂)⁻¹₂⁻¹A⁻¹₂ A₂₁₂^T A^T₁

= (A^T₁)⁻¹₁⁻¹12₂⁻¹₁₂^T A^T₁

= (A^T₁)⁻¹K A^T₁. (3.25)

The second equality in (3.25) uses the variance results of part 2 of Theorem3.11. To show the last equality, use (3.24). The sequence of equalities establishes the similarity of the two matrices.

So far we have shown that the four terms in the middle of the sequence (3.23) are similar matrices, so have the same eigenvalues. This proves part 1 of the lemma. Because the sin-gular values of C_Tare the square roots of the eigenvalues of R_T^[C], C_Tand C have the same singular values.

Lemma 2 Assume that the X^[^ρ] satisfy the assumptions of Theorem3.12. Let υ > 0 be a singular value of C with corresponding left eigenvector p. Define R^[C], K and K_T as in (3.22).

1. If r is the eigenvector of R^[C]corresponding toυ, then r= p.

2. If s is the eigenvector of K corresponding toυ, then s= ₁^−1/2p

^−1/2₁ p and p= ¹₁^/2s

₁^1/2s . 3. If s_Tis the eigenvector of K_Tcorresponding toυ, then

s_T= ( A^T₁)⁻¹s

(A^T₁)⁻¹s and s= A^T₁s_T

A^T₁s_T .

Proof Part 1 follows directly from Proposition3.1because the left eigenvectors of C are the eigenvectors of R^[C]. To show part 2, we establish relationships between appropriate eigenvectors of objects in the sequence (3.23).

We first exhibit relationships between eigenvectors of similar matrices. For this purpose, let B and D be similar matrices which satisfy B= E DE⁻¹for some matrix E. Letλ be an

eigenvalue of D and hence also of B. Let e be the eigenvector of B which corresponds toλ.

We have

Be= λe = E DE⁻¹e.

Pre-multiplying by the matrix E⁻¹leads to

E⁻¹Be= λE⁻¹e= DE⁻¹e.

Letη be the eigenvector of D which corresponds to λ. The uniqueness of the eigenvalue–

eigenvector decomposition implies that E⁻¹e is a scalar multiple of the eigenvectorη of D.

This last fact leads to the relationships η = 1

c₁E⁻¹e or equivalently,

e= c1Eη for some real c₁, (3.26)

and E therefore is the link between the eigenvectors. Unless E is an isometry, c₁is required because eigenvectors in this book are vectors of norm 1.

We return to the matrices R^[C]and K . Fix k≤ r, the rank of C. Let υ be the kth eigenvalue of R^[C] and hence also of K , and consider the eigenvector p of R^[C] and s of K which correspond toυ. Because K = [var(X^[1])]^−1/2R^[C][var(X^[1])]¹^/2, (3.26) implies that

p= c2[var(X^[1])]¹^/2s= c2¹₁^/2s,

for some real c₂. Now p has unit norm, so c⁻¹₂ =₁¹^/2s, and the results follows. A similar calculation leads to the results in part 3.

We return to Theorem3.12and prove it with the help of the two lemmas.

Proof of Theorem 3.12 Part 1 follows from Lemma 1. For part 2, we need to find rela-tionships between the eigenvectors of C and C_T. We obtain this relationship via the sequence (3.23) and with the help of Lemma2. By part 1 of Lemma2it suffices to consider the sequence

R_T^[C]←→ KT←→ K ←→ R^[C].

We start with the eigenvectors of R^[C]_T . Fix k≤ r. Let υ²be the kth eigenvalue of R^[C]_T and hence also of K_T, K and R^[C]. Let p_Tand p be the corresponding eigenvectors of R_T^[C]and R^[C], and s_Tand s those of K_Tand K, respectively. We start with the pair (p_T, s_T). From the definitions (3.22), we obtain

p_T= c1[var (T^[1])]¹^/2s_T

= c1c₂[var (T^[1])]¹^/2( A^T₁)⁻¹s

= c1c₂c₃[var (T^[1])]¹^/2( A^T₁)⁻¹₁^−1/2p

by parts 2 and 3 of Lemma2, where the constants c_iare appropriately chosen. Put c= c1c₂c₃. We find the value of c by calculating the norm of)p = [var(T^[1])]¹^/2( A^T₁)⁻¹₁^−1/2p. In the

3.5 Canonical Correlations and Transformed Data 95 next calculation, I omit the subscript and superscript 1 in T, A and. Now,

)p²= p^T^−1/2A⁻¹( A A^T)^1/2( A A^T)^1/2( A^T)⁻¹^−1/2p

= p^T^−1/2A⁻¹( A A^T)( A^T)⁻¹^−1/2p

= p^T^−1/2^−1/2p= p²= 1

follows from the definition ofvar (T) and the fact that ( A A^T)^1/2( A A^T)^1/2= A A^T. The calculations show that c= ±1, thus giving the desired result.

For the eigenvectors q_T and q, we base the calculations on R^[C,2] = C^TC and recall that the eigenvectors of C^TC are the right eigenvectors of C. This establishes the relation-ship between q_T and q. The results for canonical transforms follow from the preceding calculations and the definition of the canonical transforms in (3.8).

Part 3 is a consequence of the definitions and the results established in parts 1 and 2. I now derive the results for T^[2]. Fix k≤ r. I omit the indices k for the eigenvector and the superscript 2 in T^[2], X^[2]and the matrices A and. From (3.6), we find that

V_T= q^T_T[var (T)]^−1/2(T− ET). (3.27) We substitute the expressions for the mean and covariance matrix, established in Theo-rem3.11, and the expression for q from part 2 of the current theorem, into (3.27). It follows that

V_T=

q^T^−1/2A⁻¹

A A^T₁_/2

A A^T_−1/2

(AX + a − Aμ− a)

= q^T^−1/2A⁻¹A(X− μ)

= q^T^−1/2(X− μ) = V ,

where V is the corresponding CC score of X. Of course, V_T= −q^T^−1/2(X− μ) is also a solution because eigenvectors are unique only up to a sign. The remainder follows from Theorem3.6because the CC scores of the raw and transformed vectors are the same.

In the proof of Theorem3.12we explicitly use the fact that the transformations A_ρ are non-singular. If this assumption is violated, then the results may no longer hold. I illustrate Theorem3.12with an example.

Example 3.7 Theincomedata are an extract from a survey in the San Francisco Bay Area based on more than 9,000 questionnaires. The aim of the survey is to derive a prediction of the annual household income from the other demographic attributes. The income data are also used inHastie, Tibshirani, and Friedman(2001).

Some of the fourteen variables are not suitable for our purpose. We consider the nine variables listed in Table 3.5and the first 1,000 records, excluding records with missing data. Some of these nine variables are categorical, but in the analysis I will not distinguish between the different types of variables. The purpose of this analysis is to illustrate the effect of transformations of the data, and we are not concerned here with interpretations or effect of individual variables. I have split the variables into two groups:X^[1]are the personal attributes, other than income, andX^[2]are the household attributes, with income as the first variable. The raw data are shown in the top panel of Figure3.5, with the variables shown

Table 3.5 Variables of the income data from Example3.7

PersonalX^[1] HouseholdX^[2]

Marital status Income

Age No. in household

Level of education No. under 18 Occupation Householder status

Type of home

1 2 3 4 5 6 7 8 9

0 2 4 6 8

−10 0 10

Figure 3.5 Income data from Example3.7: (top): raw data; (bottom): transformed data.

on the x -axis, starting with the variables ofX^[1], and followed by those ofX^[2]in the order they are listed in the Table3.5.

It is not easy to understand or interpret the parallel coordinate plot of the raw data. The lack of clarity is a result of the way the data are coded: large values for income repre-sent a large income, whereas the variable occupation has a ‘one’ for ‘professional’, and its largest positive integer refers to ‘unemployed’; hence occupation is negatively correlated with income. A consequence is the criss-crossing of the lines in the top panel.

We transform the data in order to disentangle this crossing over. Put a= 0 and A= diag

2. 0 1. 4 1. 6 −1.2 1.1 1.1 1.1 −5.0 −2.5 .

The transformationX → AX scales the variables and changes the sign of variables such as occupation. The transformed data are displayed in the bottom panel of Figure 3.5. Vari-ables 4, 8, and 9 have smaller values than the others, a consequence of the particular transformation I have chosen.

The matrix of canonical correlations has singular values 0. 7762, 0. 4526, 0. 3312, and 0. 1082, and these coincide with the singular values of the transformed canonical correlation matrix. The entries of the first normalised canonical transforms for both raw and transformed data are given in Table3.6. The variable age has the highest weight for both the raw and transformed data, followed by education. Occupation has the smallest weight and opposite signs for the raw and transformed data. The change in sign is a consequence of the negative entry in A for occupation. Householder status has the highest weight among theX^[2] vari-ables and so is most correlated with theX^[1]data. This is followed by the income variable.

3.5 Canonical Correlations and Transformed Data 97 Table 3.6 First raw and transformed normalised canonical transforms from Example3.7

X^[1] ϕ raw ϕ trans X^[2] ψ raw ψ trans

Marital status 0.4522 0.3461 Income −0.1242 −0.4565

Age −0.6862 −0.7502 No. in household 0.1035 0.3802

Education −0.5441 −0.5205 No. under 18 0.0284 0.1045

Occupation 0.1690 −0.2155 Householder status 0.9864 −0.7974 Type of home −0.0105 0.0170

1 2 3 4

−2 0 2

1 2 3 4 5

1 2 3 4 1 2 3 4 5

−1 0 1 2

−5 0 5

−6

−4

−2 0 2 4

Figure 3.6 Contributions of CC scores along first canonical transforms from Example3.7: (top row) raw data; (bottom row) transformed data. TheX^[1]plots are shown in the left panels and theX^[2]plots on the right.

Again, we see that the signs of the weights change for negative entries of A, here for the variables householder status and type of home.

Figure 3.6 shows the information given in Table 3.6, and in particular highlights the change in sign of the weights of the canonical transforms. The figure shows the contribu-tions of the first CC scores in the direction of the first canonical transforms, that is, parallel coordinate plots of ϕ1U_•1 for X^[1] and ψ₁V_•1 for X^[2] with the variable numbers on the x -axis. TheX^[1]plots are displayed in the left panels and the correspondingX^[2]plots in the right panels. The top row shows the raw data, and the bottom row shows the transformed data.

The plots show clearly where a change in sign occurs in the entries of the canonical transforms: the lines cross over. The sign change between variables 3 and 4 ofϕ is apparent in the raw data but no longer exists in the transformed data. Similar sign changes exist for theX^[2] plots. Further, because of the larger weights of the first two variables of the X^[2]

transformed data, these two variables have much more variability for the transformed data.

It is worth noting that the CC scores of the raw and transformed data agree because the matrices S_ρand A are invertible. Hence, as stated in part 3 of Theorem3.12, the CC scores are invariant under this transformation.

For the income data, we applied a transformation to the data, but in other cases the data may only be available in transformed form. Example3.7 shows the differences between the analysis of the raw and transformed data. If the desired result is the strength of the correlation between combinations of variables, then the transformation is not required. If a more detailed analysis is appropriate, then the raw and transformed data allow different insights into the data. The correlation analysis only shows the strength of the relationship and not the sign, and the decrease rather than an increase of a particular variable could be important.

The transformation of Example3.6is based on a singular matrix, and as we have seen there, the CCs are not invariant under the transformation. In Example3.7, A is non-singular, and the CCs remain the same. Thus the simple univariate case does not carry across to the multivariate scenario in general, and care needs to be taken when working with transformed data.

In document Koch I. Analysis of Multivariate and High-Dimensional Data 2013 (Page 118-126)