Canonical Correlation Analysis
3. and the kth pair of canonical (correlation) transforms is
The eigenvectors pkand qksatisfy
Cqk= υkpk and CTpk= υkqk, (3.5) and because of this relationship, we call them the left and right eigenvectors of C. See Definition1.12in Section1.5.3.
Throughout this chapter I use the submatrix notation (1.21) of Section1.5.2and so will write Qmfor the m× r submatrix of Q, where m ≤ r, and similarly for other submatrices.
We are now equipped to define the canonical correlations.
Definition 3.3 Let X[1]∼ (μ1,1) and X[2]∼ (μ2,2). For ρ = 1,2, let X[ρ]be the sphered vector
X[ρ]= ρ−1/2
X[ρ]− μρ .
Let12be the between covariance matrix of X[1]and X[2]. Let C be the matrix of canonical correlations of X[1] and X[2], and write C= Pϒ QT for its singular value decomposition.
Consider k= 1,...,r.
1. The kth pair of canonical correlation scores or canonical variates is
Uk= pTk X[1] and Vk= qTk X[2] ; (3.6) 2. the k-dimensional pair of vectors of canonical correlations or vectors of canonical
variates is
3. and the kth pair of canonical (correlation) transforms is
ϕk= −1/21 pk and ψk= −1/22 qk. (3.8)
For brevity, I sometimes refer to the pair of canonical correlations scores or vectors as CC scores, or vectors of CC scores or simply as canonical correlations.
I remind the reader that our definitions of the canonical correlation scores and vectors use the centred data, unlike other treatments of this topic, which use uncentred vectors.
It is worth noting that the canonical correlation transforms of (3.8) are, in general, not unit vectors because they are linear transforms of unit vectors, namely, the eigenvectors pk and qk.
The canonical correlation scores are also called the canonical (correlation) variables.
Mardia, Kent, and Bibby(1992) use the term canonical correlation vectors for theϕk and ψk of (3.8). To distinguish the vectors of canonical correlations U(k) and V(k) of part 2
3.2 Population Canonical Correlations 75 of the definition from the ϕk and ψk, I prefer the term transforms for the ϕk and ψk
because these vectors are transformations of the directions pkand qkand result in the pair of scores
Uk= ϕTk(X[1]− μ1) and Vk= ψTk(X[2]− μ2) for k= 1,...,r. (3.9) At times, I refer to a specific pair of transforms, but typically we are interested in the first p pairs with p≤ r. We write
=
ϕ1ϕ2···ϕr
and =
ψ1ψ2···ψr
for the matrices of canonical correlation transforms. The entries of the vectors ϕk and ψk are the weights of the variables of X[1] and X[2] and so show which variables con-tribute strongly to correlation and which might be negligible. Some authors, including Mardia, Kent, and Bibby(1992), define the CC scores as in (3.9). Naively, one might think of the vectors ϕk and ψk as sphered versions of the eigenvectors pk and qk, but this is incorrect; 1 is the covariance matrix of X[1], and pk is the kth eigenvector of the non-random CCT.
I prefer the definition (3.6) to (3.9) for reasons which are primarily concerned with the interpretation of the results, namely,
1. the vectors pk and qk are unit vectors, and their entries are therefore easy to interpret, and
2. the scores are given as linear combinations of uncorrelated random vectors, the sphered vectors X[1] and X[2] . Uncorrelated variables are more amenable to an interpreta-tion of the contribuinterpreta-tion of each variable to the correlainterpreta-tion between the pairs Uk and Vk.
Being eigenvectors, the pk and qk values play a natural role as directions, and they are some of the key quantities when dealing with correlation for transformed random vectors in Section3.5.2as well as in the variable ranking based on the correlation matrix which I describe in Section13.3.
The canonical correlation scores (3.6) and vectors (3.7) play a role similar to the PC scores and vectors in Principal Component Analysis, and the vectors pk and qkremind us of the eigenvectorηk. However, there is a difference: Principal Component Analysis is based on the raw or scaled data, whereas the vectors pk and qk relate to the sphered data. This difference is exhibited in (3.6) but is less apparent in (3.9). The explicit nature of (3.6) is one of the reasons why I prefer (3.6) as the definition of the scores.
Before we leave the population case, we compare the between covariance matrix12and the canonical correlation matrix in a specific case.
Example 3.1 The car data is a subset of the 1983 ASA Data Exposition of Ramos and Donoho(1983). We use their five continuous variables: displacement, horse-power, weight, acceleration and miles per gallon (mpg). The first three variables correspond to physical properties of the cars, whereas the remaining two are performance-related. We combine the first three variables into one part and the remaining two into the second part.
The random vectors X[1]i have the variables displacement, horsepower, and weight, and the random vectors X[2]i have the variables acceleration and mpg. We consider the sample
vari-ance and covarivari-ance matrix of the X[ρ] in lieu of the respective population quantities. We obtain the 3× 2 matrices
12=
⎛
⎝−157.0 − 657.6
− 73.2 − 233.9
−976.8 −5517.4
⎞
⎠
and
C=
⎛
⎝−0.3598 −0.1131
−0.5992 −0.0657
−0.1036 −0.8095
⎞
⎠. (3.10)
Both matrices have negative entries, but the entries are very different: the between covari-ance matrix12 has entries of arbitrary size. In contrast, the entries of C are correlation coefficients and are in the interval [− 1,0] in this case and, more generally, in [ − 1,1]. As a consequence, C explicitly reports the strength of the relationship between the variables.
Weight and mpg are most strongly correlated with an entry of−5,517.4 in the covariance matrix, and−0.8095 in the correlation matrix. Although −5,517.4 is a large negative value, it does not lead to a natural interpretation of the strength of the relationship between the variables.
An inspection of C shows that the strongest absolute correlation exists between the vari-ables weight and mpg. In Section3.4we examine whether a combination of variables will lead to a stronger correlation.
3.3 Sample Canonical Correlations
In Example3.1, I calculate the covariance matrix from data because we do not know the true population covariance structure. In this section, I define canonical correlation concepts for data. At the end of this section, we return to Example3.1and calculate the CC scores.
The sample definitions are similar to those of the preceding section, but because we are dealing with a sample and do not know the true means and covariances, there are important differences. Table3.1summarises the key quantities for both the population and the sample.
We begin with some notation for the sample. Forρ = 1,2, let X[ρ]=
X[1ρ]X[2ρ]···X[nρ]
be dρ×n data which consist of n independent dρ-dimensional random vectors X[iρ]. The data X[1]andX[2]usually have a different number of variables, but measurements on the same n objects are carried out forX[1]andX[2]. This fact is essential for the type of comparison we want to make.
We assume that the X[iρ] have sample mean Xρ and sample covariance matrix Sρ. Sometimes it will be convenient to consider the combined data. We write
X =
X[1]
X[2]
∼Sam(X, S),
3.3 Sample Canonical Correlations 77
Here S12is the d1×d2(sample) between covariance matrix ofX[1]andX[2]defined by S12= 1 between covariance matrix. Assume that S1and S2are non-singular. The matrix of sample canonical correlations or the sample canonical correlation matrix is
C= S1−1/2S12S2−1/2,
and the pair of matrices of sample multivariate coefficients of determination are R[C,1]= C CT and R[C,2]= CTC. (3.12)
As in (1.9) of Section1.3.2, we use the subscript cent to refer to the centred data. With this notation, the d1× d2matrix
and C has the singular value decomposition
C= P ϒ QT,
and r is the rank of S12 and hence also of C. In the population case I mentioned that we may want to replace a singular1−1/2, with rank r< d1, by itsr−1/2r Tr. We may want to make an analogous replacement in the sample case.
In the population case, we define the canonical correlation scores Ukand Vkof the vectors X[1] and X[2]. The sample CC scores will be vectors of size n – similar to the PC sample scores – with one value for each observation.
Definition 3.5 LetX[1]∼Sam X1, S1
andX[2]∼Sam X2, S2
. Forρ = 1,2, let X[Sρ]be the sphered data. Let S12be the sample between covariance matrix and C the sample canonical correlation matrix ofX[1]andX[2]. Write P ϒ QTfor the singular value decomposition of C.
Consider k= 1,...,r, with r the rank of S12.
1. The kth pair of canonical correlation scores or canonical variates is U•k= pTkX[1]S and V•k= qTkX[2]S ;
2. the k-dimensional canonical correlation data or data of canonical variates consist of