and the kth pair of canonical (correlation) transforms is

Canonical Correlation Analysis

3. and the kth pair of canonical (correlation) transforms is

The eigenvectors pkand qksatisfy

Cq_k= υkp_k and C^Tp_k= υkq_k, (3.5) and because of this relationship, we call them the left and right eigenvectors of C. See Definition1.12in Section1.5.3.

Throughout this chapter I use the submatrix notation (1.21) of Section1.5.2and so will write Q_mfor the m× r submatrix of Q, where m ≤ r, and similarly for other submatrices.

We are now equipped to define the canonical correlations.

Definition 3.3 Let X^[1]∼ (μ1,1) and X^[2]∼ (μ2,2). For ρ = 1,2, let X^[^ρ]be the sphered vector

X^[^ρ]= _ρ^−1/2

X^[^ρ]− μ_ρ .

Let12be the between covariance matrix of X^[1]and X^[2]. Let C be the matrix of canonical correlations of X^[1] and X^[2], and write C= Pϒ Q^T for its singular value decomposition.

Consider k= 1,...,r.

1. The kth pair of canonical correlation scores or canonical variates is

U_k= p^Tk X^[1] and V_k= q^Tk X^[2] ; (3.6) 2. the k-dimensional pair of vectors of canonical correlations or vectors of canonical

variates is

3. and the kth pair of canonical (correlation) transforms is

ϕk= ^−1/2₁ p_k and ψk= ^−1/2₂ q_k. (3.8)

For brevity, I sometimes refer to the pair of canonical correlations scores or vectors as CC scores, or vectors of CC scores or simply as canonical correlations.

I remind the reader that our definitions of the canonical correlation scores and vectors use the centred data, unlike other treatments of this topic, which use uncentred vectors.

It is worth noting that the canonical correlation transforms of (3.8) are, in general, not unit vectors because they are linear transforms of unit vectors, namely, the eigenvectors p_k and q_k.

The canonical correlation scores are also called the canonical (correlation) variables.

Mardia, Kent, and Bibby(1992) use the term canonical correlation vectors for theϕk and ψk of (3.8). To distinguish the vectors of canonical correlations U^(k) and V^(k) of part 2

3.2 Population Canonical Correlations 75 of the definition from the ϕk and ψk, I prefer the term transforms for the ϕk and ψk

because these vectors are transformations of the directions p_kand q_kand result in the pair of scores

U_k= ϕ^T_k(X^[1]− μ1) and V_k= ψ^T_k(X^[2]− μ2) for k= 1,...,r. (3.9) At times, I refer to a specific pair of transforms, but typically we are interested in the first p pairs with p≤ r. We write

ϕ₁ϕ₂···ϕ_r

and  =

ψ₁ψ₂···ψ_r

for the matrices of canonical correlation transforms. The entries of the vectors ϕk and ψk are the weights of the variables of X^[1] and X^[2] and so show which variables con-tribute strongly to correlation and which might be negligible. Some authors, including Mardia, Kent, and Bibby(1992), define the CC scores as in (3.9). Naively, one might think of the vectors ϕ_k and ψ_k as sphered versions of the eigenvectors p_k and q_k, but this is incorrect; 1 is the covariance matrix of X^[1], and p_k is the kth eigenvector of the non-random CC^T.

I prefer the definition (3.6) to (3.9) for reasons which are primarily concerned with the interpretation of the results, namely,

1. the vectors p_k and q_k are unit vectors, and their entries are therefore easy to interpret, and

2. the scores are given as linear combinations of uncorrelated random vectors, the sphered vectors X^[1] and X^[2] . Uncorrelated variables are more amenable to an interpreta-tion of the contribuinterpreta-tion of each variable to the correlainterpreta-tion between the pairs U_k and Vk.

Being eigenvectors, the p_k and q_k values play a natural role as directions, and they are some of the key quantities when dealing with correlation for transformed random vectors in Section3.5.2as well as in the variable ranking based on the correlation matrix which I describe in Section13.3.

The canonical correlation scores (3.6) and vectors (3.7) play a role similar to the PC scores and vectors in Principal Component Analysis, and the vectors p_k and q_kremind us of the eigenvectorηk. However, there is a difference: Principal Component Analysis is based on the raw or scaled data, whereas the vectors pk and qk relate to the sphered data. This difference is exhibited in (3.6) but is less apparent in (3.9). The explicit nature of (3.6) is one of the reasons why I prefer (3.6) as the definition of the scores.

Before we leave the population case, we compare the between covariance matrix12and the canonical correlation matrix in a specific case.

Example 3.1 The car data is a subset of the 1983 ASA Data Exposition of Ramos and Donoho(1983). We use their five continuous variables: displacement, horse-power, weight, acceleration and miles per gallon (mpg). The first three variables correspond to physical properties of the cars, whereas the remaining two are performance-related. We combine the first three variables into one part and the remaining two into the second part.

The random vectors X^[1]_i have the variables displacement, horsepower, and weight, and the random vectors X^[2]_i have the variables acceleration and mpg. We consider the sample

vari-ance and covarivari-ance matrix of the X^[^ρ] in lieu of the respective population quantities. We obtain the 3× 2 matrices

12=

⎛

⎝−157.0 − 657.6

− 73.2 − 233.9

−976.8 −5517.4

⎞

⎠

and

⎛

⎝−0.3598 −0.1131

−0.5992 −0.0657

−0.1036 −0.8095

⎞

⎠. (3.10)

Both matrices have negative entries, but the entries are very different: the between covari-ance matrix12 has entries of arbitrary size. In contrast, the entries of C are correlation coefficients and are in the interval [− 1,0] in this case and, more generally, in [ − 1,1]. As a consequence, C explicitly reports the strength of the relationship between the variables.

Weight and mpg are most strongly correlated with an entry of−5,517.4 in the covariance matrix, and−0.8095 in the correlation matrix. Although −5,517.4 is a large negative value, it does not lead to a natural interpretation of the strength of the relationship between the variables.

An inspection of C shows that the strongest absolute correlation exists between the vari-ables weight and mpg. In Section3.4we examine whether a combination of variables will lead to a stronger correlation.

3.3 Sample Canonical Correlations

In Example3.1, I calculate the covariance matrix from data because we do not know the true population covariance structure. In this section, I define canonical correlation concepts for data. At the end of this section, we return to Example3.1and calculate the CC scores.

The sample definitions are similar to those of the preceding section, but because we are dealing with a sample and do not know the true means and covariances, there are important differences. Table3.1summarises the key quantities for both the population and the sample.

We begin with some notation for the sample. Forρ = 1,2, let X^[ρ]=

X^[₁^ρ]X^[₂^ρ]···X^[n^ρ]

be d_ρ×n data which consist of n independent d_ρ-dimensional random vectors X^[_i^ρ]. The data X^[1]andX^[2]usually have a different number of variables, but measurements on the same n objects are carried out forX^[1]andX^[2]. This fact is essential for the type of comparison we want to make.

We assume that the X^[_i^ρ] have sample mean X_ρ and sample covariance matrix S_ρ. Sometimes it will be convenient to consider the combined data. We write

X =

X^[1]

X^[2]

∼Sam(X, S),

3.3 Sample Canonical Correlations 77

Here S₁₂is the d₁×d2(sample) between covariance matrix ofX^[1]andX^[2]defined by S₁₂= 1 between covariance matrix. Assume that S₁and S₂are non-singular. The matrix of sample canonical correlations or the sample canonical correlation matrix is

C= S₁^−1/2S₁₂S₂^−1/2,

and the pair of matrices of sample multivariate coefficients of determination are R^[C,1]= C C^T and R^[C,2]= C^TC. (3.12)

As in (1.9) of Section1.3.2, we use the subscript cent to refer to the centred data. With this notation, the d₁× d2matrix

and C has the singular value decomposition

C= P ϒ Q^T,

and r is the rank of S₁₂ and hence also of C. In the population case I mentioned that we may want to replace a singular₁^−1/2, with rank r< d1, by itsr^−1/2r ^T_r. We may want to make an analogous replacement in the sample case.

In the population case, we define the canonical correlation scores U_kand V_kof the vectors X^[1] and X^[2]. The sample CC scores will be vectors of size n – similar to the PC sample scores – with one value for each observation.

Definition 3.5 LetX^[1]∼Sam X1, S₁

andX^[2]∼Sam X2, S₂

. Forρ = 1,2, let X^[_S^ρ]be the sphered data. Let S₁₂be the sample between covariance matrix and C the sample canonical correlation matrix ofX^[1]andX^[2]. Write P ϒ Q^Tfor the singular value decomposition of C.

Consider k= 1,...,r, with r the rank of S12.

1. The kth pair of canonical correlation scores or canonical variates is U_•k= p^T_kX^[1]_S and V_•k= q^T_kX^[2]_S ;

2. the k-dimensional canonical correlation data or data of canonical variates consist of

In document Koch I. Analysis of Multivariate and High-Dimensional Data 2013 (Page 102-106)