The goal of principal component analysis is to represent the majority of variabil- ity in a data set in a lower-order linear subspace (Hotelling 1933). This is done by computing orthogonal linear combinations of variables, with the first linear combina- tion explaining the maximum amount of variability. This can be thought of as fitting an ellipsoid to the data set with the axes of the ellipsoid corresponding to the vectors used when creating the linear combinations.
Generally the first k of these linear combinations, called principal component scores, are then kept as composites of the original variables. These composite principal component scores retain some portion of the total variability in the data. Principal component analysis is often the first step in an analysis, as in ours, before performing another statistical method such as linear regression. Principal components can be interpreted by how the linear combinations were computed, where variables with higher coefficient loadings are seen as contributing more to that set of scores. There are multiple ways of computing principal component scores including the Eigenvalue decomposition and the Singular Value Decomposition (SVD). For our method, we use the SVD which is detailed below.
Let X be an m × k matrix of real numbers. Then it can be decomposed in such a way that there exists an m × m orthogonal matrix U and k × k orthogonal matrix W obeying Equation 3.1
A = U ΣWT (3.1)
the other entries are zero. The positive constants σi are called singular values. In R,
the singular value decomposition is computed using the dgesvd routine in LAPACK, which uses a QR algorithm (Anderson et al. 1999).
The singular values σi, which are equivalent to the square root of the eigenvalues
of XTX when XTX is positive definite, can be interpreted as the standard deviations
of each of the principal components. They can be useful in determining how many principal components to keep. Equation 3.2 below gives a measure of the amount of variability kept in the first k of K possible components. One can then choose the number of components to keep in order keep a certain amount of the total variability.
Variance Retained in first k of K PCs = Pk i=1σ 2 i PK i=1σ 2 i (3.2) Once the SVD has been performed, principal component scores T are then ob- tained by taking linear combinations of the original data X using either the W or U matrices given in Equation 3.3.
T = XW = (U ΣWT)W = U Σ(WTW ) = U Σ (3.3)
3.2.2 Canonical correlation analysis
Canonical correlation analysis is an analogous method to principal component analysis for finding linear combinations of data that explain the maximum amount of correlation between two sets of variables (Hotelling 1936). It can also be seen as a dimension reduction technique. Canonical correlation is a general method, and many well-known statistical methods such as multiple linear regression can be considered as special cases of canonical correlation.
ate scores for each of the two data sets that are maximally correlated. These scores can then be interpreted to find which variables are responsible for covariance be- tween the two data sets by interpreting the communalities for each. Like principal components analysis, covariance matrices can be decomposed using the SVD or the spectral decomposition when computing the canonical covariates. For our purposes, we use the classical formulation of canonical correlation analysis using the spectral decomposition as detailed below.
Let X be a random vector of p variables and Y be a random vector of q variables. We can then define their cross-covariance as ΣXY = cov(X, Y ) which is an p × q
matrix whose (i, j) entry is cov(xi, yj). ΣXY can also be thought of as the off-diagonal
component of the variance-covariance matrix when combining X and Y into a single random vector Z as in Equation 3.4.
Z (p+q)×1= X Y (3.4)
ΣXY is then the off-diagonal of ΣZ given in Equation 3.5.
ΣZ = ΣXX ΣXY ΣY X ΣY Y (3.5)
The goal of canonical correlation analysis is to then find linear combinations aTX and bTY such as to maximize ρ in cor(aTX, bTY ) = ρ. The linear combinations U = aTX and V = bTY can be re-written as linear combinations of the standardized variables. For i ∈ {1, ..., min(p, q)} sets of canonical covariate vectors Ui and Vi can
be written as Ui = eTi Σ −1/2
XX X and Vi = fTi Σ −1/2
Y Y Y . Here ρ2i are the first min(p, q)
eigenvalues of Σ−1/2XX ΣXYΣ−1Y YΣY XΣ −1/2 XX and Σ −1/2 Y Y ΣY XΣ−1XXΣXYΣ −1/2 Y Y . Lastly, eTi and
spectively. The canonical covariates have the following properties given in Equation 3.6: Var(Ui) = Var(Vi) = 1 Cov(Ui, Uj) = Cor(Ui, Uj) = 0 i 6= j Cov(Vi, Vj) = Cor(Vi, Vj) = 0 i 6= j Cov(Ui, Vj) = Cor(Ui, Vj) = 0 i 6= j (3.6)
These sets of canonical covariates can then be interpreted by examining their loadings LUi and LVi, which are a measure of how much each original variable is related to canonical covariate i. The vector of loadings for canonical covariate Ui is
given in Equation 3.7. LUi = Cor(Ui, X) = Cor(Ui, X1) .. . Cor(Ui, Xn) (3.7)
Aside from being able to interpret canonical covariates using their loadings, we want to be able to quantify how much variance each canonical covariate explains in the original data. This is analogous to an R2 measure, except that it is asymmetric. The correlation between Ui and Vi is given by ρi, but Ui and Vi are different linear
combinations of the original data. Therefore, we compute what is called a redundancy coefficient (RC) separately for X and Y . The amount of variability explained in X and Y by canonical covariates Vi and Ui are given by RX∗2i and R
∗2
Yi, respectively, in Equation 3.8.
R∗2X = ρ2i p X j=1 L2U i R∗2Y = ρ2i q X k=1 L2V i (3.8)