Canonical Correlation Analysis
Lecture 11 August 4, 2011
Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Overview ●Today’s Lecture Canonical Correlations Computation Interpretation Another Example Other Analyses Wrapping Up
Today’s Lecture
■ Canonical Correlation Analysis
◆ What it is
◆ How it works
◆ How to do such an analysis
Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up
Purpose
■ In general, when we have univariate data there are times
when we would like to measure the linear relationship between things
◆ The simplest case is when we have 2 variables and all we
are interested in is measuring their linear relationship. Here we would just use bivariate correlation
◆ Another case is in multiple regression when we have
several independent variables and one dependent
variable. In this case we would use the multiple correlation coefficient (R2)
■ So, it would be nice if we could expand the idea used in
these to a situation where we have several y variables and several x variables
Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up
Concept
■ From Webster’s Dictionary: canonical: reduced to the
simplest or clearest schema possible.
■ What do we mean by basic ideas?
■ In describing canonical correlation, we will start with the
basic cases where we only have two variables and build on it until we get to canonical correlations
1. First we will look at the bivariate correlation
2. Then we will see what was done to generalize bivariate correlation to the multiple correlation coefficient
3. Finally, these discussions will lead us right to what happens in canonical correlation analysis
Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up
Bivariate Correlation
■ Begin by thinking of just two variables y and x
■ In this case the correlation describes the extent that one
variable relates (can predict) the other
■ That is...the stronger the correlation the more we will know
about y by just knowing x
Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up
Multiple Correlation
■ On the other hand, if we have one y and multiple x variables
we can no longer look at a simple relationship between the two variables
■ But, we can look at how well the set of x variables can
predict the y by just computing the regression line
■ Using the regression line we can compute our predicted yˆ
and we can compare it to the y variable.
◆ Specifically, we now have only two variables y and
ˆ
y = x′b = so we can compute a simple correlation
■ Note: we started with something that was more complicated
(many x variables) and changed it in to something that we could compute a simple correlation (between y and ˆy)
Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up
Multiple Correlation Example
From Weisberg (1985, p. 240).
“Property taxes on a house are supposedly dependent on the current market value of the house. Since houses actually sell only rarely, the sale price of each house must be estimated every year when property taxes are set. Regression methods are sometimes used to make up a prediction function.”
We have data for 27 houses sold in the mid 1970’s in Erie, Pennsylvania:
■ x1: Current taxes (local, school, and county) ÷ 100 (dollars) ■ x2: Number of bathrooms
■ x3: Living space ÷ 1000 (square feet) ■ x4: Age of house (years)
Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up
Multiple Correlation Example
To compute the multiple correlation of x1, x2, x3, and x4 with y, first compute the multiple regression for all x variables and y:
proc reg data=house; model y=x1-x4;
output out=newdata p=yhat; run;
Then, take the predicted values given by the model, yˆ and correlate them with y:
proc corr data=newdata; var yhat y;
Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up
Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up
Multiple Correlation Example
Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up
Canonical Correlation
■ Canonical correlation seeks to find the correlation between multiple x variables and multiple y variables
■ Now we have several y variables and several x variables so
neither of our previous two examples can directly apply, BUT we can take the points from the previous cases and use
them for this new case
■ So we could look at how well the set of x variables can
predict the set of y variables, but in doing this we still will not be able to compute a simple correlation
■ On the other hand, in the multiple regression we found a
linear combination of the variables b′x to get a single variable
◆ In our case we have two sets of variables so it makes
sense that we can define two linear combinations...one for the x variables (b1) and one for the y variables (a1)
Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up
Canonical Correlation
■ In the simple case where we only have a single linear
combination for each set of variables we can compute the simple correlation between these two linear combinations
■ The first canonical correlation describes the correlation
between these two new variables (b′1x and a′
1y) ■ So how do we pick the linear transformations?
◆ These linear transformations (b1 and a1) are picked such
that the correlation between these two new variables is maximized
◆ Notice that this idea is really no different from what we did in multiple regression
◆ This also sounds similar to something we have done in PCA
Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up
Canonical Correlation
■ ONE LAST THING
■ Think back to PCA when we said that a single linear
combination did not account for all of the information present in a data set...
◆ Then we could determine how many linear combinations
were needed to capture more information (where the linear combinations were all uncorrelated)
■ We can do the same thing here...
◆ We can define more sets of linear combinations (bi and
ai, i = 1, . . . , s where s = min (p, q), p is the number of
variables in the group of x and q is the number of variables in y)
◆ Each linear combinations maximizes the correlation
between the new variables under the constraint that they are uncorrelated with all other previous linear
Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights ●Canonical Corr. Properties ●Hypothesis Test for Corr. Interpretation
Another Example Other Analyses Wrapping Up
Computation
■ To show how to compute canonical correlations, first
consider our original covariance matrix from our example:
x1 x2 x3 x4 y x1 8.3100 1.0700 1.3400 −15.0300 37.7400 x2 1.0700 0.1800 0.2100 −1.2500 5.6000 x3 1.3400 0.2100 0.3100 −1.4000 7.4200 x4 −15.0300 −1.2500 −1.4000 197.4900 −62.3900 y 37.7400 5.6000 7.4200 −62.3900 204.7000
Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights ●Canonical Corr. Properties ●Hypothesis Test for Corr. Interpretation
Another Example Other Analyses Wrapping Up
Computation
■ From this matrix, we will define four new sub-matrices, from
which we will calculate our correlations:
x1 x2 x3 x4 y x1 x2 Sxx Sxy x3 x4 y S′xy Syy
Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights ●Canonical Corr. Properties ●Hypothesis Test for Corr. Interpretation
Another Example Other Analyses Wrapping Up
Computation
■ So how do we compute the canonical correlations?
■ To begin, note that we could define the Squared Multiple
Correlation R2M as
R2M = |Sxy
′S−1
xxSxy|
|Syy|
which can be rewritten as:
R2M = |S−yy1SyxS−
1
xxSyx|
■ For canonical correlations, however, we will focus on the
matrix formed by the part of the equation within the | · | (note this was just a scalar when y only has one variable)
Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights ●Canonical Corr. Properties ●Hypothesis Test for Corr. Interpretation
Another Example Other Analyses Wrapping Up
Computation
■ We first compute the square root of the eigenvalues
(r1, r2, . . . , rs) and the eigenvectors (a1,a2, . . . ,as) of:
S−yy1SyxS−
1
xx Sxy
■ Then we compute the square root of the eigenvalues
(r1, r2, . . . , rs) and the eigenvectors (b1,b2, . . . ,bs) of:
S−xx1SxyS−
1
yy Syx
■ Conveniently, the eigenvalues for both equations are equal
(and are between zero and one)!
◆ The square root of the eigenvalues represents each
successive canonical correlation between the successive pairs of linear combinations
■ From the eigenvectors we have determined the linear
Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights ●Canonical Corr. Properties ●Hypothesis Test for Corr. Interpretation
Another Example Other Analyses Wrapping Up
Example #2
■ To illustrate canonical correlations, consider the following
analysis:
Three physiological and three exercise variables are measured on 27 middle-aged men in a fitness club
■ The variables collected are:
◆ Weight (in pounds - x1)
◆ Waist size (in inches - x2)
◆ Pulse rate (in beats-per-minute - x3)
◆ Number of chin-ups performed (y1)
◆ Number of sit-ups performed (y2)
Example #2
■ To run a canonical correlation analysis, use the following code:
proc cancorr data=Fit all
vprefix=Physiological vname=’Physiological Measurements’ wprefix=Exercises wname=’Exercises’;
var Weight Waist Pulse; with Chins Situps Jumps; run;
Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights
●Canonical Corr. Properties ●Hypothesis Test for Corr. Interpretation
Another Example Other Analyses Wrapping Up
Standardized Weights
■ Just like in PCA and Factor Analysis, we are interested in
interpreting the weights of the linear combination
■ However, if our variables are in different scales they are
difficult to interpret
■ So, we can standardize them, which is the same as
computing the canonical correlations and linear combination of the correlation matrix instead of using the the
variance/covariance matrix
■ We can also compute the standardize coefficients (c and d)
directly:
c = diag(Syy)
1 2a
Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights
●Canonical Corr. Properties
●Hypothesis Test for Corr. Interpretation
Another Example Other Analyses Wrapping Up
Canonical Corr. Properties
1. Canonical correlations are invariant.
■ This means that, like any correlation, scale changes (such
as standardizing) will not change the correlation.
■ However, it will change the eigenvectors...
2. The first canonical correlation is the best we can do with associations.
■ Which means it is better than any of the simple
correlations or any multiple correlation with the variables under study
Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights ●Canonical Corr. Properties
●Hypothesis Test for Corr.
Interpretation Another Example Other Analyses Wrapping Up
Hypothesis Test for Corr.
■ We begin by testing that at least the first (the largest)
correlation is significantly different from zero
■ If we cannot get a significant relationship out of the optimal
linear combination of variables this is the same as testing
H0 : Σxy = 0 or B1 = 0
◆ This is tested using Wilk’s Lambda:
Λ1 =
|S| |Syy||Sxx|
■ Or, equivalently (where r2
i is the eigenvalue from the matrix
term produced from the submatrices of the covariance matrix): Λ1 = s Y i=1 (1 − r2i)
Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights ●Canonical Corr. Properties
●Hypothesis Test for Corr.
Interpretation Another Example Other Analyses Wrapping Up
The Rest
■ In this case Λ1 as Λ1 = s Y i=1 (1 − r2i)which can be compared to Λα,p,q,n−1−q (or to a Λα,q,p,n−1−p)
■ In general we can compute
Λj = s
Y
i=k
(1 − ri2)
which can be compared to Λα,p−k+1,q−k+1,n−k−q (or to a
Overview Canonical Correlations Computation Interpretation ●Interpretation ●Standardized ●Correlation of Linear Combination with Variables ●Rotation ●Redundancy Another Example Other Analyses Wrapping Up
Interpretation
■ Because in many ways a canonical correlation analysis is
similar to what we discussed in PCA, the interpretation methods are also similar
■ Specifically, we will discuss four methods that are used to
interpret the results:
1. Standardized Coefficients
2. Correlation between Canonical Variates (the linear combination) and each variable
3. Rotation
Overview Canonical Correlations Computation Interpretation ●Interpretation ●Standardized ●Correlation of Linear Combination with Variables ●Rotation ●Redundancy Another Example Other Analyses Wrapping Up
Standardized
■ Because the standardized variables are on the same scale
they can be directly compared
■ Those variables that are most important to the association
are the ones with the largest absolute values (i.e., determine importance)
■ To interpret what the linear combination is capturing we will
Overview Canonical Correlations Computation Interpretation ●Interpretation ●Standardized ●Correlation of Linear Combination with Variables
●Rotation ●Redundancy Another Example Other Analyses Wrapping Up
Correlation of Linear Combination with Variab
■ This was mentioned in PCA and EFA...
■ That is, we compute our linear combinations and then
compute the correlation between the linear combination (canonical variates) with each of the actual variables
◆ The correlations are typically called the loadings or
structure coefficients
■ As was the case in PCA this ignores the overall
multidimensional structure and so it is not a recommend analysis to make interpretations from
Overview Canonical Correlations Computation Interpretation ●Interpretation ●Standardized ●Correlation of Linear Combination with Variables
●Rotation ●Redundancy Another Example Other Analyses Wrapping Up
Rotation
■ We could try rotating the weights of the analysis to provide
an interpretable result...
■ For this we begin to rely on the spacial representation of
what is going on with the data
■ Every linear combination is projecting our observations on to
a different dimension
◆ Sometimes these dimensions are difficult to interpret (i.e.,
based on the sign and magnitude
■ Sometimes we can rotate these dimensions so that the
weights are easier to interpret
◆ Some are large and some are small
■ Rotations in CCA are not recommended, because we lose
Redundancy
■ Another method for interpretation is a redundancy analysis (this, again, is
often not liked by statisticians because it only summarizes univariate relationships)
Overview Canonical Correlations Computation Interpretation Another Example Other Analyses Wrapping Up
Another Example
■ In a study of social support and mental health, measures of
the following seven variables were taken on 405 subjects:
◆ Total Social Support
◆ Family Social Support
◆ Friend Social Support
◆ Significant Other Social Support
◆ Depression
◆ Loneliness
◆ Stress
■ The researchers were interested in determining the
relationship between social support and mental health...how about using a canonical correlation analysis?
*SAS Example #3;
data depress (type=corr);
_type_=’corr’; input _name_ $ v1-v7; label v1=’total social support’
v2=’family social support’ v3=’friend social support’
v4=’significant other social support’ v5=’depression’ v6=’loneliness’ v7=’stress’; datalines; v1 1.00 . . . . v2 0.8280 1.0000 . . . . . v3 0.8136 0.5192 1.0000 . . . . v4 0.8569 0.5972 0.6109 1.0000 . . . v5 -0.3691 -0.3218 -0.3150 -0.3044 1.0000 . . v6 -0.6282 -0.4945 -0.5774 -0.5266 0.5368 1.0000 . v7 -0.1849 -0.2049 -0.1132 -0.1291 0.4872 0.2846 1.000 ;
proc cancorr data=depress all corr edf=404
vprefix=Mental_Health vname=’Mental Health’ wprefix=Social_Support wname=’Social Support’; var v1-v4;
with v5-v7; run;
Overview Canonical Correlations Computation Interpretation Another Example Other Analyses Wrapping Up
Overview Canonical Correlations Computation Interpretation Another Example Other Analyses Wrapping Up
Overview Canonical Correlations Computation Interpretation Another Example Other Analyses ●Other Analyses Wrapping Up
Other Analyses
■ In general, the results from a canonical correlations routine
are related to:
1. Regression
2. Discriminant Analysis (we will learn this next week)
3. MANOVA
■ However, the goals of canonical correlation overlap with the
information provided by a confirmatory factor analysis or structural equation model...
Final Thought
■ The midterm was accomplished using MANOVA and MANCOVA.
■ Canonical correlation analysis is a complicated analysis that provides many
results of interest to researchers.
■ Perhaps because of it’s complicated nature, canonical correlation analysis is
not often used.
■ Last week: Nebraska...This week: Texas...After that: The world. ■ Tomorrow: Lab Day! Meet in Helen Newberry’s Michigan Lab