Canonical Correlation Analysis

(1)

Canonical Correlation Analysis

Lecture 11 August 4, 2011

Advanced Multivariate Statistical Methods ICPSR Summer Session #2

(2)

Overview ●Today’s Lecture Canonical Correlations Computation Interpretation Another Example Other Analyses Wrapping Up

Today’s Lecture

■ _{Canonical Correlation Analysis}

◆ _{What it is}

◆ _{How it works}

◆ _{How to do such an analysis}

(3)

Overview Canonical Correlations ●Purpose ●Concept ●Bivariate Correlation ●Multiple Correlation ●Canonical Correlation Computation Interpretation Another Example Other Analyses Wrapping Up

Purpose

■ _{In general, when we have univariate data there are times}

when we would like to measure the linear relationship between things

◆ _{The simplest case is when we have 2 variables and all we}

are interested in is measuring their linear relationship. Here we would just use bivariate correlation

◆ _{Another case is in multiple regression when we have}

several independent variables and one dependent

variable. In this case we would use the multiple correlation coefficient (R2)

■ _{So, it would be nice if we could expand the idea used in}

these to a situation where we have several y variables and several x variables

(4)

Concept

■ _{From Webster’s Dictionary: canonical: reduced to the}

simplest or clearest schema possible.

■ _{What do we mean by basic ideas?}

■ _{In describing canonical correlation, we will start with the}

basic cases where we only have two variables and build on it until we get to canonical correlations

1. First we will look at the bivariate correlation

2. Then we will see what was done to generalize bivariate correlation to the multiple correlation coefficient

3. Finally, these discussions will lead us right to what happens in canonical correlation analysis

(5)

Bivariate Correlation

■ _{Begin by thinking of just two variables} _y _and _x

■ _{In this case the correlation describes the extent that one}

variable relates (can predict) the other

■ _{That is...the stronger the correlation the more we will know}

about y by just knowing x

(6)

Multiple Correlation

■ _{On the other hand, if we have one} _y _{and multiple} _x _variables

we can no longer look at a simple relationship between the two variables

■ _{But, we can look at how well the set of} _x _{variables can}

predict the y by just computing the regression line

■ _{Using the regression line we can compute our predicted} _yˆ

and we can compare it to the y variable.

◆ _{Specifically, we now have only two variables y and}

ˆ

y = x′_b ₌ _{so we can compute a simple correlation}

■ _{Note: we started with something that was more complicated}

(many x variables) and changed it in to something that we could compute a simple correlation (between y and ˆy)

(7)

Multiple Correlation Example

From Weisberg (1985, p. 240).

“Property taxes on a house are supposedly dependent on the current market value of the house. Since houses actually sell only rarely, the sale price of each house must be estimated every year when property taxes are set. Regression methods are sometimes used to make up a prediction function.”

We have data for 27 houses sold in the mid 1970’s in Erie, Pennsylvania:

■ _x1_{: Current taxes (local, school, and county)} _÷ _{100 (dollars)} ■ _x2_{: Number of bathrooms}

■ _x3_{: Living space} _÷ _{1000 (square feet)} ■ _x4_{: Age of house (years)}

(8)

Multiple Correlation Example

To compute the multiple correlation of x1, x2, x3, and x4 with y, first compute the multiple regression for all x variables and y:

proc reg data=house; model y=x1-x4;

output out=newdata p=yhat; run;

Then, take the predicted values given by the model, yˆ and correlate them with y:

proc corr data=newdata; var yhat y;

(9)

(10)

Multiple Correlation Example

(11)

Canonical Correlation

■ _{Canonical correlation seeks to find the correlation between} multiple x variables and multiple y variables

■ _{Now we have several} _y _{variables and several} _x _{variables so}

neither of our previous two examples can directly apply, BUT we can take the points from the previous cases and use

them for this new case

■ _{So we could look at how well the set of} _x _{variables can}

predict the set of y variables, but in doing this we still will not be able to compute a simple correlation

■ _{On the other hand, in the multiple regression we found a}

linear combination of the variables b′x to get a single variable

◆ _{In our case we have two sets of variables so it makes}

sense that we can define two linear combinations...one for the x variables (b1) and one for the y variables (a1)

(12)

Canonical Correlation

■ _{In the simple case where we only have a single linear}

combination for each set of variables we can compute the simple correlation between these two linear combinations

■ _{The first canonical correlation describes the correlation}

between these two new variables (b′₁x and a′

1y) ■ _{So how do we pick the linear transformations?}

◆ _{These linear transformations (b}₁ _{and a}₁_{) are picked such}

that the correlation between these two new variables is maximized

◆ _{Notice that this idea is really no different from what we did} in multiple regression

◆ _{This also sounds similar to something we have done in} PCA

(13)

Canonical Correlation

■ _{ONE LAST THING}

■ _{Think back to PCA when we said that a single linear}

combination did not account for all of the information present in a data set...

◆ _{Then we could determine how many linear combinations}

were needed to capture more information (where the linear combinations were all uncorrelated)

■ _{We can do the same thing here...}

◆ _{We can define more sets of linear combinations (b}_i _and

ai, i = 1, . . . , s where s = min (p, q), p is the number of

variables in the group of x and q is the number of variables in y)

◆ _{Each linear combinations maximizes the correlation}

between the new variables under the constraint that they are uncorrelated with all other previous linear

(14)

Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights ●Canonical Corr. Properties ●Hypothesis Test for Corr. Interpretation

Another Example Other Analyses Wrapping Up

Computation

■ _{To show how to compute canonical correlations, first}

consider our original covariance matrix from our example:

x1 x2 x3 x4 y x1 8.3100 1.0700 1.3400 −15.0300 37.7400 x2 1.0700 0.1800 0.2100 −1.2500 5.6000 x3 1.3400 0.2100 0.3100 −1.4000 7.4200 x4 −15.0300 −1.2500 −1.4000 197.4900 −62.3900 y 37.7400 5.6000 7.4200 −62.3900 204.7000

(15)

Computation

■ _{From this matrix, we will define four new sub-matrices, from}

which we will calculate our correlations:

x1 x2 x3 x4 y x1 x2 Sxx Sxy x3 x4 y S′_xy Syy

(16)

Computation

■ _{So how do we compute the canonical correlations?}

■ _{To begin, note that we could define the Squared Multiple}

Correlation R2_M as

R2_M = |Sxy

′_S−1

xxSxy|

|Syy|

which can be rewritten as:

R2_M = |S−_yy1SyxS−

1

xxSyx|

■ _{For canonical correlations, however, we will focus on the}

matrix formed by the part of the equation within the | · | (note this was just a scalar when y only has one variable)

(17)

Computation

■ _{We first compute the square root of the eigenvalues}

(r1, r2, . . . , r_s) and the eigenvectors (a1,a2, . . . ,as) of:

S−_yy1SyxS−

1

xx Sxy

■ _{Then we compute the square root of the eigenvalues}

(r1, r2, . . . , rs) and the eigenvectors (b1,b2, . . . ,bs) of:

S−_xx1SxyS−

1

yy Syx

■ _{Conveniently, the eigenvalues for both equations are equal}

(and are between zero and one)!

◆ _{The square root of the eigenvalues represents each}

successive canonical correlation between the successive pairs of linear combinations

■ _{From the eigenvectors we have determined the linear}

(18)

Example #2

■ _{To illustrate canonical correlations, consider the following}

analysis:

Three physiological and three exercise variables are measured on 27 middle-aged men in a fitness club

■ _{The variables collected are:}

◆ _{Weight (in pounds -} _x1₎

◆ _{Waist size (in inches -} _x2₎

◆ _{Pulse rate (in beats-per-minute -} _x3₎

◆ _{Number of chin-ups performed (}_y1₎

◆ _{Number of sit-ups performed (}_y2₎

(19)

Example #2

■ _{To run a canonical correlation analysis, use the following code:}

proc cancorr data=Fit all

vprefix=Physiological vname=’Physiological Measurements’ wprefix=Exercises wname=’Exercises’;

var Weight Waist Pulse; with Chins Situps Jumps; run;

(20)

(21)

(22)

Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights

●Canonical Corr. Properties ●Hypothesis Test for Corr. Interpretation

Standardized Weights

■ _{Just like in PCA and Factor Analysis, we are interested in}

interpreting the weights of the linear combination

■ _{However, if our variables are in different scales they are}

difficult to interpret

■ _{So, we can standardize them, which is the same as}

computing the canonical correlations and linear combination of the correlation matrix instead of using the the

variance/covariance matrix

■ _{We can also compute the standardize coefficients (c and d)}

directly:

c = diag(Syy)

1 2a

(23)

(24)

Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights

●Canonical Corr. Properties

●Hypothesis Test for Corr. Interpretation

Canonical Corr. Properties

1. Canonical correlations are invariant.

■ _{This means that, like any correlation, scale changes (such}

as standardizing) will not change the correlation.

■ _{However, it will change the eigenvectors...}

2. The first canonical correlation is the best we can do with associations.

■ _{Which means it is better than any of the simple}

correlations or any multiple correlation with the variables under study

(25)

Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights ●Canonical Corr. Properties

●Hypothesis Test for Corr.

Interpretation Another Example Other Analyses Wrapping Up

Hypothesis Test for Corr.

■ _{We begin by testing that at least the first (the largest)}

correlation is significantly different from zero

■ _{If we cannot get a significant relationship out of the optimal}

linear combination of variables this is the same as testing

H0 : Σxy = 0 or B1 = 0

◆ _{This is tested using Wilk’s Lambda:}

Λ1 =

|S| |Syy||Sxx|

■ _{Or, equivalently (where} _r2

i is the eigenvalue from the matrix

term produced from the submatrices of the covariance matrix): Λ1 = s Y i=1 (1 − r2_i)

(26)

Overview Canonical Correlations Computation ●Computation ●Example #2 ●Standardized Weights ●Canonical Corr. Properties

●Hypothesis Test for Corr.

Interpretation Another Example Other Analyses Wrapping Up

The Rest

■ _{In this case} _Λ₁ _as Λ1 = s Y i=1 (1 − r2_i)

which can be compared to Λα,p,q,n₋1₋q (or to a Λα,q,p,n₋1₋p)

■ _{In general we can compute}

Λj = s

Y

i=k

(1 − r_i2)

which can be compared to Λα,p₋k+1,q₋k+1,n₋k₋q (or to a

(27)

(28)

Overview Canonical Correlations Computation Interpretation ●Interpretation ●Standardized ●Correlation of Linear Combination with Variables ●Rotation ●Redundancy Another Example Other Analyses Wrapping Up

Interpretation

■ _{Because in many ways a canonical correlation analysis is}

similar to what we discussed in PCA, the interpretation methods are also similar

■ _{Specifically, we will discuss four methods that are used to}

interpret the results:

1. Standardized Coefficients

2. Correlation between Canonical Variates (the linear combination) and each variable

3. Rotation

(29)

Overview Canonical Correlations Computation Interpretation ●Interpretation ●Standardized ●Correlation of Linear Combination with Variables ●Rotation ●Redundancy Another Example Other Analyses Wrapping Up

Standardized

■ _{Because the standardized variables are on the same scale}

they can be directly compared

■ _{Those variables that are most important to the association}

are the ones with the largest absolute values (i.e., determine importance)

■ _{To interpret what the linear combination is capturing we will}

(30)

Overview Canonical Correlations Computation Interpretation ●Interpretation ●Standardized ●Correlation of Linear Combination with Variables

●Rotation ●Redundancy Another Example Other Analyses Wrapping Up

Correlation of Linear Combination with Variab

■ _{This was mentioned in PCA and EFA...}

■ _{That is, we compute our linear combinations and then}

compute the correlation between the linear combination (canonical variates) with each of the actual variables

◆ _{The correlations are typically called the loadings or}

structure coefficients

■ _{As was the case in PCA this ignores the overall}

multidimensional structure and so it is not a recommend analysis to make interpretations from

(31)

Overview Canonical Correlations Computation Interpretation ●Interpretation ●Standardized ●Correlation of Linear Combination with Variables

●Rotation ●Redundancy Another Example Other Analyses Wrapping Up

Rotation

■ _{We could try rotating the weights of the analysis to provide}

an interpretable result...

■ _{For this we begin to rely on the spacial representation of}

what is going on with the data

■ _{Every linear combination is projecting our observations on to}

a different dimension

◆ _{Sometimes these dimensions are difficult to interpret (i.e.,}

based on the sign and magnitude

■ _{Sometimes we can rotate these dimensions so that the}

weights are easier to interpret

◆ _{Some are large and some are small}

■ _{Rotations in CCA are not recommended, because we lose}

(32)

Redundancy

■ _{Another method for interpretation is a redundancy analysis (this, again, is}

often not liked by statisticians because it only summarizes univariate relationships)

(33)

(34)

(35)

Overview Canonical Correlations Computation Interpretation Another Example Other Analyses Wrapping Up

Another Example

■ _{In a study of social support and mental health, measures of}

the following seven variables were taken on 405 subjects:

◆ _{Total Social Support}

◆ _{Family Social Support}

◆ _{Friend Social Support}

◆ _{Significant Other Social Support}

◆ _Depression

◆ _Loneliness

◆ _Stress

■ _{The researchers were interested in determining the}

relationship between social support and mental health...how about using a canonical correlation analysis?

(36)

*SAS Example #3;

data depress (type=corr);

_type_=’corr’; input _name_ $ v1-v7; label v1=’total social support’

v2=’family social support’ v3=’friend social support’

v4=’significant other social support’ v5=’depression’ v6=’loneliness’ v7=’stress’; datalines; v1 1.00 . . . . v2 0.8280 1.0000 . . . . . v3 0.8136 0.5192 1.0000 . . . . v4 0.8569 0.5972 0.6109 1.0000 . . . v5 -0.3691 -0.3218 -0.3150 -0.3044 1.0000 . . v6 -0.6282 -0.4945 -0.5774 -0.5266 0.5368 1.0000 . v7 -0.1849 -0.2049 -0.1132 -0.1291 0.4872 0.2846 1.000 ;

proc cancorr data=depress all corr edf=404

vprefix=Mental_Health vname=’Mental Health’ wprefix=Social_Support wname=’Social Support’; var v1-v4;

with v5-v7; run;

(37)

(38)

(39)

Overview Canonical Correlations Computation Interpretation Another Example Other Analyses ●Other Analyses Wrapping Up

Other Analyses

■ _{In general, the results from a canonical correlations routine}

are related to:

1. Regression

2. Discriminant Analysis (we will learn this next week)

3. MANOVA

■ _{However, the goals of canonical correlation overlap with the}

information provided by a confirmatory factor analysis or structural equation model...

(40)

Final Thought

■ _{The midterm was accomplished using MANOVA and MANCOVA.}

■ _{Canonical correlation analysis is a complicated analysis that provides many}

results of interest to researchers.

■ _{Perhaps because of it’s complicated nature, canonical correlation analysis is}

not often used.

■ _{Last week: Nebraska...This week: Texas...After that: The world.} ■ _{Tomorrow: Lab Day! Meet in Helen Newberry’s Michigan Lab}