4 Correlation analysis
Definition 4.1. A correlation coefficient that measures how well a
(1) linear function
(2) conjugate linear function (3) widely linear function
models the relationship between two complex random variables is respectively called a (1) rotational correlation coefficient
(2) reflectional correlation coefficient (3) total correlation coefficient.
4.1.2
Principle of multivariate correlation analysis
We would now like to define a scalar-valued correlation coefficient that gives an overall measure of the association between two zero-mean random vectors. The following definition sets out the minimum requirements for such a correlation coefficient.
Definition 4.2. A correlation coefficientρx y between two random vectors x and y must
satisfy the following conditions for all nonzero scalarsα and β, provided that x and y are not both zero.
0≤ ρx y ≤ 1, (4.17)
ρx y = ρxy= ρx y for x= αx, y= βy, (4.18)
ρx y = 1 if y = βx, (4.19)
ρx y = 0 if x and y are uncorrelated. (4.20) Note that we do not require the symmetryρx y = ρyx.3 If a correlation coefficient is allowed to be negative or complex-valued (as we have seen in the previous section), these conditions apply to its absolute value. However, the correlation coefficients con- sidered hereafter are all real and nonnegative. For simplicity, we consider only rotational
92 Correlation analysis (a) (b) (c) Rxy B2 B1 A2 A1 ω ξ ξξξ ω ω∗ ωω B K A K A B (·)∗ (·)∗ K x y x y x y∗ ξξξ Rxy Rxy
Figure 4.4 The principles of multivariate correlation analysis: (a) rotational correlations, (b) reflectional correlations, and (c) total correlations.
correlations and strictly linear transforms in this section. Reflectional and total correla-
tions will be discussed in the following section.
A correlation coefficient that only satisfies (4.17)–(4.20) will probably not be very useful. It is usually required to have further cases that result in a unit correlation coefficient ρx y = 1, such as when y = Mx, where M is any nonsingular matrix, or
y= Ux, where U is any unitary matrix. There are further desirable properties. Chief
among them are invariance under specified classes of transformations on x and y, and the ability to assess correlation in a lower-dimensional subspace. What exactly this means will become clearer as we move along in our development.
The cross-correlation properties between x and y are described by the cross- correlation matrix Rx y = ExyH, but this matrix is generally difficult to interpret. In order to illuminate the underlying cross-correlation structure, we shall transform n- dimensional x and m-dimensional y into p-dimensional internal (latent) representa- tions = Ax and = By, with p = min(m, n), as shown in Fig.4.4(a). The way in which the full-rank matrices A∈ Cp×n and B∈ Cp×m are chosen will determine the type of correlation analysis. In the statistical literature, the latent vectors and are usually called score vectors, and the matrices A and B are called the matrices of
loadings.
Our goal is to define different correlation coefficients as different functions of the correlations ki = E ξiωi∗, i= 1, . . ., p, which are the diagonal elements of the cross- correlation matrix K= E Hin the internal coordinate system of (, ). We would like
as much correlation as possible concentrated in the first r coefficients{k1, k2, . . ., kr}, for any r≤ p, because this will allow us to assess correlation in a lower-dimensional subspace of dimension r . Hence, our aim is to choose A and B such that all partial sums
4.1 Measuring multivariate association 93
over the absolute values of the diagonal cross-correlations kiare maximized:
max A,B r i=1 |ki|, r = 1, . . ., p (4.21)
In order to make this a well-defined maximization problem, we need to impose some constraints on A and B. The following three choices are most compelling.
r Require that the internal representations and each have identity correlation matrix (we avoid using the term “white” because and may have non-identity complemen-
tary correlation matrices): Rξξ = ARx xAH= I and Rωω= BRyyBH= I. This choice leads to canonical correlation analysis (CCA). The corresponding diagonal cross- correlations ki are called the canonical correlations, and the latent vectors and are given in canonical coordinates.
r Require that A have unitary rows (which we will simply call row-unitary) and have identity correlation matrix: AAH= I and R
ωω= BRyyBH= I. This choice leads to
multivariate linear regression (MLR), also known as half-canonical correlation anal- ysis. The corresponding diagonal cross-correlations ki are called the half-canonical
correlations, and the latent vectors and are given in half-canonical coordinates.
r Require that A and B be row-unitary: AAH= I and BBH= I. This choice leads to partial least-squares (PLS) analysis. The corresponding diagonal cross-correlations ki are called the PLS correlations, and the latent vectors and are given in PLS
coordinates.
Sometimes, when there is a risk of confusion, we will use the subscript C, M, or P to emphasize that quantities were derived using CCA, MLR, or PLS.
Example 4.3. If x and y are scalars, then the CCA constraints are|A|2= R−1
x x and|B|
2= R−1yy. Thus, the latent variables areξ = R−1/2x x ejφ1x andω = R−1/2yy ejφ2y for arbitraryφ1
andφ2. We then obtain
|k| = |Eξω∗| = |Rx y| √ Rx x Ryy = |ρx y|, withρx y defined in (4.5).
We will find in Section4.1.4that the solution to the maximization problem (4.21) for CCA, MLR, or PLS results in a diagonal cross-correlation matrix
K= E H= Diag(k1, . . ., kp), (4.22) with k1≥ k2≥ · · · ≥ kp ≥ 0. Of course, the kis depend on which of the principles CCA, MLR, and PLS is employed, so they are canonical correlations, half-canonical correlations, or PLS correlations. Furthermore, we will show that CCA, MLR, and PLS each produce a set {ki} that has maximum spread in the sense of majorization (see Appendix3). Therefore, any correlation coefficient that is an increasing, Schur-convex function of{ki} is maximized, for arbitrary rank r.
94 Correlation analysis
The key difference among CCA, MLR, and PLS lies in their invariance properties. We will see that CCA is invariant under nonsingular linear transformation of both x and y, MLR is invariant under nonsingular linear transformation of y but only unitary transformation of x, and PLS is invariant under unitary transformation of both x and y. Therefore, CCA and PLS provide a symmetric assessment of correlation since the roles of x and y are interchangeable. MLR, on the other hand, distinguishes between the mes-
sage (or predictor/explanatory variables) x and the measurement (or criterion/response
variables) y. The correlation analysis technique must be chosen to match the invariance properties of the problem at hand.
4.1.3
Rotational, reflectional, and total correlations for complex vectors
It is easy to see how to apply these principles to reflectional and total correlations, as shown in Figs.4.4(b) and (c). For reflectional correlations, the internal representation is = Ax and ∗ = By∗, whose complementary cross-correlation matrix is K= ET
, and ˜ki = E ξiωi. The maximization problem is then to maximize all partial sums over the absolute value of the complementary cross-correlations
max A,B r i=1 |˜ki|, r = 1, . . ., p, (4.23)
with the following constraints on A and B.
r For CCA, ARx xAH= I and BR∗yyB
H= I.
r For MLR, AAH= I and BR∗
yyB
H= I.
r For PLS, AAH= I and BBH= I.
For total correlations, the internal representation is computed as a widely linear function: = A x (i.e., = A1x+ A2x∗) and = B y (i.e., = B1y+ B2y∗). Here,
our goal is to maximize the diagonal cross-correlations between the vectors of real and imaginary parts of and . That is, for i = 1, . . ., p, we let
¯k2i−1= 2E(Re ξi Reωi)= Re(E ξiω∗i + E ξiωi), (4.24) ¯k2i = 2E(Im ξi Imωi)= Re(E ξiω∗i − E ξiωi), (4.25) and maximize max A,B r i=1 |¯ki|, r = 1, . . ., 2p, (4.26)
with the following constraints placed on A and B.
r For CCA, A Rx xAH= I and B RyyBH= I. Hence, and are each white and proper.
r For MLR, A AH= I and B R
yyBH= I. While is white and proper, is generally improper.
4.1 Measuring multivariate association 95
Hence, we have a total of nine possible combinations between the three correlation
analysis techniques (CCA, MLR, PLS) and the three different correlation types (rota-
tional, reflectional, total). Each of these nine cases leads to different latent vectors (, ), and different diagonal cross-correlations ki, ˜ki, or ¯ki. We thus speak of rotational canon- ical correlations, reflectional half-canonical correlations, total canonical correlations, and so on.
4.1.4
Transformations into latent variables
We will now derive the transformations that solve the maximization problems for rota- tional, reflectional, and total correlations using CCA, MLR, or PLS. In doing so, we determine the internal (latent) coordinate system for (, ). The approach is the same in all cases, and is based on majorization theory. The background on majorization necessary to understand this material can be found in Appendix3.