A correlation coefficient that measures how well a

4 Correlation analysis

Definition 4.1. A correlation coefficient that measures how well a

(1) linear function

(2) conjugate linear function (3) widely linear function

models the relationship between two complex random variables is respectively called a (1) rotational correlation coefficient

(2) reflectional correlation coefficient (3) total correlation coefficient.

4.1.2 Principle of multivariate correlation analysis

We would now like to define a scalar-valued correlation coefficient that gives an overall measure of the association between two zero-mean random vectors. The following definition sets out the minimum requirements for such a correlation coefficient.

Definition 4.2. A correlation coefficientρx y between two random vectors x and y must

satisfy the following conditions for all nonzero scalarsα and β, provided that x and y are not both zero.

0≤ ρx y ≤ 1, (4.17)

ρx y = ρxy= ρx y for x= αx, y= βy, (4.18)

ρx y = 1 if y = βx, (4.19)

ρx y = 0 if x and y are uncorrelated. (4.20) Note that we do not require the symmetryρx y = ρyx.3 If a correlation coefficient is allowed to be negative or complex-valued (as we have seen in the previous section), these conditions apply to its absolute value. However, the correlation coefficients con- sidered hereafter are all real and nonnegative. For simplicity, we consider only rotational

92 Correlation analysis (a) (b) (c) R_xy B2 B1 A2 A1 ω ξ ξξξ ω ω∗ ωω B K A K A B (·)∗ (·)∗ K x y x y x y∗ ξξξ Rxy Rxy

Figure 4.4 The principles of multivariate correlation analysis: (a) rotational correlations, (b) reflectional correlations, and (c) total correlations.

correlations and strictly linear transforms in this section. Reflectional and total correla-

tions will be discussed in the following section.

A correlation coefficient that only satisfies (4.17)–(4.20) will probably not be very useful. It is usually required to have further cases that result in a unit correlation coefficient ρx y = 1, such as when y = Mx, where M is any nonsingular matrix, or

y= Ux, where U is any unitary matrix. There are further desirable properties. Chief

among them are invariance under specified classes of transformations on x and y, and the ability to assess correlation in a lower-dimensional subspace. What exactly this means will become clearer as we move along in our development.

The cross-correlation properties between x and y are described by the cross- correlation matrix Rx y = ExyH, but this matrix is generally difficult to interpret. In order to illuminate the underlying cross-correlation structure, we shall transform n- dimensional x and m-dimensional y into p-dimensional internal (latent) representa- tions␰ = Ax and ␻ = By, with p = min(m, n), as shown in Fig.4.4(a). The way in which the full-rank matrices A∈ Cp×n and B∈ Cp×m are chosen will determine the type of correlation analysis. In the statistical literature, the latent vectors ␰ and ␻ are usually called score vectors, and the matrices A and B are called the matrices of

loadings.

Our goal is to define different correlation coefficients as different functions of the correlations ki = E ξiωi∗, i= 1, . . ., p, which are the diagonal elements of the cross- correlation matrix K= E ␰␻H_{in the internal coordinate system of (␰, ␻). We would like}

as much correlation as possible concentrated in the first r coefficients{k1, k2, . . ., kr}, for any r≤ p, because this will allow us to assess correlation in a lower-dimensional subspace of dimension r . Hence, our aim is to choose A and B such that all partial sums

4.1 Measuring multivariate association 93

over the absolute values of the diagonal cross-correlations kiare maximized:

max A,B r i=1 |ki|, r = 1, . . ., p (4.21)

In order to make this a well-defined maximization problem, we need to impose some constraints on A and B. The following three choices are most compelling.

r Require that the internal representations ␰ and ␻ each have identity correlation matrix (we avoid using the term “white” because␰ and ␻ may have non-identity complemen-

tary correlation matrices): Rξξ = ARx xAH= I and Rωω= BRyyBH= I. This choice leads to canonical correlation analysis (CCA). The corresponding diagonal cross- correlations ki are called the canonical correlations, and the latent vectors␰ and ␻ are given in canonical coordinates.

r Require that A have unitary rows (which we will simply call row-unitary) and ␻ have identity correlation matrix: AAH_{= I and R}

ωω= BRyyBH= I. This choice leads to

multivariate linear regression (MLR), also known as half-canonical correlation anal- ysis. The corresponding diagonal cross-correlations ki are called the half-canonical

correlations, and the latent vectors␰ and ␻ are given in half-canonical coordinates.

r Require that A and B be row-unitary: AAH_{= I and BB}H_{= I. This choice leads to} partial least-squares (PLS) analysis. The corresponding diagonal cross-correlations ki are called the PLS correlations, and the latent vectors␰ and ␻ are given in PLS

coordinates.

Sometimes, when there is a risk of confusion, we will use the subscript C, M, or P to emphasize that quantities were derived using CCA, MLR, or PLS.

Example 4.3. If x and y are scalars, then the CCA constraints are|A|2_{= R}₋₁

x x and|B|

2₌ R−1yy. Thus, the latent variables areξ = R−1/2x x ejφ1x andω = R−1/2yy ejφ2y for arbitraryφ1

andφ2. We then obtain

|k| = |Eξω∗_{| =} |Rx y| √ Rx x Ryy = |ρx y|, withρx y defined in (4.5).

We will find in Section4.1.4that the solution to the maximization problem (4.21) for CCA, MLR, or PLS results in a diagonal cross-correlation matrix

K= E ␰␻H= Diag(k1, . . ., kp), (4.22) with k1≥ k2≥ · · · ≥ kp ≥ 0. Of course, the kis depend on which of the principles CCA, MLR, and PLS is employed, so they are canonical correlations, half-canonical correlations, or PLS correlations. Furthermore, we will show that CCA, MLR, and PLS each produce a set {ki} that has maximum spread in the sense of majorization (see Appendix3). Therefore, any correlation coefficient that is an increasing, Schur-convex function of{ki} is maximized, for arbitrary rank r.

94 Correlation analysis

The key difference among CCA, MLR, and PLS lies in their invariance properties. We will see that CCA is invariant under nonsingular linear transformation of both x and y, MLR is invariant under nonsingular linear transformation of y but only unitary transformation of x, and PLS is invariant under unitary transformation of both x and y. Therefore, CCA and PLS provide a symmetric assessment of correlation since the roles of x and y are interchangeable. MLR, on the other hand, distinguishes between the mes-

sage (or predictor/explanatory variables) x and the measurement (or criterion/response

variables) y. The correlation analysis technique must be chosen to match the invariance properties of the problem at hand.

4.1.3 Rotational, reflectional, and total correlations for complex vectors

It is easy to see how to apply these principles to reflectional and total correlations, as shown in Figs.4.4(b) and (c). For reflectional correlations, the internal representation is ␰ = Ax and ␻∗ _{= By}∗_{, whose complementary cross-correlation matrix is}_K_{= E␰␻}T

, and ˜ki = E ξiωi. The maximization problem is then to maximize all partial sums over the absolute value of the complementary cross-correlations

max A,B r i=1 |˜ki|, r = 1, . . ., p, (4.23)

with the following constraints on A and B.

r For CCA, ARx xAH= I and BR∗yyB

H_{= I.}

r For MLR, AAH_{= I and BR}∗

yyB

H_{= I.}

r For PLS, AAH_{= I and BB}H_{= I.}

For total correlations, the internal representation is computed as a widely linear function:␰ = A x (i.e., ␰ = A1x+ A2x∗) and␻ = B y (i.e., ␻ = B1y+ B2y∗). Here,

our goal is to maximize the diagonal cross-correlations between the vectors of real and imaginary parts of␰ and ␻. That is, for i = 1, . . ., p, we let

¯k2i−1= 2E(Re ξi Reωi)= Re(E ξiω∗i + E ξiωi), (4.24) ¯k2i = 2E(Im ξi Imωi)= Re(E ξiω∗i − E ξiωi), (4.25) and maximize max A,B r i₌₁ |¯ki|, r = 1, . . ., 2p, (4.26)

with the following constraints placed on A and B.

r For CCA, A Rx xAH= I and B RyyBH= I. Hence, ␰ and ␻ are each white and proper.

r For MLR, A AH_{= I and B R}

yyBH= I. While ␻ is white and proper, ␰ is generally improper.

4.1 Measuring multivariate association 95

Hence, we have a total of nine possible combinations between the three correlation

analysis techniques (CCA, MLR, PLS) and the three different correlation types (rota-

tional, reflectional, total). Each of these nine cases leads to different latent vectors (␰, ␻), and different diagonal cross-correlations ki, ˜ki, or ¯ki. We thus speak of rotational canonical correlations, reflectional half-canonical correlations, total canonical correlations, and so on.

4.1.4 Transformations into latent variables

We will now derive the transformations that solve the maximization problems for rotational, reflectional, and total correlations using CCA, MLR, or PLS. In doing so, we determine the internal (latent) coordinate system for (␰, ␻). The approach is the same in all cases, and is based on majorization theory. The background on majorization necessary to understand this material can be found in Appendix3.

In document Statistical Signal Processing of Complex-Valued Data.pdf (Page 113-117)