• No results found

2.6 Computational Algorithms

3.1.2 Connection to latent variable models

For a fixed rankr, the matricesA and B are obtained by minimising the weighted least squares criterion

M= Tr(Y− XBA) Γ (Y − XBA)0 (3.4) for a givenq×q positive definite matrix Γ. Most commonly, the weight matrix Γ is set to be either the inverse of the estimated covariance matrix of the responses or the identity matrix. As detailed in Section 3.1.2, these choices of Γ reveal connections to other multivariate models. The estimates ˆA and ˆB that minimise (3.4) are obtained as

ˆ

A = H0(r)Γ−12 ˆ

B = (X0X)−1X0YΓ12H(r)

(3.5)

where H(r) is the q × r matrix whose columns are the first r normalised eigenvectors

associated with ther largest eigenvalues of the q× q matrix

R = Γ12Y0X(X0X)−1X0YΓ 1

2. (3.6)

As the solutions ˆA and ˆB depend on normalised eigenvectors, they must satisfy ˆ AΓ ˆA0 = Ir ˆ B0X0X ˆB = Θ2(r) (3.7) whereΘ2

(r)is ther×r diagonal matrix with diagonal entries the eigenvalues corresponding

to ther eigenvectors inH(r).

Moreover, ˆB can be rewritten in terms of the least squares solution of Equation (3.2), ˆ

B = ˆC(R)Γ

1 2H(r).

Thus, the rankr estimate of the RRR coefficient matrixC is ˆ C(r) = ˆB ˆA = ˆC(R)Γ 1 2H(r)H0 (r)Γ− 1 2.

3.1 The reduced-rank regression model 50

IfC is of full rank, i.e. r = min(p, q), then H(r)H0(r) = Ir and the estimate ˆC(r) reduces

to the unconstrained MMLR coefficient estimate, ˆC(R). This relation illustrates the fact

that the multivariate nature of the RRR model results by the rank constraint imposed on the regression coefficient matrix. Moreover, it can be noted that the estimated values of the re- sponse variables are formed as linear combinations of the unconstrained MMLR estimated responses. ˆ Y(r)= X ˆC(r) = ˆY(R)G (3.8) whereG = Γ12H(r)H0 (r)Γ− 1 2.

3.1.1

Choosing the rank - The rank trace plot

The search for an ‘optimal’ reduced-rankR∗ can be aided by the rank trace plot (Izenman,

2008). The principle behind this graphical procedure is that when an adequate rankr has been selected, the estimated RRR coefficient matrix, ˆC(r), should be close to the full rank

coefficient matrix ˆC(R), and the estimated residual covariance matrix of the sRRR model

ˆ

S(r) = (Y− X ˆC(r))

0(Y− X ˆC (r))

should be close to the corresponding full rank residual covariance ˆS(R). The rank trace is obtained by plotting, for all values ofr in a range from 0 to R, the following two quantities:

Δ ˆC(r) = k ˆ C(R)− ˆC(r)kF k ˆC(R)− ˆC(0)kF and ΔˆS(r) = kˆS(R) − ˆS(r)kF kˆS(R) − ˆS(0)kF

wherer = 0 denotes the random model with ˆC(0) = 0 and ˆS(0) = ˆSyy andk ∙ kF denotes the Frobenius norm. The coefficientΔ ˆC(r) quantifies the relative change in the size of the

regression coefficients between a rankr and the random model (r = 0), holding the full rank model (r = R) as reference. Similarly, the coefficient ΔˆS(r) represents the propor-

tional difference in the corresponding residual covariance matrices. Asr varies from 0 to R in both the x and y axes, both coefficients take values in [0, 1]. The two opposite points in the plot – those having coordinates(0, 0) and (1, 1) – indicate the two extreme models: a full rank model (r = R) and a random model (r = 0), respectively. As more ranks are added, starting at the top-right corner withr = 0, the curve moves towards the origin of the plot. When a further rank addition does not produce a significant reduction inΔ ˆC(r) and

ΔˆS(r), the plot indicates that an ‘optimal’ rankR

has been found. In our experience, the

rank corresponding to the point which maximises the curvature yields satisfactory results – this can be found by fitting a polynomial smoothing spline to the(Δ ˆC(r), ΔˆS(r)) points for which second derivatives can be easily evaluated. An illustration of this procedure is shown in Figure4.5.

3.1.2

Connection to latent variable models

The RRR model is closely related to two well known multivariate dimensionality reduction methods: canonical correlation analysis (CCA) and partial least squares (PLS). These belong to a larger class of LVMs that perform dimensionality reduction in meaningful al- beit different ways. Both models can be shown to be special cases of RRR for different choices of the matrixΓ. In this Section we briefly describe these models and clarify their connection with RRR.

CCA is a well known multivariate technique that reduces the dimensionality of the original sets of variables by extractingr ≤ min(p, q), mutually orthogonal pairs of latent variables. These are formed asT = XU and S = YV where U and V are the (p× r) and(q× r) matrices of weights. Each pair of weight vectors (u, v), forming the columns of U and V, is obtained so as to produce pairs of maximally correlated latent variables t = Xu and s = Yv that are orthogonal to the previously extracted latent variable pairs. The solutions u and v are extracted by maximising the correlation between t and s, the so-called canonical correlation, given by

Corr(t, s) = u

0X0Yv

3.1 The reduced-rank regression model 52

Unique solutions are thus found by solving

max

u,v u

0X0Yv such that u0X0Xu = v0Y0Yv = 1

The weights for the firstr CCA latent variables are given by ˆ U = (X0X)−1X0Y(Y0Y)−12H∗ (r)Ξ−1(r) ˆ V = (Y0Y)−12H∗ (r)

whereH∗(r) is the(q× r) matrix whose columns are the first r normalised eigenvectors of R∗, where

R∗ = (Y0Y)−12Y0X(X0X)−1X0Y(Y0Y)−12 (3.9) andΞ(r) is a diagonal matrix composed of the square roots of the correspondingr eigen-

values; these coefficients are also equal to the canonical correlations of ther latent variable pairs. There is a close connection between the solutions of RRR and CCA. WhenΓ is set to be proportional to the inverse of the covariance of the responses, estimated as(Y0Y)−1,

the(q× q) matrix R in Equation (3.6) becomes identical toR∗in Equation (3.9). Conse-

quently, the matrix of weights ˆU forms a scaled version of ˆB, defined for RRR in Equation (3.5). The scaling on each column of ˆB is a result of the different normalisation constraints imposed on each optimisation problem. Moreover, the matrix of weights ˆV can be seen as a generalised inverse of ˆA defined for RRR in equation (3.5).

PLS is another widely used multivariate dimensionality reduction technique that finds pairs of latent variables (t, s) having maximum covariance. In particular, u and v are extracted by maximising

Cov(t, s) = u0X0Yv such that u0u = v0v = 1.

It can be noted that due to the following covariance decomposition

the maximisation of sample covariance explained by the latent factors implies maximising the sample correlation between factors while also maximising the variance explained by each individual component. The PLS solution for the firstr latent variables is given by

ˆ

U = X0YH+(r)M−1(r) ˆ

V = H+ (r)

whereH+(r) is the(q× r) matrix whose columns are the first r normalised eigenvectors of R+, with

R+= Y0XX0Y.

The diagonal matrixM(r)has entries given by the square roots of ther largest eigenvalues

of R+ which are also equal to the covariances of the r latent variable pairs. Notably,

CCA solutions also solve the PLS problem when the estimated covariance matrices of X and Y are diagonal matrices. The same connection holds between RRR and PLS when the covariance of X and the matrix Γ are set to be the identity matrices Ip and Iq. In

fact, the choice of Γ = Iq, recovers another LVM, known as redundancy analysis (RA).

In RA reduced sets of latent variables in the predictor space are extracted such that they maximise a redundancy index, defined as the variance explained in the response space (Van Den Wollenberg,1977).

Another closely related multivariate regression model was developed byBreiman and Friedman (1997). The authors proposed combining multiple univariate-response regres- sion estimates to improve prediction accuracy in a multivariate response setting. They sug- gested estimating a matrix to combine the univariate response estimates such that the mean squared error of each response estimator was minimised individually. They consequently showed that the optimal solution is a smoothed version of RRR with Γ = (Y0Y)−1 and the matrixG (defined in Equation (3.8)) replaced by Γ12H(R)DH0

(R)Γ−

1

2. Here, H(R) is composed of the full set of the normalised eigenvectors ofR, defined in Equation (3.6). D is a diagonal matrix that performs shrinkage on the eigenvector directions, where a greater shrinkage is applied to those eigenvectors corresponding to smaller eigenvalues. The au- thors have illustrated, through simulations, that this approach is significantly improving the