Connections with latent variable models - Connection with other models

3.8 Connection with other models

3.8.1 Connections with latent variable models

Latent variable models (LVMs) are a wide class of models that relate one set of variables or the association between two sets of variables via variables that are not directly observed

Figure 3.4: A forest plot of the distributions of base-2 log fold-changes of measured expression in cancer samples compared with normal tissue, for a large number of independent studies spanning 9 different cancer types, reveals that the selected gene GATM consistently down-regulates gene expression in many cancer types

3.8 Connection with other models 122

(“latent variables”). The number of latent variables are much less than the number of original variables, and hence LVMs represent some of the most commonly applied dimen- sionality reduction methods.

Partial least squares (PLS) models association between predictors and responses via pairs of latent variables. The principle of PLS is that given data matrices X and Y , we seek p−dimensional vector u and q−dimensional vector v such that the covariance between the X−score t = Xu and Y −score s = Y v is maximised. So far, many variations of PLS have been proposed [H¨oskuldsson,1988,Sampson et al.,1989,Wold,1975,Worsley

et al., 1997] which differ on how to compute proceeding scores once the first layer scores

are computed. It was shown the subtle differences between different versions of PLS do not have large differences in practical applications, in terms of the percentage of variance explained and the relative magnitude of the loadings [Burnham et al., 1996, Braak and

de Jong, 1998]. Here, we focus on a particular variant, SIMPLS, which simultaneously

extracts the loadings u and w and scores t and s associated with multiple latent variables [Jong,1993]. The estimated SIMPLS loadings ˆU and ˆV maximises:

Tr{UTXTY V } (3.24)

subject to:

U_jTUj = I j = 1, 2, ..., R

U_iTXTXUj = 0 i 6= j

V_kTVk = 1 k = 1, 2, ..., R (3.25)

The solution of the rthloadings are:

ˆ Ur = XTY H(r)∗ θ −1 (r) ˆ Vr = H(r)∗ (3.26)

where H_(r)∗ is the normalised eigenvector of the rth largest eigenvalue of YTXXT_{Y , and}

Note that minimising the RRR objective function (3.2) is equivalent to maximising:

Tr{2 · BTXTY ΓAT + AΓATBTXTXB} (3.27)

subject to constraints (3.9) and (3.10). If we take Γ = I in the RRR and XT_{X = I in both}

RRR and SIMPLS models, the RRR objective function (3.27) becomes:

Tr{2 · BTXTY AT + AATBTB} ,

from which we can see RRR and SIMPLS correspond to the same optimisation problem except for different normalising constants in their constraints. The RRR solutions (3.6) and (3.7) and SIMPLS solutions (3.26) will be indeed equal once adjusted to the factor θ(r), the

square root of the rthlargest eigenvalue of YTXXT_{Y .}

In a sparse setting, by penalising the `1-norm of the PLS loadings U and V , some

entries in the estimated loadings will be exactly zero, which we shall refer as the sparse PLS (sPLS) [Chun and Keles¸,2010]. Let the first layer loadings be u and v, sPLS seeks to maximise:

uTXTY v + Px(u) + Py(v) (3.28)

subject to:

uTu = vTv = 1 (3.29)

where Px(u) = λukuk1 and Py(v) = λvkvk1. Note if we replace Px(u) and Py(v) by

the NsRRR penalty in (3.14), the optimisation problem of first-layer sPLS and rank-one NsRRR will only differ in the normalisation constraints, namely uu = 1 for the former and bT_{b = θ}2 _{for the latter.}

Once the first-layer loadings are obtained for sPLS, loadings of second layer can be estimated by maximising a similar objective function as in (3.28) subject to the same constraints as in (3.29) while replacing Y by ˜YsP LS _{= Y − ˆ}_YsP LS_{, where ˆ}_YsP LS ₌

X ˆu(ˆuT_XT_{X ˆ}_u)−1_u_ˆT_XT_{Y [}_{Chun and Keles¸}_,₂₀₁₀_{]. On the other hand, given rank-one es-}

3.8 Connection with other models 124

be computed by maximising bTXTY˜N sRRRaT plus the penalty terms, where ˜YN sRRR = Y − ˆYN sRRR_{. It is clearly that ˜}_YsP LS_{depends on ˆ}_{u only whereas ˜}_YN sRRR_{depends on both}

a and ˆb, and the second layer/rank estimates will not be proportional. It is also remarkable that the orthogonality constraints bT_i XTXbj = 0 in RRR and uTi XTXuj = 0 in PLS for

i 6= j are both relaxed when sparse estimates are derived in NsRRR and sPLS.

Another widely applied LVM is canonical correlation analysis (CCA) [Hotelling,1936]. Similar to PLS, CCA requires p−dimensional vector u and q−dimensional vector v in or- der to compute X-score t = Xu and Y-score s = Xv. However, unlike PLS, CCA seeks u and v such that the correlation of t and s are maximised [Johnson,1998]. Let U and V be p × R and q × R matrices whose columns consist of the CCA loadings, the estimates ˆU and ˆV are obtained by maximising (3.24) subject to constraints:

U_jTXTXUj = I j = 1, 2, ..., R

V_kTYTY Vk = I k = 1, 2, ..., R (3.30)

The solution of the rthloadings are:

ˆ Ur = (XTX)−1XTY (YTY )− 1 2H? (r)(θ ? (r)) −1 ˆ Vr = (YTY )− 1 2H? (r) (3.31) where H?

(r)is the normalised eigenvector associated to the rthlargest eigenvalue of

(YTY )−12YTX(XTX)−1XTY (YTY )− 1

2 (3.32)

and θ?_(r)is the square root of the rth largest eigenvalue of (3.32).

There is an intimate connection between CCA and RRR if we take Γ = (YT_{Y )}−1 _in

RRR. As such, the RRR solutions (3.7) and (3.6) become:

b(r) = (XTX)−1XTY (YTY )−12H_(r) ˆ

where H(r)is the normalised eigenvector associated to the rthlargest eigenvalue of

(YTY )−12YTX(XTX)−1XTY (YTY )− 1

2 (3.34)

Note (3.34) is identical to (3.32) and thus H(r) = H_(r)? , we have the following connection

between RRR estimates ˆA, ˆB and CCA estimates ˆU and ˆV :

U = B(Θˆ (R))−1 ˆ

V = (YTY )−1AˆT (3.35)

where Θ(R) is the R × R diagonal matrix whose diagonal contains the square root of the R largest eigenvalues of (3.34). (3.35) shows there exists linear transformations that map CCA estimates to RRR estimates, and vice versa.

In summary, CCA and PLS both seek linear transformations which project predictors and responses into low-dimensional vector spaces, such that the correlation or covariance between the low-dimensional scores are maximised. We illustrate the model architecture in Figure3.5. This is different compared with the RRR model architecture illustrated in Figure3.2in that the associations between predictors and responses are established between the low-dimensional projections in CCA and PLS, on the other hand, RRR regresses the response variables on a set of latent variables composed from linear combinations of the predictors.

In document Sparse multivariate models for pattern detection in high-dimensional biological data (Page 120-125)