The sparse reduced-rank regression model - Sparse reduced-rank regression for imaging genetics

estimation of the individual univariate-response regressions. The choice of Γ = (Y_{− X ˆ}C(R))0(Y− X ˆC(R))

−1

uncovers the maximum likeli- hood estimates (MLE) of the RRR model, under the assumption that the error terms in Equation (3.3) are identically independently distributed with each i∙ ∼ MV N(0, Σ),

where_i∙is theith_{row of the error matrix}_{E and Σ}

is the error covariance matrix. More-

over, it can be shown that the MLEs of RRR are scaled versions of the corresponding estimates ˆA and ˆB resulting by setting Γ = (Y0Y)−1 and the product ˆC = ˆB ˆA is equivalent for both choices ofΓ (Reinsel and Velu,1998).

3.2 The sparse reduced-rank regression model

In this section we apply the penalisation techniques, in particular Lasso penalties, to the RRR model to induce the required sparsity in the model, allowing for the identification of the important variables driving the association. The factorisation of the regression coefficient C = BA, as illustrated in Equation (3.3), enables us to apply separate sparsity constraints on each of A and B related to phenotype and genotype variable selection, respectively. For instance, in CP-GWA studies only sparsity inB might be required, whereas in BW-GWA studies bothA and B are required to be sparse.

In high dimensional problems, when the number of variables in both domains greatly exceeds the number of observations, it is common to assume that the covariance matrices of X and Y are diagonal. As discussed in Chapter 2, diagonalising the covariances can also be considered as extreme penalisation (see Section2.1). In fact, this has been success- fully done in studies involving genomic and gene expression data that also pose complex correlational structures (Witten et al.,2009;Parkhomenko et al.,2009;Waaijenborg et al.,

2008). Taking this strategy, i.e. approximatingX0X by Ip and also setting Γ equal to Iq,

Equation (3.4) can be rewritten as

M= Tr{YY0} − 2Tr{AY0XB} + Tr{AA0B0B}.

obtained by solving the corresponding penalised least squares problem,

arg min

a,b {−2aY

0_{Xb + aa}0_b0_{b + 2λ}

akak1+ 2λbkbk1}

where an l1 penalty has been added to penalise both coefficients a and b. As with Lasso

regression, constraining thel1 norms of the coefficients results in estimates that are shrunk

towards zero. The penalisation parameters λb and λa control the sparsity, and hence the

number of predictors and responses that are included in the model, respectively. When λa is zero, no phenotype selection is performed whereas when λb is zero, no genotype

selection is performed.

As mentioned earlier, the estimated coefficients should also satisfy some normalisation constraints as defined in Equation (3.7). Incorporating these constraints in the optimisation problem then amounts to optimising the Lagrangian,

arg min

a,b {−2aY

0_{Xb + aa}0_b0_{b + 2λ}

akak1+ 2λbkbk1+ δaaa0+ δbb0b}

where δa and δb are Lagrangian multipliers. Our problem is biconvex in a and b and

can be solved iteratively as convex optimisation problems. For fixed a that is satisfying the normalisation conditions (3.7) (i.e. aa0 = 1) and fixed penalisation parameter λb, the

optimal ˆb should minimise

h(b) =−2aY0Xb + b0b + 2λbkbk1+ δbb0b.

Differentiating with respect tobsand setting to zero

−2x0sYa + 2ˆbs+ 2λbms+ 2δbˆbs = 0

wherems= sign(ˆbs) if ˆbs 6= 0 and ms ∈ [−1, 1] otherwise. Therefore, for ˆbs6= 0

ˆbs = 1 1 + δb x0sYa− λbsign(ˆbs) (3.10)

3.2 The sparse reduced-rank regression model 56

and for ˆbs = 0

|x0sYa| ≤ λb. (3.11)

Equations (3.10) and (3.11) taken together give ˆbs= 1 1 + δb Sλb(x 0 sYa)

where the Lagrangian multiplier δb should be chosen such that the b satisfies the nor-

malisation constraints b0b = θ2_{, where} _θ2 _{is the largest eigenvalue of} _Θ2

(r), defined in

Equation (3.7). Thus, the optimal ˆb is given by ˆ

b = θ Sλb(X0Ya) kSλb(X0Ya)k2

. (3.12)

For fixed b, satisfying the normalisation constraints (i.e. b0b = θ2_{) and fixed parameter}

λa, the optimalˆa should minimise

h(a) =−2aY0_{Xb + θ}2_aa0_{+ 2λ}

akak1+ δaaa0.

Similarly, taking the partial derivative with respect toas and setting to zero

a_s= 1 θ2_{+ δ}

S_λ_a(b0X0ys)

where the Lagrangian multiplierδashould be chosen to satisfy the normalisation conditions aa0 = 1. Thus the optimal ˆa is given by

ˆa = Sλa(b0X0Y) kSλa(b0X0Y)k2

. (3.13)

Starting with initial arbitrary coefficient vectors ˆa and ˆb, the solutions are found by using the updates (3.12) and (3.13) iteratively until convergence. The corresponding rank-one

Algorithm sRRR

Input: Data: X, Y; Parameters: λa,λb

Output: Vectors: ˆb, ˆa

1. Find starting vector a0such that a0a00= 1 2. Find starting vectorb0_{such that}_b00_b0 _{= θ}2

3. repeat 4. â_← Sλa(b 00_X0_Y) kSλa(b00X0Y)k2 5. bˆ _←θ Sλb(X0Yâ) kSλb(X0Yâ)k2 6. a0_←â 7. b0 _←ˆb

8. until ˆb and ˆa converge.

After the rank-one sparse solution has been found, further ranks can be obtained from the residuals of the data matrices,X and Y. In particular, the rth_{pair of regression coeffi-}

cients, denoted as ˆb(r)andˆa(r), are obtained using the data matricesX(r) andY(r) and the

residual matrices are formed as

X(r+1) = X(r)− ˆγX(r)bˆ(r)

Y(r+1) = Y(r)− ˆδY(r)ˆa0(r)

(3.14)

where γ and ˆˆ δ are obtained from regressing X(r) on X(r)bˆ(r) and Y(r) on Y(r)ˆa0(r), re-

spectively. These are subsequently used to obtain the(r + 1)th _{regression coefficients. A}

schematic illustration of both MMLR and sRRR models is given in Figure3.1.

Similar algorithms to thesRRRAlgorithm presented here have been developed for ob- taining sparse CCA estimates under the assumption of covariance diagonalisation (Witten et al.,2009;Parkhomenko et al., 2009;Waaijenborg et al.,2008). Related algorithms ob- taining sparse PLS estimates have also been developed byLe Cao et al.(2008) andChun and Keles¸ (2010). The similarity of these algorithms with sRRR comes from the the assumption that the predictor covariance matrix is diagonal, and also from setting the weight matrixΓ = Iqsince, as discussed in Section3.1.2, both CCA and PLS are special cases of

In document Sparse reduced-rank regression for imaging genetics studies: models and applications (Page 54-58)