estimation of the individual univariate-response regressions. The choice of Γ = (Y− X ˆC(R))0(Y− X ˆC(R))
−1
uncovers the maximum likeli- hood estimates (MLE) of the RRR model, under the assumption that the error terms in Equation (3.3) are identically independently distributed with each i∙ ∼ MV N(0, Σ),
wherei∙is theithrow of the error matrixE and Σ
is the error covariance matrix. More-
over, it can be shown that the MLEs of RRR are scaled versions of the corresponding esti- mates ˆA and ˆB resulting by setting Γ = (Y0Y)−1 and the product ˆC = ˆB ˆA is equivalent for both choices ofΓ (Reinsel and Velu,1998).
3.2
The sparse reduced-rank regression model
In this section we apply the penalisation techniques, in particular Lasso penalties, to the RRR model to induce the required sparsity in the model, allowing for the identification of the important variables driving the association. The factorisation of the regression co- efficient C = BA, as illustrated in Equation (3.3), enables us to apply separate sparsity constraints on each of A and B related to phenotype and genotype variable selection, re- spectively. For instance, in CP-GWA studies only sparsity inB might be required, whereas in BW-GWA studies bothA and B are required to be sparse.
In high dimensional problems, when the number of variables in both domains greatly exceeds the number of observations, it is common to assume that the covariance matrices of X and Y are diagonal. As discussed in Chapter 2, diagonalising the covariances can also be considered as extreme penalisation (see Section2.1). In fact, this has been success- fully done in studies involving genomic and gene expression data that also pose complex correlational structures (Witten et al.,2009;Parkhomenko et al.,2009;Waaijenborg et al.,
2008). Taking this strategy, i.e. approximatingX0X by Ip and also setting Γ equal to Iq,
Equation (3.4) can be rewritten as
M= Tr{YY0} − 2Tr{AY0XB} + Tr{AA0B0B}.
obtained by solving the corresponding penalised least squares problem,
arg min
a,b {−2aY
0Xb + aa0b0b + 2λ
akak1+ 2λbkbk1}
where an l1 penalty has been added to penalise both coefficients a and b. As with Lasso
regression, constraining thel1 norms of the coefficients results in estimates that are shrunk
towards zero. The penalisation parameters λb and λa control the sparsity, and hence the
number of predictors and responses that are included in the model, respectively. When λa is zero, no phenotype selection is performed whereas when λb is zero, no genotype
selection is performed.
As mentioned earlier, the estimated coefficients should also satisfy some normalisation constraints as defined in Equation (3.7). Incorporating these constraints in the optimisation problem then amounts to optimising the Lagrangian,
arg min
a,b {−2aY
0Xb + aa0b0b + 2λ
akak1+ 2λbkbk1+ δaaa0+ δbb0b}
where δa and δb are Lagrangian multipliers. Our problem is biconvex in a and b and
can be solved iteratively as convex optimisation problems. For fixed a that is satisfying the normalisation conditions (3.7) (i.e. aa0 = 1) and fixed penalisation parameter λb, the
optimal ˆb should minimise
h(b) =−2aY0Xb + b0b + 2λbkbk1+ δbb0b.
Differentiating with respect tobsand setting to zero
−2x0sYa + 2ˆbs+ 2λbms+ 2δbˆbs = 0
wherems= sign(ˆbs) if ˆbs 6= 0 and ms ∈ [−1, 1] otherwise. Therefore, for ˆbs6= 0
ˆbs = 1 1 + δb x0sYa− λbsign(ˆbs) (3.10)
3.2 The sparse reduced-rank regression model 56
and for ˆbs = 0
|x0sYa| ≤ λb. (3.11)
Equations (3.10) and (3.11) taken together give ˆbs= 1 1 + δb Sλb(x 0 sYa)
where the Lagrangian multiplier δb should be chosen such that the b satisfies the nor-
malisation constraints b0b = θ2, where θ2 is the largest eigenvalue of Θ2
(r), defined in
Equation (3.7). Thus, the optimal ˆb is given by ˆ
b = θ Sλb(X0Ya) kSλb(X0Ya)k2
. (3.12)
For fixed b, satisfying the normalisation constraints (i.e. b0b = θ2) and fixed parameter
λa, the optimalˆa should minimise
h(a) =−2aY0Xb + θ2aa0+ 2λ
akak1+ δaaa0.
Similarly, taking the partial derivative with respect toas and setting to zero
ˆ
as= 1 θ2+ δ
a
Sλa(b0X0ys)
where the Lagrangian multiplierδashould be chosen to satisfy the normalisation conditions aa0 = 1. Thus the optimal ˆa is given by
ˆa = Sλa(b0X0Y) kSλa(b0X0Y)k2
. (3.13)
Starting with initial arbitrary coefficient vectors ˆa and ˆb, the solutions are found by using the updates (3.12) and (3.13) iteratively until convergence. The corresponding rank-one
Algorithm sRRR
Input: Data: X, Y; Parameters: λa,λb
Output: Vectors: ˆb, ˆa
1. Find starting vector a0such that a0a00= 1 2. Find starting vectorb0such thatb00b0 = θ2
3. repeat 4. ˆa← Sλa(b 00X0Y) kSλa(b00X0Y)k2 5. bˆ ←θ Sλb(X0Yˆa) kSλb(X0Yˆa)k2 6. a0←ˆa 7. b0 ←ˆb
8. until ˆb and ˆa converge.
After the rank-one sparse solution has been found, further ranks can be obtained from the residuals of the data matrices,X and Y. In particular, the rthpair of regression coeffi-
cients, denoted as ˆb(r)andˆa(r), are obtained using the data matricesX(r) andY(r) and the
residual matrices are formed as
X(r+1) = X(r)− ˆγX(r)bˆ(r)
Y(r+1) = Y(r)− ˆδY(r)ˆa0(r)
(3.14)
where γ and ˆˆ δ are obtained from regressing X(r) on X(r)bˆ(r) and Y(r) on Y(r)ˆa0(r), re-
spectively. These are subsequently used to obtain the(r + 1)th regression coefficients. A
schematic illustration of both MMLR and sRRR models is given in Figure3.1.
Similar algorithms to thesRRRAlgorithm presented here have been developed for ob- taining sparse CCA estimates under the assumption of covariance diagonalisation (Witten et al.,2009;Parkhomenko et al., 2009;Waaijenborg et al.,2008). Related algorithms ob- taining sparse PLS estimates have also been developed byLe Cao et al.(2008) andChun and Keles¸ (2010). The similarity of these algorithms with sRRR comes from the the as- sumption that the predictor covariance matrix is diagonal, and also from setting the weight matrixΓ = Iqsince, as discussed in Section3.1.2, both CCA and PLS are special cases of