• No results found

Simultaneous variable selection for joint models of longitudinal and survival outcomes

N/A
N/A
Protected

Academic year: 2021

Share "Simultaneous variable selection for joint models of longitudinal and survival outcomes"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

Web-based Supplemental Materials for

Simultaneous

Variable Selection for Joint Models of

Longitudinal and Survival Outcomes

by

Zangdong He, Wanzhu Tu, Sijian Wang, Haoda Fu and Zhangsheng Yu

1 Expectation conditional maximization procedures to optimize

the penalized likelihood

Let Θ = (θ, ζ1m, η2l), where θ = (β1,β2,Γ1,Γ2,φ) are defined in section 2.2.

The expectation conditional maximization procedures to optimize the penal-ized likelihood are proposed as follows:

1. Initialize (β(0)1 ,β(0)21(0)m, ζ1(0)m2(0)l , η(0)2l ,φ(0)) with some plausible values.

2. For iteration s, update β1,β2 by adaptive LASSO,

β1(s),β2(s) =argmax β1,β2 ˜ Q(β1, β2,Γˆ(1s−1),Γˆ2(s−1),φˆ(s−1)|βˆ1(s−1),βˆ2(s−1),Γˆ1(s−1),Γˆ(2s−1),φˆ(s−1)) −λ1 p X j=1 ωβ1j|β1j| −λ2 p X k=1 ωβ2k|β2k|. 3. updateγ1m,γ2l: γ1m(s),γ2l(s)=argmax γ1m,γ2l ˜ Q(βˆ1(s),βˆ2(s),Γ1,Γ2,φˆ(s−1)|βˆ (s) 1 ,βˆ (s) 2 ,Γˆ (s−1) 1 ,Γˆ (s−1) 2 ,φˆ (s−1)) − 1 4 q X m=2 (λ3ωγ1m) 2 (ζ1(sm−1))2 ||γ1m|| 2 1 4 q X l=2 (λ4ωγ2l) 2 (η2(sl−1))2 ||γ2l|| 2 .

(2)

4. updateζ1m, η2l: ζ1(sm) = r λγ1ωγ1m 2 ||γ (s) 1m||, η (s) 2l = r λγ2ωγ2l 2 ||γ (s) 2l ||. 5. updateφ: φ=argmax φ ˜ Q(βˆ(1s),βˆ2(s),Γˆ(1s),Γˆ2(s),φ|βˆ1(s),βˆ(2s),Γˆ(1s),Γˆ(2s),φˆ(s−1)).

6. Terminate the iteration when max|Θ(s)−Θ(s−1)| are small enough. Otherwise, let

s=s+ 1 and go back to step 2.

Before updating parameters in each step, the corresponding ˜Q function

is approximated by Gaussian quadrature in the E-step. To improve

compu-tation stability, smaller subset of (β12,Γ1,Γ2,φ) could be updated

itera-tively. We could update β1 when (β2,Γ1,Γ2,φ) is fixed, and then update β2

when (β1,Γ1,Γ2,φ) is fixed, and sequentially for Γ1, Γ2, and φ when other

parameters are fixed. It is at the price of more iterations.

2 Data generation for simulation study: Scenario 5

In Scenario 5, we generate the longitudinal outcome Yij from the following

model:

Yij =1 + 1.5X1ij,1 + 2X1ij,2 + 0X1ij,3 + 0X1ij,4 +bli,0

+ bli,1Z1ij,1 +bli,2Z1ij,2 +bli,3Z1ij,3 +bli,4Z1ij,4 + ij,

and the failure time from a Weibull distribution with the hazard function:

λi(t) = λ0(t) exp(1.5x2i,1 + 2x2i,2 + 0x2i,3 + 0x2i,4

+bsi,0 +bsi,1z2i,1 +bsi,2z2i,2 +bsi,3z2i,3 +bsi,4z2i,4),

for i = 1, . . . ,800, j = 1, . . . ,5, where λ0(t) = αλtα−1 with α = 2, and

λ = exp(1) = 2.718.

Random effectbi is independently generated fromN(0,I5). bli = (bli,0, bli,1, bli,2,

(3)

ob-tained by bsi = Γ2bi, where Γ1 = Γ2 = σD                  1 0 0 0 0 1 2 1 2 0 0 0 1 3 1 3 1 3 0 0 0 0 0 0 0 0 0 0 0 0                  1 2 andσD = √

0.5. CovariatesX1ij,1 = Z1ij,1, X1ij,2 = Z1ij,2, X1ij,3 = Z1ij,3, X1ij,4 =

Z1ij,4 andx2i,1 = z2i,1, x2i,2 = z2i,2, x2i,3 = z2i,3, x2i,4 = z2i,4 are generated as

in-dependent N(0,1) variables; The measurement error ij ∼ i.i.d.N(0,1). The

censoring time is independently generated from an exponential distribution to achieve a 60% censoring percentage.

3 Data generation for simulation study: Scenario 6

In Scenario 6, we generate the longitudinal outcome Yij from the following

model:

Yij =1 + 1.5X1ij,1 + 2X1ij,2 + 2.5X1ij,3 + 0X1ij,4 + 0X1ij,5 + 0X1ij,6 + 0X1ij,7+

bli,0 +bli,1Z1ij,1 +bli,2Z1ij,2 +bli,3Z1ij,3 +bli,4Z1ij,4 +bli,5Z1ij,5 + bli,6Z1ij,6+

bli,7Z1ij,7 +ij,

and the failure time from a Weibull distribution with the hazard function:

λi(t) =λ0(t) exp(1.5x2i,1 + 2x2i,2 + 2.5x2i,3 + 0x2i,4 + 0x2i,5 + 0x2i,6 + 0x2i,7+

bsi,0 +bsi,1z2i,1 +bsi,2z2i,2 +bsi,3z2i,3 +bsi,4z2i,4 +bsi,5z2i,5+

bsi,6z2i,6 +bsi,7z2i,7),

for i = 1, . . . ,250, j = 1, . . . ,5, where λ0(t) = αλtα−1 with α = 2, and

λ = exp(1) = 2.718.

Random effectbi is independently generated fromN(0,I8). bli = (bli,0, bli,1, bli,2,

(4)

bsi,5, bsi,6, bsi,7) is obtained by bsi = Γ2bi, where Γ1 = Γ2 = σD                                    1 0 0 0 0 0 0 0 1 2 1 2 0 0 0 0 0 0 1 3 1 3 1 3 0 0 0 0 0 1 4 1 4 1 4 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0                                    1 2

andσD = √0.5. CovariatesX1ij,1 = Z1ij,1, X1ij,2 = Z1ij,2, X1ij,3 = Z1ij,3, X1ij,4 =

Z1ij,4, X1ij,5 = Z1ij,5, X1ij,6 = Z1ij,6, X1ij,7 = Z1ij,7 and x2i,1 = z2i,1, x2i,2 =

z2i,2, x2i,3 = z2i,3, x2i,4 = z2i,4, x2i,5 = z2i,5, x2i,6 = z2i,6, x2i,7 = z2i,7 are

generated as independent N(0,1) variables; The measurement error ij ∼

i.i.d.N(0,1). The censoring time is independently generated from an expo-nential distribution to ahieve a 30% censoring percentage.

(5)

W eb T able 1: Selection frequency of mixed effects in longitudinal and surviv al comp onen ts for Scenario 5 Fixed effect selection Sel. F req.(%) for Longitudinal comp onen t Sel. F req.(%) for Surviv al comp onen t X1 , 1 X1 , 2 X1 , 3 X1 , 4 X2 , 1 X2 , 2 X2 , 3 X 2 , 4 Non-Zero Non-Zero Zero Zero Non-Zero Non-Zero Zero Zero 100 100 0 0 100 100 0 0 Random effect selection Sel. F req.(%) for Longitudinal comp onen t Sel. F req.(%) for Surviv al comp onen t Z1 , 1 Z1 , 2 Z1 , 3 Z1 , 4 Z2 , 1 Z2 , 2 Z2 , 3 Z2 , 4 Non-Zero Non-Zero Zero Zero Non-Zero Non-Zero Zero Zero 100 100 0 0 99 99 1 0

(6)

W eb T able 2: Estimation of fixed effec ts β1 ,j and β2 ,j in longitudinal and surviv al comp onen ts for Scenario 5 ˆβ1 ,j ± S E (Co v erage probabilit y) for Long itudinal comp onen t a In tercept X1 , 1 X1 , 2 X1 , 3 X1 , 4 T rue v alue β 1 1.5 2 0 0 W/O selection ˆβ 0.995 ± 0.033(94%) 1.500 ± 0.036(95%) 1.998 ± 0.039(94%) 0.002 0.001 1 st stage ˆβ 0.993 ± 0.033(92%) 1.487 ± 0.036(90%) 1.986 ± 0.039(87%) 0.000 0.000 2 nd stage ˆβ 0.999 ± 0.029(95%) 1.505 ± 0.034(95%) 2.001 ± 0.036(91%) 0.000 0.000 ˆβ2 ,j ± S E (Co v erage probabilit y) fo r Surviv a l comp onen t a In tercept X1 , 1 X1 , 2 X1 , 3 X1 , 4 T rue v alue β -1.5 2 0 0 W/O selection ˆβ 1.355 ± 0.126(74%) 1.844 ± 0.149(75%) 0.008 0.019 1 st stage ˆβ 0.989 ± 0.125(0%) 1.381 ± 0.145(1%) 0.000 0.000 2 nd stage ˆβ 1.348 ± 0.130(67%) 1.823 ± 0.152(73%) 0.000 0.000 a ˆβs are the a v erages o f estimates o v er the 100 data sets; SE is the empi ri cal standard error of the 1 00 ˆβs ; F or eac h data set, the 95% confidence in terv al b ased on the parameter and standard error estimates is calcul a ted and the corresp onding co v erage probabilities for the true v alue o v er the 100 data sets are included in the paren theses. SE and co v erage probabilit y are only rep orted for non-zero v ariables.

(7)

W eb T able 3: Estimation of random effects √ D 1 k k and √ D 2 k k in longitudinal and surviv al comp onen ts for Scenario 5 q ˆD1 k k for Longitudinal comp onen t a q ˆD2 k k for Surviv al comp onen t a I nter ce p t1 Z1 , 1 Z1 , 2 Z1 , 3 Z1 , 4 I nter cept 2 Z2 , 1 Z2 , 2 Z2 , 3 Z2 , 4 T rue v alue √ D k k 0.707 0 .7 07 0.707 0 0 0.707 0.707 0.707 0 0 W/O selection q ˆDk k 0.791 0 .8 17 0.817 0.052 0.050 0.776 0.820 0.825 0.205 0.202 1 st stage q ˆDk k 0.787 0 .7 73 0.763 0.000 0.000 0.407 0.368 0.319 0.000 0.000 2 nd stage q ˆDk k 0.682 0 .6 92 0.696 0.000 0.000 0.638 0.665 0.674 0.004 0.000 a q ˆD1 k k and q ˆD2 k k are the a v erages of esti mates o v er the 100 data sets.

(8)

W eb T able 4: Selection frequency of mixed effects in longitudinal and surviv al comp onen ts for Scenario 6 Fixed effect selection Sel. F req.(%) for Longitudinal comp onen t Sel. F req.(%) for Surviv a l comp onen t X 1 , 1 X 1 , 2 X 1 , 3 X 1 , 4 X 1 , 5 X 1 , 6 X 1 , 7 X 2 , 1 X 2 , 2 X 2 , 3 X 2 , 4 X 2 , 5 X 2 , 6 X 2 , 7 Non-Zero Non-Zero Non-Zero Zero Zero Zero Zero No n -Z e r o Non-Zero Non-Zero Zero Z e r o Zero Zero 100 100 100 0 0 0 0 100 100 100 0 0 0 0 Random effect selec t ion Sel. F req.(%) for Longitudinal comp onen t Sel. F req.(%) for Surviv a l comp onen t Z1 , 1 Z1 , 2 Z1 , 3 Z1 , 4 Z1 , 5 Z1 , 6 Z 1 , 7 Z2 , 1 Z2 , 2 Z2 , 3 Z2 , 4 Z 2 , 5 Z2 , 6 Z2 , 7 Non-Zero Non-Zero Non-Zero Z e r o Zero Zero Zero Non-Zero N on-Zero Non-Zero Zero Zero Zero Zero 100 100 100 0 0 0 0 97 93 94 6 4 1 9

(9)

W eb T able 5: Estimation of fixed effec ts β1 ,j and β2 ,j in longitudinal and surviv al comp onen ts for Scenario 6 ˆβ1 ,j ± S E (Co v era ge probabilit y) for Longitudinal comp onen t a In tercept X 1 , 1 X 1 , 2 X 1 , 3 X 1 , 4 X 1 , 5 X 1 , 6 X 1 , 7 T rue v alue β 1 1.5 2 2.5 0 0 0 0 W/O selection ˆβ 0.994 ± 0.068(85%) 1.498 ± 0.081(75%) 1.999 ± 0.072(79%) 2.496 ± 0.072(81%) 0.001 -0.004 0.000 -0.0 03 1 st stage ˆβ 0.987 ± 0.068(89%) 1.454 ± 0.079(82%) 1.960 ± 0.072(87%) 2.462 ± 0.072(87%) 0.000 0.000 0.000 0.000 2 nd stage ˆβ 0.994 ± 0.064(87%) 1.497 ± 0.076(82%) 1.995 ± 0.074(85%) 2.496 ± 0.073(86%) 0.000 0.000 0.000 0.000 ˆβ2 ,j ± S E (Co v era ge probabilit y) for Surviv al comp onen t a X 2 , 1 X 2 , 2 X 2 , 3 X 2 , 4 X 2 , 5 X 2 , 6 X 2 , 7 T rue v alue β 1.5 2 2.5 0 0 0 0 W/O selection ˆβ 1.966 ± 0.286(63%) 2.667 ± 0.377(49%) 3.313 ± 0.429(49%) 0.014 -0.025 0.011 0.035 1 st stage ˆβ 1.039 ± 0.249(20%) 1.495 ± 0.331(28%) 1.897 ± 0.370(30%) 0.000 0.000 0.000 0.000 2 nd stage ˆβ 1.549 ± 0.358(86%) 2.112 ± 0.593(82%) 2.625 ± 0.712(84%) 0.000 0.000 0.000 0.000 a ˆβs are the a v era ges of estimates o v er the 100 data sets; S E is the empirical standard error of the 100 ˆβs ; F or eac h data set, the 95% confidence in terv al based on the parameter and standard error estimates is calculated and the corresp onding co v erage pro babili ties for the true v alue o v er the 100 data sets are included in the paren theses. SE and co v erage probabilit y are only rep orted for no n-zero v a ri a ble s.

(10)

W eb T able 6: Estimation of random effects √ D 1 k k and √ D 2 k k in longitudinal and surviv al comp onen ts for Scenario 6 q ˆD 1 k k for Longitudinal comp onen t a In tercept Z 1 , 1 Z1 , 2 Z1 , 3 Z1 , 4 Z1 , 5 Z1 , 6 Z1 , 7 T rue v alue p D k k 0.707 0.707 0.707 0.707 0 0 0 0 W/O selection q ˆD k k 0.785 0.821 0.831 0.829 0.165 0.174 0.159 0.158 1 st stage q ˆD k k 0.768 0.677 0.657 0.633 0.000 0.000 0.000 0.000 2 nd stage q ˆDk k 0.628 0.669 0.693 0.699 0.000 0.000 0.000 0.000 q ˆD 2 k k for Surviv al comp onen t a In tercept Z 2 , 1 Z2 , 2 Z2 , 3 Z2 , 4 Z2 , 5 Z2 , 6 Z2 , 7 T rue v alue p D k k 0.707 0.707 0.707 0.707 0 0 0 0 W/O selection q ˆD k k 1.037 1.074 1.155 1.167 0.574 0.552 0.535 0.685 1 st stage q ˆD k k 0.511 0.431 0.412 0.382 0.002 0.004 0.000 0.025 2 nd stage q ˆDk k 0.652 0.698 0.725 0.782 0.047 0.018 0.005 0.091 a q ˆD1 k k and q ˆD2 k k are the a v erages of es timates o v er the 100 data sets.

(11)

Web Figure 1: Residual plots for data application diagnostics. The circles are the standardized residuals. The black lines are the LOESS estimates.

References

Related documents

Thus the deformation procedures in tension of the CuZrTi alloy can be summarized as follows: (i) in the initial stage of plastic deformation, the relatively softer (larger) primary

In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme

Head nurses can find out statuses of all nursing staffs through Care U on the web platform and improve nursing staffs’ practice behaviors through short films of health

discrepancy between the demographic breakdown of majority Black Division I men’s basketball.. players compared to majority White NCAA executive body, university presidents,

• If the defective device is not returned to New Matter within 30 days, New Matter reserves the right to both deactivate the serial number of the defective MOD-t and charge

But the very idea of individual awards, tailored to the particular circumstances of each eligible claimant, necessitated a more complex analytical approach to the administration

To determine whether rates and complexities of spinal surgeries differed when hospitals purchased from PODs, we analyzed hospitals’ Medicare claims to describe their spinal

Energy consultant - certification Post-graduate studies 1 year Energy consultant - Basic theoretical skills in science Competence Standard 7.. Sprawozdanie IKB PP