Algorithmic variants - MCMC inference for the extended AFT model

Part I. Bayesian regularization in the AFT model

6. MCMC inference for the extended AFT model

6.2. Algorithmic variants

6.2.1. Standardization of the baseline error distribution

As outlined in Subsection 2.3 we require constraints on the transformed weights α0 to achieve a standardized baseline error distribution to enforce the interpretation of the predictor ηi as the mean

i Y

(Y | )θ = µ

E _{and the scale parameter} _σ2_{as the variance}

i 2

i Y

ar(Y | )θ = σ

V _{of the conditional}

distribution Y |i θ, compare (2.12) and (2.14). In this case the trace plots of the samples of the global intercept parameter γ0, the scale parameter σ2 and the samples of the mixture weights w( )α0 of the corresponding baseline error distribution Y | ,0 γ σ0 indicate the desired convergence. With an

unconstrained estimation of the transformed mixture weights α0 the mean 0 g k 0 k k 1w ( )m ε = µ =

∑

α and the variance 2 g0 2 2 2 k 0 k k k 1w ( )(m s ) ε = ε

σ =

∑

α + − µ , of the error distribution ε|α0, (2.8), can take any values

µ ∈ℝ and 2 ₀

σ > . The corresponding mean and variance of the baseline error distribution Y | ,0 γ σ0 are given by µY0 = γ + σµ0 ε and 0

2 2 2

Y ε

σ = σ σ , compare (2.13), and the interpretation of the parameters 0

γ and _σ2_{as global intercept and variance of the conditional distribution} i

Y |θ is not feasible. As a further consequence of the unconstrained update of α0, the parameters γ0 and µε of the mean

Y 0 ε

µ = γ + σµ as well as the parameters _σ2_and 2

σ of the variance 2₀ 2 2

Y ε

σ = σ σ of the conditional distribution Y | ,0 γ σ0 are not identifiable. This is due to the fact, that for any scalar c∈ℝ,

0 0 c

γ = γ + σ

ɶ _{in combination with}_{µ = µ −}ɶ_ε _ε _c, and _{σ =}_ɶ2 _c2_σ2_{in combination with} 2 _c−2 2

ε ε

σ =ɶ σ _{, does not}

change the mean µY0 and the variance 0 2 Y

σ . It follows that only the trace plots of the mean µY0 and variance 2₀

σ of the baseline error distribution show the desired stationarity, but stationarity is not indicated by the trace plots of the single components of these expressions and the mixture weights

0 ( ) w α .

The unconstrained estimation requires also an adjustment of the hyperparameters of the intercept γ0 and the scale parameter σ in the algorithm, if informative instead of diffuse priors are used for these parameters. Consider as an example the scale parameter _σ2_{. In the case of a standardized error} distribution ~ (0,1)ε , our prior knowledge of the baseline variance Var(Y |0 γ σ0, ) of the conditional distribution Y | ,0 γ σ0 is reflected by the hyperparameters h1,σ and h2,σ of an inverse gamma

distribution, compare (5.18). For an unstandardized baseline error distribution our knowledge concerns the product of the scale parameter and the variance of the baseline error distribution, i. e.

2 2

0 0 1, 2,

ar(Y |γ σ = σ σ, ) ε ~ InvGamma(h , h )σ σ

V _{. Due to the identification problem of the variance, the}

single components 2

σ and _σ2_{can take any positive value in each iteration and we have to adjust the} hyperparameters of the prior for the scale parameter _σ2_{with respect to the values of} 2

σ in the

iterations according to 2 2

1, 2, ~ InvGamma(h , hσ σ ε)

σ σ . The same argumentation holds for the intercept

and is leading to the adjustment 0 0

2 0~ N(h1,γ ε, h2,γ )

γ − σµ in every iteration of the sampler.

We advocate the following strategy to obtain a standardized baseline error distribution, which avoids the direct implementation of constraints in the update of the (transformed) mixture weights. Let

0 g k 0 k k 1w ( )m ε = µ =

∑

α and 2 g0 2 2 2 k 0 k k k 1w ( )(m s ) ε = ε

σ =

∑

α + − µ denote the mean and the variance of the

unstandardized error distribution with density g0 2

0 k 1 k 0 k k

f ( |ε ⋅ α )=

∑

= w ( ) ( | m ,s )α ϕ ⋅ from (2.7). The density of the standardized baseline error distribution εɶ0|α0 is obtained via the transformation

0 ( ε) ε

ε = ε − µɶ σ _as 0

g ₂

0 _{k 1} k 0 0,k 0,k

f ( |εɶ ⋅ α )=

∑

= w ( ) ( | m ,s )α ϕ ⋅ ɶ ɶ with basis knots mɶ0,k and basis variances 2 0,k sɶ _{given by} 2 k ₂ k 0,k 0,k ₂ 0 m s m ε, s , k 1,...,g ε ε − µ = = = σ σ ɶ ɶ _. _(6.31)

This transformation shifts and scales only the basis knots and variances to match the zero mean and unit variance condition, but leaves the mixture weights unchanged. To avoid a change of the posterior we have to ensure, that the mean µY0 = γ + σµ0 ε and the variance 0

2 2 2

Y ε

σ = σ σ of the baseline error distribution Y | ,0 γ σ0 , now expressed in terms of the standardized error distribution εɶ0|αɶ0, does not change. With the relationship Y0= γ + σε = γ + σµ + σσ εɶ0 ɶɶ0 0 ε εɶ0 follows that we have to adjust the intercept and the scale parameter according to

2 2 2

0 0 ε, ε

γ = γ + σµ σ = σ σ

To reformulate the standardized baseline error density in terms of the initial basis knots mk and basis variances 2

s , we have to recompute the weights at the basis knots mk by solving the linear equation system 0 0 g ₂ k k k 0 0 k 1= w (m | m ,s ) f (m |ϕ = ε ), =1,...,g

∑

ɶ ℓ ɶ ℓ α ℓ , (6.33)

with respect to the constraints g0 k k 1= w =1

∑

ɶ and wɶk>0. The standardized baseline error density is then

given by 0

g 2

0 k 1 k 0 k k

f ( |εɶ ⋅ αɶ )=

∑

₌ w ( ) ( | m ,s )ɶ αɶ ϕ ⋅ , where αɶ0 denotes the corresponding transformed mixture weights.

In the simulations we try two alternatives. We run the MCMC iterations without the standardization of the baseline error density within the algorithm described in Section 5.2.3 and standardize the error density in post-processing steps with the listed corrections (6.32) applied to the samples of the involved parameters. Alternatively, we standardize the baseline error density according to (6.32) within the sampling algorithm, after every update of the mixture weights, but we leave the weights unchanged to avoid in the next iteration sampling from possibly unfavorable conditional posterior regions arising probably from the recomputation of the mixture weights according to (6.33). In this version the knot positions and basis variances change in each iteration of the sampler and need to be stored in addition to the samples of the parameters. The option for a within-standardization is selected in the function baftpgm() with the argument scalebasis=TRUE. The weights are optionally recomputed after the simulation to show the convergence also in terms of the mixture weights.

6.2.2. Varying the update order of the transformed mixture weights

If we specify the methods method.alpha=“mcondstep” or method.alpha=“slice” in the function baftpgm(), we have the following options to vary the order of the update of the transformed mixture weight in every loop of the sampler (we assume e. g. αg0: 0= for identifiability):

• order.alpha=”fix1”. The order of the indices is fixed to (1, 2,3,....,g0−1).

• order.alpha=”fix2”. The fixed order of the coefficients is determined in the way, that the coefficient j, which is just updated, does not depend on the coefficients used for the update of the previous coefficient j 1− . If d denotes the used difference order, the update order is 0

j→j (d+ 0+1)→j 2(d+ 0+1)→,…,j 1,...,d= 0+1. This is the default setting.

• order.alpha=”random1”. In each update step a random permutation of the indices 0

(1, 2,3,....,g −1) is used.

• order.alpha =”random2”. In the order of the option order.alpha=”fix2” in each step one random cut is used to exchange the update order.

6.2.3. Varying the update of the component labels

We have several alternatives to classify the observations to the mixture components.

• method.rlabel=”gibbs”: Random assignment by sampling the labels with p from (6.29), ij which is the default option.

• method.rlabel=”fix-maxprob”: Hard assignment to the class with maximum probability

i,max i 0

• method.rlabel=”fix-interval”: Hard assignment to intervals Ij around the knots that build a partition of the domain of the error distribution. E. g. for equidistant knots and homoscedastic basis variances these intervals are Ij=(0.5(mj 1− +m ),0.5(mj j+m )]j 1+ ,

j 1,...,g= , and m :−1 = −∞, mg 10+ := ∞.

6.2.4. Scale dependent implementation

In addition a variant with scale-dependent covariance matrices in the priors of the regularized predictor components (5.6) and (5.16) is implemented, i. e. , 2

β β = σ τ Σ D and j j 2 2 j − − α = σ τα Σ K . With this

parametrization the values of the scale parameter strengthen or relax the regularization. The derivation of the associated full conditionals is straightforward. We can select this option with the argument scaledpri=TRUE.

In document Konrath, Susanne (2013): Bayesian regularization in regression models for survival data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 86-89)