Prior and proposal specifications - Statistical modelling

Chapter 3 Statistical modelling

3.5 Prior and proposal specifications

In order to carry out our inference, we employ Markov chain Monte Carlo (MCMC) methods; in particular we use a Metropolis-within-Gibbs algorithm (Metropolis and

Ulam, 1949; Metropoliset al., 1953; Hastings, 1970).

We use a conjugate normal-gamma hyperprior for eachΘj,j= 1, ...p,µj|τj ∼

N(µj0, λj0

τj

) and τj ∼ G(αj0, βj0), where the former denotes the normal distribution

with meanµj0 and variance

λj0 τj

and the latter indicates the gamma random variable

(r.v.) with shape and rate parameters αj0 and βj0, respectively, i.e. with mean

αj0 βj0

and variance αj0

β2

. The choice of a conjugate hyperdistribution means that the hypermean and hyperprecision, conditional on the hierarchical parameters, still are normal and gamma distributed, respectively, and hence can be sampled via a Gibbs step, thus decreasing the computational burden.

With the exception of δ(i), c(i), σ(_Ni) and σ_C(i), on which prior information

is available, the hyperpriors were set to be non-informative for all the remaining

parameters, withµj0 = 0, λj0 = 10

4_,_α

j0 = 0.001and βj0 = 0.001. These standard

choices correspond to a vague normal prior for the hypermean µj|τj ∼ N(0,

104

τj )

103_{. The latter is a usual vague prior for the precision parameter, or analogously} its inverse for the variance (Gelman, 2006). Such a non-informative prior choice for the precision is used for all parameters: even when prior information is available, we only formulate an informative prior for the hypermean parameter.

Informative priors and simplifications

We formulate informative hyperpriors for the hypermean parameters ofδ(i),c(i),σ_N(i)

andσ(_Ci).

Forσ_N(i)andσ(_Ci), the prior information is obtained from an exploratory study

on repeated measurements on three cells, which will be described in Section 4.3.

In particular, µj0 = 4.41 and λj0 = 0.1 for the nuclear standard deviation, and

µj0 = 4.52,λj0 = 0.1, for the cytoplasmic one.

Prior information on the degradation rate was taken from Boisvert et al. (2012). The authors estimate the 50% turnover of Nrf2 proteins to be 5.09 hours, where the 50% turnover is the time until 50% of the original population, which was present at time 0, has changed. Under steady-state conditions, the 50% turnover

represents an accurate approximation of the protein half-life, that we callt₁_/₂, which

is the time until half of the initial population is degraded, assuming no synthesis (Claydon and Beynon, 2012). Under exponential decay, the half-life can easily be converted into the degradation rate. Assuming no synthesis and a constant degra-

dation rateδ per element of the population, which at time t we call Wt, we obtain

the following differential equation (DE) for the evolution of Wt:

dWt

dt = −δ Wt.

This DE has solutionWt=W0e−δ t; from this solution it is possible to express the

degradation rate with respect to the half-life, t₁_/₂, by substituting Wt1/2 =

2W0 in

the DE solution, we obtain δ = ln(2)

t₁_/₂ . Hence, by replacing t1/2 with its estimate

of 305.4 minutes, corresponding to 5.09 hours, we obtain a per minute degradation rate of 0.002269. Therefore we set the degradation hypermean parameters to

µj0 =log(0.002269) = −6.088and λj0 = 1; the choice of λj0, less informative than

for measurement error standard deviation, reflects a higher degree of uncertainty in this piece of prior information.

An exploratory study on the ratio of cytoplasmic and nuclear areas, which will be illustrated in Section 4.1, allows us to formulate two informative hyperpriors,

one for each condition, for the hypermean ofc(i); in particular, we setµj0 = 2.64and

λj0 = 0.1, for the basal condition, and µj0 = 2.47 and λj0 = 0.1, for the stimulated

one.

distribution, σ(τi), although structurally identifiable, suffers from a lack of practical

identifiability, due to the complexity of the model and the limited data available. In order to circumvent this problem, we decide to keep the distributed structure of the delay, which is a more realistic assumption, yet with a fixed standard deviation

throughout. Therefore, to decrease the model complexity, στ is chosen not to be

hierarchical: στ(i) = στ,∀i. This implies that the delay distribution has a different

mean in each cell, although the same variance. After analysing the behaviour of the

distribution ofτ for several values of στ, we set the standard deviation of the delay

στ = 3 for all cells.

Therefore we redefine the hierarchical parameter vector we want to infer as θ(i)= (k_d(i), ka(i), Ka(i), µ(τi), γ(i), δ(i), c(i), κ_N(i), σ(_Ni), σ_C(i))T

Adaptive random walk proposal

The sampling of the hierarchical parameters inθ(i)_{follows a Metropolis-within-Gibbs}

scheme, where movements for each θ(i) are proposed and accepted in five blocks,

that we define as, θ₍(_bi)₁₎ = (k_d(i), µτ(i)), θ(₍i_b)₂₎ = (ka(i), Ka(i)), θ₍(_bi)₃₎ = (δ(i), γ(i)), θ₍(_bi)₄₎ = (c(i), κ(_Ni)) and θ₍(_bi)₅₎ = (σ_N(i), σ_C(i)). The blocks are chosen, after an initial analysis where each hierarchical parameter is proposed independently from a simple random walk (RW), by merging, in the same block, the parameters with the most correlated

posterior chains. We also defineb1={1,4},b2 ={2,3},b3={5,6},b4 ={7,8}and

b5 = {9,10} as the vectors indicating the elements of θ(i) belonging to each one of

the five blocks.

For eachi, proposals in each block are sampled, in the log space, according

to the adaptive random walk (ARW) scheme (Haario et al., 2001), from a normal

distribution centred around the previous iteration values with variance proportional to the covariance matrix estimated from the parameter chains of the respective block.

The adaptation is analogous to the one implemented by Haarioet al.(2001), where

constantsandsdare chosen in order to optimize each block’s acceptance rate. The

MCMC is first run for 2,000 iterations without adaptation, as a standard random walk (RW), and only then the covariance matrices are computed from the chains, excluding the first 1,000 values, and they are used to tune the proposal variance. Being the correlation computed on all values of the chain from a fixed starting point onwards, the diminishing adaptation requirement (Roberts and Rosenthal, 2009) is respected. In other words, the proposal distribution stabilises as the chains increase;

i.e. the influence, on the proposal distribution, of the r-th iteration of the MCMC,

In document Bayesian hierarchical stochastic inference on multiple, single cell, latent states from both longitudinal and stationary data (Page 51-54)