Chapter 3 Statistical modelling
3.5 Prior and proposal specifications
In order to carry out our inference, we employ Markov chain Monte Carlo (MCMC) methods; in particular we use a Metropolis-within-Gibbs algorithm (Metropolis and
Ulam, 1949; Metropoliset al., 1953; Hastings, 1970).
We use a conjugate normal-gamma hyperprior for eachΘj,j= 1, ...p,µj|τj ∼
N(µj0, λj0
τj
) and τj ∼ G(αj0, βj0), where the former denotes the normal distribution
with meanµj0 and variance
λj0 τj
and the latter indicates the gamma random variable
(r.v.) with shape and rate parameters αj0 and βj0, respectively, i.e. with mean
αj0 βj0
and variance αj0
β2
j0
. The choice of a conjugate hyperdistribution means that the hypermean and hyperprecision, conditional on the hierarchical parameters, still are normal and gamma distributed, respectively, and hence can be sampled via a Gibbs step, thus decreasing the computational burden.
With the exception of δ(i), c(i), σ(Ni) and σC(i), on which prior information
is available, the hyperpriors were set to be non-informative for all the remaining
parameters, withµj0 = 0, λj0 = 10
4,α
j0 = 0.001and βj0 = 0.001. These standard
choices correspond to a vague normal prior for the hypermean µj|τj ∼ N(0,
104
τj )
103. The latter is a usual vague prior for the precision parameter, or analogously its inverse for the variance (Gelman, 2006). Such a non-informative prior choice for the precision is used for all parameters: even when prior information is available, we only formulate an informative prior for the hypermean parameter.
Informative priors and simplifications
We formulate informative hyperpriors for the hypermean parameters ofδ(i),c(i),σN(i)
andσ(Ci).
ForσN(i)andσ(Ci), the prior information is obtained from an exploratory study
on repeated measurements on three cells, which will be described in Section 4.3.
In particular, µj0 = 4.41 and λj0 = 0.1 for the nuclear standard deviation, and
µj0 = 4.52,λj0 = 0.1, for the cytoplasmic one.
Prior information on the degradation rate was taken from Boisvert et al. (2012). The authors estimate the 50% turnover of Nrf2 proteins to be 5.09 hours, where the 50% turnover is the time until 50% of the original population, which was present at time 0, has changed. Under steady-state conditions, the 50% turnover
represents an accurate approximation of the protein half-life, that we callt1/2, which
is the time until half of the initial population is degraded, assuming no synthesis (Claydon and Beynon, 2012). Under exponential decay, the half-life can easily be converted into the degradation rate. Assuming no synthesis and a constant degra-
dation rateδ per element of the population, which at time t we call Wt, we obtain
the following differential equation (DE) for the evolution of Wt:
dWt
dt = −δ Wt.
This DE has solutionWt=W0e−δ t; from this solution it is possible to express the
degradation rate with respect to the half-life, t1/2, by substituting Wt1/2 =
1
2W0 in
the DE solution, we obtain δ = ln(2)
t1/2 . Hence, by replacing t1/2 with its estimate
of 305.4 minutes, corresponding to 5.09 hours, we obtain a per minute degrada- tion rate of 0.002269. Therefore we set the degradation hypermean parameters to
µj0 =log(0.002269) = −6.088and λj0 = 1; the choice of λj0, less informative than
for measurement error standard deviation, reflects a higher degree of uncertainty in this piece of prior information.
An exploratory study on the ratio of cytoplasmic and nuclear areas, which will be illustrated in Section 4.1, allows us to formulate two informative hyperpriors,
one for each condition, for the hypermean ofc(i); in particular, we setµj0 = 2.64and
λj0 = 0.1, for the basal condition, and µj0 = 2.47 and λj0 = 0.1, for the stimulated
one.
distribution, σ(τi), although structurally identifiable, suffers from a lack of practical
identifiability, due to the complexity of the model and the limited data available. In order to circumvent this problem, we decide to keep the distributed structure of the delay, which is a more realistic assumption, yet with a fixed standard deviation
throughout. Therefore, to decrease the model complexity, στ is chosen not to be
hierarchical: στ(i) = στ,∀i. This implies that the delay distribution has a different
mean in each cell, although the same variance. After analysing the behaviour of the
distribution ofτ for several values of στ, we set the standard deviation of the delay
στ = 3 for all cells.
Therefore we redefine the hierarchical parameter vector we want to infer as θ(i)= (kd(i), ka(i), Ka(i), µ(τi), γ(i), δ(i), c(i), κN(i), σ(Ni), σC(i))T
Adaptive random walk proposal
The sampling of the hierarchical parameters inθ(i)follows a Metropolis-within-Gibbs
scheme, where movements for each θ(i) are proposed and accepted in five blocks,
that we define as, θ((bi)1) = (kd(i), µτ(i)), θ((ib)2) = (ka(i), Ka(i)), θ((bi)3) = (δ(i), γ(i)), θ((bi)4) = (c(i), κ(Ni)) and θ((bi)5) = (σN(i), σC(i)). The blocks are chosen, after an initial analysis where each hierarchical parameter is proposed independently from a simple random walk (RW), by merging, in the same block, the parameters with the most correlated
posterior chains. We also defineb1={1,4},b2 ={2,3},b3={5,6},b4 ={7,8}and
b5 = {9,10} as the vectors indicating the elements of θ(i) belonging to each one of
the five blocks.
For eachi, proposals in each block are sampled, in the log space, according
to the adaptive random walk (ARW) scheme (Haario et al., 2001), from a normal
distribution centred around the previous iteration values with variance proportional to the covariance matrix estimated from the parameter chains of the respective block.
The adaptation is analogous to the one implemented by Haarioet al.(2001), where
constantsandsdare chosen in order to optimize each block’s acceptance rate. The
MCMC is first run for 2,000 iterations without adaptation, as a standard random walk (RW), and only then the covariance matrices are computed from the chains, excluding the first 1,000 values, and they are used to tune the proposal variance. Being the correlation computed on all values of the chain from a fixed starting point onwards, the diminishing adaptation requirement (Roberts and Rosenthal, 2009) is respected. In other words, the proposal distribution stabilises as the chains increase;
i.e. the influence, on the proposal distribution, of the r-th iteration of the MCMC,