1.2 Statistical concepts and methods
1.2.2 Bayesian analysis
Within a Bayesian framework, the parameters of the modelθ are treated as random variables (with some defined probability distribution). Before any data is observed, a prior distribution can be used to express prior beliefs about the parameters. The Bayesian approach is subjective, as it incorporates personal belief about the distri- bution of parameters. However, there are non-informative priors, meaning that in the absence of any prior information, one can adopt a flat prior across the range of possible values of theta. A flat prior reflects ignorance about parametric knowl- edge. Often less informative priors are preferred, having a minimal influence on the posterior distribution [Gamerman and Lopes, 2006].
The dataY can be modelled based on the parametersθ. Theθ0s are random quantities with prior probability distribution P(θ). According to Bayes’ theorem,
the posterior distribution can be defined for the parameters,θ0s, given the dataY,
P(θ|Y) =R f(Y|θ)P(θ)
ΘP(Y|θ)P(θ)dθ
. (1.1)
In the absence of knowledge of the denominator of equation 1.1, the posterior is approximated as
P(θ|Y)∝f(Y|θ)P(θ). (1.2) Often, calculation of the posterior distribution P(θ|Y) requires an evalua- tion of higher dimensional integrals which are numerically intractable. In order to deal with such complexity, we need to employ approximation techniques, which can be implemented using Markov Chain Monte Carlo (MCMC) methodologies. MCMC algorithms simulate a random variable,x, such that the sequencex1, x2, . . .forms a
Markov chain with a specified equilibrium distribution. In a Bayesian context this equilibrium distribution is the posterior distribution. If new point xn+1 depends
only on the previous pointxn then the chain possesses the Markov property. The
chain i.e., collection of simulated samples from posterior distribution will then be used to draw conclusions concerning parameter estimation (or model prediction) based on statistical measures such as mean and variance or other measures calcu- lated from the samples [Carter and Kohn, 1996].
Metropolis-Hastings
The insight behind the Metropolis-Hastings algorithm is the notion of a re- versible chain. A Markov chain is said to be reversible if the probability of a state
x,π(x), with transition probabilityT(x0|x) is such that
T(x0|x)π(x) =T(x|x0)π(x0). (1.3)
chain is said to be detailed balance if and only if it is a reversible Markov chain. The equation 1.3 is ‘balanced’ due to the symmetric roles of states x and x0. It is called ‘detailed’ as it holds for every possible pair of states.
Assume a sequence of random variablesX1, X2, . . . , Xt,generating a sample
from the target densityf asx1, x2, . . . , xt. The basic idea of the Metropolis-Hastings
sampling is to generate a Markov chain that has the target densityf as its equilib- rium density. To do so the Metropolis-Hastings algorithm is set as below:
Step 1Sample a candidate valuex∗ forXt+1 from the proposal densityQ(xt+1|xt).
Step 2Given the candidate valuex∗, calculate the acceptance probabilityα(x∗|xt)
as: α(x∗|xt) =min Q(xt|x∗)f(x∗) Q(x∗|x t)f(xt) ,1
Step 3Ifα(x∗|xt) = 1 then the candidatex∗ is accepted and xt+1 is set to bex∗.
If α(x∗|xt) < 1, then the candidate x∗ is accepted with probabilty α(x∗|xt). The
probability ofα(x∗|xt) is set as follows:
• sample randomly a valueu from the uniform distributionU(0,1) based on an interval of (0,1);
• If u ≤ α(x∗|xt), then candidate value x∗ is accepted and set xt+1 = x∗;
otherwise rejectx∗ and set xt+1=xt.
Repeat steps 1−3 untill a full set of sample x1, x2, . . . , xN has achieved.
Gibbs Sampling
Gibbs sampling is a special case of MCMC in which proposals are always ac- cepted. Gibbs sampling is for multivariate target densities and simulates a multivari- ate density using univariate conditional distributions known as the full-conditional distributions. Here, we discuss the simplest Gibbs sampling approach to carry out Bayesian inference [Carter and Kohn, 1996],[Kim and Nelson, 2001]. Suppose for somek≥1, thek-dimensional multivariate random variable vectorθcan be written
as θ = (θ1,· · ·θk). Suppose the corresponding univariate conditional densities are
f1,· · ·, fk. We assume that we know how to sample from the full conditionals
θi|θ1, θ2,· · ·, θi−1, θi+1,· · ·, θk∼fi(θi|θ1, θ2,· · ·, θi−1, θi+1,· · ·, θk)
fori= 1,2,· · ·, k.
The associated Gibbs sampling algorithm can be given as a transition from
θ(t) toθ(t+1). wheret is the iteration number,
1. Given starting valuesθ0 = (θ10,· · ·, θ0k), set t= 0; 2. Sample fort= 1,2· · ·, N θ(1t+1)∼f1(θ1|θ(2t),· · ·, θ(kt)); θ2(t+1) ∼f2(θ2|θ1(t+1), θ3(t),· · ·, θ(kt)); .. . θ(kt+1)∼fk(θk|θ1(t+1),· · ·, θ (t+1) k−1 ).
3. Sett=t+ 1 and repeat from step 2.
The advantage of a Gibbs sampler is its use of density functions for simula- tion. Therefore in the case of high-dimensional problems these distributions can also be defined as univariate. Using samples drawn from the full conditional distribution, we can make estimates of the parameters.
In spite of their popularity and wide application, there are several issues that arise in implementing MCMC methods, such as blocking, updating order in Gibbs sampling, defining the optimal number of chains, starting values, determining burn- in, determining stopping time and analysis of the output [Cowles and Carlins, 1996]. Therefore the implementation of such tasks will require fine programming with very careful diagnostic tests to obtain confident results.