Modelling the Data - Exploring a Bayesian hierarchical structure within the behavioural perspec

5.5.1 The Gibbs Sampler

As discussed previously the historic issue with Bayesian inference is the prohibitive nature of calculating the posterior integral for any functional forms other than trivial models. In order to surmount this, the simulation technique of MCMC is employed whereby a sufficiently large number of iid draws are made until convergence of the posterior distribution of

)

|

(

y

p

is achieved (Rossi and Allenby, 2003). This method is what has been termed Markov Chain Monte Carlo, or MCMC (Robert and Casella, 1999). A suitable algorithm is required to achieve this convergence and the Metropolis-Hastings method has been shown to converge at a geometric rate (Tierney, 1994). The Gibbs sampler (depicted in Equation 33) is one form of Metropolis-Hastings algorithm. Consider the posterior distribution with k

elements(1,...,k). The Gibbs sampler works by drawing from conditional distributions of the posterior by cycling through each parameter, one at a time whilst maintaining the other parameters constant in the following fashion.

) , ,..., , | ( ... ) , ,..., , | ( ) , ,..., , | ( 1 , 2 , 1 , , , 1 3 , 1 1 , 2 , , 1 3 , 1 2 , 1 1 , y p y p y p k r r r k r k r r r r k r r r r                  

This continues until the joint posterior distribution converges. Inference can then be derived for each of the parameters (₁,...,_k)by calculating the estimate for the parameter from the iterations of the converged chain.

The modelling process is conducted through the Rjags package, within the R software system. The Rjags package calls on the JAGS (Just Another Gibbs Sampler) software package and brings its functionality within the R environment (see Plummer (2003) for details on the JAGS package). The JAGS R package uses the Gibbs sampler to generate the model’s MCMC, and the CODA package within R offers a suitable means of calculating this Bayesian inference of the parameters (see Finley (2013)).

5.5.2 Convergence Criteria

There is no mechanism whereby the Gibbs sampler “knows” it has converged and the researcher must ensure convergence is achieved before inference can be calculated. The Bayesian model uses MCMC to calculate the estimate and hence produces a chain of evolving estimates of the parameter value, starting at an arbitrary initial value and through the Gibbs sampler, arrives at a converged estimate of the value of the parameter. Each draw from the chain is autocorrelated, though the laws of large numbers allows the estimations to be inferenced when the chain converges (Rossi and Allenby, 1999). When the chains reach convergence, it is said they resemble “hairy caterpillars” which is a random noise around a stationary value of the estimate. This allows a visual means of assessing if the model has been run with sufficient number of draws to arrive at the estimate.

As well as the visual inspection of the MCMC to ascertain convergence, Gelman and Rubin (1992) offers a diagnostic which helps determine if convergence has been achieved. In essence the statistic measures the difference in variance between chains versus within chain. A value close to 1 indicates convergence. A rule of thumb states a value of less than 1.1 is sufficient to indicate the parameter has converged. The statistic can be calculated within the CODA Bayesian diagnostic package which can be called through the R environment (Finley, 2013).

More than one chain can be run to estimate the coefficients and Rossi and Allenby (2003) state that this can often be beneficial as convergence can be seen by the intermingle of both chains. Gelman and Rubin (1992) also suggest multiple chains when running MCMC

estimation. Each chain is independent of each other and will converge to the same estimate of the parameter given sufficient number of draws. This convergence to the same estimate also offers the further reassurance the estimate has indeed converged. An example of a converged MCMC “hairy caterpillar” plot with two chains is shown in Fig 61 whereby the red and blue colours represent two independent chains.

Figure 61: Converged MCMC plot with two chains

Therefore, when the model is constructed, the Gibbs sampler is run using two independent chains for each parameter estimate. The initial values for each parameter in the two chains will be drawn at random from the prior distribution of the parameter and hence each chain will start from a different initial value, offering a further degree of reassurance of the converged of the parameter estimate.

5.5.4 Estimate of the parameter

The parameter is estimated by taking an average of the draws within and then across both chains. Given the initial value could be significantly different from the converged value of the parameter, it is important to base the estimate of the parameter on the average of the converged values rather than the average of the entire chains. To ensure this a “burn in” sample of draws is required and hence the inference is estimated only from the converged draws of the chains. The burn in is set at 4,000 iterations per chain. A further 2,000 iterations per chain are used as the basis of the parameter estimate. There is no rule as to the number of

burn in draws and hence it is important to ensure the convergence criteria are checked for all parameter estimates.

5.5.5 Initial Values

Before the model can be initialised, the Gibbs sampler must be given an initial starting value for each chain and each parameter in order to have a base in which to start the Gibbs

sampling algorithm. The starting value can be given to the model if an appropriate estimate is known. Otherwise the Rjags package will randomly select a value from the prior distribution assigned (Plummer, 2003). For this study, the latter option is taken and the initial values are sampled from the prior distribution which will result in different starting values for each chain of each parameter, to better ascertain if convergence has been reached (Rossi and Allenby, 2003). The choice of initial value will not make an impact on the parameter estimate, given the inference is taken post burn in, though could make a difference to the number of draws required to reach convergence.

5.5.6 Model File

The Rjags package reads an external data file containing the model functional form, including the prior distribution specification. This is stored as a text file and is called by the body of the model through Rjags.

5.5.7 Generated Statistics

The combination of the MCMC post burn-in iterations are run using the Gibbs sampler resulting in the posterior distribution estimate of each parameter together with its inference. The CODA package within R is a popular means of calculating this inference (Finley, 2013). The posterior distribution is normally distributed and a chart is displayed for each variable using the GGPLOT package within R. Given the Bayesian inference, a 95% confidence interval of the posterior distribution can be observed directly from the MCMC output. A boxplot is also produced through GGPLOT which helps to visualise the difference between comparable parameter estimates. This is helpful when visualising differences or similarities between parameters given various functional forms.

The inference measures are also displayed for each parameter in the form of a point estimate and its standard error. Unlike the frequentist environment, there is no hypothesis test to understand the statistical significance of the point estimate. Instead, the paradigm takes advantage of the fact that the posterior distribution is the probability of the parameter given the data

P(|)

and hence a 95% posterior confidence interval can be calculated for the mean in the usual manner i.e.

1.96

. If the 2.5% and 97.5% estimates of the confidence do not straddle zero, then there is at least a 95% probability the value of the parameter is non- zero as illustrated in Fig 62.

Figure 62: Bayesian posterior confidence interval chart

This measure is used to understand whether the parameter is contributing to the model (if the posterior confidence interval does not straddle zero) or whether the parameter is redundant within the model (i.e. the posterior confidence interval does straddle zero). The Bayesian inference allows transparency of course in that it can be easily deduced from the confidence interval whether the degree of confidence the researcher may have as to whether the

parameter is “just” included/excluded from the interval or whether it is “some distance” from the upper/lower confidence interval extremity.

For this study, a combination of Bayesian and frequentist measures will be used to

understand the inference of the parameters, given the discussion within the literature review. Fig 63 gives an illustration of the structure of the parameter inference and how these statistics can be interpreted. An indication of whether these are Bayesian or frequentist is also offered.

Figure 63: Parameter interpretation

The estimates and diagnostics of the model parameters are calculated and displayed in tables with headings similar to the one shown in Fig 63.

Each of the metrics of Fig 63 are outlined as follows

(1) Point estimate of the parameter (and its standard error) calculated from the posterior distribution of the MCMC.

(2) The 95% Bayesian posterior confidence interval of the parameter.

(3) The symbol ^ denotes the interval does not straddle zero (and hence means the parameter has at least a 95% probability it is contributing to the model fit). Lack of ^ denotes the interval does straddle zero.

(4) The frequentist t-statistic denotes the ratio of the parameter estimate and its standard error.

(5) The frequentist statistical two-tailed significance level associated with the computed t- statistic.

(6) Indication of the statistical significance with * denoting significance at 10% level and ** denoting significance at the 5% level (two tailed). Lack of stars indicate the level of statistical significance is >10%.

In document Exploring a Bayesian hierarchical structure within the behavioural perspective model (Page 154-159)