The Semi-Bayes Model - Appendix 2: Winbugs Code for Semi-Bayes and Fully-Bayes Models

4.7 Appendix 2: Winbugs Code for Semi-Bayes and Fully-Bayes Models

5.1.1 The Semi-Bayes Model

The semi-Bayes model was introduced over 10 years ago and has seen periodic use. Re- searchers have used semi-Bayes models in occupational, genetic, nutritional and cancer epidemiology.(De Roos et al., 2001; Greenland, 1992; Hung et al., 2004; Witte et al., 1994) By placing a prior distribution on model coefficients, the semi-Bayes model not only allows the researcher to incorporate prior knowledge but also shrinks coefficients toward that prior distribution. The amount of shrinkage in the semi-Bayes model de- pends on the prior variance. Smaller prior variances (indicating more prior knowledge) cause greater shrinkage to the prior mean while larger prior variances (indicating less prior knowledge) cause less shrinkage. In datasets of moderate size, the impact of the prior distribution is likely to be minimal. Previous studies that have used semi-Bayes models frequently specify relatively large prior variances. For instance, Kirrane et al.

specify a prior variance equivalent to 95% of possible ORs falling in a ten-fold range. There are two problems with such large prior variances. First they are almost guaran- teed to cause little shrinkage and be dominated by the observed data. Second, they are frequently incommensurate with prior knowledge. In the study by Kirrane et al., the authors indicate prior research showed a small increased risk of macular degeneration among users of pesticides (with the OR observed in a previous study of 2.0). It is un- likely the investigators would truly assign any prior probability to an OR=5, let alone OR=10. Users of semi-Bayes models should consider specifying more substantively realistic prior variances (ORs of 10 could be ruled out a priori in most studies) to reap more benefits from the Bayesian model.

A further troubling aspect of the use of semi-Bayes models is their role in reducing the type-I error rate in hypothesis testing.(Hung et al., 2004; Steenland et al., 2000) As we have demonstrated, there are two problems with this approach. First, semi- Bayes credible intervals only have increased frequentist coverage (i.e., they cover the true parameter estimate ≥ (1−α)% of the time and so are less likely to incorrectly reject the null) when the prior mean is zero. While such a prior mean may sometimes be justifiable, it will frequently be incommensurate with existing research. Second, even if setting the prior mean to zero is reasonable, the increased coverage probability will generally be minimal. Since this method requires assumptions that will frequently be untenable and even when tenable, will produce little gain in coverage, we suggest against using semi-Bayes methods for reducing type-I error rates.

Our simulation results generally demonstrate that the semi-Bayes model has somewhat worse properties than the other three Bayesian hierarchical models that we exam- ine. This is not a surprising result. The semi-Bayes models suffers, to paraphrase Jim- mie Savage, from breaking the Bayesian egg without making a Bayesian omelet.(Savage, 1954) That is, the researcher who uses semi-Bayes models allows some amount of Bayesian learning by updating the prior distribution about the effects with the observed data, but doesn’t allow the prior variance to be updated with the observed data. It stands to reason that methods that do allow the prior variance to be updated will outperform the semi-Bayes method simply because they make use of more available data.

This result is also somewhat misleading: it is possible to generate scenarios in which the semi-Bayes model outperforms (in terms of mean squared error) the fully-Bayes model. The scenarios in which the fully-Bayes model will most radically outperform the semi-Bayes model will be ones in which the semi-Bayes model has specified a prior

variance that is completely incompatible with the data. On the other hand, were we to generate a dataset and fit a semi-Bayes model with a prior variance equal to the variance observed in the dataset, the semi-Bayes model could perform somewhat better than the fully-Bayes model (since the prior variance was correctly specified to begin with). However, as we are never likely to know what the true variance is, we view the fully-Bayes approach as superior to the semi-Bayes approach. Indeed, in the applied example on disinfection by-products and spontaneous abortion we found that our en- tirely plausible prior variance in the semi-Bayes model was completely incompatible with the small amount of variability between estimates in the observed data.

When semi-Bayes models were introduced, presumably, it was because they were easier to fit than fully-Bayes models given the limitations of existing software at that time. However, in presenting methods to fit semi-Bayes models, these authors relied on asymptotic properties.(Witte et al., 1998) It is important for researchers to recog- nize that it is precisely those situations where asymptotics will hold (i.e., with large datasets) that semi-Bayes methods will be least useful. It is in those datasets where asymptotic assumptions are most tenuous that Bayesian methods will be most useful. Many of the recent articles using semi-Bayes techniques are frequently implemented in studies where asympototic assumptions may be tenuous, at best (for instance Kirrane et al. observe cell sizes of zero and De Roos et al. observe cell sizes of two).(De Roos et al., 2001; Kirrane et al., 2005) We have given templates of semi-Bayes and fully- Bayes code in Winbugs to alleviate the need to rely on asymptotic normality in fitting Bayesian models. We have also presented the basics of the Gibbs sampling routines we programmed in Matlab.

In document umi-unc-1027.pdf (Page 124-126)