Finally, we compare our **Bayesian** **model** **selection** procedure to that of Lanne and Saikkonen (2008). They strongly recommend using diagnostic checks to confirm the adequacy of the **model** suggested by the maximized likelihood criterion, but we ignore this step as it is difficult to incorporate into the simulation experiment. For simplicity, we consider the case where the order of the autoregressive polynomial operators is assumed to be known. In particular, we set r + s at 2 and calculate the marginal likelihoods and the maximum values of the approximate log likelihood function for the causal, purely noncausal and mixed models. We assume the same three parameter combinations ( 1 , 1 ) = {(0.1,0.7), (0.7,0.1), (0.7,0.7)} as in section 3. Again, the results (not reported in detail) are based on 1000 realizations of a series of 150 observations where the error terms ε t are assumed to have the standardized Student’s t-distribution with 3 degrees of freedom and

Show more
32 Read more

Previous authors have also attempted **model** **selection** experiments for the GIG cycle, but with various limitations compared to our approach. Roe and Allen (1999) compared deterministic models plus autoregressive process noise using an F-test and found no support for any one **model** over any other. Feng and Bailer-Jones (2015) used **Bayesian** **model** **selection** to select between competing forcing functions over the Pleistocene, concluding that obliquity influences the termination times over the entire Pleistocene, and that precession also has explanatory power following the mid-Pleistocene transition. Their approach requires a tractable likelihood function, which heavily restricts the class of models that can be compared, in particular, ruling out the use of SDE models. As in the previously mentioned hypothesis tests, they also begin by discarding most of the data and using a summary consisting of just the termination times ( ∼ 12 over the past 1 Myr), which is necessary as the low- order deterministic models used do not fit well to the complete dataset. They also only sample parameter values from the prior, leading to poor numerical eﬃciency. Finally, Kwasniok (2013) compares conceptual models over the last glacial period using the **Bayesian** information criterion. The likelihood of each **model** is estimated using an unscented Kalman filter (UKF) (Wan et al., 2000). Whilst this approach focussed on a smaller time horizon than our application, it can be applied using the data and models in this paper. However, the Gaussian approximation used by the UKF, whilst working well for filtering, is unproven for parameter estimation and **model** **selection**, and the particle filter oﬀers a more natural approach for non-linear dynamical systems.

Show more
41 Read more

Previous authors have also attempted **model** **selection** experiments for the GIG cycle, but with various limitations compared to our approach. Roe and Allen (1999) compared deterministic models plus autoregressive process noise using an F-test and found no support for any one **model** over any other. Feng and Bailer-Jones (2015) used **Bayesian** **model** **selection** to select between competing forcing functions over the Pleistocene, concluding that obliquity influences the termination times over the entire Pleistocene, and that precession also has explanatory power following the mid-Pleistocene transition. Their approach requires a tractable likelihood function, which heavily restricts the class of models that can be compared, in particular, ruling out the use of SDE models. As in the previously mentioned hypothesis tests, they also begin by discarding most of the data and using a summary consisting of just the termination times ( ∼ 12 over the past 1 Myr), which is necessary as the low- order deterministic models used do not fit well to the complete dataset. They also only sample parameter values from the prior, leading to poor numerical efficiency. Finally, Kwasniok (2013) compares conceptual models over the last glacial period using the **Bayesian** information criterion. The likelihood of each **model** is estimated using an unscented Kalman filter (UKF) (Wan et al., 2000). Whilst this approach focussed on a smaller time horizon than our application, it can be applied using the data and models in this paper. However, the Gaussian approximation used by the UKF, whilst working well for filtering, is unproven for parameter estimation and **model** **selection**, and the particle filter offers a more natural approach for non-linear dynamical systems.

Show more
42 Read more

It is quite common in statistical modeling to select a **model** and make inference as if the **model** had been known in advance; i.e. ignoring **model** **selection** uncertainty. The resulted estimator is called post-**model** **selection** estimator (PMSE) whose properties are hard to derive. Conditioning on data at hand (as it is usually the case), **Bayesian** **model** **selection** is free of this phenomenon. This paper is concerned with the properties of **Bayesian** estimator obtained after **model** **selection** when the frequentist (long run) performances of the resulted **Bayesian** estimator are of interest. The pro- posed method, using **Bayesian** decision theory, is based on the well known **Bayesian** **model** aver- aging (BMA)’s machinery; and outperforms PMSE and BMA. It is shown that if the unconditional **model** **selection** probability is equal to **model** prior, then the proposed approach reduces BMA. The method is illustrated using Bernoulli trials.

Show more
14 Read more

This dissertation explores **Bayesian** **model** **selection** and estimation in settings where the **model** space is too vast to rely on Markov Chain Monte Carlo for posterior calculation. First, we consider the problem of sparse multivariate linear regression, in which several correlated outcomes are simultaneously regressed onto a large set of covariates, where the goal is to estimate a sparse matrix of covariate effects and the sparse inverse covariance matrix of the residuals. We propose an Expectation-Conditional Maximization algorithm to target a single posterior mode. In simulation studies, we find that our algorithm outperforms other regularization competitors thanks to its adaptive **Bayesian** penalty mixing. In order to better quantify the posterior **model** uncertainty, we then describe a particle optimization procedure that targets several high-posterior probability models simultaneously. This procedure can be thought of as running several ``mutually aware'' mode-hunting trajectories that repel one another whenever they approach the same **model**. We demonstrate the utility of this method for fitting Gaussian mixture models and for identifying several promising partitions of spatially- referenced data. Using these identified partitions, we construct an approximation for posterior functionals that average out the uncertainty about the underlying partition. We find that our approximation has favorable estimation risk properties, which we study in greater detail in the context of partially exchangeable normal means. We conclude with several proposed refinements of our particle optimization strategy that encourage a wider exploration of the **model** space while still targeting high-posterior probability models.

Show more
122 Read more

Such an approach not only avoids problems associated with improper priors when calcu- lating the Bayes factor, but also has the potential to allow more general loss functions (for example replacing the posterior predictive density with a more general scoring rule (see Section 2.5) to be incorporated in the **model** assessment. However, it leaves open two im- portant questions - firstly, the extent to which overlapping subsets used for **model** training and validation introduce bias into the assessment, and secondly the extent to which the power of the assessment is reduced by assessing performance on models conditioned on an incomplete sample of data. This approach and associated issues are closely linked to the cross-validatory approaches we now consider.

Show more
139 Read more

Parameter estimation for complex models using **Bayesian** inference is usually a very costly process as it requires a large number of solves of the forward problem. We show here how the construction of adaptive surrogate models using a posteriori error estimates for quantities of interest can significantly reduce the computational cost in problems of statistical inference. As surrogate models provide only approximations of the true solutions of the forward problem, it is nevertheless necessary to control these errors in order to construct an accurate reduced **model** with respect to the observables utilized in the identification of the **model** parameters. Effectiveness of the proposed approach is demonstrated on a numerical example dealing with the Spalart–Allmaras **model** for the simulation of turbulent channel flows. In particular, we illustrate how **Bayesian** **model** **selection** using the adapted surrogate **model** in place of solving the coupled nonlinear equations leads to the same quality of results while requiring fewer nonlinear PDE solves.

Show more
21 Read more

211 Read more

In this paper we suppose that we are in a context similar to that of Example 1, where, for any possible **model**, the sample space of the problem must be consistent with a single event tree, but where on the basis of a sample of students’ records we want to select one of a number of different possible CEG models, i.e. we want to find the “best” partitioning of the situations into stages. We take a **Bayesian** approach to this problem and choose the **model** with the highest posterior probability — the Maximum A Posteriori (MAP) **model**. This is the simplest and possibly most common **Bayesian** **model** **selection** method, advocated by, for example, Dennison et al [6], Castelo [7], and Heckerman [8], the latter two specifically for **Bayesian** network **selection**.

Show more
20 Read more

Previous attempts to control for **model** quality in GLMs for fMRI include statistical tests for goodness of fit (Razavi et al., 2003) and the application of Akaike’s or **Bayesian** information criterion for activation detection (Seghouane and Ong, 2010) or theory selec- tion (Gl¨ascher and O’Doherty, 2010). Additionally, voxel-wise **Bayesian** **model** assessment (Penny et al., 2003, 2005, 2007) and random-effects **Bayesian** **model** **selection** (Rosa et al., 2010) have been included in the popular software package Statistical Parametric Map- ping (SPM), but are only rarely used due to low visibility, high analytical complexity and interpretational difficulty. Finally, a toolbox for frequentist **model** diagnosis and ex- ploratory data analysis called SPMd (“d” for “diagnostic”) has been released for SPM (Luo and Nichols, 2003), but was discontinued several years ago 1 (Nichols, 2013).

Show more
29 Read more

A number of **Bayesian** formulations of PCA have followed from the probabilistic formulation of Tipping and Bishop (1999a), with the necessary marginalization being approximated through both Laplace approximations (Bishop, 1999a; Minka, 2000, 2001a) and variational bounds (Bishop, 1999b). More recently, work within the statistics research community has used a **Bayesian** vari- ational approach to derive an explicit conditional probability distribution for the signal dimension given the data ( ˇSm´ıdl and Quinn, 2007). However, these results have only been tested on low di- mensional data with relatively large sample sizes. A somewhat more tractable expression for the signal dimension posterior was also obtained by Minka (2000, 2001a) and it is that **Bayesian** for- mulation of PCA that we draw upon. By performing a Laplace approximation (Wong, 1989), that is, expanding about the maximum posterior solution, Minka derived an elegant approximation to the probability, the **model** evidence p(D | k), of observing a data set D given the number of principal components k (Minka, 2000, 2001a). The signal dimensionality of the given data set is then esti- mated by the value of k that maximizes p(D | k). As with any **Bayesian** **model** **selection** procedure, if the data has truly been generated by a **model** of the form proposed, then one is guaranteed to select the correct **model** dimensionality as the sample increases to an infinite size. Minka’s dimensionality **selection** method performs well when tested on data sets of moderate size and dimensionality. In- deed, the Laplace approximation incorporates the leading order term in an asymptotic expansion of the **Bayesian** evidence, with the sample size N playing the role of the ‘large’ parameter, and so we would expect the Laplace approximation to be increasingly accurate as N → ∞. In real-world data sets, such as those emanating from molecular biology experiments, the number of variables d is often very much greater than the sample size N, with d ∼ 10 4 yet N ∼ 10 or N ∼ 10 2 not uncommon

Show more
27 Read more

to the ones made by Schwarz (1978) and Haughton (1988). In this sense, our paper generalizes the mentioned works, providing valid asymptotic formulas for a new type of marginal likelihood integrals. The resulting asymptotic approximations, presented in Theorem 4, deviate from the stan- dard BIC score. Hence the standard BIC score is not justified for **Bayesian** **model** **selection** among **Bayesian** networks with hidden variables. Moreover, no uniform score formula exists for such mod- els; our adjusted BIC score changes depending on the different types of singularities of the sufficient statistics, namely, the coefficient of the ln N term (Eq. 2) is no longer − d 2 but rather a function of the sufficient statistics. An additional result presented in Theorem 5 describes the asymptotic marginal likelihood given a degenerate (missing links) naive **Bayesian** **model**; it complements the main result presented by Theorem 4.

Show more
35 Read more

applications (i.e., models with many parameters). If prior knowledge about the parameters is not available or vague, a further simpliﬁcation leads to the **Bayesian** information criterion or Schwarz’ information crite- rion (BIC) [Schwarz, 1978; Raftery, 1995]. The Akaike information criterion (AIC) [Akaike, 1973] originates from information theory and is frequently applied in the context of BMA in social research [Burnham and Ander- son, 2003] for its ease of implementation. Previous studies have revealed that these information criteria (IC) differ in the resulting posterior **model** weights or even in the ranking of the models [Poeter and Anderson, 2005; Ye et al., 2008, 2010a, 2010b; Tsai and Li, 2010, 2010; Singh et al., 2010; Morales-Casique et al., 2010; Foglia et al., 2013]. This implies that they do not reﬂect the true **Bayesian** trade-off between performance and complexity, but might produce an arbitrary trade-off which is not supported by **Bayesian** theory and cannot provide a reliable basis for **Bayesian** **model** **selection**. Burnham and Anderson [2004] conclude that ‘‘. . . many reported studies are not appropriate as a basis for inference about which criterion should be used for **model** **selection** with real data.’’ The work of Lu et al. [2011] has been a ﬁrst step into clarifying the so far contradictory results by comparing the KIC and the BIC against a Markov chain Monte Carlo (MCMC) reference solution for a synthetic geostatistical application.

Show more
30 Read more

Chapter 3 introduces an individual-based SIS **model** for the spread dynamics of an infectious disease among a population of individuals partitioned into house- holds. The proposed hidden Markov **model**, that naturally accounts for partially observed data and imperfect test sensitivity, is used as the basic **model** for the methods developed throughout the thesis. Special attention is given to the data augmentation MCMC algorithm that is used to facilitate inferences for this **model**. In Chapter 4 we consider the problem of **Bayesian** **model** **selection** in the presence of high-dimensional missing data, focusing on epidemiological applications where observations are gathered longitudinally and the population under investi- gation is organised in small groups. In particular, we outline an algorithm that combines ideas of MCMC, importance sampling and filtering to provide estimates of the marginal likelihood, and is well suited for small-scale epidemics. Even though several alternative approaches exist, there are currently only few studies assessing the performance of **model** **selection** methods in such settings. Hence, one of the main contributions of this chapter is the comparison of the proposed method with existing approaches, achieved through an extended simulation study on synthetic data generated in order to resemble real-life epidemiological problems. The impor- tance of **model** **selection** procedures is further demonstrated in Chapter 5, where we successfully apply these methods to uncover new insights into the transmission dynamics of E. coli O157:H7 in cattle.

Show more
275 Read more

Horizontal gene transfer (HGT) plays a critical role in evolution across all domains of life with important biological and medical implications. I propose a simple class of stochastic models to examine HGT using multiple orthologous gene alignments. The models function in a hierarchical phylogenetic framework. The top level of the hierarchy is based on a random walk process in “tree space” that allows for the development of a joint probabilistic distribution over multiple gene trees and an unknown, but estimable species tree. I consider two general forms of random walks. The first form is derived from the subtree prune and regraft (SPR) operator that mirrors the observed effects that HGT has on inferred trees. The second form is based on walks over complete graphs and offers numerically tractable solutions for an increasing number of taxa. The bottom level of the hierarchy utilizes standard phylogenetic models to reconstruct gene trees given multiple gene alignments conditional on the random walk process. I develop a well-mixing Markov chain Monte Carlo algorithm to fit the models in a **Bayesian** framework. I demonstrate the flexibility of these stochastic models to test competing ideas about HGT by examining the complexity hypothesis. Using 144 orthologous gene alignments from six prokaryotes previously collected and analyzed, **Bayesian** **model** **selection** finds support for (1) the SPR **model** over the alternative form, (2) the 16S rRNA reconstruction as the most likely species tree, and (3) increased HGT of operational genes compared to informational genes.

Show more
14 Read more

The Naive Bayes method is based on the work of Thomas Bayes (1702-1761). In **Bayesian** classification, we have a hypothesis that the given data belongs to a particular class. We then calculate the probability for the hypothesis to be true. This is among the most practical approaches for certain types of problems. The approach requires only one scan of the whole data. Also, if at some stage there are additional training data, then each training example can incrementally increase/decrease the probability that a hypothesis is correct. Thus, a **Bayesian** network is used to **model** a domain containing uncertainty {12, 13] & evolutionary optimization of RBF network architectures (feature and **model** **selection**) applicable to a wide range of data mining problems (in particular, classification problems). Therefore, the overall runtime of the EA had to be reduced substantially. We decided to optimize the most important architecture parameters only and to use standard techniques for representation, **selection**, and reproduction.

Show more
Since the network-based penalised-likelihood approach [21] does not incorporate interaction terms we performed a third simulation to investigate its performance under a data-generating **model** without interaction terms. In particular, we used the same true underlying predictor subset as in Simulation 2 (i.e. γ 3 ∗ = γ 2 ∗ ), which con- tains predictors that are neighbours in the network, but generated data using a linear **model** without interaction terms; Y = A + 2B + 3C + , where A, B, C are the three inﬂuential variables. We note that each predictor in the data-generating **model** has a diﬀerent magnitude of inﬂu- ence on the response (i.e. diﬀerent regression coeﬃcients). Average ROC curves are shown in Figure 3c. Compar- isons are made to other approaches as described above, but all methods now use linear models without interac- tion terms. As in Simulations 1 and 2 the **Bayesian** variable **selection** approach with empirical Bayes and pathway- based priors outperforms a ﬂat prior and an incorrect prior, with empirical Bayes selecting the correct prior in 99% of iterations (correct and incorrect priors are the same as for Simulation 2). The **Bayesian** approach with Markov random ﬁeld prior showed a similar performance to the proposed pathway-based priors (a correct value of λ > 0 was selected in 90% of iterations). However, the approach of Li and Li [21], whilst now more competi- tive compared with Simulation 2, is still outperformed by the empirical Bayes approach with pathway-based priors. Moreover, it does not display a clear improvement over Lasso regression.

Show more
16 Read more

The promise of augmenting accurate predictions provided by modern neural networks with well-calibrated predictive uncertainties has reinvigorated interest in **Bayesian** neural net- works. However, **model** **selection**—even choosing the number of nodes—remains an open question. Poor choices can severely affect the quality of the produced uncertainties. In this paper, we explore continuous shrinkage priors, the horseshoe, and the regularized horse- shoe distributions, for **model** **selection** in **Bayesian** neural networks. When placed over node pre-activations and coupled with appropriate variational approximations, we find that the strong shrinkage provided by the horseshoe is effective at turning off nodes that do not help explain the data. We demonstrate that our approach finds compact network structures even when the number of nodes required is grossly over-estimated. Moreover, the **model** selec- tion over the number of nodes does not come at the expense of predictive or computational performance; in fact, we learn smaller networks with comparable predictive performance to current approaches. These effects are particularly apparent in sample-limited settings, such as small data sets and reinforcement learning.

Show more
46 Read more

The departure from normality can also be seen from the **Bayesian** residual test. We first fit the MSGM **model** with one regime and one state, which is effectively a **model** of multivariate normal returns. We conduct a series of residual tests by normalizing the historical returns using the posterior draws of the mean and covariance matrix. If the returns are indeed normally distributed, then the classical Kolmogorov-Smirnov test should accept the null. The histogram of the test statistics are reported in Figure 1. The six panels correspond to the six assets in sequence. Since we have a fairly large sample size of more than 2000 observations, the 1% significance critical value of the test statistics can be approximated by 1.63/ √ T , which is about 0.03. Figure 1 shows that test statistics are larger than the critical value in every circumstance so that the normality can be decisively rejected.

Show more
30 Read more

Inflation’s volatility has attracted a good deal of attention recently; the interest has been sparked by the debate on the Great Moderation that has been documented for real economic aggregates. Inflation stabilization is indeed a possible source of the reduction in the volatility of macroeconomic aggregates. The issue is also closely bound up with inflation persistence and predictability. In an influential paper Stock and Watson (2007), using a local level **model** with stochastic volatility, document that inflation is less volatile now than it was in the 1970s and early 1980s; moreover, persistence, which measure the long run effect of a shock, has declined, and predictability has increased.

Show more
20 Read more