Although a two-component Bayesianmixturemodel was fit in the current study, a method with more components (if neces- sary) can be used to identify implausible gestational ages. On the other hand, each component of the model has a normal dis- tribution, this restriction may be questiona- ble, especially in lower gestational ages. As a solution, one may use distribution-free models. These models present a flexible distribution for erroneously reported gesta- tional ages. Wilcox and Russell suggest a two-component mixture distribution (36). One component includes “predominant dis- tribution” which has normal distribution. The second component includes a “residual distribution” which has no specified form. Ultimately, the objective of this study was
To extract structured representations of newsworthy events from Twitter, unsuper- vised models typically assume that tweets involving the same named entities and ex- pressed using similar words are likely to belong to the same event. Hence, they group tweets into clusters based on the co- occurrence patterns of named entities and topical keywords. However, there are two main limitations. First, they require the number of events to be known beforehand, which is not realistic in practical applica- tions. Second, they don’t recognise that the same named entity might be referred to by multiple mentions and tweets us- ing different mentions would be wrongly assigned to different events. To over- come these limitations, we propose a non- parametric Bayesianmixturemodel with word embeddings for event extraction, in which the number of events can be in- ferred automatically and the issue of lex- ical variations for the same named entity can be dealt with properly. Our model has been evaluated on three datasets with sizes ranging between 2,499 and over 60 million tweets. Experimental results show that our model outperforms the baseline approach on all datasets by 5-8% in F-measure. 1 Introduction
This paper proposes a model for term re- occurrence in a text collection based on the gaps between successive occurrences of a term. These gaps are modeled using a mixture of exponential distributions. Pa- rameter estimation is based on a Bayesian framework that allows us to fit a flexi- ble model. The model provides measures of a term’s re-occurrence rate and within- document burstiness. The model works for all kinds of terms, be it rare content word, medium frequency term or frequent function word. A measure is proposed to account for the term’s importance based on its distribution pattern in the corpus.
development results, adding morphology to the ba- sic model is generally useful. The alignment results are mixed: on the one hand, choosing the best pos- sible language to align yields improvements, which can be improved further by adding morphological features, resulting in the best scores of all models for most languages. On the other hand, without knowing which language to choose, alignment fea- tures do not help on average. We note, however, that three out of the seven languages have English as their best-aligned pair (perhaps due to its better overall scores), which suggests that in the absence of other knowledge, aligning with English may be a good choice.
is a sequence of un-modeled predictors and constants (e.g., sizes, hyperparameters). A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods (Metropolis et al., 1953), an adjusted form of Hamiltonian Monte Carlo sampling (Duane et al., 1987). Stan can be called from R using the rstan package, and through Python using the pystan package. All interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, parameter transforms, and specialized plotting. Stan programs consist of variable type declarations and statements. Variable types include constrained and unconstrained integer, scalar, vector, and matrix types. Variables are declared in blocks corresponding to the variable use: data, transformed data, parameter, transformed parameter, or generated quantities. Stan Development Team (2017).
The inferences about the number of modes is shown in figure 8. The degree of posterior uncertainty for most of the data sets (with the exception of sodium lithium) is substantial and is obscured in the posterior predictive distributions. In all cases the results are shown for Be(1, 1) prior, as with a, the results are unchanged with the second prior for the galaxy and enzyme data. The galaxy data supports a range of values between 3 and 9. The values 5 and 6 receive almost equal posterior support. The acidity data shows strongest support for 2 modes and some uncertainty about and extra 1 or 2 modes. The enzyme data also shows a large amount of posterior uncertainty about the number of modes. It show most support for 3 modes with good support for upto 7 modes. The results are rather surprising given the shape of the posterior predictive distribution. It seems reasonable to conjecture that the form of the model may lead to these results. The data can be roughly divided into two groups. The skewness of the second group can only be captured by a number of normal distribution. This may lead to rather unrealistic estimates of the number of modes. The sodium lithium data set results are shown for the Be(1, 1) prior. The posterior distribution strongly supports a single mode with a posterior probability of about 0.8.
The derivation of loss distribution from insurance data is a very interesting research topic but at the same time not an easy task. To find an analytic solution to the loss distribution may be mislading although this approach is frequently adopted in the actuarial literature. Moreover, it is well recognized that the loss distribution is strongly skewed with heavy tails and present small, medium and large size claims which hardly can be fitted by a single analytic and parametric distribution. Here we propose a finite mixture of Skew Normal distributions that provides a better characterization of insurance data. We adopt a Bayesian approach to estimate the model, providing the likelihood and the priors for the all unknow parameters; we implement an adaptive Markov Chain Monte Carlo algorithm to approximate the posterior distribution. We apply our approach to a well known Danish fire loss data and relevant risk measures, as Value-at-Risk and Expected Shortfall probability, are evaluated as well.
In spite of the success of these models, none of the previous models has formally discussed the issue of multiple relation semantics that a relation may have multiple meanings revealed by the entity pairs associated with the corresponding triples. As can be seen from Fig. 1, visualization results on embedding vectors obtained from TransE (Bordes et al., 2013) show that, there are different clusters for a specific relation, and different clusters indicate different latent semantics. For example, the relation HasPart has at least two latent semantics: composition-related as (Table, HasPart, Leg) and location-related as (Atlantics, HasPart, NewYorkBay). As one more example, in Freebase, (Jon Snow, birth place, Winter Fall) and (George R. R. Martin, birth place, U.S.) are mapped to schema /fic- tional universe/fictional character/place of birth and /people/person/place of birth respectively, indicating that birth place has different meanings. This phenomenon is quite common in knowledge bases for two reasons: artificial simplification and nature of knowledge. On one hand, knowledge base curators could not involve too many similar relations, so abstracting multiple similar relations into one specific relation is a common trick. On the other hand, both language and knowledge representations often involve ambiguous infor- mation. The ambiguity of knowledge means a semantic mixture. For example, when we mention “Expert”, we may refer to scientist, businessman or writer, so the concept “Expert” may be ambigu- ous in a specific situation, or generally a semantic mixture of these cases.
Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models B a y e s i a n L e a r n i n g of G a u s s i a n M i x t u r e D e n s i t i e s for H i d d e n M a r k o v M o d e l s J e[.]
Bayesianmodel averaging (BMA) is a popular and powerful statistical method of taking account of uncertainty about model form or assumption. Usually the long run (frequentist) performances of the resulted estimator are hard to derive. This paper proposes a mixture of priors and sampling distributions as a basic of a Bayes estimator. The frequentist properties of the new Bayes estima- tor are automatically derived from Bayesian decision theory. It is shown that if all competing models have the same parametric form, the new Bayes estimator reduces to BMA estimator. The method is applied to the daily exchange rate Euro to US Dollar.
segments (e.g. BACs). Spatial correlation was induced through the weights of the mixture via Markov random fields. In our approach, instead of considering three states, we allow for an unknown number of mixture components and achieve inference using a reversible jump Markov chain Monte Carlo method. As in  we use Markov random fields to account for correlated neighboring SNPs. In contrast to models that incorporate HMMs to infer integer copy numbers, our modeling approach uses infor- mation (neighboring SNPs) on both sides of a SNP. In addition, we account for cell contamination by shrinking the theoretical copy number log-ratios towards zero. The implementation only requires ordered (normalized) log- ratios and, therefore, may be applied to data from any plat- form suitable for copy number estimation. In Section 2 we present the model and method of inference. Section 3 reports on a simulation study and application to real data. The real data study includes cases where cytogenetics has shown large regions of gain or loss and we also show novel smaller regions detected by our algorithm. The new aberrations are validated by CGH and/or PCR. A discus- sion is given in Section 4 and an Appendix provides details on the MCMC algorithm.
Given this variety of possible patterns, a nonparametric approach seems ap- propriate to both smooth the random noise aﬀecting the curves, and to account for diﬀerent patterns. Skewness and multimodality can be modelled via mixture models. It is known, for example, that a mixture of Gaussians kernels can consis- tently estimate the shape of almost any continuous distribution. As discussed by many authors (Chandola et al., 1999; Ortega Osona and Kohler, 2000; Peristera and Kostaki, 2007; Schmertmann, 2003), mixture models are clearly appropriate when two or more populations with diﬀerent age-speciﬁc fertility rates are present. However, the problem of choosing the number of mixture component remains elu- sive in most applications.
However, recent years witness some downturn of mixture models in Bayesian works, largely due to the controversy on the identification issues. Obviously, finite Gaussian mixtures suffer from the invariance of relabeling, i.e. the permutation of the parameter vector across regimes will not change the likelihood function. In that case, interpretation of posterior is difficult and Gibbs sampler exhibits unusual properties. Celeux et al. (2000) argue that virtually the entirety of MCMC samplers do not converge. Jasra et al. (2005) pessimistically believe that Gibbs sampler is not always appropriate for mixturemodel.
Several open issues remain for future research. First, it would be be interesting to combine the current approach with recent methods based on local Rademacher complexities (e.g., Bartlett et al., 2002a), which are sometimes able to attain faster convergence rates. Second, a particularly inter- esting question relates to using the data itself to learn an appropriate constraint function, or perhaps several constraint functions. Finally, it is clearly important to conduct careful numerical studies of the bounds. Related work by Seeger (2002) demonstrated the tightness of similar bounds in the con- text of Gaussian processes, and their relevance to real-world problems. Preliminary studies indicate similar behavior for our bounds, but a systematic numerical investigation still needs to be done. In this paper we have been concerned solely with mixture based Bayesian solutions. As pointed out in Section 1, general optimal Bayesian solutions are not always of a mixture form. In this context, it would be particularly interesting to establish finite sample bounds for optimal Bayesian procedures, which, under appropriate conditions, would provide tight upper bounds on the performance of any learning algorithm, and not only those based on selecting hypotheses from some class of hypotheses. Given the suggested connections established in this work between the frequentist and Bayesian approaches, we would like to conclude with the following quote from Lehmann and Casella (1998).
In experiments with life testing, it has been found that the lifetime may often be reasonably described by an exponential distribution. The exponential distribu- tion has been used as a model, at least as a rst approximation, in areas ranging from studies on the lifetimes of manufactured items, the time interval between failures of software systems to research involving survival or remission times in chronic diseases. See Balakrishnan and Basu (1995) and Meeker and Escobar (1998) for more applications of the Exponential distribution. However, in some applications we may need to extend the exponential distributional assumption to more general class of distribution that meets the specic need of the possibly complicated structure of the data. For instance, consider two populations with exponential distributions that have been mixed in unknown proportions. In this article we develop a model which is a mixture of only two hazards, where the mixing proportion varies over time. In our subsequent development we restrict our models to only constant hazards. Similar methods can be extended for mix- tures with more than two hazards and possibly for any other lifetimedistributions such as Weibull, Gamma etc. However when considering other probability dis- tributions, one must be careful about the identiability of the parameters when allowing for time-variant mixing function. In Section 2 we nd that even in the case of mixing two exponential hazards with time-dependent mixing function,
Monte Carlo algorithms performed by Tanner and Wang (1987). Further developments in the fields listed here the data augmentation by Gelfand and Smith (1990), Gibbs sampler and the sampling important resampling (SIR) algorithm by Rubin (1987). The applications of the Gibbs sampler to mixture of important statistical problem were discussed by many researchers correspondingly Gelfand et al(1990), Gelfand and Smith (1991), Carlin and Polson (1991), Carlin et al. (1992), Gelfand et al (1992). The Metropolis-Hastings algorithm was developed by Metropolis, et al., (1953) and consequently generalized by Hastings (1970). A broad theoretical description of Metropolis-Hastings was given by (Tierney, 1994; Chib and Greenberg, 1995) provide an outstanding discussion.
The purpose of model updating is to adjust the computer model such that its outputs agree with experimental data obtained typically from a modal test. This is an essential step of model validation and verification. Today, di ff erent stochastic model updating procedures have been developed and applied in different fields (see e.g. [1–4]). The two most promising approaches to the stochastic model updating problem are the sensitivity (covariance updating) method  and Bayesian updating [6,7]. Recently, these two techniques have been combined in order to overcome the limitation of the individual procedure . More specifically, the results of the sensitivity approach are used to define prior distributions for the BayesianModel updating. By doing so, the computational efforts associated with the BayesianModel updating procedure are largely reduced and the final distributions of the model parameters can evolve and relax the initial assumption of Gaussian parameter variability used in the sensitivity approach.
In this case, standard split-merge methods such as SAMS are less helpful since only merging can be performed when the maximum number of clusters has been allocated. The PGSM sampler does not have this restriction and naturally allows simultaneously splitting and merging while preserving the total number of clusters. Furthermore, the PGSM sampler can use more than two anchors, potentially allowing for large changes in configuration without altering the number of clusters. We compared the PGSM with two (|s| = 2) and three (|s| = 3) anchors to the Gibbs sampler. The PGSM method outperformed the Gibbs sampler, though increasing the number of anchors did not improve the performance (Figure 5 a). We plot the predictive densities (Figure 5 b, d, f) and cluster allocations (Figure 5 c, e, g) after running each sampler for 1000 seconds. At this point the PGSM sampler used a single cluster to model the points in the middle, while the Gibbs samplers used two clusters to model the central cluster.
The definition of the model space is intricately linked with the model uncertainty that is being addressed. For example, if the researcher is unsure about the functional form of the models and about covariate inclusion, both aspects should be considered in building M. Clearly, models that are not entertained in M will not contribute to the model-averaged inference and the researcher will thus be blind to any insights provided by these models. Common sense should be used in choosing the model space: if one wants to shed light on the competing claims of various papers that use different functional forms and/or different covariates, it would make sense to construct a model space that combines all functional forms considered (and perhaps more variations if they are reasonable) with a wide set of possibly relevant and available covariates. The fact that such spaces can be quite large should not be an impediment. 9 In practice, not all relevant model spaces used in model averaging analyses are large. For example, to investigate the effect of capital pun- ishment on the murder rate (see the discussion earlier in this section), Durlauf et al. (2012) build a bespoke model space by considering the following four model features: the probability model (linear or logistic regression), the specification of the covariates (relating to the probabilities of sentencing and execution), the presence of state-level heterogeneity, and the treatment of zero ob- servations for the murder rate. In all, the model space they specify only contains 20 models, yet leads to a large range of deterrence effects. Another example of BMA with a small model space is the analysis of impulse response functions in Koop et al. (1997), who use two different popular types of univariate time series models with varying lag lengths, leading to averaging over only 32 models (see Subsection 5.2). Here the model space only reflects specification uncertainty. An example of theory uncertainty leading to a model space with a limited number of models can be found in Liu (2015), who compares WALS and various FMA methods on cross-country growth regressions. Following Magnus et al. (2010), Liu (2015) always includes a number of core re- gressors and allows for a relatively small number of auxiliary regressors. Models differ in the inclusion of the auxiliary regressors, leading to model spaces with sizes of 16 and 256.
The structure of this article is as follows. The Frechet mixturemodel along with its likelihood function is formulated in section 2. The expressions for posterior distributions using the non-informative and informative priors are derived in section 3. In section 4, the Bayes estimators and posterior risks using the uniform, the Jeffreys’, the exponential and the inverse levy priors under squared error loss function (SELF), precautionary loss function (PLF) and DeGroot loss function (DLF) are presented. The elicitation of hyperparameters is given in section 5. In section 6, the limiting expressions of the Bayes estimators and their posterior risks are derived. The simulation study and the real data applications are presented in section 7 and 8, respectively. This article concludes with a brief discussion in section 9.