Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for exploratory data analysis are usually not ﬂexi- ble enough to deal with the uncertainty inherent to real-world data: they are often restricted to ﬁxed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous data; moreover, their expressiveness generally comes at the price of intractable inference. As a re- sult, supervision from statisticians is usually needed to ﬁnd the right model for the data. However, since domain experts are not necessarily also experts in statistics, we propose Automatic **Bayesian** **Density** Analysis (ABDA) to make exploratory data analysis accessible at large. Speciﬁcally, ABDA allows for automatic and efﬁcient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate **density** estimation. Extensive empirical evidence shows that ABDA is a suitable tool for automatic exploratory analysis of mixed continuous and discrete tabular data.

Show more
We apply the proposed method to the data used in Longnecker et al. (2001) and Dunson and Park (2008). DDT has been widely used and shown to be effective against malaria-transmitting mosquitoes, but several health-threatening effects of DDT have been also reported. Longnecker et al. (2001) used the data from the US Collaborative Perinatal Project to investigate the association between DDT and preterm birth, defined as delivery before 37 weeks of complete gestation. The authors showed that adjusted for other covariates, increasing concentrations of maternal serum DDE, a persistent metabolite of DDT, led to high rate of preterm birth by fitting a logistic regression model with categorized DDE levels. Dunson and Park (2008) applied a kernel stick-breaking process mixture of linear regression models to the same data with a focus on the predictive **density** of gestational age at delivery (GAD), concluding strong evidence of a steadily increasing left tail with DDE dose. For more information on the study design and data structure, refer to Longnecker et al. (2001).

Show more
99 Read more

Despite providing insight into the assumptions behind models of categorization, existing rational analyses of category learning leave a number of questions open. One particularly important question is whether rational learners should use an exemplar or prototype representation. The greater flexibility of nonparametric **density** estimation has motivated the claim that exemplar models are to be preferred as rational models of category learning (Nosofsky, 1998). However, nonparametric and parametric methods have different advantages and disadvantages: the greater flexibility of nonparametric methods comes at the cost of requiring more data to estimate a distribution. The choice of representation scheme should ultimately be determined by the stimuli presented to the learner, and existing rational analyses do not indicate how this decision should be made (although see Briscoe & Feldman, 2006). This question is complicated by the fact that prototype and exemplar models are not the only options. A number of models have recently explored possibilities between these extremes, representing categories using clusters of several exemplars (Anderson, 1990; Kruschke, 1990; Love, Medin, & Gureckis, 2004; Rosseel, 2002; Vanpaemel, Storms, & Ons, 2005). The range of representations possible in these models emphasizes the significance of being able to identify an appropriate category representation from the stimuli themselves: with many

Show more
51 Read more

Prediction of the future observations is an important practical issue for statisticians. When the data can be viewed as exchangeable, de Finneti’s theorem concludes that, conditionally, the data can be modeled as independent and identically distributed (i.i.d.). The predictive distribution of the future observations given the present data is then given by the posterior expectation of the underlying **density** function given the observations. The Dirichlet process mixture of normal densities has been success- fully used as a prior in the **Bayesian** **density** estimation problem. However, when the data arise over time, exchangeability, and therefore the conditional i.i.d. structure in the data is questionable. A conditional Markov model may be thought of as a more general, yet having sufficiently rich structure suitable for handling such data. The predictive **density** of the future observation is then given by the posterior expecta- tion of the transition **density** given the observations. We propose a Dirichlet process mixture prior for the problem of **Bayesian** estimation of transition **density**. Appropri- ate Markov chain Monte Carlo (MCMC) algorithm for the computation of posterior expectation will be discussed. Because of an inherent non-conjugacy in the model, usual Gibbs sampling procedure used for the **density** estimation problem is hard to implement. We propose using the recently proposed “no-gaps algorithm” to overcome the difficulty. When the Markov model holds good, we show the consistency of the Bayes procedures in appropriate topologies by constructing appropriate uniformly exponentially consistent tests and extending the idea of Schwartz (1965) to Markov processes. Numerical examples show excellent agreement between asymptotic theory and the finite sample behavior of the posterior distribution.

Show more
142 Read more

Thus, rather than expressing prior knowledge through the prior parameters, all of them are random. Of the **density** estimators presented above, those that work with fully **Bayesian** techniques re- quire a substantial amount of knowledge either about the underlying **density** or how a fair number of parameters affects the results. It is necessary to specify a mean function, underlying distribu- tion, or individual values for the mean function for the Gaussian process in the work of Leonard, T. (1973). This was shown in some cases to heavily influence the results and therefore, should be chosen carefully. When working with the Dirichlet process mixture of normals, there are a range of parameters to specify starting with at least four. The path to inference is also not clear for some of the results from these **density** estimators. Although both of these estimators have the advantage of being smooth, and DPMN is known to be flexible, the method in Section 1.2 presents a **Bayesian** **density** estimate that is simpler to provide prior information for, while at the same time laying a clear path to inference.

Show more
307 Read more

In this thesis, we use mixtures of Dirichlet process (MDP) and mixtures of Polya trees priors (MPT) to perform **Bayesian** **density** estimation based on simulated data with different sizes. The data is simulated from a mixture of normal distribution. Moreover, to compare the performance between **Bayesian** methods and frequentist methods, we also use Gaussian kernel method.

39 Read more

Abstract: This paper considers estimating the ratio of two distributions with different parameters and common supports. We consider a **Bayesian** approach based on the Log–Huber loss function which is resistant to outliers and useful to find robust M-estimators. We propose two different types of **Bayesian** **density** ratio estimators and compare their performance in terms of **Bayesian** risk function with themselves as well as the usual plug–in **density** ratio estimators. Some applications such as classification and divergence function estimation are addressed.

12 Read more

In this paper we present a new methodology that offers a state space representation in a situation when data is collected at only one time point and the unknown state space parameter in this treatment is replaced by the discretised version of the multivariate probability **density** function (pdf) of the state space variable. The focus is on the learning of the static unknown model parameter vector rather than on prediction of the state space parameter at a time point different to when the observations are made. In fact, the sought model parameter vector is treated as embedded within the definition of the pdf of the state space variable. In particular, the method that we present here pertains to a partially observed state space, i.e. the observations comprise measurements on only some—but not all—of the components of the state space vector. Thus in this paradigm, probability of the observations conditional on the state space parameters reduces to the probability that the observed state space data have been sampled from the pdf of the full state space variable vector, marginalised over the unobserved components. Here this pdf includes the sought static model parameter vector in its definition. In addition to addressing missing data, the presented methodology is developed to acknowledge the measurement errors that may be non-Gaussian.

Show more
24 Read more

Adopting a Gamma prior distribution with parameters a and b for λ, Karunamuni and Quinn (1995) show that the posterior distribution of λ is also a Gamma distribu- tion with parameters a + n 2 and b 1 + T 2 −1 . Although classical Monte Carlo simula- tions could be used to simulate observations from the posterior distribution of λ, we use WinBUGS to draw random samples using MCMC techniques. This is motivated by generalizations to other probability **density** functions for the detection distances as well as spatial modeling for which explicit posterior distributions are difficult to obtain. We use the so-called “zeros trick” to implement the half-normal likelihood distribution because it is not included in the list of standard WinBUGS sampling distributions. This method consists of considering an observed data set made of 0’s distributed as a Poisson distribution with parameter φ so that the associated likelihood is exp(−φ). Now, if we set phi[i] to − log(L(i )) where the likelihood term L(i ) is the contribution of observed perpendicular distance y[i], then the likelihood distribution is clearly found to be L(i ). See the WinBUGS manual for further details. The WinBUGS code is as follows:

Show more
33 Read more

term implies that the shape of the forecasted **density** is ignored for observations outside the region of interest. To evaluate the CSL of competing models a procedure similar to the evaluation procedure based on KLIC scores is employed. Note that a proper threshold value r needs to be specified, determining which observations pertain to the left tail. For a fair evaluation, the number of pdf-terms and CDF-terms included in the CSL should be the same across all competing models such that a model-independent threshold is required. Though the threshold is allowed to be time-varying, the threshold is fixed at -2.5% in this study, which is the 7.7% to 8.8% quantile of the unconditional return distribution during the out-of-sample period for the S& P500 and Nikkei 225.

Show more
A goal of this paper is to review, from both **Bayesian** and frequentist (classical) perspec- tives, several nonparametric techniques that have been employed in the economics literature, to illustrate how these methods are applied, and to describe the value of their use. In the first part of our review we focus on **density** estimation. When discussing the issue of **density** estimation we begin by reviewing frequentist approaches to the problem, as commonly seen in economics, then illustrate those methods in an example. Once this has been completed, we repeat that same process - first reviewing methods and then focusing on their application - although this time we do so from a **Bayesian** perspective. We follow the same general pattern as we cover nonparametric estimation of regression functions. For both **density** and regression estimation, we pay particular attention to what are perceived as key implementa- tion issues: the selection of the smoothing parameters and kernel functions in the frequentist case, and the treatment of smoothing parameters and the number of mixture components in the **Bayesian** paradigm.

Show more
26 Read more

come only by the raw numbers of unbound sequences considered, along with the bound sequences, but the weights in itself do not emerge as an important factor because of the way the scores are built. The construction of a prior for nucleosomal positional estimates, though a useful technique, can suffer from one major drawback . The predictions for motif search lie would propagate the statistical uncertainties associated with prediction strategies directly into motif algorithms. This often yields results with low predictive power. The ideal would be to work with raw nucleosomal intensity data, and incorporate its likelihood into the broad **Bayesian** framework of a motif model. and Our approach in this paper is to use nucleosomal information, in a way such that these prediction biases do not arise. Also, we have addressed the goal of statistically quantifying the biological connection between motifs and signal strength . In order to achieve these,we have formulated an unique joint model that predicts nucleosome positions and motifs simultaneously, based on the gene expression and sequence data.

Show more
146 Read more

Recently, the focus in finance has shifted more towards continuous-time models, and continuous- time versions of stochastic volatility models have been proposed. In particular, Barndorff-Nielsen and Shephard (2001) introduce a class of models where the volatility behaves according to an Ornstein- Uhlenbeck process, driven by a positive L´evy process without Gaussian component (a pure jump pro- cess). These models introduce discontinuities (jumps) into the volatility process. The latter paper also considers superpositions of such processes. **Bayesian** inference in such models through MCMC meth- ods is complicated by the fact that the model parameters and the latent volatility process are often highly correlated in the posterior, leading to the problem of overconditioning. Griffin and Steel (2006b) pro- pose MCMC methods based on a series representation of L´evy processes, and avoid overconditioning by dependent thinning methods. In addition, they extend the model by including a jump component in the returns, leverage effects and separate risk pricing for the various volatility components in the super- position. An application to stock price data shows substantial empirical support for a superposition of processes with different risk premiums and a leverage effect. A different approach to inference in such models is proposed in Roberts et al. (2004), who suggest a reparameterisation to reduce the correlation between the data and the process. The reparameterised process is then proposed only in accordance with the parameters.

Show more
12 Read more

In this work we propose an approach, summarized in Figure 1, for inverting geodetic data — in particular those derived from InSAR measurements — using a **Bayesian** probabilistic inversion algorithm capable of including multiple independent data sets (e.g., González et al., 2015; Hooper et al., 2013; Sigmundsson et al., 2014). To ef ﬁ ciently sample the posterior PDFs, we implement a Markov chain Monte Carlo method (MCMC), incorporating the Metropolis-Hastings algorithm (e.g., Hastings, 1970; Mosegaard & Tarantola, 1995), with automatic step size selection. We then review and discuss existing methodologies to characterize errors in InSAR data and to subsample large data sets, which are both necessary steps to be performed prior to an inversion. The proposed method is applied to the inversion of synthetic InSAR and GNSS data to demonstrate the ability of the algorithm to retrieve known source parameters. Finally, as a test case, we invert InSAR data spanning the 2015 M w 6.4 Pishan (China) earthquake and determine the fault model parameters

Show more
18 Read more

The interest of this paper is to obtain a closed form for the **Bayesian** predictive **density** of the kth ordered future observation from the proposed two-component general class of distributions under random censoring.Random censoring is one in which each individual is assumed to have a lifetime T and a censoring time C, with T and C independent continuous random variables, with reliability functions R(t) and G(t),respectively. All lifetimes and censoring times are assumed to be mutually independent, and it is assumed that G(t) does not depend on any of the parameters of R(t). Random censoring occurs frequently in practice especially in clinical and medical trials.

Show more
18 Read more

In the following we refer to those models with highest probabilities as the preferred models. For the **Bayesian** approach, the preferred detection function model included the covariates year, type and state in the model for the scale parameter of the hazard-rate key function (probability = 1.00 to two decimal places, Table 3). Two other models were visited within the RJMCMC algorithm with probabilities of <0.001 that included two (type and state) or one covariate only (type). The same model with all three covariates was the preferred model for the two-stage approach having been selected by AIC in 81% of bootstrap resamples. Three other models were selected: one with covariates year and state (16%), one with type and state (2%) and one with state alone (1%). For the count model, two models dominated the RJMCMC algorithm, the model with covariates type, Julian day and state as the preferred model (0.89 probability) and the full model (year + type + Julian day + state, 0.11 probability, Table 3). For the bootstrap the latter was the preferred model, selected in 89% of resamples, while the former was the second most frequently chosen model (10%). Two other models were chosen during the bootstrap including the covariates year, type and state (1%) and the model including covariates type and state (<1%). Hence, the largest discrepancy in model probabilities between the two analysis methods was with regard to covariate year for which the total probabilities to be included in any model was 0.11 for the RJMCMC algo- rithm and 0.90 for the bootstrap (Table 3). However, 95% confidence intervals obtained from the bootstrap overlapped zero for both year coefficients (Table 4) indicating that this covariate might have less importance than suggested by model probabilities for the bootstrap.

Show more
32 Read more

Abstract: Multivariate kernel regression is an important tool for investigating the relationship between a response and a set of explanatory variables. It is generally accepted that the perfor- mance of a kernel regression estimator largely depends on the choice of bandwidth rather than the kernel function. This nonparametric technique has been employed in a number of empiri- cal studies including the state-price **density** estimation pioneered by A¨ıt-Sahalia and Lo (1998). However, the widespread usefulness of multivariate kernel regression has been limited by the dif- ficulty in computing a data-driven bandwidth. In this paper, we present a **Bayesian** approach to bandwidth selection for multivariate kernel regression. A Markov chain Monte Carlo algorithm is presented to sample the bandwidth vector and other parameters in a multivariate kernel regres- sion model. A Monte Carlo study shows that the proposed bandwidth selector is more accurate than the rule-of-thumb bandwidth selector known as the normal reference rule according to Scott (1992) and Bowman and Azzalini (1997). The proposed bandwidth selection algorithm is applied to a multivariate kernel regression model that is often used to estimate the state-price **density** of Arrow-Debreu securities. When applying the proposed method to the S&P 500 index options and the DAX index options, we find that for short-maturity options, the proposed **Bayesian** band- width selector produces an obviously different state-price **density** from the one produced by using a subjective bandwidth selector discussed in A¨ıt-Sahalia and Lo (1998).

Show more
31 Read more

identical to regular bagging except that the weights are continuous-valued on (0, 1), instead of being restricted to the discrete set {0, 1 n , n 2 , . . . , 1}. In both cases, the weights must sum to 1. In both cases, the expected value of a particular weight is n 1 for all weights, and the expected correlation between weights is the same (Rubin, 1981). Thus **Bayesian** bagging will generally have the same expected point estimates as ordinary bagging. The variability of the estimate is slightly smaller under **Bayesian** bagging, as the variability of the weights is n+1 n times that of ordinary bagging. As the sample size grows large, this factor becomes arbitrarily close to one, but we do note that it is strictly less than one, so the **Bayesian** approach does give a further reduction in variance compared to the standard approach. In practice, for smaller data sets, we often find a significant reduction in variance, possibly because the use of continuous-valued weights leads to fewer extreme cases than discrete- valued weights.

Show more
I use the default settings described by the authors. Finally, for the SSVS of George, Sun and Ni (2008) I set 2 1 = 0:0001, 2 2 = 4 and = 0:5; see the Technical Appendix for more details. I also simplify estimation by plugging in the OLS estimate of the PVAR covariance matrix, which allows to reduce uncertainty regarding covariance matrix estimates 9 . This is a typical thing to do in **Bayesian** analysis of large systems, and has been extensively used in the …rst **Bayesian** VAR applications of the Minnesota prior; see Kadiyala and Karlsson (1997) for more details and references. In this Monte Carlo exercise interest lies in the large dimensional vector of coe¢cients so I use the OLS estimate of the covariance matrix in order to control for uncertainty regarding (MCMC) sampling of .

Show more
26 Read more

constraints into the formulation of foreground classification. In the second phase of their approach, pixel values that could be explained away by distributions of neighboring pixels are reclassified as background, allowing for greater resilience against dynamic backgrounds. In [8], the background and foreground models are first constructed via KDE technique separately, which are then used competitively in a MAP- MRF decision framework. Mittal and Paragios [9] propose the use of variable bandwidths for KDE to enable modeling of arbitrary shapes of the underlying **density** in a more natural way. Parag and Elgammal [10] use a boosting method (RealBoost) to choose the best feature to distinguish the foreground for each of the areas in the scene. However, one key problem with kernel **density** estimation techniques is their high computational requirement due to the large number of samples needed to model the background. A **Bayesian** framework that incorporates spectral, spatial, and temporal features to characterize the background appearance is proposed in [11]. Under this framework, the background is represented by the most significant and frequent features, that is, the principal features, at each pixel.

Show more
14 Read more