This paper contributes to this research thread. Our focus is on enhancing linear algo- rithms to obtain the complex decision functions traditionally given by kernels, through the use of locally linear decision functions. We propose a new multi class learning algorithm based on a latent SVM formulation. For each sample and class, during training as well as during testing, our algorithm selects a different weighted combination of linearmodels (as in Yu et al. (2009) and Ladicky and Torr (2011)). The sample and class specific weights are treated as latent variables of the scoring function (Felzenszwalb et al., 2010) and are obtained by locally maximizing the confidence of the class model on the sample. As opposed to previous methods, we do not require a 2-stage formulation, i.e. our approach does not require to first learn the manifold using a reconstruction (or soft-assignment) technique and then learn a linear SVM in the manifold, nor any nearest-neighbor search. Our algorithm is trained in a winner-take-all competitive multi class fashion, so that each class tries to maximize its score on each sample by using an optimal combination of models, competing with the others in the training process. Moreover, compared to standard latent SVM imple- mentations, our formulation allows to use soft combinations of models, where the sparsity of the combinations, and thus the smoothness of the solution, is tuned using a p-norm con- straint. The solution of the p-norm constrained score maximization problem is shown to be efficiently computable in closed-form, and using this analytic solution we also obtain a prediction rule in which the local weights do not need to be explicitly computed. We call our method Multiclass Latent Locally Linear Support Vector Machine (ML3).
Building high accuracy speech recognition systems with limited language resources is a highly challenging task. Although the use of multi-language data for acoustic models yields improve- ments, performance is often unsatisfactory with highly limited acoustic training data. In these situations, it is possible to con- sider using multiple well trained acoustic models and combine the system outputs together. Unfortunately, the computational cost associated with these approaches is high as multiple decod- ing runs are required. To address this problem, this paper exam- ines schemes based on log-linear score combination. This has a number of advantages over standard combination schemes. Even with limited acoustic training data, it is possible to train, for example, phone-specificcombination weights, allowing de- tailed relationships between the available well trained models to be obtained. To ensure robust parameter estimation, this paper casts log-linear score combination into a structured support vec- tor machine (SSVM) learning task. This yields a method to train model parameters with good generalisation properties. Here the SSVM feature space is a set of scores from well-trained individ- ual systems. The SSVM approach is compared to lattice rescor- ing and confusion network combination using language packs released within the IARPA Babel program.
In , structured discriminative models are trained using the feature space based on phone log-likelihoods with the same context but different central phone generated by tandem and hy- brid systems. Small gains were observed from using additional log-likelihoods extracted from the same models.  exam- ines combination of hybrid and tandem systems with log-linearmodels, and applies learnt phone-specificcombination weights to frame level joint decoding, achieving a small performance gain.  discusses model combination at sentence level, using system-specificcombination weights. A more general frame- work was introduced by , where systems are combined at word level, and the word-specificcombination weights are trained with the minimum Bayes risk (MBR) criterion. Another approach was investigated in . However, these approaches are still limited as some words in decoding may not appear in training, especially in low resource language tasks.
This paper presents a case study of two NPP sites - Tarapur and Kakrapar in India. Based on a combination of linear and aerial earthquake source models, the observed earthquake data over a long period of time and an attenuation relation for spectral acceleration, PGAs are evaluated for various specified values of the mean recurrence interval (i.e. return period). The return period for PGA for OBE is taken, in many countries, to be of the order of 10 2 years. From this study, it is seen that the ratio of the values of PGA for OBE and SSE is highly site-specific. The paper presents the distribution of the calculated values of PGA at site during past earthquakes, the computed ratio of the values of PGA for OBE and SSE and the sensitivity of this ratio to the parameters in the earthquake occurrence model. The response of structures to OBE and SSE, considering the same shape of the ground motion time-history will be dependent on the applicable values of damping. Similarly, the earthquake response of various equipment or piping supported on various floors of the structure will be dependent on damping. The paper presents a simple case study to show the response of a structure and equipment and computes the ratio of the peak responses to OBE and SSE. This ratio is significantly different from the ratio of the values of PGA under SSE and OBE.
By now, inference in the basic regression problem is well-understood from both frequentist and Bayesian perspectives. However, for the variable selection problem, a fully satisfactory theory/method has yet to emerge. It is not our goal to review the extensive literature on variable selection, but it can be insightful to see where the fundamental difficulty arises. The most popular strategies are stepwise selection procedures and the lasso (Tibshirani 1996) and its many variants; see Hastie et al. (2009) for a thorough review of these strategies. These methods have a common drawback, which is that they cannot assign any meaningful measures of uncertainty---probabilistic or otherwise---to the set of variables selected. From a Bayesian perspective, probabilistic summaries of various models can be obtained by introducing a prior probability over the model space and a conditional prior on the model parameters, and performing a Markov chain Monte Carlo scan of the model space. For relatively small p this scheme is feasible (e.g., Clyde and George 2004), but it typically requires a convenient choice of prior for parameters given the model, which may overly influence the posterior calculations. Furthermore, as p increases, estimates of posterior model probabilities become less reliable heaton.scott.2009, making it questionable whether the ``mostly likely'' model has been identified. Since there seems to be no fully satisfactory approach among the existing methods, it makes sense to consider something new and different.
Multilayer feedforward networks (perceptrons and radial basis function networks) have emerged as the most attractive neural network architecture for various spatial analysis tasks (Fischer and Gopal 1994a; Gopal and Fischer 1996, 1997; Leung 1997). Analytical results show that two-layer (one hidden layer) feedforward networks are very capable of approximating arbitrary mappings in the presence of noise. However, they do not provide more than very general guidance on how this can be achieved, and what guidance they do offer suggests that network training will be difficult. Consequently, there is an urgent need to develop application domain-specific methodologies which provide more specific guidelines for judicious use of neural network approaches in SDA.
2. Problem of the ways to deal with cross-level data: In educational research, it is often the case that a researcher is interested in investigating the relationships between environmental factors (e.g., teaching styles, teacher behaviors, class sizes, educational policies, etc.) and individual outcomes (e.g., performance, attitudes, behaviors, etc.). But given that outcomes are collected at the student level, and other variables at group levels (e.g., classroom, schools, school districts), the question arises as how to deal with cross-level interaction. One approach is to bring group-level variables down to student level (i.e., assign classrooms, teachers, or school characteristics to all students). This results in two problems. The first is that the resulting statistical inferences (i.e., significance tests) are biased and typically over-optimistic . The second is that failure to incorporate schools in the statistical model means that the influence of school is ignorable . The other way to deal with the problem of cross-level data is to aggregate student levels up to group levels which means that do regression over the means of group level. This aggregated analysis has several problems. The first is that much of the individual variability on outcome variable is lost. It can lead to under- or over-estimation of observed relationships between variables . The second is that outcome variable change significantly and substantively from individual achievement to average group-level achievement. Furthermore, disregarding within-school variance will yield a large increase of multiple correlation coefficients . Neither disaggregated analyses nor aggregated analyses can adequately describe the nature of hierarchical data well. What are required are models which simultaneously can model student level relationships and take into account of grouping. 3. The problem of the units of analysis: A classic and
On the bases of the ARMA (1, 2) model, the tentative GARCH (p, q) ) models were built around AR (1) and MA (2). Thus AR (1) implies ARCH (1) and MA (2) means GARCH (2). Table 5 gives the GARCH (p, q) models. GARCH (1, 3) turn out as the best among the GARCH models, this gives a moving average of order 3. Hence the LULU smoothing with a length of 3 (i e order 3) was performed on the data. Thus we fitted GARCH (1, 0) on the LULU smoothed data and named it ARCHLU (1, 3) model since the LULU is of order 3 as seen in table 6. The GARCH (1, 3) under the moving average smoothing and the ARCHLU (1, 3) under the LULU smoothing methods are compared to find out the most appropriate model and the more accurate smoothing method. This is showed in table 7. ARCHLU (1, 3) model has the least AIC and BIC values as compared with the GARCH (1, 3) model. This means that the LULU smoothing method is proven one more time to be more accurate since it produced a good model as compared to the moving average smoothing method. Based on the model output of the ARMA (1, 2), ARLU (1, 2), GARCH (1,3) and ARCHLU (1, 3) models, ARMA (1, 2), ARLU (1, 2) and ARCHLU (1, 3) were selected. Based on the model diagnostics and adequacy checks performed on the ARMA (1, 2), ARLU (1, 2) and ARCHLU (1, 3) models, it was realized that all the three models represent the data adequately. Thus to select the most appropriate model among these models we observed the AIC and the BIC values of these models. Table 8 gives the AIC and BIC values of the aforementioned models. The ARLU (1, 2) stood out as the most appropriate model. The forecasting evaluation and accuracy criteria were also used for the selection of the most appropriate model. The models were evaluated
Based on these assumptions, the first contribution of this paper is a simple and effective two- step estimation procedure for the multivariate L´ evy processes model of Ballotta and Bonfiglioli (2014). Step one consists in the univariate estimation of the common process parameters on the time series of index returns; the estimation of the loadings, i.e. the common factor’s weight in each margin, and the idiosyncratic components parameters is performed in Step two. To assess this estimation procedure, we also implement a standard one-step maximum likelihood approach in which all parameters of the multivariate L´ evy process are estimated in a single step. The second contribution of this paper is the computation of the intra-horizon Value at Risk (VaR) for a portfolio of assets following the considered model, in this way extending the work of Bakshi and Panayotov (2010) to a multivariate setting which allows to take into account also the impact of dependence between the components of the portfolio. Traditional risk measures, as Value at Risk or Expected Shortfall, focus on possible losses at the end of a predetermined time horizon; nevertheless, investors are also interested in the exposure to loss throughout the horizon, as they often have thresholds that cannot be breached for the investment to survive. The emphasis on intra-horizon risk was first placed by Stulz (1996); Kritzman and Rich (2002) and Boudoukh et al. (2004) deal with intra-horizon risk assuming Gaussian distributed returns and considering a multi-year investment horizon, while Bakshi and Panayotov (2010) focus on the 10-day horizon relevant for regulatory purposes and consider univariate L´ evy pure jump models for asset or portfolio returns. We note that intra-horizon risk measures are defined on the distribution of the minimum return; whilst under the arithmetic Brownian motion assumption this distribution is analytically known, in general it must be recovered numerically. To this purpose, we adopt the Fourier Space Time-stepping (FST) algorithm introduced by Jackson et al. (2008).
Given that the literature on time series forecasting remains ambiguous on the choice of combination strategy, the core objective of this study is to introduce an effective com- bination methodology and elucidate how individual models can be combined to improve financial time series forecasting. Accordingly, this study presents a comprehensive discus- sion on series and parallel combination methods and then, constructs a model using both techniques to combine MLP as a nonlinear model and ARIMA as a linear model. Then, using two combination strategies, ARIMA-MLP and MLP-ARIMA, the series and parallel hybrid models, comprising simple average (SA), linear regression (LR) and genetic algorithm (GA), are compared with their individual components. To evaluate the effec- tiveness of the hybrid models and introduce a more accurate and reliable hybrid method, two benchmark datasets, the closing of Shenzhen Integrated Index (SZII) and that of Standard and Poor’s 500 (S&P 500), are selected for the forecasting and modeling.
In this paper, a finite-state approach to Moses, which is a PB-SMT state-of-the-art system, is presented. A monotone framework is adopted, where 7 mo- dels in log-linearcombination are considered: a di- rect and an inverse PB translation probability model, a direct and an inverse PB lexical weighting model, PB and word penalties, and a target language model. Five out of these models are based on PB scores which are organized under a PB translation table.
In the sequel, we propose and study adaptive composite M-estimation (ACME) based on (3); it simultaneously shrinks toward the true overlapping model structure while estimating the shared coefficients in that structure. For the models from Figure 1, ACME automatically chooses risk factors strongly associated with high blood glucose levels and estimates their same effects across different levels. Our procedure yields estimators with improved efficiency by information combination across the models. It correctly selects both the true overlap structure and the true non-zero parameters with probability 1 in large samples. The parameter estimators hereby are oracle in the sense that they have the same distribution as the oracle estimator based on knowing the true model structure a prior.
HRV signal is a non-stationary signal and its changes can be interpreted as a current or upcoming disease. Several methods for automatic detection and classification of cardiac arrhythmias have been proposed in literature, including: artificial immune recognition system with fuzzy weighted, threshold-crossing intervals, neural networks, fuzzy neural networks, fuzzy equivalence relations, Bayesian classifiers, support vector machines, wavelet transforms, combined wavelet transformation and radial basis neural networks, fuzzy logic combined with the Markov models and the rule based algorithms. Some papers used techniques which are based on ECG segment. The various features of the ECG signal including the morphological features are extracted and used for classification of the cardiac arrhythmias. This is a time consuming procedure and the results are very sensitive to the amount of noise. An alternative approach would be to extract the HRV signal from the ECG signal first by recording the R-R time intervals and then processing the HRV signal instead. This is a more robust method since the R-R time intervals are less affected by noise. One drawback of the proposed HRV-based algorithm is that some of the arrhythmia types such as the left bundle branch block and the right bundle branch block beats cannot be detected using only the heart rate variability (HRV) features. In this, a new arrhythmia classification algorithm is proposed which is able to effectively classify seven types of arrhythmia. These arrhythmias are namely the normal beat (NB), left bundle branch block beat (LBBB), right bundle branch beat (RBBB), premature ventricular contraction (PVC), fusion of ventricular and normal beat (FUSION), atrial premature contraction (APC) and paced beat (PACE). In this, various features from both ECG and HRV signal are extracted and given to a genetic programming to produce the suitable solution trees to distinguish between different types of arrhythmia. From the various identified features the proposed method selects the effective ones and categorizes the seven classes of heart arrhythmia highly precisely.The main objective of the proposed work is: genetic programming is applied to classify heart arrhythmias using both HRV and ECG features. Genetic programming selects effective features, and then finds the most suitable trees to distinguish between different types of arrhythmia.
In particular, risk assessment of external hazards is required and utilized as an integrated part of PRA for operating and new reactor units. In the light of the Fukushima accident, of special interest are correlated events, whose modelling is proposed in the present study, in the form of some theoretical concepts, which lay the foundations for the PSA framework implementation. An applicative example is presented for illustrative purposes, since the analysis is carried out on the basis of generic numerical values assigned to an oversimplified model and results are achieved without any baseline comparison. Obviously the first step aimed at the process endorsement is the analysis of all available information in order to determine the level of applicability of the observed specific plant site events to the envisaged model and the statistical correlation analysis for event occurrence data that can be used as part of this process.
We want solutions in exactly this format, no other format will do. Thus one must use full rank generalized inverse. This full rank generalized inverse is always obtained in Experimental Designs theory, by placing, if necessary, a linear restriction on one or more of a set of parameters, as in (27).
Cross-sectional studies are an attractive option for surveillance because of their feasibility and cost-effec- tiveness in populations, but this approach for growth monitoring has several inherent limitations. For example, they can be confounded by secular trends, such as selec- tive mortality that leads to perceived improved growth at older ages due to the better health of the survivor pop- ulation . Additionally, cross-sectional growth data may display large skewness and kurtosis and may exhibit substantial heteroskesdasticity, and marginal analyses to describe population trajectories require transformations to normality, weighting or both to achieve an adequate fit . While longitudinal data may also suffer from the same problems, linear mixed effects models natu- rally take skewness, kurtosis, and even heteroskedasticity into account, making transformations not necessary. The utility of transformation techniques remains controver- sial. Indeed, while transformations may lead to a better fit of Gaussian models, they require a priori knowledge of the data structure. The flexible Box-Cox transforma- tion family of distributions [14, 15] can be used, but it may fail when data are clustered. Moreover, interpreta- tion of transformed data is problematic, and producing predictions at the subject and population level is not straightforward.
The trend equations were fitted by using different linear, non-linearmodels, exponential smoothing models and time series models for identifying the trend. Growth models are nothing but the models that describe the behaviour of a variable varying with respect to time. They are very quick to estimate and less expensive, although less efficient. They are very good in many situations for describing the growth pattern and the future movement of a time series models are widely used to estimate the growth rate of time series data.
In contrast to methods predicated on a parametric (normality) assumption for the random effects, the method yields valid inferences under departures from this assumption and are competitive when the assumption holds. While the refined re- gression calibration estimator of Wang et al. (2000), which performed better than naive and regression calibration in our simulations, is only available for the logistic and probit models, our approach applies to any generalized linear model formulation. Although the conditional estimation methods are fast to implement and performed the best among existing approaches, they can not provide insight into the random effects density, while our approach shows potential efficiency gains when the smooth- ness assumption holds and provides reliable estimator for the underlying random effects distribution. The additional flexibility afforded by the SNP representation is sufficient to capture the underlying features of the random effects.
According to the literature, the performance of vari- ous MLTs was found sensitive to the application of approaches for imbalanced data [11, 26]. For example, SVM with different kernels (linear, radial, polynomial, and sigmoid kernels) was analysed on a genomics bio- medical text corpus using resampling techniques and reported that normalized linear and sigmoid kernels and the RUS technique outperformed the other ap- proaches tested . SVM and k-NN were also found sensitive to the class imbalance in the supervised sen- timent classification . Addition of cost-sensitive learning and threshold control has been reported to intensify the training process for models such as SVM and artificial neural network, and it might provide some gains for validation performances, not con- firmed in the test results .
The key components of our variable selection approach are first, exploiting the inherent selection of variables done by the sparse linear SVM LP(3), second, aggregation of many sparse models to overcome the unreliability of any single model, and third, visualization or analysis of bagged models to discover trends. In this research, we only investigated a sparse SVM regression algorithm, but any sparse modeling process can serve a similar function. We focused on starplots for model visualization, but other visualization methods can also be applied and may be more informative. For instance, we found parallel coordinate plots of variable weights versus bootstraps to be valuable.