prediction Interval

Top PDF prediction Interval: A Tight Prediction Interval for False Discovery Proportion under Dependence

The false discovery proportion (FDP) is a useful measure of abundance of false positives when a large number of hy- potheses are being tested simultaneously. Methods for controlling the expected value of the FDP, namely the false dis- covery rate (FDR), have become widely used. It is highly desired to have an accurate prediction interval for the FDP in such applications. Some degree of dependence among test statistics exists in almost all applications involving multiple testing. Methods for constructing tight prediction intervals for the FDP that take account of dependence among test sta- tistics are of great practical importance. This paper derives a formula for the variance of the FDP and uses it to obtain an upper prediction interval for the FDP, under some semi-parametric assumptions on dependence among test statistics. Simulation studies indicate that the proposed formula-based prediction interval has good coverage probability under commonly assumed weak dependence. The prediction interval is generally more accurate than those obtained from ex- isting methods. In addition, a permutation-based upper prediction interval for the FDP is provided, which can be useful when dependence is strong and the number of tests is not too large. The proposed prediction intervals are illustrated using a prostate cancer dataset. The variability of p-values

Table 1 gives p-value prediction intervals when π b = 1/2 and n = 20, 50, and 100 using the bootstrap prediction interval method described in Section 3. For example, an 80% prediction interval when n = 50 is (0.000031, 0.29) Thus, a replication of the experiment will result in a p-value lying in this interval 80% of the time. The related 90% lower and upper bounds are 0.000031 and 0.29, respectively. The central 50% interval is (0.00082, 0.076), encompassing values much farther from the exact p-value = 0.011 than the normal approximation p-value = 0.006. 1. Fuzzy logic based approach for short term traffic flow prediction

In reference  traffic flow forecasting is proposed based on type-2 fuzzy logic. Time period of 5 to 10 am and 2pm to 8 pm is considered for analysis. In the first case only historical information is used and in second case both historical and real time information is used for rule construction. Upper and lower membership function of interval type-2 fuzzy set can be used to generate a prediction interval. In reference  traffic flow prediction of single intersection is reported using fuzzy logic. Minimum and maximum number of vehicles within 5 minutes is defined to be equal to 60 and 200. In reference  traffic volume prediction model is presented based on fuzzy logic. Input variables considered are ‘day’ of a week and ‘time’ of a day for the fuzzy logic system. Fuzzy rules are defined using traffic volume data on hourly basis. Using extensive data sets if MAPE error is required to be minimized then future extension will be to use fuzzy neural network. Bayesian forecasting of mortality rates by using latent Gaussian models

To assess the quality of the prediction intervals obtained for the future probabilities of death we calculated the empirical coverage probabilities of the prediction intervals obtained, the mean width of the prediction intervals and the mean interval score. The quality of the mean forecasts was assessed by using the root-mean-squared error of the predicted means. For a ﬁxed prediction horizon k and age z the empirical coverage probability of the prediction interval obtained from a given model was computed as the proportion of the 25 −k intervals that include the observed probability of death at age z at the year T +k , for T = 1989, : : : , 2013 −k . The mean width of the prediction interval is the sample mean of the 25 −k widths of the prediction intervals obtained and the mean interval score is the sample mean of the scoring rule called the interval score; see equation (43) in Gneiting and Raftery (2007). As was explained in Gneiting and Raftery (2007) the interval score is a scoring rule which rewards the forecaster who obtains narrow prediction intervals and incurs a penalty, proportional to the level of signiﬁcance of the interval, if the observation misses the prediction interval. This means that we would like to obtain prediction intervals with low mean interval score. See also the on-line supplementary material of the present paper for a more detailed presentation of the interval score. Bayesian and Frequentist Prediction Using Progressive Type II Censored with Binomial Removals

In many practical problems of statistics, one wishes to use the results of previous data (past samples) to predict a future observation (a future sample) from the same population. One way to do this is to construct an interval which will contain the future observation with a specified probability. This interval is called a prediction interval. Prediction has been applied in medicine, engineering, business, and other areas as well. Hahn and Meeker  have recently discussed the usefulness of constructing prediction intervals. Bayesian prediction bounds for a future observation based on certain distributions have been discussed by several authors. Bayesian prediction bounds for future observations from the exponential dis- tribution are considered by Dunsmore , Lingappaiah , Evans and Nigm , and Al-Hussaini and Jaheen . Bayesian prediction bounds for future lifetime under the Weibull model have been derived by Evans and Nigm [6,7], and Bayesian prediction bounds for observable having the Burr type-XII distribution were obtained by Nigm , Al-Hussaini and Jaheen [9,10], and Ali Mousa and Jaheen [11,12]. Prediction was reviewed by Patel , Nagraja , Kaminsky and Nelson , and Al- Hussaini , and for details on the history of statistical Comprehensive Bayesian structural identification using temperature variation

Figure 9: Strain at G aluminium bridge - Prediction interval 95 % confidence for numerical model 9a, discrepancy function 9b and experimental response 9c... Conclusions This work applies[r] Bayesian Prediction of order statistics based on finite mixture of general class of distributions under Random Censoring

component mixture of general class of distributions. Samples under consideration are subject to random censoring. A closed form of Bayesian predictive density is obtained under a two-sample scheme. Applications to Weibull and Burr XII components are presented and comparisons with previous results are made. A numerical example is presented for special cases of the exponential and Lomax components to obtain interval prediction of first and last order statistics . A simulation study has been conducted to assess the effect of sample size, hyper parameters, and level of censoring on prediction interval and point prediction of a future observation coming from the two-component exponential model. Empirical evaluation of prediction intervals for cancer incidence

A relatively large sample size was associated with a narrow prediction interval, as seen for lung cancer among women in Denmark (Figure 3). The prediction intervals are based on asymptotic theory, which can result in underestimates of variance. Bootstrapping is a suitable method for inves- tigating this problem . We constructed a 95% predic- tion interval by bootstrapping the data for lung cancer among women in Denmark, assuming that each cell fol- lowed a binomial distribution in which the incidence rate and the number of person-years at risk were used as prob- ability of success and number of trials, respectively. We re- sampled the data 1000 times, calculating the predicted world-standardized incidence rate each time. The 95% bootstrap interval, calculated by selecting the 2.5 and 97.5 percentiles of the 1000 predictions, was 32.7 – 36.7, which is fairly close to the asymptotic interval of 32.6– 36.8. This indicates that the asymptotic intervals calculated in this study describe the uncertainty in the pre- dicted number of cases well, given a correctly specified model. A note on the graphical presentation of prediction intervals in random-effects meta-analyses

Findings: In addition to the point estimate of the between-study variation, a prediction interval (PI) can be used to determine the degree of heterogeneity, as it provides a region in which about 95% of the true study effects are expected to be found. To distinguish between the confidence interval (CI) for the average effect and the PI, it may also be helpful to include the latter interval in forest plots. We propose a new graphical presentation of the PI; in our method, the summary statistics in forest plots of RE meta-analyses include an additional row, ‘ 95% prediction interval ’ , and the PI itself is presented in the form of a rectangle below the usual diamond illustrating the estimated average effect and its CI. We then compare this new graphical presentation of PIs with previous proposals by other authors. The way the PI is presented in forest plots is crucial. In previous proposals, the distinction between the CI and the PI has not been made clear, as both intervals have been illustrated either by a diamond or by extra lines added to the diamond, which may result in misinterpretation. Hybrid learning for interval type 2 intuitionistic fuzzy logic systems as applied to identification and prediction problems

Abstract—This paper presents a novel application of a hybrid learning approach to the optimisation of membership and non-membership functions of a newly developed interval type-2 intuitionistic fuzzy logic system (IT2 IFLS) of a Takagi- Sugeno-Kang (TSK) fuzzy inference system with neural network learning capability. The hybrid algorithms consisting of decou- pled extended Kalman filter (DEKF) and gradient descent (GD) are used to tune the parameters of the IT2 IFLS for the first time. The DEKF is used to tune the consequent parameters in the forward pass while the GD method is used to tune the antecedents parts during the backward pass of the hybrid learning. The hybrid algorithm is described and evaluated, prediction and identification results together with the runtime are compared with similar existing studies in the literature. Performance comparison is made between the proposed hybrid learning model of IT2 IFLS, a TSK-type-1 intuitionistic fuzzy logic system (IFLS-TSK) and a TSK-type interval type-2 fuzzy logic system (IT2 FLS-TSK) on two instances of the datasets under investigation. The empirical comparison is made on the designed systems using three artificially generated datasets and three real world datasets. Analysis of results reveal that IT2 IFLS outperforms its type-1 variants, IT2 FLS and most of the existing models in the literature. Moreover, the minimal run time of the proposed hybrid learning model for IT2 IFLS also puts this model forward as a good candidate for application in real time systems. Estimation of Ready Queue Processing Time using Efficient Factor Type Estimator (E F T) in Multiprocessor Environment

It is common and well known idea that often the more input information provides better prediction subject to condition if information is related. Based on this thought the efficient factor type estimation technique has been introduced in order to get more precise confidence intervals compared due to Shukla et al. . Prediction Based on Generalized Order Statistics from a Mixture of Rayleigh Distributions Using MCMC Algorithm

This article considers the problem in obtaining the maximum likelihood prediction (point and interval) and Bayesian prediction (point and interval) for a future observation from mixture of two Rayleigh (MTR) distributions based on generalized order statistics (GOS). We consider one-sample and two-sample prediction schemes using the Markov chain Monte Carlo (MCMC) algorithm. The conjugate prior is used to carry out the Bayesian analysis. The results are specialized to upper record values. Numerical example is presented in the methods proposed in this paper. The value of joint ultrasonography in predicting arthritis in seropositive patients with arthralgia: a prospective cohort study

arthritis development are needed. However, our study and a previous one has shown that it may be useful to exclude the MTP joints . This would be convenient since it re- duces the time taken to perform the US examination. The second reason for different results is technical differences between US machines, which mainly seems to be important for detection of the PD signal . It is possible that in fu- ture studies, discrimination between effusion and synovial hypertrophy of the MTP joints may enhance prediction of those developing arthritis ; however, based on the avail- able literature before the start of the study, effusion and synovial hypertrophy were believed to be part of the same pathophysiological process and were often seen combined, which was the reason for scoring them as one in the current study . Differences between groups may be overcome in the future as the availability of good-quality US machines increases. The final reason is the fact that the use of US in prediction depends highly on the a priori chance of developing arthritis in the population that is in- vestigated. In patients with an already high risk, for instance those who are ACPA-positive (for example, 42% developed arthritis in the Leeds cohort , 46% in the present study) or those with a high probability based on clinical prediction rules, having US abnormalities was almost always asso- ciated with arthritis development (6 patients in the present study, with a 100% chance of developing arthritis) [9, 17, 19]. However, US might be of even more value in those subpopulations of at-risk patients in which there is Table 3 Added value of ultrasound over clinical parameters according to a clinical prediction rule Quantifying Uncertainty in Online Regression Forests

We now briefly explain the core concepts of the KLL sketch, and refer the reader to Karnin et al. (2016) for a detailed description. Estimating streaming quantiles is an established problem in data management. The task is to estimate quantile values from an incoming stream of potentially infinite real values, by utilizing minimal memory and computational cost. See the work by Greenwald and Khanna (2016); Wang et al. (2013) for surveys on the subject. The problem can be formulated as finding the approximate rank of a given value x in a sorted set of n items. An approximate quantile sketch returns the approximate rank up to an additive error of n. As customary, the KLL algorithm gives a probabilistic guarantee that the resulting sketch is an -approximation with probability 1 − δ (over its executions). Another important property is the mergeability of a sketch; this allows us to sketch different parts of a stream in parallel, and merge the partial sketches to a final sketch that has the same accuracy of a single sketch over the complete stream. This scenario is common in distributed settings (De Francisci Morales and Bifet, 2015), and we make use of this property to merge the quantile sketches of each tree in the ensemble into a single quantile prediction. Predicting Zones of Overpressure in Coastal Swamp Depobelt of Niger Delta Nigeria, Using Well-Log and Seismic Data

The methods adopted in this study include identification of key lithologic and reservoir units from gamma ray (GR), Resistivity (ILD) and spontaneous potential (SP) logs, well to seismic tie using check shot sonic (DT) and density (RHOB) logs. Faults were mapped on seismic as breaks in the continuity of reflections, while the horizions mapped were used in generating time map. Checkshots were inputed to the velocity equations to enable the conversion of time structural maps to depth. Petrophysical/reservoir quality analysis was carried out with well log data. Over pressured zones were predicted in wells by generating compaction curves obtained from the Eaton sonic transit time model while interval velocities and acoustic impedance were the basis of the seismic prediction. Overpressure mechanisms were deducted from the result of the study this enabled inferences of prospective localities. Block-Matching Translational and Rotational Motion Compensated Prediction Using Interpolated Reference Frame

It is found that up to 37% of the blocks can be better pre- dicted with rotational MCP. The proposed method has the merits of easy implementation and low overhead. The inter- polated frame used by rotational MCP is the same as that used by fractional-pixel accuracy MCP, which exists in most recent video coding standards. Experimental results show that higher fractional-pixel accuracies, for example, 1/16- pixel, cannot much further improve the prediction accuracy in translational MCP. Moreover, they require the additional computation overhead of extra interpolation calculation. With regard to the side information overhead, MCP with higher fractional-pixel accuracy needs more bits to transmit the higher fractional-pixel accuracy MV. For example the number of candidate search positions of 1/16-pixel accuracy MCP is around four times that of 1/8-pixel accuracy MCP. Our proposed method only needs to transmit one rotational angle parameter. For example four rotational angles can be represented by 2 bits, and so on. The increase in side information overhead is negligible. FINkNN : A Fuzzy Interval Number k-Nearest Neighbor Classifier for Prediction of Sugar Production from Populations of Samples

The first line in Table 5 shows the average prediction accuracy over all testing years for Larisa, Platy and Serres, respectively, using algorithm FINkNN with expert selected input variables; line 2 shows the results using L1-distances kNN (with expert input variable selection). Line 3 shows the results using FINkNN (with GA local search input variable selection); line 4 in Table 5 shows the best results obtained using a L1-distances kNN (with a GA local search input variable selection). Line 5 reports the results obtained by FINkNN (with GA input variable selection); line 6 in Table 5 shows the results using L1-distances kNN (with GA input variable selection). The last three lines in Table 5 were meant to demonstrate that prediction-by- classification is well posed in the sense that a small prediction error is expected from the outset. In particular, selection “medium” each year resulted in error rates 5.22%, 3.44%, and 5.54% for the Larisa, Platy, and Serres factories, respectively (line 7). Line 8 shows the average errors when a year was assigned randomly (uniformly) among the three choices “good”, “medium”, “poor”. Line 9 in Table 5 shows the minimum prediction error which would be obtained should each testing year be classified correct in its corresponding class “good”, “medium” or “poor”. The nearest to the latter minimum prediction error was clearly obtained by classifier FINkNN with an expert input variable selection. A Study on Prevalence of Cardiac Autonomic Neuropathy in Type 2 Diabetes Mellitus and use of QTc interval in its prediction

QT interval prolongation occurs due to loss of balance between sympathetic and parasympathetic innervations in heart, hypertrophy of left ventricle, changes in myocardium caused by electrolye and metabolic abnormalities and diseases affecting coronary arteries .Hyperglycemia and acute hypoglycemia can induce reversible QTc prolongation in healthy and diabetic patients. These all favours the basis for arrhythmias causing “Dead in Bed” syndrome. Innovation of prediction equations for milk composition estimation in milk recording at alternative sampling and half a day milking interval

Animals and their breed, milk samples, conditions of rearing and milking Cow herds with twice a day milking and regular interval (12/12 hours) were included in observation. There were 2 herds of Czech Fleckvieh (CF) cattle breed in region Plzeň – south (449 dairy cows) and 1 herd with both CF and Holstein (H) breed in region Ústí nad Orlicí (187 dairy cows). Milk recording results of last control year in herds 1, 2 and 3 were: 1) 7056 kg of milk per lactation (305 days), 4.05% of fat, 286 kg of fat, 3.65% of protein and 258 kg of protein; 2) 7519, 3.93, 296, 3.45 and 259; 3) 8310 kg, 3.79%, 315 kg, 3.40% and 283 kg. Milk samples (MSs) were taken stepwise according to localities in period from November 2011 to August 2012. The binding and free stabling were used in stables with pipeline and parlour milking diﬀ erent types from various producers. Dairy cow nutrition was typical for conditions of the Czech Republic in relevant season. Roughage feeding rations were supplemented by concentrates in accordance with feeding standards. The nutrition was characterized by total mixed ration. 