False discovery proportion

Top PDF False discovery proportion:

A Tight Prediction Interval for False Discovery Proportion under Dependence

A Tight Prediction Interval for False Discovery Proportion under Dependence

The false discovery proportion (FDP) is a useful measure of abundance of false positives when a large number of hy- potheses are being tested simultaneously. Methods for controlling the expected value of the FDP, namely the false dis- covery rate (FDR), have become widely used. It is highly desired to have an accurate prediction interval for the FDP in such applications. Some degree of dependence among test statistics exists in almost all applications involving multiple testing. Methods for constructing tight prediction intervals for the FDP that take account of dependence among test sta- tistics are of great practical importance. This paper derives a formula for the variance of the FDP and uses it to obtain an upper prediction interval for the FDP, under some semi-parametric assumptions on dependence among test statistics. Simulation studies indicate that the proposed formula-based prediction interval has good coverage probability under commonly assumed weak dependence. The prediction interval is generally more accurate than those obtained from ex- isting methods. In addition, a permutation-based upper prediction interval for the FDP is provided, which can be useful when dependence is strong and the number of tests is not too large. The proposed prediction intervals are illustrated using a prostate cancer dataset.
Show more

9 Read more

False discovery proportion estimation by permutations: confidence for significance analysis of microarrays

False discovery proportion estimation by permutations: confidence for significance analysis of microarrays

When multiple hypotheses are tested, interest is often in estimating the false discovery proportion (FDP), the number of false positives divided by the total number of rejections. When there are no rejections, the FDP is defined to be zero. When there is unknown dependence in the data, a challenge is to find methods that are powerful but also require few assumptions on the dependence structure (van der Laan et al., 2004; Meinshausen, 2006; Genovese and Wasserman, 2006). A highly popular method for estimation of the FDP is Significance Analysis of Microarrays (SAM) (Tusher et al., 2001). SAM is a very general method that is not at all limited to microarray data. It requires no parametric assumpions and almost no assumptions on the dependence structure in the data. Instead, it adapts to the dependence stucture by using permutations. Consequently Tusher et al. (2001) have been cited more than 10,000 times.
Show more

29 Read more

Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators

Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators

The False Discovery Rate (FDR) is a commonly used type I error rate in multiple testing problems. It is defined as the expected False Discovery Proportion (FDP), that is, the expected fraction of false positives among rejected hypotheses. When the hypotheses are independent, the Benjamini- Hochberg procedure achieves FDR control at any pre-specified level. By construction, FDR control offers no guarantee in terms of power, or type II error. A number of alternative procedures have been developed, including plug-in procedures that aim at gaining power by incorporating an estimate of the proportion of true null hypotheses.
Show more

37 Read more

Controlling false discovery rates in factorial experiments with between subjects and within subjects tests

Controlling false discovery rates in factorial experiments with between subjects and within subjects tests

Background: The False Discovery Rate (FDR) controls the expected number of false positives among the positive test results. It is not straightforward how to conduct a FDR controlling procedure in experiments with a factorial structure, while at the same time there are between-subjects and within-subjects factors. This is because there are P-values for different tests in one and the same response along with P-values for the same test and different responses.

5 Read more

Coordinate based meta analysis of functional neuroimaging data: false discovery control and diagnostics

Coordinate based meta analysis of functional neuroimaging data: false discovery control and diagnostics

Control of type 1 statistical error in neuroimaging is of huge importance, and the adaptive FDR control (BH-FDR) introduced by Benjamini and Hochberg [14] is often employed. The scheme estimates the number of falsely rejected hypotheses from N independent tests of level a as aN. It then attempts to find the test level where this estimate is at most some small percentage (say 5%) of the number of rejections. There are, however, problems when applied to voxel-wise analysis or fMRI. Firstly the tests in neighbouring voxels are not independent. Secondly, it is not the voxels themselves, but rather clusters of voxels, that should ideally be controlled [15,16]. Here we detail an FDR method where the numbers of false rejections are estimated directly from many realisations of the experiment generated under the null hypothesis; solving the issue with independence. We then go on to generalise the method to the control of false clusters.
Show more

12 Read more

Application of the False Discovery Rate to Quantitative Trait Loci Interval Mapping With Multiple Traits

Application of the False Discovery Rate to Quantitative Trait Loci Interval Mapping With Multiple Traits

LK attempt to extend this principle to multiple testing nificant tests for cholesterol and marbling increased the in a QTL scan by taking the null hypothesis of no QTL stringency of thresholds for the other three traits when as valid for all tests conducted across the genome. This considered in a multiple-trait scenario. As a result, in null hypothesis is, however, by definition false for traits the multiple-trait test, the number of QTL detected for that have been shown by prior biometrical analyses to carcass weight and last rib back fat at the 10% level have nonzero heritabilities. Instead, the statistical prob- was reduced from 14 to 8 (Table 5). Paradoxically, as lem is to identify regions that harbor QTL vs. those that pointed out by Spelman (1998), when grouped with do not. The FDR approach deals directly and quantita- traits having many detectable QTL, tests for the traits tively with this challenge by controlling the proportion having few or no QTL will be pushed down to a high of false positives among all significant results. The rank number. This will tend to produce less stringent GWER approaches deal with this only qualitatively, by CWER thresholds for given FDR level and hence more controlling the probability that significant results in- QTL detected for these traits than when analyzed alone. clude no more than one false positive. This is seen in Table 5 for marbling at the FDR 0.10
Show more

10 Read more

The effect of photofit type faces on recognition memory : a thesis presented in partial fulfillment of the requirement for the degree of Master of Arts in psychology at Massey University

The effect of photofit type faces on recognition memory : a thesis presented in partial fulfillment of the requirement for the degree of Master of Arts in psychology at Massey University

The three measures most frequently used are hits the proportion of "target" responses for faces previously seen in the study phase, false alarms the proportion of "target" responses for [r]

85 Read more

Detecting multiple associations in genome-wide studies

Detecting multiple associations in genome-wide studies

declared significant. This will occur when a low error rate is set, or when there are few true associations, or when the power is low. In genome-wide association scans, the number of true associations is expected to be small by comparison with the number of tests, so that the false discovery variance is relatively high in relation to the target rate, and the FDR approach may not be reliable for controlling the error rate within studies. In gene expression exper- iments, however, the number of true associations is somewhat higher and FDR methods are more appropriate for those studies.
Show more

8 Read more

A biplot correlation range for group-wise metabolite selection in mass spectrometry

A biplot correlation range for group-wise metabolite selection in mass spectrometry

The discovery of discriminating metabolites related to sphingosine was unantici- pated but reasonable in terms of what is known about ceramide metabolism. Cer- amide is an endogenous mediator of apoptotic cell death. For example, when the intracellular concentration of ceramide is elevated under oxidative stress, cellular proliferation is inhibited, and cellular apoptosis is induced [37]. Ceramide is syn- thesized at the endoplasmic reticulum from palmitoyl-CoA and serine, resulting in 3-ketosphinganine. The enzyme 3-ketosphinganine reductase generates sphinganine from 3-ketosphinganine. Sphinganine is acylated to dihydroceramide by sphinganine N-acyl-transferase. Finally, dihydroceramide is converted to ceramide by the activity of the dihydroceramide desaturase [38–40]. In this study, we observed a reduced amount of 3-ketosphinganine (300.28 m/z) in Trx2-overexpressing TG mice, sug- gesting that Trx2 decreases levels of 3-ketosphinganine, thereby conferring protec- tion against apoptosis (Fig. 7 (b)). Thus, the discrimination of WT and TG mitochondria by 3-ketospingosine is consistent with available data on mitochon- dria, ceramide metabolism, and Trx2 protection against apoptosis signaling.
Show more

24 Read more

Discrete False-Discovery Rate Improves Identification of Differentially Abundant Microbes

Discrete False-Discovery Rate Improves Identification of Differentially Abundant Microbes

ABSTRACT Differential abundance testing is a critical task in microbiome studies that is complicated by the sparsity of data matrices. Here we adapt for microbiome studies a solution from the field of gene expression analysis to produce a new method, discrete false-discovery rate (DS-FDR), that greatly improves the power to detect differential taxa by exploiting the discreteness of the data. Additionally, DS- FDR is relatively robust to the number of noninformative features, and thus removes the problem of filtering taxonomy tables by an arbitrary abundance threshold. We show by using a combination of simulations and reanalysis of nine real-world micro- biome data sets that this new method outperforms existing methods at the differen- tial abundance testing task, producing a false-discovery rate that is up to threefold more accurate, and halves the number of samples required to find a given differ- ence (thus increasing the efficiency of microbiome experiments considerably). We therefore expect DS-FDR to be widely applied in microbiome studies.
Show more

15 Read more

Preventing false discovery of heterogeneous treatment effect subgroups in randomized trials

Preventing false discovery of heterogeneous treatment effect subgroups in randomized trials

an effect estimate within each node as the proportion ex- periencing the outcome in the treated group minus the proportion experiencing the outcome in the control sub- set of the trial. Using similar logic as the RF method, the predicted treatment effects are averaged across the 2000 trees to yield an estimated causal effect for each individual in the trial. Gradient RF was carried out using the R package ‘grf’ [20]. After building the risk model, estimates of δ were again partitioned into subgroups using a classification tree.

15 Read more

False Discovery Rate in Linkage and Association Genome Screens for Complex Disorders

False Discovery Rate in Linkage and Association Genome Screens for Complex Disorders

We explore the implications of the false discovery rate (FDR) controlling procedure in disease gene mapping. With the aid of simulations, we show how, under models commonly used, the simple step-down procedure introduced by Benjamini and Hochberg controls the FDR for the dependent tests on which linkage and association genome screens are based. This adaptive multiple comparison procedure may offer an important tool for mapping susceptibility genes for complex diseases.

6 Read more

Usage of Machine Learning for Intrusion Detection in a Network

Usage of Machine Learning for Intrusion Detection in a Network

It is evident from the above table 1 that RandomForest offer highest detection rate and lowest false alarms in comparison to other 14 algorithms. RandomForest takes significant time in building model because it builds multiple classifiers. The best performed algorithm with low time to build the model is RandomTree. This model can play significant role for the organizations looking for deploying a real time network IDS. This model can also be beneficial for researchers who are working on the development of lightweight data mining algorithms.
Show more

9 Read more

A New Approach to the Problem of Multiple Comparisons in the Genetic Dissection of Complex Traits

A New Approach to the Problem of Multiple Comparisons in the Genetic Dissection of Complex Traits

Saturated genetic marker maps are being used to map individual genes affecting quantitative traits. Controlling the “experimentwise” type-I error severely lowers power to detect segregating loci. For prelimi- nary genome scans, we propose controlling the “false discovery rate,” that is, the expected proportion of true null hypotheses within the class of rejected null hypotheses. Examples are given based on a granddaugh- ter design analysis of dairy cattle and simulated backcross populations. By controlling the false discovery rate, power to detect true effects is not dependent on the number of tests performed. If no detectable genes are segregating, controlling the false discovery rate is equivalent to controlling the experimentwise error rate. If quantitative loci are segregating in the population, statistical power is increased as compared to control of the experimentwise type-I error. The difference between the two criteria increases with the increase in the number of false null hypotheses. The false discovery rate can be controlled at the same level whether the complete genome or only part of it has been analyzed. Additional levels of contrasts, such as multiple traits or pedigrees, can be handled without the necessity of a proportional decrease in the critical test probability.
Show more

8 Read more

What p-value must be used as the Statistical Significance Threshold? P

What p-value must be used as the Statistical Significance Threshold? P

Finally, the very impressive Ioannidis’s paper “Most Published Research Findings Are False” [4] has been commented by Jager & Leek [17] who reported a substantial reduction of the “false discovery rate” to 14% leading to the conclusion that “the medical literature remains a reliable record of scientific progress”. However, the Jager and Leek’s paper has been furtherly criticized by six companion commentaries [18-23] in the same issue of Biostatistics with, not surprisingly, very different judgements and considerations. Indeed, it is very well instructive to see how many aspects can be raised by a statistical method together with its practical realization. However, as a general conclusion, it seems that the drastic and dramatically alarming Ioannidis’s statement [4] has to be mitigated to some extent. Coming back to the meaning and the interpretation of the p-values, it is important to stress that Ioannidis reported [3], according to Wasserstein & Lazar [24], that the most common misinterpretation of p-values, among the multiple ones present in the scientific literature, is that they represent the “probability that the studied hypothesis is true”.
Show more

9 Read more

Comparative Proteomic Analysis of Candida albicans and Candida glabrata

Comparative Proteomic Analysis of Candida albicans and Candida glabrata

as determined by Pro Group algorithm were used for further analysis. We used false discovery rate analysis by PSPEP software that is in-built into ProteinPilot 3.0. The data generated by LC-MS/MS analysis of 30 SCX fractions were searched against the custom protein database, which includes protein sequences that belong to C. albicans and C. glabrata filtered from RefSeq, Candida Genome Database and ORF database of Genolvures using Protein- Pilot 3.0 software. Peptides identified in this study are catalogued in Supplementary Table 1. A detected protein threshold of 1.3 which corresponds to a confidence of 95% was used in identification and quantitation of proteins.
Show more

11 Read more

TCC GUI: a Shiny based application for differential expression analysis of RNA Seq count data

TCC GUI: a Shiny based application for differential expression analysis of RNA Seq count data

Similar to the original TCC , TCC-GUI can generate simulation data with various conditions in Step 0. The generated data can, of course, be used as input for DE analysis within TCC-GUI, as well as other tools. The “hypoData” provided as sample dataset in Step 1 is essen- tially the same as that generated in Step 0 with almost default settings (except for the proportion of assigned DEGs in individual groups); the total number of genes was 10,000 (N gene = 10,000), 20% of the genes were DEGs

6 Read more

c83b84d6e5ab7821f60a15c55f422d70c7e272fb.pdf

c83b84d6e5ab7821f60a15c55f422d70c7e272fb.pdf

Figure 1: The Poisson-multivariate normal hierarchical model outperforms SparCC and glasso in a synthetic experiment. a) Frobenius norm of the difference between the partial correlation transformed true precision matrix and the estimated precision matrix for each method. The graphical lasso was run jointly over all response variables and covariates, and is therefore suffixed with “w.c.” (with covariates). Shaded blue bands represent 2× standard deviation and shaded red bands represent 2× standard error. b) False discovery rate of each method as a function of the number of magnitude-ordered edges called significant. The solid thick line illustrates the average FDR curve across all replicates. The shaded bands illustrate the 5 th and 95 th percentile FDR curve considering all replicates. Network representations of the c) true partial
Show more

11 Read more

Optimal likelihood-ratio multiple testing with application to Alzheimer’s disease and questionable dementia

Optimal likelihood-ratio multiple testing with application to Alzheimer’s disease and questionable dementia

Results: Our findings show that the likelihood ratio-based false discovery rate method can control the false discovery rate, giving the smallest false non-discovery rate (for a one-sided test) or the smallest expected number of false assignments (for a two-sided test). Even though we assumed independence among voxels, the likelihood ratio-based false discovery rate method detected more extensive hypometabolic regions in 22 patients with Alzheimer ’ s disease, as compared to the 44 normal controls, than did the Benjamini Hochberg-false discovery rate method. The contingency and distribution patterns were consistent with those of previous studies. In 24 questionable dementia patients, the proposed likelihood ratio-based false discovery rate method was able to detect hypometabolism in the medial temporal region.
Show more

11 Read more

A non-parametric maximum for number of selected features: objective optima for FDR and significance threshold with application to ordinal survey analysis

A non-parametric maximum for number of selected features: objective optima for FDR and significance threshold with application to ordinal survey analysis

High-dimensionality is one of the attributes of big data in many fields. As stated by Dutheil and Hobolth [7], the shift from genetics to genomics brings to new challenges in data analysis. For example, when tests are performed, the global false discovery rate (FDR) has to be properly controlled for (p. 310). According to Kim and Halabi [12], a vital step in model building is dimension reduction. For example, in clinical studies, it is assumed that there are several variables that are associated with the outcome in the large dimensional data. The main purpose of the variable selection is to identify only
Show more

19 Read more

Show all 10000 documents...