Multiple Testing - Statistical Learning and Behrens Fisher Distribution Methods for Heterosceda

To select the differentially expressed genes from thousands of genes, there are thousands of hypotheses each belonging to each gene. So the hypotheses must be tested simultaneously. For this, the hypothesis tests should be run thousands of times re- peatedly. A problem with doing so many tests is that the number of false positives may be increased. This phenomenon is called multiple testing. The simultaneous hypotheses is:

Number of Genes Declared non-DE Declared DE Total

True non DE U V m0

True DE T S p − m0

p − R R p

Table 2.1: Multiple testing Procedure in Simultaneous Hypothesis Testing

H0 :             

Gene 1 is not differentially expressed Gene 2 is not differentially expressed

... ... ...

Gene p is not differentially expressed.

(2.3.1)

H1: At least one Gene is differentially expressed.

While testing the simultaneous hypotheses, one gets the number of hypotheses as in the table. Similar to testing a single hypothesis, the idea here is to control the number of false positives, V . This number is a random variable whose value differs from one test to another test. Let α be the type-I error that one makes when testing

for a gene i. Then, α = Probability of rejecting H0 in fact H0 is true. In terms of

above hypothesis, α= Probability of selecting a false gene as differentially expressed. Such a gene is called a false positive. So, expected number of selecting false positives from a set of p genes is αp. In other words, since p is very big integer, the number of false positives in the experiment is very big, even though we choose α very small.

This means that, probability of not selecting a false gene as differentially expressed = 1 − α. In other words, probability of making the right decision for a gene = 1 − α. Hence, the probability of making correct decision for all p genes = (probability of making correct decision for gene 1).(probability of making correct decision for gene

2)...(probability of making correct decision for gene p) is (1 − α)p_{. From this we}

see that, as p increases, the probability of making correct decision decreases. So, the probability of at least one false positive somewhere is:

This Type I Error is also called the family-wise error rate (FWER). There are few approaches to minimize the family-wise error rates. If the FP Rate is the error mea- sure used, then a simple p-value threshold of α guarantees that the expected number of false positives, V , when testing all p hypothesis/genes is E(V ) ≤ αp.

1. Sidak and Bonferroni Correction

This multiple correction method is one of the earliest method introduced by Bon- ferroni [54]. Let p be the number of tests performed for each gene. Let us consider a problem of achieving a global significance level α. Now the question is what value of

gene-wise significance level αg should be specified to achieve this goal ? From (2.3.2),

this means that,

rectionα = 1 − (1 − αg)p

or, αg = 1 − (1 − α)

p (2.3.3)

The above equation(2.3.3) is called the Sidak Correction for multiple testing. This

means that if we want to achieve the global significance level αg, we have to set the

significance level for each gene as 1 − (1 − αg)

p. Expanding the (2.3.3) by Binomial

theorem and taking the first two terms we get the Bonferroni Correction :

α = 1 − (1 − αg)p = 1 − (1 − pαg + ...) = pαg (2.3.4)

or,

αg =

p (2.3.5)

This means that instead, we have to set the significance level as α divided by the number of genes (tests to be performed). In other words, if the error measure is FWER, then the probability that at least one false positive gene will be selected by

the rule when we set αg = α_p does not exceed α. From the above, the significance

the following method was proposed. 2. Holm’s Step-wise Correction:

While the Sidak and Bonferroni approaches [54] are effective to avoid too many false positives, but the worst that can happen is : we get none of the genes significantly expressed because we take the same significance level for all tests (genes) and

the significance level αg

p is too small for large p. So, this method is too conservative in

the sense that it selects the only strong truly DE genes. Because of large number of hypotheses, p, there is many false positive by chance, it is more appropriate to choose the significance level for a gene according to its p-value. This method adjusts more to the genes that have smaller p-values than on larger p-values.

Algorithm:

1. Choose the global significance level α.

2. Order the genes according to their p-values in ascending order.

3. Compare the p-value (pi) of the i-th gene in the ordered list with the threshold

τi = _p−i+1α .

4. Report the gene i in the order as significantly expressed if pi < τi.

Here, what we see is that the threshold for the genes is chosen according to their

p-values. If the p-value for a gene is smaller then the order i is smaller. Hence τi is

also smaller. This makes more sense than the uniform significance level.

In document Statistical Learning and Behrens Fisher Distribution Methods for Heteroscedastic Data in Microarray Analysis (Page 32-35)