In this section we illustrate an overview of some nonparametric tests: one sample Sign test, one sample signed rank test and two sample rank sum test. These tests will be used in this Chapter and in Chapter 4 to explore the NPI and NPI-B methods with reproducibility probability.
3.2.1 One Sample Sign Test
Perhaps the most basic nonparametric test is the sign test [49, 52, 67]. Suppose we have n real valued random quantities X1, X2, ..., Xn, which are traditionally as-sumed to be mutually independent and identically distributed with median m0, so P (Xi < m0) = P (Xi > m0) = 1/2 for i = 1, ..., n. Generally, we test the hypotheses
H0 : θ = m0 versus H1 : θ 6= m0, > m0, < m0 (3.1) This test assumes that the data is iid from a continuous distribution with a positive density. The test statistic K is the number of these Xi that are positive, so
K =
n
X
i=1
I{Xi > 0} (3.2)
with indicator function I{A} = 1 if A is true and I{A} = 0 if A is not true, and ignoring the observations which are equal m0. For the one sided upper tail test with the level of significance α, H1 : θ > m0, we reject H0 if K ≥ bα,1/2 with bα,1/2 the upper α percentile point for the Binomial distribution with sample size n and success probability p = 1/2, while reject H0 if K ≤ n − bα,1/2 for one sided lower tail test H1 : θ < m0, and for two sided test H1 : θ 6= m0 reject H0 if K ≥ bα/2,1/2 or K ≤ n−bα/2,1/2. Where bα,1/2, bα/2,1/2 are given in some literature tables to make the type 1 error probability equal to α. If n → ∞ we use standard normal distribution as an approximation with µK = n2 and σK2 = n4, the standardized version of K is K∗ is:
K∗ = K − µK
σK = (K + 0.5) − 0.5 ∗ n
√n 2
(3.3)
We reject H0 if K∗ ≥ zα for a one sided upper tail test and if K∗ ≤ −zα for a lower tail test and reject H0 if |K∗| ≥ zα/2 for a two sided test.
3.2.2 One Sample Signed Rank Test
The one sample Wilcoxon Signed Rank test (WRS) [49, 52, 67, 69] is an improve-ment to the sign test if the population is symmetric about the median m0. It is a popular nonparametric location test which takes more information from the sample into account than the sign test. Details about the history of the signed rank test and the corresponding standard frequentist theory, together with tables for critical values for the test statistic and approximations for large samples, can be found in many statistics textbooks, e.g. [41, 49]. Let X1, X2, ..., Xn is an independent sample from an absolutely continuous, symmetric distribution, then the test statistic used in this test is:
W = X
Xi>m0
rank(|Xi− m0|) (3.4)
where rank (|Xi−m0|) is the rank of |Xi−m0|, so the test statistic is the sum of ranks of such absolute differences for observations that are greater than the median. The assumption of an absolutely continuous underlying distribution is for convenience, as it reduces the requirement for dealing with ties. If there are ties in the absolute differences in the data these can be dealt with [41]. For the one sided upper tail test H0 : θ = m0 versus H1 : θ > 0, we reject H0 if W ≥ Wα, where Wα is the critical value for the test statistic for significance level α, and reject H0 if W ≤ n(n+1)2 − Wα if we have a one sided lower tail test with H1 : θ < m0. If we use the two sided test H1 : θ 6= m0, we reject H0 if W ≥ Wα/2 or W ≤ n(n+1)2 − Wα/2. If n → ∞, use W∗ = W −µσ w
w is N (0, 1) where µw = n(n+1)4 and σw =
qn(n+1)(2n+1)
24 . Reject H0 if W∗ ≥ zα for the upper tail test, if W∗ ≤ −zα for the lower tail test and if
|W∗| ≥ zα/2 for the two sided test.
For ease of presentation we will assume in this thesis that there are no ties in the data. Adapting the NPI-RP approach, which is shown in the next section, for such possible ties is relatively easy, e.g. by breaking the ties in all possible ways and
taking the most conservative corresponding lower and upper probabilities. However, this is not of major practical relevance and is not discussed further here.
3.2.3 Two Sample Rank Sum Test
The comparison of two samples is one of the most common applications of statis-tical methods, with the two sample rank sum test, also known as the Wilcoxon Mann Whitney test (WMT) the most popular non parametric test for such scenar-ios [41,49]. For this test, data X1, X2, ..., Xn1 are assumed to be an independent and identically distributed sample from a population with the cumulative probability distribution F , and data Y1, Y2, ..., Yn2 an independent and identically distributed sample from a population with cumulative probability distribution G, where also the X and Y observations are mutually independent. The two sample rank sum test considers null hypothesis
H0 : F (t) = G(t), for all real valued t (3.5) The null hypothesis asserts that the X variable and Y variable have the same prob-ability distribution, but the common distribution is not specified. The alternative hypothesis specified that Y is larger (or smaller) than X. The model which describes the alternative is called the location shift model
H1 : G(t) = F (t − δ), for all t (3.6) This means that the population 2 is the same as population 1 except that it is shifted by the amount δ. It can be written as
Y =dX + δ (3.7)
and the null hypothesis can be written as
H0 : δ = 0 (3.8)
and the usual alternative hypotheses are either two sided, that is δ 6= 0 or one sided, so either δ > 0 or δ < 0. In this thesis we restrict attention to the one sided
upper tail alternative H1 : δ > 0. The two sided test involves more complicated combinatorics and is left as a possible topic for future research. As throughout this thesis, we assume that there are no ties in the data set in order to avoid making the presentation more complicated than needed to get the main point of this work across, namely the possibility of using NPI for inference on reproducibility of tests.
If the mean of population 1 is E(X) and the mean of population 2 is E(Y ) , then
δ = E(Y ) − E(X) (3.9)
To compute the Wilcoxon two sample rank sum test statistic Z, we order the com-bined sample of X and Y from small to large values. Let S1 be the rank of Y1, S2 is the rank of Y2 ... and Sn is the rank of Yn2. Let Vj be the rank assigned to Yj and define the rank sum
Z =
n2
X
j=1
Vj (3.10)
The one upper sided test is
H0 : δ = 0 versus H1 : δ > 0 (3.11) reject H0 if Z ≥ Zα, with the critical value Zα such that under the null hypothesis, P (Z ≥ Zα) for the chosen level of significance α, it is use the sum of ranks of the smaller sample size. The values of Zα are typically provided in tables [41, 49].
H0 : δ = 0 versus H1 : δ < 0 (3.12) and reject H0 if Z ≤ n2(n1+ n2+ 1) − Zα.
The two sided test is
H0 : δ = 0 versus H1 : δ 6= 0 (3.13) and we reject H0 if Z ≥ wα
2 or Z ≤ n2(n1+ n2+ 1) − Zα
2. Note that the R code gives the U statistic instead of the Z statistic,which is called Mann Whitney U statistic
U =
n1
X
i=1 n2
X
j=1
φ(Xi, Yj) (3.14)
and φ(Xi, Yj) = 1 if Xi < Yj and 0 otherwise
Z = U + n2(n2+ 1)
2 (3.15)
that means that tests based on U are equivalent to tests based on Z. If n is large, we use the standard normal distribution as approximation
Z∗ = Z − µz
σz (3.16)
where is µz = n1(n1+n2 2+1) and σz =
qn1n2(n1+n2+1)
12 .