Overview of Some Basic Tests - Nonparametric Predictive Methods for Bootstrap and Test Reproduc

In this section we illustrate an overview of some nonparametric tests: one sample Sign test, one sample signed rank test and two sample rank sum test. These tests will be used in this Chapter and in Chapter 4 to explore the NPI and NPI-B methods with reproducibility probability.

3.2.1 One Sample Sign Test

Perhaps the most basic nonparametric test is the sign test [49, 52, 67]. Suppose we have n real valued random quantities X1, X2, ..., Xn, which are traditionally as-sumed to be mutually independent and identically distributed with median m₀, so P (Xi < m0) = P (Xi > m0) = 1/2 for i = 1, ..., n. Generally, we test the hypotheses

H₀ : θ = m₀ versus H₁ : θ 6= m₀, > m₀, < m₀ (3.1) This test assumes that the data is iid from a continuous distribution with a positive density. The test statistic K is the number of these X_i that are positive, so

K =

i=1

I{X_i > 0} (3.2)

with indicator function I{A} = 1 if A is true and I{A} = 0 if A is not true, and ignoring the observations which are equal m₀. For the one sided upper tail test with the level of significance α, H1 : θ > m0, we reject H0 if K ≥ b_α,1/2 with b_α,1/2 the upper α percentile point for the Binomial distribution with sample size n and success probability p = 1/2, while reject H0 if K ≤ n − b_α,1/2 for one sided lower tail test H₁ : θ < m₀, and for two sided test H₁ : θ 6= m₀ reject H₀ if K ≥ b_α/2,1/2 or K ≤ n−b_α/2,1/2. Where b_α,1/2, b_α/2,1/2 are given in some literature tables to make the type 1 error probability equal to α. If n → ∞ we use standard normal distribution as an approximation with µK = ⁿ₂ and σ_K² = ⁿ₄, the standardized version of K is K^∗ is:

K^∗ = K − µ_K

σ_K = (K + 0.5) − 0.5 ∗ n

√n 2

(3.3)

We reject H₀ if K^∗ ≥ z_α for a one sided upper tail test and if K^∗ ≤ −z_α for a lower tail test and reject H₀ if |K^∗| ≥ z_α/2 for a two sided test.

3.2.2 One Sample Signed Rank Test

The one sample Wilcoxon Signed Rank test (WRS) [49, 52, 67, 69] is an improve-ment to the sign test if the population is symmetric about the median m₀. It is a popular nonparametric location test which takes more information from the sample into account than the sign test. Details about the history of the signed rank test and the corresponding standard frequentist theory, together with tables for critical values for the test statistic and approximations for large samples, can be found in many statistics textbooks, e.g. [41, 49]. Let X₁, X₂, ..., X_n is an independent sample from an absolutely continuous, symmetric distribution, then the test statistic used in this test is:

W = X

Xi>m0

rank(|X_i− m₀|) (3.4)

where rank (|X_i−m₀|) is the rank of |X_i−m₀|, so the test statistic is the sum of ranks of such absolute differences for observations that are greater than the median. The assumption of an absolutely continuous underlying distribution is for convenience, as it reduces the requirement for dealing with ties. If there are ties in the absolute differences in the data these can be dealt with [41]. For the one sided upper tail test H₀ : θ = m₀ versus H₁ : θ > 0, we reject H₀ if W ≥ W_α, where W_α is the critical value for the test statistic for significance level α, and reject H₀ if W ≤ ⁿ⁽ⁿ⁺¹⁾₂ − W_α if we have a one sided lower tail test with H₁ : θ < m₀. If we use the two sided test H₁ : θ 6= m₀, we reject H₀ if W ≥ W_α/2 or W ≤ ⁿ⁽ⁿ⁺¹⁾₂ − W_α/2. If n → ∞, use W^∗ = ^{W −µ}_σ ^w

w is N (0, 1) where µ_w = ⁿ⁽ⁿ⁺¹⁾₄ and σ_w =

qn(n+1)(2n+1)

24 . Reject H₀ if W^∗ ≥ z_α for the upper tail test, if W^∗ ≤ −z_α for the lower tail test and if

|W^∗| ≥ z_α/2 for the two sided test.

For ease of presentation we will assume in this thesis that there are no ties in the data. Adapting the NPI-RP approach, which is shown in the next section, for such possible ties is relatively easy, e.g. by breaking the ties in all possible ways and

taking the most conservative corresponding lower and upper probabilities. However, this is not of major practical relevance and is not discussed further here.

3.2.3 Two Sample Rank Sum Test

The comparison of two samples is one of the most common applications of statis-tical methods, with the two sample rank sum test, also known as the Wilcoxon Mann Whitney test (WMT) the most popular non parametric test for such scenar-ios [41,49]. For this test, data X₁, X₂, ..., X_n₁ are assumed to be an independent and identically distributed sample from a population with the cumulative probability distribution F , and data Y₁, Y₂, ..., Y_n₂ an independent and identically distributed sample from a population with cumulative probability distribution G, where also the X and Y observations are mutually independent. The two sample rank sum test considers null hypothesis

H₀ : F (t) = G(t), for all real valued t (3.5) The null hypothesis asserts that the X variable and Y variable have the same prob-ability distribution, but the common distribution is not specified. The alternative hypothesis specified that Y is larger (or smaller) than X. The model which describes the alternative is called the location shift model

H1 : G(t) = F (t − δ), for all t (3.6) This means that the population 2 is the same as population 1 except that it is shifted by the amount δ. It can be written as

Y =^dX + δ (3.7)

and the null hypothesis can be written as

H0 : δ = 0 (3.8)

and the usual alternative hypotheses are either two sided, that is δ 6= 0 or one sided, so either δ > 0 or δ < 0. In this thesis we restrict attention to the one sided

upper tail alternative H₁ : δ > 0. The two sided test involves more complicated combinatorics and is left as a possible topic for future research. As throughout this thesis, we assume that there are no ties in the data set in order to avoid making the presentation more complicated than needed to get the main point of this work across, namely the possibility of using NPI for inference on reproducibility of tests.

If the mean of population 1 is E(X) and the mean of population 2 is E(Y ) , then

δ = E(Y ) − E(X) (3.9)

To compute the Wilcoxon two sample rank sum test statistic Z, we order the com-bined sample of X and Y from small to large values. Let S₁ be the rank of Y₁, S₂ is the rank of Y₂ ... and S_n is the rank of Y_n₂. Let V_j be the rank assigned to Y_j and define the rank sum

Z =

j=1

V_j (3.10)

The one upper sided test is

H₀ : δ = 0 versus H₁ : δ > 0 (3.11) reject H0 if Z ≥ Zα, with the critical value Zα such that under the null hypothesis, P (Z ≥ Z_α) for the chosen level of significance α, it is use the sum of ranks of the smaller sample size. The values of Zα are typically provided in tables [41, 49].

H0 : δ = 0 versus H1 : δ < 0 (3.12) and reject H₀ if Z ≤ n₂(n₁+ n₂+ 1) − Z_α.

The two sided test is

H₀ : δ = 0 versus H₁ : δ 6= 0 (3.13) and we reject H₀ if Z ≥ w^α

2 or Z ≤ n₂(n₁+ n₂+ 1) − Z^α

2. Note that the R code gives the U statistic instead of the Z statistic,which is called Mann Whitney U statistic

U =

i=1 n2

j=1

φ(X_i, Y_j) (3.14)

and φ(X_i, Y_j) = 1 if X_i < Y_j and 0 otherwise

Z = U + n₂(n₂+ 1)

2 (3.15)

that means that tests based on U are equivalent to tests based on Z. If n is large, we use the standard normal distribution as approximation

Z^∗ = Z − µz

σ_z (3.16)

where is µ_z = ⁿ¹⁽ⁿ¹⁺ⁿ₂ ²⁺¹⁾ and σ_z =

qn1n2(n1+n2+1)

12 .

In document Nonparametric Predictive Methods for Bootstrap and Test Reproducibility (Page 73-77)