Parametric and non-parametric statistical methods for the life sciences - Session I

(1)

Why nonparametric methods What test to use ? Rank Tests

Parametric and non-parametric statistical methods for the life sciences - Session I

Liesbeth Bruckers Geert Molenberghs

Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-Biostat)

Universiteit Hasselt June 7, 2011

(2)

Why nonparametric methods What test to use ? Rank Tests

Why nonparametric methods ?

(4)

Introductory Example

The paper Hypertension in Terminal Renal Failure, Observations Pre and Post Bilateral Nephrectomy (J. Chronic Diseases (1973):

471-501) gave blood pressure readings for five terminal renal patients before and 2 months after surgery (removal of kidney).

Patient 1 2 3 4 5

Before surgery 107 102 95 106 112 After surgery 87 97 101 113 80

Question: Does the mean blood pressure before surgery exceed the mean blood pressure two months after surgery ?

(5)

Classical Approach

Paired t-test:

Patient 1 2 3 4 5

Before surgery 107 102 95 106 112 After surgery 87 97 101 113 80

Difference Di 20 5 -6 -7 32

Hypotheses: H₀ : µ_d = 0 versus H₁: µ_d > 0 µ_d : mean difference in blood pressure Test-Statistic: t = q ^D

1

n(n−1)P(Di−D)²

follows a t distribution with n − 1 d.f.

(6)

Assumptions

The statistic follows a t-distribution if the differences are normally distributed ⇒ t-test = parametric method Observations are made independent: selection of a patient does not influence chance of any other patient for inclusion (Two sample t test): populations must have same variances Variables must be measured in an interval scale, to interpret the results

These assumptions are often not tested, but accepted.

(7)

Normal probability plot

Normality is questionable !

(8)

Nonparametric Test of Hypotheses

Follow same general procedure as parametric tests:

State null and alternative hypothesis

Calculate the value of the appropriate test statistic (choice based on the design of the study)

Decision rule: either reject or accept depending on the magnitude of the statistic

P_H₀(T ≥ c) = ??

Exact distribution

Approximation for the exact distribution

(9)

Why nonparametric methods What test to use ? Rank Tests Two independent samples More then two independent samples Two dependent samples More then two dependent samples Ordered hypotheses

When to use what test

(10)

What test to use ?

Choice of appropriate test statistic depends on the design of the study:

number of groups ?

independent of dependent samples ? ordered alternative hypothesis ?

(11)

Two Independent Samples

Permeability constants of the human chorioamnion (a placental membrane) for at term (x) and between 12 to 26 weeks gestational age (y) pregnancies are given in the table below. Investigate the alternative of interest that the permeability of the human

chorioamnion for a term pregnancy is greater than for a 12 to 26 weeks of gestational age pregnancy.

X (at term) 0.83 1.89 1.04 1.45 1.38 1.91 1.64 1.46 Y (12-26weeks) 1.15 0.88 0.90 0.74 1.21

Statistical Methods:

t-test

Wilcoxon Rank Sum Test

(12)

More Than Two Independent Samples

Protoporphyrin levels were determined for three groups of people - a control group of normal workers, a group of alcoholics with sideroblasts in their bone marrow, and a group of alcoholics without sideroblasts. The data is shown below. Does the data suggest that normal workers and alcoholics with and without sideroblasts differ with respect to protoporphyrin level ?

Group Protoporphyrin level (mg)

Normal 22 27 47 30 38 78 28 58 72 56

Alcoholics with sideroblasts 78 172 286 82 453 513 174 915 84 153

Alcoholics without sideroblasts 37 28 38 45 47 29 34 20 68 12

ANOVA

Kruskal-Wallis Test

(13)

Two Dependent Samples

Twelve adult males were put on liquid diet in a weight-reducing plan. Weights were recorded before and after the diet. The data are shown in the table below.

Subject 1 2 3 4 5 6 7 8 9 10 11 12

Before 186 171 177 168 191 172 177 191 170 171 188 187

After 188 177 176 169 196 172 165 190 165 180 181 172

Paired t-test

Sign test; Signed-rank test

(14)

Randomized Blocked Design

Effect of Hypnosis:

Emotions of fear, happiness, depression and calmness were requested (in random order) from 8 subject during hypnosis Response: skin potential (in millivolts)

Subject 1 2 3 4 5 6 7 8

Fear 23.1 57.6 10.5 23.6 11.9 54.6 21.0 20.3 Happiness 22.7 53.2 9.7 19.6 13.8 47.1 13.6 23.6 Depression 22.5 53.7 10.8 21.1 13.7 39.2 13.7 16.3 Calmness 22.6 53.1 8.3 21.6 13.3 37.0 14.8 14.8 Statistical Methods:

Mixed Models Friedmann test

(15)

Ordered Treatments

Patients were treated with a drug a four dose levels (100mg, 200mg, 300mg and 400mg) and then monitored for toxicity.

Drug Toxicity

Dose Mild Moderate Severe Drug Death

100mg 100 1 0 0

200mg 18 1 1 0

300mg 50 1 1 0

400mg 50 1 1 1

Regression

Jonckheere-Terpstra Test

(16)

Why nonparametric methods What test to use ? Rank Tests Wilcoxon Rank Sum Test Kruskal-Wallis Test Friedmann Statistic Sign Test Jonckheere-Terpstra Test

Wilcoxon Rank Sum Test

(17)

Wilxocon Rank Sum Test

Detailed Example:

Data : GAF scores

Control 25 10 35 Treatment 36 26 40

Does treatment improve the functioning ?

(18)

Parametric Approach: t-test

t = _S^X^¯¹^{− ¯}^X⁰

X1−X0,where S_X1−X0= rs21

n1+^s2_n0⁰

t test: means of two normally distributed populations are equal

H0: µ1= µ0

H1: µ16= µ0(one sided test H₁: µ₁≥ µ₀

equal sample sizes

two distributions have the same variance

X¯1 = 34.00, ¯X0= 23.33, SX1= 7.21, SX0= 12.58 t = 1.27

P_H₀(t ≥ 1.27) = 0.1358

(19)

Wilxocon Rank Sum Test

Detailed Example:

Control 25 10 35 Treatment 36 26 40

Order data: Position of patients on treatment as compared with position of patients in control arm ?

Ranks

(20)

Treatment is effective if treated patients rank sufficiently high in the combined ranking of all patients

Test statistic such that:

treatment ranks are high ⇔ value test statistic is high treatment ranks are low ⇔ value test statistic is low

WS = S1+ S2+ . . . + Sn (n=3, number of patients in treatment arm)

Ranks

Control 2 1 4

(25) (10) (35)

Treatment 5 3 6

(36) (26) (40)

W_S = 5+3+6 =14

(21)

Reject null hypothesis when W_S is sufficiently large : W_S ≥ c P_H₀(W_S ≥ c) = α (alpha=0.05)

Distribution of W_S under H₀ ? Suppose no treatment effect (H₀)

rank is solely determined by patients health status rank is independent of receiving treatment or placebo

“rank is assigned to patient before randomisation”

Random selection of patients for treatment ⇒ random selection of 3 ranks out of 6

Randomisation divides ranks (1,2,...6) into two groups ! Number of possible combinations : ^N_n = _n!(N−n)!^N!

(22)

All posibilities: (each as a probability of 1/20 under H₀)

treatment ranks (4,5,6) (3,5,6) (3,4,6) (3,4,5) (2,5,6)

ws 15 14 13 12 13

treatment ranks (2,4,6) (2,4,5) (2,3,6) (2,3,5) (2,3,4)

w 12 11 11 10 9

treatment ranks (1,5,6) (1,4,6) (1,4,5) (1,3,6) (1,3,5)

w_s 12 11 10 10 9

treatment ranks (1,3,4) (1,2,6) (1,2,5) (1,2,4) (1,2,3)

w_s 8 9 8 7 6

(23)

Distribution of W_S under the null hypothesis:

w 6 7 8 9 10 11 12 13 14 15

P_H0(W_s= w ) ₂₀¹ ₂₀¹ ₂₀² ₂₀³ ₂₀³ ₂₀³ ₂₀³ ₂₀² ₂₀¹ ₂₀¹

(24)

P_H_O(W_S ≥ 14) = 0.1 Do not reject H₀.

Conclusion: Treatment does not increase the GAF scores.

Power of this study ???

(25)

Large Sample Size-case

N

n increases rapidly with N and n

20

10 = 184756

12

6 = 924

Asymptotic Null Distribution: Central Limit Theorem

Sum T of large number of independent random variables is approximately normally distributed.

P T − E (T ) pVar(T ) ≤ a

!

≈ Φ(a)

where Φ(a) is the area to the left of a under a standard normal curve

(26)

If both n and m are sufficiently large:

WS ≈ N(E (W_S);pVar(W_S)) E (W_S) = ¹₂n(N + 1) Var (WS) = ₁₂¹ nm(N + 1)

(27)

Kruskal-Wallis Test

(28)

Kruskal- Wallis test

Example: Kruskal- Wallis test:

The following data represent corn yields per acre from three different fields where different farming methods were used.

Method 1 Method 2 Method 3

92 94 101

91 90 100

84 81 93

89 102

Question: is the yields different for the 4 methods ?

(29)

Parametric Approach One-way ANOVA

Statistical test of whether or not the means of several groups are all equal

Assumptions:

Independence of cases

The distributions of the residuals are normal : i ∼ (0, σ²).

Homoscedasticity

F = variance between groups

variance within groups = ^MSTR_MSE

Statistic follows a F distribution with s − 1, n − s d.f.

(30)

Small F:

Large F:

(31)

One-Way ANOVA results

X¯1 = 89, ¯X2 = 88.33, ¯X3 = 99 σ1 = 3.56, σ2 = 6.65, σ3 = 4.08 MSTR= 135.03 , MSE = 22.08 F= 6.11

P_H₀(F ≥ 6.11) = 0.0245

(32)

Ranks:

Method 1 Method 2 Method 3

6 8 10

5 4 9

1 2 7

3 11

R_{i .}: 3.75 4.666 6.75

(33)

Hypothesis :

H0: No difference between the treatments H₁: Any difference between the treatments If treatments do not differ widely (H₀):

R_{i .} are close to each other R_{i .} close to R_..

If treatments do differ (H1):

R_{i .} differ substantial R_{i .} not close to R_..

(34)

Evaluate the null hypothesis by investigating:

K = 12 N(N + 1)

s

X

i =1

n_i(R_{i .}− R_..)²

P_H₀(K ≥ c) = ?

Exact distribution of K under H0 :

ranks are determined before assignment to treatment random assignment → all possibilities same chance of being observed

Number of possible combinations: multinomial coefficient :

11

4,3,4 = ¹¹₄ ₇

3

₄

4 = 11550

N

n1,n2,...,ns = _n^N

1

_N−n₁

n2 . . . ^N−n¹^−...−n_n ^s−1

s

(35)

A few possible configurations:

Method 1 Method 2 Method 3 K (1,2,3,4) (5,6,7) (8,9,10,11) 8.91 (1,2,3,5) (4,6,7) (8,9,10,11) 8.32 (1,2,3,6) (4,5,6) (8,9,10,11) 7.84 (1,2,3,7) (4,5,6) (8,9,10,11) 7,48

. . .

(1,3,5,6) (2,4,8) (7,9,10,11) 6.16 . . .

Each configuration has a probability of ₁₁₅₅₀¹ to happen.

(36)

Exact Distribution of K :

P_H₀(K ≥ 6.16) = 0.0306

Conclusion: Reject H₀: there is a difference between the farming methods

Large sample size approximation ” χ² distribution with s − 1 d.f.

(37)

Friedmann Test

(38)

Friedmann Statistic

Setting 1: complete randomization:

Kruskal-Wallis test p-value =0.8611

Treatment effect is blurred by the variability between subjects Setting 2: randomisation within age groups:

p-value 0.0411 Conclusion reject H0

(39)

Procedure

Divide subjects in homogeneous subgroups (BLOCKS) Compare subjects within the blocks w.r.t. treatment effects (Generalisation of the paired comparison design)

(40)

Example

Data

Age-group

treatment 20-30 y 30-40 y 40-50 y 50-60 y

A 19 21 43 46

B 17 20 37 44

C 23 22 39 42

Rank subjects within a block:

Age-group

treatment 20-30 y 30-40 y 40-50 y 50-60 y

A 2 2 3 3

B 1 1 1 2

C 3 3 2 1

(41)

Mean of ranks for:

treatment A = RA.=¹⁰₄ = 2.5 treatment B = RB.=⁶₄ = 1.5 treatment C = RC .=⁹₄ = 2.25

If these mean ranks are different → reject H₀ If these mean ranks are close → accept H₀

(42)

Measure for closseness of the mean ranks:

if the R_{i .} are all close to each other

↓

then they are close to the overall mean R_..

and

(Ri .− R_..)² will be close to zero Friedman Statistic

Q = 12N s(s + 1)

s

X

i =1

(R_{i .}− R_..)²

(43)

PH0(Q ≥ c) =?

Exact distribution of Q under H₀: A few possible configurations:

Age-group Q

Treatment 20-30 y 30-40 y 40-50 y 50-60 y

A 1 1 1 1 8

B 2 2 2 2

C 3 3 3 3

A 3 3 3 3 8

B 2 2 2 2

C 1 1 1 1

A 1 3 1 3 0

B 2 2 2 2

C 3 1 3 1

. . .

A 2 2 3 3 3.5

B 1 1 1 2

C 3 3 2 1

(44)

Exact Distribution of Q:

Q Pr

—————————————- .0000000 .694444444444444E-01 .5000000 .277777777777778 1.500000 .222222222222222 2.000000 .157407407407407 3.500000 .148148148148148 4.500000 .555555555555555E-01 6.000000 .277777777777778E-01 6.500000 .370370370370370E-01 8.000000 .462962962962963E-02

(45)

Number of possibilities for the rank combinations:

age-group 20- 30 year: 3! = 6 age-groups are independent

↓

total number of possible combinations: (3!)⁴= 1296

Under the null these are all equally likely : ₁₂₉₆¹ (s!)^N, s=] treatment groups, N = ] of blocks P_H₀(Q ≥ 3.5) = 0.2731

Do not reject H₀

(46)

Sign Test

(47)

Sign Test

Special case of Friedmann test: blocks of size 2 subjects matched on e.g. age, gender, ...

twins

two eyes (hands) of a person

subject serves as own control: e.g. blood pressure before and after treatment

Example: Pain scores for lower back pain, before and after having acupuncture

Pain score Pain score Sign Pain score Pain score Sign

Patient Before After Patient Before After

1 5 6 - 8 7 6 +

2 6 7 - 9 6 5 +

3 7 6 + 10 5 7 -

4 9 4 + 11 8 6 +

5 6 7 - 12 8 4 +

6 5 4 + 13 7 3 +

7 4 8 - 14 8 5 +

15 6 7 -

(48)

9 pairs out 15 where treatment comes out ahead (reduction in pain scores)

Sign Test: SN = 9 P_H₀(S_N ≥ 9) =???

Exact Distribution of S_N under H₀ is binomial N trials, N = number of ‘pairs’

Success probability: ¹₂

P_H₀(S_N = a) =N a

1 2^N

P_H₀(S_N ≥ 9) = ( ¹⁵₉ + ¹⁵₁₀ + . . . + ¹⁵₁₅)₂¹15 = 0.31

(49)

Jonckheere-Terpstra Test

(50)

Jonckheere-Terpstra Test

To be used when the H₁ is ordered.

Ordinal data for the responses and an ordering in the treatment/groups.

Example:

Data:

Three diets for rats Response: growth

H1: Growth rate decreases from A to C : A ≥ B ≥ C A 133 139 149 160 184

B 111 125 143 148 157

C 99 114 116 127 146

(51)

Parametric Approach : Regression

Models the relationship between a dependent and independent variable

y_i = β₀+ β₁x_i+ _i Assumptions

i ∼ N(0, σ²), i are independent homoscedasticity

x_i is measured without error

(52)

β₀= 169, p-value = < 0.0001 β₁= −16, p-value = 0.0133 R-square = 0.3866

(53)

Jonckheere-Terpstra Test

Based on Mann-Whitney statistics for two treatments Comparing the treatment groups two by two

if WBA is large: growth A > growth B : (W_BA= 18

if WBC is large: growth B > growth C : (W_BC= 18

if WCAis large: growth A > growth C : (W_BA= 23

JT Statistic: W =P

i <jW_ij

Reject H₀ when W is sufficiently large W = 59

PH0(W ≥ c) = 0.0120

Compare with the result of a Kruskal-Wallis Test: p-value = 0. 072

The distribution of W follows a normal distribution for large

(54)

Parametric versus nonparametric tests

Parametric tests:

Assumptions about the distribution in the population Conditions are often not tested

Test depends on the validity of the assumptions Most powerful test if all assumptions are met Nonparametric tests:

Fewer assumptions about the distribution in the population In case of small sample sizes often the only alternative(unless the nature of the population distribution is known exactly)

Less sensitive for measurement error(uses ranks)

Can be used for data which are inherently in ranks, even for data measured in a nominal scale

Easier to learn

Parametric and non-parametric statistical methods for the life sciences - Session I