• No results found

Erik Parner 14 September Basic Biostatistics - Day 2-21 September,

N/A
N/A
Protected

Academic year: 2021

Share "Erik Parner 14 September Basic Biostatistics - Day 2-21 September,"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

Basic Biostatistics - Day 2 1

PhD course in Basic Biostatistics – Day 2

Erik Parner, Department of Biostatistics, Aarhus University©

Log-transformation of continuous data

Exercise 1.2+1.4+Standard1-1 (Triglyceride) Logarithms and exponentials

Two independent samples from normal distributions The model, check of the model, estimation Comparing the two means

Approximate confidence interval and test

Exact confidence interval and test using the t-distribution Comparing two populations using a non-parametric test

The Wilcoxon-Mann-Whitney test

Two independent samples from normal distributions Type 1 and type 2 errors

Statistical power Sample size calculations

Basic Biostatistics - Day 2 2

Overview

Data to analyse Type of analysis Unpaired/Paired Type Day Continuous One sample mean Irrelevant Parametric Day 1

Nonparametric Day 3

Two sample mean Non-paired Parametric Day 2 Nonparametric Day 2

Paired Parametric Day 3 Nonparametric Day 3 Regression Non-paired Parametric Day 5 Several means Non-paired Parametric Day 6 Nonparametric Day 6 Binary One sample mean Irrelevant Parametric Day 4 Two sample mean Non-paired Parametric Day 4 Paired Parametric Day 4 Regression Non-paired Parametric Day 7 Time to event One sample: Cumulative risk Irrelevant Nonparametric Day 8 Regression: Rate/hazard ratio Non-paired Semi-parametric Day 8

0 .5 1 1.5 2 2.5 D e n si ty 0 .5 1 1.5 Triglyceride 3

Log-transformation of continuous data Continuous data with a long tail to the right are often log-transformed to obtain an approximate normal distribution. Recall the triglyceride measurements. Applying a normal based prediction interval (PI) on the original data gives invalid results: e.g. the PI will not have 2.5% below and above the two limits.

4.2% of data 0% of

data

Basic Biostatistics - Day 2 4

The logarithm of the triglyceride measurements follows (approximately) a normal distribution:

0 .2 .4 .6 .8 1 D en si ty -2 -1.5 -1 -.5 0 .5 Log-triglyceride -2 -1.5 -1 -.5 0 .5 L o g -t ri g ly c er id e -2 -1.5 -1 -.5 0 .5 Inverse Normal

We then need to transform the results back to the original scale to obtain useful results on the triglyceride measurements.

The method presented here relies on the fact that percentiles are preserved when creating a transformation of the data. Basic Biostatistics - Day 2

(2)

5

Both the logarithm and the exponential function are increasing functions.

( )

( )

( )

( )

exp

X

<

exp

A

X

< ⇔

A

log

X

<

log

A

Logarithmic and exponential functions

-3 -2 -1 0 1 y 0 .5 1 1.5 2 x Logarithm 0 2 4 6 8 y -2 -1 0 1 2 x Exponential Thus

Basic Biostatistics - Day 2 6

Medians and percentiles are preserved when making a transformation of the data:

16 % to the right 50% to the right

exp

log

Logarithmic and exponential transformations

Prediction intervals are given by 2.5 and 97.5 percentile. For a normal distribution the mean is equal to the median =50 percentile. Basic Biostatistics - Day 2

7 PI (-1.54;-0.01) PI (0.21;0.99)

exp

CImean -0.77(-0.81;-0.74) CImedian 0.46 (0.44;0.48) Transforming the results

0 .2 .4 .6 .8 1 D e n s it y -2 -1.5 -1 -.5 0 .5 Log-triglyceride 0 .5 1 1.5 2 2.5 D e n s it y 0 .5 1 1.5 2 Triglyceride

Basic Biostatistics - Day 2 8

Summary Let

Y

denote the original observation.

If

X=

log(

Y

)

has a normal distribution with mean=median=

µ

, and standard deviation=

σ

,then

a valid 95% CI for

µ

will transform into a valid 95% CI for themedianof

Y

= exp(

X

)

a valid 95% PI for

X

will transform into

a valid 95% PI for

Y

= exp(

X

)

The relation between the means and medians are

( )

(

2

)

( )

exp

( )

exp

0.5

median Y

mean Y

µ

µ

σ

=

=

+

(3)

9

It can be shown that

( )

2

( )

( )

exp

1

sd Y

=

mean Y

σ

( )

2

( )

( )

exp

1

( )

sd Y

cv Y

mean Y

σ

=

=

Hence the standard deviation of

Y

depends on the mean of

Y.

For this reason the standard deviation is rarely used as a measure of the spread of the distribution of the original data in this setting.

In this setting the coefficient of variation (cv) is often used as a measure of the spread of the data

Basic Biostatistics - Day 2 10

Properties logarithm and exponential function The basic properties of the logarithms and exponentials that we will use throughout the course:

( )

( )

( )

( )

( )

( )

(

)

( )

( )

(

)

( )

( )

log

log

log

log

log

log

exp

exp

exp

exp

exp

exp

a b

a

b

a b

a

b

a

b

a

b

a

b

a

b

⋅ =

+

=

+

=

− =

Product Sum

log

exp

( )

( )

( )

( )

( )

log b log exp exp b exp a

a = ⋅b a a b⋅ = a = b

Basic Biostatistics - Day 2

Basic Biostatistics - Day 2 11

Continuous data – two sample mean

Body temperature versus gender

Scientific question: Do the two gender have different normal

body temperature?

Design: 130 participants were randomly sampled, 65 males and

65 females

Data: Measured temperature, gender

Summary of the data (the units are degrees Celsius):

---Gender | N(tempC) mean(tempC) sd(tempC) med(tempC) ---+---Male | 65 36.72615 .3882158 36.7 Female | 65 36.88923 .4127359 36.9

---Basic Biostatistics - Day 2 12

Body temperature: Plotting the data

The data looks “fine” - a few outliers among females?

3 5 .5 3 6 3 6 .5 3 7 3 7 .5 3 8 T e m p e ra tu re (C ) Male Female Gender 3 5 .5 3 6 3 6 .5 3 7 3 7 .5 3 8 T e m p e ra tu re (C ) Male Female Figure 2.1

(4)

Basic Biostatistics - Day 2 13

Body temperature: Checking the normality in each group

0 .5 1 0 .5 1 35 36 37 38 Male Female D e n s it y Graphs by Gender 3 5 .5 3 6 3 6 .5 3 7 3 7 .5 3 8 36 36.5 37 37.5 Inverse Normal Male 3 5 .5 3 6 3 6 .5 3 7 3 7 .5 3 8 36 36.5 37 37.5 38 Inverse Normal Female

Normality looks ok! Figure 2.2

Basic Biostatistics - Day 2 14

Body temperature: The model A statistical model:

Two independent samples from normal distributions, i.e. • the two samples are independent

and

each are assumed to be a random sample from a normal distribution:

1. The observations are independent (knowing one observation will not alter the distribution of the others)

2. The observations come from the same distribution, e.g. they all have the same mean and variance.

3. This distribution is a normal distribution with unknown

mean,

µ

i, and standard deviation,

σ

i.

N

(

µ

i,

σ

i2

)

Basic Biostatistics - Day 2 15

Body temperature: Checking the assumptions The first two – think about how data was collected! 1. Independence between groups –information on

different individuals

Independence within groups: Data are from different individuals, so the assumption is probably ok.

2. In each group: The observations come from the same

distribution. Here we can only speculate.

Does the body temperature depend on known factors of interest, for example heart rate, time of day, etc.?

Basic Biostatistics - Day 2 16

Body temperature: The estimates The estimates are found like we did day 1:

(

)

( )

(

)

( )

ˆ

36.73 36.63;36.82 ,

ˆ

0.388, sem

ˆ

0.048

ˆ

36.89 36.79;36.99 ,

ˆ

0.413, sem

ˆ

0.051

M M M F F F

µ

σ

µ

µ

σ

µ

=

=

=

=

=

=

Observe that the width of the prediction interval is approximately

2 * 1.96 * 0.4 C = 1.6 C,

so there is a large variation in body temperature between individuals within each of the two groups

We see that the average body temperature is higher among

(5)

Basic Biostatistics - Day 2 17

Body temperature: Estimating the difference Remember focus is on the difference between the two groups, meaning, we are interested in :

F M

δ µ

=

µ

The unknown difference in mean body temperature. This is of course estimated by:

ˆ

ˆ

ˆ

36.89 36.73

0.16

F M

δ µ

=

µ

=

=

What about the precision of this estimate? What is the standard error of a difference?

Basic Biostatistics - Day 2 18

The standard error of a difference

( )

ˆ

(

)

( )

2

( )

2

ˆ

ˆ

ˆ

ˆ

se

δ

=

se

µ

F

µ

M

=

se

µ

F

+

se

µ

M If we have two independent estimates and, like here, calculate the differences, then the standard error of the difference is given as

( )

ˆ

2 2

se

δ

=

0.048

+

0.051

=

0.070

We note that standard error of a difference between two independent estimates is larger than both of the two standard errors.

In the body temperature data we get:

( )

(

)

ˆ

1.96 se

ˆ

0.163 1.96 0.070

0.025;0.301

δ

±

δ

=

±

=

and an approx. 95% CI

Basic Biostatistics - Day 2 19

Testing no difference in means

Here we are especially interested in the hypothesis that body temperature is the same for the two gender: Hypothesis:

δ

=

δ

0

=

0

We can make an approx. test similar to day 1

(

)

( )

ˆ

:

0

.

16

3

0.025;0.301

se

0.07

0

δ

δ

=

and find the p-value as

( )

0

( )

0.163 0

2.32

0.070

ˆ

ˆ

ˆ

ˆ

0

obs

se

se

z

δ

δ

δ

δ

δ

=

=

=

=

(

)

2 Pr

standard normal

z

obs We get p=2.03%

Basic Biostatistics - Day 2 20

Exact inference for two independent normal samples Just like in the one sample setting, it is possible to make

exact inference – based on the t-distribution.

And again these are easily made by a computer. Remember the model: Two independent samples from normal distributions with means and standard deviations,

,

,

M M F F

µ σ

and

µ σ

Note, both the means and the standard deviations might

be different in the two populations.

If one wants to make exact inference, then one has to make the additional assumption:

(6)

Basic Biostatistics - Day 2 21

Exact inference for two independent normal samples Testing the hypothesis :

σ

M

=

σ

F

This is done by considering the ratio between the two

estimated standard deviations:

2

Largest observed standard deviation

Smallest observed standard deviation

obs

F

= 

A large value of this F-ratio is critical for the hypothesis

The p-value = the probability of observing a F-ratio at least as large as we have observed - given the hypothesis is true!

The p-value is here found by using an F-distribution with (

n

largest

-1) and (

n

smallest

-1) degrees of freedom:

(

)

(

)

2 Pr

largest

1;

smallest

1

obs

p

value

= ⋅

F n

n

− ≥

F

Basic Biostatistics - Day 2 22

Exact inference for two independent normal samples Testing the hypothesis :

σ

M

=

σ

F

Here we have: 2 2

0.413

1.063

1.13

0.388

obs

F

=

=

=

The observed variance (sd2) is 13% higher among women. But could this be explained by sampling variation – what is the p-value?

To find the p-value we consult an F-distribution with 64=(65-1) and 64=(65-1) degrees of freedom. We get p-value = 63%

The difference in the observed standard deviation can be

explained by sampling variation.

We accept that

σ

M

=

σ

F! The fourth assumption is ok!

ˆ

65

0.413

ˆ

65

0.388

F F M M

n

n

σ

σ

=

=

=

=

so

Basic Biostatistics - Day 2 23

Exact inference for two independent normal samples We now have a common standard deviation :

σ

=

σ

F

=

σ

M This is estimated as a “weighted” average

Based on this we can calculate a revised/updated standard error of the difference:

(

)

(

)

(

) (

)

(

)

(

)

(

) (

)

2 2 2 2

ˆ

1

ˆ

1

ˆ

1

1

0.413

65 1

0.388

65 1

0.401

65 1

65 1

F F M M F M

n

n

n

n

σ

σ

σ

=

− +

− +

− +

=

=

− +

( )

ˆ

1

1

1

1

ˆ

se

0.401

0.070

65

65

F M

n

n

δ

= ⋅

σ

+

=

+

=

This is not found in the Stata output

Basic Biostatistics - Day 2 24

Exact inference for two independent normal samples

Exact confidence intervals and p-values are found by using

a t-distribution with

n

M

+

n

F

2

= 65 + 65

2

= 128 d.f.

( )

ˆ

:

0.1

63

se

ˆ

0.07

0

δ

δ

=

( )

(

)

0.975

ˆ

t

se

ˆ

0.163 1.96 0.07

0

0.024;0.

302

δ

±

δ

=

±

=

( )

0

0.163

:

2.32

0.0

ˆ

0

ˆ

70

obs

s

H

e

t

δ

δ

δ

=

=

=

=

and find the p-value as

2 Pr

(

t-distribution

t

obs

)

We get p=2.2% (either from table of standard normal

distribution, or from Stata) And the exact test:

(7)

Basic Biostatistics - Day 2 25

Stata: two-sample normal analysis

. cd "D:\Teaching\BasalBiostat\Lectures\Day2" D:\Teaching\BasalBiostat\Lectures\Day2 . use normtemp.dta, clear

. * Checking the normality.

. qnorm tempC if sex==1, title("Male") name(plot2, replace) . qnorm tempC if sex==2, title("Female") name(plot3, replace) . graph combine plot2 plot3, name(plotright, replace) col(1)

The F-test and t-test are easily done in Stata (more details can be found in the file day2.do).

Basic Biostatistics - Day 2 26 . sdtest tempC, by(sex)

Variance ratio test

---Group | Obs Mean Std.Err. Std.Dev. [95% Conf.Interval] ---+---Male | 65 36.72615 .0481522 .3882158 36.62996 36.82235 Female | 65 36.88923 .0511936 .4127359 36.78696 36.9915 ---+---combined 130 36.80769 .0357326 .4074148 36.73699 36.87839 ---ratio = sd(Male) / sd(Female) f = 0.8847 Ho: ratio = 1 degrees of freedom = 64, 64 Ha: ratio < 1 Ha: ratio != 1 Ha: ratio > 1 Pr(F < f) = 0.3128 2*Pr(F < f)= 0.6256 Pr(F > f)= 0.6872

Basic Biostatistics - Day 2 27 . ttest tempC, by(sex)

Two-sample t test with equal variances

---Group | Obs Mean Std.Err. Std.Dev. [95%Conf.Interval] ---+---Male | 65 36.72615 .0481522 .3882158 36.62996 36.82235 Female | 65 36.88923 .0511936 .4127359 36.78696 36.9915 ---+---combined 130 36.80769 .0357326 .4074148 36.73699 36.87839 ---+---diff | -.1630766 .070281 -.3021396 -.0240136 ---diff = mean(Male) - mean(Female) t = -2.3204 Ho: diff = 0 degrees of freedom = 128 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.0110 Pr(|T| > |t|)= 0.0219 Pr(T > t)= 0.9890

Basic Biostatistics - Day 2 28

Exact inference for two independent normal samples What if you reject the hypothesis of the same sd in the

two groups?

1. This indicates that the variation in the two groups differ!

Think about why!!!

2. Often it is due to the fact that the assumption of normality is not satisfied. Maybe you would do better by making the statistical analysis on another scale, e.g. log. 3. If you still want to compare the means on the original

scale you can make approximate inference based on the

t-distribution (e.g.ttest tempC, by(sex) unequal)

4. If you only want to test the hypothesis that the two distributions are located the same place, then can you use the non-parametric Wilcoxon-Mann-Whitney test – see later.

(8)

Basic Biostatistics - Day 2 29

Body temperature example - formulations

Methods:

Data was analyzed as two independent samples from normal distributions based on the Students t. The assumption of normality was checked by a Q-Q plot. Estimates are given with 95% confidence intervals.

Results:

The mean body temperature was36.9(36.8;37.0)C among women compared to36.7(36.6;36.8)C among men. The mean was0.16(0.02;0.30)C, higher for females and this was statistically significant (p=2.3%).

Conclusion:

Based on this study we conclude that women have a small, but statistically significantly higher mean body temperature than men.

Basic Biostatistics - Day 2 30

Example 7.2 Birth weight and heavy smoking

Scientific question: Does the smoking habits of the mother

influence the birth weight of the child?

Design and data: (observational) The birth weight (kg) of

children born by 14 heavy smokers and 15 non-smokers were recorded.

Summary of the data (the units is kg):

---Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---+---Non-smok | 15 3.627 .0925 .3584 3.428 3.825 Heavy sm | 14 3.174 .1238 .4631 2.907 3.442

Already here we observe, that the average birth weight is smallest among heavy-smokers: difference=452 g

Basic Biostatistics - Day 2 31

2.5 3 3.5 4 4.5 B ir th w e ig h t

Non-smoker Heavy smoker

Smoking habits 2.5 3 3.5 4 4.5 B ir th w e ig h t

Non-smoker Heavy smoker

Example 7.2 Birth weight and heavy smoking

Plot the data !!!!!!

Basic Biostatistics - Day 2 32

Example 7.2 Birth weight and heavy smoking

0 .5 1 1.5 0 .5 1 1.5 2 3 4 5 Non-smoker Heavy smoker D e n s it y

Graphs by Smoking habits

2.5 3 3.5 4 4.5 3 3.5 4 4.5 Inverse Normal Non-smokers 2.5 3 3.5 4 4.5 2.5 3 3.5 4 Inverse Normal Heavy smokers

(9)

Basic Biostatistics - Day 2 33

Example 7.2 Birth weight and heavy smoking exact inference

Compare the standard deviations (using the computer): 2 (13,14)

0.4631

1.64

35%

0.3584

from obs F

F

=

=

p

=

Conclusion of the test:

If there was no difference between the two groups, then it would be almost impossible to observe such a large

difference as we have seen – hence the hypothesis cannot be true!

We accept that the two standard deviations are identical. and again by computer we get:

Difference in mean birth weight: 0.452(0.138;0.767) kg Hypothesis: no difference in mean birth weight. p=0.06%

Basic Biostatistics - Day 2 34

The birth weight example - formulations

Methods - like the body temperature example:

Data ……intervals.

Results:

The mean birth weight was3.627(3.428;3.825)kg among non-smokers compared to3.174(2.907;3.442)kg among heavy smokers. The difference452(138;767)g was statistically significant (p=0.06%).

Conclusion:

Children born by heavy-smokers have a birth weight, that is statistically significantly smaller, than that of children born by non-smokers. The study has only limited information on the precise size of the association.

Furthermore we have not studied the implications of the difference in birth weight or whether the difference could be explained by other factors, like eating habits……

Basic Biostatistics - Day 2 35

Non-Parametric test: Wilcoxon-Mann-Whitney test Until now we have only made statistical inference based on a

parametric model.

E.g. we have focused on estimating the difference between two groups and supplying the estimate with a confidence

interval.

We have also performed a statistical test of no difference based on the estimate and the standard error – a parametric

test.

There are other types of tests – non-parametric tests – that are not based on a parametric model.

These test are also based on models, but they are not parametric models.

We will here look at the Wilcoxon-Mann-Whitney test, which is the non-parametric analogy to the two sample t-test.

Basic Biostatistics - Day 2 36

Non-Parametric test: Wilcoxon-Mann-Whitney test The key feature of all non-parametric tests is, that they are based on the ranks of the data and not the actual values.

Birth weight Rank Birth weight Rank 2.340 1 2.710 3 2.380 2 3.310 10 2.740 4 3.360 11 2.860 5 3.410 12 2.900 6 3.510 14 3.180 7 3.540 16 3.230 8 3.600 17.5 3.270 9 3.610 19 3.420 13 3.700 23 3.530 15 3.730 24 3.600 17.5 3.830 25 3.650 20.5 3.890 26 3.650 20.5 3.990 27 3.690 22 4.080 28 4.130 29

Heavy smokers Non-smokers

Smallest

(10)

Basic Biostatistics - Day 2 37

Non-Parametric test: Wilcoxon-Mann-Whitney test We can now add the rank in one of the groups, here the heavy smokers:

Heavy-smokers observed rank sum=150.5

Hypothesis: The birth weights among heavy-smokers and

non-smokers is the same.

Assuming the hypothesis is true one can calculate the expected rank sum among the heavy-smokers and standard error of the observed rank sum and calculate a test

statistics:

(

)

se 2.5 210 150.5 22.91 97 obs z = − − = = − Observed ranksum O Expected ranksum bserved ranksum P-value =0.9% The p-value is found as

2 Pr

(

standard normal

z

obs

)

Basic Biostatistics - Day 2 38

Non-Parametric test: Wilcoxon-Mann-Whitney test We saw that the ranksum among heavy smokers was smaller than expected if there was no true difference between the two groups.

So small that we only observe such a discrepancy in one out of 100 (p-val=0.9%) studies like this.

We reject the hypothesis!

Conclusion

Children born by heavy-smokers have a statistically significant lower birth weight than children born by non-smokers.

Remember this depends on, the sample size, the design, the statistical analysis...

Basic Biostatistics - Day 2 39

Non-Parametric test: Wilcoxon-Mann-Whitney test Some comments:

• There are two assumptions behind the test: 1. Independence between and within the groups.

2. Within each group: The observations come from the

same distribution, e.g. they all have the same mean

and variance.

• The test is designed to detect a shift in location in the two populations and not, for example, a difference in the variation in the two populations.

• You will only get a p-value – the possible difference in location will is not quantified by an estimate with a confidence interval.

• As a test it is just as valid as the t-test!

Basic Biostatistics - Day 2 40

Stata: Wilcoxon-Mann-Whitney test

. use bwsmoking.dta,clear

(Birth weight (kg) of 29 babies born to 14 heavy smokers and 15 non-smokers)

. ranksum bw, by(group)

Two-sample Wilcoxon rank-sum (Mann-Whitney) test group | obs rank sum expected ---+---Non-smoker | 15 284.5 225 Heavy smoker | 14 150.5 210 ---+---combined | 29 435 435 unadjusted variance 525.00

adjustment for ties -0.26 ---adjusted variance 524.74

Ho: bw(group==Non-smoker) = bw(group==Heavy smoker) z = 2.597

(11)

Basic Biostatistics - Day 2 41

Type 1 and type 2 errors We will here return to the simple interpretation of a statistical test:

We test a hypothesis:

δ

=

δ

0

We will make a

Type 1 error if we reject the hypothesis, if it is true. Type 2 error if we accept the hypothesis, if it is false.

If we use a specific significance level,

α

, (typically 5%) then we know:

(

)

(

)

0 0 0

Pr

Pr

reject

given it is true

reject given

δ δ

δ δ

δ δ

α

=

=

=

=

=

The risk of a Type 1 error =

α

Basic Biostatistics - Day 2 42

Type 1 and type 2 errors What about the risk of Type 2 error:

(

)

(

)

0 0 0

Pr

Pr

?

accept

given it is not true

accept given

β

δ δ

δ δ

δ δ

=

=

=

=

=

This will depend on several things:

1. The statistical model and test we will be using 2. What is the true value of

δ

?

3. The precision of the estimate.

What is the sample size and standard deviation? That is, the risk of Type 2 error,

β

, is not constant. Often we consider the statistical power:

(

0 0

)

Pr

reject given

δ δ

=

δ δ

= −

1

β

Basic Biostatistics - Day 2 43

Statistical power – planning a study - testing for no difference

Suppose we are planning a new study of fish oil and its possible effect on diastolic blood pressure (DBP).

Assume we want to make a randomized trial with two groups of equal size and we will test the hypothesis of no difference. We believe that the true difference between groups in DBP is5mmHg.

Furthermore we believe that the standard deviation in the increase in DBP is9mmHg.

We plan to include40women in each group and analyze using a t-test.

What is the chance, that this study will lead to a statistically significant difference between the two groups, given the true difference is5mmHg?

Basic Biostatistics - Day 2 44 10 20 30 40 50 60 70 80 90 100 P o w e r in % 0 20 40 60 80 100

Observations in each group

sd=10 sd=9 sd=8 sd=7

True difference = 5 - Test for no difference

Statistical power, when the true difference is 5 and sd= 7,8,9 or 10 and we test the hypothesis of no difference.

(12)

Basic Biostatistics - Day 2 45

Statistical power – planning a study

We plan to include40women in each group and analyze using a t-test and the true difference is5mmHgandsd=9mmHg Power =69%

That is, there is only69%chance, that such a study will lead to a statistical significant result - given the assumptions are true.

How may women should we include in each group if we want to have a power of90%?

Based on the plot we see that more than aprox. 69women in each group will lead to a power of90%.

Basic Biostatistics - Day 2 46 10 20 30 40 50 60 70 80 90 100 P o w e r in % 0 20 40 60 80 100

Observations in each group

sd=10 sd=9 sd=8 sd=7

True difference = 5 - Test for no difference

power=90% n=69 Statistical power, when the true difference is 5 and

sd= 7,8,9 or 10 and we test the hypothesis of no difference.

Basic Biostatistics - Day 2 47 10 20 30 40 50 60 70 80 90 100 P o w e r in % 0 20 40 60 80 100

Observations in each group

sd=10 sd=9 sd=8 sd=7 True difference = 10 - Test for no difference

The power increases as a function of the expected

difference between the groups and decreases as a function of the variation, standard deviation, within the groups

Basic Biostatistics - Day 2 48

Power two unpaired normal samples In general we have the five quantities in play:

1

-

2

n

δ µ µ

σ

α

β

=

=

=

=

=

The true difference between groups The standard deviation each group The significance level (typically 5%) The risk of type 2 error = 1-the power The sample size in each

withi

p n

grou

If we know four of these, then we can determine the last. Typically, we know the first four and want to know the

sample size.

or we know

δ, σ, α

and

n

and then we want to know the

(13)

Basic Biostatistics - Day 2 49

Stata: Paired sample from a normal distribution

. power twomeans 0 5 , sd1(9) sd2(9) alpha(0.05) power(0.90) Performing iteration ...

Estimated sample sizes for a two-sample means test Satterthwaite's t test assuming unequal variances Ho: m2 = m1 versus Ha: m2 != m1

Study parameters: alpha = 0.0500 power = 0.9000 delta = 3.2867 m1 = 0.0000 m2 = 5.0000 sd1 = 9.0000 sd2 = 9.0000 Estimated sample sizes:

N = 140 N per group = 70 * Prior to Stata 13:

* sampsi 0 5, sd1(9) sd2(9) alpha(0.05) power(0.90)

Power calculations are done using the powercommand:

Basic Biostatistics - Day 2 50

Comments on sample size calculationsMost often done by computer (in Statapower)

• There are many different formulas see Kirkwood & Stern Table 35.1. We will only look at a few in this course. • It is in general more relevant to test that the difference is

larger than a specified value.

A so-called Superiority or Non-inferiority study.

• Or to plan the study so that your study is expected to yield a

confidence interval with a certain width.

• You need to know the true differenceandyou must have an

idea of the variation within the groups. The latter you might find based on hospital records or in the literature.

• Sample size calculations after the study has been carried out (post –hoc) is nonsense!!

The confidence interval will show how much information you have in the study.

References

Related documents

Abstract This report is a retrospective analysis of 65 patients with peripheral T cell lymphoma (PTCL), who underwent high-dose therapy and autologous hematopoietic stem

Results were expressed as means and Standard Error to Mean (SEM). Statistical analysis was made with non parametric tests : Mann-Whitney test for unpaired comparisons, Wilcoxon test

South African policy exhibits attitudes contradictory to the values of Southern African Development Community (SADC) - to promote a unified Southern Africa.. Existing policies need

In particular, we evaluate the Pitman efficiency of the MWW test relative to the likelihood ratio test for scale alternatives when (X, Y) has a bivariate exponential

As our p &lt; a we reject the null hypothesis at the 95% level in favor of the alternative that, in fact, there is a statistically significant difference between the mean

Likewise the service quality instrument SERVQUAL reported a significant positive relationship between each of the five dimensions and the two variables student satisfaction

Patterns of Urinary Schistosomiasis Infection in Akure North Local Government Area of Ondo State, Nigeria.. Department of Biology, Federal University of

Topics covered in this chapter are transport systems, robots, sensors, imaging systems, computer vision, Machine Vision, illumination, camera standards, a