Te sts o f S ig n ifi ca n ce

(1)

Te sts o f S ig n ifi c a n c e

Outline:

• G eneral Pro c ed ure fo r H y p o th esis Testing – N ull and A lternativ e H y p o th eses – Test S tatistic s

– p-v alues

• Interp retatio n o f th e S ig nifi c anc e L ev el • Tests fo r a Po p ulatio n M ean

• Interp retatio n o f p-v alues

• S tatistic al v s. Prac tic al S ig nifi c anc e • C o nfi d enc e Interv als and H y p o th esis Tests • Po tential A b uses o f Tests

1

A c o n fi d en c e in terval is a very u sefu l statistic al in feren c e to o l w h en th e g o al is to estim ate a po pu latio n param eter.

W h en th e g o al is to assess th e evid en c e pro vid ed by th e d ata in favo r o f so m e c laim abo u t th e po pu latio n , test o f sig n ifi c a n c e are u sed .

E x a m p le: F illin g C o k e B o ttles A m ach in e at a C o k e pro d u c tio n plan t is d esig n ed to fi ll bo ttles w ith 1 6 o z o f C o k e. T h e ac tu al am o u n t varies slig h tly fro m bo ttle to bo ttle. Fro m past ex perien c e, it is k n o w n th at th e S D 0 .2 o z . A S R S o f 1 0 0 bo ttles fi lled by th e m ach in e h as a m ean 1 5 .9 4 o z per bo ttle. Is th is evid en c e th at th e m ach in e n eed s to be rec alibrated , o r c o u ld th is d iff eren c e be a resu lt o f ran d o m variatio n ?

2

Te stin g H y p o the se s A hy p o the sis te st is an assessm en t o f the evid en ce pro vid ed by the d ata in favo r o f (o r ag ain st) so m e claim abo u t the po pu latio n . Fo r ex am ple, su ppo se we perfo rm a ran d o m iz ed ex perim en t o r tak e a ran d o m sam ple an d calcu late so m e sam ple statistic, say the sam ple m ean . We wan t to d ecid e if the observed valu e o f the sa m ple statistic is co n sisten t with so m e h y poth esized valu e o f the co rrespo n d in g popu la tion param eter.

If the o bserved an d hypo thesiz ed valu e d iff er (as they alm o st certain ly will), is the d iff eren ce d u e to an in co rrect hypo thesis o r m erely d u e to chan ce variatio n ?

General P ro c ed u re fo r H y p o theses Testing 1. Fo rm u late the nu ll hy p o thesis and the alternative hy p o thesis

• T he nu ll hy p o thesis H0is the statem ent

being tested . U su ally it states that the d iff erence between the o bserved valu e and the hy p o thesiz ed valu e is o nly d u e to chance variatio n.

Fo r ex am p le, µ = 16 o z .

• T he alternativ e hy p o thesis Hais the

statem ent we will favo r if we fi nd evid ence that the nu ll hy p o thesis is false. It u su ally states that there is a real d iff erence between the o bserved and hy p o thesiz ed valu es. Fo r ex am p le, µ 6= 16 , µ > 16 , o r µ < 16 . A test is called

• two -sid ed if Ha is o f the fo rm µ 6= 16 .

(2)

Example: G R E S c o r es

The m ean sc o re o f all ex am in ees o n the Verb al an d Q u an titative sec tio n s o f the G R E is ab o u t 1 0 4 0 . S u p p o se 5 0 ran d o m ly sam p led U C B erk eley g rad u ate stu d en ts have a m ean G R E V+ Q sc o re o f 1 3 1 0 . We are in terested in d eterm in in g if a m ean G R E V+ Q sc o re o f 1 3 1 0 g ives evid en c e that, as a w ho le, B erk eley g rad u ate stu d en ts have a hig her m ean G R E sc o re than the n atio n al averag e.

What is H0? What is Ha?

5

General P ro c ed u re fo r H y p o th eses Testing c o nt...

2. C alc u late the test statistic o n which the test will be based .

T he test statistic m easu res the d iff erenc e between the o bserved d ata and what wo u ld be ex p ec ted if the nu ll hyp o thesis were tru e. W hen H0is tru e,

we ex p ec t the estim ate based o n the sam p le to tak e a valu e near the p aram ater valu e sp ec ifi ed by H0.

O u r g o al is to answer the q u estio n, “ H o w ex trem e is the valu e c alc u lated fro m the sam p le fro m what we wo u ld ex p ec t u nd er the nu ll hyp o thesis? ” In m any c o m m o n situ atio ns the test statistic has the fo rm

estim ate - hyp o thesiz ed valu e stand ard d eviatio n o f the estim ate

6

Fo r the C o k e ex am ple, we have that the m ean o f the sam ple is 1 5 .9 4 o z . T he po pu latio n m ean spec ifi ed by the nu ll hypo thesis is 1 6 o z . A test statistic is

z =1 5 .9 4 − 1 6 0.2/√1 00 = −3

(W e’ll have m o re to say abo u t this in a m o m ent.)

3. F ind the p-va lu e o f the o bserved resu lt • T he p -valu e is the p ro bability o f o bserving a

test statistic as ex trem e o r m o re ex trem e th an ac tu ally o b serv ed, assu m ing the nu ll

hyp o thesis H0is tru e.

• T he sm aller the p -valu e, the stro ng er the evid enc e ag ain st the nu ll hyp o thesis.

• if the p -valu e is as sm all o r sm aller than so m e nu m ber α (e.g . 0.01 , 0.05 ), we say that the resu lt is sta tistic a lly sig n ifi c a n t at level α. • α is c alled the sig n ifi c a n c e le ve l o f the test. In the c ase o f the C o k e ex am p le, p = 0.001 3 fo r a o ne-sid ed test o r p = 0.002 6 fo r a two -sid ed test. (O nc e ag ain, we’ll have m o re to say abo u t this in a m o m ent.)

(3)

Inte r p r e ta tio n o f th e S ig nifi c a nc e L e v e l To perform a te st o f sig nifi c a nc e le v e l α, we perform the prev iou s three steps an d then rejec t H0if th e p-v alu e is less th an α.

The followin g ou tc om es are possib le when c on d u c tin g a test: R eality O u r D ec ision H0 Ha H0 √ Type I E rror Ha Type II √ E rror

S u ppose H0is ac tu ally tru e. If we d raw m an y

sam ples, an d perform a test for each on e, α of these tests will (in c orrec tly) rejec t H0. In other

word s, α is th e pro bab ility th at w e w ill m ak e a Ty pe I erro r.

Type II error is related to the n otion of the po w er of a test, which we will d isc u ss later.

9

Example: A n Exac t B in o mial Test

In the last 51 Wo rld S eries (thro u g h 2003 ) there have been 24 seven g am e series. S u ppo se we wish to test the hypo thesis

H0: G am es w ith in a W o rld S eries are in d epen d en t, w ith each team h avin g p ro bab ility 1₂o f w in n in g .

Fo r the alternative hypo thesis, let’s u se the g eneric Ha: T h e m od el in H0 is in co rrect.

L et X d eno te the nu m ber o f g am es in the Wo rld S eries. U nd er H0, X has the fo llo wing d istribu tio n:

k 4 5 6 7

P (X = k) 1₈ 1₄ ₁₆5 ₁₆5 Fo r o u r test statistic, let’s ju st u se

M = # seven g am e series

What is the p-valu e?

We need to find m su ch that PH0 (M ≥ m) ≈ 0.05. A ssu m ing d ifferent years’ Wo rld S eries are ind epend ent (i.e. that the last 51 Wo rld S eries are an S R S fro m the po pu latio n o f Wo rld S eries), the nu m ber o f seven g am e series in 51 “ trials” is B(51, 5/16).

P (M ≥ 20) = 0.086 P (M ≥ 21) = 0.049

We want to have a sig nificance level o f n o m o re th an a 5% , so the critical valu e will be 21.

D o we reject H0 at sig nificance level α = 0.05? T his is ju st a m atter o f check ing whether o u r o bserved valu e o f M (24) ex ceed s the critical valu e (21). It d o es, so we rejec t H0.

10

Te sts fo r a Po p u latio n M e an

In the prec ed ing ex am ple, we were able to perfo rm an ex ac t B ino m ial test. Freq u ently , an ex ac t test is im prac tic al, bu t we c an u se the appro x im ate n o rm ality o f m ean s to c o nd u c t an appro x im ate te st. S u ppo se we want to test the hy po thesis that µ has a spec ifi c valu e:

H0: µ = µ0

S inc e ¯x estim ates µ, the test is based o n ¯x, which has a (perhaps appro x im ately ) N o rm al d istribu tio n. T hu s,

z = ¯x − µ0 σ/√n

is a stand ard no rm al rand o m variable, u n d e r th e n u ll h y po th e sis.

p-valu es fo r d iff erent alternative hy po theses: • Ha: µ > µ0 – p-valu e is P (Z ≥ z) (area o f

rig ht-hand tail)

• Ha: µ < µ0 – p-valu e is P (Z ≤ z) (area o f

left-hand tail)

• H : µ 6= µ – p-valu e is 2P (Z ≥ |z|) (area o f

Example: F illin g C ok e B ottles (c on t.) We are in terested in assessin g whether or n ot the machin e n eed s to be rec alibrated , which will be the c ase if it is sy stematic ally over- or u n d er-fi llin g bottles. T hu s, we will u se the hy potheses

H0: µ = 1 6 Ha: µ 6= 1 6 R ec all that ¯x = 1 5 .9 4 , σ = 0.2, an d n = 1 00. T hu s, z = x − µ¯ 0 σ/√n = −3 T he p-valu e for a two-sid ed test is p = 2P (Z ≥ 3) = 0.0026 .

If α = 0.01 , we rejec t H0.

(4)

Example: TV Tu b es

TV tu b es are tak en at ran d o m an d th e lifetime measu red . n = 1 00, σ = 3 00 an d ¯x= 1 26 5 (d ay s). Test wh eth er th e po pu latio n mean is 1 200, o r g reater th an 1 200. H0: µ = 1 200 Ha: µ > 1 200 U n d er H0,x¯∼ N (1 200, 3 0). ∴z=x−1 2 00¯ _{3 0} ∼ N (0, 1 ) u n d er H0 Th e test statistic is z =1 2 6 5₋1 2 00 3 0 = 2.1 7 , an d th e p-valu e is P (Z ≥ 2.1 7 |H0) = 0.01 5

Th is is evid en c e ag ain st H0at sig n ifi c an c e level

0.05 , so we rejec t H0. Th at is, we c o n c lu d e th at

th e averag e lifetime o f TV tu b es is g reater th an 1 200 d ay s.

1 3

A R o u g h In te r p r e ta tio n o f p-v a lu e s

p-valu e In te rpre tatio n p >0.1 0 n o e vid e n c e ag ain st H0

0.05 < p ≤ 0.1 0 we ak e vid e n c e ag ain st H0

0.01 < p ≤ 0.05 e vid e n c e ag ain st H0

p ≤0.01 stro n g e vid e n c e ag ain st H0

S ta tistic a l v s. P r a c tic a l S ig n ifi c a n c e S ay in g th at a re su lt is statistically sig n ifi can t d o e s n o t sig n ify th at it is larg e o r n e c e ssarily

im po rtan t. T h at d e c isio n d e pe n d s o n th e partic u lars o f th e pro b le m . A statistic ally sig n ifi c an t re su lt o n ly say s th at th e re is su b stan tial e vid e n c e th at H0is false .

Failu re to re je c t H0d o e s n o t im ply th at H0is

c o rre c t. It o n ly im plie s th at w e h av e in su ffi cien t ev id en ce to co n clu d e th at H0is in co rrect.

1 4

Confidence Inter v a ls a nd H y p oth esis Tests

A level α two -sid ed test rejec ts a hy p o thesis

H0: µ = µ0ex ac tly when the valu e o f µ0falls o u tsid e

a (1 − α) c o n fi d en c e in terval fo r µ.

Fo r ex am p le, c o n sid er a two -sid ed test o f the fo llo win g hy p o theses

H0: µ = µ0

Ha: µ 6= µ0

at the sig n ifi c an c e level α = .0 5 .

• If µ0 is a valu e in sid e the 9 5 % c o n fi d en c e in terval

fo r µ, then this test will have a p-valu e g reater than .0 5 , an d therefo re will n o t rejec t H0.

• If µ0 is a valu e o u tsid e the 9 5 % c o n fi d en c e

in terval fo r µ, then this test will have a p-valu e sm aller than .0 5 , an d therefo re will rejec t H0.

Example

A partic u lar area c ontains 8 0 0 0 c ond ominiu m u nits. In a su rvey of th e oc c u pants, a simple rand om sample of siz e 1 0 0 yield s th e information th at th ere are 1 6 0 motor veh ic les in th e sample g iving an averag e nu mber of motor veh ic les per u nit of 1 .6 , w ith a sample stand ard d eviation of 0 .8 .

C onstru c t a c onfi d enc e interval for th e total nu mber of veh ic les in th e area.

T h e c ity c laims th at th ere are only 1 1 ,0 0 0 veh ic les in th e area, so th ere is no need for a new g arag e. W h at d o you th ink ?

(5)

More on C on stru c tin g H y p oth esis Tests

Hypo thesis always refer to so me po pu latio n o r mo d el, no t to a partic u lar o u tc o me. A s a resu lt, H0and Hamu st be ex pressed in terms o f so me

po pu latio n parameter o r parameters.

Ha typic ally ex presses the eff ec t that we ho pe to

fi nd evid enc e fo r. S o Hais u su ally c arefu lly

tho u g ht o u t fi rst. We then set u p H0to be the

c ase when the ho pe-fo r eff ec t is no t present. It is no t always c lear whether Hasho u ld be

o ne-sid ed o r two -sid ed , i.e., d o es the parameter d iff er fro m its nu ll hypo thesis valu e in a spec ifi ed d irec tio n.

N ote: You a re n ot a llowed to look a t th e d a ta fi rst a n d th en fra m e Ha to fi t wh a t

th a t d a ta sh ow.

1 7

Po te n tia l A b u se s o f Te sts

In m any applic ations, a researcher c onstru c ts a nu ll hypotheses with the intent of d isc red iting it. For ex am ple:

• H0: new d ru g has the sam e eff ec t as plac ebo

• H0: m en and wom en are paid eq u ally

A sm all p valu e c an help a d ru g c om pany c an g et a d ru g approved by the FD A. S im ilarly, a researcher m ay have an easier tim e pu blishing his resu lts if the p-valu e is sm aller than 0 .0 5 .

B ec au se of that we have to be aware of the following potential abu ses:

• U sing one-sid ed tests to m ak e the p-valu e one-half as big

• C ond u c ting repeated sam pling and testing and reporting only the lowest p-valu e

• Testing m any hypothesis or testing the sam e hypothesis on m any d iff erent su bg rou ps. In the last two, even if there is ac tu ally no eff ec t, you will probably g et at least one sm all p-valu e.