Te sts o f S ig n ifi c a n c e
Outline:• G eneral Pro c ed ure fo r H y p o th esis Testing – N ull and A lternativ e H y p o th eses – Test S tatistic s
– p-v alues
• Interp retatio n o f th e S ig nifi c anc e L ev el • Tests fo r a Po p ulatio n M ean
• Interp retatio n o f p-v alues
• S tatistic al v s. Prac tic al S ig nifi c anc e • C o nfi d enc e Interv als and H y p o th esis Tests • Po tential A b uses o f Tests
1
A c o n fi d en c e in terval is a very u sefu l statistic al in feren c e to o l w h en th e g o al is to estim ate a po pu latio n param eter.
W h en th e g o al is to assess th e evid en c e pro vid ed by th e d ata in favo r o f so m e c laim abo u t th e po pu latio n , test o f sig n ifi c a n c e are u sed .
E x a m p le: F illin g C o k e B o ttles A m ach in e at a C o k e pro d u c tio n plan t is d esig n ed to fi ll bo ttles w ith 1 6 o z o f C o k e. T h e ac tu al am o u n t varies slig h tly fro m bo ttle to bo ttle. Fro m past ex perien c e, it is k n o w n th at th e S D 0 .2 o z . A S R S o f 1 0 0 bo ttles fi lled by th e m ach in e h as a m ean 1 5 .9 4 o z per bo ttle. Is th is evid en c e th at th e m ach in e n eed s to be rec alibrated , o r c o u ld th is d iff eren c e be a resu lt o f ran d o m variatio n ?
2
Te stin g H y p o the se s A hy p o the sis te st is an assessm en t o f the evid en ce pro vid ed by the d ata in favo r o f (o r ag ain st) so m e claim abo u t the po pu latio n . Fo r ex am ple, su ppo se we perfo rm a ran d o m iz ed ex perim en t o r tak e a ran d o m sam ple an d calcu late so m e sam ple statistic, say the sam ple m ean . We wan t to d ecid e if the observed valu e o f the sa m ple statistic is co n sisten t with so m e h y poth esized valu e o f the co rrespo n d in g popu la tion param eter.
If the o bserved an d hypo thesiz ed valu e d iff er (as they alm o st certain ly will), is the d iff eren ce d u e to an in co rrect hypo thesis o r m erely d u e to chan ce variatio n ?
General P ro c ed u re fo r H y p o theses Testing 1. Fo rm u late the nu ll hy p o thesis and the alternative hy p o thesis
• T he nu ll hy p o thesis H0is the statem ent
being tested . U su ally it states that the d iff erence between the o bserved valu e and the hy p o thesiz ed valu e is o nly d u e to chance variatio n.
Fo r ex am p le, µ = 16 o z .
• T he alternativ e hy p o thesis Hais the
statem ent we will favo r if we fi nd evid ence that the nu ll hy p o thesis is false. It u su ally states that there is a real d iff erence between the o bserved and hy p o thesiz ed valu es. Fo r ex am p le, µ 6= 16 , µ > 16 , o r µ < 16 . A test is called
• two -sid ed if Ha is o f the fo rm µ 6= 16 .
Example: G R E S c o r es
The m ean sc o re o f all ex am in ees o n the Verb al an d Q u an titative sec tio n s o f the G R E is ab o u t 1 0 4 0 . S u p p o se 5 0 ran d o m ly sam p led U C B erk eley g rad u ate stu d en ts have a m ean G R E V+ Q sc o re o f 1 3 1 0 . We are in terested in d eterm in in g if a m ean G R E V+ Q sc o re o f 1 3 1 0 g ives evid en c e that, as a w ho le, B erk eley g rad u ate stu d en ts have a hig her m ean G R E sc o re than the n atio n al averag e.
What is H0? What is Ha?
5
General P ro c ed u re fo r H y p o th eses Testing c o nt...
2. C alc u late the test statistic o n which the test will be based .
T he test statistic m easu res the d iff erenc e between the o bserved d ata and what wo u ld be ex p ec ted if the nu ll hyp o thesis were tru e. W hen H0is tru e,
we ex p ec t the estim ate based o n the sam p le to tak e a valu e near the p aram ater valu e sp ec ifi ed by H0.
O u r g o al is to answer the q u estio n, “ H o w ex trem e is the valu e c alc u lated fro m the sam p le fro m what we wo u ld ex p ec t u nd er the nu ll hyp o thesis? ” In m any c o m m o n situ atio ns the test statistic has the fo rm
estim ate - hyp o thesiz ed valu e stand ard d eviatio n o f the estim ate
6
Fo r the C o k e ex am ple, we have that the m ean o f the sam ple is 1 5 .9 4 o z . T he po pu latio n m ean spec ifi ed by the nu ll hypo thesis is 1 6 o z . A test statistic is
z =1 5 .9 4 − 1 6 0.2/√1 00 = −3
(W e’ll have m o re to say abo u t this in a m o m ent.)
3. F ind the p-va lu e o f the o bserved resu lt • T he p -valu e is the p ro bability o f o bserving a
test statistic as ex trem e o r m o re ex trem e th an ac tu ally o b serv ed, assu m ing the nu ll
hyp o thesis H0is tru e.
• T he sm aller the p -valu e, the stro ng er the evid enc e ag ain st the nu ll hyp o thesis.
• if the p -valu e is as sm all o r sm aller than so m e nu m ber α (e.g . 0.01 , 0.05 ), we say that the resu lt is sta tistic a lly sig n ifi c a n t at level α. • α is c alled the sig n ifi c a n c e le ve l o f the test. In the c ase o f the C o k e ex am p le, p = 0.001 3 fo r a o ne-sid ed test o r p = 0.002 6 fo r a two -sid ed test. (O nc e ag ain, we’ll have m o re to say abo u t this in a m o m ent.)
Inte r p r e ta tio n o f th e S ig nifi c a nc e L e v e l To perform a te st o f sig nifi c a nc e le v e l α, we perform the prev iou s three steps an d then rejec t H0if th e p-v alu e is less th an α.
The followin g ou tc om es are possib le when c on d u c tin g a test: R eality O u r D ec ision H0 Ha H0 √ Type I E rror Ha Type II √ E rror
S u ppose H0is ac tu ally tru e. If we d raw m an y
sam ples, an d perform a test for each on e, α of these tests will (in c orrec tly) rejec t H0. In other
word s, α is th e pro bab ility th at w e w ill m ak e a Ty pe I erro r.
Type II error is related to the n otion of the po w er of a test, which we will d isc u ss later.
9
Example: A n Exac t B in o mial Test
In the last 51 Wo rld S eries (thro u g h 2003 ) there have been 24 seven g am e series. S u ppo se we wish to test the hypo thesis
H0: G am es w ith in a W o rld S eries are in d epen d en t, w ith each team h avin g p ro bab ility 12o f w in n in g .
Fo r the alternative hypo thesis, let’s u se the g eneric Ha: T h e m od el in H0 is in co rrect.
L et X d eno te the nu m ber o f g am es in the Wo rld S eries. U nd er H0, X has the fo llo wing d istribu tio n:
k 4 5 6 7
P (X = k) 18 14 165 165 Fo r o u r test statistic, let’s ju st u se
M = # seven g am e series
What is the p-valu e?
We need to find m su ch that PH0 (M ≥ m) ≈ 0.05. A ssu m ing d ifferent years’ Wo rld S eries are ind epend ent (i.e. that the last 51 Wo rld S eries are an S R S fro m the po pu latio n o f Wo rld S eries), the nu m ber o f seven g am e series in 51 “ trials” is B(51, 5/16).
P (M ≥ 20) = 0.086 P (M ≥ 21) = 0.049
We want to have a sig nificance level o f n o m o re th an a 5% , so the critical valu e will be 21.
D o we reject H0 at sig nificance level α = 0.05? T his is ju st a m atter o f check ing whether o u r o bserved valu e o f M (24) ex ceed s the critical valu e (21). It d o es, so we rejec t H0.
10
Te sts fo r a Po p u latio n M e an
In the prec ed ing ex am ple, we were able to perfo rm an ex ac t B ino m ial test. Freq u ently , an ex ac t test is im prac tic al, bu t we c an u se the appro x im ate n o rm ality o f m ean s to c o nd u c t an appro x im ate te st. S u ppo se we want to test the hy po thesis that µ has a spec ifi c valu e:
H0: µ = µ0
S inc e ¯x estim ates µ, the test is based o n ¯x, which has a (perhaps appro x im ately ) N o rm al d istribu tio n. T hu s,
z = ¯x − µ0 σ/√n
is a stand ard no rm al rand o m variable, u n d e r th e n u ll h y po th e sis.
p-valu es fo r d iff erent alternative hy po theses: • Ha: µ > µ0 – p-valu e is P (Z ≥ z) (area o f
rig ht-hand tail)
• Ha: µ < µ0 – p-valu e is P (Z ≤ z) (area o f
left-hand tail)
• H : µ 6= µ – p-valu e is 2P (Z ≥ |z|) (area o f
Example: F illin g C ok e B ottles (c on t.) We are in terested in assessin g whether or n ot the machin e n eed s to be rec alibrated , which will be the c ase if it is sy stematic ally over- or u n d er-fi llin g bottles. T hu s, we will u se the hy potheses
H0: µ = 1 6 Ha: µ 6= 1 6 R ec all that ¯x = 1 5 .9 4 , σ = 0.2, an d n = 1 00. T hu s, z = x − µ¯ 0 σ/√n = −3 T he p-valu e for a two-sid ed test is p = 2P (Z ≥ 3) = 0.0026 .
If α = 0.01 , we rejec t H0.
Example: TV Tu b es
TV tu b es are tak en at ran d o m an d th e lifetime measu red . n = 1 00, σ = 3 00 an d ¯x= 1 26 5 (d ay s). Test wh eth er th e po pu latio n mean is 1 200, o r g reater th an 1 200. H0: µ = 1 200 Ha: µ > 1 200 U n d er H0,x¯∼ N (1 200, 3 0). ∴z=x−1 2 00¯ 3 0 ∼ N (0, 1 ) u n d er H0 Th e test statistic is z =1 2 6 5−1 2 00 3 0 = 2.1 7 , an d th e p-valu e is P (Z ≥ 2.1 7 |H0) = 0.01 5
Th is is evid en c e ag ain st H0at sig n ifi c an c e level
0.05 , so we rejec t H0. Th at is, we c o n c lu d e th at
th e averag e lifetime o f TV tu b es is g reater th an 1 200 d ay s.
1 3
A R o u g h In te r p r e ta tio n o f p-v a lu e s
p-valu e In te rpre tatio n p >0.1 0 n o e vid e n c e ag ain st H0
0.05 < p ≤ 0.1 0 we ak e vid e n c e ag ain st H0
0.01 < p ≤ 0.05 e vid e n c e ag ain st H0
p ≤0.01 stro n g e vid e n c e ag ain st H0
S ta tistic a l v s. P r a c tic a l S ig n ifi c a n c e S ay in g th at a re su lt is statistically sig n ifi can t d o e s n o t sig n ify th at it is larg e o r n e c e ssarily
im po rtan t. T h at d e c isio n d e pe n d s o n th e partic u lars o f th e pro b le m . A statistic ally sig n ifi c an t re su lt o n ly say s th at th e re is su b stan tial e vid e n c e th at H0is false .
Failu re to re je c t H0d o e s n o t im ply th at H0is
c o rre c t. It o n ly im plie s th at w e h av e in su ffi cien t ev id en ce to co n clu d e th at H0is in co rrect.
1 4
Confidence Inter v a ls a nd H y p oth esis Tests
A level α two -sid ed test rejec ts a hy p o thesis
H0: µ = µ0ex ac tly when the valu e o f µ0falls o u tsid e
a (1 − α) c o n fi d en c e in terval fo r µ.
Fo r ex am p le, c o n sid er a two -sid ed test o f the fo llo win g hy p o theses
H0: µ = µ0
Ha: µ 6= µ0
at the sig n ifi c an c e level α = .0 5 .
• If µ0 is a valu e in sid e the 9 5 % c o n fi d en c e in terval
fo r µ, then this test will have a p-valu e g reater than .0 5 , an d therefo re will n o t rejec t H0.
• If µ0 is a valu e o u tsid e the 9 5 % c o n fi d en c e
in terval fo r µ, then this test will have a p-valu e sm aller than .0 5 , an d therefo re will rejec t H0.
Example
A partic u lar area c ontains 8 0 0 0 c ond ominiu m u nits. In a su rvey of th e oc c u pants, a simple rand om sample of siz e 1 0 0 yield s th e information th at th ere are 1 6 0 motor veh ic les in th e sample g iving an averag e nu mber of motor veh ic les per u nit of 1 .6 , w ith a sample stand ard d eviation of 0 .8 .
C onstru c t a c onfi d enc e interval for th e total nu mber of veh ic les in th e area.
T h e c ity c laims th at th ere are only 1 1 ,0 0 0 veh ic les in th e area, so th ere is no need for a new g arag e. W h at d o you th ink ?
More on C on stru c tin g H y p oth esis Tests
Hypo thesis always refer to so me po pu latio n o r mo d el, no t to a partic u lar o u tc o me. A s a resu lt, H0and Hamu st be ex pressed in terms o f so me
po pu latio n parameter o r parameters.
Ha typic ally ex presses the eff ec t that we ho pe to
fi nd evid enc e fo r. S o Hais u su ally c arefu lly
tho u g ht o u t fi rst. We then set u p H0to be the
c ase when the ho pe-fo r eff ec t is no t present. It is no t always c lear whether Hasho u ld be
o ne-sid ed o r two -sid ed , i.e., d o es the parameter d iff er fro m its nu ll hypo thesis valu e in a spec ifi ed d irec tio n.
N ote: You a re n ot a llowed to look a t th e d a ta fi rst a n d th en fra m e Ha to fi t wh a t
th a t d a ta sh ow.
1 7
Po te n tia l A b u se s o f Te sts
In m any applic ations, a researcher c onstru c ts a nu ll hypotheses with the intent of d isc red iting it. For ex am ple:
• H0: new d ru g has the sam e eff ec t as plac ebo
• H0: m en and wom en are paid eq u ally
A sm all p valu e c an help a d ru g c om pany c an g et a d ru g approved by the FD A. S im ilarly, a researcher m ay have an easier tim e pu blishing his resu lts if the p-valu e is sm aller than 0 .0 5 .
B ec au se of that we have to be aware of the following potential abu ses:
• U sing one-sid ed tests to m ak e the p-valu e one-half as big
• C ond u c ting repeated sam pling and testing and reporting only the lowest p-valu e
• Testing m any hypothesis or testing the sam e hypothesis on m any d iff erent su bg rou ps. In the last two, even if there is ac tu ally no eff ec t, you will probably g et at least one sm all p-valu e.