• No results found

Te sts o f S ig n ifi ca n ce

N/A
N/A
Protected

Academic year: 2021

Share "Te sts o f S ig n ifi ca n ce"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Te sts o f S ig n ifi c a n c e

Outline:

• G eneral Pro c ed ure fo r H y p o th esis Testing – N ull and A lternativ e H y p o th eses – Test S tatistic s

– p-v alues

• Interp retatio n o f th e S ig nifi c anc e L ev el • Tests fo r a Po p ulatio n M ean

• Interp retatio n o f p-v alues

• S tatistic al v s. Prac tic al S ig nifi c anc e • C o nfi d enc e Interv als and H y p o th esis Tests • Po tential A b uses o f Tests

1

A c o n fi d en c e in terval is a very u sefu l statistic al in feren c e to o l w h en th e g o al is to estim ate a po pu latio n param eter.

W h en th e g o al is to assess th e evid en c e pro vid ed by th e d ata in favo r o f so m e c laim abo u t th e po pu latio n , test o f sig n ifi c a n c e are u sed .

E x a m p le: F illin g C o k e B o ttles A m ach in e at a C o k e pro d u c tio n plan t is d esig n ed to fi ll bo ttles w ith 1 6 o z o f C o k e. T h e ac tu al am o u n t varies slig h tly fro m bo ttle to bo ttle. Fro m past ex perien c e, it is k n o w n th at th e S D 0 .2 o z . A S R S o f 1 0 0 bo ttles fi lled by th e m ach in e h as a m ean 1 5 .9 4 o z per bo ttle. Is th is evid en c e th at th e m ach in e n eed s to be rec alibrated , o r c o u ld th is d iff eren c e be a resu lt o f ran d o m variatio n ?

2

Te stin g H y p o the se s A hy p o the sis te st is an assessm en t o f the evid en ce pro vid ed by the d ata in favo r o f (o r ag ain st) so m e claim abo u t the po pu latio n . Fo r ex am ple, su ppo se we perfo rm a ran d o m iz ed ex perim en t o r tak e a ran d o m sam ple an d calcu late so m e sam ple statistic, say the sam ple m ean . We wan t to d ecid e if the observed valu e o f the sa m ple statistic is co n sisten t with so m e h y poth esized valu e o f the co rrespo n d in g popu la tion param eter.

If the o bserved an d hypo thesiz ed valu e d iff er (as they alm o st certain ly will), is the d iff eren ce d u e to an in co rrect hypo thesis o r m erely d u e to chan ce variatio n ?

General P ro c ed u re fo r H y p o theses Testing 1. Fo rm u late the nu ll hy p o thesis and the alternative hy p o thesis

• T he nu ll hy p o thesis H0is the statem ent

being tested . U su ally it states that the d iff erence between the o bserved valu e and the hy p o thesiz ed valu e is o nly d u e to chance variatio n.

Fo r ex am p le, µ = 16 o z .

• T he alternativ e hy p o thesis Hais the

statem ent we will favo r if we fi nd evid ence that the nu ll hy p o thesis is false. It u su ally states that there is a real d iff erence between the o bserved and hy p o thesiz ed valu es. Fo r ex am p le, µ 6= 16 , µ > 16 , o r µ < 16 . A test is called

• two -sid ed if Ha is o f the fo rm µ 6= 16 .

(2)

Example: G R E S c o r es

The m ean sc o re o f all ex am in ees o n the Verb al an d Q u an titative sec tio n s o f the G R E is ab o u t 1 0 4 0 . S u p p o se 5 0 ran d o m ly sam p led U C B erk eley g rad u ate stu d en ts have a m ean G R E V+ Q sc o re o f 1 3 1 0 . We are in terested in d eterm in in g if a m ean G R E V+ Q sc o re o f 1 3 1 0 g ives evid en c e that, as a w ho le, B erk eley g rad u ate stu d en ts have a hig her m ean G R E sc o re than the n atio n al averag e.

What is H0? What is Ha?

5

General P ro c ed u re fo r H y p o th eses Testing c o nt...

2. C alc u late the test statistic o n which the test will be based .

T he test statistic m easu res the d iff erenc e between the o bserved d ata and what wo u ld be ex p ec ted if the nu ll hyp o thesis were tru e. W hen H0is tru e,

we ex p ec t the estim ate based o n the sam p le to tak e a valu e near the p aram ater valu e sp ec ifi ed by H0.

O u r g o al is to answer the q u estio n, “ H o w ex trem e is the valu e c alc u lated fro m the sam p le fro m what we wo u ld ex p ec t u nd er the nu ll hyp o thesis? ” In m any c o m m o n situ atio ns the test statistic has the fo rm

estim ate - hyp o thesiz ed valu e stand ard d eviatio n o f the estim ate

6

Fo r the C o k e ex am ple, we have that the m ean o f the sam ple is 1 5 .9 4 o z . T he po pu latio n m ean spec ifi ed by the nu ll hypo thesis is 1 6 o z . A test statistic is

z =1 5 .9 4 − 1 6 0.2/√1 00 = −3

(W e’ll have m o re to say abo u t this in a m o m ent.)

3. F ind the p-va lu e o f the o bserved resu lt • T he p -valu e is the p ro bability o f o bserving a

test statistic as ex trem e o r m o re ex trem e th an ac tu ally o b serv ed, assu m ing the nu ll

hyp o thesis H0is tru e.

• T he sm aller the p -valu e, the stro ng er the evid enc e ag ain st the nu ll hyp o thesis.

• if the p -valu e is as sm all o r sm aller than so m e nu m ber α (e.g . 0.01 , 0.05 ), we say that the resu lt is sta tistic a lly sig n ifi c a n t at level α. • α is c alled the sig n ifi c a n c e le ve l o f the test. In the c ase o f the C o k e ex am p le, p = 0.001 3 fo r a o ne-sid ed test o r p = 0.002 6 fo r a two -sid ed test. (O nc e ag ain, we’ll have m o re to say abo u t this in a m o m ent.)

(3)

Inte r p r e ta tio n o f th e S ig nifi c a nc e L e v e l To perform a te st o f sig nifi c a nc e le v e l α, we perform the prev iou s three steps an d then rejec t H0if th e p-v alu e is less th an α.

The followin g ou tc om es are possib le when c on d u c tin g a test: R eality O u r D ec ision H0 Ha H0 √ Type I E rror Ha Type II √ E rror

S u ppose H0is ac tu ally tru e. If we d raw m an y

sam ples, an d perform a test for each on e, α of these tests will (in c orrec tly) rejec t H0. In other

word s, α is th e pro bab ility th at w e w ill m ak e a Ty pe I erro r.

Type II error is related to the n otion of the po w er of a test, which we will d isc u ss later.

9

Example: A n Exac t B in o mial Test

In the last 51 Wo rld S eries (thro u g h 2003 ) there have been 24 seven g am e series. S u ppo se we wish to test the hypo thesis

H0: G am es w ith in a W o rld S eries are in d epen d en t, w ith each team h avin g p ro bab ility 12o f w in n in g .

Fo r the alternative hypo thesis, let’s u se the g eneric Ha: T h e m od el in H0 is in co rrect.

L et X d eno te the nu m ber o f g am es in the Wo rld S eries. U nd er H0, X has the fo llo wing d istribu tio n:

k 4 5 6 7

P (X = k) 18 14 165 165 Fo r o u r test statistic, let’s ju st u se

M = # seven g am e series

What is the p-valu e?

We need to find m su ch that PH0 (M ≥ m) ≈ 0.05. A ssu m ing d ifferent years’ Wo rld S eries are ind epend ent (i.e. that the last 51 Wo rld S eries are an S R S fro m the po pu latio n o f Wo rld S eries), the nu m ber o f seven g am e series in 51 “ trials” is B(51, 5/16).

P (M ≥ 20) = 0.086 P (M ≥ 21) = 0.049

We want to have a sig nificance level o f n o m o re th an a 5% , so the critical valu e will be 21.

D o we reject H0 at sig nificance level α = 0.05? T his is ju st a m atter o f check ing whether o u r o bserved valu e o f M (24) ex ceed s the critical valu e (21). It d o es, so we rejec t H0.

10

Te sts fo r a Po p u latio n M e an

In the prec ed ing ex am ple, we were able to perfo rm an ex ac t B ino m ial test. Freq u ently , an ex ac t test is im prac tic al, bu t we c an u se the appro x im ate n o rm ality o f m ean s to c o nd u c t an appro x im ate te st. S u ppo se we want to test the hy po thesis that µ has a spec ifi c valu e:

H0: µ = µ0

S inc e ¯x estim ates µ, the test is based o n ¯x, which has a (perhaps appro x im ately ) N o rm al d istribu tio n. T hu s,

z = ¯x − µ0 σ/√n

is a stand ard no rm al rand o m variable, u n d e r th e n u ll h y po th e sis.

p-valu es fo r d iff erent alternative hy po theses: • Ha: µ > µ0 – p-valu e is P (Z ≥ z) (area o f

rig ht-hand tail)

• Ha: µ < µ0 – p-valu e is P (Z ≤ z) (area o f

left-hand tail)

• H : µ 6= µ – p-valu e is 2P (Z ≥ |z|) (area o f

Example: F illin g C ok e B ottles (c on t.) We are in terested in assessin g whether or n ot the machin e n eed s to be rec alibrated , which will be the c ase if it is sy stematic ally over- or u n d er-fi llin g bottles. T hu s, we will u se the hy potheses

H0: µ = 1 6 Ha: µ 6= 1 6 R ec all that ¯x = 1 5 .9 4 , σ = 0.2, an d n = 1 00. T hu s, z = x − µ¯ 0 σ/√n = −3 T he p-valu e for a two-sid ed test is p = 2P (Z ≥ 3) = 0.0026 .

If α = 0.01 , we rejec t H0.

(4)

Example: TV Tu b es

TV tu b es are tak en at ran d o m an d th e lifetime measu red . n = 1 00, σ = 3 00 an d ¯x= 1 26 5 (d ay s). Test wh eth er th e po pu latio n mean is 1 200, o r g reater th an 1 200. H0: µ = 1 200 Ha: µ > 1 200 U n d er H0,x¯∼ N (1 200, 3 0). ∴z=x−1 2 00¯ 3 0 ∼ N (0, 1 ) u n d er H0 Th e test statistic is z =1 2 6 51 2 00 3 0 = 2.1 7 , an d th e p-valu e is P (Z ≥ 2.1 7 |H0) = 0.01 5

Th is is evid en c e ag ain st H0at sig n ifi c an c e level

0.05 , so we rejec t H0. Th at is, we c o n c lu d e th at

th e averag e lifetime o f TV tu b es is g reater th an 1 200 d ay s.

1 3

A R o u g h In te r p r e ta tio n o f p-v a lu e s

p-valu e In te rpre tatio n p >0.1 0 n o e vid e n c e ag ain st H0

0.05 < p ≤ 0.1 0 we ak e vid e n c e ag ain st H0

0.01 < p ≤ 0.05 e vid e n c e ag ain st H0

p ≤0.01 stro n g e vid e n c e ag ain st H0

S ta tistic a l v s. P r a c tic a l S ig n ifi c a n c e S ay in g th at a re su lt is statistically sig n ifi can t d o e s n o t sig n ify th at it is larg e o r n e c e ssarily

im po rtan t. T h at d e c isio n d e pe n d s o n th e partic u lars o f th e pro b le m . A statistic ally sig n ifi c an t re su lt o n ly say s th at th e re is su b stan tial e vid e n c e th at H0is false .

Failu re to re je c t H0d o e s n o t im ply th at H0is

c o rre c t. It o n ly im plie s th at w e h av e in su ffi cien t ev id en ce to co n clu d e th at H0is in co rrect.

1 4

Confidence Inter v a ls a nd H y p oth esis Tests

A level α two -sid ed test rejec ts a hy p o thesis

H0: µ = µ0ex ac tly when the valu e o f µ0falls o u tsid e

a (1 − α) c o n fi d en c e in terval fo r µ.

Fo r ex am p le, c o n sid er a two -sid ed test o f the fo llo win g hy p o theses

H0: µ = µ0

Ha: µ 6= µ0

at the sig n ifi c an c e level α = .0 5 .

• If µ0 is a valu e in sid e the 9 5 % c o n fi d en c e in terval

fo r µ, then this test will have a p-valu e g reater than .0 5 , an d therefo re will n o t rejec t H0.

• If µ0 is a valu e o u tsid e the 9 5 % c o n fi d en c e

in terval fo r µ, then this test will have a p-valu e sm aller than .0 5 , an d therefo re will rejec t H0.

Example

A partic u lar area c ontains 8 0 0 0 c ond ominiu m u nits. In a su rvey of th e oc c u pants, a simple rand om sample of siz e 1 0 0 yield s th e information th at th ere are 1 6 0 motor veh ic les in th e sample g iving an averag e nu mber of motor veh ic les per u nit of 1 .6 , w ith a sample stand ard d eviation of 0 .8 .

C onstru c t a c onfi d enc e interval for th e total nu mber of veh ic les in th e area.

T h e c ity c laims th at th ere are only 1 1 ,0 0 0 veh ic les in th e area, so th ere is no need for a new g arag e. W h at d o you th ink ?

(5)

More on C on stru c tin g H y p oth esis Tests

Hypo thesis always refer to so me po pu latio n o r mo d el, no t to a partic u lar o u tc o me. A s a resu lt, H0and Hamu st be ex pressed in terms o f so me

po pu latio n parameter o r parameters.

Ha typic ally ex presses the eff ec t that we ho pe to

fi nd evid enc e fo r. S o Hais u su ally c arefu lly

tho u g ht o u t fi rst. We then set u p H0to be the

c ase when the ho pe-fo r eff ec t is no t present. It is no t always c lear whether Hasho u ld be

o ne-sid ed o r two -sid ed , i.e., d o es the parameter d iff er fro m its nu ll hypo thesis valu e in a spec ifi ed d irec tio n.

N ote: You a re n ot a llowed to look a t th e d a ta fi rst a n d th en fra m e Ha to fi t wh a t

th a t d a ta sh ow.

1 7

Po te n tia l A b u se s o f Te sts

In m any applic ations, a researcher c onstru c ts a nu ll hypotheses with the intent of d isc red iting it. For ex am ple:

• H0: new d ru g has the sam e eff ec t as plac ebo

• H0: m en and wom en are paid eq u ally

A sm all p valu e c an help a d ru g c om pany c an g et a d ru g approved by the FD A. S im ilarly, a researcher m ay have an easier tim e pu blishing his resu lts if the p-valu e is sm aller than 0 .0 5 .

B ec au se of that we have to be aware of the following potential abu ses:

• U sing one-sid ed tests to m ak e the p-valu e one-half as big

• C ond u c ting repeated sam pling and testing and reporting only the lowest p-valu e

• Testing m any hypothesis or testing the sam e hypothesis on m any d iff erent su bg rou ps. In the last two, even if there is ac tu ally no eff ec t, you will probably g et at least one sm all p-valu e.

References

Related documents

The prevalence of neck pain was analysed stratified by sex because prevalence of MetS in Finland is more prevalent among males than females [5,18].. Further, neck pain is more

The AIH Diplomate Editor Jas Singh, PhD, CIH Editorial Assistant Lisa Van Wagner Designer Billy Stryker Mission Statement The AIH Diplomate is intended to inform Academy of

Associations Between Recent Exposure to Ambient Fine Particulate Matter and Blood Pressure in the Multi-Ethnic Study of Atherosclerosis (MESA)..

However, the literature suggests a broad range of im- aging modalities and techniques that can be used to elu- cidate the extent of the AC joint disruption, including special

Methods: In this qualitative study, in-depth interviews of 11 Japanese couples n 4 22 were conducted at an outpatient primary care clinic in southeast Michigan by a team of

By 12 weeks most of the participants in the study (both BMP-7 and placebo groups) experienced a 20% improvement in WOMAC pain, and the overall BMP-7 group was similar to placebo

Laminas of the larg- er leaves 2.5–4.5 cm long, 1–1.5 cm wide, narrowly lan- ceolate to narrowly elliptical, apex acute, base cuneate- decurrent, adaxially glabrous, abaxially

This, this you can’t forget because since I started first uh, grade school, we were always… The minute we come… came out from school, they chased us with stones and, you know,