• No results found

A Decision-Making Approach

N/A
N/A
Protected

Academic year: 2021

Share "A Decision-Making Approach"

Copied!
53
0
0

Loading.... (view fulltext now)

Full text

(1)

Business Statistics:

A Decision-Making Approach

6 th Edition

Analysis of Variance

(2)

Section Goals

After completing this section, you should be able to:

Recognize situations in which to use analysis of variance

Understand different analysis of variance designs

Perform a single-factor hypothesis test and interpret results

Conduct and interpret post-analysis of variance pairwise comparisons procedures

Analyze two-factor analysis of variance test with replications

results

(3)

Chapter Overview

Analysis of Variance (ANOVA)

F-test

F-test Tukey-

Kramer

test Fisher’s Least

Significant Difference test One-Way

ANOVA

Randomized Complete Block ANOVA

Two-factor ANOVA

with replication

(4)

Analysis of Variance

(5)

General ANOVA Setting

 Investigator controls one or more independent variables

Called factors (or treatment variables)

Each factor contains two or more levels (or categories/classifications)

 Observe effects on dependent variable

Response to levels of independent variable

 Experimental design: the plan used to test

hypothesis

(6)

One-Way Analysis of Variance

 Evaluate the difference among the means of three or more populations

Examples: Accident rates for 1 st , 2 nd , and 3 rd shift Expected mileage for five brands of tires

 Assumptions

Populations are normally distributed

Populations have equal variances

Samples are randomly and independently drawn

(7)

Completely Randomized Design

 Experimental units (subjects) are assigned randomly to treatments

 Only one factor or independent variable

With two or more treatment levels

 Analyzed by

One-factor analysis of variance (one-way ANOVA)

 Called a Balanced Design if all factor levels

have equal sample size

(8)

Hypotheses of One-Way ANOVA

All population means are equal

i.e., no treatment effect (no variation in means among groups)

At least one population mean is different

i.e., there is a treatment effect

Does not mean that all population means are different (some pairs may be the same)

k 3

2 1

0 : μ μ μ μ

H     

same the

are means

population the

of all Not

:

H A

(9)

One-Factor ANOVA

All Means are the same:

The Null Hypothesis is True (No Treatment Effect)

k 3

2 1

0 : μ μ μ μ

H     

same the

μ are all

Not :

H A i

3 2

1 μ μ

μ  

(10)

One-Factor ANOVA

At least one mean is different:

The Null Hypothesis is NOT true (Treatment Effect is present)

k 3

2 1

0 : μ μ μ μ

H     

same the

μ are all

Not :

H A i

3 2

1 μ μ

μ   μ 1  μ 2  μ 3

or

(continued)

(11)

Partitioning the Variation

 Total variation can be split into two parts:

SST = Total Sum of Squares

SSB = Sum of Squares Between SSW = Sum of Squares Within

SST = SSB + SSW

(12)

Partitioning the Variation

Total Variation = the aggregate dispersion of the individual data values across the various factor levels (SST)

Within-Sample Variation = dispersion that exists among the data values within a particular factor level (SSW)

Between-Sample Variation = dispersion among the factor sample means (SSB)

SST = SSB + SSW

(continued)

(13)

Partition of Total Variation

Variation Due to Factor (SSB)

Variation Due to Random Sampling (SSW)

Total Variation (SST)

Commonly referred to as:

 Sum of Squares Within

 Sum of Squares Error

 Sum of Squares Unexplained

 Within Groups Variation Commonly referred to as:

 Sum of Squares Between

 Sum of Squares Among

 Sum of Squares Explained

 Among Groups Variation

= +

(14)

Total Sum of Squares

  

k

i

n

j

ij

i

) x x

( SST

1 1

2

Where:

SST = Total sum of squares

k = number of populations (levels or treatments) n i = sample size from population i

x ij = j th measurement from population i

x = grand mean (mean of all data values)

SST = SSB + SSW

(15)

Total Variation

(continued)

Group 1 Group 2 Group 3 Response, X

X

2 2

12 2

11 x ) ( x x ) ... ( x x )

x (

SST       kn

k

(16)

Sum of Squares Between

Where:

SSB = Sum of squares between k = number of populations

n i = sample size from population i x i = sample mean from population i

x = grand mean (mean of all data values)

2 1

) x x

( n

SSB i

k

i

i 

 

SST = SSB + SSW

(17)

Between-Group Variation

Variation Due to

Differences Among Groups

ij

2 1

) x x

( n

SSB i

k

i

i 

 

 1

 k MSB SSB

Mean Square Between =

SSB/degrees of freedom

(18)

Between-Group Variation

(continued)

Group 1 Group 2 Group 3 Response, X

X

X 1 X 2

X 3

2 2

2 2

2 1

1 ( x x ) n ( x x ) ... n ( x x )

n

SSB       k k

(19)

Sum of Squares Within

Where:

SSW = Sum of squares within k = number of populations

n i = sample size from population i x i = sample mean from population i

2 1

1

) x x

(

SSW ij i

n

j k

i

j

  

SST = SSB + SSW

(20)

Within-Group Variation

Summing the variation

within each group and then adding over all groups

i

k N

MSW SSW

 

Mean Square Within = SSW/degrees of freedom 2

1 1

) x x

(

SSW ij i

n

j k

i

j

  

(21)

Within-Group Variation

(continued)

Response, X

X 1 X 2

X 3

2 2

1 12

2 1

11 ) ( ) ... ( )

( x x x x x kn x k

SSW      

k

(22)

One-Way ANOVA Table

Source of

Variation SS df MS

Between

Samples SSB MSB =

Within

Samples SSW N - k MSW =

Total SST = N - 1 SSB+SSW

k - 1 MSB

MSW F ratio

k = number of populations

N = sum of the sample sizes from all populations

SSB k - 1

SSW N - k

F =

(23)

One-Factor ANOVA F Test Statistic

Test statistic

MSB is mean squares between variances MSW is mean squares within variances

Degrees of freedom

df

1

= k – 1 (k = number of populations)

df

2

= N – k (N = sum of sample sizes from all populations)

MSW F  MSB

H 0 : μ 1 = μ 2 = … = μ k

H A : At least two population means are different

(24)

Interpreting One-Factor ANOVA F Statistic

 The F statistic is the ratio of the between

estimate of variance and the within estimate of variance

The ratio must always be positive

df

1

= k -1 will typically be small

df

2

= N - k will typically be large

The ratio should be close to 1 if

H 0 : μ 1 = μ 2 = … = μ k is true The ratio will be larger than 1 if

H 0 : μ 1 = μ 2 = … = μ k is false

(25)

One-Factor ANOVA F Test Example

You want to see if three different golf clubs yield different distances. You randomly select five

measurements from trials on an automated driving

machine for each club. At the .05 significance level, is there a difference in mean

distance?

Club 1 Club 2 Club 3

254 234 200

263 218 222

241 235 197

237 227 206

251 216 204

(26)

••

••

One-Factor ANOVA Example:

Scatter Diagram

270 260 250 240 230 220 210 200 190

••

••

••

Distance

X 1

X 2

X 3

X

227.0 x

205.8 x

226.0 x

249.2

x

1 2 3

Club 1 Club 2 Club 3

254 234 200

263 218 222

241 235 197

237 227 206

251 216 204

1 2 3

(27)

One-Factor ANOVA Example Computations

Club 1 Club 2 Club 3

254 234 200

263 218 222

241 235 197

237 227 206

251 216 204

x

1

= 249.2 x

2

= 226.0 x

3

= 205.8 x = 227.0

n

1

= 5 n

2

= 5 n

3

= 5 N = 15 k = 3

SSB = 5 [ (249.2 – 227)

2

+ (226 – 227)

2

+ (205.8 – 227)

2

] = 4716.4 SSW = (254 – 249.2)

2

+ (263 – 249.2)

2

+…+ (204 – 205.8)

2

= 1119.6 MSB = 4716.4 / (3-1) = 2358.2

MSW = 1119.6 / (15-3) = 93.3 25.275

93.3 2358.2

F  

(28)

F = 25.275

One-Factor ANOVA Example Solution

H 0 : μ 1 = μ 2 = μ 3 H A : μ i not all equal

 = .05

df 1 = 2 df 2 = 12

Test Statistic:

Decision:

Conclusion:

Reject H 0 at  = 0.05

There is evidence that at least one μ i differs from the rest

0

 = .05

F = 3.885

Reject H0 Do not

reject H

25.275 93.3

2358.2 MSW

F  MSB  

Critical Value:

F

= 3.885

(29)

SUMMARY

Groups Count Sum Average Variance

Club 1 5 1246 249.2 108.2

Club 2 5 1130 226 77.5

Club 3 5 1029 205.8 94.2

ANOVA Source of

Variation SS df MS F P-value F crit

Between

Groups 4716.4 2 2358.2 25.275 4.99E-05 3.885 Within

Groups 1119.6 12 93.3

Total 5836.0 14

ANOVA -- Single Factor:

Excel Output

EXCEL: tools | data analysis | ANOVA: single factor

(30)

The Tukey-Kramer Procedure

 Tells which population means are significantly different

e.g.: μ 1 = μ 2  μ 3

Done after rejection of equal means in ANOVA

 Allows pair-wise comparisons

Compare absolute mean differences with critical range

μ x

1 = μ 2 μ

3

(31)

Tukey-Kramer Critical Range

where:

q

= Value from standardized range table

with k and N - k degrees of freedom for the desired level of 

MSW = Mean Square Within

n

i

and n

j

= Sample sizes from populations (levels) i and j

 

 

 

j

i n

1 n

1 2

q MSW Range

Critical

(32)

Tukey-Kramer Critical Range

(33)

The Tukey-Kramer Procedure:

Example

1. Compute absolute mean differences:

Club 1 Club 2 Club 3

254 234 200

263 218 222

241 235 197

237 227 206

251 216 204 x x 226.0 205.8 20.2

43.4 205.8

249.2 x

x

23.2 226.0

249.2 x

x

3 2

3 1

2 1

2. Find the q value from the table in appendix J with k and N - k degrees of freedom for

the desired level of 

3.77

q α

(34)

The Tukey-Kramer Procedure:

Example

5. All of the absolute mean differences are greater than critical range.

Therefore there is a significant difference between each pair of means at 5% level of significance.

16.285 5

1 5

1 2

3.77 93.3 n

1 n

1 2

q MSW Range

Critical

j i

α

 

 

  

 

 

 

3. Compute Critical Range:

20.2 x

x

43.4 x

x

23.2 x

x

3 2

3 1

2 1

4. Compare:

(35)

Fisher’s LSD test

 Tells which population means are significantly different

e.g.: μ 1 = μ 2  μ 3

Done after rejection of equal means in ANOVA

 Allows pair-wise comparisons

Compare absolute mean differences with critical range

μ = μ μ x

(36)

where:

t

/2

= Value from t-table at the desired level of  with N-k degrees of freedom

MSW = Mean Square Within

n

i

and n

j

= Sample sizes from populations (levels) i and j

 

 

 

j i

2

/ n

1 n

M 1

L SD t SW

Fisher’s LSD test

(37)

1. Compute absolute mean differences:

Club 1 Club 2 Club 3

254 234 200

263 218 222

241 235 197

237 227 206

251 216 204 x x 226.0 205.8 20.2

43.4 205.8

249.2 x

x

23.2 226.0

249.2 x

x

3 2

3 1

2 1

2.179 t α/2, 12

Fisher’s LSD test

(38)

5. All of the absolute mean differences are greater than critical range.

Therefore there is a significant difference between each pair of means at 5% level of significance.

31 . 3 5 1

1 5 3 1 . 3 9 179 . n 2

1 n

M SW 1 t

Range Critical

j i

α/2

 

 

  

 

 

 

3. Compute Critical Range:

20.2 x

x

43.4 x

x

23.2 x

x

3 2

3 1

2 1

4. Compare:

Fisher’s LSD test

(39)

Randomized Complete Block ANOVA

Like One-Way ANOVA, we test for equal population means (for different factor levels, for example)...

...but we want to control for possible variation from a second factor (with two or more levels)

Used when more than one factor may influence the value of the dependent variable, but only one is of key interest

Levels of the secondary factor are called blocks

(40)

Partitioning the Variation

 Total variation can now be split into three parts:

SST = Total sum of squares

SSB = Sum of squares between factor levels SSBL = Sum of squares between blocks

SSW = Sum of squares within levels

SST = SSB + SSBL + SSW

(41)

Two-Way ANOVA

 Examines the effect of

Two or more factors of interest on the dependent variable

e.g.: Percent carbonation and line speed on soft drink bottling process

Interaction between the different levels of these two factors

e.g.: Does the effect of one particular

percentage of carbonation depend on which

level the line speed is set?

(42)

Two-Way ANOVA

 Assumptions

 Populations are normally distributed

 Populations have equal variances

 Independent random samples are drawn

(continued)

(43)

Two-Way ANOVA Sources of Variation

Two Factors of interest: A and B

a = number of levels of factor A b = number of levels of factor B

N = total number of observations in all cells

(44)

Two-Way ANOVA Sources of Variation

SST

Total Variation

SS A

Variation due to factor A

SS B

Variation due to factor B

SS AB

Variation due to interaction between A and B

SSE

Inherent variation (Error)

Degrees of Freedom:

a – 1

b – 1

(a – 1)(b – 1)

N – ab N - 1

SST = SS A + SS B + SS AB + SSE

(continued)

(45)

Two Factor ANOVA Equations

  

a

i

b

j

n

k

ijk x ) x

( SST

1 1 1

2

2 1

) x x

( n

b SS

a

i

i

A    

2 1

) x x

( n

a SS

b

j

j

B    

Total Sum of Squares:

Sum of Squares Factor A:

Sum of Squares Factor B:

(46)

Two Factor ANOVA Equations

2

1 1

) x x

x x

( n

SS

a

i

b

j

j i

ij

AB      

 

  

a

i

b

j

n

k

ij

ijk x )

x ( SSE

1 1 1

2

Sum of Squares

Interaction Between A and B:

Sum of Squares Error:

(continued)

(47)

Two Factor ANOVA Equations

where:

Mean Grand

n ab

x x

a

i

b

j

n

k

ijk

 

 

 

1 1 1

A factor of

level each

of n Mean

b x x

b

j

n

k

ijk

i

  

1 1

B factor of

level each

of n Mean

a x x

a

i

n

k

ijk

j

  

 1 1

cell each

of n Mean

x x

n ijk

ij

 

 a = number of levels of factor A

b = number of levels of factor B

(continued)

(48)

Mean Square Calculations

 1

 a

A SS factor

square Mean

MS A A

 1

 b

B SS factor

square Mean

MS B B

) b

)(

a ( n SS interactio

square Mean

MS AB AB

1

1 

 

ab N

error SSE square

Mean

MSE   

(49)

Two-Way ANOVA:

The F Test Statistic

F Test for Factor B Main Effect

F Test for Interaction Effect H

0

: μ

A1

= μ

A2

= μ

A3

= • • •

H

A

: Not all μ

Ai

are equal

H

0

: factors A and B do not interact to affect the mean response H

A

: factors A and B do interact

F Test for Factor A Main Effect

H

0

: μ

B1

= μ

B2

= μ

B3

= • • • H

A

: Not all μ

Bi

are equal

Reject H

0

if F > F

MSE F  MS

A

MSE F  MS

B

MSE F  MS

AB

Reject H

0

if F > F

Reject H

0

if F > F

(50)

Two-Way ANOVA Summary Table

Source of Variation

Sum of Squares

Degrees of Freedom

Mean Squares

F Statistic

Factor A SS

A

a – 1 MS

A

= SS

A

/(a – 1)

MS

A

MSE

Factor B SS

B

b – 1 MS

B

= SS

B

/(b – 1)

MS

B

MSE AB

(Interaction) SS

AB

(a – 1)(b – 1) MS

AB

= SS

AB

/ [(a – 1)(b – 1)]

MS

AB

MSE

Error SSE N – ab MSE =

SSE/(N – ab)

Total SST N – 1

(51)

Features of Two-Way ANOVA F Test

 Degrees of freedom always add up

 N-1 = (N-ab) + (a-1) + (b-1) + (a-1)(b-1)

 Total = error + factor A + factor B + interaction

 The denominator of the F Test is always the same but the numerator is different

 The sums of squares always add up

SST = SSE + SS A + SS B + SS AB

 Total = error + factor A + factor B + interaction

(52)

Examples:

Interaction vs. No Interaction

 No interaction:

1 2

Factor B Level 1 Factor B Level 3 Factor B Level 2

Factor A Levels 1 2

Factor B Level 1

Factor B Level 3 Factor B Level 2

Factor A Levels

Mean Res pons e Mean Res pons e

 Interaction is

present:

(53)

Section Summary

 Described one-way analysis of variance

The logic of ANOVA

ANOVA assumptions

F test for difference in k means

The Tukey-Kramer procedure for multiple comparisons

 Described two-way analysis of variance

Examined effects of multiple factors and interaction

References

Related documents

17 The results of the homoscedastic tobit models can be found in table 8 (appendix).. The results of the heteroscedastic tobit regressions are shown in tables 3 and 4. Our main

working through the headquarters emergency control centre and the District Officers. The District Officers will coordinate disaster relief efforts at the local level, in

These data show that a failed cough reflex test is not indicative of a history of chest infection / aspiration pneumonia in a general population of rest home residents.. This

For those generations who work, the higher official tax rates means lower net wage rates, but the effect of the higher effective interest rate on the savings rate turns out to be

palpačný nález indurácie končatinového svalstva (Khan, 2009). Rhabdomyolýza sa môže prejavovať aj celkovými príznakmi – horúčka, tachykardia, Rhabdomyolýza môže spôsobovať celý

We give a reduction from provenance verification problem to coverability in labeled Petri nets, where tokens carry (potentially unbounded) provenance data.. As a result, we obtain

As shown in the “Infrastructure” section above, despite regulatory liberalization, the incumbent Türk Telekom’s domination in the fixed-line telephony and broadband segments

Table 5.7: Mean and standard deviation of blood pressure measurements in three days test and re-test measurements in healthy subjects. P≤0.05 vs visit one using Tukey post hoc test,