Business Statistics:
A Decision-Making Approach
6 th Edition
Analysis of Variance
Section Goals
After completing this section, you should be able to:
Recognize situations in which to use analysis of variance
Understand different analysis of variance designs
Perform a single-factor hypothesis test and interpret results
Conduct and interpret post-analysis of variance pairwise comparisons procedures
Analyze two-factor analysis of variance test with replications
results
Chapter Overview
Analysis of Variance (ANOVA)
F-test
F-test Tukey-
Kramer
test Fisher’s Least
Significant Difference test One-Way
ANOVA
Randomized Complete Block ANOVA
Two-factor ANOVA
with replication
Analysis of Variance
General ANOVA Setting
Investigator controls one or more independent variables
Called factors (or treatment variables)
Each factor contains two or more levels (or categories/classifications)
Observe effects on dependent variable
Response to levels of independent variable
Experimental design: the plan used to test
hypothesis
One-Way Analysis of Variance
Evaluate the difference among the means of three or more populations
Examples: Accident rates for 1 st , 2 nd , and 3 rd shift Expected mileage for five brands of tires
Assumptions
Populations are normally distributed
Populations have equal variances
Samples are randomly and independently drawn
Completely Randomized Design
Experimental units (subjects) are assigned randomly to treatments
Only one factor or independent variable
With two or more treatment levels
Analyzed by
One-factor analysis of variance (one-way ANOVA)
Called a Balanced Design if all factor levels
have equal sample size
Hypotheses of One-Way ANOVA
All population means are equal
i.e., no treatment effect (no variation in means among groups)
At least one population mean is different
i.e., there is a treatment effect
Does not mean that all population means are different (some pairs may be the same)
k 3
2 1
0 : μ μ μ μ
H
same the
are means
population the
of all Not
:
H A
One-Factor ANOVA
All Means are the same:
The Null Hypothesis is True (No Treatment Effect)
k 3
2 1
0 : μ μ μ μ
H
same the
μ are all
Not :
H A i
3 2
1 μ μ
μ
One-Factor ANOVA
At least one mean is different:
The Null Hypothesis is NOT true (Treatment Effect is present)
k 3
2 1
0 : μ μ μ μ
H
same the
μ are all
Not :
H A i
3 2
1 μ μ
μ μ 1 μ 2 μ 3
or
(continued)
Partitioning the Variation
Total variation can be split into two parts:
SST = Total Sum of Squares
SSB = Sum of Squares Between SSW = Sum of Squares Within
SST = SSB + SSW
Partitioning the Variation
Total Variation = the aggregate dispersion of the individual data values across the various factor levels (SST)
Within-Sample Variation = dispersion that exists among the data values within a particular factor level (SSW)
Between-Sample Variation = dispersion among the factor sample means (SSB)
SST = SSB + SSW
(continued)
Partition of Total Variation
Variation Due to Factor (SSB)
Variation Due to Random Sampling (SSW)
Total Variation (SST)
Commonly referred to as:
Sum of Squares Within
Sum of Squares Error
Sum of Squares Unexplained
Within Groups Variation Commonly referred to as:
Sum of Squares Between
Sum of Squares Among
Sum of Squares Explained
Among Groups Variation
= +
Total Sum of Squares
k
i
n
j
ij
i
) x x
( SST
1 1
2
Where:
SST = Total sum of squares
k = number of populations (levels or treatments) n i = sample size from population i
x ij = j th measurement from population i
x = grand mean (mean of all data values)
SST = SSB + SSW
Total Variation
(continued)
Group 1 Group 2 Group 3 Response, X
X
2 2
12 2
11 x ) ( x x ) ... ( x x )
x (
SST kn
k
Sum of Squares Between
Where:
SSB = Sum of squares between k = number of populations
n i = sample size from population i x i = sample mean from population i
x = grand mean (mean of all data values)
2 1
) x x
( n
SSB i
k
i
i
SST = SSB + SSW
Between-Group Variation
Variation Due to
Differences Among Groups
i j
2 1
) x x
( n
SSB i
k
i
i
1
k MSB SSB
Mean Square Between =
SSB/degrees of freedom
Between-Group Variation
(continued)
Group 1 Group 2 Group 3 Response, X
X
X 1 X 2
X 3
2 2
2 2
2 1
1 ( x x ) n ( x x ) ... n ( x x )
n
SSB k k
Sum of Squares Within
Where:
SSW = Sum of squares within k = number of populations
n i = sample size from population i x i = sample mean from population i
2 1
1
) x x
(
SSW ij i
n
j k
i
j
SST = SSB + SSW
Within-Group Variation
Summing the variation
within each group and then adding over all groups
i
k N
MSW SSW
Mean Square Within = SSW/degrees of freedom 2
1 1
) x x
(
SSW ij i
n
j k
i
j
Within-Group Variation
(continued)
Response, X
X 1 X 2
X 3
2 2
1 12
2 1
11 ) ( ) ... ( )
( x x x x x kn x k
SSW
k
One-Way ANOVA Table
Source of
Variation SS df MS
Between
Samples SSB MSB =
Within
Samples SSW N - k MSW =
Total SST = N - 1 SSB+SSW
k - 1 MSB
MSW F ratio
k = number of populations
N = sum of the sample sizes from all populations
SSB k - 1
SSW N - k
F =
One-Factor ANOVA F Test Statistic
Test statistic
MSB is mean squares between variances MSW is mean squares within variances
Degrees of freedom
df
1= k – 1 (k = number of populations)
df
2= N – k (N = sum of sample sizes from all populations)
MSW F MSB
H 0 : μ 1 = μ 2 = … = μ k
H A : At least two population means are different
Interpreting One-Factor ANOVA F Statistic
The F statistic is the ratio of the between
estimate of variance and the within estimate of variance
The ratio must always be positive
df
1= k -1 will typically be small
df
2= N - k will typically be large
The ratio should be close to 1 if
H 0 : μ 1 = μ 2 = … = μ k is true The ratio will be larger than 1 if
H 0 : μ 1 = μ 2 = … = μ k is false
One-Factor ANOVA F Test Example
You want to see if three different golf clubs yield different distances. You randomly select five
measurements from trials on an automated driving
machine for each club. At the .05 significance level, is there a difference in mean
distance?
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204
••
••
•
One-Factor ANOVA Example:
Scatter Diagram
270 260 250 240 230 220 210 200 190
• •
••
•
••
• ••
Distance
X 1
X 2
X 3
X
227.0 x
205.8 x
226.0 x
249.2
x
1 2 3
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204
1 2 3
One-Factor ANOVA Example Computations
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204
x
1= 249.2 x
2= 226.0 x
3= 205.8 x = 227.0
n
1= 5 n
2= 5 n
3= 5 N = 15 k = 3
SSB = 5 [ (249.2 – 227)
2+ (226 – 227)
2+ (205.8 – 227)
2] = 4716.4 SSW = (254 – 249.2)
2+ (263 – 249.2)
2+…+ (204 – 205.8)
2= 1119.6 MSB = 4716.4 / (3-1) = 2358.2
MSW = 1119.6 / (15-3) = 93.3 25.275
93.3 2358.2
F
F = 25.275
One-Factor ANOVA Example Solution
H 0 : μ 1 = μ 2 = μ 3 H A : μ i not all equal
= .05
df 1 = 2 df 2 = 12
Test Statistic:
Decision:
Conclusion:
Reject H 0 at = 0.05
There is evidence that at least one μ i differs from the rest
0
= .05
F = 3.885
Reject H0 Do not
reject H
25.275 93.3
2358.2 MSW
F MSB
Critical Value:
F
= 3.885
SUMMARY
Groups Count Sum Average Variance
Club 1 5 1246 249.2 108.2
Club 2 5 1130 226 77.5
Club 3 5 1029 205.8 94.2
ANOVA Source of
Variation SS df MS F P-value F crit
Between
Groups 4716.4 2 2358.2 25.275 4.99E-05 3.885 Within
Groups 1119.6 12 93.3
Total 5836.0 14
ANOVA -- Single Factor:
Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
The Tukey-Kramer Procedure
Tells which population means are significantly different
e.g.: μ 1 = μ 2 μ 3
Done after rejection of equal means in ANOVA
Allows pair-wise comparisons
Compare absolute mean differences with critical range
μ x
1 = μ 2 μ
3
Tukey-Kramer Critical Range
where:
q
= Value from standardized range table
with k and N - k degrees of freedom for the desired level of
MSW = Mean Square Within
n
iand n
j= Sample sizes from populations (levels) i and j
j
i n
1 n
1 2
q MSW Range
Critical
Tukey-Kramer Critical Range
The Tukey-Kramer Procedure:
Example
1. Compute absolute mean differences:
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204 x x 226.0 205.8 20.2
43.4 205.8
249.2 x
x
23.2 226.0
249.2 x
x
3 2
3 1
2 1
2. Find the q value from the table in appendix J with k and N - k degrees of freedom for
the desired level of
3.77
q α
The Tukey-Kramer Procedure:
Example
5. All of the absolute mean differences are greater than critical range.
Therefore there is a significant difference between each pair of means at 5% level of significance.
16.285 5
1 5
1 2
3.77 93.3 n
1 n
1 2
q MSW Range
Critical
j i
α
3. Compute Critical Range:
20.2 x
x
43.4 x
x
23.2 x
x
3 2
3 1
2 1
4. Compare:
Fisher’s LSD test
Tells which population means are significantly different
e.g.: μ 1 = μ 2 μ 3
Done after rejection of equal means in ANOVA
Allows pair-wise comparisons
Compare absolute mean differences with critical range
μ = μ μ x
where:
t
/2= Value from t-table at the desired level of with N-k degrees of freedom
MSW = Mean Square Within
n
iand n
j= Sample sizes from populations (levels) i and j
j i
2
/ n
1 n
M 1
L SD t SW
Fisher’s LSD test
1. Compute absolute mean differences:
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204 x x 226.0 205.8 20.2
43.4 205.8
249.2 x
x
23.2 226.0
249.2 x
x
3 2
3 1
2 1
2.179 t α/2, 12
Fisher’s LSD test
5. All of the absolute mean differences are greater than critical range.
Therefore there is a significant difference between each pair of means at 5% level of significance.
31 . 3 5 1
1 5 3 1 . 3 9 179 . n 2
1 n
M SW 1 t
Range Critical
j i
α/2
3. Compute Critical Range:
20.2 x
x
43.4 x
x
23.2 x
x
3 2
3 1
2 1
4. Compare:
Fisher’s LSD test
Randomized Complete Block ANOVA
Like One-Way ANOVA, we test for equal population means (for different factor levels, for example)...
...but we want to control for possible variation from a second factor (with two or more levels)
Used when more than one factor may influence the value of the dependent variable, but only one is of key interest
Levels of the secondary factor are called blocks
Partitioning the Variation
Total variation can now be split into three parts:
SST = Total sum of squares
SSB = Sum of squares between factor levels SSBL = Sum of squares between blocks
SSW = Sum of squares within levels
SST = SSB + SSBL + SSW
Two-Way ANOVA
Examines the effect of
Two or more factors of interest on the dependent variable
e.g.: Percent carbonation and line speed on soft drink bottling process
Interaction between the different levels of these two factors
e.g.: Does the effect of one particular
percentage of carbonation depend on which
level the line speed is set?
Two-Way ANOVA
Assumptions
Populations are normally distributed
Populations have equal variances
Independent random samples are drawn
(continued)
Two-Way ANOVA Sources of Variation
Two Factors of interest: A and B
a = number of levels of factor A b = number of levels of factor B
N = total number of observations in all cells
Two-Way ANOVA Sources of Variation
SST
Total Variation
SS A
Variation due to factor A
SS B
Variation due to factor B
SS AB
Variation due to interaction between A and B
SSE
Inherent variation (Error)
Degrees of Freedom:
a – 1
b – 1
(a – 1)(b – 1)
N – ab N - 1
SST = SS A + SS B + SS AB + SSE
(continued)
Two Factor ANOVA Equations
a
i
b
j
n
k
ijk x ) x
( SST
1 1 1
2
2 1
) x x
( n
b SS
a
i
i
A
2 1
) x x
( n
a SS
b
j
j
B
Total Sum of Squares:
Sum of Squares Factor A:
Sum of Squares Factor B:
Two Factor ANOVA Equations
2
1 1
) x x
x x
( n
SS
a
i
b
j
j i
ij
AB
a
i
b
j
n
k
ij
ijk x )
x ( SSE
1 1 1
2
Sum of Squares
Interaction Between A and B:
Sum of Squares Error:
(continued)
Two Factor ANOVA Equations
where:
Mean Grand
n ab
x x
a
i
b
j
n
k
ijk
1 1 1
A factor of
level each
of n Mean
b x x
b
j
n
k
ijk
i
1 1
B factor of
level each
of n Mean
a x x
a
i
n
k
ijk
j
1 1
cell each
of n Mean
x x
n ijk
ij
a = number of levels of factor A
b = number of levels of factor B
(continued)
Mean Square Calculations
1
a
A SS factor
square Mean
MS A A
1
b
B SS factor
square Mean
MS B B
) b
)(
a ( n SS interactio
square Mean
MS AB AB
1
1
ab N
error SSE square
Mean
MSE
Two-Way ANOVA:
The F Test Statistic
F Test for Factor B Main Effect
F Test for Interaction Effect H
0: μ
A1= μ
A2= μ
A3= • • •
H
A: Not all μ
Aiare equal
H
0: factors A and B do not interact to affect the mean response H
A: factors A and B do interact
F Test for Factor A Main Effect
H
0: μ
B1= μ
B2= μ
B3= • • • H
A: Not all μ
Biare equal
Reject H
0if F > F
MSE F MS
AMSE F MS
BMSE F MS
ABReject H
0if F > F
Reject H
0if F > F
Two-Way ANOVA Summary Table
Source of Variation
Sum of Squares
Degrees of Freedom
Mean Squares
F Statistic
Factor A SS
Aa – 1 MS
A= SS
A/(a – 1)
MS
AMSE
Factor B SS
Bb – 1 MS
B= SS
B/(b – 1)
MS
BMSE AB
(Interaction) SS
AB(a – 1)(b – 1) MS
AB= SS
AB/ [(a – 1)(b – 1)]
MS
ABMSE
Error SSE N – ab MSE =
SSE/(N – ab)
Total SST N – 1
Features of Two-Way ANOVA F Test
Degrees of freedom always add up
N-1 = (N-ab) + (a-1) + (b-1) + (a-1)(b-1)
Total = error + factor A + factor B + interaction
The denominator of the F Test is always the same but the numerator is different
The sums of squares always add up
SST = SSE + SS A + SS B + SS AB
Total = error + factor A + factor B + interaction
Examples:
Interaction vs. No Interaction
No interaction:
1 2
Factor B Level 1 Factor B Level 3 Factor B Level 2
Factor A Levels 1 2
Factor B Level 1
Factor B Level 3 Factor B Level 2
Factor A Levels
Mean Res pons e Mean Res pons e
Interaction is
present:
Section Summary
Described one-way analysis of variance
The logic of ANOVA
ANOVA assumptions
F test for difference in k means
The Tukey-Kramer procedure for multiple comparisons
Described two-way analysis of variance