Analysis of Variance
2.6.1 One-Way Layout
In the analysis of a one-way layout, we group observations according to their corre-sponding treatment. For instance, we group repeated measurements of the packet
ptg7913109 loss rate for a given buffer size: say, five buffers. The key idea in ANOVA is that if
none of the treatments—such as the buffer size—affect the observed variable, such as the loss rate, all the observations can be assumed to be drawn from the same population. Therefore, the sample mean computed for observations corresponding to each treatment should not be too far from the sample mean computed across all the observations. Moreover, the estimate of population variance computed from each group separately should not differ too much from the variance estimated from the entire sample. If we do find a significant difference between statistics computed from each group separately and the sample as a whole, we reject the null hypothe-sis. That is, we conclude that, with high probability, the treatments affect the observed outcomes. By itself, that is all that basic ANOVA can tell us. Further test-ing is necessary to determine which treatments affect the outcome and which do not.
We now make this more precise. Suppose that we can divide the observations into I groups of J samples each. (We assume that all groups have the same number of samples, which is usually not a problem, because the treatments are under the control of the experimenter.) We denote the jth observation of the ith treatment by the random variable Yij. We model this observation as the sum of an underlying population mean , the true effect of the ith treatment , and a random fluctua-tion :
(EQ 2.40)
These errors are assumed to be independent and normally distributed with zero mean and a variance of . For convenience, we normalize the s so that . Note that the expected outcome for the ith treatment is E(Yij) = .
The null hypothesis is that the treatments have no effect on the outcome. If the null hypothesis holds, the expected value of each group of observations would be , so that . Moreover, the population variance would be .
Let the mean of the ith group of observations be denoted and the mean of all the observations be denoted . We denote the sum of squared deviations from the mean within each sample by
(EQ 2.41)
where is an unbiased estimator of the population variance because it sums I unbiased estimators, each given by
P Di
ptg7913109 2.6 Comparing Multiple Outcomes Simultaneously: Analysis of Variance 97
.
Similarly, we denote the sum of squared deviations from the mean between samples by
(EQ 2.42)
where SSB/(I – 1) is also an unbiased estimator of the population variance because
is an unbiased estimator of . So, the ratio should be 1 if the null hypothesis holds.
It can be shown that SSB/(I – 1) is an variable with I – 1 degrees of freedom and that SSW/I(J – 1) is an variable with I(J – 1) degrees of freedom. The ratio of two variables with m and n degrees of freedom follows a distribution called the F distribution with (m,n) degrees of freedom. Therefore, the variable fol-lows the F distribution with (I – 1, I(J – 1)) degrees of freedom, and has an expected value of 1 if the null hypothesis is true.
To test the null hypothesis, we compute the value of and compare it with the critical value of an F variable with (I – 1, I(J – 1)) degrees of freedom. If the computed value exceeds the critical value, the null hypothesis is rejected. Intui-tively, this would happen if SSB is “too large” that is, there is significant variation in the sums of squares between treatments, which is what we expect when the treatment does have an effect on the observed outcome.
EXAMPLE 2.18: SINGLE-FACTOR ANOVA
Continuing with Example 2.9, assume that we have additional data for larger buffer sizes, as follows. Can we still claim that the buffer size plays a role in determining the loss rate?
Loss rate with
5 buffers 1.20% 1.30% 0.90% 1.40% 1.00% 1.80% 1.10% 1.20% 1.50% 1.20%
Loss rate with
100 buffers 0.10% 0.60% 1.10% 0.80% 1.20% 0.30% 0.70% 1.90% 0.20% 1.20%
continues
---ptg7913109 Here, I = 5 and J = 10. We compute = 1.26%, = 0.81%, =
0.44%, = 0.07%, and = 0.01%. This allows us to compute SSW = 5.13 * 10–5 and SSB = 1.11 * 10–3. The F statistic is therefore (1.11 * 10–3/ 4)/
(5.13 * 10–5/45) = 242.36. Looking up the F table, we find that even with only (4, 40) degrees of freedom, the critical F value at the 1% confidence level is 3.83. The computed statistic far exceeds this value. Therefore, the null hypothesis is rejected.
The F test is somewhat anticlimactic: It indicates only that a treatment has an effect on the outcome, but it does not quantify the degree of effect. Nor does it iden-tify whether any one treatment is responsible for the failure of the test. These ques-tions can be resolved by post hoc analysis. For example, to quantify the degree of effect, we can compute the regression of the observed effect as a function of the treatment. To identify the treatment that is responsible for the failure of the test, we can rerun the F test, eliminating one treatment at a time. If the F test does not reject the null hypothesis with a particular treatment removed, we can hypothesize that this treatment has a significant effect on the outcome, testing this hypothesis with a two-variable test.
If these approaches do not work, two more advanced techniques to perform mul-tiway comparisons are Tukey’s method and the Bonferroni method (see Section 2.10 for texts that discussed these methods).