6.4 Fixing Problems
6.4.2 Accommodating nonconstant variance
The usual way to fix nonconstant error variances is by transformation of the response. For some distributions, there are standard transformations that equalize or stabilize the variance. In other distributions, we use a more ad hoc approach. We can also use some alternative methods instead of the usual ANOVA.
Transformations of the response
There is a general theory of variance-stabilizing transformations that applies to distributions where the variance depends on the mean. For example, Bino- mial(1,p) data have a mean of p and a variance of p(1−p). This method uses the relationship between the mean and the variance to construct a transfor- mation such that the variance of the data after transformation is constant and
Variance- stabilizing transformations
no longer depends on the mean. (See Bishop, Fienberg, and Holland 1975.) These transformations generally work better when the sample size is large
6.4 Fixing Problems 127
Table 6.3:Variance-stabilizing transformations.
Distribution Transformation New variance Binomial proportions X ∼ Bin(n, p) ˆ p = X/n Var(ˆp) = p(1 − p)/n arcsin(√p)ˆ 1/(4n) Poisson X ∼ Poisson(λ) Var(X) = E(X) = λ √ X 14 Correlation coefficient (ui, vi), i = 1, . . . , n are
independent, bivariate normal pairs with correlationρ and
sample correlationρˆ 1 2log 1+ˆρ 1−ˆρ 1
(or the mean is large relative to the standard deviation); modifications may be needed otherwise.
Table 6.3 lists a few distributions with their variance-stabilizing transfor- mations. Binomial proportions model the fraction of success in some number of trials. If all proportions are between about .2 and .8, then the variance is fairly constant and the transformation gives little improvement. The Poisson distribution is often used to model counts; for example, the number of bacte- ria in a volume of solution or the number of asbestos particles in a volume of air.
Artificial insemination in chickens Example 6.6
Tajima (1987) describes an experiment examining the effect of a freeze-thaw cycle on the potency of semen used for artificial insemination in chickens. Four semen mixtures are prepared. Each mixture consists of equal volumes of semen from Rhode Island Red and White Leghorn roosters. Mixture 1 has both varieties fresh, mixture 4 has both varieties frozen, and mixtures 2 and 3 each have one variety fresh and the other frozen. Sixteen batches of Rhode Island Red hens are inseminated with the mixtures, using a balanced completely randomized design. The response is the fraction of chicks from each batch that have white feathers (white feathers indicate a White Leghorn father).
It is natural to model these fractions as binomial proportions. Each chick in a given treatment group has the same probability of having a White Leg-
128 Checking Assumptions
horn father, though this probability may vary between groups due to the freeze-thaw treatments. Thus the total number of chicks with white feath- ers in a given batch should have a binomial distribution, and the fraction of chicks is a binomial proportion. The observed proportions ranged from .19 to .95, so the arcsine square root transformation is a good bet to stabilize the variability.
When we don’t have a distribution with a known variance-stabilizing transformation (and we generally don’t), then we usually try a power fam-
Power family
transformations ily transformation. The power family of transformations includes
y → sign(λ)yλ
and
y → log(y) ,
where sign(λ) is +1 for positive λ and –1 for negative λ. The log function
corresponds toλ equal to zero. We multiply by the sign of λ so that the order
of the responses is preserved whenλ is negative.
Power family transformations are not likely to have much effect unless the ratio of the largest to smallest value is bigger than 4 or so. Furthermore,
Need positive data with max/min fairly large
power family transformations only make sense when the data are all positive. When we have data with both signs, we can add a constant to all the data to make them positive before transforming. Different constants added lead to different transformations.
Here is a simple method for finding an approximate variance-stabilizing transformation powerλ. Compute the mean and standard deviation for the
data in each treatment group. Regress the logarithms of the standard devi-
Regression method for
choosingλ ations on the logarithms of the group means; let ˆsion slope. Then the estimated variance stabilizing power transformation isβ be the estimated regres-
λ = 1 − ˆβ. If there is no relationship between mean and standard deviation
( ˆβ = 0), then the estimated transformation is the power 1, which doesn’t
change the data. If the standard deviation increases proportionally to the mean ( ˆβ = 1), then the log transformation (power 0) is appropriate for vari-
ance stabilization.
The Box-Cox method for determining a transformation power is some- what more complicated than the simple regression-based estimate, but it
Box-Cox
transformations tends to find a better power and also yields a confidence interval forλ. Fur-
thermore, Box-Cox can be used on more complicated designs where the sim- ple method is difficult to adapt. Box-Cox transformations rescale the power family transformation to make the different powers easier to compare. Let ˙y
6.4 Fixing Problems 129
denote the geometric mean of all the responses, where the geometric mean is the product of all the responses raised to the 1/N power:
˙y = g Y i=1 ni Y j=1 yij 1/N .
The Box-Cox transformations are then
y(λ) = yλ− 1 λ ˙yλ−1 λ 6= 0 ˙y log(y) λ = 0 .
In the Box-Cox technique, we transform the data using a range ofλ val-
ues from, say, -2 to 3, and do the ANOVA for each of these transformations. From these we can getSSE(λ), the sum of squared errors as a function of the
transformation powerλ. The best transformation power λ⋆ is the power that Use best convenient power
minimizesSSE(λ). We generally use a convenient transformation power λ
close toλ⋆, where by convenient I mean a “pretty” power, like .5 or 0, rather than the actual minimizing power which might be something like .427.
The Box-Cox minimizing powerλ⋆will rarely be exactly 1; when should you actually use a transformation? A graphical answer is obtained by making the suggested transformation and seeing if the residual plot looks better. If there was little change in the variances or the group variances were not that different to start with, then there is little to be gained by making the transfor-
mation. A more formal answer can be obtained by computing an approximate Confidence interval forλ
1 − E confidence interval for the transformation power λ. This confidence
interval consists of all powersλ such that
SSE(λ) ≤ SSE(λ⋆)(1 +
FE,1,ν
ν ) ,
whereν is the degrees of freedom for error. Very crudely, if the transforma-
tion doesn’t decrease the error sum of squares by a factor of at leastν/(ν +4),
then λ = 1 is in the confidence interval, and a transformation may not be
needed. When I decide whether a transformation is indicated, I tend to rely mostly on a visual judgement of whether the residuals improve after trans- formation, and secondarily on the confidence interval.
130 Checking Assumptions 1500 2000 2500 3000 3500 4000 4500 -1 -0.5 0 0.5 1 1.5 Power S S
Figure 6.8:Box-Cox error SS versus transformation power for resin lifetime data.
Example 6.7 Resin lifetimes, continued
The resin lifetime data on the original scale show considerable nonconstant variance. The treatment means and variances are
1 2 3 4 5
Mean 86.42 43.56 24.52 15.72 11.87
Variance 169.75 91.45 41.07 3.00 13.69
If we regress the log standard deviations on the log means, we get a slope of .86 for an estimated transformation power of .14; we would probably use a log (power 0) or quarter power since they are near the estimated power.
We can use Box-Cox to suggest an appropriate transformation. Fig- ure 6.8 showsSSE(λ) plotted against transformation power for powers be-
tween −1 and 1.5; the minimum appears to be about 1270 near a power of .25. The logarithm does nearly as well as the quarter power (SSE(0) is
nearly as small asSSE(.25)), and the log is easier to work with, so we will
use the log transformation. As a check, the 95% confidence interval for the transformation power includes all powers with Box-Cox error SS less than
6.4 Fixing Problems 131 2.0 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.2 0.1 0.0 -0.1 -0.2 Fitted value Residual
Residuals versus the fitted values
(response is log life)
Figure 6.9:Residuals versus predicted plot for resin log lifetime data, using Minitab.
the log has anSSE well below the line, and the original scale has anSSE
well above the line, suggesting that the logarithm is the way to go. Figure 6.9 shows the improvement in residuals versus fitted values after transformation. There is no longer as strong a tendency for the residuals to be larger when the mean is larger.
Alternative methods
Dealing with nonconstant variance has provided gainful employment to statis- ticians for many years, so there are a number of alternative methods to con- sider. The simplest situation may be when the ratio of the variances in the different groups is known. For example, suppose that the response for each unit in treatments 1 and 2 is the average from five measurement units, and
the response for each unit in treatments 3 and 4 is the average from seven Weighted ANOVA when ratio of variances is known
measurement units. If the variance among measurement units is the same, then the variance between experimental units in treatments 3 and 4 would be 5/7 the size of the variance between experimental units in treatments 1
132 Checking Assumptions
and 2 (assuming no other sources of variation), simply due to different num- bers of values in each average. Situations such as this can be handled using
weighted ANOVA, where each unit receives a weight proportional to the num-
ber of measurement units used in its average. Most statistical packages can handle weighted ANOVA.
For pairwise comparisons, the Welch procedure is quite attractive. This procedure is sometimes called the “unpooled”t-test. Let s2
i denote the sam-
Welch’stfor pairwise
comparisons with unequal variance
ple variance in treatmenti. Then the Welch test statistic for testing µi = µj
is tij = yi•− yj• q s2i/ni+ s2j/nj .
This test statistic is compared to a Student’st distribution with
ν = (s2i/ni+ s2j/nj)2/ 1 ni− 1 s4i n2 i + 1 nj− 1 s4j n2 j !
degrees of freedom. For a confidence interval, we compute
tij = yi•− yj•± tE/2,ν
q s2
i/ni+ s2j/nj ,
withν computed in the same way. More generally, for a contrast we use
t = Pg
i wiyi•
qPg
i w2is2i/ni
with approximate degrees of freedom
ν = ( g X i=1 w2is2i/ni)2/ g X i=1 1 ni− 1 w4is4i n2 i ! .
Confidence intervals are computed in an analogous way.
The Welch procedure generally gives observed error rates close to the nominal error rates. Furthermore, the accuracy improves quickly as the sam- ple sizes increase, something that cannot be said for thet and F-tests under
Welch’stworks
well nonconstant variance. Better still, there is almost no loss in power for using the Welch procedure, even when the variances are equal. For simple com- parisons, the Welch procedure can be used routinely. The problem arises in generalizing it to more complicated situations.
6.4 Fixing Problems 133
The next most complicated procedure is an ANOVA alternative for non- constant variance. The Brown-Forsythe method is much less sensitive to
nonconstant variance than is the usual ANOVA F test. Again lets2i denote Brown-Forsythe modified F
the sample variance in treatmenti, and let di = s2i(1 − ni/N ). The Brown-
Forsythe modified F-test is
BF = Pg i=1ni(yi•− y••)2 Pg i=1s2i(1 − ni/N ) .
Under the null hypothesis of equal treatment means, BF is approximately distributed as F withg − 1 and ν degrees of freedom, where
ν = ( P idi)2 P id2i/(ni− 1) .
Resin lifetimes, continued Example 6.8
Suppose that we needed confidence intervals for the difference in means be- tween the pairs of temperatures on the original scale for the resin lifetime data. If we use the usual method and ignore the nonconstant variance, then pairwise differences have an estimated standard deviation of
q
68.82(1/ni+ 1/nj) ;
these range from 4.14 to 4.61, depending on sample sizes, and all would use 35 degrees of freedom. Using the Welch procedure, we get standard deviations for pairwise differences ranging from 5.71 (treatments 1 and 2) to 1.65 (treatments 4 and 5), with degrees of freedom ranging from 6.8 to 12.8. Thus the comparisons using the usual method are much too short for pairs such as 1 and 2, and much too long for pairs such as 4 and 5.
Consider now testing the null hypothesis that all groups have the same mean on the original scale. The F ratio from ANOVA is 101.8, with 4 and 32 degrees of freedom. The Brown-Forsythe F is 111.7, with 4 and 18.3 degrees of freedom. Both clearly reject the null hypothesis.