Accommodating nonconstant variance

6.4 Fixing Problems

6.4.2 Accommodating nonconstant variance

The usual way to fix nonconstant error variances is by transformation of the response. For some distributions, there are standard transformations that equalize or stabilize the variance. In other distributions, we use a more ad hoc approach. We can also use some alternative methods instead of the usual ANOVA.

Transformations of the response

There is a general theory of variance-stabilizing transformations that applies to distributions where the variance depends on the mean. For example, Bino- mial(1,_{p) data have a mean of p and a variance of p(1−p). This method uses} the relationship between the mean and the variance to construct a transformation such that the variance of the data after transformation is constant and

Variance- stabilizing transformations

no longer depends on the mean. (See Bishop, Fienberg, and Holland 1975.) These transformations generally work better when the sample size is large

6.4 Fixing Problems 127

Table 6.3:Variance-stabilizing transformations.

Distribution Transformation New variance Binomial proportions X ∼ Bin(n, p) ˆ p = X/n Var(ˆ_{p) = p(1 − p)/n} arcsin(√p)ˆ 1/(4n) Poisson X ∼ Poisson(λ) Var(X) = E(X) = λ √ X 1₄ Correlation coefficient (ui, vi), i = 1, . . . , n are

independent, bivariate normal pairs with correlationρ and

sample correlationρˆ 1 2log _1+ˆ_ρ 1−ˆρ 1

(or the mean is large relative to the standard deviation); modifications may be needed otherwise.

Table 6.3 lists a few distributions with their variance-stabilizing transformations. Binomial proportions model the fraction of success in some number of trials. If all proportions are between about .2 and .8, then the variance is fairly constant and the transformation gives little improvement. The Poisson distribution is often used to model counts; for example, the number of bacte- ria in a volume of solution or the number of asbestos particles in a volume of air.

Artificial insemination in chickens Example 6.6

Tajima (1987) describes an experiment examining the effect of a freeze-thaw cycle on the potency of semen used for artificial insemination in chickens. Four semen mixtures are prepared. Each mixture consists of equal volumes of semen from Rhode Island Red and White Leghorn roosters. Mixture 1 has both varieties fresh, mixture 4 has both varieties frozen, and mixtures 2 and 3 each have one variety fresh and the other frozen. Sixteen batches of Rhode Island Red hens are inseminated with the mixtures, using a balanced completely randomized design. The response is the fraction of chicks from each batch that have white feathers (white feathers indicate a White Leghorn father).

It is natural to model these fractions as binomial proportions. Each chick in a given treatment group has the same probability of having a White Leg-

128 Checking Assumptions

horn father, though this probability may vary between groups due to the freeze-thaw treatments. Thus the total number of chicks with white feathers in a given batch should have a binomial distribution, and the fraction of chicks is a binomial proportion. The observed proportions ranged from .19 to .95, so the arcsine square root transformation is a good bet to stabilize the variability.

When we don’t have a distribution with a known variance-stabilizing transformation (and we generally don’t), then we usually try a power fam-

Power family

transformations _{ily transformation. The power family of transformations includes}

y → sign(λ)yλ

and

y → log(y) ,

where sign(λ) is +1 for positive λ and –1 for negative λ. The log function

corresponds toλ equal to zero. We multiply by the sign of λ so that the order

of the responses is preserved whenλ is negative.

Power family transformations are not likely to have much effect unless the ratio of the largest to smallest value is bigger than 4 or so. Furthermore,

Need positive data with max/min fairly large

power family transformations only make sense when the data are all positive. When we have data with both signs, we can add a constant to all the data to make them positive before transforming. Different constants added lead to different transformations.

Here is a simple method for finding an approximate variance-stabilizing transformation powerλ. Compute the mean and standard deviation for the

data in each treatment group. Regress the logarithms of the standard devi-

Regression method for

choosingλ ations on the logarithms of the group means; let ˆ_{sion slope. Then the estimated variance stabilizing power transformation is}β be the estimated regres-

λ = 1 − ˆβ. If there is no relationship between mean and standard deviation

( ˆβ = 0), then the estimated transformation is the power 1, which doesn’t

change the data. If the standard deviation increases proportionally to the mean ( ˆβ = 1), then the log transformation (power 0) is appropriate for vari-

ance stabilization.

The Box-Cox method for determining a transformation power is some- what more complicated than the simple regression-based estimate, but it

Box-Cox

transformations tends to find a better power and also yields a confidence interval forλ. Fur-

thermore, Box-Cox can be used on more complicated designs where the simple method is difficult to adapt. Box-Cox transformations rescale the power family transformation to make the different powers easier to compare. Let ˙y

6.4 Fixing Problems 129

denote the geometric mean of all the responses, where the geometric mean is the product of all the responses raised to the 1/N power:

˙y =   g Y i=1 ni Y j=1 yij   1/N .

The Box-Cox transformations are then

y(λ) =        yλ_{− 1} λ ˙yλ−1 λ 6= 0 ˙y log(y) λ = 0 .

In the Box-Cox technique, we transform the data using a range ofλ val-

ues from, say, -2 to 3, and do the ANOVA for each of these transformations. From these we can getSSE(λ), the sum of squared errors as a function of the

transformation powerλ. The best transformation power λ⋆ is the power that Use best convenient power

minimizesSSE(λ). We generally use a convenient transformation power λ

close toλ⋆, where by convenient I mean a “pretty” power, like .5 or 0, rather than the actual minimizing power which might be something like .427.

The Box-Cox minimizing powerλ⋆will rarely be exactly 1; when should you actually use a transformation? A graphical answer is obtained by making the suggested transformation and seeing if the residual plot looks better. If there was little change in the variances or the group variances were not that different to start with, then there is little to be gained by making the transfor-

mation. A more formal answer can be obtained by computing an approximate Confidence interval forλ

1 − E confidence interval for the transformation power λ. This confidence

interval consists of all powersλ such that

SSE(λ) ≤ SSE(λ⋆)(1 +

FE,1,ν

ν ) ,

whereν is the degrees of freedom for error. Very crudely, if the transforma-

tion doesn’t decrease the error sum of squares by a factor of at leastν/(ν +4),

then λ = 1 is in the confidence interval, and a transformation may not be

needed. When I decide whether a transformation is indicated, I tend to rely mostly on a visual judgement of whether the residuals improve after transformation, and secondarily on the confidence interval.

130 Checking Assumptions 1500 2000 2500 3000 3500 4000 4500 -1 -0.5 0 0.5 1 1.5 Power S S

Figure 6.8:Box-Cox error SS versus transformation power for resin lifetime data.

Example 6.7 Resin lifetimes, continued

The resin lifetime data on the original scale show considerable nonconstant variance. The treatment means and variances are

1 2 3 4 5

Mean 86.42 43.56 24.52 15.72 11.87

Variance 169.75 91.45 41.07 3.00 13.69

If we regress the log standard deviations on the log means, we get a slope of .86 for an estimated transformation power of .14; we would probably use a log (power 0) or quarter power since they are near the estimated power.

We can use Box-Cox to suggest an appropriate transformation. Fig- ure 6.8 showsSSE(λ) plotted against transformation power for powers be-

tween _{−1 and 1.5; the minimum appears to be about 1270 near a power} of .25. The logarithm does nearly as well as the quarter power (SSE(0) is

nearly as small asSSE(.25)), and the log is easier to work with, so we will

use the log transformation. As a check, the 95% confidence interval for the transformation power includes all powers with Box-Cox error SS less than

6.4 Fixing Problems 131 2.0 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.2 0.1 0.0 -0.1 -0.2 Fitted value Residual

Residuals versus the fitted values

(response is log life)

Figure 6.9:Residuals versus predicted plot for resin log lifetime data, using Minitab.

the log has anSSE well below the line, and the original scale has anSSE

well above the line, suggesting that the logarithm is the way to go. Figure 6.9 shows the improvement in residuals versus fitted values after transformation. There is no longer as strong a tendency for the residuals to be larger when the mean is larger.

Alternative methods

Dealing with nonconstant variance has provided gainful employment to statis- ticians for many years, so there are a number of alternative methods to consider. The simplest situation may be when the ratio of the variances in the different groups is known. For example, suppose that the response for each unit in treatments 1 and 2 is the average from five measurement units, and

the response for each unit in treatments 3 and 4 is the average from seven Weighted ANOVA when ratio of variances is known

measurement units. If the variance among measurement units is the same, then the variance between experimental units in treatments 3 and 4 would be 5/7 the size of the variance between experimental units in treatments 1

132 Checking Assumptions

and 2 (assuming no other sources of variation), simply due to different num- bers of values in each average. Situations such as this can be handled using

weighted ANOVA, where each unit receives a weight proportional to the num-

ber of measurement units used in its average. Most statistical packages can handle weighted ANOVA.

For pairwise comparisons, the Welch procedure is quite attractive. This procedure is sometimes called the “unpooled”t-test. Let s2

i denote the sam-

Welch’stfor pairwise

comparisons with unequal variance

ple variance in treatmenti. Then the Welch test statistic for testing µi = µj

is tij = y_i•_{− y}_j• q s2_i/ni+ s2j/nj .

This test statistic is compared to a Student’st distribution with

ν = (s2_i/ni+ s2j/nj)2/ 1 ni− 1 s4_i n2 i + 1 nj− 1 s4_j n2 j !

degrees of freedom. For a confidence interval, we compute

tij = yi•− yj•± tE/2,ν

q s2

i/ni+ s2j/nj ,

withν computed in the same way. More generally, for a contrast we use

t = Pg

i wiyi•

qPg

i w2is2i/ni

with approximate degrees of freedom

ν = ( g X i=1 w2_is2_i/ni)2/ g X i=1 1 ni− 1 w4_is4_i n2 i ! .

Confidence intervals are computed in an analogous way.

The Welch procedure generally gives observed error rates close to the nominal error rates. Furthermore, the accuracy improves quickly as the sample sizes increase, something that cannot be said for thet and F-tests under

Welch’stworks

well nonconstant variance. Better still, there is almost no loss in power for using the Welch procedure, even when the variances are equal. For simple comparisons, the Welch procedure can be used routinely. The problem arises in generalizing it to more complicated situations.

6.4 Fixing Problems 133

The next most complicated procedure is an ANOVA alternative for nonconstant variance. The Brown-Forsythe method is much less sensitive to

nonconstant variance than is the usual ANOVA F test. Again lets2_i denote Brown-Forsythe modified F

the sample variance in treatmenti, and let di = s2i(1 − ni/N ). The Brown-

Forsythe modified F-test is

BF = Pg i=1ni(yi•− y••)2 Pg i=1s2i(1 − ni/N ) .

Under the null hypothesis of equal treatment means, BF is approximately distributed as F with_{g − 1 and ν degrees of freedom, where}

ν = ( P idi)2 P id2i/(ni− 1) .

Resin lifetimes, continued Example 6.8

Suppose that we needed confidence intervals for the difference in means between the pairs of temperatures on the original scale for the resin lifetime data. If we use the usual method and ignore the nonconstant variance, then pairwise differences have an estimated standard deviation of

68.82(1/ni+ 1/nj) ;

these range from 4.14 to 4.61, depending on sample sizes, and all would use 35 degrees of freedom. Using the Welch procedure, we get standard deviations for pairwise differences ranging from 5.71 (treatments 1 and 2) to 1.65 (treatments 4 and 5), with degrees of freedom ranging from 6.8 to 12.8. Thus the comparisons using the usual method are much too short for pairs such as 1 and 2, and much too long for pairs such as 4 and 5.

Consider now testing the null hypothesis that all groups have the same mean on the original scale. The F ratio from ANOVA is 101.8, with 4 and 32 degrees of freedom. The Brown-Forsythe F is 111.7, with 4 and 18.3 degrees of freedom. Both clearly reject the null hypothesis.

In document Experimentos (Page 147-154)