ANOVA and ANCOVA
5.1 Introduction
In this chapter, the R functions in the packages Rfit and npsm for the com-putation of fits and inference for standard rank-based analyses of variance (ANOVA)1and analysis of covariance (ANCOVA) type designs are discussed.
These include one-way, two-way, and k-way crossed designs that are covered in Sections 5.2–5.4. Both tests of general linear hypotheses and estimation of effects with standard errors and confidence intervals are emphasized. We also briefly present multiple comparison procedures (MCP), in particular a ro-bust Tukey–Kramer procedure, illustrating their computation via the package Rfit. We also consider the R computation of several traditional nonparamet-ric methods for these designs including the Kruskal–Wallis (Section 5.2.2) and the Jonckheere–Terpstra tests for ordered alternatives (Section 5.6). In the last section, the generalization of the Fligner–Kileen procedure of Chapter 3 to the k-sample scale problem is presented. The rank-based analyses covered in this chapter are for fixed effect models. Ranked-based methods and their computation for mixed (fixed and random) models form the topic of Chapter 8.
As a cursory reading, we suggest Section 5.2 and the two-way design ma-terial of Section 5.3, and the ordered alternative methods of Section 5.6. As usual, our emphasis is on how to easily compute these rank-based procedures using Rfit. Details of the robust rank-based inference for these fixed effect models are discussed in Chapter 4 of Hettmansperger and McKean (2011).
5.2 One-Way ANOVA
Suppose we want to determine the effect that a single factor A has on a re-sponse of interest over a specified population. Assume that A consists of k levels or treatments. In a completely randomized design (CRD), n subjects are randomly selected from the reference population and ni of them are
ran-1Though could be named ANODI for rank-based analysis.
121
domly assigned to level i, i = 1, . . . k. Let the jth response in the ith level be denoted by Yij, j = 1, . . . , ni, i = 1, . . . , k. We assume that the responses are independent of one another and that the distributions among levels differ by at most shifts in location.
Under these assumptions, the full model can be written as
Yij= µi+ eij j = 1, . . . , ni, i = 1, . . . , k , (5.1) where the eijs are iid random variables with density f (x) and distribution function F (x) and the parameter µi is a convenient location parameter for the ith level, (for example, the mean or median of the ith level). This model is often referred to as a one-way design and its analysis as a one-way analysis of variance (ANOVA). Generally, the parameters of interest are the effects (pairwise contrasts),
∆ii′ = µi′ − µi, i 6= i′, 1, . . . , k. (5.2) We can express the model in terms of these simple contrasts. As in the R lm command, we reference the first level. Then the Model (5.1) can be expressed as
Yij =
µ1+ e1j j = 1, . . . , n1
µ1+ ∆i1+ eij j = 1, . . . , ni, i = 2, . . . , k. (5.3) Let ∆ = (∆21, ∆31, . . . , ∆k1)′. Upon fitting the model a residual analysis should be conducted to check these model assumptions. As the full model fit is based on a linear model, the diagnostic procedures discussed in Chapter 4 are implemented for ANOVA and ANCOVA models as well.
Observational studies can also be modeled this way. Suppose k independent samples are drawn, one from each of k populations. If we assume further that the distributions of the populations differ by at most a shift in locations then Model (5.1) is appropriate. Usually, in the case of observational studies, it is necessary to adjust for covariates. These analyses are referred to as the analysis of covariance and are discussed in Section 5.4.
The analysis for the one-way design is usually a test of the hypothesis that all the effects, ∆i’s, are 0, followed by individual comparisons of levels. The hypothesis can be written as
H0: µ1= · · · = µk versus HA: µi6= µi′ for some i 6= i′. (5.4) Confidence intervals for the simple contrasts ∆ii′ can be used for the pairwise comparisons. We next briefly describe the general analysis for the one-way model and discuss its computation by Rfit.
A test of the overall hypothesis (5.4) is based on a reduction in dispersion test, first introduced in (4.4.3). For Rfit, assume that a score function ϕ has been selected; otherwise, Rfit uses the default Wilcoxon score function. As discussed in Section 5.3, let b∆ϕbe the rank-based estimate of ∆ when the full model, (5.1), is fit. Let Dϕ(FULL) = D( b∆ϕ) denote the full model dispersion,
i.e., the minimum value of the dispersion function when this full model is fit.
The reduced model is the location model
Yij = µ + eij j = 1, . . . , ni, i = 1, . . . , k. (5.5) Because the dispersion function is invariant to location, the minimum dis-persion at the reduced model is the disdis-persion of the observations; i.e., D(0) which we call Dϕ(RED). The reduction in dispersion is then RDϕ = Dϕ(RED) − Dϕ(FULL) and, hence, the drop in dispersion test statistic is given by
Fϕ= RDϕ/(k − 1)
bτϕ/2 , (5.6)
where bτϕ is the estimate of scale discussed in Section 3.1. The approximate level α test rejects H0, if Fϕ≥ Fα,k−1,n−k. The traditional LS test is based on a reduction of sums of squares. Replacing this by a reduction in dispersion the test based on Fϕ can be summarized in an ANOVA table much like that for the traditional F -test; see page 298 of Hettmansperger and McKean (2011).
When the linear Wilcoxon scores are used, we often replace the subscript ϕ by the subscript W ; that is, we write the Wilcoxon rank-based F -test statistic as
FW = RDW/(k − 1)
bτW/2 . (5.7)
The Rfit function oneway.rfit computes the robust rank-based one-way analysis. Its arguments are the vector of responses and the corresponding vector of levels. It returns the value of the test statistic and the associated p-value. We illustrate its computation with an example.
Example 5.2.1 (LDL Cholesterol of Quail). Hettmansperger and McKean (2011), page 295, discuss a study which investigated the effect that four drug compounds had on the reduction of low density lipid (LDL) cholesterol in quail. The drug compounds are labeled as I, II, III, and IV. The sample size for each of the first three levels is 10 while 9 quail received compound IV.
The boxplots shown in Figure 5.1 attest to a difference in the LDL levels over treatments.
Using Wilcoxon scores, the results of oneway.rfit are:
> robfit = with(quail,oneway.rfit(ldl,treat))
> robfit Call:
oneway.rfit(y = ldl, g = treat) Overall Test of All Locations Equal Drop in Dispersion Test
F-Statistic p-value
1 2 3 4
50100150
Drug
Ldl level
−2 −1 0 1 2
02468
Normal quantiles
Wilcoxon Studentized residuals
FIGURE 5.1
Plots for LDL cholesterol of quail example.
3.916371 0.016404
Pairwise comparisons using Rfit data: ldl and treat
2 3 4
1 - -
-2 1.00 - 3 0.68 0.99 -4 0.72 0.99 0.55
P value adjustment method: none
The Wilcoxon test statistic has the value FW = 3.92 with p-value 0.0164.
Thus the Wilcoxon test indicates that the drugs differ in their lowering of cholesterol effect. In contrast to the highly significant Wilcoxon test of the
hypothesis (5.4), the LS-based F -test statistic has value 1.14 with the p-value 0.3451. In practice, using the LS results, one would not proceed with com-parisons of the drugs with such a large p-value. Thus, for this dataset, the robust and LS analyses would have different practical interpretations. Also, the coefficient of precision, (3.46), for this data between the Wilcoxon and LS analyses isbσ2/bτ2= 2.72. Hence, the Wilcoxon analysis is much more precise.
The resulting q −q plot (see right panel of Figure 5.1) of the Studentized Wilcoxon residuals indicates that the random errors eij have a skewed distri-bution. R fits based on scores more appropriate than the Wilcoxon for skewed errors are discussed later.
5.2.1 Multiple Comparisons
The second stage of an analysis of a one-way design consists of pairwise com-parisons of the treatments. The robust (1 − α)100% confidence interval to compare the ith and ith′ treatments is given by
∆bii′ ± tα/2,n−1bτ
Often there are many comparisons of interest. For example, in the case of all pairwise comparisons k2
confidence intervals are required. Hence, the overall family error rate is usually of concern. Multiple comparison procedures (MCP) try to control the overall error rate to some degree. There are many robust versions of MCPs from which to choose. The summary function associated with oneway.rfit computes three of the most popular of such procedures.
Assuming that the fit of the full model is in robfit, the syntax of the command is summary(robfit,method="none"). The argument of method produces the following MCPs:
method="none" No adjustment made method="tukey" Tukey–Kramer method="bonferroni" Bonferroni
We give a brief description of these procedures, followed by an example using Rfit.
A protected least significant difference procedure (PLSD) consists of test-ing the hypothesis (5.4) at a specified level α. If H0 is rejected then the comparisons are based on the confidence intervals (5.8) with confidence coef-ficient 1 − α. On the other hand, if H0 is not rejected then the analysis stops.
Although this procedure does not control the overall family rate of error, the initial F -test offers protection, which has been confirmed (See, for example, McKean et al. (1989)) in simulation studies.
For the Tukey–Kramer procedure, Studentized range critical values replace the t-critical values in the intervals (5.8). When the traditional procedure is used, the random errors have a normal distribution, and the design is balanced
then the Tukey–Kramer procedure has family error rate α. When either of these assumptions fail then the Tukey–Kramer procedure has a family error rate approximately equal to α.
The Bonferroni procedure depends on the number of comparisons made.
Suppose there are l comparisons of interest, then the Bonferroni procedure uses the intervals (5.8) with the critical value tα/(2l),n−k. The Bonferroni procedure has overall family rate ≤ α. If all pairwise comparisons are desired then l = k2
. Example 5.2.2 (LDL Cholesterol of Quail, Continued). For the quail data, we selected the Tukey–Kramer procedure for all six pairwise comparisons. The Rfitcomputation is:
> summary(robfit,method="tukey") Multiple Comparisons
Method Used tukey
I J Estimate St Err Lower Bound CI Upper Bound CI 1 1 2 -25.00720 8.26820 -47.30572 -2.70868 2 1 3 -3.99983 8.26820 -26.29835 18.29869 3 1 4 -5.00027 8.49476 -27.90982 17.90928
4 2 3 21.00737 8.26820 -1.29115 43.30589
5 2 4 20.00693 8.49476 -2.90262 42.91648
6 3 4 -1.00044 8.49476 -23.91000 21.90911
The Tukey–Kramer procedure declares that the Drug Compounds I and II are statistically significantly different.
5.2.2 Kruskal–Wallis Test
Note that Model (5.1) generalizes the two-sample problem of Chapter 2 to k-samples. In this section, we discuss the Kruskal–Wallis test of the hypotheses (5.4) which is a generalization of the two-sample MWW test.
Assume then that Model (5.1) holds for the responses Yij, j = 1, . . . ni and i = 1, . . . , k. As before let n =Pn
i=1ni denote the total sample size. Let Rij
denote the rank of the response Yij among all n observations; i.e., the ranking is done without knowledge of treatment. Let Ri denote the average of the ranks for sample i. The test statistic H is a standardized weighted average of the squared deviations of the Ri from the average of all ranks (n + 1)/2. The test statistic is
The statistic H is the Kruskal–Wallis test statistic; see Kruskal and Wallis (1952). Under H0 H is distribution-free and there are some tables available for its exact distribution.2It also has an asymptotic χ2-distribution with k −1
2See Chapter 6 of Hollander and Wolfe (1999).
degrees of freedom under H0. The R command is kruskal.test. Assume the responses are in one vector x, and the group or treatment assignments are vector g, then the call is kruskal.test(x,g). In addition a formula can be used as in kruskal.test(x~g). We illustrate the computation with the following example.
Example 5.2.3 (Mucociliary Efficiency). Hollander and Wolfe (1999), page 192, discuss a small study which assessed the mucociliary efficiency from the rate of dust in the three groups: normal subjects, subjects with obstructive airway disease, and subjects with asbestosis. The responses are the mucociliary clearance half-times of the subjects. The sample sizes are small: n1= n3= 5 and n2 = 4. Hence, n = 14. The data are given in the R vectors normal, obstruct, and asbestosis in the following code segment which computes the Kruskal–Wallis test.
> normal <- c(2.9,3.0,2.5,2.6,3.2)
> obstruct <- c(3.8,2.7,4.0,2.4)
> asbestosis <- c(2.8,3.4,3.7,2.2,2.0)
> x <- c(normal,obstruct,asbestosis)
> g <- c(rep(1,5),rep(2,4),rep(3,5))
> kruskal.test(x,g)
Kruskal-Wallis rank sum test data: x and g
Kruskal-Wallis chi-squared = 0.7714, df = 2, p-value = 0.68 Based on this p-value, there do not appear to be differences among the groups for mucociliary efficiency.
Corrections for ties for the Kruskal–Wallis test are discussed in Hollander and Wolfe (1999) and kruskal.test does make such adjustments in its cal-culation. As discussed in Hettmansperger and McKean (2011), the Kruskal–
Wallis test is asymptotically equivalent to the drop in dispersion test (5.6) using Wilcoxon scores.3