• No results found

Besides differences in location, we are often interested in the difference between scales for populations. Let X1, . . . , Xn1 be a random sample with the common pdf f [(x − θ1)/σ1] and Y1, . . . , Yn2 be a random sample with the common pdf f [(y − θ2)/σ2], where f (x) is a pdf and σ1, σ2 > 0. In this section our hypotheses of interest are

H0: η = 1 versus HA: η 6= 1, (3.24) where η = σ21. Besides discussing rank-based tests for these hypotheses, we also consider the associated estimation of η, along with a confidence interval for η. So here the location parameters θ1and θ2 are nuisance parameters.

As discussed in Section 2.10 of Hettmansperger and McKean (2011), there are asymptotically distribution-free rank-based procedures for this problem.

We discuss the Fligner–Killeen procedure based on folded, aligned samples.

The aligned samples are defined by

Xi = Xi− med{Xl}, i = 1, . . . , n1

Yj = Yj− med{Yl}, j = 1, . . . , n2. (3.25)

Next, the folded samples are |X1|, . . . , |Xn1|, |Y1|, . . . , |Yn2|. The folded sam-ples consist of positive items and their logs, essentially, differ by a location parameter, i.e., ∆ = log(η). This suggests the following log-linear model. De-fine Zi by

Zi=

 log |Xi| i = 1, . . . , n1

log |Yi−n 1| i = n1+ 1, . . . , n1+ n2.

Let c be the indicator vector with its first n1 entries set at 0 and its last n2

entries set at 1. Then the log-linear model for the aligned, folded sample is Zi= ∆ci+ ei, i = 1, 2, . . . , n. (3.26) Our rank-based procedure is clear. Select an appropriate score function ϕ(u) and generate the scores aϕ(i) = ϕ[i/(n + 1)]. Obtain the rank-based fit of Model (5.28) and, hence, the estimator ˆ∆ϕof ∆. The estimator of η is then ˆ

ηϕ = exp{ ˆ∆ϕ}. For specified 0 < α < 1, denote by (Lϕ, Uϕ) the confidence interval for ∆ based on the fit; i.e.,

Lϕ= ˆ∆ϕ− tα/2τˆϕ

q 1

n1 +n12 and Uϕ= ˆ∆ϕ+ tα/2τˆϕ

q 1 n1 +n12. An approximate (1 − α)100% confidence interval for η is (exp{Lϕ}, exp{Uϕ}).

Similar to the estimator ˆηϕ, an attractive property of this confidence interval is that its endpoints are always positive.

This confidence interval for η can be used to test the hypotheses (3.24);

however, the gradient test is often used in practice. Because the log function is strictly increasing, the gradient test statistic is given by

Sϕ=

where the ranks are over the combined folded, aligned samples. The standard-ized test statistic is z = (Sϕ− µϕ)/σϕ, where

µϕ= 1nPn

i=1a(i) = a and σ2ϕ=n(n−1)n1n2 Pn

i=1(a(i) − a)2. (3.28) What scores are appropriate? The case of most interest in applications is when the underlying distribution of the random errors in Model (5.28) is normal. In this case the optimal score2 function is given by

ϕF K=

Hence, the scores are of the form squared-normal scores. Note that these are light-tail score functions, which is not surprising because the scores are

2See Section 2.10 of Hettmansperger and McKean (2011).

optimal for random variables which are distributed as log(|W |) where W has a normal distribution. Usually the test statistic is written as

SF K =

This test statistic is discussed in Fligner and Killeen (1976) and Section 2.10 of Hettmansperger and McKean (2011). The scores generated by the score function (3.29) are in npsm under fkscores. Using these scores, straightfor-ward code leads to the computation of the Fligner–Killeen procedure. We have assembled the code in the R function fk.test which has similar arguments as other standard two-sample procedures.

> args(fk.test)

function (x, y, alternative = c("two.sided", "less", "greater"), conf.level = 0.95)

NULL

In the call, x and y are vectors containing the original samples (not the folded, aligned samples); the argument alternative sets the hypothesis (default is two-sided); and conf.level sets the confidence coefficient of the confidence interval. The following example illustrates the Fligner–Killeen procedure and its computation.

Example 3.3.1 (Effect of Ozone on Weight of Rats). Draper and Smith (1966) present an experiment on the effect of ozone on the weight gain of rats.

Two groups of rats were selected. The control group (n1 = 21) lived in an ozone-free environment for 7 days, while the experimental group (n2 = 22) lived in an ozone environment for 7 days. At the end of the 7 days, the gain in weight of each rat was taken to be the response. The comparison boxplots of the data, Figure 3.5, show a disparity in scale between the two groups.

For this example, the following code segment computes the Fligner–Killeen procedure for a two-sided alternative. The data are in the dataset sievers.

We first split the groups into two vectors x and y as input to the function fk.test.

> data(sievers)

> x <- with(sievers,weight.gain[group==’Control’])

> y <- with(sievers,weight.gain[group==’Ozone’])

> fk.test(x,y)

statistic = 2.095976 , p-value = 0.03608434 95 percent confidence interval:

1.002458 5.636436 Estimate: 2.377034

Control Ozone

−1001020304050

Weight gain

FIGURE 3.5

Comparison boxplots of weight gain in n1 = 21 controls and n2 = 22 ozone treated rats.

Hence, the rank-based estimate of η is 2.337 with the 95% confidence interval of (1.002, 5.636). The standardized Fligner–Killeen test statistic has value 2.09 with the p-value 0.0361 for a two-sided test. Thus there is evidence that rats exposed to ozone have larger variability in their weight gains than nonexposed rats.

The score function (3.29) is optimal for scale if the original samples are from normal populations. Several other score functions are discussed in Hettmansperger and McKean (2011). For example, the Wilcoxon scores are optimal for scale if |X| follows a F (2, 2)-distribution.

It is well known that the traditional F -test based on the ratio of sample variances is generally invalid for nonnormal populations. This can be shown theoretically as in Section 2.10.2 of Hettmansperger and McKean (2011). On the other hand, the Fligner–Killeen test is asymptotically distribution-free over all symmetric error pdfs f (x), and, as the next remark discusses, appears to be valid for skewed-contaminated normal distributions. We discuss several pertinent simulation studies next.

Remark 3.3.1 (Simulation Studies Concerning the FK-Test). Conover et al.

(1983) discuss the results of a large simulation study of tests for scale in the k-sample problem over many distributions for the random errors. The tradi-tional Bartlett’s test (usual F -test in the two-sample problem) is well known to be invalid for nonnormal distributions and this is dramatically shown in this study. Other methods investigated included several folded, aligned tests.

One that performed very well uses a test statistic similar to (3.30) except that the exponent is 1 instead of 2. Over symmetric error distributions that test was valid and showed high empirical power but had some trouble with valid-ity for asymmetric distributions. However, in a simulation study conducted by Hettmansperger and McKean (2011), the test based on the test statis-tic (3.30), (i.e., with the correct exponent 2), (3.30), was empirically valid over a family of contaminated skewed distributions as well as symmetric error distributions.

The base R function fligner.test is based on Conover et al. (1983) which used exponent 1, so the test results will differ from fk.test. Also, fk.test obtains the associated estimate of effect and a confidence interval which are not available in fligner.test.

In the presence of different scales, one would probably not want to perform the usual analysis for a difference in locations. In the next section, we discuss a rank-based analysis using placements, which is appropriate.