An alternative that makes some hand work simpler assumes thatµ⋆ is the weighted average of the treatment means, with the sample sizesni used as
weights: µ⋆= g X i=1 niµi/N .
For this choice, the weighted sum of the treatment effects is zero: Or weighted sum of treatment effects is zero g X i=1 niαi = 0 .
When the sample sizes are equal, these two choices coincide. The computa- tional formulae we give in this book will use the restriction that the weighted sum of theαi’s is zero, because it leads to somewhat simpler hand computa-
tions. Some of the formulae in later chapters are only valid when the sample sizes are equal.
Our restriction that the treatment effectsαi add to zero (either weighted
or not) implies that the treatment effects are not completely free to vary. We Degrees of freedom for treatment effects
can setg − 1 of them however we wish, but the remaining treatment effect is then determined because it must be whatever value makes the zero sum true. We express this by saying that the treatment effects haveg − 1 degrees of freedom.
3.4
Estimating Parameters
Most data analysis these days is done using a computer. Few of us sit down and crunch through the necessary calculations by hand. Nonetheless, know- ing the basic formulae and ideas behind our analysis helps us understand and interpret the quantities that come out of the software black box. If we don’t understand the quantities printed by the software, we cannot possibly use them to understand the data and answer our questions.
The parameters of our group means model are the treatment means µi
and the variance σ2, plus the derived parameters µ⋆ and the αi’s. We will Unbiased
estimators correct on average
be computing “unbiased” estimates of these parameters. Unbiased means that when you average the values of the estimates across all potential random errorsǫij, you get the true parameter values.
It is convenient to introduce a notation to indicate the estimator of a pa- rameter. The usual notation in statistics is to put a “hat” over the parameter to indicate the estimator; thusµ is an estimator of µ. Because we have parame-b
40 Completely Randomized Designs
Let’s establish some notation for sample averages and the like. The sum of the observations in theith treatment group is
yi•= ni
X
j=1
yij .
The mean of the observations in theith treatment group is
Treatment means yi•= 1 ni ni X j=1 yij = yi•/ni.
The overbar indicates averaging, and the dot (•) indicates that we have aver- aged (or summed) over the indicated subscript. The sum of all observations is y••= g X i=1 ni X j=1 yij = g X i=1 yi•,
and the grand mean of all observations is
Grand mean y••= 1 N g X i=1 ni X j=1 yij = y••/N .
The sum of squared deviations of the data from the group means is
SSE = g X i=1 ni X j=1 (yij− yi•)2 .
TheSSE measures total variability in the data around the group means.
Consider first the separate means model, with each treatment group hav- ing its own mean µi. The natural estimator ofµi isyi•, the average of the
b
µi= yi•
observations in that treatment group. We estimate the expected (or average) response in theith treatment group by the observed average in the ith treat-
ment group responses. Thus we have
b
µi= yi•.
The sample average is an unbiased estimator of the population average, soµbi
is an unbiased estimator ofµi.
In the single mean model, the only parameter in the model for the means isµ. The natural estimator of µ is y••, the grand mean of all the responses.
b
3.4 Estimating Parameters 41
That is, if we felt that all the data were responses from the same population, we would estimate the mean of that single population by the grand mean of the data. Thus we have
b
µ = y••.
The grand mean is an unbiased estimate ofµ when the data all come from a
single population.
We use the restriction thatµ⋆=Piniµi/N ; an unbiased estimate of µ⋆
is b µ⋆= Pg i=1niµbi N = Pg i=1niyi• N = y•• N = y•• .
This is the same as the estimator we use for µ in the single mean model. µ = µ⋆for
weighted sum restriction
Becauseµ and µ⋆ are both estimated by the same value, we will drop the notationµ⋆and just use the single notationµ for both roles.
The treatment effectsαiare
αi = µi− µ ;
these can be estimated by αbi=yi•− y•• b
αi = µbi−µb
= yi•− y••.
These treatment effects and estimates satisfy the restriction
g X i=1 niαi = g X i=1 niαbi = 0 .
The only parameter remaining to estimate isσ2. Our estimator ofσ2is
b σ2 = M SE = SSE N − g = Pg i=1 Pni j=1(yij − yi•)2 N − g .
We sometimes use the notations in place of bσ in analogy with the sample bσ2
is unbiased for
σ2
standard deviations. This estimatorσb2is unbiased forσ2in both the separate means and single means models. (Note thatbσ is not unbiased for σ.)
The deviations from the group meanyij−yi•add to zero in any treatment
group, so that anyni− 1 of them determine the remaining one. Put another
way, there areni− 1 degrees of freedom for error in each group, or N − g = Error degrees of
freedom
P
i(ni− 1) degrees of freedom for error for the experiment. There are thus
42 Completely Randomized Designs
Model Parameter Estimator
Single mean µ y•• σ2 Pg i=1 Pni j=1(yij−yi•) 2 N −g Separate means µ y•• µi yi• αi yi•− y•• σ2 Pg i=1 Pni j=1(yij−yi•) 2 N −g
Display 3.1:Point estimators in the CRD.
formulan1+n2−2 for the degrees of freedom in a two-sample t-test. Another
way to think ofN −g is the number of data values minus the number of mean parameters estimated.
The formulae for these estimators are collected in Display 3.1. The next example illustrates their use.
Example 3.5 Resin lifetimes, continued
Most of the work for computing point estimates is done once we get the av- erage responses overall and in each treatment group. Using the resin lifetime data from Table 3.1, we get the following means and counts:
Treatment (oC) 175 194 213 231 250 All data Average 1.933 1.629 1.378 1.194 1.057 1.465
Count 8 8 8 7 6 37
The estimatesµbi andµ can be read from the table:b
b
µ1 = 1.933 µb2 = 1.629 µb3 = 1.378
b
µ4 = 1.194 µb5 = 1.057 µ = 1.465b
Get theαbivalues by subtracting the grand mean from the group means:
b α1= 1.932 − 1.465 = .467 αb2 = 1.629 − 1.465 = .164 b α3= 1.378 − 1.465 = −.088 αb4 = 1.194 − 1.465 = −.271 b α5= 1.057 − 1.465 = −.408
3.4 Estimating Parameters 43
Notice thatPgi=1niαbi = 0 (except for roundoff error).
The computation forσb2is a bit more work, because we need to compute theSSE. For the resin data,SSE is
SSE = (2.04 − 1.933)2+ (1.91 − 1.933)2+ · · · + (1.90 − 1.933)2 + (1.66 − 1.629)2+ (1.71 − 1.629)2+ · · · + (1.66 − 1.629)2 + (1.53 − 1.378)2+ (1.54 − 1.378)2+ · · · + (1.38 − 1.378)2 + (1.15 − 1.194)2+ (1.22 − 1.194)2+ · · · + (1.17 − 1.194)2 + (1.26 − 1.057)2+ (.83 − 1.057)2+ · · · + (1.06 − 1.057)2 = .29369 Thus we have b σ2= SSE/(N − g) = .29369/(37 − 5) = .009178 .
A point estimate gives our best guess as to the value of a parameter. A
confidence interval gives a plausible range for the parameter, that is, a set of Confidence intervals for means and effects
parameter values that are consistent with the data. Confidence intervals forµ
and theµi’s are useful and straightforward to compute. Confidence intervals
for the αi’s are only slightly more trouble to compute, but are perhaps less
useful because there are several potential ways to define theα’s. Differences
betweenµi’s, or equivalently, differences betweenαi’s, are extremely useful;
these will be considered in depth in Chapter 4. Confidence intervals for the error varianceσ2 will be considered in Chapter 11.
Confidence intervals for parameters in the mean structure have the gen-
eral form: Generic
confidence interval for mean parameter
unbiased estimate± multiplier × (estimated) standard error of estimate.
The standard errors for the averagesy••andyi• areσ/
√
N and σ/√ni re-
spectively. We do not knowσ, so we use bσ = s = √M SE as an estimate
and obtains/√N and s/√nias estimated standard errors fory••andyi•.
For an interval with coverage1 − E, we use the upper E/2 percent point of thet-distribution with N − g degrees of freedom as the multipler. This is
denotedtE/2,N −g. We use theE/2 percent point because we are constructing Usetmultiplier
when error is estimated
a two-sided confidence interval, and we are allowing error rates of E/2 on both the low and high ends. For example, we use the upper 2.5% point (or 97.5% cumulative point) oft for 95% coverage. The degrees of freedom for
thet-distribution come fromσb2, our estimate of the error variance. For the
CRD, the degrees of freedom areN − g, the number of data points minus the number of treatment groups.
44 Completely Randomized Designs
Parameter Estimator Standard Error
µ y•• s/ √ N µi yi• s/√ni αi yi•− y•• s q 1/ni− 1/N Display 3.2:Standard errors of point estimators in the CRD.
The standard error of an estimated treatment effectαbiisσp1/ni− 1/N .
Again, we must use an estimate ofσ, yielding sp1/ni− 1/N for the esti-
mated standard error. Keep in mind that the treatment effects αbi are nega-
tively correlated, because they must add to zero.