Estimating Parameters - Experimentos

An alternative that makes some hand work simpler assumes thatµ⋆ is the weighted average of the treatment means, with the sample sizesni used as

weights: µ⋆= g X i=1 niµi/N .

For this choice, the weighted sum of the treatment effects is zero: Or weighted sum of treatment effects is zero g X i=1 niαi = 0 .

When the sample sizes are equal, these two choices coincide. The computa- tional formulae we give in this book will use the restriction that the weighted sum of theαi’s is zero, because it leads to somewhat simpler hand computa-

tions. Some of the formulae in later chapters are only valid when the sample sizes are equal.

Our restriction that the treatment effectsαi add to zero (either weighted

or not) implies that the treatment effects are not completely free to vary. We Degrees of freedom for treatment effects

can set_{g − 1 of them however we wish, but the remaining treatment effect is} then determined because it must be whatever value makes the zero sum true. We express this by saying that the treatment effects have_{g − 1 degrees of} freedom.

3.4 Estimating Parameters

Most data analysis these days is done using a computer. Few of us sit down and crunch through the necessary calculations by hand. Nonetheless, know- ing the basic formulae and ideas behind our analysis helps us understand and interpret the quantities that come out of the software black box. If we don’t understand the quantities printed by the software, we cannot possibly use them to understand the data and answer our questions.

The parameters of our group means model are the treatment means µi

and the variance σ2, plus the derived parameters µ⋆ and the αi’s. We will Unbiased

estimators correct on average

be computing “unbiased” estimates of these parameters. Unbiased means that when you average the values of the estimates across all potential random errorsǫij, you get the true parameter values.

It is convenient to introduce a notation to indicate the estimator of a parameter. The usual notation in statistics is to put a “hat” over the parameter to indicate the estimator; thusµ is an estimator of µ. Because we have parame-b

40 Completely Randomized Designs

Let’s establish some notation for sample averages and the like. The sum of the observations in theith treatment group is

yi•= ni

j=1

yij .

The mean of the observations in theith treatment group is

Treatment means y_i•= 1 ni ni X j=1 yij = yi•/ni.

The overbar indicates averaging, and the dot (_{•) indicates that we have aver-} aged (or summed) over the indicated subscript. The sum of all observations is y••= g X i=1 ni X j=1 yij = g X i=1 yi•,

and the grand mean of all observations is

Grand mean y_••= 1 N g X i=1 ni X j=1 yij = y••/N .

The sum of squared deviations of the data from the group means is

SSE = g X i=1 ni X j=1 (yij− yi•)2 .

TheSSE measures total variability in the data around the group means.

Consider first the separate means model, with each treatment group hav- ing its own mean µi. The natural estimator ofµi isyi•, the average of the

µi= yi•

observations in that treatment group. We estimate the expected (or average) response in theith treatment group by the observed average in the ith treat-

ment group responses. Thus we have

µi= yi•.

The sample average is an unbiased estimator of the population average, soµbi

is an unbiased estimator ofµi.

In the single mean model, the only parameter in the model for the means isµ. The natural estimator of µ is y••, the grand mean of all the responses.

3.4 Estimating Parameters 41

That is, if we felt that all the data were responses from the same population, we would estimate the mean of that single population by the grand mean of the data. Thus we have

µ = y_••.

The grand mean is an unbiased estimate ofµ when the data all come from a

single population.

We use the restriction thatµ⋆=P_iniµi/N ; an unbiased estimate of µ⋆

is b µ⋆= Pg i=1niµbi N = Pg i=1niyi• N = y•• N = y•• .

This is the same as the estimator we use for µ in the single mean model. µ = µ⋆_for

weighted sum restriction

Becauseµ and µ⋆ are both estimated by the same value, we will drop the notationµ⋆and just use the single notationµ for both roles.

The treatment effectsαiare

αi = µi− µ ;

these can be estimated by α_bi=yi•− y•• b

αi = µbi−µb

= y_i•_{− y}••.

These treatment effects and estimates satisfy the restriction

g X i=1 niαi = g X i=1 niαbi = 0 .

The only parameter remaining to estimate isσ2. Our estimator ofσ2is

b σ2 = M SE = SSE N − g = Pg i=1 Pni j=1(yij − yi•)2 N − g .

We sometimes use the notations in place of bσ in analogy with the sample _bσ2

is unbiased for

σ2

standard deviations. This estimatorσb2is unbiased forσ2in both the separate means and single means models. (Note thatbσ is not unbiased for σ.)

The deviations from the group meanyij−yi•add to zero in any treatment

group, so that anyni− 1 of them determine the remaining one. Put another

way, there areni− 1 degrees of freedom for error in each group, or N − g = Error degrees of

freedom

i(ni− 1) degrees of freedom for error for the experiment. There are thus

42 Completely Randomized Designs

Model Parameter Estimator

Single mean µ y•• σ2 Pg i=1 Pni j=1(yij−yi•) 2 N −g Separate means µ y•• µi yi• αi yi•− y•• σ2 Pg i=1 Pni j=1(yij−yi•) 2 N −g

Display 3.1:Point estimators in the CRD.

formulan1+n2−2 for the degrees of freedom in a two-sample t-test. Another

way to think of_{N −g is the number of data values minus the number of mean} parameters estimated.

The formulae for these estimators are collected in Display 3.1. The next example illustrates their use.

Example 3.5 Resin lifetimes, continued

Most of the work for computing point estimates is done once we get the average responses overall and in each treatment group. Using the resin lifetime data from Table 3.1, we get the following means and counts:

Treatment (oC) 175 194 213 231 250 All data Average 1.933 1.629 1.378 1.194 1.057 1.465

Count 8 8 8 7 6 37

The estimatesµ_bi andµ can be read from the table:b

µ1 = 1.933 µb2 = 1.629 µb3 = 1.378

µ4 = 1.194 µb5 = 1.057 µ = 1.465b

Get theαbivalues by subtracting the grand mean from the group means:

b α1= 1.932 − 1.465 = .467 αb2 = 1.629 − 1.465 = .164 b α3= 1.378 − 1.465 = −.088 αb4 = 1.194 − 1.465 = −.271 b α5= 1.057 − 1.465 = −.408

3.4 Estimating Parameters 43

Notice thatPg_i=1niαbi = 0 (except for roundoff error).

The computation forσb2is a bit more work, because we need to compute theSSE. For the resin data,SSE is

SSE = (2.04 − 1.933)2+ (1.91 − 1.933)2+ · · · + (1.90 − 1.933)2 + (1.66 − 1.629)2+ (1.71 − 1.629)2+ · · · + (1.66 − 1.629)2 + (1.53 − 1.378)2+ (1.54 − 1.378)2+ · · · + (1.38 − 1.378)2 + (1.15 − 1.194)2+ (1.22 − 1.194)2+ · · · + (1.17 − 1.194)2 + (1.26 − 1.057)2+ (.83 − 1.057)2+ · · · + (1.06 − 1.057)2 = .29369 Thus we have b σ2= SSE/(N − g) = .29369/(37 − 5) = .009178 .

A point estimate gives our best guess as to the value of a parameter. A

confidence interval gives a plausible range for the parameter, that is, a set of Confidence intervals for means and effects

parameter values that are consistent with the data. Confidence intervals forµ

and theµi’s are useful and straightforward to compute. Confidence intervals

for the αi’s are only slightly more trouble to compute, but are perhaps less

useful because there are several potential ways to define theα’s. Differences

betweenµi’s, or equivalently, differences betweenαi’s, are extremely useful;

these will be considered in depth in Chapter 4. Confidence intervals for the error varianceσ2 will be considered in Chapter 11.

Confidence intervals for parameters in the mean structure have the gen-

eral form: Generic

confidence interval for mean parameter

unbiased estimate_{± multiplier × (estimated) standard error of estimate.}

The standard errors for the averagesy••andyi• areσ/

√

N and σ/√ni re-

spectively. We do not knowσ, so we use bσ = s = √M SE as an estimate

and obtains/√N and s/√nias estimated standard errors fory••andyi•.

For an interval with coverage_{1 − E, we use the upper E/2 percent point} of the_{t-distribution with N − g degrees of freedom as the multipler. This is}

denotedtE/2,N −g. We use theE/2 percent point because we are constructing Usetmultiplier

when error is estimated

a two-sided confidence interval, and we are allowing error rates of _{E/2 on} both the low and high ends. For example, we use the upper 2.5% point (or 97.5% cumulative point) oft for 95% coverage. The degrees of freedom for

thet-distribution come fromσb2_{, our estimate of the error variance. For the}

CRD, the degrees of freedom are_{N − g, the number of data points minus the} number of treatment groups.

44 Completely Randomized Designs

Parameter Estimator Standard Error

µ y•• s/ √ N µi yi• s/√ni αi yi•− y•• s q 1/ni− 1/N Display 3.2:Standard errors of point estimators in the CRD.

The standard error of an estimated treatment effectαbiisσp1/ni− 1/N .

Again, we must use an estimate ofσ, yielding sp1/ni− 1/N for the esti-

mated standard error. Keep in mind that the treatment effects α_bi are nega-

tively correlated, because they must add to zero.

In document Experimentos (Page 60-65)