4. ESTIMATION
4.2 Estimators and Sampling Distributions
Suppose that some attribute of interest for a population or process can be represented by a parameter in a statistical model. We assume that can be estimated using a random sample drawn from the population or process in question. Recall in Chapter 2 that a point estimate of , denoted as ^, was de…ned as a function of the observed sample y1; : : : ; yn,
^ = g(y1; : : : ; yn): (4.1) For example ^ = y = 1 n n P i=1 yi
is a point estimate of if y1; : : : ; yn is an observed random sample from a Poisson distrib-
ution with mean .
The method of maximum likelihood provides a general method for obtaining estimates, but other methods exist. For example, if = E(Y ) = is the average (mean) value of y in the population, then the sample mean ^ = y is an intuitively sensible estimate; it is the maximum likelihood estimate of if Y has a G ( ; ) distribution but because of the Central Limit Theorem it is a good estimate of more generally. Thus, while we will use maximum likelihood estimation a great deal, you should remember that the discussion below applies to estimates of any type.
The problem facing us in this chapter is how to determine or quantify the uncertainty in an estimate. We do this using sampling distributions21, which are based on the following idea. If we select random samples on repeated occasions, then the estimates ^ obtained from the di¤erent samples will vary. For example, …ve separate random samples of n = 50 persons from the same male population described in Example 1.3.1 gave …ve di¤erent estimates ^ = y of E(Y ) as:
1:723 1:743 1:734 1:752 1:736:
Estimates vary as we take repeated samples and therefore we associate a random variable and a distribution with these estimates.
4.2. ESTIMATORS AND SAMPLING DISTRIBUTIONS 107
More precisely, we de…ne this idea as follows. Let the random variables Y1; : : : ; Yn
represent the observations in a random sample, and associate with the estimate ^ given by (4.1) a random variable
~ = g(Y1; : : : ; Yn):
The random variable ~ = g(Y1; : : : ; Yn) is simply a rule that tells us how to process the
data to obtain a numerical value ^ = g(y1; : : : ; yn) which is an estimate of the unknown
parameter for a given data set y1; : : : ; yn. For example
~ = Y = 1 n n P i=1 Yi
is a random variable and ^ = y is a numerical value. We call ~ the estimator of corre- sponding to ^. (We will always use ^ to denote an estimate, that is, a numerical value, and ~ to denote the corresponding estimator, the random variable.)
De…nition 22 A (point) estimator ~ is a random variable which is a function
~ = g(Y1; Y2; : : : ; Yn) of the random variables Y1; Y2; : : : ; Yn. The distribution of ~ is called
the sampling distribution of the estimator.
Since ~ is a function of the random variables Y1; : : : ; Yn we can …nd its distribution,
at least in principle. Two ways to do this are (i) using mathematics and (ii) by computer simulation. Once we know the sampling distribution of an estimator ~ then we are in a position to express the uncertainty in an estimate. The following example illustrates how we examine the probability that the estimator ~ is “close” to .
Example 4.2.1
Suppose we want to estimate the mean = E(Y ) of a random variable, and that a Gaussian distribution Y G( ; ) describes variation in Y in the population. Let Y1; : : : ; Yn represent a random sample from the population, and consider the estimator
~ = Y = 1 n n P i=1 Yi
for . Recall that if the distribution of Yi is G( ; ) then the distribution of Y is Gaussian,
G( ; =pn). Consider the probability that the random variable j~ j is less than or equal to some speci…ed value . We have
P (j~ j ) = P Y + = P pn= Z pn= (4.2)
where Z = (Y )=( =pn) G(0; 1). Clearly, as n increases, the probability (4.2) approaches one. Furthermore, if we know (even approximately) then we can …nd the probability for any given and n. For example, suppose Y represents the height of a male (in meters) in the population of Example 1.3.1, and that we take = 0:01. That
108 4. ESTIMATION
is, we want to …nd the probability that j~ j is no more than 0:01 meters. Assuming = s = 0:07 (meters), (4.2) gives the following results for sample sizes n = 50 and n = 100:
n = 50: P (j~ j 0:01) = P ( 1:01 Z 1:01) = 0:688 n = 100: P (j~ j 0:01) = P ( 1:43 Z 1:43) = 0:847
This indicates that a larger sample is “better” in the sense that the probability is higher that ~ will be within 0:01m of the true (and unknown) average height in the population. It also allows us to express the uncertainty in an estimate ^ = y from an observed sample y1; : : : ; ynby indicating the probability that any single random sample will give an estimate
within a certain distance of .
Example 4.2.2
In the Example 4.2.1 we were able to determine the distribution of the estimator exactly, using properties of Gaussian random variables. Often we are not be able to do this and in this case we could use simulation to study the distribution22. For example, suppose we have a random sample y1; : : : ; yn which we have assumed comes from an Exponential( )
distribution. The maximum likelihood estimate of is ^ = y. What is the sampling distribution for ~ = Y ? We can examine the sampling distribution by using simulation. This involves taking repeated samples, y1; : : : ; yn, giving (possibly di¤erent) values of y for
each sample as follows:
1. Generate a sample of size n. In R this is done using the statement y<-rexp(n; 1= ) : (Note that in R the parameter is speci…ed as 1= .)
2. Compute ^ = y from the sample. In R this is done using the statement ybar<-mean(y).
Repeat these two steps k times. The k values y1; : : : ; yk can then be considered as a
sample from the distribution of ~, and we can study the distribution by plotting a histogram of the values.
The histogram in Figure 4.1 was obtained by drawing k = 10000 samples of size n = 15 from an Exponential(10) distribution, calculating the values y1; : : : ; y10000and then plotting
the relative frequency histogram. What do you notice about the distribution particularly with respect to symmetry? Does the distribution look like a Gaussian distribution?
The approach illustrated in the preceding example can be used more generally. The main idea is that, for a given estimator ~, we need to determine its sampling distribution in order to be able to compute probabilities of the form P (j~ j ) so that we can quantify the uncertainty of the estimate.
2 2
This approach can also be used to study sampling from a …nite population of N values, fy1; : : : ; yNg, where we might not want to use a continuous probability distribution for Y .
4.2. ESTIMATORS AND SAMPLING DISTRIBUTIONS 109 0 5 10 15 20 25 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Relative Frequency
Figure 4.1: Relative frequency histogram of means from 10000 samples of size 15 from an Exponential(10) distribution
The estimates and estimators we have discussed so far are often referred to as point es- timates and point estimators. This is because they consist of a single value or “point”. The discussion of sampling distributions shows how to address the uncertainty in an estimate. We also usually prefer to indicate explicitly the uncertainty in the estimate. This leads to the concept of an interval estimate23, which takes the form
[L (y) ; U (y)]
where L (y) and U (y) are functions of the observed data y. Notice that this provides an in- terval with endpoints L and U both of which depend on the data. If we let L (Y) and U (Y) represent the associated random variables then [L (Y) ; U (Y)] is a random interval. If we were to draw many random samples from the same population and each time we constructed the interval [L (y) ; U (y)] how often would the statement 2 [L (y) ; U (y)] be true? The probability that the parameter falls in this random interval is P [L (Y) U (Y)] and hopefully this probability is large. This probability gives an indication how good the rule is by which the interval estimate was obtained. For example if P [L (Y) U (Y)] = 0:95 then this means that 95% of the time (that is, for 95% of the di¤erent samples we might draw), the true value of the parameter falls in the interval [L (y) ; U (y)] constructed from the data set y. This means we can be reasonably safe in assuming, on this occasion, and for this data set, it does so. In general, uncertainty in an estimate is explicitly stated by giving the interval estimate along with the probability P ( 2 [L (Y) ; U (Y)]).
110 4. ESTIMATION