• No results found

Fitting a Distribution

E XAMPLE 1.42: B AYESIAN NETWORK

22. Joint probability distribution

2.4 Testing Hypotheses about Outcomes of Experiments

2.4.7 Fitting a Distribution

When testing a hypothesis using a chi-square test, we need to compute the expected distribution of sample values. These expected values may come from prior studies, as in the preceding example, or from physical considerations. In many cases, how-ever, the expected values can be derived by assuming that the observations arise from a standard distribution, such as the Poisson, exponential, or normal distribu-tions, and then choosing the parameters of the distribution to best match the observed values. This is called fitting a distribution to the observations. A general technique for fitting a distribution is called the method of maximum likelihood, discussed next.

Suppose that random variables X1, X2,..., Xnhave a known joint density function fT(x1, x2,..., xn), where denotes the unknown parameters of the distribution, such as its mean and variance. Given the observation Xi= xi, where i = 1, 2,..., n, we would like to compute the maximum likelihood estimate (MLE) of , that is, the value of that makes the observed data the “most likely.” Intuitively, conditional on the observations being what they are, we would like to work backward to find the value of that made these observations likely: We then assume that we observed what we did because the parameters were what they were.

Assuming that the Xi values are independent and identically distributed accord-ing to fT(.), the joint probability that the observation is (x1, x2,..., xn) is simply the product of the individual probabilities . Note that the distribution func-tion is parametrized by . We make this explicit by defining likelihood( ) as

(EQ 2.27)

We find the MLE by maximizing likelihood( ) with respect to . In practice, it is more convenient to maximize the natural logarithm of likelihood(.) denoted l(.), defined by

(EQ 2.28)

For example, suppose that we want to fit a Poisson distribution with parameter to an observation (x1, x2,..., xn). Recall that for a Poisson distribution,

.

ptg7913109

2.4 Testing Hypotheses about Outcomes of Experiments 83

If the Xi are independent and identically distributed (i.i.d.) Poisson variables, their joint probability is the product of their individual distributions, so that

(EQ 2.29)

We maximize l(.) by differentiating it with respect to and setting the derivative to 0:

(EQ 2.30)

which yields the satisfying result

(EQ 2.31)

Thus, we have found that the mean of a set of observations is the value that maxi-mizes the probability that we obtain that particular set of observations, conditional on the observations being independent and identically distributed Poisson variables.

Proceeding along similar lines, it is possible to show that the maximum likeli-hood estimators for a set of i.i.d. normal variables is

(EQ 2.32)

Note that the MLE for the standard deviation is not a consistent estimator, to get one, we need to divide by n – 1, rather than n, as discussed in Section 2.2.5. Maxi-mum likelihood estimators for other distributions can be found in standard texts on mathematical statistics.

It is possible to obtain confidence intervals for maximum likelihood estimators by considering the sampling distribution of the estimated parameters. This is dis-cussed in greater depth in more advanced texts.

Note that if we use the sample itself to estimate p parameter values of the popu-lation, we reduce the number of degrees of freedom in the sample by p. Recall that a sample that has n counts (ordinal types), has n – 1 degrees of freedom. If, in addi-tion, p parameters are estimated to compute the expected counts, the degree of free-dom when conducting a chi-squared test is n – 1 – p.

l O XilogO–O–logXi!

ptg7913109

EXAMPLE 2.13: FITTING A POISSON DISTRIBUTION

In an experiment, a researcher counted the number of packets arriving to a switch in each 1 ms time period. The following table shows the count of the number of time periods with a certain number of packet arrivals. For instance, there were 146 time periods that had six arrivals. The researcher expects the packet arrival process to be a Poisson process. Find the best Poisson fit for the sample. Use this to compute the expected count for each number of arrivals.

What is the chi-squared variable value for this data set? Determine whether the Poisson distribution adequately describes the data.

Solution:

The total number of time periods is 18 + 28 + ... + 5 = 1,211. The total number of arrivals is (18 * 1) + (28 * 2) + ... + (5 * 16) = 8,935. Therefore, the mean number of packets arriving in 1 ms is 8,935/1,211 = 7.38. This is the best estimate for the mean of a fitted Poisson distribution. We use this to generate the probability of a certain number of arrivals in each 1 ms time period. This probability multi-plied by the total number of time periods is the expected count for that number of arrivals, and this is shown next. For instance, we compute P(1) = 0.0046 and 0.0046 * 1211 = 6.

Although at first glance the fit appears to be good, it is best to compute the chi-squared value: ((18 – 6)2/6) + (28 – 21)2/21 +.... + (5 – 3)2/3 = 48.5. Since we estimated one parameter from the sample, the degrees of freedom = 16 – 1 – 1

= 14. From the chi-squared table, with 14 degrees of freedom, at the 95% con-fidence level, the critical value is 23.68. Therefore, we reject the hypothesis that the sample is well described by a Poisson distribution at this confidence level. That is, we have 95% confidence that this sample was not drawn from a Poisson population. The critical value at the 99.9% level for 14 degrees of free-dom is 36.12. So, we can be even more confident and state that with 99.9%

confidence, the sample is not drawn from a Poisson population.

Number of Packet

Arrivals 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Count 18 28 56 105 126 146 164 165 120 103 73 54 23 16 9 5

Number of Packet

Arrivals 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Count 18 28 56 105 126 146 164 165 120 103 73 54 23 16 9 5

Expected Count 6 21 51 93 138 170 179 165 135 100 67 41 23 12 6 3

ptg7913109

2.4 Testing Hypotheses about Outcomes of Experiments 85

At first glance, this is a surprising result because the fit appears quite good.

The reason the test fails is clear when we examine the ((ni – ei)2/ei) values.

The largest value is 27.6, which is for 1 packet arrival, where we expected a count of 6 but got 18. Because the denominator here is small (6), the contribu-tion of this sample value to the chi-squared variable is disproporcontribu-tionate. If we were to ignore this value as an outlier and computed the fit only for 2–16 packet arrivals, the revised estimate of the distribution mean is 7.47, and the revised chi-squared variable is 19.98 (see Exercise 2.12). This does meet the goodness-of-fit criterion with 13 degrees of freedom even at the 95% confi-dence level. In cases like these, it is worthwhile looking into why there was a deviation from the Poisson process: a systematic error in the experiment or perhaps a heretofore unknown phenomenon.

2.4.8 Power

Recall that when we test a hypothesis, we determine the probability of obtaining an observed outcome conditional on the null hypothesis being true. If the outcome is less probable than the significance level, such as 5% or 1%, we reject the null hypothesis. Of course, the hypothesis could still be true. Nevertheless, we reduce the Type I error, that of rejecting a hypothesis when it is in fact true, to a value below the significance level.

We now discuss a related concept: the power of a test. The power of a statistical test is the probability that the test will reject a null hypothesis when it is in fact false. If the power is low, we may not reject a null hypothesis even when it is false, a Type II error. Thus, the greater the power, the lower the chance of making a Type II error. Usually, the only way to increase the power of a test is to increase its signifi-cance level, which makes a Type I error more likely.

The practical difficulty in computing the power of a test is that we don’t know the ground truth. So, it becomes impossible to compute the probability that we will reject the null hypothesis conditional on the ground truth being different from the null hypothesis. For instance, suppose that the ground truth differs infinitesimally from the null hypothesis. Then, the probability that we reject the null hypothesis, which is false, is essentially the same as the significance level. In contrast, suppose that the ground truth is far from the null hypothesis. Then, the sample mean is likely to be near the ground truth, and we are likely to reject the null hypothesis, increasing the power of the test. But we have no way of knowing which of these sit-uations holds. Therefore, we can precisely compute the power of a test only in the context of an alternative hypothesis about the state of the world. Unfortunately, in many cases, this is impossible to determine. Therefore, despite its intuitive merit, the power of a test is rarely computed.

ptg7913109

2.5 Independence and Dependence: Regression