Evaluate Performance of the Sampling Design

Step 5: Draw Conclusions from the Data

5.1.3 Evaluate Performance of the Sampling Design

If the sampling design is to be used again, either in a later phase of the current study or in a similar study, the analyst will be interested in evaluating the overall performance of the design. To evaluate the sampling design, the analyst performs a statistical power analysis that describes the estimated power of the statistical test over the range of possible parameter values. The power of a statistical test is the probability of rejecting the null hypothesis when the null hypothesis is false. The estimated power is computed for all parameter values under the alternative hypothesis to create a power curve. A power analysis helps the analyst evaluate the adequacy of the

sampling design when the true parameter value lies in the vicinity of the action level (which may not have been the outcome of the current study). In this manner, the analyst may determine how well a statistical test performed and compare this performance with that of other tests.

The calculations required to perform a power analysis can be relatively complicated, depending on the complexity of the sampling design and statistical test selected. Box 5-2 illustrates power calculations for a test of a single proportion, which is one of the simpler cases. A further discussion of power curves (performance curves) is contained in the Guidance for Data Quality Objectives (QA/G-4) (EPA 1994).

Box 5-2: Example of Power Calculations for the One-Sample Test of a Single Proportion This box illustrates power calculations for the test of H0: P $ .20 vs. HA: P < .20, with a false rejection error rate of 5% when P=.20 presented in Boxes 3-10 and 3-11 The power of the test will be calculated assuming P1 = .15 and before any data are available. Since nP1 and n(1-P1) both exceed 4, the sample size is large enough for the normal approximation, and the test can be carried out as in steps 3 and 4 of Box 3-10

STEP 1: Determine the general conditions for rejection of the null hypothesis. In this case, the null hypothesis is rejected if the sample proportion is sufficiently smaller than P0. (Clearly, a sample proportion above P0 cannot cast doubt on H0.) By steps 3 and 4 of Box 3-10 and 3-3 H0 is rejected if p% _.5/_n& _P 0 P₀Q₀/n <&_z 1&".

Here p is the sample proportion, Q0 = 1 - P0, n is the sample size, and z1-" is the critical value such

that 100(1-")% of the standard normal distribution is below z1-". This inequality is true if

p % _.5/_n _< _P

0& z1&" P0Q0/n.

STEP 2: Determine the specific conditions for rejection of the null hypothesis if P1 (=1-Q1) is the true value of the proportion P. The same operations as are used in step 3 of Box 3-10 are performed on both sides of the above inequality. However, P0 is replaced by P1 since it is assumed that P1 is the true proportion. These operations make the normal approximation applicable. Hence, rejection occurs if

p % _.5/_n & _P 1 P₁)Q₁/n < P0 & _P 1 & z1&_" P₀Q₀/n P₁Q₁/n ' .20 & .15 & 1.645 (.2) (.8) / 85 (.15) (.85) / 85 ' &_0.55

STEP 3: Find the probability of rejection if P1 is the true proportion. By the same reasoning that led to the test in steps 3 and 4 of Boxes 3-10 and 3-11 the quantity on the left-hand side of the above inequality is a standard normal variable. Hence the power at P1 = .15 (i.e., the probability of rejection of H0 when .15 is the true proportion) is the probability that a standard normal variable is less than -0.55. In this case, the probability is approximately 0.3 (using the last line from Table A-1 of Appendix A) which is fairly small.

5.2 INTERPRETING AND COMMUNICATING THE TEST RESULTS

Sometimes difficulties may arise in interpreting or explaining the results of a statistical test. One reason for such difficulties may stem from inconsistencies in terminology; another may be due to a lack of understanding of some of the basic notions underlying hypothesis tests. As an

example, in explaining the results to a data user, an analyst may use different terminology than that appearing in this guidance. For instance, rather than saying that the null hypothesis was or was not rejected, analysts may report the result of a test by saying that their computer output shows a p-value of 0.12. What does this mean? Similar problems of interpretation may occur when the data user attempts to understand the practical significance of the test results or to explain the test results to others. The following paragraphs touch on some of the philosophical issues related to hypothesis testing which may help in understanding and communicating the test results.

EPA QA/G-9 Final

QA00 Version 5 - 7 July 2000

5.2.1 Interpretation of p-Values

The classical approach for hypothesis tests is to prespecify the significance level of the test, i.e., the Type I decision error rate ". This rate is used to define the decision rule associated with the hypothesis test. For instance, in testing whether the population mean µ exceeds a

threshold level (e.g., 100 ppm), the test statistic may depend on X¯ , an estimate of µ. Obtaining an estimate X¯ that is greater than 100 ppm may occur simply by chance even if the true mean µ is less than or equal to 100; however, if X¯ is "much larger" than 100 ppm, then there is only a small chance that the null hypothesis H0 (µ # 100 ppm) is true. Hence the decision rule might take the form "reject H₀ if X¯ exceeds 100 + C", where C is a positive quantity that depends on " (and on the variability of X¯ ). If this condition is met, then the result of the statistical test is reported as "reject H₀"; otherwise, the result is reported as "do not reject H₀."

An alternative way of reporting the result of a statistical test is to report its p-value, which is defined as the probability, assuming the null hypothesis to be true, of observing a test result at least as extreme as that found in the sample. Many statistical software packages report p-values, rather than adopting the classical approach of using a prespecified false rejection error rate. In the above example, for instance, the p-value would be the probability of observing a sample mean as large as X¯ (or larger) if in fact the true mean was equal to 100 ppm. Obviously, in making a decision based on the p-value, one should reject H₀ when p is small and not reject it if p is large. Thus the relationship between p-values and the classical hypothesis testing approach is that one rejects H₀ if the p-value associated with the test result is less than ". If the data user had chosen the false rejection error rate as 0.05 a priori and the analyst reported a p-value of 0.12, then the data user would report the result as "do not reject the null hypothesis;" if the p-value had been reported as 0.03, then that person would report the result as "reject the null hypothesis." An advantage of reporting p-values is that they provide a measure of the strength of evidence for or against the null hypothesis, which allows data users to establish their own false rejection error rates. The significance level can be interpreted as that p-value (") that divides "do not reject H_O" from "reject H_O."

5.2.2 "Accepting" vs. "Failing to Reject" the Null Hypothesis

As noted in the paragraphs above, the classical approach to hypothesis testing results in one of two conclusions: "reject H0" (called a significant result) or "do not reject H0" (a

nonsignificant result). In the latter case one might be tempted to equate "do not reject H₀" with "accept H₀." This terminology is not recommended, however, because of the philosophy underlying the classical testing procedure. This philosophy places the burden of proof on the alternative hypothesis, that is, the null hypothesis is rejected only if the evidence furnished by the data convinces us that the alternative hypothesis is the more likely state of nature. If a

nonsignificant result is obtained, it provides evidence that the null hypothesis could sufficiently account for the observed data, but it does not imply that the hypothesis is the only hypothesis that could be supported by the data. In other words, a highly nonsignificant result (e.g., a p-value of 0.80) may indicate that the null hypothesis provides a reasonable model for explaining the data, but it does not necessarily imply that the null hypothesis is true. It may, for example, simply

indicate that the sample size was not large enough to establish convincingly that the alternative hypothesis was more likely. When the phrase "accept H_O" is encountered, it must be considered as "accepted with the preceding caveats."

In document Quality. Guidance for Data Quality Assessment. Practical Methods for Data Analysis EPA QA/G-9 QA00 UPDATE. EPA/600/R-96/084 July, 2000 (Page 171-174)