Step 5: Draw Conclusions from the Data
5.2.5 Quantity vs Quality of Data
The above conclusions imply that, if compensation for bias cannot be made and if statistically-based decisions are to be made, then there will be situations in which serious
consideration should be given to using an imprecise (and perhaps relatively inexpensive) chemical method having negligible bias as compared to using a very precise method that has even a
moderate degree of bias. The tradeoff favoring the imprecise method is especially relevant when the inherent variability in the population is very large relative to the random measurement error.
For example, suppose a mean concentration for a given spatial area (site) is of interest and that the coefficient of variation (CV) characterizing the site's variability is 100%. Let method A denote an imprecise method, with measurement-error CV of 40%, and let method B denote a highly precise method, with measurement-error CV of 5%. The overall variability, or total
variability, can essentially be regarded as the sum of the spatial variability and the measurement variability. These are obtained from the individual CVs in the form of variances. As CV equals standard deviation divided by mean, it follows that the site standard deviation is then the CV times the mean. Thus, for the site, the variance is 1.002 x mean2; for method A, the variance is 0.402 x mean2; and for method B, the variance is 0.052 x mean2. The overall variability when using method A is then (1.002 x mean2) + (0.402 x mean2) = 1.16 x mean2, and when using method B, the variance is (1.002 x mean2) + (0.052 x mean2) = 1.0025 mean2. It follows that the overall CV when using each method is then (1.077 x mean) / mean = 107.7% for method A, and (1.001 x mean) / mean = 100.1% for method B.
Now consider a sample of 25 specimens from the site. The precision of the sample mean can then be characterized by the relative standard error (RSE) of the mean (which for the simple random sample situation is simply the overall CV divided by the square root of the sample size). For Method A, RSE = 21.54%; for method B, RSE = 20.02%. Now suppose that the imprecise method (Method A) is unbiased, while the precise method (Method B) has a 10% bias (e.g., an analyte percent recovery of 90%). An overall measure of error that reflects how well the sample mean estimates the site mean is the relative root mean squared error (RRMSE):
RRMSE ' (RB)2%(RSE)2
where RB denotes the relative bias (RB = 0 for Method A since it is unbiased and RB = ±10% for Method B since it is biased) and RSE is as defined above. The overall error in the estimation of the population mean (the RRMSE) would then be 21.54% for Method A and 22.38% for Method B. If the relative bias for Method B was 15% rather than 10%, then the RRMSE for Method A would be 21.54% and the RRMSE for Method B would be 25.02%, so the method difference is even more pronounced. While the above illustration is portrayed in terms of estimation of a mean based on a simple random sample, the basic concepts apply more generally.
This example serves to illustrate that a method that may be considered preferable from a chemical point of view [e.g., 85 or 90% recovery, 5% relative standard deviation (RSD)] may not perform as well in a statistical application as a method with less bias and greater imprecision (e.g., zero bias, 40% RSD), especially when the inherent site variability is large relative to the
measurement-error RSD.
5.2.6 "Proof of Safety" vs. "Proof of Hazard"
Because of the basic hypothesis testing philosophy, the null hypothesis is generally specified in terms of the status quo (e.g., no change or action will take place if null hypothesis is not rejected). Also, since the classical approach exercises direct control over the false rejection error rate, this rate is generally associated with the error of most concern (for further discussion of this point, see Section 1.2). One difficulty, therefore, may be obtaining a consensus on which error should be of most concern. It is not unlikely that the Agency's viewpoint in this regard will differ from the viewpoint of the regulated party. In using this philosophy, the Agency's ideal approach is not only to set up the direction of the hypothesis in such a way that controlling the
EPA QA/G-9 Final
QA00 Version 5 - 13 July 2000
false rejection error protects the health and environment but also to set it up in a way that encourages quality (high precision and accuracy) and minimizes expenditure of resources in situations where decisions are relatively "easy" (e.g., all observations are far from the threshold level of interest).
In some cases, how one formulates the hypothesis testing problem can lead to very different sampling requirements. For instance, following remediation activities at a hazardous waste site, one may seek to answer "Is the site clean?" Suppose one attempts to address this question by comparing a mean level from samples taken after the remediation with a threshold level (chosen to reflect "safety"). If the threshold level is near background levels that might have existed in the absence of the contamination, then it may be very difficult (i.e., require enormous sample sizes) to "prove" that the site is "safe." This is because the concentrations resulting from even a highly efficient remediation under such circumstances would not be expected to deviate greatly from such a threshold. A better approach for dealing with this problem may be to compare the remediated site with a reference ("uncontaminated") site, assuming that such a site can be determined.
To avoid excessive expense in collecting and analyzing samples for a contaminant,
compromises will sometimes be necessary. For instance, suppose that a significance level of 0.05 is to be used; however, the affordable sample size may be expected to yield a test with power of only 0.40 at some specified parameter value chosen to have practical significance (see Section 5.2.3). One possible way that compromise may be made in such a situation is to relax the
significance level, for instance, using " = 0.10, 0.15, or 0.20. By relaxing this false rejection rate, a higher power (i.e., a lower false acceptance rate $) can be achieved. An argument can be made, for example, that one should develop sampling plans and determine sample sizes in such a way that both the false rejection and false acceptance errors are treated simultaneously and in a balanced manner (for example, designing to achieve " = $ = 0.15) instead of using the traditional approach of fixing the false rejection error rate at 0.05 or 0.01 and letting $ be determined by the sample size. This approach of treating the false rejection and false acceptance errors
simultaneously is taken in the DQO Process and it is recommended that several different scenarios of " and $ be investigated before a decision on specific values for " and $ are selected.
EPA QA/G-9 Final
QA00 Version A - 1 July 2000
APPENDIX A