4. STATISTICAL METHODS
4.3 Test for Normal Distribution of Data:
Chi-Square Goodness of Fit Test
A statistical procedure to test for normal distribution of data is the Chi-Square Goodness of Fit Test. This test compares the observed sample distribution with a normal distribution. The normal distribution is described by the well-known bell-shaped curve. The parameters "u" (the mean) and "s" (the standard deviation) characterize the center and the spread of the
distribution, respectively. An important property of any normal distribution is its symmetry which is quite helpful when using tables to determine probabilities or percentiles of the normal distribution. A description of the Chi-Square Goodness of Fit Test from Appendix A of EPA's December 1985 report Short-Term MethQds far Estimating iM ChrQPic Toxicitv q± Effluents ajid Receiving Waters to Freshwater Organisms (Horning and Weber,
1985) is given below.
An example of this test method is provided at the end of this chapter (Table 4.2, OWASA #1 Statistical Spreadsheet Example Calculations). The first step of the Chi-Square Goodness of Fit
Test is to standardize the observations (i.e., the number of neonates
reproduced) by subtracting the mean number of neonates reproduced
from each observation and dividing the difference by the standard deviation. In this example, the number of neonates produced by
control replicate #1 is 20, the mean number of control neonates
reproduced is 14.67, and the standard deviation is 4.44.
Therefore, the standardized observation for control replicate #1 is 1.20. Likewise, the standardized observations for control
replicates #2, #3, and #4 are -0.60, 0.75, and 0.53,
respectively. In a similar manner, the observations must be
standardized for the test replicates (i.e., 97.6X effluent). For
example, the number of neonates produced by test replicate #1 is
11, the mean number of test neonates produced is 11.75, and the standard deviation is 3.28. Therefore, the standardized
observation for test replicate #1 is -0.23.
Once the control and the test replicates have been
standardized, a table is constructed consisting of five cells as follows: < -1.5; -1.5 to < -0.5; -0.5 to 0.5; > 0.5 to 1.5; and
> 1.5. The number of standardized observations which fall into each of the five cells is tabulated. These are the observed
frequencies, "fi". The expected frequency, "Fi", is found by multiplying the area under the standard normal curve over the
"ith" cell limits by the total number of standardized
observations, "N". For this example, N=24 (12 replicates for the
control and 12 replicates for the test sample). The areas for each cell, the observed frequencies, and the expected frequencies
are shown in the table. The Chi-Square Goodness of Fit Test
statistic ,"X2", is calculated as follows:
2
For the data in this example, the calculated X2 value is: 2 2 X2 = (1-1.608) /I.608 + (8-5.808) /5.808 2 2 + (5-9.168) /9.168 + (9-5.808) /5.808 2 + (1-1.608) /I.608 =4.94
The decision rule for this test is to compare the critical X2 value, with four degrees of freedom (number of cells - 1) at a significance level of 0.01 (99% confidence level), to the
calculated X2 value. If the calculated value exceeds the critical value, conclude that the data are not normally distributed. For this example, the critical value is 13.28
(Appendix C, Table C.I). The calculated value, 4.94, does not exceed the critical value. Thus, the conclusion of the test is that the data are normally distributed.
EPA suggests that if the data fail the test for normality, a transformation such as to log values may normalize the data.
After transforming the data, the Chi-Square Goodness of Fit Test should be repeated for normality. However, after discussions with Ken Eagieson, Larry Ausley, and Steve Mistele of the North Carolina Division of Environmental Management (NCDEM), if the data should fail the Chi-Square Goodness of Fit Test, the data need not be transformed if the non-parametric Wilcoxon Rank Sum Test is utilized to determine significant difference in
reproduction (Mistele, 1989).
4.4 Test for Normal Distribution of Data:
Shapiro-WiIk's Test
In March 1989, the United States Environmental Protection Agency (USEPA) revised their December 1985 guidance document
entitled Short-Term Methods for Estimating the Chronic Toxicity of Effluents and Receiving Waters %o Freshwater Organisms. EPA
still recommends Bartlett's Test for testing homogeneity of variance, Dunnett's Test for testing significant difference in
reproduction when the normality and homogeneity of variance assumptions are met, Wilcoxon Rank Sum Test for testing
significant difference in reproducton when either the normality and/or homogeneity of variance assumptions are violated, and the
Fisher's Exact Test for testing significant difference in
mortality. However, the USEPA now recommends that the Shapiro-
Wi Ik's Test be used instead of the Chi-Square Goodness of Fit Test for testing normal distribution of data. The March 1989 guidance document reports that the Shapiro-WiIk's Test is a more robust test when the sample size (i.e., the number of
observations) is fifty or less. A description of the Shapiro-
Wi Ik's Test from Appendix B of EPA's March 1989 report Short-Term
Methods for Estimating the Chronic Toxicitv of Effluents and Receiving Waters to Freshwater Organisms (Weber, et al, 1989) is
given below.
An example of this test method is provided at the end of
this chapter (Table 4.3, OWASA #19 Statistical Spreadshet Example
Calculations). The first step of the Shapiro-WiIk's Test is to
observations within a sample from each observation in that sample. In this example, the number of neonates produced by control replicate #1 is 26 and the mean number of control
neonates is 21.58. Therefore, the centered observation for
control replicate #1 is 4.42. Likewise, the centered
observations for control replicates #2, #3, and #4 are 3.42,
4.42, and -6.58, respectively. In a similar manner, the
observations are also centered for the test replicates (i.e., 93% effluent). In this example, the number of neonates produced by test replicate #1 is 34 and the mean number of neonates produced
is 33.92. Therefore, the centered observation for test replicate
#1 is 0.08.
Once the control and test replicates have been centered, the centered observations are tabulated from smallest to largest. The constructed table is shown on the second page of Table 4.3, OWASA #19 Statistical Spreadsheet Example Calculations, where
"X(i)" denotes the ith centered observation.
Continuing on page two of Table 4.3, a second table is constructed in which the Shapiro-WiIk's coefficients (i.e.,
"ai"), the centered observation differences (i.e.,
ͣͣ
X(n-i + 1) - X(i)"), and the product (i.e., "Product") of the Shapiro-Wilk coefficient multiplied by its respective centered
observation difference are tabulated. Shapiro-WiIk's
coefficients (a1, a2, a3, ..., ak, where k is approximately n/2) are obtained from Table C.2 in Appendix C by knowing the number of observations, n. For the data in this example, n=24 and k=12. Therefore, the coefficients for a1, a2, and a3 are 0.4493,
0.3098, and 0.2554. The first centered observation difference,
X(24) - X(1), corresponds to 7.08 - (-11.58) which is equal to
18.66. Likewise, the second, third, and fourth centered
observation differences are 15.00, 14.00, and 11,00,
respectively. Therefore, the "Product" of the first value is
simply "al" multiplied by [X(24) - X(1)] which corresponds to
0.4493 * 18.66 Which Is equal to 8.38. Likewise, the second,
third, and fourth values are 4.65, 3.58, and 2.36, respectively.
The calculated test statistic, Calculated W, can now be
computed as follows:
2
Calculated W = [1/D]*[summation of product values]
2
where D = summation [X(i) -Xbar]
X(i) = the ith centered observation
Xbar = the overall mean of the centered observations
For the data in this example, Xbar turns out to be zero, thereby
resulting in a D value of 641.83. Consequently, the calculated W
2
value is [1/641.08] * [24.09] which is equal to 0.904.
The decision rule for this test is to compare the calculated
W value with the critical W value, obtained from Table C.3 in
Appendix C of this report. If the calculated value is less than
the critical value, it is concluded that the data are not
normally distributed. For this example, the critical W value at
a 99X confidence level (0.01 quantile) and 24 observations is
0.884. Because the calculated W value (0.904) is greater than
the critical W value (0.884), it is concluded that the data are
Again, the USEPA recommends that if the data fail the test
for normality, a transformation to log values may normalize the
data. However, from discussions with NCDEM regulators,
transformation of data is not necessary if the non-parametric
Wilcoxon Rank Sum Test is utilized to test for significant
difference in reproduction (Mistele, 1989).