Test for Normal Distribution of Data:

4. STATISTICAL METHODS

4.3 Test for Normal Distribution of Data:

Chi-Square Goodness of Fit Test

A statistical procedure to test for normal distribution of data is the Chi-Square Goodness of Fit Test. This test compares the observed sample distribution with a normal distribution. The normal distribution is described by the well-known bell-shaped curve. The parameters "u" (the mean) and "s" (the standard deviation) characterize the center and the spread of the

distribution, respectively. An important property of any normal distribution is its symmetry which is quite helpful when using tables to determine probabilities or percentiles of the normal distribution. A description of the Chi-Square Goodness of Fit Test from Appendix A of EPA's December 1985 report Short-Term MethQds far Estimating iM ChrQPic Toxicitv q± Effluents ajid Receiving Waters to Freshwater Organisms (Horning and Weber,

1985) is given below.

An example of this test method is provided at the end of this chapter (Table 4.2, OWASA #1 Statistical Spreadsheet Example Calculations). The first step of the Chi-Square Goodness of Fit

Test is to standardize the observations (i.e., the number of neonates

reproduced) by subtracting the mean number of neonates reproduced

from each observation and dividing the difference by the standard deviation. In this example, the number of neonates produced by

control replicate #1 is 20, the mean number of control neonates

reproduced is 14.67, and the standard deviation is 4.44.

Therefore, the standardized observation for control replicate #1 is 1.20. Likewise, the standardized observations for control

replicates #2, #3, and #4 are -0.60, 0.75, and 0.53,

respectively. In a similar manner, the observations must be

standardized for the test replicates (i.e., 97.6X effluent). For

example, the number of neonates produced by test replicate #1 is

11, the mean number of test neonates produced is 11.75, and the standard deviation is 3.28. Therefore, the standardized

observation for test replicate #1 is -0.23.

Once the control and the test replicates have been

standardized, a table is constructed consisting of five cells as follows: < -1.5; -1.5 to < -0.5; -0.5 to 0.5; > 0.5 to 1.5; and

> 1.5. The number of standardized observations which fall into each of the five cells is tabulated. These are the observed

frequencies, "fi". The expected frequency, "Fi", is found by multiplying the area under the standard normal curve over the

"ith" cell limits by the total number of standardized

observations, "N". For this example, N=24 (12 replicates for the

control and 12 replicates for the test sample). The areas for each cell, the observed frequencies, and the expected frequencies

are shown in the table. The Chi-Square Goodness of Fit Test

statistic ,"X2", is calculated as follows:

For the data in this example, the calculated X2 value is: 2 2 X2 = (1-1.608) /I.608 + (8-5.808) /5.808 2 2 + (5-9.168) /9.168 + (9-5.808) /5.808 2 + (1-1.608) /I.608 =4.94

The decision rule for this test is to compare the critical X2 value, with four degrees of freedom (number of cells - 1) at a significance level of 0.01 (99% confidence level), to the

calculated X2 value. If the calculated value exceeds the critical value, conclude that the data are not normally distributed. For this example, the critical value is 13.28

(Appendix C, Table C.I). The calculated value, 4.94, does not exceed the critical value. Thus, the conclusion of the test is that the data are normally distributed.

EPA suggests that if the data fail the test for normality, a transformation such as to log values may normalize the data.

After transforming the data, the Chi-Square Goodness of Fit Test should be repeated for normality. However, after discussions with Ken Eagieson, Larry Ausley, and Steve Mistele of the North Carolina Division of Environmental Management (NCDEM), if the data should fail the Chi-Square Goodness of Fit Test, the data need not be transformed if the non-parametric Wilcoxon Rank Sum Test is utilized to determine significant difference in

reproduction (Mistele, 1989).

4.4 Test for Normal Distribution of Data:

Shapiro-WiIk's Test

In March 1989, the United States Environmental Protection Agency (USEPA) revised their December 1985 guidance document

entitled Short-Term Methods for Estimating the Chronic Toxicity of Effluents and Receiving Waters %o Freshwater Organisms. EPA

still recommends Bartlett's Test for testing homogeneity of variance, Dunnett's Test for testing significant difference in

reproduction when the normality and homogeneity of variance assumptions are met, Wilcoxon Rank Sum Test for testing

significant difference in reproducton when either the normality and/or homogeneity of variance assumptions are violated, and the

Fisher's Exact Test for testing significant difference in

mortality. However, the USEPA now recommends that the Shapiro-

Wi Ik's Test be used instead of the Chi-Square Goodness of Fit Test for testing normal distribution of data. The March 1989 guidance document reports that the Shapiro-WiIk's Test is a more robust test when the sample size (i.e., the number of

observations) is fifty or less. A description of the Shapiro-

Wi Ik's Test from Appendix B of EPA's March 1989 report Short-Term

Methods for Estimating the Chronic Toxicitv of Effluents and Receiving Waters to Freshwater Organisms (Weber, et al, 1989) is

given below.

An example of this test method is provided at the end of

this chapter (Table 4.3, OWASA #19 Statistical Spreadshet Example

Calculations). The first step of the Shapiro-WiIk's Test is to

observations within a sample from each observation in that sample. In this example, the number of neonates produced by control replicate #1 is 26 and the mean number of control

neonates is 21.58. Therefore, the centered observation for

control replicate #1 is 4.42. Likewise, the centered

observations for control replicates #2, #3, and #4 are 3.42,

4.42, and -6.58, respectively. In a similar manner, the

observations are also centered for the test replicates (i.e., 93% effluent). In this example, the number of neonates produced by test replicate #1 is 34 and the mean number of neonates produced

is 33.92. Therefore, the centered observation for test replicate

#1 is 0.08.

Once the control and test replicates have been centered, the centered observations are tabulated from smallest to largest. The constructed table is shown on the second page of Table 4.3, OWASA #19 Statistical Spreadsheet Example Calculations, where

"X(i)" denotes the ith centered observation.

Continuing on page two of Table 4.3, a second table is constructed in which the Shapiro-WiIk's coefficients (i.e.,

"ai"), the centered observation differences (i.e.,

ͣͣ

X(n-i + 1) - X(i)"), and the product (i.e., "Product") of the Shapiro-Wilk coefficient multiplied by its respective centered

observation difference are tabulated. Shapiro-WiIk's

coefficients (a1, a2, a3, ..., ak, where k is approximately n/2) are obtained from Table C.2 in Appendix C by knowing the number of observations, n. For the data in this example, n=24 and k=12. Therefore, the coefficients for a1, a2, and a3 are 0.4493,

0.3098, and 0.2554. The first centered observation difference,

X(24) - X(1), corresponds to 7.08 - (-11.58) which is equal to

18.66. Likewise, the second, third, and fourth centered

observation differences are 15.00, 14.00, and 11,00,

respectively. Therefore, the "Product" of the first value is

simply "al" multiplied by [X(24) - X(1)] which corresponds to

0.4493 * 18.66 Which Is equal to 8.38. Likewise, the second,

third, and fourth values are 4.65, 3.58, and 2.36, respectively.

The calculated test statistic, Calculated W, can now be

computed as follows:

Calculated W = [1/D]*[summation of product values]

where D = summation [X(i) -Xbar]

X(i) = the ith centered observation

Xbar = the overall mean of the centered observations

For the data in this example, Xbar turns out to be zero, thereby

resulting in a D value of 641.83. Consequently, the calculated W

value is [1/641.08] * [24.09] which is equal to 0.904.

The decision rule for this test is to compare the calculated

W value with the critical W value, obtained from Table C.3 in

Appendix C of this report. If the calculated value is less than

the critical value, it is concluded that the data are not

normally distributed. For this example, the critical W value at

a 99X confidence level (0.01 quantile) and 24 observations is

0.884. Because the calculated W value (0.904) is greater than

the critical W value (0.884), it is concluded that the data are

Again, the USEPA recommends that if the data fail the test

for normality, a transformation to log values may normalize the

data. However, from discussions with NCDEM regulators,

transformation of data is not necessary if the non-parametric

Wilcoxon Rank Sum Test is utilized to test for significant

difference in reproduction (Mistele, 1989).

In document 1144.pdf (Page 54-60)