Statistical Analysis - Materials and Methods

Section 2 Materials and Methods

2.3 Statistical Analysis

Descriptive statistics including the mean, standard deviation, minimum, maximum, median, and

data quartiles (first quartile, Q1, equals the 25th percentile and the third quartile, Q3, equals the

75th percentile) were calculated for each sampling period and holding condition. The mean,

standard deviation, and CV (equal to the standard deviation divided by the mean) were also calculated for all data collected for a given holding condition.

Overall CVs of less than 20% for CrO4 and metals, and 25% for SVOCs suggest that data has

remained within the data quality acceptance criteria for replicate precision (Table 2-5). Even though trends may be statistically detected in such data, it would be suspect due to the normal analytical error that can be observed in chemical analysis. For analytes measured within this level of noise, one would suggest that the concentration did not decrease more than would be allowed between replicate measurements over the entire time period tested. For these analytes, estimation of a holding time from this experiment could be considered impractical.

A one-sample, one sided, t-test of the null hypothesis Ho: the data are samples from a population with a mean equal to or greater than the mean of the Day 0 concentration versus the alternative

H1: the samples are from a population with a mean less than the mean of the Day 0

concentration, was conducted to provide a measure of relevance (chemical importance) of the potential change. Simple linear regression of concentration against time was used to test 1) the null hypothesis that the slope of the natural logarithm of the concentration of each contaminant was equal to zero and 2) the null hypothesis that slopes associated with curvature (the lack-of-fit to the simple linear model) were equal to zero. Plots were used to compare the observed

concentration to the nonparametric upper and lower boundaries of the Day 0 concentration calculated as Q3+ 1.5(Q3-Q1) and Q1-1.5(Q3-Q1), respectively. The fitted regression line was included in data plots if the slope was negative and statistically different from zero.

The holding time (HT), defined as the number of days until a 5%, 10% and 20% decrease in the

based on the lower 95% confidence limit of the slope estimate (βLCL). Thus, holding time for a

given percentage change (Δ%) was defined as HT = -Δ%*C0/βLCL. Using the lower 95%

confidence limit of the linear slope provides some conservatism to the holding time estimation because as the data increasingly deviate from a simple linear response, the confidence interval

increases and the slope used for estimation becomes steeper. When the estimated βLCL was

greater than 0, the concentration was estimated to be increasing instead of decreasing and the number of days until a given percentage lost was not applicable (NA). All of the analytes tested were assumed to only decrease over time since none of the analytes measured were considered a degradation product.

Chemical concentrations tend to be log normally distributed. Thus, analysis was conducted on the natural logarithm of the concentration to satisfy assumptions of the statistical analyses. Further, the natural logarithm of the concentration was used instead of the raw data scale because sediments have varying amounts (i.e., 50 to 800 mg/kg) of a given analyte. The natural log transformation rescales the observations to minimize the magnitude of the difference between the Day 0 value and the value obtained at a given percentage change. A percentage change (i.e., 10%) from a large number (say 800) has a larger magnitude difference (800-720=80) than the same percentage change from a small number (50-45=5). Thus, on the raw data scale, analytes with the same slope associated with the decrease in concentration over time but different intercepts would produce quite different estimates of the holding time. The natural log transformation reduces this effect (i.e., ln[800]-.9*ln[800]=0.67 compared to ln[50]-

.9*ln[50]=0.39). The resulting holding times would still be different; however, the magnitude of the difference would be much less than that achieved on the raw scale.

Observations were removed from data analysis including plots if they were potentially

contaminated (flagged with a B) or rejected for QC/QA reasons (flagged with an r). Analysis was conducted with and without observations considered extreme outliers, defined as Q3 + 3*(Q3- Q1) where the quartiles are derived from the Day 0 data. Extremely high outliers have a large influence on the estimated slope and, thus, the resulting holding time. Outliers in the Day 0 data were removed from analysis and plots if the within replicate CV was greater than 25%.

Observations were removed from the Day 0 data only if a single value was extreme and if that value was the furthest absolute distance from the median value. Extreme outliers observed during all other sampling times were removed from the analysis only if they made up less than 10% of the total number of observations.

Residual plots were used to assess the lack of homogeneity of variance, lack-of-fit to the linear model, potential outliers, and observations that could have exaggerated influence on the

estimated slopes. Residuals, defined as the observed minus the fitted result, are assumed to have a mean of zero and a constant variance. Residuals plotted against time should display a random pattern about zero with a constant variance across the x-axis. A consistent U-shaped pattern of the residuals would indicate the need for curvature in the model if a significant (p < 0.05) lack- of-fit was detected. The need for a spline or nonlinear model could also be indicated by a

significant lack-of-fit. However, if the concentration decreased and then increased over time, the lack-of-fit was considered analytical noise. In these cases, the simple linear model was used for the holding time estimation. A single observation that has a large amount of influence on the estimated slopes would be indicated by a residual close to zero and far away on the x-axis from

all other observations. Observations with too great an influence on the estimation of the slope were removed from the analysis.

The intent of this analysis was to provide a progression of four distinct ways to evaluate a possible decrease in the concentration of an analyte (lines of evidence). First, the magnitude and variability of the concentration was used to suggest a level of the signal to noise ratio; second a t-

test of the H0:overall mean equal to or greater than the Day 0 mean was used to provide a level of

relevance to observed change; third a regression against time was used to estimate a rate of change; and fourth a nonparametric upper and lower bound based on the Day 0 quartiles was used to provide an alternative measure of relevance to the potential change through time. When the t-test is not significant, the slope from the regression is not significant, and the data remain within the nonparametric bounds, all three methods suggest a lack of degradation over the period tested. However, when at least one of the methods differ in their results, the sensitivity and assumptions of each method must be evaluated to sort out the most likely outcome.

The t-test uses the variability observed in all of the data to determine significance and does not evaluate whether or not a trend over time is present. Thus, if the data is highly variable, the probability of rejecting the null hypothesis is low even though a significant trend of decreasing concentration may exist. The regression analysis is sensitive to detecting a linear degradation. Thus, if the pattern of degradation is not linear, the regression analysis may not detect it. A threshold point associated with different rates of degradation over time could cause a simple linear regression to be not significant. However, the lack-of-fit to a simple linear regression would be detected. Finally, the nonparametric bounds are based on the variability observed in the Day 0 data. When the Day 0 variability is high, the boundaries will be wider than when the Day 0 variability is low.

Statistical versus Chemical Significance: The determination of statistical significance in hypothesis testing means that a value greater than or equal to that observed of a specific statistic has a small probability of occurring by chance alone. The definition of small is used to define the

level of significance and is usually set at α = 5%. The probability of occurrence of the achieved

value or one greater of the statistic (p-value) is calculated based on specific assumptions about the distribution of the statistic assuming the null hypothesis is true. The decision to reject a null

hypothesis (reject if the achieved p-value is less than α) for a defined alternative is based on the

level of significance chosen before analysis. However, presenting the achieved p-value allows one to assess the potential biological or chemical significance of the result as well.

The chemical significance of a result is a function of the minimum detection limit (MDL) in comparison to the observed concentration and the analytical variability associated with the type of analyte being measured (metals versus organics, for example). Concentrations close to the MDL (defined as 2 times the MDL = 2*MDL) may exhibit a greater analytical variability than concentrations further away from the MDL. A chemically significant result requires that the observed variability is greater than the analytical variability alone. For metals and organic

analytes replicate results must have a CV of less than 20% and 25%, respectively, to meet quality control criteria for extraction variability. If all observed concentrations for a given analyte across time display a CV of less than or equal to the analytical criteria of replicates, then any detected statistical significance is within the noise of analytical variability and could be spurious. If the

achieved p-value for the test of a significant slope is close to 0.05, then the estimated slope may be considered a curious trend. However, if the achieved p-value is very small (< 0.01), then the observed trend should be considered chemically relevant even though all observations were within the allowable analytical noise.

In document Sample Holding Time Reevaluation (Page 35-38)