CHAPTER 3 Artificial Neural Networks 20
5.4 Database 100
In order to determine the distribution of contaminants at the Kansas City landfill site, samples of groundwater and soil were obtained at various locations throughout the landfill. Groundwater sampling was performed by drilling and installing four monitoring wells at the site. Well-related information (such as locations, depths, and so forth) are depicted in Table 5.1 and Figure 5.2. The wells were installed to detect the presence and migration of contaminants from the landfill site. Monitoring Well (MW) 1 was originally designated as the background well, while MW-2, 3 and 4 were installed in the presumed, in relation to the dump site, down-gradient direction (KDHE, 1996). Once the monitoring wells were in place, groundwater samples were collected and sent to the laboratory for analysis.
Similar to the groundwater sampling procedure discussed previously, soil sampling was conducted by collecting seven soil samples from depths varying from 0–12 inches from ground surface. Information (such as number, location, and chemical analysis results) related to the seven samples used herein in our ANN modeling task are given in Table 5.3 and 5.4 as well as Figure 5.3.
5.4.1 Model Development
In order to utilize any ANN-based model, the program must be trained or educated about the process it is supposed to model. To train the network, a known set of input data along with the desired outcome is used. The BPANN methodology/program [Mryyan & Najjar (2007); Itani
& Najjar (2000); Najjar & Basheer (1996); Itani (1996)] using the supervised training approach is used to train the desired ANN models to produce output values that are as close to the real values as possible via repeated modifications of the network’s connection weights. This process continues until the error at the output layer is minimized. Once this training process has been completed, the developed model can then be used for prediction tasks. Note that the accuracy of the predicted values is dependent on the quality of the data used in the training phase. The better the quality of the training sets, the greater the accuracy of the predicted values will be. For this reason, the training sets (i.e., groundwater and soil data) used to build the desired network models were of the utmost importance in this study.
When developing any ANN model, it is important to determine what input and output values will be used (Dowla & Rogers, 1995). For the Kansas City landfill case, x and y
coordinates were used as the only input values to the model. The concentration value (V) of lead or copper was used as the output for their associated network model. The x and y coordinates refer to the x and y distances for the associated observation point, measured from a reference point (i.e., x = 0, y = 0). The value of lead in soil network model was developed using five data sets for training and the remaining two data sets for testing purposes. Best network and
associated number of hidden nodes were determined by training and online testing to achieve the least error on the testing data sets. Accuracy measures used in this study are:
ASE = Averaged squared error
MARE =Mean Absolute Relative Error and R2 = Coefficient of Determination.
Best net is identified as the one having the least ASE and MARE and highest R2. In this case, the number of hidden nodes needed to achieve this objective was found by the adaptive
training approach. The final (best performing) net contained two input nodes representing the x and y coordinates, two hidden nodes and one output node (i.e., value of lead). Similar modeling processes (i.e., training, testing and evaluation) were carried out to select the best performing network model for:
Value of lead in groundwater table (GWT) at depth (Z) = 2 feet Value of lead in GWT at Z = 4 feet
Value of lead in soil Value of copper in soil
For all four networks developed herein, two hidden nodes were found adequate to achieve the desired best performing net.
5.4.2 Databank Generation
A contaminated location, for the purposes of this study, is defined as any x and y
coordinate location that contains lead or copper value that is higher than the Maximum Allowed Contamination Level (MACL). Table 5.5 lists the containment and their associated MACL values. A sampling location that has been observed to have a concentration value higher than MACL will be designated as a contaminated area or hot area, and therefore would require
remediation. On the other hand, any location having a concentration value less than MACL value will be considered as an uncontaminated zone or safe area, and therefore requires no
remediation.
Four databases containing x, y and V values were generated via the developed ANN models for each case at various locations across the site. To achieve this objective, the landfill site was divided into a grid system. In this case, grid lines were set at 10-foot intervals for both x (east) and y (north) directions. A total of 481 grid points were used for each case (See Figure
5.4). The x and y coordinates were used as input values for each of the four developed ANN profiling models. The developed models were then used to predict the corresponding
contamination value for the 481 designated x and y coordinates representing the site. This produced four data banks containing 481 sets of x, y coordinates and their predicted V value for each case. The resulting data banks were processed to construct various contamination
distribution maps of the landfill site.