• No results found

Development of a Particular aNN Model

Artificial Neural Network–Based Modeling

2.6  Development of a Particular aNN Model

2.6.1  Variable Selection

A very basic overview of the types of variables used for developing an ANN model is presented here. In accordance with systems theory, variables for systems may be classified into three general types: state variables, input variables, and output variables. State variables represent some fundamental inherent measure of the system’s state or condition; examples include water levels and water quality concentra-tions. A state variable typically evolves over time and also often exhibits spatial variaconcentra-tions.

State variables are often used as both input and output variables in an ANN model, with the set of inputs representing the state(s) of a previous time step(s) and the output variables often representing the state(s) for a future time step. Another typical group of input variables for ANN models are control variables, often called decision variables, which are variables over which humans have control; examples include extraction rates of a production well, the chemical dosing rate in a water treatment plant, and the storage release rates from a dam. Another group of input variables are random variables and, as the name implies, are variables that exhibit statistical randomness, over which there is no control; weather variables such as precipitation or temperature are classic examples. While the future value of these variables can be estimated using statistical methods, there is no way to control their outcome. The ANN outputs are the variables of most concern for the modeler, for example, the computed future water levels, salinity concen-trations, algae counts, or even objective function values like economic costs, which may evolve over time in response to some combination of prior state(s), human controls, and random variables.

A critical first step in developing a robust and accurate ANN model is identifying the critical input or predictor variables necessary for predicting the system states of interest. One frequent criticism of ANN models is that they are “black boxes” that do not explicitly account for the physics of the system of inter-est. While this is true, in order to develop a robust model capable of accurately predicting system behav-ior of interest, a strong conceptual if not theoretical understanding of the system is necessary. Without

1

–1 f

f

FIGuRE 2.2 Graph of the sigmoid and hyperbolic tangent activation functions.

Artificial Neural Network–Based Modeling of Hydrologic Processes 27

this understanding, the modeler will have difficulty in identifying the important input variables, related temporal and spatial issues that are important for proper data characterization and preprocessing, as well as characterizing the conditions under which the model will perform well, versus conditions where the model may not achieve desired performance.

Often, there is a temptation to “throw” as many variables into the model as possible, with the belief that the ANN model will identify the critical variables and minimize the relative predictive importance of the irrelevant variables accordingly. While this may be partially true, with limited data sets, more input vari-ables result in a more complex or higher dimension error surface, which can compromise learning. The

“principle of parsimony” is a general modeling edict that holds for all modeling in general; the complexity of the model should be reduced to the extent possible without compromising its ability to represent the fundamental properties of the system of interest. The goal of the modeler should be to develop an ANN that utilizes the critical input variables and can generalize system behavior, thereby consistently providing sufficiently accurate predictions. There is often a temptation for modelers to strive to achieve the lowest possible prediction error during validation or testing. Although not intuitively obvious, as discussed later in this chapter, achieving the lowest validation error does not necessarily ensure that the ANN (or other competing) model will be best for providing accurate predictions over a range of conditions.

For very complicated systems like algal blooms, where there are many possible input variables, a number of techniques may be used to identify an appropriate set of input variables. Principal com-ponent analysis is often used to identify strongly correlated variables to reduce the number of inputs for the ANN model. Another common modeling approach is the use of a special type of ANN called self-organizing maps (SOM) or Kohonen networks, which can be used to classify systems into different classes and identify the relevant variables for each. For example, Bae et al. [2] used SOM to classify 720 sampling sites on the basis of 27 environmental variables into seven clusters, with significant differences of environmental conditions among these clusters.

Another method is to use the ANN model to help identify relevant variables through trial and error and a sensitivity analysis. The ANN model can be used to generate sensitivity ratios that quantify how the training and validation errors change with and without inclusion of each of the candidate input variables. A more detailed overview of this may be found in Coppola et al. [5].

Yet another way to reduce the number of input variables, and therefore the dimensionality of the modeling problem, is to eliminate or combine highly correlated variables. For example, tion is highly dependent on temperature. However, spatial and temporal variability of evapotranspira-tion across a study area is a funcevapotranspira-tion of differences in land use, type of vegetaevapotranspira-tion, surface slopes, etc., and may produce significant variations in correlation between evapotranspiration and temperature. In cases where there is little variation in correlation between the two variables, a single variable (e.g., tem-perature) may be used in lieu of the two variables. In areas where there is significant variation between the two variables, a single lumped value that is the additive or average of the two variables may suffice.

For cases where there is a significant difference in the magnitude of the values, normalization should be used to offset these differences. Lastly, time lags for select input or predictor variables may significantly improve ANN forecasting accuracy, particularly where a “memory” in the system affects and/or is cor-related with future system outcomes that the ANN model is predicting.

2.6.2  Determining the Number of Hidden Nodes

Identifying the “optimal” number of hidden nodes is problem dependent, and a certain amount of trial and error is necessary. From Kolmogorov’s theorem, Hecht-Nielsen [9] derived that the upper bound of the required number of hidden nodes is one greater than twice the number of input nodes. The number of hidden nodes must be capable of two simultaneous objectives; providing sufficient representation of the task but sufficiently low to achieve generalization in order to avoid over-fitting. If the data do not contain much information, or contain a high degree of noise, a fewer number of hidden nodes than the theoretical limit is advisable in order to prevent over-fitting. In some cases, a “fan-in” approach may

be desirable, where a fewer number of hidden nodes is used related to the number of input nodes. This

“fan-in” structure reduces the dimensionality of the data set, promoting generalization. Therefore, in many cases, the optimum number of hidden nodes may be significantly less than the theoretical limit.

2.6.3  Training Patterns for aNN Learning

As “data-driven” models, robust ANN development is fundamentally dependent upon the quantity and quality of the data used to train the models. As discussed by Coppola et al. [4], “appropriate training set size for an ANN depends upon a number of factors, including its dimension (i.e., number of connection weights), the required ANN accuracy, the probability distribution of behavior, the level of noise in the system, and the complexity of the system.” Complexity within the context of ANN modeling refers to a system where small changes in model input values produce large and even contradictory changes in model output values. A system that does not exhibit this type of complexity may then be referred to as a “well-behaved” system.

There is no theoretical derivation for determining the number of necessary training patterns for a given ANN model development problem. However, some researchers suggest that the minimum number of training data required for robust ANN model development is

Minimum number of required training samples = [(1.5×m + 55) (1. ×n)]×c (2.6) where

m is the number of input nodes n is the number of output nodes

c is some constant, typically ranging between 4 and 10

Note that the previous equation does not account for the number of hidden nodes in the ANN. It can be stated that, in general, more connection weights, partly a function of hidden nodes, necessitate more training samples. Therefore, c can be expected to increase with a higher number of hidden nodes for a particular modeling problem. Similarly, c will increase with more complex and/or nonlinear behavior.

Ideally the training samples should span the range of measured or expected behavior. Therefore, it is not simply a matter of using a sufficient number of training samples but the degree to which they statisti-cally represent the problem behavior of interest over its full range.

2.6.4  Over-fitting Data

As discussed earlier, over-fitting of data should be avoided in ANN modeling and, fortunately, can be avoided by following basic protocol. Often, ANN modelers are excessively intent on reducing or minimizing training error to the maximum extent possible. Typically, a researcher will compare two competing models, and even when the relative error difference is almost insignificant, the researcher or modeler will select the model with the lowest error as the de facto superior one.

While low training and validation errors are obviously desirable, it does not ensure that one has developed a robust ANN (or other) model that is capable of generalizing system behavior over a wide range of conditions or that a particular model is necessarily superior to a competing or comparison model with a larger error. To help demonstrate this important concept, Figure 2.3 illustrates an over-fitting example, where the parabola represents the exact function for the system behavior, and the mea-surements have some random errors.

A function perfectly fit to the measurements produces an error of 0 between the measured and fitted values; however, the fitted function does not show the basic properties of the true function, monotonic-ity and convexmonotonic-ity, by producing a varying wavy function.

Artificial Neural Network–Based Modeling of Hydrologic Processes 29

The knowledge of the behavior of the data as well as their accuracy can be used to determine the needed accuracy of the ANN fit. One does not want to fit a function with a smaller error than the errors of the data; when this occurs, one is fitting random errors, rather than the tendency of the function, which is the objective.