Background - Hybrid Intelligent System Data Mining Techniques and

Chapter 2 Hybrid Intelligent System Data Mining Techniques and

3.2 Background

An important application of environmental hydraulics is the prediction of the fate and transport of pollutants that are released into watercourses, either as a result of accidents or as regulated discharges. Such predictions are primarily dependent on the water velocity, longitudinal mixing, and chemical/physical reactions etc, of which longitudinal dispersion coefficient is a key variable for the description of the longitudinal spreading in a river. After being first introduced in Taylor (1954), extensive studies have been made based on experimental and field data for predicting the dispersion coefficient (Jobson 1997; Seo and Cheong 1998; Deng, Singh et al. 2001; Wallis and Manson 2004; Boxall and Guymer 2007). The majority of such work has used the Advection- Dispersion Equation approach because strong physical basis makes it more amenable to predicting conditions in rivers and streams for which no model has previously been calibrated (Wallis and Manson 2004).

The concept of longitudinal dispersion coefficient was first introduced in Taylor (1954). Based on this work, the following integral expression was developed (Fischer, List et al. 1979; Seo and Cheong 1998) and generally accepted:

𝐾= −1 𝐴 𝑕𝑢 ′ 1 𝜀𝑡𝑕 𝑦 0 𝐵 0 𝑕𝑢′_{𝑑𝑦𝑑𝑦𝑑𝑦} 𝑦 0 (3.1)

where K = longitudinal dispersion coefficient; A = cross-sectional area; B = channel width; h = local flow depth; u' = deviation of local depth mean flow

velocity from cross-sectional mean; y = coordinate in the lateral direction; and

εt = local (depth averaged) transverse mixing coefficient. An alternative approach utilises field tracer measurements and applies the method of moments. It is also well documented in the literature (Rutherford 1994; Guymer 1999; Rowinski, Piotrowski et al. 2005) and defines K as

𝐾=𝑈𝑐 2 2

𝜎_𝑡2 𝑥2 − 𝜎𝑡2 𝑥1

𝑡 ₂ − 𝑡 ₁ (3.2)

where Uc = mean velocity, x1 and x2 denotes upstream and downstream measurement sites, 𝑡 = centroid travel time, ζt2

(x) = temporal variance.

However, owing to the requirement for detailed transverse profiles of both velocity and cross-sectional geometry, Eq.(3.1) is rather difficult to use. Furthermore, Eq.(3.2), called the method of moments (Wallis and Manson 2004), requires measurements of concentration distributions and can be subject to serious errors due to the difficulty of evaluating the variances of the distributions caused by elongated and/or poorly defined tails. As a result, extensive studies have been made based on experimental and field data for predicting the dispersion coefficient (Jobson 1997; Seo and Cheong 1998; Deng, Singh et al. 2001; Wallis and Manson 2004).

For example, employing 59 hydraulic and geometric datasets measured in 26 rivers in the United States, Seo and Cheong (1998) used dimensional analysis

and applied the one-step Huber method, a nonlinear multi-regression method, to derive the following equation:

𝐾= 5.915(𝐻𝑢∗) 𝐵 𝐻 0.62 _𝑈 𝑢∗ 1.428 (3.3)

in which u* = shear velocity. This technique uses the easily measureable hydraulic variables B, H and U, together with a frequently used parameter, extremely difficult to accurately quantify in field applications, u*, to estimate the dimensionless dispersion coefficient K from Eq.(3.3). Another empirical equation developed by Deng et al. (2001) is a more theoretically based approximation of Eq.(3.1), which not only includes the conventional parameters of (B/H) and (U/u*) but also the effects of the transverse mixing εt0, as follows: 𝐾 = 0.15 𝐻𝑢 ∗ 8𝜀𝑡0 𝐵 𝐻 5 3 _𝑈 𝑢∗ 2 (3.4) where 𝜀_𝑡0 = 0.145 + 1 3520.0 𝐵 𝐻 1.38 _𝑈 𝑢∗ (3.5)

These equations are easy to use, assuming measurements or estimates of the bulk flow parameters are available. However, they may be unable to capture

the complexity of the interactions of the fundamental transport and mixing mechanisms, particularly those created by non-uniformities across the wide range of channels encountered in nature. In addition, the advantage of one expression over another is often just a matter of the selection of data and the manner of their presentation. Regardless of the expression applied, one may easily find an outlier in the data, which definitely does not support the applicability of a particular formula. An expectation that, in spite of the complexity of the river reach, the dispersion coefficient may be represented by one of the empirical formulae seems exaggerated (Rowinski, Piotrowski et al. 2005).

Furthermore, most of the studies have been carried out based on specific assumptions and channel conditions and therefore the performance of the equations varies widely for the same stream and flow conditions. For instance, Seo and Cheong (1998) used 35 of the 59 measured datasets to establish Eq.(3.3) and the remaining 24 for verifying their model. While the model of Deng et al. (2001) (Eq.(3.4) and Eq.(3.5)) is limited to straight and uniform rivers. They also assume that the river has a width-to-depth ratio greater than 10. Therefore, a model that has greater general applicability is desirable.

Recently ANN modelling approaches have been embraced enthusiastically by practitioners in water resources, as they are perceived to overcome some of the difficulties associated with traditional statistical approaches, e.g. making

assumptions with regard to stream geometry or flow dynamics (Maier and Dandy 1998). They offer an effective approach for handling large amounts of dynamic, non-linear and noisy data, especially when the underlying physical relationships are not fully understood (Haykin 1994; Hagan, Demuth et al. 1996; Cannas, Fanni et al. 2006).

In specific terms, several authors (Kashefipour, Falconer et al. 2002; Rowinski, Piotrowski et al. 2005; Tayfur and Singh 2005; Piotrowski, Rowinski et al. 2006; Tayfur 2006) have reported successful applications of ANNs to the prediction of dispersion coefficient. For example, in the case of Tayfur and Singh (2005) the ANN was trained and tested using 71 data samples of hydraulic and geometric variables and dispersion coefficients measured on 29 streams and rivers in the United States, with the result that 90% of the dispersion coefficient was explained. Rowinski, Piotrowski et al. (2005) applied an MLP with the LM Algorithm to three different datasets which have been explored in the literature. The lowest percentage of training data mean error was found to be 7.02%. However, there is a lack of a suitable input determination methodology for ANN models in these applications. Moreover, without further interpretation of the trained network, their results are not easily transferable.

In document Intelligent data mining using artificial neural networks and genetic algorithms : techniques and applications (Page 101-106)