• No results found

Runoff forecasting by artificial neural network and conventional model

N/A
N/A
Protected

Academic year: 2021

Share "Runoff forecasting by artificial neural network and conventional model"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

ORIGINAL ARTICLE

Runoff forecasting by artificial neural network

and conventional model

A.R. Ghumman

a,

*

, Yousry M. Ghazaw

a

, A.R. Sohail

b

, K. Watanabe

c

a

Department of Civil Engineering, Al Qassim University, Saudi Arabia

b

Hydrology Modeler, Water Resources Murray–Darling Basin Authority, Australia

c

Saitama Package-D, Technical Development Center, Saitama University, 525, 21st Century Building, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Japan

Received 11 October 2011; accepted 24 November 2011 Available online 14 February 2012

KEYWORDS

Hub River; ANN models; Mathematical models; Low quality data; Runoff analysis

Abstract Rainfall runoff models are highly useful for water resources planning and development. In the present study rainfall–runoff model based on Artificial Neural Networks (ANNs) was devel-oped and applied on a watershed in Pakistan. The model was develdevel-oped to suite the conditions in which the collected dataset is short and the quality of dataset is questionable. The results of ANN models were compared with a mathematical conceptual model. The cross validation approach was adopted for the generalization of ANN models. The precipitation used data was collected from Meteorological Department Karachi Pakistan. The results confirmed that ANN model is an impor-tant alternative to conceptual models and it can be used when the range of collected dataset is short and data is of low standard.

ª2012 Faculty of Engineering, Alexandria University. Production and hosting by Elsevier B.V. All rights reserved.

1. Introduction

Accurate predictions of floods had been challenging for scien-tists and engineers since centuries. Different techniques had been employed, with various improvements, to get accurate flood estimates[3–6]. Rainfall–runoff (R–R) analysis is quite difficult due to presence of complex nonlinear relationships in the transformation of rainfall into runoff. However runoff analyses are very important for the prediction of natural calamities like floods and droughts. It also plays a vital role in the design and operation of various components of water re-sources projects like barrages, dams, water supply schemes, etc. Runoff analyses are also needed in water resources plan-ning, development and flood mitigations. A flood in 2010 re-sulted in a loss of several lives and damage of property and * Corresponding author.

E-mail addresses: abdulrazzaq@uettaxila.edu.pk (A.R. Ghumman),

ghazaw@yahoo.co (Y.M. Ghazaw), sohailrai@yahoo.com

(A.R. Sohail),kunio@mail.saitama-u.ac.jp(K. Watanabe).

1110-0168 ª 2012 Faculty of Engineering, Alexandria University. Production and hosting by Elsevier B.V. All rights reserved. Peer review under responsibility of Faculty of Engineering, Alexandria University.

doi:10.1016/j.aej.2012.01.005

Production and hosting by Elsevier

Alexandria University

Alexandria Engineering Journal

www.elsevier.com/locate/aej www.sciencedirect.com

(2)

crops of millions of dollars in Pakistan only because there was no assessment of flood magnitude. Various types of modeling tools had been used to estimate runoff. These techniques con-sist of lumped conceptual models, distributed physically based models, stochastic models and black box (time series) models [23,3,4].

Conceptual and physically based models although try to ac-count for all the physical processes involved in the R–R pro-cess, their successful use is limited mainly because of the need of catchment specific parameters and simplifications in-volved in the governing equations[1]. On the other hand the use of time series stochastic models is complicated due to non-stationary behavior and nonlinearity in the data. These models often require experience and expertise of the modeler[10].

Approximately, from the last two decades Artificial Neural Networks (ANNs) has emerged as a powerful computing sys-tem for highly complex and nonlinear syssys-tems. ANN belongs to the black box time series models and offers a relatively flex-ible and quick means of modeling. These models can treat the nonlinearity of system to some extent due to their parallel architecture. A few studies reported poor performances of ANN models in comparison with the conventional ones. For example Gaume and Gosset[9]compared feed forward ANNs with a linear model and a conceptual model. They concluded that their conceptual model outclassed the linear and ANN models. Some other studies show the successful applications of ANN models in the simulation of future runoffs with high degree of accuracy[2]. Maria et al.[12]compared ANN and Box & Jenkins techniques and concluded that ANN is an improvement on Box & Jenkins model. Sohail et al.[22] com-pared ANN with MARMA (multivariate auto regressive mov-ing average models[7]) models in a small watershed of Tono area in Japan during wet and dry seasons. They concluded that ANN models show better results during wet seasons when the nonlinearity of R–R process is high.

From the above discussion it is evident to state that ANN models provide better predictions as compared to the conven-tional models, however their application is as yet restricted with the research environment [11]. The noise contained in the data is a recurrent problem in hydrological time series which adversely affect the performance of a numerical model including ANN. Flow measurements obtained by measuring stage (water surface elevation) are often subject to many er-rors. Approximations in determining stage–discharge relation-ships or changes in the cross section of the stream after a flood, for example, may affect the accuracy of measurement [9]. Noise level in the hydrological dataset usually ranges between 5% and 15%[20]. However, it is a case–subjective and depends upon the quantity and quality of instruments and infrastruc-ture being used to collect the data. It also depends upon the availability of trained manpower. For the countries actively in-volved in the research the importance of the data collection is widely understood and the quality of data being collected is quite reliable. On the other hand there are many watersheds and rivers around the globe which either lack any data collec-tion system or the quantity of data is scarce and the quality of data being collected is substandard.

In the literature concerning the applications of ANN mod-els to simulate R–R processes, much of the published work de-scribes the applications of ANN models in the watersheds where the quality of dataset is good and is highly reliable. In addition sufficient quantity of data is available for the proper

generalization of an ANN model. On the contrary, applica-tions of ANN models have very little been reported in water-sheds where the quality of collected dataset is poor and quantity-wise the data is scarce[17]. The present study deals with the second case in which the hydrological dataset is quan-tity-wise scarce. Moreover the quality of data is unreliable and is of low standard with suspicion of high percentage of noise and errors. The main aim of this paper was thus to determine the confidence levels and degree of effectiveness of ANN mod-els under conditions when the collected data is scarce and its authority is questionable.

In the present work the ANN model was developed and ap-plied to catchement of Hub River in Pakistan. The datasets were available on monthly basis for rainfall and discharge but as already stated the length of collected dataset is short and the data is not of good standard. Sometimes abnormal measurements of discharge and rainfall were observed in odd months. Therefore in order to establish the guidelines for the construction of ANN model for R–R analyses in case of short length and low quality data, the results by ANN model for the catchement of this river were compared with those of concep-tual R–R model which was already developed for this water-sheds by Abulohom et al.[1].

2. Description of the study area

Abulohom et al.[1]described, in detail, the watershed of Hub River considered in this study. However for ready reference some of the important parameters of the watershed which are of significance in ANN applications are briefly discussed over here. The approximate location of the watershed is shown inFig. 1. The watershed was selected on the basis that there is no snow melt contribution to total runoff and a continuous re-cord of observed runoff is available. The Hub River watershed is located in the Sindh Province of Pakistan near the Arabian Sea. The elevation ranges between 300 and 1824 m above mean sea level. The drainage area is about 9391 km2. The major part

A ArraabbiiaannSSeeaa H HuubbRRiivveerr c caattcchhmmeenntt

Figure 1 The approximate location of the studied watershed in Pakistan.

(3)

of study area consists of narrow mountainous region. The cli-mate is arid. Mean annual precipitation is 194 mm, two third of which is received during summer season. Due to the arid cli-mate major part of the area is barren with sparse natural veg-etation. Cultivation is confined to only river plain where subsoil water is available for pumping. There are three types of land use: grazing, tube well cultivation and hill torrent water cultivation. The rocks mainly consist of sandstones, lime-stone’s, shale’s and conglomerates of various ages. All the soils are strongly calcareous and nonalkali.

3. Brief theory of models used in the study 3.1. Artificial Neural Network models (ANN)

The detailed methodology, architecture and working of ANN is given in many standard text (see for example[8]). Most of the ANN models used in hydrological applications are feed forward back propagation neural networks which are trained by back propagation algorithm. However, many different forms of ANN models have emerged since the last few years which have different architectures and algorithms according to the nature of the problem. For example, Nagesh et al.[13] used recurrent neural networks. Sohail et al.[21]proposed ge-netic algorithm to optimize an ANN model. In this study the ANN models based on back propagation have been used.

Processing of information in such ANN models is based on the functions of human brain. Majority of these models use three layers, architecture with input, hidden and output layers as shown inFig. 2. Each layer is further composed of artificial neurons known as nodes. The information of all the known parameters is input to the input nodes (i) which forward this information to all the nodes of hidden layer. At any hidden node (h), the information received from all the input nodes and the bias node of the input layer is summed up (i.e.,i1,i2,

i3,. . .,in,bi) as:

xH¼

Xj¼n j¼1

whjijþwhbbi; ð1Þ

where whj is the synaptic weight between hidden and input

layer,whbis the synaptic weight between hidden and bias node

of input layer, ‘i’ is the input node and ‘bi’ is the bias node of

input layer.

The summed up input (xH) is then mathematically treated by

using the activation function. Different types of activation func-tions have been used by various researchers. Shamseldin et al. [18]used five transfer functions i.e. logistic, hyperbolic tangent, bipolar, arctan and scaled arctan functions. However in most studies the original sigmoid logistic nonlinear activation func-tion as suggested by Rumelhart and McClelland[16]has been used and was also used in the present study. The sigmoid logistic activation function yields the output ofXHas:

XH¼

1

1þexH; ð2Þ

The processed values (XH) from hidden layer nodes are sent

to the output layer nodes in a similar fashion as from input nodes. At any output node (o) the signals received from all the hidden nodes and the bias node of hidden layer are again summed up (i.e.h1,h2,h3,. . .,hn,bh) in a similar manner as

Eq.(1), and acted upon by the activation function similarly as Eq.(2), and then compared with the target values. Mean square error (MSE) is computed at the output layer and if MSE is within acceptable limits the training of back propaga-tion ANN model is terminated otherwise the feed backward pass is carried out for updating of synaptic weights between hidden and output layers by using the back propagation equa-tion. The synaptic weights between input and hidden layers are also updated in an analogous manner.Fig. 2shows one hidden layer only, additional hidden layers may also be used, but most researchers recommended the use of single hidden layer[18]. They argued that the additional hidden layers may consume more resources and take considerable time to train the network with very little improvement in the generalization capability of the network and sometimes may result in over parameteriza-tion. In order to avoid the over fitting problem during the applications of back propagation ANN model to R–R analy-ses, the cross-validation approach has been adopted in this study. In cross-validation approach the available dataset is usually divided into three portions training, validation and test sets. Training set is used to train the model, validation set is used to avoid over or under training of model and test set is used to check the efficiency of trained model in actual field conditions.

4. Conceptual mathematical model

The mathematical model used in the present study has been de-scribed by Abulohom et al.[1]. The comprehensive details of this model will not be reproduced for brevity reasons and interested readers may consult the said publication. However the short introduction of the model is presented for the pur-pose of present study. The model was developed to simulate runoff from a watershed, considering both the loss and routing functions but with a few parameters to be optimized. The mod-el is based on the equation of continuity as following.

R¼PL; ð3Þ

whereRis total runoff,Pdenotes precipitation andL repre-sents losses. All quantities in this equation are in mm. The methodology adopted for modeling runoff from rainfall is essentially based on regression analysis[25]. The total runoff (R) is calculated as direct runoff and baseflow [25]. The monthly actual evapotranspiration is calculated as the

Input layer Hidden layer Output layer

Where, ‘i’ is input layer node, ‘h’ is hi dden layer node, ‘o’ is output layer node, bi’ is bias node of input layer and ‘bh’ is bias node of hidden layer.

i i11 i i22 b bii h h11 h h22 b bhh O O11 I Innppuutt O Ouuttppuutt

Figure 2 Architecture of back propagation ANN model used in the study.

(4)

able water during month and mean monthly potential evapo-transpiration (measured pan evaporation) [24]. The Muskin-gum method was used for routing the estimated runoff by the linear reservoir method[15]. The input to the model is pre-cipitation (Pi) and mean monthly potential evapotranspiration

(Ep)i, where time (i) is in months. Some functions convert

pre-cipitation into its various components and finally the total run-off is estimated. By suitably lagging the above runrun-off components, total runoff (Rs)iat the watershed outlet can be

computed. Observed runoff record is used for calibrating and testing the model.

5. Methodology, experiments and data used 5.1. Identification and selection of ANN models

One of the aims of the present paper was to train the ANN model to forecast the monthly future discharge for short length and low standard datasets. The considerable variation in the runoff coefficients, given inTable 1, shows the high percentage of errors and noises present in the collected datasets. In order to identify the most suitable inputs a cross correlation analysis was carried out. The results of cross correlation analysis sug-gested the following functional relationship between the vari-ous inputs and the monthly future discharge for Hub watershed.

Qðtþ1Þ ¼f

Z

½Qðt1Þ;QðtÞ;Pðt1Þ;PðtÞ; ð4Þ whereQis runoff,Pis precipitation andtis time in months.

From Eq.(4), four input nodes were selected. The hidden nodes were selected by the hit and trial method as recom-mended by most researchers [19]. After changing the hidden nodes from 2 to 15 and observing the MSE (mean square er-ror) on the trail data set and keeping in consideration the rule of parameter parsimony two hidden nodes were found to be most suitable for all the three watersheds. The output node

is only one i.e. 1 month future discharge. The finally selected architectures of back propagation ANN models were (4, 2, 1). The learning rate (lr) and momentum factor (mf) also

signif-icantly affect the generalization ability of back propagation ANN model. After careful consideration and by using hit and trial methodlrvalue was selected as 0.3 andmf value as

0.5. The starting values of synaptic weights also considerably affect the training process of back propagation ANN model. The synaptic weights initialized in the range of ±3.0 was found to be most suitable.

Table 1 Annual water budget and runoff coefficients for Hub River.

Water yeara Rainfall (mm) Runoff (mm) Runoff coefficient

1963–1964 129.01 44.45 0.34 1964–1965 1663.73 160.02 0.10 1965–1966 501.23 77.98 0.16 1966–1967 326.91 85.60 0.26 1967–1968 3056.59 499.10 0.16 1968–1969 56.20 24.39 0.43 1969–1970 102.15 42.92 0.42 1970–1971 1231.64 305.05 0.25 1971–1972 306.27 128.53 0.42 1972–1973 215.75 37.08 0.17 1973–1974 373.97 76.70 0.21 1974–1975 150.37 11.69 0.08 1975–1976 2912.86 351.54 0.12 1976–1977 4495.30 263.13 0.06 1977–1978 1659.20 451.11 0.27 1978–1979 3514.86 663.19 0.19 a

It should be noticed that the water year is calculated from October to September as found in the source from which the data was obtained. Training phase 0 10 20 30 40 50 60 12 24 36 48 60 72 84 96 108 Time in mounths Runoff in m m Obsd ANN Conc

Figure 3 Comparison of models results in training phase (data of 120 months (1963–1973)). Validation phase 0 2 4 6 8 10 12 14 16 18 20 122 124 126 128 130 132 134 136 138 140 142 144 Time in mounths Runoff in m m Obsd ANN Conc

Figure 4 Comparison of models results in validation phase (data of 24 months (October, 1973–September, 1975)).

Testing phase 0 10 20 30 40 50 60 150 156 162 168 174 180 186 192 Time in mounths Runoff in m m Obsd ANN Conc

Figure 5 Comparison of models results in testing phase (data of 48 months (October, 1975–September, 1979)).

(5)

Continuously measured monthly rainfall and runoff data was available from October 1963 to September1979. Ten years data (120 months data) was used for the training of ANN models, i.e. 1963–1973. For validation, 24 months (October, 73–September, 75) data was used. For the testing of ANN models 48 months (October, 75–September, 79) data was used. In this study sigmoid logistic nonlinear activation function has been used for the application of ANN model, therefore the data is required to be normalized in the range of 0–1. However some researchers recommended the limits of 0.1–0.9 to simu-late the floods exceeding the range of training sets [11]. For the reason the same limit was used in this study to normalize the data.

The performances of two models were compared with two different types of statistics parameters i.e. coefficient of effi-ciency (CE) and root mean square error (RMSE)[14]. 6. Results and discussion

The graphical comparison of results of the two models with the observed data is shown inFigs. 3–5. The results of three phases i.e. training, validation and test phases have been shown in these figures. To discuss the results of these figures and to assess the performance of the ANN and conceptual models the calculated statistical parameters have been given inTable 2. It can be ob-served that the ANN model performed better or at least equiva-lent to conceptual model during training and validation phases. The CE is 88.95% for ANN whereas it is 84.37% for conceptual model in training phase. Similarly the RMSE is 3.27 for ANN and 3.89 mm for conceptual model in training phase. In test phase conceptual model seemed to be better. The values of the CE and RMSE are 63.83% and 2.28 mm respectively for ANN and 54.33% and 2.56 mm respectively for conceptual model in validation phase. The values of the CE and RMSE are 63.22% and 11.54 mm respectively for ANN and 68.52% and 10.67 mm respectively for conceptual model in test phase. However, in terms of Shamseldin[19], the models with CE values above 90% are very satisfactory, in between 80% and 90% are fairly good and below 80% are unsatisfactory. In this context, it is observed that none of the two models showed good perfor-mance in statistical tests. It is well known that the R–R processes of the river are extremely nonlinear. The models might not be able to catch this nonlinear functional relationship. Another rea-son for this may be the high percentage of noise and errors con-tained in the dataset due to which the models efficiency is not very high to simulate the R–R processes. However for water re-sources planning and management the overall efficiency is found to be fairly acceptable.

7. Summary and conclusions

In the present study, back propagation artificial neural net-work model was developed for Hub River in Pakistan for

conditions when the collected hydrological data was short and containing high percentage of error and noise. The model was constructed to forecast monthly runoff. The performances of the ANN model have been compared with conceptual mod-el. It was concluded that ANN model provided improved monthly forecasts during the training phase. In validation and test phases, the performances of ANN model and concep-tual models were quite similar. However both the models could not produce very high efficiency results in validation and test phases. Based on the results it is concluded that ANN model is an important alternative to the conceptual model for the rainfall runoff analyses. It is relatively easy to develop and do not require the detailed investigation of the catchment hydrological and geological parameters as are essential for the application of conceptual, deterministic and other kinds of physically based models. The major advantage of ANN model is that it is independent of the hydrological and geolog-ical parameters of the catchment, which is a major difficulty in the application of conceptual models in a watershed. The re-sults further confirmed that ANN model is quite comparable to conceptual R–R models.

References

[1] M.S. Abulohom, S.M.S. Shah, A.R. Ghumman, Development of a rainfall–runoff model, its calibration and validation, Journal of Water Resources Management 15 (2001) 149–163. [2] A. Agarwal, R.D. Singh, Runoff modeling through back

propagation artificial neural network with variable rainfall– runoff data, Water Resources Management 18 (2004) 285–300. [3] A. Bahremand, F.De. Smedt, Distributed hydrological modeling

and sensitivity analysis in Torysa Watershed, Slovakia, Water Resources Management 22 (3) (2010) 393–408.

[4] A. Bahremand, F.De. Smedt, Predictive analysis and simulation uncertainty of a distributed hydrological model, Water Resources Management 24 (13) (2010) 2869–2880.

[5] E.G. Bekele, H.V. Knapp, Watershed modeling to assessing impacts of potential climate change on water supply availability, Water Resources Management 24 (13) (2010) 3299–3320. [6] A. Bhadra, A. Bandyopadhyay, R. Singh, N.S. Raghuwanshi,

Rainfall–runoff modeling: comparison of two approaches with different data requirements, Journal of Water Resource Management (24) (2010) 37–62.

[7] G.E. Box, G.M. Jenkins, Time Series Analysis: Forecasting and Control, Prentice Hall, San Francisco, California, 1976. [8] R.C. Eberhart, R.W. Dobbins, Neural Network PC Tools a

Practical Guide, Academic Press, New York, 1990, pp. 1–414. [9] E. Gaume, R. Gosset, Over-parameterization, a major obstacle

to the use of artificial neural networks in hydrology, Hydrology and Earth System Sciences 7 (5) (2003) 693–706.

[10] V. Gu¨ldal, H. Tongal, Comparison of recurrent neural network, adaptive neuro-fuzzy inference system and stochastic models in Egirdir Lake level forecasting, Journal of Water Resource Management 24 (2010) 105–128.

[11] C.E. Imrie, S. Durucan, A. Korre, River flow prediction using artificial neural networks: generalization beyond the calibration range, Journal of Hydrology (2000) 138–153.

Table 2 Comparative performances of ANN and conceptual models.

Model name Model structure Training set Validation set Test set

CE (%) RMSE (mm) CE (%) RMSE (mm) CE (%) RMSE (mm)

ANN 4, 2, 1 88.95 3.27 63.83 2.28 63.22 11.54

(6)

[12] C.M. Maria, G.M. Wenceslao, F.B. Manuel, M.P.S. Jose, L.C. Roman, Modelling of the monthly and daily behaviour of the runoff of the Xallas River using Box-Jenkins and neural networks methods, Journal of Hydrology 296 (2004) 38–58. [13] D.K. Nagesh, K.S. Raju, T. Sathish, River flow forecasting using

recurrent neural networks, Water Resources Management 18 (2004) 143–161.

[14] J.E. Nash, J.V. Sutcliffe, River flow forecasting through conceptual models, part1 – a discussion of principles, Journal of Hydrology 10 (1970) 282–290.

[15] V.M. Ponce, Catchment routing, Engineering Hydrology (1989) 306–331.

[16] D.E. Rumelhart, J.L. McClelland (Eds.), Parallel distributed processing: explorations in the microstructures of cognition, vol. 1, MIT, Foundations, Cambridge, MA, 1986.

[17] N. Sajikumar, B.S. Thandaveswara, A non-linear rainfall– runoff model using an artificial neural network, Journal of Hydrology 216 (1999) 32–55.

[18] A.Y. Shamseldin, A.E. Nasr, K.M. O’Connor, Comparison of different forms of the multi-layer feed forward neural network method used for river flow forecasting, Hydrological Earth System Sciences 6 (4) (2002) 671–684.

[19] A.Y. Shamseldin, Application of a neural network technique to rainfall–runoff modeling, Journal of Hydrology 199 (1997) 272–294. [20] B. Sivakumar, K.K. Phoon, S.-Y. Liong, C.-Y. Liaw, A systematic approach to noise reduction in chaotic hydrological time series, Journal of Hydrology 219 (1999) 103–135. [21] A. Sohail, K. Watanabe, S. Takeuchi, Stream flow forecasting

by artificial neural network (ANN) model trained by real coded genetic algorithm (GA), Journal of Groundwater Hydrology Japan 48 (4) (2006) 233–262.

[22] A. Sohail, K. Watanabe, S. Takeuchi, Runoff analysis for a small watershed of Tono area Japan by back propagation artificial neural network with seasonal data, Journal of Water Resources Management Springer (2007), doi:10.1007/s11269-006-9141-0. [23] T. Tingsanchali, M.R. Gautam, Application of tank, NAM,

ARMA and neural network models to flood forecasting, Hydrological Processes 14 (2000) 2473–2487.

[24] G.L. Vandewiele, N.-L. Win, Monthly water balance models for 55 basins in 10 countries, Hydrological Sciences Journal 43 (5) (1998) 687–699.

[25] G.L. Vandewiele, C.-Y. Xu, N.-L. Win, Methodology and comparative study of monthly water balance models in Belgium, China and Burma, Journal of Hydrology 134 (1992) 315–347.

References

Related documents

3.1, the trains of ISWs generated in E2 are separated by distances which are selected by the wavelength of the forcing beam, whereas in the ocean these trains are separated by

Teratological implications of soft tissue abnormalities found in human lower limbs with bony defects. Levinsohn EM, Crider

Solitary pulmonary metastasis from prostate cancer with neuroendocrine differentiation a case report and review of relevant cases from the literature WORLD JOURNAL OF SURGICAL

Role of cerebrospinal fluid, creatine kinase and lactate dehydrogenase enzyme levels in diagnostic and prognostic evaluation of tubercular and

This thesis approaches theories and illustrations that engulf important information about the scope, concept, challenges and benefits to the processes of data mining by

In this paper, we have considered a finite parallel-plate waveguide with four-layer material loading as a 2-D geometry that can form cavities, and analyzed the H -polarized plane

The triumvirate of beta cell regeneration solutions and bottlenecks to curing diabetes SUMEET P SINGH1 and NIKOLAY NINOV*,1,2,3 1DFG Center for Regenerative Therapies Dresden, Cluster

International Journal of Scientific Research in Computer Science, Engineering and Information Technology CSEIT1722172 | Received 22 March 2017 | Accepted 31 March 2017 | March April