Worldwide water resources organizations are engaged in hydrological and hydro-meteorological management and observing water level, rainfall, discharge, sediment, evaporation, temperature and water quality data. These data are very useful in research, historical trend analysis and future forecasting. With the development of database technology, various techniques of data analysis and knowledge extraction tools are being used for collected timeseries data in scientific as well as commercial organizations. Data mining, also referred to as Knowledge Discovery in Database (KDD) is defined as “Discovery of comprehensible, important and previously unknown rules or anything that is useful and non-trivial or unexpected from our collected data ”. Today, data mining is being vastly applied in research and business field. Finding association rules, sequential patterns, classification and clustering of data are typically task involved in the process of data mining. Mainly, data mining is an iterative process in which data have to be critically selected and cleaned; parameters of the mining algorithms are familiarised . The nature and quality of collected data in hydrology are extremely important and all characteristics of such data should be the best possibly analysed. Data mining in Hydrology depends on the hydro-meteorological data, which generally takes the form of timeseries. Hydrologicaltimeseries are sets of various record values of hydrological data that vary with time .
Copulas can be seen as an alternative method to analyze hydrologicaltimeseries data by focusing on the dependence structure, but further exploratory applications and theoretical developments are expected. The copula-based measures in- troduced in this study can be related to the potential model uncertainty, that is, how much the natural system is vary- ing. Empirical autocopula analysis is a more data-driven ap- proach which retains more information than the copulas esti- mated with parametric methods, but it is also numerically de- manding. The effective way to analyze timeseries and build up a timeseries model based on copulas can be further ex- plored.
The analysis shown in chapters 4 and 5 highlighted the advan- tages and disadvantages of the investigation of snow and discharge related variables with wavelet and copula analysis. In particular, on one hand, the wavelet analysis highlights the changes in the frequency content of a signal in different time steps and give also the possibility to investigate the different correlation between two variables in the time frequency domain. On the other hand, the copula analysis permits to isolate the dependence structure be- tween two variables, independently on their marginals. It also permits to describe the simultaneous correlation of several vari- ables. Copula analysis has also shown to be a valuable tool for modeling hydrological variables. As a drawback, we pointed out the necessity of having long reliable timeseries, which is rare in case of hydrological variables. We think that wavelet and copula analysis are two valuable tools, whose possible contribution for the hydrologicaltimeseries analysis has not been fully explored yet. It would be very interesting, for example, to couple the wavelet coherence and copula analysis in order to deepen the dependence structure of the frequency content of two signals. This would also permit to investigate how the dependence structure changes in time. In order to take advantage of both copulas and wavelet analysis, therefore, we could imagine to analyze with the copulas the dependence structure of the scale components obtained after decomposing the signals of two timeseries with the wavelet trans- form. Another possible way to couple both analysis would be to identify the non-stationary correlation between two variables us- ing the wavelet transform. Then, separate copulas analysis for the different part of the timeseries can be performed to identify how the dependence structure between two variables changes in the different parts of the timeseries. Both proposed approaches could find wide application in hydrological studies for example for the generation of non-stationary timeseries and for developing novel bivariate analysis methods.
ods are applied to identify the dominant frequencies of the seasonal component. This implicitly assumes that the sea- sonal component, if present, in a simulated or real hydro- logic series, is dominant enough to be identifiable through the periodogram estimate. Figure 7 shows chaotic-signal mixed with seasonal signals of different frequencies and am- plitudes, and the corresponding periodogram estimates. As shown in Fig. 7, the dominant frequency is observed for each simulation. The amplitudes selected here represent the min- imum values of the amplitudes, corresponding to the given frequencies, above which a peak can be distinguished in the periodogram. Thus, seasonal components of higher frequen- cies need higher amplitudes to make them identifiable. Once the frequencies are obtained, curve fitting techniques can be utilized to obtain the amplitudes of the seasonal compo- nent. The remaining component is the non-seasonal portion, which in this case would be the chaotic component. It is noted again that when extending this observation to real data, the non-seasonal component needs to be further decomposed into random and deterministic components. The determinis- tic component is next analyzed for chaotic signals. The sep- aration of random and deterministic (possibly chaotic) sig- nals is discussed in Sect. 4.2.2. Table 2 provides an example where the white noise component has been isolated from a series which is a mixture of chaotic signal, seasonality, and white noise. The original and recovered standard deviations of the white noise are shown along with the corresponding errors. Once the deterministic component, i.e. chaotic + sea- sonality, is isolated from the mixture, the seasonal compo- nent is separated by fitting a periodic curve with the domi- nant frequency found using frequency domain analysis. The remaining chaotic component is compared with the original Lorenz series used for the simulation. It is seen that as the white noise component in the mixture increases, the predic- tion skills represented by statistics CC and MSE decrease. The PSR, with the embedding dimension of 10, is used for multi-step ahead forecasting of a mixture of chaotic series and seasonality (f = 10, A = 12; and f = 10, A = 40). Figure 8 shows the CC vs. Forecast lead time and MSE vs. Fore- cast lead time plots. Good prediction skills with CC = 1 and MSE = 0 for a mixture of chaotic and seasonal series are ob- served.
However, the inherent problem in answering this question is the subjectivity of what we call pattern and what we call scatter. The remaining uncertainty in discharge given rain- fall, H (Q | P ) (the aleatoric uncertainty), can be made arbi- trarily small by choosing an extremely fine quantization and calculating H based on a corresponding joint histogram. It is important to realize that such a histogram is a model, and any smoothing or dimensionality reduction method used is also a model, so in principle no assessment of mutual in- formation is model-free. Although model complexity control methods can give guidelines on how much pattern can be rea- sonably inferred from a data set, they usually do not account for prior knowledge. This prior knowledge may affect to a large degree what is considered a pattern – for example, by constraining the model class that is used to search for patterns or by introducing knowledge of underlying physics. In the al- gorithmic information theory sense, the prior knowledge can be expressed in the use of a specific computer language that offers a shorter program description for that specific pattern. Prior knowledge is then contained in the code, data and li- braries available to the compiler for the language. An anal- ogy in hydrology would be to have, e.g., a digital elevation model available, or the principle of mass balance, which a hydrological model can use but is considered as a truth not inferred from the current data set, and hence should not be considered part of the complexity of the explanation of those data.
Abstract Flood-prone areas are associated with hydrologicaltimeseries data such as rainfall, water level and river flow. The possibility to predict flood is to relate all the three data involved. However, in order to develop a multivariable prediction model based on chaos approach, each datum needs to identify chaotic dynamics. As such, the Sungai Galas, Dabong in Kelantan, Malaysia which is a flood disaster area has been selected for the analysis. Rainfall, water level and river flow data in this area were collected to be analysed using the Cao method to identify the presence of chaotic dynamics. The hydrological data is uncertain, which is difficult to predict because the data involved is located in the area of flood disaster. The analysis showed the presence of chaotic dynamics on rainfall, water level and river flow data in the Sungai Galas which involved uncertain data located in flood affected areas by using Cao method. Therefore, a multivariable flood prediction model can be implemented using a chaos approach.
Although section 4.4 outlines that the forest inventory for the North Maroondah study sites is highly developed and adequate for forest growth modelling, it is evident that the hydrologicaltimeseries has limitations as four catchments were decommissioned in the early- to mid-1990’s. The study also raises the challenge of identifying trends in streamflow data under circumstances where two disturbances affect the streamflow yield instantaneously, as is the case for treated catchments regenerating after the 1939 fire. The effects of thinning, patch cuts, or the removal of understorey results in retained trees having an increase in water availability, which potentially increases water use per tree and counteracts streamflow gains due to reduced stocking densities (Jarvis, 1975). These vegetation dynamics raise challenges in identifying changes in streamflow trends due to treatment effects. Keeping these challenges in mind, the dissertation largely focuses attention on developing high resolution forest growth models for capturing spatiotemporal changes in hydrologically significant forest characteristics. Catchment level forest growth models of this kind may then provide insight into anticipated future
Data mining refers to extracting or mining knowledge from large amounts of data. The TimeSeries Data Mining (TSDM) methodology follows the time delayed embedding process to predict future occurrences of important events. TSDM framework combines the methods of phase space reconstruction and data mining to reveal hidden patterns predictive of future events in nonlinear, nonstationary timeseries. Timeseries data mining is dedicated to the development and application of novel computational techniques and patterns for the analysis of large temporal databases. Timeseries is an important class of temporal data objects as well as it can be easily obtained from scientific and financial applications. The timeseries data is large in data size, high dimensional and necessary to update continuously. The rapid development of data mining provides a new method for water resources management, hydrology and hydro informatics research. In hydrology, data mining depends on the hydro-meteorological data, which generally takes the structure of timeseries. Hydrologicaltimeseries are the sets of various record values of hydrological data that diverge with time. Researches which are based on data mining theory and hydrological techniques are needed to analyse hydrological daily gauge, rainfall, sediment, evaporation, temperature and discharge timeseries of the particular river in particular station for various types of study.
The Nearest-Neighbour Method (NNM) is data driven and non-parametric, with potential priority, and needs no assumption about the form of the dependence and probability distribution, or estimation of many parameters. Using NNM to model hydrologic process and dynamics in rivers and streams has been well documented (e.g. Lall and Sharma, 1996; Yuan et al., 2000; Wang et al., 2001; Mehrotra and Sharma, 2006; Lee et al., 2011; Liu et al., 2012), since Karlsson and Yakowtz (1987) used NNM for rainfall–runoff forecasting. One of the important parts of NNM is to choose a proper distance measure, as different distance measures may behave quite differently (Qian et al., 2004). Euclidean distance (EUD) is a commonly used distance measure, which represents the absolute distance of a spatial point and is directly related to the coordinate of the point. The cosine angle distance (CAD) is another popular distance measure, which is sensitive to the direction of the feature vector, but has not been used in hydrologicaltimeseries.
Abstract. Droughts are serious natural hazards, especially in semi-arid regions. They are also difficult to character- ize. Various summary metrics representing the dryness level, denoted drought indices, have been developed to quantify droughts. They typically lump meteorological variables and can thus directly be computed from the outputs of regional climate models in climate-change assessments. While it is generally accepted that drought risks in semi-arid climates will increase in the future, quantifying this increase using climate model outputs is a complex process that depends on the choice and the accuracy of the drought indices, among other factors. In this study, we compare seven meteorolog- ical drought indices that are commonly used to predict fu- ture droughts. Our goal is to assess the reliability of these indices to predict hydrological impacts of droughts under changing climatic conditions at the annual timescale. We simulate the hydrological responses of a small catchment in northern Spain to droughts in present and future climate, us- ing an integrated hydrological model calibrated for differ- ent irrigation scenarios. We compute the correlation of me- teorological drought indices with the simulated hydrologicaltimeseries (discharge, groundwater levels, and water deficit) and compare changes in the relationships between hydrolog- ical variables and drought indices. While correlation coef- ficients linked with a specific drought index are similar for all tested land uses and climates, the relationship between drought indices and hydrological variables often differs be- tween present and future climate. Drought indices based solely on precipitation often underestimate the hydrological impacts of future droughts, while drought indices that addi-
In most empirical modeling of hydrologicaltimeseries, the focus was on modeling and predicting the mean behavior of the timeseries through conventional methods of an Autoregressive Moving Average (ARMA) modeling proposed by the Box Jenkins methodology. The conventional models operate under the assumption that the series is stationary that is: Zero mean and either constant variance or season-dependent variances, however, does not take into account the second order moment or conditional variance. In the field of timeseries there are many linear processes that can be modeled by the autoregressive moving average (ARMA) models. However, several timeseries does not exhibit a linear behavior. These processes cannot be well fitted by the common ARMA models. To adequately fit these non-linear timeseries, other more complicated models that have the ability to capture the dynamics of the series more precisely have to be taken into account.
In this article, we present a data set comprising the major- ity of the recorded and modelled hydrometeorological and gravity timeseries at AGGO. The hydrological data set in- cludes soil moisture and groundwater variations. Meteoro- logical timeseries comprise air temperature, humidity, pres- sure, wind speed, solar net radiation, and precipitation. Ad- ditional modelled variables and parameters like soil proper- ties, reference evapotranspiration, and local- and large-scale gravity timeseries are made available for further use. In this way, the gravity recordings at AGGO can conveniently be reduced for large-scale hydrology, atmosphere, and non-tidal ocean loading effects. The data set is divided into three lev- els comprising observed, processed, and modelled time se- ries. Level 1 consists of unmodified recorded data. This type of data is suitable for all users interested in uncorrected ob- servations that are not affected by any processing steps or other data manipulation applied by the provider. Users inter- ested in filtered data corrected for known instrumental issues are advised to use Level 2 products. Level 2 data consist of Level 1 data corrected for artefacts and gaps. Level 3 prod- ucts utilize the Level 2 outputs to model timeseries such as evapotranspiration or water storage in the vadose zone. The data set covers approximately 2.5 years between April 2016 and November 2018.
Abstract. The Rollesbroich headwater catchment located in western Germany is a densely instrumented hy- drological observatory and part of the TERENO (Terrestrial Environmental Observatories) initiative. The mea- surements acquired in this observatory present a comprehensive data set that contains key hydrological fluxes in addition to important hydrological states and properties. Meteorological data (i.e., precipitation, air temperature, air humidity, radiation components, and wind speed) are continuously recorded and actual evapotranspiration is measured using the eddy covariance technique. Runoff is measured at the catchment outlet with a gaug- ing station. In addition, spatiotemporal variations in soil water content and temperature are measured at high resolution with a wireless sensor network (SoilNet). Soil physical properties were determined using standard laboratory procedures from samples taken at a large number of locations in the catchment. This comprehensive data set can be used to validate remote sensing retrievals and hydrological models, to improve the understand- ing of spatial temporal dynamics of soil water content, to optimize data assimilation and inverse techniques for hydrological models, and to develop upscaling and downscaling procedures of soil water content informa- tion. The complete data set is freely available online (http://www.tereno.net, doi:10.5880/TERENO.2016.001, doi:10.5880/TERENO.2016.004, doi:10.5880/TERENO.2016.003) and additionally referenced by three persis- tent identifiers securing the long-term data and metadata availability.
The Fmask algorithm (Zhu and Woodcock, 2012) was used here to quantify the presence of clouds and shadows over each lake on each Landsat image (details in the Ap- pendix). Level 1 Quality Assessment bands provided by USGS now employ Cmask, a C version of this cloud de- tection algorithm, thereby providing comparable data (Foga et al., 2017). Errors from clouds, shadows, and SLC-off pix- els were then assessed and minimized in post-processing by removing images for each lake with interferences above a de- termined threshold. This optimal threshold was determined by minimizing the root mean squared error (RMSE) on sur- face area aggregated over time (15 years) in order to give importance to both the number and quality of the Landsat observations over time, and therefore compromise between maintaining high temporal resolution and minimizing spa- tial errors from clouds, shadows, and SLC-off interferences. Thresholds were varied in 5 % increments and the quality of fit in terms of R 2 and RMSE between remotely sensed sur- face area and field surface area was calculated for decreasing subsets of images. Gouazine lake (Nasri et al., 2004; Al Ali et al., 2008), which possessed both the longest timeseries (over 15 years) and the most accurate data and rating curves (updated six times), was used to optimize the thresholds. The approach was repeated on four other lakes (Morra, Dekikira, Guettar, Fidh Ali) to confirm the suitability of the thresholds and explore the resulting availability of suitable Landsat im- agery over all lakes.
For reservoir impact assessment, timeseries for either the pristine or human-impacted period are usually too short to be used for calibration. For the first time, an integrated mod- elling framework was applied to a data-scarce tropical moun- tainous mesoscale catchment to assess hydrological drought risk by using naturalized and human-impacted reconstructed streamflow and two observed discharge timeseries. Compar- ing observed, simulated, reconstructed, and naturalized dis- charge timeseries is a widely used method to assess and quantify anthropogenic impacts on streamflow (Zhang et al., 2012; Deitch et al., 2013; López-Moreno et al., 2014; Chang et al., 2015; Räsänen et al., 2017). Our softly linked model set-up shows good results in terms of statistical efficiency performances and provides reliable simulations for both re- constructed and naturalized streamflow. This applies also to the low-flow simulations and hydrological drought periods which usually pose the greatest challenges to hydrological modelling (Pilgrim et al., 1988; Nicolle et al., 2014). This method presents several advantages compared to statistics- based approaches such as Budyko curves or double mass curves. The key advantages of this approach are (1) the possi- bility of comparing long-term pristine and modified stream- flow without relying on long-term hydropower release timeseries, (2) larger flexibility to account for reservoir influences at the local level, thus accurately allowing prediction of long- term influences of reservoir on streamflow, and (3) the ability to simulate and analyse scenarios dealing with changes (land use, climate, etc.) in the catchment.
Santos C.A.G and Ideiao S.M.A. (2006). Application of the Wavelet transform for analysis of precipitation and runoff timeseries. Prediction in Ungauged Basin : Promise and Progress (Proceedings of Symposium S7, Seventh IAHS Scientific Assembly. April 2005. Foz do Iguacu, Brazil. 431-439. IAHS Publisher.
Rather than draw our own conclusions, we intended to illuminate the story of this fascinating area of science, and in particular the role played by Benoit Mandelbrot, who died in 2010. The facet of Mandelbrot’s genius on show here was to use his strongly geometrical mathematical imagination to link some very arcane aspects of the theory of stochastic processes to the needs of operational environmetric statistics. Quite how remarkable this was can only be fully appreciated when one reminds oneself of the available data and computational resources of the early 1960s, even at IBM. The wider story [6,7] in which this paper’s theme is embedded, of how he developed and applied in sequence, first the α-stable model in economics, followed by the fractional renewal model in 1/ f noise, and then FBM, and a fractional hyperbolic precursor to the linear fractional stable models, and finally a multifractal model, all in the space of about 10 years, shows both mathematical creativity and a real willingness to listen to what the data was telling him. The fact the he (and his critics) were perhaps less willing to listen to each other is a human trait whose effects on this story—we trust—will become less significant over time.
a sediment transport equation together with the so-called Exner equation to account for sediment transport and storage effects in the riverbed. Examples are the one dimensional 3ST1D model (Papanicolaou et al., 2004), the 1.5 dimen- sional FLORIS-2000 model (Reichel et al., 2000) or the semi two-dimensional stream tube SDAR model (Bahadori et al., 2006). Most two-dimensional bedload transport models have been developed for large riverine or estuarine environments. An example of a two-dimensional model applicable for steep slopes is the Flumen model (Beffa, 2005). The SETRAC model (Rickenmann et al., 2006; Chiari et al., 2010b) has specifically been developed for simulations of steep alpine torrents. These very specialized models allow for simula- tions of sediment transport in a detailed way. The drawback of these models is, however, that important feedback mech- anisms as well as the seriality of processes are hard to study due to the separate treatment of the streamflow modelling and the sediment transport. The answer to this limitation can come from integrated models, which account for both basin hydrology and processes driven by hydrological response, like soil slips and sediment transport in channels. Thus, sediment transport accounting hydrological models were de- veloped, which consider sediment transfer processes at the catchment scale, within the framework of a classical rainfall- runoff model. Examples are the ETC rainfall-runoff-erosion model (Mathys et al., 2003), the SHESED model (Wicks and Bathurst, 1996), the DHSVM model (Doten et al., 2006) or the PROMAB- GIS model (Rinderer et al., 2009).
To address the inter-related, complex, and dynamic socio- hydrological challenges in floodplains, we propose a trans- disciplinary approach. This will enable an understanding of how social relations influence flood hydrology, while the hy- drology of floods influences the unfolding of social relations. For this purpose, we suggest historical, empirical, trans- disciplinary studies of structural interventions within sev- eral floodplains characterised by diverse hydrological con- ditions and socio-political contexts. Investigating differences in the structural interventions and the dominant flood mitiga- tion approaches sets a rich research agenda: how these dif- ferent socio-technical approaches in floodplains are formed, adapted and reformed through social, political, technical and economic processes; how they require and/or entail a reorder- ing of social relations leading to shifts in governance and creating new institutions, organisations and knowledge; and how these societal shifts then impact floodplain hydrology. It is expected that the interpretation of the reciprocal effects and interactions between floods and societies taking place in diverse floodplains will ultimately facilitate the develop- ment of theories explaining the dynamics of floodplains as coupled human-water systems. In this context, conceptual- ising human-flood interactions can be a fruitful way to for- malise knowledge from different disciplines, formulate hy- potheses, and explore long-term dynamics (Di Baldassarre et al., 2013).
the hydraulic characteristics of river channels to determine the optimal discharge for a river reach based on specified rules, TimeSeries Analysis (TSA) - calculates summary statistics of timeseries data, including hydrological metrics, and TimeSeries Manager - manipulates and manages timeseries data. The TSA module has been designed to calculate summary metrics of daily discharge data, however it can handle other forms of timeseries data such as timeseries hydraulic data output from the HA module. The range of statistics calculated by the TSA module has been informed by a review of the literature, focusing on hydrological statistics used in ecological studies. The TSA module can present summary statistics based on the entire period of record, annually, seasonally, or monthly depending on the specific issue being investigated. The TSA module includes spell analysis, rates of hydrograph rise and fall, prediction of flood return interval (partial and annual series), baseflow, and seasonality. In addition to the numeric output, the TSA module has some neat visualization tools for plotting flow duration curves, flood frequency curves, and baseflow vs flood-flow. In addition to above, there is many more package tools are used for hydrologic studies. The hard of the decision is which one to use and it depends of the dataset i.e. Digital Terrain Model (DTM), Digital Elevation Model (DEM), Rugossity, grid spatial resolution, computer equipment, etc.