• No results found

How To Predict Cyanotoxins From A Model Using Machine Learning And Support Vector Regression

N/A
N/A
Protected

Academic year: 2021

Share "How To Predict Cyanotoxins From A Model Using Machine Learning And Support Vector Regression"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Hybrid modelling based on support vector regression with genetic

algorithms in forecasting the cyanotoxins presence in the Trasona reservoir

(Northern Spain)

P.J. Garcı´a Nieto

a,n

, J.R. Alonso Ferna´ndez

b

, F.J. de Cos Juez

c

, F. Sa´nchez Lasheras

d

, C. Dı´az Mun

˜ iz

b aDepartment of Mathematics, Faculty of Sciences, University of Oviedo, 33007 Oviedo, Spain

bCantabrian Basin Authority, Ministry of Agriculture, Food and Environment, 33071 Oviedo, Spain c

Mining Exploitation and Prospecting Department, University of Oviedo, 33004 Oviedo, Spain d

Department of Construction and Manufacturing Engineering, University of Oviedo, 33204 Gijo´n, Spain

a r t i c l e

i n f o

Article history: Received 21 May 2012 Received in revised form 29 October 2012 Accepted 2 January 2013 Available online 29 January 2013 Keywords:

Statistical machine learning techniques Cyanobacteria

Cyanotoxins

Genetic algorithms (GAs) Support vector regression (SVR)

a b s t r a c t

Cyanotoxins, a kind of poisonous substances produced by cyanobacteria, are responsible for health risks in drinking and recreational waters. As a result, anticipate its presence is a matter of importance to prevent risks. The aim of this study is to use a hybrid approach based on support vector regression (SVR) in combination with genetic algorithms (GAs), known as a genetic algorithm support vector regression (GA–SVR) model, in forecasting the cyanotoxins presence in the Trasona reservoir (Northern Spain). The GA-SVR approach is aimed at highly nonlinear biological problems with sharp peaks and the tests carried out proved its high performance. Some physical–chemical parameters have been considered along with the biological ones. The results obtained are two-fold. In the first place, the significance of each biological and physical–chemical variable on the cyanotoxins presence in the reservoir is determined with success. Finally, a predictive model able to forecast the possible presence of cyanotoxins in a short term was obtained.

&2013 Elsevier Inc. All rights reserved.

1. Introduction

Cyanobacteria are photosynthetic prokaryotes lacking the

typi-cal membrane-bound organelles Z such as nuclei and choloroplasts

found in true algae. Consequently they are now classified as

bacteria and known most correctly as cyanobacteria although the

term ‘‘blue green algae’’ is still used frequently. Cyanobacteria can

be found in almost every conceivable environment: in oceans, lakes

and rivers as well as on land. Even they flourish in Artic and

Antarctic lakes (

Quesada et al., 2006

;

Reynolds, 2006

), hotsprings

and wastewater treatments plants. Under favourable conditions,

certain cyanobacteria can dominate the phytoplankton within a

water body and form nuisance blooms. Cyanobacteria have come to

the attention of public health workers because many freshwater

and brackish species can produce a range of potent toxins called

cyanotoxins (

Spoof et al., 2006

;

Reynolds, 2006

), and in freshwater

ecosystems are the most common cause of eutrophication. The

blooms are not always green (

Smith et al., 2008

;

Huisman et al.,

2010

). They can be blue, and even some cyanobacteria species

are coloured brownish-red. Furthermore, the water can become

malodorous when the cyanobacteria in the bloom die.

Therefore, cyanotoxins are an important environmental problem

in reservoirs (

Vasconcelos, 2006

;

Stewart et al., 2006

). Water is

never perfectly clean and polluted water is also a continuing threat

to human health and welfare (

Dası´ et al., 1998

;

de Hoyos et al.,

2004

). The toxins include neurotoxins, hepatotoxins, cytotoxins,

and endotoxins (

Dixit et al., 2005

;

Willame et al., 2005

;

Seckbach,

2007

;

David et al., 2009

;

Peschek et al., 2011

). Most reported

incidents of poisoning by microalgal toxins have occurred in

fresh-water environments, and they are becoming more common and

widespread (

Negro et al., 2000

).

Cyanotoxins are often implicated in what are commonly called

red tides or harmful algal blooms (HABs) (

Fogg et al., 1973

). Lakes

and oceans contain many single-celled organisms called

phytoplank-ton. Under certain conditions, particularly when nutrient

concentra-tions are high, these organisms reproduce exponentially. The

resulting dense swarm of phytoplankton is called an algal bloom.

These can cover hundreds of square kilometres and can be easily seen

in satellite images. Individual phytoplankton rarely live more than a

few days, but blooms can last weeks (

de Hoyos et al., 2004

).

On the one hand, a genetic algorithm (GA) is a search heuristic

that mimics the process of natural evolution (

Goldberg, 1989

;

Davis, 1991

;

Sivanandam and Deepa, 2010

). In this study, this

heuristic is used to carry out a dimensional reduction by identifying

patterns in the experimental data set. This technique permits the

selection of six main variables from a total number of 24 predicting

Contents lists available at

SciVerse ScienceDirect

journal homepage:

www.elsevier.com/locate/envres

Environmental Research

0013-9351/$ - see front matter&2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.envres.2013.01.001

n

Corresponding author. Fax:þ34 985 103354.

(2)

variables in this complex problem, with minimal loss of

informa-tion. GAs belong to the larger class of evolutionary algorithms (EA)

(

Haupt and Haupt, 2004

;

Engelbrecht, 2007

), using techniques

inspired by natural evolution, such as inheritance, mutation,

selec-tion, and crossover. On the other hand, support vector regression

(SVR) is a novel learning technique based on statistical learning

theory and the structural risk minimization principle, which has

been successfully used for nonlinear system modelling (

Vapnik

et al., 1997

;

Taboada et al., 2007

;

de Cos Juez et al., 2010

;

Sa´nchez

Lasheras et al., 2010

;

Sua´rez Sa´nchez et al., 2011

). The SVR

parameters must be determined carefully in order to obtain the

most efficient SVR model (

Vapnik, 1998

;

Keerthi, 2002

;

Sch ¨olkopf

and Smola, 2002

;

Shawe-Taylor and Cristianini, 2004

). In other

words, an inappropriate choice of the SVR hyperparameters will

result in over-fitting or under-fitting, and different hyperparameter

settings may also give place to significant differences in

perfor-mance (

Cristianini and Shawe-Taylor, 2000

;

Steinwart and

Christmann, 2008

). Therefore, the optimal selection of SVR

hyper-parameters is an important step in a SVR fit.

The aim of this research work is to construct a hybrid GA–SVR

model to identify spatial cyanotoxins in waterways in the Trasona

reservoir (Principality of Asturias, Northern Spain)

(see Fig. 1)

. The

GA-SVR technique is aimed at highly nonlinear biological

pro-blems with sharp peaks and the tests carried out in this research

work proved its high performance. It is a non-parametric regression

technique and can be seen as an extension of linear models that

automatically models non-linearities and interactions as those

analysed in this innovative research work successfully.

The Trasona reservoir, which was initially destined to the

industrial supply, is complemented at present with a recreational

utilization as a high performance training centre of canoeing. It is an

eutrophic ecosystem, which has been characterized for

cyanobac-teria outcrops in certain periods, which sometimes has produced

variable concentrations of cyanotoxins, mainly mycrocistins.

This innovative research work is structured as follows. In the

first place, the necessary materials and methods are described to

carry out this study. Next the obtained results are shown and

discussed. Finally, the main conclusions drawn from the results

are exposed.

2. Materials and methods 2.1. Experimental data set

The data set used for the hybrid GA–SVR model developed here were collected over 6 years (from 2006 to 2011) from lots of samples in Trasona reservoir and the total number of data processed was about 151 values. The supplementary site-specific experimental data set associated with this article can be found at the following online link:http://dl.dropbox.com/u/36679320/Trasona_reservoir_data_sc.xls. The information of the biological parameters is expressed in biovolume (cubic millimetres per liter) of phytoplankton species. Specifically, this reservoir was sampled several times a month from January 1, 2006 to December 31, 2011, following the sampling protocols for lakes and reservoirs of the Spanish Ministry of Agriculture, Food and Environment, which are consistent with the guidelines established by the European Union and international agencies dealing with these issues (Quesada et al., 2004). In practice, a single point of sampling is taken into account in the place of greater depth of the reservoir. The samples were taken with a Niskin hydrographic bottle at different depths in the euphotic zone (Dası´ et al., 1998). The values of phytoplankton and concentrations of cyanotoxins, chlorophyll and other physicochemical parameters were determined from a sample composed of five homogeneous subsamples obtained with the hydrographic bottle at various equidistant depths in the euphotic zone (Quesada et al., 2004; Reynolds, 2006). In this research work, we have taken into account the two dominant species of the cyanobacteria community:Microcystis aeruginosa(seeFig. 2left) and Woronichinia naegeliana(seeFig. 2right).

The main goal of this research work was to obtain the dependence relationship of cyanotoxins (output variable), expressed in micrograms per liter, as a function of the following two groups of input variables (Reynolds, 2006):

a. Biological parameters

M. aeruginosa(mm3/l) is a type of harmful blue-green algae which is also referred to as colonial cyanobacteria (seeFig. 2left).

W. naegeliana(mm3

/l) is a kind of cyanobacteria present in waters of a lower trophic status (seeFig. 2right).

Other cyanobacteria (mm3

/l): All cyanobacteria excluding the two pre-vious ones. Examples of these species may include some potentially toxic species such as Microcystis flos-aquae, Microcystis novacekii, Anabaena flos-aquaeandAnabaena crassa.

Diatoms (mm3

/l) are a major group of algae, and are one of the most common types of phytoplankton.

Chrysophytes (mm3

/l) are small flagellates that are a yellowish brown colour. They can also be found singly or in a colony.

Chlorophytes (mm3

/l) refer to a highly paraphyletic group of all green algae within the green plants group.

Other phytoplankton species (mm3/l): They represent the rest of the phytoplankton species excluding the previous ones.

Chlorophyll concentration (mg/l): Chlorophyll is an extremely important biomolecule, critical in photosynthesis, which allows plants to obtain energy from light.

b. Physical–chemical parameters

Water temperature (1C): This is a measurement of the intensity (not amount) of heat stored in a volume of water. Temperature affects the solubility of many chemical compounds and can therefore influence the effect of pollutants on aquatic life.

Ambient temperature (1C): Simply means that the temperature of the Trasona reservoir’s surroundings that affects water temperature.

Fig. 1.(a) Aerial photograph of the city of Avile´s (Northern Spain) (2) and the Trasona reservoir (1); and (b) an aerial photograph of the Trasona reservoir in great detail.

(3)

Secchi disk depth (m): The depth at which the pattern on the Secchi disk (a circular disk with alternating black and white quadrants, mounted on a pole or line) is no longer visible from the surface when it is lowered down in the water. It is a measure of water transparency, directly related to phytoplankton growth and eutrophication processes.

Turbidity (NTU): This is a measurement of the suspended particulate matter in a water body which interferes with the passage of a beam of light through the water. Materials that contribute to turbidity are silt, clay, organic material, or micro-organisms. High levels of turbidity increase the total available surface area of solids in suspension upon which bacteria can grow. High turbidity reduces light penetration; therefore, it impairs photosynthesis of submerged vegetation and algae.

Total phosphorus (mg P/l): This is a measure of both inorganic and organic forms of phosphorus. Phosphorus can be present as dissolved or particu-late matter. It is an essential plant nutrient and is often the most limiting nutrient to plant growth in freshwater. It is rarely found in significant concentrations in surface waters.

Phosphates concentration (mg PO43/l): This is a measure of the inorganic oxidized form of soluble phosphorus. This form of phosphorus is the most readily available for uptake during photosynthesis. High concentrations of orthophosphate generally occur in conjunction with algal blooms. Often a limiting nutrient in ecological environments. Its availability may govern the growth rate of the aquatic organisms. High phosphate levels gives place to eutrophication processes that increases cyanobacterial biomass with the subsequent cyanotoxins production.

Total nitrogen concentration (mg N/l): This is a measure of that portion of nitrogen that is organically bound. Organic nitrogen includes all organic compounds such as proteins, polypeptides, amino acids, and urea. Essen-tial to Earth’s life.

Nitrate concentration (mg NO3

/l): This is the measurement of the most oxidized and stable form of nitrogen in a water body. Nitrate is the principle form of combined nitrogen found in natural waters. It results from the complete oxidation of nitrogen compounds. Excessive amounts of nitrogen may result in phytoplankton or macrophyte proliferations.

Nitrite concentration (mg NO2

/l): This is a measure of a form of nitrogen that occurs as an intermediate in the nitrogen cycle. It is an unstable form that is either rapidly oxidized to nitrate (nitrification) or reduced to nitrogen gas (de-nitrification). This form of nitrogen can also be used as a source of nutrients for plants. Nitrite is toxic to aquatic life at relatively low concentrations.

Ammonium ion concentration (mg/l): This is a measure of the most reduced inorganic form of nitrogen in water. Excess ammonia contributes to eutrophication of water bodies. This results in prolific algal growths that have deleterious impacts on other aquatic life, drinking water supplies, and recreation. Ammonia at high concentrations is toxic to aquatic life. It can be easily oxidized to nitrate in oxidizing environments.

Dissolved oxygen concentration (mg O2/l): This is a measure of the amount of oxygen dissolved in water. The dissolved oxygen concentration is subject to diurnal and seasonal fluctuations that are due, in part, to variations in temperature and photosynthetic activity. Dissolved oxygen is essential to the respiratory metabolism of most aquatic organisms. It affects the solubility and availability of nutrients, and therefore the productivity of aquatic ecosystems.

Conductivity (mS/cm): This is the measurement of the ability of water to conduct an electric current, that is to say, the greater the content of ions in the water, the more current the water can carry. Ions are dissolved metals and other dissolved materials. Conductivity may be used to estimate the total ion concentration of the water, and is often used as an alternative measure of dissolved solids.

Alkalinity (mg CaCO3/l): This is the measurement of the water’s ability to neutralize acids. It usually indicates the presence of carbonate, bicarbonates,

or hydroxides. Alkalinity results are expressed in terms of an equivalent amount of calcium carbonate. Waters that have high alkalinity values are considered undesirable because of excessive hardness and high concentra-tions of sodium salts. Water with low alkalinity have little capacity to buffer acidic inputs and are susceptible to acidification (low pH).

Calcium concentration (mg/l) is essential for living organisms, in particular in cell physiology. The hardness of water is generally due to the presence of calcium and magnesium in the water. Harder water has the effect of reducing the toxicity of some metals (i.e., copper, lead, zinc, etc.).

pH: Measures the acidity or basicity of an aqueous solution. Is the

measurement of the hydrogen-ion concentration in the water. High pH values tend to facilitate the solubilization of ammonia, heavy metals and salts. The precipitation of carbonate salts (marl) is encouraged when pH levels are high. Low pH levels tend to increase carbon dioxide and carbonic acid concentrations. Lethal effects of pH on aquatic life occur below pH 4.5 and above pH 9.5.

At the same time we have an information that it is quantitative on the abundance of phytoplankton species. They are measured in number of cells per mililiter.Fig. 3(a) shows the evolution of chlorophyll concentration and cyanobac-teria cell number per millilitre in the Trasona reservoir from January of 2006 to December of 2011. Higher levels of both variables are observed at certain periods of the years 2006, 2007 and 2008, which are significantly greater than the values obtained in the years 2009, 2010 and 2011. The peaks inFig. 3(a) correspond to the cyanobacteria blooms: summer and fall of those years. However, there are no cyanobacteria blooms in years 2009, 2010 and 2011.Fig. 3(b) shows the evolution of cyanotoxins concentration and cyanobacteria cell number per millilitre in the Trasona reservoir from January of 2006 to December of 2011. Similarly, the peaks inFig. 3(b) correspond to the cyanobacteria blooms and large concentrations of cyanotoxins.

Fig. 4shows a photograph of the Trasona reservoir with a dense bloom of cyanobacteria in 2007.

Specifically, cyanobacteria cell number per millilitre was less than 50,000 and cyanotoxins concentration was always zero in 2009, 2010 and 2011. In this sense, Fig. 5shows a photograph of the Trasona reservoir in summer of 2009 without a bloom of cyanobacteria.

In fact, the Trasona reservoir is an eutrophic ecosystem (Pe´rez-Martı´nez and Sa´nchez-Castillo, 2004;A´lvarez Cobelas and Arauzo, 2006) which has been char-acterized for the presence of cyanobacteria. These last ones sometimes have produced variable concentrations of cyanotoxins, mainly microcystins (Chorus and Bartram, 1999;Quesada et al., 2004). Once the problem has been identified, civil works have been carried out in order to diminish the nutrients contributions to the reservoir. The guideline values for safe recreational water quality raises alert level 2 (World Health Organization, 1998) with values greater than 100,000 cells per millilitre and a microcystin concentration greater than 20.0

mg/l (see

Fig. 3(a) and (b)).

The inventories of cells were taken through an inverted microscope on settled samples. The cyanotoxins have been analysed by means of the high-performance liquid chromatography (HPLC) technique (American Public Health Association, 1998). The Trasona reservoir is located near the industrial city of Avile´s (Asturias, Northern Spain). Practically chained to the Trasona reservoir, it is possible to observe a wetland created artificially in order to shelter one changeable aquatic avifauna. This lagoon is able to store approximately 50,000 m3of water and the almost constant level of the water sheet of this lagoon allows the building of nests of different species of birds.

2.2. Genetic algorithms

Mathematical modelling has always been an integral part of behavioural ecology from its inception (Ruxton and Beauchamp, 2008). Mathematical modelling provides an opportunity to formulate hypotheses about ecological behaviour in a rigorous way

(4)

Fig. 3.(a) Evolution of chlorophyll concentration and cyanobacteria cell number per millilitre as a function of time in the Trasona reservoir from January of 2006 to December of 2011; and (b) evolution of cyanotoxins concentration and cyanobacteria cell number per millilitre as a function of time in the Trasona reservoir from January of 2006 to December of 2011.

(5)

and the solutions that emerge illuminate the relationships between variables thought to be important in driving behaviour. The complexity of models inevitably increases, as relationships among independent variables are refined. While the predictions of more complex models are often subtler, the practical task of solving the equations of the model to find solutions becomes more difficult. This arises for two reasons. First, finding analytic solutions to complex models is challenging and often, remarkably even for relatively simple equations, beyond current capacity (Vrugt and Robinson, 2007;Ruxton and Beauchamp, 2008). Second, tractable solutions are often beyond the abilities of mathematically challenged researchers who are not always very familiar with mathematical techniques. This is especially the case in an empirically strong field like cyanobacterial ecology (Hense and Burchard, 2010;Wang et al., 2010). Heuristic search algorithms provide a mean to locate solutions in less tractable models. These algorithms involve the use of computer programs that search from the solution systematically in a predefined search space. Genetic algorithms (GAs), one particular class of search algorithms, have been used widely in fields as varied as biology, chemistry and economics (Goldberg, 1989;Davis, 1991;Haupt and Haupt, 2004; Sivanandam and Deepa, 2010).

The genetic algorithms (GAs) are based upon Darwin’s Theory of Evolution (Goldberg, 1989;Davis, 1991;Haupt and Haupt, 2004;Sivanandam and Deepa, 2010). The genetic algorithms are modelled on a relatively simple interpretation of the evolutionary process. However, it has proven to be a reliable and powerful optimization technique in a wide variety of applications. Holland in 1975 was the first to propose the use of genetic algorithms for problem-solving (Holland, 1975; Goldberg, 1989;Davis, 1991). The GA uses the current population of strings to create a new population whereby the strings in the new generation are on average better than those in the current population. The selection depends on their fitness value. The selection process determines which string in the current will be used to create the next generation. The crossover process determines the actual form of the string in the next generation (Engelbrecht, 2007;Ordo´n˜ ez Gala´n et al., 2011). Weak individuals are discarded and only the strongest survive. In this way, how do they work?

Initialization: Initially many individual solutions are randomly generated to form an initial population. The population size depends on the nature of the problem, but typically contains hundreds or even thousands of possible solutions. Traditionally, the population is generated randomly,

covering the entire range of possible solutions (the search space). Occasion-ally, the solutions may be ‘‘seeded’’ in areas where optimal solutions are likely to be found.

Evaluation: An evaluation function is applied in order to know the goodness of each of the solutions of the population.

Stop criterion: The GA will stop when the optimum solution is found or after a certain number of iterations/generations. If the stop criterion is not accom-plished then a new iterative loop is carried out.

Selection: During each successive generation, a proportion of the existing population is selected to breed a new generation. Individual solutions are selected through a fitness-based process, where fitter solutions (as measured by a fitness function) are typically more likely to be selected. Certain selection methods rate the fitness of each solution and preferentially select the best solutions. Other methods rate only a random sample of the population, as this process may be very time-consuming. The fitness function,f, maps a chromo-some representation into a scalar value so thatGrepresents the data type of the elements of annx-dimensional chromosome (Haupt and Haupt, 2004; Engelbrecht, 2007;Ordo´n˜ ez Gala´n et al., 2011):

f:Gnx-R ð1Þ

Crossover: In genetic algorithms, crossover is a genetic operator used to vary the programming of a chromosome or chromosomes from one generation to the next. It is analogous to reproduction and biological crossover, upon which genetic algorithms are based. Crossover operators can be divided into three main categories based on the arity (i.e. the number of parents used) of the operator. This gives rise to three main classes of crossover operators (Engelbrecht, 2007; Ordo´n˜ ez Gala´n et al., 2011): (1) asexual, where an offspring is generated from one parent; (2) sexual, where two parents are used to produce one or two offspring (the operator employed in the present research) and (3) multi-recombination, where more than two parents are used to produce one or more offspring.

Mutation: A genetic operator, used to maintain genetic diversity from one generation of a population of algorithm chromosomes to the next. It is analogous to biological mutation. Mutation is used in support of crossover to ensure that the full range of allele is accessible for each gene. Mutation is applied at a certain probability,pm, to each gene of the offspring,x~iðtÞ, to produce the mutated offspringxiðtÞ. The mutation probability, also referred to as the mutation rate, is usually a small value,pmA½0,1, to ensure that good solutions are not distorted too much. Given that each gene is mutated at probabilitypm, the probability that an individual will be mutated, taking into account that the individual containsnxgenes, is given by (Haupt and Haupt, 2004;Ordo´n˜ ez Gala´n et al., 2011)

Probðx~iðtÞis mutatedÞ ¼11pm

nx ð2Þ

Replacement: The least-fit population is replaced with new individuals.

2.3. Support vector machines for regression

SVMs are a set of related supervised learning methods used for classification and regression that can universal approximate any multivariate function to any level of accuracy (Cortes and Vapnik, 1995;Vapnik, 1998). SVMs were originally developed to solve classification problems (Taboada et al., 2007). They were later generalized to solve regression problems (Vapnik et al., 1997;de Cos Juez et al., 2010;Sa´nchez Lasheras et al., 2010;Sua´rez Sa´nchez et al., 2011) in a method called support vector regression (SVR). The model produced by support vector classification only depends on a subset of the training data, because the cost function for building the model ignores training points that lie beyond the margin. Analogously, the model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that are close (within a threshold

e

) to the model prediction.

The basic idea of SVR is briefly described here. Rather than classify new unseen variables!x^ into one of two categoriesy^¼71, we want to predict a real-valued output fory0. Hence, the training data is of the formf!x

i,tig, wherei¼1,2,:::,L,yAR, x

!

ARD(Steinwart and Christmann, 2008;Fletcher, 2009): yi¼w

!

U!xiþb ð3Þ

The SVR uses a more sophisticated penalty function: a penalty is not imposed if the predicted valueyiis less than a distance

e

away from the actual valueti, i.e., if 9tiyi9o

e

. Referring toFig. 6, the region bound byyi7e8iis called an

e

-insensitive tube. Another modification to the penalty function is that output variables outside the tube are allocated one of two slack variable penalties, depending on whether they lie abovexþ

or below x

ð Þthe tube, wherexþ4 0, x4 08i(Fletcher, 2009): tiryiþ

e

þxþ ð4Þ tiZyi

e

x ð5Þ

Fig. 4.Dense bloom of cyanobacteria on the Trasona reservoir in 2007.

Fig. 5.A photograph of the Trasona reservoir in summer of 2009 without a bloom of cyanobacteria.

(6)

The SVM problem can be formulated as follows (Fletcher, 2009; Sua´rez Sa´nchez et al., 2011): min w ! ,b,n 1 2 :w !:2 þCX L i¼1 ðxþ i þx iÞ ( ) /!w,cð!xiÞSþbyiZ

e

þxþ i yið/!w,cð!xiÞSþbÞZ

e

þx i xþ i ,x iZ0 0 B B @ 1 C C Ai¼1,:::,L ð6Þ

wherec:X-Zis a transformation of the input space into a new spaceZ, usually a larger dimension space, where we define an inner product by means of a positive definite functionk(kernel trick) (Cristianini and Shawe-Taylor, 2000; Shawe-Taylor and Cristianini, 2004;Steinwart and Christmann, 2008):

/cð!xÞ,cð!x0ÞS¼X i ciðx ! Þciðx !0 Þ ¼kð!x,!x0Þ ð7Þ

The above problem is quadratic with linear constraints, and so the Kuhn– Tucker optimality conditions are necessary and sufficient. The solution, which can be obtained from the dual problem, is a linear combination of a subset of sample points denominated support vectors (s.v.) as follows (Steinwart and Christmann, 2008;Fletcher, 2009): w ! ¼X s:v: bicðx ! iÞ )fw,bðx ! Þ ¼X s:v: bi/cðx ! iÞ,cð!xÞSþb¼ X s:v: bikðx ! i,!xÞ þb ð8Þ The reason that this kernel trick is useful is that there are many regression problems that cannot be linearly regressed in the space of the inputs!x, which might be in a higher dimensional feature space given a suitable mapping. Different kernel functions are described in the bibliography, for example:

Radial basis function (RBF) (Shawe-Taylor and Cristianini, 2004;Fletcher, 2009): kð!xi,!xjÞ ¼eð

99!xi!xj992=2s2Þ

ð9Þ

Polynomial kernel (Shawe-Taylor and Cristianini, 2004;Fletcher, 2009): kð!xi,x

!

jÞ ¼ ðx

!

iU!xjþaÞb ð10Þ

whereaandbare parameters defining the kernel’s behaviour.

To sum up, to use an SVM to solve a regression problem for data that is not linearly separable, we need to first choose a kernel and relevant parameters that can be expected to map the nonlinearly separable data into a feature space where it is linearly separable.

3. Analysis of results and discussion

The biological and physical–chemical input variables considered

in this research work are shown in

Tables 1 and 2

, respectively

(

Whitton and Potts, 2000

;

Reynolds, 2006

;

Gault and Marler, 2009

;

Huisman et al., 2010

). Note that one of the variables is equal to the

product of the variable

M. aeruginosa

multiplied by the variable

W. naegeliana

due to the coexistence of these two species of

cyanobacteria in order to reproduce their dynamics without

inter-vention of external factor. This mathematical formulation adds a

multiplicative additional term to take into account the interaction

of both species according to a more realistic modelling in Biology

(

Allman and Rhodes, 2003

;

Barnes and Chu, 2010

). Furthermore,

the information of the biological parameters is expressed here in

biovolume (cubic millimetres per liter) of phytoplankton species

while cyanotoxins (output variable) and chlorophyll concentrations

in micrograms per liter. Therefore, the total number of predicting

variables used was 24 in this study.

It is important to select a model that best fits the experimental

data. In this research work, the fitted hybrid GA–SVM model has a

coefficient of determination

R

2

equal to 0.95 and a correlation

coefficient equal to 0.98. These results indicate an important

goodness of fit, that is to say, a good agreement is obtained

between our model and the observed data.

In attempting to model real-world problems or concepts using

computational methods, the selection of an appropriate

repre-sentation is of considerable importance (

Vapnik, 1998

;

de Cos

Juez et al., 2009

). The selection of features can have a considerable

impact on the effectiveness of the overall resulting regression

algorithm: the hybrid GA–SVM model. The main purpose of

feature selection is to reduce the number of features used in

regression while maintaining an acceptable accuracy. This matter

is carried out using an appropriate genetic algorithm in the first

step of the analysis. To fix ideas, a genetic algorithm (GA) is

typically defined by following types of parameters (

Haupt and

Haupt, 2004

;

Engelbrecht, 2007

;

Sivanandam and Deepa, 2010

;

Ordo´n

˜ ez Gala´n et al., 2011

): size of the population, number of

generations, mutation probability, if clones are allowed or not,

criterion to judge the quality of subsets and cardinality of the

subset. In this research work, the basic GA parameters and their

values are shown in

Table 3

.

According to the results shown in

Table 4

, the 24 original

variables of this nonlinear complex problem are reduced to six

main variables with minimal loss of information. In this sense, the

most significant variable in cyanotoxins prediction (output

vari-able) is

W. naegeliana. The second significant variable is the

product of the concentration of

M. aeruginosa

by the

concentra-tion of

W. naegeliana

(Microcys_

_Worochinia), the third is water

Table 1

Set of biological input variables used in this study.

Biological input variables Name of the variable Microcystis aeruginosa(mm3 /l) Microcystis_aeruginosa Woronichinia naegeliana(mm3 /l) Woronichinia_naegeliana Other cyanobacteria (mm3 /l) Other_species_Cyanobacteria Diatoms (mm3 /l) Diatoms Chrysophytes (mm3/l) Chrysophytes Chlorophytes (mm3/l) Chlorophytes Other species of the phytoplankton (mm3

/l) Other_phyto Microcystis aeruginosaWoronichinia

naegeliana(synergistic interaction variable) (mm6

/l2 )

Microcys__Worochinia

Chlorophyll concentration (mg/l) Chlorophyll

Table 2

Set of physical–chemical input variables used in this study. Physical–chemical input variables Name of the variable Water temperature (1C) Water_temperature Ambient temperature (1C) Ambient_temperature Secchi disk depth (m) Secchi_disk_depth Turbidity (NTU) Turbidity Total phosphorus (mg P/l) Total_phosphorus Phosphates concentration (mg PO43

/l) Phosphates_concentration Total nitrogen concentration (mg N/l) Total_nitrogen_concentration Nitrate concentration (mg NO3

/l) Nitrate_concentration Nitrite concentration (mg NO2

/l) Nitrite_concentration Ammonium concentration (mg/l) Ammonium_concentration Dissolved oxygen concentration (mg O2/l) Dissolved_oxygen_concentration Conductivity (mS/cm) Conductivity

Alkalinity (mg CaCO3/l) Alkalinity

Calcium concentration (mg/l) Calcium_concentration

pH values pH_ values

(7)

temperature, the fourth is turbidity, the fifth is total phosphorus

and finally the sixth is alkalinity.

The 24 variables are reduced to six variables with minimal loss

of information and they are sufficient to predict the blooms of

cyanobacteria with production of cyanotoxins in the Trasona

reservoir. The cyanobacteria community in this reservoir is

mainly composed by

M. aeruginosa

and

W. naegeliana. If

W. naegeliana

increase significantly its presence, this will be a

clear warning that we may be near a bloom of cyanobacteria with

risk of cyanotoxins. If we add a significant increase in the

presence of

M. aeruginosa, the two cyanobacteria species (M.

aeruginosa

and

W. naegeliana) produce a result greater than the

sum of their individual effects. Thus, the cyanotoxins production

seems to be increased in a nonlinear way by the combined

presence of both species. The physical–chemical parameters

(water temperature, turbidity, total phosphorus and alkalinity)

are also important in the cyanotoxins forecasting since

cyano-bacterial composition of the reservoir depends on them. These

last four variables are directly related to most of the physical–

chemical parameters considered in this study so that it is a logical

result the variables reduction carried out. Thus, water

tempera-ture is a consequence of ambient temperatempera-ture. They are directly

related if no thermal discharge takes place. Obviously, water

temperature is the most influential parameter in the

cyanobac-terial growth, and this variable is kept after the mathematical

process as a main variable. Dissolved oxygen is also related to the

water temperature since as water temperature increases,

dis-solved oxygen decreases. This same behaviour is observed for

turbidity and Secchi disk depth: the higher turbidity, the lower

Secchi disk depth. Total phosphorus is another of the selected

variables while phosphates not. However, phosphates were

implicitly considered since the total phosphorus includes all kind

of phosphorus compounds. The remaining parameters removed

such as the conductivity and nitrogen compounds (total nitrogen,

nitrates, nitrites and ammonium) have very little influence on

cyanobacterial growth. Indeed, it is well known that

cyanobac-teria are able to fix nitrogen from the atmosphere so that it is not

a limiting nutrient as the phosphorus.

At the same time, 15 reservoirs were studied from 2006 to

2011 to fulfil their levels of eutrophication (

Ortiz-Casas and Pen

˜ a

Martı´nez, 1984

). These reservoirs are located in the Cantabrian

basin (Northern Spain). Twelve of these reservoirs have less than

1% of the cyanobacterial biovolume with respect to the overall

biovolume of the samples. Only two of them, the San Andre´s

reservoir and the La Barca reservoir, have more than 30% of the

cyanobacterial biovolume with respect to the overall biovolume

of phytoplankton. Therefore, these two reservoirs are similar to

the Trasona reservoir. However, the Trasona reservoir is singular,

because the cyanobacterial biovolume in case of blooms of

cyanobacteria was equal to 100% with respect to the overall

biovolume of the samples (

Sabater and Nolla, 1991

).

The cyanobacteria community of the San Andre´s reservoir is

mainly composed by

M. aeruginosa

(75%) and

W. naegeliana

(18%).

The values of

W. naegeliana

and high values of the synergistic

variable (Microcys_

_Woronichinia_) warn of a high risk of

cyano-toxins. The water temperature (its high values indicate a shallow

reservoir), the turbidity (high values of an eutrophicated reservoir,

ratified by the high values of the total phosphorus) and the alkalinity

indicate the high risk of cyanotoxins along with the another set of

physicochemical variables discussed above (

Peretyatko et al., 2010

).

The cyanobacteria community of the La Barca reservoir is mainly

composed by

M. aeruginosa

(57%) and

W. naegeliana

(25%). In a

similar way, the values of

W. naegeliana

and the high values of the

synergistic variable warn of a high risk of cyanotoxins. Water

temperature (its high values are characteristic of a reservoir used

to cool a coal power plant). This reservoir is eutrophicated (high

values of the turbidity and total phosphorus). Furthermore, these

values of turbidity are high because it is a shallow reservoir. The

values of a eutrophicated reservoir are ratified by the high values of

total phosphorus. Similarly, the physical–chemical variables

dis-cussed above along with the alkalinity indicate the high risk of

cyanotoxins presence (

Peretyatko et al., 2010

).

As a consequence,

W. naegeliana

is the most important variable

in the generation of cyanotoxins. Specifically, the cyanobacteria

community of the Trasona reservoir is mainly composed by

M. aeruginosa

and

W. naegeliana. It is well known that

M.

aeruginosa

is potentially toxic. Up to now, there is only a partial

evidence of toxicity of

W. naegeliana

(

Willame et al., 2005

). The

majority of the samples which contained cyanotoxins were

dominated by

M. aeruginosa

(47%), followed by

W. naegeliana

(38%). These data do not necessarily indicate that the dominant

cyanobacteria is the largest producer of cyanotoxins (

Willame

et al., 2005

).

In order to take into account the intereraction between input

variables

M. aeruginosa

and

W. naegeliana, not considered in other

works (

Chorus and Bartram, 1999

;

Willame et al., 2005

;

Seckbach,

2007

); it was necessary to add a new input variable equal to the

product of the concentrations of the two above input variables in

additioni to other variables empirically measured in the Trasona

reservoir. The consideration of this interaction is known as synergy

or synergistic behaviour. Therefore, the production of cyanotoxins

from

M. aeruginosa

or from

W. naegeliana

increases due to the

combined presence of both species:

M. aeruginosa

and

W. naegeliana.

The term synergy comes from the Greek word ‘synergos’,

meaning working together (

Corning, 2012

). Among biologists,

the use of the term synergy has been limited until recently

mainly to certain especialized areas, such as the neurochemistry,

cell biology and endocrinology. Moreover, most biologists

recog-nize the subset of synergy known as ‘emergent effects’, as well as

the synergies associated with coevolution. Synergistic response is

a complicating factor in environmental modelling. Synergy has

been advanced as a hypothesis on how complex systems operate.

Environmental systems may react in a nonlinear way to

perturba-tions, so that the outcome may be greater than the sum of the

individual component alterations. Synergy is a room without

walls in terms of which kinds of cooperative relationships are

applicable and it is relevant at every level of living systems, from

enzymes to ecosystems. The synergistic phenomenon has been

Table 4

Evaluation of the importance of the variables that form the model: best variable-subset selected.

Order of relevance Variable

1 Woronichinia_naegeliana 2 Microcystis_aeruginosaWoronichinia_naegeliana 3 Water_temperature 4 Turbidity 5 Total_phosphorus 6 Alkalinity Table 3

Training basic parameters and their values for the genetic algorithm.

Parameters Value

Size of the population 150

Number of generations 100

Mutation probability 1%

Clones allowed No

Criterion (indicates which criterion is to be used in judging the quality of the subsets)

Standard coefficient of determinationR2 Cardinality of the subset that is wanted 6

(8)

observed in the two cyanobacteria species (M. aeruginosa

and

W.

naegeliana) and it produces a result greater than the sum of their

individual effects. Therefore, the cyanotoxins production is

increased in a nonlinear way due to the combined presence of

both species (

Reynolds, 2006

;

Corning, 2012

).

On the one hand, water temperature affects the solubility of

many chemical compounds and can therefore influence in the

effect of pollutants on aquatic life. On one hand, the metabolic

oxygen demand grows as water temperature increases, which in

conjunction with reduced oxygen solubility, affects many species

in a negative way (

Arp and Yin, 1992

;

Blais et al., 1998

). On the

other hand, the synthesis of cyanotoxins is more frequent in

warm waters than in cold waters. Temperature affects algal

growth directly, but this growth is also concerned indirectly by

the water temperature due to their influence on solubility of

many chemical compounds. At the same time, ambient

tempera-ture affects the temperatempera-ture of the Trasona reservoir’s

surround-ings and thus it also concerns the water temperature and aquatic

plants growth.

Turbidity is a measurement of the suspended particulate

matter in a water body and is usually produced by anthropogenic

sources as forest harvesting, road building, agriculture, urban

developments, sewage treatment plant effluents, mining and

industrial effluents (

France and Peters, 1995

). High levels of

turbidity increase the total available surface area of solids in

suspension upon which bacteria can grow. High turbidity reduces

light penetration (

Nicholls et al., 2003

) Therefore, it impairs

photosynthesis of submerged vegetation and algae. In turn, the

reduced plant growth may suppress fish productivity. The growth

of phytoplanton contributes to turbidity. High levels of turbidity

increase the total available surface area of solids in suspension

upon which bacteria can grow. High turbidity reduces light

penetration. Therefore, it impairs the photosynthesis of the

submerged vegetation and algae. This situation favours the

dominance of some cyanobacteria as

M. aeruginosa

(main

pro-ductor of cyanotoxins), because of their ability to move up or

down into the water column according to its need of light

irradiance (

Deng et al., 2007

).

Total phosphorus is an essential plant nutrient and is often the

most limiting nutrient to plant growth in fresh water.

Anthro-pogenic sources of total phosphorus are: sewage treatment plant

effluent, agriculture, urban developments (particularly from

detergents), and industrial effluents. Since phosphorus is

gener-ally the most limiting nutrient, its input to fresh water systems

can cause extreme proliferation of algal growth. Inputs of

phos-phorus are the prime contributing factors to eutrophication in

most fresh water systems (

Smol et al., 1983

;

Likens, 1985

;

Prepas

et al., 2001

). Phosphorus can be present as dissolved or

particu-late matter. It is an essential nutrient for plants and it is often the

most limiting nutrient in the growth of the plants in fresh water.

The phosphates concentration (mg PO

43

/l) is a measurement of

the oxidized form of the soluble inorganic phosphorus. High

concentrations of orthophosphate generally occur in conjunction

with algal blooms. It is a limiting nutrient in ecological

environ-ments. Its availability may govern the growth rate of the aquatic

organisms. High phosphate concentrations give place to

eutro-phication processes that increases cyanobacterial biomass with

the subsequent cyanotoxins production.

Alkalinity is the measurement of the water’s ability to

neu-tralize acids. It usually indicates the presence of carbonate,

bicarbonates or hydroxides. Waters that have high alkalinity

values are considered undesirable because of excessive hardness

and high concentrations of sodium salts. Water with low

alkali-nity have little capacity to buffer acidic inputs and are susceptible

to acidification (low pH). Acidic precipitation, mining and

indus-trial effluents are anthropogenic sources that lower alkalinity

(

Noges, 1992

;

Keenan and Kimmins, 1993

). Alkalinity is the

measurement of the water’s ability to neutralize acids. It usually

indicates the presence of carbonates, bicarbonates, or hydroxides.

However, carbonates and bicarbonates are part of the carbonate

system with three soluble components in equilibrium: carbonate

(CO

32

), bicarbonate (HCO

3

) and carbon dioxide (CO

2

). When CO

2

concentration is increased both carbonate and bicarbonate

con-centrations are decreased (low alkalinity) because of the

men-tioned equilibrium. In these conditions, green algae

(non-cyanobacterial

biomass)

are

favoured

over

cyanobacteria

(

Shapiro, 1984

;

Reynolds, 2006

). Conversely, if the alkalinity is

high, CO

2

-limiting conditions, cyanobacteria are predominant

because they possess an environmental adaptation known as a

CO

2

concentrating mechanism (

Price, 2011

). Alkalinity results are

expressed in terms of an equivalent amount of the calcium

carbonate.

In the final step of this analysis, once selected the six main

variables by using an appropriate GA, a regression model based on

support vector machines (SVR model) was carried out with

success in order to determine the cyanotoxins concentration in

the Trasona reservoir. Cross validation was the standard

techni-que used here for finding a suitable set of hyperparameters of the

SVR model. The data set is randomly divided into

l

disjoint

subsets of equal size, and each subset is used once as a validation

set, whereas the other

l

1 subsets are put together to form a

training set. In the simplest case, the average accuracy of the

l

validation sets is used as an estimator for the accuracy of the

method. The combination of the hyperparameters with the best

performance is chosen (

Sch ¨olkopf and Smola, 2002

;

Shawe-Taylor

and Cristianini, 2004

;

Steinwart and Christmann, 2008

). In this

way, 10-fold cross-validation was used.

Table 5

shows the

optimal hyperparameters of the fitted SVM model.

Table 5

Optimal hyperparameters of the fitted SVM model.

Parameter Value

SVM-type

n

-regression SVM-kernel Radial basis function

g

0.1666667

v 0.26

Number of support vectors 59

0 20 40 60 80 100 120 140 160 0 200 400 600 800 1000 1200 1400 1600 1800 Observation number Cyanotoxin ( µ g/l) Real values Predicted values

Fig. 7.Comparison between the three blooms of cyanobacteria observed and predicted by the model on the Trasona reservoir from 2006 to 2011.

(9)

Finally, this research work was able to estimate the presence

of cyanobacteria blooms from 2006 to 2011 in agreement to the

actual cyanobacteria blooms observed with great accurateness

and success (see

Fig. 7

).

4. Conclusions

To summarize, cyanotoxins are a very common and serious

problem for recreational reservoirs throughout the world. The

commonly used diagnostic techniques, like limnological studies,

require high costs for its implementation both from the material

and human points of view. In this sense, there is an absolute

necessity in developing alternative diagnostic techniques such as

the hybrid GA–SVR approach used in this innovative study. The

main findings of this analysis can be summarized as follows:

In the first place, the main purpose of this research work

was to build a cyanotoxin diagnostic model by using a

hybrid GA–SVR approach in Trasona reservoir with the

site-specific experimental data and this goal was achieved

in this study successfully. We have used the biological

input variables (phytoplankton species expressed in

bio-volume and the chlorophyll concentration) in combination

with the most important physical–chemical parameters.

Secondly, a correlation coefficient equal to 0.98 was

obtained when the hybrid GA–SVR technique was applied

to the experimental data set. The predicted results for the

model have demonstrated to be consistent with the

history of observed actual cyanobacteria blooms from

2006 to 2011.

Thirdly, one of the main findings of this study was to set the

order of significance of the variables involved in the

predic-tion of the cyanotoxins presence. Specifically,

W. naegeliana

and the synergetic effect of the variable

M. aeruginosa

multiplied by

W. naegeliana, are the two most influential

variables in the cyanotoxins production. The third variable

is water temperature, the fourth is turbidity, the fifth is total

phosphorus and finally the sixth is alkalinity.

Finally, the authors of this research work have confidence

that the results obtained will be useful to tackle new

future studies in other similar reservoirs and lakes by

applying the same methodology developed here in

pre-dicting the presence of cyanotoxins.

Acknowledgments

Authors wish to acknowledge the computational support

provided by the Departments of Mathematics, Construction and

Mining Exploitation at University of Oviedo as well as pollutant

data in the Trasona Reservoir of Avile´s (Northern Spain) supplied

by the Cantabrian Basin Authority (Ministry of Agriculture, Food

and Environment of Spain). Furthermore, authors would like to

express their gratitude to the Department of Education and

Science of the Principality of Asturias for its partial financial

support (Grant reference FC-11-PC10-19). Finally, the English

grammar and spelling of the manuscript have been revised by a

native person.

References

Allman, E.S., Rhodes, J.A., 2003. Mathematical Models in Biology: An Introduction. Cambridge University Press, New York.

A´lvarez Cobelas, M., Arauzo, M., 2006. Phytoplankton 457 responses to varying time scales in a eutrophic reservoir. Arch. Hydrobiol. Ergeb. Limnol. 40, 69–80.

American Public Health Association, 1998. American Water Works Association, Water Environment Federation. Standard Methods for the Examination of Water and Wastewater, no. 20. APHA/AWWA/WEF, Washington.

Arp, P.A., Yin, X., 1992. Predicting water fluxes through forests from monthly precipitation and mean monthly air temperature records. Can. J. For. Res. 22, 864–877.

Barnes, D.J., Chu, D., 2010. Introduction to Modeling for Biosciences. Springer, New York.

Blais, J.M., France, R.L., Kimpe, L.E., Cornett, R.J., 1998. Climatic changes in northwestern Ontario have had a greater effect on erosion and sediment accumulation than logging and fire: evidence from 210Pb chronologhy in lake sediments. Biogeochemistry 43, 235–252.

Chorus, I., Bartram, J., 1999. Toxic Cyanobacteria in Water: A Guide to their Public Health Consequences, Monitoring and Management. Spon Press, New York. Corning, P., 2012. Nature’s Magic: Synergy in Evolution and the Fate of

Human-kind. Cambridge University Press, New York.

Cortes, C., Vapnik, V., 1995. Support vector networks. Mach. Learn. 20, 273–297. Cristianini, N., Shawe-Taylor, J., 2000. An Introduction to Support Vector Machines

and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge, UK.

Dası´, M.J., Miracle, M.R., Camacho, A., Soria, J.M., Vicente, E., 1998. Summer phytoplankton assemblages across trophic gradients in hard-water reservoirs. Hydrobiologia 369–370, 27–43.

David, P., Fewer, D.P., K ¨oykk ¨a, K., Halinen, K., Jokela, J., Lyra, C., Sivonen, K., 2009. Culture-independent evidence for the persistent presence and genetic diver-sity of microcystin-producing Anabaena (Cyanobacteria) in the Gulf of Finland. Environ. Microbiol. 11 (4), 855–866.

Davis, L., 1991. Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York.

de Cos Juez, F.J., Sa´nchez Lasheras, F., Garcı´a Nieto, P.J., Sua´rez Sua´rez, M.A., 2009. A new data mining methodology applied to the modelling of the influence of diet and lifestyle on the value of bone mineral density in post-menopausal women. Int. J. Comput. Math 86 (10–11), 1878–1887.

de Cos Juez, F.J., Garcı´a Nieto, P.J., Martı´nez Torres, J., Taboada Castro, J., 2010. Analysis of lead times of metallic components in the aerospace industry through a supported vector machine model. Math. Comput. Model 52 (7–8), 1177–1184.

de Hoyos, C., Negro, A., Aldasoro, J.J., 2004. Cyanobacteria distribution and abundance in the spanish water reservoirs during thermal stratification. Limnetica 23 (1–2), 119–132.

Deng, D.-G., Xie, P., Zhou, Q., Yang, H., Guo, L.-G., 2007. Studies on temporal and spatial variations of phytoplankton in lake Chaohu. J. Integr. Plant Biol. 49 (4), 409–418.

Dixit, A., Dhaked, R.K., Alam, S.I., Singh, L., 2005. Military potential of biological neurotoxins. Toxin Rev. 24 (2), 175–207.

Engelbrecht, A.P., 2007. Computational Intelligence: An Introduction. Wiley, New York.

Fletcher, T., 2009. Support Vector Machines Explained: Introductory Course. Internal Report. University College London (UCL), London.

Fogg, G.E., Stewart, W.D.P., Fay, P., Walsby, A.E., 1973. The Blue-green Algae. Academic Press, London.

France, R.L., Peters, R.H., 1995. Predictive model of the effects on lake metabolism of decreased airborne litterfall through riparian deforestation. Conserv. Biol. 9 (6), 1578–1586.

Gault, P.M., Marler, H.J., 2009. Handbook on Cyanobacteria: Biochemistry, Biotechnology and Applications. Nova Science Publishers, New York. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization and Machine

Learning. Addison-Wesley, New York.

Haupt, R.L., Haupt, S.E., 2004. Practical Genetic Algorithms. Wiley-Interscience, New York.

Hense, I., Burchard, H., 2010. Modelling cyanobacteria in shallow coastal seas. Ecol. Model. 221 (2), 238–244.

Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor.

Huisman, J., Matthijs, H.C.P., Visser, P.M., 2010. Harmful Cyanobacteria. Springer, New York.

Keenan, R.J., Kimmins, J.P., 1993. The ecological effects of clear-cutting. Environ. Rev. 1, 121–144.

Keerthi, S.S., 2002. Efficient tuning of SVM hyper-parameters using radius/margin bound and iterative algorithms. IEEE Trans. Neural Networks 13 (5), 1225–1229. Likens, G.E., 1985. An Ecosystem Approach to Aquatic Ecology: Mirror Lake and its

Environment. Springer-Verlag.

Negro, A.I., de Hoyos, C., Vega, J.C., 2000. Phytoplankton structure and dynamics in Lake Sanabria and Valparaı´so reservoir (NW Spain). Hydrobiologia 424, 25–37. Nicholls, K.H., Steedman, R.J., Carey, E.C., 2003. Changes in phytoplankton com-munities following logging in the drainage basins of three boreal forest lakes in north-western Ontario. Can. J. Fish. Aquat. Sci. 60, 43–54.

Noges, P., 1992. Changes in the ionic composition of Lake Vorts ¨arv (Estonian Republic). Limnologia 22, 115–120.

Ortiz-Casas, J.L., R., Pen˜ a Martı´nez, 1984. Applicability of the OECD eutrophication models to Spanish reservoirs. Verh. Int. Ver. Limnol. 22 (3), 1521–1535. Ordo´n˜ ez Gala´n, C., Rodrı´guez-Pe´rez, J.R., Martı´nez Torres, J., Garcı´a Nieto, P.J., 2011.

Analysis of the influence of forest environments on the accuracy of GPS measurements by using genetic algorithms. Math. Comput. Model. 54 (7–8), 1829–1834.

(10)

Pe´rez-Martı´nez, C., Sa´nchez-Castillo, P., 2004. Temporal occurrence ofCeratium hirundinellain spanish reservoirs. Hydrobiologia 452 (1–3), 101–107. Peretyatko, A., Teissier, S., De Backer, S., Triest, L., 2010. Assessment of the risk of

cyanobacterial bloom occurrence in urban ponds: probabilistic approach. Ann. Limnol.-Int. J. Limnol. 46 (2), 121–133.

Peschek, G.A., Obinger, C., Renger, G., 2011. Bioenergetic Processes of Cyanobac-teria: From Evolutionary Singularity to Ecological Diversity. Springer, New York.

Prepas, E.E., Pinel-Alloul, B., Planas, D., Method, G., Paquet, S., Reedyk, 2001. Forest harvest impacts on water quality and aquatic biota on the boreal plain: introduction to the TROLS program. Can. J. Fish. Aquat. Sci. 58, 421–436. Price, G.D., 2011. Inorganic carbon transporters of the cyanobacterial CO2

con-centrating mechanism. Photosynth. Res. 109, 47–57.

Quesada, A., Sanchis, D., Carrasco, D., 2004. Cyanobacteria in spanish reservoirs. How frequently are they toxic? Limnetica 23 (1–2), 109–118.

Quesada, A., Moreno, E., Carrasco, D., Paniagua, T., Wormer, L., de Hoyos, C., Sukenik, A., 2006. Toxicity ofAphanizomenon ovalisporum(cyanobacteria) in a spanish water reservoir. Eur. J. Phycol. 41 (1), 39–45.

Reynolds, C.S., 2006. Ecology of Phytoplankton. Cambridge University Press, New York.

Ruxton, G.D., Beauchamp, G., 2008. The application of genetic algorithms in behavioural ecology, illustrated with a model of anti-predator vigilance. J. Theor. Biol. 250, 435–448.

Sabater, S., Nolla, J., 1991. Distributional patterns of phytoplankton in Spanish reservoir: first results and comparison after fifteen years. Verh. Int. Ver. Limnol. 24 (2), 1371–1375.

Sa´nchez Lasheras, F., Vila´n Vila´n, J.A., Garcı´a Nieto, P.J., del Coz Dı´az, J.J., 2010. The use of design of experiments to improve a neural network model in order to predict the thickness of the chromium layer in a hard chromium plating process. Math. Comput. Model. 52 (7–8), 1169–1176.

Sch ¨olkopf, B., Smola, A.J., 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. The MIT Press, Cambridge, MA. Seckbach, J., 2007. Algae and Cyanobacteria in Extreme Environments. Springer,

New York.

Shapiro, J., 1984. Blue-green dominance in lakes: the role and management significance of pH and CO2. Int. Rev. Gesamten Hydrobiol. 4 (69), 765–780. Shawe-Taylor, J., Cristianini, N., 2004. Kernel Methods for Pattern Analysis.

Cam-bridge University Press, New York.

Sivanandam, S.N., Deepa, S.N., 2010. Introduction to Genetic Algorithms. Springer, New York.

Smith, M.J., Shaw, G.R., Eaglesham, G.K., Ho, L., Brookes, J.D., 2008. Elucidating the factors influencing the biodegration of cylindrospermopsin in drinking water sources. Environ. Toxicol. 23 (3), 421–423.

Smol, J.P., Brown, S.R., McNeely, R.N., 1983. Cultural disturbances and trophic history of a small meromictic lake from central Canada. Hydrobiologia 103, 125–130.

Spoof, L., Berg, K.A., Rapala, J., Lahti, K., Lepisto, L., Metclaf, J.S., Codd, G.A., Meriluoto, J., 2006. First observation of cylindrospermopsin in Anabaena lapponica isolated from the boreal environment (Finland). Environ. Toxicol. 21 (6), 552–560.

Steinwart, I., Christmann, A., 2008. Support Vector Machines. Springer, New York. Stewart, I., Webb, P.M., Schluter, P.J., Shaw, G.R., 2006. Recreational and occupa-tional field exposure to freshwater cyanobacteria—a review of anecdotal and case reports, epidemiological studies and the challenges for epidemiologic assessment. Environ. Health 5 (6), 1–13.

Sua´rez Sa´nchez, A., Garcı´a Nieto, P.J., Riesgo Ferna´ndez, P., del Coz Dı´az, J.J., Iglesias-Rodrı´guez, F.J., 2011. Application of an SVM-based regression model to the air quality study at local scale in the Avile´s urban area (Spain). Math. Comput. Model. 54 (5–6), 1453–1466.

Taboada, J., Matı´as, J.M., Ordo´n˜ ez Gala´n, C., Garcı´a Nieto, P.J., 2007. Creating a quality map of a slate deposit using support vector machines. J. Comput. Appl. Math. 204 (1), 84–94.

Vapnik, V., Golowich, S.E., Smola, A., 1997. Support vector method for function approximation, regression estimation, and signal processing. Adv. Neur. In. 9, 281–287.

Vapnik, V., 1998. Statistical Learning Theory. Wiley-Interscience, New York. Vasconcelos, V., 2006. Eutrophication, toxic cyanobacteria and cyanotoxins: when

ecosystems cry for help. Limnetica 25 (1–2), 425–432.

Vrugt, J.A., Robinson, B.A., 2007. Improved evolutionary optimization from genetically multimethod search. Proc. Natl. Acad. Sci. 104, 708–711. Wang, Z., Huang, K., Zhou, P., Guo, H., 2010. A hybrid neural network model for

cyanobacteria bloom in Dianchi lake. Proc. Environ. Sci. 2, 67–75.

Whitton, B.A., Potts, M., 2000. The Ecology of Cyanobacteria: Their Diversity in Time and Space. Springer, New York.

Willame, R., Jurckzak, T., Iffly, J.F., Kull, T., Meriluoto, J., Hoffman, L., 2005. Distribution of hepatotoxic cyanobacterial blooms in Belgium and Luxem-bourg. Hydrobiologia 551, 99–117.

World Health Organization, 1998. Guidelines for Drinking-water Quality: Health Criteria and Other Supporting Information, vol. 2, Geneva.

Figure

Fig. 1. (a) Aerial photograph of the city of Avile´s (Northern Spain) (2) and the Trasona reservoir (1); and (b) an aerial photograph of the Trasona reservoir in great detail.
Fig. 4 shows a photograph of the Trasona reservoir with a dense bloom of cyanobacteria in 2007.
Fig. 3. (a) Evolution of chlorophyll concentration and cyanobacteria cell number per millilitre as a function of time in the Trasona reservoir from January of 2006 to December of 2011; and (b) evolution of cyanotoxins concentration and cyanobacteria cell n
Fig. 4. Dense bloom of cyanobacteria on the Trasona reservoir in 2007.
+2

References

Related documents

Under the stress of three inhibitors, the metabolites and key enzymes/proteins involved in glycolysis, reductive tricarboxylic acid (TCA) cycle, acetone–butanol synthesis and

The only current service in Queensland is the Queensland Working Women’s Service (QWWS) located in Brisbane, funded through the Fair Work Ombudsman, providing limited services

The previous research regarding a variety of campus food drives provides insight as to how said food drives help acquire food and create awareness for the GVSU community

Retinal examination showed perivascular sheathing with frosted branch angiitis pattern in veins and patchy retinal hemorrhages.... Repeated ocular coherence tomography (OCT)

Requisitos: Java Runtime Environment, OpenOffice.org/LibreOffice Versión más reciente: 1.5 (2011/09/25) Licencia: LGPL Enlaces de interés Descarga:

Model Integrating Society, Science, Environment, Technology and Collaborative Mind Mapping (ISSETCM2) merupakan model pembelajaran yang mengintegrasikan SETS dan teknik

This study is aimed to determine malnutrition of children under five years old as well as to identify the correlation between risk factors and malnutrition on the area of