## SUMO - Supermodeling by combining

## imperfect models

### Year 3 Report - Workpackage 7

### Pan£e Panov, Nikola Simidjievski, Jovan Tanevski,

### Darko Aleksovski, Aljaº Osojnik,

### Ljup£o Todorovski, and Sa²o Dºeroski

### May 14, 2014

### WP7 year 3 report: Executive summary

The objective of WP7 is to develop methods for computational scientic dis-covery that can learn complete supermodels (ensembles of ODE models) of dynamical systems. Its sub-objectives include the development of techniques for (semi)automated generation of constituent models (Task 7.1), selection of an appropriate subset of models to include in the ensemble (Task 7.2), and selecting an appropriate method to combine the models and their pre-dictions (Task 7.3). The focus for WP7 in Year 3 was Task 7.3 (combining the model predictions), but this required polishing and revisiting certain as-pects of Tasks 7.1 and 7.2 (generating and selecting constituent models) and implementing all of the above within an integrated software environment.

The constituent models (ODEs) are learnt from observed data and do-main knowledge by the process-based modeling tool ProBMoT. Note that both the structure and the parameters of the process-based models (and the corresponding ODEs) are learnt. Thus, the ensembles can contain models with dierent structure, which is a novelty as compared to ensembles which are based on (random) variation of the initial conditions or parameter values of the ODE models.

ProBMoT searches a space of possible model structures, dened by the domain knowledge given and ts the parameter values to each structure con-sidered, based on the input data. The model structures considered are ranked in terms of their error, either on the training data set or on a separate vali-dation data set. The highest ranked model is the output of ProBMoT.

To generate (diverse) base-level models, we have extended ProBMoT to support the three steps outlined above. First, it allows for two types of sampling of the (time-course) data given as input: bootstrap sampling and error-weighted sampling. Second, it allows for two (independent) kinds of selection of constituent models for the ensemble, one based on the error on training/validation data and the other on simulation stability: The latter is used for dynamic prediction-time selection of models to be used for prediction within the ensemble. Finally, it is possible to use three dierent methods for combining the predictions of the constituent models, using three dierent types of aggregation (taking a simple average or a weighted average/median). We empirically evaluated the bagging and boosting approach to learning ensembles of process-based (ODE) models on 15 dierent data sets, coming from three aquatic ecosystems. In the evaluation we focus on the predic-tive power of the two types of ensembles and individual models. Note that previous approaches to computational scientic discovery have focused on explanatory models of dynamics and their descriptive power and haven't investigated the predictive power of such models.

The results of the empirical evaluation can be summarized as follows. The ensembles learned by bagging signicantly outperform individual models. The ensembles learned by bagging perform better than those learned by boosting, but not signicantly. Finally, the ensembles learned by boosting perform better than individual models, but not signicantly. Overall, the ensembles learned by bagging perform best.

A longer summary of the above contributions can be found in Sections 1, 2 and 3 of this report. A detailed description can be found in the papers associated with this report. In particular, papers P1, P2, and P3 describe the above contributions in detail.

In addition to developing methods for learning ensembles of ODE models, we have applied these methods (as well as learning individual models) to practically relevant problems from environmental and life sciences. These include modeling phytoplankton dynamics in three lakes (Bled, Slovenia; Kasumigaura, Japan; and Zuerich, Switzerland) and modeling hydrological and nutrient leaching processes at watershed level (papers P3 and P4). They also include an application in systems biology, i.e., modeling the process of endocytosis, a crucial part of the immune response (papers P5, P6, and P7), as well as applications in the area of synthetic biology (talks IT1, IT2). More details can be found in Section 4 and the associated papers.

Besides learning ensembles of (continuous-time) ODE models, we have also addressed the task of learning ensembles of non-parametric regression models for modeling dynamic systems in discrete time. We have used the external dynamics approach, which formulates this task as a regression task of learning a dierence/ recurrence equation/ model. This model predicts the state of the system at a given discrete time point as a (nonlinear) function of the states and inputs of the system at several previous time points.

In particular, we propose methods for learning fuzzy linear model trees and ensembles thereof. These are used as the function approximators within the external dynamics approach. Trees for predicting either a single-output (one state variable) or multiple-outputs (the entire system state) trees/ en-sembles can be learned by our methods.

We evaluate the proposed methods on a number of problems from the area of control engineering. We show that they perform well and are resilient to noise. Among the dierent approaches proposed, ensembles of trees for predicting multiple outputs can be recommended, since they perform best and yield the most succinct overall models. The above contributions are described in detail in papers P8, P9, P10 and P11.

Finally, we also tackle the task of learning ensembles of non-parametric regression models for modeling dynamic systems in discrete time in a stream-ing context. The data stream paradigm matches the concept of modelstream-ing

dynamic systems very well, as the continued observation of the state and inputs of a dynamic systems over (discrete) time results in a (potentially in-nite) stream of data. Within this paradigm, we can handle large quantities of data (both a large number of variables and a large/ innite number of observations), which makes it suitable for some novel application areas (such as global systems science).

We have developed novel methods for learning ensembles of regression and model trees on data streams, such as on-line bagging and on-line ran-dom forests. We have also proposed the approach of on-line learning of model trees with options, which present an eective compromise between learning a single model and an ensemble. We have applied these approaches to several tasks of discrete-time modeling of dynamic systems. The above contributions are described in the papers P12 and P13.

Highlights

• We have developed equation discovery methods for learning ensemble

models of dynamic systems, where both the structures of the constituent ODE models and their parameters are learnt from observed data. The ensembles can contain models with dierent structure, which is a nov-elty as compared to ensembles which are based on (random) variation of the initial conditions or parameter values of the ODE models. The en-sembles consist of ODE models learnt by using the bagging and boosting approach to sampling the observed data and their predictions are com-bined by using simple averaging of the predicted trajectories.

We empirically evaluated the two approaches (bagging and boosting) on 15 dierent data sets, coming from three aquatic ecosystems, compar-ing them between themselves and to individual models. The ensembles learned by bagging perform best and signicantly outperform individual models in terms of predictive power. Note that previous approaches to computational scientic discovery have focused on explanatory models of dynamics and their descriptive power and haven't investigated the predictive power of such models.

• Besides learning ensembles of (continuous-time) ODE models, we have

also addressed the task of learning ensembles of non-parametric regres-sion models for modeling dynamic systems in discrete time. We have used the external dynamics approach, which formulates this task as a re-gression task of learning a dierence/ recurrence equation/ model. We address this task both in the batch learning and on-line/ data stream context.

We have developed novel methods for learning ensembles of regression and model trees on data streams, such as on-line bagging and on-line random forests. We have also proposed the approach of on-line learn-ing of model trees with options, which present an eective compromise between learning a single model and an ensemble. We have successfully applied these approaches to several tasks of discrete-time modeling of dynamic systems.

### List of papers associated with this deliverable

Conference and journal papers

P1 Nikola Simidjievski, Ljup£o Todorovski, Sa²o Dºeroski, Learning bagged models of dynamic systems, In: Zbornik, 5. ²tudentska konferenca Mednarodne podiplomske ²ole Joºefa Stefana = 5th Joºef Stefan In-ternational Postgraduate School Students Conference, 23. maj 2013, Ljubljana, Slovenija, Nejc Trdin, ed., et al, Ljubljana, Mednarodna podiplomska ²ola Joºefa Stefana, 2013, pp. 177188.

P2 Nikola Simidjievski, Ljup£o Todorovski, Sa²o Dºeroski. Learning ensem-bles of population dynamics and their application to modelling aquatic ecosystems. Submited to Ecological Modeling journal, 2014.

P3 Nikola Simidjievski, Ljup£o Todorovski, Sa²o Dºeroski. Bagging and boosting of process-based models. Submited to Machine Learning Jour-nal, 2014.

P4 Mateja kerjanec, Nata²a Atanasova, Darko erepnalkoski, Sa²o Dºeroski, Boris Kompare. Development of a knowledge library for au-tomated watershed modeling, Environmental Modeling and Software, Vol. 54, pp. 6072, 2014.

P5 Jovan Tanevski, Ljup£o Todorovski, Sa²o Dºeroski. Automated model-ing of Rab5Rab7 conversion in endocytosis, In: Zbornik, 5. ²tudentska konferenca Mednarodne podiplomske ²ole Joºefa Stefana = 5th Joºef Stefan International Postgraduate School Students Conference, 23. maj 2013, Ljubljana, Slovenija, Nejc Trdin, ur., et al, Ljubljana, Mednaro-dna podiplomska ²ola Joºefa Stefana, 2013, pp. 209218.

P6 Jovan Tanevski, Ljup£o Todorovski, Yannis Kalaidzidis, Sa²o Dºeroski. Inductive process modeling of Rab5-Rab7 conversion in endocytosis. In: Discovery science : 16th International Conference, DS 2013, Singapore, October 6-9, 2013, proceedings, (Lecture notes in computer science, Lecture notes in articial intelligence), Johannes Fürnkranz, ed., et al, Berlin, Springer, 2013, vol. 8140, pp. 265280, 2013.

P7 Jovan Tanevski, Ljup£o Todorovski, Yannis Kalaidzidis, Sa²o Dºeroski. Process-based modeling of endocytosis. Submitted to Information Sci-ences journal, 2014.

P8 Darko Aleksovski, Ju² Kocijan, Sa²o Dºeroski. Model tree ensembles for modeling dynamic systems. In: Discovery science : 16th International Conference, DS 2013, Singapore, October 69, 2013, proceedings, (Lec-ture notes in computer science, Lec(Lec-ture notes in articial intelligence), Johannes Fürnkranz, ed., et al, Berlin, Springer, 2013, vol. 8140, pp. 17-32, 2013.

P9 Darko Aleksovski, Ju² Kocijan, Sa²o Dºeroski. Model-Tree Ensembles for Noise-tolerant System Identication. Submitted to Advanced En-gineering Informatics, 2014.

P10 Darko Aleksovski, Ju² Kocijan, Sa²o Dºeroski. Model Tree Ensembles for the Identication of Multiple-Output Systems. Accepted for presen-tation at the Europeam Control Conference, Strasbourg, 2014.

P11 Darko Aleksovski, Ju² Kocijan, Sa²o Dºeroski. Ensembles of Linear Model Trees for the Identication of Multiple-Output Systems. Sub-mitted to Information Sciences, 2014.

P12 Elena Ikonomovska, Joao Gama, Sa²o Dºeroski. Online Tree-based En-sembles and Option Trees for Regression on Evolving Data Streams. Accepted for publication in Neurocomputing journal, 2014.

P13 Aljaº Osojnik, Sa²o Dºeroski. Modeling Dynamical Systems with Data Stream Mining. Submited to Neurocomputing journal, 2014.

Invited talks

IT1 Sa²o Dºeroski, Ljup£o Todorovski. Equation Discovery for Systems and Synthetic Biology. Invited talk at CMUSV Symposium on Cognitive Systems and Discovery Informatics, Carnegie Mellon University, 2013. http://www.cogsys.org/app/webroot/symposium/2013/abstracts. html

IT2 Sa²o Dºeroski. Inductive Process Modeling for Learning the Dynamics of Biological Systems. Invited talk at First International Conference on Formal Methods in Macro-Biology, Nourmea, New Caledonia, 2014. http://fmmb2014.sciencesconf.org/

Dissertations

D1 Elena Ikonomovska. Algorithms for learning regression trees and ensem-bles on evolving data streams. Doctoral dissertation, 2012.

D2 Darko erepnalkoski. Proces-based models of dynamical systems: Rep-resentation and induction. Doctoral dissertation, 2013.

D3 Darko Aleksovski. Tree ensembles for discrete-time modeling of non-linear dynamic systems. Doctoral dissertation. Expected to be de-fended in 2014.

D4 Nikola Simidjievski. Learning ensembles of process-based models of dy-namic systems. Doctoral dissertation. Expected to be defended in 2015.

D5 Jovan Tanevski. Deterministic and stochastic process-based modeling and design of dynamic systems in biology. Doctoral dissertation. Ex-pected to be defended in 2015.

### Contents

1 Generating a diverse set of models 10

1.1 Bootstrap sampling . . . 10

1.2 Error-weighted sampling . . . 11

1.3 Bagging and boosting of ODE models . . . 11

1.3.1 Bagging of PBMs . . . 12

1.3.2 Boosting of PBMs . . . 13

1.4 ProBMoT: A software tool for generating ensembles of process based models . . . 14

1.5 Empirical evaluation of bagging and boosting . . . 16

2 Selecting ensemble constituents 17 2.1 Selection at the level of the base algorithm . . . 17

2.2 Dynamic ensemble pruning at run time . . . 17

3 Combining model predictions 19 3.1 Simulation of individual ensemble models . . . 19

3.2 Combining model simulations . . . 20

4 Applications in real world domains 21 4.1 Applications in ecology . . . 21

4.2 Applications in systems & synthetic biology . . . 22

5 Discrete time modeling of dynamical systems with ensemble methods 23 5.1 Model-tree ensembles for noise-tolerant system identication . . . 23

5.2 Ensembles of linear model trees for the identication of multiple-output systems . . . 24

6 Data streams and dynamical systems 25 6.1 Ensembles of data streams . . . 25

### 1 Generating a diverse set of models

Ensemble methods construct a set of predictive models in the learning phase and then combine their predictions into a single one in the prediction phase. In the rst three sections of the deliverable, we are going to present methods for learning ensembles of process-based models of dynamic systems. In this rst section, we are going to present methods for generating a diverse set of candidate ensemble constituents. The next section presents methods for selecting the constituents to be included in the ensemble. In the third section, we are going to present methods for predicting with ensembles, i.e., combining the predictions of individual ensemble constituents into a single ensemble prediction.

Based on how the ensemble constituents are learned, the ensembles can be homogeneous or heterogeneous. In homogeneous ensembles, the base models are learned using the same learning algorithm on dierent sam-ples of the training data, where the sampling variants include: sampling of data instances, e.g., bagging [2] and boosting [6]; sampling of data fea-tures/attributes, e.g., random subspaces [8]; or both, e.g., random forests [1]. On the other hand, in heterogeneous ensembles, the candidate base models are learned using dierent learning algorithms, e.g., stacking [10].

Here, we consider homogeneous ensembles, where we employ ProBMoT, an environment for simulating and learning process-based models, to learn individual ensemble constituents. Each of these is learned on a dierent sam-ple of the training data. We use two sampling techniques. The rst is simsam-ple bootstrap sampling, where data samples for learning ensemble constituents are independent; this is the technique used in bagging ensemble methods. The second technique is error-weighted sampling, where the probability for sampling a data point from the training set is proportional to the model prediction error on that point. In such a way, ensemble constituents focus on improving erroneous predictions; this technique is often used in boosting.

### 1.1 Bootstrap sampling

The notable dierence from the usual bootstrap sampling setting in machine learning is that in our case the data instances have temporal ordering, that has to be retained in each data sample. To this end, we implement sampling by retaining the order of the instances and introducing instance weights. They corresponds to the number of times the instance has been selected in

the process of sampling with replacement. Instances that have not been selected remain in the sample with a zero weight.

To take into account the weights when learning a model from the sample, we use them to calculate the model prediction error, as follows:

WRMSE(m) =
s_{Pn}
t=0ωt∗(yt−yˆt)2
Pn
t=0 ωt
, (1)

whereyt andyˆtcorrespond to the measured and simulated values (simulating

the modelm) of the system variableyat time pointt,n denotes the number

of instances in the data sample, andωtdenotes the weight of the data instance

at time point t.

### 1.2 Error-weighted sampling

While in bootstrap sampling the data samples are random and mutually independent, in error-weighted sampling we take into account the model prediction error (of an incomplete ensemble). The weights for a particular sample are distributed among data points proportionally to the model er-ror on those points. More specically, for each data sample the weights of individual data points are being updated using the formula

ωt ←ωt· ¯ L 1−L¯ 1−Lt (2) whereLtis the model error at time pointt, whileL¯ is the average model error

on the whole training set. Both are being estimated on the model learned in the previous iteration. For the rst learning iteration, all the weights are set to 1.

### 1.3 Bagging and boosting of ODE models

Bagging (Bootstrap aggregation) developed by Breiman [2], is one of the rst and simplest ensemble learning methods. This method uses bootstrap sampling with aggregation. First, randomly sampled data instances, with replacements, from the training set are used to obtain bootstrap replicates. Next, each base model is learned from a dierent bootstrap replicate.

Boosting refers to a general approach for obtaining an accurate prediction by combining several rough ones, based on dierent weights of examples from the training data set. The AdaBoost algorithm, proposed by [7], is an imple-mentation of the boosting approach for the task of classication. AdaBoost

works iteratively; it uses dierently weighted versions of the training data for learning the base models at each iteration. Depending of the outcome of the past iteration this method decreases (for correct classication)/increases (for incorrect classication) the weight value of every instance for the sub-sequent iteration of training the model. This process can assure that the weak predictors can focus on dierent instances, thus creating more robust ones. In addition, the implementation of [4] successfully tackles the problem of combining regressors using AdaBoost.

In the following two subsection, we are going to introduce algorithms for bagging and boosting process-based models.

1.3.1 Bagging of PBMs

Algorithm 1 Bagging process-based models

1: procedure Bagging(im, lib, DT, DV, k) returns Ensemble

2: Ensemble←_{∅} .set of base models

3: letN .number of measurements in DT

4: for i= 1 to k do

5: ωt←sample(DT), t= 0..N . assign random weight to the data

points

6: ω←normalize(ω, N) . normalize toN

7: modeli ← probmot(im, lib, DT, DV, ω)

8: βi ←confidence(modeli, DV) 9: Ensemble←EnsembleS {(modeli, βi)} 10: end for 11: end procedure 12:

13: function confidence(model, D) returns β

14: letyˆt . simulated system variabley in time point t

15: letyt .measured system variable y in time point t

16: letN . number of measurements in D 17: yˆt←simulate(model, D),t = 0..N

18: Lt ←

|yt−yˆt|2

(sup|yt−yˆt|)2, t= 0..N .calculate square loss in each time point t
19: L¯←PN_{t}_{=0}Lt . calculate average loss

20: β← L¯

1−_{L}¯ . calculate condence
21: end function

Algorithm 1 provides an outline of our approach to bagging process-based models. The procedure Bagging() takes ve inputs: a conceptual modelim,

setDV, and an integerkthat denotes the number of candidate ensemble

con-stituents we are going to learn. The output is a set of process-based models

Ensemble. Using a call to probmot() in line 7, we learn the candidate base

models from dierent samples of the training data. The standard input to the probmot() procedure is a conceptual model, a library, and a training

data set. Additionally, here it takes two more inputs a validation data set

DV and a set of weights ω. The rst is used to estimate the performance

of the learned model on unseen data, while the second inuences the model error estimate according to Equation 1.

The output of a modeling task, when using ProBMoT, is an ordered list of models, ranked according to their decreasing error on the training data set or the validation data set, if the later is available. The highest ranked model is considered to be a candidate ensemble constituent and therefore added to the output set Ensemble together with its condence β, calculated on the

validation data set (DV) by the call of confidence().

1.3.2 Boosting of PBMs

Algorithm 2 provides an outline of our approach to boosting process-based models. As in the case of bagging, the Boosting() procedure takes the

same ve inputs: a conceptual model im, a library of domain knowledge lib,

training and validation data sets (DT and DV), and an integer k denoting

the number of boosting iterations to be performed. In contrast to bagging, here we start with the complete training data set. In addition, we introduce the same concept of weighting, but instead of uniformly choosing random time points at the beginning, we start with setting all the weights to 1. After every boosting iteration, the weights are recalculated (line 7 in Algorithm 2) according to the error of the ensemble model selected in the previous iteration using the update formula presented in Equation 2.

The reweight() function implements the update formula and takes 3

inputs: the ensemble constituent selected in the current boosting iteration (denoted with ensemble), a data set D, and the respective set of weights ω.

First, the model is simulated on the data set D resulting in a trajectory yˆ.

Based on the model error at each time point in the trajectory and the set of weightsω, an average lossL¯ of the ensemble model is calculated (lines 18, 19

in Algorithm 2). In turn, the loss is transformed to a condence measure

β, where low values of β denote high model condence. Finally, the set of

weights is updated: the smaller the loss, the more the weight is reduced, shifting the focus to more "dicult" parts of the data set in the future iterations of the boosting algorithm.

Algorithm 2 Boosting process-based models

1: procedure Boosting(im, lib, , DT, DV, k) returns Ensemble

2: Ensemble←_{∅} .set of base models

3: letN .number of measurements in DT

4: ωt ←1, t = 0..N . ωt is the weight of time point t

5: for i= 1 to k do

6: modeli ← probmot(im, lib, DT, DV, ω)

7: ω← reweight(modeli, DT, ω) 8: βi ←confidence(modeli, DV) 9: Ensemble←EnsembleS {(modeli, βi)} 10: end for 11: end procedure 12:

13: function reweight(model, D, ω) returns ω

14: letyˆt . simulated system variabley in time point t

15: letyt .measured system variable y in time point t

16: letN . number of measurements in D 17: yˆt←simulate(model, D),t = 0..N

18: Lt ←

|yt−yˆt|2

(sup|yt−yˆt|)2, t= 0..N .calculate square loss in each time point t
19: L¯←PN_{t}_{=0}Lt∗ PNωt

t=0ωt

. calculate average loss ,according to the

weights 20: β← L¯ 1−L¯ . calculate condence 21: ωt ←ωt∗β1−Lt, t = 0..N .update weights 22: ω←normalize(ω, N) . normalize toN 23: end function

ProBMoT is considered to be a candidate ensemble constituent and is there-fore added to the output set Ensemble.

### 1.4 ProBMoT: A software tool for generating ensembles

### of process based models

ProBMoT (which stands for Process-Based Modeling Tool, [9]) is a software tool for modeling, parameter estimation, and simulation of process-based models. ProBMoT follows the process-based modeling paradigm: it employs domain-specic modeling knowledge formulated in a form of a library of en-tity and process templates. These provide templates for modeling entities and processes (i.e., interactions between the entities) in the domain of inter-est.

Model structure generation Library Incomplete model Candidate model

structures Candidate models

Settings

Parameter estimation_{Parameter estimation}
Parameter estimation_{Parameter estimation}

Measurements
Output
speciﬁcation
*f(x,y,z)*
1
2
3
4
1
2
3
4

Figure 1: An overview of the ProBMoT procedure for learning process-based models from knowledge and data.

Template entities provide general knowledge about the objects modeled, e.g., ranges of values for variables and constants described by the entities. Template processes provide general knowledge about the interactions mod-eled, e.g., pieces of equations that describe the rates of proceses and value ranges for rate constants.

A process-based model refers to these templates to specify the entities and processes for the observed system at hand. Where a human modeler cannot make specic modeling decisions about the specic entities and processes of the observed system, she can specify a conceptual model that species a range of process-based model structures. ProBMoT can then search through the whole range of candidate model structures and nd the one that ts the observed data best.

Thus, when learning process-based models, ProBMoT takes a conceptual model of the observed system as input in addition to the library of domain-specic modeling knowledge (see Figure 1). The conceptual model species the expected logical structure of the expected model in terms of components (entities) and interactions (processes) between them without a specication of the particular modeling choices. On the other hand, library of knowledge provides a set of candidate modeling choices at dierent levels of abstrac-tion. ProBMoT combines the conceptual model with the library of modeling choices to obtain a list of candidate model structures. For each model struc-ture, the parameter values are estimated to maximize the t of the model simulation to the observed system behavior.

The parameter estimation process within ProBMoT is based on the meta-heuristic optimization framework jMetal 4.5 [5] that implements a number of global optimization algorithms. In particular, ProBMoT employs the Dif-ferential Evolution algorithm to optimize the t between simulations and observations. To measure the t, ProBMoT implements a variety of error

functions such as the sum of squared errors (SSE), mean squared error (MSE), root mean squared error (RMSE), root relative squared error (RRSE) and weighted root mean squared error (WRMSE).

To obtain model simulations, each process-based model is transformed to a system of ODEs. In turn, ProBMoT employs the CVODE solver of ordinary dierential equations from the SUNDIALS suite [3]. Finally, the output of ProBMoT is a set of process-based models, sorted according to the estimated model error.

Two changes of the ProBMoT implementation were necessary to employ it in the context of learning ensembles of process-based models. First, we implemented a new measure of t between observations and simulations that takes into account weights of the data points assigned by the sampling proce-dures. To this end, we used the formula for the weighted root mean squared error from Equation 1. Second, we allowed for estimation and ranking of models based on their performance on unseen data, i.e., based on a separate validation set of observations.

### 1.5 Empirical evaluation of bagging and boosting

To test the utility of ensembles of process-based models, we performed an extensive empirical evaluation of the performance of ensembles and single models. The results of the comparative analysis of the empirical evaluation show the following:

• Ensembles of process-based models learned with the bagging algorithm

signicantly outperform single process-based models;

• Ensembles of process-based models learned with the bagging algorithm

outperform the ones learned with boosting, but the dierence in per-formance is not signicant;

• Ensembles of process-based models learned with the boosting algorithm

outperform the single-process based models, but the dierence in the performance is not signicant.

### 2 Selecting ensemble constituents

### 2.1 Selection at the level of the base algorithm

At each iteration of building an ensemble, ProBMoT is used as a base-level algorithm for learning candidate ensemble constituents. In contrast with the usual machine learning setting for learning ensembles, where a base-level al-gorithm typically returns a single predictive model as a result of the learning, ProBMoT returns a ranked list of models. This gives us an opportunity to consider dierent models from this ranked list as ensemble constituents.

Based on results of a set of preliminary experiments, we decided to follow the traditional setting where only the top ranked model is selected to be an ensemble constituent. However, we use two dierent ways to rank the models within ProBMoT. The default ranking is based on the models' t to the training data set, while an alternative one is based on the models' t to a separate validation data set with observations not seen in the training phase. The empirical comparison of the two selection techniques show that se-lection of the best model based on a separate validation data set outperforms the default one. Using the t to the training data to rank the models leads to overtting. For details, see the attached articles Learning ensembles models of population dynamics and its application to modeling aquatic ecosystems and Bagging and boosting of process-based models.

### 2.2 Dynamic ensemble pruning at run time

In contrast with the traditional machine learning setting, where a model gives predictions for each data instance separately and independently from the others, here, a single model simulation provides predictions for all the ob-served data time points at once. Simulating process-based models requires a minimal input for obtaining a whole trajectory: An initial (at time 0) value for the endogenous (system) variables and complete (at each time point) exogenous (forcing) variables. In a predictive scenario, where the model is being simulated on unseen data, this can often lead to divergent trajecto-ries and therefore substantial predictive error. For this reason, we examine the simulated values and check whether they satisfy the upper- and lower-bound constraints specied in the library of background knowledge. If the predicted/simulated value is not within the specied bounds, the whole tra-jectory of that particular model is discarded from the resulting ensemble prediction. Therefore, we select the models to be included in the ensemble simulation (e.g., the base-level predictions to be combined into the ensemble prediction) at run time.

To illustrate how often we run into divergent simulation problems when simulating process-based models on unseen data, we performed a simple ex-periment with a set of fty process-based models. These are applied to 15 dierent datasets. We compare the results of two techniques for simulating ensembles (see Table 1). In the rst, we combine the individual predictions of all the models, not taking care about the constraints given in the library of domain knowledge (labeled complete). In the second case, we combine only the predictions from the valid models in terms of the library (labeled pruned). Table 1 also reports the number of models discarded when pruning the ensemble.

Table 1: Performance errors of a complete and a pruned ensemble and the number of base models pruned from the complete ensemble, with respect to the fteen test datasets.

Dataset

Ensemble

Complete Pruned _{pruned (out of 50)}# models
B1 3.515E+03 1.104 25
B2 1.235 1.235 0
B3 1.073 1.045 5
B4 0.737 0.737 0
B5 0.604 0.604 0
K1 2.387 0.946 10
K2 4.498 1.840 48
K3 6.631 0.916 7
K4 0.772 0.997 4
K5 0.962 0.962 0
Z1 1.047 0.989 1
Z2 1.323 1.096 3
Z3 1.289 1.226 3
Z4 0.976 0.976 0
Z5 5.383 1.375 12

From the table, we can see that in all but one experimental dataset (K4), the pruned ensemble outperforms (or has performance that is equal to) the complete ensemble. Note also that, in several cases (B1, K1, K2, K3, Z5), the performance of the ensemble is signicantly improved. By discarding models with divergent simulations from the ensemble, we can ensure valid ensemble predictions and stable simulations.

### 3 Combining model predictions

### 3.1 Simulation of individual ensemble models

Algorithm 3 outlines the procedure for simulating a given ensemble of process-based models. In order to simulate an ensemble, each base model needs to be simulated. As discussed in the previous section, we apply run-time pruning of the ensemble based on the constraints on the simulated values specied in the library of domain knowledge. The resulting ensemble simulation is a combination of the predictions of the individual base models at the respective time points.

Algorithm 3 Simulating ensembles

1: function simulateEnsemble(ensemble, lib, D, scheme) returns yˆe

2: yˆv ←∅ . yˆv denotes a valid model simulation

3: βv ←∅ . βv denotes a valid model condence

4: yˆe ←∅ . yˆe denotes the resulting ensemble simulation

5: letN . length of the simulated trajectory 6: for all (model, β)∈ensemble do

7: yˆ← simulate(model, D)

8: if inrange(∀x, lib), x∈yˆthen 9: {yˆv, βv} ← {yˆv, βv}S{y, βˆ }

10: else continue 11: end if

12: end for

13: if scheme=average then

14: yˆe ← average({yˆv})

15: else if scheme=weightedAverage then 16: yˆe ← sum({yˆvβv})

17: else

18: yˆe ← weightedMedian({yˆvβv})

19: end if

20: end function

The simulateEnsemble() procedure takes four inputs: a set of

process-based models denoted with ensemble, the library of domain knowledgelib, a

data setD, and a label schemeindicating which combination scheme should

be used. The resulting prediction of the ensemble is a trajectory denoted withyˆe. To obtain it, each model from the set is simulated. The result of the

prediction of a individual model for a dataset Dis a trajectory yˆ. Then, the

in the modeling knowledge library and performs the run-time ensemble prun-ing described in the previous section. Finally, the simulations of individual models are being combined as described below.

### 3.2 Combining model simulations

For combining the predictions of the constituent ensemble models into the prediction of the ensemble, we use three aggregation function: i.e., average, weighted average and weighted median, commonly used for regression ensem-bles in traditional machine learning setting [4]. In the case of simple average, all base models predictions are equally weighted, while for the weighted av-erage, we use the models' condences β as weights. The same set of weights

is used when calculated the weighted median value. The empirical compar-ison of the three combining schemes shows that simple average outperforms other two schemes. Although the dierence is not signicant, the simplicity of the average scheme renders it superior to the other two in the context of ensembles of process-based models.

### 4 Applications in real world domains

In addition to developing methods for learning ensembles of ODE models, we have applied these methods (as well as learning individual models) to practi-cally relevant problems from environmental and life sciences. These include modeling phytoplankton dynamics in three lakes (Bled, Slovenia; Kasumi-gaura, Japan; and Zuerich, Switzerland) and modeling hydrological and nu-trient leaching processes at watershed level. They also include an application in systems biology, i.e., modeling the process of endocytosis, a crucial part of the immune response, while previous applications of our methods addressed tasks from synthetic biology.

### 4.1 Applications in ecology

Modeling aquatic ecosystems. Ensemble methods are machine learning methods that construct a set of models and combine their outputs into a sin-gle prediction. The models within an ensemble can have dierent structure and parameters and make diverse predictions. Ensembles achieve high pre-dictive performance, beneting from the diversity of the individual models and outperforming them.

We developed approaches for learning ensemble models of dynamic sys-tems. We build upon the existing equation discovery systems for learning process-based models of dynamic systems from observational data, which in-tegrate the theoretical and empirical paradigms for modelling dynamic sys-tems: Besides observed data, they take into account domain knowledge.

We apply the proposed approach and evaluate its usefulness on a set of problems of modelling population dynamics in aquatic ecosystems. Data on three lake ecosystems are used, we also use a library of process-based domain knowledge. The results clearly show that ensemble models perform signi-cantly better than the single-model counterpart.

Watershed modeling of hydrology and nutrient leaching. In year 3 of the SUMO project, we also developed a library of components for building semi-distributed watershed models. The library incorporates basic model-ing knowledge that allows us to adequately model dierent water uxes and nutrient loadings on a watershed scale. It is written in a formalism com-pliant with the equation discovery tool ProBMoT, which can automatically construct watershed models from the components in the library, given a con-ceptual model specication and measured data.

We applied the proposed modeling methodology to the Ribeira da Foupana catchment to extract a set of viable hydrological models. By specifying

the conceptual model and using the knowledge library, dierent hydrological models are generated. The models are automatically calibrated against mea-surements and the model with the lowest root mean squared error (RMSE) value is selected as an appropriate hydrological model for the selected study area.

### 4.2 Applications in systems & synthetic biology

Modeling endocytosis. Given its recent rapid development and the central role that modeling plays in the discipline, systems biology clearly needs meth-ods for automated modeling of dynamical systems. Process-based modeling is concerned with explanatory models of dynamical systems: It constructs such models from measured time-course data and formalized modeling knowledge. In year 3 of the SUMO project, we applied PBM to the task of model-ing the conversion between Rab5 and Rab7 domain proteins in the cellular process of endocytosis, an example of a particularly relevant modeling task in systems biology. The task is dicult due to limited observability of the system variables (only sums thereof can be observed with the existing mea-surement methods) and noisy observations. This poses serious challenges to the process of model selection. To address them, we proposed a task-specic model selection criterion, that takes into account knowledge about the nec-essary properties of the simulated model behavior. In a series of modeling experiments, we empirically demonstrated the superiority of the proposed, task-specic criterion to the standard, general model selection approaches. Automated design in synthetic biology. Equation discovery approaches include approaches for automated modelling of dynamic systems, which (re)construct models of the system dynamics from its observed behavior and available domain knowledge. Such approaches have recently been used in systems biology to reconstruct the structure and dynamics of biological net-works/circuits.

In year 3 of the SUMO project, we also explored the use of equation discovery approaches to construct/design biological circuits from constraints that describe their desired behavior. We rst modied the state-of-the-art automated modeling approach to suit the requirements of the new task and then applied the resulting approach to two synthetic biology tasks of biocir-cuit design.

### 5 Discrete time modeling of dynamical systems

### with ensemble methods

Besides learning ensembles of (continuous-time) ODE models, we have also addressed the task of learning ensembles of non-parametric regression models for modeling dynamic systems in discrete time. We have used the external dynamics approach, which formulates this task as a regression task of learning a dierence/ recurrence equation/ model. This model predicts the state of the system at a given discrete time point as a (nonlinear) function of the states and inputs of the system at several previous time points.

In particular, we propose methods for learning fuzzy linear model trees and ensembles thereof. These are used as the function approximators within the external dynamics approach. Trees for predicting either a single-output (one state variable) or multiple-outputs (the entire system state) trees/ en-sembles can be learned by our methods.

We evaluate the proposed methods on a number of problems from the area of control engineering. We show that they perform well and are resilient to noise. Among the dierent approaches proposed, ensembles of trees for predicting multiple outputs can be recommended, since they perform best and yield the most succinct overall models. The above contributions are described in more detail in the following subsections.

### 5.1 Model-tree ensembles for noise-tolerant

### system identication

In year 3 of the SUMO project, we addressed the task of identication of nonlinear dynamic systems from measured data. The discrete-time variant of this task is commonly reformulated as a regression problem. As tree ensembles have proven to be a successful predictive modeling approach, we investigated the use of tree ensembles for solving the regression problem.

While dierent variants of tree ensembles have been proposed and used, they are mostly limited to using regression trees as base models. We intro-duce ensembles of fuzzied model trees with split attribute randomization and evaluated them for nonlinear dynamic system identication. Models of dynamic systems built for control purposes are usually evaluated by a more stringent evaluation procedure using output, i.e., simulation error. Taking this into account, we perform ensemble pruning to optimize the output error of the tree ensemble models.

The proposed Model-Tree Ensemble method was empirically evaluated by using input-output data disturbed by noise. It was compared to

represen-tative state-of-the-art approaches, on one synthetic dataset with articially introduced noise and one real-world noisy dataset. The evaluation showed that the method is suitable for modeling dynamic systems and produced models with comparable output error performance to the other approaches. Also, the method is resilient to noise, as its performance does not deteriorate even when up to 20% of noise is added.

### 5.2 Ensembles of linear model trees for the

### identication of multiple-output systems

In year 3 of the SUMO project, we also addressed the task of discrete-time modeling of nonlinear dynamic systems with multiple outputs using measured data. In the area of control engineering, this task is typically converted into a set of classical regression problems, one for each output, which can then be solved with any nonlinear regression approach. Fuzzy linear model trees (known as Takagi-Sugeno models) are a popular approach in this context.

We proposed, implemented and empirically evaluated three extensions of fuzzy linear model trees. First, we consider multi-output Takagi-Sugeno models. Second, we proposed to use ensembles of such models. Third, we in-vestigated the use of a search heuristic based on simulation error (as opposed to the one-step-ahead prediction error), specic to the context of modeling dynamic systems.

We empirically evaluated and compared these approaches on three case studies, including the modeling of the inverse dynamics of a robot arm and two process-industry systems. Multi-output model trees exhibit comparable performance to a set of single-output models, while providing a more compact model. Ensembles improve the performance of both single and multi-output trees, while the heuristic specic to modeling dynamic systems only improves performance very slightly. Overall, we can recommend the use of multi-output trees and ensembles thereof learnt by using prediction error as a search heuristic.

### 6 Data streams and dynamical systems

In year 3, we also tackled the task of learning ensembles of non-parametric regression models for modeling dynamic systems in discrete time in a stream-ing context. The data stream paradigm matches the concept of modelstream-ing dynamic systems very well, as the continued observation of the state and inputs of a dynamic systems over (discrete) time results in a (potentially in-nite) stream of data. Within this paradigm, we can handle large quantities of data (both a large number of variables and a large/ innite number of observations), which makes it suitable for some novel application areas (such as global systems science).

We have developed novel methods for learning ensembles of regression and model trees on data streams, such as on-line bagging and on-line random forests. We have also proposed the approach of on-line learning of model trees with options, which present an eective compromise between learning a single model and an ensemble. We have applied these approaches to several tasks of discrete-time modeling of dynamic systems. The above contributions are described in more details in the following subsections.

### 6.1 Ensembles of data streams

The emergence of ubiquitous sources of streaming data has given rise to the popularity of algorithms for online machine learning. In that context, Ho-eding trees represent the state-of-the-art algorithms for online classication. Their popularity stems in large part from their ability to process large quan-tities of data with a speed that goes beyond the processing power of any other streaming or batch learning algorithm.

Hoeding trees have often been used as base models of many ensemble learning algorithms for online classication. However, despite the existence of many algorithms for online classication, ensemble learning algorithms for online regression do not exist. In particular, the eld of online any-time regression analysis seems to have experienced a serious lack of attention.

In year 3 of the SUMO project, we also addressed this issue through a study and an empirical evaluation of a set of online algorithms for regression, which included the baseline Hoeding-based regression trees, online option trees, and an online least mean squares lter. We also designed, implemented and evaluated two novel ensemble learning methods for online regression: on-line bagging with Hoeding-based model trees, and an onon-line RandomForest method in which we have used a randomized version of the online model tree learning algorithm as a basic building block. Finally, in this context we evaluated the proposed algorithms along several dimensions: predictive

ac-curacy and quality of models, time and memory requirements, bias-variance and bias-variance-covariance decomposition of the error, and responsiveness to concept drift.

### 6.2 Modeling dynamical systems with data streams

In year 3, we also addressed the task of modeling dynamical systems in dis-crete time using regression trees, model trees and option trees for on-line regression. There exist several challenges that modeling dynamical systems pose to data mining approaches: these motivate the use of methods for min-ing data streams. We proposed the FIMT-DD algorithm for minmin-ing data streams with regression or model trees, as well as the FIMT-DD based algo-rithm ORTO, which learns option trees for regression. These methods were then compared on several case studies, i.e., tasks of learning models of dy-namical systems from observed data. From the performed experiments we can conclude that option trees for regression work best among the considered approaches for learning models of dynamical systems from streaming data.

### References

[1] Leo Breiman. Classication and regression trees. The Wadsworth and Brooks-Cole statistics-probability series. Chapman & Hall, 1984.

[2] Leo Breiman. Bagging predictors. Machine Learning, 24(2):123140, 1996a.

[3] Scott D. Cohen and Alan C. Hindmarsh. Cvode, a sti/nonsti ode solver in c. Comput. Phys., 10(2):138143, March 1996.

[4] Harris Drucker. Improving regressors using boosting techniques. In Pro-ceedings of the Fourteenth International Conference on Machine Learn-ing, ICML '97, pages 107115, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc.

[5] Juan J. Durillo and Antonio J. Nebro. jmetal: A java framework for multi-objective optimization. Advances in Engineering Software, 42:760771, 2011.

[6] Yoav Freund. An adaptive version of the boost by majority algorithm. In Proceedings of the twelfth annual conference on Computational learning theory, COLT '99, pages 102113, New York, NY, USA, 1999. ACM. [7] Yoav Freund and Robert E Schapire. A decision-theoretic generalization

of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119 139, 1997.

[8] Tin Kam Ho. The random subspace method for constructing decision forests. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(8):832844, Aug 1998.

[9] Darko erepnalkoski, Katerina Ta²kova, Ljup£o Todorovski, Nata²a Atanasova, and Sa²o Dºeroski. The inuence of parameter tting meth-ods on model structure selection in automated modeling of aquatic ecosystems. Ecological Modelling, 245(0):136 165, 2012. 7th Euro-pean Conference on Ecological Modelling (ECEM).

[10] David H. Wolpert. Stacked generalization. Neural Networks, 5:241259, 1992.

### Appendix: Workpackage 7, its structure, and

### summary of work in year 2

WP7 Objectives. The objective of WP7 is to develop methods for compu-tational scientic discovery that can learn complete supermodels (ensembles of ODE models) of dynamical systems. The sub-objectives of this WP include the development of techniques for (semi)automated generation of constituent models, selection of an appropriate subset of models, and learning the form and coecients of the interconnections among the models.

WP7 Description of work. WP7 will develop methods for learning com-plete supermodels. The supermodels are expected to be built in three phases: rst generate diverse models, then select a set of complementary models, and nally learn the interconnections between the constituent models of an en-semble. These three phases form the three tasks that constitute WP7:

• Task 7.1 Generate diverse set of ODE models. To generate a

diverse set of models, in this task, we adapt existing approaches from the area of ensemble learning. These include taking dierent subsam-ples of the data, taking projections of the data, and taking dierent learning algorithms (or randomized algorithms). Dierent subsets of domain knowledge may also be considered.

• Task 7.2 Select a complementary set of ODE models. Given

a set of models, in this task, we use a measure of similarity between models to select models that are complementary. Dierent measures of similarity (or model performance/quality) can be considered. Be-sides the sum of squared errors and correlation, these can include the weighted sum of squared errors or robust statistical estimators.

• Task 7.3 Learn to interconnect ODE models. In learning the

interconnections between the constituent models of the ensemble, in this task, we consider searching through the space of possible struc-tural forms of the interconnections, coupled with parameter tting for a selected functional form of the possible connections. For parameter tting, we will use global optimization methods based on meta-heuristic approaches. The use of such parameter estimation methods is of crucial importance in supporting the use of dierent quality criteria, as well as avoiding local optima in search.

Deliverables in WP7. The progress of each task from WP7 is reported in the planned deliverables as follows:

• D7.1 Report on the generation of a diverse set of ODE models.

This deliverable adresses task 7.1 and presents the progress on the task of generating a diverse set of ODE models. In the Annex 1 (Description of work) this deliverable was planned for month 18 of the duration of the project.

• D7.2 Report on the selection of a complementary set of ODE

models. This deliverable adresses task 7.2 and presents the progress on the task of selecting a complementary set of ODE models. In the Annex 1 (Description of work), this deliverable was planned for month 27 of the duration of the project.

• D7.3 Report on learning to interconnect ODE models. This

deliverable adresses task 7.1 and presents the progress on the task of learning to interconnect ODE models. In the Annex 1 (Description of work), this deliverable was planned for month 36 of the duration of the project.

Summary of activities from year 2. The objective of WP7 is to de-velop methods for computational scientic discovery that can learn complete supermodels (ensembles of ODE models) of dynamical systems. The su-permodels are expected to be built in three phases: generate diverse models, select a set of complementary models, and learn the interconnections between the constituent models of an ensemble. While this decomposition allows for independent development of the dierent components of the approaches to learning ensembles of ODE models, we can evaluate their performance only when we integrate them into complete approaches to learning ensembles. Thus, the issues related to nding working combinations of the methods de-veloped and proposed within D7.1 and D7.2 in year 2 are still open and are to be resolved within the task T7.3.

In Task 7.1 from year 2, we were focused on generating a diverse set of ODE models. To generate a diverse set of models, we adapted existing sampling approaches from the area of ensemble learning. The specicities of the task of subsampling an observed behavior of a dynamic system was taken into account. After dierent subsamples are generated, the base learner ProBMoT was applied to learn dierent ODE models from them.

The approaches adapted included subsampling the instance space and subsampling the feature space. Along the rst dimension, we considered the selection of random sub-intervals of the observation period. We adapted

bootstrap sampling and error-weighted sampling for the case of time-series data. Finally, we performed preliminary experiments (with some of these adaptations) with learning ODE models (same structure, dierent parame-ters) and included it in the preliminary deliverable targeting Task 7.3 in year 2.

Along the second dimension, we considered random sub-sampling of the variable space, as in random subspaces. We considered combining it with bootstrap sampling of the instance space. Moreover, we proposed a gen-eralized sub-sampling approach, which simultaneously subsamples both the instance and the feature space. Finally, we also suggested considering a procedure for sampling the template entities and processes from the library of background knowledge used by ProBMoT as a way to generate a set of diverse ODE models.

In Task 7.2 from year 2, we focused on selecting a set of complementary ODE models. To address this task, we rst attempted to dene the notion of diversity for ODE models. While model diversity has been extensively studied in the context of ensemble models, the bulk of work has considered the task of classication. For regression, co-variance is used as a measure of similarity/diversity of constituent models of an ensemble. To the best of our knowledge, the notion of diversity has not been studied in the context of ensembles of ODE models.

We based our denition of diversity on similarity measures between dy-namic system behaviors, i.e., multi-variate time series. These are based on similarity measures for single time-series. The similarities of corresponding time-series pairs in two behaviors are aggregated into an overall similarity. We considered a range of similarity measures for time-series, which we have collected and implemented. We explored their use in clustering of ODE models and observed behaviours of dynamic systems. These clustering ex-periments are important in the context of further work within Task 7.3, where a set of appropriate models has to be chosen for inclusion in the ensemble. After clustering a larger set of models, representative models from each clus-ter could be chosen for combination, for example.