GPU POWERED PARAMETER ESTIMATION OF A LARGE-SCALE KINETIC METABOLIC MODEL

(1)

GPU POWERED PARAMETER ESTIMATION OF A LARGE-SCALE KINETIC METABOLIC MODEL

Niccol`o Totis^(1)∗, Andrea Tangherloni^(2)∗, Marco Beccuti⁽¹⁾, Paolo Cazzaniga⁽³⁾, Marco S. Nobile⁽²⁾, Daniela Besozzi⁽²⁾, Marzio Pennisi⁽⁴⁾, Francesco Pappalardo⁽⁵⁾ (1) Department of Computer Science, University of Torino, Torino, Italy

(2) Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, Italy

(3) Department of Human and Social Sciences, University of Bergamo, Bergamo, Italy (4) Department of Mathematics and Computer Science, University of Catania, Catania, Italy

(5) Department of Drug Sciences, University of Catania, Catania, Italy

Keywords: Stochastic Symmetric Nets, Parameter Estimation, FST-PSO, LASSIE, GP- GPU Computing.

Kinetic modeling is a powerful tool to describe biochemical reaction systems. Consid- ering the biological complexity of metabolic systems, it is often challenging to represent models and to efficiently analyze their dynamics, as in the case of models that explicitly take into account different isoforms of metabolic enzymes. In this case, computation- ally expensive Parameter Estimation (PE) procedures are typically required since the kinetic characterization of the different isoforms is most of the times unavailable. In this work we tackle these issues with an approach that combines the descriptive power of Stochastic Symmetric Nets—a parametric and compact extension of the Petri Net formalism—with FST-PSO, an efficient optimization method suitable for the PE problem. To execute the large number of simulations required by the PE we exploit LASSIE, a GPU-powered deterministic simulator that offloads the calculations onto the GPU cores. LASSIE achieves around 30× speed-up with respect to CPU to carry out a PE of an intracellular large-scale kinetic model of human metabolism.

1 Introduction

Networks of intracellular metabolic reactions are inherently complex systems, characterized by thousands of reactions and metabolites. Kinetic models of metabolism try to exploit mechanistic biological information to give a representation of the dynamic behavior of the system. To this aim, both the structure and the kinetic parameters of the model need to be defined. Due to the high costs and complexity of experimental procedures, these data are scarce or incomplete and the resulting model is undetermined [Cazzaniga et al., 2014].

In metabolic systems, the flux of metabolites through the network depends on many interrelated processes like transcription, translation, post-translational modifications and allosteric control. Many sources of indetermination can affect models of metabolic systems, such as uncharacterized enzyme isoform mixtures [Smallbone et al., 2013]. En- zyme isoforms, or isozymes, are structurally similar but non-identical protein complexes that are able to catalyze the same biochemical reaction. Every cell type presents specific mixtures of isoforms of its metabolic enzymes; in order to quantify their abundance, experimental techniques like proteomics analyses are used. The structural differences of these proteins result in different kinetic behaviors of the catalytic process. However,

∗These authors contributed equally.

(2)

given the information we can currently find in the literature and in databases, we often cannot formulate a mathematical representation of these differences. The vast majority of published metabolic models do not take these kinetic differences explicitly into account, as they use one single enzymatic form to represent the average behavior of the whole isoform mixture. As a consequence, a single set of kinetic parameters is used, either taken from the literature or inferred through methods of Parameter Estimation (PE) [Nobile et al., 2018b]. A non-trivial limitation of this assumption emerges when, for the condition we are studying, the abundance of isoforms of a given enzyme differs substantially from the condition in which the kinetic parameters of this enzyme were estimated. The impact of the kinetics of each isozyme should in fact be weighted by the isoform abundance in the new condition. Since the expression level of all intracellular metabolic enzymes can change due to genetic knock-downs, therapeutic interventions, and various environmental stimuli, it becomes evident that the modeling of metabolic systems in perturbed conditions can dramatically benefit from the explicit representation of enzyme isoforms. In this paper we investigate a large-scale kinetic model of intracellular human metabolism—in which different enzyme isoforms are defined—described by means of Stochastic Symmetric Net formalism (SSN). In particular, the kinetic constants associated with the reactions involving the isoforms were unknown and therefore estimated exploiting a settings-free version of Particle Swarm Optimizer (PSO). Since PSO requires the execution of many simulations during each iteration, we coupled it with LASSIE, a GPU-powered deterministic simulator capable of drastically reducing the computation time otherwise required.

2 Scientific background

In this section we introduce the SSN formalism used to describe large-scale kinetic models and the GPU-powered methodology used to tackle the PE problem.

2.1 The SSN formalism

Stochastic Petri Net (SPN) and its extensions are effective formalisms to model biological systems thanks to their capability of representing in a simple and intuitive man- ner many important features of these systems, as well as to compute their qualitative and quantitative properties. In details, SPNs are bipartite directed graphs with two types of nodes: places and transitions. Places, graphically represented as circles, correspond to the state variables of the system, while transitions, graphically represented as boxes, correspond to the events that can induce a state change. The arcs connecting places to transitions and vice versa express the relation between states and event occurrences.

Places can contain tokens, drawn as black dots. The state of an SPN, called marking, is defined by the number of tokens in each place. The system evolution is given by the firing of an enabled transition, where a transition is enabled if and only if each input place contains a number of tokens greater than or equal to a given threshold defined by the cardinality of the corresponding input arc. The firing of an enabled transition removes a fixed number of tokens from its input places and adds a fixed number of tokens into its output places (according to the cardinality of its input/output arcs). In this work we focus on Stochastic Symmetric Net (SSN), a high level formalism that extends PN with colors and stochastic firing delays [Chiola et al., 1993]. Colors provide a more compact and readable representation of the system thanks to the possibility of having distinguished tokens, which can be graphically represented as dots of different colors.

Stochastic firing delays, sampled from a negative exponential distribution, allow us to automatically derive the underlying Continuous Time Markov Chain (CTMC) that can be studied to quantitatively evaluate the system behaviour. In the literature, different techniques are proposed to solve the underlying CTMC; in particular, in case of very complex models, the so-called deterministic approach [Kurtz, 1970] can be efficiently

(3)

exploited. According to this, in [Beccuti et al., 2015] we described how to derive a deterministic process, described through a system of Ordinary Differential Equations (ODEs), which well approximates the stochastic behavior of an SSN model.

2.2 GPU-powered Parameter Estimation

The dynamics of mathematical models of biological systems can be accurately simulated only when a precise parameterization is available. Unfortunately, kinetic parameters are difficult or impossible to measure by means of in vivo experiments, and the lack of these values limits the execution of computational investigations. This leads to the PE issue, which is a non-linear, non-convex and multi-modal optimization problem that can be tackled by means of Computational Intelligence techniques. One of the most effective technique for the PE is PSO, a population-based meta-heuristic be- longing to Swarm Intelligence techniques, designed to deal with real-valued optimization problems [Kennedy and Eberhart, 1995]. In PSO, a swarm of candidate solutions (called particles) moves inside a bounded search space cooperating to identify the opti- mal solution to the given problem. During each iteration, the position of each particle changes following two attractors, that is, the best position found by the swarm so far and the best position found by the particle so far, respectively. The social csoc ∈ R⁺ and the cognitive c_cog ∈ R⁺ parameters are used to balance the aforementioned attractors. Moreover, an inertia factor w ∈ R⁺ is exploited to weigh the velocity of the particles avoiding chaotic behaviors in the swarm. In this work we employ FST-PSO [Nobile et al., 2018a], a settings-free version of PSO, in which each particle automatically adjust its own settings by means of fuzzy rules, during each iteration. In general, the most time consuming task of PE is the fitness calculation that, in this work, consists in simulating the dynamics of the model using the kinetic parameters encoded by every particle in the swarm and comparing the outcome with a target dynamics. To speedup the calculation of the fitness function we rely on LASSIE, a GPU-powered deterministic simulator of large-scale biochemical models [Tangherloni et al., 2017]. LASSIE solves systems of ODEs by distributing all the required calculations on the available GPU cores. In this work, we used an improved version of LASSIE that exploits both fine- and coarse-grained parallelization strategies. This double level of parallelization allows LASSIE to (i) perform many simulations of the same ODEs system (characterized by different parameterizations) in a parallel fashion, and (ii) accelerate the numerical in- tegration of each instance of the ODEs system by distributing the required calculations on the GPU cores.

3 Red blood cell metabolic network

In this paper we consider the model of the red blood cell metabolism presented in [Jamshidi and Palsson, 2010], which consists in fully parameterized reactions following the mass action law. The model contains 92 metabolites and 94 reactions describing the central pathways for carbohydrate metabolism, namely, the glycolythic and pentose phospate pathways in a human red blood cell. For our analyses, we neglected the tissue specificity and we used it as a model for a generic human cell. The structure of the corresponding SPN representation was generated automatically starting from the stoichiometric matrices and the list of kinetic parameters. All the transitions related to uptake/release reactions were removed based on unreasonable behaviors of the baseline model. Introducing a color class, the SPN model was then converted into an SSN model. Potentially, all the places associated with a metabolic enzyme could have been colored. To ease the interpretation of our results, in this work we limited to color the places specifically related to one enzyme, hexokinase (HK), which converts glucose into glucose-6-phosphate. With the colored places we thus intended to represent the three isoforms of HK (i.e., HK I, II and III) that are the most abundant across different cell

(4)

Figure 1: SSN of the glycolythic (red box) and pentose phosphate (yellow box) pathways. The blue box highlights the colored places and transitions associated with HK. HK is represented as an instance of the circular color class Isof, containing static color subclasses I1, I2, I3 that refer to HK type I, II and III isoforms, respectively. A detail of the net (left side of the blue box) shows how kinetic and structural information are encoded in the SSN. GLC: glucose; LAC: lactate.

types and are known to have different kinetic properties. The SSN of the expanded model, comprising 92 places and 174 transitions, is reported in Figure 1. Out of the 92 places, 11 are colored and are involved in 26 transitions that represent the elementary steps of HK reaction. The compactness of the SSN representation can be appreciated considering that a non-colored SPN with an identical behavior would have needed to include two redundant replicated for the colored places and related transitions, resulting in a SPN with 114 places and 226 transitions in total.

4 Results

In this section we present the results of the PE carried out by integrating Great- SPN [Babar et al., 2010]—a framework for the analysis of Discrete Event Dynamic Systems described through the PN formalisms—with FST-PSO [Nobile et al., 2018a]

and LASSIE [Tangherloni et al., 2017]. The SSN model we created, reported in Fig- ure 1, represents a variation of the model in [Jamshidi and Palsson, 2010] in which we introduced a higher level of detail by specifying the three main isoforms of HK enzyme, namely HK I, II and III, whose relative abundances are assumed to be 0.6, 0.3 and 0.1, respectively. By so doing, we included new reactions, corresponding to transitions in the SSN model, whose kinetic constants are unknown. We show here how this indetermination can be overcome with a procedure of PE that exploits the computational effectiveness of FST-PSO coupled with LASSIE. Overall, we propose a workflow with the following steps: 1) the approach described in [Jamshidi and Palsson, 2010] is used to build a baseline large-scale kinetic metabolic model with mass action kinetics; 2) the model is represented with a SSN, where colored places represent enzyme isoforms of biological relevance; 3) new experimental data are gathered and then exploited to infer the isoform-specific kinetic parameters; 4) the model is used to predict the behavior of the system when isoform-specific modifications are introduced. The additional experimental data needed to achieve these finer predictions are i) time-course measurements of the main upstream and downstream metabolites, and ii) a quantification of the pro- portion of isozymes in the mixture. The former data can be produced by biochemical as- says or Mass Spectrometry techniques, while the latter can be derived from proteomics experiments or estimated from gene-expression data. To test the effectiveness of the approach presented in this paper, we relied on synthetic experimental data, generated by simulating the baseline model in which no isoforms are specified. The model ex- tended with the different isoformes includes 78 new transitions whose kinetic constants are unknown and were inferred to the aim of obtaining the best fitting with the synthetic time-series of glucose (GLC) and lactate (LAC) concentrations in a 50 hours time window. Figure 2 (left) shows the comparison between the target data and the simulation

(5)

0 2 4

Molecular concentration

GLC

CMA-ES DE GAs FST-PSO DTTS

0 10 20 30 40 50

Time [hours]

2.5 5.0 7.5 10.0

LAC

0 20 40 60 80 100

Iterations 10⁰

10¹

ABF

CMA-ES DE GAs FST-PSO

Figure 2: (Left) Dynamics of the GLC (top) and LAC (bottom) species of the red blood cell metabolic network, obtained with the best parameterization found by CMA-ES, DE, GA and FST-PSO, compared with the target data (DTTS). (Right) ABF calculated running CMA-ES, DE, GA and FST-PSO for 20 independent repetitions.

0 1 2 3 4 5

GLC base 50%

75%

100%

0.00 0.01 0.02 0.03

G6P

0.000 0.002 0.004 0.006

FDP

0.000 0.002 0.004 0.006

GAP

0 5 10

Time [hours]

0 1 2 3 4 5 6

PYR

0 5 10

Time [hours]

2 4 6 8

LAC

0 5 10

Time [hours]

0.00050 0.00075 0.00100 0.00125 0.00150

0.00175 GL6P

0 5 10

Time [hours]

0.004 0.006 0.008 0.010 0.012

R5P

Figure 3: Dynamic profiles of metabolite concentration simulated with the baseline model, and with increasing (50, 75, 100%) knock-down interventions on the HK I isoform.

obtained with the best parameterizations found by Covariance Matrix Adaptation Evo- lution Strategy (CMA-ES), Differential Evolution (DE) and Genetic Algorithms (GAs) [Nobile et al., 2018b]. We observe that all meta-heuristics but CMA-ES achieved a per- fect fitting at the end of the optimization process. Figure 2 (right) reports the Average Best Fitness (ABF) calculated according to the results of 20 independent PE repetitions, showing that FST-PSO is capable of outperforming the other meta-heuristics for this specific task. From a computational point of view, LASSIE allowed for drastically reducing the running time required to execute the PE of the red blood cell metabolic network, achieving a 30× speedup running on a NVIDIA GeForce Titan X with respect to the same analysis executed on the CPU. We then explored how the model could be used to reproduce the effect of an isoform-specific modification of the system. This scenario can represent either a gene knock-down experiment, the effect of a drug with an isoform-specific target, or a change in isozyme expression after the cell is subjected to an environmental stimulus. In Figure 3 we show how 50%, 75% and 100% reduc- tions of the concentration of the HK isoform with the highest abundance (0.6) affect the dynamics of key metabolites in the network in a 10 hours time window. Noteworthy, from these results we can see that a complete knock-out of one isoform is not neces- sarily detrimental for intracellular energy-producing pathways, as other isoforms with efficient kinetics can provide alternative catalytic routes. Moreover, if we compare the plots of GLC and LAC we can notice that altering the activity of enzymes like HK, which is known to exert high control over the whole pathway [Smallbone et al., 2013], indeed produces proximal and distal effects with the same scale.

(6)

5 Conclusion and future perspective

When computational models are used to investigate the effects produced by the different kinetic properties of metabolic isozymes, issues of model representation and efficient simulation of the dynamics often arise. In this work, we showed how the compact and parametric representation achieved with SSNs can be combined with FST-PSO coupled with LASSIE to effectively and efficiently solve the PE problem, thanks to the parallelization of the computations on the GPU. The approach presented in this paper is particularly well suited to deal with complex metabolic models that, for instance, include many reactions alternatively catalyzed by various isozymes with unknown kinetic parameters. As a matter of fact, we successfully estimated the 78 missing parameters related to HK isozymes. These allowed us to correctly reproduce the network behavior and to perform new in silico experiments. Thanks to the GPU-powered simulator exploited in this work, we achieved a 30× speed-up with respect to the CPU. As a future extension, we will apply our methodology to expand other portions of the intracellular metabolic network, so that the effects of combined modifications on multiple enzymes could be analyzed. In addition, we plan to employ multi-GPU systems to assess the computational performance of our approach.

Acknowledgments

This work was conducted in part using the resources of the Advanced Computing Center for Research and Education at Vanderbilt University, Nashville, TN, USA.

References

[Babar et al., 2010] Babar, J., Beccuti, M., Donatelli, S., and Miner, A. S. (2010). GreatSPN enhanced with decision diagram data structures. In Lilius, J. and Penczek, W., editors, Application and Theory of Petri Nets. PETRI NETS 2010, volume 6128 of LNCS, pages 308–317. Springer, Berlin, Heidelberg.

[Beccuti et al., 2015] Beccuti, M., Fornari, C., Franceschinis, G., Halawani, S. M., Ba-Rukab, O., Ah- mad, A. R., and Balbo, G. (2015). From symmetric nets to differential equations exploiting model symmetries. Comput. J., 58(1):23–39.

[Cazzaniga et al., 2014] Cazzaniga, P., Damiani, C., Besozzi, D., Colombo, R., Nobile, M. S., Gaglio, D., Pescini, D., Molinari, S., Mauri, G., Alberghina, L., and Vanoni, M. (2014). Computational strategies for a system-level understanding of metabolism. Metabolites, 4:1034–1087.

[Chiola et al., 1993] Chiola, G., Dutheillet, C., Franceschinis, G., and Haddad, S. (1993). Stochas- tic well-formed coloured nets for symmetric modelling applications. IEEE Trans. on Computers, 42(11):1343–1360.

[Jamshidi and Palsson, 2010] Jamshidi, N. and Palsson, B. Ø. (2010). Mass action stoichiometric simulation models: incorporating kinetics and regulation into stoichiometric models. Biophysical Journal, 98(2):175–185.

[Kennedy and Eberhart, 1995] Kennedy, J. and Eberhart, R. C. (1995). Particle swarm optimization. In Proc. Int. Conf. Neural Networks, volume 4, pages 1942–1948. IEEE.

[Kurtz, 1970] Kurtz, T. G. (1970). Solutions of ordinary differential equations as limits of pure jump Markov processes. J. Appl. Probab., 1(7):49–58.

[Nobile et al., 2018a] Nobile, M. S., Cazzaniga, P., Besozzi, D., Colombo, R., Mauri, G., and Pasi, G.

(2018a). Fuzzy self-tuning PSO: A settings-free algorithm for global optimization. Swarm Evol.

Comput., 39:70–85.

[Nobile et al., 2018b] Nobile, M. S., Tangherloni, A., Rundo, L., Spolaor, S., Besozzi, D., Mauri, G., and Cazzaniga, P. (2018b). Computational intelligence for parameter estimation of biochemical systems.

In Evolutionary Computation (CEC), 2018 IEEE Congress on. IEEE.

[Smallbone et al., 2013] Smallbone, K., Messiha, H. L., Carroll, K. M., Winder, C. L., Malys, N., Dunn, W. B., Murabito, E., Swainston, N., Dada, J. O., Khan, F., et al. (2013). A model of yeast glycolysis based on a consistent kinetic characterisation of all its enzymes. FEBS Letters, 587(17):2832–2841.

[Tangherloni et al., 2017] Tangherloni, A., Nobile, M. S., Besozzi, D., Mauri, G., and Cazzaniga, P.

(2017). LASSIE: simulating large-scale models of biochemical systems on GPUs. BMC Bioinformat- ics, 18(1):246.