Programming biological models in Python using PySB

(1)

(Article begins on next page)

The Harvard community has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

Lopez, Carlos F, Jeremy L Muhlich, John A Bachman, and Peter

K Sorger. 2013. Programming biological models in python using

PySB. Molecular Systems Biology 9: 646.

Published Version

doi:10.1038/msb.2013.1

Accessed

February 19, 2015 12:02:01 PM EST

Citable Link

http://nrs.harvard.edu/urn-3:HUL.InstRepos:10611794

Terms of Use

This article was downloaded from Harvard University's DASH

repository, and is made available under the terms and conditions

applicable to Other Posted Material, as set forth at

http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

(2)

Programming biological models in Python using PySB

Carlos F Lopez1,3, Jeremy L Muhlich2,3, John A Bachman2,3and Peter K Sorger2,*

1 _{Department of Cancer Biology, Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, TN, USA and}2 _{Department of Systems Biology,}

Center for Cell Decision Processes, Harvard Medical School, Boston, MA, USA

3 _{These authors contributed equally to this work}

* Corresponding author. Department of Systems Biology, Center for Cell Decision Processes, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115, USA. Tel.:þ1 617 432 6901; Fax: þ1 617 432 5012; E-mail: [email protected]

Received 31.8.12; accepted 7.1.13

Mathematical equations are fundamental to modeling biological networks, but as networks get large and revisions frequent, it becomes difficult to manage equations directly or to combine previously developed models. Multiple simultaneous efforts to create graphical standards, rule-based languages, and integrated software workbenches aim to simplify biological modeling but none fully meets the need for transparent, extensible, and reusable models. In this paper we describe PySB, an approach in which models are not only created using programs, they are programs. PySB draws on programmatic modeling concepts fromlittle band ProMot, the rule-based languages BioNetGen and Kappa and the growing library of Python numerical tools. Central to PySB is a library of macros encoding familiar biochemical actions such as binding, catalysis, and polymerization, making it possible to use a high-level, action-oriented vocabulary to construct detailed models. As Python programs, PySB models leverage tools and practices from the open-source software community, substantially advancing our ability to distribute and manage the work of testing biochemical hypotheses. We illustrate these ideas using new and previously published models of apoptosis.

Molecular Systems Biology9: 646; published online 19 February 2013; doi:10.1038/msb.2013.1

Subject Categories:computational methods; simulation and data analysis

Keywords: apoptosis; modeling; rule-based; software engineering

Introduction

Mechanistic studies that couple experimentation and compu-tation typically rely on models optimized for specific questions and biological settings. Such ‘fit-to-purpose’ models can effectively describe and elucidate complex biological pro-cesses, but given the available data, they are usually restricted in scope, encompassing only a subset of cellular biochemistry (Batcheloret al, 2008; Xuet al, 2010; Huberet al, 2011; Schleich

et al, 2012). Even in disciplines in which modeling is more

mature, all-encompassing models are rare. However, it is common for fit-to-purpose models to require changes invol-ving the addition, modification or elimination of species and reactions based on new discoveries. Often a family of models is needed (Albecket al, 2008b; Muzzeyet al, 2009; Rehmet al, 2009; Xu et al, 2010; Kleiman et al, 2011), each of which represents a competing molecular hypothesis derived from the literature, a different way of encoding ambiguous ‘word models’ drawn from molecular biology, or postulated differ-ences in a network from one cell type to the next (Gnadet al, 2012). One of the promises of systems biology is that collaborative and iterative construction and testing of models can improve hypotheses by subjecting them to an ongoing process of testing and improvement. However, the current

proliferation of independently derived fit-to-purpose models frustrates this goal (Krakaueret al, 2011). We require ‘second generation’ tools that leverage existing approaches to biologi-cal model construction and documentation while adding new means for modifying, comparing and sharing models in a transparent manner.

Dynamic biological systems are generally modeled using stochastic or deterministic rate equations. The latter can be described using networks of ordinary differential equations (ODEs) that precisely represent mass action kinetics. However, when a network model has many species and variables, equations become a poor tool for model development and understanding. Even familiar biochemical processes are remarkably complex at the level of equations: for example, fully accounting for the binding and posttranslational mod-ifications underlying activation of growth factor receptors can require thousands of equations that are tedious to generate, hard to error-check and difficult to understand (Hlavaceket al, 2006). The need for frequent updates is also a challenge because even a simple modification of a biochemical cascade can require dozens of small changes in the corresponding equations. When operating at the level of equations, it is also difficult to reuse the work of others directly. For example, in the field of apoptosis, Howellset al(2010) described a conceptual

(3)

extension of a previously published model of Bcl-2 proteins (which control mitochondrial outer-membrane permeabiliza-tion (MOMP), Box 1) (Chenet al, 2007a) by adding the BH3-only Bcl-2 family member Bad. Interactions among the core set of Bcl-2 proteins were identical between the two models, but Howellset al(2010) rewrote the original set of ODEs simply to add a few new reactions. Manually rebuilding earlier models is not only time-consuming but also error-prone: as described in detail below, the practice has introduced errors and unin-tended changes in another pair of related apoptosis models. Moreover, the tendency to make numerous trivial changes in duplicated elements (e.g., by renaming species) makes it difficult to focus on key differences, frustrating later attempts at model comparison (Mallavarapuet al, 2008).

Several projects are underway to address the problems of model proliferation and complexity using formalisms that aim for abstraction, transparency and reusability. Chief among these is rule-based modeling (Hlavaceket al, 2006; Bachman and Sorger, 2011) in which models are created using specialized languages such as BioNetGen language (BNGL) (Faederet al, 2009) or Kappa (Danos et al, 2007b). These languages describe local interaction rules between specific domains or ‘sites’ on molecules (e.g., enzymes and their substrates) and are easier to understand and reuse than

equations (Danos, 2007). Rule-based approaches enable modeling of otherwise intractably complex systems in which posttranslational modification and the formation of multi-protein signaling complexes give rise to large numbers of distinct species (Blinovet al, 2006; Sneddonet al, 2010; Deeds

et al, 2012). Rules can also be subjected to formal analysis

(Danoset al, 2008; Feretet al, 2009) and used to generate both deterministic and agent-based simulations (Danoset al, 2007a; Sneddonet al, 2010).

Although powerful, rule-based languages such as BNGL and Kappa do not exploit ‘higher-order’ patterns in biochemical systems such as multi-step catalysis, scaffold assembly, polymerization, receptor internalization, etc. These patterns often recur several times in a single model and also are frequently encountered in models of different molecular networks. Currently, it is necessary to regenerate the rule sets for biochemical patterns each time the patterns arise. The creation of models having related but variant topologies presents an important special case of a higher-order pattern, particularly when a core process remains the same across all models and modifications focus on specific reactions. In both rule- and ODE-based models, it is necessary to implement the change for each model in the set, a laborious process when the number of models is large. Modeling tools that leverage

Box 1 TRAIL-mediated apoptosis and the Bcl-2 protein family

C8 Bid Bax Bax*mito Bcl-2 Pore DISC TRAIL TNF XIAP Ub_C3* C6 C3 cPARP Smac CyC C9 Apaf Apoptosome CyC DISC module PARP module MOMP module

TRAIL is a prototypical pro-death ligand that binds transmembrane DR4 and DR5 receptors and leads to formation of the intracellular, multi-component death-inducing signaling complex (DISC). Autocatalytic processing of initiator procaspases-8 and -10 at the DISC allows the enzymes to cleave pro3 but caspase-3 activity is held in check by XIAP, an Ecaspase-3 ubiquitin ligase that blocks the caspase-caspase-3 active site and targets the enzyme for ubiquitin-mediated degradation. In most cell types, activation of caspase-3 and consequent cell killing requires MOMP. MOMP allows translocation of cytochromecand Smac into the cytosol where Smac binds and inactivates XIAP and cytochromecactivates pro-caspase-9, two reactions that generate cleaved, active caspase-3.

MOMP is regulated by a family ofB20 Bcl-2 proteins (Youle and Strasser, 2008) having three functions: pro-apoptotic effectors Bax and Bak assemble into pores, anti-apoptotic Bcl-2, Bcl-XL, Mcl-1, Bcl-W, and A1 proteins block pore formation, and the ‘BH3-only proteins’ such as Bid, Bim, and Puma activate the effectors and inhibit the anti-apoptotics. Bid, the most important BH3-only protein for extrinsic cell death, is a direct substrate of caspases 8/10 and its active form (tBid) binds to and activates Bax and Bak via a recently elucidated structural transition (Kimet al, 2009; Gavathiotiset al, 2010). The overall pathway can be roughly divided into a ‘receptor to Bid’ module (yellow in the figure), a ‘pore to PARP’ module (blue), and a MOMP module (orange).

Structural and cellular studies of Bcl-2 proteins are consistent with a variety of different mechanisms for MOMP. A direct model posits that effectors form pores only when activated by proteins such as tBid (Letaiet al, 2002; Kimet al, 2006; Renet al, 2010). The indirect model proposes that Bax and Bak are constitutively able to form pores but are held in check by anti-apoptotic Bcl-2 proteins (Williset al, 2007). Recent studies support a combination of both direct and indirect mechanisms (Me´rinoet al, 2009; Leberet al, 2010; Llambiet al, 2011). The ‘embedded together’ model emphasizes the active role of membranes in determining the conformational states and activity of Bcl-2 proteins and that the anti-apoptotic Bcl-2 proteins possess all of the same functional interactions as the effectors except pore formation and therefore function as dominant-negatives (Billenet al, 2008; Leberet al, 2010). Controversy about MOMP mechanisms reflects the large number of Bcl-2 proteins involved, the role of protein compartmentalization and localization in activity, the diversity of apoptosis inducers, and the fact that different cell types express different Bcl-2 proteins at very different levels. Detailed mechanistic models of MOMP are nonetheless important for rationalizing the selectivity of anti-Bcl-2/ Bcl-XL drugs, such as ABT-263, understanding the oncogenic potential of proteins, such as Mcl-1 and Bcl-2, and elucidating the precise mechanisms of action of pro-apoptotic chemotherapeutics.

(4)

approaches from software engineering are one way to increase reusability and reduce duplication (Mallavarapuet al, 2008; Pedersen and Plotkin, 2008; Danoset al, 2009; Mirschelet al, 2009; Gnadet al, 2012). In particular, LISP frameworks such as

little b(Mallavarapuet al, 2008) and ProMot (Mirschel et al,

2009) have demonstrated the value of programmatic

approaches. However, ProMot does not use rules, limiting its effectiveness for combinatorially complex systems; little b,

while implementing rules internally, does not interoperate with tools and languages from the broader rule-based modeling community and is no longer in development (the similarities and differences between the little b and ProMot approaches have been described previously (Mallavarapu et al, 2008)). Combining the strengths of rule-based and programmatic approaches to modeling is a key goal of the work described here. A benefit of modeling biological systems using contemporary approaches from computer science and open-source software engineering is the ready availability of tools and best practices for managing and testing complex code. Good software engineering practice promotes abstraction, composition and modularity (Mallavarapu et al, 2008; Mirschel et al, 2009). Through abstraction, the core features of a concept or process are separated from the particulars: for example, a pattern of biochemical reactions (e.g., phosphorylation–dephosphorylation of a substrate) is described once in a generic form as a subroutine and then instantiated for specific models simply by specifying the arguments (e.g., species such as Raf, PP2A, and MEK). In programming, abstraction is achieved through the use of parameterizable functions or macros that are written once and then invoked as needed. Functions can be built up from other functions, a process known as composition. Abstraction and composition can occur at all levels of complexity: just as complex functions can be built from simple functions, large programs can be built up from smaller subsystems that are documented and tested individually. When these subsystems have well-defined input–output interfaces, they can be used as libraries that make it possible to write new programs using a simple vocabulary of well-tested concepts (e.g., a library of biochemical actions or core pathways such as the MAPK cascade) (Pedersen and Plotkin, 2008). The decomposition of complex biological models in this fashion facilitates extensibility and transparency, because well-developed mechanisms can be reused and changes can be localized to the subsystem that needs revision.

Contemporary software engineering has much to teach us about the difficult task of developing and documenting models in a distributed setting. Software engineers ‘publish’ their findings using robust programming tools that support code annotation, documentation, and verification, all significant challenges in biological modeling (Hlavacek, 2009). The open-source software community also provides a valuable socio-cultural framework for managing large, collaborative projects in the public domain. Version control tools such as Git and Subversion, along with ‘social coding’ websites such as GitHub, have facilitated the collaborative development of software as complex as the kernel of the Linux operating system (http://github.com). It would be highly desirable to exploit such social and technical innovation in solving the problems of incremental model development and reuse in biology.

In this paper, we describe PySB, an open-source program-ming framework written in Python that allows concepts and methodologies from contemporary software engineering to be applied to the construction of transparent, extensible and reusable biological models (http://python.org; Oliphant, 2007). A critical feature of modeling with PySB is that models are Python programs, and tools for documentation, testing, and version control (e.g., Git) can be used to help manage model development. Strictly speaking, a PySB ‘model’ is a Python program, that, when executed, produces another program (the underlying reaction rules) that can be analyzed or used to create equations for simulation. The PySB frame-work provides a high-level action-oriented vocabulary con-gruent with our intuitive understanding of biochemistry (A phosphorylates B, C translocates to the nucleus, etc.). PySB is closely integrated with Python numerical tools for simulation and parameter estimation and graphical tools that enable plotting of model trajectories and topologies. We demonstrate the use of PySB to re-instantiate 12 previously published ODE-based models of the Bcl-2 family proteins that regulate apoptosis (Chenet al, 2007a; 2007b; Cuiet al, 2008; Albecket al, 2008b; Howellset al, 2010). We show how PySB can be used to decompose models into reusable macros that can be independently tested, and we generate composite models that combine a prior model from our lab describing extrinsic apoptosis (Albeck et al, 2008b; 2008a) with alter-native hypotheses about Bcl-2 family interactions from other investigators. Finally, we develop and calibrate an expanded cell-death model that spans the diversity of the multi-protein Bcl-2 family and draws on findings from leading biochemists in the field, depicting a unified, ‘embedded together’ mechan-ism for mitochondrial membrane permeabilization (Leber

et al, 2010; Llambiet al, 2011). All models in this paper, along

with the PySB source code and documentation, are available for sharing and further development at GitHub and the PySB website (http://pysb.org; Materials and methods).

Results

We chose Python as the language for PySB because of its widespread use in the computational biology community, support for object-oriented and functional programming, and rich ecosystem of mathematical and scientific libraries. At the outset, we determined that PySB should interoperate seam-lessly with BioNetGen (Faederet al, 2009) and Kappa (Danos

et al, 2007b), and thereby build on substantial investments in

rule-based modeling. A PySB model consists of instances of a core set of classes: Model, Monomer, Parameter, Compartment, Rule, InitialandObservable, closely mirroring the form of BNGL and Kappa models. However, in PySB, the component declarations return software objects inside Python, allowing model elements to be manipulated programmatically. To simplify the process of writing rules and to maximize the syntactic match with BNGL, PySB redefines (overloads) some of the mathematical operators in Python to create a shorthand that resembles chemistry notation (Box 2). For example, in the context of a PySB rule definition, the ‘þ’ operator (which in other contexts represents mathematical addition) is used to enumerate a list of reactants or products. It

(5)

is not necessary to use overloaded PySB operators but it makes the models easier to write and understand (see ‘PySB syntax’ in the Materials and methods and the Supplementary Materials).

By way of illustration, consider a ‘Hello World’ program in PySB involving reversible binding of proteinsLandR, each of which contains a single binding sites. A PySB program for this simple reaction has high overhead relative to the equivalent set of ODEs but serves to introduce the basic PySB syntax and show how it interoperates with other software such as BNG, the VODE integrator (Brownet al, 1989), and the Matplotlib plotting library (Hunter, 2007). In the first block of the program (Figure 1A; block 1), a declaration of molecule types (‘monomers’ in PySB) is followed by the forward and reverse rate parameters,kfandkr, and initial conditions for unbound LandR. The syntax for the reversible binding rule (which will be familiar to users of rule-based languages) reads as follows: when the proteinsLandRboth have empty binding sitess (e.g.,L(s¼None)), they reversibly bind to form a complex that shares a single ‘bond’ (e.g.,L(s¼1) % R(s¼1)), at rateskfandkr. This approach to naming binding sites (and calling interactions ‘bonds,’ without implying covalent mod-ification) is drawn from rule-based languages and makes it possible to describe molecules having multiple sites of interaction with different specificities, modifications, and occupancy states (e.g., the distinct binding sites on the TRAIL receptor for ligand and death-inducing signaling complex (DISC) adaptor proteins). The ‘Hello World’ model concludes by designating an observable of interest, the complexLR. (For additional details on the syntax used in this model, see Box 2 and the ‘PySB syntax’ section of the Supplementary Materials; a unified modeling language (UML) class diagram of the core PySB classes is also provided in Supplementary Figure S3.)

The second block of code in Figure 1A defines a time range and calls the PySB functionodesolve, which performs determi-nistic model simulation by invoking BNG (to generate the reaction network) and the numerical integrator VODE (see also Figure 1B). Simulation results are returned as a matrix that is graphed using theplot command from Matplotlib (Figures 1A, block 3 and B).

Using macros to model recurrent biochemical actions

The benefits of PySB become apparent only with more complex and interesting models in which programming constructs such as conditionals, loops, functions, classes, and modules are used to define reusable elements. The complexity of these elements can vary from a few species to multi-component cascades.

Macros,reusable Python functions that are programmatically

expanded into rules and reactions, define low-level biochem-ical actions such as ‘catalyze,’ ‘bind,’ or ‘assemble,’ mirroring the way we describe biological processes verbally. PySB currently contains 13 general-purpose macros covering rever-sible binding, catalytic modification, synthesis, degradation, and pore assembly. Users can generate other model-specific macros to implement new or uncommon mechanisms, thereby creating libraries of biochemical actions for subsequent modeling projects (we are currently adding new macros ourselves); phosphorylation/dephosphorylation cascades and loops are obvious candidates for such a library.

As an example, thecatalyzemacro implements a mass action model of an enzyme-mediated biochemical transforma-tion (Figure 2A) based on six user-specified arguments: the enzyme and its binding site for substrate, the substrate and its binding site for enzyme, the product, and a list of forward, reverse, and catalytic rate constants. Figure 2A illustrates the application of catalyze to a reaction in which caspase-8 cleaves Bid to form truncated Bid (tBid; see Box 1 for a description of the relevant biology). Improved transparency is an important benefit of using macros in that they explicitly describe how elementary reactions are implemented. For example,catalyzeinvokes a two-step model of catalysis in which an enzyme–substrate complex is formed as an inter-mediate step:EþS2ES-EþP.Some published models of apoptosis (Chenet al, 2007a; 2007b; Cuiet al, 2008; Howells

et al, 2010) use a pseudo-first-order, one-step approximation

EþS-EþPthat has the merit of fewer parameters. However, depending on time scales and parameter values, there is a profound difference in the dynamics of one- and two-step catalysis: the one-step model, for example, makes the strict assumption that the enzyme always operates in its linear range and cannot be saturated. These differences are not apparent from the visual or existing verbal descriptions of the ODEs but instead require careful retrospective analysis. In contrast, PySB allows exploration of mechanistic differences simply by calling an alternative catalysis macro,catalyze_one_step (Supplementary Figure S1B). The benefit of macros is not only that the model is more concise (the catalyzemacro call replaces two rules and four ODEs) but also that the difference between one- and two-step catalytic schemes is clearly declared and need not be inferred retrospectively. Because macros are tested programmatically, they also ensure correct instantiation Box 2. PySB embeds a biological modeling language within Python.

Existing computer languages developed for biological modeling (e.g., BNGL or SBML) use a specialized syntax to concisely encode the detailed specifications of biological models. However, such domain-specific languages (DSLs) lack many features found in general-purpose programming languages (functions, classes, loops,etc) that can be used for organizing complex code and making it more human-readable. They also often lack ancillary tools commonly found in general-purpose programming languages, such as testing frameworks or documentation generation support. Models written using PySB are programs in the Python programming language—they are not specialized file formats interpreted by modeling programs. When executed, a PySB model programmatically ‘builds up’ the elements of a rule-based model (molecule types, rules, parameters, etc) until the specification is complete; the Python model object that results can then be subjected to further analysis. Relative to a DSL, a traditional, object-oriented approach to building up a model in this way would require many extra lines of code to create and track objects. PySB streamlines the process of programmatically building models by overloading several Python operators (þ, 44, o4, ( ), %) to allow biological rules to be expressed in a chemistry-like syntax based on BNGL. In addition, the SelfExporter helper class allows models to be built up declaratively in BNGL-like fashion, further minimizing the required code. The overall result is a specialized language for biological modeling ‘contained within’ Python and implemented using Python operators (the Python package SymPy for symbolic mathematics uses a similar approach). Although this ‘syntactic sugar’ streamlines the most common modeling use cases, in some advanced modeling scenarios, a more traditional programming syntax may be preferred; the streamlined syntax is thus entirely optional. The PySB syntax is described in detail in the Supplementary Materials.

(6)

of forward, reverse, and catalytic reactions, an important benefit because failure to implement these processes correctly is remarkably common (see below and Supplementary Note). The power of macros is most evident with complex biochemical actions. For example, the assemble_pore_ sequentialmacro (Figure 2B) implements sequential assem-bly of a membrane-associated pore from multiple identical subunits. It was written to explore how Bax and Bak associate in the mitochondrial outer membrane to form the pore that translocates cytochrome c and Smac into the cytosol during MOMP (Youle and Strasser, 2008). The arguments to assem-ble_pore_sequentialare the identity of the pore-forming protein and its two binding sites, the maximum number of proteins in a pore and a table containing rate constants for each assembly step.In vitroexperiments suggest that Bax and Bak assemble by sequential addition of subunits (Martinez-Caballero

et al, 2009), but the precise size of an active MOMP pore is

unknown. Exploring the effects of changing the numbernof subunits in a pore requires creation and modification ofn–1

BNGL rules or n ODEs but only one value in the PySB assemble_pore_sequentialmacro.

Many biochemical processes are controlled by families of proteins that have overlapping binding specificities. In apoptosis, MOMP is regulated byB20 pro- and anti-apoptotic Bcl-2 family proteins that have a range of affinities for each other and bind in various arrangements (Box 1). Modeling the binding of any two Bcl-2 proteins is simple, but managing equations or rules for all possible binding reactions is much

harder. The bind_table macro uses a simple tabular

representation to model interactions among members of

multi-protein families (Figure 2C). The first argument to the macro is a table (a list of lists in Python) in which row and column headers identify pairs of interacting proteins and each table entry contains binding constants or the valueNoneto indicate that there is no measurable interaction. The second and third arguments specify the binding site names for row and column species, respectively. In addition to being concise (a single bind_tablecall for the Bcl-2 proteins generates 28 rules and 41 ODEs), the tabular format highlights relationships between proteins by grouping them into functional classes; new family members can be introduced simply by adding rows and columns. Abind_table is therefore a simple compu-table representation of the ‘binding codes’ that have been created by others to summarize structural and biochemistry studies on Bcl-2 family members (Chenet al, 2005; Kuwana

et al, 2005; Certoet al, 2006). By changing the first argument in

the bind_table call, it is straightforward to explicitly substitute one set of binding data for another, a useful feature for exploring differences in published binding codes or for modeling the behavior of mutant proteins (Fire et al, 2010; Debartolo et al, 2012). Models that use macros such as bind_tablenaturally acquire a ‘self-documenting’ character that minimizes the need for additional explanatory description (see Figures 4B and 5A for examples ofbind_tablein the context of models of MOMP initiation).

Modules and model reuse

Macros abstract biochemical reactions at a fairly low level of detail involving a few proteins, but Python also supports a from pysb import *

from pysb.integrate import odesolve frompylabimport plot, linspace

Model()

# Declare the monomers

Monomer('L', ['s']) Monomer('R', ['s'])

# Declare the parameters

Parameter('kf', 1e-3) Parameter('kr', 1e-3)

# Declare the initial conditions

Initial(L(s=None), Parameter('L_0', 100)) Initial(R(s=None), Parameter('R_0', 200))

# Declare the binding rule

Rule('L_binds_R',

L(s=None) + R(s=None) <> L(s=1) % R(s=1), kf, kr)

# Observe the complex

Observable('LR', L(s=1) % R(s=1))

# Simulate the model

time = linspace(0, 40) x = odesolve(model, time) plot(time, x['LR']) PySB declarations Make BNGL Rule(’L_binds_R’, L(s=None) + R(s=None) <> L(s=1) % R(s=1), kf, kr) Build Rxn Net (BNG) User PySB External Tools

Integrate (VODE) Result 1 2 Make ODEs L(s) + R(s) <-> L(s!1).R(s!1) kf, kr dL/dt = -kf*L*R + kr*LR dR/dt = -kf*L*R + kr*LR dLR/dt = kf*L*R - kr*LR odesolve(...) plot(...) Plot (Matplotlib) 3 begin reactions 1 1,2 3 kf #L_binds_R 2 3 1,2 kr #L_binds_R(reverse) end reactions Data matrix 1 2 3

Figure 1 Creation and simulation of a ‘Hello World’ model in PySB. (A) Creation and deterministic simulation of a model using PySB. The call toModel()creates thepysb.core.Modelobject to which all subsequently declared components are added. The first block of code declares the molecule types, parameters, initial conditions, reversible reaction rule, and observable necessary for modeling and simulating the reversible binding of proteinsLandR. The second block of code calls the odesolvefunction from thepysb.integratemodule to generate and integrate the ODEs. The third block plots the simulated time course using the Matplotlib plotcommand. Numbers associated with the code blocks identify the correspondences between the code and the control flow shown in B. (B) Control flow for an ODE simulation of a PySB model. The columns ‘User,’ ‘PySB’, and ‘External Tools’ indicate the locus of control of each step in the process (boxes). The ‘Result’ column shows the result of each individual step: the gray box indicates results of steps that are internal to the call toodesolve, whereas the other results are visible to the user at the top level. After declaring the model elements as in A, the user callsodesolve, which generates the corresponding BNGL for the model, invokes BNG externally on the generated BNGL code to create the reaction network, parses the reaction network to generate the corresponding set of ODEs, and calls an external integrator (VODE) to generate the trajectories. The trajectories are returned to the user as a NumPy record array, where they are visualized with a call to theplotfunction from Matplotlib.

(7)

catalyze(C8, 'bf', Bid(state='U'), 'bf', Bid(state='T'), klist)

Example macro call

Implementation BNGL rules ODEs C8(bf) + Bid(bf,state~U) <-> C8(bf!1).Bid(bf!1,state~U) kf, kr C8(bf!1).Bid(bf!1,state~U) -> C8(bf) + Bid(bf,state~T) kc C8: ds0/dt = kc*s2 - kf*s0*s1 + kr*s2 Bid: ds1/dt = -kf*s0*s1 + kr*s2 C8_Bid: ds2/dt = -kc*s2 + kf*s0*s1 - kr*s2 tBid: ds3/dt = kc*s2 assemble_pore_sequential(Bax, 's1', 's2', 6, ktable) 5 Rules 6 ODEs Example macro call

BNGL rules ODEs 28 Rules 41 ODEs bind_table([[ Bcl2, BclXL, BclW, Mcl1, Bfl1], [Bid, 66, 12, 10, 10, 53], [Bim, 10, 10, 38, 10, 73], [Bad, 11, 10, 60, None, None], [Bik, 151, 10, 17, 109, None], [Noxa, None, None, None, 19, None], [Hrk, None, 92, None, None, None], [Puma, 18, 10, 25, 10, 59], [Bmf, 24, 10, 11, 23, None]], 'bf', 'bf', kf=1e-3]]

Example macro call

BNGL rules ODEs

Supplementary Figure S1A 1: Enzyme (Caspase-8) 2: Enzyme-binding site name 3: Substrate (full length, cytosolic Bid)

4: Substrate-binding site name 5: Product (T for truncated, cytosolic Bid) 6: List of rate parameters

Macro Arguments

1 2 3 4 5 6

1: Subunit name

2, 3: Sites for binding neighboring subunits

4: Maximum number of subunits in pore 5: Table of rate parameters

Macro Arguments

1 2 3 4 5

Macro Arguments

1: Bind table data (molecules and rates) 2: Binding site for row-group proteins

3: Binding site for column-group proteins 4: Default association rate

1

2 3 4

Figure 2 Three examples of mechanistic abstractions using macros. Full implementation of all macros can be found in themacros.pyfile in the PySB source code online (Materials and methods). (A)catalyze. The example call shows how thecatalyzemacro is called to add a catalytic reaction in which active caspase-8 (C8) binds to untruncatedBid (Bid(state¼‘U’)) to yield tBid(Bid(state¼‘T’)). Rate parameters (forward, reverse, and catalytic) are provided in the list‘klist’. The ‘Basic implementation’ in Supplementary Figure S1A shows the Python source code for a simplified version of thecatalyzemacro. Execution of thecatalyzemacro leads to the creation and addition of two rules to the model, which, when converted into BNGL, take the form shown below. Generation of the reaction network via BNG then results in the system of four ODEs shown at bottom. (B)assemble_pore_sequentialmodels the assembly of a pore in a sequential fashion in which monomers bind to form dimers, dimers bind monomers to form trimers, trimers bind monomers to form tetramers,etc. Pores of size three (trimers) and above have a closed, ring-shaped topology, reflecting the variable structure and stoichiometry for the Bax pore suggested by recentin vitro experiments (Martinez-Caballeroet al, 2009). The maximal size of the pore is given by the fourth argument to the macro (i.e., 6). With the parameters shown in the example, the execution of the macro results in five rules (for binding of monomers to monomers, monomers to dimers, monomers to trimers,etc) and six species and ODEs (monomers through hexamers). Pores with greater stoichiometry can be modeled simply by changing the pore size argument in the macro call. (C) bind_tableis used to concisely represent combinatorial binding among two related groups of molecules. In the example call, the species in the column headers are the five known anti-apoptotic Bcl-2 proteins, whereas the row headers are various pro-apoptotic BH3-only proteins. The table entries represent the dissociation constants for binding between each pair of proteins drawn fromin vitromeasurements, given in units of nanomolar (Williset al, 2005; Certoet al, 2006); the Python constantNone indicates that no binding occurs. (In place of a dissociation constant, a table cell may alternatively contain a pair of explicit association and dissociation rates.) The names of the binding sites for the row-group and column-group proteins (i.e.,‘bf’) are given as the second and third arguments. The final argument (i.e.,kf¼1e-3) specifies a default association rate to be applied to all reactions, given in units of nM1s1. The execution of thebind_tablecall results in the instantiation of 28 reversible binding rules, each with the given association rate and a dissociation rate calculated from the dissociation constant provided in the table entries; this further expands to 41 ODEs.

(8)

wide variety of methods for reusing more complex model elements. We have found three to be particularly useful:

(1)Instance Reuse,for making small changes to an existing

model; (2)Module Reuse, for programmatically composing a model from reusable pieces or modules; and (3)Class Reuse, for models involving combinatorial variation of several independent features. Of these, Instance Reuse is the simplest, entailing programmatic duplication of a previous model and explicit specification of new and modified elements. Instance Reuse proved to be an appropriate way to update the PySB version of EARM 1.0 (Albecket al, 2008a) to include synthesis and degradation reactions (Figure 3A). This approach replaces conventional ‘cut-and-paste’ copying and editing of ODEs or BNGL rules. Reuse is achieved by cloning the old model into a new model and then explicitly declaring the elements that are added or modified, making the changes easy to understand and track.

Module Reuse involves the separation of a model into independent elements (‘motifs’ or ‘modules’; see also Figure 7) that are written as callable subroutines in Python. It is not necessary for a biological process to be modular in a functional sense for modularization in PySB to be advanta-geous. Building a model from subroutines enables a ‘mix-and-match’ approach in which a subset of the interactions in a model is subjected to revision or re-examination, whereas the rest remain the same. For example, we divided EARM 1.0 into three modules each involving self-contained blocks of PySB code for: (1) reactions from ligand–death receptor association to binding of DISC components; (2) interactions among Bcl-2 family members controlling MOMP; and (3) the cascade of reactions involving initiator and effector caspases and their immediate regulators (Figure 3B; Box 1). A series of papers examining alternative models of MOMP have been published by multiple groups (Chen et al, 2007a, b; Cui et al, 2008; Howells et al, 2010), but MOMP has most commonly been studied in isolation from reactions occurring upstream and downstream. However, it has been shown that multi-protein cascades do not exhibit the same behavior in isolation as when they are part of larger networks (Del Vecchioet al, 2008; Chen

et al, 2009). One of the primary aims of modeling signal

transduction is to contextualize molecular mechanisms by embedding them in a network context. Thus, studies of extrinsic apoptosis would benefit from models in which alternative hypotheses for MOMP regulation are embedded in a more complete reaction pathway. Using conventional modeling tools, it is challenging to add MOMP ‘mini-models’ to upstream and downstream reactions (Albecket al, 2008b). In contrast, in PySB, this type of composition is simple: we have written a PySB program in which any of 15 models of MOMP are called up, along with commonrec_to_bidand pore_to_parpmodules to create 15 fully functioning hybrid models of extrinsic apoptosis (Figure 3B). The upstream and downstream reactions have also been modeled in several different ways by others (Rehmet al, 2006; O’Connoret al, 2008; Frickeret al, 2010; Neumannet al, 2010; Schleichet al, 2012), and it would be straightforward to use Module Reuse to combine different proposals for rec_to_bid and pore_ to_parpwith different MOMP modules.

Class Reuse is a third and more sophisticated approach that exploits the class inheritance mechanism in Python. Common

model features are coded in a base class, and model variants are written as ‘child’ classes of the base class, able to inherit code from the base class directly with or without program-matic modification. Code from multiple variants can then be combined by further inheritance from more than one of these classes. For example, we used Class Reuse to model the effects of reaction compartmentalization on interactions among pro-apoptotic Bcl-2 proteins at the mitochondrial membrane (Figure 3C). In one case, we assumed that reactions took place in two well-mixed reaction compartments corresponding to cytosol and membrane (Figure 3C, TwoCpt), and in the second, we assumed that each mitochondrion constituted a distinct reaction compartment (MultiCpt). In addition, we independently explored different reaction topologies involving the Bcl-2 proteins tBid and Bax. Both topologies (Topo1and Topo2) include the translocation of tBid and Bax to membranes but only the second topology (Topo2) incorpo-rates activation of Bax by tBid. We used inheritance to automate creation of four different models having different compartmentalization schemes and reaction topologies (e.g., TwoCpt_Topo1). The notable feature of Class Reuse is that model variants are created and combined over multiple independent ‘axes’—in this example, compartmentalization and protein-interaction topology—transparently and with no duplicated code.

Integration with the Python ecosystem and external modeling tools

The iterative process of model development is dramatically accelerated when tools for model creation, simulation, analysis, and visualization are integrated. Many commercial and academic software packages, including Mathematica (Wolfram Research, Inc., 2010) and MATLAB (Mathworks, 2012), provide integrated tools for equation-based models but are unwieldy to use with rule-based or programmatic approaches because models must be exported and imported using SBML (Hoopset al, 2006; Maiwald and Timmer, 2008). At the same time, rule-based model editors such as RuleBender for BNGL (Smith et al, 2012) and RuleStudio for Kappa (https://github.com/kappamodeler/rulestudio) facilitate development of rule-based models but do not incorporate tools for data analysis, parameter fitting, and symbolic math. Simply by virtue of being written in Python, PySB interacts natively with a large and growing library of open-source scientific software such as NumPy, SciPy, SymPy, and Matplotlib (Table I). Models written using PySB can also exploit Python tools for documentation generation (sphinx) and for unit testing (unittest, nose,anddoctest), both of which we used extensively in creating the models of extrinsic apoptosis described below.

To interface PySB with BNG and Kappa, which are not implemented in Python, we wrote Python ‘wrapper’ libraries, providing access to agent-based simulation, static analysis, and visualization. The wrappers also manage the syntactic differences between BNGL and Kappa, allowing either to be used for the same PySB model. Models can be trivially exported in BNGL format for use with established ‘all-in-one’ tools that support BNGL, such as V-Cell (Moraruet al, 2002).

(9)

earm_modules.py twocpt_topo1.py import earm_1_0 Model(base=earm_1_0.model) # Add/modify components synthesize_degrade_table([...]) import earm_modules Model() earm_modules.rec_to_bid()

# Implement variant MOMP model <Components, macros, etc.>

earm_modules.pore_to_parp()

Model()

<Components, macros, etc.> earm_1_0.py

def rec_to_bid(): # Upstream module <Components, macros, etc.> def pore_to_parp(): # Downstream module <Components, macros, etc.>

compartment_classes.py model_variant.py earm_1_3.py Reuse topology_classes.py Reuse

from compartment_classes import TwoCpt

from topology_classes import Topo1

TwoCpt_Topo1 = type('TwoCpt_Topo1', (TwoCpt, Topo1), {}) model_builder = TwoCpt_Topo1() model_builder.build_model() earm_1_0 earm_1_3 rec_to_bid pore_to_parp CptBase Topo2 Topo1 model_variant TwoCpt MultiCpt

Compartmentalization Interaction topology

Reuse

TwoCpt Topo1

class CptBase(object):

def tBid_activates_Bax(self): # Default implementation <Components, macros, etc.> class TwoCpt(CptBase):

def translocate_tBid_Bax(self): # Specific implementation

<Components, macros, etc.>

class MultiCpt(CptBase):

def translocate_tBid_Bax(self):

# Specific implementation

<Components, macros, etc.>

class Topo1(object): def build_model(self):

self.translocate_tBid_Bax()

class Topo2(object): def build_model(self):

self.translocate_tBid_Bax() self.tBid_activates_Bax()

B

A C

Figure 3 Three approaches to model reuse. Boxes with gray tabs represent Python modules/files. Statements marked ‘Reuse’ identify the point of reuse of previously created model code. Background highlighting indicates correspondence between elements in the reuse schematic and the PySB code. (A) Direct reuse and subsequent modification of pre-existing model code. A pre-existing model is declared in its own Python file (earm_1_0.py). The extending model in the fileearm_1_3.py (representing a later version) imports and duplicates the model object fromearm_1_0.pyand subsequently adds a list of synthesis and degradation reactions. (B) Reuse of modules using macros. Macros instantiating the components for the upstream (rec_to_bid) and downstream (pore_to_parp) portions of the extrinsic apoptosis pathway are placed in a Python file,earm_modules.py. Variant models differing only in the reaction topology for MOMP initiation (e.g., model_variant.py) are then created by invoking these macros for the shared upstream and downstream elements. (C) Reuse and recombination of model elements through class inheritance. A shared implementation of the compartmentalization-independent reactions for Bax activation (i.e.,tBid_activates_Bax) is contained within the base classCptBase. Alternative compartmentalization strategies are implemented in the child classesTwoCptandMultiCpt, which separately implement the compartmentalization-dependent reactions for tBid and Bax translocation (i.e.,translocate_tBid_Bax). Because these classes inherit fromCptBase, they acquire the implementation oftBid_activates_Bax, representing a point of reuse. Alternative protein-interaction topologies are implemented within thebuild_modelfunction in the two classesTopo1andTopo2. Models with either of the compartmentalization implementations (e.g., TwoCpt) and either of the interaction topologies (e.g.,Topo1) can then be created dynamically by inheriting from the appropriate classes, representing an additional point of reuse. Readers familiar with the concept of polymorphism from object-oriented programming will note that calling thebuild_modelmethod on any of the hybrid classes will polymorphically refer to the correct implementations in the parent classes.

(10)

def chen_febs_direct(do_pore_assembly=True):

# One-step "hit-and-run" activation of Bax by tBid catalyze_one_step_reversible(

Bid(state='T', bf=None),

Bax(bf=None, **inactive_monomer), Bax(bf=None, **active_monomer), [1e-3, 1e-3])

# Bcl2 binds tBid and Bad (a sensitizer) but not Bax bind_table([[ Bcl2], [Bid(state='T'), (1e-4, 1e-3)], [Bad(state='M'), (1e-4, 1e-3)]]) if do_pore_assembly: assemble_pore_spontaneous(Bax(state='A', bf=None), [4e-3, 1e-3]) Albeck 11b Albeck 11c Albeck 11d Albeck 11e Albeck 11f Chen Biophys. J. Howells

Chen FEBS “Indirect” Chen FEBS “Direct”

Cui “Direct” Cui “Direct 1” Cui “Direct 2”

Lopez “Indirect” Lopez “Direct” Lopez “Embedded” Lopez Group

Albeck Group Shen/Howells group

def cui_direct():

# Build on the direct model from Chen et al. (2007) # FEBS Lett. (excluding pore assembly) by:

chen_febs_direct(do_pore_assembly=False) # 1. Overriding some parameter values,

one_step_BidT_BaxC_to_BidT_BaxA_kf.value = 0.0005

bind_BidT_Bcl2_kf.value = 0.001 # 2. Adding a Bad-for-Bid displacement reaction,

displace_reversibly(Bad(state='M'), Bid(state='T'), Bcl2, [0.0001, 0.001])

# 3. Adding simplified MAC formation (Bax dimerization) assemble_pore_sequential(Bax(state='A', bf=None), 2, [[0.0002, 0.02]])

# 4. Adding synthesis and degradation reactions Bax2 = Bax(s1=1, s2=None) % Bax(s1=None, s2=1) synthesize_degrade_table(

[[Bax(bf=None, **inactive_monomer), 0.06, 0.001], [Bax(bf=None, **active_monomer), None, 0.001], [Bid(state='T', bf=None), 0.001, 0.001], [Bcl2(bf=None), 0.03, 0.001], [Bid(state='T', bf=1) % Bcl2(bf=1), None, 0.005], [Bad(state='M', bf=None, serine='U'), 0.001, 0.001], [Bad(bf=1) % Bcl2(bf=1), None, 0.005], [Bax2, None, 0.0005]])

A B

D C

Figure 4 Refactoring of published models into PySB. (A) Relationships between the models examined in this paper. The ‘Albeck Group’ incorporates a series of incrementally expanded models shown in Figure 11 of Albecket al(2008b); the ‘Shen/Howells Group’ incorporates models from three papers from the research group of Shen and colleagues (Chenet al, 2007a; 2007b; Cuiet al, 2008) and a derivative model from Howellset al(2010); the ‘Lopez group’ includes three expanded models introduced in this paper. The arrows indicate that one model has been derived or extended from a prior model and point in the direction Base Model-Derivative Model. (B) The ‘Direct’ model from Chenet al(2007b) in its original ODE-based representation. (C) Conversion of the Chen ‘Direct’ model to a PySB module. The execution of thechen_febs_directfunction results in rules that exactly reproduce the ODEs shown above (the molecule typeBadin the PySB function corresponds to the generic enabler speciesEnain the original equations;Bidcorresponds to the generic activatorAct). The macrocatalyze_one_step_reversible implements the two-reaction schemeEþS-EþP,P-S;assemble_pore_spontaneousimplements the order-4 reaction 4subunit$pore. The bind_tablemacro is illustrated in Figure 2C. (D) Model extension in PySB. Module Reuse (Figure 3B) was used to implement the ‘Direct’ model from Cuiet al (2008) as an extension of the prior ‘Direct’ model from Chenet al(2007b) shown in Figure 4C. Invocation of the PySB functionchen_febs_directincorporates the elements of the original Chenet al(2007b) model; subsequent statements specify the modifications and additions required to yield the derived model from Cuiet al(2008).

Table IIntegration with external modeling tools

Tool Reference Interface Description (relevance to PySB)

NumPy Oliphant, 2007 Python Efficient array and matrix operations

SciPy Oliphant, 2007 Python Scientific algorithms, e.g., ODE integration, statistics, and optimization

SymPy SymPy Development Team, 2012 Python Symbolic manipulation of mathematical expressions

Matplotlib Hunter, 2007 Python Plotting and other data visualizations

Graphviz Gansner and North, 2000 Python Layout and rendering of node-edge graphs

BNG Faederet al, 2009 Wrapper Translation of rules to a reaction network; stochastic simulation

Kappa Danoset al, 2007b Wrapper Stochastic simulation; visualization and analysis of rules models

SBML Hucka, 2003 Export Compatibility with SBML tools

Mathematica Wolfram Research, Inc., 2010 Export General-purpose scientific computing

(11)

PySB can export models as systems of ODEs in formats for SBML (Hucka, 2003), MATLAB, Mathematica, or Potters-Wheel (Maiwald and Timmer, 2008). Finally, to facilitate unique identification of model components, we added a light-weight annotation capability that allows any model element (including macros and modules) to be tagged with identifiers from external databases using subject–object–predicate triples compatible with MIRIAM (Le Nove`reet al, 2005). The net result is a software environment that combines the flexibility of a general-purpose scientific computing package with programmatic and rule-based modeling tools and an open-source code base.

EARM version 2.0: a family of models of extrinsic apoptosis and MOMP

To explore the ability of PySB to model the latest molecular data on apoptosis while also building on previous work, we used macros and Module Reuse to construct a family of cell-death models involving 15 different modules for MOMP regulation. Seven of the MOMP modules were previously published by the research group of Shen and colleagues (Chen

et al, 2007a, b; Cuiet al, 2008), one module extends a Shen

group model (Howellset al, 2010), five modules are from the work of Albecket al(2008b), and three modules are entirely new and incorporate more complete sets of interactions among Bcl-2 proteins (Figure 4A). The three new modules are derived from word models in recent studies from Green and Andrews that unify previously competing mechanisms of pore forma-tion (Billenet al, 2008; Leberet al, 2010; Llambiet al, 2011). Each of the 15 modules was instantiated in PySB as a distinct subroutine that can be called and analyzed in the context of a receptor-to-caspase pathway. The set of 15 MOMP modules is by no means complete, and several noteworthy models of extrinsic apoptosis and MOMP (Benteleet al, 2004; Bagciet al, 2006; Legewieet al, 2006; Rehmet al, 2009; Du¨ssmannet al, 2010) have not yet been coded in PySB. However, because our objective is to explore model reuse and composition using

PySB, we limited ourselves to the 15 MOMP-focused examples described above. We collectively denote the resulting set of 30 variant models (15 models only of MOMP plus 15 models of extrinsic apoptosis incorporating the MOMP modules) as Extrinsic Apoptosis Reaction Model version 2.0 (EARM 2.0); the models are summarized in Table II and are available as a Python package with source code downloadable from GitHub (Materials and methods).

While porting existing models into PySB, we observed that several published ODE networks contained one or more errors relative to their verbal or graphical descriptions in the original paper (Cui et al, 2008) (see Supplementary Note). We were unable to discern whether the errors in the published ODE networks represent genuine mathematical errors or merely transcription errors made in the process of converting computer models to text. Even when we used the ODE networks as published, we found cases in which we were unable to reproduce the results described in the figures. Our own previous work was not entirely free of this problem: we could not reproduce the simulation results in Figure 11 of Albecket al(2008b) without access to MATLAB source code that was inadvertently omitted from the original publication. Our aim is not to criticize these papers but instead to emphasize that the current practice of maintaining different forms of a model for the purpose of simulation, illustration, and publication is highly problematic. The lists of equations included as supplementary materials in most modeling papers are particularly troublesome because they exist independently of the simulation model and the two tend to deviate. These problems can be addressed by using electronic formats for model exchange with a single master from which all other versions are derived (Huckaet al, 2003; Waltemathet al, 2011). As an electronic format for models, PySB complements XML-based formats such as SBML in that macros, modules, and other high-level abstractions make model structure more intelligible than SBML alone. In addition, modeling biochem-ical processes by reusing previously validated macros elim-inates ‘bookkeeping’ errors such as those we identified in published MOMP models. To ensure that the reinstantiated Table IISummary of models in EARM 2.0

Model namea _{ID (full/MOMP)} _Reference _{MOMP-only (M}_n_b)b _{Full apoptosis (M}_n_a)c

Rules ODEs Parameters Rules ODEs Parameters

Lopez Embedded M1a/b This paper 39 40 78 66 76 133

Lopez Direct M2a/b This paper 27 32 58 54 68 113

Lopez Indirect M3a/b This paper 29 34 64 56 70 119

Albeck 11b M4a/b Albecket al, 2008b 7 13 17 34 48 71

Albeck 11c M5a/b Albecket al, 2008b 11 17 25 38 52 79

Albeck 11d M6a/b Albecket al, 2008b 12 18 27 39 53 81

Albeck 11e M7a/b Albecket al, 2008b 14 21 31 41 56 85

Albeck 11f M8a/b Albecket al, 2008b 14 21 31 41 56 85

Chen 2007 Biophys J M9a/b Chenet al, 2007a 6 7 12 37 49 75

Chen 2007 FEBS Direct M10a/b Chenet al, 2007b 5 8 12 36 48 74

Chen 2007 FEBS Indirect M11a/b Chenet al, 2007b 3 6 9 34 48 72

Cui Direct M12a/b Cuiet al, 2008 18 10 26 49 52 89

Cui Direct 1 M13a/b Cuiet al, 2008 22 11 33 53 53 96

Cui Direct 2 M14a/b Cuiet al, 2008 23 11 34 54 53 97

Howells M15a/b Howellset al, 2010 14 12 22 45 49 84

a

Model names are drawn from the first author of the paper in which the mathematical model was published.

b

MOMP-only variants are identified as Mnb, e.g., M1b for the MOMP-only variant of ‘Lopez Embedded’.

(12)

models reproduced the behavior of the originally published versions, we wrote a series of unit tests using the Python modulesunittestandnose. The tests guarantee that the reinstantiated models reproduce validated states, despite translation into PySB. Further details on our approach to unit testing can be found in the online documentation (Materials and methods).

The structure and origin of the MOMP models are easier to understand using PySB than the underlying sets of ODEs. This can most easily be seen by comparing the ODE and PySB versions of a model from Chenet al(2007b) (Figures 4B and 4C). The original model is relatively simple (only seven ODEs), but understanding the precise mechanism for MOMP requires careful inspection of each equation. By comparison, the PySB model exploits macros to make the mechanisms transparent: single-step catalysis, combinatorial binding, and pore assem-bly. Many of the 15 MOMP models in EARM 2.0 represent incremental extensions of earlier models (this is particularly true of the models from Howellset al(2010) and Chenet al

(2007a, b), as well as the five models from Albecket al(2008a, b); Figure 4A). The authors of these models proceeded by duplicating ODEs from previous models or papers and then adding new species or reactions as required: e.g., the three models of Cuiet al(2008) are derived directly from the ‘direct’ model of Chenet al(2007b), whereas the model of Howells

et al (2010) is based on an earlier model from Chen et al

(2007a). However, the process of renaming species and variables in the derived models makes it difficult to verify that each variant correctly recapitulates the structure of the original model as claimed. For example, in Cuiet al(2008), the authors stated simply that the ‘direct model’ was ‘mainly based’ on the earlier work of Chenet al(2007b), but we found that there were several important additions and modifications in the derived model, including addition of displacement, synthesis and degradation reactions, and a change in the MOMP pore from a Bax tetramer to a Bax dimer. Inspection of the PySB source code for the direct model of Cuiet al(2008) (Figure 4D) makes these differences explicit by calling a

subroutine for the earlier chen_febs_direct model

(Figure 4C) and adding only the new reactions.

PySB Module Reuse facilitated the process of embedding each of the 15 models of MOMP within the context of receptor– proximate reactions (ligand binding to Bid cleavage) and downstream reactions creating ‘Full Apoptosis’ and ‘MOMP-only’ versions (summarized in Table II; see also Materials and methods). We are currently developing additional apoptosis modules (e.g., alternative topologies for receptor activation and DISC formation) that will soon be part of the EARM repository; other researchers can also ‘fork’ the code on GitHub and contribute their own additions. This should allow a cumulative and distributed approach to model development and comparison.

Embedded together: an updated and expanded MOMP model

The EARM 2.0 extrinsic apoptosis model incorporating the ‘Lopez Embedded’ MOMP module variant, denoted EARM 2.0-M1a for short (Table II), implements a mathematical

interpretation of recent experimental findings from Andrews (Billenet al, 2008; Leberet al, 2010) and Green (Llambiet al, 2011) and differs significantly from previously published models of MOMP (Figure 5). Interactions among Bcl-2 family members occur at the mitochondrial membrane rather than in the cytosol (Lovellet al, 2008), and anti-apoptotic proteins are able to bind both the pore-forming proteins, such as Bax and Bak, and a larger family of BH3-only Bcl-2 family members, thus serving as dominant-negative effectors (Billenet al, 2008; Leberet al, 2010). This is also consistent with a recent ‘Unified Model’ by Green and coworkers demonstrating both ‘direct’ and ‘indirect’ modes of action by the anti-apoptotic Bcl-2 proteins (Llambiet al, 2011) (Box 1). The overlapping binding specificities implied by this model are summarized in a bind_tablecall that includes the key effector for extrinsic apoptosis (Bid), two BH3-only sensitizers (Bad and Noxa), two pore-forming effectors (Bax and Bak), and three anti-apoptotic proteins (Bcl-2, Bcl-XL, and Mcl-1), along with affinity data obtained fromin vitroexperiments (Williset al, 2005; Certo

et al, 2006) (Figure 5A). There is some doubt about whether

peptide-based affinity measurements are directly relevant to protein–protein interactions occurring on the membranes of living cells, and thebind_tablemacro makes it straightfor-ward to experiment with different values (Figure 2C). EARM 2.0-M1a also assumes auto-activation of Bax (and Bak), which has been demonstrated in multiple experimental contexts

(Tanet al, 2006; Gavathiotiset al, 2010).

Our previously published EARM 1.0–1.4 models (Albeck

et al, 2008a, b; Spenceret al, 2009; Aldridgeet al, 2011; Gaudet

et al, 2012) assumed that the all-or-none quality of MOMP

arose from the ability of Bcl-2 to bind Bax monomers, dimers, and tetramers (Albeck et al, 2008b). However, subsequent immunoprecipitation experiments failed to support the exis-tence of such higher-order hetero-oligomers (Kimet al, 2009). To determine whether the updated reaction topology in EARM 2.0-M1a can reproduce MOMP dynamics measured in single TRAIL-treated HeLa cells using Fo¨rster resonance energy transfer reporter proteins (Spenceret al, 2009), we fitted it to data using the simulated annealing algorithm in SciPy (Materials and methods). We found that EARM 2.0-M1a had as good a fit to data as previous models (Figure 6), and we therefore judge it to be superior to our earlier EARM 1.0 model based simply on better correspondence with prior knowledge. The fitting exercise also demonstrated that Python numerical tools can efficiently simulate and calibrate PySB models (parameter estimation functions are included in the EARM 2.0 Python package).

Discussion

In this paper, we describe the development and use of PySB, a framework for creating, managing, and analyzing biological models in which models are full-fledged Python programs. PySB modules and macros generate BNGL or Kappa rules, which can be converted into mathematical equations. This hierarchical process is analogous to the way in which programs are written in a high-level language such as Cþ þ and converted into microprocessor code by the compiler. This complexity is hidden from PySB users who work with macro

(13)

libraries encoding biochemical actions such as ‘catalyze,’ ‘bind,’ ‘assemble,’etc. The advantages of a high-level abstrac-tion layer include greater transparency and intelligibility, a reduction in implementation errors and a dramatic increase in the ability to compare and reuse previous work. Each level of representation remains accessible for analysis and there are no explicitly black-box steps: the Python model code reveals model ontogeny, structure, and approach to mechanism; the BNGL or Kappa rules generated by PySB support agent-based simulation and static analysis; and equations enable numer-ical integration, sensitivity analysis, steady-state analysis,etc. Use of familiar but powerful programming concepts in PySB models such as abstraction, composition, modularity, inheri-tance, and polymorphism make it possible to create variant models from pre-existing models across several axes of variation and build new models from previously tested elements. We expect these features of PySB to facilitate collaborative model development and evaluation.

PySB draws on well-established practices in the open-source programming community for model documentation and

sharing. Because PySB models are programs, they can be tracked and shared using the powerful tools developed for distributed, open-source software development (e.g., all the models in this paper are available, with documentation, at GitHub; see Materials and methods). It is simple to update models online, highlight differences with previous work and divide development among multiple individuals and research teams. Finally, PySB can be used as a general-purpose modeling tool because it interoperates with diverse scientific applications written in Python (e.g., NumPy, SciPy, SymPy, and Matplotlib). Unlike conventional all-in-one programs, PySB itself tackles only certain steps in the modeling process, relying on interoperability with programs developed and maintained by others to create a full-fledged solution. A benefit of this approach is that improvements in any of these programs accrue directly to users of PySB.

The power of PySB derives, in part, from its ability to encode recurrent biochemical patterns in reusable macros (Figure 2) and to divide complex networks into modules that are defined once and called when needed (Figures 3B and 4C). By

Bak Bax Bax Bax* Bak* Bcl-2 Mcl-1 Bcl-xL Bad Noxa tBid tBid PORE Bcl-xL BclxL bf state Noxa bf state Mcl1 bf state Bid bf state Bcl2 bf Bax bf s1 s2 state Bak bf s1 s2 state Bad bf state

tBid/Bax/BclxL translocation tBid activates effectors Bcl-2 family binding Bax/Bak autoactivation Pore assembly Bax_C r1 r15 Bax_M Bak_M r16 r6 Bcl2 r18 r7 r3 Bax_A:Bcl2 Bcl2:Bid_M Bad_M:Bcl2 BclxL_C r2 BclxL_M Mcl1 r21 r9 r4 Bak_A:Mcl1 Bid_M:Mcl1 Mcl1:Noxa Bid_T r0 Bid_M Noxa Bad_M r11 Bad_M:BclxL_M r5 r8 BclxL_M:Bid_M r20 r19 Bak_A:BclxL_M Bax_A:BclxL_M Bax_A Bak_A r27 r29 r22 Bax*3 Bax*4 Bax*2 r28 r30 r23 Bak*3 Bak*4 Bak*2

equilibrate(Bid(bf=None, state='T'), Bid(bf=None, state='M'), [1e-1, 1e-3]) equilibrate(free_Bax(state='C'), free_Bax(state='M'), transloc_rates)

equilibrate(BclxL(bf=None, state='C'), BclxL(bf=None, state='M'), transloc_rates) catalyze(Bid(state='M'), Bax(state='M'), Bax(state='A'), activation_rates) catalyze(Bid(state='M'), Bak(state='M'), Bak(state='A'), activation_rates) bind_table([[ Bcl2, BclxL(state='M'), Mcl1(state='M')], [Bid(state='M'), 66, 12, 10], [Bad(state='M'), 11, 10, None], [Noxa(state='M'), None, None, 19], [Bax(active_monomer), 10, 10, None], [Bak(active_monomer) None, 50, 10]],

kf=1e-3)

catalyze(Bax(active_monomer), Bax(state='M'), Bax(state='A'), activation_rates) catalyze(Bak(active_monomer), Bak(state='M'), Bak(state='A'), activation_rates) assemble_pore_sequential(Bax(bf=None, state='A'), 4, pore_rates)

assemble_pore_sequential(Bak(bf=None, state='A'), 4, pore_rates)

Figure 5 Representations of the Bcl-2 interaction topology in the EARM 2.0-M1b (‘Lopez Embedded’) model. The Bcl-2 interaction model consists of five basic mechanistic elements or ‘motifs’—tBid/Bax/Bcl-xL translocation, activation of Bax and Bak by tBid, Bcl-2 family binding, Bax/Bak auto-activation, and pore assembly— that are highlighted in A, B, and C according to the colors in the legend. (A) PySB source code for the model, edited for brevity. (B) Simplified, manually drawn representation. (C) The full reaction network, generated from the PySB model using the PySBrender_reactionstool. Rectangles represent species, circles represent reactions, lines represent reactions with the solid arrowhead representing the nominal forward direction, and the empty arrowhead (for reversible reactions only) representing the reverse direction. Catalytic reactions are depicted with a boxed arrow pointing from the catalyst to the reaction circle (species for enzyme– substrate complexes are omitted for clarity). (D) Kappa contact map, which shows the superset of all possible bonds between monomers calculated by static analysis (Danoset al, 2008). The contact map was computed using Kappa’scomplxtool accessed through the PySB Kappa wrapper library. Rectangles represent monomers, circles represent sites, and lines represent bonds.

(14)

eliminating re-implementation, macros and modules separate fundamental mechanistic concepts from implementation details, and thereby make clear the purpose and origins of specific model features (Mallavarapuet al, 2008; Pedersen and Plotkin, 2008; Mirschelet al, 2009; Gnad et al, 2012). The ability of real biological networks to be meaningfully decom-posed into functional modules is highly context dependent and a matter of controversy (Del Vecchioet al, 2008), but there is no requirement that modules in PySB correspond to modules in a biological or ‘black-box’ engineering sense: the full reaction network is always accessible without simplification. Instead, PySB modules are defined according to flexible and convenient organizational boundaries, keeping open the possibility for cross-talk and emergent interactions with other modules. This style of modularity follows the open-ended approach oflittle b(Mallavarapuet al, 2008) and differs from

ProMoT, in which modules interact only through previously designated molecular species (Mirschelet al, 2009). In general, choosing the right boundaries for a module, whether a software program or a biological model, is a matter of art and practical experience. In the models of extrinsic apoptosis analyzed in this paper, reactions governing MOMP are a good candidate for modularization because they largely take place in a discrete compartment (the mitochondrial membrane) and focus on reactions among Bcl-2 proteins.

We have found that PySB naturally supports a hierarchy of modeling concepts (Figure 7A). At the top of this hierarchy are the models themselves, which represent a specific hypothesis about the topology and activity of a biological system or network; at the bottom of the hierarchy are specific mathema-tical equations (e.g., ODEs). Typical approaches to modeling proceed by directly rendering the hypothesis in equations, making it difficult to discern the assumptions implicit in the process of mathematical translation (Figure 7B). Rule-based approaches represent an intermediate level of abstraction in that they enumerate local interactions between proteins in a way that is less explicit than equations (Figure 7C). PySB adds an additional layer of abstraction in that the user works with macros and functions (Figure 7A; see also Figure 2). Sets of macros are then grouped into reusable subroutines that implement small mechanistic ‘motifs’ corresponding roughly to a sentence in a word model, such as ‘tBid activates Bax and Bak.’ Such ‘motifs’ are then composed into modules, and modules into models. Constructed in this fashion, a set of variant models forms a ‘web’ of intertwined elements that is largely self-documenting.

PySB as a second-generation approach

PySB is not the first attempt to create a high-level language for modeling biochemistry and was inspired by ProMoT andlittle

b, both of which represent models as LISP programs

(Mallavarapuet al, 2008; Mirschelet al, 2009). However, as described above, these tools had limited or no support for rule-based modeling. PySB is rule-based on the much more familiar Python language and is interoperable with BNGL and Kappa. The rule-based modeling community is also developing tools for managing complex models. For example, MetaKappa targets redundancy in models having related molecular species and partially overlapping functional characteristics (e.g., a set of mutants or isoforms of a single protein) (Danos

et al, 2009). The Language for Biological Systems (LBS) is

another approach in which rules are combined with methods for constructing parameterized modules (Pedersen and Plotkin, 2008). MetaKappa and LBS are examples of domain-specific languages (DSL) for high-level biological modeling, whereas PySB supports high-level modeling through the structured programming features of Python (Box 2). Through Python, PySB provides substantial flex-ibility in organizing models. A potential drawback of this flexibility is that static analysis of models may be more challenging to implement than for DSLs such as MetaKappa or LBS. For the time being, available static analysis algorithms can be applied to the rules generated from PySB models (Danoset al, 2008; Feretet al, 2009). We expect the strengths and weaknesses of these different approaches to become

experimental data

simulation with nominal parameter values simulation with fitted parameter values

Time (s)

Fraction of

cleaved IC-RP / Bid

0 5000 10 000 15 000 20 000 Time (s) TD / Fraction of released Smac 0 5000 10 000 15 000 20 000 Time (s) Fraction of

cleaved EC-RP / PARP

0 5000 10 000 15 000 20 000 0.0 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2

Figure 6 Simulated annealing fits for EARM 2.0-M1a for three experimentally measured protein signals IC-RP/tBid (A), IMS-RP/Smac release (B), and EC-RP/ PARP cleavage (C). Gray lines indicate the experimental data with error bars indicating the s.d. In the case of B, the gray line denotes the mean time of death TDExp, used to align the trajectories (Materials and methods). The purple curves

show the simulated trajectories using nominal parameter values; the orange curves show the simulated trajectories after model fitting. The objective function for fitting is described in the Materials and methods.