Metabolic Network Analysis
Algorithms in Pathway Tools
Peter D. Karp, Ph.D.
Bioinformatics Research Group SRI International
[email protected] BioCyc.org
Systems Biology
z Def 1: System-scale descriptions and analyses of
biological sytems
Overview
z Pathway/Genome Databases
z BioCyc collection
z EcoCyc, MetaCyc
z Pathway Tools software
z Visualization, Editing, Analysis
z Inference tools
z Analyzing biological networks to identify gaps and
inconsistencies
z Prediction of growth media from metabolic
What to do When Theories Become
Larger than Minds can Grasp?
z Example: E. coli metabolic network
z 244 pathways involving 1,029 reactions and 895 substrates
z Example: E. coli genetic network
z Control by 97 transcription factors of 1174 genes in 630 transcription units
z Past solutions:
z Experts specialize
z Publish theories in textual form
z We cannot compute with theories in those forms
z Evaluate theories for consistency with new data: microarrays z Refine theories with respect to new data
Databases of Metabolic Pathway Data
z Organize growing corpus of data on metabolic pathways
z Experimentally elucidated pathways in the biomedical literature z Computationally predicted pathways derived from genome data
z Provide software tools for querying and comprehending this
complex information space
z Multiorganism view: MetaCyc
z Unique, experimentally elucidated pathways across all organisms z Reference database for computational pathway prediction
z Organism-specific view:
z Organism-specific Pathway/Genome Databases z Detailed qualitative models of metabolic networks
z Combine computational predictions with experimentally determined pathways
Pathway Tools Capabilities
z Create and maintain an organism database
integrating genome, pathway, regulatory information
z Computational inference tools
z Interactive editing tools
z Query and visualize that database
z Use the database to interpret omics data z Metabolic network analysis tools
BioCyc Collection of 507
Pathway/Genome Databases
zPathway/Genome Database (PGDB) –
combines information about
z Pathways, reactions, substrates z Enzymes, transporters
z Genes, replicons
z Transcription factors/sites, promoters,
operons
zTier 1: Literature-Derived PGDBs
z MetaCyc
z EcoCyc -- Escherichia coli K-12
zTier 2: Computationally-derived DBs, Some Curation -- 24 PGDBs z HumanCyc z Mycobacterium tuberculosis zTier 3: Computationally-derived DBs, No Curation -- 481 DBs
Pathway Tools Software
z PathoLogic
z Predicts operons, metabolic network, pathway hole fillers, from genome z Computational creation of new Pathway/Genome Databases
z Pathway/Genome Editors
z Distributed curation of PGDBs
z Distributed object database system, interactive editing tools
z Pathway/Genome Navigator
z WWW publishing of PGDBs
z Querying, visualization of pathways, chromosomes, operons z Analysis operations
Pathway visualization of gene-expression data Global comparisons of metabolic networks
EcoCyc
Project – EcoCyc.org
z E. coli Encyclopedia
z Review-level Model-Organism Database for E. coli
z Tracks evolving annotation of the E. coli genome and cellular networks z The two paradigms of EcoCyc
z “Multi-dimensional annotation of the E. coli K-12 genome”
z Positions of genes; functions of gene products – 76% / 66% exp z Gene Ontology terms; MultiFun terms
z Gene product summaries and literature citations z Evidence codes
z Multimeric complexes z Metabolic pathways z Cellular regulation
Nuc. Acids Res. 35:7577 2007 ASM News 70:25 2004 Science 293:2040
EcoCyc = E.coli Dataset +
Pathway/Genome Navigator
Genes: 4,478 Proteins: 4,479 Complexes: 880 RNAs: 285 Reactions: Metabolic: 975 Transport: 272 Pathways: 237 Compounds: 1,373 URL: EcoCyc.org Gene Regulation: Operons: 3,359 Trans Factors: 196 Promoters: 1,766 TF Binding Sites: 2,105 EcoCyc v13.5 Citations: 19,000Paradigm 1:
EcoCyc as Textual Review Article
z All gene products for which experimental literature
exists are curated with a minireview summary z Found on protein and RNA pages, not gene pages!
z 3257 gene products contain summaries
z Summaries cover function, interactions, mutant
phenotypes, crystal structures, regulation, and more
z Additional summaries found in pages for operons,
pathways
Paradigm 2: EcoCyc as
Computational Symbolic Theory
z Highly structured, high-fidelity knowledge
representation provides computable information
z Each molecular species defined as a DB object
z Genes, proteins, small molecules
z Each molecular interaction defined as a DB object
z Metabolic reactions
z Transport reactions
z Transcriptional regulation of gene expression
z 220 database fields capture extensive properties
EcoCyc Accelerates Science
z Experimentalists
z E. coli experimentalists
z Experimentalists working with other microbes z Analysis of expression data
z Computational biologists
z Biological research using computational methods z Genome annotation
z Study connectivity of E. coli metabolic network
z Study phylogentic extent of metabolic pathways and enzymes in all domains of life
z Bioinformaticists
z Training and validation of new bioinformatics algorithms – predict operons, promoters, protein functional linkages, protein-protein interactions,
z Metabolic engineers
z “Design of organisms for the production of organic acids, amino acids, ethanol, hydrogen, and solvents “
MetaCyc
:
Meta
bolic En
cyc
lopedia
z Describe a representative sample of every experimentally
determined metabolic pathway
z Describe properties of metabolic enzymes
z Literature-based DB with extensive references and
commentary
z Pathways, reactions, enzymes, substrates
z Jointly developed by
z P. Karp, R. Caspi, C. Fulcher, SRI International z L. Mueller, A. Pujar, Cornell Univ
z S. Rhee, P. Zhang, Carnegie Institution
Applications of MetaCyc
z Reference source on metabolic pathways z Metabolic engineering
z Find enzymes with desired activities, regulatory properties
z Determine cofactor requirements
z Predict pathways from genomes z Systematic studies of metabolism z Computer-aided education
MetaCyc Data -- Version 13.5
Pathways 1,400 Reactions 8,100 Enzymes 5,900 Small Molecules 8,200 Organisms 1,800 Citations 20,800Taxonomic Distribution of
MetaCyc Pathways – version 13.1
Bacteria 883
Green Plants 607
Fungi 199
Mammals 159
Pathway Tools Overviews and Omics Viewers
zDesigned to avoid the hairball effect
zGenerated automatically from PGDB
zMagnify, interrogate
zOmics viewers paint omics data onto
overview diagrams
z Different perspectives on same dataset z Use animation for multiple time points or
conditions
z Paint any data that associates numbers with genes, proteins, reactions, or
metabolites
zGenome-scale visualizations of cellular networks
zHarness human visual system to interpret patterns in biological
Regulatory Overview and Omics Viewer
Dead End Metabolites
z Clues to extra/missing reactions z A small molecule C is a dead-end if:
z (Def 1 easier to compute; Def 2 more accurate)
z Definition 1:
z C is a substrate in only one reaction of the set of SMM
reactions occurring in Compartment AND
z No reactions exist containing parent classes of C AND
z No transporter acts on C in Compartment, nor on parent
classes of C
z Definition 2:
z C is produced only by SMM reactions in Compartment, and
no transporter acts on C in Compartment OR
z C is consumed only by SMM reactions in Compartment, and
Dead-End Metabolite Analysis of
E. coli
z 36 Æ 22 dead ends in metabolic pathways z 174 dead ends in full metabolic network z GDP-L-fucose
z Produced only
z Literature research supported addition of a reaction producing
colanic acid from GDP-L-fucose
z D-galactarate and D-glucarate
z Degraded only
z Literature indicates both can be used as C sources
z Hypothetical transport reactions added
Reachability Analysis of Metabolic
Networks
z Given:
z A PGDB for an organism
z A set of initial metabolites
z Infer:
z What set of products can be synthesized by the
small-molecule metabolism of the organism
z Motivations:
z Quality control for PGDBs
z Verify that a known E. coli growth medium yields known
essential compounds of E. coli
Algorithm: Forward Propagation
Through Production System
z Each reaction becomes a production rule
z Each of the 21 metabolites in the nutrient set becomes an
axiom Nutrient set Metabolite pool “Fire” reactions Products Reactants PGDB reaction set
A + B
Æ
C
Results from EcoCyc Reachability
Analysis in 2001
z Phase I: Forward propagation
z 21 initial compounds yielded only half of the 41 essential compounds for E.
coli
z Phase II: Manually identify
z Bugs in EcoCyc (e.g., two objects for tryptophan)
A Æ B B’ Æ C
z Incomplete knowledge of E. coli metabolic network
A + B Æ C + D
z “Bootstrap compounds”
z Missing initial protein substrates (e.g., ACP)
Protein synthesis not represented
z Phase III: Forward propagation with 11 more initial
metabolites
Minimal Nutrient Sets
Carolyn Talcott, Markus Krummenacker, Steven Eker, and Peter Karp
Computer Science Laboratory and
Bioinformatics Research Group SRI, International
The Problem
z Given a model of metabolism for an organism,
determine minimal sets of nutrients that will support growth.
z Model -- network of metabolic reactions (R)
z Nutrients -- transportables (T), compounds that have transport
reactions
z Growth -- production of essential compounds (E)
z A subset N of T is a nutrient set if E is R-producible
from N
Mathematical Approach
z S = stochiometric matrix for R Sij coeff of Ci in Rj z r = vector of reaction fluxes
z p = S x r -- production pi is production rate of Ci z pi = Si1 r1 + .... + Sik rk
z Basic constraints
z ri >= 0 -- reactions run forward z pi > 0 if Ci in E
z pi >= 0 if Ci not in E or N
z If a compound Cj not in E or T is used, it must be
Problem Simplification
z Impossibility elimination
z Drop reactions that have reactants that can not be produced (or
transported)
z (Uses forward collection)
z Uselessness elimination
z Drop useless compounds and reactions whose products are all
useless
z The useful compounds are found by backwards propagation
from E
Searching for Minimal Nutrient Sets
z Define nutset(N) for N a subset of T by
z nutset(N) = true if the constraints for N are satisfiable
z = false otherwise
z Use a constraint solver (Yices) to determine if there is
a solution
z Find one minimal N: Start with N = T and eliminate
elements until no more can be eliminated.
z Finding all requires some cleverness to do it feasibly.
Our approach uses a representation of Boolean
functions called BDDs (binary decision diagrams) to search for extensions of a set of minimal solutions.
E. coli Case Study
z 160 Transportables z 1378 Compounds z 2251 Reactions z 36 Essentials z 1156 Solutions z 9 Reduced solutionsSome Minimal Nutrient Sets
z Solution 5 z Taurine z Phosphate z L-alanine z Solution 6 z Taurine z Phosphate z L-aspartateEquivalence and Reduced Solutions
z Problem: Large number of minimal nutrient sets (1156)
is hard to understand and evaluate
z Solution: Nutrient equivalence classes
z Define two nutrients A,B to be equivalent if whenever A appears in
a minimal nutrient set, then replacing A by B yields another minimal nutrient set, and conversely
z Benefits:
z Small number of solutions
One Reduced Solution and its
Equivalence Classes
z Reduced solution 5 z Cytidine z Sulfate z Phosphate z Equivalence Classes:z (CN): cytidine, 32 other compounds, L-alanine, L-aspartate
z (S): taurine, sulfate
Lessons Learned
z Analysis is a great way to debug a knowledge base
z Gaps in network
z Missing participants
Ten Equivalence Classes
z 2 Unitary:
z HPO4 (P)
z nicotinamide mononucleotide (CNP)
z 3 with two compounds:
z Sulfate / taurine (S) z L-methionine / glutathione (CNS) z beta-D-glucose-6-phosphate / sn-glycerol-3-phosphate (CP) z 1 Medium (9 cpds) z L-valine/NH4+/ … (N) z 2 Very large z fumarate/malate/ ... (C) -- 50 cpds z cytidine/L-aspartate/ ... (CN) – 35 cpds
C Sources Equivalence Class
z fumarate z malate z deoxyuridine z 3-(3-hydroxyphenyl)propionate z D-fructuronate z succinate z lactose z L-fucose z 2-oxoglutarate z 2-dehydro-3-deoxy-D-gluconate z L-tartrate z D-fructose z trehalose z D-mannose z D-galactitol z arbutin z 3-phenylpropionate z D-glucarate z D-gluconate z L-galactonate z glyoxylate z citrate z mannosylglycerate z L-idonate z acetate z L-ascorbate z 2,3-diketo-L-gulonate (C) z L-lyxose z 5-ketogluconate z D-galactarate z beta-D-glucose z acetoacetate z psicoselysine z glycerol z beta-D-ribopyranose z D-allose z D-sorbitol z salicin z D-mannitol z uridine z D-galacturonate z beta-D-galactose z glycolate z D-xylose z L-rhamnose z D-glucuronate z thymidine z D-galactonate z melibiose z L-lysineN Sources Equivalence Class
z L-valine z nitrite z NH4+ z pyridoxamine z L-phenylalanine z L-tyrosine z L-leucine z L-isoleucine z cytosineCN Sources Equivalence Class
z cytidine z deoxycytidine z L-proline z putrescine z L-serine z glycine z 4-aminobutyrate z cyanate z xanthosine z N-acetylmuramate z glucosamine z L-arginine z phenylethylamine z GlcNAc-1,6-anhMurNAc-L-Ala-gamma-D-Glu-DAP-D-Ala z GlcNAc-1,6-anhMurNAc z xanthine z D-serine z 1,6-anhydro-N-acetylmuramate z L-ornithine z L-glutamine z N-acetyl-D-glucosamine z chitobiose z inosine z D-alanine z N-acetylneuraminate z L-glutamate z orotate z L-asparagine z L-threonine z L-tryptophan z deoxyinosine z deoxyadenosine z adenosine z L-aspartate z L-alanineSummary
z Pathway/Genome Databases
z MetaCyc non-redundant DB of literature-derived pathways
z 400 organism-specific PGDBs available through SRI at
BioCyc.org
z Computational theories of biochemical machinery
z Pathway Tools software
z Extract pathways from genomes
z Morph annotated genome into structured ontology
z Distributed curation tools for MODs
Acknowledgements
zSRI
z Suzanne Paley, Ron Caspi, Ingrid Keseler, Carol Fulcher, Markus Krummenacker, Alex Shearer, Tomer Altman, Joe Dale, Fred Gilham, Pallavi Kaipa
zEcoCyc Collaborators
z Julio Collado-Vides, Robert Gunsalus, Ian Paulsen
zMetaCyc Collaborators
z Sue Rhee, Peifen Zhang, Kate Dreher
z Lukas Mueller, Anuradha Pujar
zFunding sources:
z NIH National Institute of General Medical Sciences z NIH National Center for
Research Resources