• No results found

Metabolic Network Analysis Algorithms in Pathway Tools

N/A
N/A
Protected

Academic year: 2021

Share "Metabolic Network Analysis Algorithms in Pathway Tools"

Copied!
43
0
0

Loading.... (view fulltext now)

Full text

(1)

Metabolic Network Analysis

Algorithms in Pathway Tools

Peter D. Karp, Ph.D.

Bioinformatics Research Group SRI International

[email protected] BioCyc.org

(2)

Systems Biology

z Def 1: System-scale descriptions and analyses of

biological sytems

(3)

Overview

z Pathway/Genome Databases

z BioCyc collection

z EcoCyc, MetaCyc

z Pathway Tools software

z Visualization, Editing, Analysis

z Inference tools

z Analyzing biological networks to identify gaps and

inconsistencies

z Prediction of growth media from metabolic

(4)

What to do When Theories Become

Larger than Minds can Grasp?

z Example: E. coli metabolic network

z 244 pathways involving 1,029 reactions and 895 substrates

z Example: E. coli genetic network

z Control by 97 transcription factors of 1174 genes in 630 transcription units

z Past solutions:

z Experts specialize

z Publish theories in textual form

z We cannot compute with theories in those forms

z Evaluate theories for consistency with new data: microarrays z Refine theories with respect to new data

(5)

Databases of Metabolic Pathway Data

z Organize growing corpus of data on metabolic pathways

z Experimentally elucidated pathways in the biomedical literature z Computationally predicted pathways derived from genome data

z Provide software tools for querying and comprehending this

complex information space

z Multiorganism view: MetaCyc

z Unique, experimentally elucidated pathways across all organisms z Reference database for computational pathway prediction

z Organism-specific view:

z Organism-specific Pathway/Genome Databases z Detailed qualitative models of metabolic networks

z Combine computational predictions with experimentally determined pathways

(6)

Pathway Tools Capabilities

z Create and maintain an organism database

integrating genome, pathway, regulatory information

z Computational inference tools

z Interactive editing tools

z Query and visualize that database

z Use the database to interpret omics data z Metabolic network analysis tools

(7)

BioCyc Collection of 507

Pathway/Genome Databases

zPathway/Genome Database (PGDB) –

combines information about

z Pathways, reactions, substrates z Enzymes, transporters

z Genes, replicons

z Transcription factors/sites, promoters,

operons

zTier 1: Literature-Derived PGDBs

z MetaCyc

z EcoCyc -- Escherichia coli K-12

zTier 2: Computationally-derived DBs, Some Curation -- 24 PGDBs z HumanCyc z Mycobacterium tuberculosis zTier 3: Computationally-derived DBs, No Curation -- 481 DBs

(8)

Pathway Tools Software

z PathoLogic

z Predicts operons, metabolic network, pathway hole fillers, from genome z Computational creation of new Pathway/Genome Databases

z Pathway/Genome Editors

z Distributed curation of PGDBs

z Distributed object database system, interactive editing tools

z Pathway/Genome Navigator

z WWW publishing of PGDBs

z Querying, visualization of pathways, chromosomes, operons z Analysis operations

‹ Pathway visualization of gene-expression data ‹ Global comparisons of metabolic networks

(9)

EcoCyc

Project – EcoCyc.org

z E. coli Encyclopedia

z Review-level Model-Organism Database for E. coli

z Tracks evolving annotation of the E. coli genome and cellular networks z The two paradigms of EcoCyc

z “Multi-dimensional annotation of the E. coli K-12 genome”

z Positions of genes; functions of gene products – 76% / 66% exp z Gene Ontology terms; MultiFun terms

z Gene product summaries and literature citations z Evidence codes

z Multimeric complexes z Metabolic pathways z Cellular regulation

Nuc. Acids Res. 35:7577 2007 ASM News 70:25 2004 Science 293:2040

(10)

EcoCyc = E.coli Dataset +

Pathway/Genome Navigator

Genes: 4,478 Proteins: 4,479 Complexes: 880 RNAs: 285 Reactions: Metabolic: 975 Transport: 272 Pathways: 237 Compounds: 1,373 URL: EcoCyc.org Gene Regulation: Operons: 3,359 Trans Factors: 196 Promoters: 1,766 TF Binding Sites: 2,105 EcoCyc v13.5 Citations: 19,000

(11)

Paradigm 1:

EcoCyc as Textual Review Article

z All gene products for which experimental literature

exists are curated with a minireview summary z Found on protein and RNA pages, not gene pages!

z 3257 gene products contain summaries

z Summaries cover function, interactions, mutant

phenotypes, crystal structures, regulation, and more

z Additional summaries found in pages for operons,

pathways

(12)

Paradigm 2: EcoCyc as

Computational Symbolic Theory

z Highly structured, high-fidelity knowledge

representation provides computable information

z Each molecular species defined as a DB object

z Genes, proteins, small molecules

z Each molecular interaction defined as a DB object

z Metabolic reactions

z Transport reactions

z Transcriptional regulation of gene expression

z 220 database fields capture extensive properties

(13)

EcoCyc Accelerates Science

z Experimentalists

z E. coli experimentalists

z Experimentalists working with other microbes z Analysis of expression data

z Computational biologists

z Biological research using computational methods z Genome annotation

z Study connectivity of E. coli metabolic network

z Study phylogentic extent of metabolic pathways and enzymes in all domains of life

z Bioinformaticists

z Training and validation of new bioinformatics algorithms – predict operons, promoters, protein functional linkages, protein-protein interactions,

z Metabolic engineers

z “Design of organisms for the production of organic acids, amino acids, ethanol, hydrogen, and solvents “

(14)

MetaCyc

:

Meta

bolic En

cyc

lopedia

z Describe a representative sample of every experimentally

determined metabolic pathway

z Describe properties of metabolic enzymes

z Literature-based DB with extensive references and

commentary

z Pathways, reactions, enzymes, substrates

z Jointly developed by

z P. Karp, R. Caspi, C. Fulcher, SRI International z L. Mueller, A. Pujar, Cornell Univ

z S. Rhee, P. Zhang, Carnegie Institution

(15)

Applications of MetaCyc

z Reference source on metabolic pathways z Metabolic engineering

z Find enzymes with desired activities, regulatory properties

z Determine cofactor requirements

z Predict pathways from genomes z Systematic studies of metabolism z Computer-aided education

(16)

MetaCyc Data -- Version 13.5

Pathways 1,400 Reactions 8,100 Enzymes 5,900 Small Molecules 8,200 Organisms 1,800 Citations 20,800

(17)

Taxonomic Distribution of

MetaCyc Pathways – version 13.1

Bacteria 883

Green Plants 607

Fungi 199

Mammals 159

(18)

Pathway Tools Overviews and Omics Viewers

zDesigned to avoid the hairball effect

zGenerated automatically from PGDB

zMagnify, interrogate

zOmics viewers paint omics data onto

overview diagrams

z Different perspectives on same dataset z Use animation for multiple time points or

conditions

z Paint any data that associates numbers with genes, proteins, reactions, or

metabolites

zGenome-scale visualizations of cellular networks

zHarness human visual system to interpret patterns in biological

(19)

Regulatory Overview and Omics Viewer

(20)
(21)
(22)

Dead End Metabolites

z Clues to extra/missing reactions z A small molecule C is a dead-end if:

z (Def 1 easier to compute; Def 2 more accurate)

z Definition 1:

z C is a substrate in only one reaction of the set of SMM

reactions occurring in Compartment AND

z No reactions exist containing parent classes of C AND

z No transporter acts on C in Compartment, nor on parent

classes of C

z Definition 2:

z C is produced only by SMM reactions in Compartment, and

no transporter acts on C in Compartment OR

z C is consumed only by SMM reactions in Compartment, and

(23)

Dead-End Metabolite Analysis of

E. coli

z 36 Æ 22 dead ends in metabolic pathways z 174 dead ends in full metabolic network z GDP-L-fucose

z Produced only

z Literature research supported addition of a reaction producing

colanic acid from GDP-L-fucose

z D-galactarate and D-glucarate

z Degraded only

z Literature indicates both can be used as C sources

z Hypothetical transport reactions added

(24)

Reachability Analysis of Metabolic

Networks

z Given:

z A PGDB for an organism

z A set of initial metabolites

z Infer:

z What set of products can be synthesized by the

small-molecule metabolism of the organism

z Motivations:

z Quality control for PGDBs

z Verify that a known E. coli growth medium yields known

essential compounds of E. coli

(25)

Algorithm: Forward Propagation

Through Production System

z Each reaction becomes a production rule

z Each of the 21 metabolites in the nutrient set becomes an

axiom Nutrient set Metabolite pool “Fire” reactions Products Reactants PGDB reaction set

A + B

Æ

C

(26)
(27)

Results from EcoCyc Reachability

Analysis in 2001

z Phase I: Forward propagation

z 21 initial compounds yielded only half of the 41 essential compounds for E.

coli

z Phase II: Manually identify

z Bugs in EcoCyc (e.g., two objects for tryptophan)

‹ A Æ B B’ Æ C

z Incomplete knowledge of E. coli metabolic network

‹ A + B Æ C + D

z “Bootstrap compounds”

z Missing initial protein substrates (e.g., ACP)

‹ Protein synthesis not represented

z Phase III: Forward propagation with 11 more initial

metabolites

(28)

Minimal Nutrient Sets

Carolyn Talcott, Markus Krummenacker, Steven Eker, and Peter Karp

Computer Science Laboratory and

Bioinformatics Research Group SRI, International

(29)

The Problem

z Given a model of metabolism for an organism,

determine minimal sets of nutrients that will support growth.

z Model -- network of metabolic reactions (R)

z Nutrients -- transportables (T), compounds that have transport

reactions

z Growth -- production of essential compounds (E)

z A subset N of T is a nutrient set if E is R-producible

from N

(30)

Mathematical Approach

z S = stochiometric matrix for R Sij coeff of Ci in Rj z r = vector of reaction fluxes

z p = S x r -- production pi is production rate of Ci z pi = Si1 r1 + .... + Sik rk

z Basic constraints

z ri >= 0 -- reactions run forward z pi > 0 if Ci in E

z pi >= 0 if Ci not in E or N

z If a compound Cj not in E or T is used, it must be

(31)

Problem Simplification

z Impossibility elimination

z Drop reactions that have reactants that can not be produced (or

transported)

z (Uses forward collection)

z Uselessness elimination

z Drop useless compounds and reactions whose products are all

useless

z The useful compounds are found by backwards propagation

from E

(32)

Searching for Minimal Nutrient Sets

z Define nutset(N) for N a subset of T by

z nutset(N) = true if the constraints for N are satisfiable

z = false otherwise

z Use a constraint solver (Yices) to determine if there is

a solution

z Find one minimal N: Start with N = T and eliminate

elements until no more can be eliminated.

z Finding all requires some cleverness to do it feasibly.

Our approach uses a representation of Boolean

functions called BDDs (binary decision diagrams) to search for extensions of a set of minimal solutions.

(33)

E. coli Case Study

z 160 Transportables z 1378 Compounds z 2251 Reactions z 36 Essentials z 1156 Solutions z 9 Reduced solutions

(34)

Some Minimal Nutrient Sets

z Solution 5 z Taurine z Phosphate z L-alanine z Solution 6 z Taurine z Phosphate z L-aspartate

(35)

Equivalence and Reduced Solutions

z Problem: Large number of minimal nutrient sets (1156)

is hard to understand and evaluate

z Solution: Nutrient equivalence classes

z Define two nutrients A,B to be equivalent if whenever A appears in

a minimal nutrient set, then replacing A by B yields another minimal nutrient set, and conversely

z Benefits:

z Small number of solutions

(36)

One Reduced Solution and its

Equivalence Classes

z Reduced solution 5 z Cytidine z Sulfate z Phosphate z Equivalence Classes:

z (CN): cytidine, 32 other compounds, L-alanine, L-aspartate

z (S): taurine, sulfate

(37)

Lessons Learned

z Analysis is a great way to debug a knowledge base

z Gaps in network

z Missing participants

(38)

Ten Equivalence Classes

z 2 Unitary:

z HPO4 (P)

z nicotinamide mononucleotide (CNP)

z 3 with two compounds:

z Sulfate / taurine (S) z L-methionine / glutathione (CNS) z beta-D-glucose-6-phosphate / sn-glycerol-3-phosphate (CP) z 1 Medium (9 cpds) z L-valine/NH4+/ … (N) z 2 Very large z fumarate/malate/ ... (C) -- 50 cpds z cytidine/L-aspartate/ ... (CN) – 35 cpds

(39)

C Sources Equivalence Class

z fumarate z malate z deoxyuridine z 3-(3-hydroxyphenyl)propionate z D-fructuronate z succinate z lactose z L-fucose z 2-oxoglutarate z 2-dehydro-3-deoxy-D-gluconate z L-tartrate z D-fructose z trehalose z D-mannose z D-galactitol z arbutin z 3-phenylpropionate z D-glucarate z D-gluconate z L-galactonate z glyoxylate z citrate z mannosylglycerate z L-idonate z acetate z L-ascorbate z 2,3-diketo-L-gulonate (C) z L-lyxose z 5-ketogluconate z D-galactarate z beta-D-glucose z acetoacetate z psicoselysine z glycerol z beta-D-ribopyranose z D-allose z D-sorbitol z salicin z D-mannitol z uridine z D-galacturonate z beta-D-galactose z glycolate z D-xylose z L-rhamnose z D-glucuronate z thymidine z D-galactonate z melibiose z L-lysine

(40)

N Sources Equivalence Class

z L-valine z nitrite z NH4+ z pyridoxamine z L-phenylalanine z L-tyrosine z L-leucine z L-isoleucine z cytosine

(41)

CN Sources Equivalence Class

z cytidine z deoxycytidine z L-proline z putrescine z L-serine z glycine z 4-aminobutyrate z cyanate z xanthosine z N-acetylmuramate z glucosamine z L-arginine z phenylethylamine z GlcNAc-1,6-anhMurNAc-L-Ala-gamma-D-Glu-DAP-D-Ala z GlcNAc-1,6-anhMurNAc z xanthine z D-serine z 1,6-anhydro-N-acetylmuramate z L-ornithine z L-glutamine z N-acetyl-D-glucosamine z chitobiose z inosine z D-alanine z N-acetylneuraminate z L-glutamate z orotate z L-asparagine z L-threonine z L-tryptophan z deoxyinosine z deoxyadenosine z adenosine z L-aspartate z L-alanine

(42)

Summary

z Pathway/Genome Databases

z MetaCyc non-redundant DB of literature-derived pathways

z 400 organism-specific PGDBs available through SRI at

BioCyc.org

z Computational theories of biochemical machinery

z Pathway Tools software

z Extract pathways from genomes

z Morph annotated genome into structured ontology

z Distributed curation tools for MODs

(43)

Acknowledgements

zSRI

z Suzanne Paley, Ron Caspi, Ingrid Keseler, Carol Fulcher, Markus Krummenacker, Alex Shearer, Tomer Altman, Joe Dale, Fred Gilham, Pallavi Kaipa

zEcoCyc Collaborators

z Julio Collado-Vides, Robert Gunsalus, Ian Paulsen

zMetaCyc Collaborators

z Sue Rhee, Peifen Zhang, Kate Dreher

z Lukas Mueller, Anuradha Pujar

zFunding sources:

z NIH National Institute of General Medical Sciences z NIH National Center for

Research Resources

BioCyc.org

References

Related documents

Variety Shorima, Pavon-76, Hoggana and Mekelle-3 gave the highest grain yield of all the test varieties respectively, while ETBW 5879, UTQUE96/3/PYN/BAU//MILLAN, Danda'a and

To redesign services Commissioners and providers of healthcare will need to have strong partnerships to achieve large scale service change with local leaders embedding the

One of the major novelties of 2012 is that a “temporary staying” foreign worker’s income is subject to the 22 % levy of the insurance contributions to the Pension Fund for the income

Kaa-Iya of the Gran Chaco National Park is one of the largest protected areas in South America, encompassing over 3.4 million hectares of lowland tropical dry forest

In particular, I introduced the dslash stencil operator, several different iterative Krylov subspace linear solvers, and the Hybrid Monte Carlo algorithm. I motivated the

Second, we derive geometric point cloud information via three different features that are aggregated into 2.5m raster cells, (2) elev: variance of all point elevations within a

[r]

The analysis presented here is based on data from the Cambodia Labour Force and Child Labour Survey 2012 conducted by the National Institute of Statistics (NIS)