• No results found

Systems Biology through Data Analysis and Simulation

N/A
N/A
Protected

Academic year: 2021

Share "Systems Biology through Data Analysis and Simulation"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

U.S. Department of Energy

Pacific Northwest National Laboratory

5/30/03 1

Biomolecular Networks Initiative

William Cannon

Computational Biosciences

Pacific Northwest National Laboratory

Systems Biology through Data Analysis

and Simulation

(2)

RAF P MEK P P P NARP PKA

Gene

Expression

Change in Protein

state of cell

Post-translational modifications

Cellular

Dynamics

Data Mining

NARX

Nitrate

P P ATP Binding Domain ADP ATP PO4 Transfer Domain NARQ

Nitrate

P P ATP Binding Domain ADP ATP PO4 Transfer Domain P P NARL

Microbial Cell

Dynamics

(3)

U.S. Department of Energy Pacific Northwest National Laboratory

5/30/03 3

Biomolecular Networks Initiative

Deciphering Cell Dynamics

Peptide State: analyze peptide MS/MS data de novo with capability to obtain

knowledge not captured in sequence databases.

Develop a reliable, statistically-based identification method.

Account for sequence variations relative to sequence database.

Protein State and State of the cell: High-throughput proteomics must paint a picture of

the state of the cell

MS data needs to be pieced together into a picture of the state of the protein.

All expressed proteins must be analyzed.

Microarrays have so far set a bar that must be surpassed if the expense of proteomics is to be justified.

Cellular networks: Determination from high-throughput technologies:

Account for hidden variables

Make use of existing pathway information
(4)

-80 -60 -40 -20 0 20 40 60 80 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 C7

NH3-HCα-CO- NH-HCα-CO- NH-HCα-CO- NH-HCα-CO- NH-HCα-COOH Learn the fragmentation patterns and frequency of occurrence from

MS/MS spectra of known peptides

(5)

U.S. Department of Energy Pacific Northwest National Laboratory

5/30/03 5 (a ) C a nd id a te fin g e rp r in t for P G ID F T N D P L L Q G R 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 0 1 2 3 4 M o le c u la r W e ig ht R elat iv e I nte ns it y

(b ) E x tra c tio n a n d c o m p a r is o n o f c a nd id a te fin g e rp r in t p e a ks fro m a s p e c tru m for P G ID F T N D P L L Q G R L o g -L ik e lih o o d R a tio = 2 1 .3 P rob a b ility S c ore = 0 .9 9 9

F ig u r e 3 . 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 0 0 .2 0 .4 0 .6 0 .8 1 F re que nc y o f Ap pe ar an ce C u t- o ff fre q u e nc y q0fo r s c or in g

For peptide in question, construct a MS/MS fingerprint: • peak location, lr,i

• variance in location, ,sr,i,

• frequency of appearance, pr,i

Peptide Fingerprint

MS/MS Spectrum:

• fingerprint peaks (black) • unassigned peaks (aqua)

(6)

Statistical hypothesis test for identifying a peptide: • HA: the spectrum resulted from a known peptide.

• H0: the association between the spectrum and the peptide is no better than random.

, , , , 0 1 , , , , 1 , , , ,

{outcome under

}

{outcome under

}

(1

)

(1

)

r i r i r i r i A x x r i r i r i r i x x r i r i r i r i

P

H

L

P

H

p

p

q

q

− −

=

=

∏ ∏

∏ ∏

1

{

| }

1

1

(

-1)

A cand

P H

x

N

L

=

+

(7)

U.S. Department of Energy Pacific Northwest National Laboratory

5/30/03 7 -30 -20 -10 0 10 20 30 40 50 60 70 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Log-Likelihood Ratio Relat iv e F re quenc y 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Probability Score R e la ti v e F requ enc y Mismatches Matches 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 1 2 3 4 5 x 10-3 Probability Score R e la ti v e F requ enc y 6 Figure 6. Peptide LR P(H_a) P chg mass Protein

1 VVLEISPYDTSR 15.66 1.00 0.87 1 1378.56 DR2123 initiation factor 1 2 LSLLPEHLESDK 2.25 0.01 0.13 1 1380.55 DRB0143 McrB-related protein 3 VVHSMWTPLPGR -1.07 0.00 -0.06 1 1379.57 DR1838 GTP pyrophosphokinase 4 PELPWGYNGTPF -1.44 0.00 -0.08 1 1377.53 DR1795 hypothetical protein 5 VVPDFNCQDGGTK -2.05 0.00 -0.11 1 1379.50 DRC0030 hypothetical protein 6 PQLTVTFDDEGR -2.51 0.00 -0.14 1 1377.51 DR2194 ribosomal protein S6 7 TTEQTFNVEIAK -2.57 0.00 -0.14 1 1380.53 DR2429 hypothetical protein 8 PVDLVTLSEHLR -2.77 0.00 -0.15 1 1378.60 DR0549 DNA helicase (dnaB) 9 QFDSHIEVIHR -3.12 0.00 -0.17 1 1380.55 DRB0110 thioredoxin-related protein 10 LSLDLALGVGGIPR -3.52 0.00 -0.20 1 1380.65 DR2340 recA protein

(8)

What is a “Genetic Regulatory Network” ?

G1 G2

G3

G4

•What its not:

a metabolic or synthetic pathway What it is:

• A network inferred from transcriptional data (microarrays); hence the name “genetic”

• The network connections are supposed to show which genes control which other genes; hence the name “regulatory”

(9)

U.S. Department of Energy Pacific Northwest National Laboratory

5/30/03 9

Transcription is regulated at multiple levels

G1

G2

G3

G4 mRNA1

Protein1 mRNA2 Protein

2

The statistical inference on microarray data will ideally account for hidden variableshidden

• Methylation of DNA • Regulation by proteins

(transcription factors) • RNA interference • RNA loop structures • etc.

(10)

Approaches for network inference

Cluster analysis: coarse level analysis. Hypothesize function,

regulation, etc.

Relevance network

:

connect all associated genes – used by some

for drug target identification.

Graphical model:

identify ‘causal pathways’ based on conditional

covariation patterns (e.g., Bayesian network)

(11)

U.S. Department of Energy Pacific Northwest National Laboratory

5/30/03 11

Using Graphical Models for Analyzing Biological

Data

Resolution:

Inferred networks will be low resolution maps of the cellular networks

à

Fine structure will not be resolved.

Sample size problem:

Typically trying to infer relationships between 6,000+ genes from only

10-100 microarray experiments.

Solution: Invert the object/variable space

I

nstead, infer relations between 10-100 experiments from

(12)

One approach: Invert the object/variable space

Knockout data Observational data Gene 1 Gene 2 Gene 3 Gene i Gene j Gene p Gene 1 Gene 2 Gene 3 Gene i Gene j Gene p E xp 1 E xp 2 E xp n Gen e M1 ∆ Gen e M2 ∆ Gen e Mn ∆ Gen e Mi ∆ Gen e Mj ∆

Ideal intervention and Experimental Design:

Each experiment is a gene-knock-out (knock-in, RNAi, etc)

(13)

U.S. Department of Energy Pacific Northwest National Laboratory

5/30/03 13

Inferring the connections

G

1

G

2

G

3

G

4

G

5 True network: ÷ ÷ ÷ ÷ ÷ ÷ ø ö ç ç ç ç ç ç è æ = 1 95 . 1 95 . 82 . 1 94 . 91 . 90 . 1 84 . 81 . 80 . 89 . 1 R

Correlation matrix Partial Correlation matrix

G1 G2

G3

G4

G5

Resulting graph

(14)

Partial Correlations

■ Partial correlations are correlations between two variables when other variables are

are controlled for.

■ Partial correlations remove the effect of confounding variables

2

13

2

13

23

13

12

3

.

12

1

1

r

r

r

r

r

r

=

(15)

U.S. Department of Energy Pacific Northwest National Laboratory

5/30/03 15

Partial Correlations

Ø Pairwise partial correlations when the effect of all other genes are removed:

. 1 90 . 1 90 . 1 27 . 24 . 1 53 . 1 ~ . ÷ ÷ ÷ ÷ ÷ ÷ ø ö ç ç ç ç ç ç è æ − = .81 R12

Partial Correlation matrix

G1 G2

G3

G4

G5

(16)

Partial Correlations using subsets and patterns

ØPairwise partial correlations when the effect of subsets of other genes are removed:

ØSearch for specific correlation patterns when constructing the graphs

•Directionality G1 G2 G3 G4 G5 Inferred Pattern G1 G2 G3 G4 G5 True Network

(17)

U.S. Department of Energy Pacific Northwest National Laboratory

5/30/03 17

Using Graphical Models

Determining Causal influence

Ø

(1) Gene C causes A and B, or (2) A causes C which causes B:

C

A

B

A

C

B

A and B conditionally independent on C

Ø Inference algorithms work from conditional dependence/independence relationships backwards to causal influence.

repressor

lactose lac operon repressor

(18)

Results for 273 yeast knock-out mutants

(19)

U.S. Department of Energy Pacific Northwest National Laboratory

5/30/03 19 Cellular Networks:

(20)

Comparison of inferred Network with Synthetic

Pathway

Genetic Networks are not metabolic/synthetic pathways

Genes in a pathway can be ordered by their time of

appearance.

Causal networks provide information on order

Partial order between inferred networks and metabolic

pathways may be possible

Erg10 > Erg13 > HMG1, HMG2 > … … > Erg3 > Erg5 > Erg4.

(21)

U.S. Department of Energy Pacific Northwest National Laboratory

5/30/03 21

Partial Order Comparison between synthetic

pathway and inferred network

Agreement:

IDI1 > ERG4

HMG2 > ERG11 > yer044c (ERG28) > ERG2, ERG3

MGG2 > ERG11 > ERG2

ERG6 > ERG2

ERG11 > ERG2

Disagreement:

Erg2 > Erg3

(22)

Regulation of Erg 11 by Swi4

There has been speculation that Swi4 regulates erg11:

Swi4 binding motif:

CACGAAA

URS1ERG11 motif:

CACGAAAAACGAGACAAACGAA

Swi4 weakly binds to the ERG11 URS as determined by chromatin

immunoprecipitation and microarray analysis (Iyer, et al, Nature 409,

p. 533)

This appears to be the first evidence of functional correlation

between erg11 and swi4.

(23)

U.S. Department of Energy Pacific Northwest National Laboratory

5/30/03 23

Biomolecular Networks

Deciphering Cellular Networks:

Future

Use known relationships as constraints:

Genes in the same operon

Regulators and their regulated genes

Genes in known metabolic pathways

Use protein expression data including post-translational modifications

(24)

Acknowledgments

Current Funding DOE OBER ■Microbial Genome ■Microbial Cell ■Genomes-to-Life Macromolecular Structure and Dynamics Gordon Anderson Ken Auberry Dick Smith Statistics and Appl. Math

Kris Jarman Alan Willse Sharon Wunschel Alejandro Heredia-Langner Biogeosciences Jim Fredrickson Shewanella Federation

References

Related documents

© John Newton & Company Ltd, May 2011 John Newton & Co Ltd, Est 1848, is the uk’s leading independent supplier of structural waterproofing systems, water control,

Warner, EL, Norton, IT & Mills, TB 2019, 'Comparing the viscoelastic properties of gelatin and different concentrations of kappa-carrageenan mixtures for additive

We observe that when testing at the 5% significance level, data revisions in annual productivity growth are neither news nor noise for labour productiv- ity, output per hour in

Patrick Cann , Director of Leisure Services, provided a brief overview of the Department, which oversees Community Development Block Grant (CDBG) funding , capital projects ,

(Use the port on the left side of the MUT−III laptop.) Select ECU Reprogramming, then Memory Card Transfer.. A status bar will display while the data is loading into the

Proposed solution therefore satisfies 2 out of 9 requirements supporting process modelling; it provides support for modelling of an event-based com- munication type and improves

The evolution and design of service robots in health care: evaluating the role of speech and other modalities in human-robot interaction1. A critical analysis of

Kellianne Baranowsky on behalf of Creditor Committee Official Committee of Unsecured Creditors [email protected], [email protected];[email protected] Mia