• No results found

Transcription factor omics

N/A
N/A
Protected

Academic year: 2021

Share "Transcription factor omics"

Copied!
69
0
0

Loading.... (view fulltext now)

Full text

(1)

z

MBP Dierk Wanke [email protected] TATA-Box cis-Element cis-Element T AG A T T T A A A A A C G AG C T AG T AG A C G AG Pol II TBP TFx TFx TFx TFx Fac to r I I Fac to r I

Transcription factor ‘omics’

Comparative transcriptomics and DNA-protein interactomics

disclose novel insights into transcription factor ‘omics’.

(2)

Principle of functional genomics

„Central Dogma of Molecular Biology“

„Central Dogma of functional Genomics“

DNA RNA Protein Metabolism

Genome Transcriptome Proteome Metabolomics

Physiology

(3)

z

MBP

Principle of functional genomics

„Central Dogma of Molecular Biology“

„Central Dogma of functional Genomics“

DNA RNA Protein Metabolism

Genome Transcriptome Proteome Metabolomics

Physiology

Besides the four classical ‚omes‘, a number of derived omes exist:

with Interactome being probably the most important one!

(4)

 Previous stress

Individual history

Output

interpretation

 Tissue / organ specificity

 Diurnal changes and circadian rhythm

 Organism responses

 Species level responses

 ‘Omics analysis  Network analysis

Do

cu

men

tat

io

n

quantitative

‘OMICS’

laboratory / environmental setting Condition  Developmental stage

(5)

z

MBP

Quantitative vs qualitative analyses

Qualitative

 

Analysis

Quantitative

 

Analysis

empirical data

mathematical data

Yes / No / Maybe significance

(6)

Quantitative vs qualitative analyses

Acquiring of quantitative data of subcellular processes with high spatio-temporal resolution in living (plant) cells in their native tissue environment

Quantitative

 

data

   

Dynamics

(7)

z

MBP

Principle of functional genomics

„Central Dogma of Molecular Biology“

„Central Dogma of functional Genomics“

DNA RNA Protein Metabolism

Genome

Transcriptome Proteome Metabolomics

Physiology

qualitative data only!

• invariable given population (basic population) • e.g. number of genes, number of bp • easy to define a background model

• relatively easy to analyze

• significance can be analyzed by Hypergeometric p

observed P

expected P

(8)

Principle of functional genomics

„Central Dogma of Molecular Biology“

„Central Dogma of functional Genomics“

DNA RNA Protein Metabolism

Genome Transcriptome

Proteome

Metabolomics

Physiology

• first real ‘omics’

mostly qualitative information (e.g. different from control? – Yes/No) • data is highly variable depending on type experiment / researcher /

method of detection

(9)

z

MBP

Principle of functional genomics

„Central Dogma of Molecular Biology“

„Central Dogma of functional Genomics“

DNA RNA Protein Metabolism

Genome Transcriptome Proteome

Metabolomics

Physiology

• qualitative and quantitative readout

• data is highly variable depending on type experiment / researcher / method of extraction • analysis is driven by researcher’s objectives • due to high variability and data close to detection limit, metabolomes are very hard to analyze by statistics

(10)

Principle of functional genomics

„Central Dogma of Molecular Biology“

„Central Dogma of functional Genomics“

DNA RNA Protein Metabolism

Genome

Transcriptome

Proteome Metabolomics

Physiology

• qualitative and quantitative readout

• data is highly variable depending on type experiment / researcher / method of detection

• various background models can be defined, which all end up with different results! - Analysis is driven by researcher’s objectives

(11)

z

MBP

Principle of functional genomics

„Central Dogma of Molecular Biology“

„Central Dogma of functional Genomics“

DNA RNA Protein Metabolism

Genome

Transkriptome

Proteome

Metabolomics

Physiology

Interestingly, due to the quantitative nature of the data types, both methods cope with similar problems

e.g.:

• accession with low abundance, are sometimes below detection limit and have high variability

 large standard deviation / error bars

• accession with high abundance, are so frequent that they compete with others or they saturate detection

(12)

distal proximal

Promoter

cis cis TATA cis cis

transcription-start Exons Introns

*

ATG gene sequence mRNA Transcript DNA Protein

Transcriptome: all transcripts at a given time 2 current methods in the field:

Transcriptome

Transcriptome

(13)

z

MBP

13 more than 50 Mio. probe sets on 2 cm x 2 cm

Microarray

(14)

PM MM Signal Ratio PerfectMatch (PM) AGATGATAGACAGAGCAGATGCTTG MisMatch (MM) AGATGATAGACAGACCAGATGCTTGOligonucleotide-Array (Genechip-Arrays):

Microarray

Transcriptome

(15)

z

MBP

Oligonucleotide-Array (Genechip-Arrays):

Microarray

Transcriptome

(16)

FRK1 (At2g19190)

At2g19200 At2g19180

GeneChip Array

Exon Array

Tiling Array

• qualitative and quantitative readout

BUT only the GeneChip Arrays have internal controls (mismatch probes) • Exon- and Tiling Array are not assigned to ONE specific target

(17)

z

MBP

distal proximal

Promoter

cis cis TATA cis cis

transcription-start Exons Introns

*

ATG gene sequence mRNA Transcript DNA Protein

Transcriptome: all transcripts at a given time and tissue sample 2 current methods dominate the field:

- microarrays

- Next Generation Sequencing (NGS)

Transcriptome

Transcriptome

(18)

Next Generation Sequencing (NGS)

Transcriptome

(19)

z

MBP

Next Generation Sequencing (NGS)

no sequence  low coverage  no informative data At2g19200 At2g19180 Genome-Seq no sequence  low coverage  no informative data

very high sequence coverage 

over-sampling, satuartion

FRK1 (At2g19190)

(20)

Next Generation Sequencing (NGS)

At2g19200 At2g19180

Genome-Seq

RNA-Seq

expression signal, but there is no annotated gene  badly annotated, microRNA-gene, differential splicing FRK1 (At2g19190)

(21)

z

MBP

Next Generation Sequencing (NGS)

FRK1 (At2g19190)

At2g19200 At2g19180

Genome-Seq

RNA-Seq

low expression signal, (low coverage?) 

badly annotated, cryptic intron in gene model, differential splicing

(22)

Transcriptome:

What does ANALYSIS mean?

Challenge:

Find differentially expressed genes! • Mine for meaningful gene sets! - fold difference from control (2x; 3x, 4x...)

[simple data with control-treatments only]

- LIMMA (Linear Models for Microarray Data)

[all data, but linear modeling only]

- probabilistic dynamical model

(23)

z

MBP

Transcriptome:

What does ANALYSIS mean?

Challenge:

• Find differentially expressed genes!

Mine for meaningful gene sets! – Cluster analysis - Pearson correlation

- k-means clustering - hierarchical clustering

Problem: house-keeping genes!!! Transcriptome

(24)

Why searching for co-expression

Co-expressed genes encode for proteins that act in the same signaling cascades, are enrolled in the same enzymatic pathways or represent interacting partners that form higher order complexes.

Additionally, genes with similar expression profiles in microarray experiments might be regulated by the same cis-regulatory elements

(CREs).

(25)

z

MBP Grundlagen der funktionellen Genomanalyse

Related to photosynthesis

(26)

Co-expression reveals logic connections

Transcriptome

(27)

z

MBP

Transcriptome vs. metabolome

Mangelsen et al. 2010, Plant Phys.

Combined analysis of transcriptomics and metabolomics in barley research

simple Pearson correlation

(28)

sugars and genes exhibit diurnal fluctuations

+0.8 -0.8 Correlation coefficients M edi an n or m al iz ed (lo g sc al e) 103 138 Kestose 0.1 1 10 0.1 1 10 Sucrose 352 434 8 2 Fructose 10 27 Glucose 0.1 1 10 0.1 1 10 +0.8 -0.8

(29)

z

MBP • meaningful Transcriptome data

microarray

next

 

generation

 

sequencing

(RNA

seq)

+ relatively high comparability

+ identification of DE genes

+ co-expression analysis + gene regulatory network

analysis

- requires known genome

- weak on differential splicing events

- weak on ChIP data

Transcriptome:

What does ANALYSIS mean?

- low comparability of data

- difficult to identify DE genes

- co-expression analysis impossible

- no network analysis

+ applicable to any organism

+ good on differential splicing events

+ very good on ChIP data

qu an tita tiv e da ta ma in ly qua lit at iv e da ta

(30)

Chromatin-ImmunoPrecipitation-NGS (ChIP-seq)

Interactome

Immunoprecipitation of specific DNA-protein-complexes with antibodies

De-cross-link DNA-protein complexes

(31)

z

MBP

Histone modification regulates expression

Ac Ac Ac Ac Me Me Me MeMeMe Me Me Me Me Me Me Me Me Me PcGs Ac TrxG H3K27me3 Me Me Me tri-methylation H3K4me3 mono-acetylation Ac

transcriptional active [Euchromatin] ON

transcriptional inactive [Heterochromatin] OFF

• Histone-modifications are important for gene activation or repression • Position of Histones on the DNA can be identified by

Chromatin-Immunoprecipitation (ChIP) and subsequent NGS

(32)

Transcriptome

(mRNA-Seq) +

Epigenome

(ChIP-Seq)

ChIP-Seq (H3K27me3) OFF RNA-Seq (transcriptome) ChIP-Seq (H3K4me3) ON FRK1 (At2g19190) At2g19200 At2g19180 low OFF-signal, high ON-signal  high expression, good

(33)

z

MBP

ChIP-Seq (H3K27me3) OFF RNA-Seq (transcriptome) ChIP-Seq (H3K4me3) ON FRK1 (At2g19190) At2g19200 At2g19180 OFF-signal, but still mRNA

(34)
(35)

z

MBP distal

cis cis TATA

proximal Promoter cis cis transcription-start Exons Introns

*

ATG • laboratory experimentsbioinformaticssynthetic promotersbiotechnologyspectro-microscopy mRNA

Rapid gene expression response in plant

- How are rapid (stress) responses controlled? - How is specificity of this response mediated?

TF – Motif - Interaction

- How do TFs find their cis-element?

- What is the specificity of this interaction? TF

(36)

Transcription factors (TFs) act as molecular switches

http://www.pinkfloyd-forum.de bZIP WRKY BPC BAM GATA

(37)

z

MBP

Genome Transkriptome Proteome Metabolome „Central Dogma of Molecular Biology“

„Central Dogma of functional Genomics“

DNA RNA Protein Metabolism

Physiology promoter analysis microarray analysis Interactome Protein – DNA-interactome

(38)

Is there a cryptic code in non-coding regulatory sequence?

Genome

distal

cis cis TATA

proximal Promoter cis cis transcription-start Exons Introns

*

ATG 100 200 300 400 • frequencyposition N o. o f h its

(39)

z

MBP

Identification of putative cis-elements by genomic enrichment

Genome

distal

cis cis TATA

proximal Promoter cis cis transcription-start Exons Introns

*

ATG 0 200 400 600 800 1000 1200 1400 1600 0 50 100 150 200 250 300 350 400 450 500 N o. o f h its N o. o f h its

frequency distribution curves for the

TATATAmotif

(40)

Only ~10% of known DNA-binding proteins have

a known DNA-binding motif

Genome AGRIS Arabidopsis 2340 415 average ~2000 18 % <200 <10 %

(41)

z

MBP

Binding specificity of TFs to their DNA motif

Interactome epitope tagged protein of interest in crude E. coli extract 2 pmol bio dsDNA per well colorimetric detection

DNA-protein interaction ELISA

Brand et al., 2010

(42)

Development of a DPI-ELISA screen

Interactome

(43)

z

MBP

A DPI-ELISA screen with stress responsive WRKY-TFs

Interactome

Brand et al., 2013a Brand et al., 2013b

WRKY11 DBD

 Compare different WRKY binding affinities

(44)

Homology modeling of the WRKY protein at the binding motif

Interactome

WRKY WRKY

(45)

z

MBP

Interactome

Brand et al., 2013a Brand et al., 2013b

Homology modeling of the WRKY protein at the binding motif

WRKY-protein at the binding motif

linker region

variable library region

linker region

Homology modeling allows us to built hypotheses about binding of proteins to DNA, as soon as the sequences of both,

(46)

Interactome

Molecular Dynamics (MD) simulations with WRKY33-domains

strong binder

weak binder

• MD gives an impression about binding strength • requires usually complex

(47)

z

MBP

A DPI-ELISA screen with stress responsive WRKY-TFs

Interactome

Brand et al., 2013a Brand et al., 2013b

(48)

Homology modeling and interactome provide functional clues

Interactome

• WRKY proteins recognize (slightly) different binding consensi • Different binding consensi are bound with different affinities

 Quantitative binding information can be translated into a probability matrix to search genomic information for possible binding sites

WRKY11 DBD WRKY33 cDBD WRKY50 DBD ydGACY nnGACw hnGACh affinity high low

(49)

z

MBP

ATH1-Genechip:

22746 probe cells (79% of all A. thaliana genes)

>24000 genes (85%) can be detected due to the high redundanzcy in the genome

21685 unique probe sets

about 13600 (~60%) genes hybridize with probes to generate analyzable data

Oligonucleotide-Array (Genechip-Arrays):

Microarray

We made use of the 1295 AtGenExpress Expression Profiles + ~1500 additional datasets for co-expression analyses Transcriptome

(50)

Homology modeling and interactome provide functional clues

Which genes are co-expressed with WRKY genes AND

have the respective binding site in their promoter?

Transcriptome

(51)

z

MBP

~22 000 genes; analyzed in ~3000 microarray hybridizations

Wanke et al., QBIC 2010

Co-expression analysis in complex Microarray datasets

Transcriptome

(52)

Identify WRKY-co-expressed genes

Genome analysis

combined with

transcriptome analysis

Transcriptome

distal

cis cis TATA

proximal Promoter cis cis transcription-start Exons Introns

*

ATG mRNA WR K Y Genome

Identify binding sites in promoters

(53)

z

MBP WRKY11 DBD WRKY33 cDBD WRKY50 DBD ydGACY ~400 ~ 3000 ~40 nnGACw ~400 ~ 5000 ~60 hnGACh ~300 ~8000 ~250 affinity high low No. co-expressed genes:

No. binding sites:

No. put. targets

Genome analysis

combined with

transcriptome analysis

Genome - Transcriptome

(54)

Genome analysis

combined with

transcriptome analysis

Transcriptome 6 12 24 6 12 24 6 12 24 control 0 0.1 1 10 100 no rm aliz ed s ig na l in te ns ity ( lo g sc ale ) activator repressor mock stress AtWRKY33 stress

(55)

z

MBP

NGS coupled genomics: In vivo target gene identification by

chromatin immuno-precipitation

Genome Isolation of bound DNA Sequencing by NGS 6 12 24 0.1 1 10 100 no rm aliz ed s ig na l in te ns ity ( lo g sc ale ) activator repressor AtWRKY33

Integration of metadata from the Somssich Lab (Birkenbihl et al., 2012)

All ~60 putative target genes were independently

identified as true

(56)

WRKY11 DBD WRKY33 cDBD WRKY50 DBD ydGACY ~40 nnGACw ~60 hnGACh ~250 affinity high low

No. put. targets

What is the function of the putative target genes!

Genome - Transcriptome

cell cycle chromatin remodeling basal resistance pathogen response

(57)

z

MBP

What is the function of plant specific BBR/BPC factors?

99 93 100 95 97 76 100 73 99 Os10g0114500 Os10g0115500 HvBBR GmGAGA-BP AtBPC2 AtBPC1 AtBPC3 AtBPC7 AtBPC4 AtBPC5 AtBPC6 Os06g0130600 Ceratopteris richardii Physcomitrella patens 0.05 changes Grou p I G rou p I I

• BBR/BPCs arose evolutionary late in the green plant lineage • Bind to GAGA-motifs and recruit LHP1 to DNA

(58)

Microarray vs. NGS

Transcriptome

Control lhp1-4 bpc4 bpc6 lhp1-4

bpc4 bpc6

(59)

z

MBP

Microarray vs. NGS

Control lhp1-4 bpc4 bpc6 lhp1-4

bpc4 bpc6

Strong defects in flowers and seed production!

Transcriptome (microarray) analysis to understand what is wrong!

(60)

Microarray vs. NGS

microarrays provide the potential to compare the data with already 

(61)

z

MBP

Microarray vs. NGS

• more induced genes in the mutants  function as repressor proteins

(62)

Microarray vs. NGS

microarray NGS

Verify the data by Transcriptome (NGS) analysis!

• identical RNA used in both experiments

• two replicates libraries • paired end reads

• ~ 80 Mio analyzable reads from each library

30 6 17 10 412 221 27

(63)

z

MBP

Microarray vs. NGS

microarray NGS 123 12 28 12 729 323 65 30 6 17 10 412 221 27 28 17 0

(64)

Microarray vs. NGS

Genome - Interactome

Isolation of target DNA

Sequencing by NGS

Interactome (ChIP-seq) analysis to identify direct target genes!

• two replicate libraries from BPC6-expressing plants • single end reads

(65)

z

MBP

Microarray vs. NGS

microarray NGS 123 12 28 12 729 323 65 30 6 17 10 412 221 27 28 17 0

33 putative direct targets overlapped with microarray expression • 47 putative direct targets overlapped with NGS expression

• only a total of 8 were picked up with both techniques

6 2

(66)

GLOBAL vs. LOCAL

Next Generation Sequencing (NGS)

NGS is a quantitative approach,…

…but reveals mostly (high value) qualitative data

Quantitative analyses on a Global scale are informative, while specific local questions provide qualitative data only or remain hidden.

• Unfortunately, NGS for transcriptomics is often useless (to date) • local expression analysis is difficult

• co-expression analysis is almost impossible

• Chromatin-ImmunoPrecipitation-NGS (ChIP-seq) data is highly informative,

(67)

z

MBP • Quantitative ‘omics’ data can be normalized and, thus, is comparable.

• Quantitative data from different sources can be used directly in a

simultaneous combined analysis (e.g. transcriptomics and metabolomics)

• NGS data harbors great potential, but (at present) can not substitute quantitative gene expression data e.g. in co-expression analyses

• Novel methods, such as the DPI-ELISA, enable quantitative proteomics

• The combined analysis of quantitative transcriptome and interactome data provide unique insights into protein function

(68)

What does ANALYSIS mean - personally?

Challenge:

Mine for meaningful data

Problem: Conflicting interests

bioinformatics biologists • significant output • is not interested in a special gene or outcome (unbiased)  independent of funding

• biologically relevant output • is ONLY interested

in his special gene(s) of interest (biased)

(69)

z

MBP

Thank you!

Andreas Hecker Sachie Kimura Rebecca Dautel Joachim Kilian Luise Brand Nathalie Sebening Christine Zehren Angelika Anna Dagmar Pogomara Jan Hirsch Tübingen Saarland ETH Zürich Sam Zeeman Sebastian Soyk Uni Rostock Birgit Piechulla Katrin Wenke MPI Tübingen Markus Schmid Silvio Collani INRA Versailles Valerie Gaudin Uni Würzburg

Wolfgang Dröge‐Laser Christoph Weiste

References

Related documents

Keywords: Determinants, Risky sexual behavior, Synthetic drugs, Drug abuse, Alcohol consumption, Adolescents, Iran.. ©

Black hole attack is also one of the possible attack which harms the security of MANET because now a day mostly the people can use the ad-hoc network to transmission of data so it

The results of this study were 80 respondents, 56 respondents used the BCA, Bank Mandiri 47 respondents, 36 respondents using BNI46, 10 respondents using Paypal,

ous mining of linkage and linkage disequilibrium to fine map quantitative trait loci in outbred half-sib pedigrees: Revisiting the location of a quantitative trait locus with

LPS: Lipopolysaccharides; RANKL: Receptor activator of nuclear factor-kappaB ligand; OPG: Osteoprotegerin; TSBY: Trypticase Soy Broth supplemented with yeast extract; BAP: Blood

The location of traditional Red-footed Falcon roost sites (i.e. used at least in three years of the study period) where the maximum number of birds reached 400 birds at least once in

This was the result of the lack of standard material and strengths of goal posts in some university halls, the lack of proper surface on basketball and volleyball bases,

Conclusion: In patients with chronic ischemic mitral regurgitation during CABG, mitral valve replacement is associated with lower recurrence of regurgitation.. No differences were