z
MBP Dierk Wanke [email protected] TATA-Box cis-Element cis-Element T AG A T T T A A A A A C G AG C T AG T AG A C G AG Pol II TBP TFx TFx TFx TFx Fac to r I I Fac to r ITranscription factor ‘omics’
Comparative transcriptomics and DNA-protein interactomics
disclose novel insights into transcription factor ‘omics’.
Principle of functional genomics
„Central Dogma of Molecular Biology“
„Central Dogma of functional Genomics“
DNA RNA Protein Metabolism
Genome Transcriptome Proteome Metabolomics
Physiology
z
MBPPrinciple of functional genomics
„Central Dogma of Molecular Biology“
„Central Dogma of functional Genomics“
DNA RNA Protein Metabolism
Genome Transcriptome Proteome Metabolomics
Physiology
Besides the four classical ‚omes‘, a number of derived omes exist:
with Interactome being probably the most important one!
Previous stress
Individual history
Output
interpretation
Tissue / organ specificity
Diurnal changes and circadian rhythm
Organism responses
Species level responses
‘Omics analysis Network analysis
Do
cu
men
tat
io
n
quantitative‘OMICS’
laboratory / environmental setting Condition Developmental stagez
MBPQuantitative vs qualitative analyses
Qualitative
Analysis
Quantitative
Analysis
empirical data
mathematical data
Yes / No / Maybe significanceQuantitative vs qualitative analyses
Acquiring of quantitative data of subcellular processes with high spatio-temporal resolution in living (plant) cells in their native tissue environment
Quantitative
data
Dynamics
z
MBPPrinciple of functional genomics
„Central Dogma of Molecular Biology“
„Central Dogma of functional Genomics“
DNA RNA Protein Metabolism
Genome
Transcriptome Proteome MetabolomicsPhysiology
• qualitative data only!
• invariable given population (basic population) • e.g. number of genes, number of bp • easy to define a background model
• relatively easy to analyze
• significance can be analyzed by Hypergeometric p
observed P
expected P
Principle of functional genomics
„Central Dogma of Molecular Biology“
„Central Dogma of functional Genomics“
DNA RNA Protein Metabolism
Genome Transcriptome
Proteome
MetabolomicsPhysiology
• first real ‘omics’
• mostly qualitative information (e.g. different from control? – Yes/No) • data is highly variable depending on type experiment / researcher /
method of detection
z
MBPPrinciple of functional genomics
„Central Dogma of Molecular Biology“
„Central Dogma of functional Genomics“
DNA RNA Protein Metabolism
Genome Transcriptome Proteome
Metabolomics
Physiology
• qualitative and quantitative readout
• data is highly variable depending on type experiment / researcher / method of extraction • analysis is driven by researcher’s objectives • due to high variability and data close to detection limit, metabolomes are very hard to analyze by statistics
Principle of functional genomics
„Central Dogma of Molecular Biology“
„Central Dogma of functional Genomics“
DNA RNA Protein Metabolism
Genome
Transcriptome
Proteome MetabolomicsPhysiology
• qualitative and quantitative readout
• data is highly variable depending on type experiment / researcher / method of detection
• various background models can be defined, which all end up with different results! - Analysis is driven by researcher’s objectives
z
MBPPrinciple of functional genomics
„Central Dogma of Molecular Biology“
„Central Dogma of functional Genomics“
DNA RNA Protein Metabolism
Genome
Transkriptome
ProteomeMetabolomics
Physiology
Interestingly, due to the quantitative nature of the data types, both methods cope with similar problems
e.g.:
• accession with low abundance, are sometimes below detection limit and have high variability
large standard deviation / error bars
• accession with high abundance, are so frequent that they compete with others or they saturate detection
distal proximal
Promoter
cis cis TATA cis cis
transcription-start Exons Introns
*
ATG gene sequence mRNA Transcript DNA ProteinTranscriptome: all transcripts at a given time 2 current methods in the field:
Transcriptome
Transcriptomez
MBP13 more than 50 Mio. probe sets on 2 cm x 2 cm
Microarray
PM MM Signal Ratio PerfectMatch (PM) AGATGATAGACAGAGCAGATGCTTG MisMatch (MM) AGATGATAGACAGACCAGATGCTTG • Oligonucleotide-Array (Genechip-Arrays):
Microarray
Transcriptomez
MBP• Oligonucleotide-Array (Genechip-Arrays):
Microarray
TranscriptomeFRK1 (At2g19190)
At2g19200 At2g19180
GeneChip Array
Exon Array
Tiling Array
• qualitative and quantitative readout
• BUT only the GeneChip Arrays have internal controls (mismatch probes) • Exon- and Tiling Array are not assigned to ONE specific target
z
MBPdistal proximal
Promoter
cis cis TATA cis cis
transcription-start Exons Introns
*
ATG gene sequence mRNA Transcript DNA ProteinTranscriptome: all transcripts at a given time and tissue sample 2 current methods dominate the field:
- microarrays
- Next Generation Sequencing (NGS)
Transcriptome
TranscriptomeNext Generation Sequencing (NGS)
Transcriptomez
MBPNext Generation Sequencing (NGS)
no sequence low coverage no informative data At2g19200 At2g19180 Genome-Seq no sequence low coverage no informative data
very high sequence coverage
over-sampling, satuartion
FRK1 (At2g19190)
Next Generation Sequencing (NGS)
At2g19200 At2g19180
Genome-Seq
RNA-Seq
expression signal, but there is no annotated gene badly annotated, microRNA-gene, differential splicing FRK1 (At2g19190)
z
MBPNext Generation Sequencing (NGS)
FRK1 (At2g19190)
At2g19200 At2g19180
Genome-Seq
RNA-Seq
low expression signal, (low coverage?)
badly annotated, cryptic intron in gene model, differential splicing
Transcriptome:
What does ANALYSIS mean?
Challenge:
• Find differentially expressed genes! • Mine for meaningful gene sets! - fold difference from control (2x; 3x, 4x...)
[simple data with control-treatments only]
- LIMMA (Linear Models for Microarray Data)
[all data, but linear modeling only]
- probabilistic dynamical model
z
MBPTranscriptome:
What does ANALYSIS mean?
Challenge:
• Find differentially expressed genes!
• Mine for meaningful gene sets! – Cluster analysis - Pearson correlation
- k-means clustering - hierarchical clustering
Problem: house-keeping genes!!! Transcriptome
Why searching for co-expression
Co-expressed genes encode for proteins that act in the same signaling cascades, are enrolled in the same enzymatic pathways or represent interacting partners that form higher order complexes.
Additionally, genes with similar expression profiles in microarray experiments might be regulated by the same cis-regulatory elements
(CREs).
z
MBP Grundlagen der funktionellen GenomanalyseRelated to photosynthesis
Co-expression reveals logic connections
Transcriptomez
MBPTranscriptome vs. metabolome
Mangelsen et al. 2010, Plant Phys.
• Combined analysis of transcriptomics and metabolomics in barley research
simple Pearson correlation
sugars and genes exhibit diurnal fluctuations
+0.8 -0.8 Correlation coefficients M edi an n or m al iz ed (lo g sc al e) 103 138 Kestose 0.1 1 10 0.1 1 10 Sucrose 352 434 8 2 Fructose 10 27 Glucose 0.1 1 10 0.1 1 10 +0.8 -0.8z
MBP • meaningful Transcriptome datamicroarray
next
generation
sequencing
(RNA
‐
seq)
+ relatively high comparability
+ identification of DE genes
+ co-expression analysis + gene regulatory network
analysis
- requires known genome
- weak on differential splicing events
- weak on ChIP data
Transcriptome:
What does ANALYSIS mean?
- low comparability of data
- difficult to identify DE genes
- co-expression analysis impossible
- no network analysis
+ applicable to any organism
+ good on differential splicing events
+ very good on ChIP data
qu an tita tiv e da ta ma in ly qua lit at iv e da ta
Chromatin-ImmunoPrecipitation-NGS (ChIP-seq)
InteractomeImmunoprecipitation of specific DNA-protein-complexes with antibodies
De-cross-link DNA-protein complexes
z
MBPHistone modification regulates expression
Ac Ac Ac Ac Me Me Me MeMeMe Me Me Me Me Me Me Me Me Me PcGs Ac TrxG H3K27me3 Me Me Me tri-methylation H3K4me3 mono-acetylation Ac
transcriptional active [Euchromatin] ON
transcriptional inactive [Heterochromatin] OFF
• Histone-modifications are important for gene activation or repression • Position of Histones on the DNA can be identified by
Chromatin-Immunoprecipitation (ChIP) and subsequent NGS
Transcriptome
(mRNA-Seq) +
Epigenome
(ChIP-Seq)
ChIP-Seq (H3K27me3) OFF RNA-Seq (transcriptome) ChIP-Seq (H3K4me3) ON FRK1 (At2g19190) At2g19200 At2g19180 low OFF-signal, high ON-signal high expression, good
z
MBPChIP-Seq (H3K27me3) OFF RNA-Seq (transcriptome) ChIP-Seq (H3K4me3) ON FRK1 (At2g19190) At2g19200 At2g19180 OFF-signal, but still mRNA
z
MBP distalcis cis TATA
proximal Promoter cis cis transcription-start Exons Introns
*
ATG • laboratory experiments • bioinformatics • synthetic promoters • biotechnology • spectro-microscopy mRNARapid gene expression response in plant
- How are rapid (stress) responses controlled? - How is specificity of this response mediated?
TF – Motif - Interaction
- How do TFs find their cis-element?
- What is the specificity of this interaction? TF
Transcription factors (TFs) act as molecular switches
http://www.pinkfloyd-forum.de bZIP WRKY BPC BAM GATAz
MBPGenome Transkriptome Proteome Metabolome „Central Dogma of Molecular Biology“
„Central Dogma of functional Genomics“
DNA RNA Protein Metabolism
Physiology promoter analysis microarray analysis Interactome Protein – DNA-interactome
Is there a cryptic code in non-coding regulatory sequence?
Genome
distal
cis cis TATA
proximal Promoter cis cis transcription-start Exons Introns
*
ATG 100 200 300 400 • frequency • position N o. o f h itsz
MBPIdentification of putative cis-elements by genomic enrichment
Genome
distal
cis cis TATA
proximal Promoter cis cis transcription-start Exons Introns
*
ATG 0 200 400 600 800 1000 1200 1400 1600 0 50 100 150 200 250 300 350 400 450 500 N o. o f h its N o. o f h itsfrequency distribution curves for the
TATATAmotif
Only ~10% of known DNA-binding proteins have
a known DNA-binding motif
Genome AGRIS Arabidopsis 2340 415 average ~2000 18 % <200 <10 %
z
MBPBinding specificity of TFs to their DNA motif
Interactome epitope tagged protein of interest in crude E. coli extract 2 pmol bio dsDNA per well colorimetric detection
DNA-protein interaction ELISA
Brand et al., 2010
Development of a DPI-ELISA screen
Interactome
z
MBPA DPI-ELISA screen with stress responsive WRKY-TFs
Interactome
Brand et al., 2013a Brand et al., 2013b
WRKY11 DBD
Compare different WRKY binding affinities
Homology modeling of the WRKY protein at the binding motif
Interactome
WRKY WRKY
z
MBPInteractome
Brand et al., 2013a Brand et al., 2013b
Homology modeling of the WRKY protein at the binding motif
WRKY-protein at the binding motif
linker region
variable library region
linker region
Homology modeling allows us to built hypotheses about binding of proteins to DNA, as soon as the sequences of both,
Interactome
Molecular Dynamics (MD) simulations with WRKY33-domains
strong binder
weak binder
• MD gives an impression about binding strength • requires usually complex
z
MBPA DPI-ELISA screen with stress responsive WRKY-TFs
Interactome
Brand et al., 2013a Brand et al., 2013b
Homology modeling and interactome provide functional clues
Interactome
• WRKY proteins recognize (slightly) different binding consensi • Different binding consensi are bound with different affinities
Quantitative binding information can be translated into a probability matrix to search genomic information for possible binding sites
WRKY11 DBD WRKY33 cDBD WRKY50 DBD ydGACY nnGACw hnGACh affinity high low
z
MBPATH1-Genechip:
22746 probe cells (79% of all A. thaliana genes)
>24000 genes (85%) can be detected due to the high redundanzcy in the genome
21685 unique probe sets
about 13600 (~60%) genes hybridize with probes to generate analyzable data
• Oligonucleotide-Array (Genechip-Arrays):
Microarray
We made use of the 1295 AtGenExpress Expression Profiles + ~1500 additional datasets for co-expression analyses Transcriptome
Homology modeling and interactome provide functional clues
Which genes are co-expressed with WRKY genes AND
have the respective binding site in their promoter?
Transcriptomez
MBP~22 000 genes; analyzed in ~3000 microarray hybridizations
Wanke et al., QBIC 2010
Co-expression analysis in complex Microarray datasets
Transcriptome
Identify WRKY-co-expressed genes
Genome analysis
combined with
transcriptome analysis
Transcriptome
distal
cis cis TATA
proximal Promoter cis cis transcription-start Exons Introns
*
ATG mRNA WR K Y GenomeIdentify binding sites in promoters
z
MBP WRKY11 DBD WRKY33 cDBD WRKY50 DBD ydGACY ~400 ~ 3000 ~40 nnGACw ~400 ~ 5000 ~60 hnGACh ~300 ~8000 ~250 affinity high low No. co-expressed genes:No. binding sites:
No. put. targets
Genome analysis
combined with
transcriptome analysis
Genome - TranscriptomeGenome analysis
combined with
transcriptome analysis
Transcriptome 6 12 24 6 12 24 6 12 24 control 0 0.1 1 10 100 no rm aliz ed s ig na l in te ns ity ( lo g sc ale ) activator repressor mock stress AtWRKY33 stressz
MBPNGS coupled genomics: In vivo target gene identification by
chromatin immuno-precipitation
Genome Isolation of bound DNA Sequencing by NGS 6 12 24 0.1 1 10 100 no rm aliz ed s ig na l in te ns ity ( lo g sc ale ) activator repressor AtWRKY33Integration of metadata from the Somssich Lab (Birkenbihl et al., 2012)
All ~60 putative target genes were independently
identified as true
WRKY11 DBD WRKY33 cDBD WRKY50 DBD ydGACY ~40 nnGACw ~60 hnGACh ~250 affinity high low
No. put. targets
What is the function of the putative target genes!
Genome - Transcriptomecell cycle chromatin remodeling basal resistance pathogen response
z
MBPWhat is the function of plant specific BBR/BPC factors?
99 93 100 95 97 76 100 73 99 Os10g0114500 Os10g0115500 HvBBR GmGAGA-BP AtBPC2 AtBPC1 AtBPC3 AtBPC7 AtBPC4 AtBPC5 AtBPC6 Os06g0130600 Ceratopteris richardii Physcomitrella patens 0.05 changes Grou p I G rou p I I
• BBR/BPCs arose evolutionary late in the green plant lineage • Bind to GAGA-motifs and recruit LHP1 to DNA
Microarray vs. NGS
TranscriptomeControl lhp1-4 bpc4 bpc6 lhp1-4
bpc4 bpc6
z
MBPMicroarray vs. NGS
Control lhp1-4 bpc4 bpc6 lhp1-4
bpc4 bpc6
Strong defects in flowers and seed production!
Transcriptome (microarray) analysis to understand what is wrong!
Microarray vs. NGS
microarrays provide the potential to compare the data with already
z
MBPMicroarray vs. NGS
• more induced genes in the mutants function as repressor proteins
Microarray vs. NGS
microarray NGS
Verify the data by Transcriptome (NGS) analysis!
• identical RNA used in both experiments
• two replicates libraries • paired end reads
• ~ 80 Mio analyzable reads from each library
30 6 17 10 412 221 27
z
MBPMicroarray vs. NGS
microarray NGS 123 12 28 12 729 323 65 30 6 17 10 412 221 27 28 17 0Microarray vs. NGS
Genome - InteractomeIsolation of target DNA
Sequencing by NGS
Interactome (ChIP-seq) analysis to identify direct target genes!
• two replicate libraries from BPC6-expressing plants • single end reads
z
MBPMicroarray vs. NGS
microarray NGS 123 12 28 12 729 323 65 30 6 17 10 412 221 27 28 17 0• 33 putative direct targets overlapped with microarray expression • 47 putative direct targets overlapped with NGS expression
• only a total of 8 were picked up with both techniques
6 2
GLOBAL vs. LOCAL
Next Generation Sequencing (NGS)
• NGS is a quantitative approach,…
…but reveals mostly (high value) qualitative data
• Quantitative analyses on a Global scale are informative, while specific local questions provide qualitative data only or remain hidden.
• Unfortunately, NGS for transcriptomics is often useless (to date) • local expression analysis is difficult
• co-expression analysis is almost impossible
• Chromatin-ImmunoPrecipitation-NGS (ChIP-seq) data is highly informative,
z
MBP • Quantitative ‘omics’ data can be normalized and, thus, is comparable.• Quantitative data from different sources can be used directly in a
simultaneous combined analysis (e.g. transcriptomics and metabolomics)
• NGS data harbors great potential, but (at present) can not substitute quantitative gene expression data e.g. in co-expression analyses
• Novel methods, such as the DPI-ELISA, enable quantitative proteomics
• The combined analysis of quantitative transcriptome and interactome data provide unique insights into protein function
What does ANALYSIS mean - personally?
Challenge:
• Mine for meaningful data
Problem: Conflicting interests
bioinformatics biologists • significant output • is not interested in a special gene or outcome (unbiased) independent of funding
• biologically relevant output • is ONLY interested
in his special gene(s) of interest (biased)
z
MBPThank you!
Andreas Hecker Sachie Kimura Rebecca Dautel Joachim Kilian Luise Brand Nathalie Sebening Christine Zehren Angelika Anna Dagmar Pogomara Jan Hirsch Tübingen Saarland ETH Zürich Sam Zeeman Sebastian Soyk Uni Rostock Birgit Piechulla Katrin Wenke MPI Tübingen Markus Schmid Silvio Collani INRA Versailles Valerie Gaudin Uni WürzburgWolfgang Dröge‐Laser Christoph Weiste