Christopher Benner, PhD
Director, Integrative Genomics and
Bioinformatics Core (IGC)
iDASH Webinar, April 17
th2015
Analysis and Integration of Big Data
from Next-Generation Genomics,
Epigenomics, and Transcriptomics
Overview for Webinar:
•
Quick introduction to the wider world of next-generation
sequencing (NGS)
•
Overview of HOMER, our software for NGS analysis
•
Using advanced NGS assays to understand B cell
development and the generation of antibody repertoires
•
Quick teaser on how innovative NGS assays and genetics
can enhance our understanding of transcriptional
mechanisms
Next-Generation Sequencing
•
Large Consortiums
–
1000 Genomes Project
–
TCGA (cancer)
–
many many more…
Illumina sequencing can sequence any
NGS Innovation
RNA-Seq(i.e. gene expression) GRO-Seq
(i.e. transcription rates) ChIP-Seq
HOMER
(Hypergeometric Optimization of Motif EnRichment)
http://homer.salk.edu
•
Next-generation Sequencing
Analysis for Quantitative Genomics
– Software suite for UNIX command-line
environment (works downstream of manufacture’s pipeline and mapping to reference genome)
– Quality Control for Experiments
– Basic and advanced analysis, annotation,
and visualization capabilities
– General framework handles data from
different types of quantitative
sequencing (ChIP-Seq/RNA-Seq/GRO-Seq/DNase-Seq/etc.)
– Can work with any organism
•
Regulatory element analysis
– De novo Motif Discovery
– Sort out spatial relationships between
HOMER Functionality
Any organism with a FASTA file
can be analyzed with HOMER
•
Model organisms are
preconfigured with
annotation information:
– Human, mouse, rat,
zebrafish, drosophila, C. elegans, yeast, pombe, arabidopsis
•
Genomes annotated on
the UCSC Genome
Browser are easy to
incorporate, but any
custom genome can be
added with annotation
files (i.e., GTF files)
Best way to develop NGS Analysis
methods: Do it in the context of research!
Biology
NGS Methods Development
Interplay between epigenetics, spatial genome conformation,
and transcription in B-lymphocyte development
Interplay between epigenetics, spatial genome conformation,
and transcription in B-lymphocyte development
Why study transition from
pre-pro-B
to
pro-B
cells?
•
Lineage commitment:
pro-B cells cannot
dedifferentiate back to hematopoietic stem cells.
–
i.e. pre-pro-B cells can be used to reconstitute the whole
immune system
•
Antibody Recombination:
Pro-B cells are paused at the
exact stage when VDJ recombination is set to occur
•
B cell marker expression:
Key cell-surface markers and
transcription factors are induced in pro-B cells,
including CD19, Ebf1 (Early B cell factor), Pax5, and
Foxo1.
Unbiased
Discovery of
Regulatory
Features in
pro-B cells
Relationship between Transcription
Factors and Epigenetic Modifications
Unbiased Discovery of Lineage
Determining Transcription Factors
Hi-C: Mapping 3D interactions in the genome
Hi-C method from Lieberman-Aiden et al., Science 2009
GRO-Seq
Most significant interactions in the genome
occur at epigenetically modified locations
Cell-type specific
interactions often
change their DNA
methylation status
Genome Organization into
topological domains
TAD definition by Dixon et al. 2012
pre-pro-B pro-B
CTCF binding site is directional
5’ boundary of TAD 3’ boundary of TADCTCF only makes interactions with other CTCF sites in a specific direction along the DNA determined by the orientation of the motif
Clusters of CTCF sites form
‘Super Anchors’
pre-pro-B pro-B
Clusters of CTCF sites form
‘Super Anchors’
Borrowing from Richard Young’s Super Enhancer concept, we can define over 2500 CTCF ‘super anchors’ in the data Only 25% of CTCF sites are found at boundaries. However, nearly 50% of ‘Super Anchors’ are found at the boundaries of
topological domains. Foxo1
Igh Firre
Overview of Immunoglobulin Heavy
Chain Locus
Igh Locus in the Genome (~3 Mb)
Igh Locus in the Genome (~3 Mb)
Top Super Anchor
To generate full repertoires of
Antibodies, each V region needs to find a way to interact with the D regions to recombine
V regions in Igh locus are associated
with CTCF sites
In addition, each CTCF site associated with V regions is in a consistent orientationIgh Locus Model
Top Super Anchor (looping backstop) VD recombination target
Summary
•
NGS is a lot more than genome sequencing
•
Integration of different data types empowers
discovery where any given data type alone
falls short
•
The DNA sequence (CTCF motifs and their
orientation) dictates the structure of the
genome to accomplish critical tasks such as
VDJ recombination