• No results found

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

N/A
N/A
Protected

Academic year: 2021

Share "Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Christopher Benner, PhD

Director, Integrative Genomics and

Bioinformatics Core (IGC)

iDASH Webinar, April 17

th

2015

Analysis and Integration of Big Data

from Next-Generation Genomics,

Epigenomics, and Transcriptomics

(2)

Overview for Webinar:

Quick introduction to the wider world of next-generation

sequencing (NGS)

Overview of HOMER, our software for NGS analysis

Using advanced NGS assays to understand B cell

development and the generation of antibody repertoires

Quick teaser on how innovative NGS assays and genetics

can enhance our understanding of transcriptional

mechanisms

(3)

Next-Generation Sequencing

Large Consortiums

1000 Genomes Project

TCGA (cancer)

many many more…

Illumina sequencing can sequence any

(4)

NGS Innovation

RNA-Seq

(i.e. gene expression) GRO-Seq

(i.e. transcription rates) ChIP-Seq

(5)
(6)

HOMER

(Hypergeometric Optimization of Motif EnRichment)

http://homer.salk.edu

Next-generation Sequencing

Analysis for Quantitative Genomics

– Software suite for UNIX command-line

environment (works downstream of manufacture’s pipeline and mapping to reference genome)

– Quality Control for Experiments

– Basic and advanced analysis, annotation,

and visualization capabilities

– General framework handles data from

different types of quantitative

sequencing (ChIP-Seq/RNA-Seq/GRO-Seq/DNase-Seq/etc.)

– Can work with any organism

Regulatory element analysis

De novo Motif Discovery

– Sort out spatial relationships between

(7)
(8)

HOMER Functionality

Any organism with a FASTA file

can be analyzed with HOMER

Model organisms are

preconfigured with

annotation information:

– Human, mouse, rat,

zebrafish, drosophila, C. elegans, yeast, pombe, arabidopsis

Genomes annotated on

the UCSC Genome

Browser are easy to

incorporate, but any

custom genome can be

added with annotation

files (i.e., GTF files)

(9)
(10)

Best way to develop NGS Analysis

methods: Do it in the context of research!

Biology

NGS Methods Development

(11)

Interplay between epigenetics, spatial genome conformation,

and transcription in B-lymphocyte development

(12)

Interplay between epigenetics, spatial genome conformation,

and transcription in B-lymphocyte development

(13)

Why study transition from

pre-pro-B

to

pro-B

cells?

Lineage commitment:

pro-B cells cannot

dedifferentiate back to hematopoietic stem cells.

i.e. pre-pro-B cells can be used to reconstitute the whole

immune system

Antibody Recombination:

Pro-B cells are paused at the

exact stage when VDJ recombination is set to occur

B cell marker expression:

Key cell-surface markers and

transcription factors are induced in pro-B cells,

including CD19, Ebf1 (Early B cell factor), Pax5, and

Foxo1.

(14)
(15)

Unbiased

Discovery of

Regulatory

Features in

pro-B cells

(16)

Relationship between Transcription

Factors and Epigenetic Modifications

(17)

Unbiased Discovery of Lineage

Determining Transcription Factors

(18)

Hi-C: Mapping 3D interactions in the genome

Hi-C method from Lieberman-Aiden et al., Science 2009

GRO-Seq

(19)

Most significant interactions in the genome

occur at epigenetically modified locations

(20)

Cell-type specific

interactions often

change their DNA

methylation status

(21)

Genome Organization into

topological domains

TAD definition by Dixon et al. 2012

pre-pro-B pro-B

(22)

CTCF binding site is directional

5’ boundary of TAD 3’ boundary of TAD

CTCF only makes interactions with other CTCF sites in a specific direction along the DNA determined by the orientation of the motif

(23)

Clusters of CTCF sites form

‘Super Anchors’

pre-pro-B pro-B

(24)

Clusters of CTCF sites form

‘Super Anchors’

Borrowing from Richard Young’s Super Enhancer concept, we can define over 2500 CTCF ‘super anchors’ in the data Only 25% of CTCF sites are found at boundaries. However, nearly 50% of ‘Super Anchors’ are found at the boundaries of

topological domains. Foxo1

Igh Firre

(25)

Overview of Immunoglobulin Heavy

Chain Locus

(26)

Igh Locus in the Genome (~3 Mb)

(27)

Igh Locus in the Genome (~3 Mb)

Top Super Anchor

To generate full repertoires of

Antibodies, each V region needs to find a way to interact with the D regions to recombine

(28)

V regions in Igh locus are associated

with CTCF sites

In addition, each CTCF site associated with V regions is in a consistent orientation
(29)
(30)

Igh Locus Model

Top Super Anchor (looping backstop) VD recombination target

(31)

Summary

NGS is a lot more than genome sequencing

Integration of different data types empowers

discovery where any given data type alone

falls short

The DNA sequence (CTCF motifs and their

orientation) dictates the structure of the

genome to accomplish critical tasks such as

VDJ recombination

(32)
(33)
(34)

References

Related documents

For the project, the University of Minnesota Department of Civil Engineering (CE), the Minnesota Department of Transportation (Mn/DOT), and the MnROAD pavement research facility

Table 1 shows the mating latency, mating time, duration between first and second mating, fecundity, productivity and viability in Drosophila melanogaster , D.. Mating

bl dkuwu ds vuqlkj futh Ldwyks es xjhc cPpksa ds 25 izfr”kr dksVk fu/kkZfjr gksxkA ijUrq bl fd xkjaaVh D;k gksxh fd futh fo/kky; cxSj Hksn&Hkko ds mu cPPkksa dks iw.kZ xq.koRrk

The testing of the VOVO antigen with the coded horse serum samples showed highly reproducible antibody titers ranging from 1,320 to 1,135,000 LU (Fig. In contrast, the

Fixed-point Matlab software and FPGA hardware simulation results using the proposed algorithm for the 512× 512 House image with a block size of 64 × 64 and a 3 × 3 gradient mask:

Distinct activation domains within cAMP response element- binding protein (CREB) mediate basal and cAMP-stimulated transcription. Inhibition by insulin of protein

Additionally, it proved to be very helpful that Miller Heiman can be easily combined with requirements engineering activities since this sales approach uses a very

“Project managers should always be vigilant of their project’s risks and manage them on an on-going basis in order to reduce project costs”, says Lukas Hendricks. He recommends