• No results found

Copy Number Variation: available tools

N/A
N/A
Protected

Academic year: 2021

Share "Copy Number Variation: available tools"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Copy Number Variation: available tools

Jeroen F. J. Laros

Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics

(2)

Introduction

A literature review of available tools.

(3)

Introduction

A literature review of available tools.

Contents:

• Coverage based. • Breakpoint based. • Discordant pairs based. • De novo insertions. • Existing pipelines.

(4)

Coverage based methods

(5)

Coverage based methods

MoGUL: • Lee et al.

[email protected]

• Common insertions and deletions in a population. • Clustering / Bayesian network.

• Minor Allele Frequency (MAF) for each variant. • Especially suitable for MAF>0.06 and indels>20bp.

• Assigns confidence to variant calls.

• Proposes mrFast as initial aligner, others may work too?

(6)

Coverage based methods

FREEC:

• Boeva et al.

http://bioinfo-out.curie.fr/projects/freec • Window size automatically optimised.

• Works with either a control dataset. . . • or usesGCcontent.

(7)

Coverage based methods

FREEC:

• Boeva et al.

http://bioinfo-out.curie.fr/projects/freec • Window size automatically optimised.

• Works with either a control dataset. . . • or usesGCcontent.

In Boeva et al., there is a good comparison: • CNV-seq.

• SegSeq. • RDXplorer.

(8)

Coverage based methods

FREEC:

• Boeva et al.

http://bioinfo-out.curie.fr/projects/freec • Window size automatically optimised.

• Works with either a control dataset. . . • or usesGCcontent.

In Boeva et al., there is a good comparison: • CNV-seq.

• SegSeq. • RDXplorer.

(9)

Coverage based methods

Name FREEC CNV-seq SegSeq RDXplorer

Refs. (Xie and

Tammi, 2009) (Chiang et al., 2009) (Yoon et al., 2009) Compatibility Paired-ends/Mate-pairs + - - + Input format SAM, bowtie,

eland, BED, arachne, psl, SOAP BLAT psl, SOLiD arachne BAM

Genomes Any Any Any Human

Strategy

Fixed length moving window + statistical approach + + - + Automatic evaluation of window size + + +* -GC-content normaliza-tion + - - + Normalization with a control dataset + + + -Refinement of breakpoint boundaries - -+

-Table 1: Tools for CNA/CNV prediction. (Boeva et al. 2010) * Window size is selected separately for each region according to the number of reads in this region in the control sample.

(10)

Coverage based methods

Name FREEC CNV-seq SegSeq RDXplorer

Refs. (Xie and

Tammi, 2009) (Chiang et al., 2009) (Yoon et al., 2009) Compatibility

Minimal number of reads Any Any Any High (>1.2x) Segmentation of profiles + - + + Automatic annotation of

gain/loss regions +

- - +

Copy Number

predic-tion** +

- - +***

Output

CNA coordinates in one file

+ - - +

Graphical output +**** + - -Programming language C/C++ Perl/R Matlab Java/R Computational efficiency

Memory usage Low Low Low >2GB of RAM Running time***** 45s w/control

sample 105s w/o control

422s 250s N/A

Table 1: Tools for CNA/CNV prediction. (Boeva et al. 2010) ** Assignment of copy number changes to gain or loss.

*** Only for diploid genomes.

**** It is possible to visualize resulted copy number profiles and CNAs with a script R (makeGraph.R) included in the package.

***** Preprocessed HCC1143 data from (Chiang et al., 2009) on a 8-core 64-bit Linux machine.

(11)

Breakpoint based methods

(12)

Breakpoint based methods

GMAP / GSNAP: • Wu et al.

http://research-pub.gene.com/gmap • Index based (global search).

• Mapping RNA or ESTs.

• Can use a database, can do without. • SNP-tolerant alignment.

• Potentially suitable for detection of all breakpoints.

(13)

Breakpoint based methods

GMAP / GSNAP: • Wu et al.

http://research-pub.gene.com/gmap • Index based (global search).

• Mapping RNA or ESTs.

• Can use a database, can do without. • SNP-tolerant alignment.

• Potentially suitable for detection of all breakpoints. BWA-SW:

• Li et al.

http://bio-bwa.sourceforge.net • BWA for long-read alignment.

• Chimeric reads.

(14)

Discordant pairs methods

(15)

Discordant pairs methods

Breakway: • Clark et al.

http://breakway.sourceforge.net

• Clustering of discordantly mapped paired-end reads. • BAM input.

• Designed for the use in pipelines.

(16)

Discordant pairs methods

Breakway: • Clark et al.

http://breakway.sourceforge.net

• Clustering of discordantly mapped paired-end reads. • BAM input.

• Designed for the use in pipelines. GASV: • Sindi et al. • http://code.google.com/p/gasv • Deletions, Invertions. • Translocations. • No insertions. • BAM input. • Poor documentation.

(17)

Discordant pairs methods

HYDRA SV: • Quinlan et al.

http://code.google.com/p/hydra-sv • Clustering of discordant pairs.

• No classifications, but can be done with BEDTools. • Future plans:

• Denovo assembly of insertions. • Split reads.

• Automatic annotation.

• Toolkit includes useful tools likebamToFastq.

(18)

Discordant pairs methods

HYDRA SV: • Quinlan et al.

http://code.google.com/p/hydra-sv • Clustering of discordant pairs.

• No classifications, but can be done with BEDTools. • Future plans:

• Denovo assembly of insertions. • Split reads.

• Automatic annotation.

• Toolkit includes useful tools likebamToFastq.

• Possibly useful when we want to realign unmapped

reads.

(19)

De novo insertions

(20)

De novo insertions

TIGRA SV: • Chen et al.

http://tigrasv.sourceforge.net • Targeted local assembly.

• Iterative graph routing assembly. • Optional breakpoint library. • BAM input.

(21)

De novo insertions

TIGRA SV: • Chen et al.

http://tigrasv.sourceforge.net • Targeted local assembly.

• Iterative graph routing assembly. • Optional breakpoint library. • BAM input.

Next-generation VariationHunter: • Hormozdiari et al.

http://compbio.cs.sfu.ca/strvar.htm • Transposon insertion discovery.

• Discordant pairs / conflict resolution. • >85%discovery.

• >90%accuracy.

• DIVET input format.

(22)

De novo insertions

NovelSeq:

• Hajirasouliha et al.

• Long novel sequence insertions. • In essence a pipeline:

1. Alignment: mrFast.

2. Assembler: EULER-SR / ABySS. 3. Contamination filter: BLAST. 4. Clustering / anchoring: mrCAR.

5. Local assembly: mrSAAB on the output of 3 and 4. • SAM / DIVET input.

(23)

Miscellaneous

(24)

Miscellaneous

Dindel:

• Albers et al.

http://sites.google.com/site/keesalbers • Localized reassembly/realignment.

• Multiple BAM files.

• Estimate haplotype frequencies.

(25)

Miscellaneous

Mosaik:

• Lee & Str¨omberg.

https://github.com/wanpinglee/MOSAIK • Smith-Waterman gapped alignment.

• Discordantly mapped paired-end reads. • Reference-guided assemblies. • Toolkit includes: • MosaikBuild. • MosaikAligner. • MosaikSort. • MosaikAssembler.

• Input is raw sequence data.

(26)

Existing pipelines

(27)

Existing pipelines

SVMerge: • Wong et al.

http://svmerge.sourceforge.net • BreakDancer, Pindel, SE cluster, RDXplorer,

RetroSeq, . . . • Expandable. • BAM input.

(28)

Existing pipelines

Figure 1: SVMerge pipeline.

(29)

Questions?

Acknowledgements:

Martijn Vermaat Maarten van Iterson

References

Related documents

families in the comparison population of a large genetics study. Subjects were classified into four groups based on their lipid status: 1) isolated LDL-C disorder, defined by a

Information communications technologies (ICTs) have become an integral part of education the world over. ICTs is an umbrella term used to describe communication

The current review gives past development and the basis for selection of a significant combination of Aceclofenac, a COX-2 inhibitor and Drotaverine HCl, an antispasmodic agent

In this investigation, a mixture of sugarcane bagasse and rubber seed oil was subjected to pyrolysis for liquid fuel production, using aluminosilicates with different Si/Al ratios

On the other hand, MAP precipitation is known as the best approach to heavy metals and phosphate recovery, which is related to the nucleation and growth of struvite crystals in

The aim of this study was to determine the best polyurethane on the tensile strength, impact strength, elongation at break, water absorption, characterization of Fourier

Particularly the 2010 cohort was found to have significantly higher academic achievement than the 2011 cohort (p < 0.05) and a higher proportion of students progressing

ABA tests quantify the balance net acid producing potential (NAPP) between potentially acid- generating potential (MPA), particularly the oxidation of sulfide materials,