• No results found

Next Generation Sequencing Data Visualization

N/A
N/A
Protected

Academic year: 2021

Share "Next Generation Sequencing Data Visualization"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Next Generation Sequencing

Data Visualization

GBrowse2 from GMOD

Andreas Gisel

Institute for Biomedical Technologies

CNR

(2)

GMOD

is the

G

eneric

M

odel

O

rganism

D

atabase project

GMOD is a collection of interconnected applications and databases that biologists use

as repositories and as tools.

That connectivity is really the key here.

There's no lack of tools, but many of these tools will be little used since the typical

prospective user may not have the resources or expertise required to install the tool and

connect it, in some way, to the data in hand.

http://www.gmod.org/

(3)

The tutorial will show:

how to display a reference sequence with feature in

GBrowse2

how to display Next Generation Sequencing (NGS)

mapping data.

We will use

Escherichia coli str. K-12 substr. DH10B, complete genome - NC_010473.1

(ref1.fa)

and

(4)

http://localhost/cgi-bin/gb2/gbrowse/yeast_advanced/

GBrowse2

(5)

http://localhost/cgi-bin/gb2/gbrowse/yeast_advanced/

(6)

/etc/gbrowse2/

GBrowse2

(7)

Text

/var/lib/gbrowse2/databases

(8)

GBrowse2

Set-up the gbrowser for E.coli data

We have:

Reference sequence data in FASTA

Annotation for the reference sequence in GFF

(9)

GBrowse2

Set-up the gbrowser for E.coli data

We have:

Reference sequence data in FASTA

Annotation for the reference sequence in GFF

We need:

E. coli configuration file

Add E.coli informatio to the general

GBrowse.conf file

(10)

Annotation Data

Formats

EMBL annotation files

GeneBank annotation files

GFF files

(11)

GeneBank format

LOCUS HQ336405 157790 bp DNA circular PLN 22-DEC-2010 DEFINITION Prunus persica chloroplast, complete genome.

ACCESSION HQ336405

VERSION HQ336405.1 GI:309321413 KEYWORDS .

SOURCE chloroplast Prunus persica (peach) ORGANISM Prunus persica

Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids; fabids; Rosales; Rosaceae; Maloideae; Amygdaleae; Prunus. REFERENCE 1 (bases 1 to 157790)

AUTHORS Jansen,R.K., Saski,C., Lee,S.B., Hansen,A.K. and Daniell,H. TITLE Complete Plastid Genome Sequences of Three Rosids (Castanea,

Prunus, Theobroma): Evidence for At Least Two Independent Transfers of rpl22 to the Nucleus

JOURNAL Mol. Biol. Evol. 28 (1), 835-847 (2011) PUBMED 20935065

REFERENCE 2 (bases 1 to 157790)

AUTHORS Jansen,R.K., Saski,C., Lee,S.-B., Hansen,A.K. and Daniell,H. TITLE Direct Submission

JOURNAL Submitted (28-SEP-2010) Integrative Biology, University of Texas at Austin, 1 University Station C0930, Austin, TX 78712, USA

(12)

GeneBank format

FEATURES Location/Qualifiers source 1..157790 /organism="Prunus persica" /organelle="plastid:chloroplast" /mol_type="genomic DNA" /db_xref="taxon:3760" gene complement(join(99804..100597,71346..71459)) /gene="rps12" /trans_splicing CDS complement(join(99804..99829,100366..100597,71346..71459)) /gene="rps12" /trans_splicing /codon_start=1 /transl_table=11 /product="ribosomal protein S12" /protein_id="ADO64999.1" /db_xref="GI:309321458" /translation="MPTIKQLIRNTRQPIRNVTKSPALGGCPQRRGTCTRVYTITPKK PNSALRKVARVRLTSGFEITAYIPGIGHNLQEHSVVLVRGGRVKDLPGVRYHIVRGTL DAVGVKDRQQGRSKYGVKKPK" Saturday, March 17, 2012

(13)

GeneBank format

ORIGIN

1 tgggcgaacg acgggaattg aacccgcgca tggtggattc acaatccact gccttgatcc 61 acttggctac atccgcccct tatactatta caaatattta caccatttat cattacttgt 121 aagataaaat acaacataaa ataaactgaa acttttaata ttttaattaa attttgtagt 181 aaattaacta aaaaaaaata tagaacaaaa caatatagta aagttaagta gtaaataaaa 241 aaaatactaa atagtaaagg agcaataaca aacctcttga tataacaaga aatttattat 301 tgctccttta ctttcaagaa ctcctatata ctaagaccaa agtcttatcc atttatagat 361 ggaacttcaa cagcagctag atctagaggg aaattatggg cattacgttc atgcataact 421 tccataccaa ggttagcgcg gttaataata tcagcccaag tattaattac acgaccctga 481 ctatcaacta cagattgatt gaaattaaaa ccatttaagt tgaaagccat agtgctgata 541 cctaaagcgg taaaccagat acctactaca ggccaagcag ctaggaagaa atgtaaagaa 601 cgagaattgt tgaaactagc atattggaag atcaatcggc caaaataacc atgagcggct 661 acgatattat aggtttcttc ctcttgaccg aatctgtaac cttcattagc agattcattt 721 tctgtggttt ccctgatcaa actagaggtt accaaggacc catgcatagc actgaatagg 781 gagccgccga atacaccagc tacgcctaac atgtgaaatg ggtgcataag gatgttgtgc 841 tcggcttgga atacaatcat gaagttgaaa gtaccggaga ttcctagggg cataccgtca 901 gaaaagcttc cttgaccaat tggatatatc aagaaaacag cagtagcagc tgcaacagga 961 gctgaatatg caacagcaat ccaagggcgc atacccagac ggaaactaag ttcccactca 1021 cgacccatgt agcaagctac accaagtaag aagtgtagaa caattagttc ataaggacca 1081 ccgttgtata accattcatc aacggaagcc gcttcccata tcgggtaaaa gtgcaaacct 1141 atagctgcag aggtaggaat aatggcacca gaaataatat tgtttccata aagtaaagat 1201 ccagaaacag gttcacgaat accatcaata tctactggag gtgcagcaat gaaagcaata

(14)

GFF format

##gff-version 3

# sequence-region HQ336405 1 157790 # conversion-by bp_genbank2gff3.pl # organism Prunus persica

# date 22-DEC-2010

# Note Prunus persica chloroplast, complete genome.

# working on region:HQ336405, Prunus persica, 22-DEC-2010, Prunus persica chloroplast, complete genome.

# Possible gene unflattening error withHQ336405: consult STDERR

HQ336405!GenBank! region! 1! 157790! .! +! .! ID=HQ336405;Dbxref=taxon:3760;Note=Prunus persica chloroplast complete genome;date=22-DEC-2010;mol_type=genomic

DNA;organelle=plastid:chloroplast;organism=Prunus persica HQ336405!GenBank! CDS!99804! 99829! .! -! .! ID=rps12;Dbxref=GI: 309321458;codon_start=1;gene=rps12;product=ribosomal protein S12;protein_id=ADO64999.1;trans_splicing=_no_value;transl_table=11;translation=length.123 HQ336405!GenBank! CDS!100366! 100597! .! -! .! ID=rps12;Dbxref=GI: 309321458;codon_start=1;gene=rps12;product=ribosomal protein S12;protein_id=ADO64999.1;trans_splicing=_no_value;transl_table=11;translation=length.123 HQ336405!GenBank! CDS!71346! 71459! .! -! .! ID=rps12;Dbxref=GI: 309321458;codon_start=1;gene=rps12;product=ribosomal protein S12;protein_id=ADO64999.1;trans_splicing=_no_value;transl_table=11;translation=length.123

HQ336405!GenBank! gene!99804! 100597! .! -! . ID=rps12;gene=rps12;trans_splicing=_no_value

HQ336405!GenBank! gene!71346! 71459! .! -! . ID=rps12;gene=rps12;trans_splicing=_no_value

HQ336405!GenBank! gene!3! 77! .! -! .! ID=trnH-GUG;gene=trnH-GUG

HQ336405!GenBank! tRNA!3! 77! .! -! .! ID=trnH-GUG.r01;Parent=trnH GUG;Note=anticodon:GUG;gene=trnH-GUG;product=tRNA-His

(15)

GBrowse2

Create and configure the tracks visualizing the date

of E.coli

[GENERAL]

description = Escherichia coli str. K-12 substr. DH10B database = annotations

initial landmark = NC_010473.1:1..10000

# bring in the special Submitter plugin for the rubber-band select menu plugins = FastaDumper RestrictionAnnotator SequenceDumper TrackDumper Submitter S

autocomplete = 1

default tracks = Genes ORFs tRNAs CDS Transp Centro:overview GC:region

# examples to show in the introduction

(16)

################################# # database definitions

################################# [scaffolds:database]

db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor memory

-dir /var/lib/gbrowse2/databases/ecoli_seq search options = default +autocomplete

[annotations:database]

db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor memory

-dir /var/lib/gbrowse2/databases/ecoli_annotations search options = default +autocomplete

GBrowse2

Create and configure the tracks visualizing the date

of E.coli

(17)

# Default glyph settings [TRACK DEFAULTS] glyph = generic database = annotations height = 8 bgcolor = cyan fgcolor = black label density = 25 bump density = 100

show summary = 99999 # go into summary mode when zoomed out to 100k

# default pop-up balloon

balloon hover = <b>$name</b> is a $type spanning $ref from $start to $end. Click for more details.

[CDS] feature = gene glyph = cds description = 0 height = 26 sixframe = 1

label = sub {shift->name . " reading frame"} key = CDS

balloon click width = 500 balloon hover width = 350

balloon hover = <b>$name</b> is a $type spanning $ref from $start to $end. Click to search Google for $name.

balloon click = http://www.google.com/search?q=$name

GBrowse2

Create and configure the tracks visualizing the date

of E.coli

(18)

GBrowse2

Insert E.coli in the general GBrowse config file

############################################################################## #

# DATASOURCE DEFINITIONS

# One stanza for each configured data source #

############################################################################## [yeast]

description = Yeast chromosomes 1+2 (basic) path = yeast_simple.conf

[yeast_advanced]

description = Yeast chromosomes 1+2 (advanced) path = yeast_chr1+2.conf

[ecoli]

description = Escherichia coli str. K-12 substr. DH10B path = ecoli.conf

(19)

GBrowse2

Add data to “databases”

################################# # database definitions

################################# [scaffolds:database]

db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor memory

-dir /var/lib/gbrowse2/databases/ecoli_seq search options = default +autocomplete

[annotations:database]

db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor memory

-dir /var/lib/gbrowse2/databases/ecoli_annotations search options = default +autocomplete

Create:

/var/lib/gbrowse2/databases/ecoli_seq

(20)

GBrowse2

Add data to “databases”

Move to:

/var/lib/gbrowse2/databases/ecoli_seq

-

ref1.fa

-

chromosomes.gff3

/var/lib/gbrowse2/databases/ecoli_annotations

-

NC_010473.gff

Saturday, March 17, 2012

(21)

GBrowse2

(22)

Create mapping data

Map with bowtie:

index:

bowtie-build -f ref1.fa ref1

map:

bowtie -n 1 -l 30 -I 0 -X 400 --un unmapped -p 2 -S ref/

ref1 -1 illumina/reads1.fq -2 illumina/reads2.fq > pair.sam

SAM to BAM:

index:

samtools faidx ref1.fa

SAM to BAM:

samtools import ref1.fa.fai pair.sam pair.bam

sort BAM:

samtools sort pair.bam pair_sorted

index BAM:

samtools index pair_sorted.bam

(23)

Modify the ecoli.conf

Add database:

[ecolisam:database]

db_adaptor = Bio::DB::Sam

db_args = -fasta /var/lib/gbrowse2/databases/ecolisam/ref1.fa

-bam /var/lib/gbrowse2/databases/ecolisam/pair_sorted.bam

search options = none

Add tracks:

[CoverageXyplot]

feature = coverage

glyph = wiggle_xyplot

database = ecolisam

height = 50

fgcolor = black

bicolor_pivot = 20

pos_color = blue

neg_color = red

(24)

Create mapping database

create directory ecolisam

in

/var/lib/gbrowse2/databases

copy

pair_sorted.bam and

pair_sorted.bam.bai

to ecolisam

and set the right privileges to ecolisam so that

the browser can access them

(25)

References

Related documents

Of the patients, 50 underwent arthroscopic anterior cruciate ligament reconstruction with an autogenous hamstring tendon graft using the cross-pin technique, and the

In the X-ray jets, we used the thermal and nonthermal components of the ICM and lobe emission, and compared between a model with an additional thermal or additional nonthermal

Select your working directory as the Study directory using the file browser in the upper right of the window.. Click Next to continue the Create

For this study, I did not look at the impact of the context in which the experience took place, but how the mentor described the experiences and why they perceived that

Pokud nabídka ropy výrazně poklesne (jako např. Poptávka po ropě může být dána růstem reálného produktu daných zemí, nicméně je zřejmé že cena ropy

 Estate Tax, Income Tax, and Life Insurance Planning with Charitable Lead Trusts  National teleconference presentation delivered to audience of attorneys, CPAs, and.

Urban traffic management will also increasingly depend on various sharing models for means of trans- port, such as bikesharing, carsharing or taxi-sharing concepts, which must

477. Glast, Phillips &amp; Murray, Civil Action No. June 24, 2008) (if the malpractice plaintiff can recover on any theory not involving patent law, no federal jurisdiction);