Next Generation Sequencing
Data Visualization
GBrowse2 from GMOD
Andreas Gisel
Institute for Biomedical Technologies
CNR
GMOD
is the
G
eneric
M
odel
O
rganism
D
atabase project
•
GMOD is a collection of interconnected applications and databases that biologists use
as repositories and as tools.
•
That connectivity is really the key here.
•
There's no lack of tools, but many of these tools will be little used since the typical
prospective user may not have the resources or expertise required to install the tool and
connect it, in some way, to the data in hand.
http://www.gmod.org/
The tutorial will show:
how to display a reference sequence with feature in
GBrowse2
how to display Next Generation Sequencing (NGS)
mapping data.
We will use
Escherichia coli str. K-12 substr. DH10B, complete genome - NC_010473.1
(ref1.fa)
and
http://localhost/cgi-bin/gb2/gbrowse/yeast_advanced/
GBrowse2
http://localhost/cgi-bin/gb2/gbrowse/yeast_advanced/
/etc/gbrowse2/
GBrowse2
Text
/var/lib/gbrowse2/databases
GBrowse2
Set-up the gbrowser for E.coli data
We have:
Reference sequence data in FASTA
Annotation for the reference sequence in GFF
GBrowse2
Set-up the gbrowser for E.coli data
We have:
Reference sequence data in FASTA
Annotation for the reference sequence in GFF
We need:
E. coli configuration file
Add E.coli informatio to the general
GBrowse.conf file
Annotation Data
Formats
EMBL annotation files
GeneBank annotation files
GFF files
GeneBank format
LOCUS HQ336405 157790 bp DNA circular PLN 22-DEC-2010 DEFINITION Prunus persica chloroplast, complete genome.
ACCESSION HQ336405
VERSION HQ336405.1 GI:309321413 KEYWORDS .
SOURCE chloroplast Prunus persica (peach) ORGANISM Prunus persica
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids; fabids; Rosales; Rosaceae; Maloideae; Amygdaleae; Prunus. REFERENCE 1 (bases 1 to 157790)
AUTHORS Jansen,R.K., Saski,C., Lee,S.B., Hansen,A.K. and Daniell,H. TITLE Complete Plastid Genome Sequences of Three Rosids (Castanea,
Prunus, Theobroma): Evidence for At Least Two Independent Transfers of rpl22 to the Nucleus
JOURNAL Mol. Biol. Evol. 28 (1), 835-847 (2011) PUBMED 20935065
REFERENCE 2 (bases 1 to 157790)
AUTHORS Jansen,R.K., Saski,C., Lee,S.-B., Hansen,A.K. and Daniell,H. TITLE Direct Submission
JOURNAL Submitted (28-SEP-2010) Integrative Biology, University of Texas at Austin, 1 University Station C0930, Austin, TX 78712, USA
GeneBank format
FEATURES Location/Qualifiers source 1..157790 /organism="Prunus persica" /organelle="plastid:chloroplast" /mol_type="genomic DNA" /db_xref="taxon:3760" gene complement(join(99804..100597,71346..71459)) /gene="rps12" /trans_splicing CDS complement(join(99804..99829,100366..100597,71346..71459)) /gene="rps12" /trans_splicing /codon_start=1 /transl_table=11 /product="ribosomal protein S12" /protein_id="ADO64999.1" /db_xref="GI:309321458" /translation="MPTIKQLIRNTRQPIRNVTKSPALGGCPQRRGTCTRVYTITPKK PNSALRKVARVRLTSGFEITAYIPGIGHNLQEHSVVLVRGGRVKDLPGVRYHIVRGTL DAVGVKDRQQGRSKYGVKKPK" Saturday, March 17, 2012GeneBank format
ORIGIN
1 tgggcgaacg acgggaattg aacccgcgca tggtggattc acaatccact gccttgatcc 61 acttggctac atccgcccct tatactatta caaatattta caccatttat cattacttgt 121 aagataaaat acaacataaa ataaactgaa acttttaata ttttaattaa attttgtagt 181 aaattaacta aaaaaaaata tagaacaaaa caatatagta aagttaagta gtaaataaaa 241 aaaatactaa atagtaaagg agcaataaca aacctcttga tataacaaga aatttattat 301 tgctccttta ctttcaagaa ctcctatata ctaagaccaa agtcttatcc atttatagat 361 ggaacttcaa cagcagctag atctagaggg aaattatggg cattacgttc atgcataact 421 tccataccaa ggttagcgcg gttaataata tcagcccaag tattaattac acgaccctga 481 ctatcaacta cagattgatt gaaattaaaa ccatttaagt tgaaagccat agtgctgata 541 cctaaagcgg taaaccagat acctactaca ggccaagcag ctaggaagaa atgtaaagaa 601 cgagaattgt tgaaactagc atattggaag atcaatcggc caaaataacc atgagcggct 661 acgatattat aggtttcttc ctcttgaccg aatctgtaac cttcattagc agattcattt 721 tctgtggttt ccctgatcaa actagaggtt accaaggacc catgcatagc actgaatagg 781 gagccgccga atacaccagc tacgcctaac atgtgaaatg ggtgcataag gatgttgtgc 841 tcggcttgga atacaatcat gaagttgaaa gtaccggaga ttcctagggg cataccgtca 901 gaaaagcttc cttgaccaat tggatatatc aagaaaacag cagtagcagc tgcaacagga 961 gctgaatatg caacagcaat ccaagggcgc atacccagac ggaaactaag ttcccactca 1021 cgacccatgt agcaagctac accaagtaag aagtgtagaa caattagttc ataaggacca 1081 ccgttgtata accattcatc aacggaagcc gcttcccata tcgggtaaaa gtgcaaacct 1141 atagctgcag aggtaggaat aatggcacca gaaataatat tgtttccata aagtaaagat 1201 ccagaaacag gttcacgaat accatcaata tctactggag gtgcagcaat gaaagcaata
GFF format
##gff-version 3
# sequence-region HQ336405 1 157790 # conversion-by bp_genbank2gff3.pl # organism Prunus persica
# date 22-DEC-2010
# Note Prunus persica chloroplast, complete genome.
# working on region:HQ336405, Prunus persica, 22-DEC-2010, Prunus persica chloroplast, complete genome.
# Possible gene unflattening error withHQ336405: consult STDERR
HQ336405!GenBank! region! 1! 157790! .! +! .! ID=HQ336405;Dbxref=taxon:3760;Note=Prunus persica chloroplast complete genome;date=22-DEC-2010;mol_type=genomic
DNA;organelle=plastid:chloroplast;organism=Prunus persica HQ336405!GenBank! CDS!99804! 99829! .! -! .! ID=rps12;Dbxref=GI: 309321458;codon_start=1;gene=rps12;product=ribosomal protein S12;protein_id=ADO64999.1;trans_splicing=_no_value;transl_table=11;translation=length.123 HQ336405!GenBank! CDS!100366! 100597! .! -! .! ID=rps12;Dbxref=GI: 309321458;codon_start=1;gene=rps12;product=ribosomal protein S12;protein_id=ADO64999.1;trans_splicing=_no_value;transl_table=11;translation=length.123 HQ336405!GenBank! CDS!71346! 71459! .! -! .! ID=rps12;Dbxref=GI: 309321458;codon_start=1;gene=rps12;product=ribosomal protein S12;protein_id=ADO64999.1;trans_splicing=_no_value;transl_table=11;translation=length.123
HQ336405!GenBank! gene!99804! 100597! .! -! . ID=rps12;gene=rps12;trans_splicing=_no_value
HQ336405!GenBank! gene!71346! 71459! .! -! . ID=rps12;gene=rps12;trans_splicing=_no_value
HQ336405!GenBank! gene!3! 77! .! -! .! ID=trnH-GUG;gene=trnH-GUG
HQ336405!GenBank! tRNA!3! 77! .! -! .! ID=trnH-GUG.r01;Parent=trnH GUG;Note=anticodon:GUG;gene=trnH-GUG;product=tRNA-His
GBrowse2
Create and configure the tracks visualizing the date
of E.coli
[GENERAL]
description = Escherichia coli str. K-12 substr. DH10B database = annotations
initial landmark = NC_010473.1:1..10000
# bring in the special Submitter plugin for the rubber-band select menu plugins = FastaDumper RestrictionAnnotator SequenceDumper TrackDumper Submitter S
autocomplete = 1
default tracks = Genes ORFs tRNAs CDS Transp Centro:overview GC:region
# examples to show in the introduction
################################# # database definitions
################################# [scaffolds:database]
db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor memory
-dir /var/lib/gbrowse2/databases/ecoli_seq search options = default +autocomplete
[annotations:database]
db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor memory
-dir /var/lib/gbrowse2/databases/ecoli_annotations search options = default +autocomplete
GBrowse2
Create and configure the tracks visualizing the date
of E.coli
# Default glyph settings [TRACK DEFAULTS] glyph = generic database = annotations height = 8 bgcolor = cyan fgcolor = black label density = 25 bump density = 100
show summary = 99999 # go into summary mode when zoomed out to 100k
# default pop-up balloon
balloon hover = <b>$name</b> is a $type spanning $ref from $start to $end. Click for more details.
[CDS] feature = gene glyph = cds description = 0 height = 26 sixframe = 1
label = sub {shift->name . " reading frame"} key = CDS
balloon click width = 500 balloon hover width = 350
balloon hover = <b>$name</b> is a $type spanning $ref from $start to $end. Click to search Google for $name.
balloon click = http://www.google.com/search?q=$name
GBrowse2
Create and configure the tracks visualizing the date
of E.coli
GBrowse2
Insert E.coli in the general GBrowse config file
############################################################################## #
# DATASOURCE DEFINITIONS
# One stanza for each configured data source #
############################################################################## [yeast]
description = Yeast chromosomes 1+2 (basic) path = yeast_simple.conf
[yeast_advanced]
description = Yeast chromosomes 1+2 (advanced) path = yeast_chr1+2.conf
[ecoli]
description = Escherichia coli str. K-12 substr. DH10B path = ecoli.conf
GBrowse2
Add data to “databases”
################################# # database definitions
################################# [scaffolds:database]
db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor memory
-dir /var/lib/gbrowse2/databases/ecoli_seq search options = default +autocomplete
[annotations:database]
db_adaptor = Bio::DB::SeqFeature::Store db_args = -adaptor memory
-dir /var/lib/gbrowse2/databases/ecoli_annotations search options = default +autocomplete