• No results found

Module 10: Bioinformatics

N/A
N/A
Protected

Academic year: 2021

Share "Module 10: Bioinformatics"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Module  10:  Bioinformatics  

 

1.) Goal:      

To  understand  the  general  approaches  for  basic  in  silico  (computer)  analysis  of  DNA-­‐  and  protein  sequences.  We   are  going  to  discuss  sequence  formatting  required  prior  to  analysis,  DNA  restriction  mapping,  DNA  translation  into   protein  coding  regions  (=  finding  “open  reading  frames”  ORFs),  protein  sequence  analysis,  sequence  comparisons   and  database  searching.  

   

2.) Introduction  

DNA   sequencing   has   of   late   become   very   easy,   fast   and   cheap.   The   elucidation   of   the   complete   human   genome   sequence   (a   mere   3   x   109   basepairs)   has   only   been   possible   because   of   these   technical   advances.   Protein   sequencing  on  the  other  hand  is  also  possible,  but  technically  much  harder,  and  slower.  Because  a  DNA  sequence   predicts   the   encoded   protein   sequence   thru   the   rules   of   the   genetic   code,   we   can   sequence   a   piece   of   DNA   and   deduce   or   predict   its   encoded   protein   sequence,   instead   of   painstakingly   purifying   and   then   sequencing   the   corresponding  protein.  Given  the  large  size  of  some  of  the  genomes  that  have  been  sequenced  to  date,  it  becomes   clear  that  powerful  in  silico  approaches  must  go  hand  in  hand  with  the  wet  lab  procedures.            

   

3.) Background  information  

• Sequence  input  files  can  be  generated  in  WORD,  or  can  be  copied  from  other  source  documents,  websites   etc    

• formatting:  for  some  applications,  files  first  have  to  be  converted  into  a  specific  format  (“FASTA”  being  a   popular  choice);  removal  of  “non-­‐standard”  characters  from  the  files  is  necessary  (example:  DNA  sequence   files  can  only  contain  the  characters  G,  A,  T  and  C)    

• Many  free  sequence  analysis  programs  are  available  on  line;  all  you  have  to  do  is  simply  copy  and  paste   your  sequence(s)  into  a  browser  window  and  run  the  analysis,  but  remember  in  some  cases  the  correct  file   format  must  be  used.  

• Tip:  when  working  with  sequence  files  in  WORD,  use  the  Courier  font,  as  it  is  the  only  font  in  which  each   letter/character  uses  the  same  amount  of  space,  resulting  in  well-­‐aligned  sequences  

   

4.) Steps  in  an  in  silico  exercise.    Note:  different  exercises  may  use  a  different  set  of  steps   • Open  sequence  file  supplied  in  WORD  

• Check  for  absence  of  non-­‐standard  characters,  convert  file  into  appropriate  format   • Copy  sequence  

• Open  browser  window  with  particular  application   • Paste  sequence  into  window  

• Choose  specific  analysis  parameters   • Run  analysis  

• Results  and  files  can  be  copied  out  of  browser  window  and  pasted/saved  back  into  the  original  Word  file      

 

5.) Materials  supplied:    general  plasmid  map  (Appendix  A),  related  DHFR  protein  sequence  for  sequence   alignment  (Appendix  B),  Complete  Plasmid  Sequence  with  the  Bacillus  thermophilis  DHFR  (Appendix  C),          

 

6.) Boyer  book  chapter:        #2    

(2)

2

 

7.) Basic  examples  of  things  that  can  be  done  with  in  silico  analyses    

With  DNA  sequence    

• sequence  generation:    type  or  copy-­‐paste  in  Word  format  

• sequence  formatting,  sequence  length  :  http://www.ebi.ac.uk/Tools/sfc/emboss_seqret/  

• restriction  digestion  and  mapping  (linear  vs  circular  maps):  http://www.restrictionmapper.org/   • translation  of  open  reading  frames  (ORFs):  http://web.expasy.org/translate/  

 

With  Protein  sequence  

• determine  AA  sequence  length,  AA  composition,  molecular  weight,    pI,  molar  extinction  coefficient  :  

http://www.ebi.ac.uk/Tools/seqstats/emboss_pepstats/    

DNA  of  Protein  Sequence  comparison    

• these  programs  can  be  used  to  compare  complete  protein  sequences  to  establish  evolutionary  relationships   or  find  single  point  mutations    

• Pairwise  DNA  alignment:  http://www.ebi.ac.uk/Tools/psa/emboss_needle/nucleotide.html,   • Pairwise  Protein  alignment:  

http://www.ebi.ac.uk/Tools/services/web/toolform.ebi?tool=emboss_needle&context=protein   • Multiple  sequence  alignment:    http://www.ebi.ac.uk/Tools/msa/clustalo/  

   

Sequence  Databases  

• DNA  and  proteins  @  Pubmed:    http://www.ncbi.nlm.nih.gov/pubmed  

• Also:    www.Uniprot.org  (well  curated  protein  DB,  can  do  Blasts  and  other  alignments)    

 

 

8.) Protocol:      

Do  the  following:  

1. Convert  the  complete  plasmid  sequence  in  Appendix  C  to  GCG  and  EMBL  format,  indicate  length  of   plasmid  DNA  in  bp.    Include  properly  labeled  copies  of  these  in  your  report  

2. Perform  restriction  mapping  for  the  complete  plasmid  sequence  supplied  in  Appendix  C,  using  the   restriction  enzymes  NdeI    and  BamHI.    Show  result  table  in  your  report  

3. In  your  report  show  translation  of  all  6  reading  frames  and  indicate  the  frame  with  the  DHFR  ORF   (open  reading  frame).    The  Bacillus  thermophilus  ORF  starts  with  MISHI.    Show  the  amino  acid   sequence  of  the  complete  B.  thermophilus  ORF  in  your  report.  

4. Protein  analysis:  Use  the    B.  thermophilus  DHFR  protein  sequence.    In  your  report  only  include   molecular  weight,  number  of  amino  acids,  pI,    and  molar  extinction  coefficient  

5. Sequence  comparison:    use  the  DHFR  -­‐protein  sequence  from  above  and  align  one  at  a  time  to  the   three  DHFR  protein  sequences  supplied.    Show  the  sequence  alignment  and  %  identity  for  all  three   (B.  thermophilus  with  human;  B.  thermophilus    with  Bacillus  amyloliquefaciens;    B.  thermophilus    with   Geobacillus  thermodenitrificans)  alignments  in  your  report.  

6. Sequence  comparison:  Align  all  four  DHFR  protein  sequences.    Show  the  sequence  alignment  in   your  report.  

 

(3)

3

 

9.) Materials  

Appendix  A:    Plasmid  Map  

                                     

Appendix  B:  DHFR  sequences  to  be  used  for  sequence  alignment:    

This  is  the  sequence  for  human  DHFR:    

1 mvgslnciva vsqnmgigkn gdlpwpplrn efryfqrmtt tssvegkqnl vimgkktwfs 61 ipeknrplkg rinlvlsrel keppqgahfl srslddalkl teqpelankv dmvwivggss 121 vykeamnhpg hlklfvtrim qdfesdtffp eidlekykll peypgvlsdv qeekgikykf 181 evyeknd

This  is  the  DHFR  sequence  from  Bacillus  amyloliquefaciens:

1 misfifamde nrligkdndl pwhlpddlay fkkvttghti vmgrktfesi grplpnrrni 61 vvtsrdeslf pgcitadsae evlklippde ecfviggaql ysalfpyadr lymtkihhvf 121 egdrffpefn eaeweltsrk qgvkdeknpy dyeylvyekk n

 

This  is  the  DHFR  sequence  from  Geobacillus  thermodenitrificans:  

1 mnmtilkssv mtlirrlkrq wrckgektmi shivamdenr vigkdnqlpw hlpadlayfk 61 rvtmghaivm grktfeaigr plpgrdnvvv trnpqfrpeg clvlhsleev kqwiaargee 121 vfiiggaelf katmpiadrl yvtnifasfp gdtfyppise kewkvvsytp gvkdeknpye 181 hafliyerk

(4)

4

Appendix  C:  Complete  Plasmid  Sequence  with  the  Bacillus  thermophilis  DHFR    

 

DHFR  ORF  is  situated  between  pos  5205  (NdeI)and  pos  5699  (BamHI)     tggcgaatgggacgcgccctgtagcggcgcattaagcgcggcgggtgtgg tggttacgcgcagcgtgaccgctacacttgccagcgccctagcgcccgct cctttcgctttcttcccttcctttctcgccacgttcgccggctttccccg tcaagctctaaatcgggggctccctttagggttccgatttagtgctttac ggcacctcgaccccaaaaaacttgattagggtgatggttcacgtagtggg ccatcgccctgatagacggtttttcgccctttgacgttggagtccacgtt ctttaatagtggactcttgttccaaactggaacaacactcaaccctatct cggtctattcttttgatttataagggattttgccgatttcggcctattgg ttaaaaaatgagctgatttaacaaaaatttaacgcgaattttaacaaaat attaacgtttacaatttcaggtggcacttttcggggaaatgtgcgcggaa cccctatttgtttatttttctaaatacattcaaatatgtatccgctcatg agacaataaccctgataaatgcttcaataatattgaaaaaggaagagtat gagtattcaacatttccgtgtcgcccttattcccttttttgcggcatttt gccttcctgtttttgctcacccagaaacgctggtgaaagtaaaagatgct gaagatcagttgggtgcacgagtgggttacatcgaactggatctcaacag cggtaagatccttgagagttttcgccccgaagaacgttttccaatgatga gcacttttaaagttctgctatgtggcgcggtattatcccgtattgacgcc gggcaagagcaactcggtcgccgcatacactattctcagaatgacttggt tgagtactcaccagtcacagaaaagcatcttacggatggcatgacagtaa gagaattatgcagtgctgccataaccatgagtgataacactgcggccaac ttacttctgacaacgatcggaggaccgaaggagctaaccgcttttttgca caacatgggggatcatgtaactcgccttgatcgttgggaaccggagctga atgaagccataccaaacgacgagcgtgacaccacgatgcctgcagcaatg gcaacaacgttgcgcaaactattaactggcgaactacttactctagcttc ccggcaacaattaatagactggatggaggcggataaagttgcaggaccac ttctgcgctcggcccttccggctggctggtttattgctgataaatctgga gccggtgagcgtgggtctcgcggtatcattgcagcactggggccagatgg taagccctcccgtatcgtagttatctacacgacggggagtcaggcaacta tggatgaacgaaatagacagatcgctgagataggtgcctcactgattaag cattggtaactgtcagaccaagtttactcatatatactttagattgattt aaaacttcatttttaatttaaaaggatctaggtgaagatcctttttgata atctcatgaccaaaatcccttaacgtgagttttcgttccactgagcgtca gaccccgtagaaaagatcaaaggatcttcttgagatcctttttttctgcg cgtaatctgctgcttgcaaacaaaaaaaccaccgctaccagcggtggttt gtttgccggatcaagagctaccaactctttttccgaaggtaactggcttc agcagagcgcagataccaaatactgtccttctagtgtagccgtagttagg ccaccacttcaagaactctgtagcaccgcctacatacctcgctctgctaa tcctgttaccagtggctgctgccagtggcgataagtcgtgtcttaccggg ttggactcaagacgatagttaccggataaggcgcagcggtcgggctgaac ggggggttcgtgcacacagcccagcttggagcgaacgacctacaccgaac tgagatacctacagcgtgagctatgagaaagcgccacgcttcccgaaggg agaaaggcggacaggtatccggtaagcggcagggtcggaacaggagagcg cacgagggagcttccagggggaaacgcctggtatctttatagtcctgtcg ggtttcgccacctctgacttgagcgtcgatttttgtgatgctcgtcaggg gggcggagcctatggaaaaacgccagcaacgcggcctttttacggttcct ggccttttgctggccttttgctcacatgttctttcctgcgttatcccctg attctgtggataaccgtattaccgcctttgagtgagctgataccgctcgc cgcagccgaacgaccgagcgcagcgagtcagtgagcgaggaagcggaaga gcgcctgatgcggtattttctccttacgcatctgtgcggtatttcacacc gcatatatggtgcactctcagtacaatctgctctgatgccgcatagttaa gccagtatacactccgctatcgctacgtgactgggtcatggctgcgcccc gacacccgccaacacccgctgacgcgccctgacgggcttgtctgctcccg gcatccgcttacagacaagctgtgaccgtctccgggagctgcatgtgtca gaggttttcaccgtcatcaccgaaacgcgcgaggcagctgcggtaaagct catcagcgtggtcgtgaagcgattcacagatgtctgcctgttcatccgcg tccagctcgttgagtttctccagaagcgttaatgtctggcttctgataaa gcgggccatgttaagggcggttttttcctgtttggtcactgatgcctccg tgtaagggggatttctgttcatgggggtaatgataccgatgaaacgagag aggatgctcacgatacgggttactgatgatgaacatgcccggttactgga acgttgtgagggtaaacaactggcggtatggatgcggcgggaccagagaa aaatcactcagggtcaatgccagcgcttcgttaatacagatgtaggtgtt ccacagggtagccagcagcatcctgcgatgcagatccggaacataatggt gcagggcgctgacttccgcgtttccagactttacgaaacacggaaaccga agaccattcatgttgttgctcaggtcgcagacgttttgcagcagcagtcg cttcacgttcgctcgcgtatcggtgattcattctgctaaccagtaaggca accccgccagcctagccgggtcctcaacgacaggagcacgatcatgcgca cccgtggggccgccatgccggcgataatggcctgcttctcgccgaaacgt ttggtggcgggaccagtgacgaaggcttgagcgagggcgtgcaagattcc gaataccgcaagcgacaggccgatcatcgtcgcgctccagcgaaagcggt cctcgccgaaaatgacccagagcgctgccggcacctgtcctacgagttgc

(5)

5

atgataaagaagacagtcataagtgcggcgacgatagtcatgccccgcgc ccaccggaaggagctgactgggttgaaggctctcaagggcatcggtcgag atcccggtgcctaatgagtgagctaacttacattaattgcgttgcgctca ctgcccgctttccagtcgggaaacctgtcgtgccagctgcattaatgaat cggccaacgcgcggggagaggcggtttgcgtattgggcgccagggtggtt tttcttttcaccagtgagacgggcaacagctgattgcccttcaccgcctg gccctgagagagttgcagcaagcggtccacgctggtttgccccagcaggc gaaaatcctgtttgatggtggttaacggcgggatataacatgagctgtct tcggtatcgtcgtatcccactaccgagatatccgcaccaacgcgcagccc ggactcggtaatggcgcgcattgcgcccagcgccatctgatcgttggcaa ccagcatcgcagtgggaacgatgccctcattcagcatttgcatggtttgt tgaaaaccggacatggcactccagtcgccttcccgttccgctatcggctg aatttgattgcgagtgagatatttatgccagccagccagacgcagacgcg ccgagacagaacttaatgggcccgctaacagcgcgatttgctggtgaccc aatgcgaccagatgctccacgcccagtcgcgtaccgtcttcatgggagaa aataatactgttgatgggtgtctggtcagagacatcaagaaataacgccg gaacattagtgcaggcagcttccacagcaatggcatcctggtcatccagc ggatagttaatgatcagcccactgacgcgttgcgcgagaagattgtgcac cgccgctttacaggcttcgacgccgcttcgttctaccatcgacaccacca cgctggcacccagttgatcggcgcgagatttaatcgccgcgacaatttgc gacggcgcgtgcagggccagactggaggtggcaacgccaatcagcaacga ctgtttgcccgccagttgttgtgccacgcggttgggaatgtaattcagct ccgccatcgccgcttccactttttcccgcgttttcgcagaaacgtggctg gcctggttcaccacgcgggaaacggtctgataagagacaccggcatactc tgcgacatcgtataacgttactggtttcacattcaccaccctgaattgac tctcttccgggcgctatcatgccataccgcgaaaggttttgcgccattcg atggtgtccgggatctcgacgctctcccttatgcgactcctgcattagga agcagcccagtagtaggttgaggccgttgagcaccgccgccgcaaggaat ggtgcatgcaaggagatggcgcccaacagtcccccggccacggggcctgc caccatacccacgccgaaacaagcgctcatgagcccgaagtggcgagccc gatcttccccatcggtgatgtcggcgatataggcgccagcaaccgcacct gtggcgccggtgatgccggccacgatgcgtccggcgtagaggatcgagat ctcgatcccgcgaaattaatacgactcactataggggaattgtgagcgga taacaattcccctctagaaataattttgtttaactttaagaaggagatat aCATatgatttcgcacattgtggcaatggatgaaaaccgggtgatcggca aagacaaccgcttgccttggcatttgccggccgatttggcgtattttaaa cgggtgacaatgggccatgccatcgtgatggggcgcaagacgtttgaagc gatcggccggccgcttcccggccgcgataacgtcgttgtcacgcgcaacc gctcgtttcgtccggaaggctgccttgtgcttcattcgctcgaggaagtc aagcaatggatcgcatcgcgcgctgatgaagtgtttatcatcggcggggc cgaactgtttcgggcgacgatgccgattgtcgaccggctgtatgtgacaa aaatttttgcttccttccccggcgatacgttttatccgcccatttctgac gatgaatgggaaatcgtttcctatacgccaggagggaaagatgaaaagaa tccgtatgaacacgcctttatcatttatgagcggaaaaaggcgaaaTAAT GGATCCgaattcgagctccgtcgacaagcttgcggccgcactcgagcacc accaccaccaccactgagatccggctgctaacaaagcccgaaaggaagct gagttggctgctgccaccgctgagcaataactagcataaccccttggggc ctctaaacgggtcttgaggggttttttgctgaaaggaggaactatatccg gat

References

Related documents

This stage of education should form the basis for the increasingly complex cognitive-academic language demands in the later grades (Westby, 1994). Also, because of the

The Treasury Regulations deal with discretionary powers (which should include decanting) differently from a modification. will not cause” loss of exempt status if the terms of

SPECTATORS ESPACE GD BO SPECTATORS VENUE OVERVIEW DOPING CONTROL RACE OFFICE TEAM CAPTAIN’S MEETING PRESS CENTER VIP BIATHLON FAMILY SKI TEST ZONES MEDICAL FACILITIES +AED

According to the United Nations Charter, a State using force under Article 51 has to submit a report to the Security Council, which describes a “minimum plain”

raised: 1) to study the interaction and heterologous IVF parameters using dolphin spermatozoa and zona-intact cow and murine oocytes; and 2) to study sperm maturation,

Next, we present in detail our approach for ex- tracting the coherent movement in different locations on the face from dense Optical Flow method by filter- ing the noise on the basis

November is also the best time to visit the south when Douz hosts the Sahara Festival ( p245 ), a local shindig that predates tourism, followed almost immediately by the

Exploiting the conservation of general architecture of TRiC subunits, we mapped the location of substrate crosslinks to CCT2, CCT6, and CCT7 ( Figure 6 C, green, blue, and cyan