'n the eampe !o%e above, the first thing e nee% to %o insi%e the oop is to oo- &p the va&e for the !&rrent -ey. $his is a very !ommon pattern hen iterating over %i!tionaries 5 so !ommon, in fa!t, that #ython has a spe!ia shorthan% for it.
'nstea% of %oing this:
for :ey in my/dict.:eys(#! alue B my/dict.get(:ey#
4 do something ith :ey and alue
e !an &se the items metho% to iterate over pairs of %ata, rather than 9&st -eys:
for :ey, alue in my/dict.items(#! 4 do something ith :ey and alue
$he items metho% %oes something sighty %ifferent from a the other metho%s eve seen so far in this boo- rather than ret&rning a single value, or a list of values, it ret&rns a list of pairs of values. $hats hy e have to give to variabe
names at the start of the oop. 7eres ho e !an &se the items metho% to pro!ess o&r %i!tionary of trin&!eoti%e !o&nts 9&st i-e before:
1I2 ChapterI:Di!tionaries
for trinucleotide, count in counts.items(#! if count BB 2!
print(trinucleotide#
$his metho% is generay preferre% for iterating over items in a %i!tionary, as it ma-es the intention of the !o%e very !ear.
(ecap
e starte% this !hapter by eamining the probem of storing paire% %ata in #ython. 4fter oo-ing at a !o&pe of &nsatisfa!tory ays to %o it &sing toos that eve
area%y earne% abo&t, e intro%&!e% a ne type of %ata str&!t&re 5 the %i!tionary 5 hi!h offers a m&!h ni!er so&tion to the probem of storing paire% %ata.
+ater in the !hapter, e sa that the rea benefit of &sing %i!tionaries is the
effi!ient oo-&p they provi%e. e sa ho to !reate %i!tionaries an% manip&ate the items in them, an% severa %ifferent ays to oo- &p va&es for -non -eys. e aso sa ho to iterate over a the items in %i!tionary.
'n the pro!ess, e &n!overe% a fe restri!tions on hat %i!tionaries are !apabe of 5 ere ony aoe% to &se a !o&pe of %ifferent %ata types for -eys, they m&st be &ni<&e, an% e !ant rey on their or%er. J&st as a physi!a %i!tionary aos &s to rapi%y oo- &p the %efinition for a or% b&t not the other ay ro&n%, #ython
%i!tionaries ao &s to rapi%y oo- &p the va&e asso!iate% ith a -ey, b&t not the reverse.
1I3 ChapterI:Di!tionaries
!ercises
'.4 translation
rite a program that i transate a D;4 se<&en!e into protein. o&r program sho&% &se the stan%ar% geneti! !o%e hi!h !an be fo&n% at this *+1.
1IB ChapterI:Di!tionaries
&olutions
'.4 translation
$he %es!ription of this eer!ise is very short, b&t it hi%es <&ite a bit of !ompeity8 $o transate a D;4 se<&en!e e nee% to !arry o&t a n&mber of %ifferent steps. First, e have to spit &p the se<&en!e into !o%ons. $hen, e nee% to go thro&gh ea!h
!o%on an% transate it into the !orrespon%ing amino a!i% resi%&e. Finay, e nee% to !reate a protein se<&en!e by a%%ing a the amino a!i% resi%&es together.
e start off by fig&ring o&t ho to spit a D;4 se<&en!e into !o%ons. (e!a&se this eer!ise is <&ite tri!-y, e pi!- a very short test D;4 se<&en!e to or- on 5 9&st three !o%ons:
dna B *A)%))&%%)*
7o are e going to spit &p the D;4 se<&en!e into gro&ps of three basesE 'ts tempting to try to &se the split metho%, b&t remember that the split metho% ony or-s if the things yo& ant to spit are separate% by a %eimiter. 'n o&r !ase, theres nothing separating the !o%ons, so split i not hep &s.
"omething that might be abe to hep &s is s&bstring notation. e -no that this aos &s to etra!t part of a string, so e !an %o something i-e this:
dna B *A)%))&%%)* codon1 B dna$0!3 codon2 B dna$3!" codon3 B dna$"!J
print(codon1, codon2, codon3#
1I ChapterI:Di!tionaries
(A)%, ))&, %%)#
b&t its not a great so&tion, as e have to fi in the n&mbers man&ay. "in!e the n&mbers foo a very pre%i!tabe pattern, it sho&% be possibe to generate them a&tomati!ay. $he start position for ea!h s&bstring is initiay ero, then goes &p by three for ea!h s&!!essive !o%on. $he stop position is 9&st the start position p&s
three.
e!a that the 9ob of the range f&n!tion is to generate se<&en!es of n&mbers. 'n or%er to generate the se<&en!e of s&bstring start positions, e nee% to &se the threearg&ment version of range, here the first arg&ment is the n&mber to start at, the se!on% arg&ment is the n&mber to finish at, an% the thir% arg&ment is the step sie. For o&r D;4 se<&en!e above, the n&mber to start at is ero, an% the step sie is three. $he n&mber to finish at it not si b&t seven, be!a&se ranges are
e!&sive at the finish. $his bit of !o%e shos ho e !an &se the range f&n!tion to generate the ist of start positions:
for start in range(0,7,3#! print(start#
0 3 "
$o fin% the stop position for a given start position e 9&st a%% three, so e !an easiy spit o&r D;4 into !o%ons &sing a oop:
dna B *A)%))&%%)*
for start in range(0,7,3#!
codon B dna$start!startC3 print(*one codon is* C codon#
1IG ChapterI:Di!tionaries
one codon is A)% one codon is ))& one codon is %%)
$his or-s fine for o&r test D;4 se<&en!e, b&t if e give it a shorter se<&en!e e i get in!ompete an% empty !o%ons:
dna B *A)%))*
for start in range(0,7,3#!
codon B dna$start!startC3 print(codon#
one codon is A)% one codon is )) one codon is
an% if e give it a onger se<&en!e, e i miss o&t the fo&rth an% s&bse<&ent !o%ons:
dna B *A)%))&%%)%AA%&%%%&)A%A)* for start in range(0,7,3#!
codon B dna$start!startC3 print(*one codon is * C codon#
one codon is A)% one codon is ))& one codon is %%)
Ceary e nee% to mo%ify the se!on% arg&ment to range 5 the position to finish the se<&en!e of n&mbers 5 in or%er to ta-e into a!!o&nt the ength of the D;4 se<&en!e. 4t this point, e have to !onfront the probem of hat to %o if ere given a D;4 se<&en!e hose ength is not an ea!t m&tipe of three. Ceary, e !annot transate an in!ompete !o%on, so e ant the start position of the fina
1IH ChapterI:Di!tionaries
!o%on to e<&a to the ength of the D;4 se<&en!e min&s to. $his g&arantees that there i aays be to more !hara!ters fooing the position of the fina !o%on start 5 i.e. eno&gh for a !ompete !o%on.
7eres the mo%ifie% !o%e:
dna B *A)%))&%%)*
4 calculate the start position for the final codon last/codon/start B len(dna# X 2
4 process the dna seuence in three -ase chun:s for start in range(0,last/codon/start,3#!
codon B dna$start!startC3 print(*one codon is * C codon#
;o that e -no ho to spit a D;4 se<&en!e &p into !o%ons, ets t&rn o&r attention to the probem of transating those !o%ons. 'f e p& &p the *+ from the eer!ise %es!ription in a eb broser, e !an see the stan%ar% !o%on
transation tabe in vario&s formats. "toring this transation tabe seems i-e a
perfe!t 9ob for a %i!tionary: e have !o%ons -eys= an% amino a!i% resi%&es va&es= an% e ant to be abe to oo- &p the amino a!i% for a given !o%on.
1
1IIII CChhaapptteerrII::DDii!!ttiioonnaarriieess
7eres a bit of !o%e 5 its a!t&ay a singe statement, sprea% o&t over m&tipe ines 7eres a bit of !o%e 5 its a!t&ay a singe statement, sprea% o&t over m&tipe ines 5 hi!h !reates a %i!tionary to ho% the transation tabe:
5 hi!h !reates a %i!tionary to ho% the transation tabe:
gencode @ M gencode @ M )!"!)0);)% )!"C)0);)% )!"")0);)% )!"#)0)&)% )!"!)0);)% )!"C)0);)% )!"")0);)% )!"#)0)&)% )!C!)0)")% )!CC)0)")% )!C#)0)")% )!C")0)")% )!C!)0)")% )!CC)0)")% )!C#)0)")% )!C")0)")% )!!C)0)N)% )!!")0)N)% )!!!)0))% )!!#)0))% )!!C)0)N)% )!!")0)N)% )!!!)0))% )!!#)0))% )!#C)0)S)% )!#")0)S)% )!#!)0)Q)% )!##)0)Q)% )!#C)0)S)% )!#")0)S)% )!#!)0)Q)% )!##)0)Q)% )C"!)0)T)% )C"C)0)T)% )C"#)0)T)% )C"")0)T)% )C"!)0)T)% )C"C)0)T)% )C"#)0)T)% )C"")0)T)% )CC!)0)P)% )CCC)0)P)% )CC#)0)P)% )CC")0)P)% )CC!)0)P)% )CCC)0)P)% )CC#)0)P)% )CC")0)P)% )C!C)0):)% )C!")0):)% )C!!)0)U)% )C!#)0)U)% )C!C)0):)% )C!")0):)% )C!!)0)U)% )C!#)0)U)% )C#!)0)Q)% )C#C)0)Q)% )C##)0)Q)% )C#")0)Q)% )C#!)0)Q)% )C#C)0)Q)% )C##)0)Q)% )C#")0)Q)% )#"!)0)V)% )#"C)0)V)% )#"#)0)V)% )#"")0)V)% )#"!)0)V)% )#"C)0)V)% )#"#)0)V)% )#"")0)V)% )#C!)0)!)% )#CC)0)!)% )#C#)0)!)% )#C")0)!)% )#C!)0)!)% )#CC)0)!)% )#C#)0)!)% )#C")0)!)% )#!C)0)()% )#!")0)()% )#!!)0)E)% )#!#)0)E)% )#!C)0)()% )#!")0)()% )#!!)0)E)% )#!#)0)E)% )##!)0)#)% )##C)0)#)% )###)0)#)% )##")0)#)% )##!)0)#)% )##C)0)#)% )###)0)#)% )##")0)#)% )"C!)0)S)% )"CC)0)S)% )"C#)0)S)% )"C")0)S)% )"C!)0)S)% )"CC)0)S)% )"C#)0)S)% )"C")0)S)% )""C)0)7)% )""")0)7)% )""!)0)T)% )""#)0)T)% )""C)0)7)% )""")0)7)% )""!)0)T)% )""#)0)T)% )"!C)0)')% )"!")0)')% )"!!)0)_)% )"!#)0)_)% )"!C)0)')% )"!")0)')% )"!!)0)_)% )"!#)0)_)% )"#C)0)C)% )"#")0)C)% )"#!)0)_)% )"##)0)R) )"#C)0)C)% )"#")0)C)% )"#!)0)_)% )"##)0)R)
e !an oo- &p the amino a!i% for a e !an oo- &p the amino a!i% for a given !o%ogiven !o%on &sing either of n &sing either of the to metho%sthe to metho%s that e earne% abo&t:
that e earne% abo&t:
print(gencode$&A)# print(gencode$&A)# print(gencode.get(%)&## print(gencode.get(%)&##
1
1II@@ CChhaapptteerrII::DDii!!ttiioonnaarriieess
'f e oo- &p the amino a!i% for ea!h !o%on insi%e the
'f e oo- &p the amino a!i% for ea!h !o%on insi%e the oop of o&r origina !o%e, oop of o&r origina !o%e, ee !an print both the !o%on an% the amino a!i% transation
!an print both the !o%on an% the amino a!i% transation11::
dna B *A)%))&%%)* dna B *A)%))&%%)*
last/codon/start B len(dna# F 2 last/codon/start B len(dna# F 2
for start in range(0,last/codon/start,3#! for start in range(0,last/codon/start,3#!
codon B dna$start!startC3 codon B dna$start!startC3 aa B gencode.get(codon# aa B gencode.get(codon#
print(*one codon is * C codon# print(*one codon is * C codon# print(*the amino acid is * C aa# print(*the amino acid is * C aa#
one codon is A)% one codon is A)% the amino acid is the amino acid is one codon is ))& one codon is ))& the amino acid is the amino acid is one codon is %%) one codon is %%) the amino acid is % the amino acid is %
$his is starting to oo- promising. $he fina step
$his is starting to oo- promising. $he fina step is to a!t&ay %o something ithis to a!t&ay %o something ith the amino a!i% resi%&es rather
the amino a!i% resi%&es rather than 9&st printing them. 4 ni!than 9&st printing them. 4 ni!e i%ea is e i%ea is to ta-e o&rto ta-e o&r !&e from the ay that a
!&e from the ay that a ribosome behaves an% a%% ea!h ne amino a!i% resi%&eribosome behaves an% a%% ea!h ne amino a!i% resi%&e onto the en% of a
onto the en% of a protein to !reate a gra%&aygroing string:protein to !reate a gra%&aygroing string:
1
1 From nFrom no ono on, e , e ont inont in!&%e t!&%e the stathe statemenement hi!h !t hi!h !reatereates the %i!s the %i!tiontionary in o&ary in o&r !o%e sr !o%e sampampes as it ta-es as it ta-eses &p too m&!h room, so if yo& ant to try r&nning these yo&rsef yo& nee% to a%% it ba!- at the top.
1
1@@00 CChhaapptteerrII::DDii!!ttiioonnaarriieess
dna B *A)%))&%%)* dna B *A)%))&%%)* last/codon/start B len(dna# F 2 last/codon/start B len(dna# F 2 protein B ** protein B **
for start in range(0,last/codon/start,3#! for start in range(0,last/codon/start,3#!
codon B dna$start!startC3 codon B dna$start!startC3 aa B gencode.get(codon# aa B gencode.get(codon# protein B protein C aa protein B protein C aa
print(*protein seuence is * C protein# print(*protein seuence is * C protein#
'n the above !o%e, e !reate a ne
'n the above !o%e, e !reate a ne variabe to ho% the variabe to ho% the protein se<&en!eprotein se<&en!e imme%iatey before e start the oop ine 3=, then a%% a singe
imme%iatey before e start the oop ine 3=, then a%% a singe !hara!ter onto the!hara!ter onto the en% of that variabe ea!h time
en% of that variabe ea!h time ro&n% the oop ine H=. (y the time e eit the ro&n% the oop ine H=. (y the time e eit the oop,oop, e have b&i
e have b&it &p the !ompete prot &p the !ompete protein se<&en!e an% tein se<&en!e an% e !an print it o&t ie !an print it o&t ine I=:ne I=:
protein seuence is % protein seuence is %
$his oo-s i-e a very &sef& bit of !o%e, so ets t&rn it into a f&n!tion. ?&r f&n!tion $his oo-s i-e a very &sef& bit of !o%e, so ets t&rn it into a f&n!tion. ?&r f&n!tion i ta-e one arg&
i ta-e one arg&ment 5 ment 5 the D;4 se<&en!the D;4 se<&en!e as a string e as a string 5 an% 5 an% i ret&rn a stringi ret&rn a string !ontaining the protein se<&en!e
!ontaining the protein se<&en!e11::
def translate/dna(dna#! def translate/dna(dna#! last/codon/start B len(dna# F 2 last/codon/start B len(dna# F 2 protein B ** protein B **
for start in range(0,last/codon/start,3#! for start in range(0,last/codon/start,3#!
codon B dna$start!startC3 codon B dna$start!startC3 aa B gencode.get(codon# aa B gencode.get(codon# protein B protein C aa protein B protein C aa return protein return protein
e !an no test e !an no test o&r f&n!tion by printing o&t the o&r f&n!tion by printing o&t the protein transation for a fe moreprotein transation for a fe more test se<&en!es:
test se<&en!es:
1
1 o&o& n nototi!e i!e thathat tht this f&is f&n!tn!tion ion rereies ies on ton thehe gencodegencode variabe hi!h is variabe hi!h is %efine% o&tsi%e the f&n!tion 5%efine% o&tsi%e the f&n!tion 5 something that ' to% yo& not to %o in
something that ' to% yo& not to %o in !hapter . $his is an e!eption to the r&e: %efining the gen!o%e!hapter . $his is an e!eption to the r&e: %efining the gen!o%e variabe insi%e t
variabe insi%e the f&n!tion mehe f&n!tion means that it ans that it o&% have to o&% have to be !reate% anbe !reate% ane ea!h time e ea!h time e ante% e ante% to transateto transate a D;4 se<&en!e. a D;4 se<&en!e. 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8
1
1@@11 CChhaapptteerrII::DDii!!ttiioonnaarriieess
print(translate/dna(*A)%))&%%)*## print(translate/dna(*A)%))&%%)*## print(translate/dna(*A)&%A)&%A)&%))%&))A)&%A)&A%*## print(translate/dna(*A)&%A)&%A)&%))%&))A)&%A)&A%*## print(translate/dna(*actgatcgtagctagctgacgtatcgtat*## print(translate/dna(*actgatcgtagctagctgacgtatcgtat*## print(translate/dna(*A&%A)&%A)&%)<A&%)A&%A)&%)A&)&%*## print(translate/dna(*A&%A)&%A)&%)<A&%)A&%A)&%)A&)&%*##
$he o&tp&t from this !o%e shos that e r&n
$he o&tp&t from this !o%e shos that e r&n into a probem ith the thir%into a probem ith the thir% se<&en!e: se<&en!e: % % L@QS??L@Y L@QS??L@Y
)race-ac: (most recent call last#! )race-ac: (most recent call last#!
ile *dna/translation.py*, line 30, in ;module+ ile *dna/translation.py*, line 30, in ;module+
print(translate/dna(*actgatcgtagctagctgacgtatcgtat*## print(translate/dna(*actgatcgtagctagctgacgtatcgtat*## ile *dna/translation.py*, line 2, in translate/dna ile *dna/translation.py*, line 2, in translate/dna
protein B protein C aa protein B protein C aa
)ype9rror! cannot concatenate str and <one)ype o-Dects )ype9rror! cannot concatenate str and <one)ype o-Dects
$he probem o!!&rs hen e try
$he probem o!!&rs hen e try to oo- &p the amino a!i% for the first to oo- &p the amino a!i% for the first !o%on of the!o%on of the thir% se<&en!e 5 >a!t>. (e!
thir% se<&en!e 5 >a!t>. (e!a&se the ta&se the thir% se<&en!e is in hir% se<&en!e is in oer !ase b&t theoer !ase b&t the transation tabe %i!tionary is in
transation tabe %i!tionary is in &pper !ase, the -ey isnt fo&n%, the&pper !ase, the -ey isnt fo&n%, the getget metho% metho% ret&rns
ret&rns NoneNone, an% e get an error. Fiing it is straightforar% 5 e 9&st nee% to, an% e get an error. Fiing it is straightforar% 5 e 9&st nee% to !onvert the !o%o
!onvert the !o%on to &pper !ase n to &pper !ase before oo-ing &p the amino a!i%:before oo-ing &p the amino a!i%:
def translate/dna(dna#! def translate/dna(dna#! last/codon/start B len(dna# F 2 last/codon/start B len(dna# F 2 protein B ** protein B **
for start in range(0,last/codon/start,3#! for start in range(0,last/codon/start,3#!
codon B dna$start!startC3 codon B dna$start!startC3 aa B gencode.get(codon.upper(## aa B gencode.get(codon.upper(## protein B protein C aa protein B protein C aa return protein return protein
;o the o&tp&t shos that the first three
;o the o&tp&t shos that the first three se<&en!es are fine, b&t that o&r f&n!tionse<&en!es are fine, b&t that o&r f&n!tion has a probem transating the fo&rth se<&en!e: