*se the %ata from the previo&s eer!ise, b&t instea% of !reating a singe F4"$4 fie, !reate three ne F4"$4 fies 5 one per se<&en!e. $he names of the F4"$4 fies sho&% be the same as the se<&en!e hea%er names, ith the etension =fasta.
GH Chapter 3: ea%ing an% riting fies
&olutions
plitting genomic '.4
e have a hea%start on this probem, be!a&se e have area%y ta!-e% a simiar probem in the previo&s !hapter. +ets remin% o&rseves of the so&tion e en%e% &p ith for that eer!ise:
my/dna B *A)&%A)&%A)&%A)&%A&)%A&)A%)&A)A%&)A)%&A)%)A%&)A&)&%A)&%A)&%A)&%A)&%A)&%A)& %A)&%A)&%A)&A)%&)A)&A)&%A)&%A)A)&%A)%&A)&%A&)A&)A)* exon1 B my/dna$0!"2 intron B my/dna$"2!J0 exon2 B my/dna$J0!10000
print(exon1 C intron.loer(# C exon2#
hat !hanges %o e nee% to ma-eE Firsty, e nee% to rea% the D;4 se<&en!e from a fie instea% of riting it in the !o%e:
dna/file B open(*genomic/dna.txt*# my/dna B dna/file.read(#
"e!on%y, e nee% to !reate to ne fie ob9e!ts to ho% the o&tp&t:
coding/file B open(*coding/dna.txt*, **#
noncoding/file B open(*noncoding/dna.txt*, **#
Finay, e nee% to !on!atenate the to eon se<&en!es an% rite them to the !o%ing D;4 fie, an% rite the intron se<&en!e to the non!o%ing D;4 fie:
coding/file.rite(exon1 C exon2# noncoding/file.rite(intron#
GI Chapter 3: ea%ing an% riting fies
+ets p&t it a together, ith some ban- ines to separate o&t the %ifferent parts of the program:
4 open the file and read its contents dna/file B open(*genomic/dna.txt*# my/dna B dna/file.read(#
4 extract the different -its of @<A seuence exon1 B my/dna$0!"2
intron B my/dna$"2!J0 exon2 B my/dna$J0!10000 4 open the to output files
coding/file B open(*coding/dna.txt*, **#
noncoding/file B open(*noncoding/dna.txt*, **# 4 rite the seuences to the output files
coding/file.rite(exon1 C exon2# noncoding/file.rite(intron#
%riting a (4T4 file
+ets start this probem by thin-ing abo&t the variabes ere going to nee%. e have three D;4 se<&en!es in tota, so e nee% three variabes to ho% the se<&en!e hea%ers, an% three more to ho% the se<&en!es themseves:
header/1 B *!5C36/ header_6 @ (E7819 header_/ @ :;<=>? seA_3 @ !"C#"!C#!"C#!"C#!"C#C"!#!C#"!"C# seA_6 @ actgatcgacgatcgatcgatcacgact seA_/ @ !C"#!C-!C"#"--!C"#"!----C!"#"#
F4"$4 format has aternating ines of hea%er an% se<&en!e, so before e try any se<&en!e manip&ation, ets try to rite a program that pro%&!es the ines in the right or%er. ather than riting to a fie, e print the o&tp&t to the s!reen for no
G@ Chapter 3: ea%ing an% riting fies
5 that i ma-e it easier to see the o&tp&t right aay. ?n!e eve got it or-ing, e sit!h over to fie o&tp&t. 7eres a fe ines hi!h i print %ata to the
s!reen: print(header/1# print(se/1# print(header/2# print(se/2# print(header/3# print(se/3#
an% heres hat the o&tp&t oo-s i-e:
AN&123 A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&% @9'" actgatcgacgatcgatcgatcacgact @9'" actgatcgacgatcgatcgatcacgact
;ot far off 5 the ines are in the right or%er, b&t e forgot to in!&%e the greater than symbo at the start of the hea%er. 4so, e %ont reay nee% to print the hea%er an% the se<&en!e separatey for ea!h se<&en!e 5 e !an in!&%e a neine !hara!ter in the print string in or%er to get them on separate ines. 7eres an
improve% version of the !o%e:
print(+ C header/1 C n C se/1# print(+ C header/2 C n C se/2# print(+ C header/3 C n C se/3#
H0 Chapter 3: ea%ing an% riting fies +AN&123 A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&% +@9'" actgatcgacgatcgatcgatcacgact +LH7GJ A&)%A&FA&)%)FFA&)%)AFFFF&A)%)%
;et, ets ta!-e the probems ith the se<&en!es. $he se!on% se<&en!e is in oer !ase, an% it nee%s to be in &pper !ase 5 e !an fi that &sing the upper string
metho%. $he thir% se<&en!e has a b&n!h of gaps that e nee% to remove. e havent !ome a!ross a remove metho%.... b&t e %o -no ho to repa!e one
!hara!ter ith another. 'f e repa!e a the gap !hara!ters ith an empty string, it i be the same as removing them1. 7eres a version that fies both se<&en!es:
print(+ C header/1 C n C se/1#
print(+ C header/2 C n C se/2.upper(##
print(+ C header/3 C n C se/3.replace(F, ##
;o the printe% o&tp&t is perfe!t:
+AN&123 A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&% +@9'" A&)%A)&%A&%A)&%A)&%A)&A&%A&) +LH7GJ A&)%A&A&)%)A&)%)A&A)%)%
$he fina step is to sit!h from printe% o&tp&t to riting to a fie. e open a ne fie, an% !hange the three print ines to rite:
H1 Chapter 3: ea%ing an% riting fies
output B open(*seuences.fasta*, **#
output.rite(+ C header/1 C n C se/1#
output.rite(+ C header/2 C n C se/2.upper(##
output.rite(+ C header/3 C n C se/3.replace(F, ##
4fter ma-ing these !hanges the !o%e %oesnt pro%&!e any o&tp&t on the s!reen, so to see hats happene% e nee% to ta-e a oo- at the se;uences=fasta fie:
+AN&123
A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&%+@9'" A&)%A)&%A&%A)&%A)&%A)&A&%A&)+LH7GJ A&)%A&A&)%)A&)%)A&A)%)%
$his %oesnt oo- right 5 the se!on% an% thir% ines have been 9oine% together, as have the fo&rth an% fifth. hat has happene%E
't oo-s i-e eve &n!overe% a %ifferen!e beteen the print f&n!tion an% the rite metho%. print a&tomati!ay p&ts a ne ine at the en% of the string, hereas rite %oesnt. $his means eve got to be !aref& hen sit!hing
beteen them8 $he fi is <&ite simpe, e 9&st a%% a neine onto the en% of ea!h string that gets ritten to the fie:
output B open(*seuences.fasta*, **#
output.rite(+ C header/1 C n C se/1 C n#
output.rite(+ C header/2 C n C se/2.upper(# C n#
output.rite(+ C header/3 C n C se/3.replace(F, # C n#
$he arg&ments for the rite statements are getting <&ite !ompi!ate%, b&t they are a ma%e &p of simpe b&i%ing bo!-s. For eampe the ast one, if e transate% it into )ngish, o&% rea% >a greaterBthan symbol0 followe" by the variable hea"er)0 followe" by a newline0 followe" by the variable se;) with all hyphens replace" with
H2 Chapter 3: ea%ing an% riting fies
7eres the fina !o%e, in!&%ing the variabe %efinition at the beginning, ith ban- ines an% !omments:
4 set the alues of all the header aria-les header/1 B *!5C36/
header_6 @ (E7819 header_/ @ :;<=>?
B set the alues of all the seAuence aria$les seA_3 @ !"C#"!C#!"C#!"C#!"C#C"!#!C#"!"C# seA_6 @ actgatcgacgatcgatcgatcacgact
seA_/ @ !C"#!C-!C"#"D!C"#"!----C!"#"# B make a ne file to hold the output output @ open+seAuences.fasta% , B rite the header and seAuence for seA3
output.rite+)) F header_3 F )\n) F seA_3 F )\n), B rite the header and uppercase seAuences for seA6
output.rite+)) F header_6 F )\n) F seA_6.upper+, F )\n), B rite the header and seAuence for seA/ ith hyphens remoed
output.rite+)) F header_/ F )\n) F seA_/.replace+)-)% )), F )\n),
%riting multiple (4T4 files
e !an sove this probem ith a sight mo%ifi!ation of o&r so&tion to the previo&s eer!ise. e nee% to !reate three ne fies to ho% the o&tp&t, an% e !onstr&!t the name of ea!h fie by &sing string !on!atenation:
output/1 B open(header/1 C *.fasta*, **# output/2 B open(header/2 C *.fasta*, **# output/3 B open(header/3 C *.fasta*, **#
emember, the first arg&ment to open is a string, so its fine to &se a !on!atenation be!a&se e -no that the res&t of !on!atenating to strings is aso a string.
H3 Chapter 3: ea%ing an% riting fies
e aso !hange the rite statements so that e have one for ea!h of the o&tp&t fies. e nee% to be !aref& ith the n&mber here in or%er to ma-e s&re that e get the right se<&en!e in ea!h fie. 7eres the fina !o%e, ith !omments.
4 set the alues of all the header aria-les header/1 B *AN&123*
header/2 B *@9'"* header/3 B *LH7GJ*
4 set the alues of all the seuence aria-les se/1 B *A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&%* se/2 B *actgatcgacgatcgatcgatcacgact* se/3 B *A&)%A&FA&)%)OA&)%)AFFFF&A)%)%* 4 ma:e three files to hold the output output/1 B open(header/1 C *.fasta*, **# output/2 B open(header/2 C *.fasta*, **# output/3 B open(header/3 C *.fasta*, **# 4 rite one seuence to each output file
output/1.rite(+ C header/1 C n C se/1 C n#
output/2.rite(+ C header/2 C n C se/2.upper(# C n#
output/3.rite(+ C header/3 C n C se/3.replace(F, # C n#
+oo-ing at the !o%e above, it seems i-e theres a ot of re%&n%an!y there. )a!h of the fo&r se!tions of !o%e 5 setting the hea%er va&es, setting the se<&en!e va&es, !reating the o&tp&t fies, an% riting %ata to the o&tp&t fies 5 !onsists of three nearyi%enti!a statements. 4tho&gh the so&tion or-s, it seems to invove a ot of &nne!essary typing8 4so, having so m&!h nearyi%enti!a !o%e seems i-ey to !a&se errors if e nee% to !hange something. 'n the net !hapter, e eamine some toos hi!h i ao &s to start removing some of that re%&n%an!y.
HB ChapterB:+istsan%oops
":
#ists and loops
Why "o we nee" lists an" loops?
$hin- ba!- over the eer!ises that eve seen in the previo&s to !hapters 5 theyve a invove% %eaing ith one bit of information at a time. 'n !hapter 2, e &se%
string manip&ation toos to pro!ess singe se<&en!es, an% in !hapter 3, e pra!tise% rea%ing an% riting fies one at a time. $he !osest e got to &sing m&tipe pie!es of %ata as %&ring the fina eer!ise in !hapter 3, here e ere %eaing ith three D;4 se<&en!es.
'f thats a that #ython aoe% &s to %o, it o&%nt be a very hepf& too for bioogy. 'n fa!t, theres a goo% !han!e that yo&re rea%ing this boo- be!a&se yo& ant to be abe to rite programs to hep yo& %ea ith arge %atasets. 4 very
!ommon sit&ation in bioogi!a resear!h is to have a arge !oe!tion of %ata D;4 se<&en!es, ";# positions, gene epression meas&rements= that a nee% to be pro!esse% in the same ay. 'n this !hapter, e earn abo&t the f&n%amenta programming toos that i ao o&r programs to %o this.
"o far e have earne% abo&t severa %ifferent %ata types strings, n&mbers, an% fie ob9e!ts=, a of hi!h store a singe bit of information1. hen eve nee%e% to store m&tipe bits of information for eampe, the three D;4 se<&en!es in the !hapter 3 eer!ises= e have simpy !reate% more variabes to ho% them:
4 set the alues of all the seuence aria-les se/1 B *A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&%* se/2 B *actgatcgacgatcgatcgatcacgact* se/3 B *A&)%A&FA&)%)OA&)%)AFFFF&A)%)%*
1 e -no that fies are sighty %ifferent to strings an% n&mbers be!a&se they !an store a ot of information, b&t ea!h fie ob9e!t sti ony refers to a singe fie.
H ChapterB:+istsan%oops
$he imitations of this approa!h be!ame !ear <&ite <&i!-y as e oo-e% at the
so&tion !o%e 5 it ony or-e% be!a&se the n&mber of se<&en!es ere sma, an% e -ne the n&mber in a%van!e. 'f e ere to repeat the eer!ise ith three h&n%re% or three tho&san% se<&en!es, the vast ma9ority of the !o%e o&% be given over to storing variabes an% it o&% be!ome !ompetey &nmanageabe. 4n% if e ere to try an% rite a program that !o&% pro!ess an &n-non n&mber of inp&t
se<&en!es for instan!e, by rea%ing them from a fie=, e o&%nt be abe to %o it. $o ma-e o&r programs abe to pro!ess m&tipe pie!es of %ata, e nee% an entirey ne type of str&!t&re hi!h !an ho% many pie!es of information at the same time 5 alist .
eve aso %eat e!&sivey ith programs hose statements are ee!&te% from top to bottom in a very straightforar% ay. $his has great a%vantages hen first
starting to thin- abo&t programming 5 it ma-es it very easy to foo the fo of a program. $he %onsi%e of this se<&entia stye of programming, hoever, is that it ea%s to very re%&n%ant !o%e i-e e sa at the en% of the previo&s !hapter:
4 ma:e three files to hold the output output/1 B open(header/1 C *.fasta*, **# output/2 B open(header/2 C *.fasta*, **# output/3 B open(header/3 C *.fasta*, **#
4gain it as ony possibe to sove the eer!ise in this manner be!a&se e -ne in a%van!e the n&mber of o&tp&t fies e ere going to nee%. +oo-ing at the !o%e, its !ear that these three ines !onsist of essentiay the same statement being
ee!&te% m&tipe times, ith some sight variations. $his i%ea of repetitionith variation is in!re%iby !ommon in programming probems, an% #ython has b&it in
HG ChapterB:+istsan%oops