4s e start this !hapter, e fin% o&rseves on!e again %oing things in a sighty %ifferent or%er to most programming boo-s. $he ma9ority of intro%&!tory
programming boo-s ont !onsi%er or-ing ith eterna fies &nti m&!h f&rther aong, so hy are e intro%&!ing it noE
$he anser, as as the !ase in the ast !hapter, ies in the parti!&ar 9obs that e ant to &se #ython for. $he %ata that e as bioogists or- ith is store% in fies, so
if ere going to rite &sef& programs e nee% a ay to get the %ata o&t of fies an% into o&r programs an% vice versa=. 4s yo& ere going thro&gh the eer!ises in the previo&s !hapter, it may have o!!&rre% to yo& that !opying an% pasting a D;4 se<&en!e %ire!ty into a program ea!h time e ant to &se it is not a very goo% approa!h to ta-e, an% yo&% be right. $he se<&en!es e ere or-ing ith in the eer!ises ere very short obvio&sy reaife %ata i be m&!h onger. 4so, it seems ineegant to have the %ata e ant to or- on mie% &p ith the !o%e that manip&ates it. 'n this !hapter e see a better ay to %o it.
ere &!-y in bioogy that many of the types of %ata that e or- ith are store% in tet1 fies hi!h are reativey simpe to pro!ess &sing #ython. Chief among these, of !o&rse, are D;4 an% protein se<&en!e %ata, hi!h !an be store% in a variety of formats.2 (&t there are many other types of %ata 5 se<&en!ing rea%s,
<&aity s!ores, ";#s, phyogeneti! trees, rea% maps, geographi!a sampe %ata,
geneti! %istan!e matri!es 5 hi!h e !an a!!ess from ithin o&r #ython programs.
1 i.e. fies hi!h yo& !an open in a tet e%itor an% rea%, as oppose% to binary fies hi!h !annot be rea% %ire!ty.
2 'n this boo- e mosty be ta-ing abo&t F4"$4 format as its the simpest an% most !ommon format, b&t there are many more.
3 Chapter 3: ea%ing an% riting fies
4nother reason for o&r interest in fie inp&t/o&tp&t is the nee% for o&r #ython programs to or- as part of a pipeine or or- fo invoving other, eisting toos. hen it !omes to &sing #ython in the rea or%, e often ant #ython to either a!!ept %ata from, or provi%e %ata to, another program. ?ften the easiest ay to %o this is to have #ython rea%, or rite, fies in a format that the other program
area%y &n%erstan%s.
(ea"ing te!t from a file
Firsty, a <&i!- note abo&t hat e mean by tet. 'n programming, hen e ta- abo&t te!t files, e are not ne!essariy ta-ing abo&t something that is h&man rea%abe. ather, e are ta-ing abo&t a fie that !ontains !hara!ters an% ines 5 something that yo& !o&% open &p an% vie in a tet e%itor, regar%ess of hether yo& !o&% a!t&ay ma-e sense of the fie or not. )ampes of tet fies hi!h yo&
might have en!o&ntere% in!&%e:
• F4"$4 fies of D;4 or protein se<&en!es
• fies !ontaining o&tp&t from !omman%ine programs e.g. (+4"$= • F4"$L fies !ontaining D;4 se<&en!ing rea%s
• 7$M+ fies
• or% pro!essing %o!&ments • an% of !o&rse, #ython !o%e
'n !ontrast, most fies that yo& en!o&nter %ayto%ay i be binary files 5 ones hi!h are not ma%e &p of !hara!ters an% ines, b&t of bytes. )ampes in!&%e:
• image fies J#)Rs an% #;Rs= • a&%io fies
B Chapter 3: ea%ing an% riting fies
• !ompresse% fies e.g. X'# fies=
'f yo&re not s&re hether a parti!&ar fie is tet or binary, theres a very simpe ay to te 5 9&st open it &p in a tet e%itor. 'f the fie %ispays itho&t any probem,
then its tet regar%ess of hether yo& !an ma-e sense of it or not=. 'f yo& get an error or a arning from yo&r tet e%itor, or the fie %ispays as a !oe!tion of
in%e!ipherabe !hara!ters, then its binary.
$he eampes an% eer!ises in this !hapter are a itte %ifferent from those in the previo&s one, be!a&se they rey on the eisten!e of the fies that e are going to manip&ate. 'f yo& ant to try r&nning the eampes in this !hapter, yo& nee% to ma-e s&re that there is a fie in yo&r or-ing %ire!tory !ae% "na=t!t hi!h has a singe ine !ontaining a D;4 se<&en!e. $he easiest ay to %o this is to r&n the eampes hie in the chapter) fo%er insi%e the eer!ises %onoa%1.