Basic Input and Output using Perl
Environmental Genomics Thematic Programme Data Centre http://envgen.nox.ac.uk
Bela Tiwari [email protected]
Basic Input and Output – I/O
Input refers to getting information into your program Output refers to getting information out of your program I/O is how computer programs talk to the rest of the “world”
For example, you may want to:
• Open a file and read its contents
• Write your results to a file
• Ask the user of the program to supply information
Every Perl script starts with three connections to the outside world One of these is for input to your program, and two are for output from your program
Input – By default, Perl has a connection set up for taking information entered from the keyboard.
This connection is referred to as STDIN
Output – 1) By default, Perl has a connection set up for writing data out to your terminal (screen).
This connection is referred to as STDOUT
2) By default, Perl has a connection set up for writing diagnostic messages, (warnings, etc.) to your terminal.
This connection is referred to as STDERR
You can change the default locations for STDIN, STDOUT and STDERR
Reading data from STDIN
aka: how to type in information and get your program to listen
To read a line of data into your program from the keyboard, use the angle bracket function <> on STDIN
$line = <STDIN>
<STDIN> reads one line of input from the keyboard and this line is “fed into” the variable $line.
In computer speak: <STDIN> reads a line of input from standard input and returns it as the function result. This result is then assigned to the scalar variable $line.
Reading data from STDIN
Try this:
#!/usr/bin/perl
print “Please enter your name: “;
$name = <STDIN>;
print “Hello $name. Glad to meet you.”;
When you run the script, does the greeting look like what you wanted?
Newlines and STDIN
When you entered your name, you pressed the Return key Perl takes both your name, and the return-key value into the script.
Thus, in the first version of the script, $name contains your name followed by a return, also known as a newline.
So your greeting probably looked something like:
Hello Bob . Glad to meet you.
Also, Perl didn’t know to return at the end of your greeting, so your greeting ran straight into your cursor
Sorting out the newlines
Now try:#!/usr/bin/perl
print “Please enter your name: “;
$name = <STDIN>;
chomp ($name);
print “Hello $name. Glad to meet you.\n”;
Save and run this script.
Chomp and \n
#!/usr/bin/perl
print “Please enter your name: “;
$name = <STDIN>; #at this point $name still has the newline at the end chomp ($name); #chomp removes the final newline from the variable print “Hello $name. Glad to meet you.\n”;
# \n adds a newline to the end of your greeting
Newlines and STDIN
\n – this means newline
There are other useful characters like this. For example:
\t – tab
\b – backspace
\a – alarm (bell)
print “Hello $name. Glad to meet you.\n”;
print “Hello $name. Glad to meet you.\n\n”;
print “Hello $name. Glad to meet you.\n\n\n”;
STDOUT – getting stuff to the screen
This statement:
print “Hello $name. Glad to meet you.\n”;
resulted in a greeting being printed to your screen.
But you never explicitly told Perl to send it there.
Recall: By default, Perl has a connection set up for writing data out to your terminal (screen) - STDOUT
The print command sends information to STDOUT by default.
And by default, STDOUT is connected to your terminal.
Reading from and writing to files
STDIN, STDOUT and STDERR are known as Filehandles.
Filehandles are connections.
You can set up connections to locations other than the keyboard or terminal by setting up your own Filehandles.
To do this, you need to tell Perl a few things:
• what do you want a connection to e.g. the name of a file (or directory)
• what do you want to do e.g. write to a file, read to a file, both?
• what do you want to call the connection when you refer to it in your program name the Filehandle
Reading from and writing to files
To open a connection to read from a file:
open(FILECON, “/home/btiwari/myfile.txt”);
• what do you want a connection to /home/btiwari/myfile.txt
• what do you want to do read from the file
• what do you want to call the connection when you refer to it in your program FILECON
It is convention that FILEHANDLES be in capital letters.
Reading from and writing to files
To open a connection to read from a file:
open(FILECON, “/home/btiwari/myfile.txt”);
To open a connection to create and write to a file:
open(FILECON, “>/home/btiwari/anotherfile.txt”);
To open a connection to append to a file:
open(FILECON, “>>/home/btiwari/anotherfile.txt”);
To open a connection to a file you wish to read from and write to:
open(FILECON, “+>/home/btiwari/myfile.txt”);
Reading from and writing to files
Overview of accessing files:
• Open the file to read from
• Open a file to write to (Carry out processing)
• Close the file you are writing to
• Close the file you are reading from
#!/usr/bin/perl
open(FROMFILE, “/home/btiwari/infile.txt”);
open(TOFILE, “>/home/btiwari/outfile.txt”);
#Process Process Process#
close(TOFILE);
close(FROMFILE);
Reading from and writing to files
An example to try (seqlength.pl):
#!/usr/bin/perl use strict;
use warnings;
my $count = 0; #declare variable $count
open(SEQ, “x52524.tfa”);
open(OUTFILE, “>outfile.txt”);
while (<SEQ>) {
if (/^>/) {
print OUTFILE $_;
}
else {
chomp $_; #get rid of the newline
$count += length($_);
} }
print OUTFILE “The length of the above sequence is $count bases.\n”;
close(OUTFILE);
close(SEQ);
Now the niceties
– explaining seqlength.pl#!/usr/bin/perl use strict;
use warnings;
my $count;
open(SEQ, “x52524.tfa”) or die “I can’t open your file: $!”;
open(OUTFILE, “>outfile.txt”) or die “I can’t open your file: $!”;
while (<SEQ>) {
if (/^>/) {
print OUTFILE; #where is $_
}
else {
chomp; #where is $_
$count += length(); #where is $_
} }
print OUTFILE “The length of the above sequence is $count bases.\n”;
close(OUTFILE);
close(SEQ);
Remember this?
Reading data from STDIN
#!/usr/bin/perl
print “Please enter your name: “;
$name = <STDIN>;
Print “Hello $name. Glad to meet you.”;
Why not ask the user for what files they want to process?
More niceties
– ask the user for the filenames (seqlength2.pl)#!/usr/bin/perl use strict;
use warnings;
my $count;
print “\nWhich sequence do you want to process: “;
chomp (my $infile = <STDIN>);
print “\nWhat file should I write the results to: “;
chomp (my $outfile = <STDIN>);
open(SEQ, $infile) or die “I can’t open $infile: $!;
open(OUTFILE, “>$outfile”) or die “I can’t open $outfile: $!;
while (<SEQ>) { if (/^>/) {
print OUTFILE;
} else {
chomp;
$count += length();
}
Even more niceties – the magic of <>
Up until now, you have seen <FILEHANDLE>
But <> without an explicit filehandle is “magical”:
It reads from each file listed on the command line as if it were one single large file.
If no files are given on the command line, it reads from STDIN.
Try typing:
./seqlength3.pl x52524.tfa
#!/usr/bin/perl use strict;
use warnings;
my $count;
print “\nWhat file should I write the results to: “;
chomp (my $outfile = <STDIN>);
open(OUTFILE, “>$outfile”) or die “I can’t open $outfile: $!;
while (<>) { if (/^>/) {
print OUTFILE;
} else {
chomp;
$count += length();
} }
print OUTFILE “The length of the above sequence is $count bases.\n”;
close(OUTFILE);
The magic of <> continued
< > reads from each file listed on the command line as if it were one single large file.
Try typing:
./seqlength4.pl x52524.tfa m83172.tfa
Look at the sequence files and the result file.
Open a new terminal and read the script seqlength4.pl Can you see why it does what it does?
How did it store and print out only the sequence names without the rest of the title line?
Why does the script now count the total number of bases in both sequences?
Reading the files in a directory
To list files on your system in a particular directory, you need to:
1) open the directory
opendir (MYDIR, “/home/user1/mydir”) or die “I can’t open the directory: S!”;
2) read the directory
$myfile = readdir MYDIR; #reads the first directory listed
@myfiles = readdir MYDIR; #reads all the files in a directory 3) close the directory
closedir(MYDIR);
Reading the files in a directory cont’d
readdir_1.pl
#!/usr/bin/perl use strict;
use warnings;
opendir(MYDIR, "/home/btiwari/bioperl-1.2.3") or die “Can’t open dir: $!”;
my $onedir = readdir(MYDIR);
print("\nThe one directory is $onedir\n");
my $seconddir = readdir(MYDIR);
print("\nThe second directory is $seconddir\n"
my $thirddir = readdir(MYDIR);
print("\nThe third directory is $thirddir\n");
closedir(MYDIR);
Notice that using a scalar variable with readdir causes the next file in the directory to be listed each time readdir is called.
Reading the files in a directory cont’d
readdir_2.pl
#!/usr/bin/perl use strict;
use warnings;
opendir(MYDIR, "/home/btiwari/bioperl-1.2.3") or die "I couldn't open the dir: $!”);
my @allfiles = readdir(MYDIR);
print("\nThe first directory is $allfiles[0]\n");
print("\nThe second directory is $allfiles[1]\n");
print("\nThe third directory is $allfiles[2] and the fourth is $allfiles[3]\n");
closedir(MYDIR);
Notice that using a list with readdir causes the all the files in the directory to be added to the list.
Each file can then be called by asking for a specific number in the list (as here), or looping through the array if you want to do something to each file in the list (as is common) .
Reading the files in a directory cont’d
The problem:
If you are listing files on a Linux/Unix system, you will see that the first two files in a directory are . and ..
Usually you don’t want to act on these. You need to add code to ignore them:
e.g.
my @allfiles = grep !/^\./, readdir MYDIR;
#this ignores ALL dot files in the directory
#see readdir_3.pl
my @allfiles = grep !/^\.\.?\z/, readdir MYDIR;
#this ignores . and ..
#see readdir_4.pl
It is good to understand the above commands – but to code effectively, you can cut and paste code you know will do the job!
STDIN typing or sending information to a program STDOUT information from a program sent to the terminal chomp
Filehandles opening and closing files reading from and writing to files
opening, reading contents of, and closing directories Using die and capturing error messages
The magic of <>
Useful characters \n, $_
Useful operators &&, ||, and, or