Overview
• Perl has a large variety of builtin functions, which you can call any time you need them.
• You have already met some – eg 'print', 'push', 'pop'.
• You can get a full list of the functions, and instructions for using them, on the 'perlfunc' manpage.
• The functions range from the very basic to some which are quite complex or obscure.
General points
• Functions are called by giving the function name, followed by arguments in parentheses.
function_name(...stuff...)
• A function may return a value or values, which we can capture.
$result = function(...stuff...)
or
@results = function(...stuff...)
SPLIT
• split can take a text string and chop it into bits, producing an array of the
individual pieces. The "recognition sequence" can be any regex but is not retained in the array (split.pl).
$dna_strand = "AGCTATCGATGCTTTAAACGGCTATCGAGTTTTTTTT";
print "My DNA strand is: $dna_strand\n";
print "If we split this using TTTAAA we get the following fragments:\n";
@dna_fragments = split(/TTTAAA/,$dna_strand);
foreach $fragment (@dna_fragments) { print "$fragment\n";
JOIN
• join is the conceptual opposite of split. Lets think of it interms of a DNA ligation with a linker sequence (join.pl):
my $ligated_fragments;
my @dna_fragments;
@dna_fragments=("AGGCTT", "AGCCCAAATT", "AGCCCCATTA");
$ligated_fragments = join ("aaattt", @dna_fragments);
print "The fragments have been ligated with an aaattt linker:\n";
print "$ligated_fragments\n";
LENGTH
• length finds the length of a string (or a bit of DNA!) (length.pl).
• It does NOT find the length of an array – use 'scalar' for that.
#!/usr/bin/perl -w use strict;
my ($genome, $genome_length);
$genome =
"AGATCATCGATCGATCGATCAGCATTCAGCTACTAGCTAGCTGGGGGGATCATCTATC";
$genome_length = length($genome);
print "My genome sequence is:\n$genome\nand is $genome_length bases long\n"
SUBSTR
• substr extracts a specified part of a string (substr.pl).
• substr($scalar, $start_position, $length)
#!/usr/bin/perl -w
use strict;
my ($dna_sequence, $substring);
$dna_sequence = "AGCTATACGACTAGTCTGATCGATCATCGATGCTGA";
$substring = substr ($dna_sequence, 0, 5);
print "The first 5 bases of $dna_sequence are:\n$substring\n";
UC/LC
• uc (uppercase) and lc (lowercase) simply change the case of a string (uclc.pl).
#!/usr/bin/perl -w use strict;
my ($mixed_case, $uppercase, $lowercase);
$mixed_case = "AgCtAAGggGTCaCAcAAAAaCCCcATTTgcCC";
$uppercase = uc ($mixed_case);
$lowercase = lc ($mixed_case);
print "FrOm $mixed_case we get:\n";
print "UPPERCASE: $uppercase\n";
print "lowercase: $lowercase\n";
POP/PUSH/SHIFT/UNSHIFT
• These four functions add or remove values from the right or left sides of arrays.
(pushpop.pl)
#!/usr/bin/perl
my @animals = (“dog”, “cat”, “badger”, “snake”);
my $last = pop(@animals); #snake my $first = shift(@animals); #dog
print “Last is $last, first is $first\n”;
#Put last animal first:
unshift(@animals, $last); #snake is now first animal
REVERSE/SORT
• Both take a list as input and return a list.
• Beware the sort function will apply alphabetical order by default. This means the list (1, 2, 10) will sort to (1, 10, 2)! (revsort.pl)
#!/usr/bin/perl
my @animals = (“dog”, “cat”, “badger”, “snake”);
my @numbers = (1, 10, 2, 130);
my @sorted_animals = sort(@animals);
my @reverse_sorted_animals = reverse(@sorted_animals);
#Using sort as follows will do a numeric sort. Don't worry about exactly how it works!
GREP
• Grep filters an array on some criteria, which you can specify.
• The criteria can be anything you like, but the simplest form is to look for a pattern match. We will use grep later on. (grep.pl)
#!/usr/bin/perl
my @fragments = (“aagt”, “gt”, “cct”, “agtcgat”);
#Get all the fragments containing 'ag' my @filtered = grep(/ag/, @fragments);
#Remember that 'scalar' gets the length of an array.
print scalar(@filtered);
print “ fragments contain 'ag'.\n”;
Substitution : S///
• This is one of the most useful Perl constructs.
• The obvious difference between DNA and RNA is the replacement of T with U.
• Let's mimic the transcription of DNA to RNA with a quick Perl script.
• We can use the global substitution operator 's///g'.
• This can convert one element in a scalar to another element.
• This takes the form s/[one thing]/[for another thing]/g
• The addition of the 'g' modifier specifies a global substitution, so that all occurrences will be replaced in one go.
• Let's see it in action (transcription.pl).
#!/usr/bin/perl use strict;
use warnings;
my ($dna_molecule, $rna_molecule);
$dna_molecule = "AGCTATCGATGCTTTCGATCACCGGCTATCGAGTTTTTTTT";
print "My DNA molecule is $dna_molecule\n";
$rna_molecule = $dna_molecule;
$rna_molecule =~ s/T/U/g;
print "My RNA molecule is $rna_molecule\n";
• What is that funny =~ sign?
• This is called the “pattern binding operator".
• Allows you to specify the target of a pattern matching or substitution operation.
Patterns are signified in Perl by enclosing / / symbols.
• We have $rna_molecule =~ s/T/U/g; - which means perform the operation s/T/U/g on $rna_molecule.
• Note that the subsitution is not a function – it modifies the actual target variable.
This is why we copy the original $dna_molecule to the new scalar
$rna_molecule before applying the substitution.
=~
Transliteration
• Seeing that substitution allows you to change one thing into another, we might try to use the same technique to get the complement of a DNA strand.
• All we have to do is change all the A's to T's, all the G's to C's, all the T's to A's and all the C's to G's.
• Then if we reverse it we get the reverse complement! Or do we? See wrong_revcom.pl.
• What is the flaw in this approach?
$DNA = "AAAAGGGGCCCCTTTAGCTAGCT";
$DNA_UNTOUCHED = $DNA;
print "After no substitutions: DNA is : $DNA\n";
#substitute all the A's to T's
$DNA =~ s/A/T/g;
print "After A-T substitution: DNA is : $DNA\n";
#substiutute all the G's to C's
$DNA =~ s/G/C/g;
print "After G-C substitution: DNA is : $DNA\n";
#substitute all the C's to G's
$DNA =~ s/C/G/g;
print "After C-G substitution: DNA is : $DNA\n";
#subsitute all the T's to A's
$DNA =~ s/T/A/g;
print "After A-T substitution: DNA is : $DNA\n";
$DNA = reverse ($DNA);
The tr/// operator
• The previous code was logically flawed. In general, you must be very careful
when doing sequential substitutions, as Perl has no way of spotting flaws in your logic.
• Ideally we want make all our substitutions in one statement that understands our needs, and Perl provides one – the tr operator..
• tr is a bit like s, but accepts a whole set of singlecharacter substitutions at once.
• tr/ABCD/dcba would make AABBCCDD into ddccbbaa.
• A working DNA complementer can be found in revcomp.pl.
TR/REVERSE Example
#!/usr/bin/perl
use strict;
use warnings;
my ($DNA, $DNA_UNTOUCHED);
$DNA = "AAAAGGGGCCCCTTTAGCTAGCT";
$DNA_UNTOUCHED = $DNA;
$DNA =~ tr/AGCT/TCGA/;
$DNA = reverse ($DNA);
print "$DNA_UNTOUCHED has a reverse complement of:\n$DNA\n";
REVERSE and 'DWIM'
• The sharpeyed here may have noticed that reverse was used to reverse a string in the last example, even though it was introduced as being a function which reverses the
elements in an array.
• Perl is a language which often tries to 'Do What I Mean' (DWIM), and this is the case with reverse, which does two different things in different situations.
● This can be handy, but if Perl guesses wrongly it might trip you up:
– my $string = “abcde”;
– my $reversed = reverse($string);
– my ($notreversed) = reverse($string);
– print “Reversed string is $reversed, but not $notreversed.\n”;
• If in doubt use scalar(reverse($string)) to force the issue.