EGTDC Perl Course 2004

(1)

EGTDC Perl Course 2004

Builtin Functions

Tim Booth : [email protected]

(2)

Overview

• Perl has a large variety of builtin functions, which you can call any time you need them.

• You have already met some – eg 'print', 'push', 'pop'.

• You can get a full list of the functions, and instructions for using them, on the 'perlfunc' manpage.

• The functions range from the very basic to some which are quite complex or obscure.

(3)

General points

• Functions are called by giving the function name, followed by arguments in parentheses.

function_name(...stuff...)

• A function may return a value or values, which we can capture.

$result = function(...stuff...)

or

@results = function(...stuff...)

(4)

SPLIT

• split can take a text string and chop it into bits, producing an array of the

individual pieces. The "recognition sequence" can be any regex but is not retained in the array (split.pl).

$dna_strand = "AGCTATCGATGCTTTAAACGGCTATCGAGTTTTTTTT";

print "My DNA strand is: $dna_strand\n";

print "If we split this using TTTAAA we get the following fragments:\n";

@dna_fragments = split(/TTTAAA/,$dna_strand);

foreach $fragment (@dna_fragments) { print "$fragment\n";

(5)

JOIN

• join is the conceptual opposite of split. Lets think of it interms of a DNA ligation with a linker sequence (join.pl):

my $ligated_fragments;

my @dna_fragments;

@dna_fragments=("AGGCTT", "AGCCCAAATT", "AGCCCCATTA");

$ligated_fragments = join ("aaattt", @dna_fragments);

print "The fragments have been ligated with an aaattt linker:\n";

print "$ligated_fragments\n";

(6)

LENGTH

• length finds the length of a string (or a bit of DNA!) (length.pl).

• It does NOT find the length of an array – use 'scalar' for that.

#!/usr/bin/perl -w use strict;

my ($genome, $genome_length);

$genome =

"AGATCATCGATCGATCGATCAGCATTCAGCTACTAGCTAGCTGGGGGGATCATCTATC";

$genome_length = length($genome);

print "My genome sequence is:\n$genome\nand is $genome_length bases long\n"

(7)

SUBSTR

• substr extracts a specified part of a string (substr.pl).

• substr($scalar, $start_position, $length)

#!/usr/bin/perl -w

use strict;

my ($dna_sequence, $substring);

$dna_sequence = "AGCTATACGACTAGTCTGATCGATCATCGATGCTGA";

$substring = substr ($dna_sequence, 0, 5);

print "The first 5 bases of $dna_sequence are:\n$substring\n";

(8)

UC/LC

• uc (uppercase) and lc (lowercase) simply change the case of a string (uclc.pl).

#!/usr/bin/perl -w use strict;

my ($mixed_case, $uppercase, $lowercase);

$mixed_case = "AgCtAAGggGTCaCAcAAAAaCCCcATTTgcCC";

$uppercase = uc ($mixed_case);

$lowercase = lc ($mixed_case);

print "FrOm $mixed_case we get:\n";

print "UPPERCASE: $uppercase\n";

print "lowercase: $lowercase\n";

(9)

POP/PUSH/SHIFT/UNSHIFT

• These four functions add or remove values from the right or left sides of arrays.

(pushpop.pl)

#!/usr/bin/perl

my @animals = (“dog”, “cat”, “badger”, “snake”);

my $last = pop(@animals); #snake my $first = shift(@animals); #dog

print “Last is $last, first is $first\n”;

#Put last animal first:

unshift(@animals, $last); #snake is now first animal

(10)

REVERSE/SORT

• Both take a list as input and return a list.

• Beware the sort function will apply alphabetical order by default. This means the list (1, 2, 10) will sort to (1, 10, 2)! (revsort.pl)

#!/usr/bin/perl

my @animals = (“dog”, “cat”, “badger”, “snake”);

my @numbers = (1, 10, 2, 130);

my @sorted_animals = sort(@animals);

my @reverse_sorted_animals = reverse(@sorted_animals);

#Using sort as follows will do a numeric sort. Don't worry about exactly how it works!

(11)

GREP

• Grep filters an array on some criteria, which you can specify.

• The criteria can be anything you like, but the simplest form is to look for a pattern match. We will use grep later on. (grep.pl)

#!/usr/bin/perl

my @fragments = (“aagt”, “gt”, “cct”, “agtcgat”);

#Get all the fragments containing 'ag' my @filtered = grep(/ag/, @fragments);

#Remember that 'scalar' gets the length of an array.

print scalar(@filtered);

print “ fragments contain 'ag'.\n”;

(12)

Substitution : S///

• This is one of the most useful Perl constructs.

• The obvious difference between DNA and RNA is the replacement of T with U.

• Let's mimic the transcription of DNA to RNA with a quick Perl script.

• We can use the global substitution operator 's///g'.

• This can convert one element in a scalar to another element.

• This takes the form s/[one thing]/[for another thing]/g

• The addition of the 'g' modifier specifies a global substitution, so that all occurrences will be replaced in one go.

• Let's see it in action (transcription.pl).

(13)

#!/usr/bin/perl use strict;

use warnings;

my ($dna_molecule, $rna_molecule);

$dna_molecule = "AGCTATCGATGCTTTCGATCACCGGCTATCGAGTTTTTTTT";

print "My DNA molecule is $dna_molecule\n";

$rna_molecule = $dna_molecule;

$rna_molecule =~ s/T/U/g;

print "My RNA molecule is $rna_molecule\n";

(14)

• What is that funny =~ sign?

• This is called the “pattern binding operator".

• Allows you to specify the target of a pattern matching or substitution operation.

Patterns are signified in Perl by enclosing / / symbols.

• We have $rna_molecule =~ s/T/U/g; - which means perform the operation s/T/U/g on $rna_molecule.

• Note that the subsitution is not a function – it modifies the actual target variable.

This is why we copy the original $dna_molecule to the new scalar

$rna_molecule before applying the substitution.

=~

(15)

Transliteration

• Seeing that substitution allows you to change one thing into another, we might try to use the same technique to get the complement of a DNA strand.

• All we have to do is change all the A's to T's, all the G's to C's, all the T's to A's and all the C's to G's.

• Then if we reverse it we get the reverse complement! Or do we? See wrong_revcom.pl.

• What is the flaw in this approach?

(16)

$DNA = "AAAAGGGGCCCCTTTAGCTAGCT";

$DNA_UNTOUCHED = $DNA;

print "After no substitutions: DNA is : $DNA\n";

#substitute all the A's to T's

$DNA =~ s/A/T/g;

print "After A-T substitution: DNA is : $DNA\n";

#substiutute all the G's to C's

$DNA =~ s/G/C/g;

print "After G-C substitution: DNA is : $DNA\n";

#substitute all the C's to G's

$DNA =~ s/C/G/g;

print "After C-G substitution: DNA is : $DNA\n";

#subsitute all the T's to A's

$DNA =~ s/T/A/g;

print "After A-T substitution: DNA is : $DNA\n";

$DNA = reverse ($DNA);

(17)

The tr/// operator

• The previous code was logically flawed. In general, you must be very careful

when doing sequential substitutions, as Perl has no way of spotting flaws in your logic.

• Ideally we want make all our substitutions in one statement that understands our needs, and Perl provides one – the tr operator..

• tr is a bit like s, but accepts a whole set of singlecharacter substitutions at once.

• tr/ABCD/dcba would make AABBCCDD into ddccbbaa.

• A working DNA complementer can be found in revcomp.pl.

(18)

TR/REVERSE Example

#!/usr/bin/perl

use strict;

use warnings;

my ($DNA, $DNA_UNTOUCHED);

$DNA = "AAAAGGGGCCCCTTTAGCTAGCT";

$DNA_UNTOUCHED = $DNA;

$DNA =~ tr/AGCT/TCGA/;

$DNA = reverse ($DNA);

print "$DNA_UNTOUCHED has a reverse complement of:\n$DNA\n";

(19)

REVERSE and 'DWIM'

• The sharpeyed here may have noticed that reverse was used to reverse a string in the last example, even though it was introduced as being a function which reverses the

elements in an array.

• Perl is a language which often tries to 'Do What I Mean' (DWIM), and this is the case with reverse, which does two different things in different situations.

● This can be handy, but if Perl guesses wrongly it might trip you up:

– my $string = “abcde”;

– my $reversed = reverse($string);

– my ($notreversed) = reverse($string);

– print “Reversed string is $reversed, but not $notreversed.\n”;

• If in doubt use scalar(reverse($string)) to force the issue.

EGTDC Perl Course 2004