• No results found

EGTDC Perl Course 2004

N/A
N/A
Protected

Academic year: 2021

Share "EGTDC Perl Course 2004"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

EGTDC Perl Course 2004

Built­in Functions

Tim Booth : [email protected]

(2)

Overview

• Perl has a large variety of built­in functions, which you can call any time you  need them.  

• You have already met some – eg 'print', 'push', 'pop'.

• You can get a full list of the functions, and instructions for using them, on the  'perlfunc' manpage.

• The functions range from the very basic to some which are quite complex or  obscure.

(3)

General points

• Functions are called by giving the function name, followed by arguments in  parentheses.

function_name(...stuff...)

• A function may return a value or values, which we can capture.

$result = function(...stuff...)

or

@results = function(...stuff...)

(4)

SPLIT

• split can take a text string and chop it into bits, producing an array of the 

individual pieces.  The "recognition sequence" can be any regex but is not retained  in the array ­ (split.pl).

$dna_strand = "AGCTATCGATGCTTTAAACGGCTATCGAGTTTTTTTT";

print "My DNA strand is: $dna_strand\n";

print "If we split this using TTTAAA we get the following fragments:\n";

@dna_fragments = split(/TTTAAA/,$dna_strand);

foreach $fragment (@dna_fragments) { print "$fragment\n";

(5)

JOIN

• join is the conceptual opposite of split.  Lets think of it interms of a DNA  ligation with a linker sequence (join.pl):

my $ligated_fragments;

my @dna_fragments;

@dna_fragments=("AGGCTT", "AGCCCAAATT", "AGCCCCATTA");

$ligated_fragments = join ("aaattt", @dna_fragments);

print "The fragments have been ligated with an aaattt linker:\n";

print "$ligated_fragments\n";

(6)

LENGTH

• length ­ finds the length of a string (or a bit of DNA!) (length.pl).

• It does NOT find the length of an array – use 'scalar' for that.

#!/usr/bin/perl -w use strict;

my ($genome, $genome_length);

$genome =

"AGATCATCGATCGATCGATCAGCATTCAGCTACTAGCTAGCTGGGGGGATCATCTATC";

$genome_length = length($genome);

print "My genome sequence is:\n$genome\nand is $genome_length bases long\n"

(7)

SUBSTR

• substr extracts a specified part of a string (substr.pl).

• substr($scalar, $start_position, $length)

#!/usr/bin/perl -w

use strict;

my ($dna_sequence, $substring);

$dna_sequence = "AGCTATACGACTAGTCTGATCGATCATCGATGCTGA";

$substring = substr ($dna_sequence, 0, 5);

print "The first 5 bases of $dna_sequence are:\n$substring\n";

(8)

UC/LC

• uc (uppercase) and lc (lowercase) simply change the case of a string (uclc.pl).

#!/usr/bin/perl -w use strict;

my ($mixed_case, $uppercase, $lowercase);

$mixed_case = "AgCtAAGggGTCaCAcAAAAaCCCcATTTgcCC";

$uppercase = uc ($mixed_case);

$lowercase = lc ($mixed_case);

print "FrOm $mixed_case we get:\n";

print "UPPERCASE: $uppercase\n";

print "lowercase: $lowercase\n";

(9)

POP/PUSH/SHIFT/UNSHIFT

• These four functions add or remove values from the right or left sides of arrays. 

(pushpop.pl)

#!/usr/bin/perl

my @animals = (“dog”, “cat”, “badger”, “snake”);

my $last = pop(@animals); #snake my $first = shift(@animals); #dog

print “Last is $last, first is $first\n”;

#Put last animal first:

unshift(@animals, $last); #snake is now first animal

(10)

REVERSE/SORT

• Both take a list as input and return a list.

• Beware ­ the sort function will apply alphabetical order by default.  This means  the list (1, 2, 10) will sort to (1, 10, 2)! (revsort.pl)

#!/usr/bin/perl

my @animals = (“dog”, “cat”, “badger”, “snake”);

my @numbers = (1, 10, 2, 130);

my @sorted_animals = sort(@animals);

my @reverse_sorted_animals = reverse(@sorted_animals);

#Using sort as follows will do a numeric sort. Don't worry about exactly how it works!

(11)

GREP

• Grep filters an array on some criteria, which you can specify.

• The criteria can be anything you like, but the simplest form is to look for a  pattern match.  We will use grep later on. (grep.pl)

#!/usr/bin/perl

my @fragments = (“aagt”, “gt”, “cct”, “agtcgat”);

#Get all the fragments containing 'ag' my @filtered = grep(/ag/, @fragments);

#Remember that 'scalar' gets the length of an array.

print scalar(@filtered);

print “ fragments contain 'ag'.\n”;

(12)

Substitution : S///

• This is one of the most useful Perl constructs.

• The obvious difference between DNA and RNA is the replacement of T with U.

• Let's mimic the transcription of DNA to RNA with a quick Perl script.

• We can use the global substitution operator 's///g'.

• This can convert one element in a scalar to another element.

• This takes the form s/[one thing]/[for another thing]/g

• The addition of the 'g' modifier specifies a global substitution, so that all  occurrences will be replaced in one go.

• Let's see it in action (transcription.pl).

(13)

#!/usr/bin/perl use strict;

use warnings;

my ($dna_molecule, $rna_molecule);

$dna_molecule = "AGCTATCGATGCTTTCGATCACCGGCTATCGAGTTTTTTTT";

print "My DNA molecule is $dna_molecule\n";

$rna_molecule = $dna_molecule;

$rna_molecule =~ s/T/U/g;

print "My RNA molecule is $rna_molecule\n";

(14)

• What is that funny =~ sign?

• This is called the “pattern binding operator".

• Allows you to specify the target of a pattern matching or substitution operation.  

Patterns are signified in Perl by enclosing / / symbols. 

• We have $rna_molecule =~ s/T/U/g; - which means perform the  operation s/T/U/g on $rna_molecule.  

• Note that the subsitution is not a function – it modifies the actual target variable. 

 This is why we copy the original $dna_molecule to the new scalar 

$rna_molecule before applying the substitution.

=~

(15)

Transliteration

• Seeing that substitution allows you to change one thing into another, we might  try to use the same technique to get the complement of a DNA strand.

• All we have to do is change all the A's to T's, all the G's to C's, all the T's to A's  and all the C's to G's.

• Then if we reverse it we get the reverse complement!  Or do we?  See  wrong_revcom.pl.

• What is the flaw in this approach?

(16)

$DNA = "AAAAGGGGCCCCTTTAGCTAGCT";

$DNA_UNTOUCHED = $DNA;

print "After no substitutions: DNA is : $DNA\n";

#substitute all the A's to T's

$DNA =~ s/A/T/g;

print "After A-T substitution: DNA is : $DNA\n";

#substiutute all the G's to C's

$DNA =~ s/G/C/g;

print "After G-C substitution: DNA is : $DNA\n";

#substitute all the C's to G's

$DNA =~ s/C/G/g;

print "After C-G substitution: DNA is : $DNA\n";

#subsitute all the T's to A's

$DNA =~ s/T/A/g;

print "After A-T substitution: DNA is : $DNA\n";

$DNA = reverse ($DNA);

(17)

The tr/// operator

• The previous code was logically flawed.  In general, you must be very careful 

when doing sequential substitutions, as Perl has no way of spotting flaws in your  logic.

• Ideally we want make all our substitutions in one statement that understands our  needs, and Perl provides one – the tr operator..

• tr is a bit like s, but accepts a whole set of single­character substitutions at  once.

• tr/ABCD/dcba would make AABBCCDD into ddccbbaa.  

• A working DNA complementer can be found in revcomp.pl.

(18)

TR/REVERSE Example

#!/usr/bin/perl

use strict;

use warnings;

my ($DNA, $DNA_UNTOUCHED);

$DNA = "AAAAGGGGCCCCTTTAGCTAGCT";

$DNA_UNTOUCHED = $DNA;

$DNA =~ tr/AGCT/TCGA/;

$DNA = reverse ($DNA);

print "$DNA_UNTOUCHED has a reverse complement of:\n$DNA\n";

(19)

REVERSE and 'DWIM'

The sharp­eyed here may have noticed that reverse was used to reverse a string in the  last example, even though it was introduced as being a function which reverses the 

elements in an array.

Perl is a language which often tries to 'Do What I Mean' (DWIM), and this is the case with  reverse, which does two different things in different situations.

This can be handy, but if Perl guesses wrongly it might trip you up:

my $string = “abcde”;

my $reversed = reverse($string);

my ($notreversed) = reverse($string);

print “Reversed string is $reversed, but not $notreversed.\n”;

If in doubt use scalar(reverse($string)) to force the issue.

References

Related documents

[r]

In this review, the research carried out using various ion-exchange resin-like adsorbents including modified clays, lignocellulosic biomasses, chitosan and its derivatives, microbial

Retirement plan participants can get guidance and signals about strategies for using their funds in retirement from their benefit plan architecture and communications, from general

Secretary of Education Arne Duncan recently met with NAESP Executive Director Gail Connelly to discuss the Obama administration’s vision, initiatives, and goals for America’s

Assuming 4 percent of the population has been reached four years after launch, with an ARPU of EUR 20 using unlimited flat rate and average traffic per subscriber of 2GB per

NonCommercial­ShareAlike License. To view a copy of this license, visit 

● All good modules will self-test before they install, so you just need to verify that Perl can find the new

Learning Perl Objects, References and Modules by Randal L Schwartz with Tom Phoenix (O'Reilly)