• No results found

Environmental Genomics Thematic Programme Data Centre

N/A
N/A
Protected

Academic year: 2021

Share "Environmental Genomics Thematic Programme Data Centre"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

EGTDC Perl Course 2003

Dan Swan: Flow Control

(2)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Statement Blocks

A statement block is a way of executing statements, in sequence.

They look like:

{

first_statement;

second_statement;

third_statement;

}

Why do we have them?

Very useful for blocking off bits of code so that one is executed if one condition is true but another is executed if it is not.

There are various ways of making decisions in Perl.

(3)

A note on equality

• When we use control structures, we generally compare one thing to another.

• What we are looking for in gneralised terms is "do X if A=B" or "do Y if A=C".

• When comparing scalars you can compare them in a numerical context or a string context

• Equals:

$integer == 1

$string eq "perl"

• Not equals:

$integer != 1

(4)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

IF/ELSE

• A control expression that IF the condition is true, one statement block is executed, ELSE a different statement block is exected (ifelse.pl).

if (control_expression is TRUE) { do this;

and this;

} else { do that;

and that;

}

(5)

ELSIF

• if/else is great for yes/no decisions. If you want to test mutltiple statements you can combine else and if to make 'elsif' (elsif.pl).

if (condition 1 is TRUE) { do this;

} elsif (condition 2 is TRUE) { do that;

} elsif (condition 3 is TRUE) { do the other;

} else { #all tests are failed

do whatever;

(6)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

WHILE

Lets say you want to do a series of actions whilst a certain condition is true (while.pl):

while (expression is true) { do this;

do that;

do the other; #until no longer true

}

(7)

FOREACH

• Takes a list of values and assigns them to a scalar variable, which executes a block of code (foreach.pl).

foreach $element (@list) { do this;

do that;

do the_other; #until no more $element's

}

(8)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

FOREACH(2)

• There's a few things missing from this code snippet compared to the previous one (foreach2.pl).

• No specification of $element this time. And yet it still works! Why?

foreach (@list) { do this;

do that;

do the_other; #until no more implied $_'s

}

(9)

FOREACH(3)

• There is an implied scalar on the previous slide - $_

• The $_ is a special scalar variable - almost like a scratchpad - its a container for information (foreach3.pl). Notice it works both for the foreach AND the print statement.

• Perl knows that if you use foreach (@list) that it is going to assign each element to a scalar - so it will use $_ by default.

foreach $_ (@list) { do this;

do that;

do the_other; #until no more $_'s

}

(10)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

FOR

• The statement people remember from BASIC (or C!)

• An initial expression is set up ($init_val), a test expression is then set up which is tested at each iteration ($init_val < 10). Finally the initial

expression is changed with each loop iteration ($init_val++).

for ($init_val; $init_val < 10; $init_val++) { print "$init_val\n";

}

(11)

A few asides - other control structures

• unless/else - like if/else - but unless (false) rather than if (true).

• do/while and do/until - "does" a statement block "while" a condition is evaluated or " does " a statement block " until " expression is evaluated.

• last - allows you to get out of a loop early - e.g. instead of loop finishing

when loop conditions are met - a loop can end when conditions internal to the

loop are met. See also "next" "redo" and read up on "labelled blocks" for

more info.

(12)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

EGTDC Perl Course 2003

Dan Swan:More built-in Perl functions

(13)

SPLIT

• split can take a scalar and chop it into bits, each individual bit then ends up in an array. The "recognition sequence" is user-defined but not retained (split.pl).

$dna_strand = "AGCTATCGATGCTTTAAACGGCTATCGAGTTTTTTTT";

print "My DNA strand is: $dna_strand\n";

print "If we split this using TTTAAA we get the following fragments:\n";

@dna_fragments = split(/TTTAAA/,$dna_strand);

foreach $fragment (@dna_fragments) { print "$fragment\n";

}

(14)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

JOIN

• join is the conceptual opposite of split . Lets think of it interms of a DNA ligation with a linker sequence (join.pl):

my ($ligated_fragments);

my (@dna_fragments);

@dna_fragments=("AGGCTT", "AGCCCAAATT", "AGCCCCATTA");

$ligated_fragments = join ("aaattt", @dna_fragments);

print "The fragments have been ligated with an aaattt linker:\n";

print "$ligated_fragments\n";

(15)

LENGTH

• length - finds the length of a scalar (or a bit of DNA!) (length.pl).

#!/usr/bin/perl -w use strict;

my ($genome, $genome_length);

$genome =

"AGATCATCGATCGATCGATCAGCATTCAGCTACTAGCTAGCTGGGGGGATCAT CTATC";

$genome_length = length($genome);

print "My genome sequence is:\n$genome\nand is

$genome_length bases long\n"

(16)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

SUBSTR

• substr extracts a specified part of a scalar (substr.pl).

• substr($scalar, $start_position, $length)

#!/usr/bin/perl -w

use strict;

my ($dna_sequence, $substring);

$dna_sequence =

"AGCTATACGACTAGTCTGATCGATCATCGATGCTGA";

$substring = substr ($dna_sequence, 0, 5);

print "The first 5 bases of $dna_sequence

are:\n$substring\n";

(17)

UC/LC

• uc (uppercase) and lc (lowercase) simply change the case of a scalar (uclc.pl).

#!/usr/bin/perl -w use strict;

my ($mixed_case, $uppercase, $lowercase);

$mixed_case = "AgCtAAGggGTCaCAcAAAAaCCCcATTTgcCC";

$uppercase = uc ($mixed_case);

$lowercase = lc ($mixed_case);

print "FrOm $mixed_case we get:\n";

print "UPPERCASE: $uppercase\n";

print "lowercase: $lowercase\n";

(18)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

S///

• This is proper Perl :-)

• The obvious difference between DNA and RNA is the replacement of T with U.

• Lets mimic the transcription of DNA to RNA with our new found Perl skills.

• We can use the substitution operator 's'.

• This can convert one element in a scalar to another element.

• This takes the form s/[one thing]/[for another thing]/

• Let's see it in action (transcription.pl).

(19)

#!/usr/bin/perl -w

use strict;

my ($dna_molecule, $rna_molecule);

$dna_molecule =

"AGCTATCGATGCTTTCGATCACCGGCTATCGAGTTTTTTTT";

print "My DNA molecule is $dna_molecule\n";

$rna_molecule = $dna_molecule;

$rna_molecule =~ s/T/U/g;

print "My RNA molecule is $rna_molecule\n";

exit();

(20)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

• What is that crazy =~ sign?

• This is called the "=~ operator".

• Allows you to specify the target of a pattern matching operation (FYI the /[whatever]/ bit is a "matching operator").

• By default matching operators act on $_ ie. if you just saw s/T/U/g; in a program on its own it is acting on $_

We have $rna_molecule =~ s/T/U/g; - which means perform the s/T/U/g on $rna_molecule. We have re-assigned the effect of the matching operator from $_ to $rna_molecule.

• If you want $rna_molecule to remain unchanged - but alter it in some way - assign it to another scalar first.

=~

(21)

REVERSE and TR

• So substitution allows you to change one thing ito another. This is great - we could use the same technique to get the complement of a DNA strand!

• All we have to do is change all the A's to T's, all the G's to C's, all the T's to A's and all the C's to G's.

• Then if we reverse it we get the reverse complement! Or do we? See wrong_revcom.pl.

• I guess the game is given away in the filename that there's something up with this.

• Look closely.

• Think about what each line is going to do to the scalar $DNA.

(22)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

$DNA = "AAAAGGGGCCCCTTTAGCTAGCT";

$DNA_UNTOUCHED = $DNA;

print "After no substitutions: DNA is : $DNA\n";

#substitute all the A's to T's

$DNA =~ s/A/T/g;

print "After A-T substitution: DNA is : $DNA\n";

#substiutute all the G's to C's

$DNA =~ s/G/C/g;

print "After G-C substitution: DNA is : $DNA\n";

#substitute all the C's to G's

$DNA =~ s/C/G/g;

print "After C-G substitution: DNA is : $DNA\n";

#subsitute all the T's to A's

$DNA =~ s/T/A/g;

print "After A-T substitution: DNA is : $DNA\n";

$DNA = reverse ($DNA);

print "$DNA_UNTOUCHED reverse complemented is:\n$DNA\n";

(23)

The answer

• You can't use sequential substitutions!

• WATCH YOUR PERL SYNTAX vs YOUR INTERNAL LOGIC! If your thinking is wrong, even if your Perl is correct – your output will be the result of your flawed logic! ie - WRONG!

• Ideally we want make all our substitutions in one statement that understands our needs.

• Come forth the tr operator.

• tr is like s, but better for tasks like this

• tr/ABCD/dcba would make AABBCCDD into ddccbbaa .

• Don't believe me?

(24)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

TR/REVERSE

#!/usr/bin/perl -w

use strict;

my ($DNA, $DNA_UNTOUCHED);

$DNA = "AAAAGGGGCCCCTTTAGCTAGCT";

$DNA_UNTOUCHED = $DNA;

$DNA =~ tr/AGCT/TCGA/;

$DNA = reverse ($DNA);

print "$DNA_UNTOUCHED has a reverse complement of:\n$DNA\n";

exit ();

(25)

EGTDC Perl Course 2003

Dan Swan: Perl documentation and resources

(26)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Built in help - perldoc

• Perl has a fantastic built in help system called perldoc.

• These are man (manual) pages for Perl.

• Type perldoc perldoc for information on perldoc and instructions on how to use it.

• perldoc perl will give you an overview of the Perl language (and some Perl humour).

• perldoc perltoc gives you the perldoc table of contents.

• If you don't fancy all the text scrolling by you can get a nicely hyperlinked

and webalised version at: http://www.perldoc.com

(27)

Major Web Resources

• www.perl.org

– Perlmongers Advocacy site.

– Perl history.

– Perl "biblography" from creator Larry Wall.

• www.perl.com

– Download Perl.

– Core documentation for the Perl language.

– CPAN mirror (more on CPAN later).

– Perl FAQs.

– Resource topics (referenced pages).

– Perl articles grouped by topic (very good!).

(28)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

More Web Resources

• learn.perl.org – Mailing lists.

• beginners.

• beginners-cgi.

• tips.

• www.perlmonks.com

– Beginner friendly perl Q&A site.

– Post a question get a fast reply!

– Try to read documentation before posting.

– When you improve - submit help to others questions to keep building

community.

(29)

CPAN

• The worlds finest Perl resource.

• "Comprehensive Perl Archive Network".

• Access via the web (http) and by the CPAN shell.

• A collection of:

– Perl modules (bits of re-useable code).

– Perl scripts.

– Perl documentation.

– search.cpan.org is probably your first point of call.

(30)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

CPAN and modules

• CPAN contains 4801 modules - bits of code that can be plugged into your scripts. Thse end in .pm (for Perl Module).

• Need to interface with a database? Use the DBD/DBI modules.

• Need to make webpages with Perl? Use the CGI module.

• Need to make a graphical interface? Use the Tk module.

• Need to do bioinformatics? Use the BioPerl modules!

• To use a module you "use module_name;" in the same way as you "use strict; "

• Modules can be installed in 2 ways – From source

– From CPAN using CPAN.pm

(31)

Installing modules from source

• Modules are distributed as .tar.gz files.

• Download the module to your home directory.

• if you are missing components required for the module to work you will be told at the 2 nd step below.

tar -zxvf test_module.tar.gz perl Makefile.PL

make

make test

make install

(32)

Environmental Genomics Thematic Programme Data Centre

http://envgen.nox.ac.uk

Installing modules with CPAN

• perl -MCPAN -e 'shell' on the command line launches the CPAN shell - a function of CPAN.pm - a perl module that comes bundled with Perl.

• CPAN shell is very clever as it automtically downloads, builds, tests, installs and resolves dependencies (eg if one perl module requires another).

• On Linux you need a few things installed to get it going.

• LWP modules, ncftp, links...

• Can accept most defaults (it asks a lot of questions when it first starts!)

• Great for getting BioPerl up and running fast:

– install Bundle::BioPerl

– install B/BI/BIRNEY/bioperl-1.2.1.tar.gz

(33)

Perl on Windows

• If you want to do Perl - but have no UNIX machine - don't panic.

• Perl runs on Windows too.

• Run, don't walk to www.activestate.com

• Download "ActivePerl" a free binary distribution of Perl for Windows.

• Uses "ppm" the "perl package manger" to control module installation - it's like the CPAN shell only 10x easier.

• Use "notepad" or a similar text editor to write your scripts.

• If you want to mimic a Unix environment in Windows for your programming

needs, you might want to look at www.cygwin.com which gives a Linux like

environment and development software you would expect on a Linux system.

References

Related documents

background and life experience prepared them for an online course and they maintained a 3.4 grade point average as opposed to their younger, more inexperienced counterparts who

The study examined the perceived impact of Information and Communication Technologies (ICTs) in enhancing entrepreneurship in sports among Health and Physical

The general land use pattern shows 80% under agriculture followed by forest, horticulture, wasteland and permanent fallow (Figure 2). Black soils are present not only

We need new hearing aids if these chil- dren are to benefit from the talking that goes on in ordinary classes .” The other specialist teacher said, “ The children tend to want to

The paper has made three contributions to the existing literature on tax design. First, we have taken the structural model of employment and hours of work seriously in designing

Hence, the relationship between the effective crack width and the crack permeability of all the specimens corresponds well with the modified Poiseuille law (Eq.(3)).

In part IV, I look in more detail at the merger retrospectives and assess their implications for merger control. If the predictive power of merger analysis was