Introduction to Perl Programming

(1)

Introduction to Perl

Programming

Oxford University

Computing Services

(2)

(3)

Version Date Author Changes made

1.0 09 Feb 2011 Christopher Yau Created

1.1 18 Apr 2011 Christopher Yau Minor revisions and corrections 1.2 20 Jun 2011 Christopher Yau Added section on Regular Expressions 1.3 2 Dec 2011 Christopher Yau Minor corrections (hand over to Sebastian Kelm)

1.4 10 May 2012 Sebastian Kelm Minor corrections

1.5 17 Feb 2013 Sebastian Kelm Removed section on Switch statements Hand over to Thaddeus Aid 1.6 17 Apr 2013 Thaddeus Aid Added 2d arrays, minor corrections

Acknowledgements

With thanks to Alistair Wire, Rehan Ali and Jeremie Becker for careful reading of the drafts and original document and Susan Hutchinson for the use of her introduction notes to Linux.

Copyright Notice

The copyright of the remainder of this document and its associated re-sources lies with Oxford University IT Services.

(4)

(5)

1 Introduction to Perl 13

1.1 A First Perl Program . . . 14

1.2 Handling numbers and strings . . . 16

1.2.1 Numbers . . . 16

1.2.2 Strings . . . 18

1.2.3 Reading from user input . . . 20

1.3 Arrays . . . 21

1.3.1 Creating and manipulating arrays . . . 21

1.3.2 Array functions . . . 23

1.3.3 Sorting arrays . . . 25

1.3.4 Two dimensional arrays . . . 27

2 Program Control 29 2.1 If-else statements . . . 29

2.2 Nested and Compound conditionals . . . 32

2.3 Loops . . . 35

2.3.1 While loops . . . 35 5

(6)

2.3.2 Foreach loops . . . 36 2.3.3 For loops . . . 37 3 File Handling 39 3.1 File Handles . . . 39 3.2 Closing a filehandle . . . 40 3.3 Error Checking . . . 41 3.4 Reading files . . . 42 3.5 Writing to files . . . 45

3.6 Writing to multiple files . . . 46

3.7 Tying files to an array . . . 49

3.8 File System Operations . . . 49

3.8.1 Changing directory . . . 49

3.8.2 Deleting files . . . 49

3.8.3 Listing files . . . 50

3.8.4 Testing files . . . 51

4 Regular Expressions 53 4.1 Basic string comparisons . . . 53

4.2 Using Wildcards and Repetitions . . . 54

4.3 Groups . . . 56

4.4 Character Classes . . . 56

4.5 Putting it All Together . . . 57

(7)

4.6.1 Substitutions . . . 58

4.6.2 Translations . . . 59

5 Hash Tables 61 5.1 Creating a Hash . . . 62

5.2 Testing for keys in a hash . . . 62

5.3 Retrieving keys and values from a hash . . . 64

5.4 Frequency tables using hashes . . . 65

5.5 Using records . . . 68

6 Sub-routines 71 6.1 Sub-routines . . . 71

6.1.1 Local and global scoping . . . 73

6.1.2 Passing Scalar Arguments . . . 75

6.1.3 Passing Arrays and Hashes as Arguments . . . 78

A Getting Started with Linux 85

(8)

(9)

In this course we will be using alive version of the Ubuntu Linux operating system. This means that the operating system is being run from a USB drive and comes with a default Linux environment and preconfigured settings ready to go. However, before we start we may need to make a few local modifications in order to set up the keyboard correctly. If you have never used Linux or the Linux terminal/command line interface before then a quick primer can also be found in the Appendix. These notes are borrowed from the IT Services “Introduction to Linux" course developed by Susan Hutchinson. A digital version of these notes, as well as a few data files, can be found on the course website. You will be referred to this URL whenever you need to obtain extra files to complete an exercise:

http://www.stats.ox.ac.uk/~aid/perl/

If you cannot complete all the exercises during the allotted time, you may wish to finish them at home. If you run into trouble you cannot solve by yourself, you can contact the course teacher by email. Please include the words “IT Services Perl” in the subject line.

[email protected]

(10)

The keyboard is currently set up with the USstyle layout. We need to configure this for our UK keyboards. This is only necessary on Live Linux systems. When using an installed version of Linux the keyboard layout will be configured at installation time.

1. Open System > Preferences > Keyboard

(11)

4. Click on Add.

5. Make sure that Generic 104key PC is selected in the Keyboard model field. 6. Now remove the USA and highlight United Kingdom and click on Close.

This will need to be done at the beginning of every session when running Live Ubuntu but is not normally necessary when using a desktop installation of Linux.

(12)

(13)

Introduction to Perl

Perl is a programming language originally developed by Larry Wall in 1987 as a Unix-based scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely pop-ular amongst programmers. Perl is a stable language, the current version Perl 5 has been in use since 1994. A major revision, Perl 6, is currently in de-velopment. Although this version will introduce some fundamental changes, it should remain nonetheless sufficiently “Perl-like” that many Perl program-mers will never notice the difference.

Perl is sometimes nicknamed “the Swiss Army chainsaw of programming languages" due to its flexibility and adaptability and there are a number of general features of Perl that make it a particularly worthy of this description:

High-level Perl uses strong abstraction which hides many of the physical and low-level systems architecture of the computer away from the user. This means that once we have written a Perl program, it should func-tion identically on any computer running the same version of Perl. Perl also uses natural language elements that means it is easy to read and interpret (as far as a programming language can be!).

Interpreted Perl programs are executed directly from the source file by a piece of software known as theinterpreter. There is no need to compile and generate an executable binary file (e.g. an .exefile). The source file containing Perl code is translated into an efficient intermediate

(14)

representation which is then immediately executed by the interpreter software.

Dynamic Perl executes at run-time several behaviours that other lower-level languages might perform during compilation. This means that, for instance, it is unnecessary in Perl to pre-define the size of arrays.

Perl borrows features from other programming languages in particularly C and Unix shell scripting languages such as sh, awk, and sed. The lan-guage provides powerful text processing facilities without the arbitrary data length limits of many contemporary Unix/Linux tools, facilitating easy ma-nipulation of text files. It is also used for graphics programming, system administration, network programming, applications that require database access and CGI programming on the Web.

This booklet presents a brief introduction to Perl programming utilising many guided and self-learning exercises. It is important to note that there are often many ways to write a Perl script to perform a task – there is rarely a single “correct” way. In the examples and solutions shown in this booklet, the coding style has been done to emphasise simplicity or promote understanding rather than computational efficiency or coding brevity. If you are able to master the material in this booklet, you will have established a solid foundation for learning further Perl programming. If you want further exercise, a few examples can be found in the Appendix to stretch your mind. A number of texts are available for supporting further Perl programming education and in addition there are many free resources available on the Internet. Enjoy!

1.1 A First Perl Program

Traditionally, the first program that one writes when learning a new com-puter programming language is one which prints “Hello World" on the screen, and then exits. It is surprising how much you can learn about a language just from being able to do this.

(15)

Perl programs do not have to be named with a .pl extension but you will need to name them like this for text editors to recognise that they are Perl scripts and correctly highlight keywords. It is also useful to keep this con-vention so you can more easily catalogue your files by looking at the file extension.

EXERCISE 1: Hello World

1. Open a text editor (e.g. gedit). 2. Start a new text document.

3. Enter the source code for the Hello World program below: #!/usr/bin/perl

use strict;

print "Hello World\n"; # prints "Hello World" to the screen 4. Save the file as first.pl.

5. Open a terminal window and change to the directory containingfirst.pl using the shell command cd.

6. Run the Hello World program by using the commandperl first.pl. You should see the following output:

> perl first.pl Hello World

The first line is a special comment. On Unix/Linux systems, the very first two characters on the first line of a text file are#!, followed by the name of the program that executes the rest of the file. In this case, the program is stored in the file #!/usr/bin/perl. The second line contains a pragma which is an instruction to the Perl interpreter to do something special when it runs your program. In this case, the commanduse strictdoes two things

(16)

that make it harder to write bad software: (i) it requires that you declare all your variables and, later on, we shall see it makes it harder for Perl to mistake your intentions when you are using sub-routines (Chapter 6). The third line prints the words “Hello World" to the screen followed by a new line specified by the \n. The script is executable by running the command perlin a terminal followed by the name of the script. This starts the Perl interpreter which immediately begins to execute the script whose name was supplied as an argument.

Note, the hash symbol # is used in Perl to denote comments; lines that will not be interpreted as code. Adding comments throughout your code is useful for reminding yourself and others what different parts of a program are meant to be doing.

1.2 Handling numbers and strings

In Perl, single items of data (e.g. a number or a string) are stored using a scalar variable. A scalar variable can be created by prefixing a$ in front of the variable name, e.g. $x,$car, $my_house, etc. A value can be assigned to a variable using the assignment operator (=), e.g. $x = 1.

In these exercises, since we are utilising theuse strictfunctionality of Perl, it also necessary to use the keywordmyin the declaration of a variable before its first use, e.g. my $x = 1.

1.2.1 Numbers

Perl contains a standard complement of numerical operators which can be applied to scalar variables containing numbers:

=... assignment, e.g. $x = 10 assigns the value 10 to the variable$x +... addition, e.g. $x + $ygives the sum of $xand $y

*... multiplication, e.g. $x * $ygives the product of $xand $y -... subtraction, e.g. $x - $y gives the difference between$x and$y

(17)

/ ... division, e.g. $x/$y divides$x by$y

% ... modulo, computes the division remainder, e.g. 15 % 10 gives 5 **... exponentiation, e.g. 2**3gives 8

EXERCISE 2: Numbers

1. Create a new text document and name itaddnumbers.pl. 2. Enter the following code:

#!/usr/bin/perl

use strict;

my $x = 1; # assign the value 1 to variable $x

my $y = 3; # assign the value 1 to variable $y

my $z = $x + $y; # assign the sum of $x and $y to variable $z

print "The first number is: $x\n"; # print the value of $x

print "The second number is: $y\n"; # print the value of $y

print "The sum is: $z\n"; # print the value of $z

3. Run the program in the terminal. You should see the following output:

> perl addnumbers.pl The first number is: 1 The second number is: 3 The sum is: 4

4. Extend the program to calculate and print the difference, product and quotient of $xand$y.

In this program, the numbers 1 and 3 are stored in the variables$x and $y respectively and the sum of the two variables is stored in $z before the

(18)

results are printed on screen. The value of the variables can be displayed on screen using theprint command.

1.2.2 Strings

Strings can also be assigned to variables in the same way by encasing strings of characters with single (‘) or double (") quotes. Single quotes specify that the contents between the quotes should be printed literally whilst double quotes means variables contained betweeen the quotes should be replaced by their values. The following exercise illustrates the difference between the two modes.

EXERCISE 3: Strings

1. Create a new text document and name itaddstrings.pl. 2. Enter the source code below:

#!/usr/bin/perl

use strict;

my $x = "Jim";

my $y = "Hendrix";

print "My name is $x $y\n";

print 'My name is $x $y\n';

3. Switch to the terminal and run the program.

> perl addstrings.pl My name is Jim Hendrix. My name is $x $y\n

4. Create a new variable to store the string “is a legend” and print out all three strings.

(19)

In this script, we have usedvariable interpolationin the firstprint state-ment. Whenprint is called by the Perl interpreter, the variables$xand$y are replaced by their contents before being printed. The final print statement uses a literal and the characters$xand $yare printed literally.

Perl contains a number of operators and functions for string manipula-tion, we introduce a few here:

1. (.) - the dot operator can be used for string concatenation (joining two strings together), e.g.

$x = "Hello"; $y = "World";

$z = $x . " " . $y; # $z contains "Hello World"

2. length- the length function can be used to find the length of a strings

$x = "Hello";

$length = length($x); # $length contains 5

3. uc and lc - the upper/lower case functions can be used to convert a string to upper/lower case respectively

$x = "Hello";

$upper = uc($x); # $upper contains "HELLO"

$lower = lc($x); # $lower contains "hello"

4. reverse - the reverse function can be used to reverse the order of characters of the string

$x = "Hello";

$xrev = reverse($x); # $xrev contains "olleH"

5. substr- to extract a piece of a string, e.g. substr($x, 2, 5) where $xis the string, 2 specifies the start character and 5 specifies the length of the sub-string to extract.

$x = "ambulance"; $y = substr($x, 2, 5);

(20)

should return the stringbulan.

The function substr can also be used to replace a piece of a string, e.g. substr($x, 2, 5, $y) where $x is the string, 2 specifies the start character, 5 specifies the length of the sub-string to extract and $ycontains the piece of text to be inserted.

For example:

$x = "jelly baby";

substr($x, 6, 4, "babe"); # $x contains "jelly babe" should return the stringjelly babe.

6. $variable =~ s/{pattern to search for in $variable}/{what to insert}/ - substitutions can be made in strings using aregular expression

$x = "jelly baby";

$x =~ s/baby/babe/; # $x contains "jelly babe"

1.2.3 Reading from user input

A really useful facility is the ability to read in information from the console. This allows you to produce interactive programs which can ask the user questions and receive answers. To achieve this we can use what is known as theSTDIN filehandle (more specifically on filehandles later). You usually only read one line at a time fromSTDIN (so the input stops when the user presses return).

print "What is your name? ";

my $name = <STDIN>;

chomp($name); # chomp removes the return character that you entered

print "What is your quest? ";

my $quest = <STDIN>;

chomp($quest);

print "What is your favourite colour? ";

my $colour = <STDIN>;

(21)

EXERCISE 4:

1. Write a Perl script to read in a string from the console and print: (a) The length of the string

(b) The reverse of the string

(c) the upper and lower case version of the string

2. Modify your script to accept two string inputs and prints the concate-nation of the two strings separated by a space.

1.3 Arrays

Perl has three built-in data types: scalars (numbers and strings), arrays and hashes. Hashes will be considered later in the text and we will focus on arrays for now. What are arrays? Simply, arrays are an ordered collection of scalars. Arrays therefore allow you to store scalars in an ordered manner and to retrieve them based on their position. Unlike many other programming languages, arrays in Perl are dynamic, which means that you can add or remove items from them at will. You do not need to specify in advance how big the array must be.

1.3.1 Creating and manipulating arrays

Arrays in Perl are denoted by a leading @ symbol (e.g. @array), array ele-ments are scalars and therefore start with a$, but in order to identify which element of the array you are referring to it is necessary to also specify their position in the array (known as their index) included in square brackets at the end of their name. The first element in an array has index 0 so, for example, the fifth element in@arraywould be $array[4].

The following code example illustrates the specification and use of an array:

(22)

my @array = ("cat", "dog", "rabbit", "turtle");

print "The first element of \@array is not $array[1], but $array [0]";

An array of strings is created called@arrayand the strings “cat", “dog", “rabbit" and “turtle" stored as part of the new array. The print displays the second item ($array[1] is dog) and the first item ($array[0] is cat). Another useful convenience is that you can use negative indices to count back from the end of an array:

print "The last element of \@array is $array[-1]";

Perl provides a number of array functions to add or remove items from an array. The functionpushcan be used to add items to the end of an array:

push(@array, ("badger", "fox") ); # @array contains cat, dog, rabbit, turtle, badger and fox

print "@array\n";

The code sequence above gives the following output:

cat dog rabbit turtle badger fox

showing that thepush function has added badger and fox to@array. The array functions shift and pop can be used to remove the first or last array elements respectively:

my $first = shift(@array); # @array contains dog, rabbit and turtle

my $last = pop(@array); # @array contains dog and rabbit

print "$first\n"; # prints "cat"

print "$last\n"; # prints "turtle"

print "@array\n"; # prints dog rabbit

A slice is to an array what substring is to a scalar. It is a way to extract several values from array in one go without having to take the whole thing. The syntax of a slice is@array[list_of_indexes]. It returns a list of scalars

(23)

which you can either assign to individual scalars or use it to initialise another array.

my @array = ("cat", "dog", "rabbit", "turtle", "badger", "fox"); (my $two, my $three, my $five) = @array[2, 3, 5]; # extracts 3rd

, 4th and 5th elements of @array

my @last_two = @array[4, 5]; # extracts 5th and 6th elements of @array

print "$two, $three, $five\n"; # prints "rabbit, turtle, fox"

print "@last_two\n"; # prints "badger fox"

When assigning to a list of scalars (as in$two,$three,$five) the values are mapped from the list returned by the slice onto the scalars in the list. This same technique can also be used to extract values from an array without changing it (as would happen if you used shift/pop):

my @array = ("cat", "dog", "rabbit", "turtle", "badger", "fox"); (my $red, my $orange, my $yellow) = @array; # $red contains "cat

", $orange is "dog" and $yellow contains "rabbit"

In this example the values are transferred to the scalars, but@array is left in tact. It does not matter that @arrayhas more values than the list, the rest are just ignored.

1.3.2 Array functions

One useful thing to be able to extract is the length of an array. There are two ways to get this. For every array there is a special variable associated with it which holds the highest index number contained in the array. As array indexes start at 0, this value is always one less than the length. The special variable is $#array_name. It is a good idea to get used to the notion of special variables in Perl as they crop up a lot as a shorthand for experienced programmers. They can produce sometimes unintelligible code if you are not aware that you are looking at a special variable:

my @array = (1, 2, 3, 4, 5);

print "The last index is ", $#array; # gives "The last index is 4"

(24)

Alternatively you can get the length of an array by using it in a situation where Perl expects to see a scalar. If you use an array as a scalar you get the length of the array. For example:

my @array = (1, 2, 3, 4, 5);

my $length = @array;

print $length; # prints "5" is completely equivalent to:

my @array = (1, 2, 3, 4, 5);

print "The length of the array is ", scalar(@array); # prints " The length of the array is 5"

As with scalars before there are a couple of functions which are only really useful in combination with arrays.

The join function turns an array into a single scalar string and allows you to provide a delimiter which it will put between each array element. It is commonly used when outputting data to write it to a file as either comma separated or tab delimited text.

my @array = ("tom", "dick", "harry");

print join("\t", @array); # Produces tab delimited text

You can also go the other way and use thesplitfunction to split a single scalar into a series of values in an array. Thesplit command actually uses a regular expression to decide where to split a string - do not worry about the details of this bit for the moment - we will come to these later, just think of it as a string in between two “/" characters.

my $scalar = "Hello.there.everyone";

my @array = split(/\./, $scalar); # @array contains "Hello", " there" and "everyone"

print "Second element is ", $array[1],"\n"; # prints "Second element is there"

(25)

1.3.3 Sorting arrays

A common requirement having populated an array is to sort it before doing something with it. Sorting is actually a non-trivial task but most of the complexity and technicalities are rarely relevant in most common tasks.

The function to sort an array is sort. The function uses a sorting rule which is, a small block of code to instruct it as to how to do a comparison between two objects, and the array containing the objects to be sorted. Sort does not alter the array that is passed to it but rather returns a new array consisting of the sorted list of the elements contained in the original array. Perl uses the special variable names $a and $b to define sorting rules and these variable names should be reserved for use in sort code blocks, they will work elsewhere, but it is considered bad practice to use them.

my @array = ("C", "D", "B", "A"); @array = sort {$a cmp $b} @array;

print join(" ", @array); # prints A B C D

This code sorts the array alphabetically. The code block in the sort is the bit between the curly brackets . The block must contain a statement using $a and $b to say which one comes first. The two operators you can use for comparing scalars in Perl are cmp for comparing strings and <=> (called the spaceship operator) for comparing numbers. You can apply whatever transformations you like on $a and $bbefore you do the comparison if you need to.

EXERCISE 5: Array functions

1. Create a new script with the following code: #!/usr/bin/perl

use strict;

my @array = ( 1 .. 10 ); # create an array of numbers 1-10

(26)

my $first_element = shift(@array); # remove the first element and store in first_element

my $last_element = pop(@array); # remove the last element and store in last_element

print "The first and last elements of the array are $first_element and $last_element\n";

push(@array, ( -5 .. +5 ) ); # add the numbers -5 to +5 to the array

print "The array currently contains: @array\n";

my @sortedarray = sort{$a <=> $b}(@array); # sort the array numerically

print "The sorted array contains: @sortedarray\n";

my @new_array = qw(cat dog rabbit turtle fox badger); # create a new array using qw

print "@new_array\n";

2. Create a new script and the following array:

@array = qw( 99players b_squad a-team 1_Boy A-team B_squad 2_Boy); 3. Sort the array using the following sorting options:

(a) Sort numerically in ascending order: @array = sort {$a <=> $b} @array;

(b) Sort numerically in descending order (same as before but with$a and$bswapped):

@array = sort {$b <=> $a} @array;

(c) Sort alphabetically in a case-insensitive manner: @array = sort {lc $a cmp lc $b} @array; 4. Create a new script with the following array:

(27)

@words = qw( The quick brown fox jumps over the lazy dog and runs away );

5. Using appropriate array access and join functions construct the follow-ing strfollow-ings and store these in a sfollow-ingle variable and print to screen:

“The quick fox jumps over the dog” “The brown fox runs away”

“The lazy dog runs” “The dog runs away quick”

“The quick brown dog runs over the lazy fox”

1.3.4 Two dimensional arrays

Often times it is useful to have a multi-dimensional array to simulate a table or other data structure such as a matrix. In Perl this is achieved by creating an array of arrays.

my @array = ([1, 2, 3], [4, 5, 6], [7, 8, 9]);

To access the data you must add another index to your scalar listing. The first index after the scalar is the index for which sub-array you want to access. The second index is the index inside the sub-array. Like single dimensional arrays the indexes are numbered from 0 through n - 1. In order to print the centre element in the 2d array we just created we would use the command:

print $array[1][1] . "\n"; # prints 5 to the screen

print $array[1][2] . "\n"; # prints 6 to the screen

print $array[1][3] . "\n"; # this is outside of our prepared array and will give an error use $array[2][0] instead To print the array you can use the following code:

for $_ ( @array ) {

print "[ @$_ ],\n"; }

(28)

EXERCISE 6: Two dimensional array exercises

1. Create an array of people:

my @people = (["Clark", "Kent"], ["Lois", "Lane"], ["Bruce", "Wayne"]); 2. Usepushto add “Superman” to Clark Kent’s sub-array.

3. Usepop to remove Bruce Wayne from the matrix.

4. Use a directly indexed scalar add “Reporter‘’ to the third element of Lois Lane’s sub-array.

5. Add a third sub-array with the values “Jimmy”, “Olsen”, “Photogra-pher”.

6. Print the resulting matrix to the screen. 7. Print only the last names to the screen.

(29)

Program Control

In the simple scripts considered so far, the program execution starts at the top and every line is executed until we reach the bottom and the program terminates. In most programs things are not so straight forward and it is very useful to have pieces of code that are only executed when certain conditions are met. In this chapter we will examine conditional statements that allow such program control.

2.1 If-else statements

The basic conditional statement is the if. An if statement evaluates a conditional statement and then executes a piece of code if that condition is true. A simple conditional statement usingif is shown below:

my $salary = 50000;

if ( $salary > 100000 ) {

# if the value of salary is greater than 100,000

print "You must be a banker...\n"; }

Table 2.1 gives a list of some possible conditional statements that can be used to compare two numbers or two strings.

(30)

Conditional test Data Type Description $x == $y Numerical X is equal to Y $x != $y Numerical X is not equal to Y

$x > $y Numerical X is greater than Y $x < $y Numerical X is less than Y

$x >= $y Numerical X is greater than or equal to Y $x <= $y Numerical X is less than or equal to Y $x eq $y Strings X is equal to Y

$x ne $y Strings X is not equal to Y $x gt $y Strings X is greater than Y $x lt $y Strings X is less than Y

Table 2.1: String comparisons. A list of conditional statements that can be used for comparing two numbers or two strings specified in the variables x andy. The final two comparisons compare the strings based on alphabetical order.

This piece of code assigns the value 50,000 to the scalar variablesalary. A conditional statement if is used to test whether the value contained in salary is greater than 100,000. If the salary is greater than 100,000, the statementYou must be a banker ... contained between the curly braces { ... } is printed.

An if statement can also contain an else statement which specifies a piece of code that is executed if the condition is false:

my $salary = 50000;

if ( $salary > 100000 ) {

# if the value of salary is greater than 100,000

print "You must be a banker...\n"; }

else

{

# if the value of salary is less than 100,000

print "You are not a banker...\n"; }

(31)

1. Create a new text document and name itifthenelse.pl. 2. Enter the source code below:

#!/usr/bin/perl use strict; my $x = 5.1; my $y = 5; if ( $x > $y ) {

print "x is greater than y\n"; }

else

{

print "y is greater than x\n"; }

$x = 5.0; $y = 5.0;

if ( $x > $y ) {

print "x is greater than y\n"; }

elsif ( $y > $x ) {

print "y is greater than x\n"; }

elsif ( $y == $x ) {

print "x is equal to y\n"; }

3. Switch to the terminal and run the program. > perl ifthenelse.pl

x is greater than y x is equal to y

(32)

4. Modify the program to accept the numbers entered by the user using <STDIN> and re-run the program to see the changes in the program behaviour.

5. Write a new program that computes the area of a circle with a radius that is specified by the user using <STDIN>. The area of a circle is π

times the radius of the circlesquared (π≈3.141592654).

6. Modify the program so that if radius is a negative number the program will print “The radius of a circle must be a positive number".

7. Add a conditional statement to print:

(a) “This is a big circle” if the area of the circle is greater than 100. (b) “This is a small circle” if the area of the circle is less than 100.

2.2 Nested and Compound conditionals

If we want to check more than one condition in anifstatement, we can nest them to produce more complex logic:

if ($salary > 100000) # if $salary is greater than 100,000

{

if ($bonus > 100000) # if $bonus is greater than 100,000

{

# this statement is only printed if salary > 100,000 AND bonus > 100,000

print "You are a lucky boy!"; }

}

In this example code, the value ofsalaryis first checked to see if it is greater than 100,000 and if it is the value of bonusis then checked to see if it is also greater than 100,000. If both variables satisfy these conditions then the print statement is executed.

However the above code example can be equivalently expressed using a compound statement where both conditions are evaluated at the same time:

(33)

if ( ($salary > 100000) and ( $bonus > 100000 ) ) {

print "You are a lucky boy!"; }

The choice of whether to use nesting or compound conditionals typi-cally depends on the problem being solved and the readability of the code. Long compound conditional statements are undesirable but multiple levels of nesting are also difficult to manage!

EXERCISE 8: Nested and Compound If statements

1. Create a new text document and name itnestedif.pl. 2. Enter the source code below:

#!/usr/bin/perl use strict; my $x = 5.1; my $y = 5.1; if ( $x > 5.0 ) { if ( $y > 5.0 ) {

print "x and y are greater than 5\n"; }

}

if ( ( $x > 5.0 ) and ( $y > 5.0 ) ) {

print "x and y are greater than 5\n"; }

3. Switch to the terminal and run the program:

(34)

x and y are greater than 5 x and y are greater than 5

4. Modify the program to accept numbers from<STDIN> and re-run the program to see the changes in the program behaviour.

5. Using a combination of and, or and if statements or nested state-ments, write a script to print out the following statements under these salary/bonus scenarios:

(a) Salary<100000, Bonus <100000: “You are not a banker.”

(b) Salary > 100000, Bonus < 100000: “You are banker with no

bonus.”

(c) Salary > 100000, Bonus > 100000: “You are banker with a big

bonus.”

(d) Salary<100000, Bonus >100000: “You won the lottery.”

(e) Salary or Bonus >100000: “You are buying dinner tonight.”

6. In Perl, we can use the=~operator to perform pattern matching. The statement $x =~ /word/ is true if the variable x contains the phrase “word”. Using this pattern matching operator and conditional state-ments, write acase-insensitive test to see if an input string xcontains the following text:

Word to find Print this if found

Chris Found Chris!

Bells Ding dong!

Wonder I was wondering about that too

Land Air and Sea

Test your code using the following strings: (a) Christmas Time

(b) The bells are ringing in Wonderland (c) Stevie Wonder

(35)

(e) Wondering about your day

2.3 Loops

Until now the Perl code we have seen has been executed sequentially from start to finish with some parts missed out on the way due to conditional statements. Loops allow us to define repetitive code elements that can be executed repeatedly until some condition is satisfied.

2.3.1 While loops

The simplest kind of loop is the while loop. The while loop consists of a block of code in curly brackets preceded by a statement which is evaluated as being either true or false. If it is true the block of code is run once and the condition is then tested again. The loop continues until the condition returns a false value.

To make a while loop work you must have something change in the block of code which affects the condition you supplied at the start. If you do not have this then the loop will either not run at all, or it will continue running forever (known as an infinite loop). A simple while loop is shown below which illustrates the normal syntax for a loop:

my $count = 0;

while ($count < 5) {

print "Count is $count\n";

$count++; # this is Perl short hand for $count = $count + 1

}

In this loop the condition being evaluated is $count < 5, and the loop finishes because$countis increased by 1 every time the loop code runs.

(36)

2.3.2 Foreach loops

The other commonly used loop structure in Perl is theforeachloop. This is used to iterate through a list of values where you can supply a block of code which will be run once for each value in the list supplied. A simpleforeach loop is shown below:

foreach my $value (2, 4, 6, 8, 10) {

print "Value is $value\n"; }

This code prints "Value is ..." for each of the values 2, 4, 6, 8 and 10 giving the following output:

Value is 2 Value is 4 Value is 6 Value is 8 Value is 10

Although you can manually create a list for a foreach loop it is much more common to use an existing data structure instead. This is usually either an array, or a list of the keys of a hash.

my @animals = ("cat", "dog", "rabbit");

# for each element in the array @animals assign it to the scalar $animal

foreach my $animal (@animals) {

# print the value of $animal

print "I have a $animal\n"; }

Finally, another useful bit of Perl syntax to use with foreach loops is the range operator. This consists of two scalars separated by a double dot (..) and creates a list in which all values between the two scalars are filled in.

(37)

foreach my $number (1 .. 10) {

print "There are $number elephants\n"; }

foreach my $letter ("a".."z","A".."Z") {

print "This is a letter: $letter\n"; }

The behaviour of numbers in ranges is pretty intuitive (goes up by one each time). Letters are OK as long as you keep everything either upper or lower case and do not mix the two. You can do ranges with multiple letters, but watch out because they get pretty big pretty quickly!

2.3.3 For loops

A more classically styled for loop is also available for use in Perl which is more akin to the type of for loop structures seen in other programming languages such as C or Java.

my @animals = ("cat", "dog", "rabbit");

for ( my $i = 0; $i < scalar(@animals); $i++ ) {

print "I have a $animals[$i]\n"; }

Here,$i is an index which increments each time the loop is repeated (indi-cated by the $i++) until the condition$i < scalar(@animals)is satisfied. The function scalar returns the number of elements in an array. In this code example, the names of all the entries in the array @animalsare there-fore printed just as we did previously using theforeach. As you can might imagine, Perl programmers tend to use foreach more often than the more basic for loops but it is helpful to understand how a for loop works as it does appear everything so often and is fundamentally what the foreach command is built upon.

(38)

1. Create a new text document and name itloops.pl.

2. Write a script that prints out the numbers 1980 to 2010 using a loop. 3. Modify your script to use a conditional statement and print out “This is a new decade!" for years ending in nought. HINT: Use$year % 10 == 0 to test if a year is divisible by 10.

4. Use awhile loop to count backwards from 10, print the numbers and print the line “We have lift off!” when the count reaches zero.

5. Create an array with the following strings as elements: James Bond 007

Department of Statistics University of Oxford Fantastic 4

Use a loop to print “The string ‘x’ contains numbers” if$xdoes contain numbers. Print the uppercase version of strings that do not contain numbers.

HINT: The test $x =~ /[0-9]/ can be used to identify if x contains any (single digit) number from 0 to 9.

(39)

File Handling

Reading large data files and creating and writing large quantities of data to files is one of the tasks that Perl is most often used for in real applications. Perl and its associated modules provide powerful tools for handling and manipulating files in an easy to use way. In this chapter, we will explore some of these capabilities.

3.1 File Handles

In order to do any work with a file you first need to create a filehandle. A filehandle is a structure which Perl uses to allow functions to interact with a file. You create a filehandle using the open command. This can take a number of arguments:

1. The name of the filehandle (all in uppercase by convention) 2. The mode of the filehandle (read, write or append)

3. The path to the file

When selecting a mode for your filehandle, Perl uses the following symbols:

1. < for a read-only filehandle 39

(40)

2. > for a writable filehandle (creates a new file for writing or clear an existing file of the same name)

3. >> for an appendable filehandle (adds to an existing file of the same name or create a new file if none exists)

If you do not specify a mode, Perl assumes a read-only filehandle, and for reading files it is usual to not specify a mode. The difference between a write and append filehandle is that if the file you specify already exists a write filehandle will wipe its contents and start again whereas an append filehandle will open the file, move to the end and then start adding content after what was already there. The code below shows an example of opening a read-only and a writeable filehandle.

open(IN, 'readme.txt'); # open a file handle called IN for reading

open(OUT, '>', 'writeme.txt'); # open a file handle called OUT for writing

By convention the filehandle names (in this example IN and OUT) should be written in all capitals. In some older coding styles you may well see the mode combined with the filename although this style should now be considered obsolete and avoided:

open(OUT, '> writeme.txt');

3.2 Closing a filehandle

When you have finished with a filehandle it is a good practice to close it using theclose function.

close(OUT);

If you do not explicitly close your filehandle it will automatically be closed when your program exits. If you perform another open operation on a file-handle which is already open then the first one will automatically be closed when the second one is opened.

(41)

3.3 Error Checking

Error checking is very important when opening files for read and write oper-ations. What happens if a file does not exist or if you do not have permission to write to a file? Perl can force its way through these failures and move on giving you catastrophic or strange errors later in your code when you try to use the filehandle you created. It is therefore important to ensure that your Perl code ALWAYS checks that file operations are completed before proceeding.

You can check that the operation succeeded by looking at the return value of the function open. If an open operation succeeds it returns true, if it fails it returns false. You can use a normal conditional statement to check whether it worked or not.

my $return = open(IN, "readme.txt");

if ($return) {

print "It worked!"; }

else

{

print "Could not read the file!";

exit; }

More commonly, Perl programmers use the following code:

open(IN, "readme.txt") or die "Cannot read readme.txt: $!";

which gives the following output upon failure:

Cannot read readme.txt: No such file or directory at line 5.

If a file open fails, Perl stores the reason for the failure in the special variable$!and prints out the reason for failure using the functiondiewhich also terminates the program immediately.

(42)

3.4 Reading files

Once a filehandle has been created to a file, data can be read from a file using the<>operator. The identifier of the filehandle you want to read from is placed between the angle brackets. This reads one line of data from the file and returns it. To be more precise this operator will read data from the file until it hits a certain delimiter. The default delimiter is your systems newline character (\n), hence you get one line of data at a time.

my $file = "/tmp/myfile.txt"; # the path and file name of the file to be read

open(IN, $file) or die "Can't read $file: $!"; # open a file handle to the file

my $first_line = <IN>; # reads next line from the file specified by the file handle IN

print $first_line; # prints the first line of the file (note that the new line character in the file is retained

print "The end";

close(IN);

This produces the following output:

This is the first line, The end

There is no \n specified at the end of the first print statement. This is because the<>operator does not remove the delimiter it is looking for when it reads the input filehandle so the variable$first_line already contains a new line delimiter. Normally you want to get rid of this delimiter, and Perl has a special function calledchomp for doing just this. Chomp removes the same delimiter that the<> uses but only if it is at the end of a string.

my $file = "/tmp/myfile.txt";

open(IN, $file) or die "Can't read $file: $!";

my $first_line = <IN>;

chomp($first_line); # remove the delimiter

print $first_line;

print "The end";

(43)

This code produces:

This is the first line,The end

In order to read the entire file we must use a loop to apply the<>operator repeatedly to read all lines from a file. The typical way to read a file is to put the<>operator into a while loop so that the reading continues until the end of the file is reached. A while loop is preferred over for loops because we do not need to pre-specify how many times we want the loop to run for, only that we should stop when the end of the file is reached.

my $file = "/tmp/myfile.txt";

open(IN, $file) or die "Can't read $file: $!";

my $line_count = 1;

while (my $line = <IN>) {

chomp($line);

print "$line_count: $line\n"; $line_count++;

}

close(IN);

Gives:

1: This is the first line, 2: here comes the second line, 3: and here the third one... 4: This is a boring file.

5: Let’s move on to something more fun.

EXERCISE 10:

1. Download the filefruit.csv from the course website:

(44)

2. Create a new text document calledreadfile.pland enter the follow-ing code:

#!/usr/bin/perl

use strict;

my $infile = "fruit.csv";

open(FH, $infile) or die "Cannot open $infile\n";

# this bit of code reads (skips) the header line

<FH>;

while ( my $line = <FH> ) {

chomp($line);

my @linedat = split(/,/, $line); # splits the line at commas

my $fruit = $linedat[0];

my $quantity = $linedat[1];

my $unitprice = $linedat[2];

$unitprice = sprintf('%0.2f', $unitprice); # converts the unit price into 2 decimal places

print "We have $quantity of $fruit at $unitprice pounds each\n";

}

(45)

3.5 Writing to files

Writing to a file is straightforward once you understand the concept of a filehandle. After opening a filehandle for writing, the only function required is print. All of the previous print statements shown so far have actually been sending data to the STDOUT filehandle (i.e. the screen). If a specific filehandle is not specified then STDOUT is the default location for any print statements.

open(OUT, '>', "write_test.txt") or die "Can't open file for writing : $!";

print OUT "Sending some data\n";

close(OUT) or die "Failed to close file: $!";

When writing to a file it is important to check that the open function has succeeded and that no error occurs when closing the filehandle. This is because errors can occur whilst you are writing data, for example if the device you are writing to becomes full whilst you are writing.

EXERCISE 11:

1. Create a new text document called writefile.pl and enter the fol-lowing code:

#!/usr/bin/perl

use strict;

my $outfile = "myoutfile1.txt";

open(OUTFILE, "> $outfile") or die "Cannot write to $outfile\n"

;

print OUTFILE "This is my first file\n";

close(OUTFILE);

2. Using a loop, add some extra code to print the numbers 1, .., 100 to the file.

(46)

3. Use conditional statements to print only odd numbers between 1 and 100 to the file.

3.6 Writing to multiple files

A common file processing task, for which Perl is commonly used, is the extraction of selected portions of data from a large data file. Multiple output files may then be generated each with differing content. However, we may not know in advanced how many output files will be created (since this maybe determined by the input data) and hence how many output file handles we might require. In order to write to multiple files, we must create multiple filehandles, here we introduce Perl module calledFileCachewhich contains a pre-defined library of file handling functions which will simplify the task of writing to and managing multiple file handles.

In order to use theFileCachemodule, we must insert the following code at the top of our Perl script

no strict 'refs';

use FileCache maxopen => 16;

Here, the keyworduseis to specify that we are using theFileCachemodule whilstmaxopen => 16are options specific to theFileCachemodule. In this case this is to specify the maximum number of file handles used byFileCache at any one time (note - that we can write to more than 16 different files but only a maximum of 16 filehandles will be opened at any one time and the module will take care of opening and closing filehandles appropriately). Theno strict ’refs’; is required due to implementation issues with the FileCachemodule which we will not detail here.

Now, instead of opening a file handle for writing as specified previ-ously, we use the cacheout command to return a filehandle specified by theFileCache module. We can then use print to write to the file handle as before:

(47)

my $FH = cacheout($file);

print $FH "Writing to this file\n";

It is not obvious here what the point of using FileCache is so far other than as a possible simplification of the normal file handling creation process. However, consider the following example which contains an array of football clubs:

# array of club wins

my @wins = ( 'Manchester United',

'Arsenal',

'Chelsea',

'Manchester United',

'Chelsea',

'Chelsea' );

# for each club win

foreach my $club ( @wins ) {

my $file = "$club.txt"; # generate a filename for this club

my $FH = cacheout($file); # get file handle for this file

print $FH "This team won\n"; # print

}

In this example, the foreach loop goes through each club in the array, generates a filename ($file) for each club and usescacheoutto return a file handle to that file. It then writes the sentence “This team won" in that file. The result is the creation of three files named Manchester United.txt(2), Arsenal.txt (1) and Chelsea.txt (3) respectively with ‘This team won" printed the number of times indicated by the number in the brackets.

Note that we did not have to scan through the array first to work out how many different clubs were present in the array nor did we have to explicitly create a file handle for each club - theFileCachemodule has done the hard work for us! This is very useful in more realistic settings where the arrays maybe considerably larger and there are a large number of different files involved.

(48)

EXERCISE 12:

1. Create a new text document called myfilecache.pl and enter the following code:

#!/usr/bin/perl

use strict;

no strict 'refs';

use FileCache maxopen => 16;

my $infile = "departments.csv"; # this is the file to read # open a file handle

open(INFILE, $infile) or die "Cannot open $infile\n";

# skip the header line

<INFILE>;

# read one line at a time

while ( my $line = <INFILE> ) {

chomp($line);

# extract data

( my $staffid, my $firstname, my $surname, my

$department, my $employmentstatus ) = split(/,/, $line);

my $name = $firstname . " " . $surname;

print "$staffid\t$name\t$department\t$employmentstatus\ n";

}

# close file handle

close(INFILE);

(49)

files, one for each department, which contain the Staff ID and Names for each person working in that Department. The output files should be named after the department they represent.

3. Modify your program to only include full-time employees (FT).

3.7 Tying files to an array

The moduleTie::Fileallows the lines of a disk file to access as though they were a Perl array.

3.8 File System Operations

As well as being able to read and write files Perl offers a number of other filesystem operations within the language.

3.8.1 Changing directory

Instead of having to include a full path every time you open or close a file it is often useful to move to the directory you want to work in and then just use filenames. You use the chdir function to change directory in Perl. As with all file operation you must check the return value of this function to check that it succeeded.

chdir ("/tmp/") or die "Couldn't move to temp directory: $!";

3.8.2 Deleting files

Perl provides the unlink function for deleting files. This accepts a list of files to delete and will return the number of files successfully deleted. Again you must check that this call succeeded.

(50)

# This works:

unlink ("/tmp/killme.txt", "/tmp/metoo.txt") == 2 or die "Couldn't delete file: $!";

# But this is better:

foreach my $file ("/tmp/killme.txt", "/tmp/metoo.txt") {

unlink $file or die "Couldn't delete $file: $!"; }

3.8.3 Listing files

Another common scenario is that you want to process a set of files. Perl provides a function calledglob to list files in a directory.

my @files = glob("*.rtf");

print "I have ", scalar(@files), " rtf files in my directory\n";

You can also frequently encounter a shortcut for globbing which uses the angle brackets (<>) instead of glob. Both methods are entirely equivalent.

my @files = <*.doc>;

print "I have ", scalar(@files)," doc files in my directory\n";

Although you can return the output from aglobinto an array as shown above, it is actually possible to treat it a bit like a filehandle and read filenames from it in awhile loop.

chdir ("/tmp/docs") or die "Can't move to docs directory: $!";

while (my $file = <*.doc>) {

print "Found file $file\n"; }

my @files = <*.doc>

foreach my $file (@files) {

print "Found file $file\n"; }

(51)

3.8.4 Testing files

It maybe necessary to identify properties of a file in an application. Perl provides a series of simple file test operators to allow you to find out basic information about a file. The file test operators are as follows:

Test Description

-e Tests if a file exists -r Tests if a file is readable -w tests if a file is writable

-d tests if a file is a directory (directories are just a special kind of file) -f tests if a file is a file (as opposed to a directory)

-T tests if a file is a plain text file

All of these tests take a filename as an argument and return either true or false.

chdir ("/tmp/docs/") or die "Can't move to docs directory: $!";

while (my $file = <*>) {

if (-f $file) {

print "$file is a file\n"; }

elsif (! -w $file) {

print "$file is write protected\n"; }

}

As an aside, in the above example, the test to identify whether the file is write protected uses an ! at the start of the conditional statement. This operator reverses the sense of the test which follows. It is the Perl language equivalent of putting not at the end of a sentence. In this case !-w texts if the file isnot writable, i.e. is write protected.

(52)

1. Download the fileanimals.zipfrom the course website

http://www.stats.ox.ac.uk/~aid/perl/

Extract its contents to your home directory. This should create a directory calledanimalswith four files in it.

2. Use glob to find all the text files in theanimals directory and store these in an array.

3. Create a single summary file containing a list of all the types of foxes, badgers and rabbits and their numbers. The summary file should have the following headers: with the data following underneath

Species Type Number

Fox Alopex 23

You will need to:

(a) Check that each file is a text file.

(b) Create a file handle for each data file and read in the data from each file.

(c) Create a file handle to a summary file and write the animal data to this file.

4. Write some code to delete the file called sweets.dat in theanimals directory. WARNING!BE CAREFUL WHAT YOU DELETE WHEN DOING THIS!

(53)

Regular Expressions

Perl has many features that set it apart from other languages. Of all those features, one of the most important is its strong support for regular expres-sions. These allow fast, flexible, and reliable string handling.

A regular expression, often called a pattern in Perl, is a template that either matches or does not match a given string. Regular expressions are often used to implement the following types of tasks:

1. Complex string comparisons, e.g. find the text trans in the string variable$string = "transformer".

2. Complex string selections, e.g. select the textoutin the string variable $string = "shout".

3. Complex string replacements, e.g. replace the text Presley with the text Costelloin the string variable$string = "Elvis Presley".

4.1 Basic string comparisons

The most basic string comparison is:

$string =~ m/abc/;

(54)

The above returns true if the string $string contains the sub-string abc and false otherwise. The operator =~ appears between the string variable you are comparing, and the regular expression you are looking for (note that in selection or substitution a regular expression operates on the string var rather than comparing). The operatorm denotes a matching operation. Whilst the operator / is the usual delimiter for the text part of a regular expression. If the sought-after text contains slashes, it’s sometimes easier to use pipe symbols (|) for delimiters, but this is rare. Table 4.1 provides a list of variations on this basic operation.

Example Description

$string =~ m/abc/; Check if $string contains any instance of the textabc.

$string =~ m/^abc/; Checks if $string contains an instance of the textabc at the beginning of the string.

$string =~ m/abc$/; Checks if $string contains an instance of the textabc at the end of the string.

$string =~ m/^abc$/; Checks if$stringcontains only the text abc. $string =~ m/abc/i; Peforms a case-insensitive match to see if

$stringcontains an instance of the textabc. Table 4.1: Example string comparisons using regular expressions.

4.2 Using Wildcards and Repetitions

Perl regular expressions allow us to use “wildcards” (in computing, a wild-card character can be used to substitute for a particular type of character or characters in a string) and repetitions to match multiple instances of a character. Table 4.2 provides a list of wildcard characters. You can also follow any character, wildcard, or series of characters and/or wildcard with a repetiton in order to match multiple instances of particular types of char-acters. Table 4.3 lists some examples.

For example, the following regular expression checks if$stringcontains a percentage symbol followed by a whitespace character and then any other characters:

(55)

Character Description

. Match any character

\w Match “word" character (alphanumeric plus_) \W Match non-word character

\s Match whitespace character \S Match non-whitespace character \d Match digit character

\D Match non-digit character

\t Match tab

\n Match newline

\r Match return

\f Match formfeed

\a Match alarm (bell, beep, etc)

\e Match escape

Table 4.2: Wildcards in regular expressions Character Description

* Match 0 or more times

+ Match 1 or more times

? Match 1 or 0 times

{n} Match exactly n times {n,} Match at least n times

{n,m} Match at least n but not more than m times Table 4.3: Repetitions in regular expressions

This regular expression checks for a percentage symbol at the start of the string using ^%that is then followed by a whitespace character\s and then any number of characters using.*. Strings that would return true include % Hello Worldand % Applebut not %Helloor %% Banana.

We can check to see if a string satisfies a particular format, for example, a classic DOS-style 8.3 filename format, using regular expressions.

$string =~ m/^\S{1,8}\.\S{3}$/;

This regular expression matches 8 non-whitespace characters at the start of the string using ^\S{1,8} followed by a dot \. (note the use of backslash) and then three further non-whitespace characters at the end of the string \S{3}$.

(56)

4.3 Groups

Groups are regular expression characters surrounded by parentheses. Pow-erful regular expressions can be made with groups. At its simplest, you can match either all lowercase or name case like this:

$string =~ m/(G|g)eorge (C|c)looney/;

Detect all strings containing vowels:

$string =~ m/(A|E|I|O|U|Y|a|e|i|o|u|y)/;

Detect if the line starts with any of the last three Prime Ministers:

$string =~ m/^(Blair|Brown|Cameron)/i;

Groups can also be used for string selections:

$string = "01234 56789"; $string =~ m/(\d+)\s(\d+)/;

print "$1, $2\n";

would produce the following output:

01234, 56789

The regular expression(\d+)\s(\d+) matches one or more digits, followed by a whitespace character and then one or more digits. The special variables $1 and $2 store the matches corresponding to the groups of digits in the regular expression.

4.4 Character Classes

Character classes are alternative single characters within square brackets that can be used as an alternative to groups. Character classes have three main advantages:

(57)

• Character Ranges, such as [A-Z].

• One to one mapping from one class to another, as intr/[a-z]/[A-Z]/ (we will describe translations usingtrlater).

A hyphen is used to indicate all characters in the sequence between the character on the left of the hyphen and the character on its right. An uparrow (^) immediately following the opening square bracket means “Anything but these characters”, and effectively negates the character class. For instance, to match anything that is not a vowel, do this:

if ( $string =~ m/[^AEIOUYaeiouy]/ ) {

print "This string contains a non-vowel"; }

Contrast this to the following which returns true if the string contains no vowels at all (note the use of!~to denote not matching):

if ( $string !~ m/[AEIOUYaeiouy]/ ) {

print "This string contains no vowels at all"; }

Print all people whose name begins with A through E

if ( $string =~ m/^[A-E]/ ) {

print "$string\n"; }

4.5 Putting it All Together

We can put all these features together to produce some powerful string matching operations. For example, the following regular expression prints everyones whose last name is Blair, Brown or Cameron. Each element of the list is first name (^S+), blank (\s+), last name (Blair|Brown|Cameron), and possibly more characters after the last name:

(58)

if ( $string =~ m/^\S+\s+(Blair|Brown|Cameron)/i ) {

print "$string\n"; }

A more complex example is to print a string if it contains a valid phone number:

$string = "(01235) 264532";

if ( $string =~ m/($\d{5}$|(\+\d{1,2}\s$\d{1}$\s\d{4}))\s\d {6}/ )

{

print "Phone line: $string\n"; }

$string = "+44 (0) 1235 264532";

if ( $string =~ m/($\d{5}$|(\+\d{1,2}\s$\d{1}$\s\d{4}))\s\d {6}/ )

{

print "Phone line: $string\n"; }

Here we use a regular expression that allows for both national and interna-tional number formats (note that $ and $ are used to match the paren-theses in the strings and are not the parenparen-theses associated with the use of groups).

4.6 Other string operations

In addition to string comparison, we can also do string substitutions and translation:

4.6.1 Substitutions

Replace every “Gordon Brown" with “David Cameron":

(59)

Now do it ignoring the case of gOrDoN bRoWn:

$string =~ s/Gordon Brown/David Cameron/i;

Using g, instead of replacing the first instance of the pattern encounter we can replaceglobally all instances of the pattern in the string:

$string =~ s/Gordon Brown/David Cameron/g;

4.6.2 Translations

Translations are like substitutions, except they happen on a letter by letter basis instead of substituting a single phrase for another single phrase. For instance, what if you wanted to make all vowels upper case:

$string =~ tr/[a,e,i,o,u,y]/[A,E,I,O,U,Y]/;

Change everything to upper case:

$string =~ tr/[a-z]/[A-Z]/;

Change everything to lower case:

$string =~ tr/[A-Z]/[a-z]/;

Change all vowels to numbers to avoid "4 letter words" in a serial number:

$string =~ tr/[A,E,I,O,U,Y]/[1,2,3,4,5]/;

EXERCISE 14:

1. Using regular expressions, test whether a string has a valid IP address (IPv4) format.

Note: IPv4 addresses are canonically represented in dot-decimal nota-tion, which consists of four decimal numbers, each ranging from 0 to 255, separated by dots, e.g., 172.16.254.1.

(60)

2. Write a Perl program to read in an input file containing “Name Sur-name" lines and produce a second file with the format “Surname, Name" (note the comma after the surname). Use regular expressions to do the string conversion.

3. Write a Perl program to eliminate the blank lines from a text file, e.g. If the source file has the lines:

Line 1 Line 2

Line 4

Line 6

Your program should modify this file to become: Line 1

Line 2 Line 4 Line 5 Line 6

(61)

Hash Tables

The final variable type in Perl is thehash. A hash is a form of lookup table, it consists of a collection of key-value pairs, where both the key and value are scalars. You can retrieve a value from the hash by providing the key used to enter it.

Although you can have duplicate values in a hash the keys must be unique. If you try to insert the same key into a hash twice the second value will overwrite the first. Hashes do not preserve the order in which data was added to them. They store your data in an efficient manner which does not guarantee ordering. If you need things to be ordered use an array. If you need efficient retrieval use a hash. Figure 5.1 illustrates the differences between an array and hash schematically.

Figure 5.1: Schematic diagrams of (a) an array and (b) a hash. In an array the values are stored sequentially and there is an ordering to the data. In a hash, the keys are transformed into locations (via a something called a “hash function”) and the values are stored in an unordered state.

(62)

Hash names all start with the % symbol. Hash keys are simple scalars. Hash values can be accessed by putting the hash key in curly brackets{}after the hash name (which would now start with a$as we’re talking about a single scalar value rather than the whole hash. For example to retrieve the value for the key “alpha6574" from%nameswe would use$names{"alpha6574"}.

5.1 Creating a Hash

When you create a hash you can populate it with data from a list. This list must contain an even number of elements which come as consecutive sets of key-value pairs:

my %eye_colour = ( "Simon Brown", "Brown",

"Iain Smith", "Blue",

"Conor Murphy", "Grey" );

print $eye_colour{"Simon Brown"}; # prints Brown

Alternatively, it is also possible to use the=>operator in place of a comma (it’s also known as a fat comma). This has the same effect as a comma, and in addition it also automatically quotes the value to its left so you don’t need to put quotes around the key names. The code below does exactly the same thing as the one above.

my %eye_colour = ( "Simon Brown" => "Brown",

"Iain Smith" => "Blue",

"Conor Murphy" => "Grey" );

This version makes it much clearer which are the keys and which are the values.

5.2 Testing for keys in a hash

One very common operation is to query a hash to see if a particular key is already present. This is a seemingly a straightforward operation but it can lead to errors if we are not careful. One of the features of a hash is that although you need to declare the hash itself the first time you use it you do