Using variables - Python Programming for Biology_

As we have already illustrated, we can create a named item, which here we call ‘x’ for simplicity, and assign it to a value: >>> x = 17 Using the above example we can introduce some more jargon computing terms. On the left-hand side, before the equals sign, we have a variable. On the right-hand side we have a literal. The whole line here is a statement and specifies that the variable is set to have value equal to the literal. As you might expect from the term ‘variable’, the value of x may be changed by assigning a new value: >>> print(x) 17 >>> x = 3 >>> print(x) 3 In Python the names we give variable data can contain only the usual 26 letters (upper and lower case), numbers and underscores (‘_‘), with the additional restriction that they cannot begin with a number. Note that names are case-sensitive, so the variables DNA, Dna and dna are all treated as distinct. In general, variables should have names that indicate what their purpose and/or type are, in order to make the code more understandable. For example, if you state freeEnergy = heat-entropy and not x = p-q, you can see at a glance what is intended and have your program more easily understood, including by yourself at a later date, without any additional comments.

We can use as many different variable names as we like and assign their value based on other variables. For example, in the following we assign a value to x and then assign a value for y based on x:

>>> x = 17 >>> y = x * 13 >>> print(y) 221

Unlike many computing languages, Python is not a language where you must initially specify, and then stick to, a given kind or type of data for a given variable. You could initially allocate a numeric value to ‘x’, without advance warning, and then later on change ‘x’ to some text. This differs from languages like C and Java, for example, where you would have to declare up front what type of data ‘x’ was to contain. In Python, the type of variable is specified by the type of whatever its value is set to. So if you redefine a variable its type may change. Although variables can change type, it is usually best to avoid that practice. >>> x = 4 # x is set to the integer 4 >>> 3*x 12 >>> x = 7.1 # x now set to the floating point number 7.1 >>> 3*x 21.299999999999997 The above example reminds us that floating point calculations are not always precise. The answer could also depend on the Python implementation and version.

Simple data types

As with other computer languages, Python has various simple, inbuilt types of data. These are Boolean values, integers, floating point numbers, complex numbers, text strings and the null object.

Boolean values represent truth or falsehood, as used in logic operations. Not surprisingly, there are only two values, and in Python they are called True and False.3 Example usage:

a = True b = False

Integers represent whole numbers, as you would use when counting items, and can be positive or negative. In Python 2 there are two types of integers, plain integers and long integers.4 Plain integers have a maximum size dependent on the specific Python implementation you are using. On a typical computer the largest plain integer would be 231−1 or 263−1 (for 32 bit and 64 bit respectively). There is no limit on long integers except for what can fit into available memory. In Python 3 there is only one type of integer, the long integer. Unless you are doing something unusual, there is no point worrying about this distinction or the difference between the two types, and in most situations in Python 2 the plain integers will suffice. Example usage:

x = -7 y = 123

Floating point numbers (in mathematics the real numbers), which are written with decimal points or exponential notation, are not always represented exactly, since a computer has only a finite amount of memory. This introduces issues to do with numerical errors, and potential instability of numerical algorithms. However, such issues are

common to all computer languages. Example usage:

z = 123.45

There is also an inbuilt data type to represent complex numbers which you would normally write in the form ‘a+bi’ (mathematical notation) or ‘a+bj’ (engineering notation). Although complex numbers occur quite naturally in mathematics, science and engineering, relatively few Python programs use them. The Python syntax follows the engineering style and the real and imaginary parts can themselves be integer or floating point:

x = 3+4j y = 1.2-5.8j

Strings represent text, i.e. strings of characters. They can be delimited by single quotes (’) or double quotes (”), but you have to use the same delimiter at both ends. Unlike some programming languages, such as Perl, there is no practical difference between the two types of quote, although using one type does allow the other type to appear inside the string as a regular character. Example usage:

r1 = 'Ala' r2 = "Arg"

text = "It's a line with an apostrophe"

Python also allows multi-line strings, which start and end either with triple single quotes (”’) or triple double quotes (”””). Example usage:

text = """Python also allows multi-line strings, which start and end with a triple single quote or a triple double quote."""

Note that the indentation inside the string does not have to align with the start of the statement. Any whitespace at the beginning or end of the internal lines, i.e. between the opening and closing triple quotes, does make a difference though. Hence, if the second line of text were indented, then those indentation spaces would be present in the string.

The last of the basic data types we cover here is a special built-in value called None, which can be thought of as representing nothingness or that something is undefined. For example, it can be used to indicate that a variable exists, but has not yet been set to anything specific. Example usage: z = None Finally, if you have a variable and want to know what its data type is then you can use the type() function. This actually generates a special object representing the type, though it prints out in an informative way: print( type(x) ) # 'complex' print( type(z) ) # 'NoneType'

Arithmetic

Python mostly uses a similar syntax to other computer languages for performing numerical arithmetic: x + y # addition x – y # subtraction x * y # multiplication x / y # division x // y # floored division x % y # remainder of x / y x ** y # x to power y pow(x, y) # x to power y The variables x and y can be integers, floating point numbers or a mixture. If both are integers the result is also an integer, except in the case of division for Python version 3. Otherwise the result is a floating point number, even if it represents a whole number. Thus 4.6 + 2.4 is 7.0, not 7. This also includes the floored quotient, x//y, which gives the whole number part of the division of x and y as floating point. For example, 13.3//2.1 gives 6.0, not the integer equivalent.5 A non-programmer might wonder why x//y is useful at all. However, it turns out that it does come up in various contexts, but mostly when x and y are integers. This brings up an oddity, which Python, before version 3, shares in common with many computer languages, namely that for integers, the operation x/y is the same as x//y. A non-programmer might expect that 13/5 is equal to 2.6, but in fact it is equal to 2, the integer part of that. This is in contrast to doing division where at least one floating point number is involved like 13/5.0, 13.0/5 or 13.0/5.0, which are all indeed equal to 2.6. Hence in Python 2, if you have two integers and want to do the traditional non-integer division then you can explicitly convert one of them to a floating point number using the float() function, so, for example, float(13)/5. (There is also an int() function for converting floating point numbers to their integer part.)

It is a historic accident that integer division behaves this way, although the situation changes in Python 3, where integer division reverts to its more traditional ‘human’ meaning, so 13/5 now does equal 2.6. Accordingly, it is recommended that in Python 2 you avoid x/y if x and y are integers, but instead use x//y. Example arithmetic results: 13 + 5 # 18 13.0 + 5 # 18.0 13 – 5 # 8 13 – 5.0 # 8.0 13 / 5 # 2 in Python 2; 2.6 in Python 3 float(13) / 5 # 2.6 13.0 / 5 # 2.6 13 // 5 # 2 13 // 5.0 # 2.0

than addition and subtraction, but arithmetic expressions can be grouped using parentheses to override the default precedence. So we have:

13 * 2 + 5 # 31 since "*" has higher precedence than "+" (13 * 2) + 5 # 31

13 * (2 + 5) # 91

A common situation that arises is that a variable needs to be incremented by some value. For example, you could have:

x = x + 1

which increases the value of x by 1. Python allows a shorthand notation for this kind of statement: x += 1 Also, it allows similar notation for the other arithmetic operations, for example: x *= y assigns x to be the product of x and y, or in other words x is redefined by being multiplied by y.

String manipulation

Text items in Python are called strings, referring to the fact that they are strings of characters. String functionality is an important part of the Python toolbox. For example, a file on disk (covered in Chapter 6) is read as a string or a list of strings; a file can be viewed as a collection of characters. Here, even if part of the loaded file represents a number, it is initially represented as a string of characters, not a proper Python numeric object. In Python, strings are not modifiable. This might seem like a limitation, but in fact it rarely is because it is easy enough to create a new, modified string from an existing string. And since strings are not modifiable it means that they can be placed in sets and used as keys in dictionaries, both of which are exceedingly useful.

In this section we will illustrate some basic manipulations on strings using the following example string:

text = 'hello world' # same as double quoted "hello world"

In some ways a string can be thought of as a list of characters, although in Python a list of characters would be a different entity (see below for a discussion of lists). Note that when we refer to something in a string as being a character, we don’t just mean the regular symbols for letters, numbers and punctuation; we also include spaces and formatting codes (tab stop, new line etc.). You can access the character at a specific position, or index, using square brackets:

text[1] # 'e'

text[5] # ' ' – a space

Thus the first character of a string is index number 0. At first this can seem odd to non- programmers, but it is by far the most sensible convention, and is used in most modern computer languages. Bear in mind that we cannot change the characters of a string. For example, we get an error if we try to change the first position to an ‘H’: text[0] = 'H' # Fails! TypeError: 'str' object does not support item assignment You can count backwards from the end of the string, where index -1 is the last character of the string: text[-3] # 'r'

If a string has n characters, then the minimum value of the index is –n and the maximum value is n-1. If the index falls outside this range an error is generated; Python makes an Exception object which reports what the error was (see the next chapter for a description of these).

Python also has a very convenient slicing notation, to access a substring from within a string. The notation [start:stop] refers to the characters from position start up to but not including position stop. As with single indices, these positions can be negative. The fact that it is ‘up to but not including’ might seem odd, but as with the indices counting from 0, this turns out to be a sensible convention. In particular, if start and stop numbers in the slice notation are both non-negative then the number of characters in the resulting slice is just the difference between the two values (stop-start), or put another way [start:start+n] gives n characters.

As a further convenience, if you leave out the start entirely giving just [:stop], then the slice starts at the very beginning; the start point is taken to be 0. If you leave out the stop, so have [start:], then the slice continues to the very end; as if stop were taken to be the length of the string. Thus, for example, [:n] refers to the first n characters of the string. text[1:3] # 'el' text[1:] # 'ello world' text[1:-1] # 'ello worl' text[:-1] # 'hello worl' This leads to the proper way to (effectively) change the first character of the example string. We can use a slice to access the characters we wish to keep and redefine text: text = 'H' + text[1:] # 'Hello world' You can check if a substring is contained in a string: 'wor' in text # True 'war' in text # False or is not contained in (is absent from) a string: 'wor' not in text # False 'war' not in text # True

There are two functions that let you determine the position of (the first occurrence of) a substring inside a string:

text.index('wor') # 6 text.find('wor') # 6

Note that the value returned is the index of the first character of the substring in the string. The difference between these functions is how they deal with the situation when the substring is not contained in the string. For the index() function an error is generated, but instead the find() function returns −1:

text.find('war') # -1

It is a matter of taste which version you use. Nonetheless, it might have been better for find() to return None if the substring isn’t present. You can search from the (right-hand) end of the string instead of the beginning: text.index('l') # 2 text.rindex('l') # 9 text.find('l') # 2 text.rfind('l') # 9 When you read a file, you often end up with whitespace characters (newlines, carriage returns, tabs and spaces) that you want to get rid of, or deal with. There are various functions for this. Here we will consider a string with two leading spaces and two trailing spaces:

line = ' hello world '

You can strip off the whitespace from both ends:

line.strip() # 'hello world'

Note that since strings are not modifiable, this gives back a new string; it does not modify the original string. You can also strip whitespace from just the beginning (left) or end (right) of the string:

line.lstrip() # 'hello world ' line.rstrip() # ' hello world'

There is no inbuilt function to remove all whitespace from everywhere in the string, including any in the middle. This is possible using the regular expression module, which we discuss in detail in Appendix 5.

You can split up your string into separate substrings according to the presence of whitespace. This creates a list of strings, where a ‘list’ is simply a container for the strings (here represented by square brackets). Lists are Python objects in their own right and are discussed further in the next section.

line.split() # ['hello', 'world'] – a list of two strings

Note that this automatically strips off the whitespace at the beginning and end before doing any splitting. You can also split on an arbitrary substring, noting that (quite

sensibly) this does not strip off the whitespace at the beginning or end:

line.split('wor') # [' hello ', 'ld ']

Given that you can split a string into parts, it is quite natural that you can also do the opposite and join a number of strings together into one long string. For example, given a variable that represents a list of strings, which we write inside square brackets and separate with commas:

myList = ['Homer', 'Marge', 'Maude', 'Ned']

you may want to create one long, combined string:

longText = 'Homer, Marge, Maude, Ned'

This is done using the join() function, where you connect the items from the list with some other connecting string (e.g. with commas and spaces). However, although you might expect the joining function to come from the list, it actually belongs to the connecting string. Thus, you do not do: longText = myList.join(connectorString) # Not used Instead the correct Python way is: longText = connectorString.join(myList) The syntax can take a bit of time to become familiar, because the string that is linking things together might be defined on the same line where the joining occurs. Considering the following: cities = ['London', 'Paris', 'Berlin'] connector = '->' connector.join(cities) # 'London->Paris->Berlin' The last lines could be written as one, without an intermediate variable name: '->'.join(cities) # 'London->Paris->Berlin'

Thus, the connecting string is the thing that comes before the dot. A further point, which can catch you out, is that all the items that are to be joined together have to be strings; no other type will do. Also, the joining string is only added in-between the items of the list not at the beginning or end.

The join() function also allows you to concatenate items together without adding any extra characters, using an empty string. For example, suppose you have a list of one-letter codes for a DNA sequence (or protein or RNA) and want to create a string of all the letters joined together. Then you could do:

sequence = ['G', 'C', 'A', 'T']

seq = ''.join(sequence) # 'GCAT'

You can also do string concatenation using the ‘+’ operator, so an alternative to the above would be:

seq = sequence[0] + sequence[1] + sequence[2] + sequence[3] # seq is 'GCAT'

This is generally not a good approach if the list is long, because it is much less efficient than using the join() method. And in any case you would usually not write out the list elements in full; you would use a loop to go through each item in turn (see the next chapter). On the other hand, for concatenating only a few strings together it is perfectly acceptable to do it this way. As another example, suppose you have some numbers and want to create a string with this information in it. Then you could do the following, converting the numbers to strings using str(): x = 12 y = 5 text = "I have " + str(x) + " apples and " + str(y) + " oranges." # the text is "I have 12 apples and 5 oranges." Even here, though, Python offers an alternative, which is to use a formatted string. So we could write the above instead as: text = "I have %d apples and %d oranges." % (x,y) Here %d is a formatting code and represents the places in the text to insert the digits. The values for the digits are contained in the round-bracketed ‘tuple’ collection at the end (see below for discussion of tuples), after the bare % sign. Naturally, there should be as many formatting codes in the initial string as there are items to insert. If we were inserting other types of data then we would use different codes, for example, %s to insert a string and %f for a floating point value: name = 'Barry' weight = 82.173 text = "The weight of %s is %f kg" % (name, weight) # Gives "The weight of Barry is 82.173000 kg"

We can optionally specify the number of decimal places to use for the floating point value by adjusting its formatting code. For example, %.1f can be used so that the weight is written out with one digit after the decimal place, rounding as appropriate:

text = "The weight of %s is %.1f kg" % (name, weight) # Gives "The weight of Barry is 82.2 kg"

If you also wanted at least five total characters for the weight, padding with spaces, you would write %5.1f. It is notable that you can actually use %s for every type of data, because values will be automatically converted into a representative string, but if you want to fine-tune the appearance of floating point numbers then it is best to use the %f construct.

There are analogous options for the %d construct used with integers. So %5d means that at least five places are used to display the integer, and %05d means that you zero-pad

In document Python Programming for Biology_ - Tim J. Stevens (Page 36-45)