• No results found

LIST COMPREHENSIONS, FUNCTIONAL PROGRAMMING, ANONYMOUS FUNCTIONS

Excursion: Functional Programming

LIST COMPREHENSIONS, FUNCTIONAL PROGRAMMING, ANONYMOUS FUNCTIONS

It can be considered good practice to avoid loops on the Python level as far as possible. list comprehensions and

functional programming tools like map, filter, and reduce provide means to write code without loops that is both

compact and in general more readable. lambda or anonymous functions are also powerful tools in this context.

Dicts

dict objects are dictionaries, and also mutable sequences, that allow data retrieval by keys

that can, for example, be string objects. They are so-called key-value stores. While list

objects are ordered and sortable, dict objects are unordered and unsortable. An example

best illustrates further differences to list objects. Curly brackets are what define dict

objects:

In [66]: d = {

‘Name’ : ‘Angela Merkel’, ‘Country’ : ‘Germany’, ‘Profession’ : ‘Chancelor’, ‘Age’ : 60 } type(d) Out[66]: dict

In [67]: print d[‘Name’], d[‘Age’] Out[67]: Angela Merkel 60

In [68]: d.keys()

Out[68]: [‘Country’, ‘Age’, ‘Profession’, ‘Name’]

In [69]: d.values()

Out[69]: [‘Germany’, 60, ‘Chancelor’, ‘Angela Merkel’]

In [70]: d.items()

Out[70]: [(‘Country’, ‘Germany’), (‘Age’, 60),

(‘Profession’, ‘Chancelor’), (‘Name’, ‘Angela Merkel’)]

In [71]: birthday = True

if birthday is True: d[‘Age’] += 1 print d[‘Age’] Out[71]: 61

There are several methods to get iterator objects from the dict object. The objects

behave like list objects when iterated over:

In [72]: for item in d.iteritems(): print item

Out[72]: (‘Country’, ‘Germany’) (‘Age’, 61)

(‘Profession’, ‘Chancelor’) (‘Name’, ‘Angela Merkel’)

In [73]: for value in d.itervalues(): print type(value)

Out[73]: <type ‘str’> <type ‘int’> <type ‘str’> <type ‘str’>

Table 4-3 provides a summary of selected operations and methods of the dict object.

Table 4-3. Selected operations and methods of dict objects

Method Arguments Returns/result

d[k] [k] Item of d with key k

d[k] = x [k] Sets item key k to x

del d[k] [k] Deletes item with key k

clear () Removes all items

copy () Makes a copy

has_key (k) True if k is a key

items () Copy of all key-value pairs

iterkeys () Iterator over all keys

itervalues () Iterator over all values

keys () Copy of all keys

poptiem (k) Returns and removes item with key k

update ([e]) Updates items with items from e

values () Copy of all values

Sets

The last data structure we will consider is the set object. Although set theory is a

cornerstone of mathematics and also finance theory, there are not too many practical applications for set objects. The objects are unordered collections of other objects,

containing every element only once:

In [74]: s = set([‘u’, ‘d’, ‘ud’, ‘du’, ‘d’, ‘du’]) s

Out[74]: {‘d’, ‘du’, ‘u’, ‘ud’}

In [75]: t = set([‘d’, ‘dd’, ‘uu’, ‘u’])

With set objects, you can implement operations as you are used to in mathematical set

theory. For example, you can generate unions, intersections, and differences: In [76]: s.union(t) # all of s and t

Out[76]: {‘d’, ‘dd’, ‘du’, ‘u’, ‘ud’, ‘uu’}

In [77]: s.intersection(t) # both in s and t

Out[77]: {‘d’, ‘u’}

In [78]: s.difference(t) # in s but not t

Out[78]: {‘du’, ‘ud’}

In [79]: t.difference(s) # in t but not s

Out[79]: {‘dd’, ‘uu’}

In [80]: s.symmetric_difference(t) # in either one but not both

Out[80]: {‘dd’, ‘du’, ‘ud’, ‘uu’}

One application of set objects is to get rid of duplicates in a list object. For example:

In [81]: from random import randint

l = [randint(0, 10) for i in range(1000)] # 1,000 random integers between 0 and 10

len(l) # number of elements in l

Out[81]: 1000 In [82]: l[:20] Out[82]: [8, 3, 4, 9, 1, 7, 5, 5, 6, 7, 4, 4, 7, 1, 8, 5, 0, 7, 1, 9] In [83]: s = set(l) s Out[83]: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

NumPy Data Structures

The previous section shows that Python provides some quite useful and flexible general

data structures. In particular, list objects can be considered a real workhorse with many

convenient characteristics and application areas. However, scientific and financial applications generally have a need for high-performing operations on special data structures. One of the most important data structures in this regard is the array. Arrays generally structure other (fundamental) objects in rows and columns.

Assume for the moment that we work with numbers only, although the concept

generalizes to other types of data as well. In the simplest case, a one-dimensional array then represents, mathematically speaking, a vector of, in general, real numbers, internally represented by float objects. It then consists of a single row or column of elements only.

In a more common case, an array represents an i × j matrix of elements. This concept generalizes to i × j × kcubes of elements in three dimensions as well as to general n- dimensional arrays of shape i × j × k × l × … .

Mathematical disciplines like linear algebra and vector space theory illustrate that such mathematical structures are of high importance in a number of disciplines and fields. It can therefore prove fruitful to have available a specialized class of data structures

explicitly designed to handle arrays conveniently and efficiently. This is where the Python

library NumPy comes into play, with its ndarray class.

Arrays with Python Lists

Before we turn to NumPy, let us first construct arrays with the built-in data structures

presented in the previous section. list objects are particularly suited to accomplishing

this task. A simple list can already be considered a one-dimensional array:

In [84]: v = [0.5, 0.75, 1.0, 1.5, 2.0] # vector of numbers

Since list objects can contain arbitrary other objects, they can also contain other list

objects. In that way, two- and higher-dimensional arrays are easily constructed by nested

list objects: In [85]: m = [v, v, v] # matrix of numbers m Out[85]: [[0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0]]

We can also easily select rows via simple indexing or single elements via double indexing (whole columns, however, are not so easy to select):

In [86]: m[1]

Out[86]: [0.5, 0.75, 1.0, 1.5, 2.0]

In [87]: m[1][0] Out[87]: 0.5

Nesting can be pushed further for even more general structures: In [88]: v1 = [0.5, 1.5] v2 = [1, 2] m = [v1, v2] c = [m, m] # cube of numbers c Out[88]: [[[0.5, 1.5], [1, 2]], [[0.5, 1.5], [1, 2]]]

In [89]: c[1][1][0] Out[89]: 1

Note that combining objects in the way just presented generally works with reference pointers to the original objects. What does that mean in practice? Let us have a look at the following operations: In [90]: v = [0.5, 0.75, 1.0, 1.5, 2.0] m = [v, v, v] m Out[90]: [[0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0]]

Now change the value of the first element of the v object and see what happens to the m

object: In [91]: v[0] = ‘Python’ m Out[91]: [[‘Python’, 0.75, 1.0, 1.5, 2.0], [‘Python’, 0.75, 1.0, 1.5, 2.0], [‘Python’, 0.75, 1.0, 1.5, 2.0]]

This can be avoided by using the deepcopy function of the copy module:

In [92]: from copy import deepcopy

v = [0.5, 0.75, 1.0, 1.5, 2.0] m = 3 * [deepcopy(v), ] m Out[92]: [[0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0]] In [93]: v[0] = ‘Python’ m Out[93]: [[0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0]]

Regular NumPy Arrays

Obviously, composing array structures with list objects works, somewhat. But it is not

really convenient, and the list class has not been built with this specific goal in mind. It

has rather been built with a much broader and more general scope. From this point of view, some kind of specialized class could therefore be really beneficial to handle array- type structures.

Such a specialized class is numpy.ndarray, which has been built with the specific goal of

handling n-dimensional arrays both conveniently and efficiently — i.e., in a highly performing manner. The basic handling of instances of this class is again best illustrated by examples:

In [94]: import numpy as np

In [95]: a = np.array([0, 0.5, 1.0, 1.5, 2.0]) type(a)

Out[95]: numpy.ndarray

In [96]: a[:2] # indexing as with list objects in 1 dimension

Out[96]: array([ 0. , 0.5])

A major feature of the numpy.ndarray class is the multitude of built-in methods. For

instance:

In [97]: a.sum() # sum of all elements

In [98]: a.std() # standard deviation

Out[98]: 0.70710678118654757

In [99]: a.cumsum() # running cumulative sum

Out[99]: array([ 0. , 0.5, 1.5, 3. , 5. ])

Another major feature is the (vectorized) mathematical operations defined on ndarray

objects: In [100]: a * 2 Out[100]: array([ 0., 1., 2., 3., 4.]) In [101]: a ** 2 Out[101]: array([ 0. , 0.25, 1. , 2.25, 4. ]) In [102]: np.sqrt(a) Out[102]: array([ 0. , 0.70710678, 1. , 1.22474487, 1.41421356 ])

The transition to more than one dimension is seamless, and all features presented so far carry over to the more general cases. In particular, the indexing system is made consistent across all dimensions:

In [103]: b = np.array([a, a * 2]) b Out[103]: array([[ 0. , 0.5, 1. , 1.5, 2. ], [ 0. , 1. , 2. , 3. , 4. ]]) In [104]: b[0] # first row Out[104]: array([ 0. , 0.5, 1. , 1.5, 2. ])

In [105]: b[0, 2] # third element of first row

Out[105]: 1.0

In [106]: b.sum() Out[106]: 15.0

In contrast to our list object-based approach to constructing arrays, the numpy.ndarray

class knows axes explicitly. Selecting either rows or columns from a matrix is essentially the same:

In [107]: b.sum(axis=0)

# sum along axis 0, i.e. column-wise sum

Out[107]: array([ 0. , 1.5, 3. , 4.5, 6. ])

In [108]: b.sum(axis=1)

# sum along axis 1, i.e. row-wise sum

Out[108]: array([ 5., 10.])

There are a number of ways to initialize (instantiate) a numpy.ndarray object. One is as

presented before, via np.array. However, this assumes that all elements of the array are

already available. In contrast, one would maybe like to have the numpy.ndarray objects

instantiated first to populate them later with results generated during the execution of code. To this end, we can use the following functions:

In [109]: c = np.zeros((2, 3, 4), dtype=‘i’, order=‘C’) # also: np.ones()

c Out[109]: array([[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]], dtype=int32)

In [110]: d = np.ones_like(c, dtype=‘f16’, order=‘C’) # also: np.zeros_like()

d

Out[110]: array([[[ 1.0, 1.0, 1.0, 1.0], [ 1.0, 1.0, 1.0, 1.0],

[[ 1.0, 1.0, 1.0, 1.0], [ 1.0, 1.0, 1.0, 1.0],

[ 1.0, 1.0, 1.0, 1.0]]], dtype=float128)

With all these functions we provide the following information:

shape

Either an int, a sequence of ints, or a reference to another numpy.ndarray dtype (optional)

A numpy.dtype — these are NumPy-specific data types for numpy.ndarray objects order (optional)

The order in which to store elements in memory: C for C-like (i.e., row-wise) or F for Fortran-like (i.e., column-wise)

Here, it becomes obvious how NumPy specializes the construction of arrays with the numpy.ndarray class, in comparison to the list-based approach:

The shape/length/size of the array is homogenous across any given dimension. It only allows for a single data type (numpy.dtype) for the whole array.

The role of the order parameter is discussed later in the chapter. Table 4-4 provides an

overview of numpy.dtype objects (i.e., the basic data types NumPy allows).

Table 4-4. NumPy dtype objects

dtype Description Example

t Bit field t4 (4 bits)

b Boolean b (true or false)

i Integer i8 (64 bit)

u Unsigned integer u8 (64 bit)

f Floating point f8 (64 bit)

c Complex floating point c16 (128 bit)

O Object 0 (pointer to object)

U Unicode U24 (24 Unicode characters)

V Other V12 (12-byte data block)

NumPy provides a generalization of regular arrays that loosens at least the dtype restriction,

but let us stick with regular arrays for a moment and see what the specialization brings in terms of performance.

As a simple exercise, suppose we want to generate a matrix/array of shape 5,000 × 5,000 elements, populated with (pseudo)random, standard normally distributed numbers. We then want to calculate the sum of all elements. First, the pure Python approach, where we

make heavy use of list comprehensions and functional programming methods as well as lambda functions:

In [111]: import random

I = 5000

In [112]: %time mat = [[random.gauss(0, 1) for j in range(I)] for i in range(I)] # a nested list comprehension

Out[112]: CPU times: user 36.5 s, sys: 408 ms, total: 36.9 s Wall time: 36.4 s

In [113]: %time reduce(lambda x, y: x + y, \ [reduce(lambda x, y: x + y, row) \ for row in mat])

Out[113]: CPU times: user 4.3 s, sys: 52 ms, total: 4.35 s Wall time: 4.07 s

678.5908519876674

Let us now turn to NumPy and see how the same problem is solved there. For convenience,

the NumPy sublibrary random offers a multitude of functions to initialize a numpy.ndarray

object and populate it at the same time with (pseudo)random numbers: In [114]: %time mat = np.random.standard_normal((I, I))

Out[114]: CPU times: user 1.83 s, sys: 40 ms, total: 1.87 s Wall time: 1.87 s

In [115]: %time mat.sum()

Out[115]: CPU times: user 36 ms, sys: 0 ns, total: 36 ms Wall time: 34.6 ms

349.49777911439384

We observe the following: Syntax

Although we use several approaches to compact the pure Python code, the NumPy

version is even more compact and readable. Performance

The generation of the numpy.ndarray object is roughly 20 times faster and the

calculation of the sum is roughly 100 times faster than the respective operations in pure Python.