Excursion: Functional Programming
LIST COMPREHENSIONS, FUNCTIONAL PROGRAMMING, ANONYMOUS FUNCTIONS
It can be considered good practice to avoid loops on the Python level as far as possible. list comprehensions and
functional programming tools like map, filter, and reduce provide means to write code without loops that is both
compact and in general more readable. lambda or anonymous functions are also powerful tools in this context.
Dicts
dict objects are dictionaries, and also mutable sequences, that allow data retrieval by keys
that can, for example, be string objects. They are so-called key-value stores. While list
objects are ordered and sortable, dict objects are unordered and unsortable. An example
best illustrates further differences to list objects. Curly brackets are what define dict
objects:
In [66]: d = {
‘Name’ : ‘Angela Merkel’, ‘Country’ : ‘Germany’, ‘Profession’ : ‘Chancelor’, ‘Age’ : 60 } type(d) Out[66]: dict
In [67]: print d[‘Name’], d[‘Age’] Out[67]: Angela Merkel 60
In [68]: d.keys()
Out[68]: [‘Country’, ‘Age’, ‘Profession’, ‘Name’]
In [69]: d.values()
Out[69]: [‘Germany’, 60, ‘Chancelor’, ‘Angela Merkel’]
In [70]: d.items()
Out[70]: [(‘Country’, ‘Germany’), (‘Age’, 60),
(‘Profession’, ‘Chancelor’), (‘Name’, ‘Angela Merkel’)]
In [71]: birthday = True
if birthday is True: d[‘Age’] += 1 print d[‘Age’] Out[71]: 61
There are several methods to get iterator objects from the dict object. The objects
behave like list objects when iterated over:
In [72]: for item in d.iteritems(): print item
Out[72]: (‘Country’, ‘Germany’) (‘Age’, 61)
(‘Profession’, ‘Chancelor’) (‘Name’, ‘Angela Merkel’)
In [73]: for value in d.itervalues(): print type(value)
Out[73]: <type ‘str’> <type ‘int’> <type ‘str’> <type ‘str’>
Table 4-3 provides a summary of selected operations and methods of the dict object.
Table 4-3. Selected operations and methods of dict objects
Method Arguments Returns/result
d[k] [k] Item of d with key k
d[k] = x [k] Sets item key k to x
del d[k] [k] Deletes item with key k
clear () Removes all items
copy () Makes a copy
has_key (k) True if k is a key
items () Copy of all key-value pairs
iterkeys () Iterator over all keys
itervalues () Iterator over all values
keys () Copy of all keys
poptiem (k) Returns and removes item with key k
update ([e]) Updates items with items from e
values () Copy of all values
Sets
The last data structure we will consider is the set object. Although set theory is a
cornerstone of mathematics and also finance theory, there are not too many practical applications for set objects. The objects are unordered collections of other objects,
containing every element only once:
In [74]: s = set([‘u’, ‘d’, ‘ud’, ‘du’, ‘d’, ‘du’]) s
Out[74]: {‘d’, ‘du’, ‘u’, ‘ud’}
In [75]: t = set([‘d’, ‘dd’, ‘uu’, ‘u’])
With set objects, you can implement operations as you are used to in mathematical set
theory. For example, you can generate unions, intersections, and differences: In [76]: s.union(t) # all of s and t
Out[76]: {‘d’, ‘dd’, ‘du’, ‘u’, ‘ud’, ‘uu’}
In [77]: s.intersection(t) # both in s and t
Out[77]: {‘d’, ‘u’}
In [78]: s.difference(t) # in s but not t
Out[78]: {‘du’, ‘ud’}
In [79]: t.difference(s) # in t but not s
Out[79]: {‘dd’, ‘uu’}
In [80]: s.symmetric_difference(t) # in either one but not both
Out[80]: {‘dd’, ‘du’, ‘ud’, ‘uu’}
One application of set objects is to get rid of duplicates in a list object. For example:
In [81]: from random import randint
l = [randint(0, 10) for i in range(1000)] # 1,000 random integers between 0 and 10
len(l) # number of elements in l
Out[81]: 1000 In [82]: l[:20] Out[82]: [8, 3, 4, 9, 1, 7, 5, 5, 6, 7, 4, 4, 7, 1, 8, 5, 0, 7, 1, 9] In [83]: s = set(l) s Out[83]: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
NumPy Data Structures
The previous section shows that Python provides some quite useful and flexible general
data structures. In particular, list objects can be considered a real workhorse with many
convenient characteristics and application areas. However, scientific and financial applications generally have a need for high-performing operations on special data structures. One of the most important data structures in this regard is the array. Arrays generally structure other (fundamental) objects in rows and columns.
Assume for the moment that we work with numbers only, although the concept
generalizes to other types of data as well. In the simplest case, a one-dimensional array then represents, mathematically speaking, a vector of, in general, real numbers, internally represented by float objects. It then consists of a single row or column of elements only.
In a more common case, an array represents an i × j matrix of elements. This concept generalizes to i × j × kcubes of elements in three dimensions as well as to general n- dimensional arrays of shape i × j × k × l × … .
Mathematical disciplines like linear algebra and vector space theory illustrate that such mathematical structures are of high importance in a number of disciplines and fields. It can therefore prove fruitful to have available a specialized class of data structures
explicitly designed to handle arrays conveniently and efficiently. This is where the Python
library NumPy comes into play, with its ndarray class.
Arrays with Python Lists
Before we turn to NumPy, let us first construct arrays with the built-in data structures
presented in the previous section. list objects are particularly suited to accomplishing
this task. A simple list can already be considered a one-dimensional array:
In [84]: v = [0.5, 0.75, 1.0, 1.5, 2.0] # vector of numbers
Since list objects can contain arbitrary other objects, they can also contain other list
objects. In that way, two- and higher-dimensional arrays are easily constructed by nested
list objects: In [85]: m = [v, v, v] # matrix of numbers m Out[85]: [[0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0]]
We can also easily select rows via simple indexing or single elements via double indexing (whole columns, however, are not so easy to select):
In [86]: m[1]
Out[86]: [0.5, 0.75, 1.0, 1.5, 2.0]
In [87]: m[1][0] Out[87]: 0.5
Nesting can be pushed further for even more general structures: In [88]: v1 = [0.5, 1.5] v2 = [1, 2] m = [v1, v2] c = [m, m] # cube of numbers c Out[88]: [[[0.5, 1.5], [1, 2]], [[0.5, 1.5], [1, 2]]]
In [89]: c[1][1][0] Out[89]: 1
Note that combining objects in the way just presented generally works with reference pointers to the original objects. What does that mean in practice? Let us have a look at the following operations: In [90]: v = [0.5, 0.75, 1.0, 1.5, 2.0] m = [v, v, v] m Out[90]: [[0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0]]
Now change the value of the first element of the v object and see what happens to the m
object: In [91]: v[0] = ‘Python’ m Out[91]: [[‘Python’, 0.75, 1.0, 1.5, 2.0], [‘Python’, 0.75, 1.0, 1.5, 2.0], [‘Python’, 0.75, 1.0, 1.5, 2.0]]
This can be avoided by using the deepcopy function of the copy module:
In [92]: from copy import deepcopy
v = [0.5, 0.75, 1.0, 1.5, 2.0] m = 3 * [deepcopy(v), ] m Out[92]: [[0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0]] In [93]: v[0] = ‘Python’ m Out[93]: [[0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0], [0.5, 0.75, 1.0, 1.5, 2.0]]
Regular NumPy Arrays
Obviously, composing array structures with list objects works, somewhat. But it is not
really convenient, and the list class has not been built with this specific goal in mind. It
has rather been built with a much broader and more general scope. From this point of view, some kind of specialized class could therefore be really beneficial to handle array- type structures.
Such a specialized class is numpy.ndarray, which has been built with the specific goal of
handling n-dimensional arrays both conveniently and efficiently — i.e., in a highly performing manner. The basic handling of instances of this class is again best illustrated by examples:
In [94]: import numpy as np
In [95]: a = np.array([0, 0.5, 1.0, 1.5, 2.0]) type(a)
Out[95]: numpy.ndarray
In [96]: a[:2] # indexing as with list objects in 1 dimension
Out[96]: array([ 0. , 0.5])
A major feature of the numpy.ndarray class is the multitude of built-in methods. For
instance:
In [97]: a.sum() # sum of all elements
In [98]: a.std() # standard deviation
Out[98]: 0.70710678118654757
In [99]: a.cumsum() # running cumulative sum
Out[99]: array([ 0. , 0.5, 1.5, 3. , 5. ])
Another major feature is the (vectorized) mathematical operations defined on ndarray
objects: In [100]: a * 2 Out[100]: array([ 0., 1., 2., 3., 4.]) In [101]: a ** 2 Out[101]: array([ 0. , 0.25, 1. , 2.25, 4. ]) In [102]: np.sqrt(a) Out[102]: array([ 0. , 0.70710678, 1. , 1.22474487, 1.41421356 ])
The transition to more than one dimension is seamless, and all features presented so far carry over to the more general cases. In particular, the indexing system is made consistent across all dimensions:
In [103]: b = np.array([a, a * 2]) b Out[103]: array([[ 0. , 0.5, 1. , 1.5, 2. ], [ 0. , 1. , 2. , 3. , 4. ]]) In [104]: b[0] # first row Out[104]: array([ 0. , 0.5, 1. , 1.5, 2. ])
In [105]: b[0, 2] # third element of first row
Out[105]: 1.0
In [106]: b.sum() Out[106]: 15.0
In contrast to our list object-based approach to constructing arrays, the numpy.ndarray
class knows axes explicitly. Selecting either rows or columns from a matrix is essentially the same:
In [107]: b.sum(axis=0)
# sum along axis 0, i.e. column-wise sum
Out[107]: array([ 0. , 1.5, 3. , 4.5, 6. ])
In [108]: b.sum(axis=1)
# sum along axis 1, i.e. row-wise sum
Out[108]: array([ 5., 10.])
There are a number of ways to initialize (instantiate) a numpy.ndarray object. One is as
presented before, via np.array. However, this assumes that all elements of the array are
already available. In contrast, one would maybe like to have the numpy.ndarray objects
instantiated first to populate them later with results generated during the execution of code. To this end, we can use the following functions:
In [109]: c = np.zeros((2, 3, 4), dtype=‘i’, order=‘C’) # also: np.ones()
c Out[109]: array([[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]], [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]], dtype=int32)
In [110]: d = np.ones_like(c, dtype=‘f16’, order=‘C’) # also: np.zeros_like()
d
Out[110]: array([[[ 1.0, 1.0, 1.0, 1.0], [ 1.0, 1.0, 1.0, 1.0],
[[ 1.0, 1.0, 1.0, 1.0], [ 1.0, 1.0, 1.0, 1.0],
[ 1.0, 1.0, 1.0, 1.0]]], dtype=float128)
With all these functions we provide the following information:
shape
Either an int, a sequence of ints, or a reference to another numpy.ndarray dtype (optional)
A numpy.dtype — these are NumPy-specific data types for numpy.ndarray objects order (optional)
The order in which to store elements in memory: C for C-like (i.e., row-wise) or F for Fortran-like (i.e., column-wise)
Here, it becomes obvious how NumPy specializes the construction of arrays with the numpy.ndarray class, in comparison to the list-based approach:
The shape/length/size of the array is homogenous across any given dimension. It only allows for a single data type (numpy.dtype) for the whole array.
The role of the order parameter is discussed later in the chapter. Table 4-4 provides an
overview of numpy.dtype objects (i.e., the basic data types NumPy allows).
Table 4-4. NumPy dtype objects
dtype Description Example
t Bit field t4 (4 bits)
b Boolean b (true or false)
i Integer i8 (64 bit)
u Unsigned integer u8 (64 bit)
f Floating point f8 (64 bit)
c Complex floating point c16 (128 bit)
O Object 0 (pointer to object)
U Unicode U24 (24 Unicode characters)
V Other V12 (12-byte data block)
NumPy provides a generalization of regular arrays that loosens at least the dtype restriction,
but let us stick with regular arrays for a moment and see what the specialization brings in terms of performance.
As a simple exercise, suppose we want to generate a matrix/array of shape 5,000 × 5,000 elements, populated with (pseudo)random, standard normally distributed numbers. We then want to calculate the sum of all elements. First, the pure Python approach, where we
make heavy use of list comprehensions and functional programming methods as well as lambda functions:
In [111]: import random
I = 5000
In [112]: %time mat = [[random.gauss(0, 1) for j in range(I)] for i in range(I)] # a nested list comprehension
Out[112]: CPU times: user 36.5 s, sys: 408 ms, total: 36.9 s Wall time: 36.4 s
In [113]: %time reduce(lambda x, y: x + y, \ [reduce(lambda x, y: x + y, row) \ for row in mat])
Out[113]: CPU times: user 4.3 s, sys: 52 ms, total: 4.35 s Wall time: 4.07 s
678.5908519876674
Let us now turn to NumPy and see how the same problem is solved there. For convenience,
the NumPy sublibrary random offers a multitude of functions to initialize a numpy.ndarray
object and populate it at the same time with (pseudo)random numbers: In [114]: %time mat = np.random.standard_normal((I, I))
Out[114]: CPU times: user 1.83 s, sys: 40 ms, total: 1.87 s Wall time: 1.87 s
In [115]: %time mat.sum()
Out[115]: CPU times: user 36 ms, sys: 0 ns, total: 36 ms Wall time: 34.6 ms
349.49777911439384
We observe the following: Syntax
Although we use several approaches to compact the pure Python code, the NumPy
version is even more compact and readable. Performance
The generation of the numpy.ndarray object is roughly 20 times faster and the
calculation of the sum is roughly 100 times faster than the respective operations in pure Python.