Some other variants of the recommended version of Python are worth mentioning.
2.3.1 Enthought Canopy
Enthought Canopy is an alternative to Anaconda. It is available for Windows, Linux and OS X. Canopy is regularly updated and is currently freely available in its basic version. The full version is also freely available to academic users. Canopy is built using MKL, and so matrix algebra performance is very fast.
2.3.2 IronPython
IronPython is a variant which runs on the Common Language Runtime (CLR , aka Windows .NET). The core modules – NumPy and SciPy – are available for IronPython, and so it is a viable alternative for nu- merical computing, especially if already familiar with the C# or interoperation with .NET components is important. Other libraries, for example, matplotlib (plotting) are not available, and so there are some important limitations.
2.3.3 Jython
Jython is a variant which runs on the Java Runtime Environment (JRE). NumPy is not available in Jython which severely limits Jython’s usefulness for numeric work. While the limitation is important, one advan- tage of Python over other languages is that it is possible to run (mostly unaltered) Python code on a JVM and to call other Java libraries.
2.3.4 PyPy
PyPy is a new implementation of Python which uses Just-in-time compilation to accelerate code, espe- cially loops (which are common in numerical computing). It may be anywhere between 2 - 500 times faster than standard Python. Unfortunately, at the time of writing, the core library, NumPy is only par- tially implemented, and so it is not ready for use. Current plans are to have a version ready in the near future, and if so, PyPy may quickly become the preferred version of Python for numerical computing.
2.A
Relevant Differences between Python 2.7 and 3
Most differences between Python 2.7 and 3 are not important for using Python in econometrics, statistics and numerical analysis. I will make three common assumptions which will allow 2.7 and 3 to be used interchangeable. The configuration instructions in the previous chapter for IPython will produce the ex- pected behavior when run interactively. Note that these differences are important in stand-alone Python programs.
2.A.1 print
printis a function used to display test in the console when running programs. In Python 2.7,printis a keyword which behaves differently from other functions. In Python 3,printbehaves like most functions. The standard use in Python 2.7 is
print ’String to Print’
while in Python 3 the standard use is print(’String to Print’)
which resembles calling a standard function. Python 2.7 contains a version of the Python 3print, which can be used in any program by including
from __future__ import print_function
at the top of the file. I prefer the Python 3 version ofprint, and so I assume that all programs will include this statement.
2.A.2 division
Python 3 changes the way integers are divided. In Python 2.7, the ratio of two integers was always an integer, and so results are truncated towards 0 if the result was fractional. For example, in Python 2.7,9/5
is 1. Python 3 gracefully converts the result to a floating point number, and so in Python 3,9/5is 1.8. When working with numerical data, automatically converting ratios avoids some rare errors. Python 2.7 can use the Python 3 behavior by including
from __future__ import division
at the top of the program. I assume that all programs will include this statement.
2.A.3 range and xrange
It is often useful to generate a sequence of number for use when iterating over the some data. In Python 2.7, the best practice is to use the keywordxrangeto do this, while in Python 3, this keyword has been renamedrange. I will always usexrangeand so it is necessary to replacexrangewithrangeif using Python 3.
2.A.4 Unicode strings
Unicode is an industry standard for consistently encoding text. The computer alphabet was originally lim- ited to 128 characters which is insufficient to contain the vast array of characters in all written languages.
Unicode expands the possible space to be up to 231characters (depending on encoding). Python 3 treats
all strings as unicode unlike Python 2.7 where characters are a single byte, and unicode strings require the special syntaxu’unicode string’orunicode(’unicode string’). In practice this is unlikely to impact most numeric code written in Python except possibly when reading or writing data. If working in a lan- guage where characters outside of the standard but limited 128 character set are commonly encountered, it may be useful to use
from __future__ import unicode_literals
Chapter 3
Built-in Data Types
Before diving into Python for analyzing data or running Monte Carlos, it is necessary to understand some basic concepts about the core Python data types. Unlike domain-specific languages such as MATLAB or R, where the default data type has been chosen for numerical work, Python is a general purpose pro- gramming language which is also well suited to data analysis, econometrics and statistics. For example, the basic numeric type in MATLAB is an array (using double precision, which is useful for floating point mathematics), while the basic numeric data type in Python is a 1-dimensional scalar which may be either an integer or a double-precision floating point, depending on the formatting of the number when input.
3.1
Variable Names
Variable names can take many forms, although they can only contain numbers, letters (both upper and lower), and underscores (_). They must begin with a letter or an underscore and are CaSe SeNsItIve. Additionally, some words are reserved in Python and so cannot be used for variable names (e.g.importor for). For example,
x = 1.0 X = 1.0 X1 = 1.0 X1 = 1.0 x1 = 1.0 dell = 1.0 dellreturns = 1.0 dellReturns = 1.0 _x = 1.0 x_ = 1.0
are all legal and distinct variable names. Note that names which begin or end with an underscore, while legal, are not normally used since by convention these convey special meaning.1 Illegal names do not follow these rules.
1
Variable names with a single leading underscores, for example_some_internal_value, indicate that the variable is for internal use by a module or class. While indicated to be private, this variable will generally be accessible by calling code. Dou- ble leading underscores, for example__some_private_valueindicate that a value is actually private and is not accessible. Variable names with trailing underscores are used to avoid conflicts with reserved Python words such asclass_orlambda_. Double leading and trailing underscores are reserved for “magic” variable (e.g. __init__) , and so should be avoided except when specifically accessing a feature.
# Not allowed
x: = 1.0 1X = 1 X-1 = 1
for = 1
Multiple variables can be assigned on the same line using commas,
x, y, z = 1, 3.1415, ’a’