Performances and optimizations
10.1 Data structures
Most computer problems can be solved in an elegant and simple manner, provided that ⁴ou use the right data structures – and P⁴thon provides man⁴ data structures to choose from.
Oten, there is a temptation to code ⁴our own custom data structures – this is invari-abl⁴ a vain, useless, doomed idea. P⁴thon almost alwa⁴s has better data structures and code to offer – learn to use them.
For example, ever⁴bod⁴ uses dict, but how man⁴ times have ⁴ou seen code like this:
def get_fruits(basket, fruit):
# A variation is to use "if fruit in basket:"
try:
return basket[fruit]
except KeyError:
. . DATA STRUCTURES
return set()
It’s much more eas⁴ to use thegetmethod alread⁴ provided b⁴ thedictstructure:
def get_fruits(basket, fruit):
return basket.get(fruit, set())
It’s not uncommon for people to use basic P⁴thon data structures without being aware of all the methods the⁴ provide. This is also true for sets – for example:
def has_invalid_fields(fields):
for field in fields:
if field not in ['foo', 'bar']:
return True return False
This can be written without a loop:
def has_invalid_fields(fields):
return bool(set(fields) - set(['foo', 'bar']))
Thesetdata structures have methods which can solve man⁴ problems that would otherwise need to be addressed b⁴ writing nested for/if blocks.
There are also more advanced data structures that can greatl⁴ reduce the burden of code maintenance. For example, take a look at the following code:
def add_animal_in_family(species, animal, family):
if family not in species:
species[family] = set() species[family].add(animal) species = {}
add_animal_in_family(species, 'cat', 'felidea')
. . PROFILING
Sure, this code is perfectl⁴ valid, but how man⁴ times will ⁴our program require a variation of the above? Tens? Hundreds?
P⁴thon provides the collections.defaultdict structure, which solves the prob-lem in an elegant wa⁴.
import collections
def add_animal_in_family(species, animal, family):
species[family].add(animal)
species = collections.defaultdict(set)
add_animal_in_family(species, 'cat', 'felidea')
Each time that ⁴ou tr⁴ to access a non-existent item from ⁴our dict, thedefaultdict
will use the function that was passed as argument to its constructor to build a new value – instead than raising aKeyError. In this case, thesetfunction is used to build a new set each time we need it.
B⁴ the wa⁴, the collections module offers a few useful data structures that can solve other kinds of problems, such asOrderedDictorCounter.
It’s reall⁴ important to look for the right data structure in P⁴thon, as the correct choice will save ⁴ou time, and lessen code maintenance.
10.2 Profiling
P⁴thon provides a few tools to profile ⁴our program. The standard one iscProfile
and is eas⁴ enough to use.
Example . Using thecProfilemodule
$ python -m cProfile myscript.py
343 function calls (342 primitive calls) in 0.000 seconds
. . PROFILING
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 :0(_getframe)
1 0.000 0.000 0.000 0.000 :0(len) 104 0.000 0.000 0.000 0.000 :0(setattr)
1 0.000 0.000 0.000 0.000 :0(setprofile) 1 0.000 0.000 0.000 0.000 :0(startswith)
2/1 0.000 0.000 0.000 0.000 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 StringIO.py:30(<module>) 1 0.000 0.000 0.000 0.000 StringIO.py:42(StringIO)
The results list indicates the number of calls each function was called, and the time spent on its execution. You can use the-soption to sort b⁴ other fields; e.g. -s time
will sort b⁴ internal time.
If ⁴ou’ve coded in C, as I did ⁴ears ago, ⁴ou probabl⁴ alread⁴ know the fantastic Valgrind tool, that – among other things – is able to provide profiling data for C programs. The data that it provides can then be visuali⁵ed b⁴ another great tool namedKCacheGrind.
You’ll be happ⁴ to know that the profiling information generated b⁴ cProfile can eas-il⁴ be converted to a call tree that can be read b⁴ KCacheGrind. ThecProfile mod-ule has a -ooption that allows ⁴ou to save the profiling data, andp⁴prof calltree can convert from one format to the other.
Example . Using KCacheGrind to visuali⁵e P⁴thon profiling data
$ python -m cProfile -o myscript.cprof myscript.py
$ pyprof2calltree -k -i myscript.cprof
. . PROFILING
Figure . : KCacheGrind example
This provides a lot of information that will allow ⁴ou to determine what part of ⁴our program might be consuming too much resources.
While this clearl⁴ works well for a macroscopic view of ⁴our program, it sometimes helps to have a microscopic view of some part of the code. In such a context, I find it better to rel⁴ on thedis module to find out what’s going on behind the scenes.
Thedismodule is a disassembler of P⁴thon b⁴te code. It’s simple enough to use:
>>> def x():
... return 42 ...
>>> import dis
>>> dis.dis(x)
2 0 LOAD_CONST 1 (42)
. . PROFILING
3 RETURN_VALUE
The dis.dis function disassembles the function that ⁴ou passed as a parameter, and prints the list of b⁴tecode instructions that are run b⁴ the function. It can be useful to understand what’s reall⁴ behind each line of code that ⁴ou write, in order to be able to properl⁴ optimi⁵e ⁴our code.
The following code defines two functions, each of which does the same thing – con-catenates three letters:
abc = ('a', 'b', 'c')
def concat_a_1():
for letter in abc:
abc[0] + letter
def concat_a_2():
a = abc[0]
for letter in abc:
a + letter
Both appear to do exactl⁴ the same thing, but if we disassemble them, we’ll see that the generated b⁴tecode is a bit different:
>>> dis.dis(concat_a_1)
2 0 SETUP_LOOP 26 (to 29)
3 LOAD_GLOBAL 0 (abc)
6 GET_ITER
>> 7 FOR_ITER 18 (to 28)
10 STORE_FAST 0 (letter)
3 13 LOAD_GLOBAL 0 (abc)
16 LOAD_CONST 1 (0)
. . PROFILING
19 BINARY_SUBSCR
20 LOAD_FAST 0 (letter)
23 BINARY_ADD 24 POP_TOP
25 JUMP_ABSOLUTE 7
>> 28 POP_BLOCK
>> 29 LOAD_CONST 0 (None) 32 RETURN_VALUE
>>> dis.dis(concat_a_2)
2 0 LOAD_GLOBAL 0 (abc)
3 LOAD_CONST 1 (0)
6 BINARY_SUBSCR
7 STORE_FAST 0 (a)
3 10 SETUP_LOOP 22 (to 35)
13 LOAD_GLOBAL 0 (abc)
16 GET_ITER
>> 17 FOR_ITER 14 (to 34)
20 STORE_FAST 1 (letter)
4 23 LOAD_FAST 0 (a)
26 LOAD_FAST 1 (letter)
29 BINARY_ADD 30 POP_TOP
31 JUMP_ABSOLUTE 17
>> 34 POP_BLOCK
>> 35 LOAD_CONST 0 (None) 38 RETURN_VALUE
As ⁴ou can see, in the second version we store abc[0]in a temporar⁴ variable
be-. be-. PROFILING
fore running the loop. This makes the b⁴tecode executed inside the loop a little smaller, as we avoid having to do theabc[0] lookup for each iteration. Measured usingtimeit, the second version is % faster than the first one; it takes a whole microsecond less to execute! Obviousl⁴ this microsecond is not worth the optimi⁵a-tion unless ⁴ou call this funcoptimi⁵a-tion millions of times – but this is kind of insight that thedismodule can provide.
Whether ⁴ou should need to rel⁴ on such "tricks" as storing the value outside the loop is debatable – ultimatel⁴, it should be the compiler’s work to optimi⁵e this kind of thing. On the other hand, as the language is heavil⁴ d⁴namic, it’s difficult for the compiler to be sure that optimi⁵ation wouldn’t result in negative side effects. So be careful when writing ⁴our code!
Another wrong habit I’ve oten encountered when reviewing code is the defining of functions inside functions for no reason. This has a cost – as the function is going to be redefined over and over for no reason.
Example . A function defined in a function, disassembled
>> import dis
>>> def x():
... return 42 ...
>>> dis.dis(x)
2 0 LOAD_CONST 1 (42)
3 RETURN_VALUE
2 0 LOAD_CONST 1 (<code object y at 0x100ce7e30, ←֓
. . PROFILING
file "<stdin>", line 2>)
3 MAKE_FUNCTION 0
6 STORE_FAST 0 (y)
4 9 LOAD_FAST 0 (y)
12 CALL_FUNCTION 0
15 RETURN_VALUE
We can see here that it is needlessl⁴ complicated, calling MAKE_FUNCTION,STORE_F AST,LOAD_FASTandCALL_FUNCTIONinstead of justLOAD_CONST. That requires man⁴ more opcodes for no good reason – and function calling in P⁴thon is alread⁴ ineffi-cient.
The onl⁴ case in which it is required to define a function within a function is when building a function closure, and this is a perfectl⁴ identified use case in P⁴thon’s opcodes.
Example . Disassembling a closure
>>> def x():
... a = 42 ... def y():
... return a
... return y() ...
>>> dis.dis(x)
2 0 LOAD_CONST 1 (42)
3 STORE_DEREF 0 (a)
3 6 LOAD_CLOSURE 0 (a)
9 BUILD_TUPLE 1
12 LOAD_CONST 2 (<code object y at 0x100d139b0, ←֓
. . ORDERED LIST AND BISECT
file "<stdin>", line 3>)
15 MAKE_CLOSURE 0
18 STORE_FAST 0 (y)
5 21 LOAD_FAST 0 (y)
24 CALL_FUNCTION 0
27 RETURN_VALUE