• No results found

Idiomatic Python

N/A
N/A
Protected

Academic year: 2021

Share "Idiomatic Python"

Copied!
61
0
0

Loading.... (view fulltext now)

Full text

(1)

Idiomatic Python

Enrico Franchi

[email protected]

(2)

Could you please lend me the thing that you put in the wall when you want to turn on the hairdryer and

the hairdryer comes from a different country?

Could you please lend me a power adapter?

(3)

If you are out to describe the truth,

leave elegance to the tailor.

Albert Einstein

(4)

Debugging is twice as hard as writing the

code in the first place.

Therefore, if you write the code as

cleverly as possible, you are, by definition,

not smart enough to debug it.

Brian Kernighan

(5)

READABILITY

COUNTS

Zen of Python

(6)

TOC

Iteration Naming

Functions are objects Choice

Attributes and methods Duck Typing

Exceptions [unless TimeoutError is thrown]

(7)

FOR vs. WHILE vs. ...

Iteration vs. Recursion

sys.setrecursionlimit(n) for vs. while

Traditionally bounded iteration vs. unbounded iteration In C for and while are completely equivalent

Some languages have for/foreach to iterate on collections

for file in *.py; do

pygmentize -o ${file%.py}.rtf $file done

(8)

Numerical Iteration

int i = 0; while(i < MAX) { printf("%d\n", i); ++i; } int i = 0; for(i=0; i < MAX; ++i) { printf("%d\n", i); } i = 0 while i < MAX: print i i += 1 # O(n) space

for i in range(MAX): print i

# O(1) space

for i in xrange(MAX): print i

(9)

Iteration on elements

It is also common to iterate on elements of some

collection

C uses indices to iterate on array elements

Python uses for

What if we want to iterate both on elements and

indices? i = 0 while i < len(lst): process(lst[i]) i += 1 for el in lst: process(el) BAD GOOD 9

(10)

j = 0

while j < len(lst):

process(index=j, element=lst[j]) j += 1

for j in range(len(lst)):

process(index=j, element=lst[j])

for j, el in enumerate(lst):

process(index=j, element=el)

BAD

GOOD

BAD

(11)

What about Turing?

for is usually considered the more pythonic

alternative

Ideally every iteration should be done using for

However, we have shown only iteration on finite collections, that is to say, for would not provide

turing-completeness

But everybody knows about generators: Python has infinite (lazy) sequences and they cover

many other patterns as well

(12)

Design Implications

Python for statement uses external iterators, that

are extremely easy to implement through generators

itertools provides lots of functions to

manipulate iterators

The iteration logic is pushed inside the iterator; the client code becomes totally agnostic on how values are generated

(13)

def server_socket(host, port):

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind((host, port))

sock.listen(5)

csock, info = sock.accept() return csock.makefile('rw')

def server(host, port):

fh = server_socket(host, port) for i, line in enumerate(fh): if line == "EOF\r\n":

break

fh.write("%4.d:\t%s" % (i, line)) fh.close()

... (Forking)TCPServer and higher level modules and frameworks are better!

(14)

def depth_first_visit(node): stack = [node, ]

while stack:

current_node = stack.pop()

stack.extend(reversed(current_node.children)) yield current_node.value

def breadth_first_visit(node):

queue = collections.deque((node, )) while queue:

current_node = queue.popleft()

queue.extend(current_node.children) yield current_node.value

for v in depth_first_visit(tree): print v, print for v in breadth_first_visit(tree): print v, print 14

(15)

PEP-8

http://www.python.it/doc/articoli/pep-8.html

‘‘‘One of Guido\’s key insights is that code is read much

more often than it is written. The guidelines provided here are intended to improve the readability of code and make consistent across the wide spectrum of Python code. As PEP 20 [6] says, “Readability counts”.’’’

http://www.python.org/dev/peps/pep-0008/

(16)

PEP-8 (II)

Standard for source code style

names

whitespace indentation

Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is most important.

(17)

Indentation

4 spaces, don’t mix tabs and spaces 79 characters per line max

Wrap lines in using implied line cont. in (), [] and {} Add parentheses to wrap lines

Sometimes backslash is more appropriate Newline after operators

One blank line between functions, two between classes

(not filename.startswith('.') and

filename.endswith(('.pyc', '.pyo')))

(18)

Space Invaders

Put a space after “,” [parameters, lists, tuples, etc] Put a space after “:” in dicts, not before

Put spaces around assignments and comparisons

Unless it is an argument list

No spaces just inside parentheses or just before argument lists

(19)

Naming conventions (I)

Always use descriptive names; the longer the scope, the longer the name

Trailing underscore: avoids conflict with

keywords or builtins (class_)

Leading underscore: “internal use”/non-public Double leading underscore: name mangling Double leading and trailing: “magic”

Avoid l, 1 and similar confusing names

(20)

Naming conventions (II)

simple lower_case CamelCase ALL_CAPS Classes Variables Methods Functions Constants Packages Modules X X X X X X X X X X (x)

... and self/cls first argument name for methods 20

(21)

Default values

The default values are evaluated once, when

the function is defined and is ‘shared’ among all call points

If the default value is a mutable object, that leads to bugs >>> def f(x=[]): ... x.append(1) ... return x ... >>> f() [1] >>> f() [1, 1] >>> f() [1, 1, 1] >>> def g(x=None): ... x = [] if x is None else x ... return x ... >>> g() [] >>> g([1, 2]) [1, 2] 21

(22)

Functions are Objects

In Python everything is an object

Thus, functions are objects

Functions can be passed as arguments (easy) Functions can be returned as return values

Some APIs explicitly expect functions as arguments (sort(key=))

import sys, urllib

def reporthook(*a): print a

for url in sys.argv[1:]: i = url.rfind('/') file = url[i+1:]

print url, "->", file

urllib.urlretrieve(url, file, reporthook)

(23)

Internal Iterators

def dfs(node, action):

stack = [node, ] while stack:

current_node = stack.pop()

stack.extend(reversed(current_node.children)) action(current_node.value)

def bfs(node, action):

queue = collections.deque((node, )) while queue:

current_node = queue.popleft()

queue.extend(current_node.children) action(current_node.value)

dfs(tree, lambda x: sys.stdout.write("%s, " % x))

(24)

def dfs(node, pre_action=None, post_action=None): def nop(node): pass

pre_action = pre_action or nop # bad, use if post_action = post_action or nop # bad

stack = []

def process_node(n):

def do_pre(): pre_action(n.value) def do_post(): post_action(n.value) def do_process():

stack.append(do_post)

for child in reversed(n.children):

stack.append(process_node(child)) stack.append(do_pre)

return do_process

stack.append(process_node(node)) while stack:

action = stack.pop() action()

dfs(tree, pre_action=lambda x: sys.stdout.write("%s, " % x))

print

dfs(tree, post_action=lambda x: sys.stdout.write("%s, " % x))

print

(25)

A A B C D E Pre Proc Post A C B A A C B B A C B A C B A C A C E D C A C E D A C E D D A C E D A C E E A C E 1 2 3 4 5 6 7 8 9 10 11 A C E A C A 12 13 14 15 25

(26)

def dfs(node, pre_action=None, post_action=None): def nop(node): pass

pre_action = pre_action or nop post_action = post_action or nop stack = []

def process_node(n):

def do_pre(): pre_action(n.value) def do_post(): post_action(n.value) def do_process():

stack.append(do_post)

for child in reversed(n.children):

stack.append(process_node(child)) stack.append(do_pre)

return do_process

stack.append(process_node(node)) while stack:

action = stack.pop() action()

26

(27)

class TreePrinter(object):

def __init__(self, fh, step=' '): self.out = fh

self.step = step self.level = 0

def pre_print(self, value):

self.out.write(self.step * self.level) self.out.write(str(value))

self.out.write('\n') self.level += 1

def post_print(self, _): self.level -= 1

tp = TreePrinter(sys.stdout)

dfs(tree, tp.pre_print, tp.post_print)

0 1 2 3 4 5 6 7 8 9 10 11 27

(28)

The case of

the missing switch

Some people think Python should have a switch/ case like statement, something that executes a

block of code determined by the value of a variable

Possible solutions

Python if/elif/else statement

Seems the job for a dictionary + functions

A cleverly designed class can solve the problem as well

(29)

What if we use the if?

An if statement is easy to read and write, if there are few

branches. Confusing if there are many branches

Theoretically correct (provided that the conditions are disjoint)

Maybe slower as conditions are evaluated in order

Some suggest that if statements should be banned ;)

f (x1,…, xn) = φ1

(

x1,…, xn

)

if ρ1

(

x1,…, xn

)

φm

(

x1,…, xn

)

if ρm

(

x1,…, xn

)

φm+1

(

x1,…, xn

)

otherwise ⎧ ⎨ ⎪ ⎪ ⎩ ⎪ ⎪ 29

(30)

Dictionary

If the body of the switch essentially sets some (set of) variable(s), a dictionary is perfect

def some_function(n, *more_args): # ... masks = { 0: '0000', 1: '0001', 2: '0010', 3: '0011', 4: '0100', 5: '0101', 6: '0110', 7: '0111', 8: '1000', 9: '1001', 10: '1010', 11: '1011', 12: '1100', 13: '1101', 14: '1110', 15: '1111' } # ... str_bits = masks[n] 30

(31)

Dictionary [+ Functions]

If the “actions” in the branches are naturally abstracted as functions, a dictionary is perfect

import operator # ...

class BinOp(Node): # ...

def compute(self): operations = { '+': operator.add, '-': operator.sub, '*': operator.mul, '/': operator.div }

return operations[self.op](self.left.compute(), self.right.compute())

(32)

import cmd

class Example(cmd.Cmd):

def do_greet(self, rest):

print 'Hello %s!' % rest def do_quit(self, rest):

return True

while 1:

words = raw_input('(cmd) ').split(' ', 1) command = words[0]

try: rest = words[1]

except IndexError: rest=''

switch command: case 'greet':

print 'Hello %s!' % rest case 'quit':

break

(33)

Properties are a neat way to implement attributes

whose usage resembles attribute access, but

whose implementation uses method calls.

These are sometimes known as “managed

attributes”.

GvR

(34)

class Track(object):

def __init__(self, artist, title, duration): self.artist = artist

self.title = title

self.duration = duration def __str__(self):

return '%s - %s - %s' % (self.artist, self.title,

self.duration)

34

(35)

Properties (I)

Track has public attributes “Java” bad-practice

Dependency from

“implementation details”

What if we need validation in setters and such?

property: old attribute access

syntax, function calls under the hood

class A(object):

def __init__(self, foo): self._foo = foo

def get_foo(self): print 'got foo'

return self._foo def set_foo(self, val): print 'set foo'

self._foo = val

foo = property(get_foo, set_foo) a = A('hello') print a.foo # => 'got foo' # => 'hello' a.foo = 'bar' # => 'set foo' 35

(36)

Properties (II)

Sometimes we don’t need the setter...

class A(object):

def __init__(self, foo): self._foo = foo

def get_foo(self): print 'got foo'

return self._foo foo = property(get_foo) a = A('ciao') print a.foo # => 'got foo' # => 'ciao' a.foo = 'bar'

# Traceback (most recent call last):

# File "prop_example2.py", line 15, in <module> # a.foo = 'bar'

# AttributeError: can't set attribute'

(37)

Properties (III)

Nicer syntax: decorators are handy

class A(object):

def __init__(self, foo): self._foo = foo

@property

def foo(self):

print 'got foo'

return self._foo a = A('hello') print a.foo # => 'got foo' # => 'hello' a.foo = 'bar'

# Traceback (most recent call last):

# File "prop_example2.py", line 15, in <module> # a.foo = 'bar'

# AttributeError: can't set attribute'

(38)

Properties (IV)

From Python 2.6, decorator for the setter:

class A(object):

def __init__(self, foo): self._foo = foo

@property

def foo(self):

print 'got foo'

return self._foo @foo.setter

def foo(self, value): print 'set foo'

self._foo = value a = A('hello')

a.foo = 'bar'

# => 'set foo'

(39)

class Track(object):

def __init__(self, artist, title, duration): self._artist = artist

self._title = title

self._duration = duration @property

def artist(self):

return self._artist @property

def title(self):

return self._title @property

def duration(self):

return self._duration def __str__(self):

return '%s - %s - %s' % (self.artist, self.title,

self.duration)

(40)

How Pythonic?

We can decouple interface from implementation (getters/setters)

We have “read-only” attributes,

therefore, “immutable” objects

Trivial getter/setters are repetitive

Properties are helpful in order to evolve code, but are verbose to define “immutable objects”

(41)

Named Tuples

Named Tuples solve the problem nicely

Immutable objects (easier to use, too much C++ and FP lately ☺)

Can be used both as objects and tuples

__str__ and other methods have good default implementation

Subclassing can be used to change defaults Very quick to write!

http://code.activestate.com/recipes/500261-named-tuples/

(42)

Track

=

collections

.

namedtuple(

'Track'

,

[

'title'

,

'artist'

,

'duration'

])

(43)

About Java/C++ types...

In statically typed languages like C++ we

constrain parameters to be of a given type or any of its subtypes

However, a good programming practice is

program to an interface

Java interfaces (true dynamic polymorphism) C++ Templates (static polymorphism)

Both solutions have problems

(however, I do love ML static typing...)

(44)

Books, search by title

If the list contains a non book, an exception is

raised

Does not even work with subclasses

Worst strategy

Never type-check like that

Solving a non-problem

class Book(object):

def __init__(self, title, author): self.title = title

self.author = author def find_by_title(seq, title): for item in seq:

if type(item) == Book: # horrible

if item.title == title: return item

else:

raise TypeError

def find_by_author(seq, author): for item in seq:

if type(item) == Book: # horrible

if item.author == author: return item

else:

raise TypeError

(45)

Books, search by title

If the list contains a non book, an exception is

raised

Does not even work with subclasses

Worst strategy

Never type-check like that

Solving a non-problem

(46)

Books, search by title

Subclasses are ok

However, code does not depend on elements being books

They have a title

They have an author

What about songs? Bad strategy, afterall

def find_by_title(seq, title): for item in seq:

if isinstance(item, Book): # bad

if item.title == title: return item

else:

raise TypeError

def find_by_author(seq, author): for item in seq:

if isinstance(item, Book): # bad

if item.author == author: return item

else:

raise TypeError class Book(object):

def __init__(self, title, author): self.title = title

self.author = author

(47)

Books, search by title

Subclasses are ok

However, code does not depend on elements being books

They have a title

They have an author

What about songs? Bad strategy, afterall

def find_by_title(seq, title): for item in seq:

if isinstance(item, Book): # bad

if item.title == title: return item

else:

raise TypeError

def find_by_author(seq, author): for item in seq:

if isinstance(item, Book): # bad

if item.author == author: return item

else:

raise TypeError class Song(object):

def __init__(self, title, author): self.title = title

self.author = author

(48)

What about movies?

Movies have a title. However, they have a director and no author

find_by_title should work, find_by_author,

shouldn’t

Interface for Book e Song. And what about Movie?

Design Pattern o code duplication

Square Wheel Roads designed for square wheels Duck typing simply avoids the problem

(49)

Books and Songs

The simplest solution is the best Programmers do not code by chance (hopefully)

AttributeErrors are raised in case of problems

UnitTests discover these kind of errors

You have unit tests, don’t you? class Book(object):

def __init__(self, t, a): self.title = t

self.author = a

def find_by_title(seq, title): for item in seq:

if item.title == title: return item

def find_by_author(seq, author): for item in seq:

if item.author == author: return item

(50)

def find_by(seq, **kwargs): for obj in seq:

for key, val in kwargs.iteritems(): try:

if getattr(obj, key) != val: break except AttributeError: break else: return obj raise NotFound

print find_by(books, title='Python in a Nutshell')

print find_by(books, author='M. Beri')

print find_by(books, title='Python in a Nutshell', author='A. Martelli')

try:

print find_by(books, title='Python in a Nutshell', author='M. Beri')

print find_by(books, title='Python in a Nutshell', pages=123)

except NotFound: pass

(51)

def find_by(seq, **kwargs): for obj in seq:

for key, val in kwargs.iteritems(): try:

attr = getattr(obj, key) except AttributeError:

break

else:

if val != attr and val not in attr: break

else:

(52)

Life expectations

Function parameters and every variable bound in a function body constitutes the function local

scope

These variables scope is the whole function body However, using them before binding is an error

(53)

Life expectations

Function parameters and every variable bound in a function body constitutes the function local

scope

These variables scope is the whole function body However, using them before binding is an errorif a s.startswith(t):= s[:4]

else: a = t print a a = None WRONG 50

(54)

Life expectations

Function parameters and every variable bound in a function body constitutes the function local

scope

These variables scope is the whole function body However, using them before binding is an errorif a s.startswith(t):= s[:4]

else:

a = t print a

GOOD

(55)

LBYL vs. EAFP

LBYL: Look before you leap EAFP: Easier to ask

forgiveness than permission

Usually EAFP is the best

strategy

Exception are rather fast Atomicity, ... # LBYL -- bad if id_ in employees: emp = employees[id_] else: report_error(...) #EAFP -- good try: emp = employees[id_] except KeyError: report_error(...) 51

(56)

if os.access(filename, os.F_OK): fh = file(filename)

else:

print "Something went bad."

if os.access(filename, os.F_OK): try:

fh = file(filename) except IOError:

print "Something went bad."

else:

print "Something went bad."

try:

fh = file(filename)

except IOError:

print "Something went bad."

BAD

VERBOSE

GOOD

(57)

More on Exceptions

Exceptions should subclass Exception directly or

indirectly

Catch exceptions using the most specific specifier

Don’t use the base except: unless

You plan to re-raise the exception (but you probably should use finally)

You want to log any error or something like that Also catches KeyboardInterrupt

(58)

Limit the try scope

try:

# Too broad!

return handle_value(collection[key])

except KeyError:

# Will also catch KeyError raised by handle_value() return key_not_found(key) try: value = collection[key] except KeyError: return key_not_found(key) else: return handle_value(value) BAD GOOD 54

(59)

References

Python in a Nutshell, 2ed, Alex Martelli, O’Reilly Python Cookbook, Alex Martelli, Anna Martelli Ravenscroft and David Ascher, O’Reilly

Agile Software Development: Principles, Patterns and Practices, Robert C. Martin, Prentice Hall

Code Clean, Robert C. Martin, Prentice Hall

Structure and Interpretation of Computer Programs, H. Abelson, G. Sussman, J. Sussman,

http://mitpress.mit.edu/sicp/full-text/book/book.html

(60)

References

http://python.net/~goodger/projects/pycon/2007/ idiomatic/handout.html http://dirtsimple.org/2004/12/python-is-not-java.html http://docs.python.org/dev/howto/ doanddont.html http://www.slideshare.net/sykora/idiomatic-python http://bayes.colorado.edu/PythonIdioms.html 56

(61)

References

Related documents

• when a function call is encountered, the Python pauses the execution of the main thread and makes a branch into the body of the function.. • when the function finishes, the

types of inadvertent variable capture [7]. Macro argument capture is when the pattern that is fed into the macro contains a variable name that is also used within the macro function.

than in the scope of this quantifier (the variable bound by the quantifier). As such a functor, a quantifier can be treated as a set-theoretical function relative to the number of

Identifier is a name that is used to identify a variable, function, class, module or any other object in the python programming language..

Variable Scope (Local vs. Global)  Pass‐By‐Copy Versus Pass‐by‐Reference  Defining and Invoking Functions  Creating Function Libraries   

• Avoid using Python keywords and function names as variable names; that is, do not use words that Python has reserved for a particular programmatic purpose, such as the word

T VARIABLE If a variable declaration is identified the current scope is checked and the variable declaration is added either to a list of local (if the token is found in a

If a return command is executed elsewhere within the body (for example, within a conditional), the execution of the function immediately ends with any specified value passed back to