• No results found

Sorting lists

In document Python 3 Orientado a Objeto (Page 184-200)

Without any parameters, sort will generally do the expected thing. If it's a list of strings, it will place them in alphabetical order. This operation is case sensitive, so all capital letters will be sorted before lower case letters, that is Z comes before a. If it is a list of numbers, they will be sorted in numerical order. If a list of tuples is provided, the list is sorted by the first element in each tuple. If a mixture of unsortable items is supplied, the sort will raise a TypeError exception.

If we want to place objects we define ourselves into a list and make those objects sortable, we have to do a bit more work. The special method __lt__, which stands for "less than", should be defined on the class to make instances of that class comparable. The sort method on list will access this method on each object to determine where it goes in the list. This method should return True if our class is somehow less than the passed parameter, and False otherwise. Here's a rather silly class that can be sorted based on either a string or a number:

class WeirdSortee:

def __init__(self, string, number, sort_num):

self.string = string self.number = number self.sort_num = sort_num def __lt__(self, object):

if self.sort_num:

return self.number < object.number return self.string < object.string

def __repr__(self):

return"{}:{}".format(self.string, self.number)

The __repr__ method makes it easy to see the two values when we print a list. This __lt__ implementation compares the object to another instance of the same class (or any duck typed object that has string, number, and sort_num attributes; it will fail if those attributes are missing). The following output illustrates this class in action, when it comes to sorting:

>>> a = WeirdSortee('a', 4, True)

>>> b = WeirdSortee('b', 3, True)

>>> c = WeirdSortee('c', 2, True)

>>> d = WeirdSortee('d', 1, True)

>>> l = [a,b,c,d]

>>> l

[a:4, b:3, c:2, d:1]

>>> l.sort()

>>> l

[d:1, c:2, b:3, a:4]

>>> for i in l:

... i.sort_num = False ...

>>> l.sort()

>>> l

[a:4, b:3, c:2, d:1]

The first time we call sort, it sorts by numbers, because sort_num is True on all the objects being compared. The second time, it sorts by letters. The __lt__ method is the only one we need to implement to enable sorting. Technically, however, if it is implemented, the class should normally also implement the similar __gt__, __eq__, __ne__, __ge__, and __le__ methods, so that all of the <, >, ==, !=, >=, and <=

operators also work properly.

The sort method can also take an optional key argument. This argument is a function that can transform each object in a list into an object that can be somehow compared. This is useful if we have a tuple of values and want to sort on the second item in the tuple rather than the first (which is the default for sorting tuples):

>>> x = [(1,'c'), (2,'a'), (3, 'b')]

>>> x.sort()

>>> x

[(1, 'c'), (2, 'a'), (3, 'b')]

>>> x.sort(key=lambda i: i[1])

>>> x

[(2, 'a'), (3, 'b'), (1, 'c')]

The lambda keyword in the command line creates a function that takes a tuple as input and uses sequence lookups to return the item with index 1 (that is the second item in the tuple).

As another example, we can also use the key parameter to make a sort case insensitive. To do this, we simply need to compare the all lowercase versions of strings, so we can pass the built-in str.lower function as the key function:

>>> l = ["hello", "HELP", "Helo"]

>>> l.sort()

>>> l

['HELP', 'Helo', 'hello']

>>> l.sort(key=str.lower)

>>> l

['hello', 'Helo', 'HELP']

Remember, even though lower is a method on string objects, it is also a function that can accept a single argument, self. In other words, str.lower(item) is equivalent to item.lower(). When we pass this function as a key, it performs the comparison on lowercase values instead of doing the default case-sensitive comparison.

Sets

Lists are extremely versatile tools that suit most container object applications.

But they are not useful when we want to ensure objects in the list are unique. For example, a song library may contain many songs by the same artist. If we want to sort through the library and create a list of all the artists, we would have to check the list to see if we've added the artist already before we add them again.

This is where sets come in. Sets come from mathematics, where they represent an unordered group of (usually) unique numbers. We can add a number to a set five times, but it will show up in the set only once.

In Python, sets can hold any hashable object, not just numbers. Hashable objects are the same objects that can be used as keys in dictionaries, so again, lists and dictionaries are out. Like mathematical sets, they can store only one copy of each object. So if we're trying to create a list of song artists, we can create a set of string names and simply add them to the set. This example starts with a list of (song, artist) tuples and creates a set of the artists:

song_library = [("Phantom Of The Opera", "Sarah Brightman"), ("Knocking On Heaven's Door", "Guns N' Roses"), ("Captain Nemo", "Sarah Brightman"),

("Patterns In The Ivy", "Opeth"), ("November Rain", "Guns N' Roses"), ("Beautiful", "Sarah Brightman"), ("Mal's Song", "Vixy and Tony")]

artists = set()

for song, artist in song_library:

artists.add(artist) print(artists)

There is no built-in syntax for an empty set as there is for lists and dictionaries; we create a set using the set() constructor. However, we can use the curly braces of dictionary syntax to create a set, so long as the set contains values. If we use colons to separate pairs of values, it's a dictionary, as in {'key':'value','key2':

'value2'}. If we just separate values with commas, it's a set, as in {'value', 'value2'}. Items can be added individually to the set using its add method.

If we run this script, we see that the set works as advertised:

{'Sarah Brightman', "Guns N' Roses", 'Vixy and Tony', 'Opeth'}

If you're paying attention to the output, you'll notice that the items are not printed in the order they were added to the sets. Sets, like dictionaries, are unordered. They both use an underlying hash-based data structure for efficiency. Because they are unordered, sets cannot have items looked up by index. The primary purpose of a set is to divide the world into two groups: "things that are in the set", and, "things that are not in the set". It is easy to check if an item is in the set or to loop over the items in a set, but if we want to sort or order them, we'll have to convert the set to a list.

This output shows all three of these activities:

>>> "Opeth" in artists True

>>> for artist in artists:

... print("{} plays good music".format(artist)) ...

Sarah Brightman plays good music Guns N' Roses plays good music Vixy and Tony play good music Opeth plays good music

>>> alphabetical = list(artists)

>>> alphabetical.sort()

>>> alphabetical

["Guns N' Roses", 'Opeth', 'Sarah Brightman', 'Vixy and Tony']

While the primary feature of a set is uniqueness, that is not its primary purpose.

Sets are most useful when two or more of them are used in combination. Most of the methods on the set type operate on other sets, allowing us to efficiently combine or compare the items in two or more sets. These methods have strange names if you're not familiar with mathematical sets, since they use the same terminology used in mathematics. We'll start with three methods that return the same result regardless of which is the calling set and which is the called set.

The union method is the most common and easiest to understand. It takes a second set as a parameter and returns a new set that contains all elements that are in either of the two sets; if an element is in both original sets, it will, of course, only show up once in the new set. Union is like a logical or operation, indeed, the | operator can be used to get the same effect, if you don't like calling methods.

Conversely, the intersection method accepts a second set and returns a new set that contains only those elements that are in both sets. It is like a logical and operation, and can also be referenced using the & operator.

Finally, the symmetric_difference method tells us what's left; it is the set of objects that are in one set or the other, but not both. The following example illustrates these methods by comparing some artists from my song library to those in my sister's:

my_artists = {"Sarah Brightman", "Guns N' Roses", "Opeth", "Vixy and Tony"}

auburns_artists = {"Nickelback", "Guns N' Roses", "Savage Garden"}

print("All: {}".format(my_artists.union(auburns_artists)))

print("Both: {}".format(auburns_artists.intersection(my_artists))) print("Either but not both: {}".format(

my_artists.symmetric_difference(auburns_artists)))

If we run this code, we see that these three methods do what the print statements suggest they will do:

All: {'Sarah Brightman', "Guns N' Roses", 'Vixy and Tony', 'Savage Garden', 'Opeth', 'Nickelback'}

Both: {"Guns N' Roses"}

Either but not both: {'Savage Garden', 'Opeth', 'Nickelback', 'Sarah Brightman', 'Vixy and Tony'}

These methods all return the same result regardless of which set calls the other. We can say my_artists.union(auburns_artists) or auburns_artists.union(my_

artists) and get the same result. There are also methods that return different results depending on who is the caller and who is the argument.

These methods include issubset and issuperset, which are the inverse of each other. Both return a boolean. The issubset method returns True, if all of the items in the calling set are also in the set passed as an argument. The issuperset method returns True, if all of the items in the argument are also in the calling set. Thus s.issubset(t) and t.issuperset(s) are identical. They will both return True if t contains all the elements in s.

Finally, the difference method returns all the elements that are in the calling set, but not in the set passed as an argument; this is like half a symmetric_difference. The difference method can also be represented by the - operator. The following code illustrates these methods in action:

my_artists = {"Sarah Brightman", "Guns N' Roses", "Opeth", "Vixy and Tony"}

bands = {"Guns N' Roses", "Opeth"}

print("my_artists is to bands:")

print("issuperset: {}".format(my_artists.issuperset(bands))) print("issubset: {}".format(my_artists.issubset(bands))) print("difference: {}".format(my_artists.difference(bands))) print("*"*20)

print("bands is to my_artists:")

print("issuperset: {}".format(bands.issuperset(my_artists))) print("issubset: {}".format(bands.issubset(my_artists))) print("difference: {}".format(bands.difference(my_artists)))

This code simply prints out the response of each method when called from one set on the other. Running it gives us the following output:

my_artists is to bands:

issuperset: True issubset: False

difference: {'Sarah Brightman', 'Vixy and Tony'}

********************

bands is to my_artists:

issuperset: False issubset: True difference: set()

The difference method, in the second case, returns an empty set, since there are no items in bands that are not in my_artists.

The union, intersection, and difference methods can all take multiple sets as arguments; they will return, as we might expect, the set that is created when the operation is called on all the parameters.

So the methods on sets clearly suggest that sets are meant to operate on other sets, and that they are not just containers. If we have data coming in from two different sources and need to quickly combine them in some way, to determine where the data overlaps, or is different, we can use set operations to efficiently compare them. Or if we have data incoming that may contain duplicates of data that has already been processed, we can use sets to compare the two and process only the new data.

Extending built-ins

We discussed briefly in Chapter 3 how built-in data types can be extended using inheritance. Now, we'll go into more detail as to when we would want to do that.

When we have a built-in container object that we want to add functionality to, we have two options. We can either create a new object, which holds that container as an attribute (composition), or we can subclass the built-in object and add or adapt methods on it to do what we want (inheritance).

Composition is usually the best alternative if all we want to do is use the container to store some objects using that container's features. That way, it's easy to pass that data structure into other methods and they will know how to interact with it. But we need to use inheritance if we want to change the way the container actually works.

For example, if we want to ensure every item in a list is a string with exactly five characters, we need to extend list and override the append() method to raise an exception for invalid input. We'd also have to override __setitem__(self,index, value), a special method on lists that is called whenever we use the x[index]=

"value" syntax.

That's right, all that special non-object-oriented looking syntax we've been looking at for accessing lists, dictionary keys, looping over containers, and similar tasks is actually "syntactic sugar" that maps to an object-oriented paradigm underneath.

We might ask the Python designers why they did this, when common perception suggests that object-oriented programming is always better. That question is easy to answer. In the following hypothetical examples, which is easier to read, as a programmer? Which requires less typing?:

c = a + b c = a.add(b) l[0] = 5

l.setitem(0, 5)

d[key] = value

d.setitem(key, value)

for x in alist:

#do something with x it = alist.iterator() while it.has_next():

x = it.next()

#do something with x

The highlighted sections show what object-oriented code might look like (in practice, these methods actually exist as special double-underscore methods on associated objects). Python programmers agree that the non-object-oriented syntax is easier to read and to write. Non-Python programmers say that syntax like this means Python is not object-oriented. That, however, is hogwash. All of the above Python syntaxes map to object-oriented methods underneath the hood. These methods have special names (with double-underscores before and after) to remind us that there is a better syntax out there. However, we now have the means to override these behaviors. For example, we can make a special integer that always returns 0 when we add two of them together:

class SillyInt(int):

def __add__(self, num):

return 0

This is a very strange thing to do, granted, but it illustrates perfectly the object-oriented principles in action. And now we have an argument when people tell us Python isn't truly object-oriented. It's just object-oriented that has been made easy to work with. Check out the above class in action:

>>> a = SillyInt(1)

>>> b = SillyInt(2)

>>> a + b 0

The awesome thing about the __add__ method is that we can add it to any class we write, and if we use the + operator on instances of that class, it will be called. This is how string, tuple, and list concatenation works.

This is true of all the special methods. If we want to use xinmyobj syntax, we can override __contains__. If we want to use myobj[i]=value syntax, we implement __setitem__ and if we want to use something=myobj[i], we implement

__getitem__.

There are thirty-three of these special methods on the list class. We can use the dir function to see all of them:

>>> dir(list)

['__add__', '__class__', '__contains__', '__delattr__','__delitem__', '__

doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem_

_', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce_

_', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__

', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort' Further, if we want any additional information on how any of these methods works, we can use the help function:

>>> help(list.__add__) Help on wrapper_descriptor:

__add__(...)

x.__add__(y) <==> x+y

The plus operator on lists concatenates two lists. We don't have room to discuss all of the available special functions in this book, but you are now able to explore all this functionality with dir and help. The official online Python reference (http://docs.

python.org/) has plenty of useful information as well. Focus, especially, on the abstract base classes discussed in the collections module.

So to get back to the earlier point about when we would want to use composition versus inheritance: if we need to somehow change any of the methods on the class, including the special methods we definitely need to use inheritance. If we used composition, we could write methods that do the validation or alterations and ask the caller to use those methods, but there is nothing stopping them from accessing the property directly (no private members, remember?). They could insert an item into our list that does not have five characters, and that might confuse other methods in the list.

Often, the need to extend a built-in data type is an indication that we're using the wrong sort of data type. It is not always the case, but if you're suddenly looking to extend a built-in, carefully consider whether or not a different data structure would be more suitable.

As a last example, let's consider what it takes to create a dictionary that remembers the order in which keys were inserted. One way (likely not the best way) to do this is to keep an ordered list of keys that is stored in a specially derived subclass of dict. Then we can override the methods keys, values, __iter__, and items to return everything in order. Of course, we'll also have to override __setitem__ and setdefault to keep our list up to date. There are likely to be a few other methods in the output of dir(dict) that need overriding to keep the list and dictionary consistent (clear and __delitem__ come to mind, to track when items are removed), but we won't worry about them for this example.

So we'll be extending dict and adding a list of ordered keys. Trivial enough, but where do we create the actual list? We could include it in the __init__ method, which would work just fine, but we have no guarantees that any subclass will call that initializer. Remember the __new__ method we discussed in Chapter 2? I said it was generally only useful in very special cases. This is one of those special cases.

We know __new__ will be called exactly once, and we can create a list on the new instance that will always be available to our class. With that in mind, here is our entire sorted dictionary:

from collections import KeysView, ItemsView, ValuesView class DictSorted(dict):

def __new__(*args, **kwargs):

new_dict = dict.__new__(*args, **kwargs) new_dict.ordered_keys = []

return new_dict

def __setitem__(self, key, value):

'''self[key] = value syntax''' if key not in self.ordered_keys:

self.ordered_keys.append(key) super().__setitem__(key, value) def setdefault(self, key, value):

if key not in self.ordered_keys:

self.ordered_keys.append(key) return super().setdefault(key, value)

self.ordered_keys.append(key) return super().setdefault(key, value)

In document Python 3 Orientado a Objeto (Page 184-200)