Iterators

In Python, an iterator protocol is defined using two methods: __iter()__ and next(). The former returns the iterator object and latter returns the next element of a sequence. Some advantages of iterators are as follows:

Readability

Supports sequences of infinite length Saving resources

There are several built-in objects in Python which implement iterator protocol, e.g. string, list, dictionary. In the following example, we create a new class that follows the iterator protocol. We then use the class to generate log2 of numbers:

As you can see, we first create an instance of the class and assign its __iter()__ function to a variable called i. Then by calling the next() function four times, we get the following output:

As you probably noticed, the lines are log2() of 1, 2, 3, 4 respectively.

15.5.13 Generators

Before we go to Generators, please understand Iterators. Generators are also Iterators but they can only be interated over once. Thats because Generators do not store the values in memory instead they generate

lambda arguments : expression

def multiply(a, b): return a*b #call the function multiply(3*5) #outputs: 15 multiply = Lambda a, b : a*b print(multiply(3, 5)) ## outputs: 15 (lambda a, b : a*b)(3*5)

map(function_object, iterable1, iterable2,...)

def multiply(x): return x *2 map(multiply2, [2, 4, 6, 8]) ## Output [4, 8, 12, 16] map(lambda x: x*2, [2, 4, 6, 8]) ## Output [4, 8, 12, 16] dict_movies = [

{'movie': 'avengers', 'comic': 'marvel'}, {'movie': 'superman', 'comic': 'dc'}]

map(lambda x : x['movie'], dict_movies) ## Output: ['avengers', 'superman'] map(lambda x : x['comic'], dict_movies) ## Output: ['marvel', 'dc'] map(lambda x : x['movie'] =="avengers", dict_movies) ## Output: [True, False]

map_output = map(lambda x: x*2, [1, 2, 3, 4]) print(map_output)

## Output: map object: <map object at 0x04D6BAB0> list_map_output = list(map_output)

print(list_map_output) ## Output: [2, 4, 6, 8]

from math import log2

class LogTwo:

"Implements an iterator of log two" def__init__(self,last =0): self.last = last def__iter__(self): self.current_num =1

returnself

def__next__(self):

ifself.current_num <=self.last: result = log2(self.current_num) self.current_num +=1 return result else: raiseStopIteration L = LogTwo(5) i = iter(L) print(next(i)) print(next(i)) print(next(i)) print(next(i)) $ python iterator.py 0.0 1.0 1.584962500721156 2.0 82

the values on the go. If we want to print those values then we can either simply iterate over them or use the for loop. 15.5.13.1 Generators with function

For example: we have a function named as multiplyBy10 which prints all the input numbers multiplied by 10.

Now, if we want to use Generators here then we will make following changes.

In Generators, we use yield() function in place of return(). So when we try to print new_numbers list now, it just prints Generators object. The reason for this is because Generators dont hold any value in memory, it yields one result at a time. So essentially it is just waiting for us to ask for the next result. To print the next result we can just say print next(new_numbers) , so how it is working is its reading the first value and squaring it and yielding out value 1. Also in this case we can just print next(new_numbers) 5 times to print all numbers and if we do it for 6th time then we will get an error StopIteration which meanns Generators has exausted its limit and it has no 6th element to print.

15.5.13.2 Generators using for loop

If we now want to print the complete list of squared values then we can just do:

The output will be:

15.5.13.3 Generators with List Comprehension

Python has something called List Comprehension, if we use this then we can replace the complete function def with just:

Here the point to note is square brackets [] in line 1 is very important. If we change it to () then again we will start getting Generators object.

We can get the individual elements again from Generators if we do a for loop over new_numbers like we did previously. Alternatively, we can convert it into a list and then print it.

But here if we convert this into a list then we loose on performance, which we will just see next. 15.5.13.4 Why to use Generators?

Generators are better with Performance because it does not hold the values in memory and here with the small examples we provide its not a big deal since we are dealing with small amount of data but just consider a scenario where the records are in millions of data set. And if we try to convert millions of data elements into a list then that will definitely make an impact on memory and performance because everything will in memory.

Lets see an example on how Generators help in Performance. First, without Generators, normal function taking 1 million record and returns the result[people] for 1 million.

I am just giving approximate values to compare it with next execution but we just try to run it we will see a serious consumption of memory with good amount of time taken.

def multiplyBy10(numbers): result = [] for i in numbers: result.append(i*10) return result new_numbers = multiplyBy10([1,2,3,4,5])

print new_numbers #Output: [10, 20, 30, 40 ,50]

def multiplyBy10(numbers): for i in numbers: yield(i*10) new_numbers = multiplyBy10([1,2,3,4,5])

print new_numbers #Output: Generators object

print next(new_numbers) #Output: 1

def multiplyBy10(numbers): for i in numbers: yield(i*10) new_numbers = multiplyBy10([1,2,3,4,5])

for num in new_numbers: print num 10 20 30 40 50 new_numbers = [x*10for x in [1,2,3,4,5]] print new_numbers #Output: [10, 20, 30, 40 ,50]

new_numbers = (x*10for x in [1,2,3,4,5]) print new_numbers #Output: Generators object

new_numbers = (x*10for x in [1,2,3,4,5]) print list(new_numbers) #Output: [10, 20, 30, 40 ,50]

names = ['John', 'Jack', 'Adam', 'Steve', 'Rick']

majors = ['Math', 'CompScience', 'Arts', 'Business', 'Economics'] ## prints the memory before we run the function

memory = mem_profile.memory_usage_resource()

print ('Memory (Before): {memory}Mb'.format(memory=memory)) def people_list(people): result = [] for i in range(people): person = { 'id' : i, 'name' : random.choice(names), 'major' : randon.choice(majors) } result.append(person) return result t1 = time.clock() people = people_list(10000000) t2 = time.clock()

## prints the memory after we run the function memory = mem_profile.memory_usage_resource() print ('Memory (After): {memory}Mb'.format(memory=memory)) print ('Took {time} seconds'.format(time=t2-t1))

#Output

Memory (Before): 15Mb Memory (After): 318Mb Took 1.2 seconds

names = ['John', 'Jack', 'Adam', 'Steve', 'Rick']

majors = ['Math', 'CompScience', 'Arts', 'Business', 'Economics'] ## prints the memory before we run the function

memory = mem_profile.memory_usage_resource()

print ('Memory (Before): {memory}Mb'.format(memory=memory))

def people_generator(people): for i in xrange(people): person = { 'id' : i,

Now after running the same code using Generators, we will see a significant amount of performance boost with alomost 0 Seconds. And the reason behind this is that in case of Generators, we do not keep anything in memory so system just reads 1 at a time and yields that.

15.5.14 Non Blocking Threads

In document PDF (Page 82-84)