Cooperative Multitasking and Coroutines - with Kolla GitHub Web Hooks

Earlier, I mentioned that modern operating systems use “pre-emptive multitasking” to get things done, forcing processes to give up control of the CPU in favor of another process. But there’s another model, known as “cooperative multitasking”, in which the system waits until a program voluntarily gives up control of the CPU. Hence the word

“cooperation”—if the function decided to perform oodles of calculations, and never

AT THE FORGE

gives up control, then there’s nothing the system can do about it.

This sounds like a recipe for disaster; why would you write, let alone run, programs that give up the CPU? The answer is simple. When your program uses I/O, you can pretty much guarantee that you’ll be waiting around idly until you get a response, given how much slower I/O is than programs running in memory. Thus, you can voluntarily give up the CPU whenever you do something with I/O, knowing that soon enough, other programs similarly will invoke I/O and give up the CPU, returning control to you.

In order for this to work, you’re going to need all of the programs within this

cooperating multitasking universe to agree to some ground rules. In particular, you’ll need them to agree that all I/O goes through the multitasking system, and that none of the tasks will hog the CPU for an extended period of time.

But wait, you’ll also need a bit more. You’ll need to give tasks a way to stop executing voluntarily for a little bit, and then restart from where they left off.

This last bit actually has existed in Python for some time, albeit with slightly different syntax. Let’s start the journey and exploration of asyncio there.

A normal Python function, when called, executes from start to finish. For example:

def foo():

print("a") print("b") print("c")

If you call this, you’ll see:

a b c

AT THE FORGE

Of course, it’s usually good for functions not just to print something, but also to return a value:

def hello(name):

return f'Hello, {name}'

Now when you invoke the function, you’ll get something back. You can grab that returned value and assign it to a variable:

s = hello('Reuven')

But there’s a variation on return that will prove central to what you’re doing here, namely yield. The yield statement looks and acts much like return, but it can be used multiple times in a function, even within a loop:

def hello(name):

for i in range(5):

yield f'[{i}] Hello, {name}'

Because it uses yield, rather than return, this is known as a “generator function”.

And when you invoke it, you don’t get back a string, but rather a generator object:

>>> g = hello('Reuven')

>>> type(g) generator

A generator is a kind of object that knows how to behave inside a Python for loop.

(In other words, it implements the iteration protocol.)

When put inside such a loop, the function will start to run. However, each time the generator function encounters a yield statement, it will return the value to the loop and go to sleep. When does it wake up again? When the for loop asks for the next value to be returned from the iterator:

AT THE FORGE

for s in g:

print(s)

Generator functions thus provide the core of what you need: a function that runs normally, until it hits a certain point in the code. At that point, it returns a value to its caller and goes to sleep. When the for loop requests the next value from the generator, the function continues executing from where it left off (that is, just after the yield statement), as if it hadn’t ever stopped.

The thing is that generators as described here produce output, but can’t get any input. For example, you could create a generator to return one Fibonacci number per iteration, but you couldn’t tell it to skip ten numbers ahead. Once the generator function is running, it can’t get inputs from the caller.

It can’t get such inputs via the normal iteration protocol, that is. Generators support a

send method, allowing the outside world to send any Python object to the generator.

In this way, generators now support two-way communication. For example:

def hello(name):

while True:

name = yield f'Hello, {name}' if not name:

break

Given the above generator function, you now can say:

>>> g = hello('world')

>>> next(g) 'Hello, world'

>>> g.send('Reuven') 'Hello, Reuven'

AT THE FORGE

>>> g.send('Linux Journal') 'Hello, Linux Journal'

In other words, first you run the generator function to get a generator object (“g”) back. You then have to prime it with the next function, running up to and including the first yield statement. From that point on, you can submit any value you want to the generator via the send method. Until you run g.send(None), you’ll continue to get output back.

Used in this way, the generator is known as a “coroutine”—that is, it has state and executes. But, it executes in tandem with the main routine, and you can query it whenever you want to get something from it.

Python’s asyncio uses these basic concepts, albeit with slightly different syntax, to accomplish its goals. And although it might seem like a trivial thing to be able to send data into generators, and get things back on a regular basis, that’s far from the case.

Indeed, this provides the core of an entire infrastructure that allows you to create efficient network applications that can handle many simultaneous users, without the pain of either threads or processes.

In my next article, I plan to start to look at asyncio’s specific syntax and how it maps to what I’ve shown here. Stay tuned. ◾

Send comments or feedback

via http://www.linuxjournal.com/contact or email [email protected].

UPFRONT

In document with Kolla GitHub Web Hooks (Page 39-44)