for i in range(len(X)): Yprime.append(Fprime(X[i]))
3.8.5.1 Multi-threading in Python
Threading in Python is perfect for I/O operations where the process is expected to be idle regularly, e.g. web scraping. This is a very useful feature because several applications and script might spend the majority of their runtime on waiting for network or data I/O. In several cases, e.g. web scraping, the resources, i.e. downloading from different websites, are most of the time
aspect='auto', origin='lower')
plt.plot(reduced_data[:, 0], reduced_data[:, 1], 'k.', markersize=2) # Plot the centroids as a white X
centroids = kmeans.cluster_centers_
plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=169, linewidths=3, color='w', zorder=10)
plt.title('K-means clustering on the digits dataset (PCA-reduced data)\n'
'Centroids are marked with white cross') plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max) plt.xticks(()) plt.yticks(()) plt.show()
independent. Therefore the processor can download in parallel and join the result at the end.
3.8.5.1.1 Thread vs Threading
There are two built-in modules in Python that are related to threading, namely
thread and threading. The former module is deprecated for sometime in Python 2, and in Python 3 it is renamed to _thread for the sake of backwards incompatibilities. The _thread
module provides low-level threading API for multi-threading in Python, whereas the module threading builds a high-level threading interface on top of it.
The Thread() is the main method of the threading module, the two important arguments
of which are target, for specifying the callable object, and args to pass the
arguments for the target callable. We illustrate these in the following example:
This is the output of the previous example:
In case you are not familiar with the if __name__ == '__main__:' statement, what it does is
basically making sure that the code nested under this condition will be run only if you run your module as a program and it will not run in case your module is imported in another file.
3.8.5.1.2 Locks
As mentioned prior, the memory space is shared between the threads. This is at the same time beneficial and problematic: it is beneficial in a sense that the communication between the threads becomes easy, however, you might experience strange outcome if you let several threads change same variable without caution, e.g. thread 2 changes variable x while thread 1 is working with import threading
def hello_thread(thread_num):
print ("Hello from Thread ", thread_num)
if__name__ =='__main__': for thread_num in range(5):
t = threading.Thread(target=hello_thread,arg=(thread_num,)) t.start()
In [1]: %run threading.py Hello from Thread 0 Hello from Thread 1 Hello from Thread 2 Hello from Thread 3 Hello from Thread 4
it. This is when lock comes into play. Using lock, you can allow only one thread to
work with a variable. In other words, only a single thread can hold the lock. If the
other threads need to work with that variable, they have to wait until the other thread is done and the variable is “unlocked”.
We illustrate this with a simple example:
Suppose we want to print multiples of 3 between 1 and 12, i.e. 3, 6, 9 and 12. For the sake of argument, we try to do this using 2 threads and a nested for loop. Then we create a global variable called counter and we initialize it with 0. Then whenever each of the incrementer1 or incrementer2 functions are called, the counter is
incremented by 3 twice (counter is incremented by 6 in each function call). If you run the previous code, you should be really lucky if you get the following as part of your output:
The reason is the conflict that happens between threads while incrementing the
counter in the nested for loop. As you probably noticed, the first level for loop is
equivalent of adding 3 to the counter and the conflict that might happen is not effective on that level but the nested for loop. Accordingly, the output of the previous code is different in every run. This is an example output:
import threading global counter counter =0 def incrementer1(): global counter for j in range(2): for i in range(3): counter += 1
print("Greeter 1 incremented the counter by 1") print ("Counter is %d"%counter)
def incrementer2(): global counter for j in range(2): for i in range(3): counter += 1
print("Greeter 2 incremented the counter by 1") print ("Counter is now %d"%counter)
if__name__ =='__main__': t1 = threading.Thread(target = incrementer1) t2 = threading.Thread(target = incrementer2) t1.start() t2.start() Counter is now 3 Counter is now 6 Counter is now 9 Counter is now 12
We can fix this issue using a lock: whenever one of the function is going to
increment the value by 3, it will acquire() the lock and when it is done the function
will release() the lock. This mechanism is illustrated in the following code:
No matter how many times you run this code, the output would always be in the correct order:
$ python3 lock_example.py
Greeter 1 incremented the counter by 1 Greeter 1 incremented the counter by 1 Greeter 1 incremented the counter by 1 Counter is 4
Greeter 2 incremented the counter by 1 Greeter 2 incremented the counter by 1 Greeter 1 incremented the counter by 1 Greeter 2 incremented the counter by 1 Greeter 1 incremented the counter by 1 Counter is 8
Greeter 1 incremented the counter by 1 Greeter 2 incremented the counter by 1 Counter is 10
Greeter 2 incremented the counter by 1 Greeter 2 incremented the counter by 1 Counter is 12 import threading increment_by_3_lock = threading.Lock() global counter counter =0 def incrementer1(): global counter for j in range(2): increment_by_3_lock.acquire(True) for i in range(3): counter += 1
print("Greeter 1 incremented the counter by 1") print ("Counter is %d"%counter)
increment_by_3_lock.release() def incrementer2(): global counter for j in range(2): increment_by_3_lock.acquire(True) for i in range(3): counter += 1
print("Greeter 2 incremented the counter by 1") print ("Counter is %d"%counter)
increment_by_3_lock.release() if__name__ =='__main__': t1 = threading.Thread(target = incrementer1) t2 = threading.Thread(target = incrementer2) t1.start() t2.start() $ python3 lock_example.py
Greeter 1 incremented the counter by 1 Greeter 1 incremented the counter by 1 Greeter 1 incremented the counter by 1 Counter is 3
Greeter 1 incremented the counter by 1 Greeter 1 incremented the counter by 1 Greeter 1 incremented the counter by 1 Counter is 6
Greeter 2 incremented the counter by 1 Greeter 2 incremented the counter by 1 Greeter 2 incremented the counter by 1
Using the Threading module increases both the overhead associated with thread
management as well as the complexity of the program and that is why in many situations, employing multiprocessing module might be a better approach.