How do I parallelize a simple Python loop?
This is probably a trivial question, but how do I parallelize the following loop in python?
# setup output lists
output1 = list()
output2 = list()
output3 = list()
for j in range(0, 10):
# calc individual parameter value
parameter = j * offset
# call the calculation
out1, out2, out3 = calc_stuff(parameter = parameter)
# put results into correct output list
output1.append(out1)
output2.append(out2)
output3.append(out3)
I know how to start single threads in Python but I don't know how to "collect" the results.
Multiple processes would be fine too - whatever is easiest for this case. I'm using currently Linux but the code should run on Windows and Mac as-well.
What's the easiest way to parallelize this code?
Using multiple threads on CPython won't give you better performance for pure-Python code due to the global interpreter lock (GIL). I suggest using the multiprocessing
module instead:
pool = multiprocessing.Pool(4)
out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))
Note that this won't work in the interactive interpreter.
To avoid the usual FUD around the GIL: There wouldn't be any advantage to using threads for this example anyway. You want to use processes here, not threads, because they avoid a whole bunch of problems.
from joblib import Parallel, delayed
def process(i):
return i * i
results = Parallel(n_jobs=2)(delayed(process)(i) for i in range(10))
print(results) # prints [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
The above works beautifully on my machine (Ubuntu, package joblib was pre-installed, but can be installed via pip install joblib
).
Taken from https://blog.dominodatalab.com/simple-parallelization/
Edit on Mar 31, 2021: On joblib
, multiprocessing
, threading
and asyncio
-
joblib
in the above code usesimport multiprocessing
under the hood (and thus multiple processes, which is typically the best way to run CPU work across cores - because of the GIL) - You can let
joblib
use multiple threads instead of multiple processes, but this (or usingimport threading
directly) is only beneficial if the threads spend considerable time on I/O (e.g. read/write to disk, send an HTTP request). For I/O work, the GIL does not block the execution of another thread - Since Python 3.7, as an alternative to
threading
, you can parallelise work with asyncio, but the same advice applies like forimport threading
(though in contrast to latter, only 1 thread will be used) - Using multiple processes incurs overhead. You need to check yourself if the above code snippet improves your wall time. Here is another one, for which I confirmed that
joblib
produces better results:
import time
from joblib import Parallel, delayed
def countdown(n):
while n>0:
n -= 1
return n
t = time.time()
for _ in range(20):
print(countdown(10**7), end=" ")
print(time.time() - t)
# takes ~10.5 seconds on medium sized Macbook Pro
t = time.time()
results = Parallel(n_jobs=2)(delayed(countdown)(10**7) for _ in range(20))
print(results)
print(time.time() - t)
# takes ~6.3 seconds on medium sized Macbook Pro