Jupyter notebook never finishes processing using multiprocessing (Python 3)
Solution 1:
It seems that the problem in Jupyter notebook as in different ide is the design feature. Therefore, we have to write the function (prime_factor) into a different file and import the module. Furthermore, we have to take care of the adjustments. For example, in my case, I have coded the function into a file known as defs.py
def prime_factor(value):
factors = []
for divisor in range(2, value-1):
quotient, remainder = divmod(value, divisor)
if not remainder:
factors.extend(prime_factor(divisor))
factors.extend(prime_factor(quotient))
break
else:
factors = [value]
return factors
Then in the jupyter notebook I wrote the following lines
import multiprocessing
import random
from multiprocessing import Pool
import defs
if __name__ == '__main__':
pool = Pool()
to_factor = [ random.randint(100000, 50000000) for i in range(20)]
results = pool.map(defs.prime_factor, to_factor)
for value, factors in zip(to_factor, results):
print("The factors of {} are {}".format(value, factors))
This solved my problem
Solution 2:
To execute a function without having to write it into a separated file manually:
We can dynamically write the task to process into a temporary file, import it and execute the function.
from multiprocessing import Pool
from functools import partial
import inspect
def parallal_task(func, iterable, *params):
with open(f'./tmp_func.py', 'w') as file:
file.write(inspect.getsource(func).replace(func.__name__, "task"))
from tmp_func import task
if __name__ == '__main__':
func = partial(task, params)
pool = Pool(processes=8)
res = pool.map(func, iterable)
pool.close()
return res
else:
raise "Not in Jupyter Notebook"
We can then simply call it in a notebook cell like this:
def long_running_task(params, id):
# Heavy job here
return params, id
data_list = range(8)
for res in parallal_task(long_running_task, data_list, "a", 1, "b"):
print(res)
Ouput:
('a', 1, 'b') 0
('a', 1, 'b') 1
('a', 1, 'b') 2
('a', 1, 'b') 3
('a', 1, 'b') 4
('a', 1, 'b') 5
('a', 1, 'b') 6
('a', 1, 'b') 7
Note: If you're using Anaconda and if you want to see the progress of the heavy task, you can use print()
inside long_running_task()
. The content of the print will be displayed in the Anaconda Prompt console.