How do I run a long-running job in the background in Python

Solution 1:

Celery and RQ is overengineering for simple task. Take a look at this docs - https://docs.python.org/3/library/concurrent.futures.html

Also check example, how to run long-running jobs in background for Flask app - https://stackoverflow.com/a/39008301/5569578

Solution 2:

The more regular approch to handle such issue is extract the action from the base application and call it outside, using a task manager system like Celery.

Using this tutorial you can create your task and trigger it from your web application.

from flask import Flask

app = Flask(__name__)
app.config.update(
    CELERY_BROKER_URL='redis://localhost:6379',
    CELERY_RESULT_BACKEND='redis://localhost:6379'
)
celery = make_celery(app)


@celery.task()
def add_together(a, b):
    return a + b

Then you can run:

>>> result = add_together.delay(23, 42)
>>> result.wait()
65

Just remember you need to run worker separately:

celery -A your_application worker

Solution 3:

Well, Although your approach is not incorrect, basicly it may lead your program run out of available threads. As Ali mentioned, a general approach is to use Job Queues like RQ or Celery. However you don't need to extract functions to use those libraries. For Flask, I recommend you to use Flask-RQ. It's simple to start:

RQ

pip install flask-rq

Just remember to install Redis before using it in your Flask app.

And simply use @Job Decorator in your Flask functions:

from flask.ext.rq import job


@job
def process(i):
    #  Long stuff to process


process.delay(3)

And finally you need rqworker to start the worker:

rqworker

You can see RQ docs for more info. RQ designed for simple long running processes.

Celery

Celery is more complicated, has huge list of features and is not recommended if you are new to job queues and distributed processing methods.

Greenlets

Greenlets have switches. Let you to switch between long running processes. You can use greenlets for running processes. The benefit is you don't need Redis and other worker, instead you have to re-design your functions to be compatible:

from greenlet import greenlet

def test1():
    print 12
    gr2.switch()
    print 34

def test2():
    print 56
    gr1.switch()
    print 78

gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch()

Solution 4:

Your approach is fine and will totally work, but why reinvent the background worker for python web applications when a widely accepted solution exists, namely celery.

I'd need to see a lot tests before I trusted any home rolled code for such an important task.

Plus celery gives you features like task persistence and the ability to distribute workers across multiple machines.