Is there a good way to share a multiprocessing Lock between gunicorn workers? I am trying to write a json API with Flask. Some of the API calls will interact a python class that manages a running process (like ffmpeg for video conversion). When I scale up my number of web workers to more than 1, how can I ensure that only 1 worker is interacting with the class at the same time?

My initial thought was to use multiprocessing.Lock so the start() function can be atomic. I don't think I've figured out the right place to create a Lock so that one is shared across all the workers:

# runserver.py
from flask import Flask
from werkzeug.contrib.fixers import ProxyFix
import dummy

app = Flask(__name__)

@app.route('/')
def hello():
    dummy.start()
    return "ffmpeg started"

app.wsgi_app = ProxyFix(app.wsgi_app)

if __name__ == '__main__':
    app.run()

Here is my dummy operation:

# dummy.py
from multiprocessing import Lock
import time

lock = Lock()

def start():
    lock.acquire()

    # TODO do work
    for i in range(0,10):
        print "did work %s" % i
        time.sleep(1)

    lock.release()

When I refresh the page a few times, I see the output from each call woven together.

Am I barking up the wrong tree here? Is there an easier way to make sure that only copy of the processing class (here just the dummy start() method) gets run at the same time? I think I might need something like celery to run tasks (and just use only 1 worker) but that seems a bit overkill for my small project.


Solution 1:

I tried something, and it seems to work. I put preload_app = True in my gunicorn.conf and now the lock seems to be shared. I am still looking into exactly what's happening here but for now this is good enough, YMMV.

Solution 2:

Follow peterw's answer, the workers can share the lock resource.

But, It is better to use try-finally block to ensure the lock will always be released.

# dummy.py
from multiprocessing import Lock
import time

lock = Lock()

def start():
    lock.acquire()

    try:
        # TODO do work
        for i in range(0,10):
            print "did work %s" % i
            time.sleep(1)
    finally:
        lock.release()

Solution 3:

Late addition:
If for some reason, using preload_app is not feasible, then you need to use a named lock. This ensures that all processes are using the same lock object. Using mp.Lock() will create a different object for each process, negating any value.

I saw this package but did not use it yet. It supplies a named lock in the scope of one machine; that means that all processes within the same machine will use the same lock, but outside the boundaries of one machine this solution is not appropriate.