Flask App: Update progress bar while function runs

I'm building a fairly simple WebApp in Flask that performs functions via a website's API. My users fill out a form with their account URL and API token; when they submit the form I have a python script that exports PDFs from their account via the API. This function can take a long time so I want to display a bootstrap progress bar on the form page indicating how far along in the process the script is. My question is how to I update the progress bar as the function is running? Here is a simplified version of what I'm talking about.

views.py:

@app.route ('/export_pdf', methods = ['GET', 'POST'])
def export_pdf():
    form = ExportPDF()
    if form.validate_on_submit():
      try:
        export_pdfs.main_program(form.account_url.data,
          form.api_token.data)
        flash ('PDFs exported')
        return redirect(url_for('export_pdf'))
      except TransportException as e:
        s = e.content
        result = re.search('<error>(.*)</error>', s)
        flash('There was an authentication error: ' + result.group(1))
      except FailedRequest as e:
        flash('There was an error: ' + e.error)
    return render_template('export_pdf.html', title = 'Export PDFs', form = form)

export_pdf.html:

{% extends "base.html" %}

{% block content %}
{% include 'flash.html' %}
<div class="well well-sm">
  <h3>Export PDFs</h3>
  <form class="navbar-form navbar-left" action="" method ="post" name="receipt">
    {{form.hidden_tag()}}
    <br>
    <div class="control-group{% if form.errors.account_url %} error{% endif %}">
      <label class"control-label" for="account_url">Enter Account URL:</label>
      <div class="controls">
        {{ form.account_url(size = 50, class = "span4")}}
        {% for error in form.errors.account_url %}
          <span class="help-inline">[{{error}}]</span><br>
        {% endfor %}
      </div>
    </div>
    <br>
    <div class="control-group{% if form.errors.api_token %} error{% endif %}">
      <label class"control-label" for="api_token">Enter API Token:</label>
      <div class="controls">
        {{ form.api_token(size = 50, class = "span4")}}
        {% for error in form.errors.api_token %}
          <span class="help-inline">[{{error}}]</span><br>
        {% endfor %}
      </div>
    </div>
    <br>
    <button type="submit" class="btn btn-primary btn-lg">Submit</button>
  <br>
  <br>
  <div class="progress progress-striped active">
  <div class="progress-bar"  role="progressbar" aria-valuenow="0" aria-valuemin="0" aria-valuemax="100" style="width: 0%">
    <span class="sr-only"></span>
  </div>
</form>
</div>
</div>
{% endblock %}

and export_pdfs.py:

def main_program(url, token):
    api_caller = api.TokenClient(url, token)
    path = os.path.expanduser('~/Desktop/'+url+'_pdfs/')
    pdfs = list_all(api_caller.pdf.list, 'pdf')
    total = 0
    count = 1
    for pdf in pdfs:
        total = total + 1
    for pdf in pdfs:
        header, body = api_caller.getPDF(pdf_id=int(pdf.pdf_id))
        with open('%s.pdf' % (pdf.number), 'wb') as f:
          f.write(body)
        count = count + 1
        if count % 50 == 0:
          time.sleep(1)

In that last function I have total the number of PDFs I will export, and have an ongoing count while it is processing. How can I send the current progress to my .html file to fit within the 'style=' tag of the progress bar? Preferably in a way that I can reuse the same tool for progress bars on other pages. Let me know if I haven't provided enough info.


Solution 1:

As some others suggested in the comments, the simplest solution is to run your exporting function in another thread, and let your client pull progress information with another request. There are multiple approaches to handle this particular task. Depending on your needs, you might opt for a more or less sophisticated one.

Here's a very (very) minimal example on how to do it with threads:

import random
import threading
import time

from flask import Flask


class ExportingThread(threading.Thread):
    def __init__(self):
        self.progress = 0
        super().__init__()

    def run(self):
        # Your exporting stuff goes here ...
        for _ in range(10):
            time.sleep(1)
            self.progress += 10


exporting_threads = {}
app = Flask(__name__)
app.debug = True


@app.route('/')
def index():
    global exporting_threads

    thread_id = random.randint(0, 10000)
    exporting_threads[thread_id] = ExportingThread()
    exporting_threads[thread_id].start()

    return 'task id: #%s' % thread_id


@app.route('/progress/<int:thread_id>')
def progress(thread_id):
    global exporting_threads

    return str(exporting_threads[thread_id].progress)


if __name__ == '__main__':
    app.run()

In the index route (/) we spawn a thread for each exporting task, and we return an ID to that task so that the client can retrieve it later with the progress route (/progress/[exporting_thread]). The exporting thread updates its progress value every time it thinks it is appropriate.

On the client side, you would get something like this (this example uses jQuery):

function check_progress(task_id, progress_bar) {
    function worker() {
        $.get('progress/' + task_id, function(data) {
            if (progress < 100) {
                progress_bar.set_progress(progress)
                setTimeout(worker, 1000)
            }
        })
    }
}

As said, this example is very minimalistic and you should probably go for a slightly more sophisticated approach. Usually, we would store the progress of a particular thread in a database or a cache of some sort, so that we don't rely on a shared structure, hence avoiding most of the memory and concurrency issues my example has.

Redis (https://redis.io) is an in-memory database store that is generally well-suited for this kind of tasks. It integrates ver nicely with Python (https://pypi.python.org/pypi/redis).

Solution 2:

I run this simple but educational Flask SSE implementation on localhost. To handle 3rd party (user uploaded) library in GAE:

  1. Create a directory named lib in your root path.
  2. copy gevent library directory to lib directory.
  3. Add these lines to your main.py:

    import sys
    sys.path.insert(0,'lib')
    
  4. Thats all. If you use lib directory from a child folder, use relative reference: sys.path.insert(0, ../../blablabla/lib')

From http://flask.pocoo.org/snippets/116/

# author: [email protected]
#
# Make sure your gevent version is >= 1.0
import gevent
from gevent.wsgi import WSGIServer
from gevent.queue import Queue

from flask import Flask, Response

import time


# SSE "protocol" is described here: http://mzl.la/UPFyxY
class ServerSentEvent(object):

    def __init__(self, data):
        self.data = data
        self.event = None
        self.id = None
        self.desc_map = {
            self.data : "data",
            self.event : "event",
            self.id : "id"
        }

    def encode(self):
        if not self.data:
            return ""
        lines = ["%s: %s" % (v, k) 
                 for k, v in self.desc_map.iteritems() if k]

        return "%s\n\n" % "\n".join(lines)

app = Flask(__name__)
subscriptions = []

# Client code consumes like this.
@app.route("/")
def index():
    debug_template = """
     <html>
       <head>
       </head>
       <body>
         <h1>Server sent events</h1>
         <div id="event"></div>
         <script type="text/javascript">

         var eventOutputContainer = document.getElementById("event");
         var evtSrc = new EventSource("/subscribe");

         evtSrc.onmessage = function(e) {
             console.log(e.data);
             eventOutputContainer.innerHTML = e.data;
         };

         </script>
       </body>
     </html>
    """
    return(debug_template)

@app.route("/debug")
def debug():
    return "Currently %d subscriptions" % len(subscriptions)

@app.route("/publish")
def publish():
    #Dummy data - pick up from request for real data
    def notify():
        msg = str(time.time())
        for sub in subscriptions[:]:
            sub.put(msg)

    gevent.spawn(notify)

    return "OK"

@app.route("/subscribe")
def subscribe():
    def gen():
        q = Queue()
        subscriptions.append(q)
        try:
            while True:
                result = q.get()
                ev = ServerSentEvent(str(result))
                yield ev.encode()
        except GeneratorExit: # Or maybe use flask signals
            subscriptions.remove(q)

    return Response(gen(), mimetype="text/event-stream")

if __name__ == "__main__":
    app.debug = True
    server = WSGIServer(("", 5000), app)
    server.serve_forever()
    # Then visit http://localhost:5000 to subscribe 
    # and send messages by visiting http://localhost:5000/publish