Progress Bar while download file over http with Requests

I need to download a sizable (~200MB) file. I figured out how to download and save the file with here. It would be nice to have a progress bar to know how much has been downloaded. I found ProgressBar but I'm not sure how to incorperate the two together.

Here's the code I tried, but it didn't work.

bar = progressbar.ProgressBar(max_value=progressbar.UnknownLength)
with closing(download_file()) as r:
    for i in range(20):
        bar.update(i)

Solution 1:

I suggest you try tqdm, it's very easy to use. Example code for downloading with requests library:

from tqdm import tqdm
import requests

url = "http://www.ovh.net/files/10Mb.dat" #big file test
# Streaming, so we can iterate over the response.
response = requests.get(url, stream=True)
total_size_in_bytes= int(response.headers.get('content-length', 0))
block_size = 1024 #1 Kibibyte
progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True)
with open('test.dat', 'wb') as file:
    for data in response.iter_content(block_size):
        progress_bar.update(len(data))
        file.write(data)
progress_bar.close()
if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
    print("ERROR, something went wrong")

Solution 2:

The tqdm package now includes a function designed more specifically for this type of situation: wrapattr. You just wrap an object's read (or write) attribute, and tqdm handles the rest; there's no messing with block sizes or anything like that. Here's a simple download function that puts it all together with requests:

def download(url, filename):
    import functools
    import pathlib
    import shutil
    import requests
    from tqdm.auto import tqdm
    
    r = requests.get(url, stream=True, allow_redirects=True)
    if r.status_code != 200:
        r.raise_for_status()  # Will only raise for 4xx codes, so...
        raise RuntimeError(f"Request to {url} returned status code {r.status_code}")
    file_size = int(r.headers.get('Content-Length', 0))

    path = pathlib.Path(filename).expanduser().resolve()
    path.parent.mkdir(parents=True, exist_ok=True)

    desc = "(Unknown total file size)" if file_size == 0 else ""
    r.raw.read = functools.partial(r.raw.read, decode_content=True)  # Decompress if needed
    with tqdm.wrapattr(r.raw, "read", total=file_size, desc=desc) as r_raw:
        with path.open("wb") as f:
            shutil.copyfileobj(r_raw, f)

    return path

Solution 3:

It seems that there is a disconnect between the examples on the Progress Bar Usage page and what the code actually requires.

In the following example, note the use of maxval instead of max_value. Also note the use of .start() to initialized the bar. This has been noted in an Issue.

The n_chunk parameter denotes how many 1024 kb chunks to stream at once while looping through the request iterator.

import requests
import time

import numpy as np

import progressbar


url = "http://wikipedia.com/"

def download_file(url, n_chunk=1):
    r = requests.get(url, stream=True)
    # Estimates the number of bar updates
    block_size = 1024
    file_size = int(r.headers.get('Content-Length', None))
    num_bars = np.ceil(file_size / (n_chunk * block_size))
    bar =  progressbar.ProgressBar(maxval=num_bars).start()
    with open('test.html', 'wb') as f:
        for i, chunk in enumerate(r.iter_content(chunk_size=n_chunk * block_size)):
            f.write(chunk)
            bar.update(i+1)
            # Add a little sleep so you can see the bar progress
            time.sleep(0.05)
    return

download_file(url)

EDIT: Addressed comment about code clarity.
EDIT2: Fixed logic so bar reports 100% at completion. Credit to leovp's answer for using the 1024 kb block size.

Solution 4:

Also python library enlighten can be used, it is powerful, provides colorful progress bars and correctly works in Linux, Windows.

Below is code + live screen-cast. This code can be run here on repl.it.

import math
import requests, enlighten

url = 'https://upload.wikimedia.org/wikipedia/commons/a/ae/Arthur_Streeton_-_Fire%27s_on_-_Google_Art_Project.jpg?download'
fname = 'image.jpg'

# Should be one global variable
MANAGER = enlighten.get_manager()

r = requests.get(url, stream = True)
assert r.status_code == 200, r.status_code
dlen = int(r.headers.get('Content-Length', '0')) or None

with MANAGER.counter(color = 'green', total = dlen and math.ceil(dlen / 2 ** 20), unit = 'MiB', leave = False) as ctr, \
     open(fname, 'wb', buffering = 2 ** 24) as f:
    for chunk in r.iter_content(chunk_size = 2 ** 20):
        print(chunk[-16:].hex().upper())
        f.write(chunk)
        ctr.update()

Output (+ ascii-video)

ascii

Solution 5:

It seems like you're going to need to get the remote file size (answered here) to calculate how far along you are.

You could then update your progress bar while processing each chunk... if you know the total size and the size of the chunk, you can figure out when to update the progress bar.