Stream large binary files with urllib2 to file
I use the following code to stream large files from the Internet into a local file:
fp = open(file, 'wb')
req = urllib2.urlopen(url)
for line in req:
fp.write(line)
fp.close()
This works but it downloads quite slowly. Is there a faster way? (The files are large so I don't want to keep them in memory.)
Solution 1:
No reason to work line by line (small chunks AND requires Python to find the line ends for you!-), just chunk it up in bigger chunks, e.g.:
# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3
response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
while True:
chunk = response.read(CHUNK)
if not chunk:
break
f.write(chunk)
Experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements.
Solution 2:
You can also use shutil:
import shutil
try:
from urllib.request import urlopen # Python 3
except ImportError:
from urllib2 import urlopen # Python 2
def get_large_file(url, file, length=16*1024):
req = urlopen(url)
with open(file, 'wb') as fp:
shutil.copyfileobj(req, fp, length)
Solution 3:
I used to use mechanize
module and its Browser.retrieve() method. In the past it took 100% CPU and downloaded things very slowly, but some recent release fixed this bug and works very quickly.
Example:
import mechanize
browser = mechanize.Browser()
browser.retrieve('http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.32-rc1.tar.bz2', 'Downloads/my-new-kernel.tar.bz2')
Mechanize is based on urllib2, so urllib2 can also have similar method... but I can't find any now.
Solution 4:
You can use urllib.retrieve() to download files:
Example:
try:
from urllib import urlretrieve # Python 2
except ImportError:
from urllib.request import urlretrieve # Python 3
url = "http://www.examplesite.com/myfile"
urlretrieve(url,"./local_file")