Only download a part of the document using python requests

Solution 1:

What you want to use here is called Range HTTP Header.

See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html (Specifically the bit on Range).

Solution 2:

I landed here from the question: Open first N characters of a url file with Python . However, I don't think that's a strict duplicate since it doesn't explicitly mention in the title whether it is mandatory to use the requests module or not. Also, it may be the case that range bytes are not supported by the server where the request is to be made, for whatever reason. In that case, I'd rather simply talk HTTP directly:

#!/usr/bin/env python

import socket
import time

TCP_HOST = 'stackoverflow.com' # This is the host we are going to query
TCP_PORT = 80 # This is the standard port for HTTP protocol
MAX_LIMIT = 1024 # This is the maximum size of the info we want in bytes

# Create the string to talk HTTP/1.1
MESSAGE = \
"GET /questions/23602412/only-download-a-part-of-the-document-using-python-requests HTTP/1.1\r\n" \
"HOST: stackoverflow.com\r\n" \
"User-Agent: Custom/0.0.1\r\n" \
"Accept: */*\r\n\n"

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # Create a socket
s.connect((TCP_HOST, TCP_PORT)) # Connect to remote socket at given address
s.send(MESSAGE) # Let's begin the transaction

time.sleep(0.1) # Machines are involved, but... oh, well!

# Keep reading from socket till max limit is reached
curr_size = 0
data = ""
while curr_size < MAX_LIMIT:
    data += s.recv(MAX_LIMIT - curr_size)
    curr_size = len(data)

s.close() # Mark the socket as closed

# Everyone likes a happy ending!
print data + "\n"
print "Length of received data:", len(data)

Sample run:

$ python sample.py
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
X-Frame-Options: SAMEORIGIN
X-Request-Guid: 3098c32c-3423-4e8a-9c7e-6dd530acdf8c
Content-Length: 73444
Accept-Ranges: bytes
Date: Fri, 05 Aug 2016 03:21:55 GMT
Via: 1.1 varnish
Connection: keep-alive
X-Served-By: cache-sin6926-SIN
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1470367315.724674,VS0,VE246
X-DNS-Prefetch-Control: off
Set-Cookie: prov=c33383b6-3a4d-730f-02b9-0eab064b3487; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly

<!DOCTYPE html>
<html itemscope itemtype="http://schema.org/QAPage">
<head>

<title>http - Only download a part of the document using python requests - Stack Overflow</title>
    <link rel="shortcut icon" href="//cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=4f32ecc8f43d">
    <link rel="apple-touch-icon image_src" href="//cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon.png?v=c78bd457575a">
    <link rel="search" type="application/open

Length of received data: 1024

Only download a part of the document using python requests

Solution 1:

Solution 2:

Related

Recent Posts