How to partially download a remote file with cURL?
You can also set the range header parameter with the php-curl extension.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.spiegel.de/');
curl_setopt($ch, CURLOPT_RANGE, '0-500');
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
echo $result;
But as noted before if the server doesn't honor this header but sends the whole file curl will download all of it. E.g. http://www.php.net ignores the header. But you can (in addition) set a write function callback and abort the request when more data is received, e.g.
// php 5.3+ only
// use function writefn($ch, $chunk) { ... } for earlier versions
$writefn = function($ch, $chunk) {
static $data='';
static $limit = 500; // 500 bytes, it's only a test
$len = strlen($data) + strlen($chunk);
if ($len >= $limit ) {
$data .= substr($chunk, 0, $limit-strlen($data));
echo strlen($data) , ' ', $data;
return -1;
}
$data .= $chunk;
return strlen($chunk);
};
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.php.net/');
curl_setopt($ch, CURLOPT_RANGE, '0-500');
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_WRITEFUNCTION, $writefn);
$result = curl_exec($ch);
curl_close($ch);
Get the first 100 bytes of a document:
curl -r 0-99 http://www.get.this
from the manual
make sure you have a modern curl
Thanks for the nice solution VolkerK. However I needed to use this code as a function, so here's what I came up with. I hope it's useful for others. The main difference is use ($limit, &$datadump) so a limit can be passed, and using the by-reference variable $datadump to be able to return it as a result. I also added CURLOPT_USERAGENT because some websites won't allow access without a user-agent header.
Check http://php.net/manual/en/functions.anonymous.php
function curl_get_contents_partial($url, $limit) {
$writefn = function($ch, $chunk) use ($limit, &$datadump) {
static $data = '';
$len = strlen($data) + strlen($chunk);
if ($len >= $limit) {
$data .= substr($chunk, 0, $limit - strlen($data));
$datadump = $data;
return -1;
}
$data .= $chunk;
return strlen($chunk);
};
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
//curl_setopt($ch, CURLOPT_RANGE, '0-1000'); //not honored by many sites, maybe just remove it altogether.
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_WRITEFUNCTION, $writefn);
$data = curl_exec($ch);
curl_close($ch);
return $datadump;
}
usage:
$page = curl_get_contents_partial('http://some.webpage.com', 1000); //read the first 1000 bytes
echo $page // or do whatever with the result.