Reusing the same curl handle. Big performance increase?
Solution 1:
Crossposted from Should I close cURL or not? because I think it's relevant here too.
I tried benching curl with using a new handle for each request and using the same handle with the following code:
ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
for ($i = 0; $i < 100; ++$i) {
$rand = rand();
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.google.com/?rand=" . $rand);
curl_exec($ch);
curl_close($ch);
}
$end_time = microtime(true);
ob_end_clean();
echo 'Curl without handle reuse: ' . ($end_time - $start_time) . '<br>';
ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
$ch = curl_init();
for ($i = 0; $i < 100; ++$i) {
$rand = rand();
curl_setopt($ch, CURLOPT_URL, "http://www.google.com/?rand=" . $rand);
curl_exec($ch);
}
curl_close($ch);
$end_time = microtime(true);
ob_end_clean();
echo 'Curl with handle reuse: ' . ($end_time - $start_time) . '<br>';
and got the following results:
Curl without handle reuse: 8.5690529346466
Curl with handle reuse: 5.3703031539917
So reusing the same handle actually provides a substantial performance increase when connecting to the same server multiple times. I tried connecting to different servers:
$url_arr = array(
'http://www.google.com/',
'http://www.bing.com/',
'http://www.yahoo.com/',
'http://www.slashdot.org/',
'http://www.stackoverflow.com/',
'http://github.com/',
'http://www.harvard.edu/',
'http://www.gamefaqs.com/',
'http://www.mangaupdates.com/',
'http://www.cnn.com/'
);
ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
foreach ($url_arr as $url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
curl_close($ch);
}
$end_time = microtime(true);
ob_end_clean();
echo 'Curl without handle reuse: ' . ($end_time - $start_time) . '<br>';
ob_start(); //Trying to avoid setting as many curl options as possible
$start_time = microtime(true);
$ch = curl_init();
foreach ($url_arr as $url) {
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
}
curl_close($ch);
$end_time = microtime(true);
ob_end_clean();
echo 'Curl with handle reuse: ' . ($end_time - $start_time) . '<br>';
And got the following result:
Curl without handle reuse: 3.7672290802002
Curl with handle reuse: 3.0146431922913
Still quite a substantial performance increase.
Solution 2:
It depends on if the urls are on same servers or not. If they are, concurrent requests to same server will reuse the connection. see CURLOPT_FORBID_REUSE.
If the urls are sometimes on same server you need to sort the urls as the default connection cache is limited to ten or twenty connections.
If they are on different servers there is no speed advantage on using the same handle.
With curl_multi_exec you can connect to different servers at a same time (parallel). Even then you need some queuing to not use thousands of simultaneous connections.
Solution 3:
I have a similar scenario where I post data to a server. It is chunked into requests of ~100 lines, so it produces a lot of requests. In a benchmark-run I compared two approaches for 12.614 Lines (127 requests needed) plus authentication and another housekeeping request (129 requests total).
The requests go over a network to a server in the same country, not on-site. They are secured by TLS 1.2 (the handshake will also take its toll, but given that HTTPS is becoming more and more a default choice, this might even make it more similar to your scenario).
With cURL reuse:
one $curlHandle
that is curl_init()
'ed once, and then only modified with CURLOPT_URL
and CURLOPT_POSTFIELDS
Run 1: ~42.92s
Run 3: ~41.52s
Run 4: ~53.17s
Run 5: ~53.93s
Run 6: ~55.51s
Run 11: ~53.59s
Run 12: ~53.76s
Avg: 50,63s / Std.Dev: 5,8s
TCP-Conversations / SSL Handshakes: 5 (Wireshark)
Without cURL reuse:
one curl_init
per request
Run 2: ~57.67s
Run 7: ~62.13s
Run 8: ~71.59s
Run 9: ~70.70s
Run 10: ~59.12s
Avg: 64,24s / Std. Dev: 6,5s
TCP-Conversations / SSL Handshakes: 129 (Wireshark)
It isn't the largest of datasets, but one can say that all of the "reused" runs are faster than all of the "init" runs. The average times show a difference of almost 14 seconds.
Solution 4:
It depends how many requests you will be making - the overhead for closing & reopening each is negligable, but when doing a thousand? Could be a few seconds or more.
I believe curl_multi_init would be the fastest method.
The whole thing depends on how many requests you need to do.