Performance of FOR vs FOREACH in PHP
My personal opinion is to use what makes sense in the context. Personally I almost never use for
for array traversal. I use it for other types of iteration, but foreach
is just too easy... The time difference is going to be minimal in most cases.
The big thing to watch for is:
for ($i = 0; $i < count($array); $i++) {
That's an expensive loop, since it calls count on every single iteration. So long as you're not doing that, I don't think it really matters...
As for the reference making a difference, PHP uses copy-on-write, so if you don't write to the array, there will be relatively little overhead while looping. However, if you start modifying the array within the array, that's where you'll start seeing differences between them (since one will need to copy the entire array, and the reference can just modify inline)...
As for the iterators, foreach
is equivalent to:
$it->rewind();
while ($it->valid()) {
$key = $it->key(); // If using the $key => $value syntax
$value = $it->current();
// Contents of loop in here
$it->next();
}
As far as there being faster ways to iterate, it really depends on the problem. But I really need to ask, why? I understand wanting to make things more efficient, but I think you're wasting your time for a micro-optimization. Remember, Premature Optimization Is The Root Of All Evil
...
Edit: Based upon the comment, I decided to do a quick benchmark run...
$a = array();
for ($i = 0; $i < 10000; $i++) {
$a[] = $i;
}
$start = microtime(true);
foreach ($a as $k => $v) {
$a[$k] = $v + 1;
}
echo "Completed in ", microtime(true) - $start, " Seconds\n";
$start = microtime(true);
foreach ($a as $k => &$v) {
$v = $v + 1;
}
echo "Completed in ", microtime(true) - $start, " Seconds\n";
$start = microtime(true);
foreach ($a as $k => $v) {}
echo "Completed in ", microtime(true) - $start, " Seconds\n";
$start = microtime(true);
foreach ($a as $k => &$v) {}
echo "Completed in ", microtime(true) - $start, " Seconds\n";
And the results:
Completed in 0.0073502063751221 Seconds
Completed in 0.0019769668579102 Seconds
Completed in 0.0011849403381348 Seconds
Completed in 0.00111985206604 Seconds
So if you're modifying the array in the loop, it's several times faster to use references...
And the overhead for just the reference is actually less than copying the array (this is on 5.3.2)... So it appears (on 5.3.2 at least) as if references are significantly faster...
EDIT: Using PHP 8.0 I got the following:
Completed in 0.0005030632019043 Seconds
Completed in 0.00066304206848145 Seconds
Completed in 0.00016379356384277 Seconds
Completed in 0.00056815147399902 Seconds
Repeated this test numerous times and ranking results were consistent.
I'm not sure this is so surprising. Most people who code in PHP are not well versed in what PHP is actually doing at the bare metal. I'll state a few things, which will be true most of the time:
If you're not modifying the variable, by-value is faster in PHP. This is because it's reference counted anyway and by-value gives it less to do. It knows the second you modify that ZVAL (PHP's internal data structure for most types), it will have to break it off in a straightforward way (copy it and forget about the other ZVAL). But you never modify it, so it doesn't matter. References make that more complicated with more bookkeeping it has to do to know what to do when you modify the variable. So if you're read-only, paradoxically it's better not the point that out with the &. I know, it's counter intuitive, but it's also true.
Foreach isn't slow. And for simple iteration, the condition it's testing against — "am I at the end of this array" — is done using native code, not PHP opcodes. Even if it's APC cached opcodes, it's still slower than a bunch of native operations done at the bare metal.
Using a for loop "for ($i=0; $i < count($x); $i++) is slow because of the count(), and the lack of PHP's ability (or really any interpreted language) to evaluate at parse time whether anything modifies the array. This prevents it from evaluating the count once.
But even once you fix it with "$c=count($x); for ($i=0; $i<$c; $i++) the $i<$c is a bunch of Zend opcodes at best, as is the $i++. In the course of 100000 iterations, this can matter. Foreach knows at the native level what to do. No PHP opcodes needed to test the "am I at the end of this array" condition.
What about the old school "while(list(" stuff? Well, using each(), current(), etc. are all going to involve at least 1 function call, which isn't slow, but not free. Yes, those are PHP opcodes again! So while + list + each has its costs as well.
For these reasons foreach is understandably the best option for simple iteration.
And don't forget, it's also the easiest to read, so it's win-win.
One thing to watch out for in benchmarks (especially phpbench.com), is even though the numbers are sound, the tests are not. Alot of the tests on phpbench.com are doing things at are trivial and abuse PHP's ability to cache array lookups to skew benchmarks or in the case of iterating over an array doesn't actually test it in real world cases (no one writes empty for loops). I've done my own benchmarks that I've found are fairly reflective of the real world results and they always show the language's native iterating syntax foreach
coming out on top (surprise, surprise).
//make a nicely random array
$aHash1 = range( 0, 999999 );
$aHash2 = range( 0, 999999 );
shuffle( $aHash1 );
shuffle( $aHash2 );
$aHash = array_combine( $aHash1, $aHash2 );
$start1 = microtime(true);
foreach($aHash as $key=>$val) $aHash[$key]++;
$end1 = microtime(true);
$start2 = microtime(true);
while(list($key) = each($aHash)) $aHash[$key]++;
$end2 = microtime(true);
$start3 = microtime(true);
$key = array_keys($aHash);
$size = sizeOf($key);
for ($i=0; $i<$size; $i++) $aHash[$key[$i]]++;
$end3 = microtime(true);
$start4 = microtime(true);
foreach($aHash as &$val) $val++;
$end4 = microtime(true);
echo "foreach ".($end1 - $start1)."\n"; //foreach 0.947947025299
echo "while ".($end2 - $start2)."\n"; //while 0.847212076187
echo "for ".($end3 - $start3)."\n"; //for 0.439476966858
echo "foreach ref ".($end4 - $start4)."\n"; //foreach ref 0.0886030197144
//For these tests we MUST do an array lookup,
//since that is normally the *point* of iteration
//i'm also calling noop on it so that PHP doesn't
//optimize out the loopup.
function noop( $value ) {}
//Create an array of increasing indexes, w/ random values
$bHash = range( 0, 999999 );
shuffle( $bHash );
$bstart1 = microtime(true);
for($i = 0; $i < 1000000; ++$i) noop( $bHash[$i] );
$bend1 = microtime(true);
$bstart2 = microtime(true);
$i = 0; while($i < 1000000) { noop( $bHash[$i] ); ++$i; }
$bend2 = microtime(true);
$bstart3 = microtime(true);
foreach( $bHash as $value ) { noop( $value ); }
$bend3 = microtime(true);
echo "for ".($bend1 - $bstart1)."\n"; //for 0.397135972977
echo "while ".($bend2 - $bstart2)."\n"; //while 0.364789962769
echo "foreach ".($bend3 - $bstart3)."\n"; //foreach 0.346374034882
It's 2020 and stuffs had greatly evolved with php 7.4 and opcache.
Here is the OP^ benchmark, ran as unix CLI, without the echo and html parts.
Test ran locally on a regular computer.
php -v
PHP 7.4.6 (cli) (built: May 14 2020 10:02:44) ( NTS )
Modified benchmark script:
<?php
## preperations; just a simple environment state
$test_iterations = 100;
$test_arr_size = 1000;
// a shared function that makes use of the loop; this should
// ensure no funny business is happening to fool the test
function test($input)
{
//echo '<!-- '.trim($input).' -->';
}
// for each test we create a array this should avoid any of the
// arrays internal representation or optimizations from getting
// in the way.
// normal array
$test_arr1 = array();
$test_arr2 = array();
$test_arr3 = array();
// hash tables
$test_arr4 = array();
$test_arr5 = array();
for ($i = 0; $i < $test_arr_size; ++$i)
{
mt_srand();
$hash = md5(mt_rand());
$key = substr($hash, 0, 5).$i;
$test_arr1[$i] = $test_arr2[$i] = $test_arr3[$i] = $test_arr4[$key] = $test_arr5[$key]
= $hash;
}
## foreach
$start = microtime(true);
for ($j = 0; $j < $test_iterations; ++$j)
{
foreach ($test_arr1 as $k => $v)
{
test($v);
}
}
echo 'foreach '.(microtime(true) - $start)."\n";
## foreach (using reference)
$start = microtime(true);
for ($j = 0; $j < $test_iterations; ++$j)
{
foreach ($test_arr2 as &$value)
{
test($value);
}
}
echo 'foreach (using reference) '.(microtime(true) - $start)."\n";
## for
$start = microtime(true);
for ($j = 0; $j < $test_iterations; ++$j)
{
$size = count($test_arr3);
for ($i = 0; $i < $size; ++$i)
{
test($test_arr3[$i]);
}
}
echo 'for '.(microtime(true) - $start)."\n";
## foreach (hash table)
$start = microtime(true);
for ($j = 0; $j < $test_iterations; ++$j)
{
foreach ($test_arr4 as $k => $v)
{
test($v);
}
}
echo 'foreach (hash table) '.(microtime(true) - $start)."\n";
## for (hash table)
$start = microtime(true);
for ($j = 0; $j < $test_iterations; ++$j)
{
$keys = array_keys($test_arr5);
$size = sizeOf($test_arr5);
for ($i = 0; $i < $size; ++$i)
{
test($test_arr5[$keys[$i]]);
}
}
echo 'for (hash table) '.(microtime(true) - $start)."\n";
Output:
foreach 0.0032877922058105
foreach (using reference) 0.0029420852661133
for 0.0025191307067871
foreach (hash table) 0.0035080909729004
for (hash table) 0.0061779022216797
As you can see the evolution is insane, about 560 time faster than reported in 2012.
On my machines and servers, following my numerous experiments, basics for loops are the fastest. This is even clearer using nested loops ($i $j $k..)
It is also the most flexible in usage, and has a better readability from my view.