what is faster: in_array or isset? [closed]
This question is merely for me as I always like to write optimized code that can run also on cheap slow servers (or servers with A LOT of traffic)
I looked around and I was not able to find an answer. I was wondering what is faster between those two examples keeping in mind that the array's keys in my case are not important (pseudo-code naturally):
<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
if(!in_array($new_val, $a){
$a[] = $new_val;
//do other stuff
}
}
?>
<?php
$a = array();
while($new_val = 'get over 100k email addresses already lowercased'){
if(!isset($a[$new_val]){
$a[$new_val] = true;
//do other stuff
}
}
?>
As the point of the question is not the array collision, I would like to add that if you are afraid of colliding inserts for $a[$new_value]
, you can use $a[md5($new_value)]
. it can still cause collisions, but would take away from a possible DoS attack when reading from an user provided file (http://nikic.github.com/2011/12/28/Supercolliding-a-PHP-array.html)
The answers so far are spot-on. Using isset
in this case is faster because
- It uses an O(1) hash search on the key whereas
in_array
must check every value until it finds a match. - Being an opcode, it has less overhead than calling the
in_array
built-in function.
These can be demonstrated by using an array with values (10,000 in the test below), forcing in_array
to do more searching.
isset: 0.009623
in_array: 1.738441
This builds on Jason's benchmark by filling in some random values and occasionally finding a value that exists in the array. All random, so beware that times will fluctuate.
$a = array();
for ($i = 0; $i < 10000; ++$i) {
$v = rand(1, 1000000);
$a[$v] = $v;
}
echo "Size: ", count($a), PHP_EOL;
$start = microtime( true );
for ($i = 0; $i < 10000; ++$i) {
isset($a[rand(1, 1000000)]);
}
$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;
$start = microtime( true );
for ($i = 0; $i < 10000; ++$i) {
in_array(rand(1, 1000000), $a);
}
$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;
Which is faster:
isset()
vsin_array()
isset()
is faster.
While it should be obvious, isset()
only tests a single value. Whereas in_array()
will iterate over the entire array, testing the value of each element.
Rough benchmarking is quite easy using microtime()
.
Results:
Total time isset(): 0.002857
Total time in_array(): 0.017103
Note: Results were similar regardless if existed or not.
Code:
<?php
$a = array();
$start = microtime( true );
for ($i = 0; $i < 10000; ++$i) {
isset($a['key']);
}
$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;
$start = microtime( true );
for ($i = 0; $i < 10000; ++$i) {
in_array('key', $a);
}
$total_time = microtime( true ) - $start;
echo "Total time: ", number_format($total_time, 6), PHP_EOL;
exit;
Additional Resources
I'd encourage you to also look at:
- PHP Benchmark
- PHPPerf
- XDebug
Using isset()
takes advantage of speedier lookup because it uses a hash table, avoiding the need for O(n)
searches.
The key is hashed first using the djb hash function to determine the bucket of similarly hashed keys in O(1)
. The bucket is then searched iteratively until the exact key is found in O(n)
.
Barring any intentional hash collisions, this approach yields much better performance than in_array()
.
Note that when using isset()
in the way that you've shown, passing the final values to another function requires using array_keys()
to create a new array. A memory compromise can be made by storing the data in both the keys and values.
Update
A good way to see how your code design decisions affect runtime performance, you can check out the compiled version of your script:
echo isset($arr[123])
compiled vars: !0 = $arr
line # * op fetch ext return operands
-----------------------------------------------------------------------------
1 0 > ZEND_ISSET_ISEMPTY_DIM_OBJ 2000000 ~0 !0, 123
1 ECHO ~0
2 > RETURN null
echo in_array(123, $arr)
compiled vars: !0 = $arr
line # * op fetch ext return operands
-----------------------------------------------------------------------------
1 0 > SEND_VAL 123
1 SEND_VAR !0
2 DO_FCALL 2 $0 'in_array'
3 ECHO $0
4 > RETURN null
Not only does in_array()
use a relatively inefficient O(n)
search, it also needs to be called as a function (DO_FCALL
) whereas isset()
uses a single opcode (ZEND_ISSET_ISEMPTY_DIM_OBJ
) for this.
The second would be faster, as it is looking only for that specific array key and does not need to iterate over the entire array until it is found (will look at every array element if it is not found)