Efficiently pick n random elements from PHP array (without shuffle)

I have the following code to pick $n elements from an array $array in PHP:

shuffle($array);
$result = array_splice($array, 0, $n);

Given a large array but only a few elements (for example 5 out of 10000), this is relatively slow, so I would like to optimize it such that not all elements have to be shuffled. The values must be unique.

I'm looking fo the most performant alternative. We can assume that $array has no duplicates and is 0-indexed.


$randomArray = [];
while (count($randomArray) < 5) {
  $randomKey = mt_rand(0, count($array)-1);
  $randomArray[$randomKey] = $array[$randomKey];
}

This will provide exactly 5 elements with no duplicates and very quickly. The keys will be preserved.

Note: You'd have to make sure $array had 5 or more elements or add some sort of check to prevent an endless loop.


This function performs a shuffle on only $n elements where $n is the number of random elements you want to pick. It will also work on associative arrays and sparse arrays. $array is the array to work on and $n is the number of random elements to retrieve.

If we define the $max_index as count($array) - 1 - $iteration.

It works by generating a random number between 0 and $max_index. Picking the key at that index, and replacing its index with the value at $max_index so that it can never be picked again, as $max_index will be one less at the next iteration and unreachable.

In summary this is the Richard Durstenfeld's Fisher-Yates shuffle but operating only on $n elements instead of the entire array.

function rand_pluck($array, $n) {
    $array_keys = array_keys($array);
    $array_length = count($array_keys);
    $max_index = $array_length -1;
    $iterations = min($n, $array_length);
    $random_array = array();
    while($iterations--) {
        $index = mt_rand(0, $max_index);
        $value = $array_keys[$index];
        $array_keys[$index] = $array_keys[$max_index];
        array_push($random_array, $array[$value]);
        $max_index--;
    }
    return $random_array;
}

The trick is to use a variation of shuffle or in other words a partial shuffle.

performance is not the only criterion, statistical efficiency, i.e unbiased sampling is as important (as the original shuffle solution is)

function random_pick( $a, $n ) 
{
  $N = count($a);
  $n = min($n, $N);
  $picked = array_fill(0, $n, 0); $backup = array_fill(0, $n, 0);
  // partially shuffle the array, and generate unbiased selection simultaneously
  // this is a variation on fisher-yates-knuth shuffle
  for ($i=0; $i<$n; $i++) // O(n) times
  { 
    $selected = mt_rand( 0, --$N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
    $value = $a[ $selected ];
    $a[ $selected ] = $a[ $N ];
    $a[ $N ] = $value;
    $backup[ $i ] = $selected;
    $picked[ $i ] = $value;
  }
  // restore partially shuffled input array from backup
  // optional step, if needed it can be ignored, e.g $a is passed by value, hence copied
  for ($i=$n-1; $i>=0; $i--) // O(n) times
  { 
    $selected = $backup[ $i ];
    $value = $a[ $N ];
    $a[ $N ] = $a[ $selected ];
    $a[ $selected ] = $value;
    $N++;
  }
  return $picked;
}

NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and produces output which is proper array with consecutive keys (not needing extra array_values etc..)

Use example:

$randomly_picked = random_pick($my_array, 5);
// or if an associative array is used
$randomly_picked_keys = random_pick(array_keys($my_array), 5);
$randomly_picked = array_intersect_key($my_array, array_flip($randomly_picked_keys));

For further variations and extensions of shuffling for PHP:

  1. PHP - shuffle only part of an array
  2. PHP shuffle with seed
  3. How can I take n elements at random from a Perl array?