What distribution do you get from this broken random shuffle?

Solution 1:

An Empirical Approach.

Let's implement the erroneous algorithm in Mathematica:

p = 10; (* Range *)
s = {}
For[l = 1, l <= 30000, l++, (*Iterations*)
   a = Range[p];
   For[k = 1, k <= p, k++, 
     i = RandomInteger[{1, p}];
     temp = a[[k]];
     a[[k]] = a[[i]];
     a[[i]] = temp
   ];
   AppendTo[s, a];
]  

Now get the number of times each integer is in each position:

r = SortBy[#, #[[1]] &] & /@ Tally /@ Transpose[s]  

Let's take three positions in the resulting arrays and plot the frequency distribution for each integer in that position:

For position 1 the freq distribution is:

enter image description here

For position 5 (middle)

enter image description here

And for position 10 (last):

enter image description here

and here you have the distribution for all positions plotted together:

enter image description here

Here you have a better statistics over 8 positions:

enter image description here

Some observations:

  • For all positions the probability of "1" is the same (1/n).
  • The probability matrix is symmetrical with respect to the big anti-diagonal
  • So, the probability for any number in the last position is also uniform (1/n)

You may visualize those properties looking at the starting of all lines from the same point (first property) and the last horizontal line (third property).

The second property can be seen from the following matrix representation example, where the rows are the positions, the columns are the occupant number, and the color represents the experimental probability:

enter image description here

For a 100x100 matrix:

enter image description here

Edit

Just for fun, I calculated the exact formula for the second diagonal element (the first is 1/n). The rest can be done, but it's a lot of work.

h[n_] := (n-1)/n^2 + (n-1)^(n-2) n^(-n)

Values verified from n=3 to 6 ( {8/27, 57/256, 564/3125, 7105/46656} )

Edit

Working out a little the general explicit calculation in @wnoise answer, we can get a little more info.

Replacing 1/n by p[n], so the calculations are hold unevaluated, we get for example for the first part of the matrix with n=7 (click to see a bigger image):

enter image description here

Which, after comparing with results for other values of n, let us identify some known integer sequences in the matrix:

{{  1/n,    1/n      , ...},
 {... .., A007318, ....},
 {... .., ... ..., ..},
 ... ....,
 {A129687, ... ... ... ... ... ... ..},
 {A131084, A028326 ... ... ... ... ..},
 {A028326, A131084 , A129687 ... ....}}

You may find those sequences (in some cases with different signs) in the wonderful http://oeis.org/

Solving the general problem is more difficult, but I hope this is a start

Solution 2:

The "common mistake" you mention is shuffling by random transpositions. This problem was studied in full detail by Diaconis and Shahshahani in Generating a random permutation with random transpositions (1981). They do a complete analysis of stopping times and convergence to uniformity. If you cannot get a link to the paper, then please send me an e-mail and I can forward you a copy. It's actually a fun read (as are most of Persi Diaconis's papers).

If the array has repeated entries, then the problem is slightly different. As a shameless plug, this more general problem is addressed by myself, Diaconis and Soundararajan in Appendix B of A Rule of Thumb for Riffle Shuffling (2011).

Solution 3:

Let's say

  • a = 1/N
  • b = 1-a
  • Bi(k) is the probability matrix after i swaps for the kth element. i.e the answer to the question "where is k after i swaps?". For example B0(3) = (0 0 1 0 ... 0) and B1(3) = (a 0 b 0 ... 0). What you want is BN(k) for every k.
  • Ki is an NxN matrix with 1s in the i-th column and i-th row, zeroes everywhere else, e.g:

kappa_2

  • Ii is the identity matrix but with the element x=y=i zeroed. E.g for i=2:

I_2

  • Ai is

Ai= bIi + aKi

Then,

B_n

But because BN(k=1..N) forms the identity matrix, the probability that any given element i will at the end be at position j is given by the matrix element (i,j) of the matrix:

solution matrix

For example, for N=4:

B_4

As a diagram for N = 500 (color levels are 100*probability):

B_500

The pattern is the same for all N>2:

  • The most probable ending position for k-th element is k-1.
  • The least probable ending position is k for k < N*ln(2), position 1 otherwise

Solution 4:

I knew I had seen this question before...

" why does this simple shuffle algorithm produce biased results? what is a simple reason? " has a lot of good stuff in the answers, especially a link to a blog by Jeff Atwood on Coding Horror.

As you may have already guessed, based on the answer by @belisarius, the exact distribution is highly dependent on the number of elements to be shuffled. Here's Atwood's plot for a 6-element deck:

enter image description here