Select a new value from last $N$ values; how long until the last $N$ are all the same?

Say first we have N distinct numbers in a line, like 1,2,3,...,N, in each round, we choose a random one from the last N numbers, and put it in the end.

Asking the expected number of rounds to make the last N numbers the same.

e.g. for N = 2, first we have

1, 2

and if we chose 1, we got

1, 2, 1

and the status stays the same, since the last N numbers still all distinct. and if we chose 2, we got

1, 2, 2

and the game ends.

Suppose the expected number is S, we can write

S = 1/2 * (S + 1) + 1/2 * (1)

and we get S = 2

Things become very complicated when N > 2, so I turn for help.

UPDATE:

this occurs to me when doing a project, see this if you are interested in.

and i just wanna know whether it can be solved in a graceful way, or in a hard way but get the answer finally, so i don't need a numeric answer.


This is not a solution, but it might be helpful, and it is too long for a comment.

Your $N^N$ equations can be simplified, because you can exploit symmetry in the structure of the problem. Say that $L(S)$ is the expected number of rounds for the game to end after reaching state $S$, where $S$ is some string of length $N$. Then:

$$ \begin{eqnarray} L(AA)=L(BB)&=&0\\ L(AB)& =& 1+\frac12(L(BB)+L(BA)) \\ L(BA)& =& 1+\frac12(L(AA)+L(AB)) \\ \end{eqnarray} $$

Notice that the equations for $L(AB)$ and $L(BA)$ are identical, except that $A$ and $B$ have exchanged places. So by symmetry, $L(AB)=L(BA)$, and we get:

$$L(AB)=1+\frac12L(AB)$$

So $L(AB) = L(BA) = 2$. This tells us that the game ends in 2 steps (on average) from both these states.


Now we can consider the $N=3$ case.

$$ \begin{eqnarray} L(AAA)=L(BBB)=L(CCC)&=&0\\ L(ABC)&=&1+\frac13(L(BCA)+L(BCB)+L(BCC))\\ L(AAB)&=&1+\frac13(L(ABA)+L(ABA)+L(ABB)) \\ L(ABA)&=&1+\frac13(L(BAA)+L(BAB)+L(BAA))\\ L(ABB)&=&1+\frac13(L(BBA)+L(BBB)+L(BBB)) \\ &\vdots& \end{eqnarray} $$

This looks awful, but remember we can simplify. There aren't 27 variables here; there are only five:

$$ \begin{eqnarray} L(AAA)&=&L(BBB)=L(CCC)\\ L(ABC)&=&L(ACB)=L(BAC)=\cdots=L(CBA)\\ L(ABA)&=&L(ACA)=L(BAB)=\cdots=L(CBC)\\ L(AAB)&=&L(AAC)=L(BBA)=\cdots=L(CCB)\\ L(ABB)&=&L(ACC)=L(BAA)=\cdots=L(CBB) \end{eqnarray} $$

This allows us to reduce the original set of 27 equations in 27 variables to five equations in five variables:

$$ \begin{eqnarray} L(AAA)&=&0\\ L(ABC)&=&1+\frac13(L(ABC)+L(ABA)+L(ABB))\\ L(AAB)&=&1+\frac13(L(ABA)+L(ABA)+L(ABB)) \\ L(ABA)&=&1+\frac13(L(ABB)+L(ABA)+L(ABB))\\ L(ABB)&=&1+\frac13(L(AAB)\hphantom{+L(AAA)+L(AAA)}) \end{eqnarray} $$

I tried solving these with pen and paper and got $L(ABC)=\frac{27}{4} = 6\frac34$. I might have made a mistake, of course; it is after midnight. But as a proof of concept I think it was a success.

Anyway, I think the technique is reasonable, and it will reduce your unmanageable 10,000,000,000 equations to a much smaller set, maybe only a few dozen.

Addendum: Sadly, this only reduces the $N=10$ case from $10^{10}$ equations to 115,975. It brings it into the realm of the feasible, but not nearly as much as I had hoped.


I can improve a little on MJD's method. This was based on computing for each state $S$ (the last $N$ values) the expected remaining number of steps $L(S)$ until a final state (last $N$ values equal) is reached.

Let me change the notation slightly. Let $T_S$ be the number of steps from state $S$ until a final state is reached, and assume that this final state consists of $N$ copies of the value $F_S$. Both $T_S$ and $F_S$ are random variables, and $L(S)=\text{E}[T_S]$.

We can then write $$ \text{E}[T_S] =\sum_{u\in\mathcal{A}} \text{E}[T_S|F_S=u]\cdot\Pr[F_S=u] =\sum_{u\in\mathcal{A}}\sum_{t=0}^\infty \Pr[T_S=t, F_S=u]\cdot t $$ where $u$ runs through the values of $\mathcal{A}=\{1,\ldots,N\}$. The probability $\Pr[T_S=t, F_S=u]$ is that of reaching a final state with the final $N$ values all equal to $u$ after $t$ steps.

Instead of computing the expected remaining steps for of all combinations $\mathcal{A}^N$ (modulo permutations on $\mathcal{A}$), all we need are values $$ L_u(S)=\text{E}[T_S|F_S=u]\cdot\Pr[F_S=u] $$ which only depends on which elements of $S$ are equal to $u$. Thus, it suffices to compute $L_1(S)$ for $S\in\{1,0\}^N$ where $1$ indicates the value $u$ and $0$ a value other than $u$.

This reduces the computational burden to having $2^N$ different $L_1(S)$ to compute.

Update: Here are the equations required for solving $L_1(S)$. As noted by MJD in the comment, these are slightly different.

For ease of notation, let's restrict ourselves to the case where $S\in\{0,1\}^N$. Then we can write $$ L_1(S)=\text{E}[T_S|F_S=1]\cdot\Pr[F_S=1]=\text{E}[T_S F_S]. $$ We can then express $L_1(S)$ for non-final states $S$ (i.e. not all 0s or 1s) in terms of $L_1(S')$ where $S'$ are the possible next states as $$ \begin{split} L_1(S)&=\text{E}[T_S F_S]=\Pr[F_S=1]+\text{E}[(T_S-1) F_S]\\ &=\Pr[F_S=1]+\sum_{S'} \Pr[S\rightarrow S']\cdot\text{E}[T_{S'} F_{S'}]\\ &=\Pr[F_S=1]+\sum_{S'} \Pr[S\rightarrow S']\cdot L_1(S') \end{split} $$ where $\Pr[S\rightarrow S']$ denotes the transition probability from $S$ to the next state $S'$: i.e. if $S=(s_1,\ldots,s_N)$ the next state will be $S'=(s_2,\ldots,s_N,1)$ with likelihood $\frac{1}{N}\sum_{i=1}^N s_i$, and $S'=(s_2,\ldots,s_N,0)$ with likelihood $1-\frac{1}{N}\sum_{i=1}^N s_i$.

Note that we could not have done this on the conditional $\text{E}[T_S|F_S=1]$, although it looks tempting, as $\text{E}[T_{S'}|F_{S'}=1]$ are conditional on $F_{S'}=1$ not $F_S=1$. However, we do have $$ \Pr[F_S=1]=\sum_{S'}\Pr[S\rightarrow S']\cdot\Pr[F_{S'}=1]. $$

We now need to compute $\Pr[F_s=1]$ for all $S$.

To do this, let's go back to the original $S=(1,\ldots,N)$ and let $q_k=\Pr[F_S=k]$. The next state $S'=(2,\ldots,N,s')$ for $s'=1,\ldots,N$, each with likelihood $1/N$. This gives us, writing $q_0=0$ for convenience, $$ q_k=q_{k-1}+\frac{q_N}{N} \implies q_k=\frac{2k}{N(N+1)} $$ which is just the above expression for $\Pr[F_S=k]$ in terms of the sum over $S'$.

For $S\in\{0,1\}^N$ we then get $$\Pr[F_S=1]=\sum_{k=1}^N s_kq_k=\frac{2}{N(N+1)}\sum_{k=1}^N s_k.$$

Example: Let's do $N=2$.

For ease of notation, I'll write $f_S=\Pr[F_S=1]$ and $l_S=L_1(S)$ for $S\in\{0,1\}^N$.

First $f_{00}=0$, $f_{10}=1/3$, $f_{01}=2/3$ and $f_{11}=1$.

For the final states, we have $l_{00}=l_{11}=0$. Then, we have $$ \begin{split} l_{10}&=f_{10}+\frac{1}{2}(l_{00}+l_{01})=\frac{1}{3}+\frac{1}{2}l_{01}\\ l_{01}&=f_{01}+\frac{1}{2}(l_{10}+l_{11})=\frac{2}{3}+\frac{1}{2}l_{10}\\ \end{split} $$ which solves to $l_{01}=10/9$ and $l_{10}=8/9$.

Going back to the original problem, $S\in\{1,\ldots,N\}$, we compute $$ L(12)=\sum_u L_u(12)=l_{10}+l_{01}=2 $$ where the two $l$ terms correspond to picking $u=1$ and $u=2$.

The notation could perhaps been less confusing if I'd used values true and false instead of 1 and 0, intruduced the indicator map $\chi_u(S)$ which maps $u$ to true and the other values to false, and consistenty written $S\in\mathcal{A}^N$ while using $\chi_u(S)\in\{\textit{false},\textit{true}\}^N$. However, I hope it was still clear enough.


I did the computations (using Maple to solve the linear equations). If $L_N=L(12\ldots N)$, I found: $$ \begin{split} L_2&=2\\ L_3&=\frac{19}{4}=4.75\\ L_4&=\frac{1179}{140}=8.4214\ldots\\ L_5&=\frac{12226997}{934830}=13.079\ldots\\ L_6&=\frac{633096670030808}{33784478422065}=18.739\ldots\\ &\text{etc.}\\ \end{split} $$ which are not numbers that factor nicely.