Probability of the outcome of this simple card game

A symmetry-based proof that $E[n'] = n$. Also possibly an approach for approximating, but not calculating exactly, $Var(n')$.


Imagine the original $N$ circle cards are in fact $N$ different colors, $c_1, ..., c_N$. The same process is done to the deck, using as many square cards (of the same $N$ colors) as you need.

When the process is finished (i.e. no more circle cards), let random variable $X_j$ be the number of final cards of color $c_j$. Clearly:

  • $E[X_1] = E[X_2] = \dots = E[X_N]$ by symmetry,

  • $X_1 + X_2 + \dots + X_N = N$ by conservation of the total number of cards,

  • $E[X_1] + E[X_2] + \dots + E[X_N] = N$ combining the above and using linearity of expectation,

  • and therefore: $E[X_j] = 1$ for all $j$.

Now imagine you are partially colorblind and cannot distinguish between $c_1, ..., c_n$ and think of all of them as "red". The final no. of "red" cards (to your eyes) will be $n' \equiv X_1 + \dots + X_n,$ and so we have $E[n'] = n$. QED


Further thoughts on $Var(n')$:

Note that each of the $N$ colors can have at most $2$ final cards, i.e. $X_j \in \{0, 1, 2\}$. It might be possible (but difficult/tedious) to calculate $Var(X_j)$, based on careful case analysis, e.g. $X_j = 2$ iff that $c_j$ circle card was picked as the first card of some round $t$, and, neither of the $2$ square cards of color $c_j$ added to the deck was subsequently replaced in rounds $> t$.

If we know $Var(X_j)$, then based on $n' \equiv X_1 + \dots + X_n,$ a decent approximation might be $Var(n') \approx n \times Var(X_j)$. This would be exact equality if the $X_j$'s were independent, but of course they are actually dependent. However, in the limit (or maybe more stringently: the large $n$ but much larger $N$ limit, i.e. $1 \ll n \ll N$), the approximation might be quite good.

The OP said numerically it seems $Var(n') \approx {3n(N-n) \over 4N}$. This would be consistent with my thoughts above if $Var(X_j) = \frac34,$ in the $1 \ll n \ll N$ limit.


It is indeed true that $E[n'] = n$ for any $n \ge 1$.

The idea is to look at the random variable $c_R(t)+s_R(t)$ for each $t$, where $c_R(t)$ is the number of red circles and $s_R(t)$ is the number of red squares after $t$ processes/[plays of the game]. $c_B(t)$ and $s_B(t)$ are defined analogously.

By doing casework on whether a red circle or blue circle was chosen, we have $$E[c_R(t+1)+s_R(t+1)] = $$ $$\frac{c_R(t)}{c_R(t)+c_B(t)}\left[\frac{c_R(t)-1}{N}(c_R(t)-2+s_R(t)+2)+\frac{s_R(t)+1}{N}(c_R(t)-1+S_R(t)+1)+\frac{c_B(t)}{N}(c_R(t)-1+S_R(t)+2)+\frac{s_B(t)}{N}(c_R(t)-1+s_R(t)+2)\right]$$ $$+\frac{c_B(t)}{c_R(t)+c_B(t)}\left[\frac{c_B(t)-1}{N}(c_R(t)+s_R(t))+\frac{s_B(t)+1}{N}(c_R(t)+s_R(t))+\frac{c_R(t)}{N}(c_R(t)+s_R(t)-1)+\frac{s_R(t)}{N}(c_R(t)+s_R(t)-1)\right],$$ which after some computations, involving $c_R(t)+s_R(t)+c_B(t)+s_B(t) \equiv N$, gives $$E[c_R(t+1)+s_R(t+1)] = E[c_R(t)+s_R(t)](1-\frac{1}{N})+E\left[\frac{c_R(t)}{c_R(t)+c_B(t)}\right].$$ Similarly, $$E\left[\frac{c_R(t+1)}{c_R(t+1)+c_B(t+1)}\right] = $$ $$\frac{c_R(t)}{c_R(t)+c_B(t)}\left[\frac{c_R(t)-1}{N}(\frac{c_R(t)-2}{c_R(t)+c_B(t)-2})+\frac{c_B(t)}{N}(\frac{c_R(t)-1}{c_R(t)+c_B(t)-2})+\frac{s_R(t)+1+s_B(t)}{N}(\frac{c_R(t)-1}{c_R(t)+c_B(t)-1})\right]$$ $$+ \frac{c_B(t)}{c_R(t)+c_B(t)}\left[\frac{c_B(t)-1}{N}(\frac{c_R(t)}{c_R(t)+c_B(t)-2})+\frac{c_R(t)}{N}(\frac{c_R(t)-1}{c_R(t)+c_B(t)-2})+\frac{s_R(t)+s_B(t)+1}{N}(\frac{c_R(t)}{c_R(t)+c_B(t)-1})\right],$$ which after a very long computation using also $s_R(t)+s_B(t) \equiv N-c_R(t)-c_B(t)$ gives $$E\left[\frac{c_R(t+1)}{c_R(t+1)+c_B(t+1)}\right] = E\left[\frac{c_R(t)}{c_R(t)+c_B(t)}\right].$$ Since $E[\frac{c_R(0)}{c_R(0)+c_B(0)}] = \frac{n}{N}$, we see that $$E\left[\frac{c_R(t)}{c_R(t)+c_B(t)}\right] = \frac{n}{N}$$ for all $t \ge 0$. Therefore, $$E[c_R(t+1)+s_R(t+1)] = \left(1-\frac{1}{N}\right)E[c_R(t)+s_R(t)]+\frac{n}{N}.$$ Since $E[c_R(0)+s_R(0)] = n$, iteration/induction shows that $$E[c_R(t)+s_R(t)] = n$$ for all $t \ge 0$. In particular, we have $$E[n'] = n,$$ as desired.


Sketch of the proof for $\text{Var}(n') \approx \frac{3n(N-n)}{4(N-1)}$ at large $N$:

As in antkam's answer, we can define the variable $X_i\in\{0,1,2\}$ to be the number of square cards resulted from the $i$'th circle card. $n'$ can be written as $$n'=\sum_{i=1}^n X_i.$$ We can write the Var($n'$) as $$ \begin{align} \text{Var}(n') &= \left\langle\left(\sum_{i=1}^n X_i\right)^2\right\rangle - \left\langle\sum_{i=1}^n X_i\right\rangle^2\\ &=n\left(\text{Var}(X_1)+(n-1)\text{Cov}(X_1,X_2)\right) \end{align} $$ Where we have used the fact that $\text{Var}(X_i)$ is the same for all $i$s and $\text{Cov}(X_i,X_j)$ is the same for all $i\neq j$ by symmetry.

Joriki's answer proved that $\text{Var}(X_i) = 3/4$. All is left to do is to prove that $\text{Cov}(X_i,X_j) = -\frac1{N-1} \text{Var}(X_i)$, and then we will have $$ \begin{align} \text{Var}(n') &=n\text{Var}(X_1)\left(1-\frac{n-1}{N-1}\right) = \frac34\frac{n(N-n)}{N-1}. \end{align} $$ To do this, let us first define the variables $\delta X_i \equiv X_i - \langle X_i\rangle $. The $\text{Cov}(X_i,X_j)$ can be written in terms of $\delta X_i$s as (forgive the sloppy notation) $$ \begin{align} \text{Cov}(X_i,X_j) &= \int\int \delta X_i \delta X_j P(\delta X_i,\delta X_j) d\delta X_i d\delta X_j\\ &=\int\int \delta X_i \delta X_j P(\delta X_i|\delta X_j) P(\delta X_j) d\delta X_i d\delta X_j\\ &=\int\delta X_j\underbrace{\left(\int \delta X_i P(\delta X_i|\delta X_j)d\delta X_i \right)}_{\langle \delta X_i|\delta X_j\rangle} P(\delta X_j) d\delta X_j\\ \end{align} $$ Note that since $\sum_i \delta X_i = 0$, by symmetry the expected value of $\delta X_i$ given $\delta X_j$ should be $-\frac{1}{N-1} \delta X_j$: $$ \langle \delta X_i|\delta X_j\rangle=-\frac{1}{N-1} \delta X_j $$ So we have: $$ \begin{align} \text{Cov}(X_i,X_j) &= \int\delta X_j\langle \delta X_i|\delta X_j\rangle P(\delta X_j) d\delta X_j\\ &= -\frac1{N-1}\int\left(\delta X_j\right)^2 P(\delta X_j) d\delta X_j\\ &= -\frac1{N-1}\text{Var}(\delta X_j). \end{align} $$