Expected number of balls chosen from bag, I throw out everything in the bag with numbers less than the ball's

I am mainly confused about my reasoning/solution

Problem: I have a bag of numbers ranging from $1,2... n$. I randomly choose a number and then I throw out all numbers less than that. Then I keep choosing numbers until I've found the largest one. What's the expected number of things I have chosen?

My Argument: I expect on average to have chosen a number of $n/2$ on each pick. Hence, everytime I choose a number, I expect to discard half of them. Hence, the $E(X) = \log_2(n)$

Note that I devised this problem as an equivalent one while trying to solve the "cluster of cars on one long road" question. If this is not equivalent, please let me know but I am confident that it is. As a result of this equivalency, I know my solution is incorrect. However, I cannot pinpoint where the issue arises.

Probability problem: cars on the road


Here is the result of work in the comments by lulu, Henry, and me.

The expected number of balls is $E[n]=\sum_{i=1}^n\frac 1i=H_n\approx \log n + \gamma + \frac 1{2n}$, the $n^{th}$ harmonic number.

The recursion is $E[n]=1+\frac 1n\sum_{i=0}^{n-1}E[i]$ because you can imagine putting ball $n$ randomly into an arrangement of $n-1$ balls. $i$ represents the number of balls before ball $n$. You will then throw away $E[i]$ balls plus ball $n$. We define $E[0]=0$ to make this come out properly when ball $n$ is put first in line.

We have $E[1]=1$ as the base case for strong induction. Then if $E[j]=H_j$ for all $j$ from $1$ to $k$, $$\begin {align}E[k+1]&=1+\frac 1{k+1}\sum_{i=0}^{k}H_i\\ &=1+\frac 1{k+1}\left(1+(1+\frac 12)+(1+\frac 12+\frac 13)+\ldots H_{k}\right)\\ &=1+\frac 1{k+1}\left(k+\frac{k-2}2+\frac{k-3}3+\ldots\frac{k-(k-1)}{k}\right)\\ &=1+H_k-\frac k{k+1}\\&=H_{k+1}\end {align}$$ To go from the second line to the third we count the number of terms $\frac 1m$ and note that it is $n-m$


Others in the comments have computed the correct solution. Here I will only comment on what I think went wrong with the "intuitive" solution of $\log_2 n$.

For slightly easier exposition, I change the game s.t. we throw out all numbers larger than the drawn number. This way the numbers left are always $\{1, 2, \dots, k\}$ and it is slightly easier (for me) to visualize.

Imagine $n=2^{10}=1024$ and I play the game $G$ as stated. Meanwhile, you start with an identical bag and play a different game $G'$: at every step, you throw out exactly the larger half of all your remaining numbers. So your game will take exactly $11$ steps to finish. Which of us will finish first?

On your first draw, you draw $512$. On my first draw, I have about $1/2$ chance to do worse than you (i.e. throw out fewer) by drawing $d > 512$ and about $1/2$ chance to do better by drawing $d < 512$. However, when I do worse, I lost "less than a step" compared to your schedule, whereas when I do better, I can be gaining up to a step ($d \in [256, 511]$) or up to $2$ steps ($d \in [128, 255]$), or up to $3$ steps ($d \in [64, 127]$), etc. compared to your schedule. Apologies if my boundaries above are not exactly correct, but hopefully you get the idea. So intuitively, for large $n$ I should win easily. This also means $E[X] < \log_2 n$.

Another way to look at it: Let $K_j$ be the largest remaining number (also number of remaining numbers) after $j$ steps, with $K_0 = n$. For game $G'$, we have $\log_2 K_{j+1} - \log_2 K_j = -1$ exactly. If it were true that for game $G$ we have $E[\log_2 K_{j+1} | K_j] - \log_2 K_j = -1$, then it is plausible that the answer would be $\log_2 n$. However, in fact $E[\log_2 K_{j+1} | K_j] - \log_2 K_j < -1$. For this reason $G$ finishes faster than $G'$.