$n$ balls are randomly tossed into $m$ bins, each bin can hold $k$ balls. If a ball is tossed into a full bin (already has $k$ balls in it), it can be tossed repeatedly and randomly into the $m$ bins again until an empty or partly loaded bin is met. The question is that what is the expectation of the toss times for the $n$ balls, and what is the distribution of the number of balls in the $m$ bins ($n \leq mk$). For the first question, a simplified version can also be: when there are already $n$ balls accumulated in the $m$ bins according to the above toss rule, what is the average toss times for another ball tossed into the bins (we could discuss the case in this stable state rather than the build process).

I know without the re-toss process, the number of balls in the bins must follow the binomial distribution; however, when the re-toss is added, it turns to be quite different.


If you know about random walks, I would formalize this as follows :

Let $X_N$ be a discrete time process in the state space $(\mathbb N \cup \{0\})^{m}$. We let $X_0 = (0,0,\dots,0)$ and we say that $$ e_i = (0,\dots,1,\dots, 0), \quad \mathbb P(X_{N+1} = X_N + e_i) = 1/m. $$ So you add a ball in a bin and the model assumes that the bins can store infinitely many balls. Now you're interested in some events, so we need to define them properly in terms of random variables we can study. Write $$ X_N = (X_{N,1},\dots, X_{N,m}) $$ and consider $$ T_{N,i} = \min \{ X_{N,i}, k \}, $$ so that $$ T_N = (T_{N,1}, \dots, T_{N,m}) $$ would represent what is actually in your bins. If $$ \tau_n = \inf \left \{ N \in \mathbb N \, \left| \, \sum_{i=1}^m T_{N,i} \ge n \right. \right \}, $$ then using stochastic process theory one can show that this defines a stopping time (you need to define the filtrations and everything but that will work out just fine). This was too big for a comment, because I didn't actually work out the details, but having a formal setting always gives a good start. Note that what you want to look for would be :

For the expected toss of the $n$ balls, you want $\mathbb E[\tau_n]$ ;

For the distribution of $X_N$, well you can readily compute it using this model.

I hope this setting will help you get started.

EDIT: This question didn't get an answer in 8 years, so I figured I'd give it a better try just for fun.

So basically if all the bins still have less than $k$ balls in them, the average toss time is $1$. But if $\ell$ of those bins have $k$ balls in them, $0 < \ell < m$, then a toss will land the next ball in one of those bins with probability $\ell/m$, leading to a re-toss. In other words, if $\ell$ bins are full, $0 \le \ell < m$, then we get a re-toss with probability $\frac{m-\ell}m$ on the next toss.

Using the notation defined above, let $$ Y_N = \sum_{i=1}^m \mathbb 1_{\{T_{N,i} = k\}}, $$ i.e. $Y_N$ represents the number of full bins (the variable $\mathbb 1_{\{T_{N,i} = k\}}$ takes the value $1$ if the bin is full and $0$ if it isn't, so adding them up does the count). The number we are interested in computing is $$ \mathbb E[\tau_{n+1} -\tau_n] $$ since $\tau_n$ represents the number of tosses required to put in $n$ balls, and $\tau_{n+1}$ represents the number of tosses required to put in $n+1$ balls, so their difference represents the number of tosses required to put in the $(n+1)^{\text{th}}$ ball.

To compute, we condition on the value of $Y_n$: $$ \mathbb E[\tau_{n+1} -\tau_n] = \sum_{\ell=0}^{m-1} \mathbb E[\tau_{n+1} -\tau_n | Y_n = \ell] \cdot \mathbb P(Y_n = \ell). $$ The conditional expectation is easy to compute: since we re-toss until we get a toss that isn't a re-toss, the conditional random variable $\tau_{n+1} -\tau_n | Y_n = \ell$ follows a geometric distribution with parameter $\frac{m-\ell}m$ (in particular, if $\ell = 0$, we get no re-toss with probability $1$, which coincides with what we expect), so its expectation is $\frac m{m-\ell}$.

As for the probability factor, it gets a bit more tricky. First notice that if $n < k$, then $\mathbb P(Y_n = 0) = 1$ and $\mathbb P(Y_n = \ell) = 0$ for $\ell > 0$, so for these values of $n$, we can write $$ \mathbb E[\tau_{n+1} -\tau_n] = \sum_{\ell=0}^{m-1} \mathbb E[\tau_{n+1} -\tau_n | Y_n = \ell] \cdot \mathbb P(Y_n = \ell)= \frac mm \cdot 1 + \sum_{\ell = 1}^{m-1} \frac{m}{m-\ell} \cdot 0 = 1, $$ which is what you'd expect since there is no possibility for re-tossing. So we restrict our attention on computing this expectation when $k \le n < mk$. (When $n=mk$, all bins are full, so the process stops.)

There is nothing particular about any special bin, so if we know that there are $n$ balls in the bins, a given ball is in a given bin with probability $\frac 1m$. So the distribution of where are $n$ balls in the $m$ bins that can contain at most $k$ balls is as simple as positioning $n$ balls in a grid of $m$ rows and $k$ columns, and then adding up the counts in the rows. In such a grid, there is a ball in slot $(i,j)$ with probability $\frac n{km}$, and the slots contain a ball independently of each other. So the probability that $\ell$ rows are full is equal to the number of cases where $\ell$ rows are full divided by the total number of cases.

To compute the total number of cases, this is easy: we just have to pick the slots, so the count is $\binom {mk}n$. We are interested in computing the number of cases where exactly $\ell$ rows are filled, but it turns out it's actually easier to compute the number of cases where a given set of $\ell$ rows are filled, not worry about the rest, and then use inclusion-exclusion to subtract for the case where at least $\ell$ rows are filled the case where $\ell+1$ rows are filled and so on.

To compute the number of cases where $\ell$ given rows are filled, we just have to place the remaining balls in the remaining rows, so the count is $\binom{(m-\ell)k}{n-\ell k}$. To count the case where the rows are not given in advance but we just have at least $\ell$ filled rows, we use the inclusion-exclusion principle~: Let $L_{\ell} \subseteq [m] \overset{def}= \{1,\cdots,m\}$ be an arbitrary subset of the $m$ rows of cardinality $\ell$, and write $C(L_{\ell})$ for the set of placements of the $n$ balls that fill the rows indexed by $L_{\ell}$ (it has cardinality $\binom{(m-\ell)k}{n-\ell k}$ by the above). Then the number of cases with at least $\ell \ge 1$ rows filled, call it $C_n(\ell)$, is equal to $$ \left| \bigcup_{L_{\ell} \subseteq [m]} C(L_{\ell}) \right| = \sum_{j=\ell}^{m-1} (-1)^{j-\ell} \sum_{L_j \subseteq [m]} |C(L_j)| = \sum_{j=\ell}^{\lfloor \frac nk \rfloor} (-1)^{j-\ell} \binom mj \binom{(m-j)k}{n-jk}. $$ To explain the value at the top of the last sum, note that we need to stop that sum once $n-jk$ becomes negative, since that means there are not enough balls to fill the $j$ rows with $k$ balls. That happens once $jk > n$, i.e. $j > \frac nk$. So $j$ is an integer constrainted to satisfy $j \le \frac nk$, and since $j$ is an integer, $j \le \lfloor \frac nk \rfloor$.

Therefore, the number of cases we are looking for is $C_n(\ell) - C_n(\ell+1)$, which we can slightly simplify: $$ C_n(\ell) - C_n(\ell+1) \\ = \sum_{j=\ell}^{\lfloor \frac nk \rfloor} (-1)^{j-\ell} \binom mj \binom{(m-j)k}{n-jk} - \sum_{j=\ell+1}^{\lfloor \frac nk \rfloor} (-1)^{j-(\ell+1)} \binom mj \binom{(m-j)k}{n-jk} \\ = \binom m{\ell} \binom{(m-\ell)k}{n-\ell k} + 2 \left( \sum_{j=\ell+1}^{\lfloor \frac nk \rfloor} (-1)^{j-\ell} \binom mj \binom{(m-j)k}{n-jk} \right) $$

So since $$ \mathbb P(Y_n = \ell) = \frac{C_n(\ell) - C_n(\ell+1)}{\binom{mk}n}, $$ we have $$ \mathbb E[\tau_{n+1} -\tau_n] = \sum_{\ell=0}^{m-1} \frac{m}{m-\ell} \cdot \frac{C_n(\ell) - C_n(\ell+1)}{\binom{mk}n}. $$

Therefore, we obtain a closed formula for $\mathbb E[\tau_N]$ by noticing that $\mathbb E[\tau_N] = N$ for $N \le k$, and then we can do a telescopic sum for $N > k$: $$ \mathbb E[\tau_N] = k + \sum_{n=k}^{N-1} \mathbb E[\tau_{n+1} - \tau_n] = \\ k + \sum_{n=k}^{N-1} \sum_{\ell=0}^{m-1} \frac{m}{m-\ell} \cdot \frac{1}{\binom{mk}n} \left( \binom m{\ell} \binom{(m-\ell)k}{n-\ell k} + 2 \left( \sum_{j=\ell+1}^{\lfloor \frac nk \rfloor} (-1)^{j-\ell} \binom mj \binom{(m-j)k}{n-jk} \right) \right). $$

Hope you still enjoy this answer!