Expected average length of torrent download streak

When downloading large files, they are split up into many small parts, which are downloaded in random order. I was downloading a torrent file, which made me think of the following problem.

If I have downloaded $x$ of the $n$ pieces, what is the expected average length of a continuous stretch of downloaded parts? For example, if there were $100$ pieces, then each one could be downloaded or not. Some of the downloaded parts would be next to each other. These would form a chain. If I were to average the lengths of all the chains, what should I expect the answer to be?


Solution 1:

Here is an exact calculation of the expected value. The word element below refers to a minimal piece of a file (chunk) determined by the torrent being downloaded. Thus, all streaks as well as expected values of their average are measured in chunks, not in bytes.

The key point here is determining the number of ways one can choose $m$ elements among a sequence of given $n$ elements such that the selection constitute exactly $k > 0$ streaks (stretches, chains, etc.) Denote this number by $T(n, m, k)$. For the binomial coefficients, the convention $(b<0 \lor b>a) \Rightarrow {a \choose b}=0$ is used below.

Theorem. $$T(n, m, k) = {n-m+1 \choose k} {m-1 \choose k-1}.$$

Proof. Take any combination being counted in $T(n, m, k)$. There may be exactly $\hat x_0 \ge 0$ non-chosen elements before the first streak, $x_i \ge 1$ non-chosen elements between the $i$th and the $(i+1)$th streaks for any $i=1,2,\dots,k-1$, and $\hat x_k \ge 0$ non-chosen elements after the last streak. The total number of non-chosen elements must be $n-m$. Bringing $x$'s to a uniform sequence, denote $x_0 = \hat x_0 + 1,$ $x_k = \hat x_k + 1$, after which their sum increases by $2$.

Also, there may be $y_i \ge 1$ chosen elements in $i$th streak for any $i=1,2,\dots,k$. Their total number must be $m$.

We have just built a one-to-one correspondence between the combinations we're counting and positive integer solutions of the system $$\begin{align} x_0 + x_1 + \dots + x_k &= n-m+2 \\ y_1 + \dots + y_k &= m. \end{align}$$ Note that equations in this system are independent of each other, so the total number of its solutions is the product of numbers of solutions of the two equations. Solving each equation separately is nothing else than counting compositions, hence the result. $\blacksquare$

Going back to the question, we need to calculate the expected value of the random variable $X = \frac m k$ for a given parameters $n, m$ (I will use $m$ instead of $x$ in the original question here), $1 \le m \le n$. As all of ${n \choose m}$ combinations are equiprobable, $$\mathbb E[X] = \frac {\displaystyle \sum_{k=1}^m {\textstyle \frac m k \, T(n,m,k)}} {{n \choose m}} = \frac {\displaystyle \sum_{k=1}^m {\textstyle \frac m k \, {n-m+1 \choose k} {m-1 \choose k-1}}} {{n \choose m}}.$$ By the way, the above denominator is nothing else than $\sum_{k=1}^m {T(n,m,k)}$. To calculate the numerator, first note that $\frac m k \, {m-1 \choose k-1} = {m \choose k} = {m \choose m-k}$, then use Vandermonde's identity: $$\displaystyle \sum_{k=1}^m {\textstyle \frac m k \, {n-m+1 \choose k} {m-1 \choose k-1}} = \sum_{k=1}^m {\textstyle {n-m+1 \choose k} {m \choose m-k}} = \sum_{k=0}^m {\textstyle {n-m+1 \choose k} {m \choose m-k}} - 1 = \textstyle {n+1 \choose m} - 1.$$ Hence,

$$\mathbb E[X] = \frac {{n+1 \choose m} - 1} {{n \choose m}} = \frac {n+1} {n-m+1} - \frac 1 {{n \choose m}}.$$

The last term quickly approaches $0$ when $n$ is large and $m$ moves away from $1$ and $n$.


I have also considered a supplementary question: calculation of $\mathbb E[X]$ in case when only $n$ is known. Instead of $m$, probability $p$ of downloading any particular piece (chunk) is given. Thus, $m$ is random (as well as $k$) now and has the binomial distribution $B(n, p)$. The only drawback is the possibility of $k=0$ (or $m=0$ which is the same) now, but this can be eliminated by calculating $\mathbb E\left[\frac m k \mid m>0\right]$ (i.e. we know for sure that something was already downloaded, but don't know how much). To such a problem statement the answer is

$$\mathbb E[X \mid m>0] = \frac {\displaystyle \sum_{m=1}^n {p^m q^{n-m} \sum_{k=1}^m {\tfrac m k \, T(n,m,k)}}} {1-q^n} = \frac {1 - \displaystyle \frac {q^{n+2}-p^{n+2}}{q-p}}{q(1-q^n)},$$

where $q = 1-p$, and the fraction $\frac {q^{n+2}-p^{n+2}}{q-p}$ should be changed to $\frac {n+2}{2^{n+1}}$ when $p=q=\frac 1 2$.