Help understanding proof of the following statement $E(Y) = \sum_{i = 1}^{\infty} P(Y \geq k)$

If Y is a discrete random variable that assigns positive probabilities to only the positive integers, show that

$E(Y) = \sum_{i = 1}^{\infty} P(Y \geq k)$

Where $E(Y)$ is the expected value (or mean) of Y.

This is a text book question and I'm having trouble understanding the solution as they don't give any worded explanations.

Here is the solution:

1 $\hspace{1.4cm}\sum_{i = 1}^{\infty} P(Y \geq k)$

2 $\hspace{1.4cm} = \sum_{k = 1}^{\infty}\sum_{j = k}^{\infty} P(Y=k)$

3 $\hspace{1.4cm} = \sum_{k = 1}^{\infty}\sum_{j = k}^{\infty} P(j)$

4 $\hspace{1.4cm} = \sum_{j = 1}^{\infty}j\cdot P(j)$

5 $\hspace{1.4cm} = \sum_{y = 1}^{\infty}y\cdot P(y) = E(Y)$

I'm not sure how to interpret the inequality $Y \geq k$ inside the probability function, does it read "values k assigned to the random variable $Y$ such that $k$ is less than or equal to $Y$? But how can a value $k$ be less than a random variable $Y$ that doesn't take on any values?

I'm also very confused about steps 1 to 2, and 3 to 4.

(1 to 2): I don't understand how they broke down the inequality inside the probability function.

I think the nested summations are making this complicated for me to understand.

Sorry for the broad question and wordy post. Any quick short explanation of anything will be very much appreciated.

Solution 1:

I find the following visualization to be helpful. Let $p_i=P(X=i)$, and consider the following infinite grid: $$ \begin{array}{ccccc} p_1 & p_2 & p_3 & p_4 & \dots \\ & p_2 & p_3 &p_4 & \dots \\ & & p_3 &p_4 & \dots \\ &&&p_4 & \dots\\ &&&&\ddots \end{array} $$ Consider the following two ways of adding up all the numbers in this grid:

You can first add up all the columns, then add up all the column totals. The $n^{th}$ column contains $n$ copies of $p_n$, so the result is $\sum_{n=1}^\infty np_n$, which is just $E[X]$.
You can first add up all the rows, then add up all the row totals. The $n^{th}$ rows total is $p_n+p_{n+1}+\dots$, which is precisely $P(X\ge n)$, so this is $\sum_{n=1}^\infty P(X\ge n)$.

As long as you can convince yourself that both methods give you the same sum, then the result is proved.

Solution 2:

Step 1 to 2 is simple: the event $Y\geq k$ happens if $Y$ takes on any of the values $\{k,k+1,k+2,\dots\}$. After that just use that the probabilty of the union disjoint events is the sum of the probability of the individual events.

Step 3 to 4 is just a change of order of summation: instead of summing over j first, we sum over k first. For that, note that $P(Y=n)$ term occurs in the sum over $j$ for a fixed $k$ iff $n\geq k$ as the summation over $j$ starts from $k$. Thus you will have the $P(Y=n)$ term only in the sums for $k=1,\dots,n$, thus exactly $n$ times.