Striking applications of linearity of expectation
Solution 1:
Buffon's needle: rule a surface with parallel lines a distance $d$ apart. What is the probability that a randomly dropped needle of length $\ell\leq d$ crosses a line?
Consider dropping any (continuous) curve of length $\ell$ onto the surface. Imagine dividing up the curve into $N$ straight line segments, each of length $\ell/N$. Let $X_i$ be the indicator for the $i$-th segment crossing a line. Then if $X$ is the total number of times the curve crosses a line, $$\mathbb E[X]=\mathbb E\left[\sum X_i\right]=\sum\mathbb E[X_i]=N\cdot\mathbb E[X_1].$$ That is to say, the expected number of crossings is proportional to the length of the curve (and independent of the shape).
Now we need to fix the constant of proportionality. Take the curve to be a circle of diameter $d$. Almost surely, this curve will cross a line twice. The length of the circle is $\pi d$, so a curve of length $\ell$ crosses a line $\frac{2\ell}{\pi d}$ times.
Now observe that a straight needle of length $\ell\leq d$ can cross a line either $0$ or $1$ times. So the probability it crosses a line is precisely this expectation value $\frac{2\ell}{\pi d}$.
Solution 2:
As lulu mentioned in a comment, the fact that a uniformly random permutation $\pi\colon\{1,2,\dots,n\}\to\{1,2,\dots,n\}$ has in expectation one fixed point is a quite surprising statement, with a one-line proof.
Let $X$ be the number of fixed points of such a uniformly random $\pi$. Then $X=\sum_{k=1}^n \mathbf{1}_{\pi(k)=k}$, and thus $$ \mathbb{E}[X] = \mathbb{E}\left[\sum_{k=1}^n \mathbf{1}_{\pi(k)=k}\right] = \sum_{k=1}^n \mathbb{E}[\mathbf{1}_{\pi(k)=k}] = \sum_{k=1}^n \mathbb{P}\{\pi(k)=k\} = \sum_{k=1}^n \frac{1}{n} = 1\,. $$
Solution 3:
I really love this proof of Bickel and Lehmann (Ann. Math. Stat. 1969), Unbiased Estimation in Convex Families: we can use linearity of expectation to prove unbiased estimators for certain quantities are impossible. $\newcommand{\PP}{\mathbb{P}} \DeclareMathOperator{\E}{\mathbb{E}}$
Let $T(\PP)$ be some property of a distribution you wish to estimate, where $\PP$ is a distribution in some class $\mathcal P$. Let $\PP_0 \ne \PP_1$ both in $\mathcal P$ such that $\PP_\alpha = (1-\alpha) \PP_0 + \alpha \PP_1$, a weighted mixture of the two distributions, is also in $\mathcal P$ for any $\alpha \in [0, 1]$. We want to know whether there exists any estimator $\hat T(X_1, \dots, X_n)$ such that $$\E_{X_i \stackrel{iid}{\sim} \PP} \hat T(X_1, \dots, X_n) = T(\PP).$$ Assume that there is some such estimator. Then \begin{align*} R(\alpha) &= T(\PP_\alpha) \\&= \int_{x_1} \cdots \int_{x_n} \hat T (x_1, \dots, x_n) \,\mathrm{d}\PP_\alpha(x_1) \cdots \mathrm{d}\PP_\alpha(x_n) \\&= \int_{x_1} \cdots \int_{x_n} \hat T(x_1, \dots, x_n) \,\left[ (1-\alpha) \, \mathrm{d}\PP_0(x_1) + \alpha \, \mathrm{d}\PP_1(x_1) \right] \cdots \left[ (1-\alpha) \, \mathrm{d}\PP_0(x_n) + \alpha \, \mathrm{d}\PP_1(x_n) \right] \\&= (1-\alpha)^n \E_{X_i \stackrel{iid}{\sim} \PP_0}[ \hat T(X_1, \dots, X_n)] + \dots + \alpha^n \E_{X_i \stackrel{iid}{\sim} \PP_1}[ \hat T(X_1, \dots, X_n)] ,\end{align*} and so $R(\alpha)$ must be a polynomial in $\alpha$ of degree at most $n$.
Now, for example, let $T$ be the standard deviation, take $\PP_0 = \mathcal N(0, 1)$ [the standard normal distribution] and $\PP_1 = \mathcal N(0, 2^2)$. Then by the law of total variance, $$ R(\alpha) = T(\PP_\alpha) = \sqrt{(1-\alpha) \cdot 1 + \alpha \cdot 4 + \operatorname{Var}[0]} = \sqrt{1 + 3 \alpha} ,$$ which is not polynomial in $\alpha$ – and hence no unbiased estimator of the standard deviation exists, at least on any class of distributions containing two-component Gaussian mixtures.
You can also easily use this same technique to show, e.g., the non-existence of unbiased estimators for Hellinger distance or integral probability metrics (Theorem 3 here).
Solution 4:
Here is an application from theoretical computer science: recall that a 3-SAT formula is a CNF (conjunctive normal form) formula where each clause has three literals: that is, $\phi = C_1\land C_2\land \dots \land C_k$, where each $C_k$ (a clause) is an $\lor$ of 3 (distinct) literals.
Theorem. For every 3-SAT formula formula over $n$ variables, there exists an assignment $x\in\{0,1\}^n$ satisfying at least $7/8$ of the clauses.
Proof. Consider a 3-SAT formula $\phi = C_1\land C_2\land \dots \land C_k$ (we can assume wlog that no clause is trivially true). Letting $I_j$ be the indicator r.v. of whether the $j$-th clause $C_j$ is satisfied, we have that the number $N$ of clauses satisfied by a uniformly random assignment is $$ \mathbb{E}[N] = \mathbb{E}\left[\sum_{j=1}^k I_j\right] = \sum_{j=1}^k \mathbb{E}[I_j] = \sum_{j=1}^k \mathbb{P}\{C_j \text{ satisfied}\} = \sum_{j=1}^k \frac{7}{8} = \frac{7}{8}k $$ since each clause has 3 (distinct) literals, and therefore is satisfied with probability $1-1/2^3=7/8$.
Now, since $N$ has expectation $\frac{7}{8}k$, we have that $\mathbb{P}\{N\geq \frac{7}{8}k\}>0$; that is, there exists some realization of $x$ satisfying at least a $7/8$ fraction of the clauses.