How to prove Bonferroni inequalities?
Solution 1:
A proof is there. The main idea is that this is the integrated version of analogous pointwise inequalities and that, for every $k$, $$ S_k=\mathbb E\left({T\choose k}\right),\qquad T=\sum_{i=1}^n\mathbf 1_{A_i}. $$ Hence the result follows from the stronger inequalities asserting that, for every positive integer $N$, $$ \sum_{i=0}^k(-1)^ia_i,\qquad a_i={N\choose i}, $$ is nonnegative when $k$ is even and nonpositive when $k$ is odd. In turn, this fact follows from the properties that the sequence $(a_i)_{0\leqslant i\leqslant N}$ is unimodal and that $\sum\limits_{i=0}^N(-1)^ia_i=0$.
Solution 2:
Here is a self-contained proof that expands on @Did's remarks.
The assertion is that $\Delta_k\le0$ when $k$ is odd, and $\Delta_k\ge0$ when $k$ is even, where $$ \Delta_k:=P\left(\bigcup_{i=1}^n A_i\right) +\sum_{j=1}^k(-1)^j S_j.\tag1 $$ To prove this, first observe that $S_j$ is the expected value of $$ \sum_{i_1 < \cdots <i_j} I(A_{i_1}\cap\cdots\cap A_{i_j}) = {T \choose j}\tag2 $$ where $I(\cdot)$ denotes an indicator random variable and $T$ is the integer-valued random variable $T:=\sum_{i=1}^n I(A_i)$. The reason is that $I(A_{i_1}\cap\cdots\cap A_{i_j})(\omega)=1$ if and only if $\omega$ belongs to each of the $j$ sets $A_{i_1},\ldots,A_{i_j}$. Thus the LHS of (2) counts the number of ways to select $j$ different $A$'s to which $\omega$ belongs, and so does the RHS of (2). (We follow the convention ${a \choose b}=0$ when $a<b$, so (2) holds even when $T<j$.)
From (2) we see that $\Delta_k$ is the expected value of $$ I(\cup A_i) +\sum_{j=1}^k (-1)^j {T\choose j} \stackrel{(3a)}=I(\cup A_i)\left[\sum_{j=0}^k(-1)^j {T\choose j}\right] \stackrel{(3b)}=I(\cup A_i)\left[(-1)^k{T-1\choose k}\right].\tag3 $$ To justify equality (3a), consider the cases $\omega\in\cup A_i$ and $\omega\not\in\cup A_i$. For (3b) we apply (pointwise) an identity about the truncated sum of alternating binomial coefficients. From this last expression we conclude that (3) is a non-positive random variable when $k$ is odd, and a non-negative random variable when $k$ is even, which implies the claimed result.
As a bonus, plug $k=n$ in (3). Since $T\le n$, the bracketed quantity will be zero, which implies $\Delta_n=0$, which is the inclusion-exclusion principle.
Solution 3:
Bonferroni inequality is closely related to the partial sum of alternating binomial coefficients.
Let's consider an element $w$ in sample space and literally count it in the left-hand side and right-hand side of the inequality. If $w$ belongs to none of $A_1$ to $A_n$, then it is not counted in $\bigcup_{i-1}^n A_i$, and it's not counted in any $A_i$, any $A_i\cap A_i$, ..., and any $A_{i_1}\cap A_{i_2}\ldots\cap A_{i_k}$.
If $w$, however, is contained in $r$ sets from $\{A_1, A_2, \ldots, A_n\}$, let's just say $w$ lies in $A_1, \ldots , A_r$. Then $w$ is counted exactly once in $\bigcup_{i-1}^n A_i$ (LHS), and counted ${r\choose 1}-{r\choose 2}+ \ldots +(-1)^{k-1}{r\choose k}$ times on the right-hand side, where for $k\gt r$, ${r\choose k}=0$. Now, let's compare the counts on both sides.
- If $k=r$, using bionomial theorem to exapnd $(1-1)^n$, we have $1={r\choose 1}-{r\choose 2}+ \ldots +(-1)^{r-1}{r\choose r}$. That is, $w$ is counted the same times on both sides.
- If $k<r$, again compare $1$ (LHS) and ${r\choose 1}-{r\choose 2}+ \ldots +(-1)^{k-1}{r\choose k}$ (RHS). Specifically, let's find $f(k) = 1-{r\choose 1}-{r\choose 2} \ldots (-1)^{k-1}{r\choose k}$ instead. In fact, $f(k)$ is the partial sum of alternating bionomial coefficients and has closed form $(-1)^k {r-1 \choose k}$. This can be easily proved by induction and Pascal's rule, see here. Now, it's easy to see when $k$ is odd, $f(k)$ is negative and hence $w$ is counted more times on RHS; when $k$ is even, $f(k)$ is positive and hence $w$ is counted less times on the RHS.
In summary, for odd $k$, $w$ is counted either equal or more times on the RHS and hence the first $k$ terms on RHS is an upper bound of $P\left(\bigcup_{i=1}^n A_i\right)$; and for even $k$, $w$ is counted either equal or fewer times on the RHS and hence the first $k$ terms on RHS is an lower bound of $P\left(\bigcup_{i=1}^n A_i\right)$. The alternating partial sum of binomial coefficients results in the alternating Bonferroni bounds.