Prove that $P\Big( \bigcup\limits_{i=1}^{n}A_i\Big) = \sum\limits^n_{i=1}(-1)^{k-1}\sum\limits_{{i_k}=1}^n P (A_{i_1} \cap...\cap A_{i_k}).$

I want to prove this formula by induction:

$$P\Big( \bigcup_{i=1}^{n}A_i\Big) = \sum^n_{i=1}(-1)^{k-1}\sum_{(i_1,...,..i_k)\ :\ 1 \leq i_1<...<i_k \leq n } P (A_{i_1}\cap A_{i_2} \cap...\cap A_{i_k}). $$

I already proved the base case: $$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$

I am trying to do the induction step: $$P\Big( \bigcup_{i=1}^{n+1}A_i\Big) = P\Big( \big( \bigcup_{i=1}^{n}A_i \big) \cup A_{n+1} \Big) $$

Using the base case I get:

$$= P\Big( \bigcup_{i=1}^{n}A_i\Big) + P(A_{n+1}) - P\Big( \bigcup_{i=1}^{n}A_i \cap A_{n+1} \Big)$$

Using the induction assumption, it is getting messy:

$$= P(A_ {n+1}) + \sum^n_{i=1}(-1)^{k-1}\sum_{(i_1,...,..i_k)\ :\ 1 \leq i_1<...<i_k \leq n } P (A_{i_1}\cap A_{i_2} \cap...\cap A_{i_k}) - \\ \sum^n_{i=1}(-1)^{k-1}\sum_{(i_1,...,..i_k)\ :\ 1 \leq i_1<...<i_k \leq n } P \big( (A_{i_1}\cap A_{n+1})\cap (A_{i_2} \cap A_{n+1}) \cap... \cap(\cap A_{i_k} \cap A_{n+1}) \big)$$

How do I proceed from here/ is there a better attempt?


Solution 1:

The principle of inclusion/exclusion is actually based on the equality: $$\mathbf{1}_{\bigcup_{i=1}^{n}A_{i}}=\sum_{k=1}^{n}\left(-1\right)^{k-1}\sum_{1\leq i_{1}<\cdots<i_{k}\leq n}\mathbf{1}_{A_{i_{1}}\cap\cdots\cap A_{i_{k}}}\tag1$$

Note that LHS and RHS both give $0$ if an argument $\omega\notin\bigcup_{i=1}^nA_i$ is substituted.

Now suppose that $\omega\in A_{j}$ iff $j\in\left\{ j_{1},\dots,j_{m}\right\} $ where $\left\{ j_{1},\dots,j_{m}\right\} \subseteq\left\{ 1,\dots,n\right\} $ has cardinality $m>0$.

Then substituting $\omega$ as argument gives $1$ on LHS and also on RHS:

$$\sum_{k=1}^{m}\left(-1\right)^{k-1}\binom{m}{k}=1-\sum_{k=0}^{m}\binom{m}{k}\left(-1\right)^{k}1^{m-k}=1-\left(\left(-1\right)+1\right)^{m}=1-0=1$$

This proves $(1)$ and taking expectations on both sides we find consequently: $$P\left(\bigcup_{i=1}^{n}A_{i}\right)=\sum_{k=1}^{n}\sum_{1\leq i_{1}<\cdots<i_{k}\leq n}\left(-1\right)^{k-1}P(A_{i_{1}}\cap\cdots\cap A_{i_{k}})\tag2$$

So this gives a direct proof without using induction but does not really help you if you insist on the use of induction.

Induction can be applied and your setup is okay, but if that is not necessary then I would certainly choose for this approach.


addendum

First note that for the sake of consistency I replaced every $x$ above by $\omega$.

" I don't know how $x$ can be an argument..."

So this question translates to:

" I don't know how $\omega$ can be an argument..."

As you said in your comment $A_1,A_2,\dots$ are events. That means that - if you work on probability space $(\Omega,\mathcal A,P)$ - they are elements of $\mathcal A$ where $\sigma$-algebra $\mathcal A$ is a subcollection of the power set $\wp(\Omega)$. So actually the $A_i$ are subsets of $\Omega$ in this context.

If $B\subseteq\Omega$ then $\mathbf1_B:\Omega\to\mathbb R$ is the function prescribed by $\omega\mapsto1$ if $\omega\in B$ and $\omega\mapsto0$ otherwise.

So elements of $\Omega$ serve as arguments for functions like $\mathbf1_B$ where $B\subseteq\Omega$.


" I don't get why the binomial coefficient appears..."

Starting with a fixed $\omega\in A_{j_1}\cap\cdots\cap A_{j_m}$ where $1\leq j_1<\cdots<j_m\leq n$ together with $i\in\{1,\dots,n\}-\{j_1,\dots,j_m\}\implies \omega\notin A_i$ for a fixed $k$ take a look at the summation:$$\sum_{1\leq i_{1}<\cdots<i_{k}\leq n}\mathbf{1}_{A_{i_{1}}\cap\cdots\cap A_{i_{k}}}(\omega)$$

Every term equals $1$ or $0$ so the summation equals the number of terms that equal $1$.

Now note that the term $\mathbf{1}_{A_{i_{1}}\cap\cdots\cap A_{i_{k}}}(\omega)$ equals $1$ if and only if $\{i_1,\dots,i_k\}\subseteq\{j_1,\dots,j_m\}$.

So selections of $k$ elements in $\{j_1,\dots,j_m\}$ correspond with terms that equal $1$ and there are exactly $\binom{m}{k}$ of such selections.


" Also we did not introduce the concept of expectation yet.."

I am not going to give a college in order to introduce expectation, but I trust that within a short while you will be made familiar to that.

Actually for this only two things are important and are not too broad to mention:

  • If $B\in\mathcal A$ then the expectation of $\mathbf1_B$ exists with $\mathbb E\mathbf1_B=P(B)$
  • Linearity of expectation is applied on the RHS.

I hope things are more clear now.

Often the principle of inclusion/exclusion is taught without mentioning the underlying $(1)$. That is really a pity, and this is an effort to save you (and hopefully also others) from that.