Intuition behind the concept of indicator random variables.

Solution 1:

As the name implies, an indicator random variable indicates something: the value of $I_A$ is $1$ precisely when the event $A$ occurs, and is $0$ when $A$ does not occur (that is, $A^c$ occurs). Think of $I_A$ as a Boolean variable that indicates the occurrence of the event $A$. This Boolean variable has value $1$ with probability $P(A)$ and so its average value is $P(A)$. In terms of long-term frequencies, $I_A$ will have value $1$ on roughly $N\cdot P(A)$ of $N$ trials of the experiment, and the long-term average value of $I_A$ on these $N$ trials will be approximately $P(A)$.

Solution 2:

An indicator function is a function that returns the value of 1 when something is true: $$\mathbf{1}[A] = \left\{\begin{array}{cc} 1, & A\ \mathrm{ is\ true,} \\ 0, & A\ \mathrm{ is\ false.} \end{array}\right.$$

Therefore, the expectation is essentially the same thing as computing the expected value of a Bernoulli random variable: the value 1 times the probability that $A$ is true, plus the value 0 times the probability it is not.