The definition of independence is not intuitive

The point of independent events is not quite what you have in mind. If you know that a roll gave you a 1 or 2, then you know with absolute certainty that the roll was not a 5 or 6. In other words, the probability of it being 5 or 6 dropped from 2/6 to zero. Very much dependence there!

What the term "independent" seeks to capture is the following: You roll two dice. One colored red the other green. The red one turns with a two up. What's the probability that the green one is a six? Here there is no cosmic connection between the two dice, so the intuitive reaction should be: why would the outcome of the red roll affect the green roll? Well, it shouldn't. That's what we call independent. What you describe is "disjoint events". They do play a role in probability, but the word "independent" is reserved for this other useful concept.

To address your last question. At the dawn of probability theory it might not have been an impossible choice to pick another word to describe this. But there already is a word for the situation $E\cap F=\emptyset$! You called such events disjoint yourself (the technically correct term here is mutually exclusive)! If we use two words for the same concept, there will be no end to the resulting confusion. Also this meaning of the word independent does match with the intuition of the practitioners of probability theory. At least after they have seen it used a few times. You will quickly join us!

Adding one more thing. To define independent in this way is just so damn useful. As an example I'm vaguely familiar with I will mention that the functioning of cell phones to some extent depends on our ability to model and analyze various and sundry sources of noise and interference with this kind of independent random variables. Other posters can undoubtedly list even more commonplace applications.


The probabilistic definition of independence is related to the idea of causality (actually they are opposite concepts). If knowledge of one event has an effect on the occurrence of another event, there is a dependence. Independence, on the other hand, is a certain balance on the spectrum between exclusive and "inclusive" events.

Probability also talks about an event space, the set of all possible outcomes of an "experiment". For example, rolling one die, or rolling two dice.

Your first confusion is that if $A$ is rolling a $1$ or $2$ and $B$ is rolling a $5$ or $6$, are these in the same event space (are they referring to the same roll of the die) or different event spaces (different rolls of the die)? Given a fair die, $A$ & $B$ will be independent if they are from different event spaces. But if they both refer to the same roll of the die, then of course they are mutually exclusive. In this case, their is a strong dependence at work: $$ \matrix{ P(A)=\frac13 &\qquad& P(A|B)=0 &\qquad& P(A|\overline{B})=\frac12 \\\\ P(B)=\frac13 &\qquad& P(B|A)=0 &\qquad& P(B|\overline{A})=\frac12 } $$ One source of confusion is that when we say $A$ and $B$ come from separate events, we probably mean they live in different event spaces, for example, the first and second roll of a die. But if we say they are distinct events, it is perhaps unclear whether we mean they are different events in the same space (from the same roll of the die) or from different spaces. Unfortunately the problem of how to clearly describe identity and difference of object types and instances cannot be avoided, and one must be careful to clarify this both as a reader and writer.

Given any events $A$ and $B$, we can draw a Venn diagram representing them, perhaps with the areas representing their probabilities $P(A)$ and $P(B)$. The area of intersection is then the probability $P(AB)$ of both events being true. If the events are mutually exclusive, then the areas have no intersection and so $P(AB)=0$ (each event precludes the other).

enter image description here

At the other extreme, one event may include the other, in which case one area is inside the other. For example if $A$ is contained in $B$, i.e. if $A \implies B$, then $P(AB)=P(A)$, and the conditional probability $P(B|A)=\frac{P(AB)}{P(A)}$ of $B$ given $A$ is thus $1$.

enter image description here

So a general law is that $$0\le P(AB) \le \min\left( P(A),~ P(B) \right) $$ $$0\le P(A|B)=\frac{P(AB)}{P(B)} \le \min\left( \frac{P(A)}{P(B)},~1 \right) $$ $$0\le P(B|A)=\frac{P(AB)}{P(A)} \le \min\left(1,~\frac{P(B)}{P(A)} \right).$$ When the sandwitched quantity equals one of the minimum values, we have inclusion/implication.

intersection

Between these two extremes of exclusive and "inclusive" events, there is a balance point. That balance point is when $P(AB)=P(A)\,P(B)$, and this is called independence. However, it is not so easy to tell from the diagrams alone whether two events merely intersect, or whether they are in fact independent. This is where we need the numerical formulation of independence. As an exercise, you should convince yourself from the fomula for conditional probability above that

$$P(A|B)=P(A) \iff P(AB)=P(A)\,P(B) \iff P(B|A)=P(B).$$

Independence says that you can recover the joint probability from the marginal probabilities. If you take a conventional Venn diagram for two events $A$ and $B$ and draw them inside a unit square as rectangles, and if you can do this with $A$ along one axis and $B$ along the other and the areas of the rectangles and their intersection still represent the probabilities, then $A$ and $B$ are independent.

independence

This is not a fluke of geometry. Independence has a geometrical interpretation because of the conceptual connection to causality and its connection to Cartesian coordinates.

It's also good to have a tabular example. Let's say here that we roll two dice, a red die and a blue die. Let $A$ be the event that the blue die is in $\{1,2,3,4\}$ (or any particular four possibilites) and $B$ be the event that the red die is in $\{5,6\}$ (or any two values). These events should be independent. The table below is roughly the mirror image of the graphic above (with a vertical flip). $$ \eqalign{ & A\qquad & \overline{A}\qquad & \text{Total} \\\\ B\qquad & \frac29 & \frac19 & \frac13 \\\\ \overline{B}\qquad & \frac49 & \frac29 & \frac23 \\\\ \text{Total}\qquad & \frac23 & \frac13 & 1 } $$ We can also re-write the above probabilites as counts in the event space: $$ \eqalign{ & A\qquad & \overline{A}\qquad & \text{Total} \\\\ B\qquad & 2 & 1 & 3 \\\\ \overline{B}\qquad & 4 & 2 & 6 \\\\ \text{Total}\qquad & 6 & 3 & 9 } $$ Try writing out the next few numeric examples you encounter this way and the concept will become second nature.

Independence means the the marginals factor the joint distribution.

If you are familiar with the concept of Cartesian products of sets...

If we start with a priori unrelated events $A\in X$ and $B\in Y$ and $p_X,~p_Y$ are the probability distributions on $X$ and $Y$, then $p_{X\times Y}(AB)=p_X(A)\cdot p_Y(B)$ defines a probability on the product event space, $X\times Y$, corresponding to all events $A$ of $X$ and $B$ of $Y$ being independent. Independence means that the product space decomposes under the process of marginalization, and that the decomposition is a commutative diagram (i.e. that the reverse process of marginalization, reintroducing each variable, recovers the correct/original joint distribution).

Independence, and conditional independence, are also interesting from the perspectives of Bayesian networks (also called graphical models) and, via entropy, information theory. When there are many variables in a Bayesian network, there is an interesting method of diagramming their relationships called plate notation. In decision theory, there are also influence diagrams.