What does the decomposition, weak union and contraction rule mean for conditional probability and what are their proofs?

In many of the above solutions, I have found several mistakes. One of them was pointed out by JesterII, i.e. that $X \perp Y \mid Z$ & $X \perp W \mid Z \Rightarrow X \perp Y,W \mid Z$ is false in general. Another thing I noticed in several places is the use of the equality $ P(X,Y \mid Z)P(W \mid Z) = P(X,Y,W \mid Z)$ which is not true unless $W \perp X,Y$. Here is my attempt at the proofs.

  • Decomposition: $ X \perp Y,W \mid Z \Rightarrow X \perp Y \mid Z $
  • Proof: From $X\perp Y,W \mid Z$, we get:

    $$ P(X,Y,W \mid Z) = P(X \mid Z)P(Y,W \mid Z) $$

    Marginalizing W (i.e. summing over all values of W), we get

    $$ \sum_W P(X,Y,W \mid Z) = \sum_W P(X \mid Z)P(Y,W \mid Z) = P(X \mid Z)\sum_W P(Y,W \mid Z)$$

    Thus,

    $$ P(X,Y \mid Z) = P(X \mid Z)P(Y \mid Z) $$

    Hence Proved

  • Contraction: $(X \perp Y \mid Z) \land (X \perp W \mid Y,Z) \Rightarrow X \perp Y,W \mid Z$

  • Proof: By Bayes Rule:

    $$P(X,Y,W \mid Z) = P(X \mid Y,W,Z)P(Y,W \mid Z)$$

    Using the independence property $ X \perp W \mid Y,Z $, we get:

    $$P(X,Y,W \mid Z) = P(X \mid Y,Z)P(Y,W \mid Z)$$

    Now use the independence property $ X \perp Y \mid Z $ to get:

    $$P(X,Y,W \mid Z) = P(X \mid Z)P(Y,W \mid Z)$$

    Hence Proved

  • Weak Union: $ X \perp Y,W \mid Z \Rightarrow X \perp Y \mid Z,W $

  • Proof:By Bayes Rule:

    $$P(X,Y \mid W,Z) = P(X \mid Y,W,Z)P(Y \mid W,Z)$$

    Using the independence property $X \perp Y,W \mid Z $, we get:

    $$P(X,Y \mid W,Z) = P(X \mid Z)P(Y \mid W,Z)$$

    Using the decomposition property, we have $X \perp Y,W \mid Z \Rightarrow X \perp W \mid Z$, which gives us $P(X \mid Z) = P(X \mid W,Z)$. Thus

    $$P(X,Y \mid W,Z) = P(X \mid Z)P(Y \mid W,Z) = P(X \mid W,Z)P(Y \mid W,Z)$$

    Hence Proved.

  • Intersection: $(X \perp Y \mid Z,W) \land (X \perp W \mid Y,Z) \Rightarrow X \perp Y,W \mid Z$

  • Proof: Given the 2 independence properties $(X \perp Y \mid Z,W) \land (X \perp W \mid Y,Z)$, we have

    $$P(X \mid Z,W) = P(X \mid Y,Z,W) = P(X \mid Y,Z)$$

    Applying Bayes Rule to the first and last term of the above equation, we get:

    $$\frac{P(X,W \mid Z)}{P(W \mid Z)} = \frac{P(X,Y \mid Z)}{P(Y \mid Z)}$$

    Thus,

    $$ P(X,W \mid Z)P(Y \mid Z) = P(X,Y \mid Z)P(W \mid Z) $$

    Summing over W, we get:

    $$ \sum_W P(X,W \mid Z)P(Y \mid Z) = \sum_W P(X,Y \mid Z)P(W \mid Z) $$

    Thus,

    $$ P(X \mid Z)P(Y \mid Z) = P(X,Y \mid Z) $$

    This proves that $X \perp Y \mid Z$

    Given $X \perp Y \mid Z$ and $(X \perp W \mid Y,Z)$, we can now apply the contraction property to get $X \perp Y,W \mid Z$

    Hence Proved


Greetings from Germany!
Funny, it seems we have been struggling with exactly the same problem at exactly the same time. The difference is just that I found these unproven lemmas in Judea Pearl's book "Causality". It seems that different authors copy the same text passages, including the confusing parts. But one part of your question helped me to find out what I got wrong. Here is how to solve it:

The most important insight is that what seems to be a definition of conditional independence actually is a first lemma derived from the real definition. So, the Definition of this is given by:

$(X \perp Y \mid Z) \Leftrightarrow P(x \mid z)P(y \mid z) = P(x,y \mid z)$ with $P(z)>0$.

In another publication by Pearl, he also states as a side remark that from now on, it shall be assumed that all conditions have a probability greater than 0, so that this way of combining or splitting up probabilities can be used throughout. But that is what you have to know to find the proofs! For example, it is then possible to derive what I (and probably you) mistook to be the actual definition of conditional probability:

$(X \perp Y \mid Z)$

$\Leftrightarrow P(x,y \mid z) = P(x \mid z)P(y \mid z)$

$ \Leftrightarrow \frac{ P(x,y \mid z) }{ P(y \mid z) } = P(x \mid z) $

$ \Leftrightarrow \frac{P(x,y,z)P(z)}{P(y,z)P(z)} = P(x \mid z)$

$ \Leftrightarrow \frac{P(x,y,z)}{P(y,z)} = P(x \mid z)$

$ \Leftrightarrow P(x \mid y,z) = P(x \mid z) $

Using the real definition of conditional probability, it is also possible to prove that your interpretation of the mysterious concatenation of capital letters was correct. I had the same problem and made the same guess about the semantics, but I was not sure if one really is allowed to split up the expression into two parts. Here is the proof:

$(X \perp Y \mid Z) \& (X \perp W \mid Z)$

$ \Leftrightarrow P(x \mid z)P(y \mid z) = P(x,y\mid z) \ \& \ P(x \mid z)P(w|z) = P(x,w \mid z) $

$ \Leftrightarrow P(x \mid z) = \frac{P(x,y \mid z)}{P(y|\mid z)} \ \& \ P(x \mid z)P(w \mid z) = P(x,w \mid z)$

fitting the left term into the right one we get (i.e. substitute $P(x|z)$ on the RHS with its definition of the LHS):

$ \Leftrightarrow \frac{P(x, y \mid z)P(w\mid z)}{P(y \mid z)} = P(x,w \mid z)$

$ \Leftrightarrow \frac{P(x , y, w \mid z)}{P(y \mid z)} = P(x \mid z)P(w \mid z) $

$ \Leftrightarrow \frac{P(x, y, w \mid z)}{P(y \mid z)P(w \mid z)} = P(x \mid z)$

$ \Leftrightarrow \frac{P(x,y,w \mid z)}{P(y,w \mid z)} = P(x \mid z) $

$ \Leftrightarrow \frac{P(x,y,w,z)P(z)}{P(y,w,z) P(z)} = P(x \mid z)$

$ \Leftrightarrow \frac{P(x,y,w,z)}{P(y,w,z)} = P(x \mid z)$

$ \Leftrightarrow P(x \mid y,w,z) = P(x \mid z)$

$ \Leftrightarrow (X \perp Y, W \mid Z)$

And, finally, this also leads to the proof of the weak union feature we both were trying to find at the same time:

Step 1:

$(X \perp Y, W \mid Z)$

$ \Leftrightarrow (X \perp Y \mid Z) \& (X \perp W \mid Z)$

$\implies$ (using the fact that $A \ \& \ B \implies B$):

$(X \perp W \mid Z)$

$ \Leftrightarrow P(x|w,z) = P(x|z)$

Step 2:

$(X \perp Y, W \mid Z)$

$ \Leftrightarrow P(x|y,w,z) = P(x|z)$

Step 3, combining the results of steps 1 and 2:

$P(x\mid z) = P(x \mid y,w,z) = P(x \mid w,z)$

$\Leftrightarrow (X \perp Y \mid W, Z)$

So, in total we derived:

$(X \perp Y, W \mid Z) \implies (X \perp Y \mid W, Z)$ q.e.d.

Before I realized all this, I had already been able to prove the decomposition feature by assuming the weak union feature. But with this new understanding of what is defined and what is derived, it should be even more straightforward to prove the other features. As soon as I have finished the other proofs, I might post them here, if still needed by someone.

Just one last tip: In Pearl's book, he is citing two earlier articles that can easily be found online and look very informative:

Dawid (1979) Conditional independence in statistical theory

This is the article where this "non-standard notation" of conditional independence was first introduced.

Pearl & Paz (1987) Graphoids: A graph-based logic for reasoning about relevance relations

To my knowledge, this was the first pioneering work on combining graphs with conditional independence to yield graphical probability models. In an article by Chaitin, I recently read that in order to get a feeling for the intuition behind a mathematical theory, it is a good idea to read the very first early publications in which the theory was still under construction, so this might be a good read.

I hope this helps. And if I get stuck again, I now know a new site where to post my questions! :-)

Roul from Bochum & Osnabrück, Germany


Your interpretation of the statement is correct, and I actually prefer "your" notation $X\perp\!\!\!\perp (Y,W)\mid Z$ rather than the one without the parenthesis.

So what does $X\perp\!\!\!\perp Y\mid Z$ even mean? One possible way of defining this, is to require that $$ P(X\in A,Y\in B\mid Z)=P(X\in A\mid Z)P(Y\in B\mid Z) $$ should hold for any (measurable) $A,B\subseteq\mathbb{R}$. An equivalent definition is that $$ {\rm E}[f(X)g(Y)\mid Z]={\rm E}[f(X)\mid Z]{\rm E}[g(Y)\mid Z] $$ should hold for all bounded (measurable) $f,g:\mathbb{R}\to\mathbb{R}$.

As for the "decomposition" statement, suppose $X\perp\!\!\!\perp (Y,W)\mid Z$ and let $A,B\subseteq\mathbb{R}$. Then $$ P(X\in A,Y\in B\mid Z)=P(X\in A,Y\in B,W\in \mathbb{R}\mid Z) $$ since $P(W\in \mathbb{R})=1$. Using the conditional independence assumption, this equals $$ \begin{align} P(X\in A,(Y,W)\in B\times\mathbb{R}\mid Z)&=P(X\in A\mid Z)P((Y,W)\in B\times\mathbb{R}\mid Z)\\ &=P(X\in A\mid Z)P(Y\in B\mid Z) \end{align} $$ and hence $X\perp\!\!\!\perp Y\mid Z$.