confused about joint mutual information

I have a difficulty understanding 'joint mutual information'

The expressions like $I(X,Y;B)$ are not understood.

Is there an good example to understand joint mutual information?

Actually, I want to know the following text. (Michel 2004, The Extraction and Use of Informative Features for Scale Invariant Recognition)

In this context, the expression like $I(A,B;C)$ appears. I want to understand this. But there is a little material to study.

Also I found http://isites.harvard.edu/fs/docs/icb.topic467421.files/1-entropy.pdf .

In it, $I(X;Y)$, $I(X;Y|Z)$ and $I(X_1,X_2;Y)$ appeared. It is hard to understand the differences.


enter image description here


Solution 1:

Mutual information relates two random variables $X$ and $Y$. The variables are usually separated by a semicolon, and the relation is symmetric. So when you read $I(X;Y)$ you should think as $\{X\} \overset{I}{\longleftrightarrow}\{Y\}$

(BTW, the main relations are $I(X;Y)=H(X)-H(X|Y)=H(Y)-H(Y|X)=I(Y;X)$, but you probably already knew this).

When we write $I(X_1,X_2;Y)$ we (usually) mean that $X_1$ and $X_2$ should be regarded as a composite (multivariate) variable, and we are computing the mutual information of this composite variable with $Y$. That is, $\{X_1,X_2\} \overset{I}{\longleftrightarrow}\{Y\}$

Regarding $I(X;Y |Z)=H(X|Z)-H(X |Y,Z)$, this is again a mutual information between $X$ and $Y$, only that conditioned to knowledge of $Z$. It's, again symmetric, so $I(X;Y |Z)=I(Y;X |Z)$.

So, if you were confused about the "precedence" in the above expressions: let's say that "$,$" (composition of variables) binds stronger than "$;$" (mutual information), so that $I(X_1,X_2;Y)$ should be read as $I( (X_1,X_2);Y)$; and the later binds stronger than "$|$" (global conditioning).

Any textbook on Information Theory explains the concept and properties of mutual information, eg: Cover and Thomas