Non-zero Conditional Differential Entropy between a random variable and a function of it

Let two continuous random variables, where the one is a function of the other: $X\, $ and $\, Y=g\left(X\right)$. Their mutual information is defined as $$I\left(X,Y\right)\,=\,h\left(X\right)\,-\,h\left(X|Y\right)=\,h\left(Y\right)\,-\,h\left(Y|X\right)$$ where lowercase h denotes differential entropy, the entropy concept for continuous rv's. It is a proven fact that the mutual information between two variables, for discrete as well as for continuous rv's, is non-negative, and becomes zero only when the two rv's are independent (clearly not our case). Using the fact that Y is a function of X we have $$I\left(X,g\left(X\right)\right)\,=\,h\left(g\left(X\right)\right)\,-\,h\left(g\left(X\right)|X\right) \gt\,0\,\Rightarrow \,h\left(g\left(X\right)\right)\,\gt \,h\left(g\left(X\right)|X\right)$$ Now, differential entropy (unlike entropy for discrete rv's) can take negative values. Assume that it so happens that $h\left(g\left(X\right)\right)\lt\,0$. Then from the positivity of mutual information we obtain $$0\,\gt \,h\left(g\left(X\right)\right)\,\gt \,h\left(g\left(X\right)|X\right) \Rightarrow\; h\left(g\left(X\right)|X\right)\neq\,0$$ And this is the counter-intuitive puzzle: for any discrete random variable Z we always have $h\left(g\left(Z\right)|Z\right)\,=\,0$. This is intuitive: if Z is known, then any function of Z is completely determined -no entropy, no uncertainty remains, and so the conditional entropy measure is zero. But we just saw that, when dealing with continuous rv's where the one is a function of the other, their conditional differential entropy may be non-zero (it doesn't matter whether it is positive or negative), which is not intuitive at all. Because, even in this strange world of continuous rv's, knowing X, completely determines Y=g(X). I have searched high and low to find any discussion, comment or exposition of the matter, but I found nothing. Cover & Thomas book does not mention it, other books do not mention it, a myriad of scientific papers or web sites do not mention it.

My motives: a) Scientific curiosity. b) I want to use the concept of mutual information for continuous rv's in an econometrics paper I am writing, and I feel very uncomfortable to just mention the "non-zero conditional differential entropy" case without being able to discuss it a bit. So any intuition, reference, suggestion, idea, or full answer of course, would be greatly appreciated. Thanks.


Solution 1:

The paradox can be stated in a simpler form:

We know that $I(X;Y)=h(X)-h(X|Y)\ge 0$ holds, also for continuous variables. Take the particular case $Y=X$; then second term vanishes ($h(X|X)=0$) and we get

$$ I(X;X)= h(X)-h(X|X)=h(X) \ge 0$$

But this is not right. The differential entropy can be negative. So what?

when dealing with continuous rv's where the one is a function of the other, their conditional differential entropy may be non-zero (it doesn't matter whether it is positive or negative), which is not intuitive at all.

Your problem (and the problem with the above paradox) is to implicitly assume that the concept "zero entropy means no uncertainty" applies also to differential entropy. That's false. The differential entropy is not some type of entropy. It's false that $h(g(X)|X)=0$, it's false that $h(X|X)=0$; and it's false that $h(X)=0$ implies zero uncertainty. The fact that (by a mere change of scale) a differential entropy can be made negative, suggests by itself that here zero differential entropy (conditional or not) has no special meaning. In particular, a uniform variable in $[0,1]$ has $h(X)=0$.

Solution 2:

Differential entropy is only defined for absolutely continuous random variables with respect to Lebesgue measure. The joint distribution of

$$(X,g(X))$$

is not absolutely continuous because its support is

$$\{(x,g(x)):x\in\mathbb{R}\},$$

a measure 0 subset of $\mathbb{R}^2$.

The definition of conditional entropy requires an absolutely continuous joint distribution, $f(x,y)$: \begin{align} h(X|Y) &= -\int f(x,y) \log f(x|y)\ dxdy \\ &= h(X,Y)-h(Y). \end{align} If $(X,Y)$ is not absolutely continuous against Lebesgue measure in $\mathbb{R}^2$, then there is no function $f(x,y)$ such that the probability measure of $(X,Y)$ can be expressed as an integral against $f$.

Therefore neither the quantity $h(g(X)|X)$, nor the quantity $I(X;g(X))$ are well defined as differential entropies.