I'm a little temporarily confused about the concept of differential entropy. It says on wikipedia that the differential entropy of a Gaussian is $\log(\sigma\sqrt{2\pi e})$. However I was thinking as $\sigma \rightarrow 0^+$, the only intuitive value for an entropy to me seems to be 0. We are then 100% sure that the outcome will be equal to $\mu$, and nothing is required to store the knowledge of what the outcome will be. Instead the expression above gives $-\infty$.

So I must be misunderstanding something, right?

Just to clarify, the reason why I'm asking is that I'm trying to figure out if my approach at this question about Empirical Entropy makes any sense.


Edit: "Own work"

Now I have thought a bit about this. If we take the easiest distribution, the uniform distribution, which (according to wikipedia) has differential entropy $\log(b-a)$, say $b-a = 2^k$.

If k = -1, this would be -1 bit and the interval would be length 0.5.

If k = 0, this would be 0 bit and the interval would be length 1.0.

If k = 1, this would be 1 bit and the interval would be length 2.0.

So if the interval is 0.5, we would "save" one bit, as compared to if we had to store the precision of an interval of length 1. So differential entropy is in some sense the information needed "in excess" to whatever resolution we want to store with. Does this make any sense?


Solution 1:

Yes, the differential entropy disregards resolution (quantization). A continuous random variable can't be represented exactly with finite number of bits. By introducing an approximate representation through quantization you can relate to classical entropy. For example if you quantize uniformly with intervals of length $\Delta=2^{-n}$ you get a discrete random variable with $p_i=\int_{(i-1/2)\Delta}^{(i+1/2)\Delta} f(x)dx$ and entropy $$-\sum_i p_i \log p_i = -\sum_i \int_{(i-1/2)\Delta}^{(i+1/2)\Delta} f(x) dx \log (f(\xi_i)\Delta) = \\ -\sum_i \int_{(i-1/2)\Delta}^{(i+1/2)\Delta} f(x)\log(f(\xi_i)) dx - \log \Delta$$ where first term is approximated by differential entropy when quantization is fine.

You may consider a uniform on 0 to $2^m$ to see that differential entropy is $m$ and quantization with $\Delta = 2^{-n}$ gives entropy $m+n$ so intuitively the differential entropy gives the bits to cover the "spread" and the resolution $n$ in addition to cover the "precision". This assumes $m+n \geq 0$ to let quantization interval cover range.

In your initial example when $\sigma \to 0$ the fine quantization assumption will eventually be invalid but if you back up to exact expression entropy will be 0 as expected.

Solution 2:

The differential entropy $h(x)$ is not a true generalization of the (discrete, true) entropy $H(X)$, only some of the properties of the later apply to the former. In particular, the property that $H(X)\ge 0$ , with $H(X)=0$ meaning "zero uncertainty" (or full knowledge), does not apply to $h(x)$. The differential entropy can be negative, and $h(x)=0$ has no special significance.