How is the entropy of the normal distribution derived?

Wikipedia says the entropy of the normal distribution is $\frac{1}2 \ln(2\pi e\sigma^2)$

I could not find any proof for that, though. I found some proofs that show that the maximum entropy resembles to $\frac{1}2+\ln(\sqrt{2\pi}\sigma)$ and while I see that this can be rewritten as $\frac{1}2\ln(e\sigma\sqrt{2\pi})$, I do not get how the square root can be get rid of and how the extra $\sigma$ can be put into $\ln$. It is clear that an additional summand $\frac{1}2\ln(\sigma\sqrt{2\pi})$ would help, but where do we get it from? Probably just thinking in the wrong way here...

So, what is the proof for the maximum likelihood entropy of the normal distribution?


Solution 1:

Notice that $\ln(\color{blue}{\sqrt{\color{black}{x}}}) = \ln(x^{\color{blue}{\frac{1}{2}}}) = \color{blue}{\frac{1}{2}}\ln(x)$ and that $\ln(y) \color{red}{+} \ln(z) = \ln(y \color{red}{\cdot} z)$ for all $x,y,z > 0$. Using these identities, let us re-write the maximum entropy, $\frac{1}{2} + \ln(\sqrt{2\pi}\sigma)$, as follows: $$ \begin{align} \frac{1}{2} + \ln(\sqrt{2\pi}\sigma) &= \frac{1}{2} + \ln(\color{blue}{\sqrt{\color{black}{2\pi\sigma^2}}}) \\ &= \frac{1}{2} + \color{blue}{\frac{1}{2}}\ln(2\pi\sigma^2) \\ &= \frac{1}{2}(1 + \ln(2\pi\sigma^2)) \\ &= \frac{1}{2}(\ln(\mathrm{e}) \color{red}{+} \ln(2\pi\sigma^2)) = \frac{1}{2}\ln(\mathrm{e}\color{red}{\cdot}2\pi\sigma^2) \end{align} $$ So, the entropy reported in Wikipedia is correct.

Solution 2:

For continuous distribution like Normal/Gaussian we compute the differential entropy.

You can find the derivation here http://www.biopsychology.org/norwich/isp/chap8.pdf

For more info on differential entropy I recommend the book "Elements of Information Theory" by Cover and Thomas.

Solution 3:

You have already gotten some good answers, I thought I could add something more of use which is not really an answer, but maybe good if you find differential entropy to be a strange concept.

Since we can not store a real or continuous number exactly, entropy for continuous distributions conceptually mean something different than entropy for discrete distributions.

It means the information required except for the resolution of representation. Take for example the uniform distribution on $[0,2^a-1]$ for an integer $a$. At integer resolution it will have $2^a$ equiprobable states and that would give $a$ bits of entropy. Also, the differential entropy is $\log(2^a-0)$, which happens to be the same. But if we want another resolution, lesser or more bits are of course required. Double resolution ($\pm 0.5$) would require 1 more bit (on average).