Can the maximum likelihood estimator be unbiased and fail to achieve Cramer-Rao lower bound?

If some maximum likelihood estimator (MLE) turns out to be unbiased (which does not necessarily holds), then does it achieve the Cramer-Rao lower bound (CRLB) even in finite sample? (It does when the parameter to estimate is the mean of some normal, or Poisson, or binomial distribution, for example.)

I feel there should be some MLE which is unbiased but does not achieve CRLB, but I could not give an example. So I am wondering if the claim actually holds.


Solution 1:

An example can be given, when we have a misspecification.
Assume that we have an i.i.d. sample of size $n$ of random variables following the Half Normal distribution. The density and moments of this distribution are

$$f_H(x) = \sqrt{2/\pi}\cdot \frac 1{v^{1/2}}\cdot \exp\big\{-\frac {x^2}{2v}\big\}\\ E_H(X) = \sqrt{2/\pi}\cdot v^{1/2}\equiv \mu_x,\;\; \operatorname{Var}_H(X) = \left(1-\frac 2{\pi}\right)v$$

The log-likelihood of the sample is

$$L(v\mid \mathbf x) = n\ln\sqrt{2/\pi}-\frac n2\ln v -\frac {1}{2v}\sum_{i=1}^nx_i^2$$

The first and second derivatives with respect to $v$ are

$$\frac {\partial}{\partial v}L(v\mid \mathbf x) = -\frac n{2v} + \frac {1}{2v^2}\sum_{i=1}^nx_i^2,\;\; \frac {\partial^2}{\partial v^2}L(v\mid \mathbf x) = \frac n{2v^2} - \frac {1}{v^3}\sum_{i=1}^nx_i^2$$

So the Fisher Information for parameter $v$ is

$$\mathcal I(v) = -E\left[\frac {\partial^2}{\partial v^2}L(v\mid \mathbf x)\right] = -\frac n{2v^2} + \frac {1}{v^3}\sum_{i=1}^nE(x_i^2) = -\frac n{2v^2} + \frac {n}{v^3}E(X^2)$$

$$=-\frac n{2v^2} + \frac {n}{v^3}\left[\operatorname{Var}(X)+\left(E[X])^2\right)\right] = -\frac n{2v^2} + \frac {n}{v^3}v$$

$$\Rightarrow \mathcal I(v) = \frac n{2v^2}$$

The Fisher Information for the mean $\mu_x$ is then

$$\mathcal I (\mu_x) = \mathcal I(v) \cdot \left(\frac {\partial \mu_x}{\partial v}\right)^{-2} = \frac n{2v^2}\cdot \left(\sqrt{2/\pi}\frac 12 v^{-1/2}\right)^{-2} = \frac {n\pi}{v}$$

and so the Cramer-Rao lower bound for the mean is

$$CRLB (\mu_x) = \left[\mathcal I (\mu_x)\right]^{-1} = \frac {v}{n\pi}$$

Assume now that we want to estimate the mean using maximum-likelihood, but we make a mistake: we assume that these random variables follow an Exponential distribution with density

$$g(x) = \frac 1{\beta}\cdot \exp\big\{-(1/\beta)x\big\}$$ The mean here is equal to $\beta$, and the maximum likelihood estimator will be

$$\hat \beta_{mMLE} = \hat E(X)_{mMLE} = \frac 1n\sum_{i=1}^nx_i$$ where the lowercase $m$ denotes that this estimator is based on a misspecified density. Nevertheless, its moments should be calculated based using the true density that the $X$'s actually follow. Then we see that this is an unbiased estimator, since

$$E_H[\hat E(X)_{mMLE}] = \frac 1n\sum_{i=1}^nE_H[x_i] = E_H(X) = \mu_x$$

while its variance is

$$\operatorname{Var}(\hat E(X)_{mMLE}) = \frac 1n\operatorname{Var}_H(X) = \frac 1n\left(1-\frac 2{\pi}\right)v$$

This variance is greater than the Cramer-Rao lower bound for the mean because

$$ \operatorname{Var}(\hat E(X)_{mMLE}) = \frac 1n\left(1-\frac 2{\pi}\right)v > CRLB (\mu_x) = \frac {v}{n\pi} $$

$$\Rightarrow 1-\frac 2{\pi} > \frac {1}{\pi} \Rightarrow 1 > \frac 3{\pi}$$

which holds. So we have an MLE which is unbiased but does not attain the Cramer-Rao lower bound for the magnitude that it estimates. Its efficiency is

$$\frac {CRLB (\mu_x)}{\operatorname{Var}(\hat E(X)_{mMLE})} = \frac {\frac {v}{n\pi}}{\frac 1n\left(1-\frac 2{\pi}\right)v} = \frac 1{\pi - 2} \approx 0.876$$

Note that the MLE for the mean under the correct specification is biased, with a downward bias.

Solution 2:

This may be useful (adapted from "Theory of Point Estimation" 2e by Lehmann and Casella, Section 2.5 on the Information Inequality)

Assume the parameter lives in $\Omega$ which is an open interval (could be infinite), such that $P_\theta$ has common support $A$ independent of $\theta$ and $\frac{\partial p_\theta (x)}{\partial \theta}$ exists and is finite for any $x\in A$ and $\theta in \Omega$.

Theorem (similar to 2.5.12): Assume $\delta$ is an unbiased estimator of $\theta$ with finite variance under any $\theta \in \Omega$. Then, $\delta$ attains the CRLB iff there exists a cont. differentiable function $\phi(\theta)$ such that $p_\theta (x) = C(\theta) e^{\psi(\theta) \delta(x)} h(x)$ is a density (i.e. $p_\theta$ constitutes an exponential family).

There are some other similar theorems cited in the same section.


Consider a symmetric distribution supported on $(-1/2,1/2)$, $f(x)$ which is sufficiently smooth with unique peak at $0$. Then, for $\theta \in \mathbb{R}$, consider the family of distributions $p_\theta (x) = f(x - \theta)$. The Fisher information is $\int_{-1/2}^{1/2} \left( \frac{f'(x)}{f(x)}\right)^2 f(x) dx$.

Consider the one sample case. The MLE is the sample you observe, so we need to see if $var(X) > \frac{1}{\int_{-1/2}^{1/2} \left( \frac{f'(x)}{f(x)}\right)^2 f(x) dx}$.

Consider $f(x) = -6(x+1/2)*(x-1/2)$ on $(-1/2,1/2)$. The LHS is 1/20. The RHS is $0$.