When does a maximum likelihood estimate fail to exist?

I have been told that a maximum likelihood estimate (MLE) does not always actually exist. Why is this the case? It is clear that the MLE may not be unique, but there should always be a maximum, no?


Solution 1:

I can think of the following example were MLE does not exist, in the sense that the estimation diverges. Let's say you have a few samples $x_1,...,x_n$ and you want to model that with a mixture of Gaussians $p(x) = \sum_{k=1}^K \pi_k \mathcal N(x|\mu_k, \sigma_k)$, where the $\pi_k$ sum to one.

In principle, you could just derive the log-likelihood of all the examples and optimize that with respect to parameters via gradient descent (which would be straightforward MLE). However, during the optimization, it might happen that one of the Gaussians sits exactly on one of the training examples. What the optimization would do in that case is to make the $\sigma$ of that Gaussian smaller and smaller, which increases the likelihood every time (basically because we have an "infinitely" narrow and high Gaussian sitting on one example). If you only had one Gaussian, the other examples would become less likely which would enforce a trade-off at some point. However, since the other Gaussians of the mixture give the remaining examples some finite likelihood, it does not matter that we make one single Gaussian narrower. Therefore, the likelihodd will diverge.

Solution 2:

The MLE exists if the parameter space is compact and the Likelihood function is continuous on the parameter space.

It is unique if the parameter space is convex and the likelihood function is concave.

Edit

I should add that the above conditions are sufficient conditions.

Solution 3:

Suppose you start with the Gauss--Markov assumptions:

  • Errors have expectation $0$;
  • Errors have equal variances (not necessarily identical distributions);
  • Errors are uncorrelated (not necessarily independent).

It is under those assumptions that one proves that the least-square estimates of regression coefficients are the best linear unbiased estimators. ("Linear" in this case means linear in the vector of response variables; "linear" is not about with whether one is fitting a straight line or something else.)

Notice that under those assumptions one is estimating what are often called "parameters"---not only the regression coefficients but also the variance, but one does not actually have a parametrized family of probability distributions, so it can't make any sense to speak of MLEs.

For more on when MLEs either don't exist or behave badly, look at Lucien LeCam's ironically title paper "Maximum Likelihood: an Introduction."