How do you differentiate the likelihood function for the uniform distribution in finding the M.L.E.?

There is a classic problem:

Suppose that $X_1,\ldots,X_n$ form an i.i.d. sample from a uniform distribution on the interval $(0,\theta)$, where $\theta>0$ is unknown. I would like to find the MLE of $\theta$.

The pdf of each observation will have the form: $$ f(x\mid\theta) = \begin{cases} 1/\theta\quad&\text{for }\, 0\leq x\leq \theta\\ 0 &\text{otherwise}. \end{cases} $$ The likelihood function therefore has the form: $$ L(\theta) = \begin{cases} 1/\theta^n \quad&\text{for }\; 0\leq x_i \leq \theta\;\; \text{for all }i,\\ 0 &\text{otherwise}. \end{cases} $$ The general solution is usually that the MLE of theta must be a value of $\theta$ for which $\theta \geq x_i$ and which maximizes $1/\theta^n$ among all such values.

The reasoning is that since $1/\theta^n$ is a decreasing function of $\theta$, the estimate will be the smallest possible value of $\theta$ such that $\theta\geq x_i$.

Therefore, the mle of $\theta$, $\hat{\theta}$, is $\max(X_1,\ldots,X_n)$.

Here, I do not understand why we cannot just differentiate the likelihood function with respect to theta and then set it equal to $0$?

Thanks!


The likelihood function can be written as $$ L(\theta)=\frac{1}{\theta^n}\mathbf{1}_{\theta\geq c}, $$ where $c=\max\{x_1,\ldots,x_n\}$. Therefore, $\theta\mapsto L(\theta)$ is not differentiable on the whole of $(0,\infty)$ and hence we cannot solve $L'(\theta)=0$ to look for maxima and minima. (Maxima and minima of a function $f$ have to be found among values of $x$ with either $f'(x)=0$ or $f'(x)$ being undefined)

Note however, that $L$ is differentiable on $(0,\infty)\setminus\{c\}$ and that $L(\theta)=0$ for $\theta\in (0,c)$ and by looking at $L'(\theta)$ on $(c,\infty)$ we see that $L$ is decreasing on $(c,\infty)$. Since $$ L(c)=\frac{1}{c^n}>\frac{1}{\theta^n}=L(\theta),\quad \text{for all }\;\theta>c $$ we see that $L(c)$ is the global maximum.


In addition to Stefan Hansen's great answer (+1), intuitively just think of the following:

  1. As you say $L(\theta)$ is a decreasing function so to maximize it $\theta$, $\theta$ has to be such that it is as low as possible

  2. Secondly, given your restriction on the observations (the random variables $x_i$'s), can it be smaller than $max(X_i)$? The answer is no

1 and 2 imply that even though you might obtain a value of $\theta$ by differentiating so as to obtain a larger likelihood, respecting the second restriction implies that this value is not reasonable.