Maximal logarithmic likelihood function of the Student-t distribution

The negative logarithm of the Student-t distribution partial density function is $$f(\nu,x) := -\ln\Gamma\left(\frac{\nu+1}{2}\right) +\ln\Gamma\left(\frac{\nu}{2}\right) +\frac{1}{2}\ln(\pi\nu) +\frac{\nu+1}{2}\ln\left(1+\frac{x^2}\nu\right)$$

How would one prove or disprove there is only one local minimum with respect to $\nu>0$ for any given $x$?

Numerical computation seems to suggest $f(\nu,x)$ strictly decreases with $\nu\in(0,\infty)$ for $x\in[0,1.5]$, and $f(\nu,x)$ is convex in $(0,a)$ and concave in $(a,\infty)$ for some $a>0$ for $x\in (b,\infty)$ for some $b\ge 1.5$.


To facilitate the solution, I post the first and second partial derivative of $f$ as follows.

\begin{align}2\frac{\partial f}{\partial \nu}=\frac{1-x^2}{\nu+x^2}+\ln\Big(1+\frac{x^2}\nu\Big)-\int_0^\infty \frac{e^{-\frac \nu2t}}{1+e^{-\frac t2}}\,dt, \end{align} $$4\frac{\partial^2 f}{\partial \nu^2}=-2\frac{\nu+x^4}{\nu(\nu+x^2)^2}+\int_0^\infty \frac{te^{-\frac \nu2t}}{1+e^{-\frac t2}}\,dt$$


Solution 1:

The following used to contain a mistake, found by Hans. I have found a fix, and have incorporated into the following proof.


Here is a sketch of an argument that the log likelihood function has at most one critical point. It should be read in conjunction with Hans's partial answer.

The idea is to use the "variation diminishing property of the Laplace transform", according to which the Laplace transform of a function with $k$ sign changes cannot have more than $k$ sign-changing zero-crossings. For a function $\phi:\mathbb R^+\to\mathbb R$, let $S(\phi)$ be the maximal $k$ for which there exist $0<x_0<x_1<x_2<\cdots < x_{k}$ for which $\phi(x_i)\phi(x_{i+1})<0$ for all $0\le i < k$. Then the Laplace transform $g(s)=\int_0^\infty e^{-sx} G(x)dx$ of $G$ obeys $S(g)\le S(G)$. This topic is not well explained in Wikipedia articles, but the result used here is in chap V, paragraph 80 in vol 2 of Pólya and Szegő's Problems and Theorems in Analysis (p. 225 in my copy), is discussed at length in Karlin's book Total Positivity (see Theorem 3.1, page 21, and pages 233 and 237), in papers by I.J. Schoenberg, etc. One can think of it as a continuous analogue of Descartes' Rule of Signs. I used it in answering this MSE problem.

If the logarithm of the likelihood function had two or more local maxima, its derivative would have three or more roots, since between every two local maxima lies a local minimum. So it suffices, by the variation diminishing property of the LT, to show that what the OP, in his draft answer, calls $\tilde f$ has at most two sign changes.

This seems evident numerically, but deserves proof just as much as the original problem does. Here is one way of seeing this, using another application of the variation diminishing property of the Laplace transform.

Here the argument. First, I will change notation, using $s$ instead of $t$ and setting $y=x^2$. The claim is that, for fixed real $y\ge0$, $$\tilde f(s) = \frac{1-e^{-ys}}s +(1-y)e^{-ys}-\frac 2{1+e^{-s}}$$ has at most two sign changes as a function of $s\in\mathbb R^+$. Let $g(s)=\dfrac{1+e^{-s}}{s^2}\tilde f(s)$; clearly $g$ has as many sign changes as $\tilde f$ does. But $g$ is itself a Laplace transform: \begin{align}g(s)&=\frac{1+e^{-s}-e^{-ys}-e^{-(y+1)s}}{s^3}+(1-y)\frac{e^{-ys}+e^{-(y+1)s}}{s^2} - \frac2{s^2}\\ &=\int_0^\infty e^{-sx} G(x) dx,\end{align} from which one reads off \begin{align}G(x)&=\frac 1 2\left((x)_+^2 - (x-y)_+^2 +(x-1)_+^2 - (x-(y+1))_+^2\right) \\&+ (1-y)\left((x-y)_++(x-y-1)_+\right)-2x.\end{align} Here $(x)_+=\max(x,0)$. Since $x\mapsto (x)_+$ is continuous, so is $G$. If $y<1$ the function $G$ is piece-wise quadratic on each of the intervals $(0,y)$, $(y,1)$, $(1,y+1)$, $(y+1,\infty)$; if $y>1$ then $G$ is piecewise quadratic on the intervals $(0,1)$, $(1,y)$, $(y,y+1)$, $(y+1,\infty)$, so verification of the lemma is in principle easy in a case-by case manner. In practice, tedious and error prone.

If $y<1$ the formula for $G(x)$ reduces to $$G(x)=\begin{cases} x^2/2 -2x&0\le x<y\\ y^2/2-x-y&y\le x<1\\ x^2/2-2x+(y-1)^2/2&1\le x<1+y\\ y^2-2y-1&1+y\le x\end{cases}$$ and if $y>1$, the formula reduces to $$G(x)=\begin{cases} x^2/2 -2x&0\le x<1\\ x^2-3x+1/2&1\le x<y\\ x^2/2-2x+(y-1)^2/2&y\le x<1+y\\ y^2-2y-1&1+y\le x.\end{cases}$$

These can be merged into the following, where the cases are referred to below: $$ G(x)=\begin{cases} x^2/2 -2x&\text{A: if }0\le x<\min(1,y)\\ y^2/2-x-y&\text{B: if }y\le x< 1\\ x^2-3x+1/2&\text{C: if }1\le x< y\\ x^2/2-2x+(y-1)^2/2&\text{D: if }\max(1,y)\le x< 1+y\\ y^2-2y-1&\text{E: if }1+y< x \end{cases} $$ Note that cases B and C are mutually exclusive. Computations show that for fixed $y$ the function $G(x)$ has at most one sign change; I sketch an argument for this below. (Omitting an analysis of the possibility of sign changes at the case boundaries.)

$G$ has no sign changes in cases A or E (in A, the only possibilities are $x=0$ or $x=4$, the former is not a sign change, and $x=4$ does not obey $0\le x<\min(1,y)$. Constant functions, as in case E, do not have sign changes.) Case B has no sign changes, for the value $x=y^2/2-y$ violates $y<x<1$. In case C, a sign change could only occur at $x=(3\pm\sqrt 7)/2$, and then $1<x<y$ implies $x=(3+\sqrt7)/2$ and $y>(3+\sqrt7)/2$. In case D, a sign change can only occur at $x=2\pm\sqrt{3+2y-y^2}$, and $\max(0,x)<x<y$ is only possible if $x=2+\sqrt{3+2y-y^2}$ and $1+\sqrt 2<y<(3+\sqrt 7)/2$. Putting these together: if $y<1$ then there can be no sign changes in the relevant cases A,B,D,E. If $y>1$ there might be at most one sign change in each of C, D (out of the relevant A,C,D,E), but not actually both, since that would violate $(3+\sqrt7)/2 <y<(3+\sqrt 7)/2$. Hence, $G$ has at most one sign change among A,B,C,D,E.

Finally, since $G(0)=0, G'(0)=-1, G(\infty)=y^2-2y-1$, we see $G$ has exactly one sign change if $y^2-2y-1>0$ and none if $ y^2-2y-1\le0$.

The meta-motivation is to shoehorn the original question into an application of the variation diminishing machinery given in my first paragraphs. The micro-motivation for my choice of $g$ (and hence of $G$) comes from the realization that $\tilde f$ is the Laplace transform of the signed measure $$\mu = \lambda_{[0,y]} + (1-y)\delta_y -2\sum_{k\ge0}(-1)^k \delta_k,$$ where $\lambda_{[0,y]}$ is Lebesgue measure restricted to $[0,y]$ and $\delta_k$ represents the unit point mass at $k$. The signed measure $\mu$ has infinitely many sign changes, but the telescoping series $\mu*(\delta_0+\delta_1)$ does not, where $*$ denotes convolution of measures, so $1+e^{-s}$ times $\tilde f$ is a better candidate for the variation diminishing trick sketched above.

Dividing by a power of $s$ has the effect of smoothing the signed measure, and eliminating some small oscillations that create their own extraneous sign changes. The mistake Hans found in an earlier version of this answer was to divide by $s$, which allowed for 3 sign changes for a certain range of $y$. Dividing by $s^2$ fixed this problem, at the price of making $G$ piecewise quadratic instead of piecewise linear.

Solution 2:

As suggested by @kimchilover, I write \begin{align}2\frac{\partial f}{\partial \nu} &=\frac{1-x^2}{\nu+x^2}+\ln\Big(1+\frac{x^2}\nu\Big)-\int_0^\infty \frac{e^{-\frac \nu2t}}{1+e^{-\frac t2}}\,dt \\ &= \int_0^\infty \tilde f(t,x)e^{-\nu t}dt, \end{align} where $$\tilde f(t,x):=\frac{1-e^{-x^2t}}t+(1-x^2)e^{-x^2t}-\frac2{1+e^{-t}}=\frac1t+\Big(-\frac1t+1-x^2\Big)e^{-x^2t}-\frac2{1+e^{-t}}.$$ However, it seems more work is needed for when $\tilde f(t,x)$ has two signs with respect to $t$ for given large $x$.

Again running with @kimchilover's idea, $h(t,y):=(1+e^{-t})\tilde f(t,y).$ Then $$\tilde g(s,y)=$$

Solution 3:

Just some thoughts (I will add proofs in the future, if possible.)

For convenience, we replace $x^2$ with $y$.

We have $$ \frac{\partial f}{\partial v} = \frac{1 - y}{2v + 2y} + \frac12 \ln \left(1 + \frac{y}{v}\right) - \frac12 \psi\left(\frac{v + 1}{2}\right) + \frac12\psi\left(\frac{v}{2}\right) $$ where $\psi(\cdot)$ is the digamma function defined by $\psi(u) = \frac{\mathrm{d} \ln \Gamma(u)}{\mathrm{d} u} = \frac{\Gamma'(u)}{\Gamma(u)}$.

(i) If $0 < y \le 1$, then \begin{align*} \frac{\partial f}{\partial v} &\le \frac{1 - y}{2v} + \frac12 \cdot \frac{y}{v} - \frac12 \left(\ln \frac{v + 1}{2} - \frac{2}{v + 1}\right) + \frac12\left(\ln \frac{v}{2} - \frac{1}{v}\right)\\ &= \frac{1}{v + 1} - \frac12\ln\left(1 + \frac{1}{v}\right)\\ &< 0 \end{align*} where we have used $\ln z - \frac{1}{z} \le \psi(z) \le \ln z - \frac{1}{2z}$ for all $z > 0$, and $\ln(1 + u) \le u$ for all $u \ge 0$. Note: The last inequality is easy to prove by taking derivative.

(ii) If $1 < y \le 1 + \sqrt{2}$, then \begin{align*} \frac{\partial f}{\partial v} &\le \frac{1 - (1 + \sqrt2)}{2v + 2(1 + \sqrt2)} + \frac12 \ln \left(1 + \frac{1 + \sqrt2}{v}\right)\\ &\quad - \frac12\left(\ln \frac{v + 1}{2} - \frac{1}{v + 1} - \frac{1}{3(v + 1)^2}\right)\\ &\quad + \frac12\left(\ln \frac{v}{2} - \frac{1}{v} - \frac{1}{12(v/2 + 1/14)^2}\right)\\ &< 0 \end{align*} where we have used $\ln u - \frac{1}{2u} - \frac{1}{12u^2} < \psi(u) < \ln u - \frac{1}{2u} - \frac{1}{12(u + 1/14)^2}$ for all $u > 0$ (Theorem 5, [1]), and $y \mapsto \frac{1 - y}{2v + 2y} + \frac12 \ln \left(1 + \frac{y}{v}\right)$ is strictly increasing on $(1, \infty)$. Note: The last inequality is easy to prove by taking derivative.

Remark: Where does $1 + \sqrt2$ come from? We rewrite $\frac{\partial f}{\partial v}$ as $$\frac{\partial f}{\partial v} = \frac{v \mathrm{e}^{A}}{2(v + y)} \left\{\left(1 + \frac{y}{v}\right)\mathrm{e}^{-A} \ln \left[\left(1 + \frac{y}{v}\right)\mathrm{e}^{-A}\right] + \left(1 + \frac{1}{v}\right)\mathrm{e}^{-A}\right\}$$ where $A = 1 + \psi\left(\frac{v + 1}{2}\right) - \psi\left(\frac{v}{2}\right)$. From $\frac{\partial f}{\partial v} = 0$, we solve $y = {{\mathrm e}^{{\mathrm{LambertW}}\left( - (1 + 1/v)\mathrm{e}^{-A} \right) +A}}\,v - v$ where $W(\cdot)$ is the Lambert W function. Maple tells us $\lim_{v\to \infty} \left( {{\mathrm e}^{{\mathrm{LambertW}}\left( - (1 + 1/v)\mathrm{e}^{-A} \right) +A}}\,v - v\right) = 1 + \sqrt{2}$. By the way, @heropup also pointed out in comment that the point $x = \sqrt{1 + \sqrt{2}}$ is a demarcation point.

(iii) If $y > 1 + \sqrt{2}$ is fixed, we claim that $f$ has exactly one global minimizer $v^\ast$ on $(0, \infty)$, furthermore, $f$ is strictly decreasing on $(0, v^\ast)$ and strictly increasing on $(v^\ast, \infty)$.

We can prove the claim if the following conjecture is true:

Conjecture 1: $Q < 0$ for all $v > 0$, where $$Q = - 4 - v(v + 2)^2 \int_0^\infty \frac{t^2\mathrm{e}^{-vt/2}}{1 + \mathrm{e}^{-t/2}}\mathrm{d} t + (6v^2 + 16v + 8)\int_0^\infty \frac{t\mathrm{e}^{-vt/2}}{1 + \mathrm{e}^{-t/2}}\mathrm{d} t.$$ Numerical evidence shows its truth. However, we have not yet proved it.

References

[1] L. Gordon, “A stochastic approach to the gamma function”, Amer. Math. Monthly, 9(101), 1994, 858-865.