What is the meaning of the cumulant generating function itself?

Solution 1:

For simplicity let us assume that $X$ has mean zero, so I don't accidentally say something obviously wrong by mixing up cumulant and moment.

A few basic comments:

You can look at $\Psi(z)=E[e^{zX}]$ for a complex parameter $z$. This unifies the notion of the characteristic function (which is the restriction of $\Psi$ to the imaginary axis) and the cumulant generating function (which is the restriction of $\Psi$ to the real axis).

This unified object $\Psi$ is really the "Fourier transform" of the (formal) density of $X$. So they are really the same object, the only issue is that often the domain of $\Psi$ doesn't contain the real axis but it is always guaranteed to contain the imaginary axis.

The term "generating function" should really already be alluding to the fact that the cumulant generating function is a tool, not really an object of interest per se. In general generating functions are used as methods for studying the coefficients of their (perhaps formal) power series, and are not of much interest in and of themselves.

With that said, the most direct interpretation of the cumulant generating function per se that I can think of comes from Cramer's theorem. This loosely says that if $X_i$ are iid random variables with a cumulant generating function, and $n$ is large, then the probability that $|\sum_{i=1}^n X_i|>nx$ is approximately $e^{-nI(x)}$. Here $I(x)$ is called the rate function and is given explicitly by the Legendre transform of the logarithm of the cumulant generating function:

$$I(x)=\sup_{t \in \mathbb{R}} tx-\ln \Psi(t).$$

Notice that this supremum, if it is finite, will be attained where $(\ln \Psi)'(t)=x=\Psi'(t)/\Psi(t)$. Thus in effect we can look at $\psi=(\Psi'/\Psi)^{-1}$, and then $I(x)=x\psi(x)-\ln \Psi(\psi(x))$ (on the domain of $\psi$, anyway). $\Psi'/\Psi$ is guaranteed to be injective (but not surjective) because $\Psi$ is log-convex.

But $I$ has a relatively concrete interpretation as measuring the decay rate of large deviations, so this gives us a way of thinking about $\Psi$ and $\psi$.

An instructive example comes when you consider $X_i$ equally likely to be $-1$ or $1$; in this case $\Psi=\cosh$ and $\psi=\tanh^{-1}$, so that $I(x)=x\tanh^{-1}(x)+\frac{1}{2}\log(1-x^2)$ in $(-1,1)$ (extended by continuity to $-1$ and $1$, where the value is easily seen by simple counting considerations to be $\log(2)$.) This gives us the exponential decay of the tail behavior of the sum.

But neither term really expresses it properly in isolation. For instance, notice that the two terms cancel out their respective singularities at $\pm 1$, so there is no hope of understanding the behavior there without both terms. To put it another way, quantitatively understanding the tail is requiring us to know not really how $\psi$ and $\Psi$ behave by themselves but how much $id \cdot \psi$ differs from $\ln \circ \Psi \circ \psi$. That can't possibly be encapsulated in a single value of $\Psi$, at the very least you need to know $\Psi$ on some interval to get this information.