"Wild" exponents of $e$

One of the things I'm curious about is why do some functions describe something like this:

$$f(x,y) = e^{-\frac{x^2+y^2}{2\sigma^2}}$$

And people mostly take it for granted, throw it around for various kernels for processing images... But nobody takes the time to explain why there's a fraction in the exponent. What does it do? Square $e$ by the squares of inputs and then take a doubled square of sigma - root? What does that even mean?

Is there even any point of doing "the math" in your head for these wild equations? $f(1,1)$ with a $\sigma = 1$ is not that hard, it's $\frac{1}{e}$. Not to mention bigger inputs which are not so friendly.

Wrote a few articles on the subject, so I took the time to edit this answer to be even more intuitive. Again, in the interest of readability and usefulness to newcomers, rigorous formality has been rescinded for the duration of this post.

enter image description here

The normal distribution, informally also known as the Gaussian distribution, has found countless applications over the past 200 years. You must've ran into it yourself. Not to mention all the derivative concepts which are strongly based on it. With the ubiquitous nature of the subject, you'd assume humans would have a solid grasp on where it came from and why the hell it works. And, (wo)man, would you be wrong. And even among those who think they do, many would just give you a circular argument which has no logical underpinning or depends on foreknowledge of the actual thing. Let us rederive the concept.

Humble beginnings

Notice how I didn't start with the final form of the equation representing a normal distribution. Not even a graph of it. "I do not know" is the beginning of all knowledge, just ask Lieutenant Commander Data. Right now, "we don't know it exists". We'll get to that. Sit back and enjoy the ride. To find intuition, we need to find our footing in the real world. If you consider it, people often like to judge whether things or events are normal. Many associate positive emotions with normality. Being normal is being in the balance between two extremes. Awesome, terrible. Genius, imbecile. Overly social, anti-social. Just to name a few. And for whatever reason, the universe mapped most people right in the middle. Normal people.

To us, it seems natural because it is intuitive, we take it for granted. There is no reason why it is distributed that way, but it is. So, normality is defined by the words most and middle. Observing such a distribution in the real world for all of your life and chances are you'll eventually want to encode it in a mathematical function in order to predict a pattern-enforcing universe. Especially if you're allocated towards the genius extreme of the normal distribution of mental capacity. Mathematics already implements the concept of a middle, it's the arithmetic average, sometimes dubbed just the average or the mean. I've already went into deriving the notion of the arithmetic mean on this website. For now, you're already familiar with it in the form:

$$\mu = \frac{1}{n}\sum_{i=1}^{n}{x_i}$$

What we desire to find is a distribution function which replicates the concept of normality in day-to-day life, in other words, we desire to find a function which yields the highest value for the middle between the two extremes, or aptly put, the mean. In terms of statistical analysis and probability, such a function is called a probability (distribution) function. Due to the fact we're not developing a function which operates on discrete-valued variables, it is more precisely called a probability density function (evaluated on intervals rather than single points, doesn't return a probability, but probability per unit). Given a sample space of possible values from one extreme to the other, we desire to have a peak when $x = \mu$, the very definition of real-life normality.

Right now, we only have a vague idea of what the maximum of the function is. We need to introduce constraints to actually trap a computable manifestation of the function. From everyday experience, we can say that a very intelligent person and a very unintelligent person are equally improbable, that is to say, the function we're looking must be symmetric around the mean. In the simplified view where the mean $\mu = 0$, this can easily be expressed as $p(-x)=p(x)$. A symmetric function with a peak at $\mu$ implies its derivative is $0$ when $x=\mu$:

$$\LARGE \left. \frac{\mathbf{d}p(x - \mu)}{\mathbf{d}x} \right|_{x=\mu} = 0$$

The simple fact that the function is symmetric and requires that its derivative is $0$ at $\mu$ means that this particular probability density function must be non-linear and continuous because otherwise there would be a reflection at $\mu$ and a very useful derivative wouldn't be available to us. It seems like we don't have a lot to go on here, but the derivative of a symmetric function guarantees it has "equal amounts" of negative and positive values (in other words, it exhibits anti-symmetry), which when integrated from $-\infty$ to $\infty$ evaluate to $0$, which we can use to our advantage:

$$ \int_{-\infty}^{\infty}{p'(x - \mu)} \space dx = 0$$ $$ \int_{-\infty}^{\infty}{x - \mu} \space dx = 0$$

Following from that, we can infer a connection:

$$ p'(x - \mu) \propto x - \mu$$ $$ p'(x - \mu) = k(x - \mu)$$

If we were to integrate this particular expression on both sides, we'd get a parabola. While it perfectly does what we want it to do, achieves a peak and falls off symmetrically (if we make sure $k < 0$), it doesn't stop at $0$. We can't have that. We need to establish a ratio which always keeps the function above water, above $0$. Fortunately, this is very easy:

$$\int_{-\infty}^{\infty}{\frac{p'(x - \mu)}{p(x - \mu)}\space dx} = 0$$

You can easily see that it proportionally changes the values on both sides (because it is symmetrical), therefore integrating this expression still yields $0$. Continuing from there, we've setup the expression to fall into our trap:

$$ \frac{p'(x - \mu)}{p(x - \mu)} = k(x - \mu)$$

And from here we simply integrate both sides and pick up the candy from the floor:

$$ \int{\frac{p'(x - \mu)}{p(x - \mu)}} \space dx = \int{k(x - \mu)} \space dx \Rightarrow \ln{p(x - \mu)} = \frac{-k(x - \mu)^2}{2}$$ $$\LARGE p(x - \mu) = e^{\frac{-k(x - \mu)^2}{2}}$$

Do note that we're far from done, as a probability density function must be normalized before it can be of any use.

Normalizing the normal distribution

Normalizing the normal distribution... Ugh, the wording. Our terms are simple, integrate to $1$:

$$\LARGE \int ne^{\frac{-k(x - \mu)^2}{2}} \space dx = 1$$

But life is not. The function we developed has no antiderivative, therefore the indefinite integral is not an option. What can we do in face of such tragedy? Take the next best thing, select an integration range $[-a, a]$ and find a way to perform it with $a = \infty$ without the Universe collapsing:

$$\LARGE \int_{-a}^a ne^{\frac{-k(x - \mu)^2}{2}} \space dx = 1$$

Now, we need to get very devious in order to avoid brute force numerical integration or, as it is known in some countries, the peasant way. Fortunately, mathematics allows us to interpret any equation in any number of valid ways, as long as we respect the well established rules. With that in mind, let's continue. First thing we're going to try on our path of discovery is to square it and see what we get. Seems like we're shooting in the dark, but that's why we call it discovering mathematics. We cast a lot of educated guesses and try to pinpoint our target:

$$ \left(\int_{-a}^a ne^{\frac{-k(x - \mu)^2}{2}} \space dx\right)^2 = n^2\int_{-a}^a e^{\frac{-k(x - \mu)^2}{2}} \space dx \int_{-a}^a e^{\frac{-k(x - \mu)^2}{2}} \space dx = 1^2$$

Usually, at this point, you get smacked into the face with Fubini's theorem, even though people should actually be talking about Tonelli's theorem. Both theorems are really difficult to prove for most people because they are not well-versed in measure theory, which is basically tasked with generalizing the concept of measure. This means there's a huge amount of abstraction, a lot of formal notation and it simply freaks people out. Fortunately, for the Gaussian integral, it is absolutely not required to be familiar with Fubini's or Tonelli's theorem. Why? We're currently working with two variables, which can be easily interpreted in intuitive notions of trivial length and area. It's really easy to see that the following is true, due to the mutual independence of the variables involved:

$$ \int_{-a}^a x \space dx \int_{-a}^a y \space dy = \int_{-a}^a\int_{-a}^a xy \space dx dy$$

Nothing magical about it. Essentially, iteratively integrating and observing that the inner integral is constant with respect to $y$ allows you to eject it outside and boom. If you're a little queasy, write it out:

$$ \int_{-a}^a y \left(\int_{-a}^a x \space dx \right) dy = \left(\int_{-a}^a x \space dx\right) \int_{-a}^a y \space dy = \int_{-a}^a x \space dx \int_{-a}^a y \space dy$$

Where Fubini and Tonelli really shine is when you get into higher dimensions and it is really difficult to establish the validity of such an exchange without measure theory. Luckily for us, it's pretty clear we can do that, so let us apply it to our problem:

$$ \int_{-a}^a e^{\frac{-k(x - \mu)^2}{2}} \space dx \int_{-a}^a e^{\frac{-k(x - \mu)^2}{2}} \space dx = \int_{-a}^a \int_{-a}^a e^{\frac{-k(x - \mu)^2}{2}} e^{\frac{-k(x - \mu)^2}{2}} \space dx dx = \frac{1}{n^2}$$

Now, we need to clean this up into something functional. First of all, we will need to have a way of separating the two $x$ variables, simply by calling the right one $y$. It changes nothing beyond improving readability and also allows us to reinterpret the problem in terms of area. As the mean $\mu$ is constant, we can introduce a substitution with no repercussions. Therefore, let $x_{\mu} = x - \mu $ and $y_{\mu} = y - \mu $. Also, we shall now commit to the infinite range with $a = \infty $. Here it is combined:

$$ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{\frac{-k}{2}x_{\mu}^2} e^{\frac{-k}{2}y_{\mu}^2} \space dx_{\mu} dy_{\mu} = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} e^{\frac{-k}{2}(x_{\mu}^2 + y_{\mu}^2)} \space dx_{\mu} dy_{\mu} = \frac{1}{n^2}$$

Attacking the problem as if we were integrating over area, we can see something familiar popping up in the exponent. This suggests a possible avenue of solving the problem in a more susceptible coordinate system, such as the polar. First step is to notice $r^2 = x_{\mu}^2 + y_{\mu}^2$. Expressing area in polar coordinates is not that different from expressing them in spherical coordinates, I took the liberty to make you a fancy graph below where you can see where $\mathbf{d}A = \mathbf{d}x \mathbf{d}y = r \mathbf{d} \theta \mathbf{d}r $ comes from. Changing the integration range is simple, $r$ cannot be negative because it makes no sense for radius, enforced by the squares (the upper bound is still $\infty$). $\theta$ goes from $0$ to $2\pi$.

enter image description here

And here is what we've discussed in the previous paragraph combined:

$$\int_{0}^{\infty} \int_{0}^{2\pi} e^{\frac{-k}{2}r^2} r\space d\theta dr = \frac{1}{n^2}$$

Everything in the inner integral is constant with respect to $\theta$:

$$ \int_{0}^{\infty} \left(\int_{0}^{2\pi} e^{\frac{-k}{2}r^2} r \space d\theta\right) dr = \int_{0}^{\infty} e^{\frac{-k}{2}r^2} r \left(\int_{0}^{2\pi} 1 \space d\theta \right) dr = 2\pi\int_{0}^{\infty} e^{\frac{-k}{2}r^2} \space r dr = \frac{1}{n^2}$$

To resolve the outer integral, we need to introduce a substitution once more and adequately adjust the integral:

$$ u = \frac{k}{2}r^2 $$ $$ \frac{d}{dr}\left(\frac{k}{2}r^2\right) = kr $$ $$ \frac{du}{k} = rdr $$ $$ \int_{0}^{\infty} e^{-u} \space du = \frac{k}{2\pi n^2}$$

Finalizing the integral explicitly would require one last substitution, most of you probably already know the answer just by looking at it:

$$ v = -u $$ $$ \frac{d}{du}\left(-u\right) = -1 $$ $$ -\int_{0}^{-\infty} e^{v} \space dv = \frac{k}{2\pi n^2}$$

The anti-derivative of $e^v$ is $e^v$, therefore evaluating at limits:

$$ e^0 - \frac{1}{e^{\infty}} = \frac{k}{2\pi n^2}$$ $$ n = \sqrt{\frac{k}{2\pi}}$$

And our normalized $p(x - \mu)$ is therefore:

$$\LARGE p(x - \mu) = \sqrt{\frac{k}{2\pi}} e^{\frac{-k(x - \mu)^2}{2}}$$

Beautiful. And you could analyze the behavior of the function to establish the deviation and variance, I'll just give you the final values which you can use to check your little exercise:

$$k = \frac{1}{\sigma^2}$$

$$\LARGE p(x - \mu) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}$$

In the example below is the final function's graph with its area colored in blue, which has now reached unity. Parametrization used is $\mu = 0, \sigma = 2$.

enter image description here

One way to interpret the necessity of a fraction is in the sense of units. If say $x$ and $y$ are measured in meters, then the argument of the exponential must be unitless, hence $\sigma$ must have the same units. Indeed, if you think in terms of probability, then $\sigma$ in the sense of standard deviation works for this.

"Wild" exponents of $e$

Humble beginnings

Normalizing the normal distribution

Related

Recent Posts