$\newcommand{\erf}{\operatorname{erf}}$ This may be a very naïve question, but here goes.

The error function $\erf$ is defined by $$\erf(x) = \frac{2}{\sqrt{\pi}} \int_0^x e^{-t^2}dt.$$ Of course, it is closely related to the normal cdf $$\Phi(x) = P(N < x) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-t^2/2}dt$$ (where $N \sim N(0,1)$ is a standard normal) by the expression $\erf(x) = 2\Phi(x \sqrt{2})-1$.

My question is:

Why is it natural or useful to define $\erf$ normalized in this way?

I may be biased: as a probabilist, I think much more naturally in terms of $\Phi$. However, anytime I want to compute something, I find that my calculator or math library only provides $\erf$, and I have to go check a textbook or Wikipedia to remember where all the $1$s and $2$s go. Being charitable, I have to assume that $\erf$ was invented for some reason other than to cause me annoyance, so I would like to know what it is. If nothing else, it might help me remember the definition.

Wikipedia says:

The standard normal cdf is used more often in probability and statistics, and the error function is used more often in other branches of mathematics.

So perhaps a practitioner of one of these mysterious "other branches of mathematics" would care to enlighten me.

The most reasonable expression I've found is that $$P(|N| < x) = \erf(x/\sqrt{2}).$$ This at least gets rid of all but one of the apparently spurious constants, but still has a peculiar $\sqrt{2}$ floating around.


Some paper chasing netted this short article by George Marsaglia, in which he also quotes the article by James Glaisher where the error function was given a name and notation (but with a different normalization). Here's the relevant section of the paper:

In 1871, J.W. Glaisher published an article on definite integrals in which he comments that while there is scarcely a function that cannot be put in the form of a definite integral, for the evaluation of those that cannot be put in the form of a tolerable series we are limited to combinations of algebraic, circular, logarithmic and exponential—the elementary or primary functions. ... He writes:

The chief point of importance, therefore, is the choice of the elementary functions; and this is a work of some difficulty. One function however, viz. the integral $\int_x^\infty e^{-x^2}\mathrm dx$, well known for its use in physics, is so obviously suitable for the purpose, that, with the exception of receiving a name and a fixed notation, it may almost be said to have already become primary... As it is necessary that the function should have a name, and as I do not know that any has been suggested, I propose to call it the Error-function, on account of its earliest and still most important use being in connexion with the theory of Probability, and notably with the theory of Errors, and to write

$$\int_x^\infty e^{-x^2}\mathrm dx=\mathrm{Erf}(x)$$

Glaisher goes on to demonstrate use of $\mathrm{Erf}$ in the evaluation of a variety of definite integrals. We still use "error function" and $\mathrm{Erf}$, but $\mathrm{Erf}$ has become $\mathrm{erf}$, with a change of limits and a normalizing factor: $\mathrm{erf}(x)=\frac2{\sqrt{\pi}}\int_0^x e^{-t^2}\mathrm dt$ while Glaisher’s original $\mathrm{Erf}$ has become $\mathrm{erfc}(x)=\frac2{\sqrt{\pi}}\int_x^\infty e^{-t^2}\mathrm dt$. The normalizing factor $\frac2{\sqrt{\pi}}$ that makes $\mathrm{erfc}(0)=1$ was not used in early editions of the famous “A Course in Modern Analysis” by Whittaker and Watson. Both were students and later colleagues of Glaisher, as were other eminences from Cambridge mathematics/physics: Maxwell, Thomson (Lord Kelvin) Rayleigh, Littlewood, Jeans, Whitehead and Russell. Glaisher had a long and distinguished career at Cambridge and was editor of The Quarterly Journal of Mathematics for fifty years, from 1878 until his death in 1928.

It is unfortunate that changes from Glaisher’s original $\mathrm{Erf}$: the switch of limits, names and the standardizing factor, did not apply to what Glaisher acknowledged was its most important application: the normal distribution function, and thus $\frac1{\sqrt{2\pi}}\int e^{-\frac12t^2}\mathrm dt$ did not become the basic integral form. So those of us interested in its most important application are stuck with conversions...

...A search of the Internet will show many applications of what we now call $\mathrm{erf}$ or $\mathrm{erfc}$ to problems of the type that seemed of more interest to Glaisher and his famous colleagues: integral solutions of differential equations. These include the telegrapher’s equation, studied by Lord Kelvin in connection with the Atlantic cable, and Kelvin’s estimate of the age of the earth (25 million years), based on the solution of a heat equation for a molten sphere (it was far off because of then unknown contributions from radioactive decay). More recent Internet mentions of the use of $\mathrm{erf}$ or $\mathrm{erfc}$ for solving differential equations include short-circuit power dissipation in electrical engineering, current as a function of time in a switching diode, thermal spreading of impedance in electrical components, diffusion of a unidirectional magnetic field, recovery times of junction diodes and the Mars Orbiter Laser Altimeter.

On the other hand, for the applications where the error function is to be evaluated at complex values (spectroscopy, for instance), probably the more "natural" function to consider is Faddeeva's (or Voigt's) function:

$$w(z)=\exp\left(-z^2\right)\mathrm{erfc}(-iz)$$

there, the normalization factor simplifies most of the formulae in which it is used. In short, I suppose the choice of whether you use the error function or the normal distribution CDF $\Phi$ or the Faddeeva function in your applications is a matter of convenience.


I think the normalization in $x$ is easy to account for: it's natural to write down the integral $\int_0^x e^{-t^2} \, dt$ as an integral even if it's not actually the most natural probabilistic quantity. So it remains to explain the normalization in $y$, and as far as I can tell this is so $\lim_{x \to \infty} \text{erf}(x) = 1$.

Beyond that, the normalization's probably stuck more for historical reasons than anything else.