Why convolution regularize functions?

Perhaps this might soothe some of your discomfort.

Smoothing Action

There are many ways that convolution is useful in mathematics. First of all, as you have noted, $$ \mathrm{D}^\alpha\left(f\ast g\right)=\left(\mathrm{D}^\alpha f\right)\ast g\tag{1} $$ This is simply repeated changes of the order of integration and differentiation: $$ \frac{\mathrm{d}}{\mathrm{d}x}\int f(x-t)\,g(t)\,\mathrm{d}t=\int f'(x-t)\,g(t)\,\mathrm{d}t\tag{2} $$ This step can be justified in different ways depending on the context. For instance, if the limit which defines the derivative of $f$, $$ f'(x)=\lim_{h\to0}\frac{f(x+h)-f(x)}{h}\tag{3} $$ converges uniformly, then $(2)$ is valid for all $g\in L^1$.

Convolution combines the smoothness of two functions. That is, if both $f$ and $g$, and their first derivatives are in $L^1$, then the second derivative of their convolution is in $L^1$. This is because $f\ast g = g\ast f$, and so we can use $(2)$ twice to get $$ \frac{\mathrm{d}^2}{\mathrm{d}x^2}(f\ast g)=f'\ast g'\tag{4} $$

Fourier Analysis

Convolution plays an important role in Fourier Analysis. The key formulas demonstrate the duality between convolution and multiplication: $$ \mathscr{F}(f\ast g)=\mathscr{F}(f)\mathscr{F}(g)\quad\text{and}\quad\mathscr{F}(fg)=\mathscr{F}(f)\ast\mathscr{F}(g)\tag{5} $$ There also exists a duality between decay at $\infty$ and smoothness. Essentially, one derivative of smoothness of $f$ corresponds to one factor of $1/x$ in the decay of $\mathscr{F}(f)$, and vice versa.

The product of decaying functions decays even faster; e.g. $x^{-n}x^{-m}=x^{-(n+m)}$. The duality demonstrated in $(5)$ then says that the convolution of smooth functions is even smoother.

The Riemann-Lebesgue Lemma says that for $f\in L^1$, $$ \lim_{|x|\to\infty}\mathscr{F}(f)(x)=0\tag{6} $$ However, this is simply decay with no quantification. About all that can be said about $f,g\in L^1$ is that $f\ast g\in L^1$. However, if $f,g\in L^2$, then $f\ast g$ is continuous.

Summing Dice

Perhaps one of the earliest uses of convolution was in probability. If $f_n(k)$ is the number of ways to roll a $k$ on $n$ six-sided dice, then $$ f_n(k)=\sum_jf_{n-1}(k-j)f_1(j)\tag{7} $$ That is, for each way to achieve $k$ on $n$ dice, we must have $k-j$ on $n-1$ dice and $j$ on the remaining die. Equation $(7)$ represents discrete convolution.

The distribution function for the roll of a single six-sided die is evenly distributed among $6$ possibilities. This has discontinuities at $1$ and $6$ ($n=1$). The distribution function for the sum of two six-sided dice is the convolution of two of the one die distributions. This is continuous, but not smooth ($n=2$). The distribution function for the sum of three six-sided dice is the convolution of the one die and two dice distributions. This is smooth ($n=3$). For each die we add, we convolve one more of the one die distributions and the function gets smoother.

$\hspace{8mm}$enter image description here

As $n\to\infty$, the distribution approaches a scaled version of the normal distribution: $\frac1{\sqrt{2\pi}}e^{-x^2/2}$.


Concerning II:

When $f$ is a smooth function then all its translates $T_xf$, defined by $T_xf(t):=f(t-x)$, are equally smooth. Convolution of $f$ with an arbitrary $u$ can be viewed as a linear combination of such translates: $$f*u=\int_{-\infty}^\infty u(x)\>T_x f\ dx\ ,$$ therefore we expect that it inherits this smoothness. Looking now what this process has done to $u$ we get the feeling that $u$ has been smoothed out in this way.


I can't help with the first question. As for the second, the way I think of it intuitively is that the convolution of two functions mixes their values and integrates. Integrating means averaging out, so whatever properties each function brought in to the convolution are relaxed. For instance, if one function had a sharp corner in its graph but was otherwise quite tame, then the contribution of the sharp point to the convolution will be moderated by the smoothness of the function at other points. This is why local ill-behavior tends to disappear in the convolution. I hope this helps.