Why General Leibniz rule and Newton's Binomial are so similar?
The binomial expansion:
$$(x+y)^{n} = \sum_{k=0}^{n} \binom{n}{k} x^k y^{n-k}$$
The General Leibniz rule (used as a generalization of the product rule for derivatives):
$$(fg)^{(n)} = \sum_{k=0}^{n} \binom{n}{k} f^{(k)} g^{(n-k)}$$
Both formulas can be obtained simply by induction; Newton's binomial also has a combinatorial proof (here's the relevant wikipedia page). It's striking how these formulas are similar; is there a possible connection between them? I was thinking that maybe the General Leibniz rule could be obtained using a combinatorial argument as well (hence the binomial coefficients)...
Solution 1:
Here's one combinatorial way to look at both formulas.
For the first one, let $M$ be the operator which eats a polynomial $f(x,y)$ and returns the polynomial $(x+y)f(x,y)$. Note $M$ is linear, since multiplication is distributive. We want to start with $1$, apply $M$ $n$ times, and see what we get. The point is that $M$ sends any monomial $x^r y^s$ to a sum of two related ones: $$M : x^r y^s \to x^{r+1}y^s + x^r y^{s+1}.$$
Therefore, $(x+y)^n$ enumerates paths from the top of the following diagram to the bottom row; the coefficient of $x^k y^{n-k}$ is the number of paths from $1$ to $x^k y^{n-k}$, but that's $\binom nk$ since any path is a sequence of $n$ left/right choices, and you have to go left $k$ times and right $n-k$ times.
$$\matrix{ &&&&&&1\\ &&&&&\swarrow&&\searrow\\ &&&&x&&&&y\\ &&&\swarrow&&\searrow&&\swarrow&&\searrow\\ &&x^2&&&&xy&&&&y^2\\ &&&&\dots&&\dots&&\dots\\ &\swarrow&&\searrow&&\swarrow&&\searrow&&\swarrow&&\searrow\\ x^n&&&&x^{n-1}y&&\dots&&xy^{n-1}&&&&y^n }$$
For the second formula, $D$ is the derivative operator; we would like to apply it $n$ times to the product $f g$. Notice that $D$ acts nicely on $f^{(r)}g^{(s)}$: $$D : f^{(r)}g^{(s)}\to f^{(r+1)}g^{(s)}+f^{(r)}g^{(s+1)}.$$
So $(fg)^{(n)}$ enumerates paths from the top of the following diagram to the bottom row. $$\matrix{ &&&&&&f^{(0)}g^{(0)}\\ &&&&&\swarrow&&\searrow\\ &&&&f^{(1)}g^{(0)}&&&&f^{(0)}g^{(1)}\\ &&&\swarrow&&\searrow&&\swarrow&&\searrow\\ &&f^{(2)}g^{(0)}&&&&f^{(1)}g^{(1)}&&&&f^{(0)}g^{(2)}\\ &&&&\dots&&\dots&&\dots\\ &\swarrow&&\searrow&&\swarrow&&\searrow&&\swarrow&&\searrow\\ f^{(n)}g^{(0)}&&&&f^{(n-1)}g^{(1)}&&\dots&&f^{(1)}g^{(n-1)}&&&&f^{(0)}g^{(n)} }$$
Same graph, same numbers of paths, same coefficients.
Solution 2:
Suppose, we have two functions $ f(x) $ and $ g(x)$ , then the Taylor expansion of both around a point $ x= \alpha $ is of the form:
$$ f(x) = \sum_{k=0}^{\infty} a_k (x-\alpha)^k$$
$$ g(x) = \sum_{k=0}^{\infty} b_k (x-\alpha)^k$$
and, if we multiply the two series, we get another series of the form,
$$ f(x) \cdot g(x) = \sum_{k=0}^{\infty} c_k (x-\alpha)^k$$
where, $$ c_k = \sum_{i=0}^{k} a_k b_{k-i}$$
by the Cauchy product formula, and now, we bring in Taylor's theorem,
$$ a_i = \frac{f^{i} (a)}{i!} $$
and,
$$ b_{k-i} =\frac{ g^{k-i} (a) }{(k-i)!}$$
And,
$$ c_k = \sum_{i=0}^{k} \frac{f^{i} (a)}{i!} \frac{ g^{k-i} (a) }{(k-i)!}$$
Now, suppose, we take definition of $c_k$ from the definition of Taylor expansion, then,
$$ c_k = \frac{ (fg)^k }{k!}\bigg|_{a}$$
Equating the two definitions for $ c_k$
$$ \frac{ (fg)^k }{k!}\bigg|_{a} = \sum_{i=0}^{k} \frac{f^{i} (a)}{i!} \frac{ g^{k-i} (a) }{(k-i)!}$$
re-arranging
$$ (fg)^k\bigg|_{a} = \sum_{i=0}^{k} \binom{k}{i} f^{i} (a) g^{k-i} (a)$$
At this point, we can just replace '$a$' with '$x$' because we are not really evaluating anything using the definitions of functions yet, also, we replace '$k$' with '$n$' for aesthetic
$$ (fg)^n(x)= \sum_{i=0}^{n} \binom{n}{i} f^{i} (x) g^{n-i} (x)$$
Note: functions are assumed to be nice Note: exponents on functions mean derivatives not powers.
Solution 3:
Taylor series expansion of $f(x+h)$ $$ f(x+h)=f(x)+h\frac{d}{dx} \left(f(x) \right)+\frac{h^2 }{2!}\frac{d^2}{dx^2} \left( f(x) \right)+\frac{h^3 }{3!}\frac{d^3}{dx^3} \left( f(x) \right)+.... $$
Taylor series expansion of $g(x+h)$ $$ g(x+h)=g(x)+h\frac{d}{dx} \left(g(x) \right)+\frac{h^2 }{2!}\frac{d^2}{dx^2} \left( g(x) \right)+\frac{h^3 }{3!}\frac{d^3}{dx^3} \left( g(x) \right)+.... $$
Taylor series expansion of $f(x+h)g(x+h)$ $$ f(x+h)g(x+h)=f(x)g(x)+h\frac{d}{dx} \left(f(x)g(x) \right)+\frac{h^2 }{2!}\frac{d^2}{dx^2} \left( f(x)g(x) \right)+\frac{h^3 }{3!}\frac{d^3}{dx^3} \left( f(x)g(x) \right)+.... $$
$$ f(x+h)g(x+h)=f(x)g(x)+h\frac{d}{dx} \left(f(x)g(x) \right)+\frac{h^2 }{2!}\frac{d^2}{dx^2} \left( f(x)g(x) \right)+\frac{h^3 }{3!}\frac{d^3}{dx^3} \left( f(x)g(x) \right)+....=(f(x)+h\frac{d}{dx} \left(f(x) \right)+\frac{h^2 }{2!}\frac{d^2}{dx^2} \left( f(x) \right)+\frac{h^3 }{3!}\frac{d^3}{dx^3} \left( f(x) \right)+....)(g(x)+h\frac{d}{dx} \left(g(x) \right)+\frac{h^2 }{2!}\frac{d^2}{dx^2} \left( g(x) \right)+\frac{h^3 }{3!}\frac{d^3}{dx^3} \left( g(x) \right)+....) $$
If you order $h^n$, it will give you binom expansion
$$\frac{d^{n}}{dx^{n}} f(x)g(x)=\sum_{i=0}^n {n \choose i} f^{(i)}(x)g^{(n-i)}(x)$$
Because it has exactly the same coefficient of
$$ =(1+hx +\frac{h^2 x^2 }{2!}+\frac{h^3x^3 }{3!}+....)(1+hy +\frac{h^2 y^2 }{2!}+\frac{h^3y^3 }{3!}+....)=e^{hx}e^{hy}=e^{h(x+y)} $$
$$ e^{h(x+y)}=1+h(x+y) +\frac{h^2 (x+y)^2 }{2!}+\frac{h^3(x+y)^3 }{3!}+.... $$
If you order $h^n$ of $e^{hx}e^{hy}$ and equal to $h^n$ of $e^{h(x+y)}$ , it will give you binom expansion proof.
$$\frac{(x+y)^n }{n!}=\sum_{i=0}^n \frac{x^i y^{n-i}}{i! (n-i)!} $$
$$(x+y)^n=\sum_{i=0}^n \frac{n! x^i y^{n-i}}{i! (n-i)!} $$
$$(x+y)^n=\sum_{i=0}^n {n \choose i} x^i y^{n-i} $$
Thus it is also true $$\frac{d^{n}}{dx^{n}} f(x)g(x)=\sum_{i=0}^n {n \choose i} f^{(i)}(x)g^{(n-i)}(x)$$
Solution 4:
The main feature of the Leibniz law $f(x)g(x)' = f'(x)g(x) + f(x)g'(x)$ is that it turns a product into a sum. Another way to write it to show that more explicitly is
$$ \frac{d}{dx}\left(f(x)g(x)\right)=\left(\left.\frac{\partial}{\partial s}\right\rvert_{s,t=x}+\left.\frac{\partial}{\partial t}\right\rvert_{s,t=x}\right)\left(f(s)g(t)\right)=\frac{df}{dx}g(x)+f(x)\frac{dg}{dx} $$
or if you don't like the contrivance of introducing distinct dummy variables only to then sub them out for $x$ again, you could write with a tensor product like
$$ \frac{d}{dx}\left(f(x)\otimes g(x)\right) = \left(1\otimes \frac{d}{dx} + \frac{d}{dx}\otimes 1\right)\left(f(x)\otimes g(x)\right). $$
Either way we think of it, the point is that the derivative of a product is really a sum of two differential operators, applied to that product.
And how do you compute the $n$th power of a sum of two commuting variables? Why with the binomial theorem of course. The formula
$$ (x+y)^n = \sum_{i=0}^n{n \choose i}x^iy^{n-i} $$
makes sense for any formal commuting variables. So it makes sense for $\frac{\partial}{\partial s}$ and $\frac{\partial}{\partial t}$. So we have
$$ \left(\frac{\partial}{\partial s}+\frac{\partial}{\partial t}\right)^n = \sum_{i=0}^n{n \choose i}\frac{\partial^i}{\partial s^i}\frac{\partial^{n-i}}{\partial t^{n-i}} $$ which, when applied to the product $f(s)g(t)$ and evaluated at $s=t=x$, gives you the generalized Leibniz formula you desire. So we see that formula as just a special case of the binomial theorem. After all, the binomial theorem is just a bookkeeping of the combinatorics of applying the distributive law $n$ times. And differential operators distribute over addition, just as much as multiplication does. That is why an $n$th derivative product rule is just $n$ applications of the distributive law. The combinatorics of it is nicely explained in Tad's answer.
So the question here isn't why an $n$-fold derivative that's actually a sum has the same form as the $n$th power of a sum. That's now obvious: the former is just a special case of the latter. The question instead is: why is the derivative of a product a sum in the first place? What is the intuition behind the non-generalized first derivative Leibniz law? The graphic on wikipedia is helpful. This thread has some nice answers. I will offer this: a derivative is the linear response of a function to a small change of input. It makes sense that when you compute the linear response of a product of two functions, you end up with a product of sums, which you distribute. But mostly that's probably a topic for a separate question.
I will add that in physics and signal processing and other places where you spend your days in Fourier transforms, it is very natural to think of differentiation and multiplication by our independent variable as two sides of the same coin. Every function admits dual representation in configuration space and in momentum space. Multiplying in configuration space is the same as differentiating the momentum space representation, and vice versa. So of course $n$ applications of one is the same as $n$ applications of the other. I bet this point of view could be fleshed out into a complete answer to the question.