Orthogonality for Binomial Coefficients
Solution 1:
They don’t come from the binomial theorem: they come from the inclusion-exclusion principle. This is easier to see if you multiply them by $(-1)^n$ to get
$$\sum_{k=0}^n\binom{n}k(-1)^{n-k}k^r=\begin{cases} 0,&\text{if }r<n\\ n!,&\text{if }r=n\;. \end{cases}$$
The left-hand side counts surjections from $[r]=\{1,\ldots,r\}$ to $[n]=\{1,\ldots,n\}$. Of course this is $0$ when $r<n$ and $n!$ when $r=n$.
The left-hand side can be rewritten as follows:
$$\begin{align*} \sum_{k=0}^n\binom{n}k(-1)^{n-k}k^r&=\sum_{k=0}^n\binom{n}{n-k}(-1)^{n-k}k^r\\ &=\sum_{k=0}^n\binom{n}k(-1)^k(n-k)^r\\ &=n^r-\binom{n}1(n-1)^r+\binom{n}2(n-2)^r-+\ldots\;. \end{align*}$$
The first term, $n^r$, is the number of functions from $[r]$ to $[n]$. For each $k\in[n]$ there are $(n-1)^r$ functions from $[r]$ to $[n]\setminus\{k\}$, and there are $n$ possible choices for $k$; subtracting $\binom{n}1(n-1)^r$ throws out these non-surjective functions from $[r]$ to $[n]$. However, functions whose ranges miss (at least) two elements of $[n]$ get thrown out (at least) twice and have to be added back in, giving
$$n^r-\binom{n}1(n-1)^r+\binom{n}2(n-2)^r\;.$$
This is now an overcount, since functions whose ranges miss (at least) three elements of $[n]$ have now been counted once, removed three times, and recounted three times: on net they’ve been counted once and need to be thrown away again.
The inclusion-exclusion principle ensures that the full summation correctly accounts for everything and therefore really does give the number of surjections from $[r]$ to $[n]$.
Solution 2:
A Proof Using Binomial Theorem:
We prove by induction.
The binomial theorem says $$(x-1)^n = \sum_{k}{n\choose k}(-1)^{n-k} x^k.$$ Setting $x=1$ gives a proof for $r=0$. Suppose the statement is true for $<r$.
Suppose $r\leq n$. Take $r$th derivative of the formula above, we get $$\begin{eqnarray}n(n-1)\ldots (n-r+1)(x-1)^{n-r} &=& \sum_{k}{n\choose k}(-1)^{n-k}k(k-1)\ldots (k-r+1)x^{k-r}\\ &=& \sum_{k}{n\choose k}(-1)^{n-k}P(k)x^{k-r},\end{eqnarray}$$ where $P(k)=k^r+$ lower terms. Set $x=1$. By the induction hypothesis, the terms evaluated by the lower terms sum to $0$, and so RHS $=\sum_k{n\choose k}(-1)^{n-k}k^r$. If $r < n$, then LHS $=0$. If $r=n$, then LHS $=n!$. This completes the proof.
Remark: A slick way to prove it is to count the number of surjections from $[r]$ to $[n]$. By the inclusion-exclusion principle, we get the number of surjections equal to $$\sum_k {n\choose k}(-1)^{n-k}k^r.$$ However, when $r<n$, there are no surjections, whereas, when $r=n$, there are $n!$ many.
Solution 3:
Consider $\binom{k}{j}$ as a degree $k$ polynomial (combinatorial polynomial) in $k$: $$ \binom{k}{j}=\frac{k(k-1)(k-2)\cdots(k-j+1)}{j!}\tag{1} $$ It is not to difficult to see that we can write any polynomial of degree $m$ as sum of combinatorial polynomials of degree $m$ or less. In particular, we have $$ \newcommand{\stirtwo}[2]{\left\{{#1}\atop{#2}\right\}} k^m=\sum_{j=0}^mj!\stirtwo{m}{j}\binom{k}{j}\tag{2} $$ where $\stirtwo{m}{j}$ is a Stirling Number of the Second Kind.
Since $k^m$ can be written as a sum of combinatorial polynomials of degree $m$ or less, $$ n\gt m\implies\stirtwo{m}{n}=0\tag{3} $$ Furthermore, since the coefficient of $k^m$ in $m!\binom{k}{m}$ is $1$, $$ \stirtwo{m}{m}=1\tag{4} $$ Using $(2)$ in your sum yields $$ \begin{align} \sum_{k=0}^n\binom{n}{k}(-1)^kk^m &=\sum_{k=0}^n\binom{n}{k}(-1)^k\sum_{j=0}^mj!\stirtwo{m}{j}\binom{k}{j}\\ &=\sum_{j=0}^mj!\stirtwo{m}{j}\sum_{k=0}^n(-1)^k\binom{n}{k}\binom{k}{j}\\ &=\sum_{j=0}^mj!\stirtwo{m}{j}\sum_{k=0}^n(-1)^k\binom{n}{j}\binom{n-j}{k-j}\\ &=\sum_{j=0}^mj!\stirtwo{m}{j}\binom{n}{j}(-1)^j(1-1)^{n-j}\\ &=(-1)^nn!\stirtwo{m}{n}\tag{5} \end{align} $$ Equtions $(3)$, $(4)$, and $(5)$ give the results sought.