On finding polynomials that approximate a function and its derivative (extensions of Stone-Weierstrass?)
If we assume a little more smoothness, we can use a trick similar to the one in this question.
Let me take $n=2$ for concreteness. Suppose $f$ is $C^2$ on a neighborhood $U$ of $K$. With an appropriate cutoff function we can find a $C^2$, compactly supported $g$ which agrees with $f$ on a (possibly smaller) neighborhood of $K$. We can also find a large enough square $Q = (-M,M)$ which contains the support of $g$, so that $g$ and all its derivatives vanish on the boundary $\partial Q$.
Fix $\epsilon > 0$. The second partial $\partial_x \partial_y g$ is continuous, so we can find a polynomial $p_0$ with $|p_0 - \partial_x \partial_y g| < \epsilon$ on $\bar{Q}$. Set $$p(x,y) = \int_{-M}^x \int_{-M}^y p_0(s,t)\,dt\,ds.$$ Then by the fundamental theorem of calculus and Fubini's theorem we have, for any $(x,y) \in Q$, $$\begin{align*}|p(x,y) - g(x,y)| &= \left|\int_{-M}^x \int_{-M}^y p_0(s,t) - \partial_x \partial_y g(s,t)\,ds\,dt\right|\\ & \le \int_{-M}^x \int_{-M}^y |p_0(s,t) - \partial_x \partial_y g(s,t)|\,ds\,dt \\&\le (2M)^2 \epsilon \end{align*}$$ and $$\begin{align*}|\partial_x p(x,y) - \partial_x g(x,y)| &= \left| \int_{-M}^y p_0(s,t) - \partial_x \partial_y g(s,t)\,dt\right|\\ & \le \int_{-M}^y |p_0(s,t) - \partial_x \partial_y g(s,t)|\,dt \\&\le 2M \epsilon. \end{align*}$$ A similar argument works for $\partial_y$. Since $f=g$ on a neighborhood of $K$, we have that $p$ and $f$, as well as their derivatives, are uniformly close on a neighborhood of $K$.
In $\mathbb{R}^n$, this approach requires us to assume that $f$ is $C^n$.
By using a partition of unity, we may assume that $f$ has compact support. Let $\phi_m$ be the density of the normal vector with mean $\bf 0$ and covariance matrix $I/m$. Let $q_m$ be a Maclaurin polynomial approximating $\phi_m$ of sufficiently high degree that $\lim_{m \to \infty} \|\phi_m - q_m\|_{C(K)} = 0$, where $K$ is the Minkowski difference of where you want the approximation to hold and the support of $f$. Now let $p_m := q_m * f$. These $p_m$ do the job.
Remark: Nachbin has a generalization of the Stone-Weierstrass theorem to include approximating derivatives, even on manifolds. See MR0030590: Nachbin, Leopoldo, Sur les algèbres denses de fonctions différentiables sur une variété. C. R. Acad. Sci. Paris 228, (1949), 1549–1551.
I published a constructive proof of this back in 2008 for the $n$-variate case. Continuously differentiable is all that is required. See Theorem 5 in
Peet, Matthew M., Exponentially stable nonlinear systems have polynomial Lyapunov functions on bounded regions, IEEE Trans. Autom. Control 54, No. 5, 979-987 (2009). ZBL1367.93442.
Arxiv link here.
The following theorem combines Lemmas 3 and 4 with the Weierstrass approximation theorem. It says that the polynomials are dense in $\mathcal{C}_{\infty}^{1}$ with respect to the Sobolev norm for $W^{1, \infty},$ among others.
Theorem 5. Suppose $v \in \mathcal{C}_{\infty}^{1}(B) .$ Then for any $\epsilon>0,$ there exists a polynomial $p,$ such that $$ \max _{\alpha \in Z^{n}}\left\|D^{\alpha} p-D^{\alpha} v\right\|_{\infty} \leq \epsilon. $$
Here, $\mathcal{C}_{\infty}^{i}(\Omega):=\left\{f: D^{\alpha} f \in \mathcal{C}(\Omega) \text{ for any } \alpha \in \mathbb{N}^{n} \text{ such that } \|\alpha\|_{\infty}=\max _{j} \alpha_{j} \leq i \right\}$.
Summary of the construction from the paper using the multinomial notation:
The difficulty of this problem is a consequence of the fact that, for a given function, not all the partial derivatives of that function are independent. For example, we have the identities \begin{align*} &D^{(1,1,0)}f(x,y,z)=\\ &D^{(1,1,0)}\left(f(x,y,0) + \int_{0}^z D^{(0,0,1)}f(x,y,s)\,ds\right)\\ &=D^{(1,0,0)}\left(D^{(0,1,0)}f(x,y,z)\right), \end{align*} among others. Therefore, given approximations to the partial derivatives of a function, these approximations will, in general, not be the partial derivatives of any function. The problem becomes, for each partial derivative approximation, how to extract the information which is unique to that partial derivative in order to form an approximation to the original function. The following construction shows how this can be done.
In the paper, there then is a rather general formula for the construction, K, which maps the essential information from all the partial derivatives to the original function. However, this can be better understood using examples.
Basically, to approximate the original function, p, approximate each partial derivative, $q_\alpha = D^\alpha p$ with a polynomial, $\tilde q_\alpha$. Then construct the polynomial as
Example: If $p=K(\{q_\alpha\}_{\alpha \in Z^2})$, then \begin{align*} p(x_1,x_2)&=\int_0^{x_1} \int_0^{x_2} q_{(1,1)}(s_1,s_2)\,ds_2\, ds_1\\ &+\int_0^{x_1} q_{(1,0)}(s_1,0)\,ds_1\\ &+\int_0^{x_2} q_{(0,1)}(0,s_2)\,ds_2\\ &+q_{(0,0)}(0,0). \end{align*}
If $n=3$, then \begin{align*} p(x_1,x_2,x_3)&=\int_0^{x_1} \int_0^{x_2} \int_0^{x_3} q_{(1,1,1)}(s_1,s_2,s_3)\,ds_3\, ds_2\, d s_1\\ &+\int_0^{x_1} \int_0^{x_2} q_{(1,1,0)}(s_1,s_2,0)\,ds_2\, ds_1\\ &+\int_0^{x_1} \int_0^{x_3} q_{(1,0,1)}(s_1,0,s_3)\,ds_3\, d s_1\\ &+ \int_0^{x_2} \int_0^{x_3} q_{(0,1,1)}(0,s_2,s_3)\, ds_3\, d s_2\\ &+\int_0^{x_1} q_{(1,0,0)}(s_1,0,0)\,ds_1\\ &+\int_0^{x_2} q_{(0,1,0)}(0,s_2,0)\,ds_2\\ &+\int_0^{x_3}q_{(0,0,1)}(0,0,s_3)\,d s_3\\ &+q_{(0,0,0)}(0,0,0). \end{align*}
Notice that this structure gives a natural way of approximating the function $p$ and all of its partial derivatives. For example, if $p$ is defined as above for $n=2$, then \begin{equation} \frac{\partial }{\partial x_1}p(x_1,x_2)=\int_0^{x_2} q_{(1,1)}(x_1,s_2)\, ds_2 + q_{(1,0)}(x_1,0). \end{equation} Now, if one approximates $q_\alpha$ by $\tilde q_\alpha$ and uses the construction to create $\tilde p$, then \begin{equation} \frac{\partial }{\partial x_1}\tilde p(x_1,x_2)=\int_0^{x_2} \tilde q_{(1,1)}(x_1,s_2)\, ds_2 + \tilde q_{(1,0)}(x_1,0). \end{equation}
Since the integral is a bounded operator, and since $q_\alpha \approx \tilde q_\alpha$, then $\frac{\partial }{\partial x_1}\tilde p \approx \frac{\partial }{\partial x_1}p$.