Pushforward of Lie Bracket
Solution 1:
Well, you see it is much simpler in coordinate independent form. As for diffeomorphism $ f : M \rightarrow N $ you have $ f_* : \mathcal{X}(M) \rightarrow \mathcal{X}(N) $ and hence for $p\in M $ it maps tangent spaces $T_p(M)$ to $T_{f(p)}(N) $ given by for $ g \in C^\infty(N) $ you have $f_* (X)(g)(f(p)) = X(g\circ f)(p)$ hence $ f_*(X)(g)\circ f = X(g\circ f ) $ Thus for $X,Y \in \mathcal{X}(M) $ we have for all $ g \in C^\infty(N) $ \begin{align*} & f_*[X,Y]_{f(p)}(g) = [X,Y]_p(g\circ f) \\ & = X_p(Y(g\circ f))-Y_p(X(g\circ f)) \\ & = X_p(f_*(Y)(g)\circ f) - Y_p(f_*(X)(g)\circ f) \\ & = f_*(X)_{f(p)}(f_*(Y)(g))-f_*(Y)_{f(p)}(f_*(X)(g)) \\ & = [f_*(X),f_*(Y)]_{f(p)} (g) \end{align*} Hence $ f_*[X,Y] = [f_*(X),f_*(Y)] $
Solution 2:
To do this computation in coordinates without using functions and points you have to adopt the physicist way of writing things which is messy and unplesant :-) However chain rule takes care of all the evaluation matters.
Let f map $x$ to $y$. We will denote the Jacobian and inverse Jacobian by $\frac{\partial y^j}{\partial x^i}, \frac{\partial x^j}{\partial y^i}$
We will write $\tilde{Z}=f^*Z$ and $\tilde{W}=f^*W$ so the "components change as"
$Z^j = \tilde{Z}^i\frac{\partial x^j}{\partial y^i}$
$W^j = \tilde{W}^i\frac{\partial x^j}{\partial y^i}$
(here we really see Z as a pushforward of $\tilde{Z}$ by the inverse map etc)
Consider $f^*[Z,W]$
$=((Z^i\frac{\partial}{\partial x^i}(W^j) - W^i\frac{\partial}{\partial x^i}(Z^j))\frac{\partial y^l}{\partial x^j}\frac{\partial}{\partial y^l}$
$=((\tilde{Z}^k\frac{\partial x^i}{\partial y^k}\frac{\partial}{\partial x^i}(\tilde{W}^m\frac{\partial x^j}{\partial y^m}) - (\tilde{W}^k\frac{\partial x^i}{\partial y^k}\frac{\partial}{\partial x^i}(\tilde{Z}^m\frac{\partial x^j}{\partial y^m}))\frac{\partial y^l}{\partial x^j}\frac{\partial}{\partial y^l}$
The mixed derivative term is of the form
$((\tilde{Z}^k(\tilde{W}^m\frac{\partial}{\partial y^k}\frac{\partial x^j}{\partial y^m})-(\tilde{W}^k(\tilde{Z}^m\frac{\partial}{\partial y^k}\frac{\partial x^j}{\partial y^m}))$ =0
and the remaining terms give
$=((\tilde{Z}^k\frac{\partial}{\partial y^k}(\tilde{W}^m) - (\tilde{W}^k\frac{\partial}{\partial y^k}(\tilde{Z}^m))\delta_{lm}\frac{\partial}{\partial y^l}$
which is $[f^*Z,f^*W]$. The trick is to use chain rule when ever you want the derivation to be compatible with the function you are applying it to. So the middle expressions might not be completely sensible (just formal expressions to see the steps) but derivations are. Why this only works for diffeomorphisms is the component change rules given above is basically the pushforward expression in coordinates and that expression allows usage of chain rule at certain parts.
Solution 3:
Maybe the following plausibility argument will also help : [X,Y] = Lie-derivative of Y in direction X, which is e.g defined by a limiting process of 1-parameter-groups of differential equations (with X and Y the fields of direction); e.g. [F.Warner, Foundations of Differential Manifolds and Lie Groups, 1971. p.69]. By the uniqueness of solutions of diff.equs. the compatibility of Lie derivatives with diffeomorphisms should follow.