Substituting for Taylor series
Def: A function $f$ is $o(x^n)$ for $x \to 0$ if $$\lim_{x \to 0} \frac{f(x)}{x^n} = 0$$
If $f$ is $o(x^n)$ for $x \to 0$, and if $m < n$, then $f$ is $o(x^m)$ too.
There is a theorem, due to Peano, that if $f(x) = P(x) + o(x^n)$, for $P(x)$ a polynomial of degree $n$, then $P(x)$ is the Taylor polynomial of order $n$ of $f$, around zero. Clearly the definition I gave right at the beginning can be generalized for another center, $x_0$.
I'll find the Taylor expansion around zero of $\ln (1 + x^2)$, of order $6$. It is known that: $$\ln (1 + x) = x - \frac{x^2}{2} + \frac{x^3}{3} + o(x^3)$$
Using $x^2$ instead of $x$, we get: $$\ln (1 + x^2) = x^2 - \frac{x^4}{2} + \frac{x^6}{3} + o(x^6)$$ Since the remainder is $o(x^6)$, that is the polynomial desired, for Peano's theorem. Bigger examples might require that some terms are "swallowed" by the remainder $o(x^n)$, if the degree of said terms gets more than $n$. See that $o(x^n)$ is a notation, so things like $o(x^n) + o(x^n) = o(x^n)$ shouldn't bother you. At the end of the day, the $o(x^n)$ you have is not the same you began with. For example, $x^4 + x^5 + o(x^3) = o(x^3)$.
So, it is not simple as just making substitutions, you always have to pay attention to the remainder, or else you might get a polynomial, that is not the best approximation.
Intuitively, the taylor series of $f$, denoted by $T_f$, evaluated at $x$ is equal to the function $f$ evaluated at $x$. $$T_f(x)=f(x)$$ Thus evaluating it at some point $x=g(t)$ yields $$T_f(g(t))=f(g(t))\tag{1}\label{1}$$ which is what we wanted. We will treat this intuitive conclusion as our hypothesis and prove it in a way that addresses your concerns regrading chain rule.
As a preliminary, note that due to the operator
$$e^{h\partial}=\sum\limits_{n=0}^{\infty}\frac{(h\partial)^n}{n!}\tag{2}\label{eq:1}$$ where $\partial=\frac{d}{dx}$, taylor series expansion is an operator on functions. We will use it to prove the hypothesis $$e^{h\partial}(f\circ g)=e^{h\partial}(f)\circ g\tag{3}\label{hypo}$$ meaning that the taylor series of $f\circ g$ is equivalent to composing the taylor series of $f$ with the function $g$, namely $T_{f\circ g}(x)=T_f(g(x))$.
Taylor series of compositions is expressed as
$$e^{h\partial}(f\circ g)=\exp(\partial_fe^{h\partial_g})(g,f)\tag{4}\label{eq:2}$$
where $\exp(\partial_fe^{h\partial_g})$ is an operator on pairs of maps $(g,f)$, where $\partial_g$ is applied to $g,$ and $\partial_f$ to $f$.
The proof of $\eqref{eq:2}$ can be found here [Theorem 5.6.] and it contains the answer to your question. The lines of the proof pertaining to your inquiry regrading chain rule are
$$\exp(\partial_fe^{h\partial_g})=\exp\left(\partial_f\sum\limits_{i=0}^{\infty}\frac{(h\partial_g)^i}{i!}\right)=\prod_{i=1}^{\infty}e^{\partial_f\frac{(h\partial_g)^i}{i!}}\Big(e^{\partial_f}\Big)$$ $$\implies$$ $$\exp(\partial_fe^{h\partial_g})(g,f)=\sum\limits_{\forall_n}\frac{h^n}{n!}\sum\limits_{\lambda(n)}n!\prod\limits_{k\cdot l\in\lambda}\Big(\frac{\partial_f\partial_g^l(g)}{l!}\Big)^k\frac{1}{k!}\Big(\Big(e^{\partial_f}\Big)f\Big)$$ where $\lambda(n)$ stands for the partitions of $n$. Taking into consideration the fact that $e^{\partial_f}(f)$ evaluated at a point $x$ is the same as evaluating $f$ at $x$, we see
$$\exp(\partial_fe^{h\partial_g})(g,f)=\sum\limits_{\forall_n}\frac{h^n}{n!}\sum\limits_{\lambda(n)}n!\prod\limits_{k\cdot l\in\lambda}\Big(\frac{\partial_f\partial_g^l(g)}{l!}\Big)^k\frac{1}{k!}\Big(f(g)\Big)$$
which is percisely the taylor series expansion of a composition (noting Faà di Bruno's formula in the innermost sum, which resolves your issue with higher order chain rule). Thus
$$e^{h\partial}(f\circ g)=e^{h\partial}(f)\circ e^{h\partial}(g)$$
meaning that taylor series of $f\circ g$ is equivalent to composing the taylor series of $f$ with the taylor series of $g$. Again, noting that $e^{\partial_g}(g)$ evaluated at a point $x$ is the same as evaluating $g$ at $x$, we see $$e^{h\partial}(f)\circ e^{h\partial}(g)=e^{h\partial}(f)\circ g$$
$$\implies$$
$$e^{h\partial}(f\circ g)=e^{h\partial}(f)\circ g$$
meaning that the taylor series of $f\circ g$ is equivalent to composing the taylor series of $f$ with the function $g$, proving \eqref{hypo}, as inquired in the question.
If you have $\displaystyle f(x)=\sum_{n=0}^\infty a_n x^n$ for any $x\in (-r,r)$ then you have the same equality for $f(x^2), x\in(-\sqrt{r},\sqrt{r}),$ for $f(\sqrt{x}) ,x\in(-r^2,r^2),$ $\cdots$ It is just a change of the name of the variable. The only important thing is that the new variable belongs to the domain where equality holds.
Of course, you can get the Taylor series of $f(x^2)$ or $f(\sqrt{x})$ (if it exists, in the second case), but if you know the Taylor series of $f(x)$ is just to make a substitution.
We know that for any real number $x,$
$$e^x=\sum_{n=0}^{\infty}\frac{x^n}{n!}.$$
Now if $x$ is a real number, then so is $x^2!$ It follows that
$$\tag 1 e^{x^2}=\sum_{n=0}^{\infty}\frac{(x^2)^n}{n!} = \sum_{n=0}^{\infty}\frac{x^{2n}}{n!}.$$
We didn't use the word "substitution" or say "let $u$ equal …" anywhere. That's all that is going on here.
How do we know $(1)$ is the Taylor series of $e^{x^2}$ at $0?$ That follows from this theorem: Suppose $f(x)=\sum_{n=0}^{\infty} a_nx^n,$ where the power series converges in $(-r,r)$ for some $r>0.$ Then $f$ is infinitely differentiable in $(-r,r),$ and $a_n = D^nf(0)/n!$ for all $n.$
Now to your comment on this easy approach vs. the long awful slog of computing the Taylor coefficients of $e^{x^2}$ by hand: I'm not sure why you're bothered by this. It is just one of many examples of a problem that has two solutions, one simple and direct, the other long, difficult, and possibly tedious. Happens all the time.