Numerical Approximation of Differential Equations with Midpoint Method
Solution 1:
Too much going on. Keep it simple to not get lost. Just look at the Taylor expansion about the midpoint. Then since one side is larger than the midpoint and one side is smaller, each with a distance of $\frac{\Delta t}{2}$, you get
$$ u(t+\Delta t) = u(t+\frac{\Delta t}{2}) + (\frac{\Delta t}{2})u'(t+ \frac{\Delta t}{2}) + (\frac{\Delta t}{2})^2 u''(t + \frac{\Delta t}{2}) + \mathcal{O}(\Delta t)^3 $$
Here I use shorthand $f(t)=f(u(t),t)$ and plugging in $u' = f$ you get
$$u(t+\Delta t) = u(t+\frac{\Delta t}{2}) + (\frac{\Delta t}{2})f(t+ \frac{\Delta t}{2}) + (\frac{\Delta t}{2})^2 f'(t + \frac{\Delta t}{2}) + \mathcal{O}(\Delta t)^3 $$
Call that equation 1. The same steps for the other side gives
$$u(t) = u(t+\frac{\Delta t}{2}) - (\frac{\Delta t}{2})f(t + \frac{\Delta t}{2}) + (\frac{\Delta t}{2})^2 f'(t + \frac{\Delta t}{2}) + \mathcal{O}(\Delta t)^3 $$
Call that equation 2. This was the only hard part. Now subtract equation 1 from equation 2 to make equation 3:
$$u(t+\Delta t) - u(t) = (\Delta t) f(t + \frac{\Delta t}{2}) + \mathcal{O}(\Delta t)^3 $$
What this tells us is that if you know the solution at the midpoint and use it to calculate the next step then you have an error of $\mathcal{O}(\Delta t)^3$ which is what we want. But how do we find out $u$ at the midpoint, i.e. $u(t+\frac{\Delta t}{2})$? Add equation 1 to equation 2 and divide by 2:
$$\frac{u(t+\Delta t) +u(t)}{2} = u(t+\frac{\Delta t}{2}) + \mathcal{O}(\Delta t)^2$$
Notice the $\mathcal{O}(\Delta t)$ term cancelled. Plugging this into $f$ as our approximation for $u$ at the midpoint, we get
$$ f(t+\frac{\Delta t}{2},u(t+\frac{\Delta t}{2})) = f(t+\frac{\Delta t}{2},\frac{u(t+\Delta t) +u(t)}{2}) + \mathcal{O}(\Delta t)^2 $$
Notice that this is an order 2 approximation to $f$ at the midpoint, but since $f$ is multiplied by $\Delta t$, we get that the error is order 3, that is by substitution into equation 3 we get:
$$u(t+\Delta t) - u(t) = (\Delta t) f(t+\frac{\Delta t}{2},\frac{u(t+\Delta t) +u(t)}{2}) + \mathcal{O}(\Delta t)^3 $$
which is the midpoint method with the error. I hope this illuminates "why" it works: the Taylor expansion at the center makes the order 1 term flip signs and cancel. This trick is used a lot in numerical methods.
Solution 2:
It is easier to do this for an autonomous DE and then only for the scalar case, the error terms for the other cases have a similar structure. Using $u'(t)=f(u(t))$ and thus $u''(t)=f'(u(t))u'(t)=f'(u(t))f(u(t))$ etc. gives \begin{align} u(t+h)&=u(t)+u'(t)·h+\frac12·u''(t)·h^2+\frac16·u'''(t)·h^3+\frac1{24}·u^{(4)}(t)·h^4+… \\ &=u(t)+f·h+\frac12·f'·f·h^2+\frac16·(f''·f^2+f'^2·f)·h^3 \\ &\qquad\qquad+\frac1{24}·(f'''·f^3+4·f''f'f^2+f'^3·f)·h^4+… \end{align} and for the improved Euler or midpoint method at $u=u(t)$ \begin{align} u_+&=u+h·f\left(u+\frac h2·f\right) \\ &=u+h·\left(f+f'·\frac h2·f+\frac12·f''·\left(\frac h2·f\right)^2+\frac16·f'''·\left(\frac h2·f\right)^3\right)+… \\ &=u+f·h+\frac12·f'·f·h^2+\frac18·f''·f^2·h^3+\frac1{48}·f'''·f^3·h^4+… \end{align} where we see several differences in the coefficient of $h^3$.
See also the (Heun-)Ralston method. https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods#Usage
For a comprehensive methodology to treat the vector case, see Butcher's rooted trees as presented in the introduction here
In the vector case, the terms in the expansion of the midpoint step all have the form $f^{(k)}(u)[f(u),…,f(u)]_{k\text{ repetitions}}$. The terms in the Taylor series get more structure as the derivatives now are tensors, $f'$ is a linear operator, $f''$ a vector valued symmetric 2-form, $f'''$ a vector-valued symmetric 3-form etc. \begin{align} u'&=f(u)\\ u''&=f'(u)[f(u)]\\ u'''&=f''(u)[f(u),f(u)]+f'(u)[f'(u)[f(u)]]\\ u^{(4)}&=f'''(u)[f(u),f(u),f(u)]+3·f''(u)[f'(u)[f(u)],f(u)]+f'(u)[f''(u)[f(u),f(u)]]+f'(u)[f'(u)[f'(u)[f(u)]]] \end{align} From this one can see why rooted trees are a good idea to describe the single terms.
As to the last point of the question, this is a Taylor expansion that IMO hides more of the structure of the problem than helps to expose it. The structure is $$ u_+=u+h·f(u+h·a,t+h·b),\text{ where } a=\frac12·f(u,t)\text{ and }b=\frac12 $$ For the error analysis, the first $f$ gets replaced by its Taylor expansion in $(u,t)$. Note that here $a,b$ are the variables, $u,t,f(u,t)$ are constants. Thus $$ u_+=u+h·\Biggl(f(u,t)+h·\Bigl(f_u(u,t)·a+f_t(u,t)·b\Bigr)\\ +\frac{h^2}{2}·\Bigl(f_{uu}(u,t)·a^2+2f_{ut}(u,t)·ab+f_{tt}(u,t)·b^2\Bigr)\\ +\frac{h^3}6·\Bigl( f_{uuu}(u,t)·a^3+3f_{uut}(u,t)·a^2b+3f_{utt}(u,t)·ab^2+f_{ttt}(u,t)·b^3\Bigr) +…\Biggr) $$ Now insert the terms for $a$ and $b$ to get the full form of the expansion.