Intuition of Picard Iteration
Solution 1:
Before we begin I would like to bring your attention to three points.
( i ) The proofs we find in modern books are not the original one discovered by Picard and Lindelöf but refined versions of it. Back in 1894 they had a different mindset, different intuitions and much less readily available knowledge. Furthermore, the branch of functional analysis was in its early years, Lipschitz continuity was not yet known as Lipschitz continuity (in fact he was still alive) and Banach was just a baby.
( ii ) Picard iteration is proposed in chapter XI, section III, of (E. Picard) Traté D'Analyse II as an successive approximations method to show the existence (and implicitly the uniqueness) of solutions as an alternative method to Cauchy-Lipschitz, which is studied in detail in subchapters XI.I and XI.II. The content discussed in these two sections is very likely what motivated his own method; particularly XI.II, entitled "On an important property of Cauchy-Lipschitz method". (Nowadays Cauchy-Lipschitz theorem and Picard-Lindelöf theorem refer to the same theorem.)
( iii ) The kind of intuition you should expect is a more abstract and technically refined intuition which is found throughout functional analysis, since the argument used in the Picard-Lindelöf theorem is essentialy a functional analysis argument. Such kind of intuition indeed suggests "ad hoc conditions" in some situations, but have in mind that deep down they are very well educated guesses.
Said that, let's discuss a plausible modern path to craft the proof of uniqueness with the intention that at the right moment Lipschitzianity will look evident. We start with an open set $\mathcal{O} \subseteq \mathbb{R}^n$, an open interval $I \subseteq \mathbb{R}$, $ f: \mathcal{O} \times I \to \mathbb{R}^n $ continuous, $x : I \to \mathbb{R}^n$ differentiable, the problem in its differential form $$\begin{cases} \dot{x}(t) = f(x(t), t) \\ x(t_0) = x_0 \end{cases} \tag{1}$$
and its equivalent integral form $$ x(t) = x_0 + \int_{t_o}^t f(x(s), s)\ ds \tag{2}$$
(According to the hypotheses, there exists a local solution.)
Someone acquainted with functional spaces and maps between them is able note that the RHS of $(2)$ is one of these, namely $$ \mathcal{F} : X \to Y : x \mapsto \left(I \to \mathbb{R}^n : t \mapsto x_0 + \int_{t_o}^t f(x(s), s)\ ds \right) $$
where $X$ and $Y$ are suitable functional spaces. Thus $(2)$ is simply $$ x = \mathcal{F}(x) \tag{3} $$
This means that the function $x$ is a fixed point of the higher-order function $\mathcal{F}$. (Points in functional spaces are functions.)
With enough familiarity with complete metric or Banach spaces, as we want this fixed point to be unique, we start to wonder about Banach fixed-point theorem. Then if we could somehow manage to find a complete metric space invariant under $\mathcal{F}$, in the sense that, $\mathcal{F}(X) \subseteq X$, with $x \in X$ we could use it.
It does not take long to remember one of the most known spaces: the set of bounded functions, $\mathcal{B}(I, \mathbb{R}^n)$, that together with the supremum norm, $\| \cdot \|_{\infty}$, constitutes a Banach space, hence induces a complete metric space. Since $x$ is continuous, to bound it, we just take a compact subinterval $J \subset I$ and henceforth consider $X = Y = (\mathcal{B}(J, \mathbb{R}^n), \| \cdot \|_{\infty})$.
Skipping the finner details, as we try to show $\mathcal{F}(X) \subseteq X$ we find that $X$ and $Y$ should be restricted to the functions such that $\| x(t) - x_0 \| \le \beta$ for a suitable $\beta$, that is, the functions such that the distance to $x_0$ does not exceeds $\beta$. It turns out that such subspace still complete. Furthermore to make sure that solutions does not exit $f$'s domain we restrict it to $\overline{B}(x_0, \beta) \times J$.
Now we are certain that $X$ is an adequate complete metric space and $\mathcal{F}(X) \subseteq X$, we wonder if it is possible to $\mathcal{F}$ be a contraction. Right. We already got a very nice compact convex domain, there is not much else to ask here. So what to change to now? The only thing left is $f$'s continuity (which, by the way, is uniform now). Well, there are different levels of continuity. The usual types of continuity in compact sets are hierarchized as follows
- uniformly (globally) continuous
- locally Hölderian
- (globally) Hölderian
- locally Lipschitzian
- (globally) Lipschitzian
- bounded derivative
- continuously differentiable
Since showing that $\mathcal{F}$ is a contraction is showing the innequality
$$ \| \mathcal{F}(x)(t) - \mathcal{F}(y)(t) \| = \left\| \int_{t_0}^{t} f(x(s),s) - f(y(s), s)\ ds \right\| \le \lambda \|x(t) - y(t) \| \tag{4}$$
and integration interacts nicely with norms
$$ \left\| \int_{t_0}^{t} (f(x(s),s) - f(y(s), s))\ ds \right\| \le \left| \int_{t_0}^{t} \| f(x(s),s) - f(y(s), s) \| ds \right| \tag{5}$$
it is reasonable to wonder about a type of continuity that also relates norm and inequality. Particularly, $(4)$ and $(5)$ suggest Hölder or Lipshitz continuity. Since Hölder continuity does not seems convenient here because of its exponent, we choose (global) Lipshitz continuity. This immediately allows the following estimation
$$ \left| \int_{t_0}^{t} \| f(x(s),s) - f(y(s), s) \| ds \right| \le L \left| \int_{t_0}^{t} \| (x(s), s) - (y(s), s) \| ds \right| \tag{6}$$
which is enough to make $\mathcal{F}$ a contraction in these settings. We achieved our objetive.
Notice that Lipschitzianity is not needed in $t$, for it anulates itself in $(6)$. Furthermore, because a locally Lipschitzian function on a compact set is globally Lipschitzian and we will anyway restrict $f$ to a compact set, we just need $f$ to be locally Lipschitzian in $x$.
What makes this proof interesenting (and what made it popular) is that it elegantly yields a method for constructing the (locally, since we reduced $f$'s domain) unique solution through $\mathcal{F}$. In light of Banach's fixed-point theorem, we just choose any starting function in $X$, say $\gamma_0 \in X$, and iterates $\mathcal{F}$ over it: $\mathcal{F}^n(\gamma_0)$. Thus $\gamma = \lim_\limits{n \to \infty} \mathcal{F}^n(\gamma_0)$ exists and is (locally) unique. Therefore $\gamma$ is the (locally) unique fixed point of $\mathcal{F}$ and consequently the (locally) unique solution of $(1)$. We choose $\gamma_0(t) = x_0$ because it is the only function we are sure that belongs to $X$.
The sequence $\mathcal{F}^n(\gamma_0)$ is refered as Picard iteration because it exactly matches the original Picard successive approximations method. Here is important to emphasize that it is the uniqueness proof that produces "Picard's iteration" and not the other way around. However, what Picard (amazingly) managed to do, with the knowledge and mathematical maturity available at time, was to bypass this polished reasoning(which today is considered simple).
Now take a moment to contemplate how sharp your intuition must be to get through this reasoning without the structure provided by functional analysis.
Last but not least, non local Lipschitzianity does not necessarily allows non unique solutions. Uniqueness holds under a weaker condition, known as Osgood criterion. For more theorems on uniqueness see (R.P. Agarwal, V. Lakshmikantham) Uniqueness and Nonuniqueness Criteria for Ordinary Differential Equations. World Scientific (1993).
Solution 2:
There are really two parts here. First, why would you recast the differential equation
$$y'=f(x,y),y(0)=y_0$$
into the integral equation
$$y(x)=y_0+\int_0^x f(z,y(z)) dz$$
? This is a particular case of converting the "strong" form of an equation to a "weak" form, meaning that fewer derivatives are required to make sense of it (in this case none are required in fact). I find this is hard to motivate in general. But in general the introduction of the notion of weak formulations is at the heart of rigorous analysis of differential equations (especially PDEs), so it's worth getting used to it.
Second, why would you solve the integral equation by Picard iteration? Well, in general it makes sense to think about doing a "fixed point type" approximation scheme, in which you improve approximations according to a scheme
$$y_{n+1}=F(y_n)$$
where $F$ is some mapping from functions to functions. If this scheme is to converge, then the solution to the differential equation must be a fixed point of $F$. We notice that the integral equation is already in fixed point form $y=F(y)$. So one might hope that fixed point iteration with the RHS of the original integral equation will converge.
Quite frequently in other situations this fails. For example, if we want to iteratively solve the algebraic fixed point equation $x=x^2-100$, the scheme $x_{n+1}=x_n^2-100$ will not work. In such cases we need to find some equivalent fixed point equation in order to construct a convergent iteration. One of these is $x_{n+1}=\sqrt{x_n+100}$; another is $x_{n+1}=x_n-\frac{x_n^2-x_n-100}{2x_n-1}$.
In the particular case of Picard iteration for suitable $f$, it just turns out that the most obvious thing to try when you start from the integral equation above works. It probably is not so easy to anticipate this without actually doing the analysis...but you can always just do the analysis and find out what happens.
Solution 3:
You are trying to prove the existence of the solution of an equation via a fixed-point reformulation. The natural choices for the ODE are $$ y'_{n+1}=f(x,y_n(x)) $$ and $$ y'_{n}=f(x,y_{n+1}(x)). $$ The second one has several problems. First one needs to solve an implicit equation to get to the value of $y_{n+1}$, then differentiation is smoothness-destroying which severely restricts the applicability of this iteration.
In contrast the first variant requires to integrate $y_{n+1}'$ to $$ y_{n+1}(x)=y_{n+1}(x_0)+\int_{x_0}^xf(s,y_n(s))\,ds $$ Using the natural choice $y_{n+1}(x_0)=y_0$ one finds the Picard iteration. Due to the integration the smoothness class of the iterates is increased in each step by one order until the smoothness of $f$ is surpassed by one.
Additionally, $C^{k+1}$ is a small, thin subset in $C^k$ so that the projection down after a Picard step results in something close to contraction from these qualitative considerations alone, with a Lipschitz condition one obtains a quantifiable contraction.