My own (not always entirely successful...) attempt to remember the lemma and its proof is the following:

If $u$ satisfies a (differential or integral) inequality of a suitable type, then this limits the growth of $u$ in such a way that $u$ can become at most as big as the function $y$ which satisfies the corresponding equality.

(This function $y$ is, so to speak, pushing its rate of growth to the maximum, within the bounds imposed by the inequality. Any other function $u$, that doesn't use all the freedom that it has for growing, will end up smaller than $y$.)

So let's first look at the case with equality. For example (to take the specific variant of the lemma that you mentioned), suppose $y(t)$ satisfies the integral equation $$ y(t) = a(t) + \int_0^t b(s) \, y(s) \, ds . \tag{$*$} $$ We can solve this by rewriting it as an ODE (with initial condition) for the integral $I(t)=\int_0^t b(s) \, y(s) \, ds$ that appears on the right-hand side: $$ I'(t) = b(t) \, y(t) = [\text{according to ($*$)}] = b(t) \, \bigl( a(t) + I(t) \bigr) ,\qquad I(0)=\int_0^0 b(s) \, y(s) \, ds=0 . $$ In other words: $$I'(t) - b(t) \, I(t) = a(t) \, b(t) ,\qquad I(0)=0 . $$ Multiply by the integrating factor $e^{-B(t)}$, where $B(t)=\int_0^t b(s) \, ds$ is an antiderivative of $b$. This gives $$ \frac{d}{dt}\Bigl( I(t) \, e^{-B(t)} \Bigr) = a(t) \, b(t) \, e^{-B(t)} ,\qquad I(0)=0 . $$ Integrate this from $0$ to $t$, to get $$ I(t) \, e^{-B(t)} - \underbrace{I(0)}_{=0} \, e^{-B(0)} = \int_0^t a(s) \, b(s) \, e^{-B(s)} \, ds , $$ that is, $$ I(t) = \int_0^t a(s) \, b(s) \, e^{B(t)-B(s)} \, ds . $$ Now we have found the solution $y(t) = a(t) + I(t)$ to the integral equation ($*$): $$ y(t) = a(t) + \int_0^t a(s) \, b(s) \, e^{B(t)-B(s)} \, ds . \tag{${*}{*}$} $$

Next, the inequality. Suppose $$ u(t) \le a(t) + \int_0^t b(s) \, u(s) \, ds . \tag{${*}{*}{*}$} $$ Let $J(t) = \int_0^t b(s) \, u(s) \, ds$ be the integral appearing on the right-hand side. It satisfies $J'(t) = b(t) \, u(t)$. By assumption we have $u \le a+J$, so provided that the additional condition $b\ge 0$ is satisfied we get $bu \le b(a+J)$, so that $$ J'(t) \le b(t) \, \bigl( a(t) + J(t) \bigr) . $$ Now (assuming also $t \ge 0$) we can go through exactly the same steps as above, but with $\le$ instead of $=$, and then we of course end up with the same result, except that we get $\le$ instead of $=\,$: $$ u(t) \le a(t) + \int_0^t a(s) \, b(s) \, e^{B(t)-B(s)} \, ds . \tag{${*}{*}{*}{*}$} $$ In other words, $u(t) \le y(t)$ for $t \ge 0$, which is what we wanted to prove.


I had been looking for some similar intuition. I found this book by T. Tao which gives some intuition about Gronwall as being a result about `feedback' of non-linearities in ODEs (See Chapter 1.2).

http://www.math.ucla.edu/~tao/preprints/chapter.pdf

Specifically it tells you how to control the "worst-case" growth behaviour, where the "forcing term" (\beta(t) in the Wiki article) is measuring the amount of feedback in the system. This intuitively suggests that the result of the bound should be something exponential in \beta(t). In the linear case, Gronwall's Lemma tells us that solutions are bounded for all (finite) times and that there is no finite time blow-up.