Is non-standard analysis worth learning?
I think this (interesting!) question is yet-another instance of the occasional mismatch of science (and human perception) and formal mathematics. For example, the arguments used by the questioner, and common throughout science and engineering, were also those used by Euler and other mathematicians for 150 years prior to Abel, Cauchy, Weierstrass, and others' "rigorization" of calculus. The point is that the extreme usefulness and power of calculus and differential equations was illustrated prior to epsilon-delta proofs!
Similarly, c. 1900 Heaviside's (and others') use of derivatives of not-differentiable functions, of the "Dirac delta" functions and its derivatives, and so on, brought considerable ridicule on him from the mathematics establishment, but his mathematics worked well enough to build the transatlantic telegraph cable. "Justification" was only provided by work of Sobolev (1930s) and Schwartz (1940s).
And I think there are still severe problems with Feynman diagrams, even tho' he and others could immediately use them to obtain correct answers to previously-thorny quantum computations.
One conclusion I have reached from such stories is that we have less obligation to fret, if we have a reasonable physical intuition, than undergrad textbooks would make us believe.
But, back to the actual question: depending on one's tastes, non-standard analysis can be pretty entertaining to study, especially if one does not worry about the "theoretical" underpinnings. However, to be "rigorous" in use of non-standard analysis requires considerable effort, perhaps more than that required by other approaches. For example, the requisite model theory itself, while quite interesting if one finds such things interesting, is non-trivial. In the early 1970s, some results in functional analysis were obtained first by non-standard analysis, raising the possibility that such methods would, indeed, provide means otherwise unavailable. However, people found "standard" proofs soon after, and nothing comparable seems to have happened since, with regard to non-standard analysis.
With regard to model theory itself, the recent high-profile proof of the "fundamental lemma" in Langlands' program did make essential use of serious model theory... and there is no immediate prospect of replacing it. That's a much more complicated situation, tho'.
With regard to "intuitive analysis", my current impression is that learning an informal version of L. Schwartz' theory of distributions is more helpful. True, there are still issues of underlying technicalities, but, for someone in physics or mechanics or PDE... those underlying technicalities themselves have considerable physical sense, as opposed to purely mathematical content.
Strichartz' nice little book "A guide to distribution theory and Fourier transforms" captures the positive aspect of this, to my taste, altho' the Sobolev aspect is equally interesting. And, of course, beware the ponderous, lugubrious sources that'll make you sorry you ever asked... :) That is, anything can be turned into an ugly, technical mess in the hands of someone who craves it! :)
So, in summary, I think some parts of "modern analysis" (done lightly) more effectively fulfill one's intuition about "infinitesimals" than does non-standard analysis.'
It can be shown that non-standard analysis (in the framework of Nelson's internal set theory) is a conservative extension of the usual ZFC. In other words, every theorem provable using non-standard analysis, which can be stated using only "standard" terminology, can be also proved without using non-standard analysis. The well-known example is that of the Invariant Subspace Problem. Bernstein and Robinson was able to resolve a special case using non-standard analysis techniques; upon reading the pre-print, Halmos was able to re-prove the same thing using standard techniques. (The proofs are roughly the same length; B&R's paper is longer because they have to introduce terminology and include an introduction.)
So in terms of practical uses of non-standard analysis, it probably will not give you any fundamentally new insight. But, it will given you a language in which certain analytical manipulations are more easily stated (it is more than just keeping track of $\epsilon$s; if you unravel a lot of theorem statements in standard analysis, you have a bunch of statements consisting of $\forall$s and $\exists$s mixed up in some order, and the ordering is extremely important (think the difference between uniform continuity and continuity). These types of statements often tend to become simpler to state in non-standard analysis). It generally will not make it easier to come up with a proof; but it has the advantage of allowing you to spend less time worrying about $\epsilon$s and more time worrying about the bigger picture. In other words, it can give a tool to make rigorously realising a heuristic easier, but it won't help much in coming up with the heurstics to start with. Doing so comes at a sacrifice: because you are no longer explicitly keeping track of the $\epsilon-\delta$ arguments, you easily lose control of quantitative bounds (but you can retain control of qualitative statements). For a lot of mathematical analysis that is not a big deal, but there are times when the quantitative bounds are useful.
A propos whether it makes calculus any easier: that is for you to decide. From my point of view, because of the need to rigorously introduce the hyperreals and the notion of the standard part of an expression, for an absolute beginner it is just passing the buck. You either learn how to write $\epsilon-\delta$ arguments, or you accept a less-intuitive number system and learn all the rules about what you can and cannot do in this system. I think that a real appreciation of non-standard analysis can only come after one has developed some fundamentals of standard analysis. In fact, for me, one of the most striking thing about non-standard analysis is that it provides an a posteriori justification why the (by modern standards) sloppy notations of Newton, Leibniz, Cauchy, etc did not result in a theory that collapses under more careful scrutiny. But that's just my opinion.
To sum up: it is a useful tool. Learn it if you are interested. But don't think that it is the magic bullet that will solve all your problems.
Terence Tao has written several interesting blog posts on how nonstandard analysis can be useful in pure mathematics. My impression of his basic point is this: nonstandard analysis allows you to carry out certain intuitive arguments that don't work in standard analysis, and it shortens the length of certain types of proofs that would otherwise require you to juggle many epsilons.
Nonstandard analysis is also an application of model theory, which has recently gotten the attention of many mathematicians: model theory, as it turns out, is an effective tool for proving things of interest to non-logicians.
I cannot see how Willie Wong's example of the Bernstein-Robinson result supports his conclusion. It seems to me to do the opposite, and I am not alone here. Halmos admits himself in his autobiography: "The Bernstein-Robinson proof uses non-standard models of higher order predicate languages, and when Abby [Robinson] sent me his preprint I really had to sweat to pinpoint and translate its mathematical insight." Halmos did sweat because, as all of his comments and actions regarding NSA indicate, he was against it for philosophical or personal reasons, and so he was eager to downplay this result precisely because it seemed like support for using NSA, which, at least in Robinson's approach, is nonconstructive due to the reliance on the existence of nonprincipal ultrafilters (also, the compactness theorem relies on some equivalent of the axiom of choice).
Also, the fact that a formal proof of some formula exists (which is precisely what it means to be a theorem) is only trivially relevant to the question of whether a theory might help you find a proof. Besides, who other than automated theorem-provers actually thinks in terms of formal proofs? In my experience, the concepts and tools of a theory, the objects that it lets you talk about, and the ideas that it lets you express are what make a theory useful for proving things.
One thing that the OP might find attractive about NSA is that saying "x is infinitely close to y" is perfectly fine and meaningful -- and it probably means what you already think it means: two numbers are infinitely close iff their difference is infinitely small, i.e., an infinitesimal. You also get things like halos (all numbers infinitely close to some number) and shadows (the standard number infinitely close to some number), which can be fun and intuitive concepts to think with.
For example, here is how the limit of a (hyperreal) sequence is defined. First, sequences are no longer indexed by the natural numbers $\mathbb{N}$. Rather, sequences are indexed by the hypernaturals $^*\mathbb{N}$, which include numbers larger than any standard natural. Such numbers are called infinite (or unlimited). (Warning: this is not the same concept as "infinity" in "as x goes to infinity"; infinite naturals are smaller than (positive) infinity, when it makes sense to compare them.) Now, a hyperreal L is the limit of a sequence $\langle s_n \rangle$ (indexed by $^*\mathbb{N}$!) iff L is infinitely close to $s_n$ for all infinite n.
For another example, consider proofs using "sloppy" reasoning where you end up with some infinitesimal term and so just ignore it or drop it from an equation (provoking derisive comments about "ghosts of departed entities"). In NSA, rather than ignoring the term, you can actually say that it's infinitesimal and end up with a result that is infinitely close to the result of your sloppy alternative. E.g., let (the hyperfunction) $f(x) = x^2$ and consider the (I presume familiar) formula for the derivative, where we will let h be a nonzero infinitesimal:
$$\begin{align} \frac{f(x+h) - f(x)}{h} &= \frac{(x+h)^2 - x^2}{h} \\ &= \frac{x^2 + 2xh + h^2 - x^2}{h} \\ &= 2x + h \\ &\simeq 2x \end{align}$$
The symbol $\simeq$ denotes the relation "infinitely close". This derivation works because, when h is an infinitesimal, a + h is infinitely close to a for any hyperreal a. Under sensible restrictions on f and x, this derivation shows that $2x$ is the standard derivative of $x^2$, as every schoolgirl knows.
A cost-benefit analysis for learning NSA should probably include (i) for a benefit, how interesting or valuable you find the nonstandard concepts and (ii) for a cost, how much work you'll have to do to learn it. The latter will depend on what text or approach you choose. If you are willing to take some things for granted and just use the resulting tools, you can get away with bypassing a good chunk of the model-theoretic machinery (compactness, ultrafilters, elementary extensions, transfer, formal languages). If you understand the ultrapower construction, which constructs the hyperreals as equivalence classes of infinite sequences of real numbers (similar to the construction of the reals from the rationals using Cauchy sequences), then the resulting system behaves like you would expect -- relations and operations are defined componentwise. This part is relatively easy. Alternatively, you can get away with not understanding the construction very well if you are willing to internalize the definitions of the relations and operations on the hyperreals just as axiomatic.
If you want to look into NSA, I would recommend either (a) Goldblatt's Lectures on the Hyperreals if you don't have a strong background or interest in mathematical logic or (b) Hurd and Loeb's Introduction to Nonstandard Real Analysis otherwise. The latter is out of print and sadly about $100 if you want to buy it, but check libraries. It's very thoughtful and well-written. Also, if you are excited about the model-theoretic aspects, look them up in Chang and Keisler's Model Theory book as you go along. Hodges' model theory book is also very good but doesn't cover this material as extensively.
Cheers, Rachel
I love NSA: but I love it because I find the mathematics fascinating and not because it allows me to prove more analysis theorems.
There is a sense in which certain calculus arguments become more natural when written in the language of NSA. However, once you have enough mathematical maturity to really understand how NSA works you probably won't have too much difficulty translating intuitive arguments that have a natural expression in the language of NSA into $\epsilon$-$\delta$ arguments.
I do recommend learning some non-standard mathematics though, because it's extremely cool. There are various approaches: Nelson-style IST (see Robert's book Nonstandard Analysis or many others), using ultrapowers (there's a book by Goldblatt), or with smooth infinitesimal analysis (A Primer of Infinitesimal Analysis by Bell -- in this theory you drop the law of the excluded middle, so not not A no longer implies A. The infinitesimals are the numbers $x$ for which it is not the case that $x \neq 0$ !!). Whichever way you take you'll meet some amazing stuff -- ultrafilters and Łoś's theorem, toposes, all sorts of fun things. But don't imagine it's a way of making hard analysis problems easy: the law of conservation of difficulty is not easy to get round.