Solution 1:

differentials (in a way of standard analysis) are not very rigorous in mathematics

Standard or not has nothing to do with it.

Operations on differential notation are fully rigorous in standard mathematics, as a well-defined representation of properties of differentiable functions. If you want the differentials themselves to be mathematical objects that satisfy the rules of the differential notation, that can be done (and is done, in several different useful ways) in standard mathematics. Robinson NSA is only one option.

this approximation never really becomes equality, the smaller the angles the better it works, but still never equality!

That is true in all forms of infinitesimal reasoning except those with nilpotents (which can be defined by asserting that the order $k$ Taylor approximations are an equality, for a selected value of $k$). Calculations are always "up to higher order terms".

we actually have in mind that this quantity contains higher order terms, but they will vanish after we parametrise?

We have in mind that only a finite leading-order term is being calculated at the end, and the rest is an error term that is qualitatively smaller in the sense that it vanishes in the limit, or is infinitesimal or nilpotent or non-non-zero (which are some of the meanings of "qualitatively smaller" that are assigned in different formalizations of infinitesimal reasoning). For example, the product rule $d(FG) = (dF)G + F(dG)$ is true only up to second-order terms, which means ignoring $(dF)(dG)$ as something that will not contribute to the end result when calculating some quantity such as $d(FG)/dt$. If you do not want the limiting or infinitesimal answer, but a finite change in $FG$ over a finite interval of $t$, then the product rule is false and the higher order terms need to be accounted for. The calculus rules are only for handling the principal terms in asymptotic calculations that become "exact in the limit".

Solution 2:

Concerning differentials per se, the answer of xyz should clarify your question. Furthermore, the answer of user10676 settles the question concerning the line element. For those who do not know much about Riemannian geometry lets elaborate a bit on user10676's answer:

Think a moment about lines. A possible way to view a (straight) line segment $l$ from $x$ to $y$ in euclidean space is that $l$ realiyes the shortest curve from $x$ to $y$. Hence, we can think of line segments in $\mathbb{R}^3$ as shortest connections between two points. From an abstract point of view: The Euclidean space $\mathbb{R}^3$ is a metric space with respect to the usual euclidean metric (where the $x_i, y_i$ are the components of your vector) $$d(x,y)= \sqrt{\sum_{i=1}^3 (x_i-y_i)^2}$$ (this is just the abstract formula for your pythagoras theorem). This metric is induced by the euclidean scalar product $\langle x,y\rangle = \sum_{i=1}^3 x_iy_i$ via $d(x,y) = \sqrt{\langle x-y,x-y\rangle}$. Now the derivative $\frac{d}{dt}|_{t=s} c(t)$ of a (differentiable) curve $c\colon \mathbb{R} \rightarrow \mathbb{R}^3$ yields a vector in $\mathbb{R}^3$ at each point $s$. We can measure its length using the euclidean metric. For your straight line from $x$ to $y$ the derivative is $y-x$ (and its length is the euclidean distance). As you learn in a calculus course for differentiable curves the length is computed exactly as described in the post of user10676. In our case for the line from $x$ to $y$ given by $c(t)=x+t(y-x)$: $$\ell(c) = \int_0^1 \sqrt{\langle y-x,y-x\rangle}= (1-0)\sqrt{\sum_{i=1}^3 (x_i-y_i)^2}=d(x,y)$$ Notice that we can identify the middle term in the above equation with a sum of the $(dx_i)^2$ in user10676's answer. However, we used a lot of the structure on $\mathbb{R}^3$. In an essential way the construction depends on $\mathbb{R}^3$ being a (finite dimensional) vector space. This was used in the following way:

We computed the derivative (which then turned out to be at each point a vector in the same vector space). Then we measured the length of the derivative with the euclidean scalar product.

Obviously the sphere is not a vector space, so we can not copy exactly what was going on in $\mathbb{R}^3$. What should be our geometric intuition for a line segment? It should be a straigh line, i.e. the shortest connection between two points $x$ and $y$ on the sphere. To measure length of (differentiable) curves we would like to copy the trick in the vector space: Compute the derivatives and measure their length.

The answer to the first problem is to put a natural differentiable structure on the sphere. In fancy words: The sphere $\mathbb{S}^2$ in $\mathbb{R}^3$ is a smooth manifold modeled on $\mathbb{R}^2$. For curves into such objects one can compute derivatives. In our example: The derivative of a curve $c\colon \mathbb{R} \rightarrow \mathbb{S}^2$ at $s \in \mathbb{R}$ is an element in the tangent space at $c(s)$. Geometrically this is a vector in the tangent plane of $\mathbb{S}^2$ at the point $c(s)$. Now we have derivatives and want to compute their length. Thus we hit the second problem: How do we compute the length?

Answer: Riemannian geometry. Simply speaking measuring length in the tangent planes should be related to the smooth structure on the sphere, the way we measure the length shoudl depend smoothly on the basepoint of the tangent space. The correct way to do this is using a Riemannian metric (and the formula you get out is exactly the one provided by user10676). I will not go into details since this problem is more involved and there are very good books on Riemannian geometry explaining it. If you are interested in the metric aspects (without to much Riemannian geometry) I recommend:

M.R. Bridson and A. Haefliger: Metric Spaces of Non-Positive Curvature

The first chapters derive exactly the formulas for line segments and put it in a very nice perspective.

Solution 3:

Yes it is mathematically rigorous. A line element is given by a Riemanian metric.

Recall that a differential form on a vector space $V$ is a map ${\mathbb R}^2 \rightarrow V^*:={\rm Hom}(V,{\mathbb R})$, a Riemanian metric is a map $g:V \rightarrow {\rm Sym}(V^*)$ which is everywhere definite positive, and its length element $ds$ is the square root of $g$. Given a Riemanian metric $g$ and a parametrized curve $\gamma$, its length is $\int_{\gamma} ds := \int_0^1 \sqrt{g_{\gamma(t)}(\gamma'(t))} dt$ (which is independent of the parametrization).

In the case of $V:={\mathbb R}^3$, $dx,dy,dz$ are the usual differential forms, their squares $(dx)^2,(dy)^2,(dz)^2$ are maps ${\mathbb R}^2 \rightarrow {\rm Sym}(V^*)$ and their sum gives the usual metric.

Same for the sphere ${\mathbb S}^2$. The differential of $\theta$ and $\phi$ are 1-forms $d\theta, d\phi :{\mathbb S}^2 \rightarrow T^*({\mathbb S}^2)$ and the map $r^2 \sin^2\theta.(d\theta)^2 + r^2 (d\phi)^2 : {\mathbb S}^2 \rightarrow {\rm Sym }(T^*{\mathbb S}^2)$ is the usual Riemanian metric.

Solution 4:

[This is best considered a comment, but a bit too long to post as one]

I'm not an expert on this subject, but I can give you a reference that you might find helpful. The book I have in my collection is called "A Primer of Infinitesimal Analysis" by John L. Bell. It is an alternative approach to real analysis that rigorously develops an idea of an infinitesimal that is different from Robinson's. The approach is intrinsically based on constructive logic (denies the law of excluded middle); thus the 'admissible functions' you see are a lot different from the ones you encounter in classical analysis.

The number system that is developed is not quite a field---it behaves a lot like $\mathbb{R}[x]/(x^2)$. Thus it contains a subfield isomorphic to $\mathbb{R}$, but around each point is a 'infinitesimal neighborhood.' Unlike Robinson's approach those elements in the infinitesimal neighborhood around 0 square to 0.

There is also a marked difference in philosophy. This approach does not consider $\mathbb{R}$ as a bunch of points tightly packed together, but rather a lot of infinitesimal line segments overlapping and intermingling together (I believe this idea formalizes what you are trying to capture). And the graphs of the functions to and from this system are seen as 'tilting' these line segments. And the derivative at a 'point' is the tilt of the infinitesimal neighborhood that contains that point.

If you don't mind me asking, what's not 'rigorous' about Robinson's system?