Is this "derivation" of the path length formula actually correct?

What he did is correct, though the reasons for doing so seem to be glossed over. I'll give you a rigorus version of what he did.

Let $\Delta x > 0$, represent the length of some horizontal line segment, and for now let $y = f(x)$, where $f$ is some function. (If $y$ is not a function, simply break it up into several pieces, where each piece is a function.) Then define $$\Delta y = f(x + \Delta x) - f(x)$$ Now, $\Delta x$ and $\Delta y$ are the lengths of legs of a right triangle, so we should probably talk about the hypotenuse, as well, whose length I will denote by $\Delta s$. Now, by the Pythagorean Theorem, $$(\Delta s)^2 = (\Delta x)^2 + (\Delta y)^2 = (\Delta x)^2\left[1 + \left(\frac{\Delta y}{\Delta x}\right)^2\right]$$ (where I pulled out $(\Delta x)^2$ from both terms on the RHS). Substituting our two expression from above, $$(\Delta s)^2 = (\Delta x)^2\left[1 + \left(\frac{f(x + \Delta x) - f(x)}{\Delta x}\right)^2\right]$$ $$\implies \left(\frac{\Delta s}{\Delta x}\right)^2 = 1 + \left(\frac{f(x + \Delta x) - f(x)}{\Delta x}\right)^2$$

Now, take limits of both sides as $\Delta x \rightarrow 0$: $$\lim\limits_{\Delta x \rightarrow 0}\left(\frac{\Delta s}{\Delta x}\right)^2 = \lim\limits_{\Delta x \rightarrow 0}\left[1 + \left(\frac{f(x + \Delta x) - f(x)}{\Delta x}\right)^2\right]$$ $$\implies \left(\lim\limits_{\Delta x \rightarrow 0}\frac{\Delta s}{\Delta x}\right)^2 = 1 + \left(\lim\limits_{\Delta x \rightarrow 0}\frac{f(x + \Delta x) - f(x)}{\Delta x}\right)^2$$ $$\implies \left(\frac{ds}{dx}\right)^2 = 1 + \left[f'(x)\right]^2 = 1 + \left(\frac{dy}{dx}\right)^2$$ $$\implies \left|\frac{ds}{dx}\right| = \sqrt{1 + \left(\frac{dy}{dx}\right)^2}$$ If you assume $s$ increases as $x$ increases, i.e. the length $s$ of your path increases as you move from left to right, we can drop the absolute values: $$\frac{ds}{dx} = \sqrt{1 + \left(\frac{dy}{dx}\right)^2}$$ or, in differential form, $$ds = dx\sqrt{1 + \left(\frac{dy}{dx}\right)^2}$$


This is what helped me stay afloat during general relativity classes; it might help you too. First note, that this is always in the context of some (given or arbitrary) path $S$. Say your path is parametrised by some parameter $t$, that the path goes from $a$ to $b$ as $t$ goes from $0$ to $1$, just to have some concretes down. Then $ds = \sqrt{dx^2+dy^2}$ can be translated into $$ \frac{ds}{dt}=\sqrt{\left(\frac{dx}{dt}\right)^2+\left(\frac{dy}{dt}\right)^2} $$ or, correspondingly (using the appropriate form of the fundamental theorem of calculus) $$ \int_a^bds=\int_0^1\sqrt{\left(\frac{dx}{dt}\right)^2+\left(\frac{dy}{dt}\right)^2}dt $$ Now you can factor out $\frac{dx}{dt}$ from the square root, and apply the chain rule to get what I have learned to think when I see $ds = dx \sqrt{1+\left( \frac{dy}{dx} \right)^2}$, namely $$ \int_a^bds=\int_0^1\left|\frac{dx}{dt}\right| \sqrt{1+\left(\frac{dy}{dx}\right)^2}dt $$ which is valid as long as the path isn't vertical ($\frac{dy}{dx}$ is evaluated along the path, where $y$ can be a function of $x$ locally, as long as it's not vertical). But if the path is vertical, then you can factor out $\frac{dy}{dt}$ from the square root instead, and everything works out nicely. The integral above can, of course, with simple substitution also be made into an integral over $x$ if that makes the problem easier.

Note that in general relativity (and perhaps elsewhere in physics) the form $ds^2=dx^2+dy^2$ is not unheard of. As far as I can tell that's just because they don't want to bother with the square root sign when writing down formulae. You still definitely have to put the square root back into the expression before any calculations can be done.


Although mathematicians go nuts over this stuff, abusing notation and working with symbols in a way that relies on intuition is often critical to a physicist and useful to a mathematician who wants to build deeper understanding and intuition. We should all be grateful that not everyone is paralyzed by rigor-mortis because scientific discovery benefits from the audacity to break formal rules.

A mathematician would say that for a parameterized curve $\gamma(t) = (x(t), y(t))$, the arc length along the curve between parameter values $t_a$ and $t_b$ is

$$ \int_{t_a}^{t_b} |\dot\gamma(t)|\, dt = \int_{t_a}^{t_b} \sqrt{\dot x(t)^2 + \dot y(t)^2}\, dt $$

This definition is motivated by thinking about little pieces of time $dt$ and the fact that $|\dot\gamma(t)|$ is the speed at time $t$. For little pieces of time, the speed times time will give a little straight line piece of distance traveled, and then one simply adds up all the distances of these little pieces. In other words, there is a very nice physical motivation for this definition of arc length along a parameterized curve.

Now as a special case, suppose that there is a parameterization of the curve whose arc length we're trying to compute taking the following form:

$$ \alpha(s) = (s, y(s)). $$

Since the $x$-coordinate function is simply the identity function, we might as well call this parameter $x$ (it's a dummy variable anyway), in which case we get

$$ \alpha(x) = (x,y(x)). $$

Now plug this into the original arc length definition to obtain

$$ \int_{x_a}^{x_b} |\alpha'(x)|dx = \int_{x_a}^{x_b}\sqrt{1 + y'(x)^2}\, dx. $$

This is precisely the formula written down by the physicist.

Comment on the Edit. Treating derivatives as difference quotients of small quantities often works because, well, that really is what a derivative is doing. Look at the definition of the derivative as a limit of a difference quotient. If $\Delta x$ is small, then replacing the derivative by the difference quotient won't generally incur a large error, so it's not always such a bad way to look at things. This should, of course, be taken with a grain of salt when you want to rigorously clean everything up in the end, but it's often unproductive to tie your hands and not think about these things intuitively, especially when you're first learning them.

I enjoy and appreciate rigor as much as any other respectable citizen off the street, but I've also learned to appreciate that working loosely with mathematical quantities can often lead to great intuition and insight. Take, for example, path integrals in physics. No one really knows how to define these beasts in a way that would satisfy a modern mathematician (especially path integrals in quantum field theory), but nonetheless physicists' formal manipulations have led to some of the most accurately predicted measurements in human history.


your professor is probably not meaning dx and dy to be tiny numbers. He was probably saying that as minor justification of how he set up the original equation.

I do not have the full and complete ability to prove what your professor did was valid. After all, I am willing to assume that his first equation was valid. As many answers point out, that is his 'definition' of arc length. Of course, one can go deeper and somehow prove that is arc length, but let's be frank. Arc length is a human defined term. We have to accept that as the starting point.

Moving on from that; however, I believe I can clarify what your professor was trying to do in a mathematical manner with my own words.

Proposition: For every pair of real valued function $f$ and $g$, there exist differential functors $Pf$ and $Pg$ such that $\frac {Pf}{Pg} = D_g(f)$, $f \cdot Pg = \int (f) dg$, and $\frac {Pf}{Pf} = 1$

What this all does is establish that the handwavy math that the professor did was valid. Things like "dx" (in my eyes) are neither functions nor numbers. They are something else. I don't know what to call them, but the algebra of them appears to work just fine.

Think of it this way. The professor is not doing algebra of numbers or an algebra of functions. He is doing an algebraic manipulation of something else, and I wouldn't write it off as garbage. All the professor truly needs to do is set forth what kind of objects these things are. Are they elements of a function space? Are they infinitesimally small numbers? Are they some kind of number-operator hybrid (never heard of such a thing but physics people are weird)?

Honestly, the only one who can answer this question is your professor. Just tell the professor that you recognize that dx and dy are not numbers nor functions nor operators and that you wish to know what they are. If he just says "they are numbers", then you have your answer. He is most certainly doing some non-rigorous stuff and messing up in math (but it is physics so that shouldn't be an issue).

Chances are, he'll point you in the direction of some other mathematical construct and you'll learn something new. Or maybe he'll just say that's how he learned it and he was simply attempting to 'justify' the definition.

Answering your question in your edit, I think it really like I said. You can write any equation you want and claim they are the objects I set forth. The issue is whether or not it is well defined. I see no reason that wouldn't occur. Worst is that you get something that is neither an integral nor a derivative and in that situation I'd argue that the equation is simply unknown in meaning and not just undefined.