Polar Coordinate Transformation - Motivation

I am trying to work out the reason why the integral

$$ \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}{e^{-(x^2+y^2)}}\,dx\,dy $$

is, in polar coordinates,

$$ \int_{-\infty}^{\infty}{e^{-r^2}} \,r\,dr\,d\theta$$

As I understand it, a polar coordinate transformation involves the following substitution:

$$ (x,y) \rightarrow (r\cos{\theta},r\sin{\theta})$$

This would imply that

$$-(x^2+y^2) = -((r\cos{\theta})^2+(r\sin{\theta})^2) = -r^2((\cos{\theta})^2+(\sin{\theta})^2) = -r^2 $$

which gets us this far

$$ \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}{e^{-r^2}}\,dx\,dy $$

To motivate why

$$ dx\,dy = r\,dr\,d\theta$$

I thought of the following argument:

$$ dx = d(x(\theta,r)) = \frac{\partial(x(\theta,r))}{\partial\theta}d\theta + \frac{\partial(x(\theta,r))}{\partial r}dr \,+ \frac{\partial^2(x(\theta,r))}{(\partial r)^2}(dr)^2 + \, ... \\ =r(-\sin{\theta})\,d\theta+ dr\cos{\theta}\, +\,...$$

$$dy = r(\cos{\theta})\,d\theta+ dr\sin{\theta} \,+\,...$$

$$ \therefore\: dx\,dy = [r(-\sin{\theta})\,d\theta+ dr\cos{\theta}\, +\,...]\,[ r(\cos{\theta})\,d\theta+ dr\sin{\theta} \,+\,...] \\ = r\,dr\,d\theta \, ((\cos{\theta})^2 -(\sin{\theta})^2)\,+\,... \\ = r\,dr\,d\theta \, ((\cos{\theta})^2 -(\sin{\theta})^2),\, \text{ ignoring }\textit{o}((dr)^2) \text{ and } \textit{o}((d\theta)^2)\text{ terms.} $$

However, I am off by a minus sign, which would enable me to argue

$$dx\,dy = r\,dr\,d\theta \, ((\cos{\theta})^2 + (\sin{\theta})^2) = r\,dr\,d\theta $$ If it were correct, I would find this line of argument would be much more analytically convincing than the typical argument involving

$$ dx\,dy = dA = r\,dr\,d\theta$$

which I find to be less mechanistically obvious than the above substitution-based argument that I considered but that I am not being able to fully justify.

Could you please tell me whether my substitution-based argument can work, potentially by correcting some mistake or another that I might have made? If not, do you have any similarly analytical or mechanistic justification as to why $ dx\,dy = r\,dr\,d\theta$ ?

Solution 1:

No, this mere substitution argument cannot work, and you haven't made an algebraic mistake at all. This goes precisely to show you that these differentials $dx,dy, dr, d\theta$ are NOT real numbers. So, there is absolutely no reason you should treat them as real numbers and perform the usual arithmetic of real numbers with them. Therefore, the better question to ask yourself is why would you even expect this naive "substitution" to work? Sure, in single variable calculus, many such arguments just happen to "work", but of course this is not at all a good reason to expect it to keep working.

So, how do we treat such integrals? The key result is the multi-dimensional change of variables formula. The proof of this formula is very technical, and not easy at all, but the rough idea is I think straight forward. A very rough way of saying it is that in cartesian coordinates, the function you're integrating is $f(x,y) = e^{-(x^2+y^2)}$, so when you change to polar coordinates, you have modified your function to \begin{align} F(r, \theta) = f(r \cos \theta, r \sin \theta) = e^{-r^2}. \end{align} If we modify the function, then of course we have to modify how we measure the areas. Now, in cartesian coordinates, a little area element is described as $dx \, dy$, which you intuitively think of as a small box of size $dx$ and $dy$. But if you now want to integrate in polar coordinates, you have to think of what does a little area element look like. There are several geometrical justifications available for this on this site and others, and the correct answer is $r \, dr \, d \theta$. That factor of $r$ encodes how the area changes as you measure things in cartesian coordinates vs how you measure things in polar coordinates.

Once again, there is absolutely no reason to think the correct area element should be obtained by plugging in $dx = \cos \theta \, dr + \dots$ and likewise for $dy$, into the "product" (however it is defined) $dx \, dy$. Nor should you expect that "because $dA = dx \, dy$ in cartesian coordinates, therefore $dA = dr \, d \theta$ in polar coordinates". None of these is a proper way of determining the area element in a different coordinate system. It's not only wrong, but there's also no reason to expect it to work.

In general, the correct scale factor which accounts for the difference in area measurements in different coordinates is the absolute value of the determinant of the derivative of the coordinate transformation. This is a non-trivial issue, and there are several answers on this site which explain in more detail why this is the right "conversion factor" (and also entire chapters of textbooks devoted to this formula and its geometric significance).

Hopefully now that you're convinced that naive substitution is not at all even a reasonable thing to do, let's see how one can arrive at the answer mechanically. It's very simple, and even very memorable: \begin{align} dx \, dy &= \left| \det \dfrac{\partial (x,y)}{\partial (r, \theta)} \right| \, dr \, d \theta \end{align} Where that funny derivative is a $2 \times 2$ matrix of partial derivatives, which in this case is: \begin{align} \dfrac{\partial (x,y)}{\partial (r, \theta)} &= \begin{pmatrix} \cos \theta & -r \sin \theta \\ \sin \theta & r \cos \theta \end{pmatrix} \end{align} The absolute value of the determinant is $|r|$, but this is simply $r$, since $r \geq 0$. Therefore, $dx \, dy = r \, dr \, d \theta$.

Another way to approach the subject is by the means of differential forms. The differential forms machinery pretty much encodes in its definition the determinant of the Jacobian matrix. Now, I won't detail the entire theory of differential forms, but here's the "rules" (again if you want to know more, there's entire books on the subject).

Here, things like $dx, dy, dr \, d \phi, d \theta$ etc are no longer real numbers/infinitesimally small but non-zero numbers (whatever that is supposed to mean in typical analysis). What are they? They're the section of the cotangent bundle. Ok, that's probably just some mumbo jumbo right now, but here's how we calculate with them.

For the purposes of integration, we no longer write $dx \, dy$. Instead, we put a little $\wedge$, like $dx \wedge dy$; this is called the wedge product. The key property it has is that $dx \wedge dy = - dy \wedge dx$; so it alternates sign whenever you flip two of them. In particular, $dx \wedge dx = - dx \wedge dx$ (I flipped them) and therefore $dx \wedge dx = 0$. In other words whenever you wedge the same thing, it vanishes. Finally, all the other rules, like associativity, distributivity "expanding brackets" etc all work as normal. Now we can compute: \begin{align} dx \wedge dy &= \left( \cos \theta dr - r \sin \theta \, d \theta\right)\wedge \left( \sin \theta dr + r \cos \theta \, d \theta\right) \\ &= \left( \cos \theta \sin \theta \right) \underbrace{dr \wedge dr}_{=0} + \left(r \cos^2 \theta \right) dr \wedge d \theta + \left( - r \sin^2 \theta\right) \underbrace{d \theta \wedge dr}_{ = -dr \wedge d \theta} + \left( -r^2 \sin \theta \cos \theta\right)\underbrace{d \theta \wedge d \theta}_{=0} \\ &= r \, dr \wedge d \theta. \end{align}

It is a general fact that the factor you get in front will always be the determinant of the Jacobian matrix of the coordinate change. The main reason for this is that there is a deep connection between determinants, alternating objects like differential forms, and volumes of parallelepiped (which you may have learnt in linear algebra).

These relations are very general, and work for any coordinate system. For exmaple, if you consider parabolic coordinates $\xi, \eta$, whose relation to cartesian coordinates is $x = \xi \eta$ and $y = \dfrac{1}{2}(\eta^2 - \xi^2)$, you can work through a similar mechanical procedure to find \begin{align} dx \wedge dy &= (\xi^2 + \eta^2) \, d\xi \wedge d \eta. \end{align}

Final Remarks.

There are several subtleties which I have had to gloss over, like the concept of orientation, why differential forms are "suitable objects to integrate", etc. But hopefully this gives you an idea for why naive substitution doesn't (and shouldn't be expected to) work, and that one needs an appropriate scale factor to treat the areas; also hopefully there have been enough "mechanical" rules to help (of course you should eventually try to understand them at a deeper level).

One more thing I'd like to highlight is that while the differential forms approach gives a very quick and easy mechanical approach to the change of volume factor, and while computing determinants algebraically is very straight forward, it is very important to keep in mind the geometry behind these formulas. These Riemann integrals are all about areas of various (simple) figures, so it is absolutely essential to atleast somewhat connect the algebraic significance of the determinant to its geometric significance regarding areas/volumes.

Expected days to finish a box of cookies

Is $\cos(x) \geq 1 - \frac{x^2}{2}$ for all $x$ in $R$?

Injective Cogenerators in the Category of Modules over a Noetherian Ring

Quotient of a Banach space $X$ gets quotient topology under standard norm induced from $X$.

Which meromorphic functions are logarithmic derivatives of other meromorphic functions?

Measure of intervals in the Borel sigma-algebra

Can't EF game theory be applied to finite languages WITH function symbols?

Properties of the euler totient function

Homework: limit of $\frac{1}{n}\sqrt[n]{\prod_{k=1}^n(n+k)}$ as $n\to\infty$

Inequality $\sqrt{1+x^2}+\sqrt{1+y^2}+\sqrt{1+z^2} \le \sqrt{2}(x+y+z)$

Choice function for a collection of nonempty subsets of $\{0,1\}^\omega$ [duplicate]

Statistics: Bertrand's Box Paradox [duplicate]