Choice of $q$ in Baby Rudin's Example 1.1

First, my apologies if this has already been asked/answered. I wasn't able to find this question via search.

My question comes from Rudin's "Principles of Mathematical Analysis," or "Baby Rudin," Ch 1, Example 1.1 on p. 2. In the second version of the proof, showing that sets A and B do not have greatest or lowest elements respectively, he presents a seemingly arbitrary assignment of a number $q$ that satisfies equations (3) and (4), plus other conditions needed to show that $q$ is the right number for the proof. As an exercise, I tried to derive his choice of $q$ so that I may learn more about the problem.

If we write equations (3) as $q = p - (p^2 - 2)x$, we can write (4) as

$$ q^2 - 2 = (p^2 - 2)[1 - 2px + (p^2 - 2)x^2]. $$

Here, we need a rational $x > 0$, chosen such that the expression in $[...]$ is positive. Using the quadratic formula and the sign of $(p^2 - 2)$, it can be shown that we need

$$ x \in \left(0, \frac{1}{p + \sqrt{2}}\right) \mbox{ for } p \in A, $$

or, for $p \in B$, $x < 1/\left(p + \sqrt{2}\right)$ or $x > 1/\left(p - \sqrt{2}\right)$.

Notice that there are MANY solutions to these equations! The easiest to see, perhaps, is letting $x = 1/(p + n)$ for $n \geq 2$. Notice that Rudin chooses $n = 2$ for his answer, but it checks out easily for other $n$.

The Question: Why does Rudin choose $x = 1/(p + 2)$ specifically? Is it just to make the expressions work out clearly algebraically? Why doesn't he comment on his particular choice or the nature of the set of solutions that will work for the proof? Is there a simpler derivation for the number $q$ that I am missing?


Rudin's approximation to $\sqrt{2}$ arises simply by applying by the secant method - a difference analog of Newton's method for finding successively better approximations to roots.enter image description here

As the linked Wikipedia article shows, the recurrence relation for the secant method is as below.

$$\rm S_{n+1}= \dfrac{S_{n-1}\ f\:(S_n) - S_n\ f\:(S_{n-1})}{f\:(S_n)-f\:(S_{n-1})}\qquad\qquad\qquad\qquad$$

For $\rm\ (S_{n-1},S_n,S_{n+1}) = (q,p,p')\ $ and $\rm\ f\:(x) = x^2-d\:,\:$ we obtain

$$\rm p'\ =\ \dfrac{q\:(p^2-d) - p\:(q^2-d)}{p^2-d-(q^2-d)}\ =\ \dfrac{(p-q)\:(p\:q+d)}{p^2-q^2}\ =\ \dfrac{p\:q+d}{p+q}$$

Finally specializing $\rm\: q = 2 = d\: $ yields Rudin's approximation $\rm\displaystyle\ p'\ =\ \frac{2\:p+2}{\ \:p+2}$

The secant method has stunningly beautiful connections with the group law on conics. To learn about this folklore, I highly recommend Sam Northshield's Associativity of the Secant Method. The reader already familiar with the group law on elliptic curves, but unfamiliar with the degenerate case of conics, might also find helpful some of Franz Lemmermeyer's expositions, e.g. Conics - a poor man's elliptic curves.


The point here is that iterations of the Möbius transformation $p \mapsto \frac{2p+2}{p+2}$ converge to $\sqrt{2}$, so every time you apply the transformation you get closer to $\sqrt{2}$. This can be thought of as describing a generalized continued fraction

$$2 - \cfrac{2}{4 - \cfrac{2}{4 - \cfrac{2}{\cdots}}}$$

for $\sqrt{2}$. The dynamics of Möbius transformations in general are fairly well-understood; for the transformation $p \mapsto \frac{ap+b}{cp+d}$ the possible fixed points are the roots of the quadratic polynomial $cp^2 + dp = ap + b$, and using the Banach fixed point theorem (or a more specific closed form of iterations of Möbius transformations using linear algebra) one can determine which of the fixed points are attractive or repellent.

So if one wanted to design a Möbius transformation converging to $\sqrt{n}$ for some non-square $n$, this would require that $d = a, c = 1, b = n$, giving

$$p \mapsto \frac{ap + n}{p + a}$$

for some $a$, and it should not be hard to find a value of $a$ such that the fixed point $\sqrt{n}$ is attractive.

While this technique is a nice trick, because the polynomial $cp^2 + dp = ap + b$ is quadratic, it does not generalize to prove the corresponding result for roots of cubic or higher degree polynomials.


In the interest of making this question and answer more self-contained, here is the example in question. My answer is below.

1.1 Example We now show that the equation $$ p^{2}=2\tag{1} $$ is not satisfied by any rational $p .$ If there were such a $p,$ we could write $p=m / n$ where $m$ and $n$ are integers that are not both even. Let us assume this is done. Then (1) implies $$ m^{2}=2 n^{2}, \tag{2} $$ This shows that $m^{2}$ is even. Hence $m$ is even (if $m$ were odd, $m^{2}$ would be odd), and so $m^{2}$ is divisible by $4 .$ It follows that the right side of (2) is divisible by 4 , so that $n^{2}$ is even, which implies that $n$ is even.

$\qquad$The assumption that (1) holds thus leads to the conclusion that both $m$ and $n$ are even, contrary to our choice of $m$ and $n .$ Hence (1) is impossible for rational $p$.

$\qquad$We now examine this situation a little more closely. Let $A$ be the set of all positive rationals $p$ such that $p^{2}<2$ and let $B$ consist of all positive rationals $p$ such that $p^{2}>2 .$ We shall show that $A$ contains no largest number and $B$ contains no smallest.

$\qquad$More explicitly, for every $p$ in $A$ we can find a rational $q$ in $A$ such that $p<q$, and for every $p$ in $B$ we can find a rational $q$ in $B$ such that $q<p$.

$\qquad$To do this, we associate with each rational $p>0$ the number $$q=p-\frac{p^{2}-2}{p+2}=\frac{2 p+2}{p+2}.\tag3$$ Then $$q^{2}-2=\frac{2\left(p^{2}-2\right)}{(p+2)^{2}}.\tag4$$ $\qquad$If $p$ is in $A$ then $p^{2}-2<0,(3)$ shows that $q>p,$ and (4) shows that $q^{2}<2 .$ Thus $q$ is in $A$.

$\qquad$If $p$ is in $B$ then $p^{2}-2>0,(3)$ shows that $0<q<p,$ and (4) shows that $q^{2}>2 .$ Thus $q$ is in $B$.

(Transcribed from this image)

I think you hit the nail on the head. He was looking for a rational $y$ such that $q = p+y$ will have the desired properties in both cases.

First, if $p \in A$ we want $p<q \Leftrightarrow y > 0$ and if $p \in B$ we want $p>q \Leftrightarrow y < 0$. We might as well take advantage of the sign of $p^2-2$ in each case to achieve this by searching for a positive quantity $x$ such that $q = p - (p^2-2)x$.

As you showed, any choice $0 < x < 1/(p+\sqrt{2})$ will satisfy the requirements $p \in A \Rightarrow q \in A$ and $p \in B \Rightarrow q \in B$. He wanted to ensure $x$ was rational, and the easiest way to do this is to take $x = 1/(p+k)$ where $k$ is an integer larger than or equal to $2$. There's no need to complicate matters further than that, so he simply chooses the smallest $k$ which works, namely $2$.

My derivation was the same as yours, and I doubt you could get it more simple than that.

As for why he didn't comment on his choice: well, that's kind of just how Rudin is. He will rarely (if ever?) comment on the motivation for his proofs! It is endearing to some and perhaps a little infuriating to others.


Below I show that Rudin's approximation arises simply by applying by the secant method - a difference analog of Newton's method for finding successively better approximations to roots.enter image description here

As the linked Wikipedia article shows, the recurrence relation for the secant method is as below.

$$\rm S_{n+1}= \dfrac{S_{n-1}\ f\:(S_n) - S_n\ f\:(S_{n-1})}{f\:(S_n)-f\:(S_{n-1})}\qquad\qquad\qquad\qquad$$

For $\rm\ (S_{n-1},S_n,S_{n+1}) = (q,p,p')\ $ and $\rm\ f\:(x) = x^2-d\:,\:$ we obtain

$$\rm p'\ =\ \dfrac{q\:(p^2-d) - p\:(q^2-d)}{p^2-d-(q^2-d)}\ =\ \dfrac{(p-q)\:(p\:q+d)}{p^2-q^2}\ =\ \dfrac{p\:q+d}{p+q}$$

Finally specializing $\rm\: q = 2 = d\: $ yields Rudin's approximation $\rm\displaystyle\ p'\ =\ \frac{2\:p+2}{\ \:p+2}$

The secant method has beautiful connections with the group law on conics. To learn about this folklore, I highly recommend Sam Northshield's Associativity of the Secant Method. The reader already familiar with the group law on elliptic curves, but unfamiliar with the degenerate case of conics, might also find helpful some of Franz Lemmermeyer's expositions, e.g. Conics - a poor man's elliptic curves.

Finally, note this the approximation can be derived purely algebraically as follows.

Given lower and upper approximations to a square-root, we may obtain a better lower approximation $\rm\ p'\ $ by $\:$ "composing" $\:$ them,$\ $ namely:

THEOREM $\rm\displaystyle\quad\ \ q\ >\ \sqrt d\ > \ p\ \ \:\Rightarrow\:\ \ \sqrt d\ > \ p'\ >\ p\quad\ \ for\quad\ p' \:=\ \frac{p\:q+d}{p+q} $

Proof: $\rm\quad\displaystyle 0\ \: >\ (q-\sqrt d)\ \ (p-\sqrt d)\ =\ p\:q+d - (p+q)\:\sqrt d\ \ \Rightarrow\ \ \sqrt d\ >\ p'$

Finally $\rm\quad\quad\displaystyle p'-p\ =\ \frac{p\:q+d}{p+q} - p\ =\ \frac{\ d - p^2}{p+q}\: >\ 0\ \ \Rightarrow\ \ p'\ >\ p$