Inductive proof of gcd Bezout identity (from Apostol: Math, Analysis 2ed)

I've done proofs in discrete mathematics, but I'm still at the stage where proofs with more than a few steps make me uncomfortable.

From Apostol's Mathematical Analysis [2nd Ed.] on page 5, we have

Theorem 1.6. Every pair of integers $a$ and $b$ has a common divisor $d$ of the form $$ d = ax + by $$ where $x$ and $y$ are integers. Moreover, every common divisor of $a$ and $b$ divides this $d$.

The proof (with my questions throughout) goes as follows:

Proof. First assume that $a \geq 0, b \geq 0$ and use induction on $n = a + b$. If $n = 0$ then $a = b = 0$, and we can take $d = 0$ with $x = y = 0$. Assume, then, that the theorem has been proved for $0, 1, 2, ..., n - 1$.

I am a little confused about taking $n$ to be $a + b$, since it's not obvious that all pairs $\{a, b\}$ would be covered by induction for all combinations of $a, b \in \mathbb{Z}$.

By symmetry, we can assume $a \geq b$. If $b = 0$ take $d = a, x = 1, y = 0$.

OK.

If $b \geq 1$ we can apply the induction hypothesis to $a - b$ and $b$, since their sum is $a = n - b \leq n - 1$. Hence there is a common divisor $d$ of $a - b$ and $b$ of the form $d = (a - b)x + by$.

I'm going to let $a' = a - b$, let $b' = b$ and let $d' = a'x + b'y$. (I wish Apostol did something like this to make his proofs clearer.)

I don't understand this logical step. Why does the fact that $a' + b' \leq n - 1$ imply that $d'$ exists and is a common divisor of $a'$ and $b'$? This seems like a huge leap.

This $d$ also divides $(a - b) + b = a$, so $d$ is a common divisor of $a$ and $b$ and we have $d = ax + (y-x)b$, a linear combination of $a$ and $b$.

At this point I am clueless. Why does $d$ divide $a$ and why does this imply it also divides $b$? And where does Apostol get $y-x$ from??

To complete the proof we need to show that every common divisor divides $d$. Since a common divisor divides $a$ and $b$, it also divides the linear combination $ax + (y-x)b = d$. This completes the proof if $a \geq 0$ and $b \geq 0$. If one or both of $a$ and $b$ is negative, apply the result just proved to $|a|$ and $|b|$.

Why not just do the entire proof with absolute values from the beginning?


Soft question: is it normal for authors to be very terse and not explain or give motivation for any steps? How do you go about trying to understand proofs that require a higher level of intuition than you currently have?


Theorem 1.6. Every pair of integers $a$ and $b$ has a common divisor $d$ of the form $$ d = ax + by $$ where $x$ and $y$ are integers. Moreover, every common divisor of $a$ and $b$ divides this $d$.

The proof (with my questions throughout) goes as follows:

Proof. First assume that $a \geq 0, b \geq 0$ and use induction on $n = a + b$. If $n = 0$ then $a = b = 0$, and we can take $d = 0$ with $x = y = 0$. Assume, then, that the theorem has been proved for $0, 1, 2, ..., n - 1$.

I am a little confused about taking $n$ to be $a + b$, since it's not obvious that all pairs $\{a, b\}$ would be covered by induction for all combinations of $a, b \in \mathbb{Z}$.

Define the height $h$ of a point $(a,b)\in\Bbb N^2$ by $\,h(a,b) = a+b.\,$ We prove by induction on height that the statement $P(a,b)$ is true for all points $\,(a,b)\in\Bbb N^2.\,$ Since this type of induction often proves puzzling to students I will explain it from a geometric viewpoint to aid intuition.

The points $(x,y)$ of height $n$ satisfy $\,x+y = n\,$ i.e. $\,y = n -x,\,$ so they are the lattice points on the line segment $\ell_n$ of slope $\,-1\,$ from $(0,n)$ to $(n,0)\,$ in the first quadrant. If we rotate the plave $\,45^\circ $ counter-clockwise then then $\ell_n$ is the $n$'th horizontal line in the partition of the first quadrant (looking up from the origin).

These lines $\ell_n$ partition $\Bbb N^2$ so to prove that the statement $P$ is true for all points in $\Bbb N^2\,$ it suffices to prove that the statement $P$ is true for all points on each line $\,\ell_n,\,$ which we do by complete induction on $\,n,\,$ lifting the truth of $P$ on lower height lines $\ell_k,\ k < n\,$ up to the line $\,\ell_n.\,$

By symmetry, we can assume $a \geq b$. If $b = 0$ take $d = a, x = 1, y = 0$. If $b \geq 1$ we can apply the induction hypothesis to $a - b$ and $b$, since their sum is $a = n - b \leq n - 1$. Hence there is a common divisor $d$ of $a - b$ and $b$ of the form $d = (a - b)x + by$.

I'm going to let $a' = a - b$, let $b' = b$ and let $d' = a'x + b'y$. (I wish Apostol did something like this to make his proofs clearer.)

I don't understand this logical step. Why does the fact that $a' + b' \leq n - 1$ imply that $d'$ exists and is a common divisor of $a'$ and $b'$? This seems like a huge leap.

$h(a',b') = h(a\!-\!b,b) = \color{#c00}a\!-\!b\!+\!\color{#c00}b = \color{#c00}n\!-\!b <n $ (by $\,b\ge 1)$ so $\,(a',b')\,$ is on lower height line $\,\ell_{n-b}\,$ so $P(a',b')$ is true (our induction hypothesis is that $P$ is true for all points on lower height lines).

Here $P(a,b) := [\![\,d\mid a,b\,$ and $\,d = ax+by\,$ for some $\,x,y\in\Bbb Z\,]\!],\,$ so $\,P(a',b')$ $\,\Rightarrow\,d\mid a',b'\,$ i.e. $\,d\mid a\!-\!b,\,b\,$ and $\,d = a'x+b'y = (a-b)x+by$.

This $d$ also divides $(a - b) + b = a$, so $d$ is a common divisor of $a$ and $b$ and we have $d = ax + (y-x)b$, a linear combination of $a$ and $b$.

At this point I am clueless. Why does $d$ divide $a$ and why does this imply it also divides $b$? And where does Apostol get $y-x$ from??

Here we are transforming the lower height statement $P(a',b')$ into the form $P(a,b)$ at height $n$. From lower height we have $\,d\mid a\!-\!b,\,b\,$ so $\,d\mid (a\!-\!b)+b = a,\,$ hence $\,d\mid a,b,\,$ which is what we need for $\,P(a,b)\,$ at height $n$. Similarly we lift the linear combination by rearranging it into the desired form $\,d = (a\!-\!b)x + by = ax+b(y\!-\!x) = ax+by'$ in the required $P(a,b)$ form.

To complete the proof we need to show that every common divisor divides $d$. Since a common divisor divides $a$ and $b$, it also divides the linear combination $ax + (y-x)b = d$. This completes the proof if $a \geq 0$ and $b \geq 0$. If one or both of $a$ and $b$ is negative, apply the result just proved to $|a|$ and $|b|$.

Why not just do the entire proof with absolute values from the beginning?

Because peppering sign handling throughout the proof would obfuscate the essence of the matter, which has nothing to do with signs. As you've seen, the proof can be challenging to understand already without this extra complexity.


Soft question: is it normal for authors to be very terse and not explain or give motivation for any steps? How do you go about trying to understand proofs that require a higher level of intuition than you currently have?

Yes, unfortunately many proofs are presented completely unmotivated so you have to "reverse engineer" them to discover the underlying intuition.

The intuition is obfuscated in this presentation. They key idea is that sets of integers closed under subtraction are closed under remainder so closed under gcd, so they are precisely the multiples of their least positive element (= gcd of all elements), as is easily proved by descent using the Euclidean algorithm (in subtractive form (as here) or remainder form). This is explained in elementary language in this answer. It will be clarified if you study algebra (viz. Euclidean domains are PIDs).


I am a little confused about taking $n$ to be $a+b$, since it's not obvious that all pairs $\{a,b\}$ would be covered by induction for all combinations of $a,b\in\mathbb{Z}$.

Note at this point in the proof we've already restricted our attention only to all non-negative integer $a,b$, according to the very first statement "First assume that $a\ge0$, $b\ge0$". The proof will come back to all integers in the very end. But for now $a,b$ are non-negative. For any such non-negative integers $a,b$, their sum $n=a+b$ is also a non-negative integer. So induction by $n\ge0$ will cover all possible pairs $\{a,b\}$ that we're currently considering.

Why does the fact that $a′+b′\le n−1$ imply that $d$ exists and is a common divisor of $a′$ and $b′$?

He didn't say that yet. But he will justify it in the next paragraph. For now, here's what has been said, using your notation for more clarity.

Let $a′=a−b$ and let $b′=b$. Then by the induction hypothesis there exists $d=a′x+b′y$ satisfying the conclusion of the theorem for $a'$ and $b'$, which in particular means that $d$ is a common divisor of both $a'=a-b$ and $b'=b$. Note that I intentionally used the notation of "$d$" rather than "$d'$" for this new number.

Before we move on to the next part, let me reiterate where we are. For now, this $d$ has been found for $a'=a-b$ and $b'=b$, but not for $a$ and $b$ yet. However, as the next step, we will show that the very same $d$ works for $a$ and $b$ too.

Why does $d$ divide $a$ and why does this imply it also divides $b$?

Since $a=(a-b)+b=a'+b'$ and we know that $d$ divides both $a'$ and $b'$, it also divides their sum. And it divides $b=b'$ from the previous step.

And where does Apostol get $y−x$ from?

From $d=a'x+b'y=(a-b)x+by=ax-bx+by=ax+b(y-x)$.

Why not just do the entire proof with absolute values from the beginning?

That's effectively exactly what he did by saying that first of all we consider the case of $a,b\ge0$.


Your crucial problem with this proof would appear to be at the point where you say:-" I don't understand this logical step. Why does the fact that a′+b′≤n−1 imply that d exists and is a common divisor of a′ and b′? This seems like a huge leap."

If you consider the first paragraph of the proof you will see it stated that we are assuming that the theorem is true if $a+b\le n$.

Since $(a-b)+b$ is less than $a+b=n$ we can therefore assume the theorem to be true for $(a-b)$ and $b$ and that is precisely what Apostol has done.