Proving the properties of polynomial functions without using calculus
I have a little project of mine, where I'm basically trying to prove most of the properties of polynomial functions without using any calculus whatsoever.
I've been able to show, using the completeness of the real numbers, that if a polynomial function $f: \mathbf{R} \rightarrow \mathbf{R}$ is such that $f(a) f(b) < 0$ for some reals $a < b$, then there exists a real number $c \in (a, b)$ such that $f(c) = 0$.
Then, I defined the concept of the derivative polynomial: Let $f: \mathbf{R} \rightarrow \mathbf{R}$ the polynomial function given by $\displaystyle f(x) = \sum_{k=0}^n a_k x^k$ for all $x \in \mathbf{R}$. We define the derivative of $f$ as the polynomial function $f' : \mathbf{R} \rightarrow \mathbf{R}$ given by $\displaystyle f'(x) = \sum_{k=1}^n k a_k x^{k-1}$ for all $x \in \mathbf{R}$. I am trying to show, by elementary means, that if $a < b$ and $f'(x) > 0$ for all $x \in (a, b)$, then $f$ is increasing on $(a, b)$, but I have no idea on how to do that.
Solution 1:
This assumes that it is ok to use that $[x,y]$ is compact (which follows from the completeness which you already used and the fact that such an interval is totally bounded). Let $x,y\in (a,b)$ and $x<y$. Let $z\in [x,y]$. Note that there is a polynomial $g$ such that $$\frac{f(z+h)-f(z)}{h}=f'(z)+hg(h)$$ for $h\in\Bbb R\setminus\{0\}$. This follows purely algebraically from the algebraic definition of the derivative.
By assumption $f'(z)\gt0$. It is not hard to see that polynomials are bounded on bounded intervals, hence for $h\gt0$ small enough, i.e. for $h\lt f'(z)/M$ where $M$ is some bound for $|g|$, we have $f'(z)+hg(h)\gt0$, hence $f(z+h)\gt f(z)$. Similarly for $h\lt0$ small enough in absolute value we have $f(z)\gt f(z+h)$. Thus for all $z\in[x,y]$ there is some $h\gt0$ such that $$f(z-h')\lt f(z)\lt f(z+h')$$ for all $0\lt h'\lt h$. As $[x,y]$ is (assumed to be) compact, there are a finite number of $x=z_1\lt \dots \lt z_n=y$ and $h_1,\dots,h_n\gt0$ such that $[x,y]\subseteq\bigcup_{i=1}^n (z_i-h_i,z_i+h_i)$ and that $f(z_i-h')\lt f(z_i)\lt f(z_i+h')$ for $0<h'<h_i$. This then implies that $f(x)\lt f(y)$ as follows:
There is some $i\gt 1$ such that $(z_1-h_1,z_1+h_1)\cap (z_i-h_i,z_i+h_i)\ne\emptyset$, let then $$z\in (z_1-h_1,z_1+h_1)\cap (z_i-h_i,z_i+h_i)$$ where we may assume $z_1\lt z\lt z_i$, so we can write $z=z_1+h_1'=z_i-h_i'$ with $h_1'\lt h_1,h_i'\lt h_i$. Thus $f(x)=f(z_1)\lt f(z_1+h_1')=f(z_i-h_i')\lt f(z_i)$. Then we can pick some $j\gt i$ such that $f(z_j)\gt f(z_i)$ and so on. As we have only a finite number of $z_i$ this has to stop at some point which can only be $z_n=y$, so $f(x)=f(z_1)\lt f(z_i)\lt f(z_j)\lt \dots\lt f(y)$.
Solution 2:
Hint
I would proceed as follows, mimicking calculus. Let $p$ be a polynomial.
- A polynomial is bounded on a finite interval. The proof is specific to polynomials and use basic properties of the reals.
- For any interval $[a,b]$ with $a \lt b$, it exists a sequence $\{x_n\}$ converging to $x \in[a,b]$ such that $M=\sup\limits_{x \in [a,b]} p(x) = \sup \{ p(x_n) \mid n \in \mathbb N\}$. This use completeness of the reals and least-upper-bound property based on 1.
- Prove Lipschitz continuity of $p$ on $[a,b]$. First for monomials using $x^n-y^n=(x-y)(x^{n-1} + x^{n-2}y + \dots +y^{n-1})$ and then using linearity.
- Using 3. and 4. prove that $p(x) = M$.
- Prove that if $p(a)=p(b)$, it exists $c \in (a,b)$ such that $p(c) = M$, even if itmeans changing $p$ into $-p$. This uses only points above.
- Prove Rolle's theorem for polynomials by proving that $p^\prime(c) = 0$. You can do that by contradiction based on point 3. for the derivative and the link between the derivative and the formula $x^n-y^n=(x-y)(x^{n-1} + x^{n-2}y + \dots +y^{n-1})$.
- Prove Mean Value Theorem for polynomials from Rolle's theorem for the polynomial $q(x) = p(x) - rx$ where $r = \frac{f(b)-f(a)}{b-a}$.
- Conclude the desired result using MVT for polynomials.
Solution 3:
First some definitions:
- a subset $F$ of $\mathbf{C}$ is a field if it contains the integers, and for every pair of numbers $a, b \in F$, the sum, difference, product and quotient (if $b \neq 0$) is also in $F$
- if $F$ is a field and $-1$ can not be written as a sum of squares: $-1 = a_1^2 + a_2^2 + \dots + a_n^2$. It turns out, this is equivalent to saying $F \subseteq \mathbf{R}$ but that uses a bit of calculus. For now, let's just think $F \subseteq \mathbf{R}$ but I will clarify the matter in a bit.
- if $F$ is a real field then it has a real closure, let me call it $F'$. By definition, $F'$ is the largest subset of $\mathbf{C}$ which contains $F$ and is also a real field. One can show (again using calculus), that this is the set of all $a \in \mathbf{R}$ which are the root of some polynomial $f(x)$ with coefficients in $F$.
Ok, so about the matter of the definition of a "real field." So what we need calculus for is the fundamental theorem of algebra:
every non-zero polynomial $f(x)$ with complex coefficients has a complex root
Consequentially, we do not need to extend $\mathbf{C}$ to a larger set in order for us to form the set $\mathbf{R}'$ which is the largest field for which:
- for every $a \in \mathbf{R}'$, either $a$ has a square root or $-a$ has a square root (e.g. $-2$ does not have a real square root but $2$ does)
- for every polynomial in $\mathbf{R}'$ of odd-degree, there is at least one root in $\mathbf{R}'$
Now of course, we know that $\mathbf{R}' = \mathbf{R}$ because using calculus we can show these facts, but working without calculus, all we can show is that $\mathbf{R} \subseteq \mathbf{R}'$.
Think about it like this: when we go from $\mathbf{Q}$ to $\mathbf{R}$ we use calculus/analytic methods to "fill in the gaps." When we go from $\mathbf{Q}$ to $\mathbf{Q}'$ we are still filling in gaps but only with roots of polynomials. (Here $\mathbf{Q}'$ is the set of all real numbers which are a root of some polynomial with rational coefficients such as $\sqrt{2}$ or the roots of $3x^5 - x^2 + 1$.)
The point is that by using algebra to fill in the gaps, we get an algebraic version of the intermediate value theorem:
Suppose $f(x)$ is a polynomial over $\mathbf{Q}'$ or $\mathbf{R}'$ and $a < b$ and $f(a) < 0 < f(b)$. Then there exists some $c \in (a, b)$ such that $f(c) = 0$.
The difference between this and the calculus IVT is that $f(x)$ must be a polynomial.
Proof: The idea here is to get an odd-degree polynomial and then use the definition of $F'$ to conclude that there is a root. Before we get to that, we must first enforce that the root is between $a$ and $b$.
So suppose $\{a_1, a_2,\dots,a_n\}$ are all the roots of $f(x)$ (repeated with the correct multiplicity). If any of them are between $a$ and $b$ we are done, so let's now consider what happens if none of them are.
If none of the roots are between $a$ and $b$, then let us suppose they are numbered such that $a_1,\dots, a_k$ are all smaller than $a$ and $a_{k+1},\dots,a_n$ are all larger than $b$. Now let us factor $f(x) = (x - a_1)\dots(x - a_n)g(x)$. Note that if $x \in [a, b]$ then $(x - a_1)\dots(x - a_k)$ is always positive and $(x - a_{k+1})\dots(x - a_n)$ has sign $(-1)^{n - k}$. So depending on what $n$ and $k$ are, we either have that $f(x)$ and $g(x)$ have the same sign or opposite signs. If the signs are opposite, replace $g(x)$ by $-g(x)$.
Now what we have is $g(a) < 0 < g(b)$ and $g$ has no roots. Here comes the tricky bit. We know that the polynomial $g(x)$ still has non-real roots and they come in conjugate pairs, what we don't know is that these conjugate pairs are complex (that's the fundamental theorem of algebra, which requires calculus), on the other hand, we do know that by possibly going outside of $\mathbf{C}$ we can find conjugate-pair roots for $g$ and this is sufficient.
So we can factor $g(x)$ into a bunch of products such as $$(x - (a + bi))(x - (a - bi)) = x^2 - 2ax - (a^2 - b^2)$$ What's important here is that this is back to being a polynomial over $\mathbf{R}'$ and moreover, we see that it is a quadratic with no real roots. But what we know about quadratics with no real roots is that such a polynomial is always positive (or always negative). Furthermore, since $g$ is a product of such quadratics, $g$ is always positive or always negative. But this can't be so because $g(a) < 0 < g(b)$. So we see that it was always impossible not to have a root between $a$ and $b$.
Whew, that was a lot of work. But what it does is establish the intermediate value theorem for polynomials without using calculus. Once you have a polynomial IVT, you can go through the same motions as the calculus-based proof (Rolle's theorem, Mean Value Theorem) except now your theorems are for polynomials. So with the same motions, we can say that $f(b) - f(a) = (b - a)f'(c)$ for some $c$ between $a$ and $b$...except we aren't assuming that $c \in \mathbf{R}$ just in $\mathbf{R}'$ which we still need to pretend could be larger and we only know that $f'(c) > 0$ if $c \in \mathbf{R}$ not necessarily if $c \in \mathbf{R}'$.
Imagine for a second that $\mathbf{R} = \mathbf{Q}$ and the derivative is $(x^2 - 2)^2$ and you'll see what I mean. For every rational point, $f'(c) > 0$ but there are some irrational points where $f'(c) = 0$. On the other hand, we do at least want to make sure that $f'(c)$ can't be negative.
Well suppose $f'(c) < 0$, then we can find, say by bisection, a sequence of rational points $a_1, a_2, \dots$ such that $|a_i - c|$ is as small as we want. But now we can use algebra (not calculus!) to get a bound on $|f'(a_i) - f'(c)|$ and show that $f'(c) < 0$ is impossible.
Ok so now we're back in our $(x^2 - 2)^2$ case where $f'(c)$ might be $0$. On the other hand, if $f'$ is not identically $0$ then $f'$ can't have more than a finite number of zeroes so we know that $f$ is still strictly increasing. Imagine $f$ is constant for $c \in \mathbf{Q}$, well then $f'$ has infinitely many zeroes by Rolle's theorem (our algebraic version), so $f'$ is identically $0$ so $f$ is constant not just on $\mathbf{Q}$ but on $\mathbf{Q}'$ as well.
TLDR: use algebra to replace IVT with a polynomial version and imagine that you're working on $\mathbf{Q}$ and your $f'(c)$ might be with $c \not\in \mathbf{Q}$ so you can't say $f'(c) > 0$ anymore.