Polynomial map is surjective if it is injective

Solution 1:

I would like to comment, that the model theoretic proof really is not that complicated, and this is coming from someone who is very algebro geometrically minded.

While there are many places you can probably find this on the internet, let me rephrase the argument here. Perhaps I'll say it in some way that is, for some reason, more clear to you. I will be semi-informal about the intuitive stuff, in an effort to make it more comprehensible.

I will include many of the proofs of the important theorems, although they can be easily disregarded. It maybe worth reading them though just to convince yourself how trivial their proofs really are.

Also, I am no model theorist, by any standard of the term, so there is a possibility I'll make a mistake below. I hope, in that case, one of our resident model theorists could correct me :)


Recall that a language $\mathcal{L}$ is nothing more than a string of symbols over some "signature/alphabet" of base symbols. For example, consider the language of rings $\mathcal{L}_\text{ring}$ which is sentences formed out of the alphabet $\{+,\cdot,0,1\}$. Another example might be the language of groups $\mathcal{L}_\text{group}$ which is formed out of the alphabet $\{\cdot,e,i\}$ (where $\cdot$ is supposed to be a binary operation, $e$ the identity element of the group, and $i$ the inverse function).

Now, all of the elements of this alphabet are supposed to represent either a function (with some arity), a constant (just some element), or a relation. For example, in $\mathcal{L}_\text{group}$ the symbol $e$ is a constant, and the functions $\cdot$ and $i$ are functions with arities $1$ and $2$ respectively.

Now, why I am distinguishing between constant symbols and function symbols (since they are at that point just meaningless symbols) seems silly, until one defines an $\mathcal{L}$-structure. Namely, given a language $\mathcal{L}$, an $\mathcal{L}$-structure $S$ of that language is nothing more than a set $S$ with an interpretation of each of the symbols of $\mathcal{L}$ in terms of $S$. For example, an $\mathcal{L}_\text{group}$ structure is nothing but an interpretation of $e$, $\cdot$, and $i$ in $S$. What does this mean? Well, since $e$ is a constant symbol, it's interpretation should just be an element $e\in S$ that I've picked. Since $\cdot$ is a function symbol with arity $2$ an interpretation is just some function $\cdot:S^2\to S$.Finally, since $i$ is a function symbol with arity $1$ an interpretation is just a function $i:S\to S$.

Now, note that there was absolutely no restriction on how I interpreted my symbols in an $\mathcal{L}$-structure, and moreover there are many such interpretations. As an example, I can make $\{0,1\}$ an $\mathcal{L}_\text{group}$-structure by saying $e=0$, $\cdot:\{0,1\}^2\to\{0,1\}$ just sends everything to $1$, and $i:\{0,1\}\to\{0,1\}$ just sends everything to $0$. Indeed, at this point I have not dictated that the interpretations satisfy any properties. But, of course, this seems silly. We'd expect that the language of groups should have something to do with groups. This is precisely where a theory comes in.

A theory $\mathcal{T}$ in a language $\mathcal{L}$ is nothing more than a collection of sentences using only the alphabet, the standard logical quantifiers (e.g. $\forall,\exists$), and the standard logical connectives (e.g. $\vee,\wedge$,etc.). For example, the theory $\mathcal{T}_\text{group}$ in $\mathcal{L}_\text{group}$ consists of the following three sentences

$$1.(\forall x,y,z)(x\cdot(y\cdot z)=(x\cdot y)\cdot z)$$ $$2.(\forall x)(\cdot(x,i(x))=e=\cdot(i(x),x)\text{ }$$ $$3.(\forall x)(\cdot(x,e)=x=\cdot(e,x)\quad\quad$$

A model of a theory $\mathcal{T}$ is then nothing but an $\mathcal{L}$-structure for which the interpretations of the symbols satisfy the sentences in $\mathcal{T}$ (where we are quantifying over the elements of the set). For example, my interpretation of $\mathcal{L}_\text{group}$ in the set $\{0,1\}$ is NOT a model of the theory $\mathcal{T}_\text{group}$. That said, the $\mathcal{L}_\text{group}$-structure on the set $\{0,1\}$ where $e=0$, $\cdot(0,1)=\cdot(1,0)=1$, $\cdot(1,1)=\cdot(0,0)=0$, and $i(0)=0$, and $i(1)=1$ is a model of $\mathcal{T}_\text{group}$. In fact, I think you can see that a model of $\mathcal{T}_\text{group}$ is nothing more than a group!

Now that all of the basic terminology is established, which really was quite intuitive, we can finally discuss something substantive. I've often been told that there are only two theorems in model theory. The first is the following:

Theorem(Godel,Malstev): A theory of first order sentences has a model if and only if every finite subset has a model.

This, from what I understand (I've never seen the proof) isn't really that complicated. In fact, if you interpret correctly in terms of Stone spaces it apparently comes directly from Tychnoff's theorem. It also has a very, intuitively obvious, formulation in terms of ultraproducts, in which case it follows from Łos's theorem. The intuition for the correctness of the compactness theorem follows much like the intuition in algebraic geometry that one can often descend problems about schemes $X/\mathbb{C}$ to a statement about some scheme $X'/R$ for some finitely generated subring $R$ of $\mathbb{C}$ (this is Grothendieck's trick of "passing to the limit").

The second foundational theorem of model theory is the following:

Theorem(Lowenheim-Skolem): If $\mathcal{T}$ is a theory which admits an infinite model, then it admits a model of size $\kappa$ for every $\kappa\geqslant \max\{\# \mathcal{L},\aleph_0\}$.

The above theorem actually is two theorems rolled into one: the "upward" Lowenheim-Skolem theorem, and the "downard" Lowenheim-Skolem theorem. The upward part says that if you have an infinite model of size $\kappa$, you have a model of size $\lambda$ for every $\lambda\geqslant\kappa$. The downward part should now have obvious meaning.

The downward part of Lowenheim-Skolem is annoying and elementary--you explicitly construct the models. This is also where the size of the language comes into play. Intuitively, the size of the language comes into play since your language could contain nothing but constant symbols, all of which are dictated by your theory to be distinct. The upward Lowenheim-Skolem theorem is easy to prove from compactness and the downward part of Lowenheim-Skolem.

Proof(Upward Lowenheim-Skolem) Let $M$ be your model of size $\kappa$, and let $\lambda\geqslant\kappa$ be a cardinal. Consider the language $\mathcal{L}'$ which is obtained from the language of $\mathcal{T}$ by appending the set of constants $\{c_i\}_{i\leqslant\lambda}$. Let $\mathcal{T}'$ be the theory obtained by appending the sentences $\varphi_{i,j}:c_i\ne c_j$ for all $i,j$.

Note then that for all finite subsets $\Delta$ of $\mathcal{T}'$, there is a model of $\Delta$. Indeed, consider the finitely many sentences $\varphi_{i,j}\in \Delta$, with $(i,j)\in I$ where $I$ some finite set. Let us define a $\mathcal{L}'$-structure $M_\Delta$, whose underlying set is the same as $M$. We interpret $\mathcal{L}$ in $M_\Delta$ the same way we interpretted it in $M$, and for each $\{c_i\}_{i\leqslant\lambda}$ we choose some element $c_i\in M$. We choose the $c_i$ though such that $c_i\ne c_j$ if $(i,j)\in I$. Note then that by the very construction of $M_\Delta$, that $M_\Delta$ is a model of $\Delta$ as desired.

Thus, by compactness, $\mathcal{T}'$ admits some model $N$. Necessarily, $\# N\geqslant\lambda$ since $N$ interprets $\lambda$ many symbols $c_i$ which, by virtue of modeling $\mathcal{T}'$, are all distinct. We then get a model of size exactly $\lambda$ by applying downward Lowenheim-Skolem. $\blacksquare$

The amazing thing is that given the Compactness Theorem everything else becomes very easy at least as far as the proof of Ax-Grothendieck is concerned.

Before we get to the Ax-Grothendieck theorem though, we first need to discuss some technically powerful, but surprisingly simple theorems.

Let us say that a theory $\mathcal{T}$ is complete if for every sentence $\varphi$ in the language, either $\varphi$ is in $T$ or $\neg\varphi$ ($\neg$ is negation) is in $\mathcal{T}$. This says that every model $M$ of $\mathcal{T}$ must either satisfy $\varphi$, or every model $M$ of $\mathcal{T}$ must satisfy $\neg\varphi$. For example, the theory $\mathcal{T}_\text{group}$ is NOT complete. Consider the sentence $\varphi$ given by $(\forall x,y)(x\cdot y=y\cdot x)$. Then, there are models of $T$ for which $\varphi$ holds (e.g. abelian groups) and those for which $\varphi$ does not hold (non-abelian groups). Thus, being a complete theory is a a very, very strong thing, yet an obviously desirable one. Why is it so desirable? Because, to check that some sentence is true for all models of the theory, one needs only check it is true for ONE model of the theory.

Now, while completeness seems like a very attractive property, it has some obvious immediate downsides. First, one would expect that most theories aren't complete (think of three theories--none of them are probably complete). Second, even if something is complete, it seems very difficult to prove it is such--think about how one might attack such a problem. That said, there is a very important model theoretic property of a theory which guarantees completeness, and is a property that is secretly familiar to us from our everyday mathematical lives. This property is $\kappa$-categoricity.

Let us say that a theory $\mathcal{T}$ is $\kappa$-categorical, if $\mathcal{T}$ admits, up to isomorphism, only one model of cardinality $\kappa$. Isomorphism between two models $M,M'$ of $\mathcal{T}$ is what you'd expect--it's a bijection, which preserves the structure dictated by the language (in all of the theories you'd care about, rings, groups, etc. isomorphisms are what'd you expect). Why is $\kappa$-categoricity so nice? Well, because of the following:

Theorem(Vaught's Test): Suppose that $\mathcal{T}$ is $\kappa$ categorical where $\kappa\geqslant \max(\aleph_0,|\mathcal{L}|)$, and every model of $\mathcal{T}$ is infinite. Then, $\mathcal{T}$ is complete.

Proof: If $\mathcal{T}$ were not complete, there would exist some sentence $\varphi$ in the language of $\mathcal{T}$ such that there are models $M$ and $M'$ of $\mathcal{T}$ for which $\varphi$ holds in $M$ and $\neg\varphi$ holds in $M'$. Consider then the theories $\mathcal{S}=\mathcal{T}\cup\{\varphi\}$ and $\mathcal{S}'=\mathcal{T}\cup\{\neg\varphi\}$. Note then that $M$ is a model of $\mathcal{S}$ and $M'$ is a model of $\mathcal{S}'$. Since $M$ and $M'$ are infinite (since all models of $\mathcal{T}$ are) we have by Lowenheim-Skolem that there is a model $N$ of $\mathcal{S}$ and $N'$ of $\mathcal{S}'$, each of size $\kappa$. Evidently $N$ and $N'$ are models of $\mathcal{T}$ of size $\kappa$, and so by assumption of $\kappa$-categoricity, isomorphic. But, this is a contradiction since $\varphi$ holds in $N$ but not in $N'$. $\blacksquare$

As an example of a $2^{\aleph_0}$-categorical theory, consider the theory of $\mathbb{F}_2$ vector spaces (defined in the way you'd expect). Since, by cardinality considerations, any $\mathbb{F}_2$ vector space of cardinality $2^{\aleph_0}$ must be dimension $2^{\aleph_0}$ any two are isomorphic.

It's interesting to note that, in fact, $\mathbb{F}_2$ vector spaces are $\kappa$-categorical for every uncountable cardinal $\kappa$. This is no mistake, since Morley's categoricity (a very deep theorem) says that if a theory in a countable language is $\kappa$-categorical for one uncountable cardinal, it's $\lambda$-categorical for all uncountable cardinals $\lambda$.


Ok, so now onto what you actually care about.

Let us define the theory $\mathsf{ACF}$ of algebraically closed fields. This is a theory in the language of rings $\mathcal{L}_\text{ring}$, with the obvious extra sentences dictating commutativity and existence of inverses, as well as the existence of roots of all polynomials. This last condition is a little annoying to encode with just $\mathcal{L}_\text{ring}$ and logical quantifies/connectives. It goes something like for each $n$ adding in the sentence $(\forall a_0,...,a_n)(\exists x)(a_0+\cdots+a_n x^n=0)$.

Now, the theory $\mathsf{ACF}$ is NOT complete. Indeed, consider the sentence $(\forall x)(x+x=0)$. Then, this sentence is modeled by $\overline{\mathbb{F}_2}$ but the negation is modeled by $\mathbb{C}$. Somewhat amazingly, this type of sentence (specifying characteristic) is the ONLY obstruction to completeness.

To clarify this statement, let us define the following theories. For each prime $p$, let $\mathsf{ACF}_p$ denote the theory $\mathcal{T}_\mathsf{ACF}$ with the extra sentence $(\forall x)(\underbrace{x+\cdots+x}_{p\text{ times}})=0$ thrown in. Evidently the models of $\mathcal{T}_{\mathsf{ACF}_p}$ are just the algebraically closed fields of characteristic $p$. Note though that we also want to define the theory $\mathsf{ACF}_0$ of algebraically closed fields of characteristic $0$. Now, how we do this is VERY important. Instead of throwing in the sentence $(\forall x)(\underbrace{x+\cdots+x}_{p\text{ times}})=0$ we throw in its negation FOR ALL $p$. In particular, we specify characteristic $0$ by specifying not characteristic $p$ for all $p$. In particular, it takes infinitely many sentences to specifying characteristic $0$.

Now, my statement that the only thing stopping completeness for $\mathsf{ACF}$ is justified by the following:

Theorem: For every $p=\text{prime},\infty$, $\mathsf{ACF}_p$ is $\kappa$-categorical for every uncountable cardinal $\kappa$.

This will imply by Vaught's test (since every algebraically closed field is infinite and the language of rings is countable) that each $\mathsf{ACF}_p$ is complete!

The proof is actually very simple.

Proof: Suppose that $K,K'$ are two uncountable algebraically closed fields of the same characteristic. Suppose that $k$ is the prime subfield of $K,K'$ (i.e. $k=\mathbb{F}_p$ or $k=\mathbb{Q}$). Then, by cardinality considerations we have that

$$\text{tr.deg}_k K=\# K=\kappa=\# K'=\text{tr.deg}_k K'$$

Thus, we know that there are embeddings $K\hookleftarrow k(\{x_i\}_{i\leqslant\kappa})\hookrightarrow K'$ such that $K/k(\{x_i\})$ and $K'/k(\{x_i\})$ are both algebraic. But, since $K,K'$ are algebraically closed, we can conclude that $K$ and $K'$ must both be algebraic closures of $k(\{x_i\})$ and thus isomorphic. $\blacksquare$

Just to reiterate, this tells us, for example, that to check whether or not a sentence is true for all algebraically closed fields of characteristic $0$, it suffices to prove it's true for $\mathbb{C}$ (this, and several other various forms, is known as the Lefschetz principle).

Ok, with this, we can finally state the "big theorem" you were alluding to

Theorem: Let $\varphi$ be a sentence in $\mathcal{L}_\text{ring}$ then the following are equivalent

  1. The sentence is true for some algebraically closed field $K$ of characteristic $0$.
  2. The sentence is true for all algebraically closed fields of characteristic $0$.
  3. The sentence holds true for an algebraically closed field K of characteristic $p$ ,for arbitrarily large $p$
  4. The sentence holds true for all algebraically closed fields of characteristic $p$ for arbitrarily large $p$.

Proof: The equivalence of 1 and 2, and the equivalence of 3 and 4, follow from the completeness of $\mathsf{ACF}_0$ and $\mathsf{ACF}_p$ respectively. To prove that 1 and 3 are equivalent, we merely use compactness. Indeed, we want to show that $\mathcal{T}_{\mathsf{ACF}_0}\cup\{\varphi\}$ (where $\varphi$ is our sentence) has a model. But, by compactness, $\mathcal{T}_{\mathsf{ACF}_0}\cup\{\varphi\}$ will have a model if and only if every FINITE subset of $\mathcal{T}_{\mathsf{ACF}_0}\cup\{\varphi\}$ has a model. But, any finite subset $\Delta$ cannot contain the statements $(\forall x)(\underbrace{x+\cdots+x}_{p\text{ times}})=0$ for all $p$, and so, in particular, for a large enough prime $p$, our finite subset cannot contain $(\forall x)(\underbrace{x+\cdots+x}_{p\text{ times}})=0$. In particular, choosing a $p_0$ large enough so that $\varphi$ has a model $M$ in $\mathsf{ACF}_0$ (which we can do by assumption), we can see that $M$ is a model of $\Delta$. Thus (by compactness once again), we obtain a model of $\mathsf{ACF}_0$ as desired. $\blacksquare$

So, with this, we can make rigorous the proof of Ax-Grothendieck:

Theorem(Ax-Grothendieck): If $K$ is an algebraically closed field, every injective polynomial map $f:K^n\to K^n$ is surjective.

Proof: I leave it to you to show that we can take the statement "every injective polynomial map $K^n\to K^n$ is surjective" can be phrased in the language of $\mathsf{ACF}$ (it's long, and arduous, but elementary). Thus, by the above theorem, to prove this statement is true for all algebraically closed fields, it suffices to show that for each prime $p$, there is SOME algebraically closed field $K_p$ such that the statement is true. Let's choose $K_p=\overline{\mathbb{F}_p}$.

To prove the statement for $\overline{\mathbb{F}_p}$ we proceed as follows. Let $f:\overline{\mathbb{F}_p}^n\to\overline{\mathbb{F}_p}^n$ be injective. Choose any $\overline{b}\in \overline{\mathbb{F}_p}^n$. Let $\ell$ be the field obtained by adjoining the coordinates of $b$ and the coefficients of the polynomials defining $f$ to $\mathbb{F}_p$. Observe then that $f$ restricts to an injective polynomial map $f:\ell\to\ell$. But, by cardinality considerations, we must have that $f$ is surjective there, and thus $\overline{b}$ is in the image of $f$ as desired. The Ax-Grothendieck theorem follows. $\blacksquare$

Let me point out something somewhat miraculous about the above proof. The most obvious "Wow!" factor came from our ability to prove a statement in characteristic $0$ by working solely in characteristic $p$, but to me there is something equally amazing happening. Our proof in the case of $K=\overline{\mathbb{F}_p}$ COMPLETELY relied on the fact that $K/\mathbb{F}_p$ was algebraic. It would have failed for any other algebraically closed field of characteristic $p$ (because it would no longer be algebraic over $\mathbb{F}_p$). We were only able to conclude the result for the other algebraically closed fields of characteristic $p$ by compactness. This, to me, seems like magic.


The thing to notice, for me, about the above is how everything even Lowenheim-Skolem (although, only the upward part) followed formally from the compactness theorem. It really is quite astounding. I mean, it's so formal, and dare I say trivial, that even someone like me (who knows almost no model theory) can deduce it.

Something else to keep in mind about the above is the limited range of its powers. Upon first glance, one might expect the techniques above to revolutionize mathematics. I mean, it seems like such a profitable course of action to prove things about fields/algebro geometric things, by reducing it to the finite field case. The problem with this is simple--many of the statements we care about as algebraic geometers are outside the purview of first order logic (or at least phrasing them would be nightmarishly difficult). Try stating Riemann-Roch in the language of $\mathsf{ACF}$. So, while the above is powerful, and there are some deep philosophical consequences, it is not the end-all-be-all of mathematical theorems.

It is worth noting that if there were a subject where model theory ought to have a strong power over, it's the algebraic geometry of algebraically closed fields. There, not only do we have a well-behaved theory (e.g. $\mathsf{ACF}_p$ is complete, it has quantifier elimination, etc.) but the structure morphisms are polynomial maps--they are definable.

This, by the way, is how I feel about much of model theory (from my very uneducated point of view). It seems like a philosophically powerful, but practically ineffective theory. Besides the work of those like Hrushovski, I haven't heard of model theory being a sledgehammer in more "mainstream" mathematics. Just a thought.

Solution 2:

Please see the Wikipedia article on the Ax-Grothendieck Theorem.

Remark: This is really a comment, but I would not like to see the result disappear from MSE for lack of an answer.

Solution 3:

For an argument that doesn't use model theory, see for instance paragraph 4.1 "Injective endomorphisms are surjective" in Arno van den Essen, Polynomial Mappings and the Jacobian Conjecture. Here the more general claim is proven that every injective endomorphism of an algebraic variety over an algebraically closed field is surjective.

It is still, as far as I can see, a similar argument to the model theoretic argument: injectivity and non-surjectivity can be described using polynomial constraints, which makes it possible to reduce this to the case of a finite field. But it is a purely algebraic argument.

If people are interested, I can try to reproduce the proof here; it will be shorter than the argument by Alex Youcis above, but still relatively lengthy.