Can't EF game theory be applied to finite languages WITH function symbols?

Let's start with a few definitions.

Definition 1: Let $\mathcal{A}$ be a structure with domain $A$ and $B \subseteq A$. We define $\langle B \rangle$, the substructure of $\mathcal{A}$ generated by $B$, to be the smallest substructure of $\mathcal{A}$ containing $B$.

Definition 2: Let $G_n(\mathcal{A}, \mathcal{B})$ be defined as in the question. A play is a pair $\langle \bar{a}, \bar{b} \rangle$, such that $\bar{a} \in A$ and $\bar{b} \in B$, such that each element from $\bar{a}$ and $\bar{b}$ has been chosen by a player.

Definition 3: A play counts as a win for Player II if there's an embedding $f: \langle \bar{a} \to \bar{b}\rangle$ such that $f(\bar{a}) = \bar{b}$.

These are standard definitions generalizing the EF game to structures with functional symbols; cf., e.g., Hodges's Model Theory, p. 95, or Poizat's A Course in Model Theory, p. 35; the latter presents his definition in terms of local isomorphisms, i.e. partial embeddings, but the translation to the game-theoretic context is immediate. When dealing with function symbols, it's necessary to take the substructures generated by $\bar{a}$ and $\bar{b}$, otherwise the notion of an embedding is not well defined. For instance, consider two structures, $\mathcal{N} = (\omega, S)$ and $\mathcal{M} = (M, S')$, where $\mathcal{N}$ is the standard natural numbers with the successor function and $\mathcal{M}$ is a non-standard elementary end-extension of $\mathcal{N}$. Consider the play of $G_2(\mathcal{N}, \mathcal{M}$ $\langle \langle 2, 4\rangle, \langle 3, 5 \rangle$. In order to determine if this play is a partial embedding $h$, we should check if, e.g., $h(S(2)) = S'(h(2))$. But neither $S(2)$ nor $S'(h(2)) = S'(3)$ belong to the domain or image of $h$, so how can we evaluate this?

Notice that, in the case in which a language contains only relation symbols, the substructure generated by, e.g., $\bar{a}$, will just be the set of all elements from the tuple, so that this is indeed a generalization (it covers the relational case as a particular case).

Given this, it's not difficult to show that, if Player II has a winning strategy for $G_n(\mathcal{A}, \mathcal{B})$ for all $n$, then $\mathcal{A} \equiv \mathcal{B}$. On the other hand, the converse is true only for finite languages containing only relational symbols. Observe that both conditions are essential. I'll give two counter-examples, one showing that finiteness is essential, the other showing that the restriction to relational languages is also essential. Considering this pircture, a natural question to ask is: are there ways of recovering the theorem for finite functional languages? The answer is yes. In this answer, I'll therefore proceed in the following way: first, I'll present the counterexamples to the theorem in the case of either functional languages or else infinite relational languages. Afterwards, I'll show that a natural way of recovering the theorem (the one sketched in your question) works given a modified way of ranking the formulas of the language (this was mentioned by Andreas Blass in his comment).


Example 1: The theorem fails for finite languages with function symbols

Let $\mathcal{N} = (\omega, 0, S)$ and $\mathcal{M} = (M, 0, S)$, where $\mathcal{M}$ is an elementary end-extension of $\mathcal{N}$, i.e. $\omega \subseteq M$ and $\mathcal{N} \preceq \mathcal{M}$; $S$ is the usual successor function. Since $\mathcal{M}$ is an end-extension, it will contain a non-standard element, $c$.

Claim: The only substructure of $\mathcal{N}$ is $\mathcal{N}$ itself.

This is obvious from the fact that $\omega$ is the least set containing $0$ and closed under the successor function.

Claim: Any substructure of $\mathcal{M}$ will contain $\omega$ and, if it contains $c$, also any upper section of the "cut" determined by $c$.

This is also obvious from the fact that any substructure of $\mathcal{M}$ will need to be closed under the successor function.

Claim: If $h$ is an embedding between $\mathcal{N}$ and $\mathcal{M}$, and $c$ is a non-standard element, then $c$ won't be in the in the range of $h$.

This is clear from the fact that, for any $n$, there's a term denoting $n$, namely $S(S(S(\dots 0) \dots ))$ (the $S$ applied $n$ times to $0$). Since any embedding will take $0$ to $0$, it will take $n$ to $n$, and, thus, it won't take $n$ to $c$ (otherwise $c$ would be reached from $0$ by a finite application of the successor function, which can't happen).

Therefore:

Claim: Player I can win $G_1(\mathcal{N}, \mathcal{M})$ by choosing a non-standard number $c$ on the first round.

If Player I chooses $c$, then Player II must choose $n \in \omega$. But then, consider the substructures $\langle n \rangle = \mathcal{N}$ and $\langle c \rangle = \mathcal{M}'$. As we have just seen, there any embedding $h: \mathcal{N} \to \mathcal{M}'$ would have to send $n$ to $n$. Thus, there isn't any embedding $h$ such that $h(n) = c$, whence Player II loses the game.

A couple of observations: notice that the fact that $S$ is a function plays a crucial role in the proof. First, in determining the substructures generated by the elements chosen in the play. Second, in excluding the possibility that $h(n) = c$. If we replace the successor function by its graph, then (a) the substructures generated by $n$ and $c$ would just be their singletons, so Player II would win this particular play; and (b) Player II will be able to mimic Player I's moves by always choosing pairs which are or aren't in the successor relation, a much less restrictive condition than the one above.


Example 2: The theorem fails for infinite relational languages

I took this one from Monk's Mathematical Logic, exercise 26.47.

Let $\mathcal{A} = (\omega, R_i)_{i \in \omega}$ and $\mathcal{B} = (\omega, S_i)_{i \in \omega}$, where each $R_i$ and $S_i$ is an unary relation symbol. Define $R_0 = S_0 = \omega$ and $R_{i+1} = R_i \setminus \{i+1\}$ and $S_{i+1} = S_i \setminus \{i\}$. Thus, $R_1 = \omega \setminus \{1\}$, $S_1 = \omega \setminus \{0\}$, etc. I used different symbols to distinguish one structure from another, but we may assume they share a common language (say $\mathcal{L} = \{P_i\}_{i \in \omega}$, such that $P_i$ is interpreted as $R_i$ in $\mathcal{A}$ and $S_i$ in $\mathcal{B}$; I'll continue to use $R_i$ and $S_i$ throughout, though).

Claim: $\mathcal{A} \equiv \mathcal{B}$.

This should be clear enough: as there aren't any constant symbols, the only sentences will be quantifications over the $R_i$s and $S_i$'s and boolean combinations thereof. As each $R_i$ will be non-empty iff the corresponding $S_i$ is non-empty, it follows that the structures are elementary equivalent.

Claim: Player I has a winning strategy for $G_1(\mathcal{A}, \mathcal{B})$.

Player I must choose $0 \in A$ in the first round. Then $\mathcal{A} \models R_i x [0]$ for any $i \in \omega$, but there isn't any element from $B$ with the same property (say Player II chooses $n$; then $\mathcal{B} \not \models S_{n+1} x [n]$). As any partial embedding must preserve atomic formulas, Player II loses.

It's clear that the above example wouldn't work for structures with only finitely many relations, as Player II would always be able to choose a suitably high $n$.


Given the above, one natural way of trying to recovering the theorem is by modifying the definition of partial embedding in order to treat functions as relations, as you put it. This gives us condition (2) of your question. Using this, is it possible to prove the theorem for finite languages with function symbols? The answer is yes, as I'll show below, once you adopt a modified rank for formulas and terms (I took this idea from Ebbinghaus, Flum & Thomas, Mathematical Logic, chapter 12, exercise 3.15; the whole chapter is a nice exposition of Fraïssé's theorem, by the way). To see that this modification is necessary, consider the following example:

Let $\mathcal{A} = (\{a, b\}, f)$ and $\mathcal{B} = (\{c, d, e\}, f')$ with:

\begin{align*} f: a &\mapsto b\\ b &\mapsto b \end{align*}

and

\begin{align*} f': c &\mapsto d\\ d &\mapsto d\\ e &\mapsto c \end{align*}

Then Player II can always win the one round game. If (i) Player I plays $a$, Player II plays $c$, if (ii) Player I plays $b$, Player II plays $d$, and similarly with the reverse positions. Finally, if (iii )Player I plays $e$, Player II plays $a$. It seems to me that the embedding $j$ thus generated will always satisfy your condition (2): in case of (i), $a$ is not the image of anyone under $f$, and neither is $c$; (ii), the only element in $\mathrm{dom}(j)$ is $b$ itself, and $b = f(b)$ iff $d = h(b) = f(h(b)) = f(d)$. Case (iii) is analogous to (i).

However, $\mathcal{B} \models \exists x (f(f(x)) \not = f(x))$ (consider $f'(f'(e)) = d$, but $f'(e) = c$), but $\mathcal{A} \not \models \exists x (f(f(x)) \not = f(x))$ ($f(f(a)) = b = f(a)$ and $f(f(b)) = b = f(b)$). Therefore, Player II has a winning strategy for $G_1(\mathcal{A}, \mathcal{B})$, but $\mathcal{A} \not \equiv_1 \mathcal{B}$, as claimed.


Recovering the theorem

Let $\mathcal{L}$ be finite language, possibly with function symbols, and let $\mathcal{A}, \mathcal{B}$ be two $\mathcal{L}$-structures. Then:

Theorem: $\mathcal{A} \equiv \mathcal{B}$ iff, for every $n$, Player II has a winning strategy for the game $G_n(\mathcal{A}, \mathcal{B})$.

I won't prove the theorem in full, but merely note the modifications necessary in order to prove it (the rest is routine). As I said above, the key idea is to employ a modified criterion for ranking formulas and terms, $\mathrm{mrk}$. This rank is defined recursively as follows:

\begin{align*} \mathrm{mrk}(x) &= 0,\\ \mathrm{mrk}(c) &=1,\\ \mathrm{mrk}(f(t_1, \dots, t_n)) &= 1+ \mathrm{mrk}(t_1) + \dots + \mathrm{mrk}(t_n),\\ \mathrm{mrk}(t_1 = t_2) &= \mathrm{max}(0, \mathrm{mrk}(t_1) + \mathrm{mrk}(t_2) - 1), \\ \mathrm{mrk}(\neg \phi) &= \mathrm{mrk}(\phi),\\ \mathrm{mrk}(\phi \wedge \psi) &= \mathrm{max}(\mathrm{mrk}(\phi), \mathrm{mrk}(\psi)),\\ \mathrm{mrk}(\exists x \phi) &= 1 + \mathrm{mrk}(\phi). \end{align*}

Notice that the above is a generalization of quantifier rank for the relational case; that is, if $\mathcal{L}$ is relational, then $\mathrm{mrk}(\phi) = \mathrm{qr}(\phi)$.

The idea is now to adapt the proof of the finite relational case to the finite general case by employing this modified rank definition. This is mostly bookkeeping in order to show that the lemmas still go through once we make the necessary adaptations. I'll show how to do this for the crucial lemma 2.4.8 in Marker (p. 54):

Lemma: For each $n$ and $l$, there is a finite list of formulas $\phi_1, \dots, \phi_k$ of $\mathrm{mrk}$ at most $n$ in free variables $x_1, \dots, x_l$ such that every formula of depth at most $n$ in free variables $x_1, \dots, x_l$ is equivalent to some $\phi_i$.

The proof is exactly the same as in Marker, that is, an induction on $n$ which uses the disjunctive normal form theorem. The only thing we need to check is the base case, i.e. $n=0$. Given the original definition of $\mathrm{qr}$, this was problematic because just one function symbol generates infinitely many, non-equivalent formulas of $\mathrm{qr} = 0$ (consider, e.g., equalities involving iterations of the successor function). However, given the new definition, this is no longer a problem: for each constant symbol $c$, function symbol $f$ and relation symbol $R$, the formulas of $\mathrm{mrk}=0$ will be of the form $c=x$, $f(x_1, \dots, x_m)=x_{m+1}$, and $R(x_1, \dots, x_m)$. It's obvious that, up to changes in the variables, there will be only finitely many of those. The induction step then follows exactly as in Marker's proof.


Comments

So, what's happening here? Why the modified rank works? The idea is that, in order to translate formulas with function symbols into a relational setting, you must first "unnest" the formulas. An unnested formula is one of the form:

\begin{align*} x &= y\\ c&=y\\ f(x_1, \dots, x_n) &= y\\ R(x_1, \dots, x_n)& \end{align*}

In fact, there is the following theorem, which I'll state without proof:

Theorem: Every formula $\phi$ is equivalent to a formula $\phi^*$ with only unnested subformulas.

The rough idea of the proof is the following: suppose you have a formula of the type $f(t_1, \dots, t_n) = t_{n+1}$. Then this formula is equivalent to:

\begin{align*}\exists x_1, \dots,x_{n+1} (t_1 = x_1 \wedge \dots \wedge x_{n+1} = t_{n+1} \wedge f(x_1, \dots, x_n) = x_{n+1}).\end{align*}

Thus, it is possible to translate a language with function symbols into one with relational symbols: introduce relation symbols for the graph of the functions, unnest the formulas in which function symbols appear, and then substitute the function symbols in the unnested formula by the corresponding relation symbol (this theorem is a bit tedious to prove, so excuse me for not positing the proof). However, as you can see, this translation considerably raises the quantifier rank of a formula, which explains why the induction on quantifier rank is not enough to take care of the functional case.

Indeed, it's possible to use this idea to introduce directly a "patch" to the EF games that embody the same insight (see Hodges's Model Theory, section 3.3). Define $G_n[\mathcal{A}, \mathcal{B}]$, the unnested EF game, to be the same game as $G_n(\mathcal{A}, \mathcal{B})$, except that Player II wins if, given a play $(\bar{a}, \bar{b})$, for every unnested atomic formula $\phi$, $\mathcal{A} \models \phi(\bar{a}) \iff \mathcal{B} \models \phi(\bar{b})$. Using this, one is able to prove a similar version to the theorem above, except that, instead of employing formulas of quantifier rank $n$, one employs unnested formulas of quantifier rank $n$. The proof is essentially the same as in the relational case.