Represent the definition of elementary substructure in FOL
Per the comments, I'm going to address the question: "How do we talk about elementary substructurehood inside $\mathsf{ZFC}$ (which is a first-order theory)?"
This question reduces to the following more natural one:
How can we define, given a structure $\mathcal{M}$, the set $\mathcal{ED}(\mathcal{M})$ of all true sentences in the language of $\mathcal{M}$ + parameters from $\mathcal{M}$?
(Here "$\mathcal{ED}$" stands for "elementary diagram.")
The rough idea is as follows. First, we define the set of all "$\mathcal{M}$-formulas" (= formulas in the language of $\mathcal{M}$ + constants corresponding to elements of $\mathcal{M}$) and "$\mathcal{M}$-sentences" (= $\mathcal{M}$-formulas without free variables - this is what we actually care about). This is no harder than defining formulas and sentences as usual, the only subtlety being the need to pick at the outset a family of unused constant symbols to name the elements of $\mathcal{M}$.
To recap, we now have a formula which defines an assignment to each structure $\mathcal{M}$ in a language $\Sigma$ a set $Sent(\mathcal{M})$ of sentences in a larger language $\Sigma_\mathcal{M}$. We now make a further definition:
A satisfaction function for $\mathcal{M}$ is a function $f:Sent(\mathcal{M})\rightarrow\{0,1\}$ satisfying the following rules:
If $\varphi\in Sent(\mathcal{M})$ is atomic, then $f(\varphi)=1\iff \mathcal{M}\models\varphi$.
- Yes, I just used "$\models$" - don't worry, I'll explain why that's okay below.
If $\varphi$ has the form $\neg\psi$ then $f(\varphi)=1-f(\psi)$.
If $\varphi$ has the form $\psi\wedge\theta$ then $f(\varphi)=\min\{f(\psi),f(\theta)\}$.
If $\varphi$ has the form $\forall x\psi(x)$ then $f(\varphi)=\min_{a\in\mathcal{M}}f(\psi[x/c_a])$, where $c_a$ is the new constant symbol corresonding to $a$.
We then prove in $\mathsf{ZFC}$ (or indeed vastly less) that for every $\mathcal{M}$ there is exactly one satisfaction function for $\mathcal{M}$; this is a quick application of the recursion theorem (or a slightly less quick application if you're using a narrower version of the theorem).
The only weird point in the above is the use of $\models$ in the atomic clause. But the point is that we only every ask about the atomic senteces true in $\mathcal{M}$ here, so we just need an "atomic satisfaction predicate." And this is easy: it's basically already in $\mathcal{M}$ itself, in the form of $\mathcal{M}$'s interpretation of the various symbols in $\Sigma$.
We then define $\mathcal{ED}(\mathcal{M})$ as the set of $\Sigma_\mathcal{M}$-sentences which the unique satisfaction function for $\mathcal{M}$ maps to $1$. Elementary substructurehood is then definable as $$\mathcal{A}\preccurlyeq\mathcal{B}\quad\iff\quad \mathcal{ED}(\mathcal{A})\subseteq \mathcal{ED}(\mathcal{B}).$$
(There is actually a minor subtlety here around picking new constant symbols corresonding to elements of structures, the point being that our comparison of $\mathcal{ED}(\mathcal{A})$ and $\mathcal{ED}(\mathcal{B})$ presupposes that $\mathcal{B}$ and $\mathcal{A}$ use the same constant symbols for elements of $\mathcal{A}$. This can be addressed in a couple different ways: we can talk about the "translation" of $\Sigma_\mathcal{A}$ into $\Sigma_\mathcal{B}$, or we can whip up a uniform way of picking constant symbols.)
As a coda, here are a couple comments:
-
Note that the definition of $\mathcal{ED}$ above is really just the standard definition of $\models$ as it appears in logic textbooks; as usual, the $\mathsf{ZFC}$ implementation is just a careful rephrasing of the standard presentation. In particular, this tells us how to talk about (say) quantifier elimination in the language of set theory: a theory $T$ in a language $\Sigma$ eliminates quantifiers iff for every $\Sigma$-formula $\varphi(x_1,...,x_n)$ there is some quantifier-free $\Sigma$-formula $\psi(x_1,...,x_n)$ such that for every $\Sigma$-structure $\mathcal{M}$ and every tuple of elements $a_1,...,a_n\in\mathcal{M}$ we have $$f(\varphi[x_1/c_{a_1},..., x_n/c_{a_n}])=f(\psi[x_1/c_{a_1},..., x_n/c_{a_n}])$$ where $f$ is the unique satisfaction function for $\mathcal{M}$. This line of thought addresses your other question by showing how the various notions of model theory are expressible in set theory with no funny business, so proofs via model theory are totally genuine; my comment there meanwhile is aimed at addressing the general "sense of mystery" around model theory which I think enhances these concerns.
-
This argument also serves as a "blueprint" for more complicated logics. We can, again in $\mathsf{ZFC}$, define what it means for one structure to be an elementary substructure of another with respect to (say) second-order logic in place of first-order logic; the key point is in the definition of the appropriate analogue of the elementary diagram $\mathcal{ED}(\mathcal{M})$, and this will just consist of the standard description of the semantics of that logic. So e.g. part of the definition for second-order logic is "If $\varphi$ has the form $\exists X\psi(X)$," then $f(\varphi)=\min_{A\subseteq\mathcal{M}}f(\psi[X/R_A])$" where now $R_A$ is a fresh relation symbol naming $A$ - since our quantifiers are ranging over more things, we need to use more "expanded" language than just fresh constant symbols naming elements of the structure. The general study of logics beyond first-order logic is called abstract model theory, and - while quite technical - the standard text on the subject is the collection Model-theoretic logics.