What's the deal with empty models in first-order logic?
Asaf's answer here reminded me of something that should have been bothering me ever since I learned about it, but which I had more or less forgotten about. In first-order logic, there is a convention to only work with non-empty models of a theory $T$. The reason usually given is that the sentences $(\forall x)(x = x)$ and $(\forall x)(x \neq x)$ both hold in the "empty model" of $T$, so if we want the set of sentences satisfied by a model to be consistent, we need to disallow the empty model.
This smells fishy to me. I can't imagine that a sufficiently categorical setup of first-order logic (in terms of functors $C_T \to \text{Set}$ preserving some structure, where $C_T$ is the "free model of $T$" in an appropriate sense) would have this defect, or if it did it would have it for a reason. So something is incomplete about the standard setup of first-order logic, but I don't know what it could be.
The above looks like an example of too simple to be simple, except that I can't explain it to myself in the same way that I can explain other examples.
Both $(\forall x)(x = x)$ and $(\forall x)(x \not = x)$ do hold in the empty model, and it's perfectly consistent. What we lose when we move to empty models, as Qiaochu Yuan points out, are certain inference rules that we're used to.
For first-order languages that include equality, the set $S$ of statements that are true all models (empty or not) is a proper subset of the set $S_N$ of statements that are true in all nonempty models. Because the vast majority of models we are interested in are nonempty, in logic we typically look at sets of inference rules that generate $S_N$ rather than rules that generate $S$.
One particular example where this is useful is the algorithm to put a formula into prenex normal form, which is only correct when we limit to nonempty models. For example, the formula $(\forall x)(x \not = x) \land \bot$ is false in every model, but its prenex normal form $(\forall x)(x \not = x \land \bot)$ is true in the empty model. The marginal benefit of considering the empty model doesn't outweigh the loss of the beautiful algorithm for prenex normal form that works for every other model. In the rare cases when we do need to consider empty models, we realize we have to work with alternative inference rules; it just isn't usually worth the trouble.
From a different point of view, only considering nonempty models is analogous to only considering Hausdorff manifolds. But with the empty model there is only one object being ignored, which we can always treat as a special case if we need to think about it.
Isn't this a non-issue?
Many of the most common set-ups for the logical axioms were developed long ago, in a time when mathematicians (not just logicians) thought that they wanted to care only about non-empty structures, and so they made sure that $\exists x\, x=x$ was derivable in their logical system. They had to do this in order to have the completeness theorem, that every statement true in every intended model was derivable. And so those systems continue to have that property today.
Meanwhile, many mathematicians developed a fancy to consider the empty structure seriously. So logicians developed logical systems that handle this, in which $\exists x\, x=x$ is not derivable. For example, this is always how I teach first order logic, and it is no problem at all. But as you point out in your answer, one does need to use a different logical set-up.
So if you care about it, then be sure to use the right logical axioms, since definitely you will not want to give up on the completeness theorem.
(This is just a minor addition to the other excellent answers.)
There are categorical foundations for model theory: Makkai and Reyes, First order categorical logic (LNM 611). Here is a quote from page 72:
An important point is that we allow the (partial) domains $M(s)$ of $M$ to be empty. In model theory, usually the domains are stipulated to be non-empty. This difference slightly effects what sequents are considered logically valid; c.f. below.
Okay, if this is the answer, it is quite silly. The axioms of first-order logic in my notes include
$$(\forall x) p \Rightarrow p[t/x]$$
which is manifestly false for the empty model and $p = \perp$ so should just be thrown out and replaced by the correct axiom
$$(\forall x) p \wedge (\exists x) \Rightarrow p[t/x].$$
Nothing changes except for the empty set, where the statement $(\forall x) \perp$ is true but $\perp$ is not, so there is no contradiction.