Logical/Etymological reason for unique conjugation of third person singular present tense

As this is a very broad question whose full answer merits several written books, I first present a brief orientation and outline of how we got here today, with pointers to more detailed material.


Closely related to this question are questions like these, some of which you may have actually been asking about indirectly:

  • Why is it ‑s not ‑xyzzy or ‑rumplestiltskin?
  • Why is it pronounced three ways?
  • Why is it sometimes spelled differently than other times?
  • Why isn’t it always “spelled the way it’s pronounced”?
  • What’s its relationship to its two sound-alikes, the plural inflection for nouns and the possessive enclitic?
  • Why don’t we have this in other persons and tenses or nonfinite forms?
  • Why don’t we have this in the preterite?
  • Why don’t we have this in the modals?
  • Why doesn’t the required subject suffice to say which person it is?
  • Why do we have this at all?
  • Since the plural of lives is live, why then isn’t the plural of has just plain *ha — and all the rest like that?

In those questions as well as in the one which I believe was asked, synchronic analysis fails to provide a satisfying answer, or really any at all in most cases. Instead one must examine the language diachronically to draw out a sensible answer, and a full treatment of that answer must reach back over six millennia.

Languages with mainly unbound morphemes — atomic units of meaning at the lexemic level only — are classified as analytic languages, while those whose individual words each comprise multiple bound morphemes are classified as synthetic languages.

An individual word in a synthetic language combines several bound morphemes where each little internal piece adds something to the word’s overall meaning. Morphological inflection can occur via affixes (give > giving, ox > oxen, walk > walked), via sound changes of vowels or consonants (give > gave, shoot > shot, man > men, mouse > mice, this > these), or via both (brother > brethren, swell > swollen, give > gavest).

Today’s English is for the most part an analytic language. For meaning, we rely far more strongly on fixed word-order and on little “function” words (including auxiliary verbs, articles and other determiners, conjunctions, and prepositions) than we do on synthesis via inflectional morphology the way synthetic languages do.

At the same time, English still has a few inflections left in it thanks to its ultimate derivation from a long genetic line of highly synthetic languages stretching back over 6,000 years. We can trace English’s ancestors all the way to the prehistoric (read: unwritten) Proto-Indo-European (PIE) language of our distant ancestors. That language was strongly synthetic in all its open word classes, including in its verb forms.

One distinctive PIE verb inflection that occurred in certain verbs’ third-person present singular conjugations was *‑t or *‑ti. This became in prehistoric Proto-Germanic *‑di or *‑þi, in Old English ‑(e)þ, in Middle English ‑(e)þ, and in Early Modern English the ‑(e)th of he liveth, which passed quickly enough into the distinctive inflection you’ve asked about, the ‑(e)s form of he lives or he itches in today’s English.

PIE third-person singular inflections also produced forms like German er bleibt, Latin manet or cōnstat, Old French il remaint, and although you can no longer normally hear it pronounced in speech, also in the imperfect il restait of today’s French (but no longer in its present tense sauf in certain relics such as subjunctive qu’il soit for “that he/it should be”).

Old English was a much more synthetic language than Middle English was, which saw dramatic reductions in inflections as the language transitioned to an analytic one. There are several proposed explanations for why this happened, but that’s a whole nother topic with its own lines of investigation. Suffice it to say that Middle English was a furious time of mergers and acquisitions that saw as sweeping changes to the grammar as to the lexicon.

Similarly to how Middle English remade synthetic Old English analytically by reducing inflections across the board, Early Modern English verbs did enjoy more conjugations than today’s English does, but these again underwent rapid evolution. From the Wikipedia article on that topic we read:

Verbs

Marking tense and number

During the Early Modern period, English verb inflections became simplified as they evolved towards their modern forms:

  • The third person singular present lost its alternate inflections: ‑(e)th became obsolete while ‑s survived. (The alternate forms’ coexistence can be seen in Shakespeare’s phrase, “With her, that hateth thee and hates us all”).

  • The plural present form became uninflected. Present plurals had been marked with ‑en, ‑th, or ‑s (‑th and ‑s survived the longest, especially with the plural use of is, hath, and doth). Marked present plurals were rare throughout the Early Modern period, though, and ‑en was probably only used as a stylistic affectation to indicate rural or old-fashioned speech.

  • The second person singular was marked in both the present and past tenses with ‑st or ‑est (for example, in the past tense, walkedst or gav’st). Since the indicative past was not (and is not) otherwise marked for person or number, the loss of thou made the past subjunctive indistinguishable from the indicative past for all verbs except to be.

I reckon that that’s as detailed an answer to a rather broad question as one dare get here.

I in passing note that English does retain a single, unique inflectional distinction in the past indicative’s singular was versus its plural were (which is also the past subjunctive irrespective of number). Verbs other than be are no longer so marked.


Postamble

Lastly, in John Lawler’s comment:

And {‑Z₁}, the English third person singular present tense suffix — one of the 8 inflections left in English, and one of three that all use /z/, /s/, and /əz/ — is the only mark of present tense around (everything else can be taken for an infinitive, and often is in nascent Englishes). So that suffix, and the subsequent worry about “is XYZ singular or plural?” becomes a status symbol, like whom, and is often mistaken and frequently omitted. That’s the way the cookie crumbles, etymologically speaking.

When John writes {‑Z₁} using an archiphonemic {Z}, what he means is that that morphological inflection ends up being pronounced in three slightly different ways depending on its surrounding phonologic environment:

  1. kits has /s/
  2. kids has /z/
  3. kitches has /əz/

Moreover, we use that same {Z} archiphoneme for three of English’s eight remaining inflections:

  1. First we use it for present-tense verbs’ third-person singular infflections.
  2. Second we use it for the plural inflections of singular nouns.

  3. And third we use it for possessives formed via enclitic.

All three of these follow the same pronunciation rules to translate archiphonemic {Z} into actual phonetics. (Please don’t worry about the spelling; spelling is merely an immaterial side-effect of writing technology, and so shouldn’t be paid any attention to here since we’re talking about language not technology.)