Why is there an extra "t" in Lemmatization?
"Lemma" is from a Greek word that had t in some of its forms
Etymologically, the t in lemmatize comes from the stem of the Greek word λῆμμα, which is the source of the English word lemma. Greek nouns have many inflected forms: the citation form λῆμμα is just the nominative (and accusative) singular form. Most other forms of a Greek noun are built on a stem that often differs at the end from the nominative singular form. Greek has many neuter nouns ending in -μα (-ma) with stems ending in -ματ- (-mat-). Conventionally, the inflected form used to identify the stem of a Greek noun is the genitive singular, which for λῆμμα is λήμματος (lemmatos). By removing the -ος, you can identify the stem lemmat-. The nominative and accusative plural λήμματα (lemmata) is built on the same stem.
Some other English words follow the same pattern
Most English speakers don't know these kinds of details about the etymology of words like lemma or about how Greek nouns inflect. They just memorize the form of the English word lemmatize, possibly aided by analogy with other pairs of similarly related words.
A number of other Greek -μα nouns have entered English as -ma nouns and show the same variation with -mat- in derived words:
- stigma, stigmatize
- asthma, asthmatic
- trauma, traumatic, traumatize
- aroma, aromatic, aromatize
- enigma, enigmatic
- cinema, cinematic
- drama, dramatic, dramatize
There are also some English nouns ending in -m that are from Greek neuter nouns ending in -μα and that are related to words containing -mat-:
- system, systematic, systematize
- problem, problematic, problematize
- emblem, emblematic
- symptom, symptomatic
- sperm, spermatic
But custom is not such a noun. I don't think there's any easy way to figure that out aside from looking up its etymology.
/t/ is not just automatically inserted after any vowel followed by -ize, although there might be some non-automatic tendency towards /t/-insertion in certain contexts
As I mentioned, most speakers are not aware of the etymological source of the t in lemmatize.
Some comments and answers have brought up an idea that from a synchronic (as opposed to diachronic) perspective, the /t/ in lemmatize could be analyzed as a consonant that is inserted to prevent hiatus (a sequence of two vowels in separate syllables with no intervening consonant).
I don't think that's an untenable hypothesis, but I wanted to say that any such process of /t/-insertion before -ize is not incredibly productive, and is more limited than just a rule like "-tize is used after vowels".
Looking at other words ending in /ə/ spelled -a, we see the following alternatives to inserting /t/ before -ize:
-
hiatus with a possible change in vowel quality (that could be viewed as introducing a front glide):
algebra > algebraize /-eɪaɪz/ or /-əaɪz/ -
dropping the first vowel:
formulize, silicize, nebulize, patinize
If we look at other vowels, we also see those alternative strategies being used fairly frequently. Dropping the first vowel is very common with bases ending in /i/:
- jeopardize, scrutinize, summarize, agonize, theorize, notarize, anatomize, empathize, eulogize, prioritize, botanize, alchemize, etymologize, militarize, melodize, theologize, lobotomize, strategize, astronomize, philosophize, memorize, allegorize, sorcerize, prioritize
For bases ending in /oʊ/, hiatus (which could be viewed as involving a back glide) seems no less common than t-insertion. Hiatus occurs in ghettoize, heroize, jumboize, and memoize. The only case of t-insertion after /oʊ/ that I know of is egotize, which coexists with a less frequent alternative form with hiatus, egoize.
For non-rhotic speakers, there are a great many words ending in the sound /ə/ with spellings that end in the letter R. When such words are suffixed with -ize, the consonant sound /r/ is inserted after the /ə/, as in the following list:
- characterize, rubberize, rasterize, vulgarize, vascularize, exteriorize, valorize, factorize
Etymonline states:
1560s, in mathematics, from Greek lemma (plural lemmata) "something received or taken; an argument; something taken for granted,"
(emphasis mine)
This is where the 'T' comes from. In addition, note that lemma derives from Greek, whereas your other examples come from Latin through French. This would account for the difference in forms.