Is it true that the 100 most common English words are all Germanic in origin?

There is an oft-quoted statement that the 100 most common (frequently used) words in the English language are entirely Germanic/Anglo-Saxon in origin. (Also sometimes said is that ~80% of the 1000 most common are Germanic in origin.) While this did not surprise me so much, I did recently stumble across this Wikipedia page, which lists the supposed 100 most common words, with an attributed source.

A quick glance suggested (to my surprise) several words of non-Germanic (specifically, Latin) origin:

  • use
  • person
  • just
  • because (the cause part)

There may be others I've missed too? Indeed, perhaps due to the entry of Latin words into the Germanic languages in the proto-Germanic period (and the fact they are both ultimately Indo-European languages) some of the etymologies may be uncertain. Do correct me if that's not the case, as I am no historical linguist.

Clearly, depending on the statistical sample used to compile the list, results can vary. However, is there any accepted/standard list of the 100 most common English words? And moreover, is it a myth that they're all Germanic in origin (as I now doubt)?


is there any accepted/standard list of the 100 most common English words?

I suppose it all depends on your definition of authoritative, but I think a good start is The Oxford English Corpus, a collection containing over 2 billion words of 21st century English from around the world. Here's a list of facts about the corpus, including the 100 commonest words in the English language.

Neat facts about distribution: 10 lemmas (word forms, is and are are lemmas of to be) make up 25% of the corpus, 100 make up 50%, 1000 make up 75%, 7000 make up 90%, 50,000 comprise 95% and you need over a million to get 99% coverage.

So, one quarter of all words used are the, be, to, of, and, a, in, that, have, and I.

Is it a myth that they're all Germanic in origin (as I now doubt)?

Yeah, most of them are germanic in origin, but not all.

As you noted:

use is of Latin origin (by way of French) and replaced the O.E. verb brucan (which survives as the verb brook "to tolerate, put up with something unpleasant")

because is of direct Latin origin from the phrase bi cause "with cause."

and

people also Latin by way of French.

Those are the only words that jumped out at me. Of course, most of the common words have Indo-European origin, so they'll ultimately share a common root anyway. See two and duo.


It's usually pretty simple to spot Latin loans, even if they were borrowed in the common Germanic period. Grimm's law means that most of the consonants are different in inherited words and Latin loans.

Also, it's worth noting that English also has a certain amount of words borrowed from Norman as well. Which means that in some cases you have three versions of what is essentially a single proto word: an inherited version, a Latin loan, and a Norman loan. The last two will of course be quite similar, but not identical.

As to your question, I'd be surprised if there are no loans at all in the top 100 words. If nothing else, some of the personal pronouns ("they" and "them" if memory serves) are borrowed from Norse. A related language, yes, but inherited forms would be different from what we have in modern English.


Here is one site that blends British, American, and Australian English together: http://www.world-english.org/english500.htm


Latin influenced an already existing language: English. Therefore, all the most basic words already existed. Things like pronouns, articles, particles, basic (versions) of verbs such as to talk and to eat, and basic nouns such as the seasons, earth, food, etc, meaning they didn't "need" a romantic word. They needed words for things that were being introduced to them by these new people, like a gladiator, not words for things they already knew about, like the sun. Also, because the most common words are used the most, they would resist being changed by the influx of French the most. If you use the word 'gaderian' (Old English: gather) once a year, it's easier for you to shift from saying 'gaderian' to 'assembler' (Old French: assemble) than it would be for you to stop using a word you use every day, such as 'æftar' (Old English: after) and start using "apres" (Old French: after).

To sum it up, all the most common words existed already in English before romantic influence, and since their frequency of use makes them more resistant to change, almost all of the most common words are of Germanic origin.

The following is not exactly on topic, but pretty related and, mostly, very interesting.

The words we use most of the time are these 100 most common words. They create the entire structure for the language, and are then filled in with rare and specific words. No matter what crazy animal you see at the zoo, you're going to use a slew of these 100 most common words, with the addition of specific information to convey details. Whether a lion hunted, a monkey climbed, a wolf howled, or any other thing like that, this is true. "I saw a monkey climb a tree, it was so cool." 'I', 'saw' (see, inflections are attributed to the root), 'a', 'it', 'was' (be), and 'so' are all on this list. You could change it to be about a lion roaring, but 6 of 11, more than half, don't need to change. By placing specific words into the framework created by the common words, we get a full language. This sounds obvious once spelled out, but I think it's a good way of understanding why 100 words make up half of the words we use.

If you are interested in something highly related, and incredibly interesting, look into Zipf's law. It's a statistical "law" (it isn't a law like gravity, it only gives approximations and doesn't hold in all cases) that explains a frequently occurring phenomenon in statistics. Basically, in a set (the words in a given sample of a language, such as a book or even all of wikipedia, works very well), the 2nd most common thing (word, in this case) is used 1/2 as often as the most common word. The 3rd most common word is used 1/3 as often as the most common, the 4th is used 1/4 as often, continuing on until you get to single instances. Single instance words, interestingly enough, make up a a large percent of words used. Check out https://www.youtube.com/watch?v=fCn8zs912OE for more information.


In general, the Germanic words adopted into English mostly have one or two syllables. While this also describes some words with French or Latin origins, most of the multi-syllable words in English come from these sources, rather than German.

But the easier, Germanic words, make up most (not all) of the "top 100."