What are the percentages of the parts of speech in English?
What are the percentages of the parts of speech in English?
For instance, what percent of English is comprised of nouns, verbs, adjectives, etc.?
I have done an extensive web search using a variety of formulations of the question but cannot find this information.
Solution 1:
This is a complex question requiring a complex answer, because it all depends on what kind of language you examine. Corpus evidence used in the ‘Longman Grammar of Spoken and Written English’ shows the following approximate frequencies of thousands of words per million:
LEXICAL WORDS
CONVERSATION
Adverbs 50
Adjectives 25
Verbs 125
Nouns 150
ACADEMIC PROSE
Adverbs 30
Adjectives 100
Verbs 100
Nouns 300
FUNCTION WORDS
CONVERSATION
Pronouns 165
Primary auxiliary verbs 85
Prepositions 55
Determiners 45
Coordinators 30
Modals 20
Subordinators 15
Adverbial particles 10
ACADEMIC PROSE
Pronouns 40
Primary auxiliary verbs 65
Prepositions 150
Determiners 100
Coordinators 40
Modals 15
Subordinators 10
Adverbial particles 5
These figures can give only a crude picture and show only the figures for one kind of written English. In general, though, nouns and verbs are the most common words, and conversation seems to use a higher proportion of verbs, adverbs and pronouns, while written English uses a higher proportion of nouns and adjectives.
Solution 2:
The first problem is that there is no consensus on how many words the English language comprises. It depends in a very banal sense on the way you define the word "word". For example: Are "eat" and "eats" one word or two?
The other problem is that very many words function as different parts of speech. For example, present can be a noun, verb or adjective.
A better approach is to look at a body of text (a corpus) and analyse that by a process called POS (part of speech) tagging . The proportions of the different word classes will likely vary according to whether the corpus is of written or spoken language, in the academic field or popular media, and so on.
Here is an article that estimated the percentage of nouns in the LOB and Brown corpora at 37%.
Solution 3:
Eric Lease Morgan has created Perl scripts to analyse a number of English texts and written about his findings in a fascinating essay called Foray’s into parts-of-speech, summarising:
The result was very surprising to me. Despite the wide range of document sizes, and despite the wide range of genres, the relative percentages of POS [parts-of-speech] are very similar across all of the documents. The last column in the table represents the average percentage of each POS use. Notice how the each individual POS value differs very little from the average.
...
The similarity across all the documents can be further illustrated with a line graph:
Visit the blog for much more information, charts and conclusions.