English parts of speech — better new treatments
Can anyone please recommend a better treatment of English parts of speech / word classes than that offered by most traditional grammars?
Many of the latter stick with the sacrosanct 8 of antiquity, or perhaps allow for one or two more, while POS Tagsets may contain many hundreds of tags (catering for subsets such as plural nouns, verb forms etc). Demanding that we stick with eight for sentimental reasons seems like saying 'Let's just have four (chemical) elements, like they used to do, because that's easier.' Some parts of speech — 'function words' — are distinguished according to function — or called adverbs if they show a vague resemblance in some respect and we don't like having more than 8 (or 12 or 24) classes. However, having 750 classes as a working model does seem to be erring the other way.
Does anyone please know of a sensible (Goldilocks!) treatment?
Solution 1:
I would suggest looking at one of the better modern descriptions of English grammar. The best of course is the "Cambridge Grammar of the English Language" by Pullum and Huddlestone. This has an excellent analysis of English POS but is very expensive so an alternative is "A Student's Introduction to English Grammar", written by the same authors and based on the larger work, but briefer and cheaper.
Solution 2:
Which set of categories is "just right" depends on your purpose and the type of analysis that you want to perform. (Though yes, it's clear that the traditional set of 8-10 categories is inadequate for pretty much any purpose...)
I would suggest that you start with the tagset for one of the major corpora and group together categories that you don't need to differentiate for your purposes.