researching greater detail about word difficulty
I noticed that dictionary.com has a word difficulty index which is called "proprietary."
Questions: Is there an open source list of English words that assigns a difficulty index value to each word?
Are there other indices available?
Is there any position paper (or better) publication that explains how they derive this index for each word?
Solution 1:
Yes, there are a number of legitimate ways of scoring a word that are comparable to this unexplained 'difficulty'. There could be:
- frequency - the number of occurrences of a word for a given corpus
- complexity/readability - number of syllables/sounds/characters
- questions - the number of times a question (or dictionary search) is performed on the word (for a given on-line dictionary/web app)
Surely there are other possible ones. There are a number of online resources that list English words by frequency, and some online dictionaries, too (Google nGrams doesn't give a list but allows comparison by frequency). Readability is easily calculated per word (once you choose your metric). Number of searches on a word is very app-specific and also very volatile, depending very much on if a particular uncommon word is used by someone in the news. I can't find any where on line about dictionary.com's method of calculation
There is still the matter of judging what exactly difficulty should be among a combination of these and any others. I think frequency is what most people really are thinking of when 'difficulty' is being used. 'periodically' is way more 'complex' than 'nonce' but is also way more frequent.