Is it true that a word ending in -y is more likely to be an adjective than a noun?
I originally made a quick python script on the "Part of Speech Database" here, which is a combination WordNet and Moby. Then I modified it to run on the frequency list here, based on COCA.
The first script found 29476 words ending in -y, of which 13677 were -ly. Therefore we are left with 15799 words ending in -y but not -ly. Among these words, only 2643 were adjectives.
Therefore our key result is 2643/15799 = 0.16729. Approximately 1 out of 6.
This did not incorporate word frequencies, and I suspected they would boost the ranking somewhat, as many of the -y nonadjectives were quite rare (for example otolaryngology, noun). Thus I edited the program to tally instances of each word from a COCA-derived frequency list.
This found:
23,771,109 instances of -y words;
5,713,230 instances of -ly words;
18,057,879 instances of -y words that were not -ly words;
1,632,165 instances of adjectives among this set.
This leads to a frequency of 1632165/18057879 = 0.090385. Roughly 9% of words ending in -y but not -ly were adjectives. Surprisingly, this result was even smaller. I guess in the scheme of things "traditionally-suffixed" adjectives aren't really that common.
From the data I also found the converse question (does being an adjective generally imply a -y ending?). There were 28426173 total instances of adjectives and 2134139 adjectives ending in -y, including -ly. The result here was quite similar: 0.075077. Only about 3 out of every 40 adjectives have the "traditional" suffix.
Frequency results (percent) using WRI curated data.
----------------------------------------
Word Ending
---------------------------------------
"y" "ly" "y" but not "ly"
Noun 61.58% 17.03% 81.09%
Adverb 24.24% 77.57% 0.88%
Verb 4.35% 1.06% 5.78%
Adjective 12.90% 6.46% 15.72%
Interjection 0.40% 0.13% 0.53%
Determiner 0.12% 0.17%
Pronoun 0.06% 0.02% 0.08%
Preposition 0.02% 0.03%
Conjunction 0.03% 0.05%
The columns add up more than 100% because the same word can be accounted for in several rows.
Just as a reference, I used the following scripts (only one shown, Mathematica code):
n = Length@Flatten@WordData[___ ~~ "ly", "Lookup"]
{#[[1]], N@#[[2]]/n} & /@
Tally@Flatten@(WordData[#, "PartsOfSpeech"] & /@
WordData[___ ~~ "ly", "Lookup"]) // TableForm