Where can I find an "official" list of English graphemes? [closed]
Do you know of a list provided by some academic institution? I did find some lists, but I am unable to judge the quality and/or completeness of these:
- This pdf, referenced here.
- and this pdf, referenced here.
Background: I am trying to program a random name generator for project working titles, using the approach outlined here, by extracting graphemes from these downloadable free corpus samples and feeding it to some kind of markov chain.
UPDATE:
I used the Wikipedia list as suggested by @tchrist and the free COCA sample corpus referenced above. The approach worked quite well for my purposes. Here is a small random set of generated words for anyone interested:
Wanstasy, Indricis, Voformer, Colutove, Ingerstr, Tottione, Lspheres,
Umandsam, Extivelo, Pironoba, Zofiropr, Bingernt, Kitleron, Viewinef,
Juntialt, Enabbyth, Uplpofor, Everopeo, Heventri, Ntozzler, Buncener,
Granalse, Nocosacc, Randeren, Randantu, Caredyou, Ftedowla, Ncesnarr,
Ulilkien, Factitur, Grontoft, Noughtoo, Lackeded, Zofricsp, Viewedon,
Tuartand, Dossions, Kifreaps, Xicatage, Evertsom, Emorever, Manksgis,
Ponkiold, Nsualina, Atofficl, Mallitsi, Spmethir, Dayspeed, Anditout,
Xatofrse, Izamedoo, Bupleati, Plitteni, Failitha, Hinglood, Dcoveyou,
If you look at the various spellings for each given phoneme listed in Wikipedia’s section on “Sound to Spelling Correspondences” in their article on English Orthography, this may help.
I’ve looked at both your PDF sources: the Wikipedia section is better than either of those. Your task is harder than you may realize.