Can the schwa sound predict spelling?
Solution 1:
Absolutely not. Pronunciation never determines spelling in English. Spelling has its own ancient history, one far removed from any attempt to encode pronunciation.
There are all schwas:
- Alan
- mountain
- kitten
- cetacean
- foreign
- satin
- atomician
- motion
- lemon
- autumn
- Eryn
- rhythm
- acre
- little
In the case of the two most commonly misspelled words in the English language, separate and occurrence, these are spelled that way because of how Latin respectively spelled sēparāre < sē + parāre and occurrencia < occurrentia.
The same sort of thing is true with a very great many other words in English. You have to understand their history to understand their spelling. Pronunciation is virtually immaterial.
Solution 2:
The other answers answer the question I think, but just as regards to "separate", it is possible that people who misspell it aren't predicting an "e" from the schwa sound (after all, Boondoggle's data shows that "o" would make as much if not more sense there as "e" does), but are spelling by analogy with the word/syllable "per", which is pronounced with a schwa/r-colored vowel the same way as the "par" in "separate", whereas the word/syllable "par" in isolation is not.
Solution 3:
The schwa predicts the spelling to some extent, although highly inconsistent in most cases.
Hanna et al. (1966) found that phoneme to grapheme correspondence is dependent upon the position of the phoneme within the syllable. An algorithm scored whether a phoneme occurred in the first, the middle, or in the final part of a syllable. For instance the word 'string' would be separated into initial: S T R, medial: I, and final: NG. The phonemes were matched to the graphemes of the words based on the Zipfian least effort principle. Based on phonological information alone about 49 percent of the 17,000 words in the corpus could be spelled correctly.
Below is a table with the 7 most popular graphemic options for the schwa sound in relation to the position within a syllable, adapted from Hanna et al. (1966, p. 59):
From this table we can calculate the schwa-letter correspondence:
O | 26.79%
A | 23.91%
I | 22.40%
E | 12.68%
OU | 5.58%
U | 4.93%
E-E | 1.67%
The schwa sound is about twice as likely to correspond to the letter "A" (23.9%) than to the letter "E" (12.7%). When the schwa is in the final part of a syllable the most probable candidate for spelling is the letter "A" (1419/3023 * 100 = 47.3%) followed by the letter "I" (1332/3023 * 100 = 44.1%). However, the probability of the schwa being an "A" in a middle part of a syllable is less than one percent (only 13 occurrences).
There's a ~99% chance that the schwa does not represent an "A" or "I" in the first or middle part of syllables. Now what does this mean in practice?
Consider the word separate.
It can be divided into seven phonemes and three syllables:
S E3 P - SWA - R A T (verb)
S E3 P - SWA - R I3 T (adj)
Six phonemes with the R-colored vowel instead of the schwa (cmudict-0.7b):
S EH1 P - ER0 - EY2 T (verb)
S EH1 P - ER0 - IH0 T (adj)
Or six phonemes and two syllables:
S EH1 P - R AH0 T (adj)
In the adjective version the second syllable may not be pronounced. In other cases the schwa phoneme or the R-vowel is used in the second syllable. When the schwa is in medial position (i.e. the middle of syllable) the predicted spelling would be an "O" (1176/2046 * 100% = 57.5%) or an "E" (420/2046 * 100% = 20.5%).
It might be useful to instruct English writers with exceptions to phoneme-grapheme correspondence, particularly schwa-A (19 words) and schwa-I (15 words).
Hanna et al. has a list of those words, some of them are questionable:
Schwa spelled "A":
- initial position of the syllable (p. 1423): anarchist, anarchy, ballast, damask, harass, palatable.
- medial position of the syllable (p. 1423): breakfast, canvas, canvass, carcass, compass, cutlass, encompass, pampas, purchaser, trespass, trespasser, windlass.
Schwa spelled "I":
- initial position of the syllable (p. 1437): basil, civil, gossip, imperil, peril, vigil.
- medial position of the syllable (p. 1437): council, moccasin, nostril, pencil, stencil, tendril, tonsil, tulip, turnip.
In some dialects a word may not actually be pronounced with a schwa such as with anarchy. And the syllabification in some cases is questionable such as with palatable. This makes it more difficult to generate a reliable spelling rule based on pronunciation across dialects.
Solution 4:
In some accents, schwa is less likely to be written with "i" or "y"
This depends partly on one's accent. Some accents maintain a somewhat (although not entirely) stable phonemic distinction in some words between two types of "weak" or "reduced" vowels: a more front one, usually identified with the strong/unreduced vowel found in the word "kit" (/ɪ/), and a more central one, usually identifed with the "schwa" symbol /ə/, and often thought of as being similar to the strong/unreduced vowel found in the word "strut".
But speakers of some other accents feel that there is (at least in general) no stable phonemic distinction like this between different kinds of weak/reduced vowels. The absence of a phonemic distinction between weak /ɪ/ and /ə/ has been called the "weak vowel merger".
You can see from the discussion beneath tchrist's answer that FumbleFingers does not seem to have this merger, while tchrist does have this merger.
My understanding is that the Oxford English Dictionary (OED) uses transcriptions that assume that an American English speaker will have the weak vowel merger, but a British English speaker may or may not have the merger. The symbol /ᵻ/ is used in OED transcriptions to indicate a vowel that may be pronounced as /ɪ/ or as /ə/; in contemporary British English, this usually corresponds to a word that was considered to have "weak" /ɪ/ in "RP" English.
-
I believe the letter "i" in an unstressed syllable often, although not always, corresponds to /ɪ/ rather than /ə/ for a speaker without the weak vowel merger. The same goes for "y". The OED transcribes the verb "predicate" as "/ˈprɛdᵻkeɪt/, U.S. /ˈprɛdəˌkeɪt/", and the verb "carboxylate" as "Brit. /kɑːˈbɒksᵻleɪt/, U.S. /ˌkɑrˈbɑksəˌleɪt/". Word-internally, it is possible for "i" to correspond to /ɪ/ before /r/, as in "perspirate" (OED: "Brit. /ˈpəːspᵻreɪt/, U.S. /ˈpərspəˌreɪt/") or "hydrargyrum" (OED (1899): /hʌɪˈdrɑːdʒɪrəm/).
-
The situation with "e" seems to be more complicated, but in at least some circumstances it can correspond to weak /ɪ/ rather than /ə/. The OED transcribes the verb "aggregate" as "Brit. /ˈaɡrᵻɡeɪt/, U.S. /ˈæɡrəˌɡeɪt/". However, it seems like "e" is generally pronounced /ə/ before /r/ , even in accents without the weak vowel merger: the OED transcribes "generate" as "Brit. /ˈdʒɛnəreɪt/, U.S. /ˈdʒɛnəˌreɪt/".
-
I am not aware of any case where the letter "o" corresponds to weak /ɪ/. There might be a few, but it doesn't seem to be a regular correspondence. The OED transcribes the verb "advocate" as "Brit. /ˈadvəkeɪt/, U.S. /ˈædvəˌkeɪt/".
-
I think the letter "a" typically can only correspond to weak /ɪ/ in word-final syllables with a consonant after the "a" (such as "-ace" in "palace" (OED "Brit. /ˈpalᵻs/, U.S. /ˈpæləs/") and "-age" in "manage" (OED "Brit. /ˈrʌmɪdʒ/, U.S. /ˈrəmədʒ/"; oddly, the OED entry for "manage" says "Brit. /ˈmanɪdʒ/, U.S. /ˈmænɪdʒ/").
-
I think the letter "u" can only correspond to weak /ɪ/ in a few words, such as "minute' (OED "Brit. /ˈmɪnɪt/, U.S. /ˈmɪnᵻt/"), but not in words like "adjuvate" (v.), which the OED transcribes "Brit. /ˈadʒᵿveɪt/, U.S. /ˈædʒəˌveɪt/" (the symbol ᵿ in OED transcriptions is a shorthand for "/ʊ/ or /ə/").
However, in an accent without the weak vowel merger, the digraphs "ir" and "yr" (with no pronounced /r/ consonant sound) can represent schwa (e.g. in "elixir", "satyr", "martyr", "confirmation").
The linked Wikipedia article about the "weak vowel merger" also mentions "The use of final /əl/ in words like evil and pencil is now extremely common in both General American and RP, to the extent that the alternative /ɪl/ can sound archaic or stilted."
Accents with the weak vowel merger
As you can see from the above, in an accent with the weak vowel merger (like the accent that the OED chooses as representative of the U.S.), schwa may correspond to any of the letters a, e, i, o, u, y (or even, as tchrist points out, a combination of multiple letters).
There is a tendency for /-ər-/ to be spelled "-er-"
The tendency that you have observed for some people to write "separate" as "seperate" may be due to a few factors:
-
The letter "e" seems to be more common than the letter "a" in general in English.
-
The spelling pattern "er" = /ər/ is fairly common. It occurs in the common suffix "-er", as well as showing up word-internally before a vowel in many words from French and/or Latin such as general, federal, mineral, lateral, literal, several, generous, numerous, moderate, operate, desperate, temperate, temperature, refrigerate, different. In fact, in non-initial syllables, the sequence "erV" was more common than many other VrV sequences in Latin words because of historical sound changes of vowel reduction that caused short vowels to change to "e" in this context. The word "separate" does not in fact come from a Latin word with a reduced vowel in the second syllable (probably due to its origins as a compound word, although I don't know the details), but an English speaker is unlikely to have an intuitive sense of which non-final syllables in words from Latin went through the process of vowel reduction and which did not.
The web page "Spelling the Vowels of English Received Pronunciation" (which seems to be based on date from one of the projects of Washington University in St. Louis’s Reading and Language Lab) has some notes that seem consistent with what I say above (there is also a link to the statistics that these notes are based on):
/ə/ as in adore
Rules
- <a> normally;
[...]
- <e> in medial syllables before /r/;
[...]
Exegesis
- <a> is by far the most common spelling for /ə/ in the general case (saliva), but 8 or 9 other spellings are also quite common, including <o> (daffodil), <er> (clever), <e> (shellac), <or> (forbid), <u> (triumph), <ar> (cellar), <re> (fibre), <ur> (pursue).
[...]
- This pattern depends to a large extent on the fact that Latin mostly had <e> in this position (camera, viscera). Otherwise <a> (minaret), and <o> (calorie) are quite common as well.
(bolding added by me)
So in fact, it seems like the common misspelling "seperate" may use a spelling of schwa that is in this context (before an intervocalic /r/) somewhat more common than the "a" used in the standard spelling "separate" (although the "a" spelling is also noted to be "quite common" in this context).
16 rules for going from sound to spelling for schwa in "RP"
The linked web page lists 16 total rules of thumb for spelling schwa; you can see that this is a complicated area of English spelling. Note that the analysis is of the "RP" ("Received Pronunciation") accent of English, which is "non-rhotic" ("data" and "corner" both end in schwa), so a considerable amount of the spellings that are discussed include "r". An analysis of a rhotic accent such as "General American" English would surely have significantly different results in many places.
The full set of identified tendencies for the spelling of schwa in RP, along with the explanations:
<a> normally;
<u> after initial /s/;
<o> after initial /k/;
<o> before /n/;
<o> in medial syllables before /l/;
<e> in medial syllables before /r/;
<e> before final /nt/;
<o> before final /k/, /p/, /t/, or /m/;
Unspelt in final /zəm/;
<u> before final <s>;
<e> before final /l/;
<ar> before final /d/;
<er> after /t/, when word-final or preceding /n/;
<er> word-finally after /d/;
<er> word-finally after /ð/;
<er> word-finally after /p/.
Exegesis
Statistics. Schwa appears only in unstressed syllables, but is very frequent there. Some care should be taken here, in that northern speech often has a full vowel where RP, and therefore this list, has a schwa.
<a> is by far the most common spelling for /ə/ in the general case (saliva), but 8 or 9 other spellings are also quite common, including <o> (daffodil), <er> (clever), <e> (shellac), <or> (forbid), <u> (triumph), <ar> (cellar), <re> (fibre), <ur> (pursue).
This case is due mostly to forms of the Latin prefix sub-. These forms have a tendency to be pronounced with /ʊ/ in the north. A few words in <a> (saloon) and <o> (solicit) are also found.
This case is due mostly to forms of the Latin prefix con- (commence, condense). These forms have a tendency to be pronounced with /ɒ/ in the north. In other forms, the spelling <a> is most common (casino).
This pattern emerges in part beceause it recapitulates the con- rules above (confide), in part because final /ən/ is disproportionately common (apron). This is only partly due to the Greek ending -on (rhododendron). The spellings <a> (pagan), <e> (token), and <er> (lantern) are also quite common. Note that no particular effort has been made here to distinguish the sequence /ən/ from syllabic /n̩/.
This pattern is due mostly to an Italian diminutive pattern (tremolo); otherwise <a> (buffalo), and <e> (procelain) are quite common as well. Note that it is not unusual for vowels to disappear entirely in such words.
This pattern depends to a large extent on the fact that Latin mostly had <e> in this position (camera, viscera). Otherwise <a> (minaret), and <o> (calorie) are quite common as well.
This depends on a common Latin pattern (aliment, silent), but there is also a prolific competing French pattern in -ant (tenant).
In general, <o> is used before word-final voiceless or nasal stops. This rule doesn't generalize to voiced stops, however; in particular, note the rule for final /d/ below. Other spellings are possible, but the default spelling <a> is surprisingly rare before voiceless stops.
An exception to the general rule for final /m/ (prism, spasm).
This pattern is due to the many Latin loans in -us. A more typical English spelling is with <a> (carcass, compass, terrace).
-el is a common native ending, but there are also many words in Latinate -al.
A few words use <o> or <a>.
Although <ter> (bitter) and <tern> (lantern) are common patterns, there are also many words in <ta> (data), <tan> (titan), <tre> (centre), <tar> (altar), <ton> (carton), and <tor> (motor).
<der> is most common in native words (murder), but there are many Latin and Romance words in <da> (agenda, armada).
This is a particularly reliable pattern in that there are a dozen words meeting it, and no exceptions. On the other hand, most of the words are like either in being grammatical.
There are only a handful of exceptions, like pupa.