Is the word “formulæ” valid English?
Solution 1:
It’s not really part of Modern English vocabulary. Depends how you define vocabulary. Does it include words and letters being used? Or does it cover only those words and letters that are officially in words such as a or apple?
Wikipedia says this about it:
Æ (miniscule: æ) is a grapheme formed from the letters a and e. Originally a ligature representing a Latin diphthong, it has been promoted to the full status of a letter in the alphabets of some languages, including Danish, Faroese, Norwegian, and Icelandic. As a letter of the Old English Latin alphabet, it was called æsc (“ash tree”) after the Anglo-Saxon futhorc rune ᚫ, which it transliterated; its traditional name in English is still ash /æʃ/.
It’s not a letter in the English alphabet. So depending on how you define vocabulary, Æ either could be or is not part of English.
Another strange letter would be œ, which used to be in the word diarrhoea when written as diarrhœa.
Solution 2:
Are there any other examples of English words that contain letters not found in the standard English alphabet?
If for “English words”, one counts terms that appear in the Oxford English Dictionary, then yes, there are a very great many such words. Here are just a few examples of the sorts you will find there:
Allerød fête Niçoise smørrebrød
après-ski feuilleté piñon soirée
Bokmål flügelhorn plaçage tapénade
brassière Gödelian prêt-à-porter vicuña
caña jalapeño Provençal vis-à-vis
crème Madrileño quinceañera Zuñi
crêpe Möbius Ragnarök α-ketoisovaleric acid
désoeuvrement Mohorovičić discontinuity résumé (α-)lipoic acid
Fabergé moiré Schrödinger (β-)nornicotine
façade naïve Shijō ψ-ionone
As you see, a great many English words are spelled with letters outside the A to Z set. Just how fussy you are when using these depends in some part on your audience, your input mechanism, and your own care in attending to such details. In modern software, there is no excuse not to use the full character set available to you, but sometimes data may only be entered on a typewriter-style keyboard, lacking all such niceties.
About such things, Robert Bringhurst says the following on pages 179–182 of The Elements of Typographic Style, version 3.2.
(TL;DR: For the summary, skip to the part I’ve editorially set in bold and bold italics, and the paragraph immediately following it.)
9.1 The Hundred‐Thousand Character Alphabet
It is often said that the Latin alphabet consists of 26 letters, the Greek of 24 and the Arabic of 28. If you confine yourself to one case only, a narrow historical window and the dialect in power, this assertion can hold true. If you include both caps and lower case, accented letters and a global set of consonants and vowels — á à â å ã ä ą ă ā æ ǽ ç ć č ð đ é ł ñ ň ņ ő š ș þ ű ū ŵ ý ž ź ż and all the rest — the Latin alphabet is not 26 letters long after all; it is closer to 600 and able to increase at any time. The alphabet that classicists now use for classical Greek, with its long parade of vowels and diacritics — ά ὰ ᾶ ἀ ἃ ἅ ἆ ἁ ἅ ἃ ἇ ᾷ ᾇ, and so on — is modest by comparison: fewer than 300 glyphs altogether.
To the 600‐character globalized Latin alphabet, mathematicians, grammarians, chemists, and even typographers are prone to make additions: arabic numerals, punctuation, technical symbols, letters borrowed from Hebrew, Greek, and Cyrillic, and, where the letterforms require or invite them, a few typographic ligatures and alternates as well. There is no hope at this stage of counting the number of sorts or glyphs precisely, but the total is clearly over a thousand.
At the end of the eighteenth century, an English‐speaking hand compositor’s standard lower case had 54 compartments, holding roman or italic a to z, arabic numerals, basic ligatures, spaces, and punctuation. The upper case had another 98, containing caps and analphabetics. That total, 98 + 54 = 152, is the English‐speaking hand compositor’s minimum basic allotment. When more sorts are required, as they very often are, supplementary cases are used. Two pair gave 304 compartments; three pair give 456; four pair gave 608. How Gutenberg’s cases were arranged we do not know, but we know how big they were. He used not 26 but 290 different sorts, in one face and one size, in an unaccented script, to set his 42‐line Bible. The Monotype machine, built five centuries later, with 255 (later 272) positions in a standard matrix case, had fallen only a little ways behind.
Early computers and e‐mail links were, by comparison, living in typographic poverty. The alphabet they used was the basic character set defined by the American Standard Code for Information Interchange, or ASCII. Each character was limited to seven bits of binary information, so the maximum number of characters was 2⁷ = 128. Thirty‐three of those were normally subtracted for control codes, and one was the code for an empty space. This leaves 94: not even enough to hold the standard working character set of Spanish, French, or German. The fact that such a character set was long considered adequate tells us something about the cultural narrowness of American civilization, or American technocracy, in the midst of twentieth century.
[ . . . ]
Few of us may need (and few may want to memorize) 100,000 characters. Typographers working Chinese have often mastered 20,000; those who work in Korean learn 3,000 or more; most literate humans learn a thousand characters or fewer. Yet authors, editors, typographers, and ordinary citizens who just want to be able to spell Dvořák, Miłosz, Mą’ii, or al‐Fārābī, or to quote a line of Sophocles or Pushkin, or the Vedas or the Sutras or the Psalms, or to write φ ≠ π, are beneficiaries of a system this inclusive. So is everyone who want to read their e‐mail in an alphabet other than Latin or a language other than English.
There may also never be a font of 100,000 well‐made characters designed by one designer. But good fonts with well over ten thousand characters, keyed to the Unicode system, are now readily available. Computer operating systems now support them. More importantly, fonts for particular symbol sets and alphabets can be linked and tuned to one another by adjusting weight, letterfit and scale. This kind of typographic diplomacy is a task of some importance — and when character sets are joined in this way, sharing typographic space whether or not they are all on one font, Unicode can serve as a coordinating mechanism.
Unicode is relatively new, but many of the resources it catalogues are ancient. Composition software, communication links, and keyboards are just starting to catch up.
As Bringhurst says, it’s about inclusiveness — and software is only just now starting to catch up with what the original typesetters, let alone manuscripts, were able to do with perfect ease.