Is there a grammar rule that defines the properties of a legally accepted word [closed]
I would like to know if there is a grammar rule(s) that defines whether a word is gramatically legal or not. I understand a word is given meaning by a human and anyone can give meaning to anything. Therefore I realize it is probably impossible to create a set of laws that can absolutely define the legality of a string of letters. Barring that extreme example, is there a practical/general set of such rules?
For example, I remember my grade 2 teacher saying that if a word does not contain at the minimum 1 vowel, then it is not a legal word. Based on that principle, I might claim that the word 'lkjsdlf' is not a legal word.
Is there a generally accepted set of grammatical parameters that define whether a word is legal or not (apart from looking it up in a dictionary)?
The reason I'm asking this is to determine if it's possible to programmatically validate a word (rather than using a list of 100,000+ words from a dictionary). The goal is to categorize 'lkjsdlf' and 'apple' as 'invalid' and 'valid' respectively.
Not so much a grammar rule but people have analysed the frequency of all the letter combinations of various lengths in samples of English text. They then used this to randomly generate a kind of pseudo English.
I'm not sure where I originally saw this, I think it was a little more scholarly, but here's an example of someone's generated pseudo-English: http://ibbly.com/Pseudo-words.html
and here's someone else's attempt: http://www.fourteenminutes.com/fun/words/
But you could use the same frequency data to quantify how typically "English" a word is, i.e. how probable it is as a word in English.
Of course there's more to words than just a unstructured letter sequence as @curiousdannii has pointed out, so there are further considerations possible in this kind of analysis.
This question is not really about grammar, but about phonotactics. According to Wikipedia (quoting from Haspelmath, Martin; Sims, Andrea, English Words: A Linguistic Introduction) there are fourteen constraints on English words:
- All syllables have a nucleus
- No geminates
- No onset /ŋ/ or /ʒ/
- No /h/ in the syllable coda
- No affricates in complex onsets
- The first consonant in a complex onset must be an obstruent
- The second consonant in a complex onset must not be a voiced obstruent
- If the first consonant in a complex onset is not an /s/, the second must be a liquid or a glide
- Substring principle, stating that "Every subsequence contained within a sequence of consonants must obey all the relevant phonotactic rules."[5]
- No glides in codas
- If there is a complex coda, the second consonant must not be /ŋ/, /ʒ/, or /ð/
- If the second consonant in a complex coda is voiced, so is the first
- Non-alveolar nasals must be homorganic with the next segment
- Two obstruents in the same coda must share voicing
In general, and cross-linguistically, these constraints are based on the sonority hierarchy. Sounds where the mouth is most open will be found in the middle of syllables (like vowels) and sounds were the mouth is most closed will be found at the edges of syllables (stop consonants). There are usually exceptions though, as the list of English constraints above shows.