Skip to main content

legalese - Is there a grammar rule that defines the properties of a legally accepted word



I would like to know if there is a grammar rule(s) that defines whether a word is gramatically legal or not. I understand a word is given meaning by a human and anyone can give meaning to anything. Therefore I realize it is probably impossible to create a set of laws that can absolutely define the legality of a string of letters. Barring that extreme example, is there a practical/general set of such rules?


For example, I remember my grade 2 teacher saying that if a word does not contain at the minimum 1 vowel, then it is not a legal word. Based on that principle, I might claim that the word 'lkjsdlf' is not a legal word.


Is there a generally accepted set of grammatical parameters that define whether a word is legal or not (apart from looking it up in a dictionary)?


The reason I'm asking this is to determine if it's possible to programmatically validate a word (rather than using a list of 100,000+ words from a dictionary). The goal is to categorize 'lkjsdlf' and 'apple' as 'invalid' and 'valid' respectively.



Answer



Not so much a grammar rule but people have analysed the frequency of all the letter combinations of various lengths in samples of English text. They then used this to randomly generate a kind of pseudo English.


I'm not sure where I originally saw this, I think it was a little more scholarly, but here's an example of someone's generated pseudo-English: http://ibbly.com/Pseudo-words.html


and here's someone else's attempt: http://www.fourteenminutes.com/fun/words/


But you could use the same frequency data to quantify how typically "English" a word is, i.e. how probable it is as a word in English.


Of course there's more to words than just a unstructured letter sequence as @curiousdannii has pointed out, so there are further considerations possible in this kind of analysis.


Comments