The Chaos of English spelling
In order for FreeTTS to figure out how a word should be pronounced it first looks it up in an internal dictionary. If the word is not in the dictionary then a set of letter-to-sound rules are applied to attempt to guess the pronunciation. There's actually quite a bit of code in FreeTTS that is involved with determining the proper pronunciation based upon spelling. Many people have wondered, if we have such a set of letter-to-sound rules, why we need a dictionary at all. Well, in fact, the FreeTTS dictionary (which consists of 60,000 or so words) contains just the exceptions to the rules. Spelling in English is just so irregular that even with 1000 lines of Java code driving a state machine with 13,000 states in the letter-to-sound state machine, there are still 60 thousand spelling exceptions. This state of affairs was highlighed by Gerard Nolst Trenite in the poem The Chaos. Here's an excerpt;
Dearest creature in creation, Studying English pronunciation, I will teach you in my verse Sounds like corpse, corps, horse and worse. I will keep you, Susy, busy, Make your head with heat grow dizzy; Tear in eye, your dress you'll tear; Queer, fair seer, hear my prayer. Pray, console your loving poet, Make my coat look new, dear, sew it! Strewn with stones like rowlock, gunwale, Islington, and Isle of Wight, Housewife, verdict and indict. Don't you think so, reader, rather, Saying lather, bather, father? Finally, which rhymes with enough, Though, through, bough, cough, hough, sough, tough?? Hiccough has the sound of sup. My advice is: GIVE IT UP!
Posted by Carson on July 14, 2004 at 12:50 PM EDT #
Posted by Paul on July 14, 2004 at 03:44 PM EDT #
Posted by Tim Danner on July 17, 2004 at 12:43 PM EDT #