Monday Jul 19, 2004

We package the Sphinx-4 demos as jar files. These can be easily run from the command line like so:

% java -jar HelloDigits.jar

Now, some of our demos can be quite large and require a heap that is larger than the default heap size. To run these demos the user has to invoke the jar file with a -mx option like so:

 % java -mx200m -jar HelloWorld.jar
This is just bonkers. Why should the user have to throw that switch? It is awkward for the user and requires more documentation by the developer. Since the developer knows how much memory the program needs, let the developer configure the app, don't make the user do the work.

Wouldn't it be great if you could put the Java command line options such as the "-mx200m" into the manifest in the jar file so it wouldn't be necessary to explicitly set the heap size option? You can do this sort of thing with Java WebStart JNLP files, but there's no equivalent mechanism for simple jar files. If we had this option we could eliminate a whole bunch of pesky documentation, make things easier for the user and reduce the possible failure modes for Java Apps.

Saturday Jul 17, 2004

I was checking though my blog logs and noticed in unusual referrer from google. Apparently someone was getting ready for Talk Like A Pirate Day and had searched for the phrase:

how do you say "good-bye" in pirate language

Strangely enough, the number one hit in google for this is indeed this blog, go figure. You can try it for yourself ... Search google for 'how do you say "good-bye" in pirate language'

Arrgh!, See Ya Chum!

Friday Jul 16, 2004

Next week I'm giving a Sphinx-4 talk and demo. The talk is ready, but I still need a good demo. Willie came up with an idea for a really cool demo, but it may not pan out, so I needed a backup demo, just in case. For a backup, I thought I would integrate our speech recognizer with a chess game. The idea is that you can control the chess game with command like "move the queen to A5" I figured there would be few open source all-Java chess demos out there and I'd be able to find one that I could integrate with our recognizer.

Indeed, I looked on SourceForge and FreshMeat and found a number of them. A couple of them wouldn't compile, a couple wouldn't run, one looked really nice, but was too hard to figure out how to integrate stuff into it (I had a budget of an afternoon to do it). Finally I settled in on JChessBoard. The code was simple and fairly clean, and it was very easy to add the speech recognizer to it. Unfortunately, there were a number of nagging deadlocks caused by race conditions due to some improper updates of swing components. A few calls to SwingUtilities.invokeLater fixed this up though. One really good thing about the JChessBoard app is that the computer AI is pretty poor, so I have a fairly decent chance at beating it.

I still have a bit of tuning to do, but it seems to work pretty well. I'm getting very good recognition rates (using alpha, bravo, charlie for those highly-confusable letters). So I guess I'll have something to demo next week when/if the cool demo doesn't work out.

Thursday Jul 15, 2004

RJ mentions a new Speech Related Blog called SpeechBlog that is focusing on speech. There seems to be some business/marketing info as well as some technical info.

And don't forget to check out RJ's Log too. RJ is, among other things, the editor of the CCXML spec which is designed to provide telephony call control support for VoiceXML.

The W3C has approved the transition of the Speech Synthesis Markup Language to Proposed Recomendation. If all goes well, SSML will become an official W3C Recomendation by early September. SSML is based upon the Java Speech Markup Language that was part of the original Java Speech API.

Wednesday Jul 14, 2004

In order for FreeTTS to figure out how a word should be pronounced it first looks it up in an internal dictionary. If the word is not in the dictionary then a set of letter-to-sound rules are applied to attempt to guess the pronunciation. There's actually quite a bit of code in FreeTTS that is involved with determining the proper pronunciation based upon spelling. Many people have wondered, if we have such a set of letter-to-sound rules, why we need a dictionary at all. Well, in fact, the FreeTTS dictionary (which consists of 60,000 or so words) contains just the exceptions to the rules. Spelling in English is just so irregular that even with 1000 lines of Java code driving a state machine with 13,000 states in the letter-to-sound state machine, there are still 60 thousand spelling exceptions. This state of affairs was highlighed by Gerard Nolst Trenite in the poem The Chaos. Here's an excerpt;

Dearest creature in creation,
Studying English pronunciation,
        I will teach you in my verse
        Sounds like corpse, corps, horse and worse.
I will keep you, Susy, busy,
Make your head with heat grow dizzy;
        Tear in eye, your dress you'll tear;
        Queer, fair seer, hear my prayer.
Pray, console your loving poet,
Make my coat look new, dear, sew it! 
Strewn with stones like rowlock, gunwale,
        Islington, and Isle of Wight,
        Housewife, verdict and indict.
Don't you think so, reader, rather,
Saying lather, bather, father?       
        Finally, which rhymes with enough,
        Though, through, bough, cough, hough,
sough, tough??
Hiccough has the sound of sup.
My advice is:  GIVE IT UP!   

Monday Jul 12, 2004

Check out the Java Roguelike Engine Project. The goal of this project is to "make it easy for anyone to create their own variant of a Roguelike game, and to make it possible for programmers to create truly outstanding variants". Hmmm... a speech controlled rogue might make a good demo for Sphinx-4.

Over at Microsoft's research lab, they've built a singing speech synthesiser. You define a song in terms of lyrics and music (via MIDI), and Whistler will render the song with vocals. The example of 'Mark' singing Penny Lane sounds a lot like Kermit the frog. It doesn't look like you can download the program, nor does it look to be open source, so I guess all you can do for now is listen to the three audio samples. Still... it's pretty neat.

Read more at the Whistler Page at Microsoft.

Sunday Jul 11, 2004

Much research has been done lately about 'talking head' simulations. These simulations attempt to present a realistic animation of a person while they are talking, such that the movement of the lips, and the visibility of the tongue give cues as to what is being said. This can help hearing impaired as well as help the general listener improve understanding.

One talking head system developed at the University of Sheffield is described in the paper Image-based Talking Heads using Radial Basis Functions. More information about their research can be found on their Talking Heads Project Page. These folks are using FreeTTS too.

Friday Jul 09, 2004

I've taken a quick tour through a number of music composition languages. I've settled in on jMusic. First, it is all Java so I don't have to learn or relearn another programming language. It is extendible so it should be possible to add support for CSound score files, and it supports MIDI and JavaSound out of the box, so I can hear what I am doing as I go along, and finally, it is an active project with good documentation.

Thursday Jul 08, 2004

Tuesday, we took a trip to one of our favorite places in the White Mountains: the Basin-Cascades trail in Franconia Notch. The trail parallels the Cascade Brook for about a mile or so. We never walk the trail though, instead we walk up the brook in our bathing suits. The brook is a seemingly endless series of ice cold pools, potholes and waterfalls, perfect for cooling off on a hot day because no matter how hot it is, the water is very very cold. It takes a sturdy constitution to brave the water beyond the knee. Its a great place to go, but keep it secret.

Climbing the country music charts is (I Wanna Hear) A Cheatin' Song, a duet featuring Anita Cochran and Conway Twitty. Strange thing is, Conway Twitty died a decade before the song was written. Mr. Twitty did not sing the song from his grave. Instead, record producer Jim Ed Norman extracted 100's of snippets of Twitty's vocals from other recordings and painstakenly stitched them together using a computer program called Pro Tools, an audio production tool, to adjust pitch, smooth out the transitions and make it sound like a seemless recording.

In the future, as speech synthesis technology improves, the same thing could be done with actors voices, allowing them to speak long after they have taken their last curtain call. George Lucas could continually retinker with Darth Vader's lines long after James Earl Jones quits in disgust. Ah well, maybe its not such a good idea.

More on Conway Twitty's song in USA Today.

Wednesday Jul 07, 2004

More data in the (apparently) never ending performance war between Java and C/C++. The folks at the University of Southern California have published an article called Performance of Java versus C++ that surveys a number of benchmarks for numerical code and shows that Java performance is as good or better than C++ and presents 3 reasons why Java will continue to out pace C++.

Computerworld has a short story on The origins of speech synthesis

Yep, they mention Daisy.

Tuesday Jul 06, 2004

After my last entry about talking robots , researcher Alexander Koller, from the department of computational linguist at Universitat des Saarlandes in Germany pointed me to this course: Talking Robots with LEGO MindStorms. In this course, students build robots using Lego Mindstorms, program them with Java (via Lejos) and control them via speech (using the Java Speech API).

Last years crowd favorite was Luigi, the shell game player.

There are more pictures of the final robots.

They've also written a paper describing their work and how Legos can be used by dialogue researchers to explore the robot-human interface.

I wish there had been courses like this when I was in school!

This blog copyright 2010 by plamere