Tuesday Jun 08, 2004

Woohoo! Vim 6.3 has just been released . It is said to be the "most stable Vim release ever!". And for the emacs users out there who still remain in the dark, the text editor war is over, Vim won .

I originally learned vi way back in 1984, so I've had 20 years for vi commands to work their way into the lower portions of my brain. Now the vi commands are so ingrained that if someone asks me how to do something in 'vi' I can't tell them. I have to actually do it and watch what I do and then tell them. The downside of being so tied to an editor is that it makes it really hard to like one of them new-fangled IDEs like eclipse or netbeans. Try as I might, I always find that the editor and the usual reliance on the mouse (now lift your right hand off of the keyboard, move it 8 inches to the right, grab the mouse, carefully position the mouse pointer and triple click to select the line (compared to 'yy' in vi)) always drives me back to the comfortable home of vi. ah well... I guess I am just getting old.

Monday Jun 07, 2004

A bit of a celebration in the Speech Group here in Sun Labs. According to the SourceForge statistics, as of today FreeTTS, the speech synthesizer written entirely in Java, has been downloaded over 100,000 times. Its popularity certainly has exceeded my expectations. It seems like I am hearing about a new application that uses FreeTTS almost every day.

Here's a WebStart demo of a talking clock that uses FreeTTS.

SpeechBot is a " simple IRC bot that connects to an IRC server and uses the FreeTTS speech synthesizer to read out loud all channel messages it receives."

SpeechBot was created by the folks at Jibble.org.

Saturday Jun 05, 2004

No, its not 'A Bug's life' or 'Antz' ... it is a talking build tool for Java. Eu has posted a "quick and dirty logger for Ant that speaks some events of the Ant build process". Drop a few jar files in your ant lib directory, set a command line option, and ant will speak to you. Check it out at:

Talking Ant build./Java

Friday Jun 04, 2004

It is with great pleasure that we announce the Alpha 0.1 release of Sphinx-4:


Sphinx-4 is a state-of-the-art, speaker-independent, continuous speech recognition system written entirely in the Java programming language. It was created via a joint collaboration between the Sphinx group at Carnegie Mellon University, Sun Microsystems Laboratories, Mitsubishi Electric Research Labs (MERL), and Hewlett Packard (HP), with contributions from the University of California at Santa Cruz (UCSC) and the Massachusetts Institute of Technology (MIT).

The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a "research-ready" system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source under a very generous BSD-style license.

With the Alpha 0.1 release, you get the complete Sphinx-4 source tree along with several acoustic and language models capable of handling a variety of tasks ranging from simple digit recognition to large vocabulary n-Gram recognition.

Because it is written entirely in the Java programming language, Sphinx-4 can run on a variety of platforms without requiring any special compilation or changes. We've tested Sphinx-4 on the following platforms with success: the Solaris 9 Operating System on the SPARC platform, Mac OS X 10.3.3, RedHat 9.0, Fedora Core 1, Microsoft Windows XP, and Microsoft Windows 2000.

Please give Sphinx-4 0.1 alpha a try and post your questions, comments, and feedback to one of the CMU Sphinx Forums:


We can also be reached at [email protected]


The Sphinx-4 Team:  Evandro Gouvea, CMU (developer and speech advisor)
(in alph. order)    Philip Kwok, Sun Labs (developer)
                    Paul Lamere, Sun Labs (design/technical lead)
                    Beth Logan, HP (speech advisor)
                    Pedro Moreno, Google (speech advisor)
                    Bhiksha Raj, MERL (design lead)
                    Mosur Ravishankar, CMU (speech advisor)
                    Bent Schmidt-Nielsen, MERL (speech advisor)
                    Rita Singh, CMU/MIT (design/speech advisor)
                    JM Van Thong, HP (speech advisor)
                    Willie Walker, Sun Labs (overall lead)
                    Manfred Warmuth, USCS (speech advisor)
                    Joe Woelfel, MERL (developer and speech advisor)
                    Peter Wolf, MERL (developer and speech advisor)
The RNDTXT project is all about creating art from the random words that appear in SPAM. I especially like Vicimus GEGAN an ambiant piece created with Python, cSound, sox and FreeTTS.

Thursday Jun 03, 2004

At the Sphinx-4 website you will find ZipCity, a WebStart demonstration of Sphinx-4, the speech recognition system written entirely in the Java programming language. This WebStart application is a simple demonstration that will recognize spoken zip codes and display the associated city and state. The really neat thing about this WebStart demo is that it includes an entire speech recognition system as well as all of the necessary acoustic models for digit recognition (all in about 2.5MB). This may be the first ever speech recognizer that is deployed with a single click!

Tuesday Jun 01, 2004

A good article about the just announced Sun/Fujitsu deal is at The Register including this analysis:

Sun is placing a lot of bets on its new chip technology, but, to its credit, neither Intel nor IBM have laid out equally rich processor roadmaps. Come 2007. Sun will have three, distinct RISC options tuned for various types of software workloads versus just one chip each from Intel and IBM.

A new release of 'self' is available from Sun Labs. Read more about Self in the Self tutorial

Friday May 28, 2004

The JHome project is a Java application for controlling X10-enabled lights and appliances.

JHome uses the FreeTTS speech synthesizer to announce X10 events.

Thursday May 27, 2004

I don't think that there is a big future, at least in the next five years for speech on the desktop. Speech recognition still takes too much computing power and is still too inaccurate to be viable replacement for the keyboard. I think that the next big area for speech recognition is indexing audio streams (think NPR, and CNN) for search engines.

Our colleages over at HP's research lab put together SpeechBot , a prototype system that allows you to search a number of audio streams. It uses Sphinx as its speech recognizer.

StreamSage is building a business around indexing audio streams for search. They have a demonstration of their technology at CampaignSearch.com.

There's a good overview of the field in a ZD Net article: Search engines try to find their sound

Wednesday May 26, 2004

Since yesterday was Towel Day (where folks around the world carry a towel all day in memory of Douglas Adams) I took a trek to the local Douglas Adams memorial (every town should have one). I took a picture:

Tuesday May 25, 2004

One of them most unusual applications of FreeTTS is From Beyond. Combine Monty-Python-esque animation, Lovecraftian prose, Java and FreeTTS and you have an odd application that reads stories such as Call of Cthulhu to you. (And yes, the lips move). Here's a screen shot:

According to this story in The Register , Sun will soon be releasing a developer's kit for project Looking Glass , the 3-D desktop written in Java for the JDS. A good next step on the way to turning LG from a cool demo into something real.

Friday May 21, 2004

I've been working on a WebStart demo for Sphinx-4. I've been having a bit of trouble on my Linux box with ALSA sound. I'm not sure if it is a problem specific to my configuration or endemic to all Linux/ALSA systems. So ... if you feel so inclined, give it a try: ZipCity and let me know how it works. Thanks!

This blog copyright 2010 by plamere