At Thursday's keynote, James Gosling and Simon Ritter demonstrated a speech recognizer called Sphinx. They were talking about Sphinx-4 a speech recognition system written entirely in Java.

Sphinx-4 is a state-of-the-art, open source, speech recognition system created via a joint collaboration between the Sphinx group at Carnegie Mellon University, Sun Microsystems Laboratories, Mitsubishi Electric Research Labs (MERL), and others.

Sphinx-4 performs well for a wide range of recognition tasks, from the very small to the very large. The following table compares the word-error-rate (WER) and runtime (RT) for Sphinx-4 and Sphinx 3.3 (CMUs 'fast' recognizer written in C). Lower numbers are better.

Test S3.3 WER S4 WER S3.3 RT S4 RT(1) S4 RT (2) Vocabulary Size Language Model
TI46 1.217 0.168 0.14 .03 .02 11 isolated digits recognition
TIDIGITS 0.661 0.549 0.16 0.07 0.05 11 continuous digits
AN4 1.300 1.192 0.38 0.25 0.20 79 trigram
RM1 2.746 2.739 0.50 0.50 0.40 1,000 trigram
WSJ5K 7.323 7.174 1.36 1.22 0.96 5,000 trigram
HUB4 18.845 18.878 3.06 ~4.4 3.8 60,000 trigram


  • WER - Word error rate (%) (lower is better)
  • RT - Real Time - Ratio of processing time to audio time - (lower is better)
  • S3.3 RT - Results for a single or dual CPU configuration
  • S4 RT(1) - Results on a single-CPU configuration
  • S4 RT(2) - Results for a dual-CPU configuration
This data was collected on a dual CPU UltraSPARC-III running at 1015 MHz with 2G of memory.

Sphinx-4 was recently released and is available at SourceForge


Post a Comment:
Comments are closed for this entry.

This blog copyright 2010 by plamere