The Riddle of the Sphinx.
At Thursday's keynote, James Gosling and Simon Ritter demonstrated a speech recognizer called Sphinx. They were talking about Sphinx-4 a speech recognition system written entirely in Java.
Sphinx-4 is a state-of-the-art, open source, speech recognition system created via a joint collaboration between the Sphinx group at Carnegie Mellon University, Sun Microsystems Laboratories, Mitsubishi Electric Research Labs (MERL), and others.
Sphinx-4 performs well for a wide range of recognition tasks, from the very small to the very large. The following table compares the word-error-rate (WER) and runtime (RT) for Sphinx-4 and Sphinx 3.3 (CMUs 'fast' recognizer written in C). Lower numbers are better.
Test | S3.3 WER | S4 WER | S3.3 RT | S4 RT(1) | S4 RT (2) | Vocabulary Size | Language Model |
---|---|---|---|---|---|---|---|
TI46 | 1.217 | 0.168 | 0.14 | .03 | .02 | 11 | isolated digits recognition |
TIDIGITS | 0.661 | 0.549 | 0.16 | 0.07 | 0.05 | 11 | continuous digits |
AN4 | 1.300 | 1.192 | 0.38 | 0.25 | 0.20 | 79 | trigram |
RM1 | 2.746 | 2.739 | 0.50 | 0.50 | 0.40 | 1,000 | trigram |
WSJ5K | 7.323 | 7.174 | 1.36 | 1.22 | 0.96 | 5,000 | trigram |
HUB4 | 18.845 | 18.878 | 3.06 | ~4.4 | 3.8 | 60,000 | trigram |
Key:
- WER - Word error rate (%) (lower is better)
- RT - Real Time - Ratio of processing time to audio time - (lower is better)
- S3.3 RT - Results for a single or dual CPU configuration
- S4 RT(1) - Results on a single-CPU configuration
- S4 RT(2) - Results for a dual-CPU configuration
Sphinx-4 was recently released and is available at SourceForge