The Riddle of the Sphinx.

At Thursday's keynote, James Gosling and Simon Ritter demonstrated a speech recognizer called Sphinx. They were talking about Sphinx-4 a speech recognition system written entirely in Java.

Sphinx-4 is a state-of-the-art, open source, speech recognition system created via a joint collaboration between the Sphinx group at Carnegie Mellon University, Sun Microsystems Laboratories, Mitsubishi Electric Research Labs (MERL), and others.

Sphinx-4 performs well for a wide range of recognition tasks, from the very small to the very large. The following table compares the word-error-rate (WER) and runtime (RT) for Sphinx-4 and Sphinx 3.3 (CMUs 'fast' recognizer written in C). Lower numbers are better.

Test	S3.3 WER	S4 WER	S3.3 RT	S4 RT(1)	S4 RT (2)	Vocabulary Size	Language Model
TI46	1.217	0.168	0.14	.03	.02	11	isolated digits recognition
TIDIGITS	0.661	0.549	0.16	0.07	0.05	11	continuous digits
AN4	1.300	1.192	0.38	0.25	0.20	79	trigram
RM1	2.746	2.739	0.50	0.50	0.40	1,000	trigram
WSJ5K	7.323	7.174	1.36	1.22	0.96	5,000	trigram
HUB4	18.845	18.878	3.06	~4.4	3.8	60,000	trigram

Key:

WER - Word error rate (%) (lower is better)
RT - Real Time - Ratio of processing time to audio time - (lower is better)
S3.3 RT - Results for a single or dual CPU configuration
S4 RT(1) - Results on a single-CPU configuration
S4 RT(2) - Results for a dual-CPU configuration

This data was collected on a dual CPU UltraSPARC-III running at 1015 MHz with 2G of memory.

Sphinx-4 was recently released and is available at SourceForge

Duke Listens!: Visit my main blog at MusicMachinery.com

The Riddle of the Sphinx.

About this weblog

Index

Your Current Location