Finally, a good application for speech recognition ... Podzinger uses speech recognition to index podcasts and videocasts, making this whole world of spoken audio searchable.  When you search on Podzinger, podzinger will show the set of podcasts that match your query, and even allow you to directly play the content at the point that matches your query.  It's an audio-based passage retrieval.  PodZinger is not trying to create transcripts of podcasts, but instead to make them searchable. That means the speech recognition doesn't have to be 100% accurate (or even 85% accurate), it just has to be good enough to get the 'content' words ... which tend to be longer and less confusable than all of the typical stop words like 'of', 'a', 'and' and 'the'.  Still, the text passages shown by PodZinger are suprisingly understandable, and give you a good idea whether the associated podcast is interesting enough to warrant a listen.

PodZinger also has a feature called 'the ZING index which is like the Google Zeitgeist.  It reports on who and what is being talked about most of all on podcasts.  Dick Cheney is topping this week's Zing Index.

The key to success for such an ambitious project is the quality of the speech engine.  Speaker independent, continuous speech recognition of spontaneous speech (especially with multiple speakers, background music and noise) is very difficult.  Add to that the scaling problems ... trying to process 50,000 hours of speech in a week takes a lot of CPU time.  This is not a problem that I'd expect a small startup company like PodZinger to be able to tackle, but it turns out PodZinger is not really a small startup ... its tied to the venerable BBN, the research contractor that has a long history of developing speech recognition engines.  BBN certainly has the know-how to deal with these issues. 
Comments:

Post a Comment:
Comments are closed for this entry.

This blog copyright 2010 by plamere