Saturday Mar 19, 2005

The field of Music Information Retrieval (MIR) is a fairly new field, with most of the papers and interesting research being done in just the last five years. MIR rivals and perhaps even exceeds the field of speech recognition in terms of the technical challenges. Successful MIR researchers typically need to master signal processing, machine learning, symbolic representation, search techniques, pattern classification as well as music theory.

Advances in speech recognition over the years have been aided by the availability of a number of freely available toolkits such as HTK, ISIP and the Sphinx family of recognition engines. Just as with speech recognition, the availability of good tools will help advance the state of art of music information retrieval.

The IMIRSEL (International Music Information Retrieval Systems Evaluation Laboratory) has just issued the first release of M2K the Music-to-Knowledge toolkit. M2K is an open-sourced Java-based framework designed to allow Music Information Retrieval researchers to rapidly prototype, share and scientifically evaluate their sophisticated MIR techniques.

M2K comes with a large set of MIR specific modules, (currently oriented toward audio-content-based analysis), as well as a number of sample itineraries (an itinerary a task-oriented configuration of modules). M2K builds upon the D2K data mining framework developed by the Automated Learning Group at the NCSA.

D2K offers a visual programming environment that allows users to connect the M2K modules to build more complex behaviors. For example, here's a screen shot of how M2K and D2K could be used to put together a music feature extractor that generates a set of feature vectors containing spectral and timbral features that can be used for genre or artist classification.

M2K has a lot of things going for it. It is open source, the code is well written by folks who understand MIR. Even in its first alpha release it contains quite a bit of very useful functionality. M2K provides bridges to other toolkits such as Marsyas, and Matlab. Perhaps most important, M2K has a tie-in to the IMRISEL which will soon be hosting a large collection of music (in audio as well as symbolic form) that will serve as a resource for MIR researchers. Such a resource is key to MIR researchers since getting access to large bodies of music can be very difficult and expensive. I'm really hoping that a toolkit such as M2K will emerge as the tool of choice for MIR researchers.

Monday Feb 07, 2005

Willie has just rolled a new release of FreeTTS, the speech synthesizer written entirely in the Java programming language. Check out the release announcement. Some notable changes with this release:

  • Improvements to the emacspeak server demo
  • Performance improvements for time-to-first-sample and cancel
  • Improvements to the state name expansion logic
  • Pronunciations for British Sterling and "not" sign
  • New pronunciations for Linux terms
  • Clarifications to message
  • Fix for -dump{Multi}Audio problem in freetts app
  • Elimination of hang with JDK1.5
  • Tools specifically for importing CMU ARCTIC Voices.
  • Fix for MBROLA phoneme remapping
  • Rudimentary internationalization to FestVox import.
  • Support to build FreeTTS w/o requiring JSAPI
  • Only build tests on demand (more freedom from JUnit)

If you want to try out FreeTTS, check out the Talking clock webstart demo.

A couple of weeks ago I wrote about the Million song iPod and how it was the inevitable next step in portable music. Well .. sure enough, last night during the Superbowl, I saw the commercial for Napster to go. With Napster To Go you can subscribe to the 1,000,000 or so song catalog for $14.95 a month and you can download the songs to your qualified player. (The player has to be able to run Microsoft's DRM software). I thought it was a pretty good ad showing the math:
10,000 songs on iTunes + iPod  : $10,000
1,000,000 songs on Napster + (lots of players): $14.95 per month
It's a tempting proposition, but Napster itself is a Windows only music store ... so that's not for me. Next step is to see if Apple follows suit with iTunes.

Monday Jan 31, 2005

Congrats to the solaris 10 team for releasing Solaris 10. I've just installed Solaris 10 x86 on my dual opteron box. I'm quite pleased ... Things are working wonderfully. Here's a screenshot showing all the solaris 10 goodness.

Updated: Made the shot smaller with a click-thru for the larger image

Thursday Jan 27, 2005

Last week, I mentioned Jens Vonderheide's java_mp3 a Java-based MP3 ID3 Tag Library. This week I've been fooling around with another Java-based MP3 tag library called JID3. This library is quick and easy to use. It supports V1.0, V1.1 and V2.3.0 tags (but not V2.2.0 tags). The developer, Paul Grebenc has been helpful with my questions and comments. Worth checking out if you need to process MP3 tags.

Wednesday Jan 26, 2005

And so the epic tale begins ...

    Wants pawn term, dare worsted ladle gull hoe lift wetter murder inner ladle cordage, honor itch offer lodge, dock, florist. Disk ladle gull orphan worry putty ladle rat cluck wetter ladle rat hut, an fur disk raisin pimple colder Ladle Rat Rotten Hut.
Wha..? That looks like the string of words one finds in spam email that is trying hard to wind its way through the spam filters. But its not spam and it's not just random words, it's the opening paragraph of a well-known story. This version was written by Professor H. L. Chace, who wanted to demonstrate that prosody - that is, the melody of a language - is an integral part of its meaning.

To hear this story read with the proper prosody, which will make the story much more recognizeable, check out the Exploratorum page on Ladle Rat Rotten Hut.

And never for get the moral of the story:

    Yonder nor sorghum stenches shut ladle gulls stopper torque wet strainers

Thanks to Paul Martin for showing me this one.

Tuesday Jan 25, 2005

My brother-in-law, Matt, is a 'buyer'. He doesn't rent DVDs, he buys them. He has a fairly large DVD collection (maybe 100 or so) and he can watch them whenever he wants, but he limits his DVD purchases to one every other week. I'm a 'renter'. I don't buy DVDs (with a few notable exceptions) I rent them. I probably rent two a week on average. I'd guess about 80% of people fall into the 'renter' category for movies while the rest are the 'buyers'.

However for music, most of us are 'buyers'. We want to own the music. Perhaps we have a more personal attachment to our music and the rental-bond is not strong enough for us and that is why we buy. Or perhaps there just isn't any good way for people to rent music. It's not impossible to rent music. I can go to,VirginDigital or one of the other music subscription service, pay a modest monthly fee and download all of the music I want, at any time. The problem is that the music is tied to my computer. I can't put it on my iPod, I can't listen to it in my car, I can only listen to it when I am sitting at my computer. This has been the biggest barrier to the acceptence of music subscription services - you can't take the music with you. However, things are about to change. This week's Technology Review has an article called Gunning for iTunes that describes how these subscription services are adopting Microsoft's Janus technology. With this DRM technology, music renters will be able to take their rented music with them in their portable devices. Renters won't be chained to their computers while listening to their music. They'll be able to load up their portable music player with their rented songs just like they can with their purchased iTunes songs. As long as their subscription is current and they sync their player at least once a month, they can listen to all of their rented music.

Right now, if I want to legally fill up an iPod with songs purchased from iTunes, it is going to cost me about $10,000. That's way more than I want to pay. However, I'd be quite happy to pay $15 per month to have access to a million song collection that I could listen to at home, in my car, at work, wherever. I'd have, in effect, a million song iPod. Well, except it wouldn't be an iPod. iTunes isn't a subscription service and Apple hasn't (and probably won't) adopt Microsoft's DRM. But I wouldn't be surprised if Apple decides to roll out their own subscription-based iTunes based upon their own DRM. Apple has everything they need. They have DRM, they have the deals with the music publishers, and, most importantly, they have the ubiquitous iPod. Look for the million song iPod, I'm betting that it's coming soon.

Saturday Jan 22, 2005

Trying to decide whether two performances are similar or not can be very difficult, even if the two performances are of the same song. Doug Eck has put togther a page called My favorite things which demonstrates this wonderfully. Doug has collected performances of the Hammerstein melody by a number of artists (from Julie Andrews to Outkast) in snippets of mp3 files for easy comparision. On once axis (melody) the songs are extremely similar, but on others (instrumentation, rhythm, tempo, timbre, mood) they are very different.

Friday Jan 21, 2005

I need to be able to read all the varous flavors of MP3 ID3 tags from Java. I took a tour through a number of different java mp3 tagging packages (see id3 java mp3 google search. Some were slow, some were way to complex, but Jens Vonderheide's java_mp3 was just right. It's fast, small, and easy to use. Here's a code snippet that dumps artist album and title of a song:

import de.vdheide.mp3.*;

    public  void showID3(File file) {
        try {
            ID3 id3 = new ID3(file);
            System.out.println("Artist: " + id3.getArtist());
            System.out.println("Album: " + id3.getAlbum());
            System.out.println("Title: " + id3.getTitle());));
        } catch (NoID3TagException e) {
            System.err.println("no ID3 Info found for " + file);

Thursday Jan 20, 2005

My buddy and Sun labs cohort Steve Green has just started a new blog. Steve's Search Guy blog is about search technologies and all of the interesting stuff that surrounds search. Steve's most recent entry is about perverse ways to hack google's spelling checker. Steve's a smart and really funny guy, so its probably worth checking out Search Guy every once in a while.

An article at NewsForge: Speak to me, Linux is a quick survey of Linux desktop speech. It includes brief descriptions of the various recognizers (sphinx-[2,3,4]) and synthesizers (festival, flite and freetts) as well as some of the speech apps such as KTTS and Perlbox. The article concludes: The ease of using these speech engines and speech recognition systems could make Linux the preferred OS for the visually impaired. Hmmm ... I've never heard the terms 'ease of use' 'speech' and 'Linux' used together in one sentence before, interesting.

Wednesday Jan 19, 2005

JVoiceXML is an open source implementation of VoiceXML written in Java. JVoiceXML is currently in the pre-alpha stage. It's not a full implementation yet but it looks to be a great start. The code looks to be of very high quality. The first pre-alpha release was made available today (January 19, 2005), so this is about as fresh as it gets. Worth keeping an eye on.

Tuesday Jan 18, 2005

Gracenote (the folks who own the CDDB database) and ScanSoft (the folks who make desktop and embedded speech recognizers) are partnering to allow voice control of digital media devices such as car audio, mp3 players and home stereos. The metadata that Gracenote provides (artist, song name, genre info) can be used to generate speech grammars for these devices. You'll be able to say "play misty for me". Read more at Tom's Hardware.

Hit Song Science is a program that uses pattern matching and clustering technology to determine if a song has what it takes to be a hit song. For fifty dollars an artist can run their song through HSS which will compare the musical characteristics (such as melody, harmony, chord progression, beat and tempo) of the song against 30 years worth of Billboard hit singles and returns a score that represents how likely it is that the song will become a hit. According to the guardian article: Together in electric dreams, HSS predicted Norah Jones' success well before her debut album became a hit. An interesting idea to be sure... but I am extremely skeptical.

Update: Looks like this is a Slashdot story now too.

Friday Jan 14, 2005

Good interview: Asking "Why?" at Sun Laboratories: A Conversation with Director, Glenn Edens at Glen says:
    Product organizations are mostly staffed with engineers. And engineers are mostly nerds, who ask: "How are we going to get this done? How does this work? How can we make it better?" How, how, how.

    A research lab tends to consist of hippies, and hippies just ask why. Why, why, why. Why do I have to do it this way? Why should I do that? Why do I need to fill out this form? Why do I have to -- anything.

I think its true, the labs seems to be more hippie than nerd ... there certainly is more tie-dye and dead-heads here then any place else I've worked.

This blog copyright 2010 by plamere