Tuesday Sep 13, 2005

So lets talk about ISMIR. This year's ISMIR is being hosted by Queen Mary University of London and Goldsmiths College, University of London. This year ISMIR has some corporate sponsors including Microsoft research, Hewlett Packard, Philips Research, and Sun Microsystems. ISMIR is starting to be recognized in the industrial/commercial world as an important conference in today's world of digital music. (It has already been recognized as such in the academic world).

About 230 people are attending ISMIR this year, representing 25 countries. The proceedings are over 700 pages long (and add to the amount of exercise I've been getting as I lug this precious volume up and down the streets of London).

One of the themes of this year's ISMIR seems to be Let's hook up the music technologists with the musicologists Indeed, these two groups, musicologists and technologist have not really collaborated too much in the MIR community which causes endless difficulties and duplication of effort. For instance, recently on the MUSIC-IR mailing list, there's been a debate about the usefulness of music genre classification (and the use of genre in general). It was pointed out (several times) today that it would be wise (and a heck of a lot easier) to just ask the musicologists how to deal with genre and all the problems it is causing since musicologists have experience answering just this this sort of question. This theme of improved communication and cooperation were outlined in the opening talk by Nicholas Cook, a Research Professor of Music, at Royal Holloway, University of London. Professor Cook warned that although computers have been a useful tools by musicologists for quite sometime, it is important to remember that the listening experience is an essential part of music analysis, and that with 'computer musicology' we are in danger of losing that.

One of my favorite talks of the day was by Jin Ha Lee from the University of Illinois at Urbana-Champaign. She talked about a very interesting problem that I didn't even know existed. Imagine hearing music from another culture. If you really like the music, you might want to hear more of the music, or you may want to learn more about the artist or the genre of this new music. Jin Ha points out that since all of our music search tools are text based we have no real way of forming queries for this music. We may not understand the lyrics, we certainly don't recognize the genre, we don't recognize the artist, we may not even understand the mood (since the mood of a song is often conveyed in the lyrics). Jin Ha suggests that for cross cultural/multilingual music information seeking, the best approaches for music search may be non-language queries such as query by example, query based upon music similarity, culturally neutral labels.

Gerhard Widmer from Johannes Kepler University described some interesting work involving data mining the web for co-occurrence of artists using a technique similar to Google's pagerank for discovering 'prototypical artists', that is, those that are the best representative for their particular genre. Some of the difficulties with mining the web are dealing with bands that have names that are also common words. Bands such as Queen, Kiss and Yes get skewed statistics because of their common names. I didn't dare ask how they'd deal with the band "the the".

Ian Knopke of McGill presented some early work in the geospatial location of music and sound files. Ian has a web crawler designed to find music and then uses a number of techniques to discover where physically the data is located.

Thomas Lidy presented some excellent research on how to evaluate feature extractors and psychoacoustic transformations for music genre classification. Thomas was able to show how their rhythm histogram led to genre classification accuracy improvements. I wonder about some of the testing methodology though. I think some of the test data sets were too small, or Thomas's classifier was over-fitting because there was some unexplained large difference in accuracy across different data sets. This highlights one of the trouble spots for audio classification ... there just isn't enough data for testing and training systems. The copyright problems make it very hard to get an adequate sized, labelled collection that researchers can share.

Cory McKay of McGill described the framework for optimizing music classification that they've been building. Cory and others have been building ACE, the Autonomous Classification Engine. Given a set of features, ACE will try a number of classifiers, tuning parameters, classifier ensembles and other techniques to try to optimize the classification. ACE is written in Java and is released (or will be soon) under the GPL. ACE looks really interesting. It will be a great tool for anyone trying to improve music classification algorithms. I expect that ACE will help raise the bar in a number of classification tasks at next years MIREX. The McGill team is also releasing several other open source Java-based packages for use by MIR developers. With these contributions, McGill is really adding value to the MIR community. Go McGill!

Audo to ScoreThere were a number of talks in the 'audio to score' track describing techniques for extracting pitch, onset, and harmonic content directly from audio. These are probably some of the most mathematically intensive talks of the conference, .. the math was flowing .. at one point a slide was shown with probably 10 separate formula, with the speaker comment of "and then some math occurs and then ...". Two separate papers presented by students from the University of Tokyo involved pitch extraction. There was an interesting audio demo of converting an audio stream to MIDI data. These system were able to deal with harmonies generated by piano and guitar quite well.

Monday Posters

During lunch time we were able to review a number of posters while we munched on our sandwiches and finger food. There were just too many posters for me to get a chance to talk to all of the presenters, but I did talk to a few folks. Some highlights:
  • Playsom Robert Neumayer was demonstrating his PlaySom system. PlaySom is a user interface that allows the user to browse a large music collection by navigating a map of clustered music tracks. This approach provides a potentially better way to find and build playlists on small devices.
  • Sonixplorer A system that uses visualization and auralization for content-based music exploration of music collections. Dominik Lubbers showed a clustering system where aural browsing of music collections were possible.
  • Foafing the Music This system uses 'friend of a friend' and RSS for recommending music depending on the users personal tastes. This system, developed by folks at Pompeu Fabra in Barcelona Spain mines data from RSS feeeds including Audioscrobbler as well as based upon the content of the music (i.e. content-based music similarity).
  • Rebecca Fiebrink was describing her work on using genetic algorithms for determine feature weighting for music classification. You can try it out at their website.

The day ended at about 6:30. I ended up joining a group going out to dinner in an East London Indian Restaurant. Jeremy Pickens was kind enough to shepard the six of us through the tube and to the restaurant. The food was excellent, and plentiful (and extremely inexpensive for London).

And finally the day was capped off with a Jazz/rap concert by Soweto Kinch in Spitalfields.

You don't hear music like that in New Hampshire.

All in all it was a great day. And now ... off to bed ...

Sunday Sep 11, 2005

Well, I made it to London in one piece. Funny thing was that the last 15 miles from the airport to the hotel took almost as long as the first 3000 miles. Getting through the passport line was the hardest of all. 1.5 hours of standing in line after a long flight, blah. But, after a good night sleep (in the smallest hotel room in the world) I took a good tour around london. Here's a nice shot from the top of the London Eye

After my walk around town, I connected up with a few folks and went to find the conference venue, register and get the conference packet. The packet included a printed copy of the proceedings (W00T) which will give me many hours of reading pleasure. One of the neat things about ISMIR is that all of the papers are published on-line shortly after the conference. For instance, you can find all of the 2004 papers here.

Tomorrow things start in earnest... looking forward to some MIR fun.

Saturday Sep 10, 2005

In just a few minutes, I'll be heading out the door to catch a plane. I'm heading to London where I'll be attending ISMIR - The International Conference on Music Information Retrieval . ISMIR is the primary venue for MIR researchers and practitioners. This will be my first trip to ISMIR and I'm looking forward to meeting many of the folks that I know of through emails, papers and journal articles. I'm anticipating a great week of learning, talking and getting to know people. Plus, after the conference I'll have over a hundred new papers to read and digest. Lots of fun.

Wednesday Sep 07, 2005

Nice article at from Investor's Business Daily about some of our search projects going on here in the labs: Search Firms: Google, Yahoo ... And Sun?

Friday Aug 26, 2005

In the last few months, we've been working hard trying to improve our algorithms for determining acoustic and perceptual similarity of music. One aspect of this is task is to chose a good set of features to extract from music. We've done a number of experiments with various feature sets to help us understand what feature sets work best for various tasks.

One major difficulty with these types of experiments is that feature extraction can be extremely CPU intensive. Lots of time consuming DSP algorithms (MP3, decoding, FFTs, filtering, windowing, convolutions, DCTs) are used during feature extraction. A fast feature extractor can run in 0.1 X Realtime, that is, it can process ten seconds of audio in one second. That seems pretty fast, but it still takes nearly three days to process our modest sized test collection of 10,000 songs. And we don't just want to extract one feature set, we want to extract and experiment with twenty different kinds of feature sets. Plus, sooner or later we are going to want to scale this up to industrial sized music collections. Extracting a single feature set for 2,000,000 songs could take 18 months of continuous processing.

Luckily, I work for a company like Sun, that has some pretty good computing resources. Sun is rolling out its Sun Grid , the $1/cpu-hr compute utility. With the grid, I should be able to take my feature extractor and distribute it over a collection of hundreds of CPUs to yield a 100X performance. My 3 day feature extractor will run in 45 minutes. The 18 month processing of 2,000,000 songs will take less than a week. This has the potential to really change how I work. I'll be able to try all sorts of experiments that would just take too long otherwise.

I'm really excited about this. With the grid, I'll be able to do all sorts of things that would otherwise be nearly impossible.

Thursday Aug 25, 2005

Lots of rumors that Amazon.com is getting ready to roll out its own music download and subscription service. It's nice to see more choice in this arena, but so far the digital music services all suffer from the same problems: DRM tied to a specific platform, overly restrictive DRM , and missing artists

The day I can purchase a Beatles song from my Solaris laptop transfer it to my iPod is still a long way off. Sigh.

Wednesday Jul 13, 2005

Interesting article about FreeTTS in The Star Online.

Tuesday Jun 21, 2005

There have been a number of rumors today that Google is getting ready to launch their own iTunes clone. The source of this rumor is this little blurble by Dave Winer: "I've been hearing rumors that Google is readying an iTunes-clone, based on RSS 2.0, and fully podcast-capable. Multiple sources on this one."

Monday Jun 13, 2005

Rumors are that Microsoft is planning to deploy a music subscription service similar to the one recently unveiled by Yahoo. The interesting thing of course is that all of the music subscription services that allow you to take your rented music on the road in a portable music player rely on Microsoft's Play For Sure technology. So Microsoft will have to figure out how to offer a music subscription service without alienating its DRM customers at the same time. Hmmm ... it would be nice if there were some alternative portable DRM solutions out there ...

Robert Brewer has just released the first version of SpeechLion - a small speech recognition application based on Sphinx-4 which provides the ability to control your desktop via speech.

Thursday Jun 09, 2005

I just hooked up a Sun Ray in my home office. It was surprisingly easy to setup, just connect the supplied VPN router to the home router, connect the Sun Ray to the router and plug in the monitor, keyboard and mouse and I was done. Some of the big surprises:

  • I have a 1920x1600 flatpanel monitor. The Sun Ray was able to recognize and drive the display at its full resolution.
  • The Sun Ray is very responsive. Editing, browsing, mouse clicking ... all work just as well as when I am working at the office.

So far, I'm very happy with the Sun Ray ... it saves me lots of time. I don't have to fool around with long VPN passwords, and my work session travels with me when I go from home to work and back home again. I just plug my badge into the Sun Ray wherever I am an my session appears.

Thursday Jun 02, 2005

Nifty article on Java Performance at OSNews. They cite the FreeTTS Performance Paper.

Friday May 27, 2005

Stephen Downie's M2K team has just released the 1.1 alpha version of M2K. M2K is a set of music specific D2K modules that can be used by MIR researchers for development, prototyping and evaluation.

This release is quite a significant upgrade from the 1.0 release. It includes a full range of itineraries for the upcoming various Music Information Retrieval Evaluation eXchange (MIREX)

Wednesday May 25, 2005

It just keeps on raining here in New Hampshire. No sun for such a long time. I feel a Google Poem coming on ...

It just keeps on raining

It just keeps on raining; pouring from the sky.

It just keeps on raining, watching life go by.

You can feel offended that it rains on your birthday party but the rain doesn’t care – it just keeps on raining.

But it just keeps on raining and I just keep complaining And I really wouldn't blame you If you didn't want me for your friend.

I feel like this cloud of depression is floating over me and it just keeps on raining.

Man, it just keeps on raining. Which makes me lazier. I have a busy work schedule leading up to Jamaica which means the hole I'm digging won't be as bad.

"But the weather is terrible, as it just keeps on raining!" protests virtually all of my friends. In contrast, I am hoping it would rain, ...

It just keeps on raining here. Everyday it rains. All weekend it rained. Each day I got up early, ran though the rain to the garage, and got in my car.

It's so good to see that after the drought we've been living with for many years. Now I hope it just keeps on raining this summer.

Tuesday May 24, 2005

This recent article on slashdot called New Phone Service Promises to ID Songs talks about a phone-based service called 411-SONG. There's nothing new here, audio fingerprinting systems such as Shazam, Relatable and MusicBrainz have been doing this for a while.

The paper A Highly Robust Audio Fingerprinting System gives a good description how an audio fingerprinting system works. Update: fixed the broken PDF link

This blog copyright 2010 by plamere