Yahoo Research's Malcolm Slaney presented a paper at ISMIR on Wednesday about how they are using user rating data to create song similarity data. Yahoo is in the enviable position of having billions of user-taste data points about music.  This data, naturally, can be used to generate item to item similarities that would be extremely useful as ground truth for any number of MIR tasks.  Malcolm's motivation for the talk was to propose an alternative to the rather time-consuming and painful  process of human evaluations that are used in the music similarity task in MIREX.  Malcolm presented a rather traditional item-to-item collaborative filtering system - nothing new in the approach, I was hoping that at the end Malcolm would say that they are giving a big wad of the taste data or the item-item similarity data to the MIR community, but alas, Malcolm says that it is just too hard to give away such data - especially after the AOL shared data fiasco of last year.  

That's all fair. And now I understand your question about the not-so-similar results. Nobody, including me, has shown that item-to-item similarity forms a metric space. Any ideas?

As far as data goes... we have released a large dataset of music rating data. The dataset contains over 717 million ratings of 136 thousand songs given by 1.8 million users of Yahoo! Music services. The data was collected between 2002 and 2006. Each song in the data set is accompanied by artist, album, and genre attributes. The users, songs, artists, and albums are represented by randomly assigned numeric id's so that no identifying information is revealed. Alas, since the media's id's are randomized, there is no way to connect this to content.

Send email to me at [email protected] for more information.

- Malcolm

Posted by Malcolm Slaney on September 29, 2007 at 03:29 AM EDT #

Post a Comment:
Comments are closed for this entry.

This blog copyright 2010 by plamere