Improving social tagging

There's a nifty flash demo showing how the iTunes/qloud plugin supports tagging of songs. It is quite nice how the QLoud tagger is integrated directly into iTunes. You can easily tag your songs, and then use those tags to generate good playlists. The tags that you use to organize your local collection are also sent back to the qloud where they are aggregated and used to assist music discovery. Funny thing, though, in the tagging example they show a user getting songs that are tagged with 'love' to get a set of romantic songs. However, tagging is a somewhat fickle process. The user may get romantic songs, but the user may also get songs that were tagged 'love' because the tagger loved the song, not because it was a perfect first-date song. So mixed in with the heart-melters, may be a face-melter. With the result that your romantic candelight dinner may be interrupted by the wails of Eddie Van Halen.

That good-looking Search Guy, Steve Green, will tell you that these social tagging systems just need to get a little smarter. With a little clustering-goodness, a tagging system such as Qloud should be able to notice that there are two types of 'love'-tagged songs: those associated with romance, and those associated with user-preference and if you build a playlist using an ambiguous tag, a less ambiguous tag (such as 'romantic' or 'favorite') is suggested.

Flickr is starting to do this. You can explore the clusters associated with a tag. For instance Explore / Tags / apple / clusters will show the clusters associated with the word 'apple'. There are four main 'apple' clusters: Pictures of macIntosh computers, pictures of the fruit, screenshots of OS X, and pictures of New York City (including one 'double-apple' showing a picture of the new Apple store in Manhattan).

Music discovery systems such as last.fm and qloud need to start doing the same thing - cluster the tags and help us dis-ambiguate the tags - this will give us more understandable results when we start to use the tags for generating playlists.

Comments:

It seems like you could take advantage of recent work in statistical (text) topic modeling to do something like this. I'm thinking in particular of LDA - latent dirichlet allocation. (This is more than just document or tag clustering.) If you applied something like this to the sets of tags on a music collection, you might indeed learn that there are two senses of the word "love", and could generate your playlists accordingly. However, the big gain here is not just limiting yourself to text/tag features.. but integrating some musicological processing of the raw audio features, too, to assist with the topic modeling. I'd really like to see a company start doing something like that.

Posted by Jeremy P on October 23, 2006 at 03:19 PM EDT #

paul--

i'm a litte late on getting back to you on this. anyhow . . .

regarding the quality of the data we collect, we think that the dynamic goes like this: users tag primarily for their own benefit, to better organize their collections, whether that be with bookmarks (delicious) or, in our case, their music libraries.

this is why having our tagging occur inside iTunes is so key. users tag there for their own benefit . . . the impact on our service happens as a secondary consequence. when adding a tag directly benefits the individual, the quantity and quality of the tags is greatly enhanced. and tags, while not the only dimension we use in our search, are the most important.

and yes, we see clustering as very important and plan on doing it too.

on your 30 second clip question . . .. again, users tag for us while they're listening in their own library. the quality of our data coming in is not affected by the length of our clips. nevertheless, we think it could be a better discovery experience for users to also have access to a full length streaming version of each song. we're thinking of adding that too.

thanks again for blogging about us. we appreciate the discussion!

Posted by Toby on October 24, 2006 at 01:30 PM EDT #

Duke Listens!: Visit my main blog at MusicMachinery.com

Improving social tagging

About this weblog

Index

Your Current Location