GIBRO - Garbage In, Bad Recommendations out
Social music web sites like last.fm and iLike are all the rage. They harness the wisdom of the crowds to make recommendations - using the Amazon-esque 'people who listen to X also listen to Y' style of recommendations. The core value for these social music sites is the taste data that they collect on all of their users. However, because this taste data is user submitted, it is fairly easy for users to submit fake data. For instance, here's a little applescript from Doug's Applescripts that sets the playcount of the first selected song in iTunes to 250,000
tell application "iTunes"
set t to get item 1 of selection
set played count of t to 250000
end tell
Many
of the social recommender sites will sync up with the iTunes playcount
data. So all I have to do is select 'Perfect Me', run the script, sync
up with my favorite social music site and I've become Deerhoof's #1
fan. People really do this ... last.fm user Kikoin has 'played' Metallica songs over 128,000 times. iLike user Shay K has 'played' the White Stripes over 472,000 times. Take a look at the playcounts for Shay K's top ten artists:
472,744 The White Stripes
337,880 The Beatles
337,819 Beastie Boys
270,334 Stephen Malkmus
270,211 Tom Petty & The Heartbreakers
256,433 Ween (on tour)
202,665 The Cure
137,006 The Polyphonic Spree
135,455 Wilco (on tour)
135,290 Vince Guaraldi Trio
Adding up the playcounts for Shay K's top 30 tracks we get more than 5 million total playcounts. At 3 minutes a song, if Shay K listened to music 24 hours a day, we get about 30 years of continuous music. Imagine now, if Shay K's data goes into the iLike music recommender unfiltered - Shay K's synthetic taste could dominate the similarity model and we'd end up with some rather surprising recommendations like "if you like the White Stripes, you may also like the Vince Guaraldi Trio".
If you are going to deploy a 'wisdom of the crowds' music
recommender you are going to have work hard to separate the real taste
data from the manufactured taste data, or else you will generate
bad recommendations. Garbage In - Bad Recommendations out.
John Riedl at GroupLens has done some interesting work on this issue, including his 2004 paper, "Shilling recommender systems for fun and profit".
I think it is worth pointing out that the spam problem is much worse for "winner take all" systems than for recommender systems. If you manage to get to the top of Digg or Google search results, you get huge amounts of traffic. If you manage to manipulate a recommender system, you likely only will be seen by a small fraction of the audience.
Posted by Greg Linden on May 22, 2007 at 12:09 PM EDT #
Posted by elias on May 22, 2007 at 05:26 PM EDT #