This week on the
Inside the net podcast is an interview with Tim Westergren of
Pandora.
Pandora is a content-based music recommender/music streamer. In
the interview, Tim enthusiastically describes how the Pandora and the
underlying music genome project works. According to TIm, at
Pandora, they have about 40 musicians that spend their days labelling
songs with about 400 different attributes. It takes about 20 to 30
minutes for their trained musician to rate a song, and they are
currently adding about 7,000 songs a month. After six years,
Pandora has amassed labels for hundreds of thousands of songs and can do
a find job of recommending music based upon similarity. However,
they do have difficulties with scale. They are only able to add a
very small fraction of new music each year and have to pick and choose
which songs get added to their database. They skip some genres
completely. For instance, there is no Classical music in Pandora.
One of the advantages of a content-based recommender like Pandora has
over the more traditional collaborative filtering models used in systems
like
last.fm is that
they are immune to the popularity bias that is found in the
collaborative filtering systems. A content-based system is just as
likely to recommend a garage band as it is to recommend the Beatles,
since it is immune to popularity, whereas a collaborative filtering
system that is based upon user listening patterns is much more likely to
ignore the unknown bands. They don't recommend unknown bands because no
one is listening to them. Content-based recommenders can push
listeners into the
long tail
of music, finding unknowns that sound like music we already like.
Unfortunately for Pandora, the scaling problem makes them less likely
to be able push people into the long tail. Pandora has to pick and chose
which songs to process, and they will start with the most popular songs
first, so they really can't push people too far into the long tail
(yet). Despite these issues, Pandora is really good way to explore
and discover new music. It is worth trying, and since it relies
on flash, it runs on just about any platform out there..