Tuesday May 22, 2007

Social music web sites like last.fm and iLike are all the rage. They harness the wisdom of the crowds to make recommendations - using the Amazon-esque 'people who listen to X also listen to Y' style of recommendations.  The core value for these social music sites is the taste data that they collect on all of their users.  However, because this taste data is user submitted, it is fairly easy for users to submit fake data.  For instance, here's a little applescript from Doug's Applescripts  that sets the playcount of the first selected song in iTunes to 250,000

tell application "iTunes"
set t to get item 1 of selection
set played count of t to 250000
end tell

Many of the social recommender sites will sync up with the iTunes playcount data. So all I have to do is select 'Perfect Me', run the script, sync up with my favorite social music site and I've become Deerhoof's #1 fan.   People really do this ...  last.fm user Kikoin has 'played' Metallica songs over 128,000 times.  iLike user Shay K has 'played' the White Stripes over 472,000 times.  Take a look at the playcounts for Shay K's top ten artists:

      472,744 The White Stripes
      337,880 The Beatles
      337,819 Beastie Boys
      270,334 Stephen Malkmus     
      270,211 Tom Petty & The Heartbreakers               
      256,433 Ween (on tour)
      202,665 The Cure
      137,006 The Polyphonic Spree
      135,455 Wilco (on tour)
      135,290 Vince Guaraldi Trio

Adding up the playcounts for Shay K's top 30 tracks we get more than 5 million total playcounts.  At 3 minutes a song, if Shay K listened to music 24 hours a day, we get about 30 years of continuous music.  Imagine now, if Shay K's data goes into the iLike music recommender unfiltered - Shay K's synthetic taste could dominate the similarity model and we'd end up with some rather surprising recommendations like "if you like the White Stripes, you may also like the Vince Guaraldi Trio".

If you are going to deploy a 'wisdom of the crowds' music recommender you are going to have work hard to separate the real taste data from the manufactured taste data,  or else you will generate bad recommendations.  Garbage In - Bad Recommendations out.


Monday May 21, 2007

Last week J. wrote about the how building a music mashup is a great model - little startup risks, no liability, no licensing costs.  Highlighting the downside of a business built on the services of others ... this week Amazon,  according to Techcrunch has banned startup Zilo from collecting referral fees for Amazon products listed in the Zilo stores.

Last week I wrote about the trouble with resolving artist names. Here's another real-life example of the trouble.  El-Producto (aka EL-P) is (according to the Wikipedia) an American rapper, producer and entrepreneur from NYC who is a major driving force in alternative hip hop for over a decade.  Last.fm recommends similar artists like Canibal Ox, Company Flow, Mr. Lif, Aesop Rock, Cage and Vast Aire.  All reasonable recommendations.  Musicmobs yields artists like Murs, Cage  and Company Flow. Again reasonable.  But iLike yields recommendations like: Yes, Kansas, Genesis and Superior.   These are not good recommendations. It looks like iLike is confusing EL-P the rapper with ELP the prog rock band.

Lucky for many of the music recommender sites  that it is really hard to objectively evaluate how well a recommender works, otherwise I think some of the high profile, well-funded sites would be embarrassed when compared to some of the smaller scrappy sites.

Last year I tried out iLike to see how well it worked.  I didn't like their 'similar artists'  recommendations too much.  I figured it was a cold-start problem.  Since they were a new site, they didn't have enough data to make good recommendations.  I tried iLike again this morning to see how well they improved after having 6 months of data.  I was surprised to learn that I was getting the exact same  recommendations as I got back in October.  It is as if they have not updated their similarity data since they have launched.  Some of these recommendations are horrendous.  Type in Emerson Lake and Palmer and the related artists are:

  • Elmo and Patsy
  • Jose Feliciano
  • Vince Guaraldi Trio
  • Brenda Lee
  • Holiday Express
  • Bing Crosby
  • Burl Ives
  • Trans-Siberian Orchestra

iLike is still thinking that the most distinctive feature of ELP is their one Christmas song. 

Even more telling are the recommendations for Rodrigo y Gabriela.  Even though Rod and Gab have been played 20,000 times by iLike users, iLike can't offer any recommendations at all. My guess is that when they built their artist similarity model back in October, Rod and Gab weren't on the map yet - so they had no data. But 6 months later Rod and Gab are very popular, iLike should have no trouble generating recommendations for them.  In their FAQ iLike says:

How do you pick which artists are "related"? We use computer algorithms that figure this out using the "wisdom of crowds", i.e. "people who listen to this artist also listen to these other artists". The more people use the iLike Sidebar for iTunes, the better our computer algorithms will get at figuring out which artists are related.

But this doesn't seem to be happening.  Perhaps iLike doesn't see value in generating good recommendations, and just hasn't bothered to push all of their social data into their recommendation engine.

Saturday May 19, 2007

Note that the deadline for submission to Recommender Systems 2007 has been extended to June 3 for long papers and June 10 for short papers.

Friday May 18, 2007

Moody looks pretty neat.  You tag your personal collection via song mood on a colorful Thayer mood scale.  You can play songs back based on their mood.  What puzzles me though is that they don't seem to aggregate the moods of all of their users so that people can share their mood tags.  It seems to be the obvious thing to do. If they don't do it, maybe Owen will.

This post on boingboing highlights the difficulty in identifying a song if you have no other information about it.  There are technologies such as MusicDNS and Shazam that are designed to assist in song identification, but these technologies don't always work.  iden.tify.us takes a Wisdom Of the Crowds approach.  Trying to identify a song? Upload a clip  to iden.tify.us or type in a lyric, or just describe the song and let others help identify it.   iden.tify.us is a brand new site (beta 0.1) with a small number of users, so the term 'wisdom of the crowds' may be relying on the "two's company three's a crowd" definition of 'crowd'.   iden.tify.us is trying some novel ways to expand their user base. Frequent song identifiers can get referral bonuses, and they make rss and podcast feeds available for the content, allowing you to take song identification on the road (but would you really want to?)

I don't see this becoming the next digg, but it might be popular enough to actually attract the crowd that will be necessary to make it useful.  (and yes, I hate the del.icio.us style names - I can never remember where to put the dots.  (via toby).

The recent press release from Amazon about their soon to open music store indicated that they had deals with 12,000 record labels.  That seemed to be an incredible number of labels - stretching the bounds of believability, it left me wondering how many music labels are there.   A web search leads to this large list of record labels on the Wikipedia.  There's also a pointer to AllRecordLabels that claims to have links to 23,000 (!) record labels or net labels.  That's a whole lotta labels.

Thursday May 17, 2007

My new favorite blog is Save the Robot. This is the work blog of Chris Dahlen who writes about popular culture, music and gaming.  Chris is remembered for his pitchfork review of Travistan, where his 0.0 rating sent the CD straight to the seconds bin.   In his blog, Chris writes about things I want to read about .. Hermione/Cho, The Office, quirky, attractive indie pop girrls, parenting, and especially the tension between the old media and the new.  Good stuff, and the best part is that Chris is a real writer so he can turn a phrase (Steve particularly liked the story about the 'swelling Harry Potter fan fiction community'.

Wednesday May 16, 2007

Hot off the press.  Amazon has announced that they will be launching a DRM free music store with music from 12,000 labels, including music from EMI.  Back in January I wrote why I think Amazon could be a big deal in digital music.  Two aspects that Amazon potentially brings to digital music:

  • Discovery -  Amazon's focus on discovery makes Amazon a much better online bookstore than any other bookstore.  They use all sorts of ways to connect a reader with a book.  Collaborative filtering, book reviews, customer lists,  content search,  best seller lists , special deals.  These techniques help get their readers deep into the long tail of books.  Discovery is in Amazon's genes.  When they start selling digital music, you can bet that they will have the same focus on discovery and give the listener new and interesting ways to find music in the long tail.  A listener may come to Amazon to pick up the latest U2 track, but may find themselves happily downloading a track by an obscure artist.  This is good for the listener - they will will be exposed to a larger variety of music and this is good for the long tail artists.
  • Metadata - Amazon has a great set of web services built around their data.  Using Amazon's web services, one can get access to book descriptions, book cover images, reviews, pricing information - just about  any piece of data  in Amazon's database is exposed via their web services.  Exposing their data in this fashion places Amazon at the center of the online literary ecosystem.   Any startup company that wants to be in a business related to books will  use Amazon's API  because it is easy, the data is of high quality and it is free.  This is good for the startup, and even better for Amazon since all of those startups end up sending their customers to Amazon.  Amazon is already a big part of the music ecosystem.  They already have lots of data for music CDs that is available via their web APIs.  They are probably the largest supplier of album art on the web.  The Amazon part number - the ASIN - is used throughout the web as an unambiguous identifier for an album. Once Amazon starts to sell individual tracks, I would expect that Amazon will create an ASIN or an equivalent for each track in their database.  This track-level identifier may become the primary way of identifying tracks in the music world since Amazon makes it so easy to get all of the information about an item once you have the ASIN.  This could be a key enabler in the next generation of music - a  ubiquitous song ID tied to deep metadata.
The store is set to open later this year ... I am very interested in seeing what this does to the world of digital music and music discovery.


Some music highlights from JavaOne 2007:

  • 'Real' bumper music - In previous years, the music at the keynotes was canned, non-commercial music.  The type of music you could buy a license for and play forever.  I can still remember the trombone solo in one of the frequently played tracks.  This year, they played 'real' music - that is, commercial music - songs by OK Go,  U2 w. Mary J. Blige, songs that most people would recognize .  Same for the tech sessions - the real music was nice.   I did notice on the keynote webcasts, that I've watched, they've replaced the commercial music with the canned music, sigh ... such is the state of music licensing on the web.
  • DJ Anon - The openining session started with electronic music by DJ Anon - the music set a good vibe for the day.
  • Tech Sessions -  There were only two tech session directly related to music: a talk on the fascinating jFugue, and my talk on Search Inside the Music. 
  • Mini music bof - A bunch of us that have an interest in music and Java went out for dinner one evening.  Next year, we should have a real BOF around music and Java.
  • School of Rock - one of the music highlights of JavaOne was during the walk back to the hotel one evening.  There was a stage set up in the middle of Union Square, and a bunch of kids were playing music.   I stopped and watched 3 bands play - these kids were remarkable - playing some extremely difficult songs (like Zappa).  The kids were from the School Of Rock. Awesome stuff.

Robert Kaye blogs that MusicBrainz is now running on the new Sun server.  Robert says "I'm pleased to announce that the new server is largely bored with the traffic we're throwing at it right now -- which is exactly what I had hoped for."  Awesome.

Tuesday May 15, 2007

Mike Love posted this on his blog:

[James Vasile]  proposed separating the recommendation from distribution of the actual song - either by referencing an audio fingerprint or some other unique id (CDDB?). Then each listener can choose a hierarchy of how they would like to find the audio: find the mp3s on the web, last.fm, amazon samples, purchase the songs, etc. This would involve the creation of some standardized XML style format for playlists, and we talked about how Songbird seems like a good open platform for receiving these playlists and then using a diversity of networks to find the audio or at least a sample.  (I’m hoping that somehow last.fm, amazon or itunes will make their samples more portable to benefit from click-through - maybe this is impossible or unlikely)

I replied with this comment, but I'm not sure if it took, so I copied it here because lots of copies keeps stuff safe.

My reply ...

The technologies that you suggest are necessary to promote a music recommendation ecosystem are spot on. We need universal song id and a standard playlist format. The latter already exists. There is a XML playlist format called XSPF (pronounced 'spiff') that captures all the information needed for portable playlists. It is the product of many smart people (including Lucas Gonze from WebJay and Robert Kaye from MusicBrainz). Many folks are working on tools, APIs and players that generate XSPF playlists, that play XSPF playlists and resolve XSPF playlists. I think XSPF has the best chance at becoming *the* universal playlist format. As for SongID, there are many commercial audio fingerprinting systems out there that can derive a unique (or nearly unique) ID just based upon the audio. The problem, however, is that they all cost money to license, and because of that no system has become the standard (defacto or otherwise). The MusicDNS system probably has the best chance, since it is very low cost (essentially free for all but the biggest users), and it ties in with the public domain music metadata being created by the MusicBrainz folk. Still, the problem with a songID system is that unless it is universally used, it is not too useful. Companies like Apple have little incentive to use such a system, since they already own the market. They'd rather not make it easier for others to work in this space. My hope has been that a company like Amazon would come along and adopt these standard formats (you can read more about this in this post) and make a  recommendation ecosystem possible. It hasn't happened yet.

Starting  tomorrow, Pandora will no longer stream audio to Canada.  From the Pandora blog

Right before JavaOne, I ordered a few books from Amazon to read on the airplane.  Unfortunately, they arrived to late for my trip to SFO, so I was left reading an old novel.  But now my reading queue is filled with three books. This should keep me reading for a while.  On the list:


This blog copyright 2010 by plamere