Music Recommender Turing Test Results
The results for the Music Recommender Turing Test
challenge are in! In this challenge, I posed the question - "Can
you tell which music recommendation was generated by a human, and which
was generated by a machine?". Two recommendation lists were
generated as the response to the question "If you like Miles Davis, you
might like?". One list was generated by a professional music
critic, the other by an algorithm develop here in the labs. The
lists generated were:
Recommendation List A:
If you like "Miles Davis" you might like:
- John Coltrane
- Duke Ellington
- Thelonious Monk
- Charlie Parker
- Herbie Hancock
- Bill Evans
- Charles Mingus
- Sonny Rollins
- Wayne Shorter
- Weather Report
Recommendation List B:
If you like "Miles Davis" you might like:
- John Coltrane
- Weather Report
- Herbie Hancock
- Wayne Shorter
- Charlie Parker
- Dave Douglas
- Chet Baker
- Tony Williams Lifetime
- Can
- Sly & the Family Stone
The challenge received 29 responses, of which 19 had direct predictions (the others opined about the quality of the recommendation or the difficulty of the challenge). Of the 19 predictions, the results were quite skewed, 15 indicated that they thought that list A was created by a human, while 4 thought that list B was created by a human. Some respondents were quite sure in their predictions:
"Well, this is too easy...A = human
B = machineno doubt in my mind.
Others were less sure:
I bet B is machine generated, although it's a tough one to call
One interesting thing is that although the clear majority thought that list A was created by a human, those that indicated a preference, seemed to prefer list B. One commenter said "B has differences that wouldn't show up in a human's list because a human will stick to the genre. That said I think I like B more."
The Results
To summarize: 19 indicate that list A is a human, while 4
indicate that list B is a human. The actual source of the
lists are:
- List A - a machine - an collaborative filtering system developed in Sun Labs based upon last.fm listener data
- List B - a human - a professional music critic named Dominique.
80% of the respondents got the answer wrong.
BTW,
Dominique - the music critic responded to the comments as such: "yes, I
am most certainly a robot! I like the guy who commented about
miles' different eras -- one of the hardest parts about recommending
based on him is trying to figure out what aspect/period of miles davis
people are responding to."
This was a really fun exercise for me. The resulting dialog provided many insights into what it takes to make a good recommendation. And I find the results to be quite surprising, that a rather traditional recommendation algorithm can mimic a human enough to convince a significant majority that it is not a machine. Perhaps this bodes well for machine recommendation. (Of course this experiment is too small and too casual to draw any long term conclusions).
And the Winner is ...
Of the 4
correct answers, I selected one at random to receive my extra copy of
"Net Blogs and Rock 'N' Roll". The winner could have been David
Jennings (the author of the book), which would have been an interesting
twist, but the coin wavered and sent the winning book to
JuniorBonner. Congrats Junior.
cool
first time I've won anything for ages
thanks a lot
Posted by juniorbonner on October 06, 2007 at 08:34 AM EDT #
Phew, close shave! I've seen about <a href="http://www.netblogsrocknroll.com/2007/08/dont-you-wish-y.html">as much of that book</a> as I can tolerate...<wink>
I seem to remember I said my answer was a guess. Now that I see it was right, I realise that it was a matter of tacit subconscious inner wisdom.
But to be semi-serious for a moment, Paul, your post doesn't make explicit the obvious conclusion that you at Sun have devised a computer that passes the Turing Test. When can we expect the press releases, the Nobel prizes and all that follows from that?
Posted by David Jennings on October 06, 2007 at 11:05 AM EDT #
Very fun. I think it is interesting that "those that indicated a preference, seemed to prefer list B"... does that mean that there is something innate about a "Human" recommendation that bubbles under the surface and speaks to us more clearly?
Liking the performers in 'List A' better and welcoming our robot overlords,
Zac
Posted by Zac on October 06, 2007 at 04:29 PM EDT #
I think it is awesome that list B (the human list) had Can in it. That's what convinced me that B was human. Hardly anyone knows about Can, so a traditional recommendation engine wouldn't see it.
I think the problem with most recommendation engines is that they don't capture passion. A good recommendation should be someone you don't already know about, otherwise, what's the point? Since most recommendation engines are based primarily on popularity, they will usually only tell you about things you are more likely to already know about. The advantage of humans is that they will tell you obscure bands that they are passionate about, that deserve more recognition -- or at least a good listen.
Posted by Wm on November 01, 2007 at 02:23 PM EDT #