Music Recommender Turing Test Results

The results for the Music Recommender Turing Test challenge are in! In this challenge, I posed the question - "Can you tell which music recommendation was generated by a human, and which was generated by a machine?". Two recommendation lists were generated as the response to the question "If you like Miles Davis, you might like?". One list was generated by a professional music critic, the other by an algorithm develop here in the labs. The lists generated were:

Recommendation List A:

If you like "Miles Davis" you might like:

John Coltrane
Duke Ellington
Thelonious Monk
Charlie Parker
Herbie Hancock
Bill Evans
Charles Mingus
Sonny Rollins
Wayne Shorter
Weather Report

Recommendation List B:

If you like "Miles Davis" you might like:

John Coltrane
Weather Report
Herbie Hancock
Wayne Shorter
Charlie Parker
Dave Douglas
Chet Baker
Tony Williams Lifetime
Can
Sly & the Family Stone

The challenge received 29 responses, of which 19 had direct predictions (the others opined about the quality of the recommendation or the difficulty of the challenge). Of the 19 predictions, the results were quite skewed, 15 indicated that they thought that list A was created by a human, while 4 thought that list B was created by a human. Some respondents were quite sure in their predictions:

"Well, this is too easy...

A = human

B = machine

no doubt in my mind.

Others were less sure:

I bet B is machine generated, although it's a tough one to call

One interesting thing is that although the clear majority thought that list A was created by a human, those that indicated a preference, seemed to prefer list B. One commenter said "B has differences that wouldn't show up in a human's list because a human will stick to the genre. That said I think I like B more."

The Results

To summarize: 19 indicate that list A is a human, while 4 indicate that list B is a human. The actual source of the lists are:

List A - a machine - an collaborative filtering system developed in Sun Labs based upon last.fm listener data
List B - a human - a professional music critic named Dominique.

80% of the respondents got the answer wrong.

BTW, Dominique - the music critic responded to the comments as such: "yes, I am most certainly a robot! I like the guy who commented about miles' different eras -- one of the hardest parts about recommending based on him is trying to figure out what aspect/period of miles davis people are responding to."

This was a really fun exercise for me. The resulting dialog provided many insights into what it takes to make a good recommendation. And I find the results to be quite surprising, that a rather traditional recommendation algorithm can mimic a human enough to convince a significant majority that it is not a machine. Perhaps this bodes well for machine recommendation. (Of course this experiment is too small and too casual to draw any long term conclusions).

And the Winner is ...

Of the 4 correct answers, I selected one at random to receive my extra copy of "Net Blogs and Rock 'N' Roll". The winner could have been David Jennings (the author of the book), which would have been an interesting twist, but the coin wavered and sent the winning book to JuniorBonner. Congrats Junior.

Comments:

cool
first time I've won anything for ages
thanks a lot

Posted by juniorbonner on October 06, 2007 at 08:34 AM EDT #

Phew, close shave! I've seen about <a href="http://www.netblogsrocknroll.com/2007/08/dont-you-wish-y.html">as much of that book</a> as I can tolerate...<wink>

I seem to remember I said my answer was a guess. Now that I see it was right, I realise that it was a matter of tacit subconscious inner wisdom.

But to be semi-serious for a moment, Paul, your post doesn't make explicit the obvious conclusion that you at Sun have devised a computer that passes the Turing Test. When can we expect the press releases, the Nobel prizes and all that follows from that?

Posted by David Jennings on October 06, 2007 at 11:05 AM EDT #

Very fun. I think it is interesting that "those that indicated a preference, seemed to prefer list B"... does that mean that there is something innate about a "Human" recommendation that bubbles under the surface and speaks to us more clearly?

Liking the performers in 'List A' better and welcoming our robot overlords,
Zac

Posted by Zac on October 06, 2007 at 04:29 PM EDT #

I think it is awesome that list B (the human list) had Can in it. That's what convinced me that B was human. Hardly anyone knows about Can, so a traditional recommendation engine wouldn't see it.

I think the problem with most recommendation engines is that they don't capture passion. A good recommendation should be someone you don't already know about, otherwise, what's the point? Since most recommendation engines are based primarily on popularity, they will usually only tell you about things you are more likely to already know about. The advantage of humans is that they will tell you obscure bands that they are passionate about, that deserve more recognition -- or at least a good listen.

Posted by Wm on November 01, 2007 at 02:23 PM EDT #

Duke Listens!: Visit my main blog at MusicMachinery.com

Music Recommender Turing Test Results

About this weblog

Index

Your Current Location