The results for the Music Recommender Turing Test
challenge are in! In this challenge, I posed the question - "Can
you tell which music recommendation was generated by a human, and which
was generated by a machine?". Two recommendation lists were
generated as the response to the question "If you like Miles Davis, you
might like?". One list was generated by a professional music
critic, the other by an algorithm develop here in the labs. The
lists generated were:
Recommendation List A:
If you like "Miles Davis" you might like:
- John Coltrane
- Duke Ellington
- Thelonious Monk
- Charlie Parker
- Herbie Hancock
- Bill Evans
- Charles Mingus
- Sonny Rollins
- Wayne Shorter
- Weather Report
Recommendation List B:
If you like "Miles Davis" you might like:
- John Coltrane
- Weather Report
- Herbie Hancock
- Wayne Shorter
- Charlie Parker
- Dave Douglas
- Chet Baker
- Tony Williams Lifetime
- Can
- Sly & the Family Stone
The challenge received 29 responses, of which 19 had direct predictions (the others opined about the quality of the recommendation or the difficulty of the challenge). Of the 19 predictions, the results were quite skewed, 15 indicated that they thought that list A was created by a human, while 4 thought that list B was created by a human. Some respondents were quite sure in their predictions:
"Well, this is too easy...A = human
B = machineno doubt in my mind.
Others were less sure:
I bet B is machine generated, although it's a tough one to call
One interesting thing is that although the clear majority thought that list A was created by a human, those that indicated a preference, seemed to prefer list B. One commenter said "B has differences that wouldn't show up in a human's list because a human will stick to the genre. That said I think I like B more."
The Results
To summarize: 19 indicate that list A is a human, while 4
indicate that list B is a human. The actual source of the
lists are:
- List A - a machine - an collaborative filtering system developed in Sun Labs based upon last.fm listener data
- List B - a human - a professional music critic named Dominique.
80% of the respondents got the answer wrong.
BTW,
Dominique - the music critic responded to the comments as such: "yes, I
am most certainly a robot! I like the guy who commented about
miles' different eras -- one of the hardest parts about recommending
based on him is trying to figure out what aspect/period of miles davis
people are responding to."
This was a really fun exercise for me. The resulting dialog provided many insights into what it takes to make a good recommendation. And I find the results to be quite surprising, that a rather traditional recommendation algorithm can mimic a human enough to convince a significant majority that it is not a machine. Perhaps this bodes well for machine recommendation. (Of course this experiment is too small and too casual to draw any long term conclusions).
And the Winner is ...
Of the 4
correct answers, I selected one at random to receive my extra copy of
"Net Blogs and Rock 'N' Roll". The winner could have been David
Jennings (the author of the book), which would have been an interesting
twist, but the coin wavered and sent the winning book to
JuniorBonner. Congrats Junior.