How to evaluate a playlist?
Ben Fields has started the ball rolling with a post on his blog: introductory thoughts on a playlist generation task in MIREX 2009. If you are interested in playlist generation, then join the conversation.
Some ideas that have been floated for a playlist evaluation:
- The traditional IR approach - use a large database of human generated playlists (from webjay, musicmobs etc), randomly remove tracks from the playlist - calculate precision and recall for systems that try to predict what tracks were removed.
- Human evaluation - Have experts (DJs, music critics,) and non-experts evaluate the playlists.
- Create a reverse turing test - present each system with a set of playlists - some human created, some created at random - systems try to predict which playlists are human generated.