One of the challenges of trying to build a music similarity model is figuring out how to evaluate the model.   There's no absolute ground truth that you can rely on (something that you can do with other tasks such as artist identification).  Some researchers have proposed some good ways of estimating similarity performance but when it gets right down to it  there's no real substitute for human judgement - having people listen to two songs and decide whether or not they are similar.


This year as part of ISMIR 2006, MIREX (the Music Information Retrieval Evaluation eXchange) is organizing the first music similarity evaluation. The centerpiece of the evaluation is a human evaluation. Human judges listening to music to decide how similar songs are to a seed song.  



The evaluation is coordinated by the Evalutron 6000 that is being created by Dr Stephen Downie and his IMIRSEL team.  Stephen and his ace crew have done a great job figuring out how to put together a system to make the evaluations possible.   Probably the hardest thing to do is to figure out how to setup an evaluation that will satisfy all of the researchers.  Everyone has an opinion about how such an evaluation should be done and it is harder than herding cats getting everyone to agree.

There's a sandbox version of the Evalutron 6000 that you can try if you are curious to see how the evaluation works.

I
Comments:

Post a Comment:
Comments are closed for this entry.

This blog copyright 2010 by plamere