I need 5,000 MP3 files - quick
One of the difficulties with building a Music Information Retrieval system is acquiring enough music data for training and testing of the system. Music publishers hold the IP of the music very tightly which makes it difficult for MIR researchers to use and share their music. If I have published some MIR research results using a body of 5,000 songs and you want to repeat the experiment and duplicate the results you will have to build up the identical 5,000 song collection on your own. This is unlike other disciplines: Text retrieval folks can use Text Retrieval Conference (trec) data, speech researchers can use the extensive data provided by the Linguistic Data Consortium. There is currently no equivalent resource for music resources.
The International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) Project, is an attempt to rectify this problem by providing a standard set of evaluations and a common set of music data for MIR researchers to use. Dr. Downie and his team are working hard to establish this as the 'TREC' of Music Information Retrieval. They have just released M2K which will serve as the MIR evaluation framework for this year's ISMIR (the International Conference on Music Information Retrieval). The IMIRSEL looks to be a great resource for MIR researchers.
Unfortunately, the IMIRSEL doesn't offer any music data collections as of yet so researchers have to look elsewhere for music data. One excellent source of music data is Magnatune.com, the open source record label. Magnatune makes available for download over 350 albums by nearly 200 artists, for a total of nearly 5,000 songs spanning all genres. The music is licensed under the creative commons license which means that non-commercial use of their music is free. The 128 bit encoded MP3 files are consistently tagged with genre information making them great for a number of MIR tasks such as artist classification, genre classification and music similarity (and the music sounds good too!). Until the IMIRSEL is fully established, Magnatune may be the best place to find good data for MIR research.
Posted by Michelle Meyer on March 08, 2007 at 11:09 PM EST #