If you have a large music collection you probably know all about messy metadata. Artist, song and album name misspellings are common. Missing or incomplete data are par for the course. Inconsistent numbering and formatting, improper internationalization, duplicates, partial albums, multiple encodings, compilations - all make this a very sticky problem. Now, imagine you are a music researcher with 100,000 tracks, or even imagine you are Last.fm with millions of tracks, all with messy metadata. The messy metadata gets in the way all of the time - it ruins recommendations and playlists, it confuses, and just makes you look clueless when you can't tell that ELP, Emerson, Lake and Palmer, Emerson, Lake & Palmer, EL&P are all all the same band.
Last.fm has decided to take on this problem and solve it. But not just for themselves, but for the world. They are distributing an audio fingerprinter that will collect data on common misspellings for tracks, artists and albums. Soon they will be offering web services that you can use to clean up the data. This will be a boon to all of mankind. RJ describes the project on the last.fm blog: Audio Fingerprinting for Clean Metadata.
I really hope this will tie in with the MusicBrainz database.
Update: Elias points to this post on the MusicBrainz forum where Russ (of last.fm) clears things up:
We have no intention of dropping MusicBrainz (especially since it's
taken so long to get the license in the first place!). MB is a lot more
than just the fingerprinting: we think MB is a great source of metadata
- and the more metadata sources we have, the better. Our fingerprinting
services will definitely return MBIDs in the future.
That's just super! Now hopefully 'the future' is 'real soon'.




This month I'm featuring photos of the various music information retrieval labs that are the brains behind music 2.0. 

Getting
a cascading style sheet to work properly to work properly is always
difficult for me. Sometimes (most of the time) things just don't
work they way I think they should. This problem gets magnified
when working on a web app - the edit/build/deploy/test cycle can take 60
seconds or more, add to that the problem of the browser or web server
occasionally caching the CSS so I don't even see the changes that I
thought I made and it can get downright frustrating.