If you have a large music collection you probably know all about messy metadata. Artist, song and album name misspellings are common. Missing or incomplete data are par for the course. Inconsistent numbering and formatting, improper internationalization, duplicates, partial albums, multiple encodings, compilations - all make this a very sticky problem. Now, imagine you are a music researcher with 100,000 tracks, or even imagine you are Last.fm with millions of tracks, all with messy metadata. The messy metadata gets in the way all of the time - it ruins recommendations and playlists, it confuses, and just makes you look clueless when you can't tell that ELP, Emerson, Lake and Palmer, Emerson, Lake & Palmer, EL&P are all all the same band.
Last.fm has decided to take on this problem and solve it. But not just for themselves, but for the world. They are distributing an audio fingerprinter that will collect data on common misspellings for tracks, artists and albums. Soon they will be offering web services that you can use to clean up the data. This will be a boon to all of mankind. RJ describes the project on the last.fm blog: Audio Fingerprinting for Clean Metadata.
I really hope this will tie in with the MusicBrainz database.
Update: Elias points to this post on the MusicBrainz forum where Russ (of last.fm) clears things up:
We have no intention of dropping MusicBrainz (especially since it's
taken so long to get the license in the first place!). MB is a lot more
than just the fingerprinting: we think MB is a great source of metadata
- and the more metadata sources we have, the better. Our fingerprinting
services will definitely return MBIDs in the future.
That's just super! Now hopefully 'the future' is 'real soon'.