There are a number of aspects to music similarity. Melody, instrumentation, tempo, rhythm, spectral shape, acoustic density all factor in to our concept of similarity. One aspect of similarity that is very hard to extract directly from the audio is the lyrical content. Lyrics can play a large role in similarity, but for the near future at least, any system that uses lyrics to determine similarity will have to get the lyrics from hand-edited databases such as maintained by It is just too hard for a speech recognizer to recognize lyrics (they have enough of a problem recognizing clean spoken speech in a noise free environment). Song lyrics are hard enough for people to understand. The Misheard Lyrics Hall of Fame documents some of the more frequent misheard lyrics. Some of my favorites:

Wrong lyric: The ants are my friend, they're blowin' in the wind
Right lyric: The answer, my friend, is blowin' in the wind

Wrong lyric: I'll never leave your pizza burning
Right lyric: I'll never be your beast of burden
There are lots more at


My favorite example of how difficult it is to extract speech from audio was a paper entitled: "How to wreck a nice beach."

I think the title says it all, you don't even need to read the paper. (If you still don't get it, say the title out loud.)

Posted by Brian Utterback on May 20, 2005 at 10:09 AM EDT #

Nice blog.I like this. Paul

Posted by Paul on August 23, 2005 at 06:20 AM EDT #

