Over the last few days, I've been looking at building a song lyric similarity model.  The idea is to build a system that can find songs that have similar lyrics to a seed song.  This is useful, for instance, when you want to generate playlists that have some cohesive theme. 

This is a pretty simple idea, but it can be difficult to implement.  First, you need a search engine that can efficiently determine document similarity for a large number of documents.  Second, you need access to lots and lots of lyrics.  Now, I am pretty lucky in that I work with the Advanced Search Technologies folks here in Sun Labs.  The AST team has developed an incredible search engine that will let me do all sorts of neat things - document similarity being one of them.  With this search engine, I can index the lyrics of each song as a document and then make simple queries to find similar documents.  It's fast, it is well architected, and it's written in Java.

So, I have a search engine that will do the heavy lifting - but without lyric data, the search engine would have nothing to do.  So where can I get a whole lot of lyric data?  Why LyricWiki of course! LyricWiki is a wikipedia-style song lyric site, that (unlike most lyric sites) provides lyrics without a flood of invasive ads. LyricWiki has a nice clean Wiki-style interface, and they provide a soap interface to their data.  With the soap interface, I could easily crawl the site to build up a nice database of lyrics. However, it seemed like it might be a bit anti-social to do that, so I contacted Sean Colombo, the creator of LyricWiki. I explained to him the kind of thing I wanted to do, and he graciously offered to give me a dump of his entire database!  What a super thing for him to do!

So armed with a world-class search engine and 300,000 (!) song lyrics, I've been able to build a song lyric similarity engine.  It is pretty neat. First of all, it does a great job of finding song covers (duh!).  But it also works well at generating themed playlists.   For example, I used Led Zeppelin's Gallows Pole as a query.  Here's the resulting playlist, filled with songs that will make you swing (in the worst sense of swing).

This is fun stuff - I'd love to figure out how to make this available for people to play with. Perhaps I'll create a XSPF-generating web site and web service that will give you these playlists.  Let me know if you are interested in such a thing.   Much thanks to Steve and Jeff for help with the search engine, and to Sean for all of the lyrics.
Comments:

"Let me know if you are interested in such a thing." yes please

Posted by juniorbonner on March 03, 2007 at 03:45 AM EST #

Paul this is fantastic. I'd love to hook this into our Second Life xspf parser and player. Could make for some real social fun on the metaverse grid. Also, can you pipe the output to your playlist resolver?

Posted by nathan on March 03, 2007 at 12:14 PM EST #

Post a Comment:
Comments are closed for this entry.

This blog copyright 2010 by plamere