Songs that make you swing
Over the last few days, I've been looking at building a song lyric similarity model. The idea is to build a system that can find songs that have similar lyrics to a seed song. This is useful, for instance, when you want to generate playlists that have some cohesive theme.
This is a pretty simple idea, but it can be difficult to implement. First, you need a search engine that can efficiently determine document similarity for a large number of documents. Second, you need access to lots and lots of lyrics. Now, I am pretty lucky in that I work with the Advanced Search Technologies folks here in Sun Labs. The AST team has developed an incredible search engine that will let me do all sorts of neat things - document similarity being one of them. With this search engine, I can index the lyrics of each song as a document and then make simple queries to find similar documents. It's fast, it is well architected, and it's written in Java.
So, I have a search engine that will do the heavy lifting - but without lyric data, the search engine would have nothing to do. So where can I get a whole lot of lyric data? Why LyricWiki of course! LyricWiki is a wikipedia-style song lyric site, that (unlike most lyric sites) provides lyrics without a flood of invasive ads. LyricWiki has a nice clean Wiki-style interface, and they provide a soap interface to their data. With the soap interface, I could easily crawl the site to build up a nice database of lyrics. However, it seemed like it might be a bit anti-social to do that, so I contacted Sean Colombo, the creator of LyricWiki. I explained to him the kind of thing I wanted to do, and he graciously offered to give me a dump of his entire database! What a super thing for him to do!
So armed with a
world-class search engine and 300,000 (!) song lyrics, I've been able
to build a song lyric similarity engine. It is pretty neat. First
of all, it does a great job of finding song covers (duh!). But it
also works well at generating themed playlists. For example,
I used Led Zeppelin's Gallows Pole as a query. Here's the resulting playlist, filled with songs that will make you swing (in the worst sense of swing).
- Peter, Paul & Mary:Hangman
- Robert Plant: Hey Joe
- Dust for life: The End
- The Walkabouts: Hang_Man
- Smog: Hangman blues
- Bay Laurel: We Lost
- Samael: Worship him
Posted by juniorbonner on March 03, 2007 at 03:45 AM EST #
Posted by nathan on March 03, 2007 at 12:14 PM EST #