Saturday May 21, 2005
Friday May 20, 2005
There are a number of aspects to music similarity. Melody, instrumentation, tempo, rhythm, spectral shape, acoustic density all factor in to our concept of similarity. One aspect of similarity that is very hard to extract directly from the audio is the lyrical content. Lyrics can play a large role in similarity, but for the near future at least, any system that uses lyrics to determine similarity will have to get the lyrics from hand-edited databases such as maintained by Lyrics.com. It is just too hard for a speech recognizer to recognize lyrics (they have enough of a problem recognizing clean spoken speech in a noise free environment). Song lyrics are hard enough for people to understand. The Misheard Lyrics Hall of Fame documents some of the more frequent misheard lyrics. Some of my favorites:
Wrong lyric: The ants are my friend, they're blowin' in the wind Right lyric: The answer, my friend, is blowin' in the wind Wrong lyric: I'll never leave your pizza burning Right lyric: I'll never be your beast of burdenThere are lots more at KissThisGuy.com
Thursday May 19, 2005
Reader Geoffrey Peters sent me a link to his recent project Song Search by Tapping. This is a java applet that allow you to tap out the rhythm of a song on your keyboard and the applet will try to identify the song based solely upon its rhythm. It's a neat little applet that worked well for me. It was even able to distinguish between the songs "London Bridge is Falling Down" and "Mary had a little lamb" which are practically identical. Right now the song database that is searched is very small (only 30 tunes), but Geoffrey says that they are interested in seeing how their system will scale up to much larger databases. There's a paper Song Search and Retrieval by Tapping that describes how the system works.
I've tried a number of music content query systems including systems that use query by humming and query by parsons code. The query-by-humming systems are often troublesome because getting the microphone configuration just right is hard. Parsons codes work well but are not very intuitive and can be difficult for the musically untrained to generate. I find the query-by-tapping interface to be the easiest way to query for music content.
There are some other folks who are looking at query-by-tapping systems. The BeatBank system described in this paper can identify the correct song about two-thirds of the time with a small (56 song database). The Super MBox system shows a 15% accuracy with an 11,000 song database. I'm looking forward to the day when I can use a query-by-tapping system to find that song I heard on the radio on my way to work, but given the current state of the art there's quite a ways to go before that happens.
Wednesday May 18, 2005
We've all heard the The Story about Ping ... but how about the Story about Pong. Now with all of the latest buzz around PS3, XBox 360 and Nintendo Revolutions its good to look back at home video games roots. Today's Nashua Telegraph has a good article about Ralph Baer the inventor of the video game. There's lots more about early video game history too.
Monday May 16, 2005
- softest pppppppp (8 p's) in Ligeti's Etudes for Piano, 1st Book
- loudest ffffffff (8 f's) in Ligeti: Etudes for Piano, 2nd Book, (the 1812 overture only reaches ffff)
- Instruments to be played by one performer in a piece - *Mahler: Symphony no. 5 calls for one clarinetist playing six different instruments.
- Most repeated notes in a melody - 32 in Prokofieff: Toccata, Op. 11 (1912)
There are many others, quite interesting.
Friday May 13, 2005
Today I used google to search for
Musical Information Retrieval is a fairly new area of research. In fact, this year will be only the second year in which a community-wide algorithm and systems evaluation contest is held. The Music Information Retrieval Evaluation eXchange (MIREX) will take place during the 6th ISMIR Conference in London, UK, September 11-15, 2005. The goal of this contest is to compare state-of-the-art algorithms and systems relevant for Music Information Retrieval.
Nine evaluation tasks are planned:
- Audio Artist Identification
- Audio Drum Detection
- Audio Genre Classification
- Audio Melody Extraction
- Audio Onset Detection
- Audio Tempo Extraction
- Audio and Symbolic Key Finding
- Symbolic Genre Classification
- Symbolic Melodic Similarity
I find the Artist and the Genre classification tasks to be particularly interesting.
This year, the MIREX will be using the new M2k (Music to Knowledge) toolset as the framework for the evaluation. The deadlines for MIREX are:
- Participation Statement: June 12, 2005
- Algorithms and 1-page Abstracts: June 26, 2005
- Extended Abstracts: September 2, 2005
Thursday May 12, 2005
Here's another way to have fun on your computer. JFugue is a Java API for music programming. It has a rich set of classes for creating phrases, chords, transforms and so on. Here's a bit of Jfugue code that play a scale:
import org.jfugue.*; public class MyMusicApp { public static void main(String[] args) { Player player = new Player(); Pattern pattern = new Pattern("C D E F G A B"); player.play(pattern); } }JFugue includes Lots of documentation and is released under a creative commons license.
Wednesday May 11, 2005
Yahoo just announced its own music subscription to go: Yahoo Music Unlimited. Here's the good:
- $4.99 per month for access to one million songs ($10 cheaper than napster-to-go)
- Downloads for $0.79
- Load songs into portable devices (just like napster-to-go)
- Supports XSPF: XML Shareable Playlist Format
Here's the bad
- $4.99 is not going to last for long. Expect the price to rise after you are locked in.
- DRM won't let you get at the bits
- Can't load subscription downloads onto the iPod
- Only runs on windows - no Mac, linux or solaris support.
prefuse is an open-source, java-based interface toolkit for building interfactive visualizations of data. Using this toolkit, developers can create responsive, animated graphical interfaces for visualizing, exploring, and manipulating these various forms of data. Here's a cool demo showing a force-directed layout of a social network. According to the paper prefuse: a toolkit for interactive information visualization prefuse is designed to work with very large datasets. One example in the paper demonstrates using prefuse for browsing through a 600,000 node web directory.
This package looks like it will be very useful for visualizing artist and music similarity networks. Some folks over at Berkeley are doing just this with project Orpheus. Orpheus is an exploration and visualization tool for discovering the universe of new and independent recording artists. Check out the demo of orpheus. (Thanks to Brooke Maury for the prefuse tip).
Tuesday May 10, 2005
Scansoft and Nuance announced yesterday that they plan to merge. This continues Scansoft's acquistion of just about every small speech engine business (L&H, Speechworks, Rhetorical). The merged company will be called Nuance.
Monday May 09, 2005
Looks like Napster is getting into the ringtone business. It's a funny business where you charge $0.99 for a high quality full length song that you can listen to anytime on your PC or iPod, and at the same time sell an instrumental, 30 second, rendition of the same song for $1.99. Oh yeah, you can also purchase an AudioTone (which is the actual song audio, degraded for your phone) for $2.99. So why can't I just make my own AudioTone from one of the songs in my mp3 collection? Crazy business model.
Right now, it looks like the only way to get music onto the N91 is either by hooking it up to your computer (via USB) and copying it there, or by recording FM. It doesn't look like there's any support for hooking it up to iTunes or one of the subscription services like Napster-to-go. I think the music market will get really interesting when people can start to browse and buy music from their phone/mp3 player. It will be here very soon!
Friday May 06, 2005
Each scientific discipline has its tough problems. Physicists search for the grand unified theory. Mathematicians try to prove Fermat's last theorem or solve the Poincare Conjecture. Neuroscientists struggle to explain consciousness.
Music Information Retrieval researchers have been similarly vexed by a single tough problem. Like Fermat's last theorem, this problem is simply stated, and many have tried to solve it, yet its answer remains elusive. For many MIR researchers, this problem is what brought them to the field of MIR, while for some, the intractable nature of the problem ultimately drives them away.
In 1965 the US Government commissioned several federal agencies with the task of trying to solve this problem. Some promising progress was made, but ultimately these efforts ground to a halt and the programs were abandoned.
For the 40 years or so since then, a series of solutions have been proposed by successive generations of researchers, musicologists and performers. For me, I am hoping to be able to make my mark in the MIR field by contributing to finding the ultimate solution to this problem. I think we are getting close, and soon we all may be able to celebrate a great advance when MIR researchers can finally answer the question "What are the lyrics to 'louie louie'?
Some resources:
- A good summary of current research
- Final agency report. May 25, 1965
- The latest proposed (but incomplete) solution
- How this unsolved problem is affecting middle school students in Benton Harbor, Michigan.
This blog copyright 2010 by plamere