Expectation Maximization
Many pattern classification problems (including speech recognition and music classification) are solved by using a set of probability distributions called mixture models to represent a single statistical distribution. The EM algorithm is often used to determine the set of distributions given the raw data and the desired number of distributions. S. Akaho has a Java applet that demonstrates how the EM algorithm works. First you draw your data, next you tell it how many probability distributions you want, and then you tell it to go and it will use the EM algorithm to calculate the best sets of distributions.
A good paper on the EM algorithm is A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models by Jeff A. Bilmes.