For this Open Research experiment, I shall be working with a set of social tag collected from via Audioscrobbler web services. Since it might be handy for anyone following along to have access to the same data set, I am making this data set available directly for any researcher who wants to use it.

The dataset is available for download here: Lastfm-ArtistTags2007

Here are the details as told in the README file:

The LastFM-ArtistTags2007 Data set
Version 1.0
June 2008

What is this?

    This is a set of artist tag data collected from using
    the Audioscrobbler webservice during the spring of 2007.

    The data consists of the raw tag counts for the 100 most
    frequently occuring tags that listeners have applied
    to over 20,000 artists.

    An undocumented (and deprecated) option of the audioscrobbler
    web service was used to bypass the normalization of tag
    counts.  This data set provides raw tag counts.

Data Format:

  The data is formatted one entry per line as follows:



    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>art punk<sep>21
    11eabe0c-2638-4808-92f9-1dbd9c453429<sep>Deerhoof<sep>art rock<sep>18

Data Statistics:

    Total Lines:      952810
    Unique Artists:    20907
    Unique Tags:      100784
    Total Tags:      7178442


    Some minor filtering has been applied to the tag data. will
    report tag with counts of zero or less on occasion. These tags have
    been removed.

    Artists with no tags have not been included in this data set.
    Of the nearly quarter million artists that were inspected, 20,907
    artists had 1 or more tags.


    ArtistTags.dat  - the tag data
    README.txt      - this file
    artists.txt     - artists ordered by tag count
    tags.txt        - tags ordered by tag count


    The data in LastFM-ArtistTags2007 is distributed with permission of  The data is made available for non-commercial use only under
    the Creative Commons Attribution-NonCommercial-ShareAlike UK License.
    Those interested in using the data or web services in a commercial
    context should contact partners at last dot fm. For more information


    Thanks to for providing the access to this tag data via their
    web services


    This data was collected, filtered and by Paul Lamere of Sun Labs. Send
    questions or comments to [email protected]

What's all this then? This is an experiment in 'open research' - I'm going to blog my research on a particular topic. Suggestions are welcome

Table of Contents


Hello Paul!
Thanks for that - that's really useful! I will try to add support for the tag web service on, one of these days.

It looks like the <sep> in your examples get thrown out by my feed reader - I was wondering whether the band in the example was Deerhoof or Deerhoo. In the latter case, it would have been a really weird set of tags :-)


Posted by Yves on June 10, 2008 at 05:53 AM EDT #

Thanks Yves, I've fixed the <sep> issue.

Posted by Paul on June 10, 2008 at 06:01 AM EDT #

Post a Comment:
Comments are closed for this entry.

This blog copyright 2010 by plamere