metal archives micro analysis
Dec 29, 2014
2 minutes read

background

In the past couple of weeks I’ve been writing a small lib to fetch data from the Encyclopaedia Metallum, aka. the metal archives, and while building it I did a few experiments with the site.

The site itself doesn’t have a public api, the topic has been brought up in their forum a few times but still no promises. They do, however, have a couple of limited, yet similar, apis that are used in some places via ajax, these are the ones I tested:

  • Advanced search
  • Browse bands by letter
  • Browse bands by country

Since the number of the bands on their database wasn’t enormous (~100k), I decided to dump it to make a small analysis. I ended up making 3 dumps in the total, for each endpoints listed before.

The first one (search) returned all the bands containing the following data: id, name, genre, country, url.

The second one (by letter) returned the same data plus the status of the band (active, split-up, etc), but it did not returned all the results, it ommitted the ones using cyrillic alphabets.

The last one (by country) returned all the bands containing the same data as before, with the status, plus the location if any (e.g. “Barcelona, Catalonia”).

Clearly I sticked to the last dataset, since it was the most complete one.

Even though this is not suposed to be a coding post, and as a bonus, this is tldr of the dump process: scraping through all the contries, paginating the results, parsing the data (some of it returned as html), extracting a csv file, importing it to sqlite and fun.

the data

  • date: 2014/12/24
  • bands: 100523
  • genres: 7693 (variations like “Viking Folk Metal, “Viking Metal”)
  • countries: 140
  • locations: 22713
  • status: 5
    • bands active: 54704
    • bands split-up: 33642
    • bands on hold: 2296
    • bands unknown: 5488
    • bands changed name: 4393

 

top 10 genres:

top 10 countries:

top 10 locations:

ps. bands with empty location were the biggest: 4907

 

future

There are many options ahead to use this data in cool ways and since I got the location of most of the bands, my favorite option right now would be a interactive bubble map. We’ll see how this goes.