Bulk export/embed album art in Banshee

Based on the MD5 lookup in Oli's script (thanks!), I've written a Python script that uses the eyeD3 module to find MP3s, look up the album artwork from Banshee's cache, and embed the artwork inside the MP3s. It skips any files that already have artwork embedded.

It's not perfect, but it worked on about 90% of my MP3s, and you can manually handle any exceptions using EasyTag. As it stands the script expects the MP3s to be two directory levels deep from the target directory (music root/artist/album). The script prints a report once it's done highlighting any files it couldn't process or for which it couldn't find artwork.

Obviously you need to install Python and the eyeD3 module to use it:

#! /usr/bin/env python

import os, sys, glob, eyeD3, hashlib

CACHE_FILE_PREFIX = os.getenv("HOME") + "/.cache/media-art/album-"

def embedAlbumArt(dir = "."):
    artworkNotFoundFiles = []
    errorEmbeddingFiles = []
    noMetadataFiles = []
    mp3s = findMP3Files(dir)

    for mp3 in mp3s:
        print "Processing %s" % mp3

        tag = eyeD3.Tag()
        hasMetadata = tag.link(mp3)

        if not hasMetadata:
            print "No Metadata - skipping."
            noMetadataFiles.append(mp3)
            continue

        if hasEmbeddedArtwork(tag):
            print "Artwork already embedded - skipping."
            continue

        artworkFilename = findAlbumArtworkFile(tag)

        if not artworkFilename:
            print "Couldn't find artwork file - skipping."
            artworkNotFoundFiles.append(mp3)
            continue

        print "Found artwork file: %s" % (artworkFilename)

        wasEmbedded = embedArtwork(tag, artworkFilename)

        if wasEmbedded:
            print "Done.\n"
        else:
            print "Failed to embed.\n"
            errorEmbeddingFiles.append(mp3)

    if artworkNotFoundFiles:
        print "\nArtwork not found for:\n"
        print "\n".join(artworkNotFoundFiles)

    if errorEmbeddingFiles:
        print "\nError embedding artwork in:\n"
        print "\n".join(errorEmbeddingFiles)

    if noMetadataFiles:
        print "\nNo Metadata found for files:\n"
        print "\n".join(noMetadataFiles)

def findMP3Files(dir = "."):    
    pattern = "/".join([dir, "*/*", "*.mp3"])   
    mp3s = glob.glob(pattern)
    mp3s.sort()
    return mp3s

def hasEmbeddedArtwork(tag):
    return len(tag.getImages())

def findAlbumArtworkFile(tag):
    key = "%s\t%s" % (tag.getArtist(), tag.getAlbum())
    md5 = getMD5Hash(key)
    filename = CACHE_FILE_PREFIX + md5 + ".jpg"
    if os.path.exists(filename):
        return filename
    else:
        return 0

def getMD5Hash(string):
    string = string.encode("utf-8")
    md5 = hashlib.md5()
    md5.update(string)
    return md5.hexdigest()

def embedArtwork(tag, artworkFilename):
    tag.addImage(eyeD3.ImageFrame.FRONT_COVER, artworkFilename)
    success = 0
    try:
        success = tag.update()
    except:
        success = 0
    return success

if __name__ == "__main__":
    if len(sys.argv) == 1:
        print "Usage: %s path" % (sys.argv[0])
    else:
        embedAlbumArt(sys.argv[1])

I wrote this little script that follows what Banshee does (which is slightly different to the proper specs).

In short, this loops my music directories and, forms a hash based on the artist and album (from directory names), looks for a file with that hash in and if it exists, copies it into the album's directory. Simple.

#!/bin/bash

TPATH="/home/oli/.cache/media-art/"

cd /media/ned/music/

for f in *; do 
        cd "$f"
        for al in *; do
                THUMB="${TPATH}album-$(echo -ne "$f\t$al" | md5sum | cut -b1-32).jpg"
                if [ -e $THUMB ]; then
                        cp $THUMB ./cover.jpg
                        echo "/media/ned/music/$f/$al/cover.jpg" >> ~/coverlog
                fi
        done
        cd ..        
done

The echo to ~/coverlog is just there to catch where files have been copied to (in case something goes wrong and you need to delete all the cover files this writes.


To make sure I found all albums, I needed to normalize the string to NFKD before hashing. I solved it in python by:

def strip_accents(s):
    return unicodedata.normalize('NFKD', s)

My whole script is based on alphaloop's solution, yet i switched to mutagen to also deal with flac and m4a:

def getArtistAlbum(musicfile):
     """ return artist and album strings of a music file """
     import mutagen

     # key of stored information per file extension
     keys={'flac': ('artist','album'),
           'mp3': ('TPE2','TALB'),
           'm4a': ('\xa9ART','\xa9alb')}

     # read the tag
     tag = mutagen.File(musicfile)
     # get extension of musicfile
     ext = os.path.splitext(musicfile)[1][1:]

    try:
        return tag[keys[ext][0]][0], tag[keys[ext][1]][0]
    except KeyError:
        return None,None