Find directories that DON'T contain a file

Yes, I'm sorting out my music. I've got everything arranged beautifully in the following mantra: /Artist/Album/Track - Artist - Title.ext and if one exists, the cover sits in /Artist/Album/cover.(jpg|png).

I want to scan through all the second-level directories and find the ones that don't have a cover. By second level, I mean I don't care if /Britney Spears/ doesn't have a cover.jpg, but I would care if /Britney Spears/In The Zone/ didn't have one.

Don't worry about the cover-downloading (that's a fun project for me tomorrow) I only care about the glorious bash-fuiness about an inverse-ish find example.


Solution 1:

Case 1: You know the exact file name to look for

Use find with test -e your_file to check if a file exists. For example, you look for directories which have no cover.jpg in them:

find base_dir -mindepth 2 -maxdepth 2 -type d '!' -exec test -e "{}/cover.jpg" ';' -print

It's case sensitive though.

Case 2: You want to be more flexible

You're not sure of the case, and the extension might be jPg, png...

find base_dir -mindepth 2 -maxdepth 2 -type d '!' -exec sh -c 'ls -1 "{}"|egrep -i -q "^cover\.(jpg|png)$"' ';' -print

Explanation:

  • You need to spawn a shell sh for each directory since piping isn't possible when using find
  • ls -1 "{}" outputs just the filenames of the directory find is currently traversing
  • egrep (instead of grep) uses extended regular expressions; -i makes the search case insensitive, -q makes it omit any output
  • "^cover\.(jpg|png)$" is the search pattern. In this example, it matches e.g. cOver.png, Cover.JPG or cover.png. The . must be escaped otherwise it means that it matches any character. ^ marks the start of the line, $ its end

Other search pattern examples for egrep:

Substitute the egrep -i -q "^cover\.(jpg|png)$" part with:

  • egrep -i -q "cover\.(jpg|png)$" : Also matches cd_cover.png, album_cover.JPG ...
  • egrep -q "^cover\.(jpg|png)$" : Matches cover.png, cover.jpg, but NOT Cover.jpg (case sensitivity is not turned off)
  • egrep -iq "^(cover|front)\.jpg$" : matches e.g. front.jpg, Cover.JPG but not Cover.PNG

For more info on this, check out Regular Expressions.

Solution 2:

Simple, it transpires. The following gets a list of directories with the cover and compares that with a list of all the second-level directories. Lines that appear in both "files" are suppressed, leaving a list of directories that need covers.

comm -3 \
    <(find ~/Music/ -iname 'cover.*' -printf '%h\n' | sort -u) \
    <(find ~/Music/ -maxdepth 2 -mindepth 2 -type d | sort) \
| sed 's/^.*Music\///'

Hooray.

Notes:

  • comm's arguments are as follows:

    • -1 suppress lines unique to file1
    • -2 suppress lines unique to file2
    • -3 suppress lines that appear in both files
  • comm only takes files, hence the kooky <(...) input method. This pipes the content via a real [temporary] file.

  • comm needs sorted input or it doesn't work and find does by no means guarantee an order. It also needs to be unique. The first find operation could find multiple files for cover.* so there could be duplicate entries. sort -u quickly ruffles those down to one. The second find is always going to be unique.

  • dirname is a handy tool for getting a file's dir without resorting to sed (et al).

  • find and comm are both a bit messy with their output. The final sed is there to clean things up so you're left with Artist/Album. This may or may not be desirable for you.