Find directories that DON'T contain a file
Yes, I'm sorting out my music. I've got everything arranged beautifully in the following mantra: /Artist/Album/Track - Artist - Title.ext
and if one exists, the cover sits in /Artist/Album/cover.(jpg|png)
.
I want to scan through all the second-level directories and find the ones that don't have a cover. By second level, I mean I don't care if /Britney Spears/
doesn't have a cover.jpg, but I would care if /Britney Spears/In The Zone/
didn't have one.
Don't worry about the cover-downloading (that's a fun project for me tomorrow) I only care about the glorious bash-fuiness about an inverse-ish find
example.
Solution 1:
Case 1: You know the exact file name to look for
Use find
with test -e your_file
to check if a file exists. For example, you look for directories which have no cover.jpg
in them:
find base_dir -mindepth 2 -maxdepth 2 -type d '!' -exec test -e "{}/cover.jpg" ';' -print
It's case sensitive though.
Case 2: You want to be more flexible
You're not sure of the case, and the extension might be jPg
, png
...
find base_dir -mindepth 2 -maxdepth 2 -type d '!' -exec sh -c 'ls -1 "{}"|egrep -i -q "^cover\.(jpg|png)$"' ';' -print
Explanation:
- You need to spawn a shell
sh
for each directory since piping isn't possible when usingfind
-
ls -1 "{}"
outputs just the filenames of the directoryfind
is currently traversing -
egrep
(instead ofgrep
) uses extended regular expressions;-i
makes the search case insensitive,-q
makes it omit any output -
"^cover\.(jpg|png)$"
is the search pattern. In this example, it matches e.g.cOver.png
,Cover.JPG
orcover.png
. The.
must be escaped otherwise it means that it matches any character.^
marks the start of the line,$
its end
Other search pattern examples for egrep:
Substitute the egrep -i -q "^cover\.(jpg|png)$"
part with:
-
egrep -i -q "cover\.(jpg|png)$"
: Also matchescd_cover.png
,album_cover.JPG
... -
egrep -q "^cover\.(jpg|png)$"
: Matchescover.png
,cover.jpg
, but NOTCover.jpg
(case sensitivity is not turned off) -
egrep -iq "^(cover|front)\.jpg$"
: matches e.g.front.jpg
,Cover.JPG
but notCover.PNG
For more info on this, check out Regular Expressions.
Solution 2:
Simple, it transpires. The following gets a list of directories with the cover and compares that with a list of all the second-level directories. Lines that appear in both "files" are suppressed, leaving a list of directories that need covers.
comm -3 \
<(find ~/Music/ -iname 'cover.*' -printf '%h\n' | sort -u) \
<(find ~/Music/ -maxdepth 2 -mindepth 2 -type d | sort) \
| sed 's/^.*Music\///'
Hooray.
Notes:
-
comm
's arguments are as follows:-
-1
suppress lines unique to file1 -
-2
suppress lines unique to file2 -
-3
suppress lines that appear in both files
-
comm
only takes files, hence the kooky<(...)
input method. This pipes the content via a real [temporary] file.comm
needs sorted input or it doesn't work andfind
does by no means guarantee an order. It also needs to be unique. The firstfind
operation could find multiple files forcover.*
so there could be duplicate entries.sort -u
quickly ruffles those down to one. The second find is always going to be unique.dirname
is a handy tool for getting a file's dir without resorting tosed
(et al).find
andcomm
are both a bit messy with their output. The finalsed
is there to clean things up so you're left withArtist/Album
. This may or may not be desirable for you.