Disk usage of files whose names match a regex, in Linux?
So, in many situations I wanted a way to know how much of my disk space is used by what, so I know what to get rid of, convert to another format, store elsewhere (such as data DVDs), move to another partition, etc. In this case I'm looking at a Windows partition from a SliTaz Linux bootable media.
In most cases, what I want is the size of files and folders, and for that I use NCurses-based ncdu:
But in this case, I want a way to get the size of all files matching a regex. An example regex for .bak files:
.*\.bak$
How do I get that information, considering a standard Linux with core GNU utilities or BusyBox?
Edit: The output is intended to be parseable by a script.
Solution 1:
I suggest something like: find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1
Some notes:
- The
-print0
option forfind
and--files0-from
fordu
are there to avoid issues with whitespace in file names - The regular expression is matched against the whole path, e.g.
./dir1/subdir2/file.bak
, not justfile.bak
, so if you modify it, take that into account - I used
h
flag for du to produce a "human-readable" format but if you want to parse the output, you may be better off withk
(always use kilobytes) - If you remove the
tail
command, you will additionally see the sizes of particular files and directories
Sidenote: a nice GUI tool for finding out who ate your disk space is FileLight. It doesn't do regexes, but is very handy for finding big directories or files clogging your disk.
Solution 2:
du
is my favorite answer. If you have a fixed filesystem structure, you can use:
du -hc *.bak
If you need to add subdirs, just add:
du -hc *.bak **/*.bak **/**/*.bak
etc etc
However, this isn't a very useful command, so using your find:
TOTAL=0;for I in $(find . -name \*.bak); do TOTAL=$((TOTAL+$(du $I | awk '{print $1}'))); done; echo $TOTAL
That will echo the total size in bytes of all of the files you find.
Hope that helps.
Solution 3:
Run this in a Bourne Shell to declare a function that calculates the sum of sizes of all the files matching a regex pattern in the current directory:
sizeofregex() { IFS=$'\n'; for x in $(find . -regex "$1" 2> /dev/null); do du -sk "$x" | cut -f1; done | awk '{s+=$1} END {print s}' | sed 's/^$/0/'; unset IFS; }
(Alternatively, you can put it in a script.)
Usage:
cd /where/to/look
sizeofregex 'myregex'
The result will be a number (in KiB), including 0
(if there are no files that match your regex).
If you do not want it to look in other filesystems (say you want to look for all .so
files under /
, which is a mount of /dev/sda1
, but not under /home
, which is a mount of /dev/sdb1
, add a -xdev
parameter to find
in the function above.