Command of files and size
You could use file
to determine the actual file type (MIME type) based on its content instead of the file extension, and you can use pure Bash to aggregate the size sum per type.
Have a look at this example:
$ find Pictures/ -printf '%s\t' -exec file --brief --mime-type {} \;|{ declare -A A;while IFS=$'\t' read -r B T;do A["$T"]=$((A["$T"]+B));done;for T in "${!A[@]}";do printf '%12d\t%s\n' "${A["$T"]}" "$T";done;}|sort -bnr
72046936 image/jpeg
57324445 image/png
23712181 application/x-7z-compressed
17144737 image/gif
6563757 image/x-xcf
697098 image/svg+xml
53248 inode/directory
And to verify the results, the sum of all values above is exactly equal to what du
reports:
$ du -sb Pictures/
177542402 Pictures/
Here's the used command-line from above commented and formatted in a more readable way as a script:
#!/bin/bash
# Recursively find all files (and directories) in `Pictures/`,
# then output their size on disk in bytes, followed by a tab and the output of `file`,
# showing only the short MIME type without path and extra info (e.g. "image/png"):
find Pictures/ -printf '%s\t' -exec file --brief --mime-type {} \; | {
# declare the `ARR` variable to be an associative array (mapping type strings to total size)
declare -A ARR
# parse the above output line by line, reading the tab-separated columns into
# the variables `BYTES` and `TYPE` respectively
while IFS=$'\t' read -r BYTES TYPE ; do
# add the current `BYTES` number to the corresponding entry in our `ARR` array
ARR["$TYPE"]=$(( ARR["$TYPE"] + BYTES ))
done
# loop over all keys (MIME types) in our `ARR` array
for TYPE in "${!ARR[@]}" ; do
# output the total bytes (right-aligned up to 12 digits) followed by a tab and the type
printf '%12d\t%s\n' "${ARR["$TYPE"]}" "$TYPE"
done
# sort the resulting output table numerically, in descending order and ignoring leading space
} | sort -bnr
A method would be:
find . -name '?*.*' -type f -printf '%b.%f\0' |
awk -F . -v RS='\0' '
{s[$NF] += $1; n[$NF]++}
END {for (e in s) printf "%15d %4d %s\n", s[e]*512, n[e], e}' |
sort -n
Result from my Desktop:
873172992 1 mkv