How can I get the extension(s) of a file based on its content?
I'm planning on downloading a bunch of images from a website that don't come with an extension, so I want to add one based on the file's content or mime-type.
file <filename>
does a great job at identifying the filetype, however I need the extension.
--extension
Print a slash-separated list of valid extensions for the file type found.
This is from file
's man page, but it does not seem to work:
$ file --extension test_text_file.txt
test_text_file.txt: ???
$ file --extension test_png_file.png
test_png_file.png: ???
$ file --extension test_gif_file.gif
test_gif_file.gif: ???
It literally prints ???
for every file I pass to it, even those that already have a proper extension. All of these are valid files of their types and get recognized perfectly by file
without --extension
.
Why does file --extension
not work for me and what can I use to get a file's extension?
An idea would be to use file --mime-type
and then create a dispatch table array that maps known mime-types to their extensions, but I'd much rather have a simpler and safer solution.
Why does
file --extension
not work for me?
Not only for you. See this question. One of the comments there seems right:
Maybe just a very, very incomplete feature?
I haven't found any standard Unix tool to do the conversion, so your idea may be the easiest solution anyway.
An idea would be to use
file --mime-type
and then create a dispatch table array that maps known mime-types to their extensions, but I'd much rather have a simpler and safer solution.
Note such a map exists, it's /etc/mime.types
. See this another question on Unix & Linux SE. Based on one of the answers I came up with the following function:
function getext() {
[ "$#" != 1 ] && { echo "Wrong number of arguments. Provide exactly one." >&2; return 254; }
[ -r "$1" ] || { echo "Not a file, nonexistent or unreadable." >&2; return 1; }
grep "^$(file -b --mime-type "$1")"$'\t' /etc/mime.types |
awk -F '\t+' '{print $2}'
}
Usage:
getext test_text_file.txt # it takes just one argument
Tailor it to your needs, make it a script etc. The main concerns:
- If succeeded (exit status
0
), the output may be non-empty or empty (not even\n
). - Some mime-types return more than one extension. You can use
cut -d ' ' -f 1
to get at most one, it may not be the one you want though. -
So a custom map file instead of
/etc/mime.types
may be useful. This command will show you which mime-types exist in the current directory (and subdirectories):find . -type f -exec file -b --mime-type {} + | sort | uniq
grep
shouldn't match more than once (at least with/etc/mime.types
);^
(line start) and$'\t'
(tab) are there to avoid partial matching. Usegrep -m 1 ...
(orhead -n 1
later) to be sure you'll get at most one line.