How can I get the extension(s) of a file based on its content?

I'm planning on downloading a bunch of images from a website that don't come with an extension, so I want to add one based on the file's content or mime-type.

file <filename> does a great job at identifying the filetype, however I need the extension.

--extension
      Print a slash-separated list of valid extensions for the file type found.

This is from file's man page, but it does not seem to work:

$ file --extension test_text_file.txt
test_text_file.txt: ???

$ file --extension test_png_file.png
test_png_file.png: ???

$ file --extension test_gif_file.gif
test_gif_file.gif: ???

It literally prints ??? for every file I pass to it, even those that already have a proper extension. All of these are valid files of their types and get recognized perfectly by file without --extension.

Why does file --extension not work for me and what can I use to get a file's extension?

An idea would be to use file --mime-type and then create a dispatch table array that maps known mime-types to their extensions, but I'd much rather have a simpler and safer solution.


Why does file --extension not work for me?

Not only for you. See this question. One of the comments there seems right:

Maybe just a very, very incomplete feature?

I haven't found any standard Unix tool to do the conversion, so your idea may be the easiest solution anyway.

An idea would be to use file --mime-type and then create a dispatch table array that maps known mime-types to their extensions, but I'd much rather have a simpler and safer solution.

Note such a map exists, it's /etc/mime.types. See this another question on Unix & Linux SE. Based on one of the answers I came up with the following function:

function getext() {
   [ "$#" != 1 ] && { echo "Wrong number of arguments. Provide exactly one." >&2; return 254; }
   [ -r "$1" ] || { echo "Not a file, nonexistent or unreadable." >&2; return 1; }
   grep "^$(file -b --mime-type "$1")"$'\t' /etc/mime.types |
      awk -F '\t+' '{print $2}'
}

Usage:

getext test_text_file.txt   # it takes just one argument

Tailor it to your needs, make it a script etc. The main concerns:

  • If succeeded (exit status 0), the output may be non-empty or empty (not even \n).
  • Some mime-types return more than one extension. You can use cut -d ' ' -f 1 to get at most one, it may not be the one you want though.
  • So a custom map file instead of /etc/mime.types may be useful. This command will show you which mime-types exist in the current directory (and subdirectories):

    find . -type f -exec file -b --mime-type {} + | sort | uniq
    
  • grep shouldn't match more than once (at least with /etc/mime.types); ^ (line start) and $'\t' (tab) are there to avoid partial matching. Use grep -m 1 ... (or head -n 1 later) to be sure you'll get at most one line.