What is the relationship between MIME types and File .extensions?

I may have the wrong idea here, but I thought that:

MIME types are identifying codes, embedded inside a file.
File .extensions are idenifying codes, suffixed to the file's name.

I thought, from something I heard in the dim dark ages, that Linux was .extension agnostic... ( maybe it was in the early days, and things have changed since then...? )

I've recently come from the Windows world, where, at the Operating-System level, a file .extension is the only way (as far as I know) to associate it with its relevant Application program.

Because I don't know why, I find it a bit disconcerting that a file named "fred" and a file named "fred.txt" both open up in a Text Editor.

Is there a clear-cut hierarchy at work here?


Solution 1:

MIME types are just a way to name types. They don't have anything to do with how the type of a file is determined.

There are two ways to determine the type of a file: a) Look at its extension and hope that it is accurate or b) look at its contents and then guess based on that. If a file has no extension b is the only option.

Many (binary) file formats have a specific header that you can look at to determine their type. This makes option b quite reliable for those types.

Plain text file formats can often be determined by their structure (if a file contains a lot of html tags, it's probably a html file).

On unix and linux systems you can use the file command line utility to find out the type of a file based on its contents.

File manager often use some combination of option a and b (e.g. look at the file extension first, if it's not known (or the file does not have an extension), look at the contents).

A file's type is not stored as metadata on common linux file systems.

Solution 2:

In Linux, the file extension is part of the file name and doesn't actually mean anything to the operating system. A MIME type is a description of the content of the file. fred and fred.txt would both have a MIME type of text/plain.

File managers, such as Nautilus use this MIME type to know which program to open the file with. In a terminal, the xdg-open command does the same thing. However, this is user space level not operating system level.

Files don't actually contain this mime type, but the programs that open them use various methods to work out what this MIME type is. Some rely only on file extension, but most use a mixture of techniques including looking at the data at the beginning of the file.

If the beginning of the file only contains bytes that can be represented as ASCII characters, it is safe to assume it is a text file. If then the extension is something like .html, the program assumes it is a HTML page and so opens it with a web browser. The same thing works with binary file formats. For example, I know that the bitmap file format begins with 'BMP' in ASCII followed by binary data representing the image.

Put simply, Linux programs make an educated guess based upon the data in a file and its file extension. This may not sound very reliable but the algorithms used are more complicated than the examples I have given and are actually really accurate.

Solution 3:

In the linux world File extensions are only one indicator for the type of a file. There is a command line tool called "file" which guesses the type of a file.

To get this type there are mainly 3 indicators:

  • The extension
  • Special attributes in the filesystem (like for sym- or hardlinks, folders, input device, etc.)
  • The content
    • Binary (like ELF information in executables)
    • Textual (like <html>, #!/bin/bash)

The relation between the file type and the MIME type is, that MIME is only a standard to represent a file's type (like text/css).

Linux "guesses" the file type and opens an associated program for that type.