Determine file type of an image

I'm downloading some images from a service that doesn't always include a content-type and doesn't provide an extension for the file I'm downloading (ugh, don't ask).

What's the best way to determine the image format in .NET?

The application that is reading these downloaded images needs to have a proper file extension or all hell breaks loose.


Solution 1:

A probably easier approach would be to use Image.FromFile() and then use the RawFormat property, as it already knows about the magic bits in the headers for the most common formats, like this:

Image i = Image.FromFile("c:\\foo");
if (System.Drawing.Imaging.ImageFormat.Jpeg.Equals(i.RawFormat)) 
    MessageBox.Show("JPEG");
else if (System.Drawing.Imaging.ImageFormat.Gif.Equals(i.RawFormat))
    MessageBox.Show("GIF");
//Same for the rest of the formats

Solution 2:

You can use code below without reference of System.Drawing and unnecessary creation of object Image. Also you can use Alex solution even without stream and reference of System.IO.

public enum ImageFormat
{
    bmp,
    jpeg,
    gif,
    tiff,
    png,
    unknown
}

public static ImageFormat GetImageFormat(Stream stream)
{
    // see http://www.mikekunz.com/image_file_header.html
    var bmp = Encoding.ASCII.GetBytes("BM");     // BMP
    var gif = Encoding.ASCII.GetBytes("GIF");    // GIF
    var png = new byte[] { 137, 80, 78, 71 };    // PNG
    var tiff = new byte[] { 73, 73, 42 };         // TIFF
    var tiff2 = new byte[] { 77, 77, 42 };         // TIFF
    var jpeg = new byte[] { 255, 216, 255, 224 }; // jpeg
    var jpeg2 = new byte[] { 255, 216, 255, 225 }; // jpeg canon

    var buffer = new byte[4];
    stream.Read(buffer, 0, buffer.Length);

    if (bmp.SequenceEqual(buffer.Take(bmp.Length)))
        return ImageFormat.bmp;

    if (gif.SequenceEqual(buffer.Take(gif.Length)))
        return ImageFormat.gif;

    if (png.SequenceEqual(buffer.Take(png.Length)))
        return ImageFormat.png;

    if (tiff.SequenceEqual(buffer.Take(tiff.Length)))
        return ImageFormat.tiff;

    if (tiff2.SequenceEqual(buffer.Take(tiff2.Length)))
        return ImageFormat.tiff;

    if (jpeg.SequenceEqual(buffer.Take(jpeg.Length)))
        return ImageFormat.jpeg;

    if (jpeg2.SequenceEqual(buffer.Take(jpeg2.Length)))
        return ImageFormat.jpeg;

    return ImageFormat.unknown;
}

Solution 3:

All the image formats set their initial bytes to a particular value:

  • JPG: 0xFF 0xD8
  • PNG: 0x89 0x50 0x4E 0x47 0x0D 0x0A 0x1A 0x0A
  • GIF: 'G' 'I' 'F'

Search for "jpg file format" replacing jpg with the other file formats you need to identify.

As Garth recommends, there is a database of such 'magic numbers' showing the file type of many files. If you have to detect a lot of different file types it's worthwhile looking through it to find the information you need. If you do need to extend this to cover many, many file types, look at the associated file command which implements the engine to use the database correctly (it's non trivial for many file formats, and is almost a statistical process)

-Adam

Solution 4:

Adam is pointing in exactly the right direction.

If you want to find out how to sense almost any file, look at the database behind the file command on a UNIX, Linux, or Mac OS X machine.

file uses a database of “magic numbers” — those initial bytes Adam listed — to sense a file's type. man file will tell you where to find the database on your machine, e.g. /usr/share/file/magic. man magic will tell you its format.

You can either write your own detection code based on what you see in the database, use pre-packaged libraries (e.g. python-magic), or — if you're really adventurous — implement a .NET version of libmagic. I couldn't find one, and hope another member can point one out.

In case you don't have a UNIX machine handy, the database looks like this:

# PNG [Portable Network Graphics, or "PNG's Not GIF"] images
# (Greg Roelofs, [email protected])
# (Albert Cahalan, [email protected])
#
# 137 P N G \r \n ^Z \n [4-byte length] H E A D [HEAD data] [HEAD crc] ...
#
0       string          \x89PNG         PNG image data,
>4      belong          !0x0d0a1a0a     CORRUPTED,
>4      belong          0x0d0a1a0a
>>16    belong          x               %ld x
>>20    belong          x               %ld,
>>24    byte            x               %d-bit
>>25    byte            0               grayscale,
>>25    byte            2               \b/color RGB,
>>25    byte            3               colormap,
>>25    byte            4               gray+alpha,
>>25    byte            6               \b/color RGBA,
#>>26   byte            0               deflate/32K,
>>28    byte            0               non-interlaced
>>28    byte            1               interlaced
1       string          PNG             PNG image data, CORRUPTED

# GIF
0       string          GIF8            GIF image data
>4      string          7a              \b, version 8%s,
>4      string          9a              \b, version 8%s,
>6      leshort         >0              %hd x
>8      leshort         >0              %hd
#>10    byte            &0x80           color mapped,
#>10    byte&0x07       =0x00           2 colors
#>10    byte&0x07       =0x01           4 colors
#>10    byte&0x07       =0x02           8 colors
#>10    byte&0x07       =0x03           16 colors
#>10    byte&0x07       =0x04           32 colors
#>10    byte&0x07       =0x05           64 colors
#>10    byte&0x07       =0x06           128 colors
#>10    byte&0x07       =0x07           256 colors

Good luck!