Automating the scanning of graphics files for corruption

Does anyone know of a way to check graphics files (particularly JPEG, GIF, and PNG) for corruption (preferably in an automated way)?


Explanation:

A few days ago, a command worked incorrectly and ended up deleting thousands of graphics files from a FAT32 volume that was practically out of space. I’ve used several different file/photo-recovery programs, but naturally, they are limited in how much they can recover (though fortunately the volume has 8KB clusters, which helps somewhat).

Anyway, some of the larger files, that were fragmented, are now corrupt. Some of them are not even real files at all (the recovery software merely dumped the clusters that were pointed to by now-overwritten directory entries), while others are broken because of fragmentation.

Moreover, because some picture formats embed a smaller version of the picture as a thumbnail, scanning the thumbnails for corruption is not reliable because it may be intact while the actual file (i.e., the picture when viewed full-size), could be corrupt.


Here are a couple of examples:

Here’s the second one. It’s so damaged that it doesn’t display anything.

damaged image

(A third one wouldn’t even upload because it doesn’t even have the correct header!)


Solution 1:

Since I stumbled across this while trying to answer the same question I'll add another great solution I found:

Bad Peggy

Screenshot of the application

Usage
From the menu select File > Scan and then use the file dialog to browse to the folder in which the images are located. The program will then start scanning the folder and all subfolders for images (.jpg, .png, .bmp, .gif). If you want to scan a lot of pictures this will take some time, because the program needs fully load and parse the image file, so you might want to let it run overnight.

While it's scanning it'll show a progress percentage in the status bar. Any images it finds that are not perfect will show up directly in the list. If you click any image on the list, it will show a preview of what the image looks like. Quite often a image will only have a minor issue with the file format and the image will still look just fine. Other times the image will not render at all and the preview will be just black. Sometimes the image will be damaged and you'll see something like in the screenshot above.

A very handy trick is click in the column header on Reason and the images will be sorted according to how badly they're damaged (e.g. all the bad file formats that still render correctly will move to the bottom letting you focus on the more serious cases).

Also if the first scan has finished and you start another scan, the results will simply be added to the list. So if you have many different folders with images you can simply scan them sequentially without the list being cleared when you start a new scan. If you do want the list to clear, use the context menu and click Clear list.

Links
Downloads for Windows, Linux and OS X can be found here:
https://www.coderslagoon.com

Source code is here:
https://github.com/llaith/BadPeggy

Solution 2:

Try the jpeginfo '-c' option for your JPEG files.

I've seen the corruption you show happen with bad memory cards too.
What you want should be possible and available, check Corruption of Graphics Files;
a section from the online Encyclopedia of Graphics File Formats.

Also see File Integrity Checks in A Basic Introduction to PNG Features.

You may be interested in this Stackoverflow question,
How do I programmatically check whether an image (PNG, JPEG, or GIF) is corrupted?


Update: Source tarball for version 1.6.1 by Timo Kokkonen.
You should be able to build a binary for your machine.

Solution 3:

ImageMagick's identify program will let you know if an image is corrupt. A 'for i in find' loop testing for a none-0 return code from identify would let you script the test pretty easily to dump a list of damaged or corrupted files. It works on Windows with PowerShell too.

enter image description here

The following code with changes for your path works well in powershell

$stream = [System.IO.StreamWriter] "corrupt_jpegs.txt" 
get-childitem "c:\" -include *.jpg -recurse | foreach ($_) { 
    & "C:\Program Files\ImageMagick-6.7.1-Q16\identify.exe" $_.fullname > $null 
    if($LastExitCode -ne 0){ 
        $stream.writeline($_.fullname) 
    } 
} 
$stream.close()

Solution 4:

This can be done by using the Python Imaging Library's .verify() command.[1]

To run this in Windows, install Python (I installed the current latest release of Python 2), and then install Pillow (a fork of Python Imaging Library (PIL)). Then, copy the code of jpeg_corrupt.py[2] and save its contents to a .PY file, e.g. jpeg_corrupt.py.

Note that I changed the following line of code in jpeg_corrupt.py:
self.globs = ['*.jpg', '*.jpe', '*.jpeg']
to
self.globs = ['*.jpg', '*.jpe', '*.jpeg', '*.png', '*.gif']
This so .PNG and .GIF files will be scanned too.

It can then be executed through the Windows command prompt (cmd.exe) like this: C:\Python27\python.exe "C:\Directory containing the .PY file\jpeg_corrupt.py" "C:\Directory of folder to be scanned"

The first part of the command, 'C:\Python27\python.exe', might be different depending on which version of Python you installed and which directory you installed it to. In my example, it is the default installation directory of Python 2.7.

It should scan all JPG, GIF and PNG images in the specified directory and all of its subdirectories. It will show an output if it detects a corrupted image file.

I ran this on OP's sample image and it gave this error message: ...\YcB9n.png: string index out of range.

The code could also be entered in a .BAT script file, so you can easily run it a specified directory without needing to use the command prompt:

C:\Python27\python.exe "C:\Directory containing the .PY file\jpeg_corrupt.py" "%CD%"
pause



Sources:

[1]: Answer in Stack Overflow - "How do I programmatically check whether an image (PNG, JPEG, or GIF) is corrupted?" by ChristopheD
[2]: Comment by Denilson Sá in the SO answer linked in [1]