Is it possible to store arbitrary data in an image file?
Solution 1:
For most file formats, yes. For example, PNG files are composed of typed chunks, so you could add a chunk named aAAA
or lOLZ
with arbitrary data. JPEG has "application-specific" segments APPn
; the Exif tags in JPEGs are actually a complete TIFF structure inside such a tag. Other formats such as GIF are not extensible, but they often do have a field for textual comments; this has already been abused.
However, there are ways to protect against this – for example, such websites as Imgur automatically process all uploads with pngcrush
or similar tools, which drop anything that's not absolutely required.
But in the end, data exchange cannot be prevented. Aside from the aforementioned image steganography, you have Twitter and its clones, dozens of pastebins (in which incomprehensible posts are considered fairly normal), comment forms of old blog posts (I'm still trying to remember the book this was suggested in), ... more realistically, most malware will simply contact their "own" servers.
Solution 2:
There is no such thing as malicious data. Data doesn't become malicious until it's executed, at which point it's no longer data. The problem with this sort of thing would not be the image, it'd be the software (Windows, Photoshop, whatever) that contains a bug that causes the data to be executed. This is obviously an important concern of major software vendors, and you can be fairly sure they would fix these bugs very soon after they've been discovered.
That said, as is stated in the other answers, it is possible to add data that is not part of the image itself to the file. However, this is often useful or even standard practice. I think it's much more important to be careful with executables than with random images you find on the internet. The risk here is not that big.
Solution 3:
Image files, including PNG, have a specific format. The header portion of the file describes the image, and any data following would be interpreted as image data (based on the headers).
However, you can append arbitrary data to the end of a PNG, past the image data, which can then be read later. This would be fairly easy to detect - there shouldn't be any data past the end of the image data.
Alternatively, you can encode arbitrary data into the image itself, using steganography. This subtly alters the image itself in a way that is largely indetectable unless you know exactly what to look for (prior knowledge of the encoding method is often required).