Opened a JPG picture with notepad, pasted all the "text" to a new notepad file, changed to .JPG and it no longer opens. Why?
This phenomenon has been leaving me questions to ask.
Here is the detailed experiment, my OS is Windows 7 x64 SP1:
- I changed a picture (JPG) file to TXT by simply changing its extension (or one could just choose to open the JPG with notepad, same thing)
It should look like this, oddly looking sequences of texts, and some of them (very rare) are actually meaningful, like in the screenshot below "creator: dg-jpeg v1.0..."
- I disabled wrapping and selected all the text using Ctrl+A (to make sure nothing's missed)
- I pasted the copied text to another blank TXT file and saved it as JPG, I compared the new file size with the original JPG. All of them (the original JPG, the converted TXT file and the newly created TXT file) are of the exact same size, to bytes.
When I tried to open, Windows would say "Windows Photo Viewer can't open this picture because the file appears to be damaged, corrupted, or is too large".
I even tried to test it using another method: Opened the JPG with notepad, I cut ONE known character from a location easy to remember (like the first character of the 2nd line) then save the file. The viewer would of course display the same message. Then I opened it again and pasted the character to the EXACT location (Notepad remembers its exit state like windows position, wrapping, fonts size...so I have no problem getting this right)
And still the same error. You can try this to get the idea, remember to choose a small picture else Notepad will act like a old rusty man.
What could have been the cause of this phenomenon?
Solution 1:
Depending on the encoding used to open the file you might see different behaviour. My Windows 7 notepad allows to open a file in ANSI, UTF-8, Unicode or Unicode big endian.
I've tested this issue with a small 2x2 pixel jpeg image created with gimp and opening and saving the image file with ANSI encoding. Opening both the original and the saved image with an hex editor I see that all 00 sequences (two hex digits, NUL control character) have been converted to 20 (space character).
Replacing back in the hex editor all 20 by 00 restores the image format.
I've googled it a bit and I didn't found any references that explain why it does that. Only a reference to a post that warns about it (google cache link, the page is not available).
If you save/open the file as UTF-8 it seems that it still converts NUL characters to spaces but it also increases the resulting file size due to conversions from single-byte characters to UTF-8 multi-byte sequences.
If you save/open the file as Unicode it seems that it still converts NUL characters to spaces but also adds a byte to the beginning of the file, the BOM.
Solution 2:
Why it fails :
Notepad create spaces (ASCII code 32)
character for characters like NUL (ASCII code 0)
because Windows API's text box only allows null terminated char *
ASCIIZ (character array, pointer). It gets cut off at the first NUL.
That happens because Windows API is mostly written in C language and null terminated strings are one of the common features. Even when modern Windows and Unicode is considered same null terminated strings occur. So notepad simply replace them with space so you can view the complete file.
So when you save the file it is corrupted.
wikipedia-null terminated strings
How to do further research :
You may use a comparator like beyond compare (commercial,trial) to see the character replacement effect. also see other binary compare tools.
Note : (20)16 = (32)10
Reason for notepad acts slowly on large files
It checks each character and replace special characters with spaces. Other software do not do in-memory conversions (at least not primitive as notepad). They just render special characters differently. And they use advanced buffering techniques.Looking into Notepad.exe (XP 32 bit)
( I'm assuming its still written in C++ or at least use a comparably similar linker )
I'm using the PEiD tool (which stopped development with introduction of PE+/64 exes)
PEiD can be found bundled in the bin folder of Universal Extractor
I extracted the notepad. ex_ file from the Windows xp iso obviously. Try it out. It's a cab file extract using 7z.
Warning ! Your virus scanner might detect Universal Extractor/PEiD as hack tools or viruses. Don't Trust it don't download it !!
Further info about windows API
credits:Jason C
It's not just the text box; WM_SETTEXT in general provides no parameter for specifying the string length, and strings are always assumed to terminate at null. You could always create a custom text box with a custom message that specified the string length, but Notepad and most other programs reasonably do not. Also the function SetWindowText does not provide a length parameter as well.
Solution 3:
Notepad does not preserve all special / extended characters exactly as they are. I don't have a reference for this behaviour immediately at hand but have found this to be the case for example with UNIX-style end of line LF which Notepad will convert into CRLF and null (0x00) which it will ignore. In a binary file such as a JPG there are liable to be random occurrences of the character(s) that Notepad does not preserve. Try your experiment with a HEX-aware editor and it should work then. I'll update my answer if I find a good reference and once I've tested a HEX editor.
Update: I tried a few well known programmers editors but only one of them worked right off the bat, HxD by Maël Hörz. I never used HxD before but found it thanks to an answer to this Stack article, A hex viewer / editor plugin for Notepad++.
The other editors that didn't work after a few minutes effort were Notepad++, Notepad2 and UltraEdit (v17.3, older version). A couple of these had problems with the copy / paste of the first few bytes, the JPEG file signature magic number FF D8 FF. Maybe they would work with a little more fiddling than I have time for at present.
Solution 4:
You used to be able to do this with Write back in the day. It was a standard program in Windows 3.1 but I can't remember if Windows 95 included it. Write would allow binary safe editing of any file it could open (probably very limited file size). Notepad is definitely not binary safe (the text remains the same but the actual bytes of non-text characters [e.g. control codes] may change) which is why your JPG example is not working. Try getting a copy of Write (and very old Windows) and try your experiment again!
According to Wikipedia's "Windows Write" article Write was included up to Windows NT 3.5. It was replaced by Wordpad in Windows 95 onwards. write.exe
was still present in the Windows directory but was simply a wrapper for opening Wordpad.