Why don't you see binary code when you open a binary file with text editor?
Solution 1:
Binary and text data aren't separated: They are simply data. It depends on the interpretation that makes them one or the other. If you open binary data (such as an image file) in a text editor, much of it won't make sense, because it does not fit your chosen interpretation (as text).
What you call text is a subset of the possible file contents: Data that in a given character set translates to readable characters.
For example, in ASCII, you can see that, of 128 "allowed" values, only about half are letters and numbers, 30 are punctuation, and the rest are control characters. The latter group just isn't used a lot in text files, and they have no really good textual representation. Some of them are Tab and Newline characters, where text editors already need to get creative in displaying them.
Some text editors have options to explicitly display whitespace. Then they'll actually be drawn as characters, in addition to their regular formatting behavior (which is also just the interpretation of these characters).
Pure ASCII only interprets 128 values. The bytes used to store this information have 256 possible values each, so half of the possible values aren't allowed in ASCII. Those are e.g. used in region-specific character sets, such as Latin 1, but in ASCII, they're undefined. They have no useful representation in a text viewer that can only handle ASCII.
Binary data is not usually interpreted as text. So in these files, all possible byte values are commonly found. Everything else would be wasteful (and that's a reason you can compress text very well). Image file formats are complicated, and you don't usually view them as text, so they don't need to be readable.
As there is no common data interpretation (character set) that maps all possible values to readable characters, and since that wouldn't make lot of sense anyway (as it's not readable text), major parts are displayed as gibberish.
A hex editor chooses a different representation for the data: It displays each byte as two hexadecimal digits. It's just a different representation, and one with an easily human-readable character set: All 256 possible byte values can be represented as two hex digits.
Since there's an easy mapping of binary data to hex and vice versa (4 binary digits to/from one hexadecimal digit), and binary contains very little information per digit, hexadecimal is generally the preferred way for humans to read binary, unless there are specific reasons to prefer a different representation.
Some text editors might have a hex editor mode and some heuristic that tried to determine whether a file is text or binary, and automatically select one mode or the other. But this can be difficult to get right and it's not a specific property of the file that says whether it's one kind or the other.
Some FTP clients ask you to specify which file endings are used for text data. These programs will then change the file contents to match the OS of the machine you're connected to, as Windows uses a different line ending character sequence (CR/LF
) than Linux and Unix (including Mac OS X; LF
).
Solution 2:
Because you've opened it in a text editor, not a binary editor.
Solution 3:
It's all to do with context and interpretation. What's in your computer is patterns of high and low voltage, or magnetised regions of a disk, that only gain meaning when we decide how we want to interpret them.
Under different circumstances, the pattern low-high-low-low-low-low-low-high might mean the number 65, a capital letter 'A', a sky-blue colour, that a customer ordered coffee, the date 'March 6th' or anything at all, really.
When you open your image file in a graphics program, it knows to interpret it as an image, knows which patterns indicate the image format, which patterns indicate the image size and so on.
When you open your image file in a text editor, it gets treated as text. This is a very simple format, much closer to what's really going on in the computer, but there is still some interpretation going on. Specifically, nearly every pattern gets interpreted as a particular character, some normal like A-Z, but also some weird characters. A few patterns don't show up as characters but instead are treated as basic formatting: newline, tab.
(The situation is slightly complicated by things such as Unicode and text encodings such as UTF-8 but I won't deal with those here for the sake of simplicity.)
When you have an binary file open in a text editor, take care not to make changes, because almost any change you make will completely disrupt the normal interpretation of the file's contents, that is it will ruin the file and make it unusable.
Solution 4:
As a simplified example, consider an image file opened with a text editor.
The image is a simple chess pattern, with the squares 3 pixels wide and a 1-pixel gray border between each square.
- three black pixels, a grey border pixel, three white pixels, a grey border pixel, repeat.
The first line in that image would have the following value four times:
Black Black Black Gray White White White Gray
0x000000 0x000000 0x000000 0x7F7F7F 0xFFFFFF 0xFFFFFF 0xFFFFFF 0c7F7F7F
(In Hex, rather than Binary - the string in Binary would be four times as long - 0x7F being replaced with 0b01111111)
If you load that string of data in a text editor, you would get the following text:
[Nul][Nul][Nul][Nul][Nul][Nul][Nul][Nul][Nul][Del][Del][Del][Blank][Blank][Blank][Blank][Blank][Blank][Blank][Blank][Blank][Del][Del][Del]
This is because 0x00 is the ASCII code for the Null value and you need to write that 3 times to get the value for a black pixel (In 24bit BMP anyway) and you have 3 black pixels. Then 0x7F is the ASCII code for Delete, and you need THAT three times to get a gray pixel. 0xFF isn't valud ASCII code for anything in particular - even in the extended ASCII set - and you need to write it 9 times to get 3 white pixels. Finishing it off, you get three more Deletes to write a gray pixel.
A different way to show it, which might be more usefully explanatory, is the reverse example - what DO you have to write to a file in order to get zeroes and ones when opened in a text editor?
The ASCII codes for zero and one, of course! A zero in a text editor isn't stored as a single bit with value 0, it is stored as 8 bits with value 0b00110000, or in hex 0x30
The ASCII code for '0' is 0x30, and the ASCII code for '1' is 0x31, so if you want to store a chess pattern as zeroes and ones, your file will look like this:
text editor:
10101010
01010101
10101010
01010101
Stored data (ASCII values for '1', '0' and 'new line'):
0x31 0x30 0x31 0x30 0x31 0x30 0x31 0x30 0x0D 0x30 0x31 0x30 0x31 0x30 0x31 0x30 0x31 0x0D 0x31 0x30 0x31 0x30 0x31 0x30 0x31 0x30 0x0D 0x30 0x31 0x30 0x31 0x30 0x31 0x30 0x31
There is a lot more to it than this - files have starts and stops and metadata and all other kinds of things, but the takehome lesson and answer to your question is:
Unless the first 8 bits of your file are 0b00110000, your text editor will not write '0' because that's the ASCII-code for the character '0'. Unless the first 8 bits ouf your file are 0b00110001, your text editor will not write '1' because that's the ASCII-code for the character '1'.