Why can't we understand the content of a binary file after compiled?

First, registers don't have addresses. Each instruction in any assembly language translates to an opcode. Opcodes in x86 can be one, two, three, or even more bytes (in some other processors they are "fixed-width"). Usually the opcode indentifies the instruction, addressing mode, and registers involved. The "addressing mode" determines if more than the opcode is needed by the CPU, i.e. "immediate" addressing mode means there's additional data right after (or "immediately after") the instruction for that instruction - "absolute" addressing modes means that a memory address follows the instruction and is used by that instruction.

You can find out the opcode of something like MOV AL,SP or similar and then search for it. x86 has a lot of instructions that operate on the stack pointer.

But please, please quit using Notepad and use a hex editor instead. I would recommend HxD, although there are many others.

And @David Schwartz is correct. A disassembler will iterate through a file, and translate opcodes back into readable text. What you want to do is totally possible.

However, you need to know where in the file the instructions start because if you start at the wrong address, some data that should be the "operands" to opcodes (such as instructions that take an address for an operand or "argument") might get misinterpreted as opcodes. Knowing this requires knowledge of the format the executable is in, which is for Windows the "Portable Executable" or PE format (and is often ELF for Linux systems). I'm sure there are disassemblers that understand PE, etc. but I don't know of any offhand.


So, if I have understood everything correctly

Not quite.

It is a binary file and its data is incomprehensible for us humans

Typically a binary file is incomprehensible to human and machine, especially when the purpose of the file is unknown. Note that not all binary files are executable files. A lot of binary files are data files that do not contain any machine instructions. That is why file extensions are used when naming files (in some OSes). The .com extension was used by CP/M to denote an executable file. The .exe extension was added by MS-DOS to denote another executable file format. *nixes use the execute attribute to denote which files can be executed, although it could be script as well as code.

As already mentioned by others, binary files, which contain numbers, should be viewed by a hex dump program or hex editor and not by a text viewer.

there is a example of the content of the ping.exe program

That file is actually a relocatable program, and not all of the data in that file represents machine code. There is information about the program such as which dynamic libraries it needs, which routines have to be linked, requirements for stack and program & data memory, and the program's entry point. Address operands in the file could be relative values that need to be calculated to absolute values, or references that need to be resolved.

The "program file" that you're probably thinking of is called a binary image file or a dump of program memory. Such a file would contain only machine code and data, with all address references properly set for execution.

even if they know Assembly code(the lowest level of machine language.)

Assembly language is not the same as machine language. The typical (as to exclude high-level language computers) CPU accepts machine code as input, one instruction at a time. The operands are either registers or numeric memory addresses. Assembly language is a higher-level language that can use symbolic labels for instruction locations and variables, as well as replacing numeric op-codes with mnemonics. An assembly language program has to converted to machine language/code before it can actually be executed (typically by utilities called assembler, linker and loader).

The reverse operation, disassemby, can be performed on program files with some success and loss of symbolic information. Disassembly of a memory dump or program image file is more trial & error, as code and data locations need to be identified manually.

BTW there are persons that can read and code the (numeric) machine code. Of course this is a lot easier on an 8-bit CPU or microcontroller than a 32-bit CISC processor with a dozen memory address modes.


You can't see the proper, intended encoding of a binary file through Notepad. Please review this for future reference. Most text editing programs do not parse binary encoding formats, and are expected to parse ASCII character code formatting.

So opening a binary file in a text editor will yield equivalent ASCII characters that do not make any sense of the original format of the binary data as parsed by the text editor. As mentioned, hex editors, and some have binary features, to view the contents in pure binary format.

You are incorrect that the contents of a binary file are unable to be understood. While they will be hard, and in modern computer architectures extremely hard to hand-disassemble from binary alone in to proper instructions recognized by the CPU for execution (or emulated/virtual CPU), etc., it can be done.

How do you think emulators are programmed? The developer would need to know opcodes to be able to program the fictive system to recognize and behave as the real hardware would in some manner. Documentations explain many architectures of CPUs, and even GPUs have them(though more secretive).

Another thing to note is that in the lowest-level, although correlative, the "binary data" is not really a bunch of zeroes and ones, but high-and-low voltages amplified/switched through an electrical circuit as current.

Binary usually is 1:1 with this, so it makes much sense to use the number system for it.