whats the use of "org xxxx" in assembly for a legacy PC BIOS MBR bootloader?
Recently im learning how to write a boot sector, here is the complete code that i am learning:
org 07c00h
mov ax, cs
mov ds, ax
mov es, ax
call DispStr
jmp $
DispStr:
mov ax, BootMessage
mov bp, ax
mov cx, 16
mov ax, 01301h
mov bx, 000ch
mov dl, 0
int 10h
ret
BootMessage: db "Hello, OS!"
times 510-($-$$) db 0
dw 0xaa55
a very simple code if you know how to boot a system. the result is a line Hello OS!
displayed on the screen, the only thing that i dont know is the first line: org 07c00h
.
The book tells me that the line of code let the compiler to locate the address to the 7c00h place, but the explanation is very ambiguous, and I really don't know whats the use of it here. what in the world does the line org 07c00h
do here?
I tried to remove the line, and use nasm to create a bin
file, then use the bochs to boot the bin file. Nothing different from the previous one: "hello OS!" displayed on the screen too.
Can i say that the first line does nothing here? What's the use of org xxxx
?
The assembler is translating each line of your source code to processor instruction and generates these instructions in sequence, one after another, into the output binary file. Doing that, it maintains an internal counter which counts the current address of any such instruction, starting from 0 and upwards.
If you're assembling a normal program, these instructions will end up in the code section at some object file with just blank slots for addresses, which have to be filled in with proper addresses by the linker afterwards, so it's not a problem.
But when you assemble a flat binary file without any sections, relocations and other formatting, just raw machine instructions, then there is no information for the assembler about where are your labels indicating to and what are the addresses of your code & data. So, for example, when you have an instruction mov si, someLabel
, then the assembler can only calculate the offset of this label starting from 0 at the beginning of the binary file. (i.e. the default is ORG 0
if you don't specify one.)
If it's not true, and you want your machine instructions+data in memory to begin from some other address, e.g. 7C00
, then you need to tell the assembler that the starting address of your program is 7C00
by writing org 0x7C00
at the beginning of your source. This directive tells the assembler that it should start up its internal address counter from 7C00
instead of from 0
. The result is that all addresses used in such a program will be shifted by 7C00
. The assembler simply adds 7C00
to each of the address calculated for each label. The effect is as if the label was located in memory at the addres, say, 7C48
(7C00 + 48
) instead of just 0048
(0000 + 48
), no matter that it is offset only 48 bytes from the beginning of the binary image file (which, after loading at the offset 7C00
will give the proper address).
These "addresses", if used directly like jmp si
or mov al, [si]
, are the offset
part of seg:off
logical addressing, where in real mode the segment part is left-shifted by 4 to get a base that the offset adds to. (So 07C0:000
and 0000:7C00
address the same linear address, 7C00
.) The segment
part comes from whatever you've put into the relevant segment register, or whatever the BIOS left there if you didn't set it to a fixed value.
If your cs
, ds
, and/or es
segment registers are set to match where in linear address space your MBR is loaded (always 7C00
), so the first byte of your file is at es:0
for example, using that offset with a correctly-set segment base will actually reach your data. jmp si
will jump to that label if cs
is set so cs:si
is where your code is. i.e. if cs:org
references the first byte of your MBR. mov ax, [si]
will load 2 bytes from it if ds
is set correctly.
In your case, int 10h
/ah=13h
uses es:bp
, and there are no other uses of absolute addressing, only relative jumps/calls whose encoding doesn't depend on org
. You set es
from cs
at the start of the bootloader for some reason, instead of setting it to a fixed value to match the org
you're using. This is a bug; your bootloader won't work on BIOSes that jump to the MBR with CS:IP = 07C0:0000
, only ones that use 0000:7C00
matching your org
. Fix this by replacing mov ax,cs
with xor ax,ax
; it doesn't matter whether DS/ES are different from CS or not, just that ES: BootMessage-$$ + org
is where your data actually is.
Linear vs. Logical addresses
As to your other question: 7C00
is the linear physical address of the bootloader. You can represent this physical address as a logical address (segment:offset) in different ways, because segments overlap (next segment starts 16 bytes (10
in hex) after the previous one). For example, you can use logical address 0000:7C00
which is the simplest configuration: you use segment 0
starting at the beginning of your RAM, and offset 7C00
from that 0
. Or, you can use logical address 07C0:0000
, which is 7C0
th segment. Remember that segments start 16 bytes apart from each other? So you simply multiply this 7C0
by 10
(16
in decimal) and you get 7C00
-- see? It's a matter of shift one position to the right in your hexadecimal address! :-) Now you just add your offset, which is 0
this time, so it's still 7C00
physically. The byte 0
in segment 07C0
which starts at 7C00
in memory.
Of course you can also use more complicated addresses, like, for example, 0234:58C0
, which means that the segment starts at 2340
and when you add 58C0
offset to it, you'll get 7C00
again :-) But doing that could be confusing. It all depends on what configuration you need. If you want to consider the 7C00
physical address as the start of your segment, just use segment 07C0
and your first instruction will be at offset 0
, so you don't need to put org
directive, or you can put org 0
then. But if you need to read/write some data below the 7C00
address (for example, peek the BIOS data or fiddle with interrupt vectors), then use segment 0
and offset 7C00
which means your first instruction (0th byte in your binary file) will be located at 7C00
physical address in memory; then you have to add org 0x7C00
directive from the reasons described above.
The BIOS will jump to your code with CS:IP = 07C0:0000 or 0000:7C00. And with unknown values in DS/ES/SS:SP. You should write your bootloader to work either way, using xor ax,ax
/ mov ds,ax
to set DS base to zero if you're using org 0x7c00
.
See Michael Petch's general tips for bootloader development for more about writing robust bootloaders that avoid making assumptions about the state the BIOS left, except for ones that all BIOSes must get right to work at all with mainstream software. (e.g. loading your 512-byte MBR at linear address 0x00007c00 and drive number in DL).
Almost(?) all BIOSes start an MBR with either CS=0 or CS=07C0, not some other seg:off way of reaching the same linear address. But you definitely shouldn't assume one or the other.
It is where you have an assembler and linker in one step. The org tells the assembler which tells the linker (in these cases often the same program) where in physical memory space to put the code that follows. When you use a C compiler or some other high level language compiler you often have separate compile and link steps (although the compiler often calls the linker for you behind the scenes). The source is compiled to a position independent object file with some of the instructions left unimplemented waiting on the link step. The linker takes objects and a linker script or information from the user describing the memory space and from there then encodes the instructions for that memory space.
User786653 set it quite well it tells the assembler something it cant figure out on its own the memory space/address where these instructions are going to live in case there is a need to make position dependent encodings in the instructions. Also it uses that information in the output binary if it is a binary that includes address information, for example elf, srec, ihex, etc.