How to produce a minimal BIOS hello world boot sector with GCC that works from a USB stick on real hardware?

I have managed to produce a minimal boot sector that works with QEMU 2.0.0 Ubuntu 14.04:

.code16
.global _start
_start:
    cli
    mov $msg, %si
    mov $0x0e, %ah
loop:
    lodsb
    or %al, %al
    jz halt
    int $0x10
    jmp loop
halt:
    hlt
msg:
    .asciz "hello world"
.org 510
.word 0xaa55

Compiled with:

as -o main.o main.S
ld --oformat binary -o main.img -Ttext 0x7C00 main.o

The example is available on this repo: https://github.com/cirosantilli/x86-bare-metal-examples/tree/2b79ac21df801fbf4619d009411be6b9cd10e6e0/no-ld-script

Upon:

qemu -hda main.img

it shows hello world on the emulator screen as expected.

But if I try to burn to a USB:

sudo dd if=main.img of=/dev/sdb

then plug the USB into a ThinkPad T400 or T430, hit F12, and select the USB what I observe is:

  • some boot messages show up quickly
  • then the screen goes blank, with only a underscore cursor at the top

I have also tested the same USB with a Ubuntu 14.04 image, and it booted fine, so the USB is working.

How should I change this example so that it will boot on the hardware and show the hello world message?

What is the difference between the Ubuntu image and the one I've created?

Where is this documented?

I have uploaded the output of sudo dmidecode on the T400 to: https://gist.github.com/cirosantilli/d47d35bacc9be588009f#file-lenovo-t400


As mentioned by @Jester, I had to zero DS with:

@@ -4,2 +4,4 @@ _start:
     cli
+    xor %ax, %ax
+    mov %ax, %ds
     mov $msg, %si

Note that it is not possible to mov immediates to ds: we must pass through ax: 8086- why can't we move an immediate data into segment register?

So the root of the problem was difference between QEMU's initial state and that of the real hardware.

I am now adding the following 16-bit initialization code to all my bootloaders to guarantee a cleaner initial state. Not all of those are mandatory as mentioned by Michael Petch on the comments.

 .code16
cli
/* This sets %cs to 0. TODO Is that really needed? */
ljmp $0, $1f
1:
xor %ax, %ax
/* We must zero %ds for any data access. */
mov %ax, %ds
/* The other segments are not mandatory. TODO source */
mov %ax, %es
mov %ax, %fs
mov %ax, %gs
/*
TODO What to move into BP and SP? https://stackoverflow.com/questions/10598802/which-value-should-be-used-for-sp-for-booting-process
Setting BP does not seem mandatory for BIOS.
*/
mov %ax, %bp
/* Automatically disables interrupts until the end of the next instruction. */
mov %ax, %ss
/* We should set SP because BIOS calls may depend on that. TODO confirm. */
mov %bp, %sp

I have also found this closely related question: C Kernel - Works fine on VM but not actual computer?

The Intel Manual Volume 3 System Programming Guide - 325384-056US September 2015 9.10.2 "STARTUP.ASM Listing " contains a large initialization example.