How is the x86 data segment used in real operating systems and processes?
I've been programming x86 asm programs (bootloaders) in real mode, and I know how to use segments, registers and stuff like that.
I see from debuggers like OllyDbg and similar that DS registers, SS, etc.. are obviously used.. but how do normal "Windows-like" processes use them? I know that segmentation is partially used (just to separate ring0 from ring3) and that their entries are in the GDT, I know that there's paging involved and that would totally mess the addresses with PDE and PTE, but I can't quite "link" everything together and understand what data stack extra segments are all about.. does each process have a different DS/SS/ES?
Solution 1:
From INTEL 80386 PROGRAMMER'S REFERENCE MANUAL 1986:
Figure 5-1. Address Translation Overview
15 0 31 0
LOGICAL ╔═══════════════╗ ╔══════════════════════════════╗
ADDRESS ║ SELECTOR ║ ║ OFFSET ║
╚═══════════╤═══╝ ╚═══╤══════════════════════════╝
v v
╔══════════════════════════════╗
║ SEGMENT TRANSLATION ║
╚══════════════╤═══════════════╝
╔══╧═╗ PAGING ENABLED
║PG ?╟────────────────────┐
╚══╤═╝ │
31 PAGING v DISABLED 0 │
LINEAR ╔═══════════╦═══════════╦═══════════╗ │
ADDRESS ║ DIR ║ PAGE ║ OFFSET ║ │
╚═══════════╩═════╤═════╩═══════════╝ │
v │
╔══════════════════════════════╗ │
║ PAGE TRANSLATION ║ │
╚══════════════╤═══════════════╝ │
│<─────────────────────┘
31 v 0
PHYSICAL ╔══════════════════════════════╗
ADDRESS ║ ║
╚══════════════════════════════╝
Figure 5-2. Segment Translation
15 0 31 0
LOGICAL ╔════════════════╗ ╔═════════════════════════════════════╗
ADDRESS ║ SELECTOR ║ ║ OFFSET ║
╚═══╤═════════╤══╝ ╚═══════════════════╤═════════════════╝
┌──────┘ v │
│ DESCRIPTOR TABLE │
│ ╔════════════╗ │
│ ║ ║ │
│ ║ ║ │
│ ║ ║ │
│ ║ ║ │
│ ╠════════════╣ │
│ ║ SEGMENT ║ BASE ╔═══╗ │
└─>║ DESCRIPTOR ╟──────────────>║ + ║<──────┘
╠════════════╣ ADDRESS ╚═╤═╝
║ ║ │
╚════════════╝ │
v
LINEAR ╔════════════╦═══════════╦══════════════╗
ADDRESS ║ DIR ║ PAGE ║ OFFSET ║
╚════════════╩═══════════╩══════════════╝
In Windows, DS=ES=SS in most processes at most times and the values of CS and DS are shared across all processes. Processes may change their segment registers, but its rarely needed, so you're going to see the same set of CS and DS/ES/SS values most of the time. The kernel uses its own CS and DS.
Solution 2:
Usually, both in x86 protected mode and x86-64 long mode, segmentation is virtually not used (flat memory model). There are four main segment descriptors, each allowing access to the whole address space: ring0 code , ring0 data, ring3 code, ring3 data. Memory protection is enforced using paging. So, in general all processes are given the same CS, DS, SS, ES values.
Note, that some operating systems use FS and GS segments in addressing local data, for example TIB in Windows.
It is also worth mentioning that while in x86 protected mode such behavior is optional and kernel is free to use multiple segments for memory protection, in x86-64 long mode segmentation is generally disabled and operating system is forced to use flat memory model (though it still can use FS and GS for addressing local data and operating system structures).
You may also want to check this invaluable source of information on x86 and x86-64 architecture: Intel Manual 3A (section 3.2 should clarify all your doubts on segmentation)