Steps in Context Switching
I am asked to describe the steps involved in a context switch (1) between two different processes and (2) between two different threads in the same process.
- During a context switch, the kernel will save the context of the old process in its PCB and then load the saved context of the new process scheduled to run.
- Context switching between two different threads in the same process can be scheduled by the operating system so that they appear to execute in parallel, and is thus usually faster than context switches between two different processes.
Is this too general or what would you add to explain the process clearer?
It's much easier to explain those in reverse order because a process-switch always involves a thread-switch.
A typical thread context switch on a single-core CPU happens like this:
All context switches are initiated by an 'interrupt'. This could be an actual hardware interrupt that runs a driver, (eg. from a network card, keyboard, memory-management or timer hardware), or a software call, (system call), that performs a hardware-interrupt-like call sequence to enter the OS. In the case of a driver interrupt, the OS provides an entry point that the driver can call instead of performing the 'normal' direct interrupt-return & so allows a driver to exit via the OS scheduler if it needs the OS to set a thread ready, (eg. it has signaled a semaphore).
Non-trivial systems will have to initiate a hardware-protection-level change to enter a kernel-state so that the kernel code/data etc. can be accessed.
Core state for the interrupted thread has to be saved. On a simple embedded system, this might just be pushing all registers onto the thread stack and saving the stack pointer in its Thread Control Block (TCB).
Many systems switch to an OS-dedicated stack at this stage so that the bulk of OS-internal stack requirements are not inflicted on the stack of every thread.
It may be necessary to mark the thread stack position where the change to interrupt-state occurred to allow for nested interrupts.
The driver/system call runs and may change the set of ready threads by adding/removing TCB's from internal queues for the different thread priorities, eg. network card driver may have set an event or signaled a semaphore that another thread was waiting on, so that thread will be added to the ready set, or a running thread may have called sleep() and so elected to remove itself from the ready set.
The OS scheduler algorithm is run to decide which thread to run next, typically the highest-priority ready thread that is at the front of the queue for that priority. If the next-to-run thread belongs to a different process to the previously-run thread, some extra stuff is needed here, (see later).
The saved stack pointer from the TCB for that thread is retrieved and loaded into the hardware stack pointer.
The core state for the selected thread is restored. On my simple system, the registers would be popped from the stack of the selected thread. More complex systems will have to handle a return to user-level protection.
An interrupt-return is performed, so transferring execution to the selected thread.
In the case of a multicore CPU, things are more complex. The scheduler may decide that a thread that is currently running on another core may need to be stopped and replaced by a thread that has just become ready. It can do this by using its interprocessor driver to hardware-interrupt the core running the thread that has to be stopped. The complexities of this operation, on top of all the other stuff, is a good reason to avoid writing OS kernels :)
A typical process context switch happens like this:
Process context switches are initiated by a thread-context switch, so all of the above, 1-9, is going to need to happen.
At step 5 above, the scheduler decides to run a thread belonging to a different process from the one that owned the previously-running thread.
The memory-management hardware has to be loaded with the address-space for the new process, ie whatever selectors/segments/flags/whatever that allow the thread/s of the new process to access its memory.
The context of any FPU hardware needs to be saved/restored from the PCB.
There may be other process-dedicated hardware that needs to be saved/restored.
On any real system, the mechanisms are architecture-dependent and the above is a rough and incomplete guide to the implications of either context switch. There are other overheads generated by a process-switch that are not strictly part of the switch - there may be extra cache-flushes and page-faults after a process-switch since some of its memory may have been paged out in favour of pages belonging to the process owning the thread that was running before.
I hope that I can provide a more detailed/clear picture.
First of all, the OS schedules threads, not processes, because threads are the only executable units in the system. Process switch is just a thread switch where the threads belong to different processes, and therefore the procedure is basically the same.
-
The scheduler is invoked. There are three basic scenarios in which this may happen:
- Involuntary switch. Some external event affecting scheduling has occurred outside the currently running thread. For example, an expired timer has woken up a thread with a high priority; or the disk controller has reported that the requested part of a file has been read into the memory and the thread waiting for it can continue its execution; or the system timer has told the kernel that your thread ran out of its time quantum; and so on.
- Voluntary switch. The thread explicitly requests rescheduling through a system call. For example, it may have requested to yield the CPU to some other thread, be put asleep or wait until a mutex is released.
- Semi-voluntary switch. The thread implicitly triggered rescheduling by performing some unrelated system call. For example, it asked to read a file. The OS has forwarded this request to the disk controller, and not to waste time by having the calling thread busy waiting, it decided to switch to another thread.
In all cases, to be able to perform a context switch, control should be passed to the kernel. In the case of involuntary switches, this is performed by an interrupt. In the case of voluntary (and semi-voluntary) context switches, control is passed to the kernel via a system call.
In both cases, kernel entry is CPU-assisted. The processor performs a permissions check, saves the instruction pointer (so that execution can be continued from the right instruction later), switches from user user mode to kernel mode, activates the kernel stack (specific to the current thread) and jumps to a predefined and well-known point in the kernel code.
The first action performed by the kernel is saving the content of CPU registers, which it needs to use for its own purposes. Usually the kernel uses only general purpose CPU registers and saves them by pushing them onto the stack.
The kernel then handles a primary request if needed. It may handle an interrupt, prepare a file read request, reload a timer etc.
At some point during request handling, the kernel performs an action that affects the state of either the current thread (decided that there is currently nothing to be done in this thread as it is waiting for something) or that of another thread (or threads) (a thread became ready to run because an event it was waiting for occurred - a mutex was released, for example).
-
The kernel invokes the scheduller. The scheduler has to make made two decisions.
- What to do with the current thread? Should it be blocked? If so, which wait queue should it be placed in? If the switch is involuntary, it is placed at the end of the ready queue. Otherwise, the thread is placed in one of the wait queues.
- Which thread should be run next?
Once both decisions have been made, the scheduler performs the context switch using the TCB of the current thread as well as that of the thread that is to be run next.
-
A context switch itself consist of three main steps.
- The kernel figures out what CPU registers the thread actually uses and saves their content either on the stack or in the TCB of the unscheduled thread. In the case of the IA-32 CPU platform, if the thread does not use FPU and SSE registers, their content will not be saved.
- The kernel pushes the instruction pointer onto the stack and saves the value of the stack pointer in the TCB of the unscheduled thread. It then loads the stack pointer from the TCB of the scheduled thread and pops the instruction pointer from the top of its stack.
- The kernel figures out which registers are actually used by the scheduled thread and loads them with their previously stored contents (see step 1 above).
At this point the kernel checks if the scheduled and unscheduled threads belong to the same process. If not ("process" rather than "thread" switch), the kernel resets the current address space by pointing the MMU (Memory Management Unit) to the page table of the scheduled process. The TLB (Translation Lookaside Buffer), which is a cache containing recent virtual to physical address translations, is also flushed to prevent erroneous address translation. Note that this is the only step in the entire set of context switch actions that cares about processes!
The kernel prepares Thread Local Storage for the scheduled thread. For example, it maps respective memory pages to the specified addresses. As another example, on the IA-32 platform a common approach is to load a new segment which point to the TLS data of the incoming thread.
The kernel loads the current thread's kernel stack address into the CPU. After this, every kernel invocation will use this kernel stack instead of the kernel stack of the unscheduled thread.
Another step which may be performed by the kernel is reprogramming the system timer. When the timer fires, control is returned to the kernel. The time period between the context switch and the timer firing is called a time quantum and indicates how much execution time the current thread is given at that time. This is known as pre-emptive scheduling.
Kernels usually collect statistics during context switches to improve scheduling as well as to show system administrators and users what is going on in the system. These statistics may include such information as how much CPU time the thread has consumed, how many times it has been scheduled, how many times its time quantum has expired, how frequently context switches are occurring in the system etc.
The context switch can be considered ready at this point, and the kernel continues previously interrupted system actions. For example, if the thread had tried to acquire a mutex during a system call, and the mutex is now free, the kernel may finish the interrupted operation.
At some point the thread finishes its system activities and wants to return back to user mode to execute non-system code. The kernel pops from the kernel stack content of general-purpose registers which was previously saved upon kernel entry and makes the CPU execute a special instruction to return to user mode.
The CPU captures the values of the instruction pointer and stack pointer, which were previously saved kernel mode was entered, and restores them. At this point the thread's user mode stack is also activated and kernel mode exited (this prohibits the use of special system instructions).
Finally, the CPU continues execution from the point where the thread was when it was unscheduled. If it happened during a system clal, the thread will proceed from the point where the system call was invoked, by capturing and handling its result. In the case of pre-emption by interrupt, the thread will continue its execution as if nothing happened.
Some summary notes:
The kernel only schedules and executes threads, not processes - context switches take place between threads.
The procedure of switching to the context of a thread from another process is essentially the same in a context switch between threads belonging to the same process. Only one additional step is required: changing page tables (and flushing the TLB).
Thread context is stored either in kernel stack or in the TCB (not PCB!).
Context switching is an expensive operation - it has a significant direct cost in performance, and the indirect cost caused by cache pollution (and TLB flush if the switch occurred between processes) is even greater.