[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

Processes are the abstraction that lets us divide up the CPU and other system resources between different programs. Historically, the idea of a process grew out of the separate virtual machines of classic mainframe OS architectures. Over time, some of the features traditionally associated with a process (such as a separate address space) have been shed, giving rise to a newer notion of a thread, which slices up the CPU without dividing up other resources. Curiously, this is almost the reverse of what happened with subroutines, which started out only affecting the instruction pointer in many early programming languages (BASIC, FORTRAN) but later added separate address spaces (local variables, objects). We'll follow the historical pattern by talking about processes first.

1. Processes

A process typically encompasses a bunch of things bundled together:

In order to implement processes, we need to solve two problems:

  1. How do we divide up the CPU between processes?
  2. How do we keep all the other bits of processes separate?

The first problem can be further divided into building a mechanism that lets the CPU switch between processes (multitasking) and choosing how to decide which process it should run next (scheduling). (This is an example of the split between mechanism and policy that recurs throughout OS design and development.) The second problem is a question of allocating the right data structures, which we'll talk about below, and setting up the memory system to give each process its own address space, which we'll defer to VirtualMemory.

2. Multitasking

Here's a picture of a single process A running along doing its stuff:


Here are two processes A and Be running along in parallel (say on a system with multiple CPUs):


And here are A and B sharing a single CPU:

AAAAAAAA             AAAAAAA                AAAAAA       

The trick here is that as long as A and B don't have the ability to detect that they've been shunted aside, they won't be able to tell the difference between this picture and the previous picture. So when designing a multitasking system, we just have to make sure that we clean up anything that B does when it's running so that A doesn't notice and vice versa.

We also have to figure out how to switch the CPU's instruction pointer from A's code to B's code. Fortunately this is not a new problem: We do it all the time when executing a procedure call. The difference is between executing a procedure call and doing a context switch between processes is that (a) we generally don't expect to come back, and (b) the "calling" process doesn't necessarily know that the context switch is happening. But since we already have mechanisms for calling other code, we can exploit this to build context switches. There are two typical ways this is done, depending on whether we expect our processes to cooperate or not.

2.1. Option 1: Cooperative multitasking

Here we implement a context-switch as a first-class operation triggered by the running process, typically through a yield system call. So we might write code like

#include <myfancyos.h>

main(int argc, char **argv)
    while(1) {


(Note that exit will also be a system call or a library wrapper to a system call.)

2.1.1. Mechanism

Here most of the code just runs like standard C code. But when we execute yield, several strange things happen under the hood:

  1. The CPU state is saved. Typically this involves pushing the IP and process registers onto the stack. On an x86 machine, this could be done using the CALL instruction to save the IP (automatically done as part of the yield procedure call if it is implemented as a library procedure; for software interrupts we rely on similar behavior in INT) and the PUSHAD instruction to push all the registers. For other architectures that don't provide such convenient tools the process may be more complicated, but is usually a simple matter of programming. The last step is to save the stack pointer itself in the process control block—the data structure that holds all the extra junk the kernel has to keep track of for the process.

  2. Any other process state that shouldn't be clobbered is saved. At minimum this involves switching a pointer in the kernel to a new process control block. It may also involve talking to the memory management hardware to switch to a new address space and deal with any effects on caches.

As the automobile service manuals say, "installation is the reverse of removal":

  1. Copy the saved state of the new process to wherever it needs to go; switch the PCB pointer and update memory management if necessary.
  2. Restore the stack pointer and starting popping: POPAD, IRET or somesuch.

You should be wondering here how yield knows which processes to switch to; we'll talk about this under Scheduling below. You should also notice that even if we don't implement any other separation between processes, we at minimum have to have separate stacks. This is the main thing that separates a multi-threaded process from a standard single-threaded one.

Note that these steps are generally very architecture-dependent. Some CPU architectures (including the x86) include features for hardware context switching, where state changes can be handled by switching a pointer built into the CPU. Since these features tends to be even more architecture dependent and also depend on the OS agreeing to use the CPU's ideas of what to put in a PCB, they tend not to be used much in OS's aiming for portability. But they are handy if you are coding for a single architecture (like our friends in Redmond or the builders of some embedded systems) and really care about performance.

The advantage of cooperative multitasking is that the implementation is not very hard and you can yield only when it is safe to do so. The disadvantage is that you have to remember to yield a lot, which can produce filibusters if the processes want to be rude. This not only causes trouble with this code snippet:

   1     while(1);

But it also annoys anybody using the same machine as this fine program:

   1     while(1) {
   2         compute_wind_tunnel_simulation_without_yielding();
   3         yield();
   4     }

Here the on-screen mouse pointer is likely to move very jerkily (if at all), since if we can only switch contexts when yielding we don't get to run our window system very often. The solution is to interrupt these rude processes whether they like it or not.

2.2. Option 2: Pre-emptive multitasking

Pre-emption is when the kernel yields on behalf of a process rather than making the process do it. For the most part, this is a good thing: like running a garbage collector instead of malloc and free or using virtual memory instead of paging by hand, we are giving up a little bit of control to the underlying system in return for getting rid of a lot of headaches. Unfortunately in each of these cases we create new headaches. The particular new headache we get with pre-emption is that we have to take concurrency seriously.

2.2.1. Mechanism

Typically we still allow voluntary yielding, although this operation is usually hidden under some other blocking system call (e.g. read on data that isn't available yet). But we now also have incoming interrupts from that may produce a context switch between processes. The mechanism for switching out a process is now:

  1. Interrupt triggers switch to ring 0 interrupt handler. The interrupt might be from an I/O device but much of the time will be triggered by a timer every fixed number of milliseconds.
  2. Interrupt handler calls kernel-internal yield routine.
  3. Kernel-internal routine saves process state as before and jumps to the scheduler.

The mechanism for switching a process back in is essentially unchanged.

2.2.2. Gotchas

With pre-emptive multitasking, some other process may sneak in and break things while I'm not looking. With separate address spaces (or just separate data structures) this is not a problem. In a threading model with a common address space, this can lead to all sorts of nasty inconsistencies. This is particularly likely to come up inside the kernel, where a common approach is to have one thread per user process sharing a common set of core data structures.

The simplest approach to preventing bad outcomes of unwanted concurrency is to prevent the concurrency, usually through some sort of locking or critical section mechanism. A lock marks a data structure as inaccessible to anybody but the lock-holder. This requires politeness, and in a reasonable implementation also requires interaction with the scheduler so that processes don't spin waiting for a lock to become available. A critical section enforces that no other thread takes the CPU while it is running. This only requires turning off interrupts (CLI on x86). We'll talk about how to use these and how not to use them later.

3. Scheduling

We've deferred the question of how to choose which process to switch to. This is generally the job of a scheduler, which might range from anywhere from a couple of lines of code inside yield in a simple kernel to a full-blown user-space process in a microkernel-based design. However we set up the scheduling mechanism, we can separately consider the issue of scheduling policy. Here the goal is to balance between (a) keeping the kernel simple and understandable, and (b) not annoying the user. We also have to keep track of which processes are blocked waiting for locks or slow I/O operations.

Here are some common policies:

Processes move to the end of a queue when pre-empted (or unblocked). The next process to run is whatever is at the head of the queue. This has the advantage of speed and simplicity, but it doesn't give much control over resource allocation.
Instead of putting the runnable processes in queue, put them in a bag and draw the next one to run at random. Theoretically this is approximately equivalent to round-robin assuming your random number generator is reasonably good. OS implementers hate it since an unlucky process may starve.
Use a priority queue. High-priority processes take precedence over lower-priority processes. Priorities can be assigned either by the user (e.g. the update-the-mouse-pointer-now-so-the-user-doesn't-get-annoyed process beats the wind tunnel simulator) or by the system based on past performance (CPU hogs that have a history of getting pre-empted get lower priority that processes that yield the CPU quickly). This is the strategy used in Linux, which even has a particularly fast (constant-time!) priority queue implementation.
Real-time scheduling
Here some processes have hard constraints on when they run next or how often they run (think airplane control systems). The scheduler runs an algorithm that enforces these constraints if possible.
Fair-share scheduling
Each user gets a fixed slice of the CPU that is further divided between the user's processes using some other scheduling mechanism.
Let the user do it

This is the common microkernel approach alluded to earlier. The kernel's scheduler policy is to always run the designated user-space scheduler process, which can delegate its control of the CPU to specific other processes.

There are trade-offs in choosing any of these policies between simplicity, efficiency, performance, predictability, and user control. Which factors are the most important depends on the application. A typical modern OS will provide some hybrid policy whose default has been observed to work for most applications but which can be overridden by the user when necessary.

An orthogonal issue of scheduling policy is the quantum or minimum time that we schedule a process for before pre-empting it. With a small quantum we pre-empty a lot (which costs time). With a big quantum the mouse pointer stops moving. Consumer OSs will typically use a fixed quantum of about 10ms, which is less than most humans will notice and comparable to disk-access delays. Real-time embedded OSs may use quanta measured in microseconds that vary depending on the real-time constraints on other processes. Again these are policy trade-offs that depend on the application.


2014-06-17 11:58