Some notes on processor scheduling. For more details see SilberschatzGalvinGagne Chapter 5.
1. Processor scheduling: basics
Recall the basic picture for processes: We have various tasks (processes or threads) waiting in queues (which may not enforce strict FIFO ordering) for various resources. One of these resources that pretty much every task needs is the CPU. We'd like to arrange access to the CPU to take advantage of the particular needs of particular processes and minimize the noticeable waiting time and general sluggishness of our system. The main mechanism for doing this is a short-term scheduler or CPU scheduler that choose which process to pull off the ready queue when the CPU becomes idle. More sophisticated schedulers may also include a long-term scheduler that tracks the behavior of processes and adjusts their parameters to influence the behavior of the short-term scheduler, or (in batch systems) may even plot out CPU assignments well into the future. We'll mostly be talking about short-term scheduling.
Note that some of the issues in CPU scheduling might apply to other scarce resources, e.g. traffic shaping on a network. Conversely, if you have a lot of CPUs or few processes, CPU scheduling becomes much less of an issue since the resource isn't scarce.
1.1. CPU burst cycle
Processes typically alternative CPU bursts with I/O bursts. During the CPU burst the process needs the CPU; during the I/O burst the process doesn't, because it's waiting for some slow device. Through observation and experiment, OS researches have determined that the length of CPU bursts measured across all processes is distributed approximately exponentially, with short bursts much more likely than long bursts. This pattern is somewhat true even of single processes (think of a program that has to do several I/O operations in a row after a long computation), but the distribution may be shifted depending on whether the process is CPU bound (mostly wanting to use the CPU for long computations) or I/O bound (mostly waiting for I/O).
1.2. Types of jobs
- System processes
- Core system services. These typically have to be scheduled quickly, or the system keels over dead. Behavior tends to be predictable since they ship with the OS.
- Real-time processes
- Have hard real-time constraints (e.g. turn off the X-ray gun after at most 10 ms).
- Interactive processes
- User notices if they are slow: window system, word processors, web servers.
- Batch processes
- User doesn't notice if they are slow as long as they finish eventually and don't get in the way: background animation rendering, anti-virus scanning.
Goal is to take advantage of the characteristics of these jobs to maximize the right criterion for each.
1.3. Performance measures
(See SGG §5.2.)
- CPU utilization: what percentage of time the CPU is not idle. A good scheduler keeps the CPU busy. In particular this means that we want to space processes out so they aren't all trying to do I/O at once.
- Throughput: how many processes are completed per time unit. (Depends on what the processes are doing).
- Turnaround time: how long it takes a particular process to finish.
- Waiting time: total time spent by all processes in the ready queue.
- Response time: time between user input and corresponding system action.
Response time likely will be the sole criterion for real-time processes and the main criterion for interactive processes (and possibly system processes that they depend on). Batch processes are likely to care more about turnaround time. The other measures (which tend to be closely related) are mostly useful to let you know if your scheduler is screwing up somehow, but waiting time can vary dramatically depending on scheduling policy and is the factor that most closely affects response time.
There is a relationship between average queue length n, average arrival time λ (in processes/second), and average waiting time for a single process W (in seconds): n = λW. This allows computing any of these quantities from the other two. Total waiting time will be nW = λW²; so total waiting time grows faster than individual process waiting time. This is another reason why waiting time is so important.
Again, if the CPU is free most of the time, it doesn't really matter what scheduling method we use. The goal is to get graceful degradation in performance when the system is loaded.
2. When scheduling happens
To a first approximation, we schedule at every context switch. Following SGG §5.1.3, we can observe four places where this might happen, as a function of a change in state of some process:
Running → waiting, because of an I/O or mutex request, waiting on some condition, or explicitly calling yield.
- Running → ready. (Preemption.)
- Waiting → ready. Resource becomes available.
The first and last cases involve a process giving up the CPU, so we have to schedule some other process. A cooperative or non-preemptive scheduler will run only in these cases. A preemptive scheduler may be called in the other cases as well.
If we are doing scheduling often, we want our scheduler to be cheap (possibly even running inside the kernel thread that is being preempted, so we avoid an extra context switch). At the same time, we want to use a good scheduling policy to minimize waiting time (and thus response time for interactive processes). So there is a trade-off between scheduling algorithm complexity and performance. Since performance depends on more unpredictable factors that complexity, this tends to favor simpler algorithms.
3.1. First-come first-served
The ready queue is a queue; new processes go to the end of the line. No preemption. Main advantage: simplicity.
Disadvantage: High waiting times if processes vary in CPU burst length. Consider processes with CPU bursts of 100, 1, 1, 1, 1, and 1 milliseconds. If the first process is the 100-ms one, we get 510 ms of total waiting time as compared to 15 ms if we sort in the other order.
3.2. Round robin
Basically FCFS with preemption. After using up its quantum, a process is bumped from the CPU whether it is done or not. Waiting time for an individual process now looks roughly like ceiling((length of CPU burst)/((quantum)-(context switch time))*(quantum)*(average number of waiting processes); a quick way to read this is that a process waits for roughly (n-1)/n of its run time where n is the size of the ready queue.
Adjusting the quantum is important: With a big quantum, we get poor response time. With a small quantum, we spend a lot of time on unnecessary context switches. A common heuristic is to aim for most CPU bursts to be shorter than the quantum, so that most processes are not preempted.
If we know the burst lengths, we can sort jobs to minimize waiting time as in the FCFS example above. If we don't know the burst lengths, we can guess based on averaging previous burst lengths. Typical approach is to use an exponentially-declining weighted average, which is computed as (guess) = (previous guess) * α + (last observation) * (1-α) for some constant α.
This still has the problem that once a big job gets the CPU any jobs that arrive after it will have to wait.
3.4. Priority scheduling
Generalizes SJF by allowing for arbitrary priorities other than burst length. The highest-priority job gets the CPU whenever it becomes free, or in a preemption model whenever it is being used by a lower priority job.
Disadvantages: Low-priority jobs may starve (solution: slowly increase the priority of starving jobs). A high-priority job waiting for a low-priority job to do something may have to wait a long time, a situation called priority inversion (solution: temporarily bump up the priority of the job being waited for). General problem: Priority queues are an expensive data structure.
For real-time scheduling, earliest deadline first is a common priority-based scheme that guarantees completion of all jobs by their deadlines if possible.
3.5. Multilevel queue scheduling
Partition the jobs into classes based on their scheduling characteristics and schedule each class separately off of a separate ready queue. So for example we can schedule real-time jobs using earliest-deadline-first, system jobs using priorities, interactive jobs using round-robin, and batch jobs using first-come first-served. We still have to decide how to allocate the CPU between the queues; here typically a mix of priority-based scheduling (real-time jobs always take priority) and timeslicing (10% of the CPU goes to batch jobs) works.
3.6. Multilevel feedback-queue scheduling
Like MQS, but assign processes to classes based on their past behavior. So processes with short CPU bursts get put on a queue with a smaller quantum, while processes that never give up the CPU are eventually classified as batch jobs (since they don't do any I/O, nobody will notice!) Hideously complex in its full glory.
4. Multiple processors
Ideal case: one processor per process. No scheduling!
More typical case: small number of processors, many processes. Two basic approaches: Centralized scheduling (asymmetric multiprocessing) or distributed scheduling (symmetric multiprocessing). Latter case does scheduling on a per-processor basis with long-term load-balancing to move jobs from heavily-loaded machines to lightly-loaded machines, either by having the heavily-loaded processors shove jobs away or having the lightly-loaded ones engage in work stealing (more efficient, since the lightly-loaded processors have time on their hands to look for new jobs).
Load balancing must balance two conflicting goals:
- Processor affinity
- It's best to keep a job on the same processor as much as possible, since that's where all its cached data sits.
- Load balancing
- If we never move anybody, we can't fix imbalances.
A typical solution is to delay load balancing until it's clear that the long-term payoff from adjusting the load exceeds the cost of moving a process. We can also allow processes to lock themselves to particular CPUs if they know better than we do what they are doing.
5. Algorithms in practice
See SGG §5.6. Short version: Solaris uses multilevel queue scheduling with distinct scheduling methods for each class; Windows XP uses priority scheduling with a base priority class and "relative priority" adjustments that penalize processes that don't give up the CPU before the end of their quantum; Linux uses static priorities for real-time tasks and dynamic priorities for most tasks.
Theoretical: use queuing theory. Practical: simulate first, then implement what seems to work. See SGG for more details.