Basic idea: Allow address spaces to contain logical addresses that don't refer to any physical address, but instead point to blocks on a disk. Use the page fault mechanism in the paging system to copy the disk blocks into main memory as needed (this requires flushing other blocks back to disk to make room).
Mostly we are cribbing from SilberschatzGalvinGagne Chapter 9 here, so you should read that instead of this page.
- Backing store
Where we put pages that don't fit in main memory (typically some sort of disk). The specific region on a disk used for backing store is often called swap space.
- Program or other mechanism that manages moving pages back and forth.
- Demand paging
- Only fetching pages when requested (as opposed to e.g. loading up all pages from a program at startup).
- Resident set
Set of pages from a process that are present (resident) in main memory.
- Page fault
- Failed attempt to read a page from main memory.
2. Handling a page fault
What does kernel do when a process requests a page marked as invalid in its page table?
- Check against VM page table (possibly implemented in the same structure).
- Maybe it really is an invalid address → SIGSEGV the process.
- Otherwise it's out on backing store (or somewhere else).
- If on backing store, allocate a physical memory frame from the free list and copy it in.
If free list is empty, this requires flushing some other page first. If the other page hasn't been written to (it is clean), we can just throw it away. If it has been written to (it's dirty), then we have to write it out before reusing its frame.
- Restart the process when the page is available.
3. What can we page?
Userspace processes? Sure, why not? Except don't page out your disk driver daemon.
Kernel code? As long as you don't page out your pager.
Page tables? Yes, if you are very very careful.
We will mostly ignore this issue.
Expected time for memory access = (probability of page fault)×(cost of servicing page fault) + (normal memory access time).
- Normal access time = 10ns
Service time = 10ms =106 normal memory accesses.
- p = 0.001
Expected access time = 10.01μs = 103 normal memory accesses!
So we need a very low page fault rate: ~10-6 if we want to only double expected access time, lower if we want to be able to ignore the VM.
Fortunately, that 10-6 isn't as ludicrous a page fault rate as it looks: for a typical program with a lot of locality, a loop spinning through the same page or two generates millions of accesses that are probably not going to generate page faults as long as we are smart about not swapping out the pages it uses. Getting a low page fault rate by being smart in this way is the job of the page replacement algorithm.
5. Page replacement strategies
Observation: most processes have a working set of actively-used pages, whose size can be much smaller (say 10%) than the size of the address space. If the working sets of all the active processes fit in memory, we can avoid having any page faults.
This leaves us with two conflicting goals (cf. process scheduling):
- Get a good approximation to the working sets resident in memory so that we don't get too many page faults.
- Don't spend too much time thinking about paging.
The page replacement algorithm chooses which pages to get rid of; the idea is that the pages that are left should be ones we are likely to need again. To make this work, we will assume that the working set of each process changes slowly over time, so that a page we are using now we are likely to need again soon.
Flush the page that has been resident longest. Ignores usage. We don't like it: among other things, some request sequences can cause more paging as memory gets larger. (See SGG §9.4.2 for analysis of the bad sequence 123412512345, which causes slightly more page faults with 4 frame than 3).
- Optimal paging (OPT)
- If we know the request sequence, we can compute an optimal page replacement schedule with dynamic programming. But we usually don't know the request sequence.
- Least-recently-used (LRU)
Flush page that has been untouched the longest. Requires hardware support to track page-use time (e.g. in TLB); overhead to track every reference in software is prohibitive. So we are at the mercy of hardware vendors, who don't actually provide last-access-time fields.
- Approximate LRU
What we do get out of hardware is reference bits that indicate whether a page has been written to or read from since we last cleared the bits. This lets us make guesses about which pages are LRU or at least which have not been used in a while. Some variants::
- Second chance (clock algorithm)
- Pages are in a circular queue. Clock pointer acts like the pointer in FIFO, except that if a page's reference bit is set, we clear it and move on to the next page in the cycle. In the worst case we clear every bit and get back to the original page (we hope this doesn't happen too often—it basically means we get a very expensive implementation of FIFO). Why this works: frequently-accessed pages will get their reference bits set before we get back to them.
- Enhanced second chance
- Basic second chance prefers flushing unread pages (0 bit) to read pages (1 bit). In the enhanced version we have separate access bits and dirty (modified) bits, and prefer flushing 00 to 01 to 10 to 11; the idea is that given a choice between a clean page to flush and a dirty page to flush, we flush the clean page because we don't need to to an extra disk access to write it out.
- Counting schemes
e.g. least-frequently used, most-frequently used. Mostly useful to make other page-replacement strategies look good.
6. Buffering strategies
By keeping a few spare frames around we can speed things up. Instead of flushing a victim page immediately on a page fault, we allocate an empty frame and read the new page into it, so we can restart the processes after 1 disk access instead of 2 disk accesses. The victim page is then flushed in the background when the disk is otherwise free.
A natural way to implement this is to have a swapper daemon process that just flushes likely victims when it can. We can do this because the page replacement algorithm generally only selects victims and doesn't care about what we are replacing them with.
7. Other virtual memory tricks
7.1. Shared pages
We've already mentioned (in Paging) the use of a paging system to allow processes to share blocks within their address spaces, e.g. for doing InterProcessCommunication or keeping around only a single copy of a read-only executable or shared library. The same tricks work just as well when paging is supplemented by virtual memory.
Suppose we have two processes that share most of their initial state but may diverge over time (e.g. the parent and child in a fork in Unix). Instead of allocating pages for both copies, we keep a single copy of each duplicate page and clone it only when one of the processes tries to write to it (which we can detect as a page fault by turning off write access to all the shared pages). This is also useful for processes that start off with large blocks of zeros in their address space (e.g. big global C arrays, or the result of calling sbrk to ask for more heap space from the OS)—instead of handing out thousands of identical blank pages, we hand out thousands of pointers to the same page, and allocate the real pages only when we need to.
7.3. Memory-mapped files
The POSIX mmap call lets you take a file in the filesystem and map it into memory. Here the VM mechanism allocates a region in the address space that is backed by the file rather than the swap space. Otherwise everything looks exactly like standard VM: pages from the file are read in on demand, and written back (if dirty) when space is needed. There is a slight complication in that eventually the file is unmapped (either explicitly or when the process terminates), and we need to go and flush all the pages when this happens (we may also provide a mechanism to force a flush, e.g. if we want to be sure that our changes will survive a power failure).
8. Bad outcomes
The bad outcome for a virtual memory system is thrashing, the VM equivalent of a traffic jam where the total working-set size of all active processes exceeds the size of physical memory. Now a large proportion of memory accesses will cause page faults, and though the system will continue to struggle along sluggishly, we can expect to see a slowdown of many orders of magnitude.
This problem is similar in many ways to Deadlock, and has essentially the same solutions: ignore the problem, or detect the disaster and kill or suspend processes to reduce the conflict. As with deadlock, "ignore the problem" is easier to implement and arguably gives the user more control.