For more up-to-date notes see http://www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf.
1. The problem
The goal is to share some critical resource between processes without more than one using it at a time—this is the fundamental problem in time-sharing systems.
The solution is to only allow access while in a specially-marked block of code called a critical section, and only allow one process at a time to be in a critical section.
A mutual exclusion protocol guarantees this, usually in an AsynchronousSharedMemory model.
Formally: We want a process to cycle between states trying (trying to get into critical section), critical (in critical section), exiting (cleaning up so that other processes can enter their critical sections), and remainder (everything else—essentially just going about its non-critical business). Only in the trying and exiting states does the process run the mutual exclusion protocol to decide when to switch to the next state; in the critical or remainder states it switches to the next state on its own.
Core mutual exclusion requirements:
- Mutual exclusion
- At most one process is in the critical state at a time.
- No deadlock (progress)
- If there is at least one process in a trying state, then eventually some process enters a critical state; similarly for exiting and remainder states.
Note that the protocol is not required to guarantee that processes leave the critical or remainder state, but we generally have to insist that the processes at least leave the critical state on their own to make progress.
Additional useful properties (not satisfied by all mutual exclusion protocols; see Lynchbook §10.4):
- No lockout (lockout-freedom): If there is a particular process in a trying or exiting state, that process eventually leaves that state. This means that I don't starve because somebody else keeps jumping past me and seizing the critical resource before I can.
Stronger versions of lockout-freedom include explicit time bounds (how many rounds can go by before I get in) or bounded bypass (nobody gets in more than k times before I do).
3. Mutual exclusion using strong primitives
See AttiyaWelch §4.3 or LynchBook §10.9. Idea is we will use some sort of read-modify-write register, where the RMW operation computes a new value based on the old value of the register and writes it back as a single atomic operation, usually returning the old value to the caller as well.
3.1. Test and set
A test-and-set operation does the following sequence of actions atomically:
oldValue ← read(bit) write(bit, 1) return oldValue
(Typically there is a second reset operation for setting the bit back to zero.)
Because a test-and-set operation is atomic, if two processes both try to perform test-and-set on the same bit, only one of them will see a return value of 0. This is not true if each process simply executes the above code on a stock atomic register: there is an execution in which both processes read 0, then both write 1, then both return 0 to whatever called the non-atomic test-and-set subroutine.
Test-and-set provides a trivial implementation of mutual exclusion:
while true do: ... do remainder stuff ... /* entry code */ state ← trying while testAndSet(lock) = 1: /* spin */ state ← critical ... do critical section stuff ... /* exit code */ state ← exiting reset(lock) state ← remainder
It is easy to see that this code provides mutual exclusion, as once one process gets a 0 out of the lock, no other can escape the inner while loop until that process calls the reset operation in its exiting state. It also provides progress (assuming the lock is initially set to 0); the only part of the code that is not straight-line code (which gets executed eventually by the fairness condition) is the inner loop, and if the lock is 0, some process escapes it, while if the lock is 1, some process is in the region between the testAndSet call and the reset call, and so it eventually gets to the reset and lets the next process in (or itself, if it is very fast).
The algorithm does not provide lockout-freedom: nothing prevents a single fast process from scooping up the lock bit every time it goes through the outer loop, while the other processes ineffectually grab at it just after it is taken away. Lockout-freedom requires a more sophisticated turn-taking strategy.
3.2. A lockout-free algorithm using an atomic queue
Basic idea: In the trying phase, each process enqueues itself on the end of a shared queue (assumed to be an atomic operation). When a process comes to the head of the queue, it enters the critical section, and when exiting it dequeues itself. So the code would look something like this:
while true do: ... do remainder stuff ... /* entry code */ state ← trying enqueue(Q, my-id) while head(Q) ≠ my-id: /* spin */ state ← critical ... do critical section stuff ... /* exit code */ state ← exiting dequeue(Q) state ← remainder
Here the proof of mutual exclusion is that only the process whose id is at the head of the queue can enter its critical section. Formally, we maintain an invariant that any process whose program counter is between the inner while loop and the dequeue(Q) line must be at the head of the queue; this invariant is easy to show because a process can't leave the while loop unless the test fails (i.e., it is already at the head of the queue), no enqueue operation changes the head value (if the queue is nonempty), and the dequeue operation (which does change the head value) can only be executed by a process already at the head (from the invariant).
Deadlock-freedom follows from proving a similar invariant that every element of the queue is the id of some process in the trying, critical, or exiting states, so eventually the process at the head of the queue passes the inner loop, executes its critical section, and dequeues its id.
Lockout-freedom follows from the fact that once a process is at position k in the queue, every execution of a critical section reduces its position by 1; when it reaches the front of the queue (after some finite number of critical sections), it gets the critical section itself.
3.2.1. Reducing space complexity
AttiyaWelch §4.3.2 gives an implementation of this algorithm using a single read-modify-write (RMW) register instead of a queue; this drastically reduces the (shared) space needed by the algorithm. The reason this works is because we don't really need to keep track of the position of each process in the queue itself; instead, we can hand out numerical tickets to each process and have the process take responsibility for remembering where its place in line is.
The RMW register has two fields, first and last, both initially 0. Incrementing last simulates an enqueue, while incrementing first simulates a dequeue. The trick is that instead of testing if it is at the head of the queue, a process simply remembers the value of the last field when it "enqueued" itself, and waits for the first field to equal it.
Here's the code above rewritten to use this technique. The way to read the rmw operations is that the first argument specifies the variable to update and the second specifies an expression for computing the new value. Each rmw operation returns the old state of the object, before the update.
while true do: ... do remainder stuff ... /* entry code */ state ← trying position ← rmw(V, <V.first, V.last+1>) // enqueue while rmw(V, V).first ≠ postion.last // while head(Q) ≠ my-id /* spin */ state ← critical ... do critical section stuff ... /* exit code */ state ← exiting rmw(V, <V.first+1, V.last>) // dequeue state ← remainder
4. Mutual exclusion using only atomic registers
4.1. Peterson's tournament algorithm
Here's Peterson's lockout-free mutual exclusion protocol for 2 processes p0 and p1 (see also AttiyaWelch §4.4.2 or Lynchbook §10.5.1). It uses only atomic registers.
Shared atomic bits: waiting, initially arbitrary present[i] for each process p[i], initially 0 Process p[i]'s code: while true do: 0 ... do remainder stuff ... /* entry code */ 1 state ← trying 2 present[i] ← 1 3 waiting ← i while true do 4 if present[¬i] = 0 break 5 if waiting ≠ i break 6 state ← critical 7 ... do critical section stuff ... /* exit code */ 8 state ← exiting 9 present[i] = 0 10 state ← remainder
Note that the protocol requires that waiting be multi-writer, but it's OK for the present bits to be single-writer. We write lines 4–5 as two separate lines because they include two separate read operations. Lines 1, 6, 8, and 10 are pretty much irrelevant to the code, but we'll leave them in to make it clear which of the four states we are cycling through.
4.1.1. Why this works
Intuitively, let's consider all the different ways that the entry code of the two processes could interact. There are basically two things that each process does: it sets its own present in line 2 and grabs the waiting variable in line 3. Here's a typical case where one process gets in first:
- present ← 1
- waiting ← 0
- process 0 reads present = 0 and enters critical section
- present ← 1
- waiting ← 1
- process 1 reads present = 1 and waiting = 1 and loops
- present ← 0
- process 1 reads present = 0 and enters critical section
The idea is that if I see a 0 in your present, I know that you aren't playing, and can just go in.
Here's a more interleaved execution where the waiting variable decides the winner:
- present ← 1
- waiting ← 0
- present ← 1
- waiting ← 1
- process 0 reads present = 1
- process 1 reads present = 1
- process 0 reads waiting = 1 and enters critical section
- process 1 reads present = 1 and waiting = 1 and loops
- present ← 0
- process 1 reads present = 0 and enters critical section
Note that it's the process that set the waiting variable last (and thus sees its own value) that stalls. This is necessary because the earlier process might long since have entered the critical section.
Sadly, examples are not proofs, so to show that this works in general, we need to formally verify each of mutual exclusion and lockout-freedom. Mutual exclusion is a safety property (see IOAutomata) so we expect to prove it using invariants. LynchBook gives a proof based on translating the pseudocode directly into automata (including explicit program counter variables); we'll do essentially the same proof but without doing the full translation to automata. Below, we write that pi is at line k if it the operation in line k is enabled but has not occurred yet.
- Invariant 1
If present[i] = 0 then pi is at line 0, 1, or 2.
- Immediate from the code.
- Invariant 2
If pi is at line 6, 7, or 8 and p¬i is at line 4, 5, 6, 7, or 8, then waiting = ¬i.
- Let's assume without loss of generality that i = 0. The proof is by induction on the schedule. We need to check that any event that makes the LHS of the invariant true or the RHS false also makes the whole invariant true. The relevent events are:
Transitions by p0 from line 4 to line 6; these occur only if present = 0 implying p1 is at line 0, 1, or 2 by Invariant 1, so the LHS is false.
Transitions by p0 from line 5 to line 6; these occur only if waiting ≠ 0, so the RHS is true.
Transitions by p1 from line 3 to line 4: these set waiting = 1, making the RHS true.
Transitions that set waiting = 0. These are transitions by p0 from line 3 to line 4, making the LHS false.
We can now read mutual exclusion directly off of Invariant 2: if both p0 and p1 are at line 7 or 8, then we get waiting = 1 and waiting = 0, a contradiction.
To show progress, observe that the only place where both processes can get stuck forever is in the loop in lines 4-5. But then waiting isn't changing, and so some process i reads waiting = ¬i and leaves. To show lockout-freedom, observe that if 0 is stuck in the loop while 1 enters the critical section, then after 1 leaves it sets present to 0 in line 9 (which lets 0 in if 0 reads present in time), but even if it then sets present back to 1 in line 2, it still sets waiting to 1 in line 3, which lets 0 into the critical section. With some more tinkering this argument shows that 1 enters the critical section at most twice while 0 is in the trying state, giving 2-bounded bypass; see LynchBook Lemma 10.12). With even more tinkering we get a constant time bound on the waiting time for process i to enter the critical section, assuming the other process never spends more than O(1) time inside the critical section.
4.1.2. Generalization to n processes
(See also AttiyaWelch §4.4.3.)
The easiest way to generalize Peterson's two-process algorithm to n processes is to organize a tournament in the form of log-depth binary tree. At each node of the tree, the roles of the two processes are taken by the winners of the subtrees, i.e. the processes who have entered their critical sections in the two-process algorithms corresponding to the child nodes. The winner of the tournament as a whole enters the real critical section, and afterwards walks back down the tree unlocking all the nodes it won in reverse order. It's easy to see that this satisfies mutual exclusion, and not much harder to show that it satisfies lockout-freedom—in the latter case, the essential idea is that if the winner at some node reaches the root infinitely often then lockout-freedom at that node means that the winner of each child node reaches the root infinitely often.
The most natural way to implement the nodes is to have present and present at each node be multi-writer variables that can be written to by any process in the appropriate subtree. Because the presents don't do much we can also implement them as the OR of many single-writer variables (this is what is done in LynchBook §10.5.3, but there is no immediate payoff to doing this since the waiting variables are still multi-writer).
Nice properties of this algorithm are that it uses only bits and that it's very fast: (O(log n)) time in the absence of contention.
4.2. Lamport's Bakery algorithm
Here's the actual algorithm:
Shared variables: choosing[i], an atomic bit for each i, initially 0 number[i], an *unbounded* atomic register, initially 0 Process i's code: while true do: 0 ... do remainder stuff ... /* entry code */ 1 state ← trying 2 choosing[i] ← 1 3 number[i] ← 1 + max(number[j] for j ≠ i) 4 choosing[i] ← 0 for j ≠ i do: 5 loop until choosing[j] = 0 6 loop until number[j] = 0 or ((number[i], i) < (number[j], j)) 7 state ← critical 8 ... do critical section stuff ... /* exit code */ 9 state ← exiting 10 number[i] = 0 11 state ← remainder
Note that several of these lines are actually loops; this is obvious for lines 5 and 6, but is also true for line 3, which includes an implicit loop to read all n-1 values of number[j].
Intuition for mutual exclusion is that if you have a lower number than I do, then I block waiting for you, for lockout-freedom is eventually I have the smallest number. For a real proof see AttiyaWelch §4.4.1 or LynchBook.
Selling point is strong near-FIFO guarantee and the use of only single-writer registers (which need not even be atomic—it's enough that they return correct values when no write is in progress). Weak point is unbounded registers.
4.3. Lower bound on the number of registers
There is a famous result due to Burns and Lynch that any mutual exclusion protocol using only read/write registers requires at least N of them. Details are in LynchBook §10.8. AttiyaWelch §4.4.4 also gives a slightly different version of the argument, or you can read the original paper: burns-lynch-1993.pdf. The proof is a nice example of an IndistinguishabilityProof, where we use the fact that if a group of processes can't tell the difference between two executions, they behave the same in both.
Assumptions: We have a protocol that guarantees mutual exclusion and progress.
Key idea: In order for some process p to enter the critical section, it has to do at least one write to let the other processes know it is doing so. If not, they can't tell if p ever showed up at all, so eventually either some p' will enter the critical section and violate mutual exclusion or (in the no-p execution) nobody enters the critical section and we violate progress. Now suppose we can park a process pi on each register ri with a pending write to i; in this case we say that pi covers ri. If every register is so covered, we can let p go ahead and do whatever writes it likes and then deliver all the covering writes at once, wiping out anything p did. Now the other processes again don't know if p exists or not. So we can say something stronger: before some process p can enter a critical section, it has to write to an uncovered register.
The hard part is showing that we can cover all the registers without letting p know that there are other processes waiting—if p can see that other processes are waiting, it can just sit back and wait for them to go through the critical section and make progress that way. So our goal is to produce states in which (a) processes p1...,pk (for some k) between them cover k registers, and (b) the resulting configuration is indistinguishable from an idle configuration to pk+1...pn.
Starting from any idle configuration C, there exists an execution in which only processes p1...pk take steps that leads to a configuration C' that is indistinguishable by any of pk+1...pn from some idle configuration C'' and in which k registers are covered by p1...pk.
The proof is by induction on k. For k=1, just run p1 until it is about to do a write, let C' be the resulting configuration and let C'' = C.
For larger k, the essential idea is that starting from C, we first run to a configuration C1 where p1...pk-1 cover k-1 registers that is indistinguishable from an idle configuration by the remaining processes, and then run pk until it covers one more register. If we let p1...pk-1 go, they overwrite anything pk wrote. Unfortunately, they may not come back to covering the same registers as before if we rerun the induction hypothesis (and in particular might cover the same register that pk does). So we have to look for a particular configuration C1 that not only covers k-1 registers but also has an extension that covers the same k-1 registers.
Here's how we find it: Start in C. Run the induction hypothesis to get C1; here there is a set W1 of k-1 registers covered in C1. Now let processes p1 through pk-1 do their pending writes, then each enter the critical section, leave it, and finish, and rerun the induction hypothesis to get to a state C2, indistinguishable from an idle configuration by pk and up, in which k-1 registers in W2 are covered. Repeat to get sets W3, W4, etc. Since this sequence is unbounded, and there are only (r choose k-1) distinct sets of registers to cover (where r is the number of registers), eventually we have Wi = Wj for some i≠j. The configurations Ci and Cj are now our configurations desired configurations covering the same k-1 registers.
Now that we have Ci and Cj, we run until we get to Ci. We now run pk until it is about to write some register not covered by Ci (it must do so, or otherwise we can wipe out all of its writes while it's in the critical section and then go on to violate mutual exclusion). Then we let the rest of p1 through pk-1 do all their writes (which immediately destroys any evidence that pk ran at all) and run the execution that gets them to Cj. We now have k-1 registers covered by p1 through pk-1 and a k-th register covered by pk, in a configuration that is indistinguishable from idle: this proves the induction step.
The final result follows by the fact that when k=n we cover n registers ⇒ there are n registers to cover.