*For more up-to-date notes see http://www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf.*

For full details see AttiyaWelch Chapter 4 or LynchBook Chapter 10.

# 1. The problem

The goal is to share some critical resource between processes without more than one using it at a time—this is *the* fundamental problem in time-sharing systems.

The solution is to only allow access while in a specially-marked block of code called a **critical section**, and only allow one process at a time to be in a critical section.

A **mutual exclusion protocol** guarantees this, usually in an AsynchronousSharedMemory model.

Formally: We want a process to cycle between states *trying* (trying to get into critical section), **critical** (in critical section), **exiting** (cleaning up so that other processes can enter their critical sections), and **remainder** (everything else—essentially just going about its non-critical business). Only in the trying and exiting states does the process run the mutual exclusion protocol to decide when to switch to the next state; in the critical or remainder states it switches to the next state on its own.

# 2. Goals

(See also AttiyaWelch §4.2, LynchBook §10.2.)

Core mutual exclusion requirements:

- Mutual exclusion
- At most one process is in the critical state at a time.
- No deadlock (progress)
- If there is at least one process in a trying state, then eventually some process enters a critical state; similarly for exiting and remainder states.

Note that the protocol is not required to guarantee that processes leave the critical or remainder state, but we generally have to insist that the processes at least leave the critical state on their own to make progress.

Additional useful properties (not satisfied by all mutual exclusion protocols; see Lynchbook §10.4):

- No lockout (lockout-freedom): If there is a particular process in a trying or exiting state, that process eventually leaves that state. This means that I don't starve because somebody else keeps jumping past me and seizing the critical resource before I can.

Stronger versions of lockout-freedom include explicit time bounds (how many rounds can go by before I get in) or **bounded bypass** (nobody gets in more than k times before I do).

# 3. Mutual exclusion using strong primitives

See AttiyaWelch §4.3 or LynchBook §10.9. Idea is we will use some sort of **read-modify-write** register, where the RMW operation computes a new value based on the old value of the register and writes it back as a single atomic operation, usually returning the old value to the caller as well.

## 3.1. Test and set

A **test-and-set** operation does the following sequence of actions atomically:

oldValue ← read(bit) write(bit, 1) return oldValue

(Typically there is a second **reset** operation for setting the bit back to zero.)

Because a test-and-set operation is atomic, if two processes both try to perform test-and-set on the same bit, only one of them will see a return value of 0. This is not true if each process simply executes the above code on a stock atomic register: there is an execution in which both processes read 0, then both write 1, then both return 0 to whatever called the non-atomic test-and-set subroutine.

Test-and-set provides a trivial implementation of mutual exclusion:

while true do: ... do remainder stuff ... /* entry code */ state ← trying while testAndSet(lock) = 1: /* spin */ state ← critical ... do critical section stuff ... /* exit code */ state ← exiting reset(lock) state ← remainder

It is easy to see that this code provides mutual exclusion, as once one process gets a 0 out of the lock, no other can escape the inner while loop until that process calls the `reset` operation in its exiting state. It also provides progress (assuming the lock is initially set to 0); the only part of the code that is not straight-line code (which gets executed eventually by the fairness condition) is the inner loop, and if the lock is 0, some process escapes it, while if the lock is 1, some process is in the region between the `testAndSet` call and the `reset` call, and so it eventually gets to the `reset` and lets the next process in (or itself, if it is very fast).

The algorithm does *not* provide lockout-freedom: nothing prevents a single fast process from scooping up the lock bit every time it goes through the outer loop, while the other processes ineffectually grab at it just after it is taken away. Lockout-freedom requires a more sophisticated turn-taking strategy.

## 3.2. A lockout-free algorithm using an atomic queue

Basic idea: In the trying phase, each process enqueues itself on the end of a shared queue (assumed to be an atomic operation). When a process comes to the head of the queue, it enters the critical section, and when exiting it dequeues itself. So the code would look something like this:

while true do: ... do remainder stuff ... /* entry code */ state ← trying enqueue(Q, my-id) while head(Q) ≠ my-id: /* spin */ state ← critical ... do critical section stuff ... /* exit code */ state ← exiting dequeue(Q) state ← remainder

Here the proof of mutual exclusion is that only the process whose id is at the head of the queue can enter its critical section. Formally, we maintain an invariant that any process whose program counter is between the inner while loop and the `dequeue(Q)` line must be at the head of the queue; this invariant is easy to show because a process can't leave the while loop unless the test fails (i.e., it is already at the head of the queue), no enqueue operation changes the head value (if the queue is nonempty), and the dequeue operation (which does change the head value) can only be executed by a process already at the head (from the invariant).

Deadlock-freedom follows from proving a similar invariant that every element of the queue is the id of some process in the trying, critical, or exiting states, so eventually the process at the head of the queue passes the inner loop, executes its critical section, and dequeues its id.

Lockout-freedom follows from the fact that once a process is at position k in the queue, every execution of a critical section reduces its position by 1; when it reaches the front of the queue (after some finite number of critical sections), it gets the critical section itself.

### 3.2.1. Reducing space complexity

AttiyaWelch §4.3.2 gives an implementation of this algorithm using a single read-modify-write (RMW) register instead of a queue; this drastically reduces the (shared) space needed by the algorithm. The reason this works is because we don't really need to keep track of the position of each process in the queue itself; instead, we can hand out numerical tickets to each process and have the process take responsibility for remembering where its place in line is.

The RMW register has two fields, *first* and *last*, both initially 0. Incrementing last simulates an enqueue, while incrementing first simulates a dequeue. The trick is that instead of testing if it is at the head of the queue, a process simply remembers the value of the last field when it "enqueued" itself, and waits for the first field to equal it.

Here's the code above rewritten to use this technique. The way to read the rmw operations is that the first argument specifies the variable to update and the second specifies an expression for computing the new value. Each rmw operation returns the old state of the object, before the update.

while true do: ... do remainder stuff ... /* entry code */ state ← trying position ← rmw(V, <V.first, V.last+1>) // enqueue while rmw(V, V).first ≠ postion.last // while head(Q) ≠ my-id /* spin */ state ← critical ... do critical section stuff ... /* exit code */ state ← exiting rmw(V, <V.first+1, V.last>) // dequeue state ← remainder

# 4. Mutual exclusion using only atomic registers

## 4.1. Peterson's tournament algorithm

Here's Peterson's lockout-free mutual exclusion protocol for 2 processes p_{0} and p_{1} (see also AttiyaWelch §4.4.2 or Lynchbook §10.5.1). It uses only atomic registers.

Shared atomic bits: waiting, initially arbitrary present[i] for each process p[i], initially 0 Process p[i]'s code: while true do: 0 ... do remainder stuff ... /* entry code */ 1 state ← trying 2 present[i] ← 1 3 waiting ← i while true do 4 if present[¬i] = 0 break 5 if waiting ≠ i break 6 state ← critical 7 ... do critical section stuff ... /* exit code */ 8 state ← exiting 9 present[i] = 0 10 state ← remainder

Note that the protocol requires that waiting be multi-writer, but it's OK for the present bits to be single-writer. We write lines 4–5 as two separate lines because they include two separate read operations. Lines 1, 6, 8, and 10 are pretty much irrelevant to the code, but we'll leave them in to make it clear which of the four states we are cycling through.

### 4.1.1. Why this works

Intuitively, let's consider all the different ways that the entry code of the two processes could interact. There are basically two things that each process does: it sets its own present in line 2 and grabs the waiting variable in line 3. Here's a typical case where one process gets in first:

- present[0] ← 1
- waiting ← 0
- process 0 reads present[1] = 0 and enters critical section
- present[1] ← 1
- waiting ← 1
- process 1 reads present[0] = 1 and waiting = 1 and loops
- present[0] ← 0
- process 1 reads present[0] = 0 and enters critical section

The idea is that if I see a 0 in your present, I know that you aren't playing, and can just go in.

Here's a more interleaved execution where the waiting variable decides the winner:

- present[0] ← 1
- waiting ← 0
- present[1] ← 1
- waiting ← 1
- process 0 reads present[1] = 1
- process 1 reads present[0] = 1
- process 0 reads waiting = 1 and enters critical section
- process 1 reads present[0] = 1 and waiting = 1 and loops
- present[0] ← 0
- process 1 reads present[0] = 0 and enters critical section

Note that it's the process that set the waiting variable last (and thus sees its own value) that stalls. This is necessary because the earlier process might long since have entered the critical section.

Sadly, examples are not proofs, so to show that this works in general, we need to formally verify each of mutual exclusion and lockout-freedom. Mutual exclusion is a safety property (see IOAutomata) so we expect to prove it using invariants. LynchBook gives a proof based on translating the pseudocode directly into automata (including explicit program counter variables); we'll do essentially the same proof but without doing the full translation to automata. Below, we write that p_{i} is at line k if it the operation in line k is enabled but has not occurred yet.

- Invariant 1
If present[i] = 0 then p

_{i}is at line 0, 1, or 2.

- Proof
- Immediate from the code.

- Invariant 2
If p

_{i}is at line 6, 7, or 8 and p_{¬i}is at line 4, 5, 6, 7, or 8, then waiting = ¬i.

- Proof
- Let's assume without loss of generality that i = 0. The proof is by induction on the schedule. We need to check that any event that makes the LHS of the invariant true or the RHS false also makes the whole invariant true. The relevent events are:
Transitions by p

_{0}from line 4 to line 6; these occur only if present[1] = 0 implying p_{1}is at line 0, 1, or 2 by Invariant 1, so the LHS is false.Transitions by p

_{0}from line 5 to line 6; these occur only if waiting ≠ 0, so the RHS is true.Transitions by p

_{1}from line 3 to line 4: these set waiting = 1, making the RHS true.Transitions that set waiting = 0. These are transitions by p

_{0}from line 3 to line 4, making the LHS false.

We can now read mutual exclusion directly off of Invariant 2: if both p_{0} and p_{1} are at line 7 or 8, then we get waiting = 1 and waiting = 0, a contradiction.

To show progress, observe that the only place where both processes can get stuck forever is in the loop in lines 4-5. But then waiting isn't changing, and so some process i reads waiting = ¬i and leaves. To show lockout-freedom, observe that if 0 is stuck in the loop while 1 enters the critical section, then after 1 leaves it sets present[1] to 0 in line 9 (which lets 0 in if 0 reads present[1] in time), but even if it then sets present[1] back to 1 in line 2, it still sets waiting to 1 in line 3, which lets 0 into the critical section. With some more tinkering this argument shows that 1 enters the critical section at most twice while 0 is in the trying state, giving 2-bounded bypass; see LynchBook Lemma 10.12). With even more tinkering we get a constant time bound on the waiting time for process i to enter the critical section, assuming the other process never spends more than O(1) time inside the critical section.

### 4.1.2. Generalization to n processes

(See also AttiyaWelch §4.4.3.)

The easiest way to generalize Peterson's two-process algorithm to n processes is to organize a tournament in the form of log-depth binary tree. At each node of the tree, the roles of the two processes are taken by the winners of the subtrees, i.e. the processes who have entered their critical sections in the two-process algorithms corresponding to the child nodes. The winner of the tournament as a whole enters the real critical section, and afterwards walks back down the tree unlocking all the nodes it won in reverse order. It's easy to see that this satisfies mutual exclusion, and not much harder to show that it satisfies lockout-freedom—in the latter case, the essential idea is that if the winner at some node reaches the root infinitely often then lockout-freedom at that node means that the winner of each child node reaches the root infinitely often.

The most natural way to implement the nodes is to have present[0] and present[1] at each node be multi-writer variables that can be written to by any process in the appropriate subtree. Because the presents don't do much we can also implement them as the OR of many single-writer variables (this is what is done in LynchBook §10.5.3, but there is no immediate payoff to doing this since the waiting variables are still multi-writer).

Nice properties of this algorithm are that it uses only bits and that it's very fast: (O(log n)) time in the absence of contention.

## 4.2. Lamport's Bakery algorithm

See AttiyaWelch §4.4.1 or LynchBook §10.7.

Here's the actual algorithm:

Shared variables: choosing[i], an atomic bit for each i, initially 0 number[i], an *unbounded* atomic register, initially 0 Process i's code: while true do: 0 ... do remainder stuff ... /* entry code */ 1 state ← trying 2 choosing[i] ← 1 3 number[i] ← 1 + max(number[j] for j ≠ i) 4 choosing[i] ← 0 for j ≠ i do: 5 loop until choosing[j] = 0 6 loop until number[j] = 0 or ((number[i], i) < (number[j], j)) 7 state ← critical 8 ... do critical section stuff ... /* exit code */ 9 state ← exiting 10 number[i] = 0 11 state ← remainder

Note that several of these lines are actually loops; this is obvious for lines 5 and 6, but is also true for line 3, which includes an implicit loop to read all n-1 values of number[j].

Intuition for mutual exclusion is that if you have a lower number than I do, then I block waiting for you, for lockout-freedom is eventually I have the smallest number. For a real proof see AttiyaWelch §4.4.1 or LynchBook.

Selling point is strong near-FIFO guarantee and the use of only single-writer registers (which need not even be atomic—it's enough that they return correct values when no write is in progress). Weak point is unbounded registers.

## 4.3. Lower bound on the number of registers

There is a famous result due to Burns and Lynch that any mutual exclusion protocol using only read/write registers requires at least N of them. Details are in LynchBook §10.8. AttiyaWelch §4.4.4 also gives a slightly different version of the argument, or you can read the original paper: burns-lynch-1993.pdf. The proof is a nice example of an IndistinguishabilityProof, where we use the fact that if a group of processes can't tell the difference between two executions, they behave the same in both.

Assumptions: We have a protocol that guarantees mutual exclusion and progress.

Key idea: In order for some process p to enter the critical section, it has to do at least one write to let the other processes know it is doing so. If not, they can't tell if p ever showed up at all, so eventually either some p' will enter the critical section and violate mutual exclusion or (in the no-p execution) nobody enters the critical section and we violate progress. Now suppose we can park a process p_{i} on each register r_{i} with a pending write to i; in this case we say that p_{i} **covers** r_{i}. If every register is so covered, we can let p go ahead and do whatever writes it likes and then deliver all the covering writes at once, wiping out anything p did. Now the other processes again don't know if p exists or not. So we can say something stronger: before some process p can enter a critical section, it has to write to an uncovered register.

The hard part is showing that we can cover all the registers without letting p know that there are other processes waiting—if p can see that other processes are waiting, it can just sit back and wait for them to go through the critical section and make progress that way. So our goal is to produce states in which (a) processes p_{1}...,p_{k} (for some k) between them cover k registers, and (b) the resulting configuration is indistinguishable from an idle configuration to p_{k+1}...p_{n}.

- Lemma
Starting from any idle configuration C, there exists an execution in which only processes p

_{1}...p_{k}take steps that leads to a configuration C' that is indistinguishable by any of p_{k+1}...p_{n}from some idle configuration C'`' and in which k registers are covered by p`_{1}...p_{k}.

The proof is by induction on k. For k=1, just run p_{1} until it is about to do a write, let C' be the resulting configuration and let C'`' = C. `

For larger k, the essential idea is that starting from C, we first run to a configuration C_{1} where p_{1}...p_{k-1} cover k-1 registers that is indistinguishable from an idle configuration by the remaining processes, and then run p_{k} until it covers one more register. If we let p_{1}...p_{k-1} go, they overwrite anything p_{k} wrote. Unfortunately, they may not come back to covering the same registers as before if we rerun the induction hypothesis (and in particular might cover the same register that p_{k} does). So we have to look for a particular configuration C_{1} that not only covers k-1 registers but also has an extension that covers the same k-1 registers.

Here's how we find it: Start in C. Run the induction hypothesis to get C_{1}; here there is a set W_{1} of k-1 registers covered in C_{1}. Now let processes p_{1} through p_{k-1} do their pending writes, then each enter the critical section, leave it, and finish, and rerun the induction hypothesis to get to a state C_{2}, indistinguishable from an idle configuration by p_{k} and up, in which k-1 registers in W_{2} are covered. Repeat to get sets W_{3}, W_{4}, etc. Since this sequence is unbounded, and there are only (r choose k-1) distinct sets of registers to cover (where r is the number of registers), eventually we have W_{i} = W_{j} for some i≠j. The configurations C_{i} and C_{j} are now our configurations desired configurations covering the same k-1 registers.

Now that we have C_{i} and C_{j}, we run until we get to C_{i}. We now run p_{k} until it is about to write some register not covered by C_{i} (it must do so, or otherwise we can wipe out all of its writes while it's in the critical section and then go on to violate mutual exclusion). Then we let the rest of p_{1} through p_{k-1} do all their writes (which immediately destroys any evidence that p_{k} ran at all) and run the execution that gets them to C_{j}. We now have k-1 registers covered by p_{1} through p_{k-1} and a k-th register covered by p_{k}, in a configuration that is indistinguishable from idle: this proves the induction step.

The final result follows by the fact that when k=n we cover n registers ⇒ there are n registers to cover.