Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

1. Group mutual exclusion

In group mutual exclusion, each process that wants to enter the critical section specifies a session id, and it is permitted for two or more processes with the same session id to be in the critical section at the same time. Slightly more formally, we can define a group mutual exclusion protocol as one that satisfies: (a) group mutual exclusion: no two processes with the different session ids are in the critical section simultaneously; (b) progress: if no process is in the critical section, eventually some process enters the critical section; and (c) concurrent entering: if a process p with a given session id attempts to enter the critical session, and no process with a different session id is executing the protocol, then p enters the critical section after a bounded number of steps.

Give an implementation of a group mutual exclusion protocol as defined above using a single RMW register.
Suppose that session ids are either 0 or 1, and we have each process run Peterson's two-process tournament algorithm (as describe in MutualExclusion) using its session id in place of its process id. Is the resulting protocol a group mutual exclusion protocol?

1.1. Solution

1. Use the RMW register to hold the session id plus a count of how many processes are in the critical section. The code is:

while rmw(r, <if r.id ∈ { my-id, ⊥ } then (my-id, r.count+1) else r>)).id ∉ { my-id, ⊥ }:
    spin
/* critical section */
rmw(r, <(r.count - 1, if r.count = 1 then ⊥ else r.id)>)

This satisfies group mutual exclusion, because we maintain the invariant that r.count is always equal to the number of processes in the critical section while r.id equals their common session id or ⊥ if r.count = 0. It satisfies progress, because if nobody is in the critical section, r.count = 0 and r.id = ⊥, and so any process executing the first rmw operation sees r.id = ⊥ and enters. It satisfies concurrent entering, because if I am competing with no processes with a different id, then either r.id = ⊥ or r.id = my-id, and I enter the critical section in one operation. (Note that it is not starvation-free; building a starvation-free group mutual exclusion protocol is harder.)

2. It doesn't work. For example, suppose we have two processes with session id 0. When one of them exits the critical section, it sets present[0] to 0, allowing any process with id 1 to enter the critical section even though the other id-0 process may still be in there.

2. Two agreement protocols

Suppose you have an asynchronous system in which any one process may crash, and that you want an agreement protocol satisfying the usual requirements of agreement, validity, and termination. From the FischerLynchPaterson impossibility result, we know that this is impossible for deterministic processes.

Consider the following protocol: each process sends its input (0 or 1) to all processes (including itself). It then waits to receive n messages and decides on the OR of their values. Show that this protocol satisfies agreement and validity, but not termination.
Consider this protocol, for n even and at least 4: Each process executes a loop implementing a sequence of phases. Initially, process p sets v_p to input_p. In phase i, process p sends (i, v_p) to all processes (including itself). It then waits to receive n-1 phase-i messages and sets v_p to the majority of the values its sees (we assume messages from later phases are buffered so p can pretend it received them during the appropriate phase). If all the messages p receives have the same value for three phases in a row, p decides on that value. Show that this protocol satisfies agreement and validity, but not termination.

2.1. Solution

Any process that receives n messages computes the OR of all n processes' inputs. Since this is the same set of inputs for all processes, we get agreement, and since the OR of n zeros is zero and n ones is one, we get validity. Unfortunately, if a process crashes before sending all n of its messages, at least one process fails to receive n messages and thus waits forever; this violates termination.
Validity follows from using majority everywhere; we can prove by induction on the phase number that if the input is constant, all values in all phases are equal to it. Agreement follows from the fact that if I see n-1 ones and decide, nobody will see fewer than n-2 ones, and so any subsequent majority and thus any possible decision value will also be one (the same argument applies to zero). We can show non-termination by dividing the processes into groups A₀ and A₁ of size n/2 each. Initially, each p in A_i has v_p = input_p = i, and at each phase, we deliver n/2 messages from group A_i and n/2-1 messages from group A_¬i to each process p in group A_i, delaying the remaining message until p has moved on to a later phase. Since the majority value in these n-1 messages is always i, v_p never changes, and no process ever sees unanimity.

3. Synchronous agreement with intermittent faults

Suppose we have a system where one process in each round suffers a transmitter failure, causing some subset of its outgoing messages for that round to be lost. These faults are intermittent; a process p that is faulty in round r is not necessarily faulty in round r+1, and indeed if some other process q is faulty in round r+1, then p must not be faulty in round r+1, because at most one process is faulty in each round.

Is it possible to solve synchronous agreement (see SynchronousAgreement) in this model? Give a protocol and prove that it works, or give an impossibility result.

3.1. Solution

Surprisingly, achieving agreement in this model is not possible (at least for n≥2). We can show this using a bivalence argument.

First observe that there is an initial bivalent configuration. When all inputs are 0, the configuration is 0-valent; when all are 1, the configuration is 1-valent. Suppose we change inputs from 0 to 1 one at a time, and consider the resulting intermediate configurations. If all of them are univalent, there are adjacent configurations C₀ and C₁ that differ only in the input to some process p, with C₀ 0-valent and C₁ 1-valent. But if we cut off all outgoing messages from p forever, the remaining processes can't distinguish these two cases, giving a contradiction. It follows that some initial configuration is bivalent.

Now we need to argue that, if we are in a bivalent configuration after k rounds, there is some bivalent configuration reachable from it after one additional round. Given a configuration C, the set of messages sent by all processes is fixed, but different successor configurations C' are reachable depending on which of these messages are delivered. Suppose none of these C' are bivalent; then there exist round k+1 configurations C₀ and C₁ be round k+1 configurations reachable from C that are 0-valent and 1-valent respectively, where C₀ is reached following a transmission failure of some process p₀ in round k and C₁ is reached following a transmission failure of some process p₁ (possibly equal to p₀).

We now construct a sequence of configurations C₀=A₀A₁A₂...A_m=C₁, where each A_i differs from A_i+1 by changing one message delivery. Initially, we get A_i+1 by adding in one message that is not delivered in A_i, until we reach some configuration A_j where all round k messages are delivered. We then work backwards from A_m by having A_m-1 similarly be reached by adding one undelivered message to A_m until we again reach configuration A_j. In each case we have that there is exactly one process q (the recipient of the omitted message) whose state is different in A_i and A_i+1; given two adjacent configurations that are 0-valent and 1-valent, silencing this process q gives a contradiction. It follows that there is in fact some bivalent configuration in this sequence.

(This particular impossibility result, plus several related ones, can be found in this paper.)