[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

For more up-to-date notes see http://www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf.

Basic shared-memory model. See also AttiyaWelch §4.1.

1. Atomic registers

An atomic register supports read and write operations; we think of these as happening instantaneously, and think of operations of different processes as interleaved in some sequence. Each read operation on a particular register returns the value written by the last previous write operation. Write operations return nothing.

A process is defined by giving, for each state, the operation that it would like to do next, together with a transition function that specifies how the state will be updated in response to the return value of that operation. A configuration of the system consists of a vector of states for the processes and a vector of value for the registers. An execution consists of a sequence of alternating configurations and operations C0, π1, C1, π2, C2 ..., where in each triple Ci, πi+1, Ci+1 the configuration Ci+1 is the result of applying πi+1 to configuration Ci. For read operations, this means that the state of the reading process is updated according to its transition function. For write operations, the state of the writing process is updated, and the state of the written register is also updated.

Pseudocode for shared-memory protocols is usually written using standard pseudocode conventions, with the register operations appearing either as explicit subroutine calls or implicitly as references to shared variables. Sometimes this can lead to ambiguity; for example, in the code fragment

    done ← leftDone and rightDone    

When all three variables are shared, it is clear that the operation write(done, some-value) happens after read(leftDone) and read(rightDone), but it is not clear which of read(leftDone) and read(rightDone) happens first. When the order is important, we'll write the sequence out explicitly:

    leftIsDone ← read(leftDone)
    rightIsDone ← read(rightDone)
    write(done, leftIsDone and rightIsDone)

Here leftIsDone and rightIsDone are internal variables of the process, so using them does not require read or write operations to the shared memory.

2. Single-writer versus multi-writer registers

One variation that does come up even with atomic registers is what processes are allowed to read or write a particular register. A typical assumption is that registers are single-writer multi-reader—there is only one process that can write to the register (which simplifies implementation since we don't have to arbitrate which of two near-simultaneous writes gets in last and thus leaves the long-term value), although it's also common to assume multi-writer multi-reader registers, which if not otherwise available can be built from single-writer multi-reader registers using AtomicSnapshot. Less common are single-reader single-writer registers, which act much like message-passing channels except that the receiver has to make an explicit effort to pick up its mail.

3. Fairness and crashes

From the perspective of a schedule, the fairness condition says that every processes gets to perform an operation infinitely often, unless it enters either a crashed or halting state where it invokes no further operations. (Note that unlike in AsynchronousMessagePassing, there is no way to wake up a process once it stops doing operations, since the only way to detect that any activity is happening is to read a register and notice it changed.) Because the registers (at least in in multi-reader models) provide a permanent fault-free record of past history, shared-memory systems are much less vulnerable to crash failures than message-passing systems (though FischerLynchPaterson still applies); so in extreme cases, we may assume as many as n-1 crash failures, which makes the fairness condition very weak. The n-1 crash failures case is called the wait-free case—since no process can wait for any other process to do anything—and has been extensively studied in the literature.

For historical reasons, work on shared-memory systems has tended to assume crash failures rather than Byzantine failures—possibly because Byzantine failures are easier to prevent when you have several processes sitting in the same machine than when they are spread across the network, or possibly because in multi-writer situations a Byzantine process can do much more damage. But the model by itself doesn't put any constraints on the kinds of process failures that might occur.

4. Complexity measures

Time

Assume that no process takes more than 1 time unit between operations (but some fast processes may take less). Assign the first operation in the schedule time 1 and each subsequent operation the largest time consistent with the bound. The time of the last operation is the time complexity. This is also known as the big-step or round measure because the time increases by 1 precisely when every non-faulty process has taken at least one step, and a minimum interval during which this occurs counts as a big step or a round.

Total work

The total work is just the length of the schedule, i.e. the number of operations. This doesn't consider how the work is divided among the processes, e.g. an O(n2) total work protocol might dump all O(n2) operations on a single process and leave the rest with almost nothing to do. There is usually not much of a direct correspondence between total work and time. For example, any algorithm that involves busy-waiting—where a process repeatedly reads a register until it changes—may have unbounded total work (because the busy-waiter might spin very fast) even though it runs in bounded time (because the register gets written to as soon as some slower process gets around to it). However, it is trivially the case that the time complexity never greater than the total work.

Per-process work
Measures the maximum number of operations performed by any single process. Produces more equitably distributed workloads (or reveals inequitably distributed workloads). Like total work, per-process work gives an upper bound on time, since each time unit includes at least one operation from the longest-running process, but time complexity might be much less than per-process work (e.g. in the busy-waiting case above).
Contention

In multi-writer or multi-reader situations, it may be bad to have too many processes pounding on the same register at once. The contention measures the maximum number of pending operations on any single register during the schedule (this is the simplest of several definitions out there). A single-reader single-writer algorithm always has contention at most 2, but achieving such low contention may be harder for multi-reader multi-writer algorithms. Of course, the contention is never worse that n, since we assume each process has at most one pending operation at a time.

Space

Just how big are those registers anyway? Much of the work in this area assumes they are very big. But we can ask for the maximum number of bits in any one register or the total size or number of all registers, and will try to minimize these quantities when possible. We can also look at the size of the internal states of the processes for another measure of space complexity.

5. Fancier registers

In addition to stock read-write registers, one can also imagine more tricked-out registers that provide additional operations. These usually go by the name of read-modify-write (RMW) registers, since the additional operations consist of reading the state, applying some function to it, and writing the state back, all as a single atomic action. Examples of RMW registers that have appeared in real machines at various times in the past include:

test-and-set bits
A test-and-set operation sets the bit to 1 and returns the old value.
fetch-and-add registers
A fetch-and-add operations adds some increment (typically -1 or 1) to the register and returns the old value.
compare-and-swap registers
Compare-and-swap writes a new value only if the previous value is equal to a supplied test value.

These are all designed to solve various forms of MutualExclusion or locking.

Some more exotic read-modify-write registers that have appeared in the literature are

fetch-and-cons
Contents of the register is a linked list; fetch-and-cons adds a new head and returns the old list.
sticky bits (or sticky registers)
Once the initial empty value is overwritten, all further writes fail.
bank accounts

Replace the write operation with deposit, which adds a non-negative amount to the state, and withdraw, which subtracts a non-negative amount from the state provided the result would not go below 0; otherwise, it has no effect.

These solve harder problems under bad conditions. Note that they all have to return something in response to an invocation: while one might imagine using blocking objects like locks or semaphores, these don't fit into the RMW framework.

We can also consider generic read-modify-write registers that can compute arbitrary functions (passed as an argument to the read-modify-write operation) in the modify step. Here we typically assume that the read-modify-write operation returns the old value of the register. Generic read-modify-write registers are not commonly found in hardware but can be easily simulated (in the absence of failures) using MutualExclusion.

6. Examples

7. I/O automata version

In IOAutomata terms, register operations get modeled as pairs of actions e.g. invoke-read(p, r), respond-read(p, r, value) or invoke-write(p, r, value), respond-write(p, r), where the first is an input action to the register and the second is the corresponding output action. (This is a bit like send and recv, except the response always goes back to the invoking process instead of to somebody else.) The register is atomic because even though a read or write is split into two actions, it appears to occur instantaneously at some point in between the two actions, as if there was a real internal read or write action happening that just happened to be triggered when the register finally got around to dealing with the invocation, and whose return value (if any) is delivered at some later date in the response. Subject to this assumption, the actual return value of a read operation will be the value appearing in the most recent previous write, or some default initial value if there is no previous write.

We could define an atomic register using a reference implementation, e.g.

states
value × (list of pending reads) × (list of undelivered (p, value) read responses) × (list of pending (p, value) writes) × (list of undelivered write responses)
actions
invoke-read(p, r)
(input) effect: add p to list of pending reads
respond-read(p, r, v)
(output) precondition: (p, v) is in list of undelivered responses; effect: remove (p, v) from list
do-read(p, r, v)
(internal) precondition: p is in list of pending reads, v = value; effect: remove p from that list, add (p, v) to response list
  • ..similar actions for invoke-read and respond-read...
  • do-write(p, r, v)
    (internal) precondition: (p, v) is in list of pending writes; effect value := v, remove (p, v) from pending write list, add p to write response list

    But this gets ugly since almost all of the code involves buffering invocations and responses, and the real action is in the internal do-read and do-write actions. So when modeling at asynchronous shared memory we deviate from I/O Automata orthodoxy and just keep track of the internal do-read and do-write actions, and pretend that the invocations and responses happen at the same time as the internal actions (i.e., "atomically"). This is the reason for assuming that registers are atomic: if the registers were weaker (e.g. if you had a register that might return arbitrary values to a read that happens in between an invoke-write and the corresponding respond-write) then we would have to keep track of the full details of what starts and finishes when. Instead, when describing an execution of a shared-memory system we write down a schedule of register operations, e.g. write(p1, r1, 7), read(p1, r1, 7), write(p2, r, 8), write(p3, r, 9), read(p1, r1, 9) where each entry in the schedule corresponds to an internal do-read or do-write action in the reference implementation.

    For simplicity, we generally require that a process have at most one outstanding invocation at a time: having executed e.g. invoke-read on some register, it can't do another invoke-read or invoke-write until it gets back a response-read or response-write. This allows us to define the process behavior in terms of a deterministic choice of what operation to invoke after each response comes back.


    CategoryDistributedComputingNotes

    1. Without using randomization or making some other strong assumption about the model. (1)


    2014-06-17 11:57