Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

For more up-to-date notes see http://www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf.

Intuitive model: we have n processes, processes communicate by passing messages, but unlike in SynchronousMessagePassing there are few guarantees on when (or in some models, if) messages are delivered.

Formal model: we'll model processes and communications channels as IOAutomata, possibly with some additional constraints on channel traces.

Modeling processes is easy: process p_i is an automaton with input events of the form recv(sender, i, message) and output events of the form send(i, receiver, message); otherwise the behavior of the process is up to its programmer.

Modeling channels involves more detail, since we actually want to specify their possible executions. From the outside, a channel automaton for the i->j channel is just an automaton with input events send(i, j, m) and output events recv(i, j, m). But channels can vary in how they choose to generate recv events in response to send events.

1. Axiomatic definitions of channels

General approach: given a trace of a channel, we want to assign to each receive event a preceding send event that caused it. This assignment corresponds to constructing a cause() function that describes the mapping. Following LynchBook §14.1.2 we can write down four axioms for a reliable FIFO channel, which we'll drop for less reliable or less FIFO channels.

If a = recv(i,j,m) then cause(a) = send(i,j,m) for the same i,j,m. [Messages are not modified in transit.]
cause is surjective. [Messages are not lost---every send event has a matching recv event.]
cause is injective. [Messages are not duplicated.]
If a₁ < a₂ (in the trace ordering), then cause(a₁) < cause(a₂). [Messages are delivered in order.]

There is also an implicit "axiom 0": cause(a) < a for all a: messages do not travel backwards in time. Systems that violate this axiom can do all sorts of interesting things, but they appear to be difficult to build in practice.

In real life, the closest thing one gets to a reliable FIFO channel is a TCP connection, which attempts to guarantee all of the properties of a reliable channel but may go down at some point, with unpredictable results.

If we drop axiom 4, we get a reliable reordering channel---this delivers each message exactly once, but may deliver them out of order.

If we drop condition 3, we can duplicate packets. If we drop condition 2, we can lose packets. UDP packets, for example, can be both lost and duplicated.

Lossy channels are not very interesting, because they have a trivial implementation that never issues a recv event. So one often adds some additional restrictions on traces (which may be difficult to enforce internally in an automaton specification, as we will see below):

Strong loss limitation: Any message that is sent infinitely many times is received infinitely many times.
Weak loss limitation: If infinitely many messages are sent, infinitely many messages are received (but particular messages may never be received).
Finite duplication: A single send event can generate at most finitely many receive events.

These assumptions are used in data-link protocols that try to build reliable channels on top of unreliable ones.

2. Automata-theoretic definitions of channels

An alternative to giving axioms for traces is to supply representative automata, and say that a given candidate automaton C is a reliable FIFO channel (say) iff its fair traces correspond to the fair traces of a standardized reliable FIFO channel. This works better for some combinations of axioms than others, because of the limitations of the fairness constraints for I/O automata.

2.1. The universal reliable FIFO automaton

Here is the automaton for an i->j channel. Recall that the signature for all channel automata makes send and input action and recv an output action.

states

M^* where M is the message space.

actions

send(i,j,m)

effects: state <- state :: m

recv(i,j,m)

precondition: state = m :: Q
effects: state <- Q

In English, it's a big queue. The task partition includes all recv events as a single equivalence class (in fact, any task partition works).

To prove that the fair traces of this automata satisfy the axioms, observe that any fair trace is either finite (and the queue is empty at the end => same number of send and recv events) or infinite with infinitely many send and recv events (the last is slightly tricky---either the queue is always full so recv events are enabled infinitely often and thus occur infinitely often, or it is empty infinitely often so any enqueued message must get sent soon thereafter). To get cause(r), map the k-th recv event in the trace to the k-th send event. We can easily show that the size of the queue after any prefix = (# of sends) - (# of recvs) >= 0, so axiom 0 applies; the ordering of the queue ensures axiom 1, and axioms 2-4 are immediate from our definition of cause.

In the other direction, we'd like to show that any trace t that satisfies axioms 0-4 is a fair trace of our automaton. We can split this task into showing that it's a trace and then that it's fair. To show the former, suppose t satisfies 0-4 and let t' be the shortest prefix of t that is not a trace of the automaton A. Then the last event in t' is a recv event. Suppose it's the r-th recv event in t', and that there are s send events in t'. If r > s we violate either axiom 0 or axiom 3, so we have r <= s. From the code for the automaton we can prove that A can only issue recv(i,j,m) where m message in the r-th send event. We then argue from axioms 2-4 that since cause is an order-preserving bijection between send events and recv events and axiom 1 (a matches cause(a)) that this is precisely what occurs in t'.

2.2. A reliable reordering channel

Replace the queue with a multiset (or "bag").

states

multisets in-transit over M where M is the message space.

actions

send(i,j,m)

effects: in-transit <- in-transit + m

recv(i,j,m)

precondition: m is in in-transit
effects: in-transit <- in-transit - m

This is identical to the automaton that appears in LynchBook §14.1.2. The technique that LynchBook uses to get the liveness property of this automaton (essentially axiom 2, all messages are eventually delivered), is to specify it as an additional trace property required of the automaton. However, in this case we can also encode it in the task partition: make all events send(i,j,m) into their own singleton tasks. It follows that in any fair trace that for any particular message m either (1) the trace is finite and there are no messages in in-transit at the end; (2) the trace is infinite, and recv(i,j,m) is not enabled infinitely often => all m's are delivered since every send(i,j,m) is followed by a state in which no m's remain in in-transit; or (3) the trace is infinite, and recv(i,j,m) occurs infinitely often. In either case we have that the appearence of m in in-transit at any time implies some later recv(i,j,m) event, which is the trace-property version.

2.3. A lossy FIFO channel

states

M^* where M is the message space.

actions

send(i,j,m)

effects: state <- state :: m^k for any k > 0

recv(i,j,m)

precondition: state = m :: Q
effects: state <- Q

The task partition puts all recv events in the same class. This channel provides finite duplication and weak loss limitation in infinite traces. Note that if only finitely many send events occur, no messages are lost at all. It's not entirely obvious how to design a lossy FIFO channel automaton whose fair traces (including the finite ones) are precisely those that satisfy axioms 0, 1, 4 + finite duplication + WLL, so here we are worrying only about infinite traces. LynchBook handles this by allowing k=0 (somewhat implicitly) and using an explicit WLL restriction on traces.

2.4. A lossy reordering channel

As above, but with multisets:

states

multisets in-transit over M where M is the message space.

actions

send(i,j,m)

effects: in-transit <- in-transit + k*m for some k > 0

recv(i,j,m)

precondition: m is in in-transit
effects: in-transit <- in-transit - m

Here the task partition again puts all recv tasks in the same bin. This has the same problem in the preceding case of requiring all messages are delivered at least once if there are only finitely many sends, and a better alternative is probably to augment the automaton (without k > 0) with an explicit WLL axiom.

3. Channel simulations

Simulating more reliable channels using less reliable channels is the responsibility of DataLinkProtocols, which we'll discuss more later. For the moment, let's just look at a simple simulation of a reliable FIFO channel C+ using a reliable reordering channel C. The idea is that we will augment C with extra automata S and R at each end that handle tagging messages and processing messages so that they are delivered in order. Given a system P1 -> S -> C -> R -> P2, if S -> C -> R composes together into something that acts like C+, then we can treat the system like P1 -> C+ -> P2 even if the actual implementation looks more like (P1 -> S) -> C -> (R -> P2).

Here's the code for S. We assume that S has input actions send(i,j,m) that it translates into output actions send'(i,j,m) (which are in turn inputs to C):

states

count, initially 0; multiset M_S of (count, message) pairs, initially empty

actions

send(i,j,m)

effects: M <- M + (count, M), count <- count + 1

send'(i,j,(c,m))

precondition: (c,m) in M
effects: M <- M - (c,m)

task partition

all send'(i,j,(c,m)) in separate classes

And here's the code for R. Note that here recv' is an input action (output of C) and recv is an output action.

states

count, initially 0; multiset M_R of (count,message) pairs, initially empty

actions

recv'(i,j,(c,m))

effects: M <- M + (c,m)

recv(i,j,m)

precondition: M contains (c,m) where c = count
effects: M <- M - (c,m), count <- count + 1

task partition

not needed---at most one recv action is enabled in any reachable state of S+C+R

Now we want to argue that the traces of S+C+R are a subset of the traces of C. To do so, we use a simulation map: the composite state ((c_S,M_S),in-transit,(c_R,M_R)) is mapped the union of all m in M_S, in-transit and M_R, where the messages in the union are ordered by their attached counts. We also need an invariant on how c_S and c_R relate to the set of counts for messages in in-transit and M.

Invariant: Either c_R = c_S or the set of message numbers in M_S union in-transit union M_R is precisely c_R c_R+1, ... c_S-1.
Proof: It's true in the initial state. A send action increments c_S and adds (c_S-1, m) to M_S; send' and recv' just move existing messages between bags; and recv pulls (c_R, m) then increments c_R; in each case the invariant is preserved.

Now we just check that each operation preserves the simulation map, i.e. that for any of send, send', recv', or recv taking s to s' the corresponding trace (empty trace for send' or recv' since they're internal) takes f(s) to f(s') in the reliable automaton.

send(i,j,m): Adds new (c_S, m) to union of the bags and thus Q = f(s); it's at the end since c_S > previous sequence numbers by the invariant.
send'(i,j,m), recv(i,j,m): moves (c,m) from one bag to another, so no effect on f(s); equivalent to a length-zero trace of Q.
recv(i,j,m): Removes (c_R,m) from M_R and thus from Q = f(s). By the invariant c_R is the smallest sequence number in the bags, so m is the first message in Q.

The nice thing about this proof is that it is very short, and once the invariant and simulation map are given the rest is essentially mechanical verification. So it's harder to make mistakes than if, say, we had to reason about cause maps directly using the axioms. The downside is that we have only shown trace(S+C+R) is a subset of trace(Q); we still need a separate argument that demonstrates that fair-trace(S+C+R) generates only fair traces of Q. (Intuitively, we can look at any individual (c,m) pair and argue that (a) it eventually moves from M_S to C because of the fairness constrain on S, (b) it eventually moves to M_R because of the fairness constraint on C, and (c) it is eventually delivered because of fairness on R, but the details are not as mechanical as the simulation-map checking.)

CategoryDistributedComputingNotes