For more up-to-date notes see http://www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf.
Like SynchronousAgreement except that we replace crash failures with Byzantine failures, where a faulty process can ignore its programming and send any messages it likes. Since we are operating under a universal quantifier, this includes the case where the Byzantine processes appear to be colluding with each other under the control of a centralized adversary.
1. Lower bounds
1.1. Minimum number of rounds
We've already seen an f+1 lower bound on rounds for crash failures (see IndistinguishabilityProofs). Applies a fortiori to Byzantine failures, since Byzantine failures can simulate crash failures.
1.2. Minimum number of processes
We can also show that we need n ≥ 3f+1 processes. For n = 3 and f = 1 the intuition is that Byzantine B can play A and C off against each other, telling A that C is Byzantine and C that A is Byzantine. Since A is telling C the same thing about B that B is saying about A, C can't tell the difference and doesn't know who to believe. Unfortunately, this tragic soap opera is not a real proof, since we haven't actualy shown that B can say exactly the right thing to keep A and C from guessing B is evil.
The real proof (see also AttiyaWelch §5.2.3): consider an artifical execution where (non-Byzantine) A, B, and C are duplicated and then placed in a ring A0 B0 C0 A1 B1 C1, where the digits indicate inputs. We'll still keep the same code for n=3 on A0, B0, etc, but when A0 tries to send a message to what it thinks of as just C we'll send it to C1 but messages from B0 will instead go to C0. For any adjacent pair of processes (e.g. A0 and B0), the behavior of the rest of the ring could be simulated by a single Byzantine process (e.g. C), so each process in the 6-process ring behaves just as it does in some 3-process execution with 1 Byzantine process. It follows that all of the processes terminate and decide in the unholy 6-process Frankenexecution1 the same value that they would in the corresponding 3-process Byzantine execution. So what do they decide?
Given two processes with the same input e.g. A0 and B0, the giant execution is indistinguishable from an A0 B0 C execution where C is Byzantine. Validity says A0 and B0 must both decide 0. Since this works for any pair of processes with the same input, we have each process deciding its input. But now consider the execution of C0 A1 B, where B is Byzantine. In the big execution, we just proved that C0 decides 0 and A1 decides 1, but since the C0 A1 B execution is indistinguishable from the big execution to C0 and A1, they do the same thing here and violate agreement.
This shows that with n=3 and f=1, we can't win. We can generalize this to n = 3f. Suppose that there were an algorithm that solved Byzantine agreement with n=3f processes. Group the processes into groups of size f, and let each of the n=3 processes simulate one group (giving everybody the same input). Then we get a protocol for n=3 and f=1, an impossibility.
1.3. Minimum connectivity
So far, we've been assuming a complete communication graph. If the graph is not complete, we may not be able to tolerate as many failures. In particular, we need the connectivity of the graph (minimum number of nodes that must be removed to split it into two components) to be at least 2f+1. See LynchBook §6.5 for the full proof. The essential idea is that if we have an arbitrary graph with a vertex cut of size k < 2f+1, we can simulate it on a 4-process graph where A is connected to B and C (but not D), B and C are connected to each other, and D is connected only to B and C. Here B and C each simulate half the processes in the size-k cut, A simulates all the processes on one side of the cut and D all the processes on the other side. We then construct an 8-process artificial execution with two non-faulty copies of each of A, B, C, and D and argue that if one of B or C can be Byzantine then the 8-process execution is indistinguishable to the remaining processes from a normal 4-process execution. An argument similar to the n ≥ 3f+1 proof then shows we violate one of validity or agreement.
Conversely, if we have connectivity 2f+1, then the processes can simulate a general graph by sending each other messages along 2f+1 predetermined vertex-disjoint paths and taking the majority value as the correct message. Since the f Byzantine processes can only corrupt one path each (assuming the non-faulty processes are careful about who they forward messages from), we get at least f+1 good copies overwhelming the f bad copies. This reduces the problem on a general graph with sufficiently high connectivity to the problem on a complete graph, allowing Byzantine agreement to be solved if the other lower bounds are met.
2. Upper bounds
2.1. Exponential information gathering gets n = 3f+1
We'll show that a variant of Exponential Information Gathering for SynchronousAgreement works with n ≥ 3f+1 in f+1 rounds.
Recall EIG gives us at each node a set of pairs (path, value) where path spans all sequences of 0 to n distinct ids and value is the input value forwarded along that path. We think of the set of paths as a tree where w is the parent of wj for each path w and each id j not in w. To apply EIG in the Byzantine model, ill-formed messages received from j are treated as missing messages, but otherwise the data-collecting part of EIG proceeds as in the crash failure model. However, we compute the decision value recursively as follows. First replace any pair (w, ⊥) with |w| = f+1 with (w, 0). Then for each path w, define val'(w, i) = majority value among val'(wj, i) for all j, or val(w, i) if |w| = f+1. Finally, decide val'(< >, i).
In computing the final decision we ignore any values attached to short paths, replacing them with the computed majority values. One way to think about this is that I don't trust j to give me the right value for wj, so instead a take a majority of values of wj that j allegedly reported to other people---which are in turn not computed from the untrustworthy accounts of those unreliable witnesses but from majorities repeating what they said.
2.1.1. Proof of correctness
- If i, j, and k are all non-faulty then for all w, val(wk, i) = val(wk, j) = val(w, k).
- Trivial: k announces the same value val(w, k) to both i and j.
- If j is non-faulty then val'(wj, i) = val(wj, i) for all non-faulty i and all w.
By induction on f+1-|wj|. If |wj| = f+1, then val'(wj, i) = val(wj, i) = val(w, j) = val(wj, i). If |wj| < f+1, then val(wj, k) = val(w, j) for all non-faulty k. It follows that val(wjk, i) = val(w, j) for all non-faulty i and k (that do no appear in w). The bad guys report at most f bad values val(wj, k'), but the good guys report at least n-f-|wj| good values val(wj, k). Since n ≥ 3f + 1 and |wj| <= f we have n - f - |wj| ≥ 3f+1 - f - f ≥ f+1 good values, which are a majority.
- Proof of Validity
- If everybody has the same input v, we always get a majority of val' values equal to this input at each step, and we win.
- Proof of Agreement
Call a node common if val'(w, i) = val'(w, j) for all non-faulty i, j. Lemma above says wk is common if k is good. With n ≥ 3f+1, taking majorities ensures any parent of only common nodes is also common. Observe that every path has a common node on it, since a path travels through f+1 nodes and one of them is good. Now suppose root is not common: it must have a not-common kid, that node must have a not-common kid, etc. But this constructs a path with a not-common kid which we just proved can't happen.
2.2. Phase king gets constant-size messages
The following algorithm, due to Berman, Garay, and Perry , achieves Byzantine agreement in 2(f+1) rounds using constant-size messages, provided n ≥ 4f+1. The description here is drawn from AttiyaWelch §5.2.5. There is also a variant that achieves n = 3f+1 in 3(f+1) rounds (see http://cm.bell-labs.com/who/garay/bit.ps), but it's more complicated.
2.2.1. The algorithm
The basic idea of the algorithm is that we avoid the recursive majority voting of EIG by running a vote in each of f+1 "phases" through a "phase king," some process chosen in advance to run the phase. Since the number of phases exceeds the number of faults, we eventually get a non-faulty phase king. The algorithm is structured so that one non-faulty phase king is enough to generate agreement and subsequent faulty phase kings can't undo the agreement.
Here is pseudo-code for the algorithm:
Each processes i maintains an array prefi[j] where j ranges over all process ids.
- There are also utility values majority and multiplicity.
Initially, prefi[i] = input and prefi[j] = 0 for j <> i.
- Phase k consists of rounds 2k-1 and 2k:
- In round 2k-1 (first round of phase k):
send prefi[i] to all processes (including myself)
set prefi[j] = vj where vj is the value received from process j.
set majority = majority value in prefi and multiplicity = # of times majority appears in prefi.
- In round 2k (second round of phase k):
- if i = k then send majority to all processes (only the phase king does this)
- receive king-majority from the phase king (everybody does this)
if multiplicity > n/2 + f then
prefi[i] = majority
prefi[i] = king-majority
if k = f+1 then decide prefi[i].
- In round 2k-1 (first round of phase k):
What is going on here is that in each phase, everybody announces their current preference (initially the inputs). If the majority of these preferences is large enough (e.g. all inputs are the same), everybody adopts the majority preference. Otherwise everybody adopts the preference of the phase king.
2.2.2. Proof of correctness
Termination is immediate from the algorithm.
For Validity, suppose all inputs are v. We'll show that all non-faulty i have prefi[i] = v after every phase. In the first round of each phase, process i receives at least n-f messages containing v; since n ≥ 4f + 1 we have n-f ≥ 3f+1 and n/2 + f <= (4f+1)/2 + f = 3f+1/2, and thus these n-f messages exceed the n/2+f threshold for adopting them as the new preference. So all non-faulty processes ignore the phase king and stick with v, eventually deciding v after round 2(f+1).
For Agreement, we'll ignore all phases up to the first phase with a non-faulty phase king. Let k be the first such phase, and assume that the pref values are set arbitrarily at the start of this phase. We want to argue that at the end of the phase, all non-faulty processes have the same preference. There are two ways that a process can set its new preference in the second round of the phase:
The process i observes a majority of more than n/2+f identical values v and ignores the phase king. Of these values, more than n/2 of them were sent by non-faulty processes. So the phase king also receives these values (even if the faulty processes change their stories) and chooses v as its majority value. Similarly, if any other process j observes a majority of n/2+f identical values, the two >n/2 non-faulty parts of the majorities overlap, and so j also chooses v.
- The process i takes its value from the phase king. We've already shown that i then agrees with any j that sees a big majority; but since the phase king is non-faulty, process i will agree with any process j that also takes its new preference from the phase king.
This shows that after any phase with a non-faulty king, all processes agree. The proof that the non-faulty processes continue to agree is the same as for Validity.
2.2.3. Performance of phase king
It's not hard to see that this algorithm sends exactly (f+1)(n^2+n) messages of 1 bit each (assuming 1-bit inputs). The cost is doubling the minimum number of rounds and reducing the tolerance for Byzantine processes. As mentioned earlier, a variant of phase-king with 3-round phases gets optimal fault-tolerance with 3(f+1) rounds (but 2-bit messages). Still better is a rather complicated descendant of the EIG algorithm due to Garay and Moses (see http://cm.bell-labs.com/who/garay/3t.pdf), which gets f+1 rounds with n ≥ 3f+1 while still having polynomial message traffic.
3. Weak Byzantine agreement
(See LynchBook §6.6.)
Like regular Byzantine agreement but Validity is only required to hold if there are no faulty processes at all, i.e. if there is a single faulty process, the non-faulty processes can output any value regardless of their inputs (as long as they agree on it). Sadly, this weakening doesn't improve things much: even weak Byzantine agreement can be solved only if n ≥ 3f+1.
Proof: As in the strong Byzantine agreement case, we'll construct a many-process Frankenexecution to figure out a strategy for a single Byzantine process in a 3-process execution. The difference is that now the number of processes in our synthetic execution is much larger, since we want to build an execution where at least some of our test subjects think they are in a non-Byzantine environment. The trick is to build a very big, highly-symmetric ring so that at least some of the processes are so far away from the few points of asymmetry that might clue them in to their odd condition that the protocol terminates before they notice.
Fix some protocol that allegedly solves weak Byzantine agreement, and let r be the number of rounds for the protocol. Construct a ring of 6r processes A01 B01 C01 A02 B02 C02 ... A0r B0r C0r A10 B10 C10 ... A1r B1r C1r where each Xij runs the code for process X in the 3-process protocol with input i. For each adjacent pair of processes, there is a 3-process Byzantine execution which is indistinguishable from the 6r-process execution for that pair: since Agreement holds in all Byzantine executions, each adjacent pair decides the same value in the big execution and so either everybody decides 0 or everybody decides 1 in the big execution.
Now we'll show that means that Validity is violated in some no-failures 3-process execution. We'll extract this execution by looking at the execution of processes A0r/2 B0r/2 C0r/2. The argument is that up to round r, any input-0 process that is at least r steps in the ring away from the nearest 1-input process acts like the corresponding process in the all-0 no-failures 3-process execution. Since A0r/2 is 3r/2 > r hops away from A1r and similarly for C0r/2, our 3 stooges all decide 0 by Validity. But now repeat the same argument for A1r/2 B1r/2 C1r/2 and get 3 new stooges that all decide 1. This means that somewhere in between we have two adjacent processes where one decides 0 and one decides 1, violating agreement in the corresponding 3-process execution where the rest of the ring is replaced by a single Byzantine process.
Not a real word. (1)