#
Highly Fault-Tolerant Parallel Computation

**Author:**
Daniel A. Spielman.
**Bibliographic Information:**
Appeared in *Proceedings of the 37th Annual IEEE Conference
on Foundations of Computer Science*, pp. 154-163, 1996.

## Abstract

We re-introduce the {\em coded model} of fault-tolerant
computation in which the input and
output of a computational device are treated as words in an error-correcting
code.
A computational device correctly computes a function in the coded model
if its input and output, once decoded, are
a valid input and output of the function.
In the coded model, it is reasonable to hope for exponentially
reliable computation with devices in which each component
can fail with
some constant probability.
We consider fine-grained parallel computations in which
each processor has a constant probability of producing
the wrong output at each time step.
We show that any parallel computation
that runs for time $t$ on $w$ processors can be
performed on a faulty machine in the coded model
using
$w \log^{O(1)}w $ processors and time
$t \log^{O(1)}w$.
The failure probability of the computation will
be $t \cdot exp(-w^{1/4})$.
We show that any computation that can be performed efficiently
by a parallel machine can be performed
in the coded model
by an exponentially reliable
fault-tolerant circuit whose size is greater by a polylogarithmic factor:
Any circuit of width $w$ and
height $h$ can be simulated in the coded model by a circuit
of size $h w \log^{O(1)} w$ that will fail with
probability at most $h \cdot \mbox{{\rm exp}}(-w^{1-1/\log \log w})$.

The codes used to communicate with our fault-tolerant
circuits can be
encoded and decoded in $\O{n \log^{2} n}$ sequential time and
are independent of the circuit they are used to communicate with.

We show how techniques from this paper can
be used to self-correct many linear functions in parallel
with arbitrarily small overhead.

You can download this paper as
Postscript,
compressed Postscript,
PDF, or
compressed PDF.