[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

For more up-to-date notes see http://www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf.

Distributed computing systems are characterized by their structure: a typical distributed computing system will consist of some large number of interacting devices that each run their own programs but that are affected by receiving messages or observing shared-memory updates from other devices. Examples of distributed computing systems range from simple systems in which a single client talks to a single server to huge amorphous networks like the Internet as a whole.

As distributed systems get larger, it becomes harder and harder to predict or even understand their behavior. Part of the reason for this is that we as programmers have not yet developed the kind of tools for managing complexity (like subroutines or objects with narrow interfaces, or even simple structured programming mechanisms like loops or if/then statements) that are standard in sequential programming. Part of the reason is that large distributed systems bring with them large amounts of inherent nondeterminism—unpredictable events like delays in message arrivals, the suddent failure of components, or in extreme cases the nefarious actions of faulty or malicious machines opposed to the goals of the system as a whole. Because of the unpredictability and scale of large distributed systems, it can often be difficult to test or simulate them adequately. Thus there is a need for theoretical tools that allow us to prove properties of these systems that will let us use them with confidence.

The first task of any theory of distributed systems is modeling: defining a mathematical structure that abstracts out all relevant properties of a large distributed system. There are many foundational models for distributed systems, but for CS425 we will follow AttiyaWelch and use simple automaton-based models. Here we think of the system as a whole as passing from one global state to another in response to events, e.g. local computation at some processor, an operation on shared memory, or the delivery of a message by the network. The details of the model will depend on what kind of system we are trying to represent:

We'll see many of these at some point in CS425, and examine which of them can simulate each other under various conditions.

Properties we might want to prove about a model include:


CategoryDistributedComputingNotes


2014-06-17 11:58