Any process on the Internet may attempt to initiate a session. The participants in a session are not known until the session is initiated. Recall that these characteristics originally caused us to label this distributed system anarchic in Section 1.1. In this section, we propose compositional methodologies that help application developers deal with the anarchy.
A specification is a precise definition of behavior. In our model, common specifications allow application developers to write programs with an understanding of:
We discuss specifications further in Section 4.
Specifications allow application developers to reason about the correctness of their processes, interactions, and sessions. Correctness verification is achieved using preconditions and postconditions, which are assertions on the states of program components before and after the execution of statements that cause a transition from one state to another.
Program components can be composed in a number of ways: sequentially, by choice, and in parallel. Given some number of components, with sequential composition, all of the components are executed in order, one after the other. By contrast, given some number of components, with parallel composition, all of the components are executed in some order that cannot be predicted; this execution might happen concurrently on multiple machines. Also, given a number of alternative components, choice composition chooses one to execute under some specified arbitration policy. The different types of composition may be nested to create larger programs from smaller program components.
In our implementation, a process is a Java program with files (i.e., persistent state), that can interact with other processes by sending and receiving messages through its output and input queues. We reason about a process as a state transition system. The state of a process includes the states of its input and output queues. There are two kinds of state transitions in the process: (a) transitions in which the process takes a step, and (b) transitions in which the communication layer takes a step and modifies message queues or raises message-related exceptions.
A state transition in which the process takes a step is an action by a thread of the process. An action can change the program counter of the thread, change local variables, append a message to an output queue, receive a message from an input queue, or query an input queue. An action in which the communication layer takes a step can append a message to an input queue, remove the message at the head of an output queue, or raise message-related exceptions.
A specification of a process is defined in terms of the externally visible aspects of the process: the messages delivered to and sent by the process, and also the process state. For instance, in a calendar application, a specification for processing a ``make appointment'' message is that the state (e.g., the appointments schedule) has changed appropriately. Even if the entire process state is not externally visible, some predicates on the state (which can be defined as ``thought'' or auxiliary variables) are visible.
Process specifications are given in terms of safety properties (e.g., next, stable, and invariant) and progress properties (defined using leads-to) [CM88, Lam94, CS95]. Processes can only be composed in parallel; we do not deal with sequential or choice composition of processes, though we do support sequential and choice composition of sessions.
A session is defined in terms of preconditions and postconditions on the states of processes that participate in the session. A session that is initiated with the prescribed preconditions on specified processes must terminate establishing the prescribed postconditions on the processes. For instance, a session to establish a time for a BOF meeting has the precondition true and the postcondition that the state (i.e., calendar) of each member attending the meeting is changed to record the appointment for the meeting.
A session may be implemented by parallel composition of processes. Since a session does not have input message queues and output message queues, there is no way to bind one session to another by binding message queues. Since sessions themselves cannot be composed in parallel, sessions can be defined in terms of preconditions and postconditions.
We reason about a session as an atomic operation that can change the states of several processes. Sessions can be composed in any of the ways in which statements in a process are composed. For instance, sessions can be composed using sequential and choice composition. Different threads of a process can execute sessions concurrently. Thus, since our system supports threads (because Java does), parallel composition of sessions is possible by having parallel threads initiate sessions.
Since a session can be encapsulated within a statement in a thread of a process, and since processes can be composed to form sessions, we conjecture that the arbitrary nesting of processes and sessions can be supported, given certain constraints. We are presently investigating what those constraints should be.
These concepts are illustrated in Figure 8. Suppose we have threads running in four Java processes P, Q, R, and S. Using the two-phase initiate-and-commit (described in Section 2.1), processes P, Q, and S can synchronize and enter a session together. The corresponding threads in those processes suspend while the session takes place. Meanwhile, other threads in those processes (and threads in other processes such as R) can execute normally (or enter into other sessions that do not interfere with this session with respect to the corresponding process states). When the session terminates, the threads in processes P, Q, and S resume. Later, if the threads in processes Q and R want to hold a session, they can do so, using the same technique. As indicated previously, we can reason about each of these sessions simply as a single operation in a thread, that potentially modifies the states of the participating processes.
Figure 8: can be composed into the threads
of processes in any of the ways in which other statements can be
composed into processes; for example, in this figure, sessions are
sequentially composed into process threads. When all of the
participating processes commit to a session, the session is initiated,
and the corresponding threads suspend; when the session terminates, the
modified process states are saved in the persistent store, and the
suspended threads resume.
Our infrastructure supports services to sessions [CRS96]. One of the challenges is to provide a service layer that sessions can employ when we do not know in advance what the applications that include the sessions do. Next, we give a brief discussion of some of the service support we have planned for sessions.
Traditional distributed systems (e.g., [Tan95]) are architected in a series of well-defined layers, with each layer providing services to the layer above it and using services of the layer below. For instance, a distributed database application employs services (e.g., checkpointing, deadlock detection, and transaction abortion) of the distributed operating system on which it runs.
A session also needs operating system services. The model of application development for sessions and processes is, however, very different from that in traditional systems. We do not expect each process developer to also develop all the operating systems services (e.g., checkpointing, termination detection, and multiway synchronization) that an application needs. Our challenge is to facilitate the development of a library of operating systems services (which we could call servlets) that process developers could use in their processes, as needed.
We consider the problem of composing services with processes. The challenge is to make these services generic enough so that they can be used for very different kinds of applications, and make the services powerful enough to simplify the design of processes.
We focus our discussion here on inter-process services. Methods for coordination within a process use standard Java classes [SC96]. The questions we address are: How can objects associated with a service be bound into a process in a straightforward way, and, what sorts of services are helpful for process designers?
There are complementary ways of providing services to processes. We can provide a collection of service objects that a designer can include in a process. In addition, we can have a resource manager process, executing on each machine, that provides a rich collection of services to processes executing on that machine. Our focus in this paper is on the former approach; we give a few examples of service objects and show how these services can be used within a process.
Distributed operating systems manage indivisible resources shared by processes [Tan95]; we would like to provide service objects with this functionality, which a process designer can incorporate as needed. A problem is that generic service objects do not have information about the specific resources used in a given application.
A solution is to treat indivisible resources in a generic way. The generic service deals with managing indivisible resources, sharing them among processes in a way that avoids deadlock (if processes release all resources before next requesting resources), and detecting deadlock if it does occur (if a process holds on to some resources and then requests more). The designer of a process can separate these service functions from other concerns, and using a library of common service functions can simplify process design.
We treat each resource as a token. Tokens are objects that are neither created nor destroyed; once they are initially set up, a fixed number of them are communicated and shared among the processes of a system. Tokens have colors; tokens of one color cannot be transmuted into tokens of another color. A token represents an indivisible resource, and a token color is a resource type. A file, for instance, is represented by a token and each file-token has a unique color.
A network of token-manager objects manages tokens shared by all the processes in a session. A token is either held by a process or by the network of token managers. A token manager associated with a process has a data member that maintains the number of tokens of each color that the process holds.
A process can carry out the following operations on its token manager. A token request suspends until the tokens requested (i.e., a specified number for each color) are available, and then these tokens are removed from the token manager collection and given to the process. If the token managers detect a deadlock, an exception is raised. A specific positive number of tokens of a given color can be requested or the request can ask for all tokens of a given color. A token release releases the specified tokens from the process and returns them to the token managers. If the tokens specified are not actually held, an exception is raised. A process can also probe the total number of tokens of all colors in the system.
The process that constructs the network of token managers ensures that the initial number of tokens is set appropriately. Tokens are defined by the invariant that the total number of tokens of each color in the system remains unchanged.
Tokens can be used in many ways. For example, suppose we want at most one process to modify an object at any point in the computation. We associate a single token with that object, and only the process holding the token can modify the object. A process that needs to access the object requests the appropriate token from the token management servlet.
As another example, tokens can be used to implement a simple read/write control protocol that allows multiple concurrent reads of an object but at most one concurrent write (and no reads concurrent with a write) of the object. The object is associated with a token color. A process writes the object only if it has all tokens associated with the object, and a process reads the object only if it has at least one token associated with the object.
Access to a global clock simplifies the design of many distributed algorithms. For instance, a global state can be easily checkpointed: all processes checkpoint their local states at some predetermined time T, and the states of the channels are the sequences of messages sent on the channels before T and received after T.
Another use of global clocks is in distributed conflict resolution. Each request for a set of resources is timestamped with the time at which the request is made. Conflicts between two or more requests for a common indivisible resource are resolved in favor of the request with the earlier timestamp. Ties are broken in favor of the process with the lower id. If processes release all resources before requesting resources, and release all resouces within finite time, then all requests will be satisfied.
The problem is that processes do not share a global clock. Though local clocks are quite accurate they are not perfectly synchronized. We can, however, use unsynchronized clocks for checkpointing provided they satisfy the global snapshot criterion [CL85]. The global snapshot criterion is satisfied, provided every message that is sent when the sender's clock is T, is received when the receiver's clock exceeds T. A simple algorithm [Lam78] to establish this criterion is: every message is timestamped with the sender's clock; upon receiving a message, if the receiver's clock value does not exceed the timestamp of the message, then the receiver's clock is set to a value greater than the timestamp.
Our message-passing layer is designed to provide local clocks that satisfy the global snapshot criterion. Our local clocks can be used for checkpointing and conflict resolution just as though they were global clocks. Process designers can separate the generic concerns of clock synchronization from other concerns specific to their application.
Other servlets for sessions we are investigating include a library to enable the creation and maintenance of distributed data structures (e.g., for diffusing computations), and the infrastructure to permit constrained forms of process stack layering.
Also, some servlets, rather than providing services to sessions, provide services to processes or threads. For example, Java provides constructs for synchronizing threads within a process by using something like a monitor [GYtJT96]. We have implemented and verified other kinds of synchronization constructs -- barriers, single-assignment variables, channels, and semaphores -- for threads within a process [SC96]. We are extending these designs to allow synchronizations between threads in different processes in different address spaces.