Towards Abstractions for Distributed Systems

Abstract. For historical, sociological and technical reasons, lambda-calculi have been the dominant theoretical paradigm in the study of functional computation. Similarly, but to a lesser degree, pi-calculi dominate advanced mathematical accounts of concurrency. Alas, and despite its ever increasing ubiquity, an equally convincing formal foundation for distributed computing has not been forthcoming. This thesis seeks to contribute towards ameliorating that omission. To this end, guided by the assumption that distributed computing is concurrent computing with partial failures of various kinds, we extend the asynchronous pi-calculus with

a notion of sites,
the possibility of site failure,
a persistence mechanism to deal with site failures,
the distinction between inter-site and intra-site communication,
the possibility of message loss in inter-site communication and
a timer construct, as is often used in distributed algorithms to deal with various failure scenarios.

The basic theory of two of the resulting augmented pi-calculi is explored in considerable detail: the asynchronous pi-calculus with timers and the asynchronous pi-calculus with timers, sites and message failure. The emphasis of this development is on soundly approximating reduction congruence, the canonical equivalence for asynchronous pi-calculi. In the case of the asynchronous pi-calculus with timers we manage to obtain a characterisation of reduction congruence as a labelled bisimilarity. Our results appear to be robust under variations of the underlying calculi. In addition, as a test to evaluate the usefulness and the integration of the aforementioned extensions, we cleanly and incrementally represent the Two Phase Commit Protocol -- an important distributed algorithm, used to ensure consistency of distributed databases in the face of various kinds of (non-byzantine) failures -- in the asynchronous pi-calculus, the asynchronous pi-calculus with timers, sites and message failure and in the asynchronous pi-calculus with timers, sites, site failure and message failure. We establish the correctness of our representations by equational reasoning.

Update (5 May 2005). An anonymous reviewer found a rather clever counterexample to my Theorem 25 (the labelled characterisation of the maximal sound theory). In other words: Theorem 25 is wrong! That does not influence the rest of the development, as Theorem 25 is not really used anywhere. In particular, the labelled bisimilarities are still sound, but I'm currently investigating the source of the problem and will publish the forensics. It seems that the problem can be overcome. Moreover, the counterexample is such that it is irrelevant to the verification of realistic programs. The pdf file below has not yet been modified. I will update the thesis when I have some more time. For now, my Reply to referees below explains the problem.