Abstract
This work applies the theory of knowledge in distributed systems to the design of efficient fault-tolerant protocols. We define a large class of problems requiring coordinated, simultaneous action in synchronous systems, and give a method of transforming specifications of such problems into protocols that areoptimal in all runs: these protocols are guaranteed to perform the simultaneous actions as soon as any other protocol could possibly perform them, given the input to the system and faulty processor behavior. This transformation is performed in two steps. In the first step we extract, directly from the problem specification, a high-level protocol programmed using explicit tests for common knowledge. In the second step we carefully analyze when facts become common knowledge, thereby providing a method of efficiently implementing these protocols in many variants of the omissions failure model. In the generalized omissions model, however, our analysis shows that testing for common knowledge is NP-hard. Given the close correspondence between common knowledge and simultaneous actions, we are able to show that no optimal protocol for any such problem can be computationally efficient in this model. The analysis in this paper exposes many subtle differences between the failure models, including the precise point at which this gap in complexity occurs.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
J. Burns and N. A. Lynch, The Byzantine firing squad problem, MIT Technical Report MIT/LCS/TM-275, 1985.
K. M. Chandy and J. Misra, How processes learn,Distrib. Comput.,1(1), 1986, 40–52.
B. Coan, A communication-efficient canonical form for fault-tolerant distributed protocols,Proceedings of the Fifth PODC, 1985, pp. 63–72.
B. Coan, D. Dolev, C. Dwork, and L. Stockmeyer, The distributed firing squad problem,Proceedings of the Seventeenth STOC, 1985, pp. 335–345.
D. Dolev, R. Reischuk, and H. R. Strong, Eventual is earlier than immediate,Proceedings of the 23th FOCS, 1982, pp. 196–203.
C. Dwork and Y. Moses, Knowledge and common knowledge in a Byzantine environment: The case of crash failures,Proceedings of the Conference on Theoretical Aspects of Reasoning About Knowledge, Monterey, 1986, J. Y. Halpern ed., Morgan Kaufmann, Los Altos, CA, pp. 149–170. Slightly revised as MITTechnical Report MIT/LCS/TM-300, 1986. To appear inInformation and Computation.
M. J. Fisher, The consensus problem in unreliable distributed systems (a brief survey), Yale University Technical Report YALEU/DCS/RR-273, 1983.
M. J. Fischer and N. Immerman, Foundations of knowledge for distributed systems,Proceedings of the Conference on Theoretical Aspects of Reasoning About Knowledge, Monterey, 1986, J. Y. Halpern ed., Morgan Kaufmann, Los Altos, CA, pp. 171–185.
M. J. Fischer and N. A. Lynch, A lower bound for the time to assure interactive consistency,Infor. Process. Lett.,14(4), 1982, 183–186.
M. R. Garey and D. S. Johnson,Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman and Company, San Francisco, 1979.
V. Hadzilacos, A lower bound for Byzantine agreement with fail-stop processors, Harvard University Technical Report TR-21-83.
J. Y. Halpern and R. Fagin, A formal model of knowledge, action, and communication in distributed systems,Proceedings of the Fourth PODC, 1985, pp. 224–236.
J. Y. Halpern and Y. Moses, Knowledge and common knowledge in a distributed environment. Version of August 1987 is available as IBM Research Report RJ 4421. Early versions appeared inProceedings of the Third PODC, 1984, pp. 50–61; and as IBM Research Report RJ 4421, 1984 and 1986.
J. Y. Halpern and Y. Moses, A guide to the modal logic of knowledge and belief,Proceedings of the Ninth IJCAI, 1985, pp. 480–490.
J. Hintikka,Knowledge and Belief, Cornell University Press, Ithaca, NY, 1962.
J. E. Hopcroft and J. D. Ullman,Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, Reading, Massachusetts, 1979.
L. Lamport and M. J. Fischer, Byzantine generals and transaction commit protocols, SRI Technical Report Op. 62, 1982.
R. Michel, Attaining common knowledge in synchronous distributed networks, unpublished manuscript, 1986.
C. Mohan, H. R. Strong, and S. Finkelstein, Methods for distributed transaction commit and recovery using Byzantine agreement within clusters of processors,Proceedings of the Second PODC, 1983, pp. 89–103.
Y. Moses, Knowledge in a distributed environment, Ph.D. Thesis, Stanford University Technical Report STAN-CS-1120, 1986.
R. Parikh and R. Ramanujam, Distributed processes and the logic of knowledge (preliminary report),Proceedings of the Workshop on Logics of Programs, 1985, pp. 256–268.
M. Pease, R. Shostak, and L. Lamport, Reaching agreement in the presence of faults,J. Assoc. Comput. Mach.,27(2), 1980, 228–234.
K. Perry and S. Toueg, Distributed agreement in the presence of processor and communication faults,IEEE Trans. Software Engrg,12(3), 1986, 477–482.
M. O. Rabin, Efficient solutions to the distributed firing squad problem, private communication.
Author information
Authors and Affiliations
Additional information
Communicated by Jeffrey Scott Vitter.
This research was supported by the Office of Naval Research under contract N00014-85-K-0168, by the Office of Army Research under contract DAAG29-84-K-0058, by the National Science Foundation under Grant DCR-8302391, and by the Defense Advanced Research Projects agency (DARPA) under contract N00014-83-K-0125, and was performed while both authors were at MIT. A preliminary version of this work appeared in theProceedings of the 27th Annual IEEE Symposium on Foundations of Computer Science, Toronto, 1986.
This author was primarily supported by an IBM postdoctoral fellowship.
Rights and permissions
About this article
Cite this article
Moses, Y., Tuttle, M.R. Programming simultaneous actions using common knowledge. Algorithmica 3, 121–169 (1988). https://doi.org/10.1007/BF01762112
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01762112