Abstract
We define a simple variant of the Byzantine agreement (BA) problem, called the Failure Discovery (FD) problem, that roughly speaking, amounts to reaching BA provided that no failures are discovered. We show how a protocol for FD can be extended to one for BA, with no message overhead in the failure-free runs. We also show that, for so-calledbenign failures, if the FD protocol satisfies an additional property, the message-preserving extension to a BA protocol can be accomplished with minimal time overhead in the failure-free runs. Our results show that FD is a useful building block for BA; indeed, it has been used in this way in a companion paper (Hadzilacos and Halpern, 1993).
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Amdur, E. S., S. M. Weber, and V. Hadzilacos. On the Message Complexity of Binary Byzantine Agreement Under Crash Failures.Distributed Computing,5:175–186, 1992.
Attiya, H., N. A. Lynch, and N. Shavit. Are Wait-Free Algorithms Fast? InProc. 31st Symp. on Foundations of Computer Science, pp. 55–64, October 1990.
Bar-Noy, A., D. Dolev, C. Dwork, and H. R. Strong. Shifting Gears: Changing Algorithms on the Fly To Expedite Byzantine Agreement. InProc. 6th ACM Symp. on Principles of Distributed Computing, pp. 42–51, August 1987.
Bernstein, P. A., V. Hadzilacos, and N. Goodman.Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading, MA, 1987.
Chandra, T. D., and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. InProc. 10th ACM Symp. on Principles of Distributed Computing, pp. 325–340, August 1991.
Dolev, D. The Byzantine Generals Strike Again.Journal of Algorithms,3(1): 14–30, January 1982.
Dwork, C. Strong Verifiable Secret Sharing. InProc. 4th Internat. Workshop on Distributed Algorithms, pp. 213–227. Lecture Notes in Computer Science, Vol. 486. Springer-Verlag, Berlin, 1990.
Eager, D. L., and K. C. Sevcik. Achieving Robustness in Distributed Database Systems.A CM Transactions on Database Systems,8(3): 354–381, September 1983.
Fischer, M. J. The Consensus Problem in Unreliable Distributed Systems. Research Report RR-273, Department of Computer Science, Yale University, June 1983.
Hadzilacos, V. Issues of Fault-Tolerance in Concurrent Computations. Ph.D. dissertation, Aiken Computation Laboratory, Harvard University, June 1984.
Hadzilacos, V. On the Relationship Between the Atomic Commitment and Consensus Problems. Workshop on Fault Tolerant Distributed Computing, March 17–19, 1986, Pacific Grove, CA. (Proceedings published as Lecture Notes in Computer Science, Vol. 448 (B. Simons and A. Spector, eds.), Springer-Verlag, Berlin 1990.)
Hadzilacos, V., and J. Y. Halpern. Message-Optimal Protocols for Byzantine Agreement.Mathematical Systems Theory, this issue, pp. 41–102, 1993.
Halpern, J. Y., and Y. Moses. Knowledge and Common Knowledge in a Distributed Environment.Journal of the ACM,37(3):549–587, 1990. (Preliminary version inProc. 3rd ACM Symp. on Principles of Distributed Computing, pp. 50–61, August 1984.)
Lamport, L. The Weak Generals Problem.Journal of the ACM,30(3):668–676, July 1983.
Lamport, L., and M. J. Fischer. Byzantine Generals and Transaction Commit Protocols. Technical Report Op. 62, SRI International, April 1982.
Lamport, L., R. Shostak, and M. Pease. The Byzantine Generals Problem.ACM Transactions on Programming Languages and Systems,4(3):382–401, July 1982.
Moses, Y., and O. Waarts. Coordinate Traversal:(t + l)-Round Byzantine Agreement in Polynomial Time. InProc. 29th Symp. on Foundations of Computer Science, pp. 246–255, October 1988.
Neiger, G., and S. Toueg. Automatically Increasing the Fault-Tolerance of Distributed Algorithms.Journal of Algorithms,11(3): 374–419, September 1990.
Pease, M., R. Shostak, and L. Lamport. Reaching Agreement in the Presence of Faults.Journal of the ACM,27(2):228–234, April 1980.
Ricciardi, A. M., and K. P. Birman. Using Process Groups To Implement Failure Detection in Asynchronous Environments. InProc. 10th ACM Symp. on Principles of Distributed Computing, pp. 341–351, August 1991.
Skeen, M. D. Crash Recovery in a Distributed Database System. Ph.D. dissertation, Department of Electrical Engineering and Computer Science, University of California at Berkeley, 1982.
Srikanth, T. K., and S. Toueg. Simulating Authenticated Broadcasts To Derive Simple Fault-Tolerant Algorithms.Distributed Computing,2:80–94, 1987.
Zwaenepoel, W. Protocols for Large Data Transfers over Local Area Networks. InProc. 9th Data Communications Symposium, pp. 22–32, Sept. 1985.
Author information
Authors and Affiliations
Additional information
The work of V. Hadzilacos was supported, in part, by a grant from the Natural Sciences and Engineering Research Council of Canada.
Rights and permissions
About this article
Cite this article
Hadzilacos, V., Halpern, J.Y. The Failure Discovery problem. Math. Systems Theory 26, 103–129 (1993). https://doi.org/10.1007/BF01187075
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF01187075