Abstract
Self-stabilization is an elegant approach for designing a class of fault-tolerant distributed protocols. A self-stabilizing protocol is guaranteed to eventually converge to a legitimate state after a transient fault. However, even a minor transient fault can cause vast disruption in the system before legitimacy is reached. This paper introduces the notion of fault-containment to address this particular weakness of self-stabilizing systems. Informally, a fault-containing self-stabilizing protocol, in addition to providing self- stabilization, contains the effects of faults. This ensures that disruption during recovery from faults, is proportional to the extent of the faults. The paper begins with a formal framework for specifying and evaluating fault-containing self-stabilizing protocols. The main result of the paper is a transformer that converts any non-reactive self-stabilizing protocol into an equivalent fault-containing self-stabilizing protocol that can repair any single fault in the system in O(1) time. For a large class of input protocols, the corresponding output protocols produced by the transformer have O(1) space overhead. The small time and space overhead make the fault-containing self-stabilizing protocol a practical alternative to the original self-stabilizing protocol. The transformer is based on a novel stabilizing timer paradigm that significantly simplifies the task of fault-containment.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Afek, Y., Dolev, S.: Local stabilizer. In: Proceedings of the 5th Israeli Conference on Theory of Computing and Systems, pp. 74–84 (1997)
Anagnostou, E., El-Yaniv, R., Hadzilacos, V.: Memory adaptive self-stabilizing protocols. In: WDAG92 Distributed Algorithms 6th International Workshop Proceedings. LNCS vol. 647, pp. 203–220. Springer, Heidelberg (1992)
Arora A. and Gouda M.G. (1994). Distributed reset. IEEE Trans. Comput. 43: 1026–1038
Arora, A., Kulkarni, S.S.: Designing masking fault-tolerance via nonmasking fault-tolerance. In: SRDS95 Proceedings of the 14th Symposium on Reliable Distributed Systems, pp. 174–185 (1995)
Arora A. and Kulkarni S.S. (1997). Component based design of multitolerance. IEEE Trans. Softw. Eng. 43: 1026–1038
Awerbuch B. (1985). Complexity of network synchronization. J. ACM 32: 804–823
Awerbuch, B., Kutten, S., Mansour, Y., Patt-Shamir, B., Varghese, G.: Time optimal self-stabilizing synchronization. In: Proceedings of the 25th Annual ACM Symposium on Theory of Computing, pp. 652–661 (1993)
Awerbuch, B., Varghese, G.: Distributed program checking: a paradigm for building self-stabilizing distributed protocols. In: Proceedings of the 31st Annual IEEE Symposium on Foundations of Computer Science, pp. 258–267 (1991)
Beauquier, J., Genolini, C., Kutten, S.: Optimal reactive k-stabilization – the case of mutual exclusion. In: Proceedings of the 18th Annual ACM Symposium on Principles of Distributed Computing, pp. 209–218 (1999)
Bruell S.C., Ghosh S., Karaata M.H. and Pemmaraju S.V. (1999). Self-stabilizing algorithms for finding centers and medians of trees. SIAM J. Comput. 29(2): 600–614
Buskens, R.W., Bianchini, R.P.: Self-stabilizing mutual exclusion in the presence of faulty nodes. In: FTCS95 Proceedings of the 25th International Symposium on Fault-Tolerant Computing Systems, pp. 144–153 (1995)
Chen N.S., Yu H.P. and Huang S.T. (1991). A self-stabilizing algorithm for constructing spanning trees. Informat. Process. Lett. 39: 147–151
Couvreur, M., Francez, N., Gouda, M.G.: Asynchronous unison. In: Proceedings of the 12th International Conference on Distributed Computing Systems, pp. 486–493 (1992)
Dijkstra E.W. (1974). Self stabilizing systems in spite of distributed control. Communi. ACM 17: 643–644
Dolev, S.: Optimal time self-stabilization in dynamic systems. In: WDAG93 Distributed Algorithms 7th International Workshop Proceedings. LNCS, vol. 725, pp. 160–173 Springer, Heidelberg (1993)
Dolev S. (2000). Self-Stabilization. MIT Press, Cambridge
Dolev, S., Herman, T.: Superstabilizing protocols for dynamic distributed systems. In: Proceedings of the Second Workshop on Self-Stabilizing Systems, pp. 3.1–3.15 (1995)
Dolev, S., Israeli, A., Moran, S.: Uniform dynamic self-stabilizing leader election. In: WDAG91 Distributed Algorithms 5th International Workshop Proceedings. LNCS, vol. 579, pp. 167–180 Springer, Heidelberg (1991)
Ghosh S. and Gupta A. (1996). An exercise in fault-containment: Self-stabilizing leader election. Informat. Process. Lett. 5(59): 281–288
Ghosh, S., Gupta, A., Herman, T., Pemmaraju, S.V.: Fault-containing self-stabilizing distributed protocols. Technical Report TR-00-01, Department of Computer Science, The University of Iowa, Iowa City (2000)
Ghosh S., Gupta A. and Pemmaraju S.V. (1996). A fault-containing self-stabilizing algorithm for spanning trees. J. Comput. Informat. 2: 322–338
Ghosh, S., Gupta, A., Pemmaraju, S.V.: Fault-containing network protocols. In: Proceedings of 12th Annual ACM Symposium on Applied Computing (1997)
Ghosh S., Gupta A. and Pemmaraju S.V. (1997). A self-stabilizing algorithm for the maximum flow problem. Distributed Comput. 10: 167–180
Ghosh, S., He, X.: Scalable self-stabilization. In: Proceedings of the 4th Workshop on Self-Stabilization, pp. 18–24 (1999)
Gopal, A.S., Perry, K.J.: Unifying self-stabilization and fault-tolerance. In: Proceedings of the 12th Annual ACM Symposium on Principles of Distributed Computing, pp. 195–206 (1993)
Gouda, M.G., Schneider, M.: Maximum flow routing. In: Proceedings of the Second Workshop on Self-Stabilizing Systems, pp. 2.1–2.13 (1995)
Herman, T.: Superstabilizing mutual exclusion. In: Proceedings of 1st International Conference on Parallel and Distributed Processing: Techniques and Applications (1995)
Kutten, S., Patt-Shamir, B.: Asynchronous time-adaptive self-stabilization. In: Brief Announcement, Proceedings of the 17th Annual ACM Symposium on Principles of Distributed Computing (1998)
Kutten S. and Patt-Shamir B. (1999). Stabilizing time-adaptive protocols. Theor. Comput. Sci. 220: 93–111
Kutten, S., Peleg, D.: Fault-local distributed mending. In: Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing, pp. 20–27 (1995)
Manna Z. and Pnueli A. (1992). The Temporal Logic of Reactive and Concurrent Systems. Springer, Heidelberg
Nelson, V.: Fault-tolerant computing: fundamental concepts. IEEE Comput., pp. 19–25 (1990)
Parlati, G., Yung, M.: Non-exploratory self-stabilization for constant-space symmetry-breaking. In: Algorithms ESA 94. LNCS, vol. 855, pp. 183–201 Springer, Heidelberg (1994)
Schneider M. (1993). Self-stabilization. ACM Comput. Surveys 25: 45–67
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ghosh, S., Gupta, A., Herman, T. et al. Fault-containing self-stabilizing distributed protocols. Distrib. Comput. 20, 53–73 (2007). https://doi.org/10.1007/s00446-007-0032-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00446-007-0032-2