Abstract
Service-oriented systems are widely-employed in e-business, e-government, finance, management systems, and so on. Service fault tolerance is one of the most important techniques for building highly reliable service-oriented systems. In this paper, we provide an overview of various service fault tolerance techniques, including sections on fault tolerance strategy design, fault tolerance strategy selection, and Byzantine fault tolerance. In the first section, we introduce the design of static and dynamic fault tolerance strategies, as well as the major problems when designing fault tolerance strategies. After that, based on various fault tolerance strategies, in the second section, we identify significant components from a complex service-oriented system, and investigate algorithms for optimal fault tolerance strategy selection. Finally, in the third section, we discuss a special type of service fault tolerance techniques, i.e., the Byzantine fault tolerance.
摘要
面向服务系统被广泛应用于电子商务、 电子政务、 金融、 管理系统等领域。 服务容错技术是用于建立高可靠性面向服务系统的最重要的技术之一。 本文给出了各种服务容错技术的概述, 包括三个部分: 容错策略设计, 容错策略选择, 及拜占庭容错。 第一部分主要关注静态及动态容错策略的设计, 及服务容错策略设计过程中需要解决的主要问题。 面对各种各样的服务容错策略, 第二部分包括快速定位复杂的面向服务系统关键模块的方法, 及最优容错策略选择算法。 最后, 第三部分将会讨论一种特殊的服务容错技术, 拜占庭容错。
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Lyu M R. Handbook of Software Reliability Engineering. New York: McGraw-Hill, 1996
Lyu M R. Software Fault Tolerance. Chichester: John Wiley & Sons, 1995
Wang H, Tang Y, Yin G, et al. Trustworthiness of internet-based software. Sci China Ser-F: Inf Sci, 2006, 49: 759–773
Fang C-L, Liang D, Lin F, et al. Fault-tolerant Web services. J Syst Architect, 2007, 53: 21–38
Salatge N, Fabre J-C. Fault tolerance connectors for unreliable Web services. In: Proceedings of 37th International Conference on Dependable Systems and Networks, Edinburgh, 2007. 51–60
Sheu G-W, Chang Y-S, Liang D, et al. A fault-tolerant object service on CORBA. In: Proceedings of 17th International Conference on Distributed Computing Systems, Baltimore, 1997. 393
Luckow A, Schnor B. Service replication in grids: ensuring consistency in a dynamic, failure-prone environment. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing, Miami, 2008. 1–7
Merideth M G, Iyengar A, Mikalsen T, et al. Thema: Byzantine fault-tolerant middleware for Web service applications. In: Proceedings of 24th IEEE Symposium on Reliable Distributed Systems, Orlando, 2005. 131–142
Pallemulle S L, Thorvaldsson H D, Goldman K J. Byzantine fault-tolerant Web services for n-tier and service oriented architectures. In: Proceedings of 28th International Conference on Distributed Computing Systems, Beijing, 2008. 260–268
Salas J, Perez-Sorrosal F, Marta Pati N-M, et al. WS-replication: a framework for highly available Web services. In: Proceedings of 15th International Conference on World Wide Web, Edinburgh, 2006. 357–366
Santos G T, Lung L C, Montez C. FTWeb: a fault tolerant infrastructure for Web services. In: Proceedings of 9th IEEE International Conference on Enterprise Computing, Enschede, 2005. 95–105
Randell B, Xu J. The evolution of the recovery block concept. In: Lyu M R, ed. Software Fault Tolerance. Chichester: John Wiley & Sons, 1995. 1–21
Avizienis A. The methodology of n-version programming. In: Lyu M R, ed. Software Fault Tolerance. Chichester: John Wiley & Sons, 1995. 23–46
Leu D, Bastani F, Leiss E. The effect of statically and dynamically replicated components on system reliability. IEEE Trans Rel, 1990, 39: 209–216
Zheng Z, Lyu M R. An adaptive QoS-aware fault tolerance strategy for Web services. Springer J Empir Softw Eng, 2010, 15: 323–345
Ye X, Shen Y. Replicating multithreaded web services. In: Proceedings of 3rd International Symposium on Parallel and Distributed Processing and Applications, Nanjing, 2005. 162–167
Osrael J, Froihofer L, Weghofer M, et al. Axis2-based replication middleware for Web services. In: Proceedings of IEEE International Conference on Web Services, Salt Lake City, 2007. 591–598
Ye X. Providing reliable Web services through active replication. In: Proceedings of 6th IEEE/ACIS International Conference on Computer and Information Science, Melbourne, 2007. 1111–1116
Brito A, Fetzer C, Felber P. Multithreading-enabled active replication for event stream processing operators. In: Proceedings of 28th IEEE International Symposium on Reliable Distributed Systems, Niagara Falls, 2009. 22–31
Object Management Group. Fault-tolerant COBRA using entity redundancy: request for proposal. 98-04-01, 1998
Narasimhan P, Moser L E, Melliar-Smith P M. Enforcing determinism for the consistent replication of multithreaded CORBA applications. In: Proceedings of 18th IEEE Symposium on Reliable Distributed Systems, Lausanne, 1999. 263
Fang C-L, Liang D, Chen C, et al. A redundant nested invocation suppression mechanism for active replication faulttolerant Web service. In: Proceedings of IEEE International Conference on e-Technology, e-Commerce and e-Service, Taipei, 2004. 9–16
Zheng Z, Zhou T C, Lyu M R, et al. FTCloud: a ranking-based framework for fault tolerant cloud applications. In: Proceedings of International Symposium on Software Reliability Engineering, San Jose, 2010. 398–407
Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of 7th Internationl World Wide Web Conference, Brisbane, 1998
Zheng Z, Zhou T, Lyu M R, et al. Component ranking for fault-tolerant cloud applications. IEEE Trans Serv Comput, 2012, 5: 540–550
Qiu W, Zheng Z, Wang X, et al. Reliability-based design optimization for cloud migration. IEEE Trans Serv Comput, 2014, 7: 223–236
Zheng Z, Lyu M R. Selecting an optimal fault tolerance strategy for reliable service-oriented systems with local and global constraints. IEEE Trans Comput, 2015, 64: 219–232
Cormen T, Leiserson C, Rivest R. Introduction to Algorithms. Cambridge: MIT Press, 1990
Shahadat Khan E G M, Li Kin F, Akbar M. Solving the knapsack problem for adaptive multimedia systems. Stud Inf Univ, 2002, 2: 157–178
Lamport L, Shostak R, Pease M. The Byzantine generals problem. ACM Trans Program Lang Syst, 1982, 4: 382–401
Castro M, Liskov B. Practical Byzantine fault tolerance. In: Proceedings of 3rd Symposium on Operating Systems Design and Implementation, New Orleans, 1999. 1–14
Zhao W. BFT-WS: a Byzantine fault tolerance framework for Web services. In: Proceedings of 7th International IEEE EDOC Conference Workshop, Annapolis, 2007. 89–96
Li W, He J, Ma Q, et al. A framework to support survivable Web services. In: Proceedings of 19th IEEE International Symposium on Parallel and Distributed Processing, Denver, 2005. 93–94
Rodrigues R, Castro M, Liskov B. BASE: using abstraction to improve fault tolerance. In: Proceedings of 18th Symposium on Operating Systems Principles, Banff, 2001. 15–28
Engelen R A V, Gallivan K A. The gSOAP toolkit for Web services and peer-to-peer computing networks. In: Proceedings of IEEE International Symposium on Cluster Computing and the Grid, Berlin, 2002. 128
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zheng, Z., Lyu, M.R.T. & Wang, H. Service fault tolerance for highly reliable service-oriented systems: an overview. Sci. China Inf. Sci. 58, 1–12 (2015). https://doi.org/10.1007/s11432-015-5313-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-015-5313-y