Abstract
In order for systems in which tasks may fail to be fault-tolerant, traditional methods deploy multiple servers as replicas to perform the same task. Further, in real time systems, computations have to meet strict time-constraints, a delayed output being unacceptable, even if correct. The effectiveness of sending task-replicas to multiple servers simultaneously, and using the results from whichever one responds first, is considered in this paper as a means of reducing response time and improving fault-tolerance. Once a request completes execution in one server successfully, it immediately cancels (kills) its replicas that remain at other servers. We assume a Markovian system and use the generating function method to determine the Laplace transform of the response time probability distribution, jointly with the probability that not all replicas fail, in the case of two replicas. When the failure rate of each task is greater than the service rate of the server, we make the approximation that the queues are independent, each with geometric queue length probability distributions at equilibrium. We compare our approximation with simulation results as well as with the exact solution in a truncated state space and find that for failure rates in that region, the approximation is generally good. At lower failure rates, the method of spectral expansion provides an excellent approximation in a truncated, multi-mode, two-dimensional Markov process.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Artalejo, J.R.: G-networks: A versatile approach for work removal in queueing networks. European Journal of Operational Research 126(2), 233–249 (2000)
Chan, P., Lyu, M.R., Malek, M.: Reliableweb services: Methodology, experiment and modeling. In: IEEE International Conference on Web Services, ICWS 2007, pp. 679–686. IEEE (2007)
Dabrowski, C.: Reliability in grid computing systems. Concurrency and Computation: Practice and Experience 21(8), 927–959 (2009)
Dean, J., Barroso, L.A.: The tail at scale. Communications of the ACM 56(2), 74–80 (2013)
Gelenbe, E.: Product-form queueing networks with negative and positive customers. Journal of Applied Probability, 656–663 (1991)
Harrison, P.G., Pitel, E.: Sojourn times in single-server queues with negative customers. Journal of Applied Probability, 943–963 (1993)
Koren, I., Krishna, C.M.: Fault-tolerant systems. Morgan Kaufmann (2010)
Maxion, R.A., Siewiorek, D.P., Elkind, S.A.: Techniques and architectures for fault-tolerant computing. Annual Review of Computer Science 2(1), 469–520 (1987)
Mitrani, I.: Spectral expansion solutions for markov-modulated queues. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 17–35. Springer, Heidelberg (2002)
Sauro, J.: The high cost of task failure on websites (2012), http://www.measuringusability.com/blog/cost-task-failure.php
Tang, C., Li, Q., Hua, B., Liu, A.: Developing reliable web services using independent replicas. In: Fifth International Conference on Semantics, Knowledge and Grid, SKG 2009, pp. 330–333. IEEE (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Harrison, P.G., Qiu, Z. (2013). Performance Enhancement by Means of Task Replication. In: Balsamo, M.S., Knottenbelt, W.J., Marin, A. (eds) Computer Performance Engineering. EPEW 2013. Lecture Notes in Computer Science, vol 8168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40725-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-40725-3_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40724-6
Online ISBN: 978-3-642-40725-3
eBook Packages: Computer ScienceComputer Science (R0)