Abstract
In this paper we study a fault tolerant model for Grid environments based on the task replication concept. The basic idea is to produce and submit to the Grid multiple replicas of a given task, given the fact that the failure probability for each one of them is known a priori. We introduce a scheme for the calculation of the number of replicas for the case of having diverse failure probabilities of each task replica and propose an efficient resource management scheme, based on fair share technique, which handles the task replicas so as to maintain in a fair way the fault tolerance in the Grid. Our study concludes with the presentation of the simulation results which validate the proposed scheme.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Lyu, M.R.: Software Fault Tolerance. John Wiley & Sons, Chichester (1995)
Weissman, J.B.: Fault Tolerant Computing on the Grid: What are My Options? HPDC 1999 (1999)
Wang, F., Ramamritham, K., Stankovic, J.A.: Determining redundancy levels for fault tolerant real-time systems. IEEE Trans. Computers 44(2), 292–303 (1995)
Nguyen-Tuong, A.: Integrating Fault-Tolerance Techniques in Grid Applications, PhD Dissertation, University of Virginia (August. 2000)
Scheduling Working Group of the Grid Forum, Document: 10.5 (September 2001)
Ramamritham, K., Stankovic, J.A., Shiah, P.-F.: Efficient Scheduling Algorithms for Real-time Multiprocessor Systems. IEEE Trans. on Parallel and Distributed Systems 1(2), 184–194 (1990)
Jackson, L.E., Rouskas, G.N.: Deterministic Preemptive Scheduling of Real Time Tasks. IEEE Computer 35(5), 72–79 (2002)
Demers, A., Keshav, S., Shenker, S.: Design and Analysis of a Fair Queuing Algorithm. In: Proc. of the ACM SIGCOMM (1989)
Bertsekas, D., Gallager, R.: Data Networks. Prentice Hall, Englewood Cliffs (1992); The section on max-min fairness starts on p. 524
Leung, J.Y.-T., Merrill, M.L.: A Note on Preemptive, Scheduling of Periodic, Real-Time Tasks. Information Processing Letters 11(3), 115–118 (1980)
Dertouzos, M.L., Mok, A.K.-L.: Multiprocessor On-line scheduling for Hard Real Time Tasks. IEEE Trans. on Software Eng. 15(12), 1497–1506 (1989)
Tanenbaum, A.S., van Steen, M.: Distributed Systems: Principles and Paradigms. Computer Science. Prentice Hall, Englewood Cliffs (2002)
Varvarigou, T., Trotter, J.: Module replication for fault-tolerant real-time distributed systems. IEEE Transactions on Reliability 47(1), 8–18 (1998)
Doulamis, N., Doulamis, A., Panagakis, A., Dolkas, K., Varvarigou, T., Varvarigos, E.: A Combined Fuzzy -Neural Network Model for Non-Linear Prediction of 3D Rendering Workload in Grid Computing. IEEE Trans. on Systems Man and Cybernetics, Part-B (accepted for publication)
The Globus project, http://www-fp.globus.org/hbm/
Nguyen-Tuong, A., Grimshaw, A.S.: Using Reflection to Incorporate Fault-Tolerance Techniques in Distributed Applications. Computer Science Technical Report, University of Virginia, CS 98-34 (1998)
Casanova, H., Dongarra, J., Johnson, C., Miller, M.: Application-Specific Tools. In: Foster, I., Kesselman, C. (eds.) The GRID: Blueprint for a New Computing Infrastructure, ch. 7, pp. 159–180 (1998)
Grimshaw, A.S., Ferrari, A., West, E.A.: Mentat. In: Wilson, G.V., Lu, P. (eds.) Parallel Programming Using C++, ch. 10, pp. 382–427 (1996)
Gartner, F.C.: Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments. ACM Computing Surveys 31(1) (1999)
Access to Knowledge through the Grid in a Mobile World (AKOGRIMO) Integrated Project FP6-2003-IST-004293, http://www.akogrimo.org/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Litke, A., Tserpes, K., Dolkas, K., Varvarigou, T. (2005). A Task Replication and Fair Resource Management Scheme for Fault Tolerant Grids. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds) Advances in Grid Computing - EGC 2005. EGC 2005. Lecture Notes in Computer Science, vol 3470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11508380_104
Download citation
DOI: https://doi.org/10.1007/11508380_104
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26918-2
Online ISBN: 978-3-540-32036-4
eBook Packages: Computer ScienceComputer Science (R0)