A Task Replication and Fair Resource Management Scheme for Fault Tolerant Grids

Litke, Antonios; Tserpes, Konstantinos; Dolkas, Konstantinos; Varvarigou, Theodora

doi:10.1007/11508380_104

Antonios Litke²¹,
Konstantinos Tserpes²¹,
Konstantinos Dolkas²¹ &
…
Theodora Varvarigou²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3470))

Included in the following conference series:

European Grid Conference

Abstract

In this paper we study a fault tolerant model for Grid environments based on the task replication concept. The basic idea is to produce and submit to the Grid multiple replicas of a given task, given the fact that the failure probability for each one of them is known a priori. We introduce a scheme for the calculation of the number of replicas for the case of having diverse failure probabilities of each task replica and propose an efficient resource management scheme, based on fair share technique, which handles the task replicas so as to maintain in a fair way the fault tolerance in the Grid. Our study concludes with the presentation of the simulation results which validate the proposed scheme.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A Fault-Tolerant Workflow Scheduling Algorithm for Grid with Near-Optimal Redundancy

Article 25 August 2020

A Hybrid Fault Tolerant Scheduler for Computational Grid Environment

Fault Tolerant Resource Management Scheme for Computational Grids

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Lyu, M.R.: Software Fault Tolerance. John Wiley & Sons, Chichester (1995)
Google Scholar
Weissman, J.B.: Fault Tolerant Computing on the Grid: What are My Options? HPDC 1999 (1999)
Google Scholar
Wang, F., Ramamritham, K., Stankovic, J.A.: Determining redundancy levels for fault tolerant real-time systems. IEEE Trans. Computers 44(2), 292–303 (1995)
Article MATH Google Scholar
Nguyen-Tuong, A.: Integrating Fault-Tolerance Techniques in Grid Applications, PhD Dissertation, University of Virginia (August. 2000)
Google Scholar
Scheduling Working Group of the Grid Forum, Document: 10.5 (September 2001)
Google Scholar
Ramamritham, K., Stankovic, J.A., Shiah, P.-F.: Efficient Scheduling Algorithms for Real-time Multiprocessor Systems. IEEE Trans. on Parallel and Distributed Systems 1(2), 184–194 (1990)
Article Google Scholar
Jackson, L.E., Rouskas, G.N.: Deterministic Preemptive Scheduling of Real Time Tasks. IEEE Computer 35(5), 72–79 (2002)
Google Scholar
Demers, A., Keshav, S., Shenker, S.: Design and Analysis of a Fair Queuing Algorithm. In: Proc. of the ACM SIGCOMM (1989)
Google Scholar
Bertsekas, D., Gallager, R.: Data Networks. Prentice Hall, Englewood Cliffs (1992); The section on max-min fairness starts on p. 524
Google Scholar
Leung, J.Y.-T., Merrill, M.L.: A Note on Preemptive, Scheduling of Periodic, Real-Time Tasks. Information Processing Letters 11(3), 115–118 (1980)
Article MATH MathSciNet Google Scholar
Dertouzos, M.L., Mok, A.K.-L.: Multiprocessor On-line scheduling for Hard Real Time Tasks. IEEE Trans. on Software Eng. 15(12), 1497–1506 (1989)
Article Google Scholar
Tanenbaum, A.S., van Steen, M.: Distributed Systems: Principles and Paradigms. Computer Science. Prentice Hall, Englewood Cliffs (2002)
MATH Google Scholar
Varvarigou, T., Trotter, J.: Module replication for fault-tolerant real-time distributed systems. IEEE Transactions on Reliability 47(1), 8–18 (1998)
Article Google Scholar
Doulamis, N., Doulamis, A., Panagakis, A., Dolkas, K., Varvarigou, T., Varvarigos, E.: A Combined Fuzzy -Neural Network Model for Non-Linear Prediction of 3D Rendering Workload in Grid Computing. IEEE Trans. on Systems Man and Cybernetics, Part-B (accepted for publication)
Google Scholar
The Globus project, http://www-fp.globus.org/hbm/
Nguyen-Tuong, A., Grimshaw, A.S.: Using Reflection to Incorporate Fault-Tolerance Techniques in Distributed Applications. Computer Science Technical Report, University of Virginia, CS 98-34 (1998)
Google Scholar
Casanova, H., Dongarra, J., Johnson, C., Miller, M.: Application-Specific Tools. In: Foster, I., Kesselman, C. (eds.) The GRID: Blueprint for a New Computing Infrastructure, ch. 7, pp. 159–180 (1998)
Google Scholar
Grimshaw, A.S., Ferrari, A., West, E.A.: Mentat. In: Wilson, G.V., Lu, P. (eds.) Parallel Programming Using C++, ch. 10, pp. 382–427 (1996)
Google Scholar
Gartner, F.C.: Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments. ACM Computing Surveys 31(1) (1999)
Google Scholar
Access to Knowledge through the Grid in a Mobile World (AKOGRIMO) Integrated Project FP6-2003-IST-004293, http://www.akogrimo.org/

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, National Technical University of Athens, 9, Heroon Polytechniou Str, 15773, Athens, Greece
Antonios Litke, Konstantinos Tserpes, Konstantinos Dolkas & Theodora Varvarigou

Authors

Antonios Litke
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Tserpes
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Dolkas
View author publications
You can also search for this author in PubMed Google Scholar
Theodora Varvarigou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Sciences, Section of Computational Science, University of Amsterdam, Kruislaan 403, 1098, Amsterdam, SJ, The Netherlands
Peter M. A. Sloot
Section Computational Science, University of Amsterdam, The Netherlands
Alfons G. Hoekstra
INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042, Rennes Cedex, France
Thierry Priol
Zuse Institute Berlin,
Alexander Reinefeld
Institute of Computer Science, AGH, Poland
Marian Bubak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Litke, A., Tserpes, K., Dolkas, K., Varvarigou, T. (2005). A Task Replication and Fair Resource Management Scheme for Fault Tolerant Grids. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds) Advances in Grid Computing - EGC 2005. EGC 2005. Lecture Notes in Computer Science, vol 3470. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11508380_104

Download citation

DOI: https://doi.org/10.1007/11508380_104
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26918-2
Online ISBN: 978-3-540-32036-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Task Replication and Fair Resource Management Scheme for Fault Tolerant Grids

Abstract

Chapter PDF

Similar content being viewed by others

A Fault-Tolerant Workflow Scheduling Algorithm for Grid with Near-Optimal Redundancy

A Hybrid Fault Tolerant Scheduler for Computational Grid Environment

Fault Tolerant Resource Management Scheme for Computational Grids

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Task Replication and Fair Resource Management Scheme for Fault Tolerant Grids

Abstract

Chapter PDF

Similar content being viewed by others

A Fault-Tolerant Workflow Scheduling Algorithm for Grid with Near-Optimal Redundancy

A Hybrid Fault Tolerant Scheduler for Computational Grid Environment

Fault Tolerant Resource Management Scheme for Computational Grids

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation