Dependable Grid Workflow Scheduling Based on Resource Availability

Tao, Yongcai; Jin, Hai; Wu, Song; Shi, Xuanhua; Shi, Lei

doi:10.1007/s10723-012-9237-0

Dependable Grid Workflow Scheduling Based on Resource Availability

Published: 29 September 2012

Volume 11, pages 47–61, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Grid Computing Aims and scope Submit manuscript

Dependable Grid Workflow Scheduling Based on Resource Availability

Download PDF

Yongcai Tao¹,
Hai Jin²,
Song Wu²,
Xuanhua Shi² &
…
Lei Shi¹

467 Accesses
18 Citations
Explore all metrics

Abstract

Due to the highly dynamic feature, dependable workflow scheduling is critical in the Grid environment. Various scheduling algorithms have been proposed, but seldom consider the resource reliability. Current Grid systems mainly exploit fault tolerance mechanism to guarantee the dependable workflow execution, which, however, wastes system resources. The paper proposes a dependable Grid workflow scheduling system (called DGWS). It introduces a Markov Chain-based resource availability prediction model. Based on the model, a reliability cost driven workflow scheduling algorithm is presented. The performance evaluation results, including the simulation on both parametric randomly generated DAGs and two real scientific workflow applications, demonstrate that compared to present workflow scheduling algorithms, DGWS improves the success ratio of tasks and diminishes the makespan of workflow, so improves the dependability of workflow execution in the dynamic Grid environments.

Avoid common mistakes on your manuscript.

References

Topcuoglu, H., Hariri, S., Wu, M.: Performance effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Article Google Scholar
Mandal, A., Kennedy, K., Koelbel, C., Marin, G., Mellor-Crummey, J., Liu, B., Johnsson, L.: Scheduling strategies for mapping application workflows onto the Grid. In: Proc. of 14th IEEE International Symposium on High Performance Distributed Computing (HPDC-14), pp. 125–134. IEEE Computer Society, Research Triangle Park, North Carolina, USA (2005)
Google Scholar
Sih, G.C., Lee, E.A.: A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architecture. IEEE Trans. Parallel Distrib. Syst. 4(2), 175–187 (1993)
Article Google Scholar
Hwang, S., Kesselman, C.: Grid workflow: a flexible failure handling framework for the Grid. In: Proc. of 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12), pp. 126–137, Seattle, Washington, USA. IEEE Computer Society Press, Los Alamitos, CA, USA, (2003)
Google Scholar
He, Y., Shao, Z., Xiao, B., Zhuge, Q., Sha, E.: Reliability driven task scheduling for heterogeneous systems. In: The 15th IASTED International Conference on Parallel and Distributed Computing and Systems 1, pp. 465–470 (2003)
Qin, X., Jiang, H., Swanson, D.R.: An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems. In: Proc. of the 2002 International Conference on Parallel Processing, pp. 360–368 (2002)
Truong, H.L., Fahringer, T., Dustdar, S.: Dynamic instrumentation, performance monitoring and analysis of Grid scientific workflows. J. Grid Computing 3(1–2), 1–18 (2005)
Article Google Scholar
Yu, J., Buyya, R.: A taxonomy of workflow management systems for Grid computing. J. Grid Computing 3(3–4), 171–200 (2005)
Article Google Scholar
Krauter, K., Buyya, R., Maheswaran, M.: A taxonomy and survey of Grid resource management systems for distributed computing. Softw. Pract. Exp. 32(2), 135–164 (2002)
Article MATH Google Scholar
Cao, J., Jarvis, S.A., Saini, S., Nudd, G.R.: GridFlow: workflow management for Grid computing. In: 3rd International Symposium on Cluster Computing and the Grid (CCGrid). IEEE Computer Society Press, Los Alamitos, Tokyo, Japan (2003)
Google Scholar
Buyya, R., Murshed, M., Abramson, D., Venugopal, S.: Scheduling parameter sweep applications on global Grids: a deadline and budget constrained cost-time optimization algorithm. Softw. Pract. Exp. (SPE) J. 35(5), 491–512 (2005)
Article Google Scholar
Vanmechelen, K., Depoorter, W., Broeckhove, J.: Combining futures and spot markets: a hybrid market approach to economic Grid resource management. J. Grid Computing 9(1), 81–94 (2011)
Article Google Scholar
Prodan, R., Wieczorek, M., Mohammadi Fard, H.: Double auction-based scheduling of scientific applications in distributed Grid and cloud environments. J. Grid Computing 9(4), 531–548 (2011)
Article Google Scholar
Song, S.S., Hwang, K., Kwok, Y.K.: Trusted Grid computing with security binding and trust integration. J. Grid Comput. 3(1–2), 53–73 (2005)
Article Google Scholar
Sahoo, R., Sivasubramaniam, A., Squillante, M.S., Zhang, Y.: Failure data analysis of a large-scale heterogeneous server environment. In: The International Conference on Dependable Systems and Networks (DSN), Florence, Italy (2004)
Heath, T., Martin, R., Nguyen, T.D.: Improving cluster availability using workstation validation. In: The ACM SIGMETRICS 2002, pp. 217–227. Marina Del Rey, CA (2002)
Google Scholar
Sahoo, R., Oliner, A.J., Rish, I., Gupta, M., Moreira, J.E., Ma, S., Vilalta, R., Sivasubramaniam, A.: Critical event prediction for proactive management in large-scale computing clusters. In: Proc. of the ACM SIGKDD, pp. 426–435 (2003)
Fu, S., Xu, C.-Z.: Quantifying temporal and spatial correlation of failure events for proactive management. In: Proc. of IEEE International Symposium on Reliable Distributed Systems (SRDS), pp. 175–184 (2007)
Nurmi, D., Brevik, J., Wolski, R.: Modeling machine availability in enterprise and wide-area distributed computing environments. In: Technical Report CS2003–28, U.C. Santa Barbara Computer Science Department (2003)
Brevik, J., Nurmi, D., Wolski, R.: Automatic methods for predicting machine availability in desktop Grid and peer-to-peer systems. In: Proc. of CCGrid’04, pp. 190–199 (2004)
Ren, X.J., Lee, S., Eigenmann, R., Bagchi, S.: Resource failure prediction in fine-grained cycle sharing systems. In: Proc. of 15th IEEE International Symposium on High Performance Distributed Computing, pp. 93–104. IEEE Computer Society Paris, France (2006)
Google Scholar
Ren, X.J., Lee, S., Eigenmann, R., Bagchi, S.: Prediction of resource availability in fine-grained cycle sharing systems empirical evaluation. J. Grid Computing 5(2), 173–195 (2007)
Article Google Scholar
Malewicz, G., Foster, I., Rosenberg, A.L., Wilde, M.: A tool for prioritizing DAGMan jobs and its evaluation. J. Grid Computing 5(2), 197–212 (2007)
Article Google Scholar
Wu, M., Sun, X.H.: Grid harvest service: a performance system of Grid computing. J. Parallel Distrib. Comput. 66(10), 1322–1337 (2006)
Article MATH Google Scholar
Sen, A., Bhattacharyya, G.K.: A piecewise exponential model for reliability growth and associated inferences. In: Basu, A.P. (ed.) Advances in Reliability, pp. 331–355. Elsevier (1993)
Calabria, R., Guida, M., Pulcini, G.: A Bayes procedure for estimation of current system reliability. IEEE Trans. Reliab. 41, 616–620 (1992)
Article MATH Google Scholar
Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Introducing Markov chain Monte Carlo. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 1–19. Chapman & Hall, London (1996)
Sakellariou, R., Zhao, H.: A hybrid heuristic for DAG scheduling on heterogeneous systems. In: Proc. of 13th Heterogeneous Computing Workshop (HCW-2004), Santa Fe, New Mexico, USA (2004)
Jin, H.: ChinaGrid: making Grid computing a reality. In: Digital Libraries: International Collaboration and Cross-Fertilization, Lecture Notes in Computer Science, vol. 3334, pp. 13–24. Springer (2004)
Buyya, R., Murshed, M.: GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing. J. Concurr. Comput. Pract. Exp. 14(13–15), 1175–1220 (2002)
Article MATH Google Scholar
Zhang, Y., Squillante, M.S., Sivasubramaniam, A., Sahoo, R.K.: Performance implications of failures in large-scale cluster scheduling. In: 10th Workshop on JSSPP, SIGMETRICS, pp. 233–252 (2004)
Kato, S., Osogami, T.: Evaluating availability under quasi-heavy-tailed repair times. In: Proc. of Dependable Systems and Networks with FTCS and DCC, 2008, DSN 2008, pp. 442–451 (2008)
Matlab by Mathworks: http://www.matlab.com. Accessed 1 Aug 2011
Asmussen, S., Nerman, O., Olsson, M.: Fitting phase-type distributions via the EM algorithm. Scand. J. Statist. 23, 419–441 (1996)
MATH Google Scholar
Cosnard, M., Marrakchi, M., Robert, Y., Trystram, D.: Parallel gaussian elimination on an MIMD computer. Parallel Comput. 6, 275–295 (1988)
Article MathSciNet MATH Google Scholar
Sulakhe, D., Rodriguez, A., D’Souza, M., Wilde, M., Nefedova, V., Foster, I., Maltsev, N.: GNARE: an environment for Grid-based high throughput genome analysis. In: Proc. of 5th IEEE Int. Symp. Cluster Computing and Grid (CCGrid05), vol. 1, pp. 455–462. Cardiff, UK (2005)

Download references

Author information

Authors and Affiliations

School of Information Engineering, Zhengzhou University, Zhengzhou, Henan, 450000, China
Yongcai Tao & Lei Shi
Services Computing Technology and System Lab, Cluster and Grid Computing Lab, Huazhong University of Science and Technology, Wuhan, 430074, China
Hai Jin, Song Wu & Xuanhua Shi

Authors

Yongcai Tao
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Song Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xuanhua Shi
View author publications
You can also search for this author in PubMed Google Scholar
Lei Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongcai Tao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tao, Y., Jin, H., Wu, S. et al. Dependable Grid Workflow Scheduling Based on Resource Availability. J Grid Computing 11, 47–61 (2013). https://doi.org/10.1007/s10723-012-9237-0

Download citation

Received: 14 September 2011
Accepted: 13 September 2012
Published: 29 September 2012
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10723-012-9237-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Dependable Grid Workflow Scheduling Based on Resource Availability

Abstract

Article PDF

Similar content being viewed by others

A Fault-Tolerant Workflow Scheduling Algorithm for Grid with Near-Optimal Redundancy

Fault Tolerant Task Scheduling on Computational Grid Using Checkpointing Under Transient Faults

Real-time workflows oriented online scheduling in uncertain cloud environment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dependable Grid Workflow Scheduling Based on Resource Availability

Abstract

Article PDF

Similar content being viewed by others

A Fault-Tolerant Workflow Scheduling Algorithm for Grid with Near-Optimal Redundancy

Fault Tolerant Task Scheduling on Computational Grid Using Checkpointing Under Transient Faults

Real-time workflows oriented online scheduling in uncertain cloud environment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation