Abstract
Due to the highly dynamic feature, dependable workflow scheduling is critical in the Grid environment. Various scheduling algorithms have been proposed, but seldom consider the resource reliability. Current Grid systems mainly exploit fault tolerance mechanism to guarantee the dependable workflow execution, which, however, wastes system resources. The paper proposes a dependable Grid workflow scheduling system (called DGWS). It introduces a Markov Chain-based resource availability prediction model. Based on the model, a reliability cost driven workflow scheduling algorithm is presented. The performance evaluation results, including the simulation on both parametric randomly generated DAGs and two real scientific workflow applications, demonstrate that compared to present workflow scheduling algorithms, DGWS improves the success ratio of tasks and diminishes the makespan of workflow, so improves the dependability of workflow execution in the dynamic Grid environments.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Topcuoglu, H., Hariri, S., Wu, M.: Performance effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Mandal, A., Kennedy, K., Koelbel, C., Marin, G., Mellor-Crummey, J., Liu, B., Johnsson, L.: Scheduling strategies for mapping application workflows onto the Grid. In: Proc. of 14th IEEE International Symposium on High Performance Distributed Computing (HPDC-14), pp. 125–134. IEEE Computer Society, Research Triangle Park, North Carolina, USA (2005)
Sih, G.C., Lee, E.A.: A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architecture. IEEE Trans. Parallel Distrib. Syst. 4(2), 175–187 (1993)
Hwang, S., Kesselman, C.: Grid workflow: a flexible failure handling framework for the Grid. In: Proc. of 12th IEEE International Symposium on High Performance Distributed Computing (HPDC-12), pp. 126–137, Seattle, Washington, USA. IEEE Computer Society Press, Los Alamitos, CA, USA, (2003)
He, Y., Shao, Z., Xiao, B., Zhuge, Q., Sha, E.: Reliability driven task scheduling for heterogeneous systems. In: The 15th IASTED International Conference on Parallel and Distributed Computing and Systems 1, pp. 465–470 (2003)
Qin, X., Jiang, H., Swanson, D.R.: An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems. In: Proc. of the 2002 International Conference on Parallel Processing, pp. 360–368 (2002)
Truong, H.L., Fahringer, T., Dustdar, S.: Dynamic instrumentation, performance monitoring and analysis of Grid scientific workflows. J. Grid Computing 3(1–2), 1–18 (2005)
Yu, J., Buyya, R.: A taxonomy of workflow management systems for Grid computing. J. Grid Computing 3(3–4), 171–200 (2005)
Krauter, K., Buyya, R., Maheswaran, M.: A taxonomy and survey of Grid resource management systems for distributed computing. Softw. Pract. Exp. 32(2), 135–164 (2002)
Cao, J., Jarvis, S.A., Saini, S., Nudd, G.R.: GridFlow: workflow management for Grid computing. In: 3rd International Symposium on Cluster Computing and the Grid (CCGrid). IEEE Computer Society Press, Los Alamitos, Tokyo, Japan (2003)
Buyya, R., Murshed, M., Abramson, D., Venugopal, S.: Scheduling parameter sweep applications on global Grids: a deadline and budget constrained cost-time optimization algorithm. Softw. Pract. Exp. (SPE) J. 35(5), 491–512 (2005)
Vanmechelen, K., Depoorter, W., Broeckhove, J.: Combining futures and spot markets: a hybrid market approach to economic Grid resource management. J. Grid Computing 9(1), 81–94 (2011)
Prodan, R., Wieczorek, M., Mohammadi Fard, H.: Double auction-based scheduling of scientific applications in distributed Grid and cloud environments. J. Grid Computing 9(4), 531–548 (2011)
Song, S.S., Hwang, K., Kwok, Y.K.: Trusted Grid computing with security binding and trust integration. J. Grid Comput. 3(1–2), 53–73 (2005)
Sahoo, R., Sivasubramaniam, A., Squillante, M.S., Zhang, Y.: Failure data analysis of a large-scale heterogeneous server environment. In: The International Conference on Dependable Systems and Networks (DSN), Florence, Italy (2004)
Heath, T., Martin, R., Nguyen, T.D.: Improving cluster availability using workstation validation. In: The ACM SIGMETRICS 2002, pp. 217–227. Marina Del Rey, CA (2002)
Sahoo, R., Oliner, A.J., Rish, I., Gupta, M., Moreira, J.E., Ma, S., Vilalta, R., Sivasubramaniam, A.: Critical event prediction for proactive management in large-scale computing clusters. In: Proc. of the ACM SIGKDD, pp. 426–435 (2003)
Fu, S., Xu, C.-Z.: Quantifying temporal and spatial correlation of failure events for proactive management. In: Proc. of IEEE International Symposium on Reliable Distributed Systems (SRDS), pp. 175–184 (2007)
Nurmi, D., Brevik, J., Wolski, R.: Modeling machine availability in enterprise and wide-area distributed computing environments. In: Technical Report CS2003–28, U.C. Santa Barbara Computer Science Department (2003)
Brevik, J., Nurmi, D., Wolski, R.: Automatic methods for predicting machine availability in desktop Grid and peer-to-peer systems. In: Proc. of CCGrid’04, pp. 190–199 (2004)
Ren, X.J., Lee, S., Eigenmann, R., Bagchi, S.: Resource failure prediction in fine-grained cycle sharing systems. In: Proc. of 15th IEEE International Symposium on High Performance Distributed Computing, pp. 93–104. IEEE Computer Society Paris, France (2006)
Ren, X.J., Lee, S., Eigenmann, R., Bagchi, S.: Prediction of resource availability in fine-grained cycle sharing systems empirical evaluation. J. Grid Computing 5(2), 173–195 (2007)
Malewicz, G., Foster, I., Rosenberg, A.L., Wilde, M.: A tool for prioritizing DAGMan jobs and its evaluation. J. Grid Computing 5(2), 197–212 (2007)
Wu, M., Sun, X.H.: Grid harvest service: a performance system of Grid computing. J. Parallel Distrib. Comput. 66(10), 1322–1337 (2006)
Sen, A., Bhattacharyya, G.K.: A piecewise exponential model for reliability growth and associated inferences. In: Basu, A.P. (ed.) Advances in Reliability, pp. 331–355. Elsevier (1993)
Calabria, R., Guida, M., Pulcini, G.: A Bayes procedure for estimation of current system reliability. IEEE Trans. Reliab. 41, 616–620 (1992)
Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Introducing Markov chain Monte Carlo. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 1–19. Chapman & Hall, London (1996)
Sakellariou, R., Zhao, H.: A hybrid heuristic for DAG scheduling on heterogeneous systems. In: Proc. of 13th Heterogeneous Computing Workshop (HCW-2004), Santa Fe, New Mexico, USA (2004)
Jin, H.: ChinaGrid: making Grid computing a reality. In: Digital Libraries: International Collaboration and Cross-Fertilization, Lecture Notes in Computer Science, vol. 3334, pp. 13–24. Springer (2004)
Buyya, R., Murshed, M.: GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing. J. Concurr. Comput. Pract. Exp. 14(13–15), 1175–1220 (2002)
Zhang, Y., Squillante, M.S., Sivasubramaniam, A., Sahoo, R.K.: Performance implications of failures in large-scale cluster scheduling. In: 10th Workshop on JSSPP, SIGMETRICS, pp. 233–252 (2004)
Kato, S., Osogami, T.: Evaluating availability under quasi-heavy-tailed repair times. In: Proc. of Dependable Systems and Networks with FTCS and DCC, 2008, DSN 2008, pp. 442–451 (2008)
Matlab by Mathworks: http://www.matlab.com. Accessed 1 Aug 2011
Asmussen, S., Nerman, O., Olsson, M.: Fitting phase-type distributions via the EM algorithm. Scand. J. Statist. 23, 419–441 (1996)
Cosnard, M., Marrakchi, M., Robert, Y., Trystram, D.: Parallel gaussian elimination on an MIMD computer. Parallel Comput. 6, 275–295 (1988)
Sulakhe, D., Rodriguez, A., D’Souza, M., Wilde, M., Nefedova, V., Foster, I., Maltsev, N.: GNARE: an environment for Grid-based high throughput genome analysis. In: Proc. of 5th IEEE Int. Symp. Cluster Computing and Grid (CCGrid05), vol. 1, pp. 455–462. Cardiff, UK (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tao, Y., Jin, H., Wu, S. et al. Dependable Grid Workflow Scheduling Based on Resource Availability. J Grid Computing 11, 47–61 (2013). https://doi.org/10.1007/s10723-012-9237-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-012-9237-0