Abstract
Given the characteristics of dynamic provisioning and illusion of unlimited resources, clouds are becoming a popular alternative for running scientific workflows. In a cloud system for processing workflow applications, the system’s performance is heavily influenced by two factors: the scheduling strategy and failure of components. Failures in a cloud system can simultaneously affect several users and depreciate the number of available computing resources. A bad scheduling strategy can increase the expected makespan and the idle time of physical machines. In this paper, we propose an optimization method for the scheduling of scientific workflows on cloud systems. The method comprises the use of a meta-heuristic algorithm coupled to a performability model that provides the fitnesses of explored solutions. For being able to represent the combined effect of scheduling and component failures, we adopted discrete event simulation for the performability model. Experimental results show the effectiveness of the hybrid simulation-optimization approach for optimizing the number of allocated virtual machines and the scheduling of tasks regarding performability.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Alwabel, A., Walters, R., Wills, G.: Desktopcloudsim: Simulation of node failures in the cloud. In: International Conference on Cloud Computing, GRIDs, and Virtualization, p. 29 (2015)
Ando, E., Nakata, T., Yamashita, M.: Approximating the longest path length of a stochastic dag by a normal distribution in linear time. J. Discrete Algoritms 7(4), 420–438 (2009)
Arabnejad, H., Barbosa, J.G.: A budget constrained scheduling algorithm for workflow applications. J. Grid Comput. 12(4), 665–679 (2014)
Bianchi, L., Dorigo, M., Gambardella, L.M., Gutjahr, W.J.: A survey on metaheuristics for stochastic combinatorial optimization. Nat. Comput. 8(2), 239–287 (2009)
Bitam, S.: Bees life algorithm for job scheduling in cloud computing. In: Proceedings of the Third International Conference on Communications and Information Technology, pp. 186–191 (2012)
Blum, C., Roli, A.: Metaheuristics in combinatorial optimization: overview and conceptual comparison. ACM Comput. Surv. (CSUR) 35(3), 268–308 (2003)
Bolch, G., Greiner, S., de Meer, H., Trivedi, K.S.: Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. Wiley, Hoboken (2006)
Book, R.V., et al.: Michael r. garey and david s. johnson, computers and intractability: a guide to the theory of np-completeness. Bulletin (New Series) of the American Mathematical Society 3(2), 898–904 (1980)
Brown, D.A., Brady, P.R., Dietz, A., Cao, J., Johnson, B., McNabb, J.: A case study on the use of workflow technologies for scientific analysis: gravitational wave data analysis. In: Workflows for E-Science, pp. 39–59. Springer (2007)
Bux, M., Leser, U.: Dynamiccloudsim: Simulating heterogeneity in computational clouds. Futur. Gener. Comput. Syst. 46, 85–99 (2015)
Cai, Z., Li, Q., Li, X.: Elasticsim: a toolkit for simulating workflows with cloud resource runtime auto-scaling and stochastic task execution times. J. Grid Comput. 15(2), 257–272 (2017)
Cai, Z., Li, X., Ruiz, R., Li, Q.: A delay-based dynamic scheduling algorithm for bag-of-task workflows with stochastic task execution times in clouds. Futur. Gener. Comput. Syst. 71, 57–72 (2017)
Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A., Buyya, R.: Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw. Pract. Exp. 41(1), 23–50 (2011)
Chen, W., Deelman, E.: Workflowsim: a toolkit for simulating scientific workflows in distributed environments. In: 2012 IEEE 8th International Conference on E-Science (E-Science), pp. 1–8. IEEE (2012)
Chen, W.N., Zhang, J.: Ant colony optimization for software project scheduling and staffing with an event-based scheduler. IEEE Trans. Softw. Eng. 39(1), 1–17 (2013)
Davis, N.A., Rezgui, A., Soliman, H., Manzanares, S., Coates, M.: Failuresim: a system for predicting hardware failures in cloud data centers using neural networks. In: 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), pp. 544–551. IEEE (2017)
Entezari-Maleki, R., Trivedi, K.S., Sousa, L., Movaghar, A.: Performability-based workflow scheduling in grids. The Computer Journal (2018)
Ever, E.: Performability analysis of cloud computing centers with large numbers of servers. J. Supercomput. 73(5), 2130–2156 (2017)
Ghosh, R., Trivedi, K.S., Naik, V.K., Kim, D.S.: End-To-End performability analysis for infrastructure-as-a-service cloud: an interacting stochastic models approach. In: 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing (PRDC), pp. 125–132. IEEE (2010)
Goldberg, D.E., Lingle, R., et al.: Alleles, loci, and the traveling salesman problem. In: Proceedings of an International Conference on Genetic Algorithms and their Applications, vol. 154, pp. 154–159. Lawrence Erlbaum, Hillsdale (1985)
Gorissen, D., Couckuyt, I., Demeester, P., Dhaene, T., Crombecq, K.: A surrogate modeling and adaptive sampling toolbox for computer based design. J. Mach. Learn. Res. 11, 2051–2055 (2010)
Gu, J., Hu, J., Zhao, T., Sun, G.: A new resource scheduling strategy based on genetic algorithm in cloud computing environment. J. Comput. 7(1), 42–52 (2012)
Guimarães, A.P., Maciel, P.R., Matias, R.: An analytical modeling framework to evaluate converged networks through business-oriented metrics. Reliab. Eng. Syst. Saf. 118, 81–92 (2013)
Hamby, D.: A review of techniques for parameter sensitivity analysis of environmental models. Environ. Monit. Assess. 32(2), 135–154 (1994)
Hoffa, C., Mehta, G., Freeman, T., Deelman, E., Keahey, K., Berriman, B., Good, J.: On the use of cloud computing for scientific workflows. In: 2008. Escience’08. IEEE Fourth International Conference on Escience, pp. 640–645. IEEE (2008)
Juve, G., Bharathi, S.: Pegasus synthetic workflow generator. https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator (2014)
Juve, G., Deelman, E., Vahi, K., Mehta, G., Berriman, B., Berman, B.P., Maechling, P.: Scientific workflow applications on amazon Ec2. In: 2009 5th IEEE International Conference on E-Science Workshops, pp. 59–66. IEEE (2009)
Kim, D.S., Machida, F., Trivedi, K.S.: Availability modeling and analysis of a virtualized system. In: 2009. PRDC’09. 15th IEEE Pacific Rim International Symposium on Dependable Computing, pp. 365–371. IEEE (2009)
Kliazovich, D., Pecero, J.E., Tchernykh, A., Bouvry, P., Khan, S.U., Zomaya, A.Y.: Ca-dag: Modeling communication-aware applications for scheduling in cloud computing. J. Grid Comput. 14(1), 23–39 (2016)
Kohne, A., Spohr, M., Nagel, L., Spinczyk, O.: Federatedcloudsim: a sla-aware federated cloud simulation framework. In: Proceedings of the 2nd International Workshop on CrossCloud Systems, pp. 3. ACM (2014)
LD, D.B., Krishna, P.V.: Honey bee behavior inspired load balancing of tasks in cloud computing environments. Appl. Soft Comput. 13(5), 2292–2303 (2013)
Lin, W., Wu, W., Wang, J.Z.: A heuristic task scheduling algorithm for heterogeneous virtual clusters. Sci. Program. 2016, Article ID 7040276 (2016)
Maciel, P., Matos, R., Silva, B., Figueiredo, J., Oliveira, D., Fé, I., Maciel, R., Dantas, J.: Mercury: performance and dependability evaluation of systems with exponential, expolynomial, and general distributions. In: 2017 IEEE 22Nd Pacific Rim International Symposium on Dependable Computing (PRDC), pp. 50–57. IEEE (2017)
Mainkar, V., Trivedi, K.S.: Sufficient conditions for existence of a fixed point in stochastic reward net-based iterative models. IEEE Trans. Softw. Eng. 22(9), 640–653 (1996)
Malawski, M., Juve, G., Deelman, E., Nabrzyski, J.: Algorithms for cost-and deadline-constrained provisioning for scientific workflow ensembles in iaas clouds. Futur. Gener. Comput. Syst. 48, 1–18 (2015)
Meyer, J.F.: On evaluating the performability of degradable computing systems. IEEE Trans. Comput. C-29(8), 720–731 (1980)
Mezmaz, M., Melab, N., Kessaci, Y., Lee, Y.C., Talbi, E.G., Zomaya, A.Y., Tuyttens, D.: A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems. J. Parallel Distrib. Comput. 71(11), 1497–1508 (2011)
Molloy, M.K.: Performance analysis using stochastic petri nets. IEEE Trans. Comput. 31(9), 913–917 (1982)
Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965)
Oliveira, D., Matos, R., Dantas, J., Ferreira, J., Silva, B., Callou, G., Maciel, P., Brinkmann, A.: Advanced stochastic petri net modeling with the mercury scripting language. In: ValueTools 2017, 11th EAI International Conference on Performance Evaluation Methodologies and Tools. Venice, Italy. Elsevier (2017)
Panda, S.K., Jana, P.K.: Efficient task scheduling algorithms for heterogeneous multi-cloud environment. J. Supercomput. 71(4), 1505–1533 (2015)
Plateau, B., Atif, K.: Stochastic automata network of modeling parallel systems. IEEE Trans. Softw. Eng. 17(10), 1093–1108 (1991)
Qiu, X., Sun, P., Guo, X., Xiang, Y.: Performability analysis of a cloud system. In: 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), pp. 1–6. IEEE (2015)
Queipo, N.V., Haftka, R.T., Shyy, W., Goel, T., Vaidyanathan, R., Tucker, P.K.: Surrogate-based analysis and optimization. Prog. Aerosp. Sci. 41(1), 1–28 (2005)
Raei, H., Yazdani, N.: Performability analysis of cloudlet in mobile cloud computing. Inform. Sci. 388, 99–117 (2017)
Ramakrishnan, L., Reed, D.A.: Performability modeling for scheduling and fault tolerance strategies for scientific workflows. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing, pp. 23–34. ACM (2008)
Rimal, B.P., Maier, M.: Workflow scheduling in multi-tenant cloud computing environments. IEEE Trans. Parallel Distrib. Syst. 28(1), 290–304 (2017)
Rodriguez, M.A., Buyya, R.: A taxonomy and survey on scheduling algorithms for scientific workflows in iaas cloud computing environments. Concurr. Comput. Pract. Exp. 29(8), e4041 (2017)
Sousa, E., Lins, F., Tavares, E., Cunha, P., Maciel, P.: A modeling approach for cloud infrastructure planning considering dependability and cost requirements. IEEE Trans. Syst. Man Cybern. Syst. Hum. 45(4), 549–558 (2015)
Sousa, E., Lins, F., Tavares, E., Maciel, P.: Cloud infrastructure planning considering different redundancy mechanisms. Computing 99(9), 841–864 (2017)
Swisher, J.R., Hyden, P.D., Jacobson, S.H., Schruben, L.W.: A Survey of simulation optimization techniques and procedures. In: Simulation Conference, 2000. Proceedings. Winter, vol. 1, pp. 119–128. IEEE (2000)
Tawfeek, M.A., El-Sisi, A., Keshk, A.E., Torkey, F.A.: Cloud task scheduling based on ant colony optimization. In: 2013 8th International Conference on Computer Engineering & Systems (ICCES), pp. 64–69. IEEE (2013)
Tsai, C.W., Rodrigues, J.J.: Metaheuristic scheduling for cloud: a survey. IEEE Syst. J. 8(1), 279–291 (2014)
Vinay, K., Kumar, S.D.: Fault-tolerant scheduling for scientific workflows in cloud environments. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 150–155. IEEE (2017)
Vöckler, J. S., Juve, G., Deelman, E., Rynge, M., Berriman, B.: Experiences using cloud computing for a scientific workflow application, In: Proceedings of the 2nd International Workshop on Scientific Cloud Computing, pp. 15–24. ACM (2011)
Wang, J., Bao, W., Zhu, X., Yang, L.T., Xiang, Y.: Festal: fault-tolerant elastic scheduling algorithm for real-time tasks in virtualized clouds. IEEE Trans. Comput. 64(9), 2545–2558 (2015)
Wang, T., Chang, X., Liu, B.: Performability analysis for iaas cloud data center. In: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 91–94. IEEE (2016)
Xia, Y., Zhou, M., Luo, X., Zhu, Q., Li, J., Huang, Y.: Stochastic modeling and quality evaluation of infrastructure-as-a-service clouds. IEEE Trans. Autom. Sci. Eng. 12(1), 162–170 (2015)
Xu, Y., Li, K., He, L., Zhang, L., Li, K.: A hybrid chemical reaction optimization scheme for task scheduling on heterogeneous computing systems. IEEE Trans. Parallel Distrib. Syst. 26 (12), 3208–3222 (2015)
Zhao, C., Zhang, S., Liu, Q., Xie, J., Hu, J.: Independent tasks scheduling based on genetic algorithm in cloud computing. In: 2009. Wicom’09. 5th International Conference on Wireless Communications, Networking and Mobile Computing, pp. 1–4. IEEE (2009)
Zhao, H.W., Tian, L.W.: Resource schedule algorithm based on artificial fish swarm in cloud computing environment. In: Applied Mechanics and Materials, vol. 635, pp. 1614–1617. Trans Tech Publ (2014)
Zheng, W., Sakellariou, R.: Stochastic dag scheduling using a monte carlo approach. J. Parallel Distrib. Comput. 73(12), 1673–1689 (2013)
Zheng, W., Wang, C., Zhang, D.: A randomization approach for stochastic workflow scheduling in clouds. Sci. Program. 2016, Article ID 9136107 (2016)
Zheng, Z., Wang, R., Zhong, H., Zhang, X.: An approach for cloud resource scheduling based on parallel genetic algorithm. In: 2011 3rd International Conference on Computer Research and Development (ICCRD), vol. 2, pp. 444–447. IEEE (2011)
Zhou, A., Wang, S., Sun, Q., Zou, H., Yang, F.: Ftcloudsim: a simulation tool for cloud service reliability enhancement mechanisms. In: Proceedings Demo & Poster Track of ACM/IFIP/USENIX International Middleware Conference, p. 2. ACM (2013)
Zhu, X., Wang, J., Guo, H., Zhu, D., Yang, L.T., Liu, L.: Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Trans. Parallel Distrib. Syst. 27(12), 3501–3517 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Oliveira, D., Brinkmann, A., Rosa, N. et al. Performability Evaluation and Optimization of Workflow Applications in Cloud Environments. J Grid Computing 17, 749–770 (2019). https://doi.org/10.1007/s10723-019-09476-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-019-09476-0