Abstract
The Cloud computing is an emerging distributed systems which follows a “pay-as-you-use” model. It is a new type of shared infrastructure able to offer several resources through the Internet. There is large number of users using the services over the cloud, which generating large volume of data. The scheduling of dependent tasks is a NP-complete problem and has become as one of the most challenging problems in cloud environment. There is a need of specifying a sequence of execution of these tasks to satisfy the user requirements in terms of QoS parameters such as cost, execution time, etc. The workflow scheduling is considered to be difficult, when it becomes a multi-objective optimization problem. In this paper, we presented a comprehensive description of the existing approaches based on meta-heuristics for workflow scheduling. On the basis of the related works, we found the Genetic algorithm as the best method for scheduling. A GA searches the problem space globally and therefore, scholars have investigated combining GAs with other meta-heuristic methods to resolve the local search problem. We feel that there is a scope of using hybrid meta-heuristics approach that combines Artificial Bee Colony algorithm and Genetic Algorithm (ABC-GA) for scheduling workflows in Cloud computing. Cross-over and mutation operators of GA can be embedded into ABC to improve scheduling strategy.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Cloud computing is an emerging research area for the past few years and it is the transformation of the computing in which all the services are being commoditized and provided similar to traditional utilities like water, electricity, gas, and telephony. The users are allowed to access the services and resources according to their usage based on some desired quality of service. These resources are distributed and accessed the basis of Pay-as-per-use model [1]. The demand for high performance computing is growing for complex and large-scale applications in science, engineering and commerce. The elasticity feature of cloud computing is suitable for such applications because of large amount of data and computation is involved. Many complex and computation intensive applications can be modeled as workflows. These workflow applications are generally described as the sequence of tasks to be processed in well defined order to accomplish a specific goal [2]. Cloud workflow system can be regarded as a platform service that facilitates the automation of distributed large-scale applications in the cloud computing. Workflows are represented as directed acyclic graphs (DAG), which consists of nodes that are linked according to their flow dependencies: G = (V, E), which G represents the graph, V represents a vertex or a node that is an operation or a task, and E represents the edge that shows the relationship between two nodes [3]. Flow dependency imposed the sequence of execution as a parent node will get executed before its child nodes always. Workflow scheduling is the key problem in cloud computing. A workflow scheduling framework is needed to optimize the performance of the resource provisioning process in a cloud system that can allocate the tasks into the appropriate resources according to certain scheduling strategies. This mapping of the tasks has an impact on the performance of workflow scheduling. The goal should be to attain an optimal trade-off between performance and cost. In general, the performance of the workflow varies depending on the nature of tasks. The users may work towards different goals, such as, the shortest possible execution time, the most inexpensive execution cost, or the optimized throughput. Moreover, some users may need to create a schedule plan that satisfies more than one goal. To achieve high performance by satisfying more than one objective makes the scheduling process difficult and such problem is termed as multi-objective optimization problem [4]. There are various scheduling strategies based on heuristics and meta-heuristics methods.
Given this motivation, we investigate meta-heuristics based workflow scheduling methods capable of being applied to complex domains. This paper gives a survey of various meta-heuristics scheduling methods used in Cloud and Grid. The remainder of the paper is organized as follows. We introduce workflow scheduling algorithms for clouds and grids in Sect. 2 with Table 1 and various scheduling parameters of existing algorithms with Table 2 and Sect. 3 gives the research challenges and finally, we conclude the paper with directions for the future work in Sect. 4.
2 Related Work
2.1 Workflow Scheduling in Cloud Computing: A State-of-the-Art
Initial study has been carried out on understanding scheduling algorithms for executing workflows in cloud computing systems. A lot of work has been done in the area of workflow scheduling [2, 3]. To solve the optimization problem of scheduling, several solutions have been proposed in the literature. There are heuristic-based and meta-heuristic based scheduling strategies to achieve near-optimal solutions within polynomial time, as workflow scheduling is an NP-complete problem. The meta-heuristics based scheduling strategies can deal with massive search space and find a near-optimal solution within polynomial time for all workflow structures. Some of these approaches are based on Genetic Algorithms (GA) [18, 19]. In [20], the authors run the genetic algorithm starting with an initial population consisting of randomly generated solutions. The genetic algorithm started with an initial population consisting of a solution produced by one of the simple heuristics together with other randomly generated solutions. This proposed algorithm either minimized the monetary cost while meeting user’s budget constraint, or minimized the execution time while meeting user’s deadline constraints. Artificial Bee Colony algorithm for multivariable, multimodal function optimization have been used in [5, 30]. The use of Intelligent Water Drops (IWD) algorithm for solving multi-objective job shop scheduling has been discussed in [31], which optimized the makespan, tardiness and mean flow time of the schedules in job shop. Similar to this, authors in [25] has employed IWD with ACO algorithm to optimize the makespan and load balancing of the jobs in grid environment. The paper [26] showed the use of IWD as a hybrid meta-heuristics with ACO to optimize the mean flow time of the jobs in cloud systems. There is also work in which particle swarm optimization (PSO) is used for the scheduling of workflow in Cloud environment [8–11]. CSO based multi-objective optimization approach has been used in [29] to schedule workflow in a cloud computing environment. Researches show that ant colony optimization (ACO) is considered as one of the best meta-heuristic for scheduling workflow in Cloud computing [14]. Most of these researches focus only on one or two optimization objective most often cost or time.
However, current state-of-the-art studies tackle different scheduling problems in cloud workflow systems by focusing on general QoS optimization constraints as described in a Table 1. It is clear that most of the scheduling algorithms have focused on optimizing makespan and cost [5, 6, 8, 10, 11, 21]. Some of the algorithms have considered makespan and load balancing [10, 14, 16, 19, 22, 25], while others have taken care of optimizing makespan and security [11]. Optimization of resource utilization is considered in [13, 17, 25, 27] in addition to makespan and load balancing. Some have also considered the issue of reliability along with makespan and cost [21]. Also the papers [11, 15, 18] based on meta-heuristics approaches considered the energy efficiency parameter for scheduling of the workflow tasks.
2.2 Scheduling Parameters
The scheduling decision during workflow scheduling must be guided by user’s QoS constraints [1]. There are different parameters which need to be considered when developing a scheduling algorithm for workflows. It is not possible for an algorithm to consider all the parameters in a single solution because it depends on many factors like nature or size of the job, resource availability, working environment etc. Some of the parameters considered in the existing workflow scheduling algorithms in Table 2 are explained as below:
-
Makespan Makespan, M, is the total elapsed time required to execute the entire workflow. The makespan of the workflow is computed as M = finish time − submission time. It represents the duration from the user submitting a workflow to the time it completes and receives the results.
-
Cost The cost is the monetary value which incurred while running a cloud workflow. It consists of the processing cost and the data transfer costs.
-
Security Security is an essential requirement in the cloud computing as there is a risk of sensitive data being leaked or tampered in the process of transmission or execution.
-
Scheduling overhead There can be the situation when numbers of tasks demand for the same resource. Such tasks then need to be queued up waiting for the desired resource which creates a problem of scheduling overhead.
-
Resource utilization Scheduling refers to the appropriate assignment of tasks to the resources available like CPU, memory and storage, such that there is a maximum utilization of resources.
-
Failure rate Many discrete events may lead to failures of an application such as non-availability of required services or overloaded resource conditions. The failure density function is defined as:
$$ f\left( t \right) = \lambda e^{ - \lambda t } (t \ge 0) $$(1)where λ is the failure rate of a resource [32]. If numfails be the number of failures within a resource during the job execution period runtime then the failure rate can be calculated as: \( \frac{1}{\text{MTTF}} = \frac{{{\text{num}}_{\text{fails}} }}{{{\text{run}}_{\text{time}} }} \), which is the inverse of mean time to failure (MTTF).
-
Reliability Reliability represents the probability that the workflow will be executed successfully and it is to be maximized. A schedule is an assignment of the tasks to the virtual machine and the reliability of a schedule is defined as the probability that it finishes correctly and is given by the probability that all the VMs be functional during the execution of all their assigned tasks.
-
Energy efficiency Data centers consist of huge numbers of heterogeneous servers which both consume and simultaneously waste massive power to execute numerous assigned tasks due to poor task assignment optimization. So, scheduling plays a very important role in determining the efficient execution of tasks in virtualized environments. Tasks are to be mapped to the virtual machines in such a way that the energy consumption should be reduced.
3 Research Challenges
Based on the related literature, we find that the following issues have not been sufficiently solved. These issues can give the directions for future work.
-
Scheduling overhead There are number of resources available and it may possible that more than one user demand the same resource. This creates a scenario of multiple jobs waiting in a queue, which leads to the problem of scheduling overhead. Some mechanism needs to be discussed to reduce such overhead while generating schedules in a workflow. In [33] the authors presented different clustering techniques incorporated into the Pegasus workflow mapping system on the TeraGrid and results showed significant reduction of overall workflow runtime. Similar thing can be thought in the clouds systems also.
-
Scheduling with Fault tolerance There is a need of defining a method to integrate fault tolerance with scheduling algorithm based on some meta-heuristics approach. There is no evidence of such work done in the related literature. There are various fault tolerance techniques that reduce the effect of failures on application execution when the failure occurs. Check point recovery, over-provisioning or task replication and task resubmission schemes can be integrated with scheduling as discussed in [34].
-
Storage cost and performance trade-off workflow scheduling that takes both storage cost and execution time into consideration has not been studied as extensively. The tasks of a workflow are to be mapped to the VM-instances (Virtual Machine) in the cloud. Every available virtual machine is associated with both cost and performance related attributes like processor speed, bandwidth. Reducing the storage cost has also become a challenge for the scientists while optimizing the workflow performance [35].
4 Conclusion and Future Directions
In this paper, we have discussed various workflow scheduling algorithms in clouds and Grids. Scheduling criteria should include multiple constraints to be satisfied based on the user’s requirements. There can be heuristics approach or meta-heuristics approach for optimizing the various scheduling criteria. We discussed both approaches but focus of our survey is meta-heuristics based methods. Some of the algorithms are based on Artificial Bee colony algorithms (ABC) [5, 7]. There are some work in which Genetic Algorithm (GA) is combined with other heuristic method [18–23]. Also, the use of Particle Swarm Optimization (PSO) discussed in [8, 9, 11, 12] for the scheduling of workflow in Cloud environment. Clonal selection algorithm is also being used for scheduling workflow in cloud environment [15, 17]. In [17], authors combined ACO with Clonal selection algorithm for scheduling the resources in the clouds. ACO is also combined with intelligent water drops algorithm in [25, 31] to improve the scheduling. In [27, 28], a new method based on Cuckoo Optimization has been used to generate optimal schedules in a Grid environment. Researches show that Genetic Algorithm is one of the best meta-heuristic for scheduling workflow in Cloud computing [18, 20]. In [29], Cat Swarm Optimization has been used to minimize the cost, makespan, and CPU idle time. Discrete CSO [36, 37] can also be used in combination with other heuristics method to achieve the better performance. As there is no evidence of using Discrete CSO based approach for workflow scheduling in cloud systems.
Most of the above stated researches focus only on one or two optimization objective such as cost or time and do not consider the energy efficiency and failure rate while scheduling the workflows. But, there is need to consider the failure rate and security also. The efficient scheduling will give better optimized results if energy efficiency factor is also considered. A hybrid heuristic-based scheduling [22] considered the failure rate while scheduling the tasks in workflow which is shown as in a Table 2 given. Also, the papers [15, 18] based on meta-heuristics approaches considered the energy efficiency parameter for scheduling of the workflow tasks. There is need to develop scheduling algorithms that would focus on failure rate, reliability, and energy efficient parameters.
Also, we found that the Artificial Bee Colony has not been used for optimization in workflow scheduling. The experimental results of various research works [5, 30, 38, 39] reveal that the ABC algorithm is having potential to solve optimization problems and more research is required to adapt it to other engineering problems. After getting inspiration from its successful implementation in [18] to solve Job scheduling problem in cloud computing, we feel that there is a scope of using hybrid meta-heuristics approach that combines Artificial Bee Colony algorithm and Genetic Algorithm (ABC-GA) for multi-objective optimization workflow scheduling problems. Cross-over and mutation operators of GA can be embedded into ABC to improve scheduling strategy. A hybrid meta-heuristics based energy efficient workflow scheduling can be considered for future work, which schedules the tasks on the resources by minimizing the cost and that consider the failure rate of the resources.
References
Grance T, Mell P. The NIST definition of cloud computing—recommendations of the National Institute of Standards and Technology. Special Publication 800-145, NIST, Gaithersburg; 2011.
Buyya RK, Kotagiri R, Yu J. Workflow scheduling algorithm for grid computing. In: Meta-heuristics for scheduling in distributed computing environment, vol. 146. Berlin Heidelberg: Springer; 2008. p. 173–214.
Barrionuevo JJD, Fard HM, Prodan R, Fahringer T. A multi-objective approach for workflow scheduling in heterogeneous environment: cluster, cloud and grid computing. In: 12th IEEE International conference; 2012. p. 300–309.
Hoheisel A, Prodan R, Wieczorek M. Taxonomies of the multi-criteria grid workflow scheduling problem. In: Grid middleware and service. Springer; 2008. p. 237–64.
Achalakul T, Udomkasemsub O, Li XO. A multiple-objective workflow scheduling framework for cloud data analytics. In: International Joint Conference; 2012. p. 391–398.
Bitam S. Bees life algorithm for job scheduling in cloud computing. In: Second international conference on communications and information technology; 2012.
Dhinesh Babu LD, Venkata KP. Honey bee behavior inspired load balancing of tasks in cloud computing environments. Appl Soft Comput. 2013; 2292–2303.
Sivanandam SN, Visalakshi P. Dynamic task scheduling with load balancing using hybrid PSO. Int J Open Problems Comput Math. 2009; 475–488.
Buyya RK, Guru SM, Pandey S, Wu L. A particle swarm optimization based heuristic for scheduling workflow applications in cloud computing environments. In: 24th IEEE international conference on advanced information networking and applications (AINA). 2010; 400–407.
Sultan EI. Quantum PSO technique for load balancing in cloud computing. PhD Thesis; 2013.
Jianfang C, Junjie C, Qingshan Z. An optimized scheduling algorithm on a cloud workflow using a discrete particle swarm. Cybern Inform Technol. 2014; 25–39.
Buyya RK, Rodriguez MA. Deadline based resource provisioning and scheduling algorithm for scientific workflows on clouds. IEEE Trans Cloud Comput. 2014; 222–235.
Channa I, Rajni: Bacterial foraging based hyper-heuristic for resource scheduling in grid computing. In: Future Generation Computer System; 2012. p. 751–762.
Dong Y, Li K, Wang D, Xu G, Zhao G. Cloud task scheduling based on load balancing ACO. In: Sixth annual chinagrid Conference; 2011. p. 3–9.
Shu W, Wang W, Wang Y. A novel energy efficient resource allocation algorithm based on Immune clonal optimization for green cloud computing. EURASIP J Wireless Commun Networking; 2014.
Cao J, Hwang K, Khan SU, Li K, Zhanga F. Multi-objective scheduling of many tasks in cloud platforms. In: Future generation computer systems; 2013.
Lin J, Lin X, Lin H, Zhong Y, Zeng Q. Hybrid ant colony algorithm clonal selection in the application of the cloud’s resource scheduling. In: Distributed parallel, and cluster computing (cs.DC). arXiv:1411.2528v1; 2014.
Liu J, Li B, Luo XG, Zhang XM, Zhang F. Job scheduling model for cloud computing based on multi-objective genetic algorithm. Int J Comput Sci. 2013; 134–9.
Abraham A, Amendola D, Cordeschi N, Javanmardi S, Liu H, Shojafar M. Hybrid job scheduling algorithm for cloud computing environment. In: Proceedings of the fifth international conference on innovations in bio-inspired computing and applications IBICA 2014, vol 303. Advances in Intelligent Systems and Computing; 2014. p. 43–52.
Buyya RK, Yu J. Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms. In: Scientific Programming Journal IOS Press, Amsterdam; 2006. p. 217–230.
Chelouah R, Nacer MA, Sellami K, Tiako PF. Immune genetic algorithm for scheduling service workflows with QoS constraints in cloud computing. S Afr J Ind Eng. 2013;24:68–82.
Aryan Y, Delavar AG. HSGA: a hybrid heuristic algorithm for workflow scheduling in cloud systems. J Cluster Comput. Springer; 2013. p. 129–137.
Abraham A, Pooranian Z, Shojafar M, Singhal M, Tavoli R. A hybrid metaheuristics algorithm for job scheduling on computational grids. Informatica. 2013;37:157–64.
Fan Z, Li Y, Shen H, Wu Y. Simulated-annealing load balancing for resource allocation in cloud environments. In: International conference on parallel and distributed computing, applications and technologies. IEEE Computer Society, Washington, USA; 2013. p. 1–6.
Mathiyalagan P, Sivanandam SN, Saranya KS. Hybridization of modified ant colony optimization and intelligent water drops algorithm for job scheduling. In: Computational Grid. ICTACT Journal on Soft Computing; 2013. p. 651–655.
Johnson M, Preethima RA. Hybrid ACO-IWD optimization algorithm for minimizing weighted flow time in cloud based parameter sweep experiments. Int J Res Eng Technol. 2014; 317–321.
Rabiee M, Sajedi H. Job scheduling in grid computing with cuckoo optimization algorithm. Int J Comput Appl. 2013;62:38–43.
Rabiee M, Sajedi H. A metaheuristic algorithm for job scheduling in grid computing. Int J Mod Educ Comput Sci. 2014;05:52–9.
Bilgaiyan S, Das M, Sagnika S. A multi-objective cat swarm optimization algorithm for workflow scheduling in cloud computing environment. In: Proceedings of international conference on intelligent computing, communication and devices. Advances in Intelligent Systems and Computing. Springer; 2015. p. 73–84.
Basturk B, Karaboga D. A powerful and efficient algorithm for numerical function optimization- artificial bee colony (ABC) algorithm. J Global Optim. 2007; 459–471.
Niu SH, Nee AYC, Ong SK. An improved intelligent water drops algorithm for solving multi-objective job shop scheduling. Eng Appl Artif Intell. 2013; 2431–2442.
Buyya RK, Su J, Wang X, Yeo CS. Optimizing makespan and reliability for workflow applications with reputation and look-ahead genetic algorithm. Future Gener Comput Syst. 2011;27:1124–34.
Berriman B, Deelman E, Good J, Katz DS, Mehta G, Singh G, Su MH, Vahi K. Workflow task clustering for best effort systems with pegasus. In: Proceedings of the 15th ACM Mardi Gras conference; 2008.
Cooper K, Koelbel C, Mandal A, Zhang Y. Combined fault tolerance and scheduling techniques for workflow applications on computational grids. In: Proceedings of 9th IEEE/ACM international symposium on cluster computing and the grid; 2009. p. 244–251.
Lin X, Wu CQ. On scientific workflow scheduling in clouds under budget constraint. In: Proceedings of 42nd international conference on parallel processing. IEEE; 2013. p. 90–99.
Khanesar A, Sharafi Y, Teshnehlab M. Discrete binary cat swarm optimization algorithm. In: Proceedings of 3rd international conference on computer, control and communication; 2013. p. 1– 6.
Bouzidi A, Riffi ME. Discrete cat swarm optimization to resolve the traveling salesman problem. Int J Adv Res Comput Sci Softw Eng. 2013; 13–18.
Bahriye A, Karaboga D. A comparative study of artificial bee colony algorithm. Erciyes University, Department of Computer Engineering, Melikgazi, 38039 Kayseri, Turkey 2009.
Achalakul T, Banharnsakun A, Sirinaovakul B. Job shop scheduling with the best-so far ABC. Eng Appl Artif Intell 2011.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Poonam, Dutta, M., Aggarwal, N. (2016). Meta-Heuristics Based Approach for Workflow Scheduling in Cloud Computing: A Survey. In: Dash, S., Bhaskar, M., Panigrahi, B., Das, S. (eds) Artificial Intelligence and Evolutionary Computations in Engineering Systems. Advances in Intelligent Systems and Computing, vol 394. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2656-7_121
Download citation
DOI: https://doi.org/10.1007/978-81-322-2656-7_121
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2654-3
Online ISBN: 978-81-322-2656-7
eBook Packages: EngineeringEngineering (R0)