Keywords

1 Introduction

Cloud computing is an emerging research area for the past few years and it is the transformation of the computing in which all the services are being commoditized and provided similar to traditional utilities like water, electricity, gas, and telephony. The users are allowed to access the services and resources according to their usage based on some desired quality of service. These resources are distributed and accessed the basis of Pay-as-per-use model [1]. The demand for high performance computing is growing for complex and large-scale applications in science, engineering and commerce. The elasticity feature of cloud computing is suitable for such applications because of large amount of data and computation is involved. Many complex and computation intensive applications can be modeled as workflows. These workflow applications are generally described as the sequence of tasks to be processed in well defined order to accomplish a specific goal [2]. Cloud workflow system can be regarded as a platform service that facilitates the automation of distributed large-scale applications in the cloud computing. Workflows are represented as directed acyclic graphs (DAG), which consists of nodes that are linked according to their flow dependencies: G = (V, E), which G represents the graph, V represents a vertex or a node that is an operation or a task, and E represents the edge that shows the relationship between two nodes [3]. Flow dependency imposed the sequence of execution as a parent node will get executed before its child nodes always. Workflow scheduling is the key problem in cloud computing. A workflow scheduling framework is needed to optimize the performance of the resource provisioning process in a cloud system that can allocate the tasks into the appropriate resources according to certain scheduling strategies. This mapping of the tasks has an impact on the performance of workflow scheduling. The goal should be to attain an optimal trade-off between performance and cost. In general, the performance of the workflow varies depending on the nature of tasks. The users may work towards different goals, such as, the shortest possible execution time, the most inexpensive execution cost, or the optimized throughput. Moreover, some users may need to create a schedule plan that satisfies more than one goal. To achieve high performance by satisfying more than one objective makes the scheduling process difficult and such problem is termed as multi-objective optimization problem [4]. There are various scheduling strategies based on heuristics and meta-heuristics methods.

Given this motivation, we investigate meta-heuristics based workflow scheduling methods capable of being applied to complex domains. This paper gives a survey of various meta-heuristics scheduling methods used in Cloud and Grid. The remainder of the paper is organized as follows. We introduce workflow scheduling algorithms for clouds and grids in Sect. 2 with Table 1 and various scheduling parameters of existing algorithms with Table 2 and Sect. 3 gives the research challenges and finally, we conclude the paper with directions for the future work in Sect. 4.

Table 1 Survey of workflow scheduling algorithms based on hybrid heuristics and meta-heuristics methods
Table 2 Scheduling parameters considered by existing workflow scheduling algorithms

2 Related Work

2.1 Workflow Scheduling in Cloud Computing: A State-of-the-Art

Initial study has been carried out on understanding scheduling algorithms for executing workflows in cloud computing systems. A lot of work has been done in the area of workflow scheduling [2, 3]. To solve the optimization problem of scheduling, several solutions have been proposed in the literature. There are heuristic-based and meta-heuristic based scheduling strategies to achieve near-optimal solutions within polynomial time, as workflow scheduling is an NP-complete problem. The meta-heuristics based scheduling strategies can deal with massive search space and find a near-optimal solution within polynomial time for all workflow structures. Some of these approaches are based on Genetic Algorithms (GA) [18, 19]. In [20], the authors run the genetic algorithm starting with an initial population consisting of randomly generated solutions. The genetic algorithm started with an initial population consisting of a solution produced by one of the simple heuristics together with other randomly generated solutions. This proposed algorithm either minimized the monetary cost while meeting user’s budget constraint, or minimized the execution time while meeting user’s deadline constraints. Artificial Bee Colony algorithm for multivariable, multimodal function optimization have been used in [5, 30]. The use of Intelligent Water Drops (IWD) algorithm for solving multi-objective job shop scheduling has been discussed in [31], which optimized the makespan, tardiness and mean flow time of the schedules in job shop. Similar to this, authors in [25] has employed IWD with ACO algorithm to optimize the makespan and load balancing of the jobs in grid environment. The paper [26] showed the use of IWD as a hybrid meta-heuristics with ACO to optimize the mean flow time of the jobs in cloud systems. There is also work in which particle swarm optimization (PSO) is used for the scheduling of workflow in Cloud environment [811]. CSO based multi-objective optimization approach has been used in [29] to schedule workflow in a cloud computing environment. Researches show that ant colony optimization (ACO) is considered as one of the best meta-heuristic for scheduling workflow in Cloud computing [14]. Most of these researches focus only on one or two optimization objective most often cost or time.

However, current state-of-the-art studies tackle different scheduling problems in cloud workflow systems by focusing on general QoS optimization constraints as described in a Table 1. It is clear that most of the scheduling algorithms have focused on optimizing makespan and cost [5, 6, 8, 10, 11, 21]. Some of the algorithms have considered makespan and load balancing [10, 14, 16, 19, 22, 25], while others have taken care of optimizing makespan and security [11]. Optimization of resource utilization is considered in [13, 17, 25, 27] in addition to makespan and load balancing. Some have also considered the issue of reliability along with makespan and cost [21]. Also the papers [11, 15, 18] based on meta-heuristics approaches considered the energy efficiency parameter for scheduling of the workflow tasks.

2.2 Scheduling Parameters

The scheduling decision during workflow scheduling must be guided by user’s QoS constraints [1]. There are different parameters which need to be considered when developing a scheduling algorithm for workflows. It is not possible for an algorithm to consider all the parameters in a single solution because it depends on many factors like nature or size of the job, resource availability, working environment etc. Some of the parameters considered in the existing workflow scheduling algorithms in Table 2 are explained as below:

  • Makespan Makespan, M, is the total elapsed time required to execute the entire workflow. The makespan of the workflow is computed as M = finish time − submission time. It represents the duration from the user submitting a workflow to the time it completes and receives the results.

  • Cost The cost is the monetary value which incurred while running a cloud workflow. It consists of the processing cost and the data transfer costs.

  • Security Security is an essential requirement in the cloud computing as there is a risk of sensitive data being leaked or tampered in the process of transmission or execution.

  • Scheduling overhead There can be the situation when numbers of tasks demand for the same resource. Such tasks then need to be queued up waiting for the desired resource which creates a problem of scheduling overhead.

  • Resource utilization Scheduling refers to the appropriate assignment of tasks to the resources available like CPU, memory and storage, such that there is a maximum utilization of resources.

  • Failure rate Many discrete events may lead to failures of an application such as non-availability of required services or overloaded resource conditions. The failure density function is defined as:

    $$ f\left( t \right) = \lambda e^{ - \lambda t } (t \ge 0) $$
    (1)

    where λ is the failure rate of a resource [32]. If numfails be the number of failures within a resource during the job execution period runtime then the failure rate can be calculated as: \( \frac{1}{\text{MTTF}} = \frac{{{\text{num}}_{\text{fails}} }}{{{\text{run}}_{\text{time}} }} \), which is the inverse of mean time to failure (MTTF).

  • Reliability Reliability represents the probability that the workflow will be executed successfully and it is to be maximized. A schedule is an assignment of the tasks to the virtual machine and the reliability of a schedule is defined as the probability that it finishes correctly and is given by the probability that all the VMs be functional during the execution of all their assigned tasks.

  • Energy efficiency Data centers consist of huge numbers of heterogeneous servers which both consume and simultaneously waste massive power to execute numerous assigned tasks due to poor task assignment optimization. So, scheduling plays a very important role in determining the efficient execution of tasks in virtualized environments. Tasks are to be mapped to the virtual machines in such a way that the energy consumption should be reduced.

3 Research Challenges

Based on the related literature, we find that the following issues have not been sufficiently solved. These issues can give the directions for future work.

  • Scheduling overhead There are number of resources available and it may possible that more than one user demand the same resource. This creates a scenario of multiple jobs waiting in a queue, which leads to the problem of scheduling overhead. Some mechanism needs to be discussed to reduce such overhead while generating schedules in a workflow. In [33] the authors presented different clustering techniques incorporated into the Pegasus workflow mapping system on the TeraGrid and results showed significant reduction of overall workflow runtime. Similar thing can be thought in the clouds systems also.

  • Scheduling with Fault tolerance There is a need of defining a method to integrate fault tolerance with scheduling algorithm based on some meta-heuristics approach. There is no evidence of such work done in the related literature. There are various fault tolerance techniques that reduce the effect of failures on application execution when the failure occurs. Check point recovery, over-provisioning or task replication and task resubmission schemes can be integrated with scheduling as discussed in [34].

  • Storage cost and performance trade-off workflow scheduling that takes both storage cost and execution time into consideration has not been studied as extensively. The tasks of a workflow are to be mapped to the VM-instances (Virtual Machine) in the cloud. Every available virtual machine is associated with both cost and performance related attributes like processor speed, bandwidth. Reducing the storage cost has also become a challenge for the scientists while optimizing the workflow performance [35].

4 Conclusion and Future Directions

In this paper, we have discussed various workflow scheduling algorithms in clouds and Grids. Scheduling criteria should include multiple constraints to be satisfied based on the user’s requirements. There can be heuristics approach or meta-heuristics approach for optimizing the various scheduling criteria. We discussed both approaches but focus of our survey is meta-heuristics based methods. Some of the algorithms are based on Artificial Bee colony algorithms (ABC) [5, 7]. There are some work in which Genetic Algorithm (GA) is combined with other heuristic method [1823]. Also, the use of Particle Swarm Optimization (PSO) discussed in [8, 9, 11, 12] for the scheduling of workflow in Cloud environment. Clonal selection algorithm is also being used for scheduling workflow in cloud environment [15, 17]. In [17], authors combined ACO with Clonal selection algorithm for scheduling the resources in the clouds. ACO is also combined with intelligent water drops algorithm in [25, 31] to improve the scheduling. In [27, 28], a new method based on Cuckoo Optimization has been used to generate optimal schedules in a Grid environment. Researches show that Genetic Algorithm is one of the best meta-heuristic for scheduling workflow in Cloud computing [18, 20]. In [29], Cat Swarm Optimization has been used to minimize the cost, makespan, and CPU idle time. Discrete CSO [36, 37] can also be used in combination with other heuristics method to achieve the better performance. As there is no evidence of using Discrete CSO based approach for workflow scheduling in cloud systems.

Most of the above stated researches focus only on one or two optimization objective such as cost or time and do not consider the energy efficiency and failure rate while scheduling the workflows. But, there is need to consider the failure rate and security also. The efficient scheduling will give better optimized results if energy efficiency factor is also considered. A hybrid heuristic-based scheduling [22] considered the failure rate while scheduling the tasks in workflow which is shown as in a Table 2 given. Also, the papers [15, 18] based on meta-heuristics approaches considered the energy efficiency parameter for scheduling of the workflow tasks. There is need to develop scheduling algorithms that would focus on failure rate, reliability, and energy efficient parameters.

Also, we found that the Artificial Bee Colony has not been used for optimization in workflow scheduling. The experimental results of various research works [5, 30, 38, 39] reveal that the ABC algorithm is having potential to solve optimization problems and more research is required to adapt it to other engineering problems. After getting inspiration from its successful implementation in [18] to solve Job scheduling problem in cloud computing, we feel that there is a scope of using hybrid meta-heuristics approach that combines Artificial Bee Colony algorithm and Genetic Algorithm (ABC-GA) for multi-objective optimization workflow scheduling problems. Cross-over and mutation operators of GA can be embedded into ABC to improve scheduling strategy. A hybrid meta-heuristics based energy efficient workflow scheduling can be considered for future work, which schedules the tasks on the resources by minimizing the cost and that consider the failure rate of the resources.