1 Introduction

Complex scientific applications like Physics, Bioinformatics, Earth Science, Astronomy and disaster modeling can be represented naturally in the form of workflows [1, 2]. One of the benefits of workflow representation is that the workflow can be reusable, reproducible and even traceable through other workflows [3, 4]. Workflows, depicted as Directed Acyclic Graph (DAG), consists of computational activities interconnected through data-flow and control-flow dependencies [5]. Scientific workflows are partitioned into multiple tasks which require complex data of different sizes and tens of hundreds of processing hours [6]. In the meantime, computing systems where catastrophe can occur will become useless, if the completion of workflow execution takes more than some specified time. These workflow-based applications are highly demanding and challenging for processing huge amounts of data in real-time workflow tasks with the desired cost reduction of computing resources [7].

Cloud provides computing as a utility-based IT resource over the internet where users pay based on the actual consumption of computing resources [8]. A cloud service provider’s basic offerings are categorized as a. Software as a Service, widely used for remote software application services (SaaS model) b. Platform as a Service that provides an application development platform for building and deploying the applications (PaaS model) and c. Infrastructure as a Service that provides middleware associated computing instances (IaaS model) [9]. SaaS and PaaS based solutions are not feasible for workflow applications since they only provide a platform to design, develop, and porting legacy applications to the new platform [10]. On the other hand, IaaS offers several benefits over traditional distributed environments [11]. Direct on-demand provisioning allows the users to directly provision/de-provision the required computing resources. Elasticity offers the flexibility of procuring or releasing the computing resources with varying configurations. Therefore, for any scientific workflow application, the resource pool can grow or shrink based on its requirement.

In the cloud, the workflow is executed in two phases. In the first phase, resources are identified and acquired from the cloud to run the workflow tasks. The second phase generates a schedule by mapping each task to a suitable resource so that Quality of Service (QoS) requirements such as deadline, task precedence constraints are met. Previous works on the workflow scheduling in traditional distributed systems are mainly focused on the resource provisioning phase because these distributed environments provide a static pool of resources whose configuration is known in advance. Although, most of the researchers in traditional distributed systems focused on the minimization of makespan, while researchers in the cloud focus on other important parameters such as economic cost, consumption of energy and secure computation [12] besides execution time.

Existing works on the workflow scheduling problem assume that the monetary cost for a computation is based on the amount of actually used resources [13]. With this assumption, two key corollaries are 1) the total cost of a workflow is the sum of the costs of all sub-tasks, and 2) the cost of a task is fixed when running on certain service. On the other hand, in the cloud environments, the cost is determined by the running time of the underlying hosting instances. In addition, the runtime is usually measured by counting fixed-size time intervals, with the partially used intervals rounded up. Such schemes make the cost caused by a task hard to be precisely predicted before scheduling. For example, a task that shares the same time interval with the previous task hosted in the same instance might not produce extra cost. On the other hand, for a task which starts a new time interval but does not use it entirely, the cost might be more than the estimated.

In scheduling, there are still some specific challenges that need to be addressed. First, the CPU performance variation due to virtualization, shared and heterogeneous nature of non-virtualized hardware. Schad et al. [14] reported CPU performance variation up to 24% in Amazon’s EC2 cloud. To make an allocation decision, many of the scheduling policies depend on the estimation of the task’s runtime on various types of resources, assuming the capacity of the resource is always optimal. Due to the performance variation in the virtual machine, the actual completion of the task takes longer and is delayed. This leads to a deluge on the sub-tasks execution and may cause an application’s deadline to be missed, or an increase in budget. So, the performance variation of the resources has a substantial impact on workflow scheduling. Secondly, acquiring a resource requires some time for proper initialization (acquisition delay) before it is available to the user. Likewise, when the resource is terminated, an approximate shutdown time is required (termination delay). Therefore, longer acquisition times may cause an application to miss its deadline or increase in budget. Third, there is a limitation in traditional heterogeneous environments with respect to the availability of the number of resources and their type. All the existing list-based heuristics allocate the best suitable resource to the task by traversing the entire list of available resources [13]. Since the cloud provides an infinite pool of resources with various configurations and pricing schemes, it is not possible to traverse an entire pool of resources. For example, PSO defines particle position and velocities as t x r matrices, where t is the number of tasks and r is the number of resources. However, r would be too large for the cloud. The typical encoding schemes usually represent the mapping of tasks to resources that might not be feasible for the cloud, since the resources acquired and released dynamically. These encoding schemes result in unnecessary high time and space complexity. The scheduling algorithm should take a suitable decision in choosing the resources in view of performance and cost optimization. All the above challenges of the cloud dictate the development of novel resource provisioning and workflow scheduling algorithms.

Workflow scheduling is widely known to be an NP-hard [17] problem. To solve the nature of NP-hard problem, heuristic algorithms are more suitable than the deterministic algorithms [15]. The parallel nature working of the Firefly Algorithm (FA) resolves the optimization problems. The ‘lower level’ (heuristic) in FA focuses on generating new solutions within the search space and thus selects the best solution for survival. On the other hand, Randomization, allows the search process to prevent the solution being trapped in local optima. The local search improves a candidate solution until improvements are detected, i.e., places the solution in local optimum.

In this paper, an attempt is made to use the Firefly algorithm to solve the workflow scheduling problem in the cloud which gives cost-effective realistic schedules. This paper proposes a novel scheduling algorithm CEFA which schedules workflow tasks to the low-cost virtual machine. Our proposal CEFA considers all the challenges of cloud, such as on-demand resource provisioning, elasticity, and pay-per-use price model and it also focus on issues such as CPU performance variation and their acquisition delay. Major contributions are made in the research of the workflow scheduling process as follows.

  • Through modeling the IaaS cloud and workflow, we formulate the workflow scheduling problem in the IaaS cloud as an optimization one that attempts to minimize the makespan and execution cost of workflows simultaneously.

  • In order to effectively solve the workflow scheduling problem in clouds, we propose a new deadline-constrained scheduling algorithm named Cost-Effective Firefly based Algorithm (CEFA), by designing Novel schemes for workflow encoding, initial population, and fitness evaluation.

  • We conduct extensive experiments on synthetic data of real scientific workflows to verify the effectiveness and efficiency of the proposed CEFA algorithm. Experimental results show that CEFA can achieve better cost-makespan tradeoffs compared to three existing algorithms, including IC-PCP, PSO, RCT, RTC, and RWO.

The rest of the paper is organized as follows. Section 2 of this paper provides a precise study of literature on the work carried in the area of workflow scheduling and highlights the major works and proposals. Section 3 gives an overview the virtual machine modeling, workflow modeling and problem formulations, followed by the fundamentals of firefly algorithm in Section 4. In continuation, the proposed solution to the workflow scheduling problem shall be discussed in Section 5. Next is Section 6, which delivers the evaluation report of the arrived results and its comparison charts. Towards the end of this paper, in Section 7, the possible scope of future work in the workflow scheduling is suggested.

2 Related work

A multitude of studies have been made on scheduling problems in distributed systems and is NP-hard [17] in which a feasible schedule can’t be found in polynomial time unless NP = P. Various heuristic and meta-heuristic approaches focused on generating the near-optimal schedules [18,19,20,21,22,23]. However, these solutions focused on meeting the QoS requirements while generating a schedule. This section summarizes the bibliographic review of several scheduling approaches in the IaaS cloud with the user’s QoS requirements.

Workflow scheduling approaches are classified as either Heuristic or Meta-heuristic and then sub-categorized according to workflow multiplicity and scheduling techniques as shown in Fig. 1.

Fig. 1
figure 1

Classification of scheduling algorithms

In heuristic-based workflow scheduling, Mao and Humphrey [24] proposed a Scaling-Consolidation-Scheduling (SCS) algorithm to execute workflow ensembles on the cloud. They acknowledge that there are various types of VMs with different prices and that they can be leased on demand, depending on the application’s requirements. Furthermore, they tailor their approach so that the execution cost is minimized based on the cloud’s pricing model, that is, VMs are paid by a fraction of time, which in most cases is one hour. They try to minimize the execution cost by applying a set of heuristics such as merging tasks into a single one, identifying the most cost-effective VM type for each task and consolidating instances. Although this is a valid approach capable of reducing the execution cost of workflows on clouds, the solution proposed only guarantees a reduction on the cost and not a near-optimal solution.

Another work on workflow ensembles developed for cloud is presented by Malawski et al. [25]. They proposed three algorithms in which two are dynamic algorithms, DPDS (Dynamic Provisioning Dynamic Scheduling), WA-DPDS (Workflow-Aware DPDS) and one static algorithm is SPSS (Static Provisioning Static Scheduling). These algorithms aim to maximize the number of workflows completed under QoS constraints such as deadline and budget. Their solutions acknowledge different delays present when dealing with VMs leased from IaaS cloud providers such as instance acquisition and termination delays. Furthermore, their approach is robust in the sense that the task’s estimated execution time may vary based on a uniform distribution and they use a cost safety margin to avoid generating a schedule that goes over budget. Their work, however, considers only a single type of VM, ignoring the heterogeneous nature of IaaS clouds.

Static Provisioning-Static Scheduling under Energy and Deadline Constraints (SPSS-ED) and Static Provisioning-Static Scheduling under Energy and Budget Constraints (SPSS-EB) are two energy-aware resource provisioning algorithms proposed by Pietri et al. [26] to schedule workflow ensembles based on Static Provisioning-Static Scheduling (SPSS). SPSS-ED focuses on meeting a deadline and energy constraints and SPSS-EB focuses on budget and energy constraints. Both of the algorithms aim to reduce the consumption of energy and increase the number of completed workflows. SPSS-ED plan it’s execution by scheduling each task of the workflow. SPSS-ED plan is accepted only if the deadline and energy constraints are satisfied. In SPSS-EB, the same process is repeated but instead of the deadline, budget constraint is used. However, they do not account for CPU performance variations, different workflow structures, data transfer costs, and instance acquisition and termination delays.

Further, Heuristic based workflow scheduling algorithms are presented in [27,28,29,30] to execute a single workflow in the IaaS cloud. The work in [27,28,29] are based on the partial critical path (PCP) of the workflow. These approaches reduce the execution cost by scheduling all the workflow tasks in a critical path on a single machine. However, they do not have a global optimization strategy in place that can provide a near-optimal solution; instead, they use a task level optimization and hence fail to generate a better solution for the whole workflow.

Abrishami et al. [27] formulated the IaaS Cloud Partial Critical Path (IC-PCP) algorithm, which is a non-robust deadline constrained algorithm. This algorithm schedules a workflow in an IaaS offering whilst reducing costs involved in execution within the user-specified application’s deadline. It follows Partial Critical Paths (PCPs) and hence, it starts identifying the critical paths which are associated with the exit node of the workflow. Then, it assigns each task of the critical path on to the cheapest resource, which can execute before its finish time. If in case, this cannot be achieved, the cheapest resource that can finish the tasks before their latest finish time is leased and PCP assigned to it. The procedure is executed recursively until all the tasks are scheduled. They try to minimize the execution cost based on the heuristic of scheduling all tasks in a partial critical path on a single machine which can finish the tasks before their latest finish time (which is calculated based on the application’s deadline and the fastest available instance). However, they do not have a global optimization technique in place capable of producing a near-optimal solution; instead, they use a task level optimization and hence fail to utilize the whole workflow structure and characteristics to generate a better solution.

Poola et al. [29] proposed RCT and RTC algorithms, which are also based on Partial Critical Paths (PCPs). These algorithms are robust, fault-tolerant and handle the performance variation and failures of the cloud resources. These algorithms schedule tasks on two diverse cloud instances, on-demand and spot (dynamic) instances by adapting just-in-time heuristics in scheduling. On the other hand, to handle performance variation of the resources, some amount of relaxed time slots are identified depending on the type of robustness and are driven by PCP execution time and its levels of tolerance to fluctuations in PCP. Also, to enhance the tolerance level a checkpoint is imposed in a gap of intervals. When a task fails, the algorithm resumes the task from the last checkpoint. However, the algorithm handled uncertainty by assuming the known distribution function for all the input data; besides increasing the infrastructural costs as the storage of checkpoints becomes mandatory. The comparison of our proposed work has been done with the robust scheduling algorithm only for the parameters which caters for performance variation and the acquisition delays of VMs. Furthermore, we have selected RCT and RTC resource selection policies as our baseline algorithms.

Just-In-Time (JIT-C) workflow scheduling algorithm for the Cloud environment is presented by Sahni et al. [30], is a dynamic cost-effective scheduling approach. If the deadline factor is achievable, then the optimal solution is generated. Otherwise, the user has to prompt the deadline again. Their algorithm exploits the advantages of the cloud computing as well as the performance variation of the resources and the acquisition delay.

Meta-heuristic algorithms [13, 31,32,33,34,35] are the other category of workflow scheduling algorithms. Modern metaheuristics are accurate and efficient to find a “good enough” solution in a “small enough” computing time. Population-based metaheuristics find good solutions by iteratively selecting and then combining existing solutions from a population set by mimicking the principles of natural evolution. Commonly used population-based algorithms are the Particle Swarm Optimization (PSO), Genetic Algorithm (GA) and the recently developed algorithm called the Firefly Algorithm (FA) shows a lot of potential [36, 37] in workflow scheduling.

The authors of [32,33,34] solved the workflow scheduling problem using Particle Swarm Optimization (PSO). Pandey et al. [33] explicated an approach that lowers the cost of execution with load balancing in available machines. The execution time of the workflow is not considered in the scheduling objectives and therefore this value can be considerably high as a result of the cost minimization policy. The authors do not consider the elasticity of the cloud and assume a fixed set of VMs is available beforehand. For this reason, the solution presented is similar to those used for grids where the schedule generated is a mapping between tasks and resources instead of a more comprehensive schedule indicating the number and type of resources that need to be leased, when they should be acquired and released, and in which order the tasks should be executed on them.

Rodriguez and Buyya [32] proposed a meta-heuristic optimization based approach, Particle Swarm Optimization (PSO) that gives promising results, but there are still rooms for enhancement. In their approach, particles are encoded based on the resources index that depicts the position of the particles. Because the index does not have much information on resources, particles move in a different dimensions to individual best, and global best does not lead to an optimal solution. This algorithm focused on the essential cloud characteristics such as the elasticity, pay as you go pricing model and heterogeneous nature of the unlimited resources, including the disparities in performance and the acquisition and termination delay of VMs. In this work, the considered PSO parameters are: the acceleration coefficient (ci=acceleration coefficient; i = 1, 2), number of particles (n), and inertia factor or weight (ω). In this computation, the acceleration coefficient (ci) was varied from 1.5 to 2.0, and the inertia (inertia factor or weight) ranged from 0.1 to 1.0. The number of particles is set to 100.

Wu et al. [34] proposed Revised Discrete Particle Swarm Optimization (RDPSO) which focuses on minimizing either cost or time while meeting a deadline or budget constraints. But it assumes an initial set of resources available beforehand and hence lacks in utilizing the elasticity of IaaS clouds.

Reddy and Kumar [47] proposed the Regressive Whale Optimization (RWO) algorithm for workflow scheduling in the cloud computing environment. RWO is the meta-heuristic algorithm, which schedules the task depending on a fitness function. The fitness function is defined based on resource utilization, energy, and the Quality of Service (QoS) constraints. The proposed algorithm differs from the standard Whlae Optimization Algorithm (WOA) in the position update step, where the modification is made with the introduction of a regression-based position update. In their approach, the solution is generated at random without considering the resource type. Resource Type provides the necessary information about VM which in turn helps to comply with the deadline factor. Also, it ignores the essential cloud characteristics such as the elasticity, pay as you go pricing model, and heterogeneous nature of the unlimited resources, including the disparities in performance and the acquisition and termination delay of VMs.

Furthermore, approaches presented by Huang [35], Zhu et al. [13] and Chen et al. [31] are based on Genetic Algorithms. Chen et al. [31] proposed a cost optimization strategy under deadline constraint that uses the same encoding approaches as Rodriguez and Buyya [32]. When there is no optimal/feasible solution, their strategy moves from cost to time function to minimize the execution time, but not the cost. Zhu et al. [13] and Huang [35] focused on essential characteristics of the cloud by ignoring the CPU performance variation of the resources and the acquisition delay. However, these algorithms need to evolve for numerous generations and a feasible solution may not be found.

In the literature, numerous approaches are proposed on heuristic and meta-heuristic algorithms. However, meta-heuristic algorithms outperform other algorithms when the number of tasks is less and fall behind if the number of tasks is more. The meta-heuristic algorithms outlined do not conform to the basic principles of the cloud. So, this work proposes a Cost-Effective Firefly based Algorithm by addressing the challenges, such as the CPU performance variation of the resources, the acquisition delay and attempts to optimize the cost under deadline constraint.

3 Modeling and problem formulation

In this section, we first give the model of virtual machines (VMs), workflow and then formulate the scheduling problem.

3.1 Virtual machine modeling

The cloud platform provides an infinite number of VMs with various configurations. Let \(VT=\left \{vt_{1}, vt_{2}, vt_{3} \dots , vt_{m}\right \} \) represent m types of virtual machines available in the cloud. The parameter \(v m_{k}^{vt_{u}}\) represents the kth VM of type vtu. Also, each virtual machine VM \(v m_{k}^{vt_{u}}\) is associated with an estimated lease begin time \(LBT_{v m_{k}^{vt_{u}}}\) and lease end time \(LET_{v m_{k}^{vt_{u}}}\). The price of the virtual machine denoted as \( price (v m_{k}^{vt_{u}})\) is the cost per unit interval of time. It is worth noting that virtual machines can be acquired and terminated at any point in time. Also, virtual machines are charged per unit interval of time, and any partial use of the unit of time will be charged for the whole period.

3.2 Workflow modeling

In the cloud, a workflow (W) can be modeled as \(W=\left \{G, w_{d} \right \}\), where G and wd represent workflow’s structure and deadline constraint respectively. The structure G of workflow W can be formally modelled as a directed acyclic graph (DAG), i.e., \(G=\left (T, E\right )\), where \(T=\left \{t_{1}, t_{2}, \ldots , t_{n}\right \}\) is a set of tasks and the parameter n is the task count; the vertex tj denotes the jth task in a workflow W. Also, \(E \subseteq T \times T\) denotes the directed edges among tasks. An edge ek,jE of the form (tk,tj) exists if there is a precedence constraint between the tasks tk and tj, where tk is an immediate predecessor of task tj and the task tj is an immediate successor of task tk. The pred(tj) denotes the set of tasks consisting of all tj’s immediate predecessors, and succ(tj) represents the set of tasks consisting of all tj’s immediate successors. A simple workflow is shown in Fig. 2

Fig. 2
figure 2

An example workflow

3.3 Problem formulation

In workflow scheduling, let ETj,k denotes the execution time of a task tj on VM \(v m_{k}^{vt_{u}}\). The execution time of a task tj on various instances are calculated based on the task’s size (Size(tj)) divided by Processing Capacity (\( PC_{v m_{k}^{vt_{u}}} \)) of the instance type vtu.

$$ ET_{j,k}=\frac{Size (t_{j} ) }{PC_{v m_{k}^{vt_{u}}}} $$
(1)

Additionally, CPU performance variation of the resource is modeled by altering the processing capacity of the VM \(v m_{k}^{vt_{u}}\) by introducing a performance degradation (\( PerDeg_{v m_{k}^{vt_{u}}} \)) parameter. So, (1) is updated as

$$ ET_{j,k}=\frac{Size (t_{j} ) }{ (PC_{v m_{k}^{vt_{u}}}\ast (1-PerDeg_{v m_{k}^{vt_{u}}} )) } $$
(2)

Further, data transfer time DTj,k across the tasks on different virtual machines is the ratio of output-data file size (\( d_{t_{j}} \)) to average bandwidth capacity β within the same data center in the cloud, as shown in (3). The data transfer time DTj,k becomes zero when both parent and child tasks are scheduled on the same VM.

$$ DT_{j,k}=\frac{d_{t_{j}}}{\beta} $$
(3)

Finally, total processing time PTj,k of task tj on \( v m_{k}^{vt_{u}} \) is obtained using (4). Where e is the out degree of the task tj and IsV MSame is zero whenever tj and tk run on the same VM or 1 otherwise.

$$ P T_{j,k}=E T_{j,k}+{\sum\limits_{1}^{e}}\left( D T_{e_{jk}} * IsVM_{S a m e}\right) $$
(4)

Besides, the symbols ESTj,k and EFTj,k denotes the earliest start time and earliest finish time of a task tj on VM \(v m_{k}^{vt_{u}}\). The earliest time at which a task tj can begin its execution on VM \(v m_{k}^{vt_{u}}\) is known as the earliest start time and calculated as follows.

$$ E S T_{j}=\left\{\begin{array}{cc}{0,} & {\text{ if } \text{pred}(t_{j})=N U L L} \\ {\max\limits_{t_{p} \in \text{pred}\left( t_{j}\right)}\left\{EST_{p}+M E T_{p}+M T T_{p,j}\right\},} & {\text{ otherwise }} \end{array}\right. $$
(5)

Where MTTp,j denotes the minimum data communication time from predecessor task tp to current task tj and METj represent the minimum execution time of a task tj on VM \( vm_{k}^{vt_{u}} \in VT \) that have minimum execution time among all VM types in the cloud and computed as

$$ M E T_{j}=\min_{v m_{k}^{vt_{u}} {\epsilon} VT}\left\{E T_{j, k}\right\} $$
(6)

Evidently, the estimated finish time, EFTj can be computed as

$$ E F T_{j}=E S T_{j}+M E T_{j} $$
(7)

In an arbitrary workflow scheduling, the latest finish time LFTj of task tj is the period before which task completes its computation, such that finish time of workflow W is less than the user-specified deadline, wd. It is defined as

$$ L F T_{j} = \left\{\begin{array}{cc}{w_{d},} & {\text{ if } \text{succ}(t_{j}) = N U L L} \\ {\min\limits_{\!\!t_{p} \in \text{succ}\left( t_{j}\right)}\!\left\{L F T_{p} - M E T_{p} - M T T_{p,j}\right\}\!,} & {\text{ otherwise }} \end{array}\right. $$
(8)

Due to precedence constraints in a workflow, the task can’t be executed until it gathers all of the data from its immediate predecessors.

$$ F T_{p, k}+D T_{p, j} \leq S T_{j, k} \quad \forall e_{p, j} \in E $$
(9)

Where FTp,k denotes task tp’s finish time on VM \( vm_{k}^{vt_{u}} \), DTp,j denotes the data transfer time between task tp and tj and STj,k represent the start time of task tj on VM \( vm_{k}^{vt_{u}} \).

In workflow scheduling environments, finish time of workflow WFT is the maximal finish time of all its tasks and is defined as

$$ W_{FT}=\max_{t_{j} \in\left( W\right)}\left\{F T_{j, k}\right\} $$
(10)

The cost cj,k for executing task tj on VM \( vm_{k}^{vt_{u}} \) is calculated as

$$ c_{j, k}={{Price}}\left( v m_{k}^{v t_{u}}\right) *\left[\frac{E T_{j, k}}{N_{t}}\right] $$
(11)

Where \( Price (vm_{k}^{vt_{u}}) \) is the cost of the VM \( vm_{k}^{vt_{u}} \) per one interval of time and Nt is the unit of time interval.

To ensure the deadline of a workflow, all the workflow tasks in W must finish their execution before its deadline wd. Consequently, this brings with it another constraint

$$ W_{FT} \leq w_{d} $$
(12)

Subject to aforementioned constraints in (9) and (12), the primary goal of optimization is to minimize the Total Execution Cost (TEC) of completing workflow W, which can be computed as

$$ \text{Minimize } T E C=\sum\limits_{k=1}^{|V M|} {Price}(v m_{k}^{v t_{u}}) \cdot p_{k} $$
(13)

where |VM| represents the number of acquired VMs and pk denotes the number of time units of leased VM \( vm_{k}^{vt_{u}} \).

4 Firefly algorithm

Swarm based algorithms have received more attention in the research community due to advances in computing infrastructures. Fireflies are tiny winged beetles that belong to the family of Lampyridae. They are capable of flashing light to fascinate mates. They work like a capacitor, gradually charge to a definite threshold and discharge accumulated energy in the form of light. Based on this phenomenon Yang [37] developed the firefly algorithm.

The firefly algorithm has many advantages such as:

  1. The rate of convergence is maximum or maximizing.

  2. It can be used as a general as well as a global problem solver.

  3. Each firefly toils independently and settles with a superior position than its current position and the position of other fireflies. Hence, it is easy to escape from the local optima and scale to settle with global optimum.

  4. It is accomplished with the provision of diversified levels of robustness when compared to other meta-heuristic algorithms.

  5. It is simple, flexible, versatile and effectively efficient in addressing extensive range, but assorted real-time problems.

4.1 Inspiration

Firefly Algorithm (FA) developed by Yang [37, 46] represents one of the newest meta-heuristic techniques that idealizes the characteristics and behavior of the fireflies. The Firefly algorithm depends on three idealized principles [45]:

  1. 1.

    Fireflies have a unisex character; each firefly is fascinated by another, irrespective of gender.

  2. 2.

    Attractiveness factor is directly proportional to its brightness. A firefly with lesser brightness gets attracted to a firefly with more brightness.

  3. 3.

    The brightness of the firefly is represented by a fitness function that is maximized. This determines the quality of the workflow schedule.

In accordance with the previous rules, the basic Firefly algorithm is given in Algorithm 1.

figure d

4.2 Firefly evaluation

Light intensity computation of firefly, highly relies on the nature of investigating the problem. In this scenario, the quality of the workflow schedule solution is computed by Total-Execution-Cost (TEC) using (13).

4.3 Distance

Let the distance between any two fireflies, firefly “i at yi” and “j at yj” be ‘rij’. In the Cartesian framework, ‘rij’ is defined using (14). Where yj,k is the kth component of spatial coordinate yj associated with jth firefly and d is the dimension of the investigating issue [46].

$$ r_{i j}=\left\|y_{i}-y_{j}\right\|=\sqrt{\sum\limits_{k=1}^{d}\left( y_{i, k}-y_{j, k}\right)^{2}} $$
(14)

4.4 Attractiveness

The attractiveness (β) of two fireflies depends on the distance (r) between fireflies and absorption coefficient of light (γ). The degree of attraction β(r) of a firefly is estimated by (15).

$$ \beta(r)=\beta_{0} e^{-\gamma r^{2}} $$
(15)

4.5 Movement

The movement of the firefly k at yk towards more attractive firefly j is established using (16).

$$ y_{k+1}=y_{k}+\beta_{0} e^{-\gamma r_{kj}^{2}}(y_{k}-y_{j})+\alpha(r a n d-0.5) $$
(16)

4.6 Efficacy of firefly algorithm

By inheritance, FA follows all the advantages that other swarm intelligence-based algorithms have. Upon analysis, a few insights that draw attention are as follows:

  1. FA has the potential to automatically divide the whole population into subgroups. This is because, invariably, local attraction strength is superior to long-distance attractions; resulting FA efficiently handles the extremely nonlinear and multi-facet optimization problems naturally.

  2. FA believes neither the historical individual best nor the explicit global best. This draws the strength to resist the probable weakness of impulsive convergence. Additionally, FA does not rely on the velocities similar to that of velocities in PSO.

  3. FA is equipped to control and acclimatize to crisis settings by imposing control over the scaling constraint like, γ.

5 Proposed firefly based workflow scheduling

In the cloud computing, an efficient encoding mechanism can provide an efficient solution that fulfills the QoS constraints and raise efficiency. There are two steps involved in modeling the Firefly problem. First, the encoding mechanism used in defining the problem, that is, how initial solution is represented. Second, the fitness function that measures the quality of the solution.

5.1 Map between FA and workflow scheduling

A map between workflow scheduling and FA is provided before describing the proposed scheduling method. The proposed mapping would be as follows.

  1. a.

    The dimension of the problem (d) is mapped to the total number of tasks in the workflow.

  2. b.

    The location (yi) of each firefly is mapped to a potential solution to the workflow scheduling problem.

  3. c.

    Intensity (I) is mapped to the fitness of each solution of the workflow scheduling. Schedules with lower-cost are equivalent to fireflies of higher intensity.

  4. d.

    Movement of low-intensity fireflies towards high-intensity fireflies is mapped to changing non-optimal schedules to more optimal schedules.

5.2 Firefly modelling

For the workflow scheduling scenario, the firefly represents workflow tasks; hence, the total number of tasks in workflow determines the size of a typical firefly. Workflow tasks execution order is determined based on dependency constraints of workflow and the index is assigned to each workflow task. A workflow schedule consists of 3 elements, first is task execution order (TaskPriorityIndex), second is allocation of the appropriate resource (TaskToResource) and third determines the resource type (ResourceType). Figure 3 shows the sample firefly encoding for the workflow in Fig. 2. This figure clearly shows the mapping of the task to a resource. For example, task t3 will be executed at the virtual machine number 2 of type 1.

Fig. 3
figure 3

Firefly Encoding for Sample Workflow (For Fig. 2)

The fitness function should reflect the objectives of workflow scheduling, as it determines how “good” the potential solution is. The fitness function value is the overall execution cost for each of the associated schedule derived from the firefly’s position.

5.3 Schedule generation

In order to handle the workflow scheduling problem, evaluation of total execution time and total execution cost of the considered workflow should be carried with an explicit task-resource mapping. The evaluation of the Total-Execution-Cost (TEC) and the Total-Execution-Time (TET) in a generic workflow is presented in Algorithm 2. Initialize an array V MS for maintaining the provisioned resources. Now, this algorithm evaluates the execution time of a task tiT in each type of vtiV T using (2) and data transfer time DTj,k, to transfer data from task tj to tk using (3). The Start Time of task tj (STj) has two scenarios. If the task does not have any parents, then it can start its execution as soon as the virtual machine is allotted. Otherwise, the task starts as soon as the parents finish their execution and transfer their output. The finish time of the task tj is obtained by adding STj to PTj. Then update the resource in VMS. This process continued until each workflow task is scheduled.

figure e

5.4 Initial population

The execution time of tasks in the critical path affects the total execution time significantly, but the execution cost is a minor part of the total cost of execution. Therefore, assigning critical path tasks to faster virtual machines significantly reduces the total execution time, while the cost of execution has a limited impact on the total cost of execution. The diversity and veracity of the initial population of a firefly influence the performance greatly. To improve the convergence speed and solution quality, 20% of the initial population in the critical path [44] is assigned to the fastest virtual machines and the next 20% of the initial population is assigned to virtual machines with low processing capability. The rest of the population is generated randomly.

6 Performance evaluation

This section presents the experiments conducted to evaluate the performance of the proposed approach.

6.1 Experimental workflows

The performance of CEFA is evaluated with different workflows used in different scientific fields: LIGO, Epigenomics, CyberShake, and Montage. These workflows have diverse structural properties such as pipeline, aggregation, distribution, and redistributions as well as different composition as shown in Fig. 4. Montage, an astronomy application stitches a series of images to create personalized sky mosaics. Montage tasks require high intense I/O and CPU with low processing capacity. The LIGO workflow aims to detect gravitational waves. This workflow requires a large memory with a high CPU. The Epigenomics is used in bioinformatics to automate genome sequencing operation. Moreover, these tasks demand high power computational processors with limited I/O regulations. Cybershake is best suited for simulating the earthquake hazards using synthetic seismograms. These workflow tasks require large memory and high CPU. To ease the evaluation of scheduling algorithms, Bharathi [43] developed a set of synthetic workflows of various sizes that resembles concrete scientific workflows. Synthetic workflows are characterized by DAG in XML format and are available in [42]. For assessing the results of the proposed algorithm in terms of performance, experiments are carefully designed and carried out for the above-specified workflows with a varying number of tasks: small (about 25 tasks), average (about 100 tasks) and large (about 1000 tasks).

Fig. 4
figure 4

Workflow structures differed in terms of characterization

6.2 Experimental settings

The cloud service providers provide various types of VMs with varying configurations. The VM configurations of EC2 cloud offerings [41] are shown in Table 1. It is assumed that for each type of VM, the processing capacity in terms of floating-point operations per second (FLOPS) is available from the provider or can be estimated [40]. The estimated execution time of workflow tasks in various types of virtual machines is obtained based on their processing capacity. The change in CPU performance of each VM is modeled based on the results presented by Schad et al. [14]. Also, each virtual machine’s performance is reduced by a maximum of 24% based on the normal distribution. Its average mean is found to be 12% along with a 10% standard deviation. In a similar way, the data transfer time in the same data center is increased by a maximum of 19% [14], based on the normal distribution. Its average mean is found to be 9.5% along with a 5% standard deviation. The average bandwidth is set based on Amazon’s Elastic Block Store [39] i.e., 20 MBps. The VM billing time is set to 10-minute interval and the estimated acquisition delay is set to one minute similar to Meena et al. [10].

Table 1 VM Instance specifications

In this work, the FA parameters considered are: the number of generations (G), number of fireflies (n), the coefficient of light absorption (γ), random variable (α) and initial attractiveness (β0). These are selected in a random fashion and the combination factor (nG) determines the extent of search in the solution space for obtaining the optimal solution. If the value of (nG) is high, then it requires longer computational time, but it helps to get the best solution. In this computation, the coefficients of light absorption (γ) vary from ‘0’ to ‘10’ similar to Fister et al. [16], and the randomized parameter can have the fractional values from ‘0’ to ‘1’.

To evaluate the performance metrics, two types of deadline constraints SoftDeadline and HardDeadline are considered. We introduce a deadline factor δ similar to Abrishami et al. [27] and vary from 0 to 3.2 with a step length of 0.4. For this, all workflow tasks are executed on the fastest resources and the minimum time to run the workflow WMET is obtained. WMET is the lower limit of the workflow execution time. The deadlines are established according to the rule specified in (17).

$$ w_{d}=W_{M E T} *(1+\delta) $$
(17)

Where WMET is the minimum time required to execute the submitted workflow and δ is the deadline factor defined as follows.

For ‘HardDeadline’: 0 ≤ δ ≤ 1.2

For ‘SoftDeadline’: 1.2 ≤ δ ≤ 3.2

The deadline factor δ was varied with 0.4 disparity

6.3 Results analysis

The performance of the proposed CEFA is analyzed in terms of the compliance bounded by the deadline factor. The success rate for different scientific workflows under different deadlines is shown in Table 2. Table 2 presents the percentage of an improvement for experimental workflows with hard and soft deadline factors. Recently published algorithms IC-PCP [27], PSO [32], Robustness-Cost-Time [29], Robustness-Time-Cost [29], and RWO [47] are used as the baseline algorithms to evaluate the proposed solution.

Table 2 Percentage of improvement for experimental workflows with deadline factor

6.3.1 Evaluation of deadline constraints

It can be seen from Table 2 that the proposed CEFA gives an improvement of 90.5%, 84.5%, 87.5% and 87.5% in Montage, Epigenomics, LIGO, and CyberShake respectively for hard deadline condition. It is also seen that for soft deadline conditions, the proposed CEFA gives 73%, 62%, 70.5% and 66.5% improvement for Montage, Epigenomics, LIGO, and CyberShake respectively.

From above, it is observed that IC-PCP has low performance in all reference algorithms for ‘HardDeadline’ and ‘SoftDeadline’ constraints. IC-PCP doesn’t consider the performance degradation and delay in the acquisition of resources, which has a substantial effect on the execution cost and execution time, whereas the proposed CEFA considers performance degradation (PerDeg) and cloud resource startup time, which helps us to calculate the beginning and end time of task to ensure every task is executed under deadline constraint. The RTC and RCT algorithms work on the policy of resource selection and their priorities are specified based on the Partial Critical Path (PCP) heuristics. Both can tolerate the CPU performance variation of the instance only to a certain extent.

In PSO, particles are encoded based on the resources index that depicts the position of the particles. Because the index does not have much information on the resources, particles tend to move in various directions towards the individual best, and global best may not lead to an ideal solution. In RWO, solutions are generated randomly based on the static resource pool and do not have much information on the resource types. RWO also ignores the CPU performance variation and the acquisition delay of the resources, which has a significant impact on the execution cost and execution time However, if the user specifies a hard deadline, then it’s difficult to generate a feasible schedule for these approaches. CEFA encodes the firefly based on TaskPriorityIndex, TaskToResource mapping, and ResourceType. In TaskPriorityIndex, each task is assigned to an integer index based on its execution order. In TaskToResource, each task is mapped to a suitable resource and in ResourceType, vtu type resource is selected from a pool of resources. This mechanism provides necessary information about the VM which in turn helps to comply with the deadline factor. Therefore, CEFA exceeds all baseline algorithms in respect of meeting the user-specified deadline factor.

6.3.2 Makespan and cost evaluation (TET and TEC)

The proposed solution is anticipated to generate a schedule map which is cost-effective under deadline factor, therefore a comparison with the average of the makespan and the average cost should be observed. For each sample workflow, an average cost of execution (in $, in Fig. 5b, d, f, h) and total average execution time (in seconds) are depicted in Fig. 5a, c, e, g. The X-axis denotes the user-defined deadline factor values. If the deadline factor is between 0 and 1.2, then it is known as the hard deadline. Any value above 1.2 is known as a soft deadline.

Fig. 5
figure 5figure 5

a. Makespan for montage workflow. b. Execution cost for Montage workflow. c. Makespan for LIGO workflow. d. Execution cost for LIGO workflow. e. Makespan for CyberShake workflow. f. Execution cost for CyberShake workflow. g. Makespan for Epigenomics workflow. h. Execution cost for Epigenomics workflow

It can be seen in Fig. 5a, c, e, g, IC-PCP generates the cheapest schedule with the longest running time for each type of workflow deadline and hence, it does not meet any of the deadlines. On the other hand, the algorithm’s goal is to devise an optimal and also a feasible schedule by minimizing the cost under the deadline factor. Therefore, an inexpensive schedule with a deadline factor violation is not useful. Therefore, the comparison is made among baseline algorithms that have the objective of meeting the deadlines. As seen in Fig. 5a, c, e, g, among PSO, RTC, RCT, RWO, and CEFA, the proposed algorithm generates the profitable schedule for each type of the workflow with an average success rate of 87.5% for hard deadline and 100% for a soft deadline.

It can be seen from Fig. 5a for Montage workflow under deadline factor, CEFA gives an average lower makespan of 28%, 11%, 22%, 13% for RCT, PSO, RTC and RWO respectively. In a similar way, a lower makespan trend (Fig. 5c, e, g) is observed in LIGO, CyberShake, and Epigenomics workflows.

Figure 5b, d, f, h presents the cost incurred in the execution of Montage, LIGO, CyberShake, and Epigenomics for RTC, RCT, PSO, RWO, and CEFA. It can be observed that the proposed CEFA cost is 30%, 21%, 24% lower than RCT, PSO, RWO respectively for Montage work flow. Similarly, a cost reduction trend is observed in LIGO, CyberShake, and Epigenomics.

Thus, results conclude that CEFA delivers better performance in comparison with baseline algorithms. For hard deadlines, CEFA is able to deliver the maximum percentage of success rates for all experimental workflows at a lower cost. However, as deadlines get relaxed, CEFA decreases the execution time and cost by capitalizing on the increased slack time.

6.3.3 Computational complexity

For deriving the computational complexity, it is assumed that scheduled workflow W is composed of k tasks and e edges. Since scheduled workflow W is represented as DAG, the total number of edges in the given workflow W is \( k(k-1) / 2 \approx \mathrm {O}\left (k^{2}\right ) \). For each firefly generation, the distance, attractiveness and objective function of fireflies are evaluated. The extent of search in the solution space is determined by the overall available number of fireflies (N), along with their dimensions (d). The complexity of the objective function is determined by the schedule map algorithm and count of workflow tasks (T) and resources (R). As per the formulation d=k, the computational complexity of the recommending mechanism is \( O\left (N * T^{2} * R\right ) \) per iteration. RCT, RTC, and IC-PCP are based on a heuristic approach. They execute much quicker than the proposed approach. While RCT, RTC, and IC-PCP have a polynomial time complexity, the time complexity of proposed CEFA is high which is similar to other meta-heuristic algorithms [38]. Though the proposed algorithm has got higher time complexity, it provides better schedules than the existing RCT, RTC, PSO,RWO, and IC-PCP algorithms. Hence this drawback can be tolerated.

7 Conclusion and future work

This paper presented a meta-heuristic Cost-Effective Firefly based Algorithm (CEFA) which minimizes the cost of execution under deadline constraint. Also, CEFA considers the parameters like CPU performance variation, delay in acquisition and termination of Virtual Machines (VMs) to attain the proposed objectives. For simulation, the CloudSim tool is used and the obtained results are compared with baseline algorithms such as IC-PCP, PSO, RWO, Robustness-Cost-Time, and Robustness-Time-Cost on diverse real-world scientific workflows. Observations reveal that the proposed scheme provides better results in terms of cost-effective realistic schedules. The proposed work can be extended in the real cloud environment. The present scheme considers the pricing from a single cloud service provider. This work can be extended for multiple pricing schemes from various cloud service providers.