1 Introduction

The development of cloud computing technology provides a good platform for parallel applications, especially scientific workflows, such as Epigenomics in bioinformatics, Montage from astronomy and LIGO from gravitational physics. These platforms offer numbers of networked, flexible and scalable resources and services, and users pay only for what they use. However, the inherent flexibility of cloud computing platforms, while powerful, may also lead to inefficient usage and high costs when inadequate scheduling and provisioning decisions are made [1, 2]. Therefore, how to effectively perform workflow scheduling in a cloud computing environment is the key to ensuring that workflow applications benefit from the cloud computing environment.

Workflow scheduling is the mapping of each task in a workflow to a suitable resource and sorting the tasks on each resource to meet certainly given metrics. As a well-known NP problem, it has been a hot research topic in the distributed computing community for many years [3], and many scheduling algorithms have been proposed. However, the scheduling algorithms proposed in the early days are mostly based on shared community environments, such as community grids. In general, the scheduling objective is to minimize the execution time of the workflow, where workflows with QoS constraints are rarely focused on.

With the development of service-oriented grids, many workflow scheduling algorithms based on QoS constraints have been proposed, such as [4,5,6,7,8,9,10,11,12]. However, these methods only consider a fixed number of resources and cannot be directly applied to the cloud computing environment. Because in a cloud computing environment, resources can be acquired at any time and released when they are idle. Several scheduling algorithms have been proposed for workflows with QoS constraints in cloud computing environments in recent years, but the majority of research has focused on one of cost or time, such as [3, 13,14,15,16,17,18,19]. However, in cloud computing, time and cost are two of the most relevant user concerns in user-defined quality of service (QoS). Considering time and cost constraints simultaneously makes scheduling problems even more challenging. A few scheduling approaches consider both cost and time constraints, such as [20, 21]. However, the literature [20] and [21] first divide the tasks in the workflow into different levels, and then assign sub-deadline to each level. The tasks of the same level have the same sub-deadline. This way of dividing deadline by level is not conducive to each task make full use of the spare deadline to balance time and cost.

Towards this, we propose a new heuristic algorithm for multi-QoS constrained workflow scheduling, cost and time, named Budget-Deadline Constrained Scheduling (BDCWS). The BDCWS algorithm introduces an optimistic spare budget and an optimistic spare deadline, and gives a new balancing factor and selection strategy based on the optimistic spare budget and the optimistic spare deadline, making the execution cost and time consumption of the control task more effective.

The main contributions of this paper include the following. (1) A multi-QoS constrained workflow scheduling algorithm is proposed to solve the workflow scheduling problem with budget and deadline constraints in cloud computing environment; (2) The definition of new balancing factors and selection strategies based on the optimistic spare deadline and optimistic spare budget expands the time and cost adjustable range, which is more conducive to balancing time and cost consumption, thereby increasing the possibility of meeting both deadlines and budget constraints; (3) Double control of the task execution cost through the set of affordable virtual machines (built based on the task optimistic available budget) and the new balance factor, not only ensuring the cost competitiveness of the scheduling result, but also ensuring the successful rate; (4) Using real workflow applications to evaluate the algorithm, the experimental results show that our algorithm achieves a 26.3–79.7% higher success rate when compared to state of the art algorithms; (5) Introducing the cost frequency to analyze the experimental results, the results show that the scheduling results generated by our scheduling algorithm have more cost-competitive.

The rest of the paper is organized as follows. After outlining the related work in Sect. 2, we describe the system scheduling model in Sect. 3. Section 4 introduces the proposed scheduling algorithm. Section 5 shows the experimental results, and Sect. 6 concludes the paper.

2 Related work

Workflow scheduling problems have been extensively studied for many years. A large range of the presented scheduling algorithms in the early days are based on the shared community environments such as community grids. Assuming that the available computation or storage resources are limited, in general, the scheduling objective is to minimize the execution time of the workflow, where workflows with QoS constraints are rarely focused on. This problem becomes more challenging when QoS parameter constraints are considered in the scheduling problems. Time, cost, energy and reliability are common QoS parameters considered in research work in this area. Yu et al. [10] proposed a QoS workflow scheduling method based on Markov decision process, which can minimize the cost while satisfying the deadline constraints given by users in the grid. Liu et al. [11] proposed a path balance algorithm to adjust the length of each path in the workflow, and proposed a cost optimization algorithm based on path balance. Sakellariou et al. [5] proposed two algorithms, LOSS and GAIN, for time-optimized and cost-constrained. They all use other heuristics to minimize one target as an initial allocation, then implement a redistribution strategy to optimize another goal and meet the user's budget constraints. Zheng et al. [6, 7] proposed the algorithm Budget-constrained Heterogeneous Earliest Finish Time (BHEFT) that optimizes the execution time of a budget-constrained workflow. Similar to BHEFT, [4] proposed Heterogeneous Budget Constrained Scheduling (HBCS) algorithm. The algorithm defines a quantity Cost Coefficient to adjust the ratio between available budget and the cheapest possibility to control the budget and minimize the execution time of the workflow application. However, the HBCS algorithm is unfair for low-priority tasks because high-priority tasks have more budget than low-priority tasks [12]. Chen et al. [12] proposed a scheduling algorithm to solve the unfairness of the HBCS algorithm, namely, Minimizing the Schedule Length using the Budget Level (MSLBL). Wu et al. [22] proposed a heuristic algorithm of PCP-B2 with cost-constrained and time-optimized. PCP-B2 uses the PCP-wise budget distribution mechanism, which balances budget among partial critical paths according to their sequential or parallel structure nature. Prodan et al. [9] proposed a general bi-criteria scheduling heuristic called dynamic constraint algorithm (DCA). DCA models the scheduling problem as an extension of the multiple-choice knapsack problem and uses dynamic programming to optimize two criterions. DCA first selects a criterion as the primary criterion and optimizes it. Then, DCA establishes a sliding constraint and optimizes the secondary criteria within the sliding constraint. Arabnejad et al. [8] proposed a deadline-budget constrained scheduling (DBCS), which defines a quality measure to balance time and cost. Sun et al. [23] proposed a scheduling algorithm using sub-deadline for workflow applications under budget and deadline constrained. The proposed approach uses sub deadline for each task to assign priority of each task.

The above methods offer valuable experience for the opportunities and challenges of workflow scheduling in a grid environment. However, there are very big differences between cloud computing environments and grid environments in resource supply and resource pricing mechanisms [13]. References [24,25,26,27,28,29] classified and described many workflow methods in the cloud computing environments. Since the scheduling constraints in this paper are time and cost, only these two QoS parameters are considered in our review of previous work. These algorithms can be divided into three categories, one is to constrain one parameter to optimize another parameter, two-parameter optimization and two-parameter constraint.

Concerning the optimization of cost constrained to time, Mao et al. [30] proposed an “auto-scaling” mechanism that automatically increases/decreases computing resources based on workload information to minimize execution costs while meeting deadlines. Abrishami et al. [31] designed two algorithms based on the Partial Critical Path (PCP) [32], which were the IaaS Cloud Partial Critical Paths (IC-PCP) one-phase algorithm and the IaaS Cloud Partial Critical Paths with Deadline Distribution (IC-PCPD2) two-phase algorithm. The purpose of the two algorithms is to minimize the cost of workflow execution while meeting a user-defined deadline. Calheiros et al. [33] gives the Enhanced IC-PCP with Replication (EIPR) algorithm for improving IC-PCP. The algorithm applies task replication to increase the chances of meeting deadlines. Rodriguez et al. [13] proposed the Particle Swarm Optimization (PSO) algorithm based on meta-heuristic optimization technology to minimize the overall workflow execution cost while meeting deadline constraints. Hong et al. [18] proposed a hybrid distribution estimation algorithm to optimize cost with deadline constraints and setup time in cloud computing. Arabnejad et al. [14] introduced a new heuristic scheduling algorithm Deadline Distribution Ratio (DDR) to solve the workflow scheduling problem with the objectives of minimizing the cost of cloud computing resources while meeting a given deadline. Anwar et al. [34] proposed a dynamic scheduling algorithm of bag of tasks based workflows to meet user-defined deadline constraints while minimizing costs. Sahni et al. [15] proposed a dynamic cost-effective deadline-constrained heuristic algorithm for scheduling scientific workflows in public clouds. Singh et al. [19] proposed a scheduling algorithm called Partition Problem based Dynamic Provisioning and Scheduling (PPDPS) to optimize the execution cost of deadline-constrained workflow applications. Meena et al. [35] proposed a meta-heuristic cost effective genetic algorithm to minimize the execution cost of workflow while meeting the deadlines of the cloud computing environment. Wu et al. [3] proposed a meta-heuristic algorithm L-ACO and a simple heuristic algorithm ProLiS to minimize the execution cost of workflow in the cloud under a deadline constraint.

Concerning the optimization of time constrained to cost, Wu et al. [36] rigorously proved that the scheduling problem of budget constraints is not only NP-complete, but also incomparable, and a heuristic solution is designed for this problem. Ghafouri et al. [16] proposed a scheduling algorithm called CB-DT (Constrained Budget-Decreased Time) to reduce makespan while meeting the budget constraints of workflow applications. In order to get a smaller makespan, the algorithm attempts to use the back-tracking method to select faster and more expensive resources for critical tasks as much as possible. Rodriguez et al. [17] proposed a budget-driven algorithm with fine-grained billing periods, the purpose of which is to optimize the way in which the budget is spent to minimize the application's makespan (i.e., total execution time). Arabnejad et al. [37] proposed a scheduling algorithm as Budget Distribution with Trickling (BDT) and concluded that the earlier calculation of biasing the budget distribution to the workflow would result in a lower makespan within the budget. Faragardi et al. [38] proposed the Greedy Resource Provisioning and modified HEFT (GRP-HEFT) algorithm for minimizing the makespan of a given workflow subject to a budget constraint for the hourly-based cost model of modern IaaS clouds. The GRP-HEFT consists of two parts, (i) a resource provisioning algorithm and (ii) a scheduling algorithm. The resource provisioning algorithm lists the instance types according to their efficiency rate (its capacity divided by its cost). For the scheduler, [38] modified the HEFT algorithm to consider a budget limit. Rizvin et al. [39] proposed the Fair Budget-Constrained Workflow Scheduling algorithm (FBCWS) to minimize the makespan while satisfying budget constraints and a fair means of schedule for every task. Chakravarthi et al. [40] proposed the Normalization based Budget constraint Workflow Scheduling algorithm (NBWS), which controls the resource selection range of each task according to the available budget, thereby improving the probability ‘best’ resource selection for the task.

With regard to bi-parameter optimization, Su et al. [41] proposed a cost-effective scheduling algorithm based the concept of Pareto dominance, which produced the same makespan as the Heterogeneous Earliest Finish Time [42] (HEFT) algorithm, while significantly reducing costs. Zhu et al. [43] proposed an evolutionary multi-objective optimization (EMO) algorithm to solve the multi-objective cloud scheduling problem that minimizes makespan and cost. Choudhary et al. [44] proposed a hybrid algorithm based on Gravitational Search Algorithm (GSA) and HEFT algorithm for bi-objective workflow scheduling in cloud computing, i.e., optimization time and cost, but it is for a fixed number of virtual machines.

Regarding the bi-parameter constraint, Malawski et al. [45] proposed several dynamic (online) and static (offline) algorithms for a given budget and deadline, but these algorithms only consider one instance type. Verma et al. [46] proposed Bi-Criteria Priority based Particle Swarm Optimization (BPSO) that solves workflow scheduling problems in the cloud given deadlines and budget constraints. The algorithm can produce good scheduling results, but it has higher time complexities. In [47], Verma et al. proposed Budget and Deadline Constraint Heterogeneous Earliest Finish Time (BDHEFT) algorithm for workflow scheduling problems with deadlines and budget constraints, which is an extension of the HEFT algorithm. Similar to BDHEFT, Arabnejad et al. [20] proposed a Budget Deadline Aware Scheduling (BDAS) algorithm which solves the problem of workflow scheduling under budget and deadline constraints in the cloud. The BDAS algorithm first divides the tasks in the workflow into different levels, then allocates a budget and sub-deadline for each level, and finally selects resources for the task based on the cost time trade-off factor. The experimental results given in [20] show that the BDAS algorithm is superior to the BDHEFT algorithm. Ghasemzadeh et al. [21] proposed a Deadline-Budget Workflow Scheduling (DBWS) algorithm. Similar to the BDAS algorithm, the DBWS algorithm also divides the tasks in the workflow into different levels.

Most of the scheduling strategies mentioned above focus on constraining one QoS parameter to optimize another. In this paper, our goal is to propose a novel workflow scheduling algorithm to find a feasible solution for a workflow that meets budget and deadline constraints. To evaluate the performance of our scheduling strategy, we chose to compare with the two latest low-time complexity algorithms based on budget and deadline constraints, namely BDAS [20] and DBWS [21].

3 System scheduling model

The resource model, application model, and scheduling objective are described in this section.

The resource model, assumed in this work, consists of virtualized resources provided by an IaaS cloud service provider. The IaaS cloud service provider offers a variety of virtual machine (VM) types that can be represented as \({\text{VMT}} = \{ vmt_{1} ,vmt_{2} ,...,vmt_{{|{\text{VMT}}|}} \}\). The processing power of the various VM types can vary. The reason behind this could be (i) different MIPS rate for the virtual processor, (ii) different number of virtual cores, (iii) different memory size and storage capacity, and different storage access time. Cloud providers specify the processing power of different types of the provided VMs use a metric names Compute Unit (CU) (e.g., ECU used by Amazon EC2) [38]. \(CU(vmt_{k} )\) denotes the compute unit of \(vmt_{k}\).

In the same way, as in [20, 21], all VMs are assumed to be in the same data center or region and average bandwidth between virtual machines is roughly equal. Suppose there is no limit to the number of VM instances lease (used) by an application. Also, when a VM is leased, it requires an initial boot time for its proper initialization before it is made available to the user. Similarly, when the VM is released, it again takes some time to shut down properly. In addition, the pricing model is based on a pay-as-you-go billing scheme and the users are charged for the number of time intervals they use (lease) a VM. We use the fine-grained billing period provided by most providers, such as Amazon’s EC2 cloud [48], Google Compute Engine [49] and Microsoft Azure Microsoft [50], which are 1 s billing period. The fine-grained billing period is an emerging billing period in the cloud computing environment in the past two years. Flexibility is limited when coarse-grained billing periods, so that it is easy to cause inevitable waste [17]. In September 2017, Amazon announced that EC2 will use the per-second billing period from October 2017 [51]. To ensure user benefits and increase competitiveness, many cloud providers have also pushed out fine-grained billing periods (billing per second), such as Google Compute Engine [49] and Microsoft Azure [50].

Since internal data transfer is free in most cloud environments, the data transfer cost is assumed to be zero.

3.1 Application model

A typical workflow application can generally be described as a Directed Acyclic Graph (DAG). A DAG can be modeled by a two-tuple \(G = (T,E)\), where \(T = \{ t_{1} ,t_{2} ,...,t_{n} \}\) is the set of nodes with each element representing a workflow task, and E is the set of directed edges with each element representing the dependency between two task nodes, that is, for \(\forall e_{i,j} \in E\), task \(t_{i}\) must finish its execution and transfer the resulting data to solve the data dependency before task \(t_{j}\) starts, thus, \(t_{i}\) is the immediate predecessor of \(t_{j}\) while \(t_{j}\) the immediate successor of \(t_{i}\).

In a given DAG, a task with no predecessor is called an entry task, and a task with no successor is called an exit task. It is assumed that a DAG has only one entry task and one exit task. If there is more than one entry task (exit task) in a DAG, a dummy entry task (exit task) with 0 weight and 0 communication can be added to the graph.

3.2 Scheduling objective

The objective of our algorithm is to find a feasible schedule for the workflow that can meet the user-defined QoS constraints, i.e., the finish time of the workflow must not exceed the user-defined time constraint (\(DEADLINE\)), and the total cost must not be higher than the user-defined cost constraint (\(BUDGET\)). For this workflow, if a schedule satisfies its time and cost constraints, the schedule is considered as a successful schedule; otherwise, it is considered as a failure.

4 Algorithm design

In this section, first, define the basic definitions that need to be used in the proposed algorithm (Sect. 4.1). Table 1 summarizes the common notations used throughout this paper.

Table 1 List of notations

4.1 Basic definition

Definition 1

The execution time of task \(t_{i}\) on virtual machine \(vm_{k}\) is represented by \(ET_{{t_{i} }}^{{vm_{k} }}\). For workflows, Juve et al. [52] has published the execution time of tasks on Xeon@2.33 GHz CPUs (CU = 8). The execution time of tasks is reversely proportional to the CU value [38]. In the same way, as in [38], we use the execution time of tasks for a VM with \(CU\) = 1 as the reference execution time of tasks and denoted by \(ref\_time\). Accordingly, \(ET_{{t_{i} }}^{{vm_{k} }}\) is.

$$ ET_{{t_{i} }}^{{vm_{k} }} = \frac{{ref\_time(t_{i} )}}{{CU(vmt(vm_{k} ))}} $$
(1)

Definition 2

The data transfer time between the tasks \(t_{i}\) and \(t_{j}\) is defined as:

$$ TT_{i,j} = \left\{ {\begin{array}{*{20}c} {data_{i,j} /bw} & {vm(t_{i} ) \ne vm(t_{j} )} \\ {0,} & {vm(t_{i} ) = vm(t_{j} )} \\ \end{array} } \right. $$
(2)

where \(data_{i,j}\) is the amount of data elements that \(t_{i}\) sends to \(t_{j}\), \(bw\) is the communication bandwidth between VMs. If \(t_{i}\) and \(t_{j}\) are assigned to the same VM (denoted by \( \, vm(t_{i} ) = vm(t_{j} )\)), \(TT_{i,j}\) becomes 0.

Definition 3

All immediate predecessors of task \(t_{i}\) are defined as:

$$ pred(t_{i} ) = \{ t_{j} |(t_{j} ,t_{i} ) \in E\} $$
(3)

Definition 4

All immediate successors of task \(t_{i}\) are defined as:

$$ succ(t_{i} ) = \{ t_{j} |(t_{i} ,t_{j} ) \in E\} $$
(4)

Definition 5

\(sched_{{vm_{k} }}\) denotes a set of tasks that are scheduled to be on \(vm_{k}\).

$$ sched_{{vm_{k} }} = \{ t_{i} |vm(t_{i} ) = vm_{k} \} $$
(5)

where \(vm(t_{i} )\) is the VM which is assigned to the task \(t_{i}\).

Definition 6

The start time of the task \(t_{i}\) on the VM \(vm_{k}\) is calculated as follows:

$$ ST_{{t_{i} }}^{{vm_{k} }} = \max \left\{ {avail_{{vm_{k} }} ,\mathop {\max }\limits_{{t_{j} \in pred(t_{i} )}} \{ FT_{{t_{j} }} + TT_{i,j} \} } \right\} $$
(6)

where \(avail_{{vm_{k} }}\) is the available time of \(vm_{k}\).

$$ avail_{{vm_{k} }} = \left\{ {\begin{array}{*{20}c} {boot\_time_{{vm_{k} }} ,} & {sched_{{vm_{k} }} = \phi } \\ {FT_{{t_{L} }}^{{vm_{k} }} ,} & {sched_{{vm_{k} }} \ne \phi } \\ \end{array} } \right. $$
(7)

where \(boot\_time_{{vm_{k} }}\) is the VM startup/boot time, and \(t_{L}\) is the last task in the sorted tasks scheduled list for the virtual machine \(vm_{k}\)(\(sched_{{vm_{k} }}\)).

Definition 7

The Finish Time of the task \(t_{i}\) on \(vm_{k}\) is calculated as follows:

$$ FT_{{t_{i} }}^{{vm_{k} }} = ST_{{t_{i} }}^{{vm_{k} }} + ET_{{t_{i} }}^{{vm_{k} }} $$
(8)

Definition 8

\(DAG_{makespan}\) indicates makespan or schedule length, which is the finish time of the exit task of the workflow:

$$ DAG_{makespan} = FT_{exit} $$
(9)

Definition 9

The execution cost of task \(t_{i}\) on \(vm_{k}\) is calculated as follows:

$$ \begin{gathered} {\text{Cos}} t_{{t_{i} }}^{{vm_{k} }} = \hfill \\ \left\{ {\begin{array}{*{20}c} {\left\lceil {\frac{{FT_{{t_{i} }}^{{vm_{k} }} - ST_{{t_{i} }}^{{vm_{k} }} }}{IT}} \right\rceil * VMC(vmt(vm_{k} )), \, } & {sched_{{vm_{k} }} = \phi } \\ {0, \, } & { \, FT_{{t_{i} }}^{{vm_{k} }} \le RT_{{vm_{k} }} \, and \, sched_{{vm_{k} }} \ne \phi } \\ {\left\lceil {\frac{{FT_{{t_{i} }}^{{vm_{k} }} - RT_{{vm_{k} }} }}{IT}} \right\rceil * VMC(vmt(vm_{k} )), \, } & {otherwise} \\ \end{array} } \right. \hfill \\ \end{gathered} $$
(10)

where \(VMC(vmt(vm_{k} ))\) is the monetary cost per charging unit (price) of a VM type \(vmt(vm_{k} )\), \(IT\) is the billing period, and \(RT_{{vm_{k} }} \, \) is indicated as the last interval used by last scheduled task on \(vm_{k}\).

$$ RT_{{vm_{k} }} \, = \left\lceil {\frac{{FT_{{t_{L} }}^{{vm_{k} }} }}{IT}} \right\rceil *IT $$
(11)

where \(t_{L}\) is the last task in the sorted tasks scheduled list for \(vm_{k}\).

Definition 10

\(DAG_{cost}\) represents the overall cost of executing a workflow application and is defined as:

$$ DAG_{\text{cost}} = \sum\limits_{{t_{i} \in T}} {\{ {\text{Cos}} t_{{t_{i} }}^{{vm_{k} }} |t_{i} \in sched_{{vm_{k} }} \} } $$
(12)

Definition 11

\(ST_{{t_{curr} }}^{best}\) is the best start time of the task \(t_{curr}\), that is, the start time of the task \(t_{curr}\) in the VM with the earliest finish time in all tested VMs.

4.2 The MW-HBDCS algorithm

The BDCWS algorithm consists of two main phases, the task selection phase and the VM selection phase.

4.2.1 Task selection phase

Tasks are selected according to their priorities. To assign a priority for a task in the DAG, the upward rank (\(urank\)) is computed. This \(urank\) represents, for a task \(t_{i}\), the length of the longest path from \(t_{i}\) to the exit task \(t_{exit}\), including the execution time of task \(t_{i}\) and it is given by Eq. (13).

$$ urank_{{t_{i} }} = ET_{{t_{i} }}^{\min } + \mathop {\max }\limits_{{t_{{child \in succ(t_{i} )}} }} \{ urank_{{t_{child} }} + TT_{i,child} \} $$
(13)

where \(ET_{{t_{i} }}^{\min }\) is the minimum execution time for task \(t_{i}\) among all VM type. For the exit node, \(urank_{{t_{exit} }}\) = \(ET_{{t_{exit} }}^{\min }\).

The task with the highest \(urank\) value receives the highest priority, followed by the task with the next highest \(urank\) value, and so on.

4.2.2 VM selection phase

The VM selection phase is responsible for selecting the appropriate VM for the current task \(t_{curr}\). To control the time and cost of consumption during the scheduling process, a limit value for each factor is needed. We define two variables, TOD (Task Optimistic Deadline) and TOAB (Task Optimistic Available Budget) as limits for time and cost, as shown in Eqs. (14) and (16), respectively.

$$ TOD_{{t_{curr} }} = ST_{{t_{curr} }}^{best} + OSD_{{t_{curr} }} + ET_{{t_{curr} }}^{\min } $$
(14)

where \(ET_{{t_{curr} }}^{\min }\) is the minimum execution time for task \(t_{curr}\) among all VM type, and \(OSD_{{t_{curr} }}\) denotes the optimistic spare deadline the current task \(t_{curr}\), as shown in Eq. (15).

$$ OSD_{{t_{curr} }} = DEADLINE - ST_{{t_{curr} }}^{best} - urank_{{t_{curr} }} $$
(15)

where \(urank_{{t_{curr} }}\) is the \(urank\) for the current task \(t_{curr}\).

$$ TOAB_{{t_{curr} }} = \frac{{ET_{{t_{curr} }}^{\max } }}{IT}*VMC(vmt(vm_{\max } )) + OSB $$
(16)

where \(ET_{{t_{curr} }}^{\max }\) is the maximum execution time for the current task \(t_{curr}\) among all VM type. \(OSB\) denotes the optimistic spare budget, which is equal to the difference of the budget and the sum of the cost for allocated tasks and the cheapest cost for unscheduled tasks. The formula is as follows:

$$ OSB = BUDGET - \left[ {\sum\limits_{{t_{i} \in T_{assigned} }} {{\text{Cos}} t_{{t_{i} }}^{{vm_{sel} }} } } \right. + \left. {\sum\limits_{{t_{i} \in T - T_{assigned} }} {{\text{Cos}} t_{{t_{i} }}^{\min } } } \right] $$
(17)

where \(vm_{sel}\) is the VM selected to run the scheduled task, \(Cost_{{t_{i} }}^{\min }\) denotes the minimum execution cost of task \(t_{i}\) among all VM type, and \(T_{assigned}\) is the set of allocated tasks.

In the VM selection, to guarantee that the workflow can be executed without exceeding the budget constraint, first all the \(vm_{j} \in R\) are filtered by \(TOAB_{{t_{curr} }}\) to construct an affordable set \(S_{affordable} (t_{curr} )\). \(R\) represents the set of resources (VM instances) used in previous steps of scheduling.

$$ S_{affordable} (t_{curr} ) = \{ vm_{j} |{\text{Cos}} t_{{t_{curr} }}^{{vm_{j} }} \le TOAB_{{t_{curr} }} ,vm_{j} \in {\text{R}}\} $$
(18)

Then, the VM is selected using the following selection rules:

  1. (1)

    If \(S_{affordable} (t_{curr} ) = \phi\) and \(TOD_{{t_{curr} }} \le 0\), the VM with the earliest finish time in \(R \cup R^{^{\prime}}\) is selected to execute the current task \(t_{curr}\). \(R^{^{\prime}}\) is defined as the set of one temporary VM from each available \(VMT\).

  2. (2)

    If \(S_{affordable} (t_{curr} ) = \phi\) and \(TOD_{{t_{curr} }} > 0\), the cheapest VM in \(R \cup R^{^{\prime}}\) is selected to execute the current task \(t_{curr}\).

  3. (3)

    If \(S_{affordable} (t_{curr} ) \ne \phi\), first, filter all \(vm_{j} \in R^{^{\prime}}\) with \(TOAB_{{t_{curr} }}\) and build an affordable set (as in Eq. 19). Then for all \(vm_{j} \in S_{affordable} (t_{curr} ) \cup\)\(S_{affordable}^{^{\prime}} (t_{curr} )\) and \(FT_{{t_{curr} }}^{{vm_{j} }} < TOD_{{t_{curr} }}\), calculate their Bi-factor (BF) values (as in Eq. 20). Finally, the VM is selected for the current task \(t_{curr}\) according to (a) and (b) below.

  4. (a)

    If \(\exists vm_{j} \in S_{affordable} (t_{curr} ) \cup S^{\prime}_{affordable} (t_{curr} )\) is such that \(FT_{{t_{curr} }}^{{vm_{j} }} < TOD_{{t_{curr} }}\), select the VM with the smallest BF value for the current task \(t_{curr}\).

  5. (b)

    Otherwise, select the VM with the earliest finish time from \(S_{affordable} (t_{curr} ) \cup S^{\prime}_{affordable} (t_{curr} )\) to execute the current task \(t_{curr}\).

    $$ S^{\prime}_{affordable} (t_{curr} ) = \{ vm_{j} | {\text{Cos}} t_{{t_{curr} }}^{{vm_{j} }} \le TOAB_{{t_{curr} }} ,vm_{j} \in R^{\prime}\} $$
    (19)
    $$ BF_{{t_{curr} }}^{{vm_{j} }} = TF_{{t_{curr} }}^{{vm_{j} }} + CF_{{t_{curr} }}^{{vm_{j} }} $$
    (20)

where \(TF_{{t_{curr} }}^{{vm_{j} }}\) and \(CF_{{t_{curr} }}^{{vm_{j} }}\) represent the time factor and cost factor, respectively, such as (21) and (22).

$$ TF_{{t_{curr} }}^{{vm_{j} }} = FT_{{t_{curr} }}^{{vm_{j} }} /\left( {\frac{{ET_{{t_{curr} }}^{{vm_{j} }} }}{{urank_{{t_{curr} }}^{{}} }}*OSD_{{t_{curr} }} } \right) $$
(21)
$$ CF_{{t_{curr} }}^{{vm_{j} }} = \frac{{cost_{{t_{curr} }}^{{vm_{j} }} }}{OSB} $$
(22)

The pseudo code of BDCWS is shown in Table 2.

Table 2 The BDCWS algorithm

In the algorithm, \(urank\) value of all the tasks are first calculated by line 1. Second, in the while loop in lines 2–35, the algorithm tries to allocate VMs for all the tasks. At every loop iteration, line 3 selects the task with the highest \(urank\) as the current task \(t_{curr}\) In lines 4–5, its \(TOD_{{t_{curr} }}\) and \(TOAB_{{t_{curr} }}\) values are calculated. Line 6 constructs a set of affordable resources \(S_{affordable} (v_{curr} )\) for the current task \(t_{curr}\) then, the algorithm selects a VM for the current task \(t_{curr}\) according to the defined selection rules implemented in lines 7–32. Finally, line 33 adds the mapping of the current task \(t_{curr}\) to \(vm_{sel}\) to the MAP.

The time complexities of each step in the algorithm are as follows:

  1. a.

    The complexity of calculating the \(urank\) for all tasks is \(O(e + n)\), where \(e\) is the number of edges in the DAG and \(n\) is the number of nodes.

  2. b.

    At each step of the VM selection phase, the complexity of calculating the Task Optimistic Available Budget TOAB is \(O(n \cdot m)\), where \(m\) is the number of available VM.

  3. c.

    At each step of the VM selection phase, the complexities of calculating the Task Optimistic Deadline TOD and constructing \(S_{affordable} (v_{curr} )\) are each \(O(m)\).

Thus, the total time is \(O(e + n + n \cdot (n \cdot m + m + m))\), resulting in a total algorithm complexity of \(O(n^{2} \cdot m)\).

5 Experimental evaluation

Since it is very difficult to perform repeatable experiments on real datacenters, we implement our proposed approaches on the simulation platform of [3], which uses Java 2 Standard Edition V1.7.0 and is available at https://github.com/wuquanwang/workflow. In order to better evaluate the proposed algorithm, the DBWS [21] algorithm and the BDAS algorithm [20] which both are the latest two heuristic scheduling algorithms for deadline and budget constraints are chosen as the compared metrics of BDCWS algorithm.

In the experiment, we used 9 different types of VMs in [3], and each with different processing power and cost, the details are shown in Table 3. Especially, the average bandwidth between VMs is set to 20 Mbps, which is the approximate average bandwidth between computing services in Amazon EC2 [53]. The billing period and service startup time are set to 1 s and 97 s [54], respectively.

Table 3 Capabilities and costs of available service types

In addition, we use the same approach to calculate the cost execution of each task (Eq. 9) for all algorithms.

5.1 Budget and deadline constraints

It is necessary to define a predetermined deadline and budget constraint for each workflow to evaluate the proposed algorithm, such as Eqs. (23) and (24).

$$ DEADLINE = \min_{D} *(0.8 + \alpha_{D} ) $$
(23)
$$ BUDGET = \min_{B} *({0}{\text{.8}} + \alpha_{B} ) $$
(24)

where \(\min_{{\text{D}}}\) represents makespan for scheduling target workflows using HEFT on the fastest virtual machine set, \(\min_{B}\) represents the total execution cost of scheduling target workflows using HEFT on the cheapest-cost virtual machine set, \(\alpha_{D}\) and \(\alpha_{B}\) represent the time parameter and cost parameter, respectively.

In the experiments, we let \(a_{D}\) = [0.2,0.4,0.6, 0.8,1,2,3,4] and \(a_{B}\) = [0.2,0.4,0.6,0.8,1,2,3,4].

5.2 Performance metrics

To evaluate our algorithm with other algorithms, the different performance metrics are exploited, and the details are shown in the following contents.

  • Planning Successful Rate (PSR): it is defined as the planning success rate of finding a feasible schedule while satisfying the user-defined deadline and budget, as expressed by Eq. (25):

    $$ PSR = 100 \times \frac{\begin{gathered} Number\,of\, experiments\,that \hfill \\ successfully\,meet\,deadline\,and\,budget \hfill \\ \end{gathered} }{Total\,number\,of\, experiments} $$
    (25)
  • Makespan to Deadline Ratio (MDR): the ratio of makespan achieved and deadline defined for each workflow, the expression is given by Eq. (26):

    $$ MDR = \frac{{DAG_{makespan} }}{DEADLINE} $$
    (26)
  • Cost to Budget Ratio (CBR): the ratio of execution cost of the schedule produced and budget defined for each workflow, as expressed by Eq. (27):

    $$ CBR = \frac{{DAG_{cost} }}{BUDGET} $$
    (27)
  • Cost frequency of algorithm A relative to algorithm B: the cost frequency includes the best cost frequency, the worse cost frequency and the equal cost frequency, and the calculation formulas are given by Eqs. (28), (29) and (30), respectively.

    $$ Best \, cost \, frequency = \frac{Number \, of \, best \,cost}{{Total \, number \, of \, experiments}} $$
    (28)
    $$ Worse \, cost \, frequency = \frac{Number \, of \, worse \, cost}{{Total \, number \, of \, experiments}} $$
    (29)
    $$ Equal \, cost \, frequency = \frac{Number \, of \, equal \,cost}{{Total \, number \, of \, experiments}} $$
    (30)

where the best cost means that the cost of executing the same workflow according to the scheduling scheme generated by Algorithm A is smaller than Algorithm B, the worse cost means that the cost of executing the same workflow according to the scheduling scheme generated by Algorithm A is larger than Algorithm B, and the equal cost means that the cost of executing the same workflow according to the scheduling scheme generated by Algorithm A is the same as Algorithm B. And if the best cost frequency of algorithm A relative to algorithm B is greater than the worse cost frequency, that means algorithm A is more cost effective than algorithm B.

5.3 Experimental data sets

In our experiments, we use Workflow Generator [55] to generate three different areas of workflow, namely Epigenomics from bioinformatics, Montage from astronomy and LIGO from gravitational physics, and the details about them can be found in Ref. [52]. These workflow applications differ in terms of computational characteristics, structure, and communication data. The workflow for each type application of small size is shown in Fig. 1.

Fig. 1
figure 1

The structure of the small size workflows

Especially, 8 type sizes are selected for each workflow application, with task sizes of 30, 50, 100, 200, 300, 400, 500, and 1000, respectively. And since 100 different workflows are tested for each size, the total number of experimental workflows is 3 × 8 × 100 = 2400.

5.4 Results and analysis

For our experiments, we define 8 different deadline factors and 8 different budget factors. Permuting both factors yield 64 different cases per workflow and each workflow includes 51,200 test cases.

5.4.1 EPIGENOMIC

As shown in Figs. 2, 3, 4, and 5, the Figs. show the results of Epigenomic, where Figs. 2 and 4 show CBR and MDR obtained by each algorithm. And the CBR and MDR values are divided into two main categories, where a valid schedule is represented by yellow and an invalid schedule is represented by green. A value is less than 1 for the CBR metric (Eq. 27) means that the algorithm meets the user defined budget. But a value CBR > 1 means that the algorithm failed to find a schedule map with the cost is lower than the user-defined budget. Similarly, the same explanation also can be applied to MDR (Eq. 26). For BDCWS, it almost uses the entire deadline and performs workflow execution at a lower cost than the defined budget to satisfy the users. As shown in Fig. 2, for all range of deadline factor \(\alpha_{{\text{B}}}\), the execution cost of the schedule map obtained by DBWS almost meets user-defined budget constraints. As can be found in Figs. 3 and 5, DBWS and BDAS both observed the worst performance in the first deadline factor (Fig. 3) and the first budget factor (Fig. 5), which means 100% failure.

Fig. 2
figure 2

CBR value for EPIGENOMIC grouped by \(\alpha_{{\text{D}}}\)

Fig. 3
figure 3

PSR value for EPIGENOMIC grouped by \(\alpha_{{\text{D}}}\)

Fig. 4
figure 4

MDR value for EPIGENOMIC grouped by \(\alpha_{{\text{B}}}\)

Fig. 5
figure 5

PSR value for EPIGENOMIC grouped by \(\alpha_{{\text{B}}}\)

5.4.2 MONTAGE

Figures 6, 7, 8, and 9 shows the results of MONTAGE. And from Fig. 9, it can find that the PSR obtained by the BDCWS algorithm is better than the DBWS algorithm and the BDAS algorithm, especially when the budget factor is 0.2. And the PSR obtained from the DBWS and BDAS algorithms is less than 4%, but under the same conditions, the PSR obtained from the proposed BDCWS algorithm is higher than 80%. As shown in Fig. 6, the behavior of the DBWS algorithm in MONTAGE is similar to EPIGENOMIC, that is, the costs obtained for all range of deadline factor \(\alpha_{{\text{D}}}\) almost satisfy the budget constraint. Furthermore, an interesting feature is observed when the deadline factor \(\alpha_{{\text{D}}}\) is equal to 0.2. As can be seen, all cost budget ratio results in a valid schedule with a success rate of 0%. It means that 100% of the failures in the algorithm are due to violating deadline.

Fig. 6
figure 6

MDR value for MONTAGE grouped by \(\alpha_{{\text{D}}}\)

Fig. 7
figure 7

PSR value for MONTAGE grouped by \(\alpha_{{\text{D}}}\)

Fig. 8
figure 8

MDR value for MONTAGE grouped by \(\alpha_{{\text{B}}}\)

Fig. 9
figure 9

PSR value for MONTAGE grouped by \(\alpha_{{\text{B}}}\)

5.4.3 LIGO

The results of LIGO are shown in Figs. 10, 11, and 12. As can be found in Fig. 12, the reason why the BDAS box plot is not shown is any ratio greater than 1 means that a valid schedule cannot be generated, so we only display values of up to 4 to explain the results of valid schedules in more detail. Meanwhile, the behaves of BDCWS in LIGO is similar to EPIGENOMIC, it uses almost all deadlines and completes the execution of the workflow within the specified budget. And for the DBWS Algorithm, we also can find that as the budget factor increases, the ratio of makespan and deadline gradually decreases. Furthermore, it can be observed in Figs. 11 and 13 that the BDAS algorithm achieves almost 100% failure when deadlines and costs are tight.

Fig. 10
figure 10

MDR value for LIGO grouped by \(\alpha_{{\text{D}}}\)

Fig. 11
figure 11

PSR value for LIGO grouped by \(\alpha_{{\text{D}}}\)

Fig. 12
figure 12

MDR value for LIGO grouped by \(\alpha_{{\text{B}}}\)

Fig. 13
figure 13

PSR value for LIGO grouped by \(\alpha_{{\text{B}}}\)

5.4.4 The total success rate

Table 4 shows the total success rate of each algorithm scheduling workflow with different deadline factor \(\alpha_{D}\) and budget factor \(\alpha_{B}\). Especially, for each workflow, we use 8 different size workflows and 100 different workflows were tested for each size. Meanwhile, for each workflow, we tested 64 different states, which are combinations of 8 different deadline factors \(\alpha_{D}\) and 8 budget factors \(\alpha_{B}\). Therefore, in this study, we provided 51,200 different test cases for each scientific workflow.

Table 4 Total success rate for three different scientific workflows

It can be clearly seen from the results in Table 4 that the best performance is the BDCWS algorithm, which has a success rate of more than 81.7% in all data sets and the best performance in MONTAGE, reaching 90.8%. And the worst performer was the BDAS algorithm, especially with 99.4% of failed test cases in MONTAGE. Therefore, according to the success rate shown in Table 4, it is shown that our method is more likely to generate an acceptable scheduling scheme under defined constraints.

5.4.5 Cost frequency

Since it is meaningless to compare the cost frequency when the budget constraint and the deadline are not met, the experimental results of \(a_{D} = [1,2,3,4]\) and \(a_{B} = [1,2,3,4][1,2,3,4]\) are taken to compare the cost frequency, namely the success rate of the algorithms BDCWS and DWBS is reaching 100%.

Table 5 compares the best frequencies, worse frequencies, and equal cost frequencies of the algorithms BDCWS and DWBS. Compared with the DWBS algorithm, the BDCWS algorithm has the best cost frequency more than the worse frequency, especially in the MONTAGE, the best cost frequency reaches 100%. Therefore, we can say that the BDCWS algorithm is more cost-competitive than the DWBS algorithm.

Table 5 Cost frequency

6 Conclusions and future work

In this paper, for the workflow scheduling problem in the cloud environment, a BDCWS algorithm is proposed. The BDCWS algorithm maps a workflow constrained by user-defined deadline and budget values to cloud resources. BDCWS takes the basic characteristics of the cloud environment into accounts, such as heterogeneity, on-demand resource provisioning and pay-as-you-go price model with time periods as well as booting time, and take a series of targeted measures to improve its effectiveness. The experimental results show that compared with the latest two algorithms (DBWS and BDAS), our proposed BDCWS algorithm increases the chances of meeting deadlines and budget constraints. Especially when the deadline and budget are tight, the improvement of BDCWS algorithm is significantly higher than DBWS algorithm and BDAS algorithm. Besides, the BDCWS algorithm is more cost competitive than DBWS. In addition, we found that the structure of the workflow seems to have a significant impact on the success rate, such as the success rate of BDCWS algorithm in MONTAGE is higher than 90%, while on EPIGENOMICS it is only 81.7%. Therefore, in the future, we will study how to choose proper algorithm in an intelligent way for a given workflow to get better schedule. Another future direction is to explore the impact of VM performance variation on workflow execution and improve our strategy to adapt to volatile environments.