Keywords

1 Introduction

Biomedical workflow applications consist of numerous interdependent tasks, and there exist huge data transfer between the tasks. These applications are generally represented as Directed Acyclic Graphs (DAGs), which brings a problem of resource scheduling in distributed systems. Cloud is a distributive computing environment that dynamically delivers scalable, on demand services through virtualization of hardware and software over the internet. Cloud is based on a market-oriented paradigm, where the services consumed by the customers are charged on pay-as-you-go model [1, 2]. The prospect of running workflow applications through the cloud is made attractive by its benefits. The essential benefits includes,

  • Virtualization - Cloud gives the illusion of unlimited resources and this allows the user to acquire sufficient resources at any time.

  • Elasticity - Cloud providers offer scalable resources to its users so that the resources are gained and released as per the requirements.

A notion of discovering the suitable resource from the heterogeneous resource pool for the workflow task is referred as scheduling [3]. The mapping of tasks to the resources is an NP-Complete problem [4]. Scheduling of biomedical workflow applications involves substantial communicational and computational costs, which strongly emphasizes the usage of cloud computing for their execution. Cloud providers offer heterogeneous computing resources with different capabilities at various prices. Generally, high computing resources are expensive than the slower resource. Hence different scheduling is possible for the same workflow, which in turn impacts the scheduling time and cost. Therefore a special care for scheduling should be taken to avoid the unnecessary cost.

An optimized cost based workflow scheduling algorithm is proposed to schedule the biomedical workflow application in the cloud to minimize the overall execution time and cost for the execution of the workflow.

2 Related Works

Suraj pandey et al. [5] proposed the particle swarm optimization algorithm based heuristic for the scheduling of workflow applications to cloud resources with an aim to reduce the execution cost by considering the computation cost and data transmission cost. Arabnejad et al. [6] presented a Proportional Deadline Constrained (PDC) for mapping workflow tasks to cloud resources which minimizes the cost while meeting deadline constraints. The algorithm considers the execution cost and time for the selection of resource. Amandeep Verma et al. [7] proposed Budget and Deadline Constrained Heterogeneous Earliest Finish Time (BDHEFT) for workflow scheduling. The spare workflow cost and current task cost are considered for the selection of cost-efficient resource. Abrishami et al. [8] proposed QoS-based workflow scheduling algorithm based on Partial Critical Path (PCP), which tries to minimize the execution cost while meeting user defined deadline. PCP algorithm tries to schedule the critical task that is; the tasks present in the critical path to the resources that executes the task earliest in order to minimize the total cost of the path and executes all the tasks before its finish time. Su et al. [9] proposed a Pareto optimal scheduling heuristic (POSH) to schedule tasks to the cost conscious resource based on pareto dominance. It uses the execution time along with the cost factors to map the higher priority task to the cost efficient resource. Convolbo et al. [10] proposed cost aware scheduling algorithm for solving cost optimization problem for DAG scheduling on IaaS cloud. It schedules the job to the cost efficient resource by computing the execution time and resource usage cost.

3 System Model

3.1 Application Model

A workflow application is modeled by DAG is defined as W = G(T,E), where T is the set of n task{t1,t2,….tn} and E represents the set of directed edges {e1,e2….ek} between the workflow tasks. A task ti ε T, represents a task in workflow application and each edge (ti, tj) = e1 ε E, corresponds to a precedence constraints, where tj ε T cannot be executed till ti ε T finishes its execution. Each task ti ε T represents a computational workload, Wli which takes millions of instructions (MI) as a unit of measurement. A task with no predecessor and successor tasks is called as entry task and exit task respectively and the workflow size is determined by the number of tasks [12].

3.2 Cloud Model

Workflowsim [13] is a toolkit used in this experiment to mimic the cloud computing infrastructure. The service provider offers heterogeneous computational resources in the form of virtual machines VM {VM 1, VM 2,…VMm} with different prices. Each resource (Virtual Machine) VMm ε VM is capable of executing the given workflow application and its processing power is expressed in Millions of Instructions per Second (MIPS). Each VM has a different number of cores, MIPS, memory and storage configurations. Pricing is based on pay per use strategy similar to commercial clouds, where the users are charged based on the time interval and type of the resource used.

4 Optimized Cost Scheduling Algorithm

OCSA is an online scheduling algorithm, which comprises of three phases to schedule the biomedical workflow application in the cloud. The phases include Task selection, Resource Selection, and Resource allocation. Task selection phase selects the task with maximum execution time by preserving the parent-child relationship of a given biomedical workflow application. Resource selection phase is a significant phase of this proposed work, as it selects the optimal resource for the task execution and Resource allocation phase allocates the chosen resource to the workflow task for execution. The resource allocation phase in OCSA is a crucial phase where the actual scheduling occurs.

The main objective of the proposed work is to reduce the execution time and monetary cost of Biomedical Workflow applications in a cloud environment. Monetary cost includes execution cost, communication cost, storage cost and resource usage cost [14]. The following time factors are computed before resource selection which in turn is used to compute the various costs resulting in monetary cost for the selection of optimal VM.

$$ CT_{{t_{i} }} = \frac{{L_{{t_{i} }} }}{{VM_{c} }} $$
(1)

where CTti is the Computation time of task ti which calculates the Computation time of the task by the length of the task Lti with the capacity of Virtual Machine VMc.

Data transfer time between the interdependent tasks in a workflow application is calculated as

$$ CMT_{{t_{i} }} = \frac{{\sum\limits_{FS = 0}^{t} {FS} }}{{VM_{b} }} $$
(2)

where CMTti, represents the communication time of the task ti, which computes the Communication time between the tasks by the input and output file sizes, FS with Virtual Machine bandwidth, VMb.

Expected Execution Time of the workflow task is calculated from the Eqs. (1) and (2), as follows

$$ EET_{{t_{i} }} = CT{}_{{t_{i} }} + CMT_{{t_{i} }} $$
(3)

where EETti is the Expected Execution Time of the task ti, which computes the Expected Execution Time of the task ti on the VM by the computation time of the task CTti with the communication time of the task, CMTti.

Total execution cost for the workflow is calculated by using the execution time with resource usage cost, memory cost, communication cost and storage cost.

$$ EC = ET_{{t_{i} }} \times C_{r} $$
(4)

where EC is the Execution Cost which computes the Cost for Execution of the task on the VM by the execution time ETti with the resource cost Cr.

$$ MC = FS \times ET_{{t_{i} }} \times C_{\mu } $$
(5)

where MC is the Memory cost which computes the Cost for Memory Usage by the File size FS of the respective task along with its execution time and the memory usage cost, Cµ.

$$ CC = FS \times CT_{{t_{i} }} \times C_{\beta } $$
(6)

where CC is the communication cost which calculates the communication cost between the tasks by the input and output size of the files FS with the time of communication, CTti and Bandwidth Cost, Cβ. Communication cost is applicable, only when there exist a dependency between the tasks that is when ei,ej > 0. And the communication cost will be zero for the tasks executing on the same resource.

$$ SC = FS \times C_{s} $$
(7)

Storage Cost, SC is computed by the size of the file stored with the Storage cost, Cs. And finally, the Minimum Execution Cost is computed from the Eqs. (4)–(7), which is used for selecting the appropriate resource from the heterogeneous resource pool.

$$ MEC_{w} = EC + MC + CC + SC $$
(8)

where MECw is the Minimum Execution Cost required to execute the Workflow task on the VM.

figure a

OCSA selects the appropriate resources for scheduling with the notion to reduce the time and monetary cost of the biomedical workflow applications. It selects the optimal cost VM by considering the execution cost, memory cost, communication cost, storage cost and resource cost so that the workflow tasks are executed in an optimal resource which is shown in the Algorithm 1.

Direct Cost of the applications is measured using the individual resource usage that is data storage cost, resource cost, resource computation cost, Network cost, I/O cost [15, 16]. As the proposed algorithm selects the Optimal VM by computing the direct cost as shown in the Eqs. (4)–(7), it significantly reduces the overall time and cost of execution of workflows in a cloud environment.

5 Experimental Setup and Result Analysis

Workflowsim toolkit is used to create a cloud environment for experimentation purposes, which consists of a Service Provider offering heterogeneous computational resources for workflow execution. The data center configuration is presented in the Table 1, while the resource characteristics and cost for using the resources are shown in Tables 2 and 3 respectively.

Table 1 Datacenter Characteristics
Table 2 Resources characteristics
Table 3 Cost of the resources

For the enhanced analysis and evaluation of the proposed algorithm the experiment is conducted with the real world biomedical workflow applications with a diverse structure and range of tasks varies randomly from 10, 25, 100 and 500. The traces of biomedical workflow applications are downloaded from Pegasus Workflow Generator [11]. The workflow structure of the considered biomedical workflow applications are depicted in the Fig. 1. The proposed algorithm OCSA is compared with the existing workflow scheduling algorithms in workflowsim (FCFS, MAXMIN, MINMIN and MCT) in terms of time and cost.

Fig. 1
figure 1

Structure of the Biomedical Workflow Applications

The execution time of the workflow application is calculated using Eqs. (1) and (2) and the results of OCSA compared with the existing scheduling algorithms are presented in Fig. 2, and the average execution time for the various workflow tasks which results from the average of 20 runs of each workflow execution is compared in Fig. 3, which substantiate that the execution cost of OCSA is considerably minimum than the other scheduling algorithms.

Fig. 2
figure 2

Execution Time comparisons of biomedical workflow applications

Fig. 3
figure 3

Average Execution Time comparison of biomedical workflow applications

The execution cost is computed for different biomedical workflow application based on the Eq. (4) and the results are depicted in Fig. 4 and the average cost for different workflow tasks are depicted in Fig. 5, which clearly illustrate that the proposed algorithm performance supersedes the existing approaches and reduces the overall cost.

Fig. 4
figure 4

Execution Cost Comparison of biomedical workflow applications

Fig. 5
figure 5

Average Execution Cost Comparison of biomedical workflow applications

Optimal Cost Scheduling Algorithm maps the task to appropriate VM by considering monetary costs which are computed by the summation of various individual costs involving storage cost, data transfer cost, memory cost and resource usage cost. This is turn, results in a best scheduling strategy that minimizes the overall execution time and cost of the workflow application.

6 Conclusion

An Optimized Cost Scheduling Algorithm is proposed to schedule a biomedical workflow application in cloud with an aim to minimize the overall execution time and cost. The algorithm is evaluated using workflowsim toolkit for four real world biomedical workflow applications and the comparison is made with the existing scheduling approaches of workflowsim (in terms of execution time and cost). The result analysis reveals that the proposed OCSA schedules a workflow application with minimal time and cost in the cloud environment.