1 Introduction

Cloud computing (CC) is a revolutionary solution for science and business communities demanding powerful computing services. In its Infrastructure as a Service (IaaS) deployment model, CC offers a virtualized hardware for those applications where customers require stronger control over computing services. A virtual machine (VM), the main unit of computation in IaaS, is almost identical to desktop machines. This flexibility allows users to build a network of VMs to create a strong tool of computing power. These tools have a great impact over the scientific community which has a strong dependency on computing tools to solve a wide range of problems, including scientific workflows.

A workflow is a group of simple processes to solve a complex problem. Scientific-oriented workflows cover a wide number of areas such as astronomy, geology, biology, cosmic analysis and biotechnology [1] as well as image processing [2, 3]. These applications demand high computing power and large storage capacity. Current computing systems have enough capacity to execute these applications; CC possesses an attractive environment to run the scientific workflows due to (1) its extraordinary computing power and immense data storage capacity, (2) that contrary to grid computing systems, any person or institution can access cloud resources, (3) that users do not need to invest in costly systems as with supercomputers, (4) that as opposed to cluster computing systems, customers can scale up/down the numbers of resources and (5) that customers have immediate access to resources in contrast to supercomputing systems where a waiting list of weeks are common.

Workflow owners must first obtain the required tools in order to run their experiments on the cloud resources. To begin, the application owner must define which variables need to be optimized based on his/her needs. Secondly, the user must divide the application into subgroups and upload them into the available resources. Additionally, the user must set computer communication protocols to establish a connection between computers in order to execute parallel tasks. At the moment, existing tools offer solutions to the aforementioned issues. For instance, WMS (workflow manager systems) [4, 5] produce a link between workflow owner and cloud resources. WMS deal with the necessary protocols in order to establish communication among computer resources to trigger experimentation. Even so, most WMS do not offer a critical analysis of application. To overcome this issue, researchers have developed different scheduling algorithms that analyze users’ application in order to optimize selected optimization variables. However, specialized scheduling algorithms have not reached full computer efficiency. Additionally, cloud users need assistance in building a complete system with the aforementioned tools.

To date, different studies exist to guide cloud customers to execute their workflows on cloud environments; however, these approaches fail in at least one of the following ways: (1) no complete framework with time and cost estimations to execute workflows tasks; (2) do not present different options to the user to run his/her experiments; and (3) do not provide a specialized scheduler that considers inter-dependencies and/or large data files in their workflows [6,7,8,9,10,11]. CUPA, our proposed framework in this article, encompasses the aspects of workflow execution on cloud environments; it acts as a layer on top of a WMS. In contrast to common WMS, CUPA provides a specialized scheduler capable of analyzing an application prior to execution in order to discover file and task dependencies for an efficient distribution of resources.

For the aforementioned reasons, the objective of this study is to provide architecture for the execution of scientific workflows on cloud environments. In summary, this study outlines the following contributions: (1) a cloud architecture with focus on the relation of provider–customer in executing scientific workflows (i) execution time and cost estimation, (ii) feedback to user with the available options to execute his/her application and (iii) present the user all the stages he/she needs to follow in order to execute his/her application; and (2) a workflow scheduler based on the PSO metaheuristic able to analyze both computing- and data-intensive applications.

2 Related work

The background of this study has three foundations: workflow manager systems (WMS), schedulers and architectures for the execution of applications on cloud systems.

2.1 Workflow Manager Systems for Cloud Systems

The main objective of a WMS is to act as a link between cloud resources and users’ workflows as expressed in Fig. 1. WMS produce the necessary configuration files to execute the workflows computer tasks over the available computer resources. Consequently, users do not need to deal with low-level specification such as data transfers and task dependencies. WMS offer different configuration options to distribute and execute dedicated workflows over computer systems. Numerous WMS were originally designed for grid systems. Even though cloud computing and grids systems run on similar technologies, for this reason, most of these WMS can be used on cloud systems with minor revision or even without any adjustments. Following is a list of famous WMS in use in the scientific community. The Pegasus system [4] is able to handle heterogeneous resources and manage applications ranging from a few tasks up to one million. This tool has given proof of its functionality on the Amazon EC2 as well as on the Nimbus cloud platform. HTCondor [5], a specialized system to manage computing-intensive tasks, has its main feature in executing tasks on dedicated and non-dedicated machines, i.e., computer resources being used by a person or other process which can be used by HTCondor whenever they become idle. Kepler [12], a workflow manager based on web services, provides a graphical interface to the user in visualizing the flow of job execution. It simplifies the required effort to create an executable model by using a visual representation of a particular process. The Taverna [13] provides a friendly interface to the user to build their workflows. It also provides the ability to monitor previous and current workflows as well as intermediate and final results while executing the application. Similarly, SwinDeW-G [14] is a peer-to-peer manager system. It accepts workflow submission on any peer, and then it distributes it among near peers with enough capabilities. However, this tool needs further adaptation for use on cloud systems.

Fig. 1
figure 1

Representation of the input and output of a workflow manager system. It receives workflow tasks and produces the required files that the systems need to interpret to execute workflows tasks

2.2 Schedulers

A scheduler analyzes an application and distributes its tasks among available resources in order to optimize particular objectives as represented in Fig. 2. It shows the representation of a workflow scheduler. A scheduler receives a workflow description, and then it defines a number of queues and distributes tasks to efficiently execute the application based on user demands. Workflows can be sent to resources following simple approaches such as queues or using more complex techniques. Whenever a user requires an efficient usage of cloud resources, then he/she requires a specialized scheduler focusing on his/her particular objectives. The following works present applicable scheduling algorithms for cloud environments. The HEFT (Heterogeneous Earliest-Finish-Time) [15] is a widely recognized scheduler in accomplishing efficient performance in computing systems. HEFTs objective is to execute tasks onto heterogeneous machines in the earliest completion time. An important feature of this scheduler is its low scheduling overhead time. It first lists tasks based on their computational demands, and then it starts assigning each task to the VM that executes it in the earliest finishing time. Authors in [16] developed a scheduler which scales up and down the number of VMs founded on provenance information obtained through execution. Provenance, as we will denote this algorithm, optimizes execution time, monetary cost and reliability. This algorithm starts grouping tasks with similar file dependencies; then, it chooses a set of VMs to execute the given tasks. Authors in [17] present a scheduler with a flexible selection of the number of VMs balancing cloud revenue and customer budget. Flexible, as we will call this scheduler, measures the number of VMs needed to execute a given application based on its computing requirements. With its base in microeconomics, Flexible finds equilibrium between cloud user and provider needs. On the one side, a customer aims to execute his/her workflow within a budget. On the other side, the cloud provider’s objective is to maximize revenue. Equilibrium is reached when both parties fulfill their needs. In [18], the authors developed a load forecasting algorithm, an approach to efficiently manage web applications on cloud computing environments. This algorithm predicts loads on computer machines based on historical and current data. The load forecasting algorithm produced outstanding results regarding its decision making in selecting a target resource. BaRRS [19] is a scheduler with the objective of balancing task queues to execute the scientific application in cloud environments optimizing runtime and monetary cost. GA-ETI [20] develops a powerful scheduler tool to map workflow application to cloud resources. It modified the genetic algorithm providing specific alteration to the crossover and mutation operators with the objective to reduce randomness, an inherent characteristic of genetic algorithms. After a deep examination of the mentioned scheduling approaches, we observe most of them (1) do not provide different options to execute their application with different execution times, budget limits and number of resources; (2) tend to spend user budget without analyzing different scheduling options that may end up with higher efficiency; (3) do not analyze the complete application prior to execution resulting in limited scheduling decisions and a poor performance; or (4) cause an excessive scheduling overhead time.

Fig. 2
figure 2

Representation of a workflow scheduler. A scheduler receives a workflow description, and then it defines a number of queues and distributes tasks to efficiently execute the application based on user demands

2.3 Architectures for execution of application of cloud systems

Architecture, in this context, is a framework with the required components involved in the execution of applications on cloud environments. Cloud computing only provides the hardware to run computer experimentation, but additional components are needed in order to build a complete system for the execution of applications such as a scheduler, application receiver, workflow editor/engine, application monitor and a discovery manager. Following is a list of architectures with a similar objective as this study. The broker-based framework in [7] defines architecture with three main components: a workflow engine, service broker and cloud providers. Its objective is to exploit economic benefits from the use of cloud environments. Based on the service level of agreements chosen by the user, this tool selects adequate resources using a cloud service broker. The Workflow Engine for Clouds from [9] proposed a versatile architecture for the business perspective of cloud computing. It has a featured market-orientated component acting as an intermediary between user and computing resources. The multi-agent architecture presented in [6] runs applications using one or more cloud providers targeting concurrent and parallel execution. The work in [8] introduces the SciCumulus middleware that exploits parallelism with two techniques: parameter sweep and data fragmentation. Nevertheless, these approaches fail in providing a feedback to users with the options to run his/her application in the cloud.

As previous paragraphs exhibited, each of these foundations has its own objective. The purpose of this study is to embrace these roots to find a strong architecture to execute workflows on cloud environments. Firstly, current WMS are well-designed tools that only require specialized scheduling policies to be added, for this reason, there is no necessity to develop a new WMS, and instead this study selects one of them to act as a link between cloud resources and our scheduler. Secondly, we proposed a scheduler capable of overcoming the negative aspects of the aforementioned scheduling algorithms. Finally, every component is enclosed into the architecture to exhibit the user the main components he/she needs to retrieve and understand in order to have the application executed.

3 Cloud user–provider affiliation: architecture to conduct scientific experiments on cloud systems

CUPA is a mechanism to guide researchers in executing their workflows on cloud environments acting as a bridge between workflows and cloud resources. CUPA’s main features are: (1) to specify and provide guidance on the steps a user must follow to get his/her workflow executed on the cloud resources; (2) to use history data or use a program profiler to retrieve tasks running time for those cases where information is unknown; (3) to accept scientific applications demanding high computing power and/or intense file transferring; and (4) to develop a complete scheduler based on the PSO to make an efficient use of cloud resources.

The architecture of CUPA is presented in Fig. 4. Stage (1) is a process completed by user; he/she develops a workflow \({{\varvec{W}}}=\left\{ {t_1,\ldots , t_n } \right\} \) to solve a problem in a particular scientific area. Each task \(t_i\) from \({{\varvec{W}}}\) has a set of parents \(t_i^{\mathrm{parents}}\) linked by a set of files with total size \(f^{\mathrm{size}}\). Examples of scientific workflows are presented in Fig. 3. At stage (2), CUPA requests execution time \(\hat{t}_i^{\mathrm{exe}}\) for each task. CUPA gives two options for obtaining this information, (i) using a Program Profiler and (ii) retrieve historical data from previous executions. CUPA recommends that a user employs historical data; in cases where these data are not available, different profilers such as SimpleScalar [21] may be used. A user collects the calculated execution time and attaches it to each task. Then at stage (3) a user selects the optimization level for each objective, i.e., monetary cost and makespan. It is important to highlight user selects a percentage and not a budget or time value limit since CUPA’s internal scheduler provides an analysis and then provides different options based on user selection. Then at stage (4) the PSO-DS receives a workflow with its information attached including optimization levels. Then a user receives feedback with the possible scheduling configuration with a different number of VMs, monetary cost \({ MCst}\) and makespan \({ MSpn}\) and then he/she selects the one that is best for his situation. Finally, at stages (5–6), CUPA requests the required VMs and submits tasks to each resource based on the selected scheduling configuration and triggers execution.

Fig. 3
figure 3

Example of five well-known scientific applications from different research areas including astronomy, geology, biology, cosmic analysis and biotechnology. The CUPA objective is to guide the user in analyzing, scheduling and executing these types of workflows

Fig. 4
figure 4

Cloud user–provider affiliation. Cloud affiliation consists of the steps a user’s workflow must take to be analyzed before a scheduler presents executing options in the cloud system

3.1 Profiler description

This study proposes the usage of a profiler to obtain task execution time whenever this information is not available to the user. The objective of a profiler is to examine a computer program to discern memory usage, instruction count and time complexity. The profiling process is realized over program code or binary executable files. Different profilers are currently available; as an example, SimpleScalar LLC [21], a popular profiler in the research on computer architecture [22,23,24,25,26,27,28,29], is a system to model a program’s performance and provide a detailed examination of processor architecture as well as hardware and software co-verification as exemplified in Fig. 5. SimpleScalar is able to output the execution time, number of instructions and the required CPU cycles to execute a program on a particular architecture processor. The use of profilers such as SimpleScalar is recommended for workflows in which the same task/program is executed with different file inputs; SimpleScalar needs to run the task/program itself to produce the aforementioned stats.

Fig. 5
figure 5

SimpleScalar basic functions to analyze applications

Fig. 6
figure 6

Example to calculate makespan and monetary cost with Eqs. 16

3.2 The scheduling problem

Scheduler responsibility is to organize tasks into a set of \(vm_j^{\mathrm{queue}}\) to be executed by a given \(vm_j\) with the objective of minimizing total makespan \({ MSpn}\) (Eq. 1) and monetary cost \({ MCst}\) (Eq. 2). To accomplish this task, a scheduler needs to define the size of pool of resources \({{\varvec{VM}}}=\left\{ {vm_1,\ldots , vm_v} \right\} \) based on cost to hire each resource \(vm_j^{\mathrm{cost}}\) and the potential time to execute each set of tasks \(vm_j^{\mathrm{time}}\) from a given \(vm_j\). Makespan is the latest finishing time (LFT) to execute all \(vm_j^{\mathrm{queue}}\). Equation 3 defines \(vm_j^{\mathrm{time}}\) as the LFT from all tasks assigned to \(vm_j\). Equation 4 expresses \(\hat{t}_i^{\mathrm{total}}\) as the time to execute \(t_i\), while Eq. 5 is the time to execute all its parent tasks. Finally, Eq. 6 is employed to calculate the file transfer between \(vm_p \) and \(vm_i\). Figure 6 presents an example to calculate \({ MSpn}\) and \({ MCst}\) for a four-task workflow and a set of two VMs with unitary values for \(\hat{t}_i^{\mathrm{exe}}\), \(f_i^{\mathrm{size}}\), \(vm_j^{bw}\) and \(vm_j^{\mathrm{cost}}\). Firstly, consider the four tasks are equally distributed to the set of VMs as indicated. Secondly, files are transferred only between tasks residing on different VMs. Thirdly, tasks are presented over the timeline exhibiting its corresponding execution time \(\hat{t}_i^{\mathrm{exe}}\) and its transfer time \(\hat{t}_i^f \). Task \(t_1 \) must transfer its corresponding \(f_1^{\mathrm{size}}\), while \(t_2 \) does not require data transmission since it is allocated to same VM. In contrast, \(t_3 \) must transfer \(f_3^{\mathrm{size}}\) from \(vm_1 \) to \(vm_2 \), while \(t_4\) does not require any data transmission. It is clearly seen that \(vm_1^{\mathrm{time}}\) executes its set of tasks within three units of time, while \(vm_2^{\mathrm{time}}\) executes its tasks in five units of time. Finally, \({ MSpn}\) selects the largest value from [\(vm_1^{\mathrm{time}}, vm_2^{\mathrm{time}}\)] obtaining the value of five. As for \({ MCst}\), it obtains a value of eight units of time.

$$\begin{aligned}&{ MSpn} = \mathrm{LFT}_{j=1}^{\left| {VM} \right| } [vm_j^{\mathrm{time}}] \end{aligned}$$
(1)
$$\begin{aligned}&{ MCst} = \mathop {\sum }\limits _{j=1}^{\left| {VM} \right| } \lceil vm_j^{\mathrm{time}} vm_j^{\mathrm{cost}}\rceil \end{aligned}$$
(2)
$$\begin{aligned}&vm_j^{\mathrm{time}} = \mathrm{LFT}_{i=1}^{\big |{vm}_j^{\mathrm{queue}}\big |} [ {\hat{t}_i^{\mathrm{total}} }] \end{aligned}$$
(3)
$$\begin{aligned}&\hat{t}_i^{\mathrm{total}} = \hat{t}_i^f +\hat{t}_i^{\mathrm{exe}} +\hat{t}_i^{\mathrm{parent}} \end{aligned}$$
(4)
$$\begin{aligned}&\hat{t}_i^{\mathrm{parent}} = \mathop {\sum }\limits _{p=1}^{\left| {\mathrm{parents}} \right| } \hat{t}_p^f +\hat{t}_p^{\mathrm{exe}} \end{aligned}$$
(5)
$$\begin{aligned}&\hat{t}_i^f = \frac{f_i^{\mathrm{size}} }{{ min}\left( {vm_p^{bw}, vm_i^{bw} } \right) } \end{aligned}$$
(6)

This model assumes: (1) every VM has a fixed bandwidth \((vm_j^{bw})\), number of cores \((vm_j^{\mathrm{cores}})\), memory size \((vm_j^{\mathrm{mem}})\) and disk size \((vm_j^{\mathrm{disk}})\) and (2) the CUPA system negotiates with a cloud provider/broker in obtaining the number of VMs indicated by scheduling analysis. Since this study uses Pegasus as part of experimentation, workflow description follows its format including executable files, input data and DAX file which is an abstract description of the workflow and its internal dependencies. Additionally, workflow description must include the time to execute each of its tasks. This study prevents cases where this information is not available. For such cases, it employs a profiler to retrieve the missing data. Once information is complete, the analyzer produces scheduling plans with different finishing time, monetary cost and number of VMs. For this task, CUPA employs a scheduler based on the PSO optimization technique as described in Sect. 4.

4 Particle Swarm Optimization in solving the scheduling problem

For this study, the authors developed a scheduling approach based on the particle swarm optimization (PSO) mechanism to solve the aforementioned problem. PSO is a process to find a solution for nonlinear problems in terms of their optimization functions. PSO is based on particles continuously moving while aiming to obtain the coordinates that optimize the evaluation value as illustrated in Fig. 7. PSO is strongly related to swarming theory and has similarities with genetic algorithms (GAs) [30]. As compared with GA, PSO has lower demands in terms of computational power, memory capacity and computer coding with exceptional capabilities to solve different kinds of optimization problems [31].

Fig. 7
figure 7

The PSO process. Each particle (solution) moves through the solution space. Particles change velocity and direction on each iteration targeting a global best highlighted by a white shadow

4.1 Particle Swarm Optimization (PSO): the original model

In the original form of PSO [30], GBEST model is a searching technique in the solution space for an optimal answer. It is orientated for problems expressed with real numbers. In order to extend PSO’s scope, a discrete version of the swarm algorithm is developed in [32]. The core of the original version was kept intact, while differing only on the discrete mode to manipulate the problem. Each particle in the PSO represents a solution and has a position and a velocity in the search space. Through a series of iterations, particles swarm through the solution space to find the maximum (or minimum) value for a given evaluation function. The following is the notation to introduce the discrete PSO (Algorithm 1).

On a population with size P, consider the position of the ith particle as \(X_i^t =\left( {x_{i1}^t, x_{i2}^t,\ldots , x_{iD}^t} \right) \) with D bits where \(x_{id}^t \in \left\{ {0,1} \right\} \). A particle’s velocity is then defined as \(V_i^t =\left( {v_{i1}^t, v_{i2}^t,\ldots , v_{iD}^t } \right) \) where \(v_{id}^t \in R\). The PSO keeps a record of the particle’s best position on \({ PBEST}_i^t =\left( {{ pbest}_{i1}^t, { pbest}_{i2}^t,\ldots , { pbest}_{iD}^t } \right) \) as well as a global best solution ever found in \({ GBEST}^{t}=\left( {{ gbest}_{i1}^t, { gbest}_{i2}^t,\ldots , { gbest}_{iD}^t } \right) \). Equation 7 presents the function to calculate the velocity \(v_{id}^t \) for the dth dimension of the ith particle on the t iteration. The \(\omega \) term introduced in [33] is a particle’s inertia to continue moving toward its original destination. Acceleration coefficients of \(c_1, c_2 \) act as a particles’ memory, inclining it to move toward \({ PBEST}_i^t\) and \({ GBEST}^{t}\), respectively. The objective of \(v_{id}^t \) is to drive particles in the direction of a “superior” position in terms of its evaluation function value. Position \(X_i^t \) is updated on every iteration of the PSO.

Sigmoid function (Eq. 8) is employed to operate velocities, as probabilities values, in the interval of [0, 1]. Additionally, \(v_{id}^t \) is limited to a fixed range of values \(\left[ {-V_{{ max}}, +V_{{ max}}} \right] \) to prevent \(s\left( {v_{id}^t} \right) \) from falling on the upper or lower bound of [0, 1]. In our experiments, also advised by [34], \(V_{{ max}} =4\). Algorithm 1 presents the generic discrete version of the PSO for a maximization optimization. In step 1, it initializes an array of particles with random positions \(X_i^0 \) and velocities \(V_i^0 \). In steps 2–20, it executes its main cycle. It employs function F to evaluate each particle’s value in step 4; if a particle’s value is greater than its previous best position, then \({ PBEST}_i^t \) is updated by the particle’s value. Similarly, the global best \({ GBEST}^{t}\) value is compared, and updated if required, with \(P_i^t \) (steps 7–9). In the sub-cycles in steps 10–18, Algorithm 1 updates the velocities and positions for all D dimensions of a particle. In steps 11–12, each particle updates its velocity \(v_{id}^t \) and caps its values. Finally, based on the result of the sigmoid function, each particle sets the value for each dimension \(x_{id}^{t+1}\) for iteration \(t+1\). The main cycle continues until a termination criterion is met.

$$\begin{aligned}&v_{id}^t = \omega v_{id}^{t-1} +c_1 r_1 \left( {{ pbest}_{id}^t -x_{id}^t } \right) + c_2 r_2 \left( {{ gbest}_{id}^t -x_{id}^t } \right) \end{aligned}$$
(7)
$$\begin{aligned}&s\left( {v_{id}^t } \right) = \frac{1}{1+\hbox {exp}\left( {-v_{id}^t } \right) } \end{aligned}$$
(8)
figure a

4.2 PSO-DS as the scheduling engine for CUPA

In this section, we present our modified particle swarm optimization with discrete adaptation and a featured SuperBEST (PSO-DS)—an extension to the generic PSO—to solve the scheduling problem in this article. Following is the description of the PSO-DS particles, velocity and the introduction of a featured SuperBEST particle.

4.2.1 Adaptation of particle format

As in to [35], PSO-DS needs to unfold the original discrete PSO particles to interpret integer numbers to solve the scheduling problem. In PSO-DS, particles have an augmented format \(X_i^t =\left( {x_{i11}^t, x_{i21}^t,\ldots , x_{inv}^t } \right) \), \(x_{ijk}^t \in \left\{ {0,1} \right\} \), where \(x_{ijk}^t =1\) if the jth task of the ith particle is executed in \(vm_k \), and \(x_{ijk}^t =0\) otherwise. For ease of explanation, this study introduces a short format to represent particles, i.e., \(X^{\prime t}_i =\left( {x^{\prime t}_{i1}, x^{\prime t}_{i2},\ldots , x^{\prime t}_{in}} \right) \), \(x^{\prime t}_{ij} \in \left\{ {vm_1,\ldots , vm_\mathrm{v}} \right\} \), is the abstract representation of particle \(X_i^t \) where \(x^{\prime t}_{ij}\) is the \(vm_k \) executing task j of particle i at time t. An example particle is presented in Fig. 8 for a workflow with four tasks \({{\varvec{W}}}=\left\{ {t_1, t_2, t_3, t_4} \right\} \), and a set of two resources \({{\varvec{VM}}}=\left\{ {vm_1, vm_2 } \right\} \). Here, particle i is expressed in its long and abstract format \(X_i^t \) and \(X^{\prime t}_i\), respectively.

Fig. 8
figure 8

Each particle represents a workflow scheduling configuration. The particle’s long format is used for velocity calculation, while its abstract representation expresses the scheduling distribution

For a particle expressed in its abstract format \(X^{\prime t}_i\), the number of different values that each of its dimensions can have is v, given \({{\varvec{VM}}}=\left\{ {vm_1,\ldots , vm_\mathrm{v} } \right\} \). At the same time, v is driven by the number of parallel tasks in a given workflow. Figure 9 illustrates this concept; for Workflow 1, the maximum number of tasks that can be executed in parallel is four (in the third level of Workflow 1). As a consequence, \({{\varvec{VM}}}\) is set to the values of \(\left\{ {vm_1, vm_2, vm_3, vm_4 } \right\} \) because any additional VM (more than four) in the pool will remain idle during execution of this workflow.

Fig. 9
figure 9

Setting up the maximum number of values particle’s dimensions can have

4.2.2 Particle’s velocity adaptation

The velocity from Eq. 7 is transformed into Eq. 9 in order to follow each particle’s adaptation. Firstly, for \(X_i^t\) (expressed in its long format), the best position is defined as \({ PBEST}_i^t =\left( {{ pbest}_{i11}^t, { pbest}_{i21}^t,\ldots , { pbest}_{inv}^t } \right) \), while the global best particle in the population is defined as \({ GBEST}^{t}=\left( {{ gbest}_{11}^t, { gbest}_{21}^t,\ldots , { gbest}_{nv}^t } \right) \). Parameters \(\omega \), \(c_1 \), \(r_1 \), \(c_1 \), \(r_1 \) and \(V_{{ max}}\) have the same functions as in the original discrete PSO described in the previous section. Additionally, Eq. 8 is slightly modified to produce Eq. 10 for managing dimension velocities as a set of probabilities, i.e., representing velocities in the range of [0,1]. Here, each dimension’s velocity \(v_{ijk}^t \) is the probability of \(vm_k \) to execute \(t_j \).

$$\begin{aligned}&v_{ijk}^t = \omega v_{ijk}^{t-1} +c_1 r_1 \left( {{ pbest}_{ijk}^t -x_{ijk}^t } \right) + c_2 r_2 \left( {{ gbest}_{jk}^t -x_{ijk}^t } \right) \end{aligned}$$
(9)
$$\begin{aligned}&s\left( {v_{ijk}^t } \right) = \frac{1}{1+\hbox {exp}\left( {-v_{ijk}^t } \right) } \end{aligned}$$
(10)

4.2.3 Scheduling reconstruction from particle velocity

PSO-DS has a population of solutions expressed as velocity probabilities requiring an interpretation to construct scheduling configurations. In this context, VMs compete to execute tasks, while each task can only be assigned to a single resource. Unlike [35], where construction of particles forces consecutive tasks to be assigned to different VMs, our approach allows to assign sets of successive tasks to the same VM to avoid unnecessary data transfers. Consider \(s^{\prime }\left( {v_{ijk}^t } \right) \) (Eq. 11) as the probability of assigning the jth task to the kth resource from the pool of v machines where \(\sum _{k=1}^v s^{\prime }\left( {v_{ijk}^t } \right) =1\). During a process to be repeated for every task j in every particle i, only one dimension, namely kth, \(x_{ijk}^t =1\), while \(\left\{ {x_{ijk^{\prime }}^t =0|k^{\prime }\ne k} \right\} \); k is the index of the dimension with the maximum value. Figure 10 presents a velocity and position update example for a single-task workflow and a set of four VMs. Here, the \({ GBEST}^{t}\) indicates that the task should be allocated to \(vm_4 \), while the particle \(X_i^t \) has assigned the task to \(vm_1 \). In the resulting \(S^{\prime }\left( {V_i^t } \right) \), the last dimension exhibits the highest probability to obtain 1, and thus in the updated particle \(X_i^{t+1}\), \(vm_4 \) executes \(t_1 \).

$$\begin{aligned} s^{\prime }\left( {v_{ijk}^t } \right) =\frac{s\left( {v_{ijk}^t } \right) }{{\sum }_{k=1}^v s\left( {v_{ijk}^t } \right) } \end{aligned}$$
(11)
Fig. 10
figure 10

Particle updates its velocity and position on each iteration. The process to calculate the velocity is done by expressing the particle in its long format

4.2.4 SuperBEST particle and the GBEST

We defined a new particle, namely \({ SuperBEST}^{t}\), in PSO-DS; \({ SuperBEST}^{t}\) is built using the most popular particles’ elements in the population. Consider the set \(X^{\prime \prime t}_j =\left\{ {x^{\prime \prime t}_{j1}, x^{\prime \prime t}_{j2},\ldots ,x^{\prime \prime t}_{jP}} \right\} \), \(x^{\prime \prime t}_{ji} \in \left\{ {vm_1,\ldots , vm_\mathrm{v} } \right\} \), where \(x^{\prime \prime t}_{ji}\) is the value assigned to dimension j in the ith particle (expressed in its abstract format) for a population with P particles. In this case, \({ SuperBEST}^{t}=\left( {sb_1^t, sb_2^t,\ldots , sb_n^t } \right) \), where \(sb_j^t \) holds the value with highest frequency from \({X}''\). Figure 11 illustrates the formation of the \({ SuperBEST}^{t}\). Firstly, from a population of three particles \(X_1^t \), \(X_2^t \) and \(X_3^t \), a set of \(X^{\prime \prime t}_j\) vectors express the population; then the frequency of occurrence for each \(vm_k \) is counted; and finally, the \({ SuperBEST}^{t}\) is composed. The PSO-DS uses the \({ SuperBEST}^{t}\) to build the \({ GBEST}^{t}\). For \({ GBEST}_{{ Pbased}}^t ={ max}_{i=1}^P \left[ {{ PBEST}_i^t } \right] \) and \({ GBEST}_{{ Xbased}}^t ={ max}_{i=1}^P \left[ {XBEST_i^t } \right] \), the \({ GBEST}^{t}={ max}\left( {{ SuperBEST}^{t}, { GBEST}_{{ Pbased}}^t, { GBEST}_{{ Xbased}}^t } \right) \).

Fig. 11
figure 11

The SuperBEST particle formation. From every dimension, SuperBEST selects the value with the higher number of occurrence. SuperBEST is updated and included in PSO-DS

4.2.5 Evaluation function and termination criterion

This study adopts a maximization optimization for the scheduling process in the PSO-DS. It employs an evaluation function \(E_{{ value}}\) that integrates the makespan \({ MSpn}\) and monetary cost \({ MCst}\) with weight values \(w_1\) and \(w_2\) to control objective optimization. The variables \({ max}^{{ time}}, { min}^{{ time}}, { max}^{{ cost}}\) and \({ min}^{{ cost}}\) retain maximum and minimum values of makespan and economical cost through continuous update during PSO-DS processes. PSO-DS continues until \(E_{{ value}}\) shows no improvement; PSO-DS outputs GBEST as the final solution.

$$\begin{aligned} E_{{ value}} =w_1 \frac{\left( { max}^{{ MSpn}}-{ MSpn}\right) }{\left( { max}^{{ MSpn}}-{ min}^{{ MSpn}}\right) }+w_2 \frac{\left( { max}^{{ Cst}}-{ MCst}\right) }{\left( { max}^{{ Cst}}-{ min}^{{ Cst}}\right) } \end{aligned}$$
(12)
figure b

4.2.6 The proposed PSO-DS algorithm

The resulting PSO-DS algorithm is presented in Algorithm 2. In step 1, it creates a population with size P where random positions \(X_i^t \) and velocities \(V_i^t \) are assigned to the population. Then steps 2–38 present the main loop. Steps 4–6 update \({ PBEST}_i^t \) for every particle; \({ GBEST}_{{ Xbased}}^t \) and \({ GBEST}_{{ Pbased}}^t \) update their values if the current particle has a higher evaluation value. In steps 14–18, the algorithm builds the \({ SuperBEST}^{t}\) particle. Following in step 19, PSO-DS selects \({ GBEST}^{t}\) from the pool \(\left\{ { SuperBEST}^{t}, { GBEST}_{{ Pbased}}^t, { GBEST}_{{ Xbased}}^t \right\} \). In steps 23 – 26, velocity \(v_{ijk}^t \)is updated and capped to the range \(\left[ {-V_{{ max}}, +V_{{ max}} } \right] \). Next, \(\hbox {s}\left( {v_{ijk}^t } \right) \) expresses \(v_{ijk}^t \) as a probability in the interval [0,1]; its respective position \(x_{ijk}^t \) is set to 0 as a preliminary step to update each particle’s position. Steps 28–35 present procedures to update each particle’s position; it is repeated n times for a given workflow \({{\varvec{W}}}\) with size n. Steps 29–34 form a loop to calculate \(s^{\prime }\left( {v_{ijk}^t}\right) \) for all k resources that are competing to execute the jth task in the ith particle. Finally, only the resulting k resource (Step 32) in \(x_{ijk}^t\) obtains the value of 1 for the jth task in the ith particle.

5 Experiments and analysis

This section evaluates the performance of the PSO-DS in the CUPA using three main tests. In experiment 1, we compare the execution of workflows using the Pegasus-WMS without any external scheduling policy and with the complete CUPA framework including the PSO-DS scheduler. Then in experiment 2, we analyze the need to guide the user in selecting a limited budget by comparing monetary costs when executing workflows with/without an unlimited budget. Finally in experiment 3, we compare the performance of the PSO-DS against Provenance [16], HEFT [15] and Flexible [17] and GA-ETI [20].

The experiment test bed consists of a private cloud with three Krypton Quattro R6010 with 4-way AMD OpteronTM 6300 series (64-Cores each). We selected the Pegasus-WMS (4.2) as the WMS; it is installed on Ubuntu 14.04 operating system. The VMware vSphere (5.5) manages computer resources and provides virtual machines with the aforementioned platform. As described in the CUPA, the PSO-DS performs as the scheduling engine with parameters set as shown in Table 1. In order to define the PSO-DS parameters, we produced preliminary experiments to tested \(\omega \) in the range of [0.9, 1.2] and \(c_1 =c_2 =2\) as advised in [34, 36]. From those trials, a \(\omega \) value of 1.2 produced the best results. Additionally [36], recommended a relatively small population of 20 particles but the nature of the scheduling problem demanded a larger population; preliminary tests exhibited that 100 particles give the best results. \(V_{{ max}}\) is set to 4 to prevent \(s\left( {v_{id}^t } \right) \) in Eq. 8 to continuously approaching the upper and lower bound of [0,1]. Finally, \(w_1 \) and \(w_2\) have a value of 0.5 to give the same priority to monetary cost and execution time. We selected five scientific workflows from [1] to produce the experiments. Workflows represent applications from different scientific areas including astronomy, geology, biology, cosmic analysis and biotechnology. Their details are presented in Table 2.

Table 1 PSO-DS setup
Table 2 Characteristics of the scientific workflows employed in experiments to test GA-ETI

5.1 Analysis 1: the need for a specialized scheduler on top of WMS

This first experiment stage has the objective of highlighting the need to add a specialized scheduling analysis on top of WMS. Table 3 provides makespan and monetary cost results for the PSO-DS and Pegasus-WMS. The epigenomics workflow presents the biggest difference in terms of time and cost due to its great parallelism level. In this application, PSO-DS converges to solutions where dependent tasks are executed on the same VM; as a consequence PSO-DS successfully decreases to its minimum the number of data transfers. Similar scenarios are presented on the rest of the applications. For example, the Ligo workflow has three main groups of task running in parallel; PSO-DS converges to solutions where not all of the tasks are executed at the same time; since monetary cost is also considered as an optimization objective, it balances the optimization of makespan and cost. As for Pegasus, it presents higher makespan and cost values due to that fact that its only objective is to execute the application. Pegasus uses HTCondor as its internal DAG (direct acyclic graph) executer. HTCondor receives workflow and sends its tasks for execution with almost no control on which tasks to execute on a given VM or which data files would be replicated. In contrast, PSO-DS is able to evaluate a different number of scheduling configurations and chooses the one that contributes to the highest optimization value .

Table 3 Results for makespan and monetary cost for PSO-DS and PEGASUS-WMS

5.2 Analysis 2: the need to guide users in selecting a limited budget

In this experiment, we allow the PSO-DS to produce scheduling configurations relaxing the monetary cost optimization. Results are presented in Fig. 12; it first schedules for two VMs, then three, four and so on. For each workflow, three graphs are presented; the first graph presents evaluation function values; the second and third columns of graphs present their corresponding makespan and monetary cost. For each case, a red shadow highlights the values with function value above 80%. As readers will notice, makespan drops dramatically as the number of VMs increases, but once a low makespan is achieved, it does not decrease notably. In contrast, by incrementing the number of resources the makespan decreases by small amounts of timeframes but monetary cost increases linearly to the number of VMs.

With the exception of epigenomics, evaluation function values for every case present similar behavior; they rise as the number of VMs increases, then it reaches a peak and finally it drops. Peak time is presented when number of VMs is optimal for a particular case. For instance, when PSO-DS distributes Cybershake’s tasks into four VMs, it obtains an optimal case with a makespan of 4377s at the cost of 1.256 dlls, even though the minimal achievable makespan is 3891s at the cost of 3.768 dlls. As for the epigenomics, its evaluation function starts rising as the number of VMs increases, it reaches its peak with five VMs, then it starts dropping, but it suddenly has a rise with 12 VMs. The reason for this behavior is that maximum number of parallel tasks is 24; for this reason, PSO-DS converges with a uniform distribution of these parallel tasks among 12 VMs. The reason PSO-DS does not produce an improved scenario for 24 VMs is that our scheduling model contemplates an hourly changing model of every VM, so although the configuration with 24 VMs presents the minimal makespan, its cost rises tremendously.

5.3 Analysis 3: Performance of PSO-DS in front up-to-date schedulers

In order to provide arguments for the competence of the PSO-DS, in this experiment, we compare it against the four schedulers previously presented. With the exception of Cybershake, results show the PSO-DS is able to provide better results especially for the cases with a low number of VMs and function values above 80% as highlighted in the previous section. An important reason for these positive results is that PSO-DS is designed to consider the most predominant factors affecting makespan such as task grouping that is based on their dependencies, file sizes and available number of VMs. It is important to highlight that whether HEFT and GA-ETI converge with similar makespan values they do it when the number of VMs increases. An important factor to emphasize is that optimal function values (above 80%) are presented as soon as makespan does the biggest drops. For example, in the Sipht result graph, in Fig. 13c, the makespan drops from 7188 to 2987s with two and five VMs, respectively, beyond that number of resources makespan does not provide a substantial improvement for any of the algorithms. This behavior is presented for the rest of workflow execution.

Fig. 12
figure 12

Results for evaluation function values, makespan and monetary cost for the five scientific worklfows highlighting the area where evaluation function values are above 80%

Fig. 13
figure 13

Execution time results with a different number of VMs

6 Discussion

This section provides a deeper analysis of the PSO-DS performance compared with other algorithms for solid reference.

6.1 PSO-DS performance on a Pareto front fashion

Figure 14 presents results of the five scheduling algorithms on a Pareto front fashion, i.e., on the \({ MSpn}\) vs \({ MCst}\) graph. Pareto front is a concept in economics with application in engineering [37]. It is defined as positioning individuals where no one of them can improve its position without deteriorating another’s. The Pareto front is built from connecting such individuals forming a curved graph which in engineering is used as a trade-off of values. In this case, we analyze only the epigenomics workflow since the remaining workflows present similar behavior. Figure 14 exhibits the results for makespan with its corresponding monetary cost. As readers will notice, the PSO-DS is able to converge to the values spread along the complete \({ MSpn}\) vs \({ MCst}\) curve. This analysis provides proof that PSO-DS does not get trapped in particular sections of the solution space and corroborates that the local best solution does not overdominate the search for a solution. In a similar manner, GA-ETI and HEFT distribute their values with great proximity to the PSO-DS due to their analysis and filtering of solutions. In contrast, flexible and provenance missed the opportunity to provide superior results due to their minimal workflow analysis.

Fig. 14
figure 14

Epigenomics’ \({ MSpn}\)\({ MCst}\) graph to analyze where in the solution space schedulers search for a solution

6.2 Measure scheduling time

An important factor in scheduling algorithms is the time to run the scheduling process itself. Table 4 presents timeframes to produce a scheduling configuration for a workflow with 100 nodes. The PSO-DS and GA-ETI present the highest processing time with 1300 and 1450 ms, respectively, while the flexible executes the algorithm in 3.1 ms. The reason for this large difference is that flexible does not base its scheduling approach on an evolution of solutions; it rather chooses a final solution from a limited number of configurations. Similarly, HEFT considers a single solution executing the algorithm on only 16.1 ms. On the contrary, PSO-DS and GA-ETI evaluate a group of solutions on a series of iterations unfolding solutions and executing their algorithms on a larger timeframe. However, none of the approaches has an excessive execution time compared with the final makespan as shown in column three of Table 4. Moreover, the high scheduling time as in PSO-DS yields a reduced final makespan.

Table 4 Scheduling time and its relation with final makespan

7 Conclusion

In this article, we proposed CUPA, an architecture to execute scientific workflows in cloud computing systems. CUPA provides guidance to users in selecting the tools he/she needs in order to efficiently run his/her experiments. CUPA features the PSO-DS, a specialized workflow scheduler. PSO-DS is based on the PSO with a special adaptation to the scheduling problem including a discrete formatting of particles and an enhanced super element. Using five workflows representing current scientific problems, PSO-DS was tested and proved its dominance against four cloud schedulers (HEFT, provenance, flexible and GA-ETI). Through experimentation, PSO-DS highlighted the need for a specialized scheduler on top of WMS. PSO-DS is able to provide superior results in terms of makespan and monetary cost compared against other schedulers, in particular in the cases with a small number of resources providing function values above 80%. Additionally, PSO-DS provides scheduling configuration with values spread along the complete \({ MSpn}\) vs \({ MCst}\) curve. PSO-DS’s positive results exhibited the main factors to consider during the scheduling process in order to optimize time and cost; such characteristics include task grouping, job dependencies, file sizes and available number of VMs. Additionally, PSO-DS demonstrated that superior solutions execute parallel tasks sequentially on the same VM in order to lower file transferring. PSO-DS experiments underline the importance of not relaxing the monetary budget. Users may have an unlimited budget, or some schedulers may consider this assumption. By loosening the monetary budget, the user may obtain similar results at the expense of a pointless charge. To continue with this work, the authors aim to produce the CUPA system in order to make it available as a tool for other studies. As for future studies, the authors are focusing on the analysis of large sets of files (Big Data) for applications associated with biological viruses, terrorism and economic crisis behavior.