An energy-aware scheduling algorithm for budget-constrained scientific workflows based on multi-objective reinforcement learning

Qin, Yao; Wang, Hua; Yi, Shanwen; Li, Xiaole; Zhai, Linbo

doi:10.1007/s11227-019-03033-y

An energy-aware scheduling algorithm for budget-constrained scientific workflows based on multi-objective reinforcement learning

Published: 23 October 2019

Volume 76, pages 455–480, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

The Journal of Supercomputing Aims and scope Submit manuscript

An energy-aware scheduling algorithm for budget-constrained scientific workflows based on multi-objective reinforcement learning

Download PDF

Yao Qin ORCID: orcid.org/0000-0003-3062-4052¹,
Hua Wang²,
Shanwen Yi¹,
Xiaole Li³ &
…
Linbo Zhai⁴

1295 Accesses
53 Citations
Explore all metrics

Abstract

Since scientific workflow scheduling becomes a major energy contributor in clouds, much attention has been paid to reduce the energy consumed by workflows. This paper considers a multi-objective workflow scheduling problem with the budget constraint. Most existing works of budget-constrained workflow scheduling cannot always satisfy the budget constraint and guarantee the feasibility of solutions. Instead, they discuss the success rate in the experiments. Only a few works can always figure out feasible solutions. These methods work, but they are too complicated. In workflow scheduling, it has been a trend to consider more than one objective. However, the weight selection is usually ignored in these works. The inappropriate weights will reduce the quality of solutions. In this paper, we propose an energy-aware multi-objective reinforcement learning (EnMORL) algorithm. We design a much simpler method to ensure the feasibility of solutions. This method is based on the remaining cheapest budget. Reinforcement learning based on the Chebyshev scalarization function is a new framework, which is effective in solving the weight selection problem. Therefore, we design EnMORL based on it. Our goal is to minimize the makespan and energy consumption of the workflow. Finally, we compare EnMORL with two state-of-the-art multi-objective meta-heuristics in the case of four different workflows. The results show that our proposed EnMORL outperforms these existing methods.

A multi-objective reinforcement learning algorithm for deadline constrained scientific workflow scheduling in clouds

Article 27 May 2021

A Multi-object Optimization Cloud Workflow Scheduling Algorithm Based on Reinforcement Learning

Online scheduling of dependent tasks of cloud’s workflows to enhance resource utilization and reduce the makespan using multiple reinforcement learning-based agents

Article 24 April 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Cloud computing has been an important computing paradigm, which can provide computing resources through the Internet [1]. For cloud consumers, cloud computing can provide an infinite amount of resources and has the advantages of reliability, scalability, and flexibility [2].

The increasing demand for cloud computing has made the energy problem be one of the major concerns in clouds [3]. It is reported that the energy consumption of cloud data centers accounted for $1\%$ of global electricity usage in 2010, and this rate is expected to increase to 3–13% in 2030 [4]. High energy consumption of data centers also increases global greenhouse gas emissions (GHGE) [5]. Data centers have been the largest culprit in the information and communication technology (ICT), which grew from $33\%$ in 2010 to $45\%$ of global ICT emissions by 2020 [6]. Due to its advantages, cloud computing has been widely used to execute scientific workflow applications [7]. Over the last few years, scientific workflow scheduling has been one of the primary energy consumers in clouds [8]. As a result, we should consider energy saving when designing the workflow scheduling algorithm.

This paper considers a multi-objective workflow scheduling problem with the budget constraint. Budget-constrained workflow scheduling is one of the main research directions in the field of workflow scheduling. However, most existing works of budget-constrained workflow scheduling cannot always satisfy the budget constraint and guarantee the feasibility of solutions. Instead, they discuss the success rate of scheduling in the experiments [8,9,10]. Unlike these works, [11] and [12] propose methods to guarantee the feasibility. These two methods are based on a concept called the budget level. They convert the budget constraint of the whole workflow into the budget constraints of tasks. The existing methods can always figure out feasible solutions, but they are complicated. What’s more, the budget constraints of tasks are just the sufficient conditions of the original budget constraint. In this paper, we propose a much simpler way to guarantee the feasibility of solutions. We directly use the original budget constraint instead of converting it. Our method is proposed based on a concept called the remaining cheapest budget (RCB) [9].

In workflow scheduling, it has been a trend to consider more than one objective [13]. However, the weight selection problem is usually ignored in these works. It is an acknowledged difficult problem to determine the appropriate weight for each objective in multi-objective optimization [14]. If the weights are inappropriate, it will reduce the quality of solutions. From the literature, we find that the combination of reinforcement learning (RL) and the Chebyshev scalarization function can efficiently solve the weight selection problem [15]. RL based on the Chebyshev scalarization function is a new algorithm framework, which provides a general solution to multi-objective optimization. As a result, we design the scheduling algorithm based on it.

In this paper, we propose an energy-aware multi-objective RL (EnMORL) algorithm. Our goal is to find a set of Pareto approximations, which can simultaneously minimize the makespan and energy consumption of the workflow under the specified budget. The energy model in this paper is based on the dynamic voltage/frequency scaling (DVFS) technique [16]. Like most list-based scheduling algorithms [17], EnMORL consists of a ranking phase and a mapping phase. In EnMORL, we first assign a priority to each task in the workflow and then select a virtual machine (VM) of the appropriate type for each task in the order of priority. Finally, we compare EnMORL with two state-of-the-art multi-objective meta-heuristics in the case of four workflows, which come from different scientific applications. The experimental results show that our proposed EnMORL outperforms these existing methods.

The main contributions of this paper can be summarized as follows:

1.
RL is capable of solving difficult multi-step decision-making problems [18]. It is very suitable for solving the workflow scheduling problem in this paper.
2.
We propose a new method based on RCB to satisfy the budget constraint and guarantee the feasibility of solutions. This method is much simpler than those in [11] and [12].
3.
We propose the EnMORL algorithm based on the Chebyshev scalarization function, which is efficient in solving the weight selection problem.
4.
The experimental results demonstrate that EnMORL can compete efficiently with two state-of-the-art multi-objective meta-heuristics in terms of makespan, energy consumption, and the hypervolume indicator [19].

The remainder of this paper is organized as follows: We introduce related works from two aspects in Sect. 2. We briefly introduce the multi-objective optimization in Sect. 3. We give the mathematical formulation of the scheduling problem in Sect. 4. We present our proposed EnMORL algorithm in Sect. 5. We evaluate the performance in Sect. 6. Finally, we conclude this paper in Sect. 7.

2 Related work

In this section, we first introduce some representative works of budget-constrained workflow scheduling, and then the related works of energy-aware workflow scheduling.

The quality of service (QoS) requirements, such as the makespan and cost, are the traditional optimization objectives in workflow scheduling. In this area, budget-constrained workflow scheduling is one of the major research hot spots.

Most existing works of budget-constrained workflow scheduling do not guarantee the feasibility of the solution. Instead, they discuss the success rate or the failure rate of the scheduling in the experiments. Arabnejad et al. [9] propose a heterogeneous budget-constrained scheduling (HBCS) algorithm. The authors aim to minimize the makespan under the specified budget. In their algorithm, they use the combination of the time and cost factors to select the processor for the current task. Garg et al. [10] propose a $\varepsilon $-fuzzy dominance sort-based discrete particle swarm optimization (PSO) algorithm called $\varepsilon $-FDPSO to solve the workflow scheduling in grid computing. Their goal is to optimize makespan, cost, and reliability objectives simultaneously under deadline and budget constraints. The authors use the $\varepsilon $-fuzzy dominance-based sorting procedure to select the particle with a lower rank.

Unlike the above works, Wu et al. [11] aim to find feasible solutions in budget-constrained workflow scheduling. Their proposed critical-greedy (CG) algorithm preassigns tasks with the budget level to meet the budget constraint. However, this algorithm only applies to homogeneous data centers. In [12], Chen et al. extend the concept of the budget level to heterogeneous environments. They propose an efficient algorithm called minimizing the schedule length using the budget level (MSLBL). The authors aim to minimize the makespan of the workflow while satisfying the budget constraint. To meet the budget constraint, the authors convert the budget constraint of the whole workflow into those of tasks. When using the budgets of tasks, MSLBL can at least assign each task assign to the processor with the minimum cost. The methods in [11] and [12] are complex. The budget constraints of tasks are just the sufficient conditions of the original budget constraint. In this paper, we will use a much simpler way to guarantee the feasibility of solutions. We design our method based on the original budget constraint instead of converting it.

An increasing number of scientific workflows are migrating into clouds. As a result, workflow scheduling has been one of the primary energy consumers in data centers. In recent years, much attention has been paid to reduce the energy consumption of workflow scheduling.

In, Li et al. [20] propose a cost and energy-aware scheduling (CEAS) algorithm to solve workflow scheduling in clouds. CEAS is composed of five sub-algorithms. Their goal is to minimize the execution cost and energy consumption under the deadline constraint. To meet the deadline constraint, they propose the concept of sub-makespan. In, Qureshi et al. [21] introduce the concept of power-aware application profiles (APs). The authors use APs to compute the execution cost of a workflow according to the energy consumption requirements. Their scheduling algorithm considers CPU, memory, I/O, and energy consumption requirements.

To reduce the makespan and energy consumption, Sofia et al. [13] propose a multi-objective algorithm based on non-dominated sorting genetic algorithm (NSGA-II) [22]. The authors also propose two single-objective genetic algorithms (GA) to minimize energy consumption and makespan individually. Verma et al. [8] propose a hybrid PSO (HPSO) algorithm to handle the workflow scheduling problem in clouds. HPSO is the combination of heterogeneous earliest finish time (HEFT) [23] and multi-objective PSO (MOPSO) [24]. Their objectives include time minimization, cost minimization, and energy saving. The authors consider both the bi-objective workflow scheduling problem as well as the tri-objective workflow scheduling problem in their work.

We notice that the energy models in the above works are all based on the DVFS technique. This technique has been widely used in energy-aware workflow scheduling problems.

3 Multi-objective optimization

A maximization multi-objective problem with m decision variables and n objectives can be defined as [25]:

$$\begin{aligned} \max&\ \overrightarrow{f}(\overrightarrow{x})=[f_1(x_1,x_2,\ldots ,x_m),\ldots ,f_n(x_1,x_2,\ldots ,x_m)] \nonumber \\ s.t.\nonumber \\&\overrightarrow{x}=(x_1,x_2,\ldots ,x_m)\in X \end{aligned}$$

(1)

$$\begin{aligned}&\overrightarrow{f}=(f_1,f_2,\ldots ,f_n)\in Y, \end{aligned}$$

(2)

where $\overrightarrow{x}$ is the decision vector, X is the search space, $\overrightarrow{f}$ is the objective vector, and Y is the objective space.

A solution $\overrightarrow{x_1}$ is said to dominate another solution $\overrightarrow{x_2}$, ($\overrightarrow{x_1}\succ \overrightarrow{x_2}$), if and only if both of the following two conditions are true [26]:

1.
The solution $\overrightarrow{x_1}$ is not worse than $\overrightarrow{x_2}$ in any objective.
2.
The solution $\overrightarrow{x_1}$ is strictly better than $\overrightarrow{x_2}$ in at least one objective.

The points that are not dominated by any other points are called the non-dominated points, and the corresponding decision variables are called non-dominated solutions or Pareto optimal solutions. The Pareto optimal front is defined as the set of all non-dominated points. Similarly, the Pareto optimal set is the set of all non-dominated solutions.

The concept of Pareto optimal set is used to optimize problems with multiple objectives. The multi-objective algorithm aims to find a Pareto approximate set that can approximate the true Pareto optimal set. In this paper, the set of non-dominated solutions will be updated after the construction of each feasible solution.

4 System models and mathematical formulation

Our algorithm aims to schedule a budget-constrained workflow on a set of available heterogeneous resources, which are in the form of VMs in clouds. This section consists of five parts, namely the workflow model, cloud data center model, makespan model, energy model, and mathematical formulation.

4.1 Workflow model

In general, a workflow application is represented by a directed acyclic graph (DAG). A DAG can be modeled as a tuple G(T, D), where $T=\{t_1,\ldots ,t_m\}$ is the set of m tasks in the workflow, and D is the set of directed edges between tasks. Directed edges represent the task-dependency constraints. If there exists an edge $(t_i,t_j)\in D$, then $t_i$ is called an immediate predecessor of $t_j$, and $t_j$ is called an immediate successor of $t_i$. We use ${\mathrm{pre}}(t_i)$ to denote the set of all immediate predecessors of task $t_i$, and ${\mathrm{succ}}(t_i)$ to denote the set of all immediate successors of task $t_i$. A task can only start after all of its predecessors complete their execution. The task with no predecessors is called the entry task, which can be denoted as $t_{\mathrm{entry}}$. The task with no successors is called the exit task, which can be denoted as $t_{\mathrm{exit}}$. The size of data processed by task $t_i$ is denoted by ${\mathrm{in}}_i$, which is expressed in Millions of Instructions (MI). The size of data transferred between task $t_i$ and $t_j$ is denoted by ${\mathrm{tran}}_{i,j}$, which is expressed in MB.

4.2 Cloud data center model

In the cloud data center, the computing resources are usually provided in the form of homogeneous VMs. We assume that the cloud data center offers n types of VMs, which have different prices with different performance. We use $V=\{v_1,\ldots ,v_n\}$ to denote the set of VM types. For simplicity, we assume that each VM type $v_j$ can provide an infinite number of VMs to execute all the tasks of a workflow.

In this paper, the cloud service is charged based on the number of time intervals that the VM has been used. The consumers have to pay for the whole time interval even if it is not completely used. We use $c_j$ to denote the unit price or cost of using the VM type $v_j$ for each time interval, and $p_j$ to denote the processing speed of type $v_j$ for each time interval. The processing speed $p_j$ is expressed in million instructions per second (MIPS).

Let $x_{i,j}$ be a binary variable, which represents whether task $t_i$ is executed on the VM type $v_j$$(x_{i,j}=1)$ or not $(x_{i,j}=0)$. Each task $t_i\in T$ is executed exactly once, so there exists the equation $\sum _{j=1}^{n}x_{i,j}=1$, where n is the number of VM types in the data center.

4.3 Makespan model

Without loss of generality, we assume that each task can be executed on any VM in the data center. If task $t_i$ is assigned to VM type $v_j$, the execution time ${\mathrm{ET}}(t_i,v_j)$ of task $t_i$ is expressed as follows:

$$\begin{aligned} {\mathrm{ET}}(t_i,v_j)= \frac{\mathrm{in}_i}{p_j} \end{aligned}$$

(3)

When considering the decision variable $x_{i,j}$, the actual execution time ${\mathrm{ET}}(t_i)$ of task $t_i$ can be calculated as:

$$\begin{aligned} {\mathrm{ET}}(t_i)= \sum _{j=1}^{n}{\mathrm{ET}}(t_i,v_j)\cdot x_{i,j}=\sum _{j=1}^{n}\frac{\mathrm{in}_i}{p_j}\cdot x_{i,j} \end{aligned}$$

(4)

We have assumed that all the VMs are in the same data center, which means the storage cost and the transmission cost can be ignored. Let ${\mathrm{EC}}(t_i)$ denote the cost of executing task $t_i$, which can be calculated as follows:

$$\begin{aligned} \mathrm{EC}(t_i)=\mathrm{ET}(t_i)\cdot c_j=\sum _{j=1}^{n}\frac{\mathrm{in}_i}{p_j}c_j\cdot x_{i,j} \end{aligned}$$

(5)

We have assumed that each VM type $v_j$ can provide a sufficient number of VMs to execute all the tasks of a workflow. We assume that all the VMs in the cloud data center have the same communication bandwidth, the value of which is set to a constant b. Then, the actual finish time $\mathrm{FT}(t_i)$ of task $t_i$ can be calculated as:

$$\begin{aligned} \mathrm{FT}(t_i)=\left\{ \begin{array}{ll} \mathrm{ET}(t_i),&{}\quad \mathrm{pre}(t_i)=\phi \\ \max \limits _{t_p\in pre(t_i)}\left\{ \mathrm{FT}(t_p)+\frac{\mathrm{tran}_{p,i}}{b}\right\} +\mathrm{ET}(t_i),&{}\quad \mathrm{pre}(t_i)\ne \phi \end{array} \right. \end{aligned}$$

(6)

The makespan MS of the workflow is defined as the maximum actual finish time of the exit tasks:

$$\begin{aligned} \mathrm{MS}=\max \{\mathrm{FT}(t_\mathrm{exit})\} \end{aligned}$$

(7)

4.4 Energy model

The energy consumption in this paper is calculated based on the DVFS technique [16]. The dynamic power consumption is the most important factor, so we ignore the static energy consumption [27]. The dynamic power capability $P_\mathrm{d}$ is expressed as follows:

$$\begin{aligned} P_\mathrm{d}= BCV^2f, \end{aligned}$$

(8)

where B is the number of switches per clock cycle, C is the total capacitance load, V is the supply voltage, and f is the frequency. From Eq. (8), we know that the supply voltage is the dominant factor. The total energy consumed by the whole workflow can be calculated as follows [8]:

$$\begin{aligned} E=\sum _{i=1}^{m} BC(V_i)^2f\cdot \mathrm{ET}(t_i)=\sum _{i=1}^{m} \alpha (V_i)^2\cdot \mathrm{ET}(t_i), \end{aligned}$$

(9)

where $V_i$ denotes the supply voltage of the VM on which task $t_i$ is executed. It is often assumed that VMs are operating at the maximum voltage level when they are busy [8]. The VM type with higher performance has a higher supply voltage. Without loss of generality, we assume that a VM with higher performance consumes more energy.

4.5 Mathematical formulation

The two objectives in this paper are to minimize makespan and energy consumption. Besides, the scheduling problem subjects to a budget constraint. Based on Eqs. (7) and (9), the scheduling problem is formulated as follows:

$$\begin{aligned}&\mathrm{Goals:} \nonumber \\&\min \ MS \ \mathrm{and} \ \min \ E \nonumber \\&\mathrm{Constraints:}\nonumber \\&\sum _{i=1}^{m} \mathrm{EC}(t_i) \le \mathrm{BG} \end{aligned}$$

(10)

$$\begin{aligned}&\sum \limits _{j=1}^{n} x_{i,j}= 1,\quad \forall t_i\in T \end{aligned}$$

(11)

$$\begin{aligned}&x_{i,j}\in \left\{ 0,1 \right\} , \quad \forall t_i\in T,\ v_j\in V, \end{aligned}$$

(12)

where m and n is the number of all tasks and VM types respectively, and BG denotes the budget.

The constraint (10) ensures that the cost of the workflow must be smaller than the budget. The constraint (11) ensures that each task is executed only once. The constraint (12) defines the decision variables. What’s more, it is quite remarkable that the task-dependency constraint is reflected by Eq. (6), and $x_{i,j}$ can be found in all objectives and constraints.

A consumer can determine the budget within the range provided by the following equation [9]:

$$\begin{aligned} \mathrm{BG}= C_{c}+k_{\mathrm{BG}}\cdot (C_{h}-C_{c}), \end{aligned}$$

(13)

where $C_{c}$ denotes the cost of the cheapest scheduling, $C_{h}$ denotes the cost of the most expensive scheduling, and $k_\mathrm{BG}$ is the budget factor, which is determined by the consumer. When $k_\mathrm{BG}<0$, there exists the inequation $\mathrm{BG}<C_{c}$. It means that the consumer cannot afford to pay the cost of the workflow. When $k_\mathrm{BG}\ge 1$, $\mathrm{BG}\ge C_{h}$. There will be no budget constraint anymore. Only when $0\le k_\mathrm{BG}< 1$, the scheduling problem is budget-constrained. If $k_\mathrm{BG}=0$, the consumer can only choose the cheapest scheduling. In the experiments, we set $k_\mathrm{BG}$ in the range of 0.1–0.9. The step length is set to 0.1.

Without loss of generality, we assume that a VM with higher computing performance charges more in the data center. Given $t_i\in T$ and $v_j, v_k\in V$, if processing speeds satisfy $p_j<p_k$, then there exists the inequation $\mathrm{ET}(t_i,v_j)\cdot c_j< \mathrm{ET}(t_i,v_k)\cdot c_k$. After reducing the above inequation, we obtain:

$$\begin{aligned} \frac{c_j}{p_j}<\frac{c_k}{p_k} \end{aligned}$$

(14)

The cheapest cost $C_c$ is obtained by selecting the VM with the lowest performance to execute each of the tasks. And the highest cost $C_h$ is obtained by selecting the VM with the highest performance to execute each of the tasks.

In conclusion, the problem in this paper is about the multi-objective scientific workflow scheduling, in which the makespan and energy consumption are optimized under a budget constraint. Workflow scheduling is NP-hard [28]. In the next section, we will apply a multi-objective RL algorithm based on the Chebyshev scalarization method to the scientific workflow scheduling problems.

5 The proposed EnMORL algorithm

This section consists of four parts, namely reinforcement learning, the Chebyshev scalarization function, the learning agent of EnMORL, and the description of EnMORL.

5.1 Reinforcement learning

Single-objective RL is usually described in the form of a Markov decision process (MDP) [29]. We use $S=\{s_1,\ldots ,s_M\}$ to denote the state space, and $A=\{a_1,\ldots ,a_N \}$ to denote the set of N actions, which are available in the current state. For each combination of current state s, an available action $a\in A$ and a next state $s'$, there always exists a transition probability $p(s'|s,a)$ and a reward signal r(s, a).

RL aims to find an optimal policy $\pi $, which maximizes the expected discounted reward ${\mathbb {E}}[\sum _{i=0}^{\infty }\gamma ^ir_i]$, where $\gamma \in (0,1]$ is the discount factor, and $r_i$ is the reward signal at time step i. This goal can be expressed using Q values which record the expected discounted reward for each state-action pair. We use $Q^{\pi }(s,a)$ to denote the Q value of taking action $a\in A$ in state s under policy $\pi $, and $Q^*$ to denote the optimal Q value.

In a famous RL algorithm called Q-learning [30], ${\widehat{Q}}(s,a)$ was proposed to iteratively approximate $Q^*$. The ${\widehat{Q}}(s,a)$ value is updated accordingly to following update rule:

$$\begin{aligned} {\widehat{Q}}(s,a)= (1-\alpha _i){\widehat{Q}}(s,a)+\alpha _i\left( r+\gamma \max _{a'}{\widehat{Q}}(s^{\prime },a^{\prime })\right) , \end{aligned}$$

(15)

where $0<\alpha _i\le 1$ denotes the learning rate at time step i, and r is the reward of taking action a in state s. The ${\widehat{Q}}(s,a)$ value will converge to $Q^*(s,a)$ if visiting all state-action pairs infinitely using an appropriate learning rate [31].

In a multi-objective RL algorithm, the MDP is usually extended to a multi-objective MDP, and the reward signal is extended to a reward vector [32]. The vector of reward signals in a multi-objective RL algorithm can be denoted as $\overrightarrow{r}(s,a)=\{r_1(s,a),\ldots ,r_k(s,a)\}$, where k is the number of objectives.

5.2 The Chebyshev scalarization function

The linear scalarization function is widely used in multi-objective algorithms, including RL, PSO, and Ant Colony Optimization (ACO) [15]. However, it can only discover solutions in convex regions of the Pareto front [33]. Besides, it is an acknowledged difficult problem to determine the appropriate weights when using the linear scalarization function [14]. The inappropriate weights will affect the quality of solutions. Therefore, we use the Chebyshev scalarization method in our proposed algorithm.

The Chebyshev scalarization function is categorized as a nonlinear scalarization method, which overcomes the shortcomings of the linear scalarization function. It can discover Pareto optimal solutions regardless of the front shape. Besides, the Chebyshev scalarization function is not particularly dependent on the actual weights used. We can select a relative optimal weight tuple from several randomly generated ones in preliminary experiments [15].

The Chebyshev scalarization function evaluates actions using the metric $L_p$. This metric is defined as the distance between a point in the objective space and a utopian point $z^*$, which records the best value for each objective [34]. During the learning process, $z^*$ is constantly updated. The value of $z_i^*$ for objective i can be calculated as follows:

$$\begin{aligned} z_i^*= f_i^\mathrm{best}(x) + \mu , \end{aligned}$$

(16)

where $f_i$ denotes the function of objective i, $f_i^\mathrm{best}(x)$ denotes the best value so far for objective i of the solution x, and $\mu $ denotes a small constant. We set $\mu $ to 2.0. We use $L_p(x)$ to denote the distance between the corresponding point of solution x and the utopian point $z^*$ and it can be expressed as follows:

$$\begin{aligned} L_p(x)=\left( \sum _{i=1}^kw_i |f_i(x)-z_i^*|^p \right) ^{1/p}, \end{aligned}$$

(17)

where $p\ge 1$. We use k to denote the number of objectives, and $0\le w_i\le 1$ to denote the weight of each objective i, which satisfies the equation $\sum _{i=1}^k w_i=1$. When $p =\infty $, the metric will be called the Chebyshev metric. It can be expressed as follows:

$$\begin{aligned} L_{\infty }(x)=\max _{i=1\cdots k}w_i |f_i(x)-z_i^*| \end{aligned}$$

(18)

In multi-objective RL, $f_i(x)$ is replaced by ${\widehat{Q}}_i(s,a)$, which denotes the ${\widehat{Q}}$ value for each objective i. Then, the ${\widehat{SQ}}$ value or the scalarized ${\widehat{Q}}$ value of a state-action pair (s, a) can be calculated as follows:

$$\begin{aligned} {\widehat{SQ}}(s,a)=\max _{i=1\cdots k}w_i |{\widehat{Q}}_i(s,a)-z_i^*| \end{aligned}$$

(19)

The above equation describes the scalarization of ${\widehat{Q}}$ values when using the Chebyshev scalarization function in RL. The action corresponding to the smallest ${\widehat{SQ}}$ value will be chosen as the greedy action in state s. In other words, the greedy strategy can be expressed as follows:

$$\begin{aligned} \mathrm{greedy}_{a}(s)=\arg \min _{a\in A}{\widehat{SQ}}_i(s,a), \end{aligned}$$

(20)

where A is the available action set in state s. Algorithm 1 shows the scalarized $\epsilon $-greedy strategy for a multi-objective RL algorithm, where $\overrightarrow{w}$ is the vector of weights for multiple objectives.

5.3 The learning agent of EnMORL

In this part, we introduce the learning agent of EnMORL in terms of the state space, the available action set, and the reward vector.

5.3.1 The state space

The state space in the EnMORL algorithm describes the usage of all VM types at each time step, which is denoted by $S=\{s_1,\ldots ,s_q\}$. Apart from the final state, each state corresponds to a time step.

The state $s_i\in S$ denotes the usage of all VM types at time step i. The above state can be expressed as $s_i= \{s_{i}^1,\ldots ,s_{i}^n\}$, where n is the number of all VM types, and $s_{i}^j\in s_i$ is the set of all tasks which have been assigned to VM type $v_j \in V$ at time step i. In our implementation, the usage of all VM types and the task assignments in the current state $s_i$ is figured out using the current values of decision variables. It should be noted that the constraint (11) can only be satisfied after the construction of a solution.

At each time step, we only schedule one task based on the available action set. The maximum time step is determined by the number of tasks in the workflow. There exists the equation $q=m+1$, where m is the number of tasks. In particular, $s_{m+1}$ is the final state, in which all tasks have been assigned to a VM.

5.3.2 The available action set

The set of available actions at time step i is denoted by $A_i$. Tasks are sorted and scheduled in descending order of their priorities, which are calculated according to Eq. (26). We denote the task to be scheduled at time step i as $t_i$.

As described above, most existing works of budget-constrained workflow scheduling cannot ensure the feasibility of solutions. Instead, they discuss the success rate or the failure rate of the scheduling in the experiments. Only a few works, like [11] and [12], can guarantee the feasibility. However, their methods are too complex.

To guarantee the feasibility of solutions, we use a concept called the RCB in our algorithm. This concept is initially proposed in [9] to select computing resources for tasks. RCB is defined as the remaining cheapest cost for unscheduled tasks excluding the current task $t_i$, which updates using the following equation:

$$\begin{aligned} \mathrm{RCB}=\mathrm{RCB}-\mathrm{EC}_{c}(t_i), \end{aligned}$$

(21)

where $\mathrm{EC}_{c}(t_i)$ is the least cost of executing task $t_i$. In this paper, we define a concept called the possible remaining budget. If task $t_i$ is assigned to VM type $v_j$ at time step i, then the possible remaining budget $\mathrm{PRB}(t_i,v_j)$ can be calculated as:

$$\begin{aligned} \mathrm{PRB}(t_i,v_j)=\left\{ \begin{array}{ll} \mathrm{BG}-\mathrm{ET}(t_i,v_j)\cdot c_j, &{}\quad i=1\\ \mathrm{BG}-\sum _{u=1}^{i-1}\mathrm{EC}(t_u)-\mathrm{ET}(t_i,v_j)\cdot c_j,&{}\quad i>1, \end{array} \right. \end{aligned}$$

(22)

where $\sum _{u=1}^{i-1}\mathrm{EC}(t_u)$ denotes the total cost of tasks that have been assigned before, and $\mathrm{EC}(t_u)$ is calculated according to Eq. (5). In EnMORL, taking action means selecting an appropriate VM type to execute the current task. Then, the available action set $A_i$ at time step i can be represented as follows:

$$\begin{aligned} A_i= \{v_j\in V| \mathrm{PRB}(t_i,v_j) \ge \mathrm{RCB} \} \end{aligned}$$

(23)

Our core idea is to ensure that the actual remaining budget (ARB) is never less than the RCB during the mapping phase. In other words, always keep the remaining budget enough. This method can even be used to enable the feasibility of the solution in the worst case when the budget factor $k_\mathrm{BG}=0$. At each time step i, only when task i is assigned to the cheapest VM type, the inequation $\mathrm{PRB}(t_i,v_j) \ge RCB$ holds. We do not consider $k_\mathrm{BG}=0$ in the experiments, because there is only one feasible solution in that case.

5.3.3 The reward vector

In multi-objective RL algorithms, we use a vector of reward signals instead of a single reward signal. Let $\overrightarrow{r}=[r_1,r_2]^T$ denote the reward vector, where $r_1$ is the reward of the first objective, and $r_2$ is the reward of the second objective.

If the action selected at time step i is $a_i\in A_i$, the reward of the first objective can be calculated as follows:

$$\begin{aligned} r_1(s_i,a_i)=\frac{\mathrm{FT}_{\mathrm{best}}(t_i)}{\mathrm{FT}(t_i,a_i)}, \end{aligned}$$

(24)

where $\mathrm{FT}_{\mathrm{best}}(t_i)$ denotes the finish time of task $t_i$ when it is executed on a VM with the highest performance, and $\mathrm{FT}(t_i,a_i)$ denotes the finish time of task $t_i$ when it is executed on VM type $a_i$. Similarly, the reward of the second objective can be calculated as follows:

$$\begin{aligned} r_2(s_i,a_i)=\frac{E_{\mathrm{best}}(t_i)}{E(t_i,a_i)}, \end{aligned}$$

(25)

where $E_{\mathrm{best}}(t_i)$ denotes the energy consumption of task $t_i$ when it is executed on a VM with the lowest power capability, and $E(t_i,a_i)$ denotes the energy consumption of task $t_i$ when it is executed on VM type $a_i$.

5.4 The description of EnMORL

Like other list-based algorithms, EnMORL includes a ranking phase and a mapping phase.

In the ranking phase, each task $t_i$ in the workflow is assigned a priority [23], which is denoted by $\mathrm{pr}(t_i)$. And the value of $\mathrm{pr}(t_i)$ can be calculated as follows:

$$\begin{aligned} \mathrm{pr}(t_i)=\overline{ET(t_i)}+\max \limits _{t_s\in \mathrm{succ}(t_i)} \left\{ \frac{\mathrm{tran}_{i,s}}{b}+\mathrm{pr}(t_s) \right\} , \end{aligned}$$

(26)

where $\overline{\mathrm{ET}(t_i)}$ is the average execution time of task $t_i$ over all types of VMs. The priority $\mathrm{pr}(t_i)$ represents the length of the longest path from task $t_i$ to the exit node. In the ranking phase, we do not know where the tasks will run. As a result, the average value is considered in the Eq. (26). In the mapping phase, tasks are sorted in descending order of their priorities and assigned according to Algorithm 2.

Algorithm 2 shows the pseudocode of the EnMORL algorithm. This algorithm can also be described in the following five steps.

Step 1:
The ranking phase. In this step, we assign a priority to each task $t_i\in T$ and then sort all tasks in descending order of their priorities.
Step 2:
The initialization phase. In this step, we initialize the Pareto approximate set $S_\mathrm{p}$ and the ${\widehat{Q}}$ value for each objective.
Step 3:
The mapping phase. In this step, the learning agent constructs a feasible solution or a trajectory according to the scalarized $\epsilon $-greedy strategy shown in Algorithm 1. After taking action, the ${\widehat{Q}}$ values will be updated according to Eq. (15).
Step 4:
Update of the Pareto approximate set. After a solution has been constructed, the Pareto approximate set $S_\mathrm{p}$ will be updated. If the solution in the current iteration is not dominated by any other solutions in $S_\mathrm{p}$, it will be added to the set. All solutions dominated by the newly added one will be eliminated from the set.
Step 5:
The termination check. If the maximum number of iterations is reached, the algorithm terminates and returns the Pareto set $S_\mathrm{p}$. Otherwise, go to Step 2.

More specifically, Step 1 corresponds to lines 1–4 in Algorithm 2, Step 2 corresponds to lines 5, 6, Step 3 corresponds to lines 8–29, Step 4 corresponds to line 31, and Step 5 corresponds to line 32.

6 Performance evaluation

In the section of experiments, we present performance comparisons of our EnMORL algorithm with some state-of-the-art scheduling algorithms in terms of makespan, energy consumption, and the quality of the Pareto approximate set. This section consists of five parts, namely workflow structure, simulation setup, weight selection, solution selection, and experimental results.

6.1 Workflow structure

In simulation experiments, we use four different synthetic workflows, which are described in [35], to evaluate the performance of our proposed algorithm. These synthetic workflows are based on realistic workflow structures from different scientific areas, which are as follows:

1.
Montage: Astronomy
2.
EpiGenomics: Biology
3.
CyberShake: Earthquake
4.
SIPHT: Biology

The characterization of the above workflows, including their structure, data, and computational requirements, is described in detail in [35]. The corresponding workflow structures are shown in Fig. 1.

6.2 Simulation setup

In simulations, all algorithms are implemented in the Java language and run on a PC with an Intel Core i5-9400F CPU at 2.90 GHz and 8 GB RAM.

In this paper, we use CloudSim [36], which is a widely used framework, to simulate the environment of a cloud data center. Cloudsim can offer a repeatable and controllable experimental environment, which enables the users to pay no attention to the hardware details.

The cloud data center provides computing resources in the form of 10 different VM types. In this paper, each VM type is dynamic voltage scaling (DVS) enabled, and each of them is generated using a method similar to those in [27] and [8]. For each VM type $v_j$, a set of voltage supply levels (VSLs) is random and uniformly distributed among three different sets of VSLs, which are shown in Table 1. The processing speed of each VM type is chosen at random in the range of 1000–5000 MIPS. The VM type with the highest performance is five times faster than the type of lowest performance. In this paper, a VM is operating at the maximum voltage level when there exists a task executing on it. When a VM is in the idle state, the supply voltage drops to the minimum level [8]. The bandwidth of each communication link between VMs is equal, and the value is set to 25 Mbps. What’s more, the pricing model similar to Amazon EC2 is used in the experiments, and we assume that the unit price of each VM type is proportional to its performance.

In Pareto-based RL algorithms, the convergence time or the time to get the optimized solution is usually figured out according to a quality indicator called hypervolume [15, 25, 26].

As shown in Fig. 2, hypervolume is defined as the volume of the area between solution points and the reference point [19]. In the case of a maximization problem, a suitable reference point can be obtained by determining the lower limit of each objective and then subtracting a small constant. In the case of a minimization problem, the reference point can be obtained by determining the upper limit of each objective and then adding a small constant. If there is only one solution in the Pareto approximate set, the value of hypervolume will be 0. For the Pareto approximate set, a higher hypervolume means better quality.

The Q-learning algorithm will converge to the optimal policy if each action and state is sufficiently sampled [31]. For example, Fig. 3 shows the convergence of EnMORL with respect to hypervolume in the case of Montage and $k_\mathrm{BG}=0.4$. The convergence time here is expressed in the iterations. With the increase of the iterations, hypervolume tends to converge to a fixed value.

Due to the importance of parameters for RL, we conduct experiments to configure them. These parameter values are often used in RL algorithms. Table 2 shows a set of experiments in the case of Montage workflow and $k_\mathrm{BG}=0.4$. The columns “hyper” and “$I_\mathrm{con}$” represent the hypervolume value and the convergence iteration, respectively. From the table, we find that the variation of parameters has a greater influence on the convergence time. In other cases, we have similar findings. Therefore, the parameter configurations are mainly based on convergence time. In all experiments, the parameter configurations for EnMORL are entirely identical. The parameter $\epsilon $ in the $\epsilon $-greedy strategy is set to 0.1. The learning rate $\alpha _i$ is fixed to 0.1 at any time step and the discount factor $\gamma $ is set to 0.9. Results are collected and averaged over 30 trials of each 500 iterations.

Table 1 Voltage–relative speed pairs

Full size table

Table 2 Parameter configuration in the case of Montage and $k_\mathrm{BG}=0.4$

Full size table

6.3 Weight selection

The weight selection is very important for the multi-objective problems. The inappropriate weights have a terrible impact on the quality of the solution set. However, it is an acknowledged difficult problem to determine the weights in multi-objective algorithms [14]. Moffaert et al. [15] find that the combination of RL and the Chebyshev scalarization function is not particularly dependent on the weights used.

When applying the Chebyshev scalarization function to RL, most weight tuples can achieve good results. We can choose a relatively optimal weight tuple from randomly generated ones. The weight selection in the Chebyshev scalarization method is based on hypervolume.

In preliminary experiments, we select the relatively optimal weight tuple from randomly generated ones for each workflow according to the value of hypervolume. For instance, we present the weight selection in the case of the Montage workflow and $k_\mathrm{BG}=0.4$. We randomly generate 10 scalarization weights $w_1$, which are in the range (0, 1), of the makespan objective. Then, the corresponding weights of the energy objective are calculated by $w_2=1-w_1$. We collect and average the hypervolume of each weight tuple over 20 trials. Figure 4 shows the hypervolume value corresponding to 10 weight tuples in the case of Montage and $k_\mathrm{BG}=0.4$. In particular, EnMORL returns only one solution in the case of the fourth tuple. So the corresponding hypervolume value is 0. We choose the third one, which corresponds to the largest hypervolume.

6.4 Solution selection

There are multiple different solutions in the Pareto approximate set. Our goal is to minimize the makespan and energy consumption. As a result, we select the solution, which is closest to the origin, to evaluate the performance. We propose a metric to select such a solution from the approximate set. We first normalize the values of two objectives and then calculate the Euclidean distance between each normalized solution point and the origin. The Euclidean distance $D_\mathrm{e}(x)$ in terms of solution x can be calculated as follows:

$$\begin{aligned} D_\mathrm{e}(x)=\sqrt{\mathrm{NM}(x)^2+\mathrm{NE}(x)^2}, \end{aligned}$$

(27)

where $\mathrm{NM}(x)=f_1(x)/\mathrm{MS}_{c}$ is the normalized makespan, $\mathrm{NE}(x)=f_2(x)/E_{h}$ is the normalized energy consumption, f is the objective function, $f_1(x)$ is the actual makespan of solution x, $\mathrm{MS}_{c}$ is the makespan when all tasks are executed on the cheapest VM type, $f_2(x)$ is the actual energy consumption of solution x, and $E_{h}$ is the energy consumption when all tasks are executed on the most expensive VM type.

6.5 Experimental results

In this section, EnMORL is compared with two popular multi-objective meta-heuristics, which are called NSGA-II [22] and MOPSO [24], in terms of makespan, energy consumption, and the hypervolume indicator. More specifically, NSGA-II used for comparison is the multi-objective algorithm in [13], MOPSO used for comparison is a new version called HSPO [8].

The experiments consist of four parts in the case of four different workflows, namely Montage, EpiGenomics, CyberShake, and SIPHT. They are representative and reasonable examples of workflows.

6.5.1 Montage workflow

The experimental results of the Montage workflow with different budget factors are shown in Figs. 5a, 6a, and 7a.

Figure 5a shows the results of the makespan. On the whole, the values of makespan generally decline with the rise of budget factors. When $k_\mathrm{BG}=0.1$, the values of the makespan by using EnMORL are $29.33\%$ and $20.36\%$ lower than those obtained by using NSGA-II and MOPSO, respectively. When $k_\mathrm{BG}$ increases to 0.9, the values of the makespan obtained by using EnMORL become $16.93\%$ and $7.55\%$ lower than those obtained by using NSGA-II and MOPSO, respectively.

As we can see from Fig. 6a, energy consumption generally increases with the increase in budget factors. The main reason is that more tasks will be assigned to the VM types, which have higher performance when the budget increases. When a task is executed on a VM with higher performance, it consumes more energy. In the case of $k_\mathrm{BG}=0.1$, the values of the energy consumption by using EnMORL are $16.02\%$ and $5.71\%$ lower than those obtained by using NSGA-II and MOPSO, respectively. When the value of $k_\mathrm{BG}$ increases to 0.9, the values of the energy consumption obtained by using EnMORL become $38.24\%$ and $29.01\%$ lower than those obtained by using NSGA-II and MOPSO, respectively.

Hypervolume is a commonly accepted quality indicator in multi-objective RL [15]. Figure 7a shows the hypervolume values obtained by using three multi-objective algorithms when the values of budget factor $k_\mathrm{BG}$ are 0.2, 0.4, 0.6, and 0.8. We find that the hypervolume values obtained by EnMORL are higher than those obtained by NSGA-II and MOPSO. It is because that the weights used in EnMORL are selected according to hypervolume, while the weights used in NSGA-II and MOPSO are simply set to 0.5. NSGA-II and MOPSO are all scalarized based on the linear scalarization function. It is an acknowledged difficult problem to determine the weights in the linear scalarization function [14]. As a result, the weight tuple is often seen as the preference for each objective. If the hypervolume value is very low or equals to 0, the weight tuple in use is inappropriate.

6.5.2 EpiGenomics workflow

The experimental results of the EpiGenomics workflow with different budget factors are shown in Figs. 5b, 6b and 7b.

Figure 5b shows the results of the makespan. On the whole, the values of makespan generally decline with the rise of budget factors. When $k_\mathrm{BG}=0.1$, the values of the makespan by using EnMORL are $45.95\%$ and $37.22\%$ lower than those obtained by using NSGA-II and MOPSO, respectively. When $k_\mathrm{BG}$ increases to 0.9, the values of the makespan obtained by using EnMORL become $48.36\%$ and $25.29\%$ lower than those obtained by using NSGA-II and MOPSO, respectively.

As we can see from Fig. 6b, the values of energy consumption generally increase with the increase in budget. In the case of $k_\mathrm{BG}=0.1$, the values of the energy consumption by using EnMORL are $23.63\%$ and $7.37\%$ lower than those obtained by using NSGA-II and MOPSO, respectively. When the value of $k_\mathrm{BG}$ increases to 0.9, the values of the energy consumption obtained by using EnMORL become $34.54\%$ and $21.55\%$ lower than those obtained by using NSGA-II and MOPSO, respectively.

Figure 7b shows the hypervolume values obtained by using three multi-objective algorithms when the values of budget factor $k_\mathrm{BG}$ are 0.2, 0.4, 0.6, and 0.8. We can see that the hypervolume values obtained by using EnMORL are always higher than those obtained by using NSGA-II and MOPSO. If hypervolume is very low or equals to 0, it indicates that the weight tuple in use is inappropriate.

6.5.3 CyberShake workflow

The experimental results of the CyberShake workflow with different budget factors are shown in Figs. 5c, 6c and 7c.

Figure 5c shows the results of the makespan. On the whole, the values of makespan generally decline with the rise of budget factors. In the case of $k_\mathrm{BG}=0.1$, the values of makespan obtained by using EnMORL are $48.54\%$ and $37.75\%$ lower than those obtained by using NSGA-II and MOPSO, respectively. When $k_\mathrm{BG}$ increases to 0.9, the values of the makespan obtained by using EnMORL become $41.53\%$ and $19.09\%$ lower than those obtained by using NSGA-II and MOPSO, respectively.

As we can see from Fig. 6c, energy consumption generally increases with the increase in budget factors. In the case of $k_\mathrm{BG}=0.1$, the values of the energy consumption by using EnMORL are $29.60\%$ and $16.87\%$ lower than those obtained by using NSGA-II and MOPSO, respectively. When the value of $k_\mathrm{BG}$ increases to 0.9, the values of the energy consumption obtained by using EnMORL become $49.09\%$ and $31.36\%$ lower than those obtained by using NSGA-II and MOPSO, respectively.

Figure 7c shows the hypervolume values obtained by using three multi-objective algorithms when the values of budget factor $k_\mathrm{BG}$ are 0.2, 0.4, 0.6, and 0.8. We can see that the hypervolume values obtained by using EnMORL are always higher than those obtained by using NSGA-II and MOPSO. If the hypervolume value is very low or equals to 0, it indicates that the weight tuple in use is inappropriate.

6.5.4 SIPHT workflow

The experimental results of the SIPHT workflow with different budget factors are shown in Figs. 5d, 6d and 7d.

The results of the makespan are shown in Fig. 5d. On the whole, the values of makespan generally decline with the rise of budget factors. In the case of $k_\mathrm{BG}=0.1$, the values of makespan obtained by using EnMORL are $54.82\%$ and $47.92\%$ lower than those obtained by using NSGA-II and MOPSO, respectively. When $k_\mathrm{BG}$ increases to 0.9, the values of the makespan obtained by using EnMORL become $48.73\%$ and $36.34\%$ lower than those obtained by using NSGA-II and MOPSO, respectively.

As we can see from Fig. 6d, the values of energy consumption generally increase with the increase in budget factors. In the case of $k_\mathrm{BG}=0.1$, the values of the energy consumption by using EnMORL are $15.61\%$ and $12.49\%$ lower than those obtained by using NSGA-II and MOPSO, respectively. When the value of $k_\mathrm{BG}$ increases to 0.9, the values of the energy consumption obtained by using EnMORL become $37.07\%$ and $25.29\%$ lower than those obtained by using NSGA-II and MOPSO, respectively.

Figure 7d shows the hypervolume values obtained by using three multi-objective algorithms when the values of budget factor $k_\mathrm{BG}$ are 0.2, 0.4, 0.6, and 0.8. We can see that the hypervolume values obtained by using EnMORL are always higher than those obtained by using NSGA-II and MOPSO. If the hypervolume value is very low or equals to 0, it means that the weight tuple in use is inappropriate.

In conclusion, the comparison results show that the EnMORL algorithm outperforms the other two algorithms in terms of makespan, energy consumption, and the hypervolume indicator.

7 Conclusion

Workflow scheduling has been a major energy consumer in clouds. This paper considers a multi-objective workflow scheduling problem with the budget constraint. Our goal is to simultaneously minimize the makespan and energy consumption while meeting the budget constraint. This paper focuses on the weight selection problem and how to satisfy the budget constraint. Both of them were seldom considered in related works.

We propose a multi-objective RL algorithm called EnMORL. In EnMORL, we propose a simple new method to satisfy the budget constraint and guarantee the feasibility of solutions. This method is based on RCB and can be applied to other workflow scheduling problems with the budget constraint. EnMORL is based on the Chebyshev scalarization function, which can efficiently solve the weight selection problem. The experimental results show EnMORL outperforms these algorithms in the case of four workflows.

In future work, we will extend our algorithm to solve the dynamic workflow scheduling problem. We intend to use the conditional variational auto-encoder (CVAE) to predict the current situation according to the historical data.

References

Senyo PK, Addae E, Boateng R (2018) Cloud computing research: a review of research themes, frameworks, methods and future research directions. Int J Inf Manag 38(1):128–139
Article Google Scholar
Khattar N, Sidhu J, Singh J (2019) Toward energy-efficient cloud computing: a survey of dynamic power management and heuristics-based optimization techniques. J Supercomput 75(8):4750–4810
Article Google Scholar
Kintsakis AM, Psomopoulos FE, Mitkas PA (2019) Reinforcement learning based scheduling in a workflow management system. Eng Appl Artif Intell 81:94–106
Article Google Scholar
Andrae ASG, Edler T (2015) On global electricity usage of communication technology: trends to 2030. Challenges 6(1):117–157
Article Google Scholar
Ismayilov G, Topcuoglu HR (2020) Neural network based multi-objective evolutionary algorithm for dynamic workflow scheduling in cloud computing. Futur Gener Comput Syst 102:307–322
Article Google Scholar
Belkhir L, Elmeligi A (2018) Assessing ict global emissions footprint: trends to 2040 and recommendations. J Clean Prod 177:448–463
Article Google Scholar
Zhangjun W, Liu X, Ni Z, Yuan D, Yang Y (2013) A market-oriented hierarchical scheduling strategy in cloud workflow systems. J Supercomput 63(1):256–293
Article Google Scholar
Verma A, Kaushal S (2017) A hybrid multi-objective particle swarm optimization for scientific workflow scheduling. Parallel Comput 62:1–19
Article MathSciNet Google Scholar
Arabnejad H, Barbosa JG (2014) A budget constrained scheduling algorithm for workflow applications. J Grid Comput 12(4):665–679
Article Google Scholar
Garg R, Singh AK (2014) Multi-objective workflow grid scheduling using $\varepsilon $-fuzzy dominance sort based discrete particle swarm optimization. J Supercomput 68(2):709–732
Article Google Scholar
Wu CQ, Lin X, Yu D, Xu W, Li L (2014) End-to-end delay minimization for scientific workflows in clouds under budget constraint. IEEE Trans Cloud Comput 3(2):169–181
Article Google Scholar
Chen W, Xie G, Li R, Bai Y, Fan C, Li K (2017) Efficient task scheduling for budget constrained parallel applications on heterogeneous cloud computing systems. Futur Gener Comput Syst 74:1–11
Article Google Scholar
Sofia AS, GaneshKumar P (2018) Multi-objective task scheduling to minimize energy consumption and makespan of cloud computing using NSGA-ii. J Netw Syst Manag 26(2):463–485
Article Google Scholar
Das I, Dennis JE (1997) A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Struct Optim 14(1):63–69
Article Google Scholar
Van Moffaert K, Drugan MM, Nowé A (2013) Scalarized multi-objective reinforcement learning: Novel design techniques. In: 2013 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), IEEE, pp 191–199
Zhu D, Melhem R, Childers BR (2003) Scheduling with dynamic voltage/speed adjustment using slack reclamation in multiprocessor real-time systems. IEEE Trans Parallel Distrib Syst 14(7):686–700
Article Google Scholar
Zhou J, Wang T, Cong P, Lu P, Wei T, Chen M (2019) Cost and makespan-aware workflow scheduling in hybrid clouds. J Syst Arch. https://doi.org/10.1016/j.sysarc.2019.08.004
Article Google Scholar
Gábor Z, Kalmár Z, Szepesvári C (1998) Multi-criteria reinforcement learning. In: Proceedings of the Fifteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 197–205
Zitzler E, Thiele L, Laumanns M, Fonseca CM, Da Fonseca GV (2002) Performance assessment of multiobjective optimizers: an analysis and review. TIK-Report, vol 139
Li Z, Ge J, Haiyang H, Song W, Hao H, Luo B (2015) Cost and energy aware scheduling algorithm for scientific workflows with deadline constraint in clouds. IEEE Trans Serv Comput 11(4):713–726
Article Google Scholar
Qureshi B (2019) Profile-based power-aware workflow scheduling framework for energy-efficient data centers. Futur Gener Comput Syst 94:453–467
Article Google Scholar
Deb K, Pratap A, Agarwal S, Meyarivan TAMT (2002) A fast and elitist multiobjective genetic algorithm: NSGA-ii. IEEE Trans Evolut Comput 6(2):182–197
Article Google Scholar
Topcuoglu H, Hariri S, Min-you W (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Article Google Scholar
Coello CAC, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evolut Comput 8(3):256–279
Article Google Scholar
Mossalam H, Assael YM, Roijers DM, Shimon W (2016) Multi-objective deep reinforcement learning. arXiv preprint arXiv:1610.02707
Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of pareto dominating policies. J Mach Learn Res 15(1):3483–3512
MathSciNet MATH Google Scholar
Lee YC, Zomaya AY (2010) Energy conscious scheduling for distributed computing systems under different operating conditions. IEEE Trans Parallel Distrib Syst 22(8):1374–1381
Article Google Scholar
Atkinson M, Gesing S, Montagnat J (2017) and Ian Taylor. Past, present and future, Scientific workflows
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, New York
MATH Google Scholar
Watkins CJCH (1989) Learning from delayed rewards
Tsitsiklis JN (1994) Asynchronous stochastic approximation and q-learning. Mach Learn 16((3):185–202
MATH Google Scholar
Wiering MA, De Jong ED (2007) Computing optimal stationary policies for multi-objective Markov decision processes. In: 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, IEEE, pp 158–165
Vamplew P, Yearwood J, Dazeley R, Berry A (2008) On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts. In: Australasian Joint Conference on Artificial Intelligence, Springer, New York, pp 372–378
Voß T, Beume N, Rudolph G, Igel C(2008) Scalarization versus indicator-based selection in multi-objective CMA evolution strategies. In: 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), IEEE, pp 3036–3043
Bharathi S, Chervenak A, Deelman E, Mehta G, Su M-H, Vahi K (2008) Characterization of scientific workflows. In: 2008 Third Workshop on Workflows in Support of Large-Scale Science, IEEE, pp 1–10
Calheiros RN, Ranjan R, Beloglazov A, De Rose CAF, Buyya R (2011) Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50
Article Google Scholar

Download references

Acknowledgements

We would like to thank anonymous referees for their helpful suggestions to improve this paper. This work was supported in part by the National Natural Science Foundation of China under Grant NSFC 61672323, in part by the Fundamental Research Funds of Shandong University under Grant 2017JC043, in part by the Key Research and Development Program of Shandong Province under Grant 2017GGX10122 and Grant 2017GGX10142, and in part by the Natural Science Foundation of Shandong Province Grant ZR2019MF072.

Author information

Authors and Affiliations

School of Computer Science and Technology, Shandong University, Jinan, 250101, China
Yao Qin & Shanwen Yi
School of Software, Shandong University, Jinan, 250101, China
Hua Wang
School of Information Science and Engineering, Linyi University, Linyi, 276005, China
Xiaole Li
School of Information Science and Engineering, Shandong Normal University, Jinan, 250014, China
Linbo Zhai

Authors

Yao Qin
View author publications
You can also search for this author in PubMed Google Scholar
Hua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shanwen Yi
View author publications
You can also search for this author in PubMed Google Scholar
Xiaole Li
View author publications
You can also search for this author in PubMed Google Scholar
Linbo Zhai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, Y., Wang, H., Yi, S. et al. An energy-aware scheduling algorithm for budget-constrained scientific workflows based on multi-objective reinforcement learning. J Supercomput 76, 455–480 (2020). https://doi.org/10.1007/s11227-019-03033-y

Download citation

Published: 23 October 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11227-019-03033-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An energy-aware scheduling algorithm for budget-constrained scientific workflows based on multi-objective reinforcement learning

Abstract

Similar content being viewed by others

A multi-objective reinforcement learning algorithm for deadline constrained scientific workflow scheduling in clouds

A Multi-object Optimization Cloud Workflow Scheduling Algorithm Based on Reinforcement Learning

Online scheduling of dependent tasks of cloud’s workflows to enhance resource utilization and reduce the makespan using multiple reinforcement learning-based agents

Explore related subjects

1 Introduction

2 Related work

3 Multi-objective optimization

4 System models and mathematical formulation

4.1 Workflow model

4.2 Cloud data center model

4.3 Makespan model

4.4 Energy model

4.5 Mathematical formulation

5 The proposed EnMORL algorithm

5.1 Reinforcement learning

5.2 The Chebyshev scalarization function

5.3 The learning agent of EnMORL

5.3.1 The state space

5.3.2 The available action set

5.3.3 The reward vector

5.4 The description of EnMORL

6 Performance evaluation

6.1 Workflow structure

6.2 Simulation setup

6.3 Weight selection

6.4 Solution selection

6.5 Experimental results

6.5.1 Montage workflow

6.5.2 EpiGenomics workflow

6.5.3 CyberShake workflow

6.5.4 SIPHT workflow

7 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation