Keywords

1 Introduction

Recently, various scientific fields employ workflows to analyze large amounts of data and to perform complex simulations and experiments efficiently. A process in such scientific applications can be modeled as a workflow by dividing it into smaller and simpler tasks. These tasks can then be distributed to multiple computing resources [1]. They usually present graphical interfaces to combine different technologies along with efficient methods for using them, and thus increase the efficiency of scientists. They are usually represented as directed graphs with their nodes representing discrete computational components and the edges representing connections along which data and results can communicate among components. They have different types and usually their execution needs computing platforms with different QoS requirements, e.g. most completion time, load balancing, economics.

Recently, cloud computing is recognized as a promising solution and paradigm for providing a flexible, on-demand computing infrastructure over the Internet for large-scale scientific-workflow-based applications. The services that can be provided from the cloud include Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS), and Infrastructure-as-a-Service (IaaS) [2]. SaaS clouds offer web applications/software over the Internet, running on cloud infrastructure. PaaS and SaaS clouds are thus less suitable for scientific workflows than IaaS ones because they mainly offer an environment to design, develop and test web based applications. Instead, IaaS clouds offer an easily accessible, flexible, and scalable infrastructure suitable for the deployment of large-scale scientific workflows based on on-demand and pay-per-use patterns [3].

One of the most challenging NP-complete problems that researchers try to address is how to schedule large-scale scientific applications to distributed and heterogeneous computational nodes, e.g., IaaS clouds, such that quantitative objective functions such as process make-span are optimized, and certain execution constraints such as communication cost and storage requirements are considered and fulfilled. From the end-users perspective, a low make-span is always preferred, whereas from the systems perspective system-level efficiency and fairness are often considered as a good motivation such that the scientific applications and tasks are supposed to be fairly distributed among computational resources in order to avoid hot spots and performance bottle-necks. However, a careful investigation into related work shows that only a few schemes are able to deal with both perspectives, such as optimizing user objectives (e.g., make-span) while fulfilling other constraints, and providing a good fair workload distribution among physical computational resources of clouds.

The primary aim of the paper is therefore to propose a multi-objective scheduling method to address the real-time workflow scheduling problem on multiple IaaS cloud. Specifically, we consider a multi-objective optimization workflow scheduling approach based on dynamic game-theoretic model. It aims at reducing workflow make-spans, reducing cost, and maximizing system fairness in terms of workload distribution among heterogeneous VMs. We conduct extensive case studies as well based on various well-known scientific workflow templates and heterogeneous VMs created on real-world third-party commercial IaaS clouds, i.e., Amazon, Tencent, and Ali clouds. Experimental results clearly suggest that our proposed approach outperforms traditional ones by achieving lower workflow make-spans, lower cost, and better system fairness. Table 1 summarized the notations and description.

Table 1. Notations and description summary

The paper is structured as follows. In Sect. 2, we review related work. In Sect. 3, we present the formulation for the heterogeneous-VM-based multi-workflow scheduling problem. In Sect. 4, we present the real-time multi-objective scheduling algorithm based on the dynamic game-theoretic model. In Sect. 5, we conduct extensive case studies to validate our proposed approach. This paper concludes in Sect. 6 with a summery.

2 Related Work

Along with rapidly growing data and computational requirements of large-scale workflow applications, scheduling multiple workflows in distributed systems has become a important and challenging research topic. In this section, we briefly cover a part of the important or relevant related work.

2.1 Multi-objective Workflow Scheduling

The optimization model for workflow (single or multiple) scheduling aim at finding tradeoffs among multiple quantitative objectives, e.g., make-span, cost, reliability, energy consumption, security, or load balancing. Extensive efforts, e.g., [4,5,6,7,8,9,10,11] are paid in this direction. Durillo et al. [12] proposed tradeoff solutions generated using a multi-objective-heterogeneous-earliest-finish-time (MOHET) algorithm for multi-objective workflow scheduling problem. Yassa et al. [13] proposed an approach based on dynamic voltage and frequency scaling (DVFS) technique for multi-objective workflow scheduling in clouds to minimize energy consumption, and introduced a hybrid particle swarm optimization (PSO) algorithm to optimize the scheduling performance.

Many multi-objective evolutionary algorithms have been extended to deal with the multi-objective problems. Khajemohammadi et al. [14] proposed a genetic fast workflow scheduling over grid infrastructures. Zhu et al. [15] proposed an evolutionary multi-objective optimization (EMO)-based workflow scheduling algorithm with novel schemes for problem-specific encoding and population initialization, fitness evaluation and genetic operators. Chen et al. [16] proposed an ant colony optimization (ACO) algorithm to schedule large-scale workflows with make-span and cost. Padmaveni et al. [17] introduced a hybrid algorithm called particle swarm memetic (PSM) algorithm make-span and deadline as the optimization objectives.

Recently, the Pareto-optimal methods are frequently employed. It aims at pursuing a set of compromise solutions that represent good approximations to the Pareto-optimal fronts (PFs). For instance, Zheng et al. [18] proposed a Pareto-based fruit fly optimization algorithm (PFOA) to solve the task scheduling and resource allocating (TSRA) problem in cloud computing environment. Hou et al. [19] studied the Pareto optimization to schedule crude oil operations in a refinery via genetic algorithm. Ebadifard et al. [20] introduced a recent heuristic algorithm called black-hole-optimization (BHO) framework for workflow scheduling based on Pareto optimizer algorithm. It allows users to select the best from the proper solution set of candidate scheduling plans.

2.2 Game-Theoretic-Based Scheduling

Game theory models and methodologies are widely applied to the multi-constraint process scheduling on cloud social, economic and resource scheduling problems. Fard et al. [21] suggested a novel pricing model and truthful scheduling mechanism to find the best resource using the game-theoretic concepts. Duan et al. [22] modeled workflow scheduling the problem as a sequential cooperative game and proposed a communication and storage-aware multi-objective algorithm with network bandwidth and storage requirements as the constraints. Sujana et al. [23] applied the game multi objective algorithm for minimizing the execution time and cost of single workflow applications.

3 Model and Formulation

In this section, we first present the problem description and formulation of multi-objective workflow scheduling over heterogeneous cloud VMs. Then, we propose a finite multi-stage game model, i.e., Fig. 1, to reconcile multiple objectives and introduce a dynamic game-theoretic-based algorithm to reduce make-span, optimize system fairness and reduce the total cost.

Fig. 1.
figure 1

The abstract model for multi-workflow scheduling problem

3.1 Problem Formulation

In this study, we consider that scientific computational processes can be described by multiple workflows which are supposed to be scheduled into heterogeneous VMs created over multiple IaaS CSPs. Each workflow can be represented by a directed acyclic graph (DAG), \(W = (V, E)\), where V is a set of n tasks, i.e., \(\{t_1, t_2,\dots , t_n \}\). E is a set of precedence dependencies. Each task \(t_i\) represents an individual application with a certain task execution time \(v_i\) on a VM. A precedence dependency \(e_{ij}=(t_i,t_j)\) indicates that \(t_j\) starts only after the data from \(t_i\) are received. The source and destination of a dependency \(e_{ij}\) are called the parent and the child task, respectively. Each workflow has an input and output task, which are added to the beginning and the end, respectively. When multiple workflows are ready for execution, we first partition their tasks into multiple phases based on their hops from the input task as shown in Fig. 2. After the partition, tasks are scheduled to VMs according to our proposed method and tasks at earlier phases are scheduled earlier than those at later ones. Three quantitative objectives are considered: make-span, fairness, and total cost. Note that reducing make-span, i.e., the time required to execute all workflows, usually contradicts with cost reduction and thus we consider game-theoretic approaches to reconcile such conflicting optimization aims. The fairness maximization objective aims at achieving fair distribution of workloads among all VMs and avoiding hot-spots and performance bottle-necks.

Fig. 2.
figure 2

The partition procedure of multiple workflows

The following hypotheses are stipulated to facilitate the development of the game-theoretic-based method: (1) VMs are created on multiple CSPs. (2) Each task can be executed by only one VM. (3) The task execution duration is the interval between the task setup time and the task cutting time. (4) The dynamic game is finite because the number of workflows and tasks are finite. The game is thus able to end within finitely many moves and every player has finitely available choices at every moment.

Based on the above hypotheses, we can formulate the problem into a multi-stage dynamic game-theoretic model:

$$\begin{aligned} Min \,\, f_1 = make-span = Max\; C_{ijpk} \end{aligned}$$
(1)
$$\begin{aligned} Max\,\, f_2 = fairness\,\, index = \frac{(\sum _{i=1}^{K}{ET}_{ijpk})^2}{K\cdot \sum _{i=1}^{K}{{ET}_{ijpk}}^2} \end{aligned}$$
(2)
$$\begin{aligned} Min \,\, f_3 = cost = \sum _{p=1}^m\sum _{k=1}^{m_p}\sum _{i=1}^n\sum _{j=1}^{n_i}(t_{ijpk}-t_{ijpks})\cdot u_{pk}\cdot x_{ijpk} \end{aligned}$$
(3)

subject to:

$$\begin{aligned} i \in [1,n], j \in [1, n_i], p \in [1,m], k \in [1, m_p] \end{aligned}$$
(4)
$$\begin{aligned} C_{ijpk}\le C_{i,j+1,p,k}-t_{i,j+1,p,k}-t_{i,j+1,p,k,s}, C_{ijpk}\ge 0 \end{aligned}$$
(5)
$$\begin{aligned} \sum _{k \in VM(T_{ij})}x_{ijpk}=1 \end{aligned}$$
(6)
$$\begin{aligned} VM(T_{ij})\subset VM \end{aligned}$$
(7)

3.2 The Proposed Dynamic Game Model

In this paper, the multi-stage dynamic game theory is applied to deal with the conflicts and competition among multiple optimization objectives for the multi-workflow scheduling problem. The optimization objectives can be seen as players in the multi-stage dynamic game model, and the players are usually assumed to be fully rational. The game equilibrium solutions can be obtained as the optimal results. It is assumed that players take actions sequentially and the choice of the former player has an effect on the selection of the latter because the latter can observe the action of the former. The condition upon which the later makes a choice is denoted as \(h^l\). The utility functions of the first/second/third player correspond to make-span (\(u_1=f_1\)), the utility function of the second player is the second objective function which corresponds to, fairness (\(u_2=f_2\)), and the total cost (\(u_3=f_3\)), respectively. Consequently, the multi-stage dynamic game formulation for the problem can be described as follows:

$$\begin{aligned} G(h^l)=\{p_1,p_2,p_3; \{S_1^l\}_{l=0}^L, \{S_2^l\}_{l=0}^L, \{S_3^l\}_{l=0}^L; u_1,u_2,u_3 \} \end{aligned}$$
(8)

Let \(H^l=\{h^l\}\) be the history set of all possible l stage. The pure strategies for player i are defined as a contingency for every possible history \(h^l\). Formally, the \(l^{th}\) stage history information of the game is denoted as \(h^l=(a^0,a^1, \dots ,a^{l-1})\). The mapping \(\varphi \) \(_i\): \(s_i \rightarrow \{S_i^l\}_{l=0}^L\) indicates the pure strategy for player i, which is a collection of mappings from all possible histories into available actions. \(S_i^l\) is a mapping \(\varphi \) \(_i\): \(H^l \rightarrow A_i(H^l)\), i.e., for all \(h^l, S_i^l\) meets \(S_i^l(h^l) \in A_i (h^l)\). The mapping \(\varphi \) \(_i\): \(f_i \rightarrow u_i\) indicates the utility functions of the game. At each stage l, the player i calculates its pure Nash equilibrium solution based on the history information in the last stage, i.e. \(h^{l-1}\).

4 The Algorithm to Obtain Approximate Equilibrium

According to earlier discussions, each sequential game is represented by a game tree with game length of \(L+1\), shown as Fig. 3. To determine the optimal behaviors of players, we employ the sub-game perfectness in the finite multi-stage game with perfect information. A multi-stage dynamic game with perfect information may have multiple Nash equilibriums some of which are with non-credible threats or promises. The sub-game perfect Nash equilibrium (SPNE) is those able to pass credibility tests. The SPNE solution can be found through a standard procedure [24] by the backward induction method. However, the standard procedure requires a traverse through the game tree and unfortunately such tree for the multi-VM multi-workflow problem is extremely large. We therefore consider approximate equilibrium solutions with reduced complexity. The approximate equilibriums can be defined as follows:

$$\begin{aligned} \sum _{l=0}^Lu^l(S_1^l,S_2^l,S_3^l,h^{l-1})\ge \sum _{l=0}^Lu^l(S_1^*,S_2^*,S_3^*,h^{l-1}) \end{aligned}$$
(9)

The approximate equilibrium \(S^*=(S_1^*,S_2^*,S_3^*)\) is a set of strategies based on the game in Eqs. (8)–(9), where \(S^*\) is combination of the pure strategies Nash equilibriums at L stages. The decision strategies space S equals variables space X.

We introduce a multi-stage dynamic game-theoretic (MDGT) algorithm, Algorithm 1, to obtain the approximate equilibrium solutions. In this algorithm, during each stage l of the implementation of the workflow planning, a dynamic-game theory-based real-time scheduling method is triggered so that the tasks can be assigned to the most suitable VMs based on the real-time cloud environment. The aim of the scheduling layer is to map optional tasks to the most appropriate VMs. The algorithm repeatedly handles each stage until all tasks are scheduled and the major steps within each stage are as follows:

Step 1: create a real-time scheduling task pool with multi-phase tasks from multiple workflows to put \(T_{option}^l\) into it. \(T_{option}^l\) is supposed to meet topological dependence of its corresponding workflow, i.e., a task is executed only after all its preceding ones are executed.

Step 2: assign \(VM_{idle}^l\) to three objectives in turns. Each virtual machine of \(VM_{idle}^l\) which is allocated to \(f_i\) could choose the corresponding task from the real-time scheduling task pool. The mapping of tasks to idle virtual machines is called the strategies of the players.

Step 3: calculate the utility functions based on Eqs. (1)–(3) using the pure strategy Nash equilibrium based on historical information \(h^{l-1}\). Each \(T_{option}^l\) will best match with \(VM_{idle}^l\) at stage l.

Step 4: construct a finite dynamic game model and obtain the equilibrium solutions.

figure a
Fig. 3.
figure 3

The game tree template of multiple players

Table 2. The unit price of heterogeneous VMs from three IaaS providers

5 Case Study

In this section, we conduct extensive case study based on 5 well-known scientific workflow templates as shown in Fig. 4 and real-world third-party commercial clouds, i.e., Amazon EC2, Tencent, and Ali Clouds. Every task in all workflows implements a GaussCLegendre calculation procedure with 8M of digits through executing the Super-Pi program on VMs. We create heterogeneous VMs on these clouds and expose them to the scheduling algorithms.

Fig. 4.
figure 4

The case templates of five workflows

Table 2 shows the price-per-unit-time of such VMs with different resource configurations. A resulting scheduling scheme generated by our proposed method is shown in Fig. 5.

Fig. 5.
figure 5

An example of multi-objectives scheduling scheme for five workflows by using our proposed method

We compare our proposed method with a traditional non-game-theoretic algorithm proposed in [25]. Note that: (1) we notice several other game-theoretic scheduling algorithms, e.g., [22, 23, 26], but find out that they are intended for different problems and based on different architectural configurations and resource constraints. We are therefore unable to compare them with our proposed method, (2) other non-game-theoretic methods can be found in, e.g., [5, 27]. However, our tests show that their performance is actually very close to that of the baseline one, (3) we are pretty aware of the fact that meta-heuristic algorithms, e.g., PSO and GA-based ones, could well be promising options. However, we do not implement them and compare them with our proposed method because we consider scientific applications to be time-critical and meta-heuristic algorithms are with high time complexity.

Tables 3 and 4 present the comparisons of make-span and cost, respectively. As the total number of tasks from five workflows increases, the number of game stage increases. And our proposed MDGT method performs better than the baseline method.

Table 3. The comparisons of make-span
Table 4. The comparisons of the total cost
Fig. 6.
figure 6

The comparisons of different CSPs fairness

Fig. 7.
figure 7

The comparisons of average fairness

In Fig. 6, we show the comparisons of fairness indexes with different IaaS cloud service providers. The curves represents that our method outperforms the baseline method. Similarly, the results in Fig. 7 show that the MDGT method performs better than baseline method on the average fairness.

6 Conclusion

In this paper, we studied multi-objective multi-workflow scheduling problem over heterogeneous VMs created on multi-Clouds platforms and introduce a multi-stage dynamic game-theoretic (MDGT) scheduling approach. The proposed method is featured by approximation algorithm for identifying equilibrium solutions aiming at optimizing both workflow make-span, system fairness and the total cost. In addition, we conduct extensive experiments based on various well-kwon scientific workflow templates and real-world third-party commercial IaaS clouds. Experimental results demonstrate that our approach outperforms traditional baseline ones.