Energy-aware scheduling using slack reclamation for cluster systems

Maurya, Ashish Kumar; Modi, Kashish; Kumar, Vinay; Naik, Nenavath Srinivas; Tripathi, Anil Kumar

doi:10.1007/s10586-019-02965-7

Energy-aware scheduling using slack reclamation for cluster systems

Published: 29 July 2019

Volume 23, pages 911–923, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Cluster Computing Aims and scope Submit manuscript

Energy-aware scheduling using slack reclamation for cluster systems

Download PDF

Ashish Kumar Maurya ORCID: orcid.org/0000-0001-9679-9045¹,
Kashish Modi¹,
Vinay Kumar²,
Nenavath Srinivas Naik³ &
…
Anil Kumar Tripathi¹

528 Accesses
20 Citations
Explore all metrics

Abstract

Reducing energy consumption in processors has become a critical issue in cluster systems. The dynamic voltage and frequency scaling (DVFS) is one of the effective techniques to minimize energy consumption in processors when executing parallel applications. These applications may consist of many interrelated tasks that may be computed when their precedence constraints are satisfied. The scheduling of these precedence-constrained tasks on to processors, to optimize makespan or energy consumption or both, has been studied as an NP-complete problem. Thus, many algorithms have been given by the researchers and newer algorithms keep coming in. In the parallel applications, there exist some tasks which may have slacks due to the dependencies between tasks. The EASLA (energy aware service level agreement) is one of the well-known energy-aware task scheduling algorithms which utilize the underused slack of tasks. In this paper, we present an improved version of its sub-algorithm, NCM (Not Changing Makespan), for heterogeneous cluster systems that is used to downscale frequencies when schedule length does not change. The presented algorithm also makes use of a fast and low complexity algorithm called PEFT (Predict Earliest Finish Time) to compute the schedule length of the application. We performed experiments for randomly generated graphs, and the results illustrate that the improved version of the NCM algorithm achieves better energy saving compared to the NCM algorithm.

Dynamic power management under the RUN scheduling algorithm: a slack filling approach

Article 22 March 2021

A Power-Aware Scheduler Exploiting All Slacks under EDF Scheduling

A reformed task scheduling algorithm for heterogeneous distributed systems with energy consumption constraints

Article 22 August 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Nowadays, large-scale distributed systems such as cluster systems, clouds, grids, and datacentres are used to provide high-performance computing (HPC) solutions to big parallel applications. These systems consume a huge amount of energy which raises several environmental and economical concerns [11, 31]. For example, a 360-teraFLOPS high-performance cluster can consume nearly 20MW of power [32]. A datacenter having one thousand racks demands almost 10000 KW of electric power in a day to function [19]. In 2006, the United States Environmental Protection Agency reported that the overall energy consumed by datacenters and servers of the U.S. was 61.4 billion KWh which was equivalent to the power expenditure of 5.8 million households [32]. It is guesstimated in 2008 that HPC devices consume approximately 0.5% of the world$'$s total electricity produced which is expected to grow quadruple by 2020 [5]. High power consumption in cluster systems and datacenters lifts their functional costs, produces more heat requiring cooling devices which ultimately emit large quantities of carbon dioxide harming the environment and also affects the reliability, availability, and performance of the systems. Therefore, reducing energy consumption for HPC in cluster systems becomes a vital research topic.

Over the past decade, an appreciable development has been seen in computing technologies to boost power efficiency in HPC systems. Dynamic Power Management (DPM) [3, 7,8,9, 18, 22,23,24] and Dynamic Voltage Frequency Scaling (DVFS) [7, 10, 16, 26,27,28,29,30] are two popular techniques used in modern processors for minimizing power consumption. With DPM, the processors are turned off when they are idle [3, 22]. It is appropriate where idle slots of the processors are very long, and not suitable for the applications having precedence-constrained tasks as these applications involve shorter idle time between the computations of tasks [7, 18]. While using DVFS, the processors could operate at different levels of frequencies and voltages [16]. DVFS allows processors to reduce power consumption by scaling down the supply voltages and the corresponding frequencies of the processors since there is a quadratic relationship between the power consumed and the operating voltage [21]. For example, AMD Opteron processor operates on six different frequency levels varying from 1000 to 2600 MHz and correspondingly consumes power from 32 to 95 W [1].

The parallel applications are modeled as directed acyclic graphs (DAGs). In a DAG, each node represents a task, and the edges represent dependencies between the tasks [14]. Normally, the weight of a node represents the computation cost of a task, and the weight of an edge represents the communication cost required to send data from parent task to child [13, 17]. With DVFS, it is possible to minimize energy consumption of the parallel application by assigning different frequencies to each task of the application [7]. Sometimes, minimizing energy consumption slows down the computation of parallel applications because the techniques used for energy reduction decreases the overall computational power of HPC devices which may lead to performance degradation of the systems as well [20]. Scaling down the supply voltages, and turning off some processors, decreases the computational capability of the system and increases the overall schedule length or makespan of the application. Therefore, when concentrating on energy efficiency, makespan should be taken care of [20].

Many task scheduling algorithms have been given by the researchers that make use of the DVFS technique to obtain energy efficient schedule. The main idea, behind these algorithms, is to use the slack time of the tasks to scale down the supply voltages and frequencies of the processors. This paper presents an energy-aware task scheduling algorithm using slack reclamation for heterogeneous cluster systems. The algorithm decreases the energy consumption as much as possible without affecting the schedule length. For this, we have made use of DVFS technique. The presented algorithm is an improved version of the NCM (Not Changing Makespan), a sub algorithm of EASLA algorithm [7], for heterogeneous cluster systems. The algorithm also makes use of a fast and low complexity algorithm called PEFT (Predict Earliest Finish Time) [2] to compute the schedule length of the application and down scale frequencies of processors when schedule length of the application does not change.

The contributions of our work are summarized as follow:

1.
We develop an improved version of the NCM algorithm which can be utilized for different types of DAG applications.
2.
It makes use of PEFT algorithm [2] to determine the schedule length of the application. The PEFT is a fast and low complexity scheduling algorithm which uses a lookahead feature to compute the schedule of the application.
3.
It downscales the frequencies of the processors by using DVFS concept and redistribute the slack of the non-critical tasks of the applications as like NCM to reduce the energy consumption in processors of the cluster systems.
4.
We perform a simulation study to know the effectiveness of the proposed improved NCM algorithm over NCM algorithm for randomly generated graphs based on various characteristics. The comparative results demonstrate its effectiveness.

The rest of the paper is organized as follows: Sect. 2 explains the related work. Sect. 3 introduces the system models and formalizes the scheduling problem. Sect. 4 presents the improved NCM algorithm and its complexity analysis. An execution trace of the presented algorithm, with the help of an example, is also given in this section. The experimental results and discussion are presented in Sect. 5. Sect. 6 concludes the work.

2 Related work

In the literature, many algorithms have been proposed for energy efficient scheduling of precedence-constrained tasks of the parallel applications. In this section, we give a brief explanation of some existing energy-aware task scheduling algorithms.

Juurlink and Langen proposed a leakage-aware multiprocessor scheduling algorithm to reduce the total energy consumption [4]. This algorithm finds the schedule using a scheduler which schedules the tasks according to their earliest deadlines first. Next, binary search is used to find the number of required processors in such a way that the total length of the schedule is not more than that obtained earlier. After that, frequency and voltage scaling technique is applied to reduce the energy consumed further. This algorithm was designed for the homogeneous multiprocessor systems.

Some authors proposed duplication-based scheduling algorithms [12, 32] and clustering-based scheduling algorithms [16, 26] to minimize the energy consumption. In [32], the authors presented two energy efficient duplication-based scheduling algorithms named EAD (Energy-Aware Duplication) and PEBD (Performance-Energy Balanced Duplication) for parallel applications on homogeneous clusters. The algorithms leverage DVFS concept to conserve energy dissipation in processors. In [12], the authors proposed an algorithm which minimizes the duplication and saves energy for homogeneous systems. For this, an ideal threshold was calculated which depended on the schedule length and the power of the processor. Tasks are duplicated only up to this threshold. Moreover, the duplicating tasks have to satisfy the property that because of it energy consumed will not increase and the performance will be retained. The tasks are grouped based on their previous tasks communicating from the graph, and the group of tasks is assigned to any of the available processors. The tasks are executed at the highest frequency levels, and DVFS is used to scale only on the times when the processors are idle. In [16], the authors have given a power-aware task scheduling algorithm called EAEPS (Energy Aware Edge Priority Scheduling) for multiprocessors. This algorithm is an energy aware version of the EPS (Edge Priority Scheduling) algorithm [15]. The algorithm zeros the edges of higher priorities and reduces the energy consumption by exploiting DVFS technique on non-critical tasks. It utilizes the concept for voltage scaling for non-critical tasks of the application similar to [26].

Wang et. al [27] studied the slack time for non-critical tasks and reduced energy consumption without enhancing the makespan of the applications. Further, they considered the green SLA (Service Level Agreement) and presented energy aware task scheduling heuristics such as best effort power scheduling algorithm and energy-performance tradeoff scheduling algorithm to reduce energy consumption in DVFS enabled clusters. In [26], authors extended the work of [27] and proposed two energy-aware task scheduling for parallel applications on heterogeneous multiprocessor systems named PALS (Power Aware List Scheduling) and PATC (Power Aware Task Clustering). In this work, along with energy-aware scheduling, the authors have also considered how to allocate data to different memories on a heterogeneous system which consists of shared memory for various processors.

In [20], the authors developed a power-aware task scheduling algorithm called PASTA for heterogeneous systems. It follows a score based approach. The goal here is to find out of the given set of processors a combination which has its makespan same as that of the HEFT (Heterogeneous Earliest Finish Time) algorithm [25]. The calculation of score is done for the processors. Next method is the scoring of tasks by multiplying their earliest first time and how much energy was consumed on all the chosen processors. Then, for the given task we select the processor which score the best that is which had the least score. It has quite a high makespan.

In [28], the authors defined a term ECR (energy consumption ratio) to compute the energy efficient schedule and converted the energy-efficient task scheduling problem into minimization of the total ECR problem. They proved the minimization of ECR as NP-hard problem by transforming it into variable bin-packing problem and given two heuristics such as MECRI (Minimum Energy Consumption Ratio Increment) and PTAB (Power-Aware Task Allocation for Batch Tasks) to minimize ECR. Further, they presented an algorithm for task migration and frequency readjustment which again minimize the energy consumption of the server.

In [29], the authors proposed an algorithm to minimize energy consumption for DVFS enabled systems. The algorithm is developed using an upward approach (from exit to entry tasks) as well as a downward approach (from entry to exit tasks). The algorithm uses an existing algorithm HEFT [25] to compute the schedule length of the application.

Hu et. al. [7] developed an algorithm called EASLA for cluster systems having DVFS enabled processors. The main idea behind their algorithm is to distribute slacks of set of tasks and scale voltage/frequencies down to achieve energy reduction. It uses EFT (Earliest Finish time) algorithm and iteratively allocates slacks to max independent set which achieves maximal energy reduction. Our approach is quite similar to it too but we make use of PEFT algorithm [2] which has proved to be faster than it.

3 System model and problem formulation

This section introduces the system model used in the work and formulates the scheduling problem. The notations used throughout the text of this paper with their respective descriptions are shown in Table 1.

Table 1 Symbols and their meaning

Full size table

3.1 Cluster model

The cluster model utilized in this work consists of a set of heterogeneous processors $P = \{p_1, p_2,....,p_p\}$ which are fully interconnected with each other through the identical communication links. Each processor in a cluster system is DVFS enabled and have different voltage levels with corresponding frequencies. The latency of communication between processors is assumed to be negligible. When two tasks are executed by the same processor, the communication cost between them is omitted. A task which is getting executed by a processor cannot be preemptively interrupted by another task. A processor can simultaneously perform execution and communication at the same time. Also, duplication of tasks is not considered in this model.

3.2 Application model

An application is, a set of tasks, depicted using a Directed Acyclic Graph (DAG), $G = (V, E)$, where V represents the set of nodes, and each node denotes a task, and E is the set of directed edges between tasks. In the DAG, a directed edge ${e}_{i,j} \in E$ represents the precedence constraint from task ${t}_{i} \in V$ to ${t}_{j} \in V$ which means ${t}_{j}$ cannot begin its execution until task ${t}_{i}$ has finished its execution and output is transferred to ${t}_{j}$. The weight on edge is denoted by $c_{i,j}$ which gives the communication time from task ${t}_{i}$ to ${t}_{j}$. The communication time between two tasks is considered to be zero when both tasks are executed by the same processor. Also, a computation cost matrix for tasks is also given which gives execution cost of a task ${t}_{i} \in V$ on different processors of the cluster system. For instance, $w_{i,j}$ represents the computation cost of a task $t_i$ on processor $p_j$. A sample DAG [8] and its computation matrix are given in Fig. 1 and Table 2 respectively.

Table 2 Computation cost matrix of the tasks on each processor for a 3-processor system

Full size table

3.3 Energy model

The Energy model used in this work is based on the power consumption model of the CMOS-based logic circuits. The processors considered in this model are DVFS enabled and we can minimize the energy consumption by scaling down the frequencies and supply voltages of the processors appropriately. Power consumption in CMOS-based logic circuits is due to leakage, short-circuit and processing of the instructions etc. Generally, the power consumption by the processor is the sum of its static power consumption and dynamic power consumption. Here, static power consumption is not affected by the DVFS state of processors. Thus, the total power consumption of the processor is equal to its dynamic power consumption and is expressed as follows:

$$\begin{aligned} {\mathscr {P}}_{c} = C_{eff}V_{j}^2f_{j} \end{aligned}$$

(1)

where $C_{eff}$ gives the effective capacitance, $V_{j}$ and $f_{j}$ is the supplied voltage and operating frequency of processor at $j^{th}$ state [6]. Thus, the power consumed is reduced by using DVFS as their is a quadratic relationship between power consumed and voltage supplied.

Let, ${\mathscr {P}}^{i}_j$ represents power consumed by the processor working on task $t_i$ at power state j and $w^{i}_j$ be the computation cost of task ${t}_{i}$ at power state j. Each state of DVFS enabled processor has its own power consumption rate and service rate (SR). We take highest power state as 0 and further states by increasing the index. Let the processor be k, then we get $w^{i}_j$ as

$$\begin{aligned} w^{i}_j = w_{i,k} \cdot \frac{SR_0}{SR_j} \end{aligned}$$

(2)

Taking n as the number of tasks in the application, the total energy consumed by the system can be written as

$$\begin{aligned} EN = \sum _{i=0}^{n-1} {\mathscr {P}}^{i}_j \cdot w^{i}_j = \sum _{i=0}^{n-1} C_{eff} (V^{i}_j)^2 f^{i}_j \cdot w^{i}_j \end{aligned}$$

(3)

where $f^{i}_j$ represents the processor frequency, and $V^{i}_j$ represents the voltage, on which task $t_i$ is executed at power state j.

3.4 Problem formulation

This subsection first gives some known definitions ascribing common properties of task scheduling and then formulates the task scheduling problem.

Definition 1

($parent({t}_{i})$). It denotes the set of direct parents of task ${t}_{i}$ in a graph. An entry task, ${t}_{entry}$ is a task which doesn$'$t have any parent. A graph may contains multiple entry tasks, if it does, a dummy entry task having zero computation and communication costs may be added to the graph.

Definition 2

($child({t}_{i})$). It denotes the set of direct children of task ${t}_{i}$ in a graph. An exit task, ${t}_{exit}$ is a task which doesn$'$t have any child. A graph may contains multiple exit tasks, if it does, a dummy exit task having zero computation and communication costs from existing exit tasks to dummy task may be added to the graph.

The uniqueness of ${t}_{entry}$ and ${t}_{exit}$ is not presumed in this work.

Definition 3

($AFT({t}_{i})$). It is the Actual Finish Time of a task ${t}_{i}$ and is defined as the point of time where the execution of task ${t}_{i}$ is finished by some allocated processor.

Definition 4

Makespan or schedule length refers to the finish time of the last task in the scheduled DAG and it can be defined as follows:

$$\begin{aligned} makespan = AFT(t_{exit}) \end{aligned}$$

(4)

where $AFT(t_{exit})$ is the Actual finish time of the exit task.

Definition 5

($EST(t_i,p_j)$). It denotes the Earliest Start Time of a task $t_i$ on a processor $p_j$ and is obtained as follows:

$$\begin{aligned}&EST(t_i,p_j) \nonumber \\&\quad = \max \Big \{T_{av}(p_j), \max _{t_k \in parent(t_i)} \{AFT(t_k) + c_{k,i} \} \Big \} \end{aligned}$$

(5)

where $T_{av}(p_j)$ is the lowest time at which processor $p_j$ becomes available. Also, the communication cost $c_{k,i}$ is taken as zero if $p_k$ is allocated to the processor $p_j$. EST is zero for the entry task.

Definition 6

($EFT(t_i,p_j)$). It denotes the Earliest Finish Time of a task $t_i$ on a processor $p_j$ and is obtained as

$$\begin{aligned} EFT(t_i,p_j) = EST(t_i,p_j) + w_{i,j} \end{aligned}$$

(6)

which is summation of the EST of a task $t_i$ on a processor $p_j$ and the execution cost of $t_i$ on a processor $p_j$.

Definition 7

($LFT(t_i,p_j)$). It denotes the Latest Finish Time of a task $t_i$ on a processor $p_j$ and is defined as follows:

$$\begin{aligned}&LFT(t_i,p_j)\nonumber \\&\quad = \min _{p_x \in P} \Big \{ \min _{t_k \in child(t_i)} \{LFT(t_k,p_x) - c_{i,k} - w_{k,x} \} \Big \} \end{aligned}$$

(7)

$c_{i,k}$ is zero if $p_x = p_j$. LFT is equal to makespan of exit task on all processors.

Definition 8

($OCT(t_i,p_j)$). It denotes the Optimistic Cost Table value which indicates the maximum shortest path from a $t_i$ children$'$s tasks to the exit task using processor $p_j$ for task $t_i$. It is defined as follows:

$$\begin{aligned}&OCT(t_i,p_j) \nonumber \\&\quad = \max _{t_k \in child(t_i)} \Big \{ \min _{p_x \in P} \{ OCT(t_k,p_x) + w_{k,x} + \overline{c_{i,k}} \} \Big \} \end{aligned}$$

(8)

$\overline{c_{i,k}}$ is zero if $p_x = p_j$.

$OCT(t_{exit},p_j) = 0$ for all processors $p_j$. The definition of OCT is taken from [2].

Our problem can be formulated as follows. Given, an application graph consisting of |V| tasks and their precedence constrained defined. We have p number of processors which have their own states for DVFS. Our goal is to minimize energy consumption in the cluster system as much as possible while keeping the length of schedule to be the same as that achieved by the best effort algorithm (PEFT in our case).

Let, $EN_{best\_effort}$ be the energy consumed in scheduling by the best effort algorithm and it is calculated by running all processors at highest power states and $EN_{algo}$ be the energy consumption by an algorithm using DVFS technique. The energy saving (ES)metric can be defined from [7] as follows:

$$\begin{aligned} ES = \frac{EN_{best\_effort} - {EN}_{algo}}{EN_{best\_effort}} \end{aligned}$$

(9)

Hence, we have to maximize the value of ES, as expressed by Eq. (9).

4 The improved NCM algorithm

In [7], authors presented an energy-aware scheduling algorithm called EASLA (Energy-Aware Service Level Agreement) for cluster systems having DVFS enabled processors. The EASLA algorithm reduces energy consumption by utilizing the slack of tasks and downscaling the frequency of processors. This algorithm obtains the schedule from the ETF (Earliest Task First) algorithm as input and consists of two sub-algorithms named NCM (not changing makespan) and UAME (using accepted makespan extension). The NCM algorithm is used to downscale frequency when schedule length does not change whereas the UAME algorithm is used to downscale frequency when schedule length is extended. In this work, we present an improved NCM algorithm for heterogeneous cluster systems that is used to downscale frequencies when schedule length does not change. As like NCM [7], it reduces the energy consumption by using the slacks of the tasks and without affecting the schedule length of the application. The improved NCM algorithm differs from the NCM algorithm as it makes use of the PEFT algorithm [2] to determine the makespan of the application. The PEFT is a fast and low complexity list-based scheduling algorithm which uses a lookahead feature to compute the schedule of the application.

Algorithm 1 presents the improved NCM algorithm. The algorithm first computes the makespan of the given application according to the PEFT algorithm [2] and then downscales the frequencies of the processors according to the NCM algorithm [7]. The main steps of the algorithm are as follows. A matrix called Optimistic Cost Table(OCT) is made in which each element $OCT(t_{i},p_{j})$ gives the maximum of the shortest paths from the task $t_{i}$’s children nodes to the exit node on selecting processor $p_{j}$. Then we find, Optimistic EFT (OEFT) which is the sum of OCT computed before and its Earliest Finish Time(EFT). Out of the tasks which can be executed, the task with highest OCT is chosen, then it is assigned to the processor with minimum OEFT for it.

$$\begin{aligned} OEFT(t_i, p_j) = EFT(t_i, p_j) + OCT(t_i, p_j) \end{aligned}$$

(10)

Next step is the calculation of available slack for a task. Makespan is the overhead of the time required to complete all the tasks. We find the slack as the difference between its LFT and the sum of its EST and computation cost [7].

$$\begin{aligned} slack_i = LFT(t_i,p_j) - EST(t_i,p_j) - w_{i,j} \end{aligned}$$

(11)

Here, $p_j$ is the processor which is assigned to the $i^{th}$ task in our cluster system. Only the tasks which are independent of each other can share slack or in other words the tasks which are not reachable from each other in the DAG can share slack.

Initialize a list L with all the tasks and sort them in non-increasing order of their OCT values. Calculate EST for each task in increasing order of OCT values and LFT for each task in reverse order of OCT values. If the task being iterated on cannot be scaled down using the slack available it is deleted from L. Otherwise, apply frequency/voltage scaling on the task’s assigned processor and calculate the amount of energy saved. Out of the remaining tasks, the task which can get maximum energy reduction is scaled down using DVFS. Then, the voltage, frequency and computation time of the tasks are updated according to the new value. We terminate the algorithm when L becomes empty.

Thus, at the end we get a schedule which gives the processors assigned to tasks, their order and the level of the DVFS enabled processor used for the particular tasks.

4.1 Time complexity of algorithm

The time complexity of our algorithm can be discussed as follows: assuming |V| as the number of tasks, |E| as the number of edges, p as processors and m as the number of power states for processors, the first step is to schedule using the best effort algorithm i.e. PEFT. Its complexity is known to be $O(p(|E| + |V|) + |V|^{2}.p)$. Then, we have to sort the tasks by their OCT values which takes O(|V|log|V|) time. Next, we iteratively find the available slack times for the tasks. Then, we have to check for the processor’s DVFS states to scaled down using the slack time. It requires $O(m|V| + |E|)$ time and after all the iteration, the time becomes $O(|V|(m|V| + |E|))$. Thus, the time complexity of our algorithm is $O(p(|E| + |V|) + |V|^{2}.p + |V|(log|V| + m|V| + |E|))$.

4.2 An illustrative example

Consider the DAG given in Fig. 1 which contains eight tasks labeled ${t}_{0}$ to ${t}_{7}$. A corresponding computation cost matrix for the tasks is given in Table 2 for three processor system. The tasks ${t}_{0}$ and ${t}_{7}$ are the entry and exit tasks of the DAG respectively. The edges of the graph are labeled with the communication costs.

The improved NCM algorithm initially calculates the schedule length of the given DAG by using PEFT algorithm and then determines the slack of tasks using NCM algorithm. The PEFT first computes the OCT of each tasks for different processors and an average OCT for all tasks. The OCT for this example is shown in Table 3. Based on the OCT computation, tasks are assigned to the processors in order. The tasks selected and processors allocated are given along with their actual finish times in Table 4. After this, slack times are calculated for the various tasks. The slack times of tasks are given in Table 5.

Table 3 OCT for the illustration

Full size table

Table 4 Tasks and the processors assigned and the actual finish time

Full size table

Table 5 Slack times after first iteration

Full size table

Based on the slack time of the tasks, DVFS is applied and energy saved according to our discussed algorithm. Here, the tasks $t_2$, $t_5$ and $t_7$ can$'$t be scaled down and removed from the list L. Task $t_4$ achieves maximum ES for DVFS state 5 of processor 2. Thus, to save energy consumption, power state of processor 2 is scaled down from state 0 to state 5. After this task $t_4$ is removed from the list L. Now, the tasks $t_0$, $t_1$, $t_3$ and $t_6$ remain in L. Similarly, further iterations are done using slack allocation and the overall energy is saved as 12.29 % by using our algorithm compared to best effort algorithm. The energy for best effort algorithm is calculated by running all processors at highest power states.

5 Experimental results and discussion

In this section, we evaluate the results of the improved NCM algorithm and compare its performance with the NCM algorithm [7]. For comparison of the performance of the algorithms, we used energy saving (ES) as a comparison parameter, as given by Eq. (9). For this purpose, we have generated some random graphs based on different input parameters like number of tasks, CCR values, fat, and regularity. We performed the simulations in C++ programming language and use three different types of processors as listed below. The processors are DVFS enabled and their details are taken from [24]. The specification of the processors is given in the Table 6 and Table 7.

AMD Opteron processors (high-performance and high-energy).
Intel Pentium M processors (low-performance and low-energy).
VIA C7-M processors (medium-performance and low-energy).

Table 6 P-states and power consumption

Full size table

Table 7 Service rate of processors

Full size table

The Random DAGs, used in the experiment, are generated using some defined parameters [17]:

n: It denotes the number of tasks present in an input DAG.
CCR: Communication-to-computation ratio. It represents the ratio of the sum of communication costs of all edges and the sum of computation costs on all the nodes in a DAG.
fat: It affects the width and height of the DAG. The width and height of the DAG are maintained by a uniform distribution using $fat \times \sqrt{n}$ and $\frac{\sqrt{n}}{fat}$ as a mean value for width and height, respectively.
regularity: It determines how uniformly tasks are distributed at each level of the graph. The smaller value of regularity represents that each level of the DAG contains different number of tasks, whereas a higher value of regularity indicates that at each level of the graph, the number of tasks are almost same.

Table 8 Parameters and their corresponding values used in the generation of random graphs

Full size table

Table 8 gives the details of the parameters used to generate the random graphs. 144 DAGs are generated by using these parameters$'$ combinations. Further, for each DAG, we generated ten different random graphs having same structure but with different execution and communication costs on nodes and edges, respectively. Thus, totally 1440 random DAGs are used in this work.

Figs. 2, 3, 4, 5 show the ES results of random graphs with respect to number of tasks for different number of processors such as 3, 6, 9, and 12. It is observed from the results that the improved NCM algorithm gives better ES as compared to the NCM algorithm for various number of processors. For example, when DAG size is 200, the improved NCM algorithm saves 13.71 %, 14.75 %, 15.14 %, and 16.03% energy as compared to the NCM algorithm for the number of processors 3, 6, 9, and 12, respectively. It is also noticed that increasing the number of processors also leads to a slight increase in ES. This can be credited to the fact that with an increase in the number of processors, the number of tasks allocated to each processor decreases. In our experiments, the results of the improved NCM algorithm always outperform the NCM algorithm.

Figs. 6, 7, 8, 9, 10, 11 show the ES results of random graphs with respect to number of processors for different number of tasks such as 10, 20, 40, 80, 100, and 200. It is observed from the results that the improved NCM algorithm gives better ES as compared to the NCM algorithm for various number of tasks. For example, when number of processors are 9, the improved NCM algorithm saves 13.12 %, 13.47 %, 13.96 %, 14.59 %, 14.87 %, and 15.14% energy as compared to the NCM algorithm for the number of tasks 10, 20, 40, 80, 100, and 200, respectively. It is also noticed that as the number of tasks in the application increases, energy saving also increases. Here too, the improved NCM algorithm performs better than the NCM algorithm in all cases. Though, it stops increasing at some point as the number of slacks to be used decreases.

Fig. 12 presents the energy saving for random graphs with respect to CCR. We have used 0.2, 0.5, 1, and 2 as CCR values for the experiments. For all values of CCR, it is observed from the results that the improved NCM algorithm gives better energy savings as compared to the NCM algorithm. The improved NCM algorithm saves upto 14.93 % energy as compared to the NCM algorithm.

6 Conclusions

In this paper, we presented an algorithm for energy-aware scheduling using slack reclamation for heterogeneous cluster systems in which processors are DVFS enabled. The algorithm decreases the energy consumption as much as possible without affecting the schedule length. The presented algorithm is an improved version of the NCM algorithm which is a sub-algorithm of well-known EASLA algorithm. The time complexity of the proposed algorithm is $O(p(|E| + |V|) + |V|^{2}.p + |V|(log|V| + m|V| + |E|))$, where |V| and |E| are the number of tasks and edges in a given DAG, and p and m are the number of processors and total number of DVFS power states. We performed experiments for various randomly generated DAGs and compared performance of the improved NCM algorithm with the NCM algorithm. The results demonstrate that the improved NCM algorithm shows better performance than the NCM algorithm in terms of energy saving. In future works, we may also add some real world application graphs to our experiment. Furthermore, real systems may be used to test applications and their efficiency and energy savings recorded.

References

Power and cooling in data center. https://www.amd.com/Documents/34146A_PC_WP_en.pdf (2019). Accessed: 23 May 2019
Arabnejad, H., Barbosa, J.G.: List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Trans. Parallel Distrib. Syst. 25(3), 682–694 (2014)
Article Google Scholar
Benini, L., Bogliolo, A., De Micheli, G.: A survey of design techniques for system-level dynamic power management. IEEE Trans. Very Large Scale Integr. Syst. 8(3), 299–316 (2000)
De Langen, P., Juurlink, B.: Leakage-aware multiprocessor scheduling for low power. In: Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, pp. 8–pp. IEEE (2006)
Forrest, W.: How to cut data centre carbon emissions. https://www.computerweekly.com/feature/How-to-cut-data-centre-carbon-emissions(2019). . Accessed: 23 May 2019
Ge, R., Feng, X., Cameron, K.W.: Performance-constrained distributed dvs scheduling for scientific applications on power-aware clusters. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, pp. 34–34. IEEE (2005)
Hu, Y., Liu, C., Li, K., Chen, X., Li, K.: Slack allocation algorithm for energy minimization in cluster systems. Future Gener. Comput. Syst. 74, 119–131 (2017)
Article Google Scholar
Kaur, N., Bansal, S., Bansal, R.K.: Duplication-controlled static energy-efficient scheduling on multiprocessor computing system. Concurr. Comput. 29(12), (2017)
Kliazovich, D., Bouvry, P., Khan, S.U.: Dens: data center energy-efficient network-aware scheduling. Clust. Comput. 16(1), 65–75 (2013)
Article Google Scholar
Kumar, N., Vidyarthi, D.P.: An energy aware cost effective scheduling framework for heterogeneous cluster system. Future Gener. Comput. Syst. 71, 73–88 (2017)
Article Google Scholar
Lee, Y.C., Zomaya, A.Y.: Energy conscious scheduling for distributed computing systems under different operating conditions. IEEE Trans. Parallel Distrib. Syst. 22(8), 1374–1381 (2011)
Article Google Scholar
Liu, W., Du, W., Chen, J., Wang, W., Zeng, G.: Adaptive energy-efficient scheduling algorithm for parallel tasks on homogeneous clusters. J. Netw. Comput. Appl. 41, 101–113 (2014)
Article Google Scholar
Maurya, A.K., Tripathi, A.K.: Performance comparison of heft, lookahead, ceft and peft scheduling algorithms for heterogeneous computing systems. Proceedings of the 7th International Conference on Computer and Communication Technology (ICCCT’2017), pp. 128–132. ACM (2017)
Maurya, A.K., Tripathi, A.K.: Ecp: A novel clustering-based technique to schedule precedence constrained tasks on multiprocessor computing systems. Computing 100(8), 1015–1039 (2018)
Article MathSciNet Google Scholar
Maurya, A.K., Tripathi, A.K.: An edge priority-based clustering algorithm for multiprocessor environments. Concurr. Comput. 31(11), e5060 (2018)
Article Google Scholar
Maurya, A.K., Tripathi, A.K.: An energy aware edge priority-based scheduling algorithm for multiprocessor environments. Proceedings of the 24th International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’18), pp. 42–46 (2018)
Maurya, A.K., Tripathi, A.K.: On benchmarking task scheduling algorithms for heterogeneous computing systems. J. Supercomput. 74(7), 3039–3070 (2018). https://doi.org/10.1007/s11227-018-2355-0
Article Google Scholar
Mei, J., Li, K., Li, K.: Energy-aware task scheduling in heterogeneous computing environments. Clust. Comput. 17(2), 537–550 (2014)
Article Google Scholar
Rivoire, S., Shah, M.A., Ranganathan, P., Kozyrakis, C.: Joulesort: a balanced energy-efficiency benchmark. In: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 365–376. ACM (2007)
Sharifi, M., Shahrivari, S., Salimi, H.: Pasta: a power-aware solution to scheduling of precedence-constrained tasks on heterogeneous computing resources. Computing 95(1), 67–88 (2013)
Article Google Scholar
Simunic, T., Benini, L., Acquaviva, A., Glynn, P., De Micheli, G.: Dynamic voltage scaling and power management for portable systems. In: Proceedings of the 38th annual Design Automation Conference, pp. 524–529. ACM (2001)
Srivastava, M.B., Chandrakasan, A.P., Brodersen, R.W.: Predictive system shutdown and other architectural techniques for energy efficient programmable computation. IEEE Trans. Very Large Scale Integr. Syst. 4(1), 42–55 (1996)
Swaminathan, V., Chakrabarty, K.: Pruning-based, energy-optimal, deterministic i/o device scheduling for hard real-time systems. ACM Trans. Embed. Comput. Syst. 4(1), 141–167 (2005)
Article Google Scholar
Terzopoulos, G., Karatza, H.D.: Power-aware bag-of-tasks scheduling on heterogeneous platforms. Clust. Comput. 19(2), 615–631 (2016)
Article Google Scholar
Topcuoglu, H., Hariri, S., Wu, M.y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Wang, L., Khan, S.U., Chen, D., KołOdziej, J., Ranjan, R., Xu, C.Z., Zomaya, A.: Energy-aware parallel task scheduling in a cluster. Future Gener. Comput. Syst. 29(7), 1661–1670 (2013)
Article Google Scholar
Wang, L., Von Laszewski, G., Dayal, J., Wang, F.: Towards energy aware scheduling for precedence constrained parallel tasks in a cluster with dvfs. In: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 368–377. IEEE Computer Society (2010)
Wang, S., Qian, Z., Yuan, J., You, I.: A dvfs based energy-efficient tasks scheduling in a data center. IEEE Access 5, 13090–13102 (2017)
Article Google Scholar
Xie, G., Jiang, J., Liu, Y., Li, R., Li, K.: Minimizing energy consumption of real-time parallel applications using downward and upward approaches on heterogeneous systems. IEEE Trans. Ind. Inf. 13(3), 1068–1078 (2017)
Article Google Scholar
Zahaf, H.E., Benyamina, A.E.H., Olejnik, R., Lipari, G.: Energy-efficient scheduling for moldable real-time tasks on heterogeneous computing platforms. J. Syst. Archit. 74, 46–60 (2017)
Article Google Scholar
Zhu, X., He, C., Li, K., Qin, X.: Adaptive energy-efficient scheduling for real-time tasks on dvs-enabled heterogeneous clusters. J. Parallel Distrib. comput. 72(6), 751–763 (2012)
Article Google Scholar
Zong, Z., Manzanares, A., Ruan, X., Qin, X.: Ead and pebd: two energy-aware duplication scheduling algorithms for parallel tasks on homogeneous clusters. IEEE Trans. Comput. 60(3), 360–374 (2011)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous referees for their useful suggestions.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology (BHU) Varanasi, Varanasi, 221005, India
Ashish Kumar Maurya, Kashish Modi & Anil Kumar Tripathi
Department of Computer Science and Engineering, National Institute of Technology, Jamshedpur, India
Vinay Kumar
Department of Computer Science and Engineering, Dr. S.P.M. International Institute of Information Technology Naya Raipur, Chhattisgarh, India
Nenavath Srinivas Naik

Authors

Ashish Kumar Maurya
View author publications
You can also search for this author in PubMed Google Scholar
Kashish Modi
View author publications
You can also search for this author in PubMed Google Scholar
Vinay Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Nenavath Srinivas Naik
View author publications
You can also search for this author in PubMed Google Scholar
Anil Kumar Tripathi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashish Kumar Maurya.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maurya, A.K., Modi, K., Kumar, V. et al. Energy-aware scheduling using slack reclamation for cluster systems. Cluster Comput 23, 911–923 (2020). https://doi.org/10.1007/s10586-019-02965-7

Download citation

Received: 26 July 2018
Revised: 02 July 2019
Accepted: 23 July 2019
Published: 29 July 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10586-019-02965-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Energy-aware scheduling using slack reclamation for cluster systems

Abstract

Similar content being viewed by others

Dynamic power management under the RUN scheduling algorithm: a slack filling approach

A Power-Aware Scheduler Exploiting All Slacks under EDF Scheduling

A reformed task scheduling algorithm for heterogeneous distributed systems with energy consumption constraints

1 Introduction

2 Related work