# Power Consumption Characterization of Synthetic Benchmarks in Multicores

Jonathan Muraña $^{1(\boxtimes)},$ Sergio Nesmachnow<sup>1</sup>, Santiago Iturriaga<sup>1</sup>, and Andrei Tchernykh<sup>2</sup>

 <sup>1</sup> Universidad de la República, Montevideo, Uruguay {jmurana,sergion,siturria}@fing.edu.uy
 <sup>2</sup> CICESE Research Center, Ensenada, Baja California, Mexico chernykh@cicese.mx

Abstract. This article presents an empirical evaluation of power consumption of synthetic benchmarks in multicore computing systems. The study aims at providing an insight of the main power consumption characteristics of different applications when executing over current high performance computing servers. Three types of applications are studied executing individually and simultaneously on the same server. Intel and AMD architectures are studied in an experimental setting for evaluating the overall power consumption of each application. The main results indicate the power consumption behavior has a strong dependency with the type of application. An additional performance analysis shows that the best load of the server regarding energy efficiency depends on the type of the applications, with efficiency decreasing in heavily loaded situations. These results allow characterizing applications according to power consumption, efficiency, and resource sharing, and provide useful information for resource management and scheduling policies.

**Keywords:** Green computing  $\cdot$  Energy efficiency  $\cdot$  Multicores Computing efficiency

## 1 Introduction

Nowadays, data centers are key infrastructures for executing industrial and scientific applications. Data centers have become highly popular for providing storage, computing power, middleware software, and others information technology (IT) utilities, available to researchers with ubiquitous access [3]. However, their energy efficiency has become a major concern in recent years, having a significant impact on monetary cost, environment, and guarantees for service-level agreements (SLA) [4].

The main sources of power consumption in data centers are the computational resources and the cooling system [13]. When focusing on power consumption due to resource utilization, several techniques for hardware and software optimization can be applied to improve energy efficiency. Software characterization techniques [1] are used to determine features that are useful to analyze

<sup>©</sup> Springer International Publishing AG 2018

E. Mocskos and S. Nesmachnow (Eds.): CARLA 2017, CCIS 796, pp. 21–37, 2018. https://doi.org/10.1007/978-3-319-73353-1\_2

the software behavior, including power consumption [2]. Estimation of power consumption is an important issue that can determine the quality and even the feasibility of both software products and big data center infrastructures.

In this line of work, this article focuses on the characterization of power consumption for scientific computing applications over nowadays multicore hardware used in scientific computing platforms. Such characterization is useful for designing energy efficient scheduling strategies for scientific computing platforms. Three synthetic benchmarks are studied over two physical setups from a real High Performance Computing (HPC) platform, registering their power consumption with a power meter device. Furthermore, the experimental analysis studies the power consumption of different applications sharing a computing resource via simultaneous execution. The proposed study is very relevant for nowadays data centers and HPC infrastructures, where many applications are executed at the simultaneously. This impacts on both the energy efficiency and the quality of service (QoS) offered to the users of the platform.

The article is organized as follows. Section 2 reviews related works on energy characterization in multicores. Section 3 describes the proposed methodology for energy characterization, the benchmarks, and the physical setup for experiments. The experimental results are reported and discussed in Sect. 4. The conclusions and the main lines for future work are presented in Sect. 5.

### 2 Related Work

The analysis of related works allows identifying two types of studies. A first group proposes the empirical evaluation of scientific applications to develop specific models or to adjust existing models for studying power and performance behavior. Other works introduces optimization techniques over existing models and presents optimized results. This section focuses on reviewing those articles about empirical measurements and evaluation of power consumption models.

Iturriaga et al. [7] studied the problem of finding schedules with appropriate trade-off between power consumption and execution time in heterogeneous computers systems, considering uncertainty. Specific versions of well-known heuristic were proposed for scheduling on realistic scenarios, applying the power consumption model in [12] and considering only CPU-bound workloads. A model for uncertainty on power consumption was determined through empirical evaluations using three CPU-bound benchmarks. Regarding scheduling results, online heuristics computed better schedules than offline approaches. Results also confirmed that uncertainly has a significant impact in the accuracy of the scheduling algorithms. The power consumption behavior of CPU-bound benchmarks shown in [7] is consistent with the one in our research. Moreover, we propose a fully empirical power consumption characterization, considering also two additional types of benchmarks: memory bound and disk bound.

Srikantaiah et al. [14] studied workload consolidation strategies for energy optimization in cloud computing systems. An empirical study of the relationship between power consumption, performance, and resource utilization was presented. The experiments were executed in four physical server connected to a power meter to track the power consumption. The resource utilization was monitorized using the Xperf toolkit. Only two resources were considered in the study: processor and disk. The performance degraded for high levels of disk utilization, and variations in CPU usage did not result in significant performance variations. Energy results were presented in terms of power consumption per transaction (including power consumption in idle state), resources utilization, and performance degradation. On the other hand, results showed that power consumption per transaction, is more sensitive to CPU utilization than disk utilization. The authors also proposed an heuristic method to solve a modified bin packing problem where the servers are bins and the computing resources are bin dimensions. Results were reported for small scenarios where the power consumption of the solutions computed by the heuristic is about 5% more than the optimal solution. The tolerance for performance degradation was 20%.

Du Bois et al. [5] presented a framework for generating workloads with specific features, applied to compare energy efficiency in commercial systems. CPUbound, memory-bound, and disk-bound benchmarks were executed on a power monitoring setup composed of an oscilloscope connected to the host and a logging machine to persist the data. Two commercial systems were studied: a high-end with AMD processors and a low-end with Intel processors. Benchmarks were executed independently, isolating the power consumption of each resource. Results confirmed that energy efficiency depends on the workload type. Comparatively, the high-end system had better results for the CPU-bound workload, the lowend system was better for disk-bound, and both had similar efficiency for the memory-bound workload. Our work complements this approach by including a study of the power consumption behavior when executing different types of tasks simultaneously on specific architectures for high performance computing.

Feng et al. [6] evaluated the energy efficiency of a high-end distributed system, with focus on scientific workloads. The authors proposed a power monitoring setup that allows isolating the power consumption of CPU, memory, and disk. The experimental analysis studied single node executions and distributed executions. In the single node experiments, results of executing a memory-bound benchmark showed that the total power consumption is distributed as follow: 35% corresponds to the CPU, 16% corresponds to the physical memory and 7% corresponds to the disk. The rest is consumed by power supply, fans, network, and other components. Idle state represented 66% of the total power consumption. In distributed experiments, benchmarks that are intensive in more than one computing resource were studied. Results showed that energy efficiency increased with the number of nodes used for execution.

Kurowski et al. [9] presented a data center simulator that allows specifying various energy models and management policies. Three types of theoretical energy models are proposed: (i) *static approach*, which consider a unique power value by processing unit; (ii) *dynamic approach*, which consider power levels, representing the usage of the processing unit; and (iii) *application specific approach*, which consider usage of application resources to determine the power consumption. Simulation results were compared with empirical measurements over real hardware to validate the theoretical energy models in arbitrary scenarios. All models obtained accurate results (error was less than 10% with respect to empirical measurements), and the dynamic approach was the most precise.

Langer et al. [10] studied energy efficiency of low voltage operations in manycore chips. Two scientific applications were considered for benchmarking over a multicore simulator. The performance model considered for a chip was  $S = a_k(\sum f_i) + b_k$ , where S are the instructions per cycle, f is the frequency of the core i and  $a_k, b_k$  are constants that depend on k, the number of cores in the chip. A similar model is used for power consumption. Across 25 different chips, an optimization method based on integer linear programming achieved 26% in energy savings regarding to the power consumption of the faster configuration.

Several works in literature have focused on modeling and characterizing power consumption of scientific applications. However, to the best of our knowledge there is no empirical research focused on the inter-relationship between power consumption and CPU, memory, and disk utilization. Also, there is no experimental analysis of critical levels of resource utilization (close to 100%) and its impact on power consumption and performance. This article contributes in this line of research, proposing empirical analysis for both aforementioned issues.

## 3 Methodology for Power Consumption Evaluation

This section describes the proposed methodology for power consumption evaluation, the benchmarks and architectures studied, the power evaluation setup, and the experiments designed.

## 3.1 Overview of the Proposed Methodology

Experiments characterize the power consumption of the most relevant computing resources: CPU, memory, and disk [5–7]. We aim at studying holistic behaviors and analyzing the power consumption of hosts close to 100% of computing resource utilization. The analysis is complemented with performance experiments to study the trade-off between power consumption and performance degradation.

## 3.2 Benchmarks

The benchmarks used in the analysis are part of the Sysbench toolkit [8]. Sysbench is a cross-platform software written in C that provides CPU, memory, and disk intensive benchmarks for performance evaluation. The components used in the experiments are:

1. *CPU-bound*: an algorithm that calculates  $\pi(n)$  (the prime counting function) using a backtracking method. The algorithm contains loops, square root and module operations, as described in Algorithm 1.

#### Algorithm 1. CPU-bound benchmark code

```
1: C \leftarrow 3
 2: while C < MAX_PRIME do
       T \leftarrow sqrt(C)
 3:
       L \leftarrow 2
 4:
       while L < T do
 5:
          if C \pmod{T} = 0 then
 6:
 7:
            break
 8:
         end if
 9:
         if L > T = 0 then
10:
            N \leftarrow N + 1
11:
          end if
12:
          L \leftarrow L + 1
       end while
13:
       C \leftarrow C + 1
14:
15: end while
16: return N
```

2. *Memory-bound*: a program that executes write operations in memory, as described in Algorithm 2, where the *BUF* variable is an array of integers. The cells of the array are overwritten with value of *TMP* until the last position of the array, i.e., the value of the *END* variable.

### Algorithm 2. Memory-bound benchmark code

| 1: | while $BUF < END$ do     |
|----|--------------------------|
| 2: | $*BUF \leftarrow TMP$    |
| 3: | $BUF \leftarrow BUF + 1$ |
| 4: | end while                |

3. *Disk-bound*: a program that reads/writes content in files. Read or write requests are generated randomly and executed until a given number of requests (*MAX\_REQUEST*) is reached, as described in Algorithm 3.

| Algorithm 3. Disk-bound benchmark code |                                               |  |  |  |
|----------------------------------------|-----------------------------------------------|--|--|--|
| 1:                                     | while $REQS\_COUNT < MAX\_REQS$ do            |  |  |  |
| 2:                                     | $REQ \leftarrow generate\_rnd\_request()$     |  |  |  |
| 3:                                     | $\mathbf{if} \ is\_read(REQ) \ \mathbf{then}$ |  |  |  |
| 4:                                     | read(REQ.FILE)                                |  |  |  |
| 5:                                     | end if                                        |  |  |  |
| 6:                                     | if $is\_write(REQ)$ then                      |  |  |  |
| 7:                                     | write(REQ.FILE)                               |  |  |  |
| 8:                                     | end if                                        |  |  |  |
| 9:                                     | $REQS\_COUNT + +$                             |  |  |  |
| 10:                                    | end while                                     |  |  |  |

#### 3.3 Multicore Hosts and Power Monitoring Setup

Experiments were performed on Cluster FING, the HPC platform from Universidad de la República, Uruguay [11]. Two hosts were chosen according to their characteristics and availability: HP Proliant DL385 G7 server (2 AMD Opteron 6172 CPUs, 12 cores each, and 72 GB RAM), and HP Proliant DL380 G9 server (2 Intel Xeon CPU E5-2680v3 CPUs, 12 cores each, and 128 GB RAM).

Figure 1 presents the power monitoring setup. Benchmarks were executed in a host connected to the power source via a Power Distribution Unit (PDU) to register the instant power consumption. In an secondary machine, a polling demon logged data for post-processing. This configuration is similar to the one used in related works [5,7].



Fig. 1. Power monitoring setup

### 3.4 Design of Experiments

The power consumption (PC) of each host is computed as the average of 20 independent executions for each benchmark and combination of benchmarks. Idle consumption (IC), i.e., average consumption of the host without load, is registered to compute the effective consumption (EC) as EC = PC - IC.

In a first stage, benchmarks are evaluated independently from each other, analyzing only one resource. Utilization level (UL) is defined as the percentage of processors being used regarding the total number of processors in the host. Eight ULs were considered for single benchmark execution: 12%, 25%, 37.5%, 50%, 62.5%, 75%, 87.5% and 100%. Figure 2 shows an example of 37.5% utilization level: the host has 24 processors of which 9 execute instances of the CPU-bound benchmark. The remaining processors are in *idle* state.

In a second stage, the simultaneous execution of benchmarks is evaluated, analyzing several combinations of resource utilization at the same time. Two and three benchmarks are executed together in different ULs. In this case, UL is a vector where each entry represents the percentage of processors being used by



Fig. 2. CPU-bound benchmark, UL of 37.5%

each type of benchmark considered in the study (CPU-bound, memory-bound, and disk-bound, in that order). The utilization levels chosen were (25%, 25%), (25%, 50%), (25%, 75%), (50%, 25%), (50%, 50%), (75%, 25%) in pair executions and (25%, 25%, 25%), (25%, 25%, 50%), (25%, 50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%, 25%), (50%), (25%, 25%), (50%), (25%, 25%), (50%), (25%, 25%), (25%, 25%), (25%, 25%), (25%, 25%), (25%, 25%), (25%, 25%), (25%, 25%), (25%, 25%), (25%, 25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25%), (25



Fig. 3. CPU-bound and memory-bound benchmarks, UL of (25%, 50%).



Fig. 4. CPU-, memory-, and disk-bound benchmarks in UL of (25%, 25%, 50%).

Each instance of the memory-bound benchmark is configured to use (100/N) percent of the available memory, where N is the number of processors of the host (100% of the memory is used in full utilization mode). Each instance of the

disk benchmark is configured to use 4 GB of disk size in AMD experiments and 2 GB in Intel experiments. These sizes were chosen taking into account the disk size available in each host. Instances were executed and monitored for 60 s.

Finally, the impact on performance is analyzed. The *makespan* when executing multiple applications at the same time is compared with the makespan of single executions. The reported makespan values are the average computed over 20 independent executions.

## 4 Experimental Results

This section reports the experimental results of the power consumption and the performance evaluation. The average idle consumption was 183.4 W for the AMD host and 57.0 W for the Intel host.

#### 4.1 Single Benchmark Executions

Figure 5 reports PC and EC values for the CPU-bound benchmark and a graphic comparison of EC values in both hosts. Results show an average difference of 56 W between the EC of the Intel host and the AMD host for all ULs. The almost linear behavior indicates that power consumption is proportional to the UL.



Fig. 5. CPU-bound PC and EC results

Figure 6 reports the PC and EC values for the memory-bound benchmark and a graphic comparison of the EC values in both hosts. Results show a significant increment in EC with regard to CPU-bound executions for all ULs (104% for the AMD host and 36% for the Intel host, on average).



Fig. 6. Memory-bound PC and EC results



Fig. 7. Disk-bound PC and EC results

A logarithmic behavior is observed for both PC and EC, which does not occur in CPU-bound case, mainly due to the bottleneck in the access to the main memory that reduces the CPU usage. No significant increase is detected on high/critical ULs, possibly by effective resource contention by the operating system for solving conflicts over access to shared resources.

Figure 7 reports PC and EC values for the disk-bound benchmark and a comparison of EC values in both hosts. The maximum EC variation through ULs is 4 W in Intel and 2 W in AMD. These low power variation indicate that disk usage has low impact in power consumption in comparison with CPU and memory, mainly due to waits generated by bottlenecks in disk access.

#### 4.2 Combined Benchmark Executions

CPU and memory. Figure 8 reports PC and EC when executing CPU- and memory-bound benchmarks together and a comparison of EC values on AMD. Figure 9 reports the same analysis on Intel. Symbol  $\uparrow$  indicates the EC of the combined benchmarks is higher than the sum of the ECs of each benchmark executed independently, i.e., the combined execution is less efficient than the independent execution. Symbol  $\downarrow$  indicates the opposite, that is, the combined execution is more efficient. Symbol = indicates that the values are equal (less than 1 W of difference). Results show that for AMD, combined executions reduces EC compared to independent executions. On the contrary, EC of combined executions is higher than independent executions for most cases on Intel.



Fig. 8. Combined CPU- and memory-bound PC and EC on AMD

*CPU and disk.* Figure 10 (AMD) and Fig. 11 (Intel) report PC and EC when executing CPU- and disk-bound benchmarks combined, and the EC graphics on each host. Results show that the combined execution of CPU and disk benchmarks improves energy efficiency for most ULs in both hosts, with regard to EC.



Fig. 9. Combined CPU- and memory-bound PC and EC on Intel



Fig. 10. Combined CPU- and disk-bound PC and EC on AMD



Fig. 11. Combined CPU- and disk-bound PC and EC on Intel

*Memory and disk.* Figure 12 (AMD) and Fig. 13 (Intel) report PC and EC for memory- and disk-bound benchmarks combined, and the EC graphics on each host. Results show that the combined execution presents higher values of EC than their independent execution, except for low ULs of the AMD host.



Fig. 12. Combined memory- and disk-bound PC and EC on AMD



Fig. 13. Combined memory- and disk-bound PC and EC on Intel

*CPU, memory, and disk.* Table 1 reports PC and EC of CPU-, memory- and disk-bound benchmarks executed together. Results show that the combined execution on AMD has higher EC compared to their independent execution, mainly at high ULs. However, on Intel, combined executions reduce EC compared to independent executions for all ULs.

| UL              | AMD               |                    | Intel             |                   |  |
|-----------------|-------------------|--------------------|-------------------|-------------------|--|
|                 | PC                | EC                 | PC                | EC                |  |
| (25%, 25%, 25%) | $265.9 {\pm} 4.3$ | 82.5 =             | $176.9{\pm}3.8$   | 119.9 ↓           |  |
| (25%, 25%, 50%) | $266.6 \pm 4.8$   | 83.2 ↓             | $178.2 \pm 3.3$   | $121.6\downarrow$ |  |
| (25%, 50%, 25%) | $303.0{\pm}3.1$   | $119.6$ $\uparrow$ | $221.3 \pm 5.4$   | $164.3\downarrow$ |  |
| (50%, 25%, 25%) | $287.9 \pm 1.8$   | $104.5$ $\uparrow$ | $194.5 {\pm} 1.8$ | $137.5\downarrow$ |  |

Table 1. Combined CPU-, memory- and disk-bound PC and EC

#### 4.3 Performance Evaluation

This subsection analyzes the performance evaluation experiments.

Figure 14 reports the makespan of the CPU-bound benchmark and a graphic comparison for both hosts. Results show that increasing the UL does not impact significantly the completion time, due to the absence of resource competition. However, both hosts present a slight degradation for UL 100%, possibly due to conflicts with operating system processes.



Fig. 14. CPU-bound makespan results

Figure 15 reports the makespan of the memory-bound benchmark and a graphic comparison for both hosts. Performance degrades in AMD; there is a gap of 400 s between the lowest and the highest UL. For Intel the difference is only 48 s. The difference in gaps is possibly due to specific disk features of each host, such as transfer speed.

Figure 16 reports the makespan of the disk-bound benchmark and a graphic comparison for both hosts. The disk-bound case presents a notorious degradation in performance when increasing UL when compared with other benchmarks.

#### 4.4 Energy Efficiency Analysis

This subsection analyzes the energy efficiency from the collected measurements. The energy efficiency metric  $\frac{PC \times makespan}{number \ of \ instances \times 3600}$  is defined for comparing results. The lower the metric value, the higher energy efficiency of the host.



Fig. 15. Memory-bound makespan results



Fig. 16. Disk-bound makespan results

Table 2 shows the average energy efficiency of all tests for all ULs and all hosts. Best values for each host are presented in bold. The study shows that for both hosts the CPU-bound benchmark is more efficient at high ULs, memory-bound benchmark is more efficient at medium ULs and disk-bound benchmark is more efficient at low ULs. Intel host is more efficient than AMD for all ULs and all types of benchmarks. Finally, the high-critic UL (100%) is less efficient than the high-medium UL (87.5%), except for disk-bound executions.

| UL     | AMD   |       |       | Intel |       |       |
|--------|-------|-------|-------|-------|-------|-------|
|        | CPU   | mem.  | Disk  | CPU   | mem.  | Disk  |
| 12.5%  | 0.801 | 1.804 | 4.106 | 0.404 | 0.407 | 0.226 |
| 25.0%  | 0.422 | 1.511 | 4.817 | 0.191 | 0.193 | 0.233 |
| 37.5%  | 0.295 | 1.429 | 4.926 | 0.159 | 0.226 | 0.240 |
| 50.0%  | 0.231 | 1.391 | 6.147 | 0.109 | 0.185 | 0.246 |
| 62.5%  | 0.193 | 1.335 | 5.801 | 0.089 | 0.266 | 0.255 |
| 75.0%  | 0.168 | 1.410 | 5.907 | 0.081 | 0.216 | 0.260 |
| 87.5%  | 0.150 | 1.570 | 6.367 | 0.083 | 0.235 | 0.265 |
| 100.0% | 0.154 | 1.656 | 5.686 | 0.107 | 0.238 | 0.262 |

**Table 2.** Efficiency  $(PC \times makespan/3600/number of instances)$ 

#### 5 Conclusions and Future Work

This article presented an empirical analysis of the power consumption of synthetic benchmarks in high-end multicore systems. The main contribution is an exhaustive study of the inter-relationship among the power consumption of the main computing resources at different ULs, over AMD and Intel architectures.

The experimental methodology consisted on executing synthetic benchmarks over high-end hosts connected to a PDU, considering different ULs and combinations, aimed at characterizing the power consumption of each computing resource (CPU, memory, and disk). The operations performed by the benchmarks include mathematical functions and read/write of main memory and disk.

The study was complemented with performance experiments. A total number of 144 experiments were performed, 96 evaluating the power consumption and 48 evaluating the performance. For each experiment, 20 independent executions were performed. Results showed that in single executions, CPU utilization has a linear relation with power consumption. Memory utilization has significant impact on power consumption when compared to CPU usage, up to 157% more EC for AMD and 46% more EC for Intel. On the other hand, disk usage presented low EC variation for all ULs.

Combined executions are able to reduce EC with regard to independent executions manly for CPU and disk combined execution. Efficiency analysis showed that different benchmarks performed more efficiently at different ULs: CPU at high ULs, memory at medium ULs and disk at low ULs. Critic UL (100%) showed worse efficiency than high-medium UL (87.5%), except for disk.

The main lines for future work are related to extend the power and performance characterization of different benchmarks (including GPU-bound, networkbound, and no-synthetic benchmarks) and other high-end hosts. We are also working on using the characterization to build energy models for evaluating energy-aware scheduling strategies on HPC infrastructures and datacenters.

## References

- Anghel, A., Vasilescu, L., Mariani, G., Jongerius, R., Dittmann, G.: An instrumentation approach for hardware-agnostic software characterization. Int. J. Parallel Prog. 44(5), 924–948 (2016)
- Brandolese, C., Corbetta, S., Fornaciari, W.: Software energy estimation based on statistical characterization of intermediate compilation code. In: International Symposium on Low Power Electronics and Design, pp. 333–338 (2011)
- 3. Buyya, R., Vecchiola, C., Selvi, S.: Mastering Cloud Computing: Foundations and Applications Programming. Morgan Kaufmann, San Francisco (2013)
- Dayarathna, M., Wen, Y., Fan, R.: Data center energy consumption modeling: a survey. IEEE Commun. Surv. Tutorials 18(1), 732–794 (2016)
- Du Bois, K., Schaeps, T., Polfliet, S., Ryckbosch, F., Eeckhout, L.: Sweep: evaluating computer system energy efficiency using synthetic workloads. In: 6th International Conference on High Performance and Embedded Architectures and Compilers, pp. 159–166 (2011)
- Feng, X., Ge, R., Cameron, K.: Power and energy profiling of scientific applications on distributed systems. In: 19th IEEE International Parallel and Distributed Processing Symposium, pp. 34–44 (2005)
- Iturriaga, S., García, S., Nesmachnow, S.: An empirical study of the robustness of energy-aware schedulers for high performance computing systems under uncertainty. In: Hernández, G., Barrios Hernández, C.J., Díaz, G., García Garino, C., Nesmachnow, S., Pérez-Acle, T., Storti, M., Vázquez, M. (eds.) CARLA 2014. CCIS, vol. 485, pp. 143–157. Springer, Heidelberg (2014). https://doi.org/10.1007/ 978-3-662-45483-1\_11
- 8. Kopytov, A.: Sysbench repository https://github.com/akopytov/sysbench. Accessed 1 May 2017
- Kurowski, K., Oleksiak, A., Piątek, W., Piontek, T., Przybyszewski, A., Węglarz, J.: Dcworms-a tool for simulation of energy efficiency in distributed computing infrastructures. Simul. Model. Pract. Theory 39, 135–151 (2013)
- Langer, A., Totoni, E., Palekar, U.S., Kalé, L.V.: Energy-efficient computing for HPC workloads on heterogeneous manycore chips. In: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, pp. 11–19 (2015)
- Nesmachnow, S.: Computación científica de alto desempeño en la Facultad de Ingeniería. Universidad de la República. Revista de la Asociación de Ingenieros del Uruguay, 61(1), pp. 12–15 (2010). Text in Spanish
- Nesmachnow, S., Dorronsoro, B., Pecero, J., Bouvry, P.: Energy-aware scheduling on multicore heterogeneous grid computing systems. J. Grid Comput. 11(4), 653– 680 (2013)

 Nesmachnow, S., Perfumo, C., Goiri, I.: Holistic multiobjective planning of datacenters powered by renewable energy. Cluster Comput. 18(4), 1379–1397 (2015)

37

 Srikantaiah, S., Kansal, A., Zhao, F.: Energy aware consolidation for cloud computing. In: Conference on Power Aware Computing and Systems, vol. 10, pp. 1–5 (2008)