Energy consumption model in multicore architectures with variable frequency

Meneses-Viveros, Amilcar; Paredes-López, Mireya; Hernández-Rubio, Erika; Gitler, Isidoro

doi:10.1007/s11227-020-03349-0

Energy consumption model in multicore architectures with variable frequency

Published: 17 June 2020

Volume 77, pages 2458–2485, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

The Journal of Supercomputing Aims and scope Submit manuscript

Energy consumption model in multicore architectures with variable frequency

Download PDF

Amilcar Meneses-Viveros ORCID: orcid.org/0000-0003-1976-6199¹,
Mireya Paredes-López²,
Erika Hernández-Rubio³ &
…
Isidoro Gitler⁴

317 Accesses
5 Citations
Explore all metrics

Abstract

Models extending Amdahl’s law have been developed to study the behavior of parallel programs energy consumption. In addition, it has been shown that energy consumption of those programs also relies on the layout of the resources on the chip, such as power supply. Other extensions over Amdahl’s law have been conducted to study the behavior of parallel programs speedup for frequency variable processors. Previous models have focused on the use of Turbo Boost in the parallel regions of a program, without considering that Turbo Boost also affects the sequential regions. Hence, we present a model to analyze energy consumption of parallel programs executed on Intel multicore processors with Turbo Boost frequencies to cover this gap. The model is an extension to Amdahl’s law, and it is validated with a double-precision matrix multiplication running on Intel multicore processors that enable Turbo Boost technology.

Amdahl’s Law Extension for Parallel Program Performance Analysis on Intel Turbo-Boost Multicore Processors

The Energy Efficiency Evaluating Method Determining Energy Consumption of the Parallel Program According to Its Profile

Article 01 December 2020

The shift from processor power consumption to performance variations: fundamental implications at scale

Article 18 August 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Generation, consumption and energy saving are not a minor issue. Although sources of clean energy are being sought, the generation of energy still causes large emissions of CO₂. Approximately 72% of global energy consumption goes to transportation, lighting and electrical heating, while other energy consumptions are mainly due to electrical and electronic equipment, including the computing devices, among others [1,2,3,4,5,6].

Energy consumption in computing devices is an area of research and development that is rapidly growing. This growth is due to the fact that there are several critical problems that must be thoroughly addressed, including the energy gap problem, energetic consumption for autonomous systems and the problem of exaflop scaling, among others [7,8,9,10,11,12,13]. Firstly, regarding the energy gap problem, Nokia (2010) reported that the power consumption used by new applications increases annually more than what mobile device batteries can yield. Requiring that new smartphones need to be recharged more often, limiting users mobility [7, 14]. Secondly, autonomous vehicles, such as robots and drones, require efficient energy management. In fact, if the algorithms that control or use them for specific tasks are not energetically optimal, the battery is at risk and can affect the tasks of the vehicle, even to its integrity [9,10,11]. Thirdly, related to the exaflop problem, it is known that there is a steady growth in supercomputer performance as can be seen from the Top500.org reports [12]. With nowadays computer technology, the increment of the number of floating point operations per second (flops) is directly proportional to power consumption; hence, exaflop computers would be energetically unviable. However, around the world, teams of engineers and scientists are aiming, by 2020, to manufacture exaflop computers with the target in energy consumption of 20 Mwatts per hour [12, 13].

Various strategies have been developed and implemented to save energy. For mobile devices: use of processors with low power consumption, mobile computation offloading strategies, ensuring that applications have sequential rather than parallel executions, and scheduler optimization for both processor and antennas, to name a few [15]. Some strategies to save energy have been developed, such as the use of approximate computing, to the effect that autonomous vehicles can save energy in the expensive computing operations they perform [16, 17]. For exaflop scaling: use of processing units that have a good flop-to-watt ratio including low power processors, use of accelerators such as GPUs or Xeon Phi, use of multicore processors with variable frequency, cooling technology and energy capture/recovering [12, 13].

Most of energy models based on extensions to Amdahl’s law are not considering the use of variable frequency as Turbo Boost provides [18]. Meneses [19] presents an extension to Amdahl’s law for Intel processors with Turbo Boost, but only to study the performance behavior and not the energy behavior of parallel programs. Verner [20] presents a model that considers frequency variation but only in the parallel regions of a program. On the other hand, it is necessary to have a model that helps to analyze the energy behavior of parallel programs running on multicore processors with Turbo Boost technology. The main contribution of this work is a model to analyze energy consumption of parallel programs executed on Intel multicore processors with Turbo Boost, which is an extension to Amdahl’s law.

2 Related work

Power consumption is an important factor during the designing phase of new processors. Recently, the use of multicore processors has increased from 2 to 48 cores. In addition, it is common to manufacture processors with more memory cache and with frequencies higher than 3.6 Ghz. This frequency has a limit and it is due to the fact that energy consumption generates heat. For instance, when an instruction flow is being executed the energy that is consumed by the processor might cause overheat, ending requiring additional energy to cool it down [21,22,23,24,25].

Fuller and Miller [26] in 2010 addressed the power consumption problem, exploring new technologies for the design of modern processors. Furthermore, different strategies have been used to reduce power consumption of these new processor architectures, heterogeneous or asymmetric, including specialized cores, vectorization, scaled dynamic voltage frequency (DVFS) and systematic techniques for power activation, among others [27, 28].

The study of the multicore energy behavior has been focused on two different paths. The first one has been through mathematical models and the second one through experimenting. Table 1 shows all models related to power consumption analysis of multicore processors.

Table 1 Models related to the study of power consumption

Full size table

Traditionally, Amdahl’s law model has been used to analyze parallel programming speedup [47] and it has been taken as basis for the study of power consumption of this programming in multicore architectures. Research on energy models based on extensions to Amdahl’s law can be found in [29,30,31,32, 32,33,34,35]. Most of these studies have been focused on power consumption of multicore architectures without considering frequency variations. Kim et al. [48] conducted their research focused on frequency variations on the sequential regions of the parallel program. Pei et al. [35] present a model for heterogenous multicore processors; in this model, the Data Preparation Overhead (ODP) is focused to generate its extension to Amdahl’s law.

Energy proportionality refers to a model that establishes that a server consumes power proportionally to its use [36,37,38,39,40]. Since its introduction in 2007, the model has been embraced by the server industry as a design goal for further optimizations of energy efficiency. In [36], the authors found that modern servers have taken the proportionality of server power consumption from linear to quadratic. Hsu et al. [37] reviewed different energy proportionality metrics and they proposed a unique unified indicator to represent them all, called EP (energy proportionality). Furthermore, the authors identified that while EP is considered better than other existing metrics, EP does not capture an important aspect of the server power proportionality, which is linearity. Energy proportionality is still under research for data centers environment [38,39,40].

Some models can be categorized as energy complexity models. Some authors [41,42,43,44] have formalized the problem of energy consumption by extending the ideas of computational complexity to a new area called energy complexity to explain its behavior. Actually, computational complexities have been obtained for traditional algorithms such as sorting algorithms. The main idea is to know the energy used as a function of problem size.

There are other models using different strategies for the analysis of energetic behavior based on processors with independent power supply resources [33, 45]. These resources can be cores, cache memory and network interconnection, among others. Basmadjian and de Meer [45] worked on the design of a software-based model of power consumption for multicore processors. They suggested that it is important to bear in mind that the presence of more than one processing unit directly affects the processor power behavior, because computational resources are shared. They also mentioned that measuring power consumption from hardware-level is not trivial for multiprocessors with a high quantity of circuits inside. Cho and Melhem [33] studied the effects of parallelism over a program performance and energy consumption. Their proposed model was tested on a machine that could apply core off-lining. They predicted that the introduction of more cores combined with a high percentage of parallel code in a process helps to reduce the energy consumption. Their main conclusion is that the more energy savings, the more common elements processing units there are, energized by the same power source.

Finally, another energy consumption model is presented in [46]. The authors take the energy delay product (EDP) as the metric for energy efficiency. The lowest EDP means the best energy efficiency. For a shared memory multicore processor system, they used processor overclocking and memory frequency scaling to achieve better performance and lower power consumption, providing better energy efficiency. Furthermore, the authors stated that it cannot be ensured that better energy efficiency can be achieved anytime we use processor overclocking. However, they also found that reducing memory frequency combined with processor overclocking cannot ensure that we achieve greater energy efficiency. Hence, they propose a processor overclocking and memory frequency scaling algorithm based on the Holistic Energy-Efficient (HEE) algorithm, which determines when to use processor overclocking to improve energy efficiency and when you could combine processor overclocking with memory frequency scaling for better energy efficiency.

The majority of research that have studied the energy consumption behavior in Intel multicore processors are based on experimentation. They have tested several benchmarks to study the impact of energy consumption on specific tasks of the system, such as memory access, use of DVFS (dynamic voltage frequency scalability), use of Turbo Boost, use of vectorization, use of counters of thermal, power use and operating system configurations, to name a few [18, 28, 49,50,51,52,53,54,55,56,57].

James et al. [25] present a performance analysis of Intel Core i7 processors (Nehalem). Their analysis combines different workload scenarios. The authors determine that Turbo Boost activation is due to inherent features of the application. They note that Turbo Boost mitigates Amdahl’s law. This article reports that the reduction in time is 6%, while the energy consumption increases by 16%. They report that with Core i7 the step size for frequency increases is 133.33 MHz. Turbo Boost is made possible by a processor feature called power gating. Traditionally, an idle processor core consumes zero active power while dissipating static power due to leakage current. Power activation also aims to reduce the leakage current, further reducing the power consumption of the idle core. The additional available power margin can be diverted to the active cores to increase their voltage and frequency without violating the power, voltage and thermal enveloped [25]. Turbo Boost makes Nehalem a dynamically asymmetric processor; this means that different cores can execute the same set of instructions, while frequencies vary independently between cores. The experiments they do make them CPU intensive, floating point and integer, and with large memory accesses. In this article, they present a frequency-based processor power consumption model.

Acun et al. [49] present a series of experiments on 4 supercomputers in order to see the behavior of HPC applications on supercomputers with nodes that have Turbo Boost. The test program they use is MK double-precision matrix multiplication, as it represents a benchmark for heavy hardware use. The authors report a degradation in the performance of applications as the number of cores increases in processors with dynamic frequency change. The observed problem is that not all nodes increase the frequency at the same time. They tried trying to set the frequency constant but did not get good results. The solution they found was a process migration algorithm to processors with active Turbo Boost and which are observed to increase the frequency in the same way.

Aniruddha et al. in 2017 [50] proposed that the models to predict the performance of energy and temperature in clusters presuppose homogeneous execution systems, that is, processors and cores that run at the same frequencies, so that frequency variations must be carefully studied of processors. This paper presents an empirical study to see the performance of processors for HPC. In particular, they do the family review of Intel processor families with Turbo Boost. The performance problem due to energy variation increases as processors arrive with higher frequency dynamic increases. The variation is further amplified by the restrictions imposed by the hardware (number of cores, power management functionality, among others). In this paper, the authors show that with the hardware power constraint of the processor, the variation in processor performance and energy efficiency has increased up to fourfold in the latest Intel processors (Sandy Bridge, Ivy Bridge and Broadwell). This work suggests that there is a risk of greater variation in performance and energy efficiency, as the number of cores in a processor increases.

Current energy models based on extensions to Amdahl’s law are not considering the use of variable frequency as Turbo Boost provides [18]. Verner [20] and Meneses [19] present an extension to Amdahl’s law for Intel Processors with Turbo Boost, but only to study the performance behavior and not the energy behavior of parallel programs. On the other hand, it is necessary to have a model that helps to analyze the energy behavior of parallel programs running on multicore processors with Turbo Boost technology. Hence, we present a model to analyze energy consumption of parallel programs executed on multicore processors with variable frequencies, which is an extension to Amdahl’s law.

3 Energy consumption model

We give an energy consumption model for multicore processors with variable frequency, based on Amdahl’s law. This model extends the model proposed in [34] and covers processors with variable frequency.

The relation between energy, power and time is given by [14]:

$$\begin{aligned} energy = power \times time. \end{aligned}$$

(1)

Based on this relation, our energy consumption model consists of two main components: the power and the time. On one hand, we obtain the power model for multicore processors of variable frequency taking into account two different measurements of power. The first one is the power associated to the processor’s base frequency. The second one is the power associated to the processor after variable frequency has been enabled. Due to the fact that frequency is directly related to power, for higher frequency there is higher power consumption. Hence, for multicore processors with variable frequency, it is expected that as frequency increases an increment of power consumption is obtained. On the other hand, we introduce Amdahl’s law to model the time of a parallel program running on a multicore processor. Then, we extend the model for multicore processors with variable frequency. Furthermore, this extension predicts not only the speedup in terms of time, but also the energetic speedup of the multicore processor. In Sects. 3.1 and 3.2, we describe both components of the energy consumption model, the power and the time models, respectively. Section 3.3 develops the energy model and Sect. 3.4 develops an extension of Amdahl’s law for the energetic speedup.

3.1 Power

The power model is developed based on two cases, which we call $W_{off}$ and $W_{on}$. Case $W_{off}$ refers to the status when the processor does not have variation in its frequency. Case $W_{on}$ refers to the status when the processor has an active variation in its frequency.

In general, according to [34], it is possible to represent the entire power usage by the addition of three parameters: base power, power of active cores and idle power for all cores. These parameters are denoted by $W_{base}$, $pW_{active}$ and $mW_{idle}$, where p refers to the number of active cores and m to the total number of cores in the processor. The parameter $pW_{active}$ can be $pW_{aoff}$ or $pW_{aon}$ depending on the case.

The power of a processor with m cores of which p are active is given by:

$$\begin{aligned} W_{off}(p) & = W_{base} + pW_{aoff} + mW_{idle}, \end{aligned}$$

(2)

$$\begin{aligned} W_{on}(p)& = W_{base} + pW_{aon} + mW_{idle}. \end{aligned}$$

(3)

Since $W_c = W_{base} + mW_{idle}$ is constant, we will use $W_{c}$ in the future reference to this term. As stated in [45], this constant depends on the layout of the resources shared by the multicore processor including, cache memory and interconnection network, among others.

$$\begin{aligned} W_{off}(p)& = W_{c} + pW_{aoff}, \end{aligned}$$

(4)

$$\begin{aligned} W_{on}(p)& = W_{c} + pW_{aon}. \end{aligned}$$

(5)

When only one core is used, Eqs. (6) and (7) represent the power required by a sequential program in a multicore processor for both cases.

$$\begin{aligned} W_{off}(1)& = W_{c}+W_{aoff}, \end{aligned}$$

(6)

$$\begin{aligned} W_{on}(1)& = W_{c}+W_{aon}. \end{aligned}$$

(7)

3.2 Time

According to Amdahl [47], the execution time of a sequential program to solve a problem of size n, running on a single processor is given by Eq. (8), where $\sigma (n)$ is the serial portion of the computation and $\varphi (n)$ is the portion of the computation that can be executed in parallel.

$$\begin{aligned} T(n,1) = \sigma (n) + \varphi (n). \end{aligned}$$

(8)

Following the same principle, the execution time for a program running on a multicore processor is given by Eq. (9), where p denotes the number of cores in the multicore processor and k(n, p) is the overhead related to the intercommunication between processors.

$$\begin{aligned} T(n,p) = \sigma (n) + \frac{\varphi (n)}{p} + k(n, p). \end{aligned}$$

(9)

Hence, the speedup of a parallel program is the ratio between sequential execution time and parallel execution time as given by Eq. (11). Assuming $k(n, p) > 0$ and $\sigma (n)>0$, we obtain inequality (10).

$$\begin{aligned} \Psi (n,p)& = \frac{\sigma (n) + \varphi (n) }{\sigma (n) + \frac{\varphi (n)}{p} + k(n, p)}, \end{aligned}$$

(10)

$$\begin{aligned} \Psi (n,p)\le & \frac{1}{f + (1-f)/p}, \;\text{ where }\; f= \frac{\sigma (n)}{\sigma (n) + \varphi (n)}. \end{aligned}$$

(11)

Note that $\sigma (n) > 0$ because in practice all programs have a sequential code region, even if it is very small. And the inherent sequential proportion is given by f. From (10), it follows that the maximum speedup when $f \rightarrow 0$ is given by:

$$\begin{aligned} \lim _{f\rightarrow 0}\Psi (n,p) = p. \end{aligned}$$

(12)

3.2.1 Amdahl’s law for processors with variable frequency

We give an extension for the speedup (10) of multicore processors with variable frequency based on Amdahl’s law. We consider the execution time of a program when the frequency change is active or not. Let $T_{off}(n, p)$ be the time a program uses to solve a problem of size n with p cores when the frequency change is not active. We consider that $T_{off}(n,1)$ and $T_{off}(n,p)$ are given by Eqs. (8) and (9), respectively.

To calculate the execution time of a sequential program to solve a problem of size n in a processor that has the frequency change active, we must consider the ratio of frequencies H given by:

$$\begin{aligned} H = \frac{\text{ increased } \text{ frequency }}{\text{ base } \text{ frequency }}. \end{aligned}$$

Then, $T_{on}(n,1)$ and $T_{on}(n,p)$ are defined as

$$\begin{aligned} T_{on}(n,1)& = \frac{\sigma (n)}{H} + \frac{\varphi (n)}{H}, \end{aligned}$$

(13)

$$\begin{aligned} T_{on}(n,p)& = \frac{\sigma (n)}{H} + \frac{\varphi (n)}{Hp} + \frac{\kappa (n,p)}{H}. \end{aligned}$$

(14)

The speedup of a parallel program that solves a problem of size n with p cores on a multicore processor with variable frequency is given by:

$$\begin{aligned} \Psi (n,p)& = \frac{T_{off}(n,1)}{T_{on}(n,p)})\nonumber \\&=\frac{ \sigma (n) + \varphi (n)}{\frac{\sigma (n)}{H}+\frac{\varphi (n)}{Hp} + \frac{k(n, p)}{H}}, \end{aligned}$$

(15)

We obtain (16) from (15) (as in the case above for the time),

$$\begin{aligned} \Psi (n,p) & \le \frac{ \sigma (n) + \varphi (n)}{\frac{\sigma (n)}{H}+\frac{\varphi (n)}{Hp}}\nonumber \\ & = H\left( \frac{ \sigma (n) + \varphi (n)}{\sigma (n)+\frac{\varphi (n)}{p}}\right) . \end{aligned}$$

(16)

Taking $f= \frac{\sigma (n)}{ \sigma (n)+ \varphi (n)} $, for $\sigma (n)>0$, turns (16) into (17).

$$\begin{aligned} \Psi (n,p)\le \frac{H}{f+\frac{(1-f)}{p}}. \end{aligned}$$

(17)

Note that the maximum speedup for a parallel program running in a processor with frequency variation is when $\lim _{f\rightarrow 0}\Psi (n, p) $, then

$$\begin{aligned} \lim _{f\rightarrow 0}\Psi (n,p) \le pH. \end{aligned}$$

(18)

Figure 1a, b shows the speedup for three different values of f: 0.2, 0.1 and 0.02. Figure 1a shows the speedup of a parallel program without variation in the frequency (traditional Amdahl’s law), and Fig. 1b shows the speedup of a parallel program using $H=1.14$. The H value was taken assuming that the processor used increments in frequency from 2.7 to 3.1 GHz ($H \approx 3.1GHz/2.7GHz$). The same values are used in the experiments conducted in Sect. 4.

In Fig. 1b, it can be observed that for the frequency ratio $H=1.14$, there are some regions where the speedup is higher than the upper limit of Amdahl’s [Eq. (12)]. For example, when $f=0.02$ and $p \in [1, 8]$, where p is the number of cores, the speedup is greater than p (the upper limit of Amdahl’s law). Speedup allows to study the behavior of a parallel program with respect to a sequential program, if speedup is positive then it follows that the parallel program has better performance than the sequential program and, if there is a deceleration then the sequential program has better performance than the parallel program. In the same way, in the case of processors with frequency change, the speedup allows to study the behavior of a parallel program in a processor with frequency variable with respect to the sequential execution in the multicore processor without change of frequencies. For $f=0.02$, the speedup exceeds the traditional limit of Amdahl’s law, given in Eq. 12, because the parallel program is executed in the multicore processor with frequencies greater than of sequential execution, so H-more instructions per unit of time per streaming is executed in the parallel program. It can also be seen that the acceleration is below pH as indicated in Eq. 18.

3.3 Energy consumption model

In Sects. 3.1 and 3.2, we introduced the models for time and power for multicore processors that can change their frequency. In this section, we will combine both models to generate an energy consumption model. Analogously to the modeling of time, we begin by introducing the general formula for energy of a sequential program. In general, the energy required to solve a problem of size n on a single core is denoted by J(n, 1), where $\sigma (n)$ refers to the inherently sequential portion of the computation and $\varphi (n)$ to the potentially parallel portion. Then, J(n, 1) is given by Eq. (19).

$$\begin{aligned} J(n,1)& = W(1)\sigma (n) + W(1)\varphi (n) \nonumber \\& = W(1)(\sigma (n) + \varphi (n)), \end{aligned}$$

(19)

where W(1) depends if the processor frequency is variable or not [Eqs. 4 and 5].

We define J(n, p) as the energy to solve a problem of size n on p cores. We have:

$$\begin{aligned} J(n,p) = W(1)\sigma (n) + W(p)\frac{\varphi (n)}{p} + W_{\kappa }\kappa (n,p), \end{aligned}$$

(20)

where W(1) and W(p) could be taken from Eqs. 4 and 5, depending if the processor have activate the variable frequency, and $W_{\kappa }\kappa (n,p)>0$, which refers to the energetic overhead inherent in a parallel program.

Now, to model the energy for a multicore processor of variable frequency, we consider two cases. The first one is when the frequency of the processor does not have any variation, and the second case is when the frequency has variation. We denote these two cases by $J_{off}$ and $J_{on}$, respectively.

Eqs. (21) and (22) correspond to the sequential and parallel executions for which there is a lack of frequency variation in the multicore processor, where $W_{off}(1)$ is given by Eq. (6) and $W_{off}(p)$ by Eq. (4).

$$\begin{aligned} J_{off}(n,1)& = W_{off}(1)\sigma (n) + W_{off}(1)\varphi (n) \nonumber \\& = W_{off}(1)(\sigma (n) + \varphi (n)). \end{aligned}$$

(21)

$$\begin{aligned} J_{off}(n,p)& = W_{off}(1)\sigma (n) + W_{off}(p)\frac{\varphi (n)}{p} + W_{\kappa }\kappa (n,p). \end{aligned}$$

(22)

Similarly, for the case when frequency change is enabled, Eqs. (23) and (24) give the energy consumption for solving a problem of size n in one and p cores, respectively.

$$\begin{aligned} J_{on}(n,1)& = W_{on}(1)\frac{\sigma (n)}{H} + W_{on}(1)\frac{\varphi (n)}{H} \nonumber \\& = \frac{W_{on}(1)}{H}(\sigma (n) + \varphi (n)), \end{aligned}$$

(23)

$$\begin{aligned} J_{on}(n,p)& = W_{on}(1)\frac{\sigma (n)}{H} + W_{on}(p)\frac{\varphi (n)}{Hp} + W_{\kappa }\kappa (n,p). \end{aligned}$$

(24)

3.4 Amdahl’s law extension for energetic speedup

Based on the energy consumption model, it is possible to calculate the energetic speedup using Amdahl’s law. By knowing the energetic speedup, the energy saving of a multicore processor of variable frequency can also be known. Again, we will work on two cases, $\Psi _{off}(n,p)$, for multicore processors with no variable frequency and $\Psi _{on}(n,p)$ for multicore processors with variable frequency. This representation will allow us to study the energy consumption behavior of a parallel program in multicore processors. When the frequency change is inactive, as in Amdahl’s law, the energy consumption of a sequential program $J_{off}(n,1)$ is divided by the power consumption of a parallel program $J_{off}(n,p)$ as given in Eq. (25).

$$\begin{aligned} \Psi _{off}(n,p) & = \frac{J_{off}(n,1)}{J_{off}(n,p)} \nonumber \\ & = \frac{W_{off}(1)(\sigma (n) + \varphi (n))} {W_{off}(1)\sigma (n) + W_{off}(p)\varphi (n) + W_{\kappa }\kappa (n,p)} \nonumber \\ & \le \frac{W_{off}(1)(\sigma (n) + \varphi (n))}{W_{off}(1)\sigma (n) + W_{off}(p)\frac{\varphi (n)}{p}}\nonumber \\ & \le \frac{1}{f + (1-f)\frac{W_{off}(p)}{pW_{off}(1)}}, \end{aligned}$$

(25)

where $f = \frac{\sigma (n)}{(\sigma (n) + \varphi (n))}$.

The maximum energy speedup for a parallel program running in a processor without frequency variation is when $\lim _{f\rightarrow 0} \Psi _{off}(n,p)$, then

$$\begin{aligned} \lim _{f\rightarrow 0} \Psi _{off}(n,p) = \frac{pW_{off}(1)}{W_{off}(p)}. \end{aligned}$$

(26)

If this limit is greater than 1, then we are in a scenario where we can have energy savings. If this value is less than or equal to 1, then we are in a scenario where sequential programs consume less energy than parallel programs.

On the other hand, when the frequency change is active, the energy consumption of a sequential execution when the frequency change is inactive $J_{off}(n,1)$ is divided by the energy consumption of the parallel program in a multicore processor with the frequency change activated $J_{on}(n,p)$ as given by Eq. (27).

$$\begin{aligned} \Psi _{on}(n,p)& = \frac{J_{off}(n,1)}{J_{on}(n,p)} \nonumber \\ & = \frac{W_{off}(1)(\sigma (n) + \varphi (n))} {W_{on}(1)\frac{\sigma (n)}{H} + W_{on}(p)\frac{\varphi (n)}{Hp} + W_{\kappa}\kappa (n,p)} \nonumber \\ & \le \frac{W_{off}(1)(\sigma (n) + \varphi (n))}{W_{on}(1)\frac{\sigma (n)}{H} + W_{on}(p)\frac{\varphi (n)}{Hp}} \nonumber \\ & \le \left( \frac{W_{off}(1)}{W_{on}(1)}\right) \frac{H}{f + (1 - f)\frac{W_{on}(p)}{W_{on}(1)p}}. \end{aligned}$$

(27)

The maximum energy speedup for a parallel program running in a processor with frequency variation is when $\lim _{f\rightarrow 0} \Psi _{off}(n,p)$, then

$$\begin{aligned} \lim _{f\rightarrow 0} \Psi _{on}(n,p) = \frac{pHW_{off}(1)}{W_{on}(p)}. \end{aligned}$$

(28)

As in the inequality (26), if this limit is greater than 1, we can have energy savings. If this value is less than or equal to 1, then the sequential programs consume less energy than parallel programs. The relevant point is that we can know the maximum energy acceleration if p is equal to the number of cores of the processor, and if we know $H, W_{off}(1)$ and $W_{on}(p)$.

For example, Fig. 2 shows an instance of the model that describes how energy speedup behaves in a processor that can change frequency. In this example we considered an 4-core processor that has its base frequency at 2.75Ghz and can change its frequency to 3.1GHz, thus the value of $H=1.127$. The power values associated to the processor are: $W_c=2.5 $ watts for the constant power, $W_{aoff}=6.5$ watts for the active power of the core when Turbo Boost is off, and $W_{aon} = 8.5$ is the active power per core when Turbo Boost is on. With these values it is possible to compute $W_{off}(p) = W_c + pW_{aoff}$ and $W_{on}(p)=W_c+pW_{aon}$. Figure 2a shows the energy speedup when the frequency is constant. Figure 2b shows the energy speedup when there are changes in the frequency. These figures show that the processor without changes in the frequency can have greater speedup than the processor that has changes in the frequency. However, the gain is not so significant, it stays close to 1. This means, parallel programs running in processors with or without change of frequency consume the same energy when they use all cores available. So, from the point of view of the model, a processor with frequency change offers greater performance in parallel programs and uses the same amount of energy as a sequential program.

4 Experiments and discussion

Several experiments have been conducted in two platforms to validate the energy model introduced in Sect. 3.3. These experiments have been executed on three different Intel processors adapted with Turbo boost technology listed below. This technology allows the processor to switch on and off the frequency variation during the processing of instructions.

1.
Intel Core i5 Dual core 2.7 Ghz (power book).
2.
Intel Core i5 Quad core 2.3Ghz (iMac).
3.
Intel Xeon Eight Quad core 3.4Ghz (Server).

To measure the energy consumption in a processor, we used Power Gadget, which is an Intel software-based power usage monitoring tool activated for the processors used in these experiments.

To test the proposed model, part of the Linpack benchmark was used, specifically the double-precision matrix multiplication in square matrices. It is well known that the complexity of this task is $O(n^3)$ and that it stresses the processor. Also, the parallel part exceeds 98% of execution time with matrices greater than 1500. The sizes of the matrices that were tested are $1500\times 1500, 2000\times 2000, 3000\times 3000$ and $3500\times 3500$.

OpenMP was used for the tests. The number of threads that were used in the OpenMP programs for the experiments was from one to the number of cores that were on the test platform. That is, for Intel Core i5 dual core processor was experimented with one and two threads. On the Intel Core i5 quad core processor, the experiments were done with one, two, three and four threads. Finally, for Intel Xeon eight-core, the experiments were done from one until eight threads. Special care was taken in that increasing the number of threads would increase the use of cores in the processor. For example, if k threads were used, then k cores of the processor were used. For this reason, we use threads and cores in an indistinct manner in the following sections.

4.1 Dual core experiments

Table 2 shows the power behavior of the Intel Core i5 Dual core processor, as measured by the Intel PowerTutor tool. Columns $W_c$ and $W_a$ represent the power constant and the active power for each core [Eqs. (4) and (5)]. Column W(1) is the power of the processor with one active core [Eqs. (6) and (7)], and column W(2) is the power of the processor with two active cores [Eqs. (4) and (5)].

Table 2 Frequency and electric power for Intel Core i5 dual core with Turbo Boost off and on

Full size table

Figure 3 and Table 3 show the execution time behavior and speedup of double-precision matrix multiplication for different sizes of square matrices. In Fig. 3, the solid lines show the consumption time of the program running in one and two cores with Turbo Boost off, while the dashed lines show the behavior of the process in one and two cores with Turbo Boost on. Notice, that when Turbo Boost is activated there are shorter execution times compared to when it is turned off. The process that consumes more time is the sequential one with Turbo Boost off, and the process that consumes less time is the one that uses four cores with Turbo Boost on. This behavior can be deduced from Eqs. (12) and (18), notice that when $H=3.1/2.7\approx 1.148 > 1$, the speedup obtained with Turbo Boost on is greater than when Turbo Boost is off.

Table 3 shows the experimental speedup and the model associated to Eq. (17), of double-precision matrix multiplication using one and two cores, for the cases when Turbo Boost is on and off. Speedup is calculated with Eq. (27), since we are interested in knowing the execution behavior of parallel programs in architectures with changes in the frequency handling of the processor cores. In Table 3, we can observe that the speedup with Turbo Boost on exceeds the traditional linear limit of Amdahl’s law [Eq. (12)]. Also, according to the behavior predicted by the model [Eq. (18)], this upper limit is pH, and since $H=1.148$, the upper limit of the speedup is 2.29x. This result is consistent with those in column Model Two Threads for Turbo Boost On in Table 3, where the maximum speedup reported is 2.2953 for the case of a matrix of size 3500.

Table 3 Speedup on Intel Core i5 dual core processor

Full size table

Figure 4 shows the energy consumption for double-precision matrix multiplication with different matrix sizes in one and two cores. Figure 4a shows the behavior of energy consumption in the experiments carried out. Figure 4b shows the energy consumption behavior of the model, according to Eqs. (22) and (24), using the processors data found in Table 2 and the sequential time when Turbo Boost is turned off. The solid lines in these graphs correspond to the energy consumption of matrix multiplication when Turbo Boost is off and the dashed lines correspond to the same experiment when Turbo Boost is on. When both graphs in Fig. 4 are compared, it is observed that the behavior predicted by the model corresponds to that of the experiments. Notice that the process consumes more energy when Turbo Boost is on than when it is off. Also, the execution in two cores consumes less energy than the sequential execution. Another behavior that can be observed from this figure is that the program that consumes most energy is the sequential one with Turbo Boost turned on, while the process that consumes less energy is the one that uses two cores with Turbo Boost off. Finally, in Fig. 4, we can see that the cases of the process running with 1 core with Turbo Boost off and the process that runs with two cores with Turbo Boost on, consume almost the same amount of energy.

Table 4 Experimental and model energetic speedup for double-precision matrix multiplication on Intel Core i5 dual core processor

Full size table

Table 4 shows the behavior of energetic speedup for double-precision matrix multiplication when Turbo Boost is on and off. The values on this Table are for experiments with several matrix sizes and for the model given by Eqs. (25) and (27), according to the energy power values found in Table 2. Table 4 shows that for both the experiments and the results of the model, when Turbo Boost is turned off, a greater energy speedup is obtained compared to the speedup obtained when Turbo Boost is on. An interesting behavior observed is, that for both the experiments and the model, when the two cores of the processor are used, the speedup is very close to one. This means that a process using two cores with Turbo Boost on spends the same energy as the sequential process when Turbo Boost is off. Finally, we observe that the speedup that occurs using a core when Turbo Boost is on is less than one, this is because the power consumption of a sequential process with Turbo Boost on spends more energy than a sequential process with Turbo Boost off, as shown in Fig. 4.

4.2 Quad core experiments

Table 5 shows the main attributes of power and frequency of Intel Core i5 quad core processor. Values for the constant power $W_c$, active power $W_a$, power of the processor with 1, 2, 3 and 4 active cores are obtained by experiments with the Intel Power Gadget program. Column W(1) is the power of the processor with one active core [Eqs. (6) and (7)], and column W(p), with $p\in [2,3,4]$ is the power of the processor with two, three and four active cores [Eqs. (4) and (5)].

Table 5 Frequency and electric power for Intel Core i5 quad core with Turbo Boost off and on

Full size table

Figure 5 and Table 6 show the execution time behavior of double-precision matrix multiplication for different sizes of square matrices. In Fig. 5, the solid lines show the behavior of the experiment running in one, two, three and four cores with Turbo Boost off, while the dashed lines show the behavior of the process in one, two, three and four cores with Turbo Boost on. In the figure and in the table, it is possible to see that the execution for one, two, three and four cores runs in less time when the frequency is increased. The process that consumes more time is the sequential one with Turbo Boost off, and the process that consumes less time is the one that uses two cores with Turbo Boost on. As in the case of the dual core processor, this behavior is predicted by Eqs. (8), (9), (13) and (14), where $H = 2.8/2.7 \approx 1.037 > 1$.

Table 6 Speedup on Intel Core i5 quad core processor

Full size table

Table 6 shows the experimental speedup and the models speedup, corresponding to Eq. (15). The columns show the speedup with one, two, three and four cores with Turbo Boost on and off for the experimental speedup and the models speedup. As can be seen in the speedup columns for two, three and four cores, the values obtained from the model are very similar to the values obtained by the experiments. As in the case of the two-core processor, the traditional Amdahl’s law (10) is fulfilled when Turbo Boost is turned off. But, when Turbo Boost is turned on, the upper limit of Amdahl’s law (12) is exceeded, as is indicated by inequality (18). This upper limit is nH, and since $H\approx 1.037$ the upper limit of speedup is 2.074x for two cores, 3.11x for three cores and 4.148x for four cores. This behavior can be appreciated from experimental values of speedup that are below this upper limit.

Figure 6 and Table 7 show the behavior of energy consumption and energy speedup of matrix multiplication. Figure 6 shows the energy consumption for double-precision matrix multiplication with different matrix sizes in one, two, three and four cores. Figure 6a shows the behavior of energy consumption from the experiments carried out. Figure 6b shows the energy consumption behavior given by the model, as obtained from Eqs. (22) and (24), with the processors data found in Table 5 and sequential time when Turbo Boost is turned off. The solid lines in these graphs correspond to the energy consumption of matrix multiplication when Turbo Boost is off and the dashed lines correspond to the same experiment when Turbo Boost is on. From Fig. 6a, b it is observed that the behavior predicted by the model corresponds to that of the experiments. As in the case of the dual core processor, in Fig. 6 it is observed that the process consumes more energy when Turbo Boost is on. However, the execution when all cores are used consumes less energy than the sequential execution (1 thread). Moreover, the program that consumes the most energy is the sequential one with Turbo Boost turned on, while the process that consumes less energy is the one that uses all the available cores with Turbo Boost turned off.

Table 7 Energetic speedup model and experimental for Intel Core i5 quad core processor

Full size table

Table 7 shows the energetic speedup for double-precision matrix multiplication. These values are for both the experimental and the model cases when Turbo Boost is on and off. The values on this table are for experiments with several matrix sizes and for the model given by Eqs. (27) and (25), according to the energy power values found in Table 5. It should be noted that, although there is an error between the value obtained in the model with respect to the experimental value, the model does show the qualitative behavior of the energy, for the cases in which Turbo Boost is off and on. Unlike the dual core case, where the maximum energy speedup is approximately one, we find that when all the cores of the Quad Core processor are used, we have a speedup close to 2x, for the cases in which Turbo Boost is on and off. Thus, when we use the multicore processor at its maximum computational capacity and with Turbo Boost on, we save time (more than 4x speedup) and spend half of the energy consumed by the sequential program with Turbo Boost off.

4.3 Xeon eight-core experiments

Table 8 shows the main attributes of power and frequency of Intel Xeon eight-core processor. Values for the constant power $W_c$, power of the processor with 1, 2, 3, 4, 5, 6, 7 and 8 active cores are obtained by experiments with the Intel Power Gadget program. Column W(1) is the power of the processor with one active core [Eqs. (6) and (7)], and columns W(p), with $p\in [2,\ldots ,8]$ is the power of the processor with two, three until eight active cores.

Table 8 Frequency and electric power for Intel Xeon eight-core with Turbo Boost off and on

Full size table

For this case, the H element from Eq. 17 is $H=3.6GHX/3.4GHX=1.058823$. For visualization purposes, the tables and figures that describe the experimental and model behavior use 1, 2, 4, 6 and 8 threads.

Figure 7 shows the experimental and the models time behavior, corresponding to Eq. (15) for double-precision matrix multiplication. The time behavior is maintained in the model.

Table 9 shows the comparison of the speedup model from (10) and (17) inequalities, with the acceleration obtained from the experiments. The case of Turbo Boost off is the traditional Amdahl’s law.

Table 9 Speedup experimental and model for Intel Xeon eight-core processor

Full size table

Figure 8 shows the experimental and the models energy behavior, corresponding to from Eqs. (22) and (24) for double-precision matrix multiplication. The energy consumption behavior is maintained in the model.

Table 10 shows the comparison of the speedup energy model from (25) and (27) inequalities, with the acceleration obtained from the experiments. It is appreciated that while increasing the use of cores, there is more energy savings. The worst performance of power consumption is when using a single core with the Turbo Boost turned on. The maximum energy acceleration is 4.8x and is obtained when using eight cores with the Turbo Boost turned off. The multicore processor saves more energy by using all the cores.

Table 10 Energetic speedup experimental and model for Intel Xeon eight-core processor

Full size table

Table 11 shows the values obtained from Verner Model [20]. We notice that the model is equal to our model in the case of Turbo Boost off. And when comparing our model with the Verner model for Turbo Boost on, it is observed that our model is closer to the experiments (Table 10).

Table 11 Energy speedup on Intel Xeon eight-core processor with Verner Model

Full size table

5 Conclusions

It is possible to obtain extensions of Amdahl’s law to study the performance and power behavior of parallel programs running on Intel processors with Turbo Boost technology. The model presented in this work can help to understand the behavior of parallel programs on processors with Turbo Boost. When change of frequencies in the processing units is considered, the rate of change in the frequency should be taken into account, as given by Eq. (17). To extend Amdahl’s law for the case of energy consumption behavior, it is necessary to establish the power model used by the processors when the frequency change is active and when it is not [Eqs. (4) and (5)]. By combining the model extensions for time and power, it is possible to extend Amdahl’s law for energy, by considering constant or variable frequency [Eqs. (25) and (27)].

Several experiments were carried out on platforms with Intel Core i5 processors of two and four cores and Intel Xeon eight-core. These processors allow to enable or disable frequency change through the Turbo Boost technology. The operation considered was double-precision matrix multiplication for different matrix sizes in order to stress the processor. The advantage of this test is that it is highly parallelizable and the values of f given by Eq. (10) are very close to zero, allowing to study the behavior of a highly parallelizable task.

With the experiments, it was possible to verify that the model corresponds to the qualitative behavior in time and energy for parallel programs running in processors that allow frequency change. In addition, the experiments show that the sequential programs are the ones that consume more energy when the processor has frequency change enabled and that the parallel programs that consume less energy are those that occupy the maximum amount of cores available in the processor, in accordance with the model. Parallel programs consume less energy when the processor frequency is not changed compared to parallel programs that run when the processor increases its frequency. From Figs. 4 and 6, it is possible to observe upper and lower limits in the energy consumption, the programs that consume more energy are the sequential ones with the case of Turbo Boost on. The programs that consume less energy are those that have Turbo Boost off and that use the maximum amount of cores available in the processor, however, if they use all of the available cores, the energy difference between the case with Turbo Boost off compared to the case with Turbo Boost on is very small.

As expected, it was validated that at higher frequencies, the processor cores operate with more power. This is noted in Tables 2, 5 and 8. However, the speedup analyses show that the use of more cores decreases the execution time and decreases the energy consumption. In fact, for the case in which all the processor cores are used, the consumption of energy is almost the same with Turbo Boost on and off.

The power model presented in this work considers linear behavior; however, it is useful to deduce the formulas for energy acceleration and it is appreciated in the experiments that it has a good behavior with respect to the experimental results. In the future, a nonlinear correction can be made to the power model to have a better prediction of the energy consumption behavior, taking into account the energy proportionality models.

The experimental results validate the model for the case of architectures with Intel Turbo Boost technology. In the future work, tests will be done for AMD’s Turbo Core technology and ARM processors.

References

Clerici A, Assayag M (2013) Recursos energéticos globales. Encuesta 2013: Resumen. Tech. rep., World Energy Council, for sustainable energy
Commission WEC (1993) Energy for tomorrow’s world: the realities, the real options and the agenda for achievement. St. Martin’s Press, New York
Google Scholar
Nakicenovic N, Jefferson M (1995) Global energy perspectives to 2050 and beyond. Global energy perspectives to 2050 and beyond. Tech. rep
Poizot P, Dolhem F (2011) Clean energy new deal for a sustainable world: from non-CO₂ generating energy sources to greener electrochemical storage devices. Energy Environ Sci 4(6):2003–2019
Article Google Scholar
Chu S, Majumdar A (2012) Opportunities and challenges for a sustainable energy future. Nature 488(7411):294–303
Article Google Scholar
Chow J, Kopp RJ, Portney PR (2003) Energy resources and global development. Science 302(5650):1528–1531
Article Google Scholar
Robinson S (2009) Cellphone energy gap: desperately seeking solutions. Strateg Anal
D’Andrea R (2014) Guest editorial can drones deliver? IEEE Trans Autom Sci Eng 11(3):647–648
Article Google Scholar
Mei Y, Lu YH, Hu YC, Lee CG (2004) Energy-efficient motion planning for mobile robots. In: Proceedings, ICRA’04 2004 IEEE International Conference on Robotics and Automation 2004, vol 5, pp 4344–4349
de Santos PG, Garcia E, Ponticelli R, Armada M (2009) Minimizing energy consumption in hexapod robots. Adv Robot 23(6):681
Article Google Scholar
Chyba M, Haberkorn T, Singh S, Smith R, Choi S (2009) Increasing underwater vehicle autonomy by reducing energy consumption. Ocean Eng 36(1):62
Article Google Scholar
Geller T (2011) Supercomputing’s exaflop target. Commun ACM 54(8):16–18
Article Google Scholar
Hsu J (2012) Supercomputer ‘Titans’ face huge energy costs. Blog on LiveScience. https://www.livescience.com/18072-rise-titans-exascale-supercomputers-leap-power-hurdle.html
Tarkoma S, Siekkinen M, Lagerspetz E, Xiao Y (2014) Smartphone energy consumption: modeling and optimization. Cambridge University Press, Cambridge
Book Google Scholar
Meneses-Viveros A, Hernandez-Rubio E, Mendoza S, Rodriguez J, Quintos ABM (2018) Energy saving strategies in the design of mobile device applications. Sustain Comput Inform Syst 19:86–95
Google Scholar
Xu Q, Mytkowicz T, Kim NS (2016) Approximate computing: a survey. IEEE Des Test 33(1):8–22
Article Google Scholar
Pant YV, Abbas H, Nischal K, Kelkar P, Kumar D, Devietti J, Mangharam R (2015) Power-efficient algorithms for autonomous navigation. In: 2015 International Conference on Complex Systems Engineering (ICCSE), pp 1–6
Gunther S, Deval A, Burton T, Kumar R (2010) Energy-efficient computing: power management system on the nehalem family of processors. Intel Technol J 14(3):50–65
Google Scholar
Meneses-Viveros A, Paredes-López M, Gitler I (2018) In: International Conference on Supercomputing in Mexico, Springer, pp 87–96
Verner U, Mendelson A, Schuster A (2017) Extending Amdahl’s Law for Multicores with Turbo Boost. IEEE Comput Archit Lett 16(1):30–33
Article Google Scholar
Le Sueur E, Heiser G (2010) In: Proceedings of the 2010 International Conference on Power Aware Computing and Systems, USENIX Association, Berkeley, HotPower’10, pp 1–8
Haj-Yahya J, Mendelson A, Asher YB, Chattopadhyay A (2018) In: Energy Efficient High Performance Processors, Springer, pp 57–72
Conway P, Hughes B (2007) The AMD Opteron Northbridge architecture. IEEE Micro 27(2):10–21. https://doi.org/10.1109/MM.2007.43
Article Google Scholar
Rotem E, Naveh A, Ananthakrishnan A, Weissmann E, Rajwan D (2012) Power-management architecture of the intel microarchitecture code-named Sandy bridge. IEEE Micro 32(2):20. https://doi.org/10.1109/MM.2012.12
Article Google Scholar
Charles J, Jassi P, Ananth NS (2009) In: Proceedings of IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, IEEE, pp 188–197
Fuller SH, Miller LE (2011) The National Academies Press pp 31–38
Song W, Mukhopadhyay S, Yalamanchili S (2012) In: Dark Silicon Workshop
Cebrian JM, Natvig L, Meyer JC (2012) In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, IEEE, pp 675–684
Sun XH, Chen Y (2010) Reevaluating Amdahl’s Law in the Multicore Era. J Parallel Distrib Comput 70(2):183
Article Google Scholar
Woo DH, Lee HHS (2008) Extending Amdahl’s law for energy-efficient computing in the Many-Core Era. Computer 41(12):24–31
Article MathSciNet Google Scholar
Hill MD, R MM (2008) Amdahl’s law in the multicore Era. Computer 41(7):33–38
Article Google Scholar
Londoño SM, de Gyvez JP (2010) In: 2010 International Conference on Energy Aware Computing (ICEAC), IEEE, pp 1–4
Cho S, Melhem RG (2010) On the interplay of parallelization, program performance, and energy consumption. IEEE Trans Parallel Distrib Syst 21:342–353
Article Google Scholar
Isidro-Ramirez R, Viveros AM, Rubio EH (2015) Energy consumption model over parallel programs implemented on multicore architectures. Int J Adv Comput Sci Appl 6(6):21
Google Scholar
Pei S, Zhang J, Xiong N, Kim MS, Gaudiot JL (2018) Energy efficiency of heterogeneous multicore system based on the enhanced Amdahl’s law. IJHPCN 12(3):261–269
Article Google Scholar
Hsu CH, Poole SW (2013) In: 2013 42nd International Conference on Parallel Processing, IEEE, pp 834–840
Hsu CH, Poole SW (2015) In: Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, pp 235–240
Ruiu P, Fiandrino C, Giaccone P, Bianco A, Kliazovich D, Bouvry P (2017) On the energy-proportionality of data center networks. IEEE Trans Sustain Comput 2(2):197–210
Article Google Scholar
Jiang C, Wang Y, Ou D, Luo B, Shi W (2017) In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), IEEE, pp 1649–1660
Malla S, Christensen K (2020) The effect of server energy proportionality on data center power oversubscription. Future Gener Comput Syst 104:119–130
Article Google Scholar
Martin AJ (2001) Towards an energy complexity of computation. Inf Process Lett 77(2–4):181–187
Article MathSciNet Google Scholar
Tran VNN, Ha PH (2016) In: 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), IEEE, pp 1041–1048
Roy S, Rudra A, Verma A (2013) In: Proceedings of the 4th conference on Innovations in Theoretical Computer Science, ACM, pp 283–304
Swapnoneel R, Rudra A, Verma A (2013) In: 4th Conference on Innovations in Theoretical Computer Science ITCS ’13, pp 283–304
Basmadjian R, de Meer H (2012) In: Future Energy Systems: Where Energy, Computing and Communications Meet (e-energy), pp 1–10
Wu F, Chen J, Dong Y, Zheng W, Pan X, Sun Y (2018) In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), IEEE, pp 960–967
Amdahl GM (1967) In: AFIPS Conference, vol 30, pp 483–485
Kim SH, Kim D, Lee C, Jeong WS, Ro WW, Gaudiot JL (2014) A performance-energy model to evaluate single thread execution acceleration. IEEE Comput Archit Lett 14(2):99–102
Article Google Scholar
Acun B, Miller P, Kale LV (2016) In: Proceedings of the 2016 International Conference on Supercomputing, pp 1–12
Marathe A, Zhang Y, Blanks G, Kumbhare N, Abdulla G, Rountree B (2017) In: Proceedings of the 5th International Workshop on Energy Efficient Supercomputing, pp 1–8
Hackenberg D, Schöne R, Ilsche T, Molka D, Schuchart J, Geyer R (2015) In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, IEEE, pp 896–904
Wang B, Schmidl D (2015) In International Workshop on OpenMP. Springer, Switzerland, pp 233–246
Google Scholar
Marques SMV, Medeiros TS, Rossi FD, Luizelli MC, Girardi AG, Beck ACS, Lorenzon AF (2019) In: 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC), IEEE, pp 149–154
Jarus M, Varrette S, Oleksiak A, Bouvry P (2013) In: Revised Selected Papers of the COST IC0804 European Conference on Energy Efficiency in Large Scale Distributed Systems - Volume 8046, Springer, Berlin, EE-LSDS 2013, pp 182–200. https://doi.org/10.1007/978-3-642-40517-4$_$16
Kambadur M, Kim MA (2014) In: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages and Applications, pp 329–344
Porterfield AK, Olivier SL, Bhalachandra S, Prins JF (2013) In: 2013 IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum, IEEE, pp 884–891
Hackenberg D, Oldenburg R, Molka D, Schöne R (2013) In: 2013 International Green Computing Conference Proceedings, IEEE, pp 1–9

Download references

Acknowledgements

The authors thank financial support given by the Mexican National Council of Science and Technology (CONACyT), as well as ABACUS: Laboratory of Applied Mathematics and High-Performance Computing of the Mathematics Department of CINVESTAV-IPN. The authors acknowledge both, the Center for Research and Advance Studies of the National Polytechnic Institute (CINVESTAV-IPN) and the Section of Research and Graduate Studies (SEPI) of ESCOM-IPN, for encouragement and facilities provided to accomplish this publication.

Author information

Authors and Affiliations

Department of Computer Science, Center for Research and Advanced Studies, Mexico City, Mexico
Amilcar Meneses-Viveros
Universidad de las Américas-Puebla, Cholula, Puebla, Mexico
Mireya Paredes-López
Instituto Politécnico Nacional, SEPI-ESCOM, Mexico City, Mexico
Erika Hernández-Rubio
Department of Mathematics, Center for Research and Advanced Studies, Mexico City, Mexico
Isidoro Gitler

Authors

Amilcar Meneses-Viveros
View author publications
You can also search for this author in PubMed Google Scholar
Mireya Paredes-López
View author publications
You can also search for this author in PubMed Google Scholar
Erika Hernández-Rubio
View author publications
You can also search for this author in PubMed Google Scholar
Isidoro Gitler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amilcar Meneses-Viveros.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meneses-Viveros, A., Paredes-López, M., Hernández-Rubio, E. et al. Energy consumption model in multicore architectures with variable frequency. J Supercomput 77, 2458–2485 (2021). https://doi.org/10.1007/s11227-020-03349-0

Download citation

Published: 17 June 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11227-020-03349-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Energy consumption model in multicore architectures with variable frequency

Abstract

Similar content being viewed by others

Amdahl’s Law Extension for Parallel Program Performance Analysis on Intel Turbo-Boost Multicore Processors

The Energy Efficiency Evaluating Method Determining Energy Consumption of the Parallel Program According to Its Profile

The shift from processor power consumption to performance variations: fundamental implications at scale

1 Introduction

2 Related work