1 Introduction

Power consumption of servers can be acquired by different approaches, e.g., using external devices like energy sensors and energy meters, and modeling the power consumption with the help of performance counters and external devices like Mantis [7]. Mounting energy sensors, watt meters, or instrumenting the systems with energy meters, however, is not only expensive, but also hinders the normal operation of the data center [16].

The introduction of running average power limit (RAPL) [12] to the latest Intel processors has mitigated this problem. RAPL provides model specific registers (MSRs) to read the processor energy consumption values in real time. It provides energy readings from four domains, including package (both cores and uncores, i.e., last-level cache), pp0 (all cores in one package), pp1 [specific device in the uncore, i.e., on-chip graphics processing unit (GPU)], and dynamic random access memory (DRAM) plane. With RAPL, we can get fine-grained and reliable energy measurements without needing to custom-instrument the hardware [9].

The introduction of RAPL has enabled multiple new opportunities, e.g., measuring energy consumption of short-code paths [10], power limiting and capping for main memory [6]. Furthermore, RAPL has been extensively studied and incorporated in different tools for fine-grained energy measurements of computing systems [9, 11]. Unlike those works, in this paper, we take a different angle to utilize RAPL, i.e., leveraging RAPL readings to model power consumption drawn from the wall socket.

Knowing the wall socket power draw is beneficial. It helps to determine the exact energy spending, and thus allocate proper energy budget for the system. In addition, it is helpful to set power limit properly to best utilize pricing variations (e.g., setting a high power limit when the hourly electricity price is low, and vice versa) [1]. Utilizing RAPL readings to provide estimates for wall socket power brings several advantages: (1) it promises to minimally interrupt the regular operations of data centers; (2) it is easily executable because it does not require any external sensors or energy meters to be mounted with the system.

We make the following major contributions in this paper:

  1. 1.

    Through a set of experiments performed on three different Intel-based systems, we demonstrate that there exists a strong linear relationship between wall power and package power.

  2. 2.

    Based on the observation mentioned above, we apply machine learning techniques to formulate a power model, which can predict the wall power from RAPL package power when arbitrary workloads are running in the system.

  3. 3.

    Experimental results demonstrate good accuracy of the model, i.e., it can achieve maximally 5.6 % error rate at different processor frequencies.

The rest of the paper is structured as follows. Section 2 discusses the related work, while Sect. 3 describes the experimental setup and benchmark specifications. Section 4 presents the experimental results from our empirical study, and Sect. 5 describes the model formulation. Section 6 discusses the use case of our model and Sect. 7 concludes the paper.

2 Related work

Prior to the introduction of RAPL, power modeling techniques usually focused on carefully designed software monitors and hardware measurement tools, or leveraging machine learning techniques such as stochastic power models. McCullough et al. [15] presented an extensive evaluation of such techniques. They observed that such models perform well for single- and multi-core scenarios but perform poorly for subsystem power modeling due to increased system complexity and hidden power states that are not exposed to OS. The introduction of RAPL mitigates the hidden power states problem because they expose the energy readings and power states to the OS now. Hackenberg et al. [8] presented a comprehensive overview of different power consumption measurement methodologies using RAPL. Venkatesh et al. [21] proposed a new shared-memory window-based solution to model the energy consumed by processes engaged in message passing interface (MPI) operations using RAPL. Balaji et al. [20] investigated the efficacy of RAPL in achieving energy proportionality for SPECpower benchmark. Similar to our work, Castano et al. [5] presented a model for full system instantaneous power dissipation using energy consumption information from a subset of advanced technology extended (ATX) lines. In addition, a lot of work has been directed towards modeling the full system power consumption of server based systems [4, 7, 10, 17].

There are several differences between these works and our approach. Firstly, most of the power modeling techniques formulate the power models using bottom up approach, i.e., modelling the power consumption of the individual components and then adding up to the full system power. Such approaches tend to accumulate the modeling errors to the full system, and usually the error rate can reach up to 10 %, while our approach can achieve better accuracy. Secondly, the approaches that achieve low error rates usually perform instrumentation of the system board with external metering tools, while our approach avoids the usage of such tools during the regular operation of the system.

Table 1 System specifications

3 Experimental setup and benchmark specifications

We choose three different Intel-based systems to conduct the experiments. Table 1 lists the specifications of the three systems. For brevity, in the rest of the paper, we refer to the machines as Machine 1, Machine 2, and Machine 3, respectively. Machine 1 and 2 are workstation-grade and Machine 3 is a server-grade machine. To cover different aspects of the systems, we use in total 16 different workloads (cf. Table 2) which cover CPU, memory, and network-intensive tasks, HEP workloads, and non-trivial applications.

Stress-ng [19] is originally designed to stress different subsystems of a computer. In our work, we use Stress-ng to stress the CPU cores with 100 % workload running a number of sqrt() operations on pseudo-random values.

Stream McCalpin [14] is a well known benchmark designed to measure the sustainable memory bandwidth. Using Stream helps us understand the characteristics of different systems in terms of power consumption when running a memory intensive task.

ParFullCMS is a Geant4 [2] benchmark, which is a multi-threaded high energy physics workload. This benchmark employs complex geometry for simulation and essentially exhibits similar properties like compact muon solenoid (CMS) experiments in CERN.

Parsec is not a synthetic benchmark, hence it provides opportunities to test power models for diverse instruction mix, memory access and network operations [3]. It includes emerging applications from different application domains, e.g., financial, computer vision, deduplication. In Table 2 the 13 benchmarks starting from Black-scholes are all from Parsec benchmark suite.

To measure the power consumption from the wall, we deploy Plugwise Smart Plug [18]. The RAPL measurements are obtained using the latest stable version of Likwid tool set [13].

Table 2 Benchmark specifications

4 Modeling wall power consumption from RAPL

In this section, we present the experimental results and discuss the empirical analysis that helps understand the modeling technique.

Fig. 1
figure 1

Experimental results of Machine 1-Stress-ng

4.1 Stress-ng and Stream benchmarks observations

Figure 1a presents the power consumption in processor package, DRAM interface and wall power consumption when the CPU frequency varies for the CPU-intensive Stress-ng workload. All the three systems presented in Table 1 offer 15 distinct frequencies that are chosen by the operating system in dynamic voltage and frequency scaling (DVFS). For these experiments, we pin the frequency at a specific value to acquire the exact power consumption of the system for that frequency. We plot the wall power consumption against the package power in Fig. 1b. The linear curve fit in this figure suggests that there is a near exact linear relationship between the package power and wall power. In fact the correlation between these two values is 0.999. For the Stress-ng benchmark, the linear equation obtained from the linear fitting is:

$$\begin{aligned} P_{wall}= 1.214 * P_{package} + 20.221 \end{aligned}$$
(1)

wherein, \(P_{package}\) represents the package power as measured with RAPL, and \(P_{wall}\) represents the wall power. For Machine 1, we perform similar experiments with Stream benchmark. The observations are similar. In this case, the correlation between the package power consumption and wall power consumption is again 0.999 and the linear equation is presented in Eq. 2.

$$\begin{aligned} P_{wall}= 1.212 * P_{package} + 25.580 \end{aligned}$$
(2)

For Machine 1 and 2, the idle wall power consumption are 24.65 and 33.97 watts respectively. We observe that the constant part of the linear Eqs. 1 and 2 are dominated by the idle wall power consumption. In general, the observations for Machine 2 follow the same trends as Machine 1. Thus, for brevity, we do not present the results from Machine 2 in this paper.

For the server-grade Machine 3, the same set of experiments also exhibit a linear relationship between package power and wall power consumption. Observations for Stress-ng benchmark are presented in Fig. 2. In case of Machine 3, the correlation between package power and wall power is 0.999 for both Stress and Stream benchmarks. The linear equations for Machine 3 are presented in Eqs. 3 and 4, for Stress-ng and Stream workload, respectively.

$$\begin{aligned} P_{wall}= 1.327 * P_{package} + 97.732 \end{aligned}$$
(3)
$$\begin{aligned} P_{wall}= 1.330 * P_{package} + 116.70 \end{aligned}$$
(4)
Fig. 2
figure 2

Experimental results of Machine 3-Stress-ng

Although Machine 3 is a server-grade machine with two processor sockets, the relationship between package power and wall power remains linear.

Fig. 3
figure 3

Wall and package power consumption with time-ParFullCMS - Machine 1

Fig. 4
figure 4

Package power vs. wall power-Parsec

4.2 ParFullCMS and Parsec benchmarks observations

Figure 3 presents a detailed view of the wall power and package power consumption over time (in seconds) as we run ParFullCMS on different processor frequencies. Figure  3 demonstrates that the multithreaded ParFullCMS has two distinct phases: the initialization phase and the compute-intensive phase. The initialization phase sets up the events to be processed and consumes relatively less power, whereas the compute intensive phase performs bulk of the simulation tasks and consumes relatively more power. It is evident from the figure that irrespective of processor frequencies, package power and wall power correlate strongly with each other and follow the same pattern. In this case the correlation coefficient is 0.997, and the linear equation is presented in Eq. 5.

$$\begin{aligned} P_{wall}= 1.140 * P_{package} + 21.40 \end{aligned}$$
(5)

Figure 4 presents the package power vs. wall power consumption for a subset of the Parsec benchmarks running on Machine 1. Because of space limit, Fig. 4 only includes the observations from canneal, fluidanimate, ferret, facesim, netferret and netstreamcluster. The observations from the rest of the Parsec benchmark are also similar. We run Parsec benchmark with native input size (the largest input set) and make sure that the number of threads are enough to keep all the physical cores active. Parsec results (Fig. 4) show that the strong correlation between wall power and package power holds for non-trivial applications.

The observations obtained from running ParFullCMS and Parsec benchmarks confirm that irrespective of the type of workload, careful formulation of power models can yield the full system power consumption from RAPL package power. Our experiments show that the package power consumption almost always remains strongly correlated with full system power consumption, even when the non-cpu power draw of a machine is not constant, and when the different components of the system exhibit dynamic behaviour with multiple phases.

5 Wall power modeling

As shown in previous sections, the linear model has an excellent fit but the co-efficients vary for different workloads. Thus, we need an approach to calibrate the model for any arbitrary workload. In this section, we develop a generic power model for wall power consumption using machine learning techniques. Note that for brevity, in the rest of the paper, we use Machine 1 only to present the results. The other machines demonstrate similar trends.

5.1 Model calibration

We use Stream, Stress-ng and Parsec benchmarks data for training, and validating the wall power consumption model. We then use the ParFullCMS data to test the accuracy of the model.

We formulate the model using the general least square solution. Assuming that we have a training set of N observations of package power and wall power pairs (pkg,wall), our aim is to obtain a function \(f: \mathbb {R} \rightarrow \mathbb {R}\) (which calculates wall power from package power) so that the average squared error E is minimized. E can be defined as

$$\begin{aligned} E = \frac{1}{N}\sum _{t=1}^N(wall_t - f(pkg_t))^2. \end{aligned}$$
(6)

Let us consider basis functions \(y_i: \mathbb {R} \rightarrow \mathbb {R}\). In general terms, f(pkg) can be defined as

$$\begin{aligned} f(pkg) = \sum _{i=0}^{k-1} a_i \cdot y_i(pkg) \end{aligned}$$
(7)

where A = (\(a_0,\ldots ,a_{k-1}\)) is a \(1 \times k\) vector. Our aim is to find an optimum f(x) that minimizes E. E is minimum when its partial derivatives become zero. We can then acquire

$$\begin{aligned} \frac{\delta E}{\delta a_j} = \frac{1}{N}\sum _{t=1}^N2\left( (wall_t) - \sum _{i=0}^{k-1} a_i \cdot y_i(pkg_t)\right) y_j(pkg_t).\nonumber \\ \end{aligned}$$
(8)

If we simplify Eq. 8, we can get

$$\begin{aligned} \sum _{t=1}^N wall_t y_j(pkg_t)= & {} \sum _{t=1}^N\sum _{i=0}^{k-1} a_i \cdot y_i(pkg_t)y_j(pkg_t) \end{aligned}$$
(9)

where j varies from 0 to \(k-1\). In simplified matrix notation, Eq. 9 can be written as

$$\begin{aligned} A = WY^T (YY^T)^{-1}. \end{aligned}$$
(10)

In Eq. 10, Y is a \(k \times N\) matrix where \(Y_{i,t}= y_i(pkg_t)\) and \(W = (wall_0,\ldots ,wall_t)\). We define the basis function as \(y_i(pkg)=pkg^i\), which means function f is defined by a polynomial with k terms and A becomes the vector of coefficients obtained from polynomial regression.

We solve Eq. 10 for different orders of polynomials k, where k varies from 1 to 4. From previous sections we get the inductive bias of our learning algorithm that there exist a linear relation between RAPL package and wall power. However, to test the generalization of our assumptions, we test different orders of polynomials on our data set to see whether the linear relationship holds also for diverse workload mix in comparison to higher order polynomials. Once we formulate the model, we calculate \(E_{T}\), \(E_{V}\), \(E_{T+V}\) and \(E_{Test}\), which represents average squared training error, validation error, training+validation error, and test error, respectively. The values are presented in Table 3. \(E_{V}\) and \(E_{T+V}\) (Table 3) give us the accuracy of the validation set and the total training set. The lower the values of \(E_{V}\) and \(E_{T+V}\) the better the model performs.

Table 3 Polynomial regression results

As Table 3 suggests, the training data and test data performs the best for linear regression. The obtained power model for Machine 1 is

$$\begin{aligned} P_{wall}= 1.227 * P_{package} + 22.084 \end{aligned}$$
(11)

Equation 11 predicts the wall power consumption for ParFullCMS benchmark with an average square error rate of 5.59 % for different processor frequencies on Machine 1.

Our model requires a one time run of the benchmarks with the external AC-power measurement equipment connected for a single machine. Once we acquire the wall power and the corresponding package power consumption data for a specific machine, we perform the training and validation stage that results in a power model. The power model is then tested for any arbitrary application to measure the accuracy. The model can predict the wall power consumption of any workload running on the machine without the external power meters with promising accuracy. If the machine has access to external power meter at a later point of time, the evaluation stage can recalibrate the predicted data with the real data measured. The feedback information can be fed into the training and validation stage to fine tune and recalibrate the model for better prediction.

6 Discussion

The proposed model can predict wall power from RAPL package power for any work-load where CPU performs the bulk of the system operations. There are cases when RAPL measurements are not enough to measure the wall power consumption, specifically when there are components other than the CPU conducting bulk of the system operations. For example, a file server with multiple disks is performing a disk intensive task, or a server with a separate (non-integrated) GPU processor where the processing is executed by the GPU rather than the CPU. For the former example, the wall power consumption can be estimated using the following equation.

$$\begin{aligned} P_t = P_i + P_{RAPL} + P_{disk} \end{aligned}$$
(12)

where \(P_t\) is the wall power consumption at time t, \(P_i\) is the wall power consumption when the system is idle, \(P_{RAPL}\) is the difference of power consumption between operating mode and idle mode in RAPL domain, and \(P_{disk}\) is the difference of power consumption between operating mode and idle mode of the disk drive that can be obtained from its specification datasheet.

Be noted that although we only focus on Intel processors supporting RAPL feature, our method itself is not limited to RAPL, because it only needs power consumption data of different components of a computing system (specifically CPU package and DRAM power). As such, similar models can be developed for AMD or ARM processors as well.

7 Conclusion

In this paper, we present an empirical study on wall socket power consumption and propose a power model to predict wall power from RAPL package power. The proposed power model can predict full system power for any workload with only one time calibration with external power meter. Experimental results demonstrate that our model can predict wall power consumption with an accuracy of 5.6 % error rate in the worst case. Our findings suggest that a careful formulation of the linear coefficients can present a useful and efficient model to predict wall socket power consumption from RAPL package. We plan to extend our work by evaluating the power draw characteristics of different processor architecture (AMD, ARM etc.) and fine-tune the power model with diverse workloads.