1 Introduction

In recent years, the wide adoption of Information and Communication Technologies (ICT) and the exponential growth of Internet users have significantly contributed to the increase of the world energy consumption [1, 2], and the impact of the digital economy is expected to increase even more over the next years [1, 3]. Even if ICT is actually helping other sectors of the economy to reduce their environmental impact, the energy consumption of the ICT sector itself cannot be neglected.

The main strategy of the research effort put so far in ICT energy issues is the reduction of energy consumption, with the constraint of guaranteeing the same quality of service. In addition, partially replacing polluting energy by installing green energy plants close to big consumers is also part of the solution to the problem.

In Cloud computing, data centers are well known for being particularly energy hungry. Electricity consumed by global data centers is estimated to be between 1.1 and 1.5% of total electricity use [4]. Typically, data centers are rather inefficient and consume more energy than required [5], leaving room for improvement achievable through intelligent management techniques.

By breaking down the energy consumption of data centers into their components as shown in Fig. 1 [6], we can observe that about 52% of energy is consumed by computing equipments and the remaining 48% are for power equipments and cooling systems.

Fig. 1
figure 1

Data center energy consumption breakdown

One of the reasons for energy inefficiency is the underutilization of servers whose consumption is not proportional to computing load. As the statistics show, average server utilization in data centers is around 30% [7], due to capacity over provision based on worst-case scenario in order to ensure high levels of reliability [8].

To address this problem and improve Cloud systems power efficiency, VM (virtual machine) migration has been proposed and has shown great potential. Migrating a VM consists of changing its physical host without service interruption. This can be done for different purposes, such fault resilience or for system maintenance. It can be used in power management strategies to move services running on a big set of underutilized servers to a smaller set of optimally loaded servers, so that the others can be switched off for power saving.

Even if VM migration also consumes energy, it has been shown that it is more effective than leaving underutilized servers switched on [9, 10].

Another important component that consumes energy in Cloud services is the communication network. Networks are also typically provisioned for worst-case scenarios such as traffic burst and busy-hours load. Actually, the network has typically a pretty large capacity margin even with respect to peak load for service quality and robustness reasons, and then it usually wastes a lot of energy [11]. In the internal data center network, the main energy consumers are Ethernet switches that are hierarchically interconnected. In the external network based on IP technology, the core network routers dominate energy consumption [12]. The relative contribution to energy consumption of core router components is shown in Fig. 2 [12]. IP networks typically operate at less than 50% utilization, while still consuming almost 100% of maximum power due to an almost flat energy profile (consumption versus load) [11]. For managing network devices in order to consume less energy, two main approaches are used: turning off the nodes or scaling down their performance [13, 14].

Fig. 2
figure 2

Breakdown of power consumed by a core router

Most of previous work on energy efficiency in Cloud systems focused on managing computing and networking components separately. However, optimizing energy consumption of data centers and their network independently may be significantly inefficient, in particular when dynamic resource management schemes like VM migration are considered. Considering energy consumption of data center servers only may cause traffic congestion and degrade the quality of Cloud services offered to end users, as well as decrease the energy efficiency of the network. On the network side, energy saving techniques are based on estimations of the traffic matrices over time, and if data centers are not considered, large traffic variations due to decisions taken by dynamic resource managers can cause energy waste. Integrating techniques for managing energy consumption of computing and networking components in a new generation of Cloud systems can potentially provide non negligible efficiency gains.

A key aspect that makes some level of integration in services offered by data centers and networks particularly important is the geographic distribution of Cloud systems. Distributing data centers over different locations brings Cloud services closer to end users, and offers the opportunity to better exploit the variation of energy prices in different locations and time zones, as well as the efficient use of the green energy that is locally generated.

Even if there are only a few existing works in literature that have investigated the impact of joint optimization solutions for energy saving in Cloud systems [15,16,17], the effectiveness of such solutions from energy cost point of view, and their contribution to reducing environmental impact through the use of green energy remain open issues that have motivated our work (see Sect. 2 for more details).

In this paper, we present a holistic approach for jointly managing Cloud data centers and their networks. In the considered scenario, the Cloud system provides Platform as a Service (PaaS) to a variety of users, and data centers are distributed geographically in different locations and interconnected by a network. We propose an optimization model based on mixed integer linear programing (MILP), which has the goal of minimizing the Cloud energy cost and exploit the availability of green energy sources in different places where data centers are located. The proposed approach covers many aspects of Cloud computing including live migration of VMs, energy storage management, and green energy exploitation. In this model, we consider both energy consumption of data-center servers and their interconnection network, and optimize the use of energy coming from the electrical grid, as well as the energy locally generated using renewable resources exploiting also energy storage [18].Footnote 1

The rest of the paper is organized as follows. Section 2 presents related work on energy saving. Section 3 describes the proposed approach and introduces the model formulation. Section 4 presents the tests performed and the obtained results. Finally, concluding remarks are given in Sect. 5.

2 Related Work

As mentioned in the introduction, most of the existing works tackle the problem of energy efficiency in Cloud systems separating the management of data center servers and network nodes.

For the data centers, there is a large body of work on energy management of computing resources. We can categorize existing approaches into two classes: server consolidation with power state management and workload scheduling.

Server consolidation consists of efficiently using the available computing resources with the view to reduce the total number of active servers and thus saving energy by turning off unused ones. Entropy is a resource manager proposed in [19] that is based on constraint programming, and it is able to consolidate applications running on a number of underutilized servers to a smaller number of highly utilized servers using live migration of VMs. The adopted scheme does not take into account heterogeneity in application requirements and servers, which is rather common in multiple cloud provider environments. A similar approach able to cope with heterogeneous environments is named pMapper [20], an application placement controller based on continuous optimization. For more complex environments, with a combination of service level agreements (SLAs), different power models and energy policies, a VM consolidation engine named Plug4Green has been proposed in [21].

The workload placement in modern data centers with a large number of servers significantly affects their operating temperature in addition to energy consumption. A smart placement using workload scheduling techniques may reduce cooling requirements and save even more energy. A good example of schemes based on this scenario is EnaCloud [22]. EnaCloud is an energy-aware heuristic-based approach that chooses the most appropriate scheme for dynamic application placement based on their arrivals, departures or resizing events. The approach in [23] proposes an integer linear programming (ILP) model that combines job allocation and VM migration.

None of the above solutions considers network requirements or the geographic distribution of data centers.

As for previous work on Cloud networks, available contributions are mainly focused on designing and operating the communication infrastructure in order to achieve fault tolerance, scalability, high utilization and cost efficiency [24, 25]. For energy efficiency, beside the hardware improvements, many contributions related to protocols and network architecture aim at achieving a better trade-off between performance and energy consumption.

In [26], Green Route (G-Route), a service routing protocol for achieving energy-efficiency and collaboration among cloud providers, is proposed. It is a routing scheme that creates autonomous energy-efficient paths between different providers before running a specific service. It has been implemented and tested on Amazon EC2 cloud infrastructure, and shown quite significant energy and cost savings per service request. A drawback of this approach is that it needs a trusted third party to control the energy profiling process. Other energy-aware routing solutions can be found in [27,28,29,30,31,32,33,34].

Switching off network nodes and rerouting traffic on other paths has a significant impact on saving energy. In [35] the authors proposed an integer linear programing model and some heuristic algorithms that minimize energy by finding the set of routers and links that must remain powered on for a given traffic level while switching off the others. This model is based on the knowledge of the traffic profile exchanged between source/destination nodes, and the maximum link utilization. Unlike most similar works where the objective is to minimize cost or to maximize performance, the authors minimize the total power consumption of the network. Other works that consider switching off network components and sleep mode for saving energy can be found in [36,37,38,39,40,41,42,43].

PCube [44] is an elastic data center scheme that conserves energy by adjusting the network topology and varying bandwidth availability based on traffic demand. It is designed to be able to dynamically adjust network structure depending on different traffic volumes, and to turn off a set of switches to save energy. Similar solutions are Bcube [45] and ElasticTree [46].

The above solutions focus on network only and do not consider the energy optimization of servers in data centers together with network nodes.

Relatively few papers consider joint management of data centers and network. In [16], the authors proposed an optimization approach to jointly minimize the energy consumption in data center hosts and network. The basic idea is to consider both VM placement and traffic routing for energy saving. To avoid the complexity of the problem, a unified representation method is proposed and the optimization model of VM placement is made similar to a routing problem; then the placement and routing problems are solved as a single one. A similar approach is PowerNetS [15], a power optimization strategy based on workload and traffic correlation analysis. The problem is formulated using constrained programming with the goal of consolidating VMs which are not positively correlated with the same physical machines. At the same time, the model takes into account the network by consolidating VMs that are linked through traffic flows onto the same server or servers close to each other. In [47] the authors proposed an energy saving scheme for VM placement considering both physical servers and network resources. The problem is modeled as a combination of bin packing and quadratic assignment problems with multi-objective optimization and solved with a greedy algorithm that combines hierarchical clustering with best fit scheme. The above approaches focus on the network internal to the data center and do not consider geographically distributed Cloud systems, as well as the availability of green resources and energy storage.

The approach in [48] jointly minimizes the cost in big data processing based on three factors: task assignment, data placement, and data routing. The cost minimization problem is formulated as a mixed-integer nonlinear programming (MINLP) model, that is then linearized to make it tractable. This work considers geographically distributed data centers but it does not explicitly model the external network for energy consumption, it does not exploit the difference of energy prices in various locations, and it does not take into account green energy usage.

In [49], the authors jointly optimize three problems: VM placement, the distribution of requests, and data center resizing. The geo-distribution of data centers is explored, by considering variation in energy prices within different locations. The problem was formulated using MILP, and solved using a two phase heuristic. However, the energy consumption of the network was not included neither the exploitation of green energy resources.

In this paper, we study the energy cost minimization problem of Cloud systems by managing data centers and Network as a whole. Differently from existing works, where the focus is only on one aspect of Cloud computing like VM placement/migration, in this paper we aggregate multiple Cloud computing aspects into one approach, and propose a holistic energy aware solution for managing Cloud data centers and their interconnection network.

3 Global Green Cloud Management Framework

3.1 Problem Description

We consider a PaaS (Platform as a Service) scenario, where the provider operates on a virtualized infrastructure composed by multiple data centers distributed over different geographical locations. Each data center is equipped with thousands of physical servers. This scenario is rather common nowadays even if the number of locations and and servers vary with the size of the provider. For example, Google data centers are distributed among various locations in the world: 19 in the US, 12 in Europe, one in Russia, one in South America, and three in Asia. [50] While Amazon Web Services Cloud operates 42 Zones within 16 geographic Regions around the world [51].

Let \({\mathcal{I}}\) be the set of available data centers. We assume they are fully connected by a backbone network (mesh topology), where in each path between two data centers i and j, the number of routers and the available bandwidth capacity are known.

We assume that the Cloud Provider is able to host different user applications by offering a set \({\mathcal{L}}\) of heterogeneous types of VMs. Each type of VMs executes a specific service application that is capable to serve a set \({\mathcal{K}}\) of user request classes as shown in the system model (Fig. 3).

Fig. 3
figure 3

Cloud system model

We consider a 1-day horizon, divided into 24 time periods, in which the duration of each time period is 1 h, and we solve the problem in advance for each day. We consider predictions of the application workload based on historical traffic information [52,53,54], thus, an estimation of the incoming traffic for each application is provided. We denote by \(\lambda_{ik}^{t}\) the arrival rate of requests of class \({k} \in {\mathcal{K}}\) to data center i at time \({t} \in {\mathcal{T}}\).

Based on the traffic profile, our goal is to minimize the total energy cost in the cloud system by allocating VMs to servers and if necessary, migrating them between data centers, considering the fact that migration itself costs energy on both source and destination. Depending on the location of a data center and its time zone (day/night), the price of energy varies. We exploit different energy regions by migrating VMs to data centers where the price of energy is cheaper. We also consider the availability of energy from renewable resources for reducing environmental impact of the cloud system. Therefore, migration of VMs to data centers with more available green energy is an opportunity that can be exploited to optimize costs. Basically, using VM migration we actually reduce the load of expensive and polluting data centers, while we exploit cheap and green energy when available.

We consider that VMs are live migrated between data centers (DCs) using post-copy live migration scheme [55], in which the VM is suspended immediately upon beginning of the migration process. First CPU state is transferred to the destination DC, while the memory is still at the source DC. Then the destination DC requests fault pages from the source DC, while the latter is transferring the memory state to the destination DC. Post-copy live migration decreases the migration traffic therefore reduces the total migration time, since the VM page is transfered only once over the network unlike in pre-copy live migration [56]. Indeed, in this case memory pages are copied from the source DC to the destination DC without interrupting the execution of the virtual machine, which implies a succession of iterations of memory transfer before stopping the VM execution on the source DC and starting it again at the destination DC.

By migrating VMs from one data center to another, we consider migration energy cost of the destination data center, the source data center, and the network. For the network, its energy is assumed to be proportional to the amount of exchanged traffic, which includes bandwidth consumed by migrated VMs including there memory, disk images of VMs, and users traffic.

Moreover, we jointly manage the use of green and brown energy. We assume that all the energy coming from the electrical grid is brown energy, while we consider the energy generated in data centers as green and available for free if it is consumed locally, to privilege the use of on-site generated green energy. However, it is straightforward to modify the formulations to include also an amount of green energy coming from the electrical grid (with a cost depending on location).

We assume that data centers are able to generate on-site an amount of green energy using renewable sources, such as solar energy, wind energy or geothermal energy. The use of this smart electricity sourcing strategies on-site is increasing, e.g., Facebook’s solar-powered data center [57], and Green House Data wind-powered data center [58]. Since our work focuses only on system management, we do not include capital expenditures for renewable sources. The costs of green energy generation are significantly declining over the last few years. Depending on the technology, installation prices vary, e.g., Parabolic trough plants used to generate Concentrating Solar Power (CSP) have capital costs as low as 4600$/kW in USA market, while wind power technologies tend to be more competitive, between 1800 and 2200$/kW [59].

Matching exactly the energy consumption with green energy generation is difficult and can potentially generate inefficiencies when produced energy cannot be consumed immediately. Therefore, we relax this problem by considering the use of energy storage technologies. In our scenario, data centers are equipped with rechargeable battery systems that are able to store the locally generated green energy. Balancing the use of green energy produced, between immediate usage and storage in batteries for later consumption allows for green energy to be available when the price of brown energy is high, as well as to solve the problem of discontinuous availability of renewable resources.

For each time period, the model defines how to allocate the load in each data center. In other words, how many VMs are kept active or off. The same thing for the network, where we define for each time period which links are to be turned on and which should be off depending on the number of routers in each link and their capacity. Note that, even if we associate energy consumption to links, the real energy consumers in the network are line cards connected to the links in the routers on both sides.

3.2 Problem Formulation

In this section, we first introduce decision variables. We then formulate the cost minimization objective function and the problem constraints.

3.2.1 Decision Variables

The goal of our optimization model is twofold: (1) finding how many VMs to be migrated among DCs, and the source and destination of the migrations, (2) managing the usage of the available green energy sources. These decisions allow to move the load and the energy consumption among DCs during the day in order to exploit the availability of green energy and the differences in brown energy prices of various DC locations.

We formulate the problem using several sets of decision variables. The first pair of main decision variables refer to VMs migration and requests forward. The integer variable \({v_{ijl}^{t}}\) represents the number of VMs of type l to be migrated from DC i to data center j during time period t. This variable depends on the number of received requests by each data center. We use a continuous positive variable \({x_{ijkl}^{t}}\) to represent the arrival rate of class k requests received by VM of type l in data center i and then served in data center j after live migration.

The second pair of main decision variables are related to the green energy management. In particular, variables \(sge_i^t\) and \(dge_i^t\) indicate the sources of the green energy supplied to the DC i during the time period t. \(sge_i^t\) is the amount of energy coming from DC batteries, while \(dge_i^t\) is the green energy produced at the DC i directly supplied to the DC for its operations.

Together with these main decision variables, there are several other secondary variables that, depending on the values of the main ones, are used to model the behavior of the system. We can group them in the three domains, they refer to: VM and Migration, Networking, Battery and Green Energy Management.

3.2.2 VM and Migration

Integer variable \({\overline{w}_{il}^{t}}\) denotes the number of active VMs type l that are originally running on data center i, while we use the integer variable \({\underline{w}_{il}^{t}}\) to refer to the number of VMs executing on a data center i after all live migrations took place. The variables \({{won}_{il}^{t}}\) and \({{woff}_{il}^{t}}\) depict the number of VMs to be turned on and off respectively at time period t. The energy consumption associated to the migration of type-l VMs, including both current VMs at DC i migrated to other DCs and new VMs migrated to DC i from other DCs, is captured by the variable \({mig}_{il}^{t}\). Figure 4 describes a small illustrative scenario of the migration process, in which a DC i receive a number \({v_{ijl}^{t-1}}\) of migrated VMs from a DC j during the time slot t − 1, then, during time t it migrates a number \({v_{ijl}^{t}}\) to a DC j, we mention the space between time bands is just to show the number of turned off VMs \({{woff}_{il}^{t}}\).

Fig. 4
figure 4

VM variables

Networking As for the network, we assume that the energy consumption of a link is proportional to its load, expressed in terms of the ratio of used bandwidth over available bandwidth. We rely on variable \({b_{ij}^{t}}\) to express the bandwidth used at the link (i, j) connecting DC i with DC j during the time period t, which is determined by the traffic volume exchanged by the two DCs due to all migration processes among them. In addition, we let unused links to be switched off. We capture this behavior using binary variables \({z_{ij}^{t}}\), equal to 1 if the link (i, j) is active, and 0 otherwise. Similarly to data centers, \({{zon}_{ij}^{t}}\) and \({{zoff}_{ij}^{t}}\) indicate whether each link has to be turned on or off at the beginning of time period t according to its status during the previous time period (t − 1). Figure 5 illustrates a small example of networking variables changes during a small scenario for transmissions in the link between a DC i and a DC j. In this small example, the link is used for transmissions from time band 5 until time band 8, after that, during time band 9 there are no transmissions, therefore the link is turned off, and we indicate that by assigning 1 to the variable \({{zoff}_{ij}^{t}}\). The link has to be turned on again during time band 11, though, the variable \({{zon}_{ij}^{t}}\) takes the value 1.

Fig. 5
figure 5

Networking variables

Battery and Green Energy Management Concerning green energy and batteries, the variable \({c_{i}^{t}}\) represents the amount of energy charged in a battery i at time t from renewable energy sources installed at DC i. It is related to the main decision variables \(sge_i^t\) and \(dge_i^t\) as described by the Fig. 6, where basically the generated green energy not immediately provided in \(dge_i^t\) is used to recharge the batteries.

Fig. 6
figure 6

Energy generation, storage and consumption

In addition to the above-mentioned variables, we have a set of variables to model the energy charging and discharging phases at DC i. We suppose that the energy charged at a time t cannot be used in the same time period. To define the energy level of a battery we use two different variables, one refers to the level of energy at the beginning of a time period t denoted by \(\overline{s}_{i}^{t}\), and the other one \(\underline{s}_{i}^{ t}\) refers to the level of energy at the end of the time period t. Figure 7 shows how variable \(sge_i^t\), which indicates the amount of the batteries’ energy consumed during time period t to run the DC, is connected to other energy-related variables. Finally, variables \(sge_i^t\) and \(dge_i^t\) define the total amount of green energy provided to the DC during the time period t, expressed by the variable \(g_i^t\).

Fig. 7
figure 7

Batteries functioning

All defined variables are summarized in Table 1.

Table 1 Decision variables

3.2.3 Objective Function

The objective of our model is to minimize the energy cost, which consists of two components: DC energy consumption and networking energy consumption. Therefore, we design the following objective function:

$$\begin{aligned} \min& \sum \limits_{t \in {\mathcal{T}}} \sum \limits_{i \in {\mathcal{I}}} \left \lbrace M_{i}^{t} \left [{\rho_{i}} \sum \limits_{l \in {\mathcal{L}}} \left ({\alpha_{il}} \underline{w}_{il}^{t} +{\eta_{il}}{won}_{il}^{t} +{\theta_{il}}{woff}_{il}^{t} +{mig}_{il}^{t} \right ) -{g_{i}^{t}} \right] \right \rbrace \\ &+ \sum \limits_{t \in {\mathcal{T}}} \sum \limits_{i,j \in {\mathcal{I}}}{E_{ij}^{t}}{R_{ij}} \left [ ({\gamma_{ij}} -{\delta_{ij}} ) \frac{{b_{ij}^{t}}}{{{{{\mathcal{Q}}}}_{ij}}} +{\delta_{ij}} z_{ij}^{t} + {\tau_{ij}}{zon}_{ij}^{t} + {\xi_{ij}}{zoff}_{il}^{t} \right ]\end{aligned}$$
(1)

The first term accounts for the cost of data centers consumption. It considers the costs of all data centers over all time periods, where for each data center we multiply the specific site cost of brown energy, \(M_i^t\), and PUE (Power Usage efficiency), \(\rho_i\), for the total energy consumed by the servers. The consumed energy consists of:

  • the total consumption of running VMs, where \(\alpha_{il}\) is energy needed for running a type l VM in DC i (e.g., Wh)

  • the energy needed for turning on and off the servers, where \(\eta_{il}\) and \(\theta_{il}\) are, respectively the energy needed for turning on or off a type l VM in DC i (e.g., Wh)

  • the energy consumed by DCs to migrate VMs, captured by variable \({mig}_{il}^{t}\)

In addition, the consumed energy is discounted by the amount of green energy provided by renewable energy sources installed at DCs represented by \(g_i^t\). As we consider the green energy produced locally, the operating cost of green plants is set to zero in the model.

The second term of the objective function accounts for the network consumption. It is computed as a sum of each path cost, which in turn consists of two components:

  • the energy required to operate each router on link (i, j) during the time period t, considering both active and idle state energy consumption, respectively indicated by \(\gamma_{ij}\) and \(\delta_{ij}\)

  • the energy required to turn on and off a router at the beginning of the time period, respectively, \(\tau_{ij}\) and \(\xi_{ij}\)

The model assumes that all the routers of the link (i, j) are identical, therefore, the energy consumption of each router multiplied by the number of routers along the link, \(R_{ij}\). This assumption can be easily modified by considering individualized energy consumption values, which have been omitted here for sake of ease in presentation. Finally, in order to compute the energy cost of this second term, the total energy consumption is multiplied by the average energy price along the link during time period t, \(E_{ij}^t\).

3.2.4 Constraints

In this section we present different groups of constraints used to model the Vm migration, the operations of data centers, the network, and battery and green energy management features.

VM Migration First, we must ensure that all the requests received from cloud users are processed by the data centers. The requests of different classes have to be processed by suitable type of VMs. For this purpose we use the following constraints:

$$\sum \limits_{l \in {\mathcal{L}}_k} \sum \limits_{j \in {\mathcal{I}}}{x_{ijkl}^{t}}= \lambda_{ik}^{t}\; \quad\forall i \in {\mathcal{I}}, \;\forall k \in {\mathcal{K}}, \;\forall t \in {\mathcal{T}}$$
(2)

In particular, constraint (2) ensures that all the incoming traffic is processed in the Cloud, by any of the DCs with an appropriate VM. Note that we consider the set \({\mathcal{L}}_k\), which is the set of VM classes that can process class-k requests.

In addition, a migration plan requires to define the number of VMs to migrate, as well as their source and destination DCs. For this purpose we use the following constraints:

$$\sum \limits_{j \in {\mathcal{I}}}{v_{ijl}^{t}}\ge { \sum \limits_{k \in {\mathcal{K}}} \left[\left(\sum \limits_{j \in {\mathcal{I}}}{x_{ijkl}^{t}} \right ) -{x_{iikl}^{t}} \right ] } /{{\mu_{l}}} \quad \forall i \in {\mathcal{I}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in {\mathcal{T}}$$
(3)
$$\sum \limits_{j \in {\mathcal{I}}}{v_{jil}^{t}}\ge { \sum \limits_{k \in {\mathcal{K}}} \left[ \left( \sum \limits_{j \in {\mathcal{I}}}{x_{jikl}^{t}} \right ) -{x_{iikl}^{t}} \right ] } /{{\mu_{l}}} \quad \forall i \in {\mathcal{I}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in {\mathcal{T}}$$
(4)
$${v_{iil}^{t}}= 0 \quad \; \forall i \in {\mathcal{I}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in {\mathcal{T}}$$
(5)

Equations (3) and (4) compute the number of VM that were sent and received, respectively, by each DC. They are proportional to the rate of request, respectively, redirected to and received from other DCs. Note that the term \(x_{iikl}^t\) represents the requests arrived to DC i and locally served. The number of migrated VMs must be sufficient to serve the rate of forwarded requests, this is captured by the parameter \(\mu_l\) expressing the maximum service rate for a type l VM. The last constraint (5) guarantees that a DC does not migrate VMs to itself.

The following constraint (6) determines the number of active VMs in a data center i after making the necessary migrations. Starting from the number of VMs that was created in a data center (\({\overline{w}_{il}^{t}}\)), we subtract the number of VMs that migrated and add the ones that arrived from other locations. In most cases, a data center makes one of these operations: sending or receiving VMs but not both at the same time, and that depends on its capacity and the energy constraints.

$${\underline{w}_{il}^{t}}={\overline{w}_{il}^{t}} - \sum \limits_{j \in {\mathcal{I}}}{v_{ijl}^{t}} + \sum \limits_{j \in {\mathcal{I}}}{v_{jil}^{t}}\; \quad\forall i \in {\mathcal{I}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in {\mathcal{T}}$$
(6)

Finally, constraint (7) calculates the energy consumed by migration operations.

$${mig}_{il}^{t}={EH_{l}} \sum \limits_{j \in {\mathcal{I}}}{v_{jil}^{t}} +{ES_{l}} \sum \limits_{j \in {\mathcal{I}}}{v_{ijl}^{t}}\; \quad\forall i \in {\mathcal{I}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in {\mathcal{T}}$$
(7)

Parameters \(ES_{l}\) and \(EH_{l}\) represent the energy consumption for migrating a VM type l consumed, respectively, at source and destination DC.

DC Behavior Together with the migration plan, we need to model the DC behavior. We assume each DC has a maximum number of requests that can process per time period. This number depends on the number of VMs that a data center is able to handle simultaneously. Hence, we use the following constraints to ensure that the capacity requirements of data centers are not exceeded:

$${\underline{{{w}}}}_{il}^{t}\le P_{il} \quad \forall i \in {\mathcal{I}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in {\mathcal{T}}$$
(8)
$${\overline{{{w}}}}_{il}^{t}\ge \sum \limits_{j \in {\mathcal{I}}} \sum \limits_{k \in {\mathcal{K}}} \frac{{x_{ijkl}^{t}}}{\mu_{l}} \quad \forall i \in {\mathcal{I}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in{{\mathcal{T}}}$$
(9)
$${\underline{{{w}}}}_{il}^{t}\ge \sum \limits_{j \in {\mathcal{I}}} \sum \limits_{k \in {\mathcal{K}}} \frac{{x_{jikl}^{t}}}{\mu_{l}} \quad \forall i \in{{\mathcal{I}}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in{{\mathcal{T}}}$$
(10)

where, constraint (8) ensures that the number of running VMs after all migration operations took place does not exceed the capacity of the system resources. In other words, the overall utilization of resources dedicated to run class-l VMs is below a planned threshold in each DC i, \(P_{il}\). Constraint (9) defines the number of VMs of type l originated in DC i, while constraint (10) defines the number of VMs of type l running on DC i after making all the necessary live migrations. Both numbers depend on outgoing and ingoing rates of migrated requests.

In order to ensure time continuity in the number of running VMs at each DC, we need the following constraints.

$${{won}_{il}^{t}}\ge {\overline{w}_{il}^{t}} -{\underline{w}_{il}^{t-1}} \quad \forall i \in {\mathcal{I}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in {\mathcal{T}}$$
(11)
$${{woff}_{il}^{t}}\ge {\underline{w}_{il}^{t-1}} -{\overline{w}_{il}^{t}} \quad \forall i \in {\mathcal{I}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in {\mathcal{T}}$$
(12)

Constraints (11) and (12) determine the number of VMs to be turned on and off at the beginning of time period t, according to their number at the end of time period t − 1.

Networking For the network, the following constraints are defined to ensure that we do not exceed bandwidth capacity and to guarantee the proper operation of network links:

$${b}_{ij}^{t} = \sum \limits_{l \in {\mathcal{L}}} {\phi_{ijl}^{t}} \left ({VMsize_{l}} +{DI_{l}} \right ) + \sum \limits_{k \in {\mathcal{K}}} \left( {{{B}}_{k}} \sum \limits_{l \in {\mathcal{L}}}{x_{ijkl}^{t}} \right) \quad \forall i \in {\mathcal{I}}, \;\forall j \in {\mathcal{I}}, \;\forall t \in {\mathcal{T}}$$
(13)
$${b}_{ij}^{t} +{b}_{ji}^{t} \le {{{Q}}_{ij}}{z_{ij}} \quad \forall i \in {\mathcal{I}}, \;\forall j \in {\mathcal{I}}, \;\forall t \in {\mathcal{T}}$$
(14)
$${z}_{ij}^{t} ={z}_{ji}^{t} \quad \forall i \in {\mathcal{I}}, \;\forall j \in {\mathcal{I}}, \;\forall t \in {\mathcal{T}}$$
(15)

Constraint (13) computes the portion of bandwidth used for transferring data between different DCs. In our scenario, all the exchanged data is related to migration operations. Basically, we consider the size of different types of migrated VMs, in terms of memory state and the content of CPU registers, then, we associate to each type of VMs an amount of users traffic in terms of requests. Parameters \(VMsize_{l}\) and \(DI_{l}\) indicates the bandwidth consumed to migrate a type-l VM and the size of disk images of type-l VMs, while \(B_k\) is the bandwidth required for a type-k request. In order to consider the effect of large latency between data centers, a scale factor can be added to Eq. (13) to take into account the increase of bandwidth needed to manage memory consistency during VM migration. However, a detailed investigation of the effects of the latency on VMs’ performance is out of the scope of this paper, which mainly focuses on energy efficiency aspects. Therefore, in the rest of the paper we assume this scale factor can be approximatively set to 1.

Constraint (14) guarantees that the VM exchanges do not exceed the link capacity, \(Q_{ij}\), which is forced to 0 when the link is switched off (\(z_{ij} = 0\)). Finally, Constraint (15) ensures that if a link is active in one direction, it is also active in the other one.

Similarly to the case of the number of VMs at DCs, we need to ensure the time continuity of the link status. Constraints (16) and (17) define which links have to be turned on or off at each time period transition.

$${{zon}_{ij}^{t}}\ge {{z}_{ij}^{t}} -{{z}_{ij}^{t-1}} \quad \forall i \in {\mathcal{I}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in {\mathcal{T}}$$
(16)
$${{zoff}_{ij}^{t}}\ge {{z}_{ij}^{t-1}} -{{z}_{ij}^{t}} \quad \forall i \in {\mathcal{I}}, \;\forall l \in {\mathcal{L}}, \;\forall t \in {\mathcal{T}}$$
(17)

Green Energy and Batteries Management Green energy management and storage play an important role in our model. The constraints below guarantee the appropriate behavior of the available renewable resources and batteries.

$${g_{i}^{t}} \le{\rho_{i}} \sum \limits_{l \in {\mathcal{L}}} \left ({\alpha_{il}} \underline{w}_{il}^{t} +{\eta_{il}}{won}_{il}^{t} +{\theta_{il}}{woff}_{il}^{t} +{mig}_{il}^{t} \right) \quad \forall i \in {\mathcal{I}}, \; \forall t \in {\mathcal{T}}$$
(18)
$${g_{i}^{t}} ={sge_{i}^{t}}\beta_{i}+{dge_{i}^{t}} \quad \forall i \in {\mathcal{I}}, \;\forall t \in {\mathcal{T}}$$
(19)
$$c_{i}^{t} + dge_{i}^{t} \le \Gamma_{i}^{t} \quad \forall i \in {\mathcal{I}}, \quad \;\forall t \in {\mathcal{T}}$$
(20)
$$\sum \limits_{t \in {\mathcal{T}}}{g_{i}^{t}} \le \sum \limits_{t \in {\mathcal{T}}}{\Gamma_{i}^{t}} \quad \forall i \in {\mathcal{I}}$$
(21)

In particular, constraint (18) ensures that the consumed green energy during a time period is less than the required amount to run the corresponding DC. Constraint (19) states that the green energy at time period t can be provided directly from the renewable source (\(dge_i^t\)) or from batteries (\(sge_i^t\)), taking into account the energy loss rate during a time period due to its storage, denoted by \(\beta_{i}\). Constraint (20) guarantees that the amount of energy charged in batteries and the one directly supplied are less than the total amount generated in one time period, denoted by \(\Gamma_{i}^{t}\), while constraint (21) ensures that the total green energy consumed during a day in a data center (\(\sum_{t_{\mathcal{T}}}g_i^t\)) does not exceed the total amount generated (\(\sum_{t_{\mathcal{T}}}\Gamma_i^t\)). This makes the daily repetition of the plan sustainable.

$$\overline{s}_{i}^{t}= c_{i}^{t-1} \psi_{i} + \underline{s}_{i}^{t-1}\zeta_{i} \quad \forall i \in {\mathcal{I}}, \;\forall t \in {\mathcal{T}}{\setminus}\left\{ 1\right\}$$
(22)
$$sge_{i}^{t}= \overline{s}_{i}^{t} - \underline{s}_{i}^{ t} \quad \forall i \in {\mathcal{I}}, \; \forall t \in {\mathcal{T}}$$
(23)
$$sge_{i}^{t}\le \overline{s}_{i}^{t} \quad \forall i \in {\mathcal{I}}, \; \forall t \in {\mathcal{T}}$$
(24)

Constraints (22), (23), and (24) are related to batteries. Constraint (22) states that the energy level in a battery at the beginning of the period t is given by the energy remaining at the end of time period t − 1 (considering the energy discharging efficiency of the battery \(\zeta_{i}\)) and the energy charged during (t − 1) (considering the energy charging efficiency of the battery \(\psi_{i}\)). Constraint (23) forces the amount of discharged energy, \(sge_i^t\), to be equal to the difference between the level of energy at the beginning and at the end of time period t. Constraint (24) makes sure that the energy discharged from a battery is less than the available energy in that battery at the beginning of the time period.

The following final constraints are related to physical limitations of the batteries. Constraints (25), (26) and (27) ensure that the model does not exceed the energetic capacity and charging and discharging rate limits of a battery.

$$\overline{s}_{i}^{t}\le Smax_{i} \quad \forall i \in {\mathcal{I}}, \; \forall t \in {\mathcal{T}}$$
(25)
$$c_{i}^{t}\le Cmax_{i} \quad \forall i \in {\mathcal{I}}, \; \forall t \in {\mathcal{T}}$$
(26)
$$sge_{i}^{t}\le Dmax_{i} \quad \forall i \in {\mathcal{I}}, \; \forall t \in {\mathcal{T}}$$
(27)

Parameters \(Smax_i\), \(Cmax_i\), and \(Dmax_i\) refer, respectively, to the maximum energy storage capacity, energy charging in 1 h, and energy discharging in 1 h of the considered batteries (which is the duration of one time period). We also assume the energy charged into a battery during time t can not be used until the next time period, therefore, we add the constraints (28), (29) and (30) to initialize batteries’ status in t = 1.

$$sge_{i}^{1}= 0 \quad \forall i \in {\mathcal{I}}$$
(28)
$$\overline{s}_{i}^{1}= 0 \quad \forall i \in {\mathcal{I}}$$
(29)
$$\underline{s}_{i}^{1}= 0 \quad \forall i \in {\mathcal{I}}$$
(30)

Table 2 summarizes all model parameters.

Table 2 Model parameters

4 Model Evaluation

Our model has been evaluated using a state-of-the-art MILP solver and considering various instances and workload configurations. In this section, we present results obtained on a set of scenarios with realistic values for parameters.

4.1 Parameter Setting

We considered 15 data centers distributed geographically all over the world. For their locations, we have taken inspiration from Cloud computing infrastructure of Google [60]. We used four geographical macro areas: West USA, East USA, Europe and Asia. In each area, data centers have the same or close time zone. A detailed view of used data centers location and number of servers is provided in Table 3. We mention here that the number of servers used for each data center is not the real number of Google DCs, but it is generated within a the range of [5000, 16,000].

Table 3 Data centers location and number of servers

For each data center, we associate a PUE value in order to include the power facilities that support the IT equipment load, such as cooling systems. According to [61] the global average PUE of the largest data centers is around 1.7, while the average PUE for all Google data centers is 1.12. In our tests, we vary PUE values between 1.1 and 2.

For data centers capacity, we generated a random number of physical servers for each DC, within the range [5000:16,000], and we assume 1:1 ratio for the physical to virtual resources assignment (i.e., 1 physical core is assigned to 1 virtual core of equal capacity).

Regarding technical characteristics of servers in DCs, we consider an HP ProLiant DL370 G6, with a Intel Xeon W5580 processor (8 cores at 3200 Mhz) and 96 GB of total memory. Even if we considered three different classes of VMs (see below), we modeled only a single server type, in order to simplify energy consumption analysis. For this reason, all VMs require the same amount of energy to run at peak load or when idle, while they differ in the class of requests that can be processed and their total number per time period. However, the model is flexible to include more than one type of physical servers, this is possible through assigning each class of VMs to a specific configuration of physical servers, therefore, different classes of VMs can take different values of energy consumption. The values of energy consumed for migrating a VM are taken from an experiment that is designed to estimate servers (host/destination) consumption due to live migration [9]. However, the model is tested on an example of homogeneous data centers, but the formulations are flexible to include heterogeneity by giving different energy consumption for different classes of VMs.

Another important parameter is the energy cost, which varies over time, with peak hours not simultaneously occurring at different time-zones. In this paper, we have used as input the average energy prices in the day-ahead market in different markets in the world including GME (Gestore dei Mercati Energetici) in Italy, New England Market and PJM in California USA, SEMO in Ireland and many others. Energy prices were collected and averaged during October 2014. Table 4 reports the list of market managers considered. The resulting costs of energy varies between 10 and 65 Euro/MWh. Figure 8 represents energy price trend for each different macro area.

Table 4 Energy market managers considered in the paper
Fig. 8
figure 8

Average energy prices for different macro areas during 1 day

We assume that data centers are fully connected, thus, we consider that capacity of different links varies between 0.5 and 1 Gbps. Moreover, a typical link connecting data centers is built up both by physical lines (such as optical fiber) and network components (such as routers and switches). Therefore, we estimate the energy cost of each link as the cost of energy consumed by its routers, proportionally to the bandwidth in use. For the number of routers in each link, a traceroute application was used to determine the number of hopes between two nodes. We have also considered a single reference router which is a Juniper E320, with a maximum power consumption of 3.84 kW [73]. The other values of parameters related to routers are listed in Table 6.

We built workloads based on a trace of requests registered at a website of a big University. This trace was collected hourly during 1 year, from sessions registered on 100 servers. To generate the workload, we consider the total number of Internet users for each country where a data center is located [74], then, based on the number of Google search done all over the world and percentages of Internet users of the same country, we estimate the requests rate for each data center.

For VMs types, we consider 3 classes of VMs, where each class is able to serve 5 types of requests. The differences between this three classes is mainly the size of the VM and the size of its disk images. While each type of request have different requirements in terms of bandwidth. For VMs size and related parameters, we took inspiration from Amazon EC2 Instance Types [75]. Disk images size varies depending on the type of VMs. In the considered scenario, we assume that the type of VM are not storage intensive therefore we consider that the size of disk images is between 0.5 and 20 Gb. However, latency factor is considered to be 1 in the following tests. The considered values of each class are summarized in Table 5.

Table 5 VMs settings

In order to estimate the total amount of green energy produced by each data center during a single day, we multiply the average energy produced by a green plant per square meter with the average data center size that we vary between 450 and 10,000 m\(^2\) [76]. Moreover, we consider that data centers are equipped with Li-ion (lithium-ion) batteries with overall capacity of 1486 Ah. This kind of batteries have a C-rate around 73 Ah per module, so with a voltage of 14.8 V, a module can charge 1.08 kWh during 1 h, which is almost the full capacity of a module. Energy charging and discharging efficiency are considered equal to 88% [77]. Table 6 summarizes parameter settings used to test the model.

Table 6 Parameters settings

4.2 Numerical Results

We used the commercial solver IBM ILOG CPLEX 12.1 as a MILP optimization solver [78]. The model has run on an 8-core 2.4 GHz Intel Xeon server with 96 Gb RAM. To evaluate the energy saving of joint optimization and compare it to traditional strategies that separate data centers and network energy management, we considered the proposed model denoted by Global Green and two different scenarios.

The Servers Only scenario, where we optimize only servers side without considering an optimization for the network. In other words, network equipments are considered to be turned on all the time without any energy management strategy. A second scenario, called Separated, consists of two separated energy management strategies for both data centers and the network. In this case, data centers collaborate to minimize their energy consumption without considering that their interchangeable traffic can have an impact on the network energy consumption, while the network side turn on and off the routers based on the amount of traffic imposed due to DCs load balancing. A third scenario called Global Green represents our proposed approach.

Considering that our model is solved based on 1 day horizon, all of the following results represent the behavior of the system in 1 day, energy expenditures included.

Figure 9 shows the energy costs for the different scenarios for different values of traffic (number of requests per day). We also report the energy savings (in percentage) of the Global Green scenario with respect the other two. These costs are the objective function values resulting from the optimization described in Sect. 3.2. The joint optimization approach (Global Green) can save large amount of energy cost up to 70% compared to the cost without network power management. The reason behind this is the non-negligible energy saved in the network turning on and off routers according to traffic dynamics. The savings can be up to 34% compared to the separate case, and this is due obviously to the separate solution of the two problems that leads to suboptimal solutions, and to the use of fixed traffic values for the optimization of the network. As expected, the savings tend to decrease as traffic increases since we are forced to keep more system nodes (servers and routers) active to accommodate a larger number of requests and we have less room (smaller space of admission solutions) for optimizing energy consumption.

Fig. 9
figure 9

Comparison of the optimal daily cost of energy for the different optimization approaches

In Fig. 10 we plot the energy consumption of the same three scenarios. Obviously, we observe a significant saving of about 43% of the joint model compared to the Servers Only case, because of the network energy consumption. On the other side, the energy consumption of the separate and joint models is comparable. The reason is that in the objective function we considered the energy costs rather than the energy consumption. Since the local generated green energy is available for free, the system tends to exploit it at best using also storage to better match production periods with consumption.

Fig. 10
figure 10

Comparison of the optimal daily energy consumption for the different optimization approaches

In order to better investigate the behavior of the proposed joint optimization model, we performed a series of other tests aimed at understanding the contribution of different system features like the geographical distribution of data centers with different energy prices and green energy availability. To this purpose we have considered with three additional scenarios: the Brown Base scenario, where we do not consider any load balancing between data centers nor the use of green resources. In the scenario called Green Base, we introduce the use of green resources locally but without transferring load between data centers. Conversely, in Global Brown scenario, we exploit traffic distribution among data centers but without using green energy generation. Note that in scenarios where green energy production is considered, we also use storage to optimize its use over time.

Before analyzing the results of the model, it is worth considering their computational complexity. Figure 11 shows the average execution time required by CPLEX to solve the models using different workload configurations. We observe that in all cases it is possible to solve the problem within a few minutes even for a very large number of user requests. For small instances, i.e., 2 Billion requests, the solver takes around 1 min for the Global Green scenario, and half a minute for the Global brown scenario. While for the two base cases (Brown and Green), solution time is always less than half a second, even if the problems become unfeasible for instances with more than 16 billion requests, due to capacity limits of data centers servers. In the worst case, with a very high traffic compared to capacity (40 Billion requests per day), the solution of the joint optimization problem took 4.52 min, which is very good for a 24 h time horizon of traffic planning.

Fig. 11
figure 11

Solving time

Figure 12 shows the results obtained for energy costs using the scenarios mentioned above for different traffic levels. On top of each bar, we indicate the percentages of savings of the Global Green model compared the one indicated by the bar. It can be easily noted how significant cost reduction is achieved through collaboration between data centers using VMs migration. Moreover, with the cooperative and jointly optimized schemes the available capacity of the Cloud system is higher than with the non-cooperative schemes, as for high load levels the Base Brown and Green models are not feasible. We notice also that using the cooperative model without green energy (Global Brown) still provides non-negligible savings compared with the non cooperative approach (Base Brown).

Fig. 12
figure 12

Overall cost comparison

To better understand how the Global Green model uses energy in the system, we can analyze the split of energy use in Figs. 13 and 14. The amount of green energy is limited by the capacity of generators considered in our instances, and it is a significant portion of the energy used only at low traffic levels, while it becomes rather small at high load. Taking a closer look to optimal solutions we notice that the system in different time periods tends to saturate the capacity of sites where energy is cheaper and green energy available for migrating VMs, and uses the other sites for the load exceeding capacity until the savings are significant compared to migration costs. As load increases, the capacity of cheap and green sites tends to be saturated by local demand and the cost savings decrease since migration is used mainly for load balancing.

Fig. 13
figure 13

Energy consumption split

Fig. 14
figure 14

Energy split percentage

Figure 15, shows the number of migrated VMs in both Global Green and Global Brown scenarios. As it is expected, it is proportional to the number of received requests. Even though VM migration process itself costs energy, the overall cost saved is more significant. We can notice also that with the presence of green sources of energy more VMs are migrated to exploit it, unlike the brown case where we only benefits from the difference of prices between various locations.

Fig. 15
figure 15

Number of virtual machines migrated

While our model can achieve significant savings of the total power cost, it may cause consumption of larger amounts of energy as shows Fig. 16. The reason is that we use beside data center servers, network devices for VMs load balancing and this consume more power. On the other hand, our model performs better in exploiting green energy by migrating VMs as reported in Fig. 17. Therefore, even with the additional amount of energy that we consume, the proposed model is greener because it uses less brown sources of power by replacing them with green resources.

Fig. 16
figure 16

Total energy consumption

Fig. 17
figure 17

Green energy usage

5 Conclusion

Most of existing work on energy optimization in Cloud systems manages separately data center servers and their interconnection network. In this paper we presented a new optimization framework based on MILP for jointly management of Cloud data centers and their network.

The proposed model considers a set of data centers geographically distributed over different locations around the world. Data centers collaborate by migrating VMs between them when necessary to exploit different energy prices in various time zones. Another factor that we consider is the availability of green energy resources in some data centers and the possibility to store this energy using batteries.

In Cloud scenarios, migrating VMs between different sites needs additional network resources due to the size of VMs themselves and their data. Beside managing both data centers and their network, we also manage both the use of brown and green energies. Our strategy consists on redirecting the load to sites with more available green energy. We suppose also that some data centers are able to store the generated energy for later use, therefore we can save the clean power to use it when its generation is not possible or during peak energy price periods, thus we solve the problem of the possibility to be discontinued.

We show that the proposed optimization model can be solved using a state of the art MILP solver (CPLEX) in a reasonable time even for big size instances. The obtained results are very promising and shows that our approach allows significant cost saving compared to the base scenarios used nowadays. Moreover, from an environmental point of view, our model reduces greenhouse gas emission by pushing the Cloud to use more green power resources, alongside with optimizing its use in each data center using local energy storage.