1 Introduction

Cloud computing is a modern direction for computing as opposed to conventional desktop computing. This model received significant attention from both industry and academia and being emerged as a publicly accepted style of computing [1]. The cloud computing model offers a number of remarkable advantages over desktop computing. These benefits are location-independent access to the services, rapid elasticity, measured service, minimal capital investment, and lower maintenance cost.

A cloud service is composed of a set of data centers around the globe. Each data center contains thousands of servers and resides in a different geographic point for performance enhancement and reliability purpose while connected to other data centers with high-speed telecommunication link [2]. High operational cost such as electricity cost, cooling cost, and footprint cost is the leading challenge for today’s modern cloud data center providers. Managing operational cost enables cloud providers to offer affordable and competitive services to their customers [3]. However, among different costs, the rampant electricity cost constitutes the largest fraction of total cost in a data center and needs to be properly controlled [4,5,6,7]. The steady increase of energy consumption has appeared as a real concern in operating a data center. One of the main reasons for high energy consumption is inefficient management of resources. Particularly, most of the servers suffer from severe underutilization which is a major cause for power overconsumption [8, 9].

In modern data centers, virtualization as a primitive technology provides a potential solution to enhance resource utilization while it guarantees performance and isolation for cloud applications [10]. Even though virtualization technology offers a potential platform to improve resource utilization, there are still some issues which obstruct the maximum utilization of hardware. One of these issues is the inefficient deployment of virtual machines in a virtualized data center [11]. Having a data center with an enormous number of PMs, the basic challenge is how to map each input VMs to a most suitable PM so that the total electricity consumption by all active PMs is minimized and for each individual PM maximum resource utilization is achieved. Optimal placement of virtual machines is one of the prevailing approaches for efficient use of hardware in a data center and regarded as the core of cloud computing. Nonetheless, VM Placement is a complex computational task and recognized as an NP-Hard combinatorial optimization problem [12]. Despite past research works, the problem still needs more attention and further research effort should be devoted to this domain. This paper delineates the characteristics and significance of the VM Placement and presents a comprehensive review of different approaches for solving VM Placement. These approaches are either intended to optimize a sole VM Placement objective function or presumed to be a multi-objective optimization approach. For each of the two categories, various research works are overviewed. In addition, VM Placement solutions are discussed based on other perspectives and properties. Finally, a number of open issues are listed which can serve as a platform for further research in this domain. It is worth mentioning, many of the research works discussed in this paper could also apply to the private data centers and they are not exclusively designed for a cloud computing environment.

While to the best of our knowledge there are few related survey articles on the domain of this article, what makes the present paper relatively distinct is its main concentration on the optimization perspective. Because the majority of existing surveys overlooked or less concentrated on the optimization aspect, we particularly decided to discuss the problem under the context of a single/multi-objective optimization framework in detail. Two common approaches as weighted-sum approach and Pareto-based approach for dealing with the VM placement having multiple objectives are clearly explained and existing works are discussed, classified and summarized in accordance with these two common approaches. Furthermore, special attention is paid to the heuristic methodologies proposed to deal with the problem in various existing works. The heuristic methods presented in the literature are described, analyzed and compared including their strength and weakness. Lastly, unlike the majority of the existing works which have been published a few years prior to this study, the present work covers the state-of-the-art research works conducted within the domain. In the following, a few most recent and closely related survey articles are briefly compared to our present work.

Usmani and Singh [13] presented a study of VM placement techniques used in a green cloud. The survey’s main focus is merely improving energy efficiency. The survey does not cover a range of VM placement objectives such as network traffic, resource wastage, and response time which are discussed in our work. Also, unlike the wide range of optimization methods presented in the present paper, only a limited number of mostly deterministic algorithms were discussed.

In another related survey, Masdari et al. [14] provided a review of VM placement schemes for cloud computing. In contrast to our work, the authors did not address VM placement from the optimization point of view along with its theoretical foundation and specification in a single and multi-objective variation. Moreover, the survey does not present the advantage and disadvantage of each existing method as presented in our work. On the contrary, our work provides an inclusive and well-structured taxonomy of different properties of VM placement along with in detail and organized summarization of single objective, multi-objective methods.

A systematic review of VM placement was conducted by [15]. Our work significantly differs from this work in the sense that the work, in fact, is a systematic and concise review of the relevant literature in the domain. It is mainly focused on collecting, organizing, and quantitatively summarizing the literature in a systematic way rather than a detailed analysis, discussion and conclusion of various VM placement schemes.

Pietri and Sakellariou [16] also surveyed the body of literature related to mapping VMs onto PMs. The survey is mainly structured on the basis of four different factors as VM configuration, VM placement, optimization objectives, and application metrics and tools. The optimization objectives are classified into three categories of resource utilization, monetary units, and energy consumption. However, the survey’s concentration on the optimization aspect of the problem (including theoretical representation, different approaches to deal with the problem and detailed methodology) is minimal.

The rest of this paper is organized as follows: in the next section, a background of the problem presented, including a summary of the cloud computing paradigm, virtualization technology and significance of energy consumption concern in modern large scale data centers. Section 3 defines the VM placement problem along with its characteristics. Section 4 reviews the state-of-the-art VM placement strategies classified into a single objective and multi-objective VM placement. Section 5 discusses the VM placement problem from different perspectives on the basis of the taxonomy presented in Fig. 19. A number of open issues for future research are listed in Sect. 6. Lastly, Sect. 7 concludes this paper.

2 Background

This section commences with an introduction to the concept of cloud computing as the modern paradigm of computing and particularly Infrastructure as a Service (IaaS) as underlying service. Then, an introductory to the virtualization as the fundamental technology of cloud computing is provided. Thereafter, the importance of escalating energy consumption in today’s modern data centers is highlighted. Finally, in the last section, a general architecture for VM Placement in a data center is illustrated.

2.1 Cloud computing paradigm

In the cloud computing paradigm, rather than owning a local computing asset, users rely on a service they receive from distant high-performance provider [17]. A cloud facility is administrated by Cloud Service Provider (CSP) and consists of a large number of servers spread over different geographical points around the world. The cloud services are delivered via three different typical models. These models are Software as Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS).

Figure 1 shows different cloud service delivery models with examples for each mode. In the SaaS model, an application is hosted on the cloud server and users are given on-demand access to a ready application [18]. Examples of the SaaS are Google Apps, Dropbox, and Salesforce.com Applications. In PaaS, developers make use of an online programming framework to develop their application without the cost of the underlying software and hardware [18]. Heroku and Google Apps Engine are two instances for PaaS. In IaaS, computer equipment such as processor, memory, network, and storage are abstracted and delivered to the users as a service. Examples of IaaS are Amazon EC2 and Google Compute Engine. As IaaS environment serves as a platform for VM placement, in this paper we mainly focused on the IaaS cloud system.

Fig. 1
figure 1

Different service delivery models in cloud computing

IaaS emerges as one of the popular and powerful services in cloud computing [19]. An IaaS cloud provides the on-demand and scalable service to the users through a large shared pool of computing resources in the form of VMs while the users are charged based on a pay-as-you-go pricing model similar to water and electricity [20].

Due to the elasticity of IaaS services, the resources can be instantly and flexibly scaled up/down and therefore users are not required to anticipate their future hardware demands. This feature gives users the illusion of infinite computing resources without having to establish their own infrastructure [21]. The user specifies the hardware configuration of requested VM including processor speed, memory size, disk space amount and network bandwidth and the VMs are created accordingly, instantly and are run on the cloud provider’s infrastructure [22]. From the customer’s standpoint, these approach results in substantial cost effectiveness since the capital expenditure is eliminated. Moreover, improved reliability is yielded because the service is provided by the vigorous provider’s infrastructure. With recent rapid development in IaaS market, this model has become a remarkable alternative to the local ownership of computing infrastructure and therefore a number of prestigious companies such as Netflix shifted to IaaS cloud instead of owning dedicated servers [23]. Examples of real-world IaaS providers are Amazon EC2 and GoGrid, Rackspace cloud. IaaS cloud system leverages the virtualization technology to manage the underlying computing resources and deliver flexible and dynamic service to the customers [24]. A schematic of an IaaS cloud is shown in Fig. 2.

Fig. 2
figure 2

IaaS cloud system model

2.2 Virtualization technology

The virtualization was first introduced in the late 1960s by IBM to make efficient use of expensive hardware in that time and mostly applied to the desktop sector [6, 25, 26]. A large underutilized mainframe’s hardware was logically partitioned into slots which enabled users to use the resource in a time-sharing fashion [27]. Even though today’s computer hardware is not as expensive as before, the virtualization technique is still being applied as a technique to divide the hardware resource of a single computer to multiple segregated computing environments [28]. Also, modern data centers make use of virtualization for other advantages [29]. These advantages are: reducing management complexity [29]. portability, encapsulation, isolation and efficient utilization through server consolidation [30]. Virtualization facilitates the way an administrator manages a data center. The isolation property prevents malicious applications, security flaws, and software failures to affect other co-located machines [29, 31]. Encapsulation ensures that the whole state of a machine as an image file can be cloned or migrated to another host with the purpose of load balancing, scheduling and fault tolerance in case of hardware failure [28, 31]. Virtualization also makes applications and services easily portable across heterogeneous hosts with different geographical locations. Again, virtualization provides finer-grained resource allocation [26]. However, the main application of virtualization in a data center is preventing server sprawl and efficient utilization of hardware through server consolidation [25]. Consolidation of many underutilized servers in a small number of host results in a significant saving of energy cost [28].

In virtualization concept, on top of underlying hardware layer, a control program called Hypervisor or Virtual Machine Monitor (VMM) (such as VMware, KVM, Hyper-V, and Xen) creates, executes and manages virtualized instances of a machine. This virtual instance which is called a virtual machine (VM) contains its own operating system and applications and acts as the logical equivalent of a physical machine [32]. The applications that are installed on top of the virtualized machine are expected to perform identically as if they were installed on a real physical machine. The hypervisor tends to provide a transparent underlying hardware layer to the virtualized software and make application independent to a specific type of hardware and therefore resilient to the future hardware changes [6]. Multiple virtual machines can co-reside on a single physical machine and each virtual machine encapsulates a complete operating system bundled with the necessary applications.

2.3 Significance of energy consumption in modern data centers

The energy consumed in large scale data centers has dramatically risen during recent years. This alarming growth of energy consumption has caused deep concerns in industry and research communities. Kaplan et al. [33] denote each data center consumes electricity equal to 25,000 households on average. During 2008–2010 data centers in the United States consumed the energy provided by ten nuclear power stations [34]. It is reported that Google data centers consume power equal to the total consumption of a small city like San Francisco [35]. Again, Gartner estimates that energy cost for the IT industry will grow from 10 to 50% in the next few years [36]. Apart from directly associated electricity cost, over-consumption of electricity also requires larger expenditure for air conditioning computer systems since cooling cost is proportional to the energy spent for the computation purpose. Moreover, emission of carbon dioxide (CO2) is the negative environmental effect of overly energy consumption and it is one of the causes of global warming and climate change. According to Gartner [37], 2% of CO2 global emissions are from the IT industry. All these facts motivate researchers to move toward green data centers by minimizing the level of energy consumed in data centers.

By far, two approaches suggested for controlling the increasing level of electricity consumed in data centers. The first approach mainly focuses on the design and manufacture of energy-efficient computer hardware such as processor, memory, and disk while the second approach relies on the efficient management of existing hardware systems. Indeed, the latter approach tends to make efficient use of the available hardware resources. One of the leading reasons for energy over-consumption in data centers is the underutilization of hardware resource. A study [6] shows that the utilization degree in a data center is between 10 to 50%. Underutilization often occurs as a result of over-provisioning which is an allocation of more than enough computing resources to the input workloads. Due to the uncertainty of future demands for an application, the resources are allocated based on the peak resource requirement of applications at the beginning. Energy consumption and low utilization are two correlated issues. Resources having low utilization still consume a non-trivial amount of power [38]. A larger number of underutilized servers in a data center results in higher energy consumption, more expense for cooling systems and extra required footprint. Therefore, reducing the number of poorly utilized machines to a minimum number of fully utilized ones contributes to cutting down the amount of electricity usage in a data center.

2.4 System architecture

The context for VM placement is a large scale data center with numerous physical servers inter-connected using high-speed telecommunication links. Each server is characterized by a certain amount of different resource types it has such as CPU, memory, disk storage, and bandwidth. Figure 3 shows the architecture of the system. The main roles in these systems are cloud provider, end-user and client [39].

Fig. 3
figure 3

System architecture

  • The cloud provider is the owner of the infrastructure, manages the data center, its resources and lease them to the client.

  • The client leases an infrastructure in the form of VMs for a certain period of time from a cloud provider.

  • End-user uses the application ran on the cloud infrastructure.

It is to be mentioned that in case of multiple independent public cloud providers, an additional entity which is often called Cloud Broker provides intermediation service and allow users to deploy their VM across multiple providers [40]. However, the common interaction between the above roles is as follows [39]:

  • A client plans to run his application on the cloud infrastructure and then submits his request for provisioning of a set of VMs with pre-determined hardware specification.

  • Cloud provider, creates the requested collection of VMs accordingly.

  • Having a determined set of VMs, the cloud provider decides how to assign each VM on a most suitable PM.

  • Once the VMs are deployed on the different PMs on the data center they are available to serve requests from end users.

A software called Virtual Machine Manager (VMM) (or hypervisor) continuously monitors the available resources in each PM and based on its placement algorithm places a requested set of VMs on a certain subset of available PMs in a data center. The virtual machine manager is comprised of two different modules: the local manager and global manager [41]. The local manager operates on each individual PM. The role of a local manager is to monitor the current residual resource capacity in the PM together with resource utilization and reports the statistics to the global manager. The global manager module resides on the master PM and receives and compiles information coming from each local manager. Global manager process those reports to make a global picture of current resource usage in the whole data center. In addition, the global manager is responsible for optimizing VM placement.

3 Problem definition

The virtual machine placement is defined as follows: given a set \(V = \left\{{v_{1}, v_{2}, \ldots, v_{m}} \right\}\) of virtual machines and a set \(P = \left\{{p_{1}, p_{2}, \ldots, p_{n}} \right\}\) of physical machines, the goal is to find a specific mapping of VMs in \(V\) into PMs in \(P\) that most minimizes/maximizes a certain predefined objective(s). The most common objective is minimizing the number of running PMs. However, the other objectives such as network traffic also can be defined. Each VM demands a different amount from each resources type (i.e. CPU, memory, and disk space and network bandwidth). On the other hand, each PM has a certain capacity from each resource type. It is assumed that VMs do not demand more resource than a single PM can offer [42].

For a large data center with thousands of PMs, assigning VMs to PMs is an intricate decision-making task for a human administrator. This is due to the presence of enormous potential mappings while only one or few of these mapping results in an optimal value of the predetermined objective. Theoretically, VM Placement is known to be an NP-Hard combinatorial optimization problem [12, 43,44,45] since there is no provable efficient algorithm to solve it [46]. A search for a solution should be conducted within a large space of possible mappings. The exact algorithms which provide optimal solution often take a long time to produce the optimal solution and therefore, in practice, an approximate algorithm is employed to deliver a near optimal solution in reasonable computation time.

3.1 Objectives

Although VM Placement has addressed in literature with subject to a variety of objectives from different perspectives, the most common objective is minimizing the number of active PMs. This is based on the underlying assumption that consumed energy is proportional to the number of powered-on PMs in a data center. Reducing the number of active PMs also contributes to the reduction of server footprint and capital investment in a data center [25]. However, VM placement can have other objectives such as power consumption or inter-communication among a set of VMs. A list of different objectives for VM Placement based on the literature is presented in Tables 1 and 2. In general, VM Placement is defined with a single objective or multiple objectives as shown in Fig. 4. As the name suggests, the single objective VM Placement is optimization (maximization or minimization) problem of one objective. In the multi-objective form of VM placement, two or more objectives are to be optimized simultaneously. For instance, minimizing the resource wastage in physical machines along with minimizing the power consumption is a multi-objective virtual machine placement problem with two objectives.

Table 1 Summary of different single objective VM placement schemes
Table 2 Summary of multi-objective VM placement schemes
Fig. 4
figure 4

A general formulation of the VM placement problem

3.2 Constraints

Besides objectives, the search space for VM Placement can be restricted when constraints are introduced (as shown in Fig. 4). These constraints can be in two types: the basic constraints which actually serve as an intrinsic assumption to the problem and additional technical constraints which are added in accordance with a practical application or specific requirement.

In the following, the basic constraint and additional constraints are listed.

3.2.1 Basic constraints

  • I: Each VM can be hosted by exactly one PM:

    $$\sum\limits_{{j = 1}}^{n} {x_{{ij}} } = 1,\quad \forall i:1 \le i \le m$$
    (1)

    where \(x_{ij} \in \left\{{0,1} \right\} , 1 \le i \le m, 1 \le j \le n\) is 1 if VM i is assigned to PM j and 0 otherwise.

  • II: For each type of resource (e.g. CPU, memory) and for each active PM, the sum of the resource demands for all the VMs sharing that PM should not exceed the capacity of the PM:

    $$\sum\limits_{{i = 1}}^{m} {d_{{ri}} } x_{{ij}} \le c_{{rj}} y_{j} ,\quad \forall j:1 \le j \le n$$
    (2)

    where \(d_{ri}\) is the resource demand of type r by VM i,\(c_{rj}\) is the capacity of resource type r offered by PM j and \(y_{j}\) is 1 if PM j is active/powered-on and 0 otherwise.

3.2.2 Additional constraints

In addition to the above basic constraints, additional constraints may also apply. These constraints are as follows [45]:

  • III: the number of VMs assigned to a particular PM j is limited to a certain number:

    $$\sum\limits_{{i = 1}}^{m} {x_{{ij}} } \le t_{j} ,\quad \forall j:1 \le j \le n$$
    (3)

    where \(t_{j}\) is the maximum number of VMs, PM j can host.

  • IV: A subset of all VMs; S may need to be assigned to the different PMs for security or technical purposes:

    $$\sum\limits_{{i \in S}} {x_{{ij}} } \le 1,\quad \forall j:1 \le j \le n$$
    (4)
  • V: A subset of VMs; S may need to be assigned to the same PM in order to facilitate inter-application communication or other requirements:

    $$\sum\limits_{{i \in S - \left\{ e \right\}}} {x_{{ij}} } = \left( {\left| S \right| - 1} \right) \cdot x_{{ej}} ,\quad e \in S,\forall j:1 \le j \le n$$
    (5)
  • VI: A particular VM \(v\) may need to be assigned to a subset; \(R\) of PMs. This is because \(v\) requires a certain hardware specification such as storage or network bandwidth that only provided by PMs in \(R\):

    $$\sum\limits_{{j \in R}} {x_{{ij}} } = 1,\quad \forall i:1 \le i \le m$$
    (6)
  • Also, any problems objective can also act as a constraint if it does not appear as an objective. For example, minimizing the number of PMs while satisfying the inter-traffic among VMs.

3.3 Solution

A potential solution to the VM placement problem with m VMs and n PMs can be represented by the matrix shown in Fig. 5. However, among potential solutions, only solutions that satisfy the basic constraint are feasible solutions. As it can be inferred from the matrix in Fig. 5, the number of the different combination is \(2^{m \times n}\) as each cell takes either 0 or 1. In fact, \(2^{m \times n}\) is the size of search apace for a typical VM placement problem and it is also binomial time complexity of a brute-force algorithm that enumerates all the possible solutions in the search space to find the optimal one. This can be an indication of the intractable nature of the VM placement problem.

Fig. 5
figure 5

Matrix representation of potential solutions to the VM placement problem

4 State-of-the-art VM placement strategies

In this section, the state-of-the-art VM placement approaches are classified into two categories, namely: single-objective and multi-objective approaches. Specifically, for each category, a number of salient methods in the literature are described according to the particular objective used.

4.1 Single objective VM placement

In the following, a number of eminent works that attempted to address a single objective VM are reviewed according to their specific objectives. Finally, at the end of this section, a summary of discussed schemes together with their corresponding objectives and their assumption/drawback is presented in Table 1.

4.1.1 Number of PMs

One of the common and intuitive approaches for reducing power consumption is through minimizing the number of PMs in a data center. In such approaches, power consumption is deemed to be proportional to the number of PMs [7]. This objective can be represented by the following formula:

$$minimize\sum\limits_{{j = 1}}^{n} {y_{j} } \quad \forall i:1 \le i \le m$$
(7)

where \(y_{j}\) is 1 if PM j is active/powered-on and 0 otherwise. Figure 6 shows an example of minimizing the number of PMs in a data center. The upper part is a data center with five underutilized PMs whereas the bottom part is the same data center with a minimum of two highly utilized active PMs and three turned off servers. The load on three switched off PMs has been transferred to the active PMs.

Fig. 6
figure 6

Minimizing the number of PMs

Tang et al. [47] proposed a dynamic forecast scheduling algorithm called VM-DFS for VM placement. The problem is formalized as bin packing problems and FFD algorithm is employed to solve the problem with the objective of minimizing the number of active PMs. VM-DFS uses a prediction model to forecast the future memory consumption of VMs with dynamic memory demand and place VMs on the most suitable PMs based on their predicted future consumption. The result of the simulative experiment in CloudSim shows that VM-DFS reduces the number of active PMs as compared to the default static algorithm in CloudSim. However, the proposed algorithm merely considers the memory usage and the impact of other important resources like CPU and bandwidth are overlooked.

Liu et al. [48] developed an Ant Colony System (ACS) based algorithm named OEMACS coupled with a local search technique to minimize the number of active PMs in a cloud data center. The problem formulation merely considers two types of resources as CPU and memory. The performance of OEMACS assessed by comparing the results to that of FFD [49], RGGA [50], ACO and MACO obtained from a series of experimental tests with both homogeneous and heterogeneous servers. The results indicate that OEMACS can find an optimal or quasi-optimal solution in both homogeneous and heterogeneous data center environments in terms of a number of active servers and average memory and CPU utilization.

Yan et al. [51] presented SOWO, a discrete particle swarm optimization (PSO) based VM placement algorithm to minimize the number of active PMs in the cloud center. To reduce the computational complexity of the problem, SOWO only focuses on CPU and memory as the two most critical system resources. For each PM, total workload for placing k VMs on a that PM defined as below formula:

$$\frac{1}{{1 - \left({cpu_{p} + \mathop \sum \nolimits_{i = 1}^{k} cpu_{i}} \right)}} \times \frac{1}{{1 - \left({mem_{p} + \mathop \sum \nolimits_{i = 1}^{k} mem_{i}} \right)}}$$
(8)

where \(cpu_{p}\) and \(mem_{p}\) are the current CPU and memory load of the PM respectively. Also, \(cpu_{i}\) and \(mem_{i}\) are CPU utilization and memory utilization of VM Vi. The authors used the presented workload formulation as a fitness function for their PSO-based algorithm.

SOWO was implemented as a scheduler in OpenStack and the experiment was conducted to compare the usability and superiority of SOWO compared to the native OpenStack scheduler known as Filter Scheduler. In the experiment, ten homogenous PMs with identical specifications were used along with four types of VM templates. The result of experiments shows that SOWO is capable of using fewer PMs compared to native OpenStack native scheduler. Also, in terms of computational time, the result indicates two methods behave almost similar when the number of input VMs increases. However, when it comes to resource utilization, OpenStack native scheduler performs more stable and balanced than SOWO.

4.1.2 Power consumption

Power consumption of a server in a data center is often determined by the sum of power consumption of all the main hardware components in that server. The components are CPU, memory, disk storage, and network adapter. However, CPU consumes the largest fraction of the energy as compared to other hardware parts [52]. Recent studies [42, 52,53,54,55] report that there is a linear relationship between the power consumption of a server and its CPU utilization level. Furthermore, a server in its idle state still consumes 70% of the power when it operates with maximum capacity. Hence, power consumption is usually defined as a function of CPU utilization as shown in the following formula [30, 41, 52, 56,57,58,59]:

$$P\left(U \right) = P_{idle} + \left({P_{busy} - P_{idle}} \right) \times U$$
(9)

where \(P_{idle}\) is the power consumption in idle state, \(P_{busy}\) denotes the maximum power consumption when the CPU is fully utilized, \(U\) is the CPU utilization of server and P(U) is the power consumption of the server based on its utilization level. The CPU utilization is variable over time due to change in CPU load and therefore the total energy consumption (\(E\)) for a period of time; \([t_{a},t_{b}]\) is calculated as follows [52, 60]:

$$E = \mathop \smallint \limits_{{t_{a}}}^{{t_{b}}} P\left({U\left(t \right)} \right)dt$$
(10)

It is noteworthy that Dynamic Voltage and Frequency Scaling (DVFS) is an effective technology in managing the energy consumption of the processor. This technology allows the processor to operate in variable frequencies with different voltages. However, although DVFS has been widely applied in embedded, multicore and multiprocessor systems, it is less adopted in virtualized data centers [58]. As a result, the servers in today data centers are not energy proportional [61].

Verma et al. [62] presented design, implementation, and evaluation of placement controller called PMapper to address application placement and VM placement for a heterogeneous data center. The VM placement is performed in the second phase and it is intended to minimize the power consumption under a fixed performance constraint in the form of SLA. PMapper uses an extension of FFD heuristic (called Min Power Parity or mPP) in which more power-efficient servers are utilized first.

Beloglazov et al. [52] presented a modified version of Best-Fit Decreasing heuristic called MBFD to solve the static form of VM placement. In MBFD, all VMs are sorted in descending order of their CPU demands and each VM is mapped to the PM with sufficient capacity and with least increase of power consumption after this mapping. The time complexity of the algorithm is O(n.m) where n is the number of VMs and m is the number of PMs. Abdullah et al. [63] enhanced MBFD by proposing fast best-fit decreasing (FBFD). The main idea of FBFD is to sort the list of PMs in increasing order of CPU utilization of PMs using a binary search tree before the placement takes place. After each placement, the list of PMs is sorted again. As a result, the VMs are assigned to a PM with most power-efficient PMs first. As FBFD uses a binary search tree to find the most suitable PM with time complexity of \(O\left({log_{2}^{n}} \right)\) instead of searching the whole list in MBFD, its overall time complexity is \(O\left( {m \cdot log_{2}^{n} } \right)\) as compared to \(O\left( {m \cdot n} \right)\) of MBFD. In the dynamic scenario and to determine when to migrate VMs, a double-threshold policy introduced. The CPU utilization level of PM should be always between a lower threshold (Tl) and an upper threshold (Tu). If the total CPU utilization level of a PM exceeds Tu, some VMs have to migrate to other PMs. In the event that the CPU utilization of PM falls below the Tl, all the VMs on that particular PM have to migrate to other PM in order to reduce the utilization level and prevent performance degradation. The authors also proposed three different policies to determine what VMs should be migrated. These policies are Minimization of Migration (MM) policy, The Highest Potential Growth (HPG) policy, and Random Choice (RC) policy. MM policy identifies the minimum number of VMs to be migrated according to the current utilization level of PM along with the two thresholds. RC policy randomly selects a number of VMs to be migrated. HPG migrates a set of VMs with the lowest usage of CPU relatively to the CPU capacity as defined by set S in the following formula:

$$\left. {\text{S}} \right| S \in P\left({V_{j}} \right), u_{j} - \mathop \sum \limits_{v \in S} u_{a} \left(v \right) < T_{u}, \mathop \sum \limits_{v \in S} \frac{{u_{a} \left(v \right)}}{{u_{r} \left(v \right)}} \to { \hbox{min} }$$
(11)

where Vj is the set of VMs already placed on PM j. P(Vj) is the power set of Vj, uj is the current CPU utilization of PM j, ua(v) is the CPU utilization allocated to VM v and ur(v) is the CPU initially requested by VM v. The simulation result shows that, overall, using MM policy provides the best results among other policies in terms of SLA violation, energy saving and the number of VM migration.

Gao et al. [64] presented a dynamic resource management scheme to minimize the power consumed by the physical infrastructure while SLA requirement is also met. The proposed scheme leverages both DVFS and server consolidation to reduce power consumption and provides desired performance guarantee. In particular, a greedy heuristic on the basis of BFD was presented to assign a set of VMs to PMs with regards to the above-mentioned objective. The result of experimental validation shows that the proposed scheme yields 50.3% power saving as compared to the four other policies.

Kessaci et al. [65] proposed a new placement technique to be embedded in OpenNebula [66]; a cloud management software. The proposed algorithm called EMLS-NOC is based on multi-start local search metaheuristic and aims to minimize the energy consumption of the entire infrastructure. The multi-start feature is used to give more exploration power within the search space to the algorithm while local search adds more accuracy to the algorithm. EMLS-ONC was compared to OpenNebula’s default scheduler and FFD. The result shows EMLS-ONC improves the OpenNebula’s default scheduler 26% on average and it also improves energy-aware FFD-based approach up to 25%.

The dynamic form of VM placement also was addressed by Ferreto et al. [67]. In particular, the authors investigated mapping VMs with variable resource demands into the smallest number of PMs. As opposed to the static form where the VM resource demands remain unchanged during VM lifetime and VM resource capacities are assigned according to a peak demand of VM, VMs with dynamic demands, changes their resource demands based on their current and actual need. However, the dynamic approach might result in migrating VM between different PMs in order to remove VMs from an overloaded PM or to switch off a PM when all its VMs already moved to other PM. The authors presented a Linear Programming (LP) formulation to deal with the problem. The whole lifetime of VM is divided into a number of consolidation steps. In each consolidation step, the placement process is repeated using VM demands at that particular moment which may require migration of VMs to different PMs and also can affect the number of required PMs. The key idea behind the proposed approach (called dynamic consolidation with migration control) is to avoid migrating VMs with steady demands. Performance degradation due to migration or QoS deterioration was stated as justification for prioritizing the VMs with fixed demands. The authors also used the same idea to some common heuristics as FFD, BFD, WFD, and AWFD and made following modifications in order to ensure VMs with steady demands are not migrated: (a) in each consolidation step, map VMs with steady demands to the previously mapped PMs. (b) sort physical servers according to the lexicographic order of their resource (CPU, memory, and network) capacity. The proposed approach was evaluated based on TU-Berlin and Google workloads and the main finding is: avoiding migration of VMs with steady demands reduces the total migration while it has a minimal adverse effect on the number of PMs.

In another study, Alharbi et al. [68] took advantage of ant colony system optimization technique to solve the dynamic VM placement problem with the objective of minimizing the total energy consumption of all active PMs in a data center. The energy consumption is modeled similar to what we presented earlier in Eq. (9). The proposed method utilizes the VM and PM profile information extracted from the historical data logs. This information includes CPU and memory capacity, their residual capacity and their minimum and maximum energy consumption. The dynamic scenario is implemented considering a change in VM request in each time interval. At the beginning of each interval, the list of VMs is updated and the released resource by expired VMs are made available to the new VMs. According to the conducted simulation on small, medium, and large-scale data centers, the proposed method offers improved energy efficiency in comparison with FFD, ACO-VMP [69] and PVM [70]. However, such an improvement gained at the cost of more execution time. In terms of scalability, the runtime of the proposed method shows a linear increase with varying the number of PMs.

To minimize the overall energy consumption in a data center, Xiao, Ming [71] proposed a partitioned optimization framework. The proposed framework classifies the set of PMs into three different pools, namely: running pool, sleeping pool and off the pool. The running PMs are currently loaded with VMs while off PMs are switched off for energy efficiency. The sleeping PMs are put into the low energy state and they can be awakened when they are needed. The PM in different pool consumes a different amount of energy. The overall energy model which is denoted by Eall is defined as a summation of three parts as it is shown by the below equations:

$$E_{all} = \mathop \sum \limits_{i = 1}^{m} \left({E_{sta} \left({i,t} \right) + E_{swi} \left(i \right)} \right) + \mathop \sum \limits_{j = 1}^{n} E_{mig} \left(j \right)$$
(12)

where \(E_{sta} \left({i,t} \right)\) is the energy consumption of ith PM in its state (running, sleep or off). \(E_{swi} \left(i \right)\) is the amount of energy consumed for state switching of ith PM. \(E_{mig} \left(j \right)\) is the energy consumption for migrating jth VM. The main idea of the proposed optimization method is to reduce the search space by avoiding states/solutions that cannot make the current energy consumption any lower. The authors proposed a memetic algorithm to solve the dynamic placement of VMs. To demonstrate the priority of the proposed algorithm, the experiment was conducted to compare the proposed algorithm to the heuristics: FF(First Fit), BFI (Best Fit Increasing), BFD (Best Fit Decreasing), Greedy and LB (Load Balance). Based on the results, the authors conclude that the proposed algorithm outperforms those heuristics algorithms in terms of the percentage of energy consumption improvement. One drawback of the proposed method can be ignoring the impact of other hardware resources such as memory and even GPU on the energy consumption in data centers. In fact, the inclusion of such resources in the energy model can help more precise prediction of energy consumption and therefore the more realistic solution for the VM placement.

4.1.3 Power consumption for network communication

Most studies on efficient power management in the data center primarily focus on the effects of computer hardware with high power demands such as computer servers and cooling systems. However, network equipment also consumes 10–20% of total power in a data center [72] which introduces a new challenge to reduce the networking power consumption without imposing an adverse impact on overall network performance. This issue motivated Fang et al. [73] to propose a novel approach called VMPlanner to conserve network power in a data center. The main idea behind VMPlanner is to optimize the VM placement and traffic flow among VMs such that dispensable networking elements can be switched off for the sake of maximum possible energy conservation. The problem is formulated as a combinatorial optimization problem and solved in three different steps. First, all the VMs are partitioned into a set of groups with a minimum amount of inter-communication. Second, using a Tabu search technique, VM groups are assigned to the corresponding PMs’ racks such that total inter-rack communication is minimized. Third, the network traffic among VMs is managed such that dispensable networking equipment can be switched off for energy conservation. The evaluation result shows VMPlanner achieves 60% more power saving as compared to the situation in which all the network equipment are fully operating.

In multi-core servers, the inter-communication among the cores can substantially impact the overall system performance. Network-on-chip (NOC) technology leverages the computer networking and packet switching concept to provide efficient communication among multiple cores compared to the conventional communication architecture which uses wires and buses to connect cores. The advantage of NOC over traditional architecture is low latency, high performance, and lower power consumption. Liu et al. [74] designed an energy-aware on-chip VM placement scheme with the aim of efficiently allocating a number of VMs on a multi-core server with high efficiency and performance. An ant colony heuristic is used to place the VMs running the same application on the closer cores based on traffic rates, energy consumption, and communication delay. The problem is formulated as binary integer programming with the objective of minimizing the power consumption for inter-communication between VMs. The simulation results show that the proposed scheme attains better energy efficiency compared to FFD placement and random placement.

4.1.4 Traffic

A typical topology for a data center is a tree structure which includes switches. In such a topology, the communication delay between two different VMs is proportional to the distance between them. The distance is a number of hops from sender VM to the receiver VM. When a collection of VMs forms a single application, an inter-communication among collaborating VMs is likely needed. Therefore, placing the most communicative VMs on PMs with minimum network distance is a way to alleviate the communication overhead [39]. Besides VM placement policies to minimize the number of active PMs, power consumption or other criteria, network-aware techniques [39, 75] intend to improve the performance of the application by minimizing the communication latency among VMs. However, in such techniques, clients are required to at least provide the application interconnection network and its communication requirement in order to facilitate the decision-making process in effective resource management. As a simple example shown in Fig. 7, the placement strategy can accelerate data transfer by moving the communicative VMs (e.g. VM1–VM3, VM2–VM4) from distant physical servers to a local physical server.

Fig. 7
figure 7

Network traffic optimization

Meng et al. [75] addressed the scalability concern for modern traffic-intensive data centers. The proposed traffic-aware placement policy (called Cluster-and-Cut) intends to minimize the average traffic latency for the data center network by placing most communicative VMs in close proximity. Cluster-and-Cut is a two-tier heuristic algorithm which receives the traffic matrix between the VMs and first partitions VMs and hosts into different clusters and then matches VMs to the hosts at cluster level and thereafter at the individual level. The experimental analysis indicates that the proposed placement algorithm significantly reduces the aggregate traffic and computational time as compared to the existing generic methods presented in [76] and [77]. In the other effort to minimize the energy consumption of network equipment, da Silva, da Fonseca [78] presented a topology-aware strategy to place communicating groups of VMs closer together in a small area of the data center. As a result, VMs require a shorter path and fewer network switches to communicate with each other thus less energy consumed. The proposed algorithm, called TAVMP, receives a group of VMs as input and then splits the whole data center topology into smaller sub-graphs. The same strategy is recursively applied to each sub-graph and when the lowest level is reached the placement decision is made by other algorithm named Placement in Current Area (PCA). The performance of TAVMP was assessed in a simulation experiment based on blocking ratio (percentage of VMs were not placed) and energy efficiency. The result indicates that TAVMP accepts more virtual machines without degrading the energy efficiency compared to other algorithms: Power Aware Best Fit Decreasing algorithm (PABFD) and Round Robin algorithm (ROUND).

Rahimzadeh Ilkhechi et al. [79] studied VM placement with the objective of maximizing a certain metric named Satisfaction in a particular scenario of interest where some VMs are highly inclined to exchange traffic to certain nodes called sinks. The sinks can be a supercomputer, connection point or any physical resource that other nodes are highly dependent on it. The satisfaction of a VM is measured based on appropriateness of a PM that hosts that VM. Moreover, the metric takes into account the cost (proximity) of VMs to sinks together with demand flow of VMs in order to determine the suitability of each PM. The authors presented greedy and heuristic-based algorithms to assign VMs to PMs and found these algorithms more effective compared to the random assignment. However, the presented algorithm assumes that the knowledge of communication patterns and flow demand profiles are provided beforehand.

Song et al. [80] formulated the VM placement problem as a convex optimization problem and proposed a scheme called optimization-based scheme for solving the problem in a large scale data centers which take into account both network dependencies between different VMs and server constraints. The problem objective is to minimize the communication traffic among VMs. To validate the performance of the proposed approach, it was compared to the random placement and traditional bin packing algorithm in four different scenarios and with subject to four popular data center architecture topology such as Tree, VL2, Fat-Tree, and BCube. The achieved results indicate that employing the optimization-based schemes in such topologies (Especially BCube) reduces the communication cost between VMs and thus results in a higher degree of performance as compared to other methods. In addition, the proposed scheme requires the least number of PMs as compared to the random placement and First Fit placement.

4.1.5 Balance of the residual resource/resource utilization

The residual resource along different dimensions on each server should be always balanced in anticipation of future request. This is to prevent any resource wastage due to fragmentation [81]. Figure 8 shows an example of resource allocation along two dimensions; namely CPU and memory for a typical PM.

Fig. 8
figure 8

a Unbalanced PM versus, b balanced PM

Figure 8a shows an unbalanced placement strategy which results in resource wastage while in Fig. 8b the balanced placement helps to provide sufficient capacity to the future requests. The resource wastage in Fig. 8a is caused because the remaining resource along CPU dimension is too small and thus unlikely to accommodate future requests. By placing each VM on a PM, a certain amount of resource in different dimensions on that particular PM is consumed. Each inner rectangle represents the resource usage of each VM while the outer rectangle is the total resource capacity of the PM. Some extant works [7, 82] addressed VM placement with load balancing as an objective. Specifically, Cho et al. [82] proposed a hybrid meta-heuristic called ACOPS (as a hybrid of ant colony optimization and particle swarm optimization) to maximize the balance of resource utilization across different resource dimensions. The proposed approach uses the workload of historical requests to predict the workload of future requests. Each request is for a VM along with its resource demands. To quantify the load balancing, a degree of balance (DB) is defined for three types of resources i.e. memory, CPU and disk as follows:

$$DB = b_{1} \times V + b_{2} \times \left({1 - D} \right)$$
(13)

where V is the feature of utilization, D is the feature of maximum difference. \(b_{1}\), \(b_{2}\) are coefficients. In addition, to speed up the ACO process, the unsatisfied solutions are rejected before scheduling. The results of the simulative experiment indicate that ACOPS is faster than conventional ant colony optimization algorithm, has shorter makespan and also outperforms other approaches in terms of the balance of resource utilization. The authors also reported the time complexity of the proposed algorithm as O(n2MAI) (n number of VMs, M number of PMs, A number of ants, I maximum number of algorithm iterations).

He et al. [83] developed a Genetic Algorithm (GA) to consolidate moldable VMs for a cloud system. In converse to rigid VMs that the resource capacities remain unchanged, the resource capacity for moldable VMs is adjustable. Their approach particularly deals with a set of virtual clusters each of which provides a specific type of service to the users. Since a steady level of Quality of Service (QoS) is expected from the whole cluster as a single entity, it is not necessary to keep individual VMs capacities constant in the cluster. The authors employed a genetic algorithm to search for a mapping solution that most minimizes the standard variation of spare capacity across different resource types. The outcome of the genetic algorithm is an optimized system state that represents the optimal mapping of VMs in a virtual cluster to the PMs. If the resource capacities of VMs undergo a change, a new system state is calculated and the old state is transited to the new one. Transiting to the new state may involve different VM operations as VM creation, VM deletion, and VM migration. As each of VM operations has different cost, a heuristic approach was developed to obtain a reconfiguration plan with lowest possible cost in a reasonable time. In addition to the initial placement of VMs, a reconfiguration algorithm is leveraged to dynamically transform the current state of allocation to the new one. Through a simulation experiment, the developed GA technique was compared to the Entropy consolidation scheme presented in [84] and the result demonstrates that GA performs better than Entropy in packing VM in a fewer number of physical nodes.

4.1.6 Carbon footprint

The amount of carbon dioxide gas (CO2) emitted from an energy source of a data center is referred to as ICT carbon footprint and considered as an acute environmental effect. Today’s, large scale data centers are confronted with a substantial increase in carbon emission and therefore minimizing the carbon footprint has become one of the significant industry priorities [85]. Proper handling of the issue will contribute towards a sustainable and green ICT technology. Khosravi et al. [86] proposed an Energy and Carbon-Efficient VM placement algorithm called ECE based on a best-fit heuristic. ECE places VMs on a distributed data center with the objective of minimizing the carbon footprint. A broker decides to place the VMs on most suitable sites and servers according to the different parameters such as data center power usage effectiveness (PUE), energy source carbon footprint rate and proportional power. PUE is a metric to measure data center efficiency. A data center PUE is calculated as:

$$PUE = \frac{{Total\,date\,center\,power\,consumption}}{{Data\,center\,IT\,power\,consumption}}$$
(14)

where total data center power consumption refers to the sum of power drawn by the data center for all the purposes including IT equipment, lightning, cooling ant, etc. Data center IT power consumption reflects the power consumed by the data center for IT equipment only (as illustrated in Fig. 9).

Fig. 9
figure 9

Illustration of different power consumption for calculating PUE

PUE is a value greater than 1. In an ideal condition, PUE = 1 implies that 100% of electricity provided to a data center goes to the IT equipment which is practically impossible. The smaller the PUE is the more efficient data center is. The higher values of PUE mean a larger portion of input electricity is spent on cooling, lighting and etc. To evaluate ECE, it was compared with four First-Fit based heuristics in four data centers with heterogeneous infrastructure. The results demonstrate that, with the increasing number of VMs, ECE reduces the carbon footprint by at least 45%. Moreover, In terms of power consumption, ECE achieves a minimum of 8% of power saving.

Moghaddam et al. [87] proposed two new algorithms for placement of VMs in multiple clouds. The algorithms are an extension of the GGA algorithm presented in [81] and these algorithms were studied with different objectives such as minimizing energy consumption (MLGGA-EA) and carbon emission reduction (MLGGA-CA). The proposed algorithms are compared to GGA, FFD and Swarm [88]. The result demonstrates that, overall, MLGGA-EA achieves better solutions for multi-cloud scenario while MLGGA-CA is a promising choice for energy efficiency case. However, the MLGGA-EA is not recommended when carbon footprint is the main concern. The proposed approach is devoid of any concrete formulated model for energy consumption and carbon footprint.

4.1.7 Total VM performance

Service Level Agreement (SLA) is an agreement between user and provider and specifies a minimum level of quality of service to be offered to the user [89]. In the IaaS cloud model, one of the important factors to comply with the endorsed SLA is to ensure tenants always receive the promised quality and specification for the hardware equipment they lease. In particular, in the presence of virtualization technology and with interference caused due to co-existence of several VMs on a single PM, an important requirement to be taken into account is to improve VM performance which can be represented as VM response time (delay) [65] or VM throughputs.Tordsson et al. [90] presented a cloud brokering approach that involves optimal placement of VMs across multiple heterogeneous clouds. The cloud broker has two roles, First: providing a scheduling mechanism for determining optimal VM placement. Second: providing a uniform and transparent management interface for dealing with different VMs without depending on a specific type of cloud architecture or technology. A schematic of architecture for proposed cloud brokering approaches is shown in Fig. 10. The placement algorithm is in a static form and designed based on binary Integer Programming formulation and meant to maximize the total performance of running VMs across multiple clouds while satisfying various constraints such as performance, budget, service configuration and load balancing. The objective function is represented as:

$$TIC = \mathop \sum \limits_{j = 1}^{l} C_{j} \left({\mathop \sum \limits_{i = 1}^{n} \mathop \sum \limits_{k = 1}^{m} x_{ijk}} \right)$$
(15)

where Cj is the performance of a VM type j and \(x_{ijk} = 1\) if VM i of type j is placed on cloud k, and 0 otherwise.

Fig. 10
figure 10

Cloud brokering approach

To solve the problem, a mathematical programming language called AMPL [91] used along with a CPLEX [92] as a backend solver. The cloud brokering approach is evaluated with high throughput computing cluster over multiple cloud providers and the most significant finding is: the deployment of VMs over multi-cloud results in better performance and lower cost as compared to single cloud scenario.

4.1.8 Cost of deployment

In today cloud market, service providers offer a diverse range of service plans. These plans are sometimes subject to revision by the provider from time to time. However, it is not always straightforward for cloud end users to choose the best economically possible service with subject to their budget and limitations. This is particularly true in the case of the federated cloud that a service has to be provisioned through a set of stand-alone clouds with different pricing schemas. In this scenario, an intuitive goal is to minimize the total cost of deployment of VMs by choosing the lowest price plan/cloud. The cost is defined as a sum of the cost for each VM to be deployed [40]. Analogous to Tordsson et al. [90], Lucas-Simarro et al. [40] proposed a modular cloud brokering architecture including a scheduling module for the multi-cloud scenario. The VM placement strategy was implemented in the scheduling component and was aimed to optimally deploy VMs across different cloud environments where each vendor offers different and dynamic price schemes. The problem is formulated to minimize the so-called Total Infrastructure Cost (TIC). TIC is defined as a total cost for placing each VMs for a particular period of time as represented as:

$$TIC\left(t \right) = \mathop \sum \limits_{i}^{n} \mathop \sum \limits_{j}^{l} \mathop \sum \limits_{k}^{m} X_{i,j,k} \left(t \right)*P_{j,k} \left(t \right)$$
(16)

where t is the 1-h period of time and \(X_{i,j,k} \left(t \right) = 1\) if VM i of type j is placed on cloud k during period t, and 0 otherwise. \(P_{j,k} \left(t \right)\) is the price of placing VM j on cloud k for a period t. The placement is repeated before the beginning of each 1-h period and the prices of similar VM instances are subject to change over periods of time. Additionally, the second objective is to maximize Total Infrastructure performance (TIP) as a total performance for each VM in a certain period of time.

$$TIP\left(t \right) = \mathop \sum \limits_{i}^{n} \mathop \sum \limits_{j}^{l} \mathop \sum \limits_{k}^{m} X_{i,j,k} \left(t \right)*Perf_{j,k} \left(t \right)$$
(17)

where \(Perf_{j,k} \left(t \right)\) is the performance of VM of type j on cloud k for a period t. The performance of a virtual machine depends on a number of factors as a requirement of the application, type of VM and the PM that hosts the VM. LINPACK benchmark [93] was used to analyze the performance of each instance of VM.

In addition, few constraints with respect to the budget, minimum expected performance and budget were added. Thereafter, AMPL [94], a mathematical programming language, with MINOS solvers is used to optimize the mathematical model. The performance of the architecture and placement strategy was evaluated against HPC cluster and Web Server cases and the results demonstrate that multiple VM placement outperforms single VM and multi-cloud deployment using the broker is superior to the single one regardless of interference of cloud broker. Also, making use of cloud broker, users are benefited with 4–6% improvement in performance or budget.

4.2 Multi-objective VM placement

This section is meant to review a number of prominent VM placement approaches that address the multi-objective form of the problem. Besides, at last, a summary of discussed schemes as well as their objectives and assumption/drawback is presented in Table 2. Although the majority of proposed VM placement techniques formulate the problem based on a single objective there exist some other strategies [12, 41, 65, 95] that mainly address the multi-objective variation of the problem. Despite the primary objective which is often minimizing total power consumption, other criterions like Traffic, Load balancing, and Thermal dissipation can be considered to establish the multi-objective form of the problem. In multi-objective optimization, the aim is to minimize/maximize the number of objectives simultaneously. Generally, in tackling multi-objective problems there are two main approaches: The first approach relies on Pareto concept [96] to find a range of tradeoff solutions which are equally optimal and called non-dominated solutions as shown in grey in Fig. 11. The figure illustrates a sample solution space for a problem of minimizing two objectives of power consumption and network traffic. The circles in white represent solutions that have greater (worse) values of both objectives and therefore dominated by better solution represented in grey. The second approach of multi-objective optimization transforms an originally multi-objective problem into a weighted sum of individual objectives as a single objective as shown in the following equation for two objectives of power consumption and traffic:

$$Minimize\,\,\, w_{1} \times power + w_{2} \times traffic$$
(18)

where \(w_{1}\) and \(w_{2}\) are weighting coefficients for power consumption and traffic respectively and they indicate the relative importance of their corresponding objectives. Although the weighted sum approach is regarded as the simplest way to deal with a multi-objective problem, choosing the proper weight vector is not always a straightforward task [97]. Moreover, prior determination of the weight vector excludes some other potentially good solutions from the search space [98]. In practice, the weighting coefficients are chosen based on the relative importance of individual objectives for the specific problem of interest [96]. For example, if in a situation, minimizing power usage has more priority than minimizing traffic overhead then in above \(w_{1} = 0.7, w_{2} = 0.3\) can be a reasonable choice or in case of the equal importance of two objectives, both coefficients can be set to 1 (i.e. \(w_{1} = 1, w_{2} = 1\)). Table 3 lists VM placement schemes that utilize a weighted sum approach to deal with multiple objectives along with their rationale behind choosing weighting coefficients.

Fig. 11
figure 11

Pareto concept for multi-objective optimization

Table 3 List of VM placement schemes with weighted sum approach

4.2.1 Number of PMs and resource utilization

Liu et al. [69] proposed an ant colony based algorithm called ACO-VMP. The objective is to minimize the sum of total resource utilization and the number of used servers as shown by below formula:

$$w_{1} \cdot \mathop \sum \limits_{i - 1}^{M} \left({\frac{1}{{PC_{i} - UC_{i}}} + \frac{1}{{PM_{i} - UM_{i}}}} \right) + w_{2} \cdot M$$
(19)

where \(w_{1}\) and \(w_{2}\) are weight coefficients, PCi and PMi represent the CPU and memory capacities respectively while UCi and UMi are CPU and memory utilization level respectively. M is the number of PMs. The performance of ACO-VMP was compared to FFD algorithm in [49] and the experimental results show the solution returned by ACO-VMP always requires fewer PMs when VMs are varied between 100 to 600. However, no comparison between the two algorithms was performed in terms of computational time. Also, the performance of ACO-VMP in minimizing resource utilization was not assessed.

4.2.2 Energy consumption and VM performance

In addition to the single objective algorithm by Kessaci et al. [65] which was previously discussed in Sect. 4.1.2, the authors proposed a bi-objective version of EMLS-NOC called EMLS-NOC-MO which intends to addresses both energy consumption and performance of VMs. Moreover, among a set of best-found solutions, the priority is given to the solution that packs the highest number of VMs. The VM performance model is defined based on the response time of VMs. The response time is calculated according to a linear relationship with memory increase as shown in Fig. 12 Where memoryj is the memory requirement of VM j and mem_usagei is the current memory usage of PM i. To evaluate the EMLS-NOC-MO, it was compared to FFD and OpenNebula’s default Scheduler when it is applied to individual objectives (e.g. energy consumption and performance of VMs). The achieved results from EMLS-ONC-MO, are 24% and 9% better than that of OpenNebula’s default scheduler when respectively energy and VM performance are objectives. Comparing EMLS-ONC-MO to other approaches, the authors also report the superiority of the EMLS-ONC-MO. However, no comparison with prominent Pareto-based multi-objective approaches (such as NSGA-II [99]) presented.

Fig. 12
figure 12

VM performance model

4.2.3 Resource wastage and power consumption

Gao et al. [95] studied VM Placement as a multi-objective combinatorial optimization problem with two objectives as resource wastage and power consumption. The authors modeled the resource wastage for jth PM (\(W_{j}\)) as below:

$$W_{j} = \frac{{\left| {L_{j}^{p} - L_{j}^{m}} \right| + \varepsilon}}{{U_{j}^{p} + U_{j}^{m}}}$$
(20)

where \(U_{j}^{p}\) and \(U_{j}^{m}\) denote the ratio of used amount of CPU and memory respectively to the total available corresponding resource while \(L_{j}^{p}\) and \(L_{j}^{m}\) represent the normalized remaining amount of CPU and memory respectively. ε is a very small positive value.

The presented power consumption model as already discussed in Sect. 4.1.2. is an ant colony optimization algorithm called VMPACS which was proposed to simultaneously minimize both problem objectives. The goal is to find a set of non-dominated solutions that provide the best possible trade-off between two objectives with reference to the concept already discussed in Sect. 4.2. The performance of the proposed approach is compared with single objective ant colony (SACO) optimization algorithm in [100], a multi-objective genetic algorithm (MGGA) that proposed in [81] and a single objective FFD heuristic in [49]. Two special performance metrics for the multi-objective algorithms called ONVG [101] and Spacing [102] were employed to evaluate the effectiveness of the proposed algorithm. The conducted experiment demonstrates the superiority of the VMPACS to the other algorithms. Furthermore, the experiment result verifies the scalability of the approach in large data centers with many VMs and also shows that VMPAS takes less than 3 min to solve a placement problem with up to 2000 VMs. However, the authors did not assess the performance of VMPACS in terms of computational complexity when it is compared to MGGA, SACO, and FFD.

In another attempt to minimize the power consumption and resource wastage of cloud data center, Jamali et al. [103] applied an imperialist competitive-based algorithm (ICA); a novel optimization technique which was first introduced by Atashpaz-Gargari and Lucas [104] for tackling real-world solve optimization problem. To reduce the complexity, the bi-objective problem was converted to a single objective one using the weight-based approach. The performance of ICA was compared to the well-known approaches such as Ant Colony, Genetic Algorithm and FFD on a CloudSim simulation environment in terms of power consumption and resource wastage criteria. The simulation results indicate that ICA performs better in reducing the power consumption and resource wastage of PMs as compared to other placement algorithms. However, no comparative analysis of computation time provided.

Zheng et al. [42] proposed a novel solution called VMPMBBO. The proposed evolutionary algorithm is a biogeography-based optimization technique and it is intended to find a solution that simultaneously minimizes the resource wastage and power consumption. The authors extended the model in [81, 95] to quantify the cost of the resource wastage along three dimensions of CPU, memory, and bandwidth. The power consumption model is based on CPU utilization level similar to the model presented in Sect. 4.1.2. Through simulative experiments and using both synthetic and real data, the proposed method was compared with two other multi-objective optimization algorithms: MGGA [81] and VMPACS [95] and the results show that in most cases VMPMBBO has better convergence and also it is computationally more efficient. However, the conducted experimental analysis does not include any performance comparison to other the state-of-the-art evolutionary approaches (such as NSGA-II [99]) which has been successfully applied in a variety of optimization problems in different domains. In addition, employing dedicated multi-objective performance metrics seems necessary to evaluate the efficiency of the proposed algorithm precisely.

The problem of VM placement with two objectives of power consumption and resource wastage was also addressed by Gupta, Amgoth [109]. A power consumption model similar to what we presented in Sect. 4.1.2 and a more involved resource wastage model based on CPU demand, memory demand of VMs, maximum memory utilization and maximum CPU utilization of PMs were presented. The main idea behind the proposed methods named as RVMP is the utilization of a new two-dimensional resource usage model (as shown in Fig. 13). The model partitions the CPU and memory utilization space into three different domains according to the degree of balance in resource utilization. Based on this model, the VM migration is limited and this balances the resource utilization. Three different domains in the proposed model are Acceptance Domain (AD): where the residual resource amounts are nearly balanced. That is, there is little resource wastage and it is an ideal case for all PMs. This domain has the highest priority. Balance Domain (BD): where there is no apparent disequilibrium in resource utilization and it is fairly balanced. This domain has the second highest priority. Domain (UD): where there is an obvious disequilibrium in resource utilization. This domain has the least priority. RVMP is divided into two phases as VM placement and VM migration. The decision of placing the VMs are made based on the so-called Resource Usage Factor (RUF) which merits the suitability of a PM to host a VM. RUF is calculated based on the resource utilization of VMs and the remaining resources of PMs. VM Migration phase is performed based on RUF and the posterior usage state of PM in the aforementioned resource utilization model. The posterior usage state of a PM with respect to a VM is defined as new usage state of the PM when the VM migrates to that PM and implies the suitability of that PM with subject to the domain in the two-dimensional model which the posterior usage falls into. To evaluate its performance, RVMP was compared with existing algorithms: First Fit, VMPACS [95], MBFD [52] and OBFD [110] in terms of power consumption, resource wastage, overall CPU/memory utilization and the number of active PMs. The simulation results using user-customized VMs and using Amazon EC2 instances demonstrate the superior performance of the proposed algorithm.

Fig. 13
figure 13

Multi-dimensional resource usage model

4.2.4 Power consumption and VM execution time

Kansal and Chana [9] proposed the ERU (Energy-Aware Resource Utilization) model to efficiently manage the resources in the cloud computing environment. The goal of ERU is to reduce the energy consumption of cloud infrastructure without degrading the performance of the user’s application which is translated into VM execution time. This model is meant to achieve the maximum possible resource utilization which results in the enhanced energy efficiency of the data center. A weighted sum of two objectives (execution time and power consumption) is minimized. The model for calculating VM execution time (ET) is defined as:

$$ET = \mathop \sum \limits_{i = 1}^{M} ET_{i}$$
(21)

where ETi is the execution time of VMs running on ith PM and M is the number of PMs. \(ET_{i}\) is defined as:

$$ET_{i} = \mathop \sum \limits_{j = 1}^{n} \mathop \sum \limits_{k = 1}^{l} ET_{ijk}$$
(22)

where \(ET_{ijk}\) is the execution of k jobs running on jth VM on ith PM.

Likewise, the total power consumption (EC) is calculated as the sum of power consumption for every single PM (ECi) as:

$$EC = \mathop \sum \limits_{j = 1}^{M} EC_{i}$$
(23)

and consequently, the power consumption for every single PM; ECi; is its power consumption; PCi; during t units of time as shown below:

$$EC_{i} = PC_{i} \times t$$
(24)

The key features of the proposed model are: monitoring cloud resources to determine the current level of energy consumption, providing users with requested resources and enhancing resource utilization. As an element of the proposed model, the scheduler module is responsible for finding the best physical nodes for user’s jobs. To avoid conflict among different nature of workloads and to prevent potential resource contention, workloads are segregated into CPU-intensive workloads and memory-intensive workloads. The scheduler uses an artificial bee colony (ABC) optimization technique to place the dynamic user’s workload to an optimal set of physical nodes. Through a simulation-based experiment on CloudSim toolkit [111] the performance of ABC-based technique (called ERU) is evaluated against Ant Colony Optimization (ACO) [100] and First-Fit Decreasing Heuristic (FFD) [112]. The experimental results show ERU takes higher time than FFD and less time than ACO to obtain the final output. In addition, employing the ERU approach results in less energy consumption as compared to FFD and ACO techniques. Specifically, 11% of PMs and 10.7% of power have been saved using ERU over FFD. The PMs and power conserved using ERU over ACO are 6.35% and 6.63% respectively.

4.2.5 Resource fragmentation and number of PMs

Since the resource wastage/fragmentation results from imbalance use of the resource over multiple dimensions (such as CPU, memory and disk space), Li et al. [7] proposed a novel multi-dimensional space partition model to describe the resource usage status of PMs. Figure 14 shows the multi-dimensional space partition model for two different resources. All the resource dimensions are normalized to have capacities in the same range of [0, 1]. The point O indicates that all the resource dimensions are unused and thus the PM is idle. On the other hand, point E refers to the state that all the resource dimensions are exhausted. The model has partitioned into three different domains as (1) acceptance domain (AD): where all the D-dimensional resources are almost finished. A PM with usage state falls in this domain is an ideal candidate for placing a new VM, (2) forbidden domain (FD): this domain implies imbalance in D-dimensional resource utilization and therefore should be avoided. (3) Safety domain (SD): indicates there is no obvious imbalance of resource utilization and considered a balanced case. On top of this underlying model, a dynamic energy efficient VM Placement algorithm called EAGLE is proposed to reduce the number of active PMs and therefore decrease the amount of energy that consumed in a data center. EAGLE attempts to place the VMs in a more balanced way rather than arbitrary manner. The central idea of EAGLE is to achieve a compromise between multi-dimensional resource utilization and minimizing the number of active PMs. EAGLE decides to place the VM v on PM p based on a so-called posterior usage state of p which refers to the new available resources of p after presumptive placement of v. If posterior usage state of p lies in the acceptance domain, p has the priority to be selected while if its posterior usage state lies in the safety domain it has second priority to be selected. p will not be selected if its posterior usage state lies in the forbidden domain. PMs with identical posterior usage state will be compared according to two other defined metrics \({\Re}\) and \({\mathfrak{D}}\). The dynamic feature of EAGLE is established on the basis of dividing time into time-slots with equal length \({\Delta}t\). There are \(k\left(\tau \right)\) VMs to be placed at the \(\tau^{th}\) time-slot while \(N\left(\tau \right)\) and \(M\left(\tau \right)\) represent the total number of PMs and VMs at the \(\tau^{th}\) time-slot. EAGLE starts by initial time-slots (e.g. \(\tau = 0\)) and records the number of running PMs;\(N\left(\tau \right)\) at the end of each iteration and repeats the same procedure for the next time slot (\(\tau + 1\)). To evaluate the performance of EAGLE, it was compared to FFD [49] as a well-known heuristic for the bin packing problem. The experiment results for single VM request per time-slot shows that: using EAGLE results in 10% less power consumption as compared to FFD which is a substantial amount of energy saving for a large data center. Moreover, for multiple VM requests per time-slot, FFD uses 1.15 times of PMs as compared to EAGLE which implies that EAGLE placement saves 15% of energy cost.

Fig. 14
figure 14

Multi-dimensional space partition model

4.2.6 Power consumption and VM performance degradation

Lovász et al. [59] addressed performance degradation incurred when multiple VMs share a single PM in heterogeneous server infrastructure. Running multiple VMs on single hardware is susceptible to resource contention. This is because different VMs send several requests to obtain access to the shared hardware resources such as CPU and this leads to frequent context switching and consequently degrades the performance of VMs. The proposed approach is an energy-aware and performance-aware approach which is meant to make a tradeoff between energy consumption and performance degradation. Besides, authors provided a model to predict the performance degradation overhead (in terms of response time) of a service encapsulated in a VM when it is co-placed with other VMs as compared to the performance of the same service in a non-virtualized environment. The model considers three different parameters which have an influence on the performance degradation of VM v. These parameters are (1) \(v\#\): the number of VMs competing for a specific CPU core. (2) \(s^{CPU}\): the total load on CPU core of server \(s\) and (3) \(v^{CPU}\): the CPU demand of VM \(v\) itself. The mathematical equation below were retrieved to represent the relationship between these parameters and performance degradation of VM \(v\) (\(p_{virt} \left({v,s} \right))\) in the virtualized environment when \(v\) is placed on server \(s\):

$$p_{virt} \left({v,s} \right) = p_{no\_virt} \left({v,s} \right) + v\# \cdot \left({\lambda_{1} + \lambda_{2} v^{CPU}} \right)$$
(25)

where \(p_{no\_virt} \left({v,s} \right)\) is the performance of virtual service \(v\) on server \(s\) in non-virtualized environment and \(\lambda_{1}\) and \(\lambda_{2}\) are two constants which are experimentally determined.

On the basis of this model, two heuristic algorithms termed greedy heuristic and ModifiedFirstFit were proposed to approximate the optimal solution for the problem. The average overall performance of the two algorithms is evaluated against four other competitors which are Load Balancing, Maximum density consolidation, exhaustive optimal allocation, and best from random allocation. The experimental result demonstrates that the proposed algorithms significantly perform better than other competitors in terms of energy saving. The greedy heuristic provides an additional energy saving of 30%. However, this energy saving is achieved at the price of a higher degree of performance degradation. With regards to computational complexity, both proposed heuristics have the complexity of \(O\left({n.m} \right)\) while the complexity of best from the random allocation, the exhaustive optimal allocation is \(O\left({n.m} \right)\) and \(O\left({m^{n}} \right)\) respectively (m number of PMs and n number of VMs).

In another work by Zhao et al. [113], the authors proposed an ant colony-based method named as PPVMP to solve the bi-objective VM placement with objectives of power consumption and performance degradation. In dealing with the multiple objectives, authors took advantage of the Pareto concept to find optimal solutions with respect to both objectives simultaneously. PPVMP was constructed on the basis of two fundamental power consumption and performance degradation models. The power consumption model denoted by \(PW_{j} \left({U_{j}} \right)\) is formulated as:

$$PW_{j} \left({U_{j}} \right) = PW_{j}^{idle} + PW_{j}^{dync} \left({U_{j}} \right)$$
(26)

where \(PW_{j}^{idle}\) is the power consumed by physical machine j when it is in idle state and \(PW_{j}^{dync}\) is the power consumed by physical machine j when it is busy. \(U_{j}\) is CPU utilization level. To characterize the resource contention in PM, three individual performance models for CPU, memory, and network is used. The CPU relative performance of VM running on PM Mj is denoted as \(mp_{c}^{i}\) and defined as follows:

$$mp_{c}^{i} \propto \left\{{\begin{array}{ll} 1 &\mathop\sum\limits_{i} v_{c}^{i} \le M_{c}^{j} - M_{c,r}^{j}\\{\frac{{M_{c}^{j} - M_{c,r}^{j}}}{\gamma_{c} \mathop \sum \nolimits_{i} v_{c}^{i}}} &Otherwise \\ \end{array}}\right.$$
(27)

where \(M_{c,r}^{j}\) is reserved CPUs for running Mj. \(\gamma_{c}\) is the CPU performance degradation parameter. \(M_{c}^{j}\) is the total CPU for Mj and \(v_{c}^{i}\) is CPU requirement for VM \(V_{i}\). Memory relative performance is denoted as \(mp_{m}^{i}\) and defined as follows:

$$mp_{m}^{i} \propto \frac{{M_{m}^{j} - M_{m,r}^{j}}}{{\gamma_{m} \mathop \sum \nolimits_{i} v_{m}^{i}}}$$
(28)

where \(M_{m,r}^{j}\) is reserved memory for running Mj, \(\gamma_{m}\) is memory performance degradation parameter, \(M_{m}^{j}\) is total memory for Mj and \(v_{m}^{i}\) is memory requirement for VM \(V_{i}\). Likewise, the network relative performance is denoted as \(mp_{n}^{i}\) and defined as follows:

$$mp_{n}^{i} \propto \frac{{M_{n}^{j}}}{{\gamma_{n} \mathop \sum \nolimits_{i} v_{n}^{i}}}$$
(29)

where \(\gamma_{n}\) is network performance degradation parameter, \(M_{n}^{j}\) is total network bandwidth for \(M_{j}\) and \(v_{n}^{i}\) is network bandwidth requirement for VM \(V_{i}\).

To evaluate the efficiency PPVMP was compared to CMBFD [114], VMPBBO [42] and VMPACS [95] in CloudSim and real OpenStack cloud platform. In terms of both objectives, the authors found PPMP to perform best as compared to two other methods. According to the authors, the superiority of the proposed approach is mainly attributed to their choice of both power consumption model and performance degradation model. However, as the authors mention, the performance degradation is not considered a target/objective in three compared works and therefore comparison to other VM placement methods that have both objectives in common does make more sense here.

4.2.7 Energy consumption and interference among VMs

Sharifi et al. [105] applied a simulated annealing (SA) technique to schedule a number of VMs on a set of PMs in a data center. The goal is to minimize total power consumption in the whole data center while the performance interference among different types of workloads is minimized. The workloads are either processor-intensive workloads or disk- intensive workloads. The scheduler uses a criterion called consolidation fitness (CF) to merit the consolidation of a set of VMs on a number of PMs before scheduling actually take place. CF is calculated by dividing the performance degradation of a set of VMs by the amount of energy saving gained through consolidation as shown by the below equation:

$$CF = PD/SE$$
(30)

where PD denotes the performance degradation when VMs are consolidated and SE is saved energy obtained by the consolidation. Smaller CF is, more reasonable is the VM placement. Figure 15 illustrates the way CF is calculated. In Fig. 15a the energy consumption (\(e_{1})\) and the execution time (\(t_{1}\)) are measured for two VMs running on two separate PMs. In Fig. 15b, the same parameters are again measured (denoted by \(e_{2}\) and \(t_{2}\)) when both VMs are placed on a single PM and the spare PM is switched off. Then PD and SE are computed by the following formulas:

$$PD = \frac{{t_{2} - t_{1}}}{{t_{1}}} \times 100$$
(31)
$$SE = \frac{{e_{1} - e_{2}}}{{e_{1}}} \times 100$$
(32)

The proposed power model calculates the total power consumption as the sum of power consumed by processor and disk. Firstly, each individual objective is minimized separately using a simulated annealing method to find an optimal point for each objective. Thereafter, a weighted sum technique is employed to transform the original multi-objective problem into a single objective equivalent. Upon optimizing the problem objectives, the proposed method generates a system state as output which includes a binary matrix of mapping VMs to PMs (\(X_{ij}\)) along with a binary vector (\(Z_{i}\)) that determines which PM is off or on. This system state can be used by the same proposed method as input to move to the next system state (including \(X^{\prime}_{{ij}}\) and \(Z^{\prime}_{{i}}\)). The difference between the two system states determines which PM should be switched off or on and which VM should be migrated to which PM. However, the authors did not address how frequent the algorithm is executed and under what condition. To evaluate the performance, through a simulative experiment, the presented algorithm was compared to the static algorithms presented in [115, 116] and dynamic load balancing scheduling methods presented in [117]. The comparison was carried out based on the power consumption and computation time of the different algorithms. The results of the experiment indicate that the proposed approach saves 24.9% more energy than two other methods. However, the total execution time of all the VMs for the proposed approach is 1.2% higher than the static method since static algorithms naturally require less time to complete as they have full knowledge of all jobs. The authors also reported the time complexity of the proposed algorithm as \(O\left({M \times N} \right)\) (M number of PMs, N number of VMs).

Fig. 15
figure 15

Calculating CF metric

4.2.8 VMs communication latency and number of PMs

Pascual et al. [39] proposed an evolutionary multi-objective placement policy which attempts to simultaneously minimize the communication latency and a number of active servers for an application. The application is formed by a set of communicating VMs and has to be assigned to any group of physical servers in the data center. The VMs intercommunicate based on a specific layer-based organization similar to what is depicted in Fig. 16. The client is assumed to be aware of the communication need and the interconnection network of his application and hands over this information to the provider to be used in the placement process. To reduce the intra-VM communication overhead, the placement policy places the most communicative VMs as close as possible. The proposed model for calculating the communication latency is based on the bandwidth and distance between VMs as shown by the following formula:

$$Latency = \mathop \sum \limits_{{v_{i}, v_{j} \in V}} d(v_{i},v_{j}) \times bw\left({v_{i},v_{j}} \right)$$
(33)

where \(d\left({v_{i},v_{j}} \right)\) is network distance between core of PMs assigned to VM i and j, \(bw\left({v_{i},v_{j}} \right)\) is bandwidth required by VM i and j and \(V\) is set of all VMs. Distance is measured as a number of hops from sender PM to the receiver PM. For example, as shown in Fig. 16, the distance between the VM assigned to PM 1 and PM 2, or \(d\left({1,2} \right)\) is 2. Similarly, \(d\left({5,7} \right) = 4\) and \(d\left({8,16} \right) = 6\).

Fig. 16
figure 16

Representation of physical organization of data center

Two well-known evolutionary multi-objective algorithms called SPEA-II [118] and NSGA-II [99] were employed to tackle the placement problem. In addition, the result obtained from common heuristic-based placement strategies: FF and RR were used as the starting point for the evolutionary optimization technique. The VM placement strategies experimented in a simulation-based environment. According to the results, SPEA-II performs better than NSGA-II in the majority of cases. Moreover, the results indicate that applying placement policy has a positive impact on both the data center and VMs in terms of VM execution time and energy consumption. Specifically, the average execution time per request is reduced up to 11–19%. Also, for highly loaded data center, the energy saving is between 7.26 and 13.81%. One downside of the proposed approach is requiring clients to specify the application architecture and inter-connection network in advance.

4.2.9 Power consumption, resource wastage, and thermal dissipation

Thermal performance is one of the key indicators in managing a data center [81]. Tightly packed workloads on a small number of servers create hotspot which makes hardware prone to failure and incurs extra cooling expenditure [81]. Designing cooling equipment and ventilation systems are necessary to avoid overheating and performance degradation or even hardware failure [119]. Nevertheless, suitable thermal management policy is still a crucial need to reduce further cooling cost, alleviate hotspots and keep the temperature in a safe range [81]. Xu, Fortes [81] proposed temperature aware placement policies to improve the overall performance by minimizing the temperature of a server alongside other objectives such as power consumption and resource wastage. The proposed policy employs an improved genetic algorithm (called MGGA) with a fuzzy multi-objective evaluation to search for a solution which most minimizes the above mentioned conflicting objectives. According to the conducted profiling study, there is a linear relationship between CPU temperature and power consumption as denoted by the following formula:

$$T = PR + T_{amb}$$
(34)

where \(T\) is a temperature, \(P\) is the power consumption, R denotes thermal resistance and \(T_{amb}\) is ambient temperature. The presented resource wastage model calculates the wasted resource (\(W\)) as a sum of differences between the smallest normalized residual resource (\(R_{k}\)) and others (\(R_{i}\)) as shown in the following formula:

$$W = \mathop \sum \limits_{i \ne k} \left({R_{i} - R_{k}} \right)$$
(35)

where \(R_{i}\) is the ratio of residual resource to the total resource for resource type i and \(R_{k} = \mathop {\min}\limits_{i} R_{i}\). Therefore, the larger the difference among different dimensions is, the more resources are wasted. The result of simulative experiments shows the proposed approach is superior to other approaches such as bin packing algorithm and single objective algorithms (tending to minimize individual objectives) in terms of performance, scalability, and robustness. To validate the performance, the authors showed that, overall, MGGA returns lower values for different objectives. Scalability was evaluated by varying the number of VMs (100–2000) and PMs (50–1000). MGGA takes up to 3 min to solve the problem with 1000 PMs and 2000 VMs. In addition, the execution time for MGGA shows linear growth with a variable number of S (population size) and G (number of generations). Finally, to validate robustness, the authors showed that the results of MGGA are not sensitive to various values of S and G.

In addition to the current work, the other study [12] by the same authors address the VM placement in the dynamic scenario and proposes a controller which automatically maps VMs to PMs in order to satisfy the same objectives as the previous study and reduce migration cost.

4.2.10 Power consumption, network traffic, and migration cost

Dong et al. [106] proposed two placement strategies for static and dynamic scenarios. The first algorithm called VM-P is a greedy algorithm which was designed for initial placement of VMs. The static problem is abstracted as multi-dimensional bin packing problem with the objective of minimizing a weighted sum of power consumption and network traffic among VMs. Supplementary, VM-Mig was developed for the dynamic scenario and it was meant to minimize the weighted sum of the aforementioned objectives coupled with the third objective of migration cost which is defined as a number of migrated VMs. However, the authors did not mention when VM-Mig is triggered. Basically, VM-Mig uses the same VM-P strategy for finding a new placement of VMs on PMs but the outcome of VM-P is accepted if the number of migration (as result of the difference between the previous and current placement) is less than a pre-determined threshold. In the event that the new placement requires a number of migrations more than the threshold value, only some of the migrations (less than the threshold) that can improve the performance are accepted. The drawback of VM-Mig is poor stability and tendency to get stuck in local optimum.

Through a simulation experiment, the proposed algorithms were compared to FFD and T-opt and Random algorithm. Overall, the proposed algorithms are found to attain better results in terms of energy consumption and communication traffic. In addition, the time complexity of VM-P and VM-Mig was reported as \(O\left({m \cdot n^{2}} \right)\) and \(O\left({nMax \cdot n^{2}} \right)\) respectively (m number of PMs, n number of VMs, nMax number of loop iterations for finding a placement with a lower number of migrations) which means VM-P is computationally more expensive than FFD (\(O\left({nlogn} \right))\).

4.2.11 Power consumption by PMs and power consumption for inter VM traffic

As an improvement to their preliminary work [120], Tang and Pan [107] applied a hybrid genetic algorithm (HGA), which is a combination of a genetic algorithm and local search technique; for solving VM placement problem. Analogous to the previous work, the objective is to minimize the power consumed by PMs together with the power consumption of a communication network in a data center. The presumed communication network topology is similar to the structure used by Pascual et al. [39]. The power consumption of the communication network is dependent on the number of network equipment such as switches used by VMs to communicate with each other. The communication between pairs of VMs is categorized into four different classes as shown in the example of Fig. 17. These classes are C1: The communication that does not involve any network device (for two VMs placed on a single PM). The communication between VM 1 and VM 2 in the example of Fig. 17 falls in this class. C2: The communication that uses only one network device. The communication between VM 1 and VM 3 in the example of Fig. 17 falls in this class. C3 The communication that involves three network devices. The communication between VM 3 and VM 4 falls in this class. C4 The communication that uses five network devices. The communication between VM 4 and VM 5 in example falls in this class. The authors approximate the total network power consumption as follows:

$$E\left(c \right) = e\left(c \right) \times l\left(c \right)$$
(36)

where \(l\left(c \right)\) denotes the amount of data needed to be transmitted over communication c and \(e\left(c \right)\); the power needed to transfer a unit of data within set c. \(e\left(c \right)\) is defined as follows:

$$\begin{gathered} e\left( c \right) = e_{i} \quad if\,c \in C_{i} ,\quad 2 < i < 4 \hfill \\ e\left( c \right) = 0\quad if\,c \in C_{i} \hfill \\ \end{gathered}$$
(37)

The result of evaluation demonstrates the superiority of HGA over the original genetic algorithm presented in [120] in terms of performance, efficiency, and scalability. In particular, HGA was found better in discovering a new solution in the search space, converges faster to the optimal solutions and produce better solutions with respect to minimizing the objective function. Mean total energy consumption of solution found by HGA is 27.36–43.90% less than that of original GA while the mean computation time of HGA is 73.30–88.61% less than original GA.

Fig. 17
figure 17

Different categories of VM communication

Furthermore, by increasing the number of PMs and VMs, HGA exhibits nearly linear computation time. However, since the multi-objective problem is transformed into a weighted sum form, determining the proper weight coefficient is not always straightforward and usually needs a time consuming trial-and-error process.

4.2.12 Resource usage, server usage, and bandwidth usage

Kanagavelu et al. [44] developed a greedy approach called Greedy VM Placement with Two Routing (GVMTPR) to reduce the possibility of network congestion and balance the load in a data center by distributing the traffic in multiple paths. The maximum load on links is considered as a measure of congestion. Besides, the proposed method offers partial traffic protection to enhance link reliability. This is performed by splitting the current traffic flow between two adjacent VMs across two disjoint paths. Therefore, at least one path will be available in the event of a potential single link failure. Specifically, by dividing b units of traffic into b1 and b2 units, the minimum of b1 and b2 as denoted by min (b1, b2) is available in the event of single link failure. The partial protection is measured by protection grade which is defined as a fraction of guaranteed bandwidth in case of a single link failure. The protection grade for two paths with b1 and b2 units of bandwidth is \(\hbox{min} \left({b_{1},b_{2}} \right)/b\) where b is a guaranteed unit of bandwidth for a particular flow. In each stage of its greedy procedure, GVMTPR attempts to minimize the weighted sum of three costs. These costs are Resource Usage (RU): as a fraction of resources used in a particular server, Server Usage (SU) a fraction of active server in the data center and Bandwidth Usage (BU) as the bandwidth needed for a pair of physical servers; \(S_{x}\) and \(S_{y}\) to communicate with each other weighted by the hop distance \(h_{x,y}\). If \(B_{x,y}\) is the total bandwidth required for all the VMs placed to \(S_{x}\) to communicate with the VMs in \(S_{y}\), the Bandwidth Cost(BU) for all servers is calculated as:

$$B\left(U \right) = \frac{{\mathop \sum \nolimits h_{x,y} \cdot B_{x,y}}}{{B_{T}}}$$
(38)

where \(B_{T}\) is the total bandwidth requirement of the traffic.

The performance of GVMTPR was compared to the first fit heuristic and random placement and the achieved results demonstrate the effectiveness of the proposed algorithm in terms of bandwidth cost and performance. The authors also report the time complexity of the proposed methods as \(O\left({n^{2} m^{2}} \right)\) where n is the number of servers and m is the number of VMs.

4.2.13 Maximum bandwidth occupancy on the uplink of all the tor switches and maximum number of VM partitions of all the requests

Chen et al. [121] proposed Least-Load First Based Placement (LLBP) algorithm to simultaneously minimize the maximum number of VM partition for all the requests and maximum bandwidth occupancy on the uplink of Top of Rack (ToR) switches. The VM placement problem was studied on the basis of the three-layer tree-like architecture of example in Fig. 18. In this topology, ToR switches are connected to the aggregation switches and aggregation switches are connected to the core switches. The accumulated traffic at higher levels links makes them bottleneck and prone to be oversubscribed. Therefore, distributing the traffic evenly across all uplink of ToR switches is necessary to hinder the creation of hotspot. The data center network is assumed to have n ToR switches (as shown in Fig. 18) which are represented by \(T = T_{1}, T_{2}, \ldots,T_{n}\). Each ToR switch has the capacity for accepting maximum c VMs at one PM connected to switch. There are \(R = \left\{{R_{1},R_{2}, \ldots,R_{m}} \right\}\) requests from different tenants (each request from one tenant) and each request \(R_{i}\) is for placing set \(S_{i}\) of VMs to more than one ToR switch (\(\left| {S_{i}} \right| > c)\). Each VM under ToR switch contributes a certain amount of traffic toward the higher level. The first objective is defined as:

$$min \mathop {\max}\limits_{{k \in \left[{1..n} \right]}} B_{Lk}$$
(39)

where \(B_{Lk}\) is the accumulated bandwidth occupancy on the uplink \(L_{k}\). To reduce the communication overhead, the VMs of the same request is placed on a few PMs as possible. If \(T_{{R_{i}}}\) is the subset of ToR switches under which the VMs of request \(R_{i}\) is placed, the second objective is defined as follows:

$$min \mathop {\max}\limits_{{i \in \left[{1..m} \right]}} \left| {T_{{R_{i}}}} \right|$$
(40)

where \(\left| {T_{{R_{i}}}} \right|\) denotes the numbers of switches assigned to VMs of request \(R_{i}\).

Fig. 18
figure 18

Three-layers architecture for minimizing maximum bandwidth occupancy of ToR switches

The proposed heuristic algorithm (LLBP) places the requests based on non-increasing order of number of VMs. LLBP tries to place each request in a minimum empty ToR switch that has the capacity for all the VMs of the request. The performance of LLBP was evaluated against Greedy Based Placement (GBP) as a baseline algorithm as well as Longest Processing Time Based Placement (LPTBP) which can generate near-optimal solution but it does not take VM communication locality into account. According to the simulation results and based on the minimum and maximum recorded values of bandwidth occupancy of the uplink of all ToR switches, LPTBP performs best while LLBP is in the middle and GBP is the worst. The authors also observed that GBP takes full advantage of communication locality property (as reflected by the second objective), LPTBP is able to equally spread the traffic across all the uplinks of ToR switches (as reflected by the first objective) and LLBP is effective in simultaneously balancing both objectives.

4.2.14 Response time, failure rate and resource utilization

Chen and Jiang [108] developed a fault-tolerant VM placement method to guarantee the reliability of cloud applications. The proposed method considers three factors as constraints, namely: response time, failure rate and resource consumption for a cloud application running on a VM. In addition, four well-known fault tolerant strategies as Retry, Recovery Block, N-Version Programming and Active are employed. The objective function is defined as minimizing the weighted sum of the ratio of response time, the ratio of failure rate and the resource utilization for a cloud application running on a VM. A two-phase VM placement algorithm proposed. In the first phase, the best objective function value for each fault-tolerant strategy is obtained. In the second phase, the VM placement is solved based on the result from the first phase. The authors compared the performance of the proposed fault-tolerant approach to three other fault-tolerant strategies (NOFTPlace, RandomFTPlace, ResourceFTPlace) with the constant increase of constraint and the result shows the achieved value of the objective function for the proposed approach is less than that of other approaches. The authors also reported the time complexity of the proposed VM placement algorithm as O(a × v × n) where v is the number of VMs, n is the number of PMs and a is the number of fault-tolerant strategies.

4.3 High-performance computing (HPC) applications

Cloud computing can be envisioned as a potentially cost-effective solution for high-performance computing applications. This can be particularly attractive for users with small computing capability who are unable to establish their own cluster infrastructure. The pool of interconnected commodity computers, as well as virtualization technology, makes the cloud a considerable choice for HPC applications [122]. However, the difference between the nature of HPC application and current cloud architecture might hinder effective utilization of cloud infrastructure for HPC purposes. An HPC-aware VM placement strategy is expected to improve the performance of HPC applications because in HPC with loosely coupled architecture, internal computing nodes frequently interact with each other. Some works such as [122, 123] studied a VM placement strategy with taking into account the characteristics of HPC in the cloud.

Gupta et al. [122] explored the challenges and advantages of HPC oriented VM Placement technique for the cloud computing environment. Two techniques for optimizing the VM placement with subject to HPC applications are implemented. These techniques are topology awareness and hardware awareness. Topology awareness requires providing the knowledge of network topology to the HPC application. Typically, in the context of cloud, the cluster topology is transparent to the users. However, for an HPC application, the goal is to place VMs into PMs having the least possible distance from each other in order to reduce communication overhead. To address this issue, the authors attempted to place all requested VMs on the same rack rather than randomly distribute them over the data center. Hardware awareness requires providing the specification of the underlying hardware to the HPC application. HPC applications are composed of a number of iterations. There are two different phases to be performed in each iteration as computation and communication/synchronization. The next iteration cannot be started unless all other processing nodes have fully completed their previous iteration. Cloud infrastructures consist of heterogeneous commodity PMs. When there is a slow PM that it takes a longer time to complete iteration, the time in faster PM wasted and this degrades the overall application performance. In compliance with hardware-aware characteristic, a proper VM placement strategy is expected to place all VMs to a set of PMs with equal computing power. Authors address this issue by attempting to place all the requested VMs on the identical type of processor. The topology awareness and hardware awareness are implemented on top of OpenStack scheduler layer [124]. The OpenStack Scheduler is responsible for receiving the VM request and determining the proper PM for hosting the VM. An evaluation was conducted based on OpenCirrus [125] test-bed. The result indicates that using a topology-aware mechanism results in a 5% improvement in performance as compared to the random scheduling. In addition, after applying the hardware-aware technique, 20% of time * N CPU-Hours (N: number of processors used) improvement is achieved in terms of execution time.

Due to the communicative nature of HPC applications, they are often subject to competitive access to Shared Last Level Cache (SLLC). This incurs a serious issue called cache contention. Cache contention overshadows the performance isolation offered by virtualization to HPC applications running within VMs. Jin et al. [126] addressed the performance degradation resulted from cache contention of applications in HPC cloud. An enhanced reuse distance analysis with accelerated cyclic compression algorithm is employed to classify the HPC applications based on their cache access behavior. According to this classification, the HPC cloud applications are divided into three different categories as Cache Pollution Applications: which occupy a large amount of cache capacity, Cache Sensitive Applications: which strongly depends on the available cache resources, and Cache friendly Applications: which consumes a small amount of cache capacity. In addition to the reuse distance analysis, CCAP: Cache Contention-Aware Virtual Machine Placement method is designed to cope with the cache contention problem. CCAP dispatches VMs to the distinct cores based on the applications’ cache behavior information. Indeed, CCAP tends to minimize interference of cache sensitive applications and cache pollution applications and thus mitigates the negative impact of cache contention. The result of the evaluation shows that CCAP significantly enhances the performance of cache sensitive applications when they are co-scheduled with cache pollution applications.

Similar to [126], Kim et al. [127] addressed the performance degradation of the applications hosted in multiple VMs. The VMs are to be mapped to PMs with the modern multi-core processor architecture. In this architecture, each individual core has its own private cache while last-level cache (LLC) and memory bus are shared among different cores. Co-located VMs on a PM with multi-core processor contend for accessing LLC and memory bus and therefore performance degradation arises due to interference among applications. A performance model is proposed based on two measures: Interference Intensity and Interference sensitivity. Interference intensity is a measure of how much an application hurts other co-located applications and interference sensitivity is a measure of how an application suffers from other co-located applications. Based on this performance model, a VM placement algorithm called swim is presented. Swim aims to minimize the average performance degradation ratio of all the applications. The main idea behind swim is to co-locate high interference-intensive VMs with less interference sensitive VMs. The experimental results show that applying swim causes similar performance degradation as compared to the optimal allocation.

In another study, based on performance analysis, Mc Evoy et al. [128] conclude that chosen strategy for mapping virtual clusters to the physical resource together with the inter-communication pattern between the application processes has a significant impact on the performance of HPC parallel application.

5 Taxonomy

This section provides a thematic taxonomy on VM placement approaches as depicted in Fig. 19. The presented taxonomy is organized based on the several parameters and aspects such as uniformity of PMs/VMS, number of clouds, operation mode, problem objectives, methodology, number of objectives and resource demand mode. On the basis of this taxonomy and after discussion of the aforementioned aspects in the following sections a detailed comparison of different methods is also presented in Table 4.

Fig. 19
figure 19

Taxonomy of the state-of-the-art VM placement techniques

Table 4 Comparison of different virtual machine placement schemes

5.1 Uniformity

VM placement is defined in two different contexts, namely: VM placement in the heterogeneous environment data center and homogeneous environment. In the heterogeneous platform, PMs have different hardware specifications (such as CPU speed, memory size, and disk storage amount) [129] while in the homogeneous platform all the machines have an identical hardware configuration. In a cloud computing environment where old computing nodes and new ones operate alongside, the placement is often carried out on a heterogeneous platform. As a result, an assumption of homogenous PMs for a placement strategy is less realistic and can be a limiting factor in practice. In the design and implementation of a placement policy, the uniformity mode of underlying hardware should be taken into account. Aside from uniformity of PMs, the uniformity of VMs refers to mapping a set of VMs with identical hardware specification onto a set of PMs.

5.2 Number of clouds

Although the majority of works studied the VM placement problem in a single cloud environment, the problem is also addressed [40, 90, 130, 131] in the multi-cloud scenario. Figure 20 shows the architecture for multi-cloud scenario. In a multi-cloud scenario, a cloud client requests for a collection of VMs to be deployed across multiple cloud providers. Each cloud provider offers different pricing plan, diverse VM instance types and has different resource management interfaces [132], Due to complicated decision-making task in such a scenario for a typical client, often, a cloud brokering mechanism is required to serve as an intermediary between client and providers [40, 132]. The broker middleware gathers information from different individual clouds and then optimally distributes the VMs to the most suitable servers across multiple cloud systems based on the VMs requirements and clouds resources specifications [132]. In addition, the cloud broker provides a uniform management interface to the client with a transparent view of a heterogeneous set of providers regardless of particular cloud provider technology [40, 90]. The advantage of a multi-cloud service for the client is reducing cost, fault tolerance and enhancing service reliability. In dynamic placement modes, one of the salient challenges is communication overhead due to VM migration between different cloud providers [132]. Again, in case of tightly coupled VMs with large inter-VM traffic, the multi-cloud scenario results in high communication overhead between different clouds [90]. Furthermore, in long term period, the cloud specification such as prices schemes, VM instances types is subject to frequent revision and therefore the placement algorithm is required to be recurrently running to adapt the resource allocation with latest changes in cloud provider [40, 90].

Fig. 20
figure 20

Multi-cloud architecture

5.3 Operation mode

The problem of placing a set of VMs on proper PMs is defined under two different modes: Static (or Initial or offline) placement and Dynamic (or online) placement. The static placement is to place a number of VMs at once on an unloaded data center with subject to VM requirements and resources capacities of PMs. The static placement is also often performed when the system resumes operating after a period of idleness or reset. The decision on a static placement plays an important role in the overall data center performance since the large change in initial VM assignment incurs extra migration and therefore large communication overhead is imposed. In addition, the static placement is performed less frequently compared to the dynamic placement and has long term effect since extensive changes incur large overhead [12]. On the other hand, in dynamic placement, VMs are re-assigned to PMs due to the unforeseen changes in VM requirement, VM termination, halt and launch of new VMs. In a dynamic placement scenario, an instant decision at run-time should be made to minimize the migration overhead and improve the overall system performance [41]. Indeed, for each mode of placement different strategy is sought.

5.4 Objectives

In general, the objectives for VM Placement can be in two different types as Cloud Service Provider (CSP) oriented or user-oriented. The CSP oriented objectives are defined to fulfill the requirement or minimize a cost for the sake of data center provider’s benefit while the user-oriented objectives are meant to serve users in a faster and affordable way. For example, power consumption as a common objective is meant to reduce the operational cost for a data center provider. In contrast, the cost of deployment is a typical objective of interest for a cloud user.

5.5 Methodology

In this section, a spectrum of the most prominent algorithms proposed in the literature for dealing with VM placement is briefly described. Finally, a comparison of these algorithms based on their advantage and limitation is presented in Table 5. The algorithms proposed for solving the VM placement problem can be generally classified into two different categories as exact algorithms and approximate algorithms. The exact algorithms are guaranteed to provide an optimal solution to the problem. However, they are not practical because of their high computational time unless they are used to solve small-sized problems. If finding an optimal solution takes a long time, then we should make a tradeoff between optimality and efficiency. In practice and for a large-sized problem having a large number of VMs and PMs as input (in case of VM placement problem), approximate algorithms (either heuristics or meta-heuristics) are utilized to deliver a sub-optimal solution within a reasonable amount of time. The heuristic algorithms are specific problem-dependent methods which take advantage of the problem specification to solve the problem. On the other hand, meta-heuristic algorithms are generic problem-independent methods which can be used to solve a wide range of complex problems.

Table 5 Comparison of existing solutions for solving VM placement problem

5.5.1 Greedy heuristics

A greedy algorithm is a simple optimization algorithm which goes through a series of stages. In each stage of the algorithm, the best possible choice should be made for that particular stage. A greedy algorithm makes a sequence of local optimal choices which are anticipated to construct a globally optimal solution at the end [133, 134]. Even though, the optimal solution is not always guaranteed in these techniques, a reasonable time; especially for large problem size; is spent to achieve a sub-optimal solution in contrast to exact approaches [67].

The VM Placement is often formulated as a variant of multidimensional bin packing problem. In multi-dimensional bin packing problem, a set of objects with different dimensions are to be packed into a number of bins with multiple dimensions. The goal is to pack the objects into a minimum number of bins. The bin packing problem is also recognized as an NP-hard problem. That is, there is no optimal solution with polynomial time complexity. When VM Placement is reduced as bin packing problem, PMs represent bins while individual VMs stand for objects. By far, a number of well-known heuristics have been proposed to deal with the problem [49, 67, 135]. Most of these heuristics are greedy-based algorithms. In the following, a few well-known greedy-based heuristics which are intended to address VM Placement as a variant of multidimensional bin packing problem are described.

First Fit Decreasing (FFD) FFD is one of the well-known heuristics for tackling multidimensional bin packing problem. The basic idea of FFD is to sort the list of VMs based on the descending order of certain criteria first and then place VMs sequentially from VM with largest criteria to smallest one to the first PM with sufficient residual resource. The criteria can be certain resources such as (CPU, memory) [67] or a single scalar which is calculated as a function of individual resources in a VM. FFD is proven to allocate VMs to not more than \(\frac{11}{19} OPT + 1\) PMs where \(OPT\) is the optimal number of PMs [112].

Best Fit Decreasing (BFD): BFD first makes a list of VMs which is sorted according to the descending order of certain criteria similar to FFD. However, in the second step, VMs are sequentially placed on a PM with least sufficient residual resource. BFD is also shown to require \(\frac{11}{19} OPT + 1\) PMs in the worst case.

Worst Fit Decreasing (WFD): In WFD heuristic, the list of VMs is first sorted according to certain criteria like two other aforementioned heuristics. However, in the next step, VMs are sequentially placed on a PM with largest sufficient residual resource.

5.5.2 Linear programming and integer programming

Linear programming (LP) is a mathematical technique for optimization in which the objective functions to be optimized (maximized or minimized) is represented in a linear form. Besides, a number of linear equality and inequality constraints should be satisfied. In particular, integer programming is a special form of linear programming in which variables can take integer values only [136]. A variety of real-world problems can be modeled and solved by LP. Generally, an LP is expressed as follows [137]:

$$\begin{aligned} Maximize/Minimize\, Cx \hfill \\ \begin{array}{*{20}c} {Subject\,to:} & {Ax \le B} \\ {and} & {x \ge 0} \\ \end{array} \hfill \\ \end{aligned}$$
(41)

where C is a vector of constants, \(\varvec{x}\) is a matrix of variables and A and B are matrixes of coefficients. After representing a problem in LP form, a specific solver such as CPLEX [92] is used to solve the problem.

5.5.3 Genetic algorithm

Genetic algorithm (GA) which was first introduced by Holland [138] is a search technique for solving the optimization problems. GA emerged as a popular and powerful approach in finding near-optimal solutions for the complex problem with large search space in different domains. The technique is an inspiration of biological evolution that takes place in nature as expressed by Darwinian Theory of natural selection. In the natural environment, the fittest living organisms are more likely to resist against diseases and other dangers and eventually are able to survive and reproduce the next generation which have even fitter individuals than the previous generation. A simple genetic algorithm (SGA) simulates the natural evolution through a series of computer instructions. SGA commences with an initial population of random individuals. Each individual represents a candidate solution to the problem at hand. The fitness of an individual commensurate to the degree it minimizes/maximizes the problem’s objective function. To create a new generation of individuals, first, individuals with the highest fitness value are chosen through a particular selection mechanism. Then, crossover and mutation operators are applied to the selected parent in order to produce new offspring. The iterative evolution from one generation to the next is continued until a solution with satisfactory fitness is discovered.

5.5.4 Ant colony optimization

Ant colony optimization (ACO) is population-based meta-heuristic which is inspired by the foraging behavior of ants in nature [139]. This behavior enables real ants to find the shortest path from their nest to the food resources [140]. This is carried out by depositing pheromone trail on the ground as a medium of intercommunication among ants. This characteristic is simulated by artificial ants to solve combinatorial optimization problems [141].

5.5.5 Simulated annealing

Simulated annealing (SA) is a popular search heuristic for combinatorial optimization inspired by the annealing process in metallurgy where a metal is heated to its melting point and then it slowly cooled again [142]. In each iteration, SA generates all the next moves at the neighborhood of the current solution. Then a move is randomly picked. The moves that improve the quality of the solutions are always accepted while non-improving moves are accepted with a certain decreasing probability of less than one [143]. In fact, the key feature of SA is to overcome getting stuck in local optima which is occurred in older techniques like hill-climbing [144].

5.5.6 Artificial bee colony optimization

Artificial bee colony (ABC) which was first introduced by D.Karaboga in [145] is a subclass of swarm-intelligence based algorithms that imitate the collective intelligence of honeybee swarms to solve various optimization problems. In ABC algorithm the colony of artificial bees split into three (3) groups, namely: employed bees that forage for food, onlooker bees who observe other bees and scout bees that randomly search for new food sources. The position of the food resource represents a potential solution to the problem. For each food source, only one employed bee is designated and the quality of the solution is proportional to the available amount of nectar in the source. The employed bees forage for food sources and when they bring the nectar to the hive they share the gathered information of food source, its position and its quality with onlooker bees through a so-called waggle dance as a medium of communication. The onlooker evaluates the information from employer bees and chooses the best food source to forage [146]. While employer bees exploit the search space by slightly modifying the food source’s position with the hope of improving the solution, scout bees perform exploration by randomly discovering new promising sources.

5.5.7 Tabu search

Tabu search (TS) is a local search strategy which is developed by Glover [147] to cope with trapping into local optima in SA. The strategy is characterized by its capability of memorizing a history of previously encountered solutions. TS uses a short term memory called Tabu list to record the recently explored solutions and therefore avoids re-visiting the solutions. This is carried out to prevent recycling problem. A tabu search algorithm begins by evaluating all the neighbor solutions to the current solution. Then, a solution with the highest quality is selected and the tabu list is updated accordingly.

5.5.8 Imperialist competitive algorithm

The imperialist competitive algorithm (ICA) was first introduced in [104] to solve the real-world optimization problems. The algorithm mimics the imperialist competition among empires. Analogous to Genetic Algorithm, ICA commences with an initial population of random individuals. Each individual is called a country. A country can be either imperialist or colony. During the imperialist competition process, the powerful imperialist which represent a better solution to the problem makes the weaker ones collapse and take control of their colonies. The same competition will be iterated until a single empire including an imperialist along with its colonies is left at the end. The final imperialist represents the best-found solution to the problem.

5.5.9 Memetic algorithm

Memetic algorithm (MA) was first introduced by Moscato [148]. MA is population-based metaheuristics and a hybrid form of genetic algorithms (GA) and local search techniques [149,150,151]. Incorporating the local search capability into the genetic algorithm accelerates its search and increase the chance of convergence [151]. Similar to GA, MA begins with a population of random members. Then, a local search is applied to each member to improve the quality of the solution it represents. Thereafter, new offspring are produced by performing the crossover and mutation operators. Again the local search is applied to new offspring forming the new population. Producing new generations are continued until convergence occurs [151, 152].

5.6 Resource demand type

Most of the studies on VM Placement assume the VM resource demands are constant value over time. This type of demand is called static/deterministic demands [153]. Static demands are simply compared with the residual capacity of target PM at the time of placement. Unlike VMs with static demand, VM with dynamic demand changes their amount of resource requirement during their lifetime. According to some recent studies [154,155,156], the VM’s demands for particular resources such as network bandwidth can be fluctuating and therefore difficult to anticipate at initial [157]. In this case, mean or maximum VM demand is used as an estimated value although the estimation is not always accurate and may result in over-provisioning or resource wastage [153, 158]. Some works like [153, 158] investigated VMs with stochastic demands and used a probabilistic model based on a random variable to represent the uncertainty of future demands [157, 158]. In one of the attempts to estimate dynamic demands, Isci et al. [159] introduced a resource demand estimation technique which is meant to be lightweight, accurate and general. Through an experiment on synthetic and real data, authors found the technique is able to significantly improve the efficiency of dynamic VM placement.

6 Open issues and future directions

This section is to highlight a number of salient research directions which are less focused in the past and thus deserve more attention by researchers. The presented items suggest a potential platform for future research work in the domain of VM placement.

6.1 Thermal-aware placement policy

A significant portion of electricity spent on computer equipment is transformed into the heat. Operating in high temperature reduces the lifetime of hardware parts and makes them less reliable and susceptible to failure and malfunctioning. Therefore, keeping hardware items in a safe temperature zone is always necessary. Even though, today’s the cooling systems are widely exploited in modern data centers, using these systems is subject to additional expenditure for purchase, maintenance and electricity consumption. One of the potential ways for minimizing the heat dissipation is through continuous monitoring of the thermal state of PMs and re-placement of VMs once a hotspot is created [160]. As a result, the relieved PM requires less cooling power. Further research work is required to address the thermal topology and analysis for a data center in order to facilitate the efficient placement of VMs.

6.2 Price aware placement policy in a multi-cloud scenario

Today, the diverse number of cloud providers established a competitive market for users. Each cloud provider offers diverse service plans with different time-varying price schemes (dynamic or static), specifications and value-added features. This circumstance offers an opportunity for the users to select the most affordable service with a maximum level of the desired quality. To best of our knowledge, there are few works [90] to address this issue and further research seems necessary to study, design and implement schemes for price aware placement techniques in the multi-cloud scenario.

6.3 Security

Although VM Placement has been studied in several performances oriented aspects, the problem is less explored from the security and privacy perspective. For instance, the client might require two specific VMs not to be run on the same PM and this is to prevent potential leakage of business secrets to competitors. In addition, clients are might be interested to restrict the placement of VMs to some data centers with certain geographical points (e.g. due to some legal issues.)

6.4 Scalability

For many current VM placement approaches, there is no much rigorous analysis and evaluation to demonstrate the proposed strategy efficiently scales up to the modern gigantic data centers with thousands of VMs and PMs. Thereby, a potential future research direction could be studying the extension of current multi-cloud placement techniques to cope with real-world large scale data centers.

6.5 A dynamic placement scheme from the scratch

Although dynamic VM placement was studied in a number of research works, majority of these works are designed on the basis of a recurring static placement across different time-slots or consolidation states or they have merely focused on variable resource demands of VMs [7, 67, 83]. In these works, at the beginning of each time-slot, a static VM placement algorithm is invoked again and the difference between two outputs/states (for previous and current time-slot) is calculated. The difference determines the VMs that should be migrated to other PMs as well as the PMs that should be powered-on/off. However, to best our knowledge, none of these works addressed the VM placement as a native dynamic scenario in which every VM has its own lifespan and during its lifetime it may undergo load change. A potential future direction in this context is the investigation of a comprehensive dynamic mechanism for VM placement that addresses dynamic creation, dynamic deletion, dynamic resource demand of VMs, failure of PMs and addition of new PMs.

6.6 Renewable energy resources

Modern data centers have begun moving toward exploiting renewable green energy resources such as solar, wind and tidal power as a replacement to the energy that is supplied from the electrical grid. In a geographically distributed cloud system having a number of data centers sites spread over different points of the globe, each data center may be operated using a particular type of renewable energy. Availability of these energies is subject to time or weather condition in their location. The efforts are necessary to make efficient use of these energies. Achieving this requires design and implementation of VM placement strategy which works in accordance with the availability of energy resources. For instance, during night time, PMs in some of data centers around the globe lose their source of solar energy and therefore their running VM should be migrated to other PMs in data centers currently operating in the daytime. Initial placement of VMs also must consider the availability of energy in each data center location before placing the VM.

6.7 Multi-core processors

To best of our knowledge, so far a little research works have been carried out to study, design and development of the VM placement policy that is compliant with modern multi-core processing architecture. In particular, an elaborated analytical power model seems essential to precisely estimate the degree of energy consumption in these systems. In addition, an adequate effort should be devoted to mitigating the potential performance degradation due to the contention of a shared resource in these systems.

6.8 Resource dimensions

Majority of research works discussed in this paper addressed the VM placement problem with main focus on CPU and memory as two prime resource types as reflected in Table 4. However, in the era of modern applications such as online gaming, video streaming and augmented reality, other resource dimensions such as Graphical Processing Unit (GPU) as bottleneck can also be important factors in resource management and significantly contribute to the amount of power consumption in a data center. Therefore, prospective research in this domain should devote adequate effort to study the impact of GPU in the designed power model and resource management policy.

7 Conclusion

One of the most critical issues for large scale data centers is the substantial growth of power consumption. Efficient management of hardware resources can significantly reduce the amount of power consumption in a data center. Many of the servers in a data center operate at low utilization level. Minimizing underutilized servers to an optimal number of fully utilized servers and turning off the spare servers would significantly help in cutting down the rampant electricity consumption. In a virtualized data center, VM placement is a primary and complex decision which can affect the overall energy consumption in a data center.

In this paper, we have investigated several proposed methods in the literature for dealing with VM placement problem. The problem, as we already saw, is defined under diverse settings in terms of number of objectives (either single objective or multi-objective), type of objectives (energy consumption, resource utilization, number of PMs and etc.) and presence of constraints. These settings vary from one scheme to another. Based on our observation from different research works, it appeared that the choice of problem setting is highly dependent on the particular context and priorities. For example, when energy efficiency is the most important or critical issue in a data center administration it is usually selected as the main objective of interest. The other less important issues can be simply ignored or added as a constraint to the problem formulation. We found FFD the most common method to deal with the VM placement in its single objective from and it provides a sub-optimal solution with a worst-case limit. As we saw in this paper, many research works compare their proposed method to the FFD as a baseline.

When there are two or more equally important objectives that are needed to be minimized/maximized, we are encountering with intrinsic multi-objective VM placement problem. We observed that many existing schemes try to tackle multi-objective VM placement problem using weighted sum approach. Although the approach is simple and straightforward to use, it has some disadvantages as discussed earlier in this paper. In general, taking advantage of Pareto-based approaches is more recommended for tackling VM placement with multiple objectives since these approaches find the set of solutions that are optimal with subject to multiple objectives. As the search space for VM placement and particularly multi-objective VM placement problem as an NP-Hard problem is extremely large, the naïve (or exhaustive) search could be computationally expensive. Therefore, using meta-heuristics such as evolutionary techniques or swarm based intelligence methods are helpful and necessary for medium-sized to large data centers with a large number of PMs and VMs. These techniques try to find near-optimal solutions in a reasonable amount of time. CPU and memory are the two most common hardware resources considered in the literature. However, in some applications the other resources such as GPU and network equipment have a significant impact on the objective of VM placement and future work must have more concentration on these resources.