1 Introduction

Cloud computing is changing the way in which people use computing resources. Recent research shows more than 93% organizations are using or preparing to use Infrastructure as a Service (IaaS). With the help of Cloud, people no longer need to buy and maintain their own physical devices. Instead, they can lease computing resources from Cloud providers to serve their needs. More and more people choose to put their job into Cloud which leads to the expansion of data centers’ scale. Therefore, the amount of virtual machines in data center rises quickly. Increasing demand for cloud computing resources and the expanding of data centers’ scale pose a challenge to efficiently managing users’ request and physical resources.

Traditional researches usually focused on improving resources such as CPU and memory utilization by introducing intelligent virtual machine placement algorithms [12, 20] and periodically virtual machine consolidation [5]. These studies often turned the problem into multi-dimensional classical bin packing problem which is known to be NP-hard and solved by heuristic algorithms. Recently, more studies paid their attention to improving communication ability of virtual machines since numerous of distributed computing tasks like map-reduce have been deployed into data centers. Such distributed computing tasks involve massive data transfer among virtual machines which require high network bandwidth guarantee.

To guarantee the performance of distributed tasks wit high bandwidth requirements, cloud managers should assign virtual machines with large mutual bandwidth usage to host machines in close proximity. However, though a variety of network-aware virtual machine placement algorithms have been proposed, at the best of our knowledge, none of these researches considered a specific network-aware virtual machine consolidation algorithm. When tasks finished, related virtual machines would be shut down and deleted from the system which generate plenty of resource fragments. Such fragments lead to low resource utilization as well as increase network delay. Furthermore, it is hard to predict communication patterns. Thus, during the running time, some heavy communication traffic may across a long distance. To solve this problem, we designed a self-adaptive network-aware virtual machine clustering and consolidation algorithm which automatically detects resource fragments and high-cost network communication virtual machines and then clusters and consolidates them through appropriate live migrations. The target of our algorithm is to maximize communication ability among virtual machines to improve their performance with relatively low cost. The algorithm has linear time complexity and is suitable for using in large data center. And experiments showed that our algorithm considerably reduced the amount of high-delay tasks, lowered the average traffic passed through aggregate switches and improved the communication ability among virtual machines.

The rest of this paper is organized as follows. Section 2 presents related works. Section 3 introduces the data centers’ architecture and analyzes the network influence on typical applications in data centers. Section 4 formulates the virtual machine placement and consolidation problem and proves the complexity. Section 5 explains the proposed self-adaptive network-aware virtual machine clustering and consolidation algorithm. After that, Sect. 6 presents the experimental results in simulation and in our cloud computing testbed, and Sect. 7 gives the conclusion of this paper. In Sect. 8, we introduce the next step of our work.

2 Related works

Meng Wang et al. [17] formulated the virtual machine placement problem into a stochastic bin packing problem and proposed an online packing algorithm. Zhu Jiang et al. [21] studied the problem of virtual machine allocation under the consideration of providing bandwidth guarantees and proposed an online allocation algorithm for tenants with homogeneous bandwidth demand, which aimed to improve the accuracy of existing algorithms. Meng et al. [10] proposed a traffic-aware virtual machine placement algorithm to improve the network scalability which tried to allocate VMs with large mutual bandwidth usage to host machines in close proximity. Alicherry et al. [1] extend the network-aware virtual machine placement problem into distributed Cloud. They divided the problem into two stage, firstly choosing data center and then choosing rack and server, and proposed a two-approximation algorithm. And Steiner et al. [15] considered the same situation as Alicherry, but they focused on service level. Jiang et al. [8] studied a joint tenant (e.g., server or virtual machine) placement and routing problem and gave an approximation to minimize traffic costs. Ofer Biran et al. [3] proposed a heuristics network-aware VM placement algorithm which is trying to allocate a placement that not only satisfied the predicted communication demand but was also resilient to demand time variations. Wilson et al. [18] studied influence of bandwidth on web applications in data center and proposed a deadline-aware control protocol to lower flow latency and improve burst tolerance. Kliazovich et al. [9] considered the network-aware virtual machine placement problem in perspective of energy saving. Ming Xia et al. [19] studied the traffic features of virtualized NFs (vNF), and formulated the problem of optimal vNF placement into binary integer programming (BIP), and proposed an alternative efficient heuristic algorithm to solve this problem. Beloglazov et al. [2] studied the virtual machine consolidation problem in OpenStack platform and implemented a program called NEAT to do consolidation. Ferdaus et al. [6] surveyed network-aware virtual machine placement problem and identified the benefits and limitations of the existing techniques. All these works paid their attention to real-time network-aware virtual machine placement when user’s requests arrived but ignored the benefit of consolidation.

Breitgand et al. [4] studied the cost of reconfiguring virtual machines in response to workload variations. They observed that live migration requires a significant amount of spare CPU on the source server and if spare CPU is not available, and it impacts both the duration of migration and the performance of the application being migrated. Alexander et al. [14] introduced network topology-aware scheduling models which took workload characteristics, network bandwidth requirements of migrations and network topologies into account. Shrivastava et al. [13] studied the inherent dependencies between VMs and the complex load interactions between the underlying physical server. They introduced an application-aware virtual machine migration algorithm which incorporated inter-VM dependencies and the underlying network topology into virtual machine migration decisions. These works carefully studied the process and cost of live migration of virtual machine which were the basis of our study.

3 Data center architecture and typical application

The most typical network structure in data center is a tree or tree-like topology which is described in Fig. 1a. Figure 1b shows our cloud computing testbed. Commonly in modern data center, each physical server has four or six NICs, and each two NICs are made a bond. These two or three bonds are used for management, virtual machine traffic and storage traffic, respectively. Each bond directly connects to one rack switch, and each rack switch connects to a group switch (aggregate switch). Finally, each group switch connects to top-level switch. Since these three networks have the same structure, we simplified them into a single tree model. In the tree structure, the bandwidth of higher-level switch is shared by lower-level switch which leads to decrease in bandwidth in inter-group communications. Commonly, the bandwidth of two nodes in a same TOR is up to 10Gb/s, while the bandwidth of two nodes in different aggregates switch is lower than 1Gb/s. Unfortunately, two typical types of applications in data centers, distributed computing and three-tier web application, both involve large data transfers which apparently requires high network bandwidth guarantee. Therefore, in this section, we researched on the communication influence on these two applications. We use phrase tree level as the meaning of the level of network tree which traffic passed through.

Fig. 1
figure 1

Network architecture. a Abstract network architecture, b experimental cloud computing

3.1 Distributed computing scenario

Leasing data center’s resources to run temporary distributed computing task is one of most popular applications in cloud computing, and map-reduce is a typical type of distributed computing. Therefore, we choose PUMA,Footnote 1 a map-reduce benchmark, to test the network influence on the performance of MapReduce jobs. We did our experiment under two situations, network traffic normal and overload. We compared job’s finish time in different ways of virtual machine placement under these two situations.

Table 1 MapReduce performance in different network condition
Table 2 RUBiS performance in different network condition

As Table 1 shows, MapReduce job’s average finish time prolongs with tree-level rises. And when traffic is overloaded, the negative influence on the performance of MapReduce jobs becomes more conspicuous.

3.2 Web application scenario

Current web applications normally use three-tier architecture. Considering the performance, the manageability and the robustness, developers would choose to deploy each tier into different physical machines or virtual machines. Therefore, the communication ability between VMs affects the overall performance of web application. RUBiSFootnote 2 is an auction site prototype modeled after eBay.com which is used by us to evaluate the performance of application servers (virtual machines).

As Table 2 shows, in normal situation, average response time almost stays same when tree level rises. However, when traffic is overloaded, average response time increases quickly with tree-level rises. The response time in worst case nearly doubled compared with the best case.

4 Virtual machine placement and consolidation

In the previous section, we found that the tree level where inter-virtual machine traffic passed through has a strong impact on the performance of distributed computing and web application tasks. In addition, after a long-time running, some virtual machines would be shut down or deactivated because the job on them has finished which may lead to resource fragments. These fragments will lead to scattered task phenomenon (e.g., a task’s virtual machines have been allocated into different racks even different rack groups). Simulation of continuously submit and run distributed computing tasks in Fig. 2 shows, as time goes on, the amount of scattered task rises.

Fig. 2
figure 2

Rising of scattered tasks

Therefore, even assumed that we know the traffic model at first, we cannot easily allocate them into same rack due to the fragments problem. Besides, not all users could predict their tasks’ traffic model precisely or they just not submit all resource requirement in single time.

4.1 Virtual machine placement

To decrease the amount of scattered tasks, firstly, we give the formal definition of the virtual machine placement problem.

Suppose a data center has \(k_\mathrm{{RG}}\) rack groups, each rack group has \(k_{R_{i}}\) servers. The total amount of physical servers is \(k_{P}\). We denote each server as \(p_{i}\) and the physical resource which belongs it as a vector \(\mathbf {H}_{i}\). If \(p_{i}\) is running, then \(S(i)=1\), otherwise \(S(i)=0\). \(\mathrm{RG}(i) = i_\mathrm{{RG}}\) and \(R(i) = i_{R}\) means that server \(p_{i}\) belongs to \({i_\mathrm{{RG}}}\)th rack group and \({i_{R}}\)th rack in this group. \(h_{ij}\) is the min tree level between \(p_{i}\) and \(p_{j}\) which could be computed by \(\mathrm{RG}(i)\) and R(i), and we define \(h_{ii}=0\).

Consider the situation that a user \(u_{k}\) initiates a task and this task requests for a group of VMs \(G_{k}\) and the amount of \(G_{k}\) is \(w_{k}\). Each virtual machine \(v_{i}^{k}\) in the \(G_{k}\) has its specific requirement for CPU cycles and memory sizes which denotes as vector \(\mathbf {R}_{i}^{k}\), and \(\mathbf {R}_{i}^{k}\) could be extended to include other resources such as I/O operations and outlet bandwidth according to specific application. \(T_{k}\) is the \(w_{k}\times w_{k}\) traffic matric of \(G_{k}\), and each item \(t_{ij}^{k}\) in \(T_{k}\) is the traffic between \(v_{i}^{k}\) and \(v_{j}^{k}\) during time \({\Delta }\,t\). The traffic matric could either be given by user or be calculated by service provider through the collected data. Notice that our traffic model is quite generic and clearly it could cover both three-tier web application and distributed computing situations. We denote \(Z_{i}^{k}(m)=1\) if virtual machine \(v_{i}^{k}\) is hosted on a physical server \(p_{m}\), otherwise \(Z_{i}^{k}(m)=0\). Therefore, a feasible decision space for virtual machine placement is characterized by

$$\begin{aligned} \mathcal {Z}= & {} \left\{ \left\{ Z_{i}^{k}(m)|Z_{i}^{k}(m) \in ~\{0,1\}, \sum \limits _{m=1} Z_{i}^{k}(m)=1,\ \forall (i,k) ;\right. \right. \nonumber \\&\quad \left. \left. \sum \limits _{k=1}\sum \limits _{i=1}^{w_{k}}Z_{i}^{k}(m)\cdot \mathbf {R}_{i}^{k} \le \mathbf {H}_{i} \cdot S(m),\ \forall {m} \right\} \right\} , \end{aligned}$$
(1)

where the first equation guarantees that each virtual machine is placed on exactly one host, and the second equation ensures that the total resource consumptions of virtual machines on a physical server should not exceed their host’s physical capacity in the condition that the machine is turned on. \(HV_{Z}(v_{i}^{k},v_{j}^{k})\) is the min tree level between \(v_{i}^{k}\) and \(v_{j}^{k}\) in the placement Z.

$$\begin{aligned} HV_{Z}(v_{i}^{k},v_{j}^{k}) \ = \ \sum _{m=1}\sum _{n=1}\left[ h_{mn} \cdot Z_{i}^{k}(m) \cdot Z_{j}^{k}(n) \right] . \end{aligned}$$
(2)

4.2 Traffic pattern modeling

Since we have the specific position of each virtual machine and the traffic matrix between virtual machines, we could calculate the total traffics of each switch and then judge whether it is overloaded. We mark all switches in the data center as \(\{s_{0},s_{1}\ldots \}\). \(L(m,n) = (b_{0},b_{1},b_{2}\ldots ), b_{i}\in ~\{0,1\}\) is a route function between \(p_{m}\) and \(p_{n}\), \(b_{i}=1\) means traffic between \(p_{m}\) and \(p_{n}\) would pass through switch \(s_{i}\), otherwise traffic would not pass through switch \(s_{i}\). After we defined L(mn) between \(p_{m}\) and \(p_{n}\), we could define \(\mathrm{LM}\left( v_{i}^{k},v_{j}^{k}\right) \) as the route function between \(v_{i}^{k}\) and \(v_{j}^{k}\).

$$\begin{aligned} \mathrm{LM}(v_{i}^{k},v_{j}^{k}) \ = \ \sum _{m=1}\sum _{n=1}\left[ L(m,n) \cdot Z_{i}^{k}(m) \cdot Z_{j}^{k}(n) \right] . \end{aligned}$$
(3)

Therefore, the total traffic of a task on all switches could be calculated as follows.

$$\begin{aligned} D(k) \ = \ \sum _{i=1}^{w_{k}}\sum _{j=1}^{w_{k}}\left[ \mathrm{LM}(v_{i}^{k},v_{j}^{k}) \cdot t_{ij}^{k} \right] . \end{aligned}$$
(4)

4.3 Measurement of network benefit

As shown in previous sections, an overloaded switch has a strong negative effect on the performance of virtual machines which have large traffic passing through it. Since we have already got all tasks’ traffic and their route, we could get the total traffic of switch \(s_{i}\),

$$\begin{aligned} \mathrm{DS}(i) \ = \ \sum _{k=1}\ D(k) \cdot \mathbf {1}_{i}, \end{aligned}$$
(5)

where \(\mathbf {1}_{i}\) means a vector where only ith bit equals 1, other bits equal 0. If \(\mathrm{DS}(i)/\Delta \ t \ge C(i)\), C(i) is capacity of switch \(s_{i}\), \(s_{i}\) is overloaded. However, it is obviously too late to reschedule while a switch is overloaded. Therefore, we introduce parameter \(\alpha \), if \(\mathrm{DS}(i)/\Delta \ t \ge \alpha \cdot C(i), \alpha \in (0,1]\) and \(s_{i}\) is not the rack switch, it will trigger consolidation.

Assume at time \(t_{0}\), the placement of virtual machines was \(\left\{ X_{i}^{k}(m)\right\} \), \(\left\{ X_{i}^{k}(m)\right\} \in \mathcal {Z}\); and after consolidation, the new placement of virtual machines is \(\left\{ Y_{i}^{k}(m)\right\} \),\(\left\{ Y_{i}^{k}(m)\right\} \in \mathcal {Z}\).

Our target is to improve the communication capability among VMs in each group \(G_{k}\); thus, a practical and effective way is to keep all VMs of a group in a close position (e.g., schedule them into the same server or the same rack). It has three advantages. Firstly, the lower tree-level traffic has to pass through, the higher bandwidth it may shared. Secondly, if traffic only pass through low tree nodes (switches), it could save the network capacity of high tree nodes which means more bandwidth could be shared by lower nodes. Thirdly, as Table 1 shows, overload occurred in lower-level switches has less influence on tasks compared with overload occurs in higher-level switches.

Therefore, we define a Benefit function to estimate network influence quantitatively.

$$\begin{aligned} \mathrm{Benefit}_{Z}(G_{k})\ =\ \sum _{i=1}^{w_{k}}\sum _{j=1}^{w_{k}}\frac{t_{ij}^{k}}{B\left( HV_{Z}\left( v_{i}^{k},v_{j}^{k}\right) \right) }, \end{aligned}$$
(6)

where \(B(HV_{Z}(v_{i}^{k},v_{j}^{k}))\) is a bandwidth function of min tree level. To simplify the computation, we define \(B(x)\ =\ {(N)}^{x}\), N is the bandwidth declining rate. According to Cisco’s Data center Infrastructure Design Guide,Footnote 3 a typical network design of data centers has a bandwidth declining rate of 2.5:1 to 8:1. Here we use 4:1 as our bandwidth declining rate in experiment, and it should be modified to real bandwidth declining rate on the basis of specific data center architecture.

4.4 Consolidation optimization

Suppose at time \(t_{0}\), the placement of virtual machines was \(\left\{ X_{i}^{k}(m)\right\} \), \(\left\{ X_{i}^{k}(m)\right\} \in \mathcal {Z}\); and after consolidation, the new placement of virtual machines is \(\left\{ Y_{i}^{k}(m)\right\} \), \(\left\{ Y_{i}^{k}(m)\right\} \in \mathcal {Z}\). We can formulate our network-aware consolidation (NAC) problem as follows:

$$\begin{aligned} \begin{array}{rl} \mathbf {NAC(Y)} &{} \\ \text {Maximize} &{}\quad \sum _{i=1}^{K}\ Benefit_{Y}(G_{k})\, \\ \text {Subject to} &{}\quad \left\{ Y_{i}^{k}(m)\right\} \in \mathcal {Z},\ \forall {i,\,k,\,m} \\ {} &{} \mathrm{Benefit}_{X}(G_{k}) \le \mathrm{Benefit}_{Y}(G_{k}),\ \forall {k} \\ \end{array} \end{aligned}$$

The first equation is to optimize overall communication ability of the data center. The following equations guarantee that the new placement would be valid and for each group of virtual machines, it would benefit from consolidation.

5 Consolidation algorithms

Since Benefit function is nonlinear and virtual machine placement is a integer programming problem, NAC is a nonlinear integer programming problem. Murty et al. [11] proved that nonlinear programming is NP-Hard. Therefore, nonlinear integer programming, as a subproblem of nonlinear programming, is NP-Hard either. Since current data centers usually have more than 10,000 physical servers and 100,000 virtual machines running on them, to get the exact solution of the above NP-Hard problem in such a large scale is extremely time-consuming which is unacceptable in real applications. To solve the above optimization in a feasible time, we design a heuristic algorithm.

Before introducing our algorithm, we give three important observations first.

  • Since live migration is a costly operation which will generate large burst traffic, moving all virtual machines of a group to new servers or racks is too expensive.

  • Traffic between virtual machines is irregular; therefore, virtual machine pair which generates large traffic should have priority to be consolidated.

  • Communication through high-level switch has higher delay and lower bandwidth. Therefore, virtual machine pair which communicates through high-level switches should be rescheduled in priority.

Due to these observations, we defined a priority function,

$$\begin{aligned} \mathrm{Priority}(G_{k})\ =\ \sum _{i=1}^{w_{k}}\sum _{j=1}^{w_{k}} t_{ij}^{k}\cdot \left[ B\left( HV_{X}\left( v_{i}^{k},v_{j}^{k}\right) \right) \right] . \end{aligned}$$
(7)

The value of Priority\((G_{k})\) reflects the order of urgency. Thus, our algorithm firstly calculates Priority value of each virtual machine group and sort them in descending order. Then we select virtual machine group to do our consolidation in order. While processing each virtual machine group, we are trying to move virtual machine pair which has large mutual traffic into the same rack with minimum operations. Since consolidating all virtual machines of a group into the same rack is not always possible, we cluster virtual machines according to their traffic pattern in need .Therefore, we iteratively cluster and consolidate virtual machines until all the virtual machines in the same clustered group were consolidated into same rack.

Fig. 3
figure 3

Process of virtual machine clustering

5.1 Traffic-based VM clustering

In the condition that we cannot consolidate all virtual machines of a group into the same rack, we cluster virtual machines according to their traffic pattern. The goal of clustering is making the traffic between clustered parts to be minimum. Since a traffic matrix could be mapped to a weighted graph equivalently, clustering a traffic matrix could be converted to cut a weighted graph. The weighted minimum K-cut problem is NP-compete [7] when K is uncertain. When \(k=2\), using Stoer–Wagner [16] algorithm could find the optimum solution in polynomial time. Therefore, we perform Stoer–Wagner algorithm to split virtual machines of a group into two small groups and then we consolidate the virtual machines of each small group. If consolidation fails, we further split the failed group into smaller group by performing Stoer–Wagner algorithm again. The process of the clustering is demonstrated in Fig. 3. The vertexes in the graph represent virtual machines, and the weight on edges represents the traffic between virtual machines. After first time of clustering, the vertexes were split into two sets \(\{v1,v2,v3\}\) and \(\{v4,v5,v6\}\). The traffic between \(\{v1,v2,v3\}\) and \(\{v4,v5,v6\}\) is 2 which is minimum traffic in any clustering method. In second time of clustering, \(\{v4,v5,v6\}\) was clustered further into \(\{v4,v5\}\) and \(\{v6\}\).

5.2 Network-aware VM consolidation

For each group of virtual machines, we initiate a queue and push the group into the queue. While the queue is not empty, popping head element of the queue, a set of virtual machines, and trying to consolidate them into the same rack. If consolidation fails, clustering that set of virtual machines into two smaller sets and push them into the queue. In each round of consolidation, we keep the virtual machines in the rack which holds most virtual machines of this group unchanged and consolidate other virtual machine into this rack. Iterating this process until the queue is empty. The processes of clustering and consolidation are demonstrated in Fig. 3. The detailed algorithm is described as follows.

figure a

To avoid burst traffic from live migration which may congest the management network, we keep the migration speed under \(v_{i}\) which satisfies \(\mathrm{DS}(i)/\Delta \ t +v_{i}\cdot L(j,k) \le C(i) \).

5.3 Self-adaptive mechanism

We adopt two different ways to execute our clustering and consolidation algorithm, regularly and self-adaptively. Regular way means we do our consolidation algorithm periodically. Short period achieves better performance but brings more cost. Large period is more efficient but may miss the best time to do consolidation. Self-adaptive method resorts to a more intelligent way. It checks switches’ current status and traffic history and then judges whether consolidation is necessary. Self-adaptive consolidation algorithm is described as follows.

As Self-Adaptive Trigger described, we calculated the total traffic of each switch and compared it with their capacity. If the total traffic approaches the capacity, it denotes that the switch is overloaded or will be overloaded in near future. When switch is overloaded, we compare the current traffic of each virtual machine group with their history data. If current traffic conspicuously declines, this virtual machine group may have a strong possibility of suffering insufficient bandwidth. In such condition, consolidation is necessary.

figure b

Notice that in practice, we choose at least 600s as the value of \({\Delta }\,t\) due to the irregularly fluctuated traffic. And we ignored the migration traffic since such traffic goes on management network which is isolated from virtual machine traffic network.

6 Experiments

To evaluate the effectiveness of the proposed approach, we firstly conducted experiments on our experimental data center with small data set. After proving the effectiveness on small data set, we conducted a group of simulation experiments with large data set. Data set was based on the log of real parallel workload.Footnote 4 Small data set was derived from a small part of above data set and was modified to fit the scale of our experimental data center. Since this log only includes start time, end time, amount of machines, CPU requirement, memory requirement, we have to generate a random traffic matrix for each task to fit our simulation. And since all the traffic is randomly generated which could describe various scenario, there is no need to distinguish three-tier web application or map-reduce application as discussed before.

6.1 Experiments in a cloud computing testbed

Our experimental cloud computing testbed, see Fig. 1b, contains two rack groups, and each rack group has two five-server racks. All the switches used in experiments are 1Gb/s in specification. All the servers are connected by tree-like network.

Fig. 4
figure 4

Experiment results in cloud computing testbed. a Effects on distributed computing tasks, b effects on web applications

As Fig. 4a shows, four lines represent the performance of map-reduce tasks of barely using First-Fit (FF), network-aware [1] and using them along with our consolidation algorithm, respectively. Performance deteriorated quickly when barely using FF algorithm. Using our consolidation algorithm with FF could significantly promote the performance of FF algorithm. Due to constraint of the scale of our experimental environment, our consolidation algorithm didn’t bring much improvement to network-aware allocation algorithm. However, in long-time simulation with large data set which would be shown in next subsection, our clustering and consolidation algorithm brought much improvement to network-aware allocation algorithm either.

Figure 4b shows the result of comparing the performance of web applications in four different algorithms. Same as map-reduce scene, delay increased quickly when barely using FF algorithm. And using our clustering and consolidation algorithm with FF also could achieve good performance. Due to the reason mentioned in previous subsection, our clustering and consolidation algorithm didn’t bring much help to network-aware allocation algorithm in this situation.

6.2 Simulation experiments

In this part, we use large data set to conduct our simulation experiments. Since it is hard and inaccurate to compute a task’s finish time in simulation, we use the amount of scattered tasks and the sum of benefit to indicate the performance of tasks. Scattered task means that virtual machines which belong to this task has been allocated to different racks. If a task is scattered, its communication ability would be weaken which may lead to decline in performance. Benefit function defined in Eq. 6, which indicates the communication ability more accurately. According to the document of Thunder Linux Cluster,Footnote 5 we construct a virtual data center which contains 21 rack groups, 269 racks and 3224 physical servers. Workload includes 120, 000 tasks in a month.

Fig. 5
figure 5

Results of simulation experiments. a Scattered tasks: First-Fit, b scattered tasks: network aware, c scattered tasks: traffic aware, d normalized Benefit: First-Fit, e normalized Benefit: network aware, f normalized Benefit: traffic aware

Figure 5a shows the comparison of barely using First-Fit algorithm and First-Fit algorithm with clustering and consolidation. We can see that the amount of scattered tasks increase quickly with First-Fit algorithm. When we add our clustering and consolidation algorithm to First-Fit, the amount of scattered tasks decrease significantly. Figure 5d uses benefit value as an indicator of network performance of tasks which had the same tendency of Fig. 5a.

Figure 5b shows the comparison of barely using network-aware algorithm[1] and network-aware algorithm with clustering and consolidation. We can see that, although barely using network-aware VM allocation algorithm could achieve good performance in scattered tasks, it could do better when it coordinated with our clustering and consolidation algorithm. Figure 5e uses benefit value as an indicator of network performance of tasks. It also shows that clustering and consolidation algorithm could promote performance of original network-aware algorithm.

Figure 5c shows the comparison of barely using traffic-aware algorithm[10] and traffic-aware algorithm with clustering and consolidation. We can see that, traffic-aware algorithm achieved good performance as network-aware algorithm. However, it also achieved better performance when it coordinated with our clustering and consolidation algorithm. Similar as previous results, Fig. 5f uses benefit value as an indicator of network performance of tasks and shows that our clustering and consolidation algorithm could promote performance of original traffic-aware algorithm.

Fig. 6
figure 6

Top-level switch traffic. a First-Fit, b network aware, c traffic aware

Figure 6 shows the average traffic passed through top-level switches. We attached our clustering and consolidation algorithm with First-Fit allocation, network-aware allocation algorithm and traffic-aware allocation algorithm. Results shows that in all situations, using allocation algorithm with our clustering and consolidation algorithm could reduce top-level switch traffic significantly. Compared with First-Fit, network-aware allocation and traffic-aware allocation had much less top-level switch traffic. However, network-aware and traffic-aware allocation in company with our clustering and consolidation algorithm can make top-level switch traffic decreased further more.

Fig. 7
figure 7

Middle-level switch traffic. a Group0, b Group1, c Group2, d Group3, e Group0, f Group1, g Group2, h Group3

Figure 7 shows the average traffic passed through middle-level switches. Due to space constraints, we only displayed the traffic of first four groups. And since the differentiation of traffic between traffic-aware virtual machine algorithm and network-aware algorithm is not significant, we only displayed the result of network-aware algorithm. Similar with top-level switch traffic, results show that no matter we use First-Fit or network-aware allocation algorithm, when we use our clustering and consolidation algorithm to assist them, the middle-level switch traffic would be reduced tremendously. And since live migration renders large traffic in short time, there are some leaps in Fig. 7e–h. And because the total traffic was low, these traffic leaps will not cause side effects.

7 Conclusion

Virtual machine allocation and scheduling are key issues of management in data centers. In this paper, we analyzed the network influence on distributed computing tasks and web applications and proposed a network-aware virtual machine clustering and consolidation algorithm with self-adaptive mechanism. Our algorithm focused on lowering communication cost among virtual machines through conditional and automatic virtual machine live migrations. We used two benchmarks, RUBiS and PUMA, in our experimental cloud computing testbed to examine network influence on tasks. And by using real workload data in both simulation and our experimental testbed, we compared the result of with and without using our algorithm with different virtual machine allocation algorithms. Results in real experiments showed that our algorithm reduced the average finish time of map-reduced tasks and reduce time delay of web applications. Simulation results showed that our algorithm considerably reduced the amount of high-delay jobs, decreased the average traffic of high-level switches and improved the communication ability among virtual machines.

8 Future work

Software defined network (SDN) is gradually introduced into data centers which makes the network structure becoming more and more complicated. Virtual routers and virtual switches created hundreds of virtual networks on the physical network and change the communication path. Therefore, the next step of our work would focus on how to extend our network-aware consolidation algorithm into a SDN environment.