1 Introduction

Cloud data centers are growing exponentially in number and size to accommodate an escalating number of users and an expansion in applications. In the current “Cisco Global Cloud Index”, IT manufacturer Cisco predicts that by 2019, more than four-fifths of the workload in data centers will be handled in cloud data centers [1]. As a result, the tremendous amount of energy consumption and carbon dioxide emissions from cloud data centers are becoming a great concern worldwide. According to a report from Natural Resources Defense Council (NRDC), cloud data center energy consumption is estimated to reach 140 billion kW h by 2020, which will be responsible for the emission of nearly 150 million tons of carbon pollution [2]. Therefore, producing energy-efficient systems has become a focus for the development and operation of cloud data centers.

The main contributions of this paper are summarized as follows:

  1. 1.

    For reducing energy consumption and achieving greener cloud computing, we propose an energy-efficient Virtual Machine (VM) allocation strategy with an asynchronous multi-sleep mode and an adaptive task-migration scheme.

  2. 2.

    We present a method to model the proposed VM allocation strategy and to evaluate the system performance in terms of the average response time of tasks and the energy saving rate of the system.

  3. 3.

    With the help of an intelligent searching algorithm, we optimize the proposed VM allocation strategy to trade off different performance measures, such as the average response time of tasks and the energy saving rate of the system.

2 Review of Related Literature

In this section, we review the research studies into energy saving strategies in cloud data centers, sleep mode based energy saving strategies and enhanced particle swarm optimization algorithms. And then, we outline the motivation for our research.

2.1 Energy Saving Strategies in Cloud Data Centers

In cloud data centers, an enormous amount of energy can be wasted due to excessive provisioning [3, 4], while Service Level Agreements (SLA) violations can be risked by insufficient provisioning [5, 6]. In [7], by introducing dynamic voltage and frequency scaling (DVFS) methods as part of a consolidation approach, Arianyan et al. proposed a novel fuzzy multi-criteria and multi-objective resource management solution to reduce energy consumption and alleviate SLA violation. In [8], Jungmin et al. proposed a dynamic overbooking strategy, allocating a more precise amount of resources to VMs and traffic with a dynamically changing workload. In this strategy, both of the energy consumption and the SLA violations were considered. In [9], for the purpose of minimizing the energy consumption, Hosseinimotlagh et al. introduced an optimal utilization level of a host to execute a certain number of instructions. Furthermore, they proposed a VM scheduling algorithm based on unsurpassed utilization level in order to derive the optimal energy consumption while satisfying a given Quality of Service (QoS) requirement. The literature mentioned above has contributed to reducing energy consumption while guaranteeing response performance in cloud data centers. However, the energy consumption generated by idle hosts in cloud data centers has been ignored.

2.2 Sleep Mode Based Energy Saving Strategies

The use of a sleep mode is an efficient approach for reducing the energy consumption in data centers [10]. In [11], Duan et al. proposed a dynamic idle interval prediction scheme that can estimate the future idle interval length of a CPU and thereby choose the most cost-effective sleep state to minimize the power consumption during runtime. In [12], Sarji et al. proposed two energy models based on the statistical analysis of a server’s operational behavior. With these models, the Energy Savings Engine (ESE) in the cloud provider decided either to migrate the VMs from a lightly-loaded server and then put the machine into a sleep mode, or to keep the current server running and ready for receiving any new tasks. In [13], Liu et al. proposed a sleep state management model to balance the system’s energy consumption and the response performance. In this model, idle nodes were classified into different groups according to their sleep states. In the resource allocation process, nodes with the highest level of readiness were preferentially provided to the application. This research emphasized applying a sleep mode to a Physical Machine (PM).

To improve the energy efficiency of cloud data centers, Jin et al. proposed an energy-efficient strategy with a speed switch on PMs and a synchronous multi-sleep mode on partial VMs [14]. In [15], by applying dynamic power management (DPM) technology to PMs and introducing synchronous semi-sleep mode to partial VMs, Jin et al. proposed a novel VM scheduling strategy for reducing energy consumption in cloud data centers. Both of the studies mentioned above applied a synchronous sleep mode to the VMs. However, there has so far been no research into the effect of asynchronous sleep modes on the level of VMs in cloud data centers.

2.3 Enhanced Particle Swarm Optimization Algorithms

In 1995, particle swarm optimization (PSO) was developed as an effective tool for function optimization. Since then, numerous research studies on improving the searching ability of PSO algorithms have appeared. In [16], to enhance the performance of PSO algorithms, Cao et al. improved PSO algorithms by introducing a nonlinear dynamic inertia weight and two dynamic learning factors. In [17], Zhang et al. proposed a novel PSO algorithm based on an adaptive inertia weight and chaos optimization, which enhanced the local optimization ability of the PSO algorithm and helped objective functions easily jump out of local optimum. In [18], Tian presented a new PSO algorithm by introducing chaotic maps (Tent and Logistic), a Gaussian mutation mechanism, and a local re-initialization strategy into the standard PSO algorithm. The chaotic map is utilized to generate uniformly distributed particles for the purpose of improving the quality of the initial population. From the research mentioned above, we note that the searching ability of PSO algorithms are greatly influenced by the inertia weight and the initial positions of particles.

2.4 Motivation for Our Research

Inspired by the literature mentioned above, in this paper, we propose an energy-efficient strategy for VM allocation over cloud data centers. We note that letting all the VMs in a virtual cluster go to sleep may degrade the quality of cloud service. Taking both the response performance and the energy conservation level into consideration, we divide the VMs in a virtual cluster into two parts: Module I and Module II. The VMs in Module I stay awake all the time to provide an instant cloud service for accomplishing tasks, while the VMs in Module II may go to sleep whenever possible to reduce energy consumption. The energy consumption of a VM is related to the processing speed of the VM. Generally speaking, the higher the processing speed is, the more energy will be consumed. In our proposed strategy, the VMs in Module I process tasks at a higher speed to guarantee the response performance, while the VMs in Module II process tasks at a lower speed to save more energy. In order to further enhance the energy efficiency of the proposed strategy, we introduce an adaptive task-migration scheme which shifts an unfinished task in Module II to an idle VM in Module II. When an idle VM appears in Module I, a task being processed on a VM in Module II will migrate to the idle VM in Module I, and then the just evacuated VM in Module II will go to sleep independently. To analyze the proposed strategy, we build a queueing model with partial asynchronous multiple vacations by using a matrix geometric solution, and investigate the system performance through theoretical analysis and simulation experiments. Finally, in order to optimize the proposed strategy, we construct a cost function to balance different system performance levels, such as the average response time of tasks and the energy saving rate of the system, and apply the PSO algorithm to optimize the system parameter settings.

The rest of this paper is organized as follows. In Sect. 3, a novel energy-efficient VM allocation strategy is proposed and a queueing model is built accordingly. In Sect. 4, the queueing model is analyzed by using a matrix geometric solution. In Sect. 5, the expressions of the average response time of tasks and the energy saving rate of the system are derived. With numerical experiments, the system performance is evaluated in Sect. 6. In Sect. 7, an intelligent searching algorithm is used to optimize the number of the VMs in Module II and the sleeping parameter together. Finally, Sect. 8 outlines conclusions from the research.

3 Energy-Efficient VM Allocation Strategy and System Model

In this section, an energy-efficient VM allocation strategy with an asynchronous multi-sleep mode and an adaptive task-migration scheme is proposed. Accordingly, a type of continuous-time multi-server queueing model with partial asynchronous multiple vacations is established.

3.1 Energy-Efficient VM Allocation Strategy

In conventional cloud data centers, all the VMs remain open waiting for the arrival of tasks regardless of current traffic. This may result in a great deal of energy waste. To get around this problem, a novel VM allocation strategy with an asynchronous multi-sleep mode and an adaptive task-migration scheme is proposed in this paper. It should be emphasized that the asynchronous multi-sleep mode considered in this paper is at the level of VMs rather than that of PMs.

Given the processing capability and the energy conservation level, all the VMs hosted in a virtual cluster are divided into two modules, namely, Module I and Module II. The VMs in Module I stay awake all the time and run on a high speed when tasks arrive. Whereas, the VMs in Module II switch between the sleep state and the busy state.

For a busy VM in Module II, state transition only happens at the instant when a task is completely processed. Given that a task is completely processed in Module II, if the system buffer is empty, the evacuated VM in Module II will go to sleep. Once a VM in Module II switches to the sleep state, a sleep timer will be started, the data in the memory will be saved to a hibernation file on the hard disk, and then the power of the other accessories, except for the memory, will be cut off, so that the VM will no longer be available for processing tasks in the system. Given that a task is completely processed in Module I, if the system buffer is empty and there is at least one task being processed in Module II, one of the tasks being processed in Module II will be migrated to Module I, and then the evacuated VM in Module II will go to sleep. We note that the task-migration considered in this paper is a kind of online VM-migration between different modules within a virtual cluster.

For a sleeping VM in Module II, state transition only happens at the instant when a sleep timer expires. At the moment that a sleep timer expires, the sleeping VM in Module II will listen to the system and decide whether to keep sleeping or to wake up. If the system buffer is empty, another sleep timer will be started and the sleeping VM in Module II will begin another sleep period, so that multiple sleep periods are formed. Otherwise, the sleeping VM in Module II will wake up to process the first task waiting in the system buffer on a lower speed. Once a VM in Module II switches to the awake state, the corresponding sleep timer will be turned off, the data of the hibernation file on the hard disk will be read into the memory, and then the power of all accessories will be turned on, so that the VM will be available for processing tasks in the system.

With our proposed sleep mode, energy consumption could be saved, but the incoming tasks may not receive timely service. We speculate that the average response time of tasks is lower with a smaller number of VMs in Module II, while the energy saving rate of the system is higher with a suitable number of VMs in Module II. We also speculate that the average response time of tasks is lower with a shorter sleep period, while the energy saving rate of the system is higher with a longer sleep period. Given this, we should optimize the proposed strategy by trading off the average response time and the energy saving rate of the system. (The optimization approach will be given in Sect. 7.)

We show the state transition of a virtual cluster in the cloud data center under the proposed VM allocation strategy in Fig. 1.

Fig. 1
figure 1

State transition of a virtual cluster considered in three cases in this paper

As shown in Fig. 1, in the proposed strategy, the numbers of the VMs in Module I and Module II are denoted as c and d, respectively. All the VMs hosted in one virtual cluster are dominated by a control server, in which several sleep timers and a VM scheduler are deployed. Each sleep timer is responsible for controlling the sleep time of a VM in Module II. The numbers of tasks in the system, busy VMs in Module I and sleeping VMs in Module II are denoted as M, b and s, respectively. Given these parameters, the VM scheduler adjusts the VM state.

According to the state of VMs both in Module I and in Module II, we consider three cases as follows:

Case 1 :

There is at least one idle VM in Module I, and all the VMs in Module II are sleeping.

In Case 1, each arriving task could be processed immediately on a high speed in Module I. However, as more tasks arrive at the system, more VMs in Module I will be occupied. If there are no VMs available, a newly incoming task has to wait in the system buffer. Once a sleep timer expires, the corresponding VM in Module II will wake up to process the first task queueing in the system buffer on a low speed, and then the system will be converted to Case 2.

Case 2 :

All the VMs in Module I are busy, and there is at least one sleeping VM in Module II.

In Case 2, with the departures of the tasks, more VMs in Module II will go to sleep. At the moment a task process is completed in Module I and there are no tasks waiting in the system buffer, i.e., \(M>c\) and \(b<c\), one of the tasks being processed in Module II will be migrated to Module I, then the just evacuated VM in Module II will go to sleep. When all the VMs in Module II are asleep, i.e., \(M\le c\) and \(s=d\), the system will be converted back to Case 1.

We note that for Case 2, there are no VMs available in the system, so a newly incoming task will queue in the system buffer. When a task is completely processed on one of the VMs in Module I, the just evacuated VM in Module I will process the first task queueing in the system buffer on a high speed. Also, when one of the sleep timers expires, the corresponding VM in Module II will wake up and process the first task queueing in the system buffer on a low speed. Once all the VMs in Module II wake up, i.e., \(M \ge c+d\) and \(s=0\), the system will be converted to Case 3.

Case 3 :

All the VMs in both Module I and Module II are busy.

In Case 3, a newly incoming task has to wait in the system buffer since all the VMs hosted in the virtual cluster are occupied. With the departures of the tasks, more tasks in the system buffer will be processed on the evacuated VMs. Once the system buffer is empty and there exists at least one sleeping VM in Module II, i.e., \(M<c+d\) and \(s>0\), the system will be converted back to Case 2.

3.2 System Model

In cloud data centers, there are many available task scheduling schemes, such as event-driven scheduling schemes, preemptive scheduling schemes and random scheduling schemes. In our paper, we assume that an available VM can be assigned to the first task queueing in the system buffer. Regarding a task as a customer, a VM as an independent server, a sleep period as a vacation and multiple sleep periods as multiple vacations, we model the proposed strategy as a type of novel queueing model with partial asynchronous multiple vacations.

The system model is described as being in an infinite state. Let random variable \(N(t)=i, i\in \{0,1,\ldots \}\) be the total number of tasks in system at instant t. N(t) is also called the system level. Let random variable \(J(t)=j, j\in \{0,1,\ldots ,d\}\) be the number of busy VMs in Module II at instant t. J(t) is also called the system stage. \(\{N(t), J(t), t\ge 0\}\) constitutes a two-dimensional continuous-time stochastic process with the state-space \(\varvec{\varOmega }\) as follows:

$$\begin{aligned} \varvec{\varOmega }& {}= \{(i, 0): 0\le i\le c\} \cup \ \{(i,j):c<i\le c+d,\ 0\le j\le i-c\}\ \nonumber \\&\quad \cup \ \{(i, j): i>c+d, 0\le j\le d \}. \end{aligned}$$
(1)

In our research, we focus on user initiated tasks [19], and we make the following assumptions. We suppose that the arrival intervals of tasks, the service times of a task processed in Module I and in Module II, and the time lengths of a sleep timer are independent, identically distributed (i.i.d) random variables. Task arrivals are assumed to follow a Poisson process with parameter \(\lambda ~(\lambda > 0)\), the service times of a task processed in Module I and in Module II are assumed to follow exponential distributions with parameters \(\mu _{1}~(\mu _{1}>0)\) and \(\mu _{2}~(0<\mu _{2}<\mu _{1})\), respectively. In addition, the time length of a sleep timer is assumed to follow an exponential distribution with parameter \(\theta\), called the sleeping parameter. It should be noted that in the system model, we assume that no time is taken for a task to migrate or for a sleeping VM to wake up.

Based on the assumptions above, \(\{N(t), J(t), t\ge 0\}\) can be regarded as a two-dimensional continuous time Markov chain (CTMC).

We define \(\pi _{i,j}\) as the steady-state probability distribution of the system model for the system level being equal to i and the system stage being equal to j. \(\pi _{i,j}\) is then given as follows:

$$\begin{aligned} {\pi _{i,j}=\lim _{t\rightarrow \infty }P\{N(t)=i,J(t)=j\}}, \ (i,j)\in \varvec{\varOmega }. \end{aligned}$$
(2)

We define \(\varvec{\pi }_i\) as the steady-state probability distribution when the system level is i. \(\varvec{\pi }_i\) can be given as follows:

$$\begin{aligned} \varvec{\pi }_i=\left\{ \begin{array}{l} \pi _{i,0},\ 0\leqslant i\leqslant c \\ (\pi _{i,0},\pi _{i,1},\ldots ,\pi _{i,i-c}),\ c<i\leqslant c+d \\ (\pi _{i,0},\pi _{i,1},\ldots ,\pi _{i,d}),\ i>c+d. \\ \end{array} \right. \end{aligned}$$
(3)

The steady-state probability distribution \(\varvec{\varPi }\) of the two-dimensional CTMC is composed of \(\varvec{\pi }_i~(i\ge 0)\). \(\varvec{\varPi }\) is given as follows:

$$\begin{aligned} \varvec{\varPi }=(\varvec{\pi }_{0},\varvec{\pi }_{1},\ldots ). \end{aligned}$$
(4)

4 Model Analysis

In this section, the transition rate matrix of the two-dimensional CTMC is firstly investigated. Then, the steady-state probability distribution of system model is derived.

4.1 Transition Rate Matrix

Let Q be the one step state transition rate matrix of the two-dimensional CTMC \(\left\{ (N(t),J(t)),t\ge 0\right\}\). Based on the system level, Q is separated into several sub-matrices. Let \({{\varvec{Q}}}_{k,l}\) be the one step state transition rate sub-matrix for the system level changing from \(k\ (k=0,1,\ldots )\) to \(l\ (l=0,1,\ldots )\). For convenience of presentation, we denote \({{\varvec{Q}}}_{k, k-1}\), \({{\varvec{Q}}}_{k, k+1}\) and \({{\varvec{Q}}}_{k, k}\) as \({{\varvec{B}}}_{k}\), \({{\varvec{C}}}_{k}\) and \({{\varvec{A}}}_{k}\), respectively. \({{\varvec{B}}}_{k}\), \({{\varvec{C}}}_{k}\) and \({{\varvec{A}}}_{k}\) are discussed in the following cases.

  1. 1.

    When the initial system level k ranges from 0 to c, k VMs in Module I are busy and all the VMs in Module II are sleeping.

For the case of \(k=0\), there are no tasks at all in the system. This means that the possible state transitions are from (0, 0) to (1, 0) and from (0, 0) to (0, 0). If a task arrives at the system, the system level will increase by one but the system stage will remain unchanged, i.e., the system state will transform to (1, 0) from (0, 0) with the transition rate \(\lambda\). Otherwise, the system state will remain fixed at (0, 0) with the transition rate \(-\lambda\). Thus, \(C_{0}\) and \(A_{0}\) are given as follows:

$$\begin{aligned} C_{0}=\lambda ,\ A_{0}=-\lambda . \end{aligned}$$

For the case of \(0< k\leqslant c\), all the tasks in the system are being processed on the VMs in Module I. If a task is completely processed, the system level will decrease by one but the system stage will remain unchanged, i.e., the system state will transform to \((k-1,0)\) from (k, 0) with the transition rate \(k\mu _{1}\). If a task arrives at the system, the system level will increase by one but the system stage will remain unchanged, i.e., the system state will transfer to \((k+1,0)\) from (k, 0) with the transition rate \(\lambda\). Otherwise, the system state will remain fixed at (k, 0) with the transition rate \(-(\lambda +k\mu _{1})\). Thus, \(B_{k}\), \(C_{k}\) and \(A_{k}\) are given as follows:

$$\begin{aligned} B_{k}=k\mu _{1},\ C_{k}=\lambda ,\ A_{k}=-(\lambda +k\mu _{1}). \end{aligned}$$
  1. 2.

    When the initial system level k ranges from \((c+1)\) to \((c+d)\), all the VMs in Module I are busy, while at most \((k-c)\) VMs in Module II are busy.

For the case of \(k=c+x,\ x=1,2,\ldots ,d-1\), the number of busy VMs in Module I is c, while in Module II, there are at most x busy VMs.

If a task is completely processed and there is at least one task in the system buffer, the first task queueing in the system buffer will occupy the evacuated VM to receive service. Consequently, the system level will decrease by one, but the system stage will remain fixed, i.e., the system state will transform to \((k-1,n)\) from (kn) with the transition rate \((c\mu _{1}+n\mu _{2})\), where n\((0\le n \le x)\) is the number of busy VMs in Module II. If a task is completely processed on the VM in Module I and there are no tasks in the system buffer, one of the tasks being processed in Module II will migrate to the evacuated VM in Module I and the just-evacuated VM in Module II will start sleeping. If a task is completely processed on the VM in Module II and there are no tasks in the system buffer, the evacuated VM in Module II will start sleeping directly. Consequently, both the system level and the system stage will decrease by one, i.e., the system state will transform to \((k-1,x-1)\) from (kx) with the transition rate \((c\mu _{1}+x\mu _{2})\). Thus, \({{\varvec{B}}}_{k}\) is a rectangular \((x+1)\times x\) matrix and is given as follows:

$$\begin{aligned} {{\varvec{B}}}_{k}=\left( \begin{array}{cccc} c\mu _{1} &{} \ &{} \ &{} \ \\ \ &{} c\mu _{1}+\mu _{2} &{} \ &{} \ \\ \ &{} \ &{} \ddots &{} \ \\ \ &{} \ &{} \ &{} c\mu _{1}+(x-1)\mu _{2} \\ \ &{} \ &{} \ &{} c\mu _{1}+x\mu _{2} \\ \end{array} \right) . \end{aligned}$$

None of VMs in Module II will wake up before their corresponding sleep timers expire, even though the system buffer is not empty. If a task arrives at the system before one of the sleep timers expires, the system level will increase by one but the system stage will remain fixed, i.e., the system state will transform to \((k+1,n)\) from (kn) with the transition rate \(\lambda\). Thus, \({{\varvec{C}}}_{k}\) is a rectangular \((x+1)\times (x+2)\) matrix and is given as follows:

$$\begin{aligned} {{\varvec{C}}}_{k}= \left( \begin{array}{cccccc} \lambda &{}&{}&{}&{}&{}0\\ & \lambda &{} &{} &{} &{} 0 \\ &{} \ &{}\ddots &{} &{} &{} \vdots \\ \ &{} &{} &{} \lambda & &{} 0 \\ &{} &{} &{} &{} \lambda &{} 0 \\ \end{array} \right) . \end{aligned}$$

If one of the sleep timers expires, the corresponding VM in Module II will wake up and process the first task queueing in the buffer. Consequently, the system level k will remain fixed but the system stage n will increase by one, i.e., the system state will transform to \((k,n+1)\) from (kn) with the transition rate \((d-n)\theta\). Otherwise, the system state will remain fixed: when the system buffer is not empty, the transition rate is \(-h_{n}\), where \(h_{n}=\lambda +c\mu _{1}+n\mu _{2}+(d-n)\theta\); when the system buffer is empty, the transition rate is \(-(\lambda +c\mu _{1}+x\mu _{2})\). Thus, \({{\varvec{A}}}_{k}\) is a rectangular \((x+1)\times (x+1)\) matrix and is given as follows:

$$\begin{aligned} {{\varvec{A}}}_{k}= \left( \begin{array}{ccccccccc} -h_{0} &{} d\theta &{} \ &{} \ &{} \ \\ \ &{} -h_{1} &{} (d-1)\theta &{} \ &{} \ \\ \ &{} \ &{} \ddots &{}\ddots &{} \ \\ \ &{} \ &{} \ &{} -h_{x-1} &{} (d-x+1)\theta \\ \ &{} \ &{} \ &{} \ &{} -(\lambda +c\mu _{1}+x\mu _{2}) \\ \end{array} \right) . \end{aligned}$$

For the case of \(k=c+d\), the number of tasks in the system is equal to the total number of VMs. This is really just a specialized case discussed previously. \({{\varvec{B}}}_{k}\) is a rectangular \((d+1)\times d\) matrix, \({{\varvec{C}}}_{k}\) and \({{\varvec{A}}}_{k}\) are square matrices of the order \((d+1)\times (d+1)\). \({{\varvec{B}}}_{k}\), \({{\varvec{C}}}_{k}\) and \({{\varvec{A}}}_{k}\) are given as follows:

$$\begin{aligned}&{{\varvec{B}}}_{k}=\left( \begin{array}{cccc} c\mu _{1} &{} \ &{} \ &{} \ \\ \ &{} c\mu _{1}+\mu _{2} &{} \ &{} \ \\ \ &{} \ &{} \ddots &{} \ \\ \ &{} \ &{} \ &{} c\mu _{1}+(d-1)\mu _{2} \\ \ &{} \ &{} \ &{} c\mu _{1}+d\mu _{2} \\ \end{array} \right) , \\&{{\varvec{C}}}_{k}=\left( \begin{array}{ccccc} \lambda &{} \ &{} \ &{} \ &{} \ \\ \ &{} \lambda &{} \ &{} \ &{} \ \\ \ &{} \ &{}\ddots &{} \ &{} \ \\ \ &{} \ &{} \ &{} \lambda &{} \ \\ &{} &{} &{} & \lambda \\ \end{array} \right) , \\&{{\varvec{A}}}_{k}=\left( \begin{array}{ccccc} -h_{0} &{} d\theta &{} \ &{} \ &{} \ \\ \ &{} -h_{1} &{} (d-1)\theta &{} \ &{} \ \\ \ &{} \ &{} \ddots &{} \ddots &{} \ \\ \ &{} \ &{} \ &{} -h_{d-1} &{} \theta \\ \ &{} \ &{} \ &{} \ &{} -h_{d} \\ \end{array} \right) . \end{aligned}$$
  1. 3.

    When the initial system level is greater than the total number of VMs, i.e., \(k>c+d\), all the VMs in Module I are busy, while the VMs in Module II are either busy or sleeping. \({{\varvec{B}}}_{k}\), \({{\varvec{C}}}_{k}\) and \({{\varvec{A}}}_{k}\) are square matrices of the order \((d+1)\times (d+1)\). Similar to the discussion in item (2), the sub-matrices \({{\varvec{B}}}_{k}\), \({{\varvec{C}}}_{k}\) and \({{\varvec{A}}}_{k}\) are given as follows:

$$\begin{aligned}&{{\varvec{B}}}_{k}=\left( \begin{array}{ccccccc} c\mu _{1} &{} \ &{} \ &{} \ &{} \ \\ \ &{} c\mu _{1}+\mu _{2} &{} \ &{} \ &{} \ \\ \ &{} \ &{} \ddots &{} \ &{} \ \\ \ &{} \ &{} \ &{} c\mu _{1}+(d-1)\mu _{2} &{} \ \\ \ &{} \ &{} \ &{} \ &{} c\mu _{1}+d\mu _{2} \\ \end{array} \right) ,\\&{{\varvec{C}}}_{k}=\left( \begin{array}{ccccc} \lambda &{} \ &{} \ &{} \ &{} \ \\ \ &{} \lambda &{} \ &{} \ &{} \ \\ \ &{} \ &{}\ddots &{} \ &{} \ \\ \ &{} \ &{} \ &{} \lambda &{} \ \\ &{} &{} &{} & \lambda \\ \end{array} \right) ,\\&{{\varvec{A}}}_{k}=\left( \begin{array}{ccccc} -h_{0} &{} d\theta &{} \ &{} \ &{} \ \\ \ &{} -h_{1} &{} (d-1)\theta &{} \ &{} \ \\ \ &{} \ &{} \ddots &{} \ddots &{} \ \\ \ &{} \ &{} \ &{} -h_{d-1} &{} \theta \\ \ &{} \ &{} \ &{} \ &{} -h_{d} \\ \end{array} \right) . \end{aligned}$$

Now, all the sub-matrices in the one step state transition rate matrix Q have been addressed. Starting from the system level \((c+d)\), the sub-matrices \({{\varvec{A}}}_{k}\) and \({{\varvec{C}}}_{k}\) in Q are repeated forever. Starting from the system level \((c+d+1)\), the sub-matrices \({{\varvec{B}}}_{k}\) in Q are repeated forever. The repetitive sub-matrices \({{\varvec{B}}}_{k}\), \({{\varvec{A}}}_{k}\) and \({{\varvec{C}}}_{k}\) are represented by \({{\varvec{B}}}\), \({{\varvec{A}}}\) and \({{\varvec{C}}}\), respectively. For this, Q is written as follows:

$$\begin{aligned} {{\varvec{Q}}}=\left( \begin{array}{ccccccccccccccc} A_{0} &{} C_{0} &{} &{} &{} &{} &{} &{} &{} &{} &{} &{}\\ B_{1} &{} A_{1} &{} C_{1} &{} &{} &{} &{} &{} &{} &{} &{} &{}\\ &{} \ddots &{} \ddots &{}\ddots &{} &{} &{} &{} &{} &{} &{} \\ &{} &{}B_{c}&{} A_{c} &{}C_{c} &{} &{} &{} &{} &{} &{} \\ &{} &{} &{}{{\varvec{B}}}_{c+1}&{} {{\varvec{A}}}_{c+1} &{}{{\varvec{C}}}_{c+1} &{} &{} &{} &{} &{} \\ &{} &{} &{} &{} \ddots &{}\ddots &{}\ddots &{} &{} &{} &{} \\ &{} &{} &{} &{} &{}{{\varvec{B}}}_{c+d} &{}{{\varvec{A}}} &{}{{\varvec{C}}} &{} &{} \\ &{} &{} &{} &{} &{} &{}{{\varvec{B}}} &{}{{\varvec{A}}} &{}{{\varvec{C}}} &{} \\ &{} &{} &{} &{} &{} &{} &{}\ddots &{}\ddots &{}\ddots \\ \end{array} \right) . \end{aligned}$$
(5)

The block-tridiagonal structure of the one step state transition rate matrix Q shows that the state transitions occur only between adjacent system levels. Referring to [20], we know that the two-dimensional CTMC \(\{N(t),J(t),t\ge 0\}\) can be seen as a type of Quasi Birth-and-Death (QBD) process.

4.2 Steady-State Probability Distribution

For the QBD process \(\{N(t),J(t),t\ge 0\}\) with the one step state transition rate matrix Q, the necessary and sufficient condition for positive recurrence is that the matrix quadratic equation

$$\begin{aligned} {{\varvec{R}}}^2{{\varvec{B}}} + {{\varvec{R}}}{{\varvec{A}}} + {{\varvec{C}}} = \mathbf 0 \end{aligned}$$
(6)

has the minimal non-negative solution \({{\varvec{R}}}\) with the spectral radius \(SP({{\varvec{R}}}) < 1\). This solution, called the rate matrix and denoted by \({{\varvec{R}}}\), can be explicitly determined.

From Sect. 4.1, we find that the sub-matrices \({{\varvec{B}}}\), \({{\varvec{A}}}\) and \({{\varvec{C}}}\) are upper-triangular matrices. So, the rate matrix \({{\varvec{R}}}\) must be an upper-triangular matrix and can be expressed as follows:

$$\begin{aligned} {{\varvec{R}}}= \left( \begin{array}{cccccc} r_{0}&{}r_{01}&{}r_{02}&{}\cdots &{}r_{0d-1}&{}r_{0d}\\ &{}r_{1}&{}r_{12}&{}\cdots &{}r_{1d-1}&{}r_{1d}\\ &{}&{}r_{2}&{}\cdots &{}r_{2d-1}&{}r_{2d}\\ &{}&{}&{}\ddots &{}\vdots &{}\vdots \\ &{}&{}&{}&{}r_{d-1}&{}r_{d-1d}\\ &{}&{}&{}&{}&{}r_{d}\\ \end{array} \right) . \end{aligned}$$
(7)

Then, the elements of \({{\varvec{R}}}^{2}\) are

$$\begin{aligned} ({{\varvec{R}}}^{2})_{kk}&=r^{2}_{k},\ 0\leqslant k\leqslant d,\\ ({{\varvec{R}}}^{2})_{jk}&=\sum \limits ^{k}_{i=j}r_{ji}r_{ik},\ 0\leqslant j\leqslant d-1, j+1\leqslant k\leqslant d. \end{aligned}$$

Substituting \({{\varvec{R}}}^{2}\), \({{\varvec{R}}}\), \({{\varvec{A}}}\), \({{\varvec{B}}}\) and \({{\varvec{C}}}\) into Eq. (6) yields a set of equations:

$$\begin{aligned} \left\{ \begin{array}{l} (c\mu _{1}+k\mu _{2})r_{k}^{2}-h_{k}r_{k} +\lambda =0,\ 0\leqslant k\leqslant d\\ (c\mu _{1}+k\mu _{2})\sum \nolimits ^{k}_{i=j}r_{ji}r_{ik}-h_{k}r_{jk}+(d-k+1)\theta r_{j,k-1}\\ \quad =\,0,\ 0\leqslant j\leqslant d-1,\ j+1\leqslant k\leqslant d. \end{array}\right. \end{aligned}$$
(8)

If the traffic load \(\rho =\lambda (c\mu _{1}+d\mu _{2})^{-1}<1\), it can be proven that the first equation of Eq. (8) has two real roots \(0<r_{k}<1\) and \(r_{k}^{*} \ge 1\). Note that the diagonal elements of \({{\varvec{R}}}\) are \(r_{k}~(0\le k \le d)\) and the spectral radius of R satisfies:

$$\begin{aligned} SP(\varvec{R})={ max}\{ r_{0}, r_{0}, \ldots , r_{d} \} < 1. \end{aligned}$$
(9)

The off-diagonal elements of \({{\varvec{R}}}\) satisfy the last equation of Eq. (8). It is an arduous task to give a general expression for \(r_{jk} \ (0\le j\le d-1, j+1\le k\le d)\) in closed-form, so we recursively compute the off-diagonal elements based on the diagonal elements.

Since the QBD process with the one step state transition rate matrix Q is positive recurrent, the stationary distribution is easily expressed in the matrix geometric form with the rate matrix \({{\varvec{R}}}\) as follows:

$$\begin{aligned} \varvec{\pi }_i = \varvec{\pi }_{c+d}{{\varvec{R}}}^{i-(c+d)}, \ i\ge c+d. \end{aligned}$$
(10)

In order to obtain the unknown stationary distribution \(\varvec{\pi }_{0}\), \(\varvec{\pi }_{1}\), \(\ldots\), \(\varvec{\pi }_{c+d}\), we construct a square matrix \(B[{{\varvec{R}}}]\) of the order \(\left[ c+\frac{(d+1)(d+2)}{2}\right] \times \left[ c+\frac{(d+1)(d+2)}{2}\right]\) as follows:

$$\begin{aligned} B [{{\varvec{R}}}]= \left( \begin{array}{cccccccccccc} A_{0} &{} C_{0}\\ B_{1} &{} A_{1} &{} C_{1}\\ &{} \ddots &{} \ddots &{}\ddots \\ &{} &{}B_{c}&{} A_{c} &{}C_{c}\\ &{} &{} &{}{{\varvec{B}}}_{c+1}&{} {{\varvec{A}}}_{c+1} &{}{{\varvec{C}}}_{c+1}\\ &{} &{} &{} &{} \ddots &{}\ddots &{}\ddots &{} &{} &{} &{} \\ &{} &{} &{} &{} &{}{{\varvec{B}}}_{c+d-1} &{}{{\varvec{A}}}_{c+d-1} &{}{{\varvec{C}}}_{c+d-1}\\ &{} &{} &{} &{} &{} &{}{{\varvec{B}}}_{c+d} &{}{{\varvec{RB}}}+{{\varvec{A}}}\\ \end{array} \right) . \end{aligned}$$
(11)

Using the method of a matrix geometric solution, we can construct an augmented matrix equation as

(12)

where \(\varvec{e}_{1}\) is a \(\left[ c+\frac{d(d+1)}{2} \right] \times 1\) vector with ones, and \(\varvec{e}_{2}\) is a \((d+1)\times 1\) vector with ones.

Applying the Gauss–Seidel method [21] to solve Eq. (12), we can obtain \(\varvec{\pi }_0\), \(\varvec{\pi }_1\), ..., \(\varvec{\pi }_{c+d}\). Substituting \(\varvec{\pi }_{c+d}\) obtained in Eq. (12) into Eq. (10), we can obtain \(\varvec{\pi }_i\ (i=c+d+1, c+d+2,\ldots )\). Then the steady-state probability distribution \(\varvec{\varPi }=(\varvec{\pi }_{0}, \varvec{\pi }_{1}, \ldots )\) of the system can be given mathematically.

5 Performance Measures

In this section, the performance measures in terms of the average response time of tasks and the energy saving rate of the system are mathematically evaluated.

We define the response time of a task as the duration from the instant a task arrives at the system to the instant this task is completely processed.

Based on the steady-state probability distribution of the system model given in Sect. 4.2, the average response time E[T] of tasks is given as follows:

$$\begin{aligned} E[T]=\frac{1}{\lambda } \left( \sum ^{c}_{i=0}i\pi _{i0}+\sum ^{c+d}_{i=c+1}\sum ^{i-c}_{j=0}i\pi _{i,j}+\sum ^{\infty }_{i=c+d+1}\sum ^{d}_{j=0}i\pi _{i,j} \right) . \end{aligned}$$
(13)

In our proposed VM allocation strategy, energy consumption can be reduced during the sleep period. We let \(\omega \ (\omega >0)\) be the energy consumption per unit time for a busy VM in Module II, and \(\omega _{s}\ (\omega _{s}>0)\) be the energy consumption per unit time for a sleeping VM in Module II. Obviously, \(\omega > \omega _{s}\). We note that additional energy will be consumed when a task migrates from Module II to Module I, when a VM in Module II listens to the system buffer, as well as when a VM in Module II wakes up from sleep state. Let \(\omega _{m}\ (\omega _{m}>0)\), \(\omega _{l}\ (\omega _{l}>0)\) and \(\omega _{u}\ (\omega _{u}>0)\) be the energy consumption for each migration, listening and wakeup, respectively.

We define the energy saving rate of the system as the energy conservation per unit time with our proposed strategy. Based on the discussions above and the steady-state probability distribution of the system model given in Sect. 4.2, the energy saving rate \(\psi\) of the system is given as follows:

$$\begin{aligned} \psi &=(\omega _{}-\omega _{s})\sum ^{\infty }_{i=0}\sum ^{d}_{j=0}(d-j)\pi _{ij}- \left( \omega _{m}\sum ^{c+d}_{i=c+1}\sum ^{d}_{j=1}c\mu _{1}\pi _{ij} \right. \\&\quad \left. +\, \omega _{l}\sum ^{\infty }_{i=0}\sum ^{d}_{j=0}\theta (d-j)\pi _{ij}+ \omega _{u}\sum ^{\infty }_{i=c+j+1}\sum ^{d-1}_{j=0}\theta (d-j)\pi _{ij} \right) \end{aligned}$$
(14)

6 Numerical Experiments

In order to evaluate the average response time of tasks and the energy saving rate of the system with the proposed VM allocation strategy, we provide numerical experiments with analysis and simulation. The analysis results are obtained based on Eqs. (13) and (14) using Matlab 2011a. The simulation results are obtained by using MyEclipse 2014. We create a JOB class with attributes in terms of UNARRIVE, WAIT, RUNHIGH, RUNLOW and FINISH to record the task state. We also create a SERVER class with attributes in terms of SLEEP, IDLE, BUSYLOW and BUSYHIGH to record the state of a VM. The necessary and sufficient condition for the system being stable is \(\rho <1\). We analyze the system model and evaluate the system performance under the condition that \(\rho <1\). To compare our proposed strategy with the existing VM allocation strategies, we set parameters in numerical experiments by referencing [14]. The parameter settings in the numerical experiments are shown in Table 1.

Table 1 Parameter settings in numerical experiments

We note that, with different parameter settings, as long as the system is stable, the trend for all the performance measures will not change much.

Figure 2 examines the influence of the sleeping parameter \(\theta\) on the average response time E[T] of the tasks for the different number d of VMs in Module II.

Fig. 2
figure 2

Average response time E[T] of tasks versus sleeping parameter \(\theta\)

From Fig. 2, we observe that if there are less VMs in Module II (such as \(d<24\)), the average response time E[T] of tasks remains nearly constant across all the values of the sleeping parameter \(\theta\). For this case, the capability of the VMs in Module I is strong enough to process all the arriving tasks, and there are no tasks waiting in the system buffer. As a result, it is likely that the VMs in Module II keep sleeping. So, the average response time of tasks is approximately the average service time \((\mu _{1}^{-1})\) of tasks processed in Module I.

From Fig. 2, we also observe that if there are more VMs in Module II (such as \(d=24,41,44,50\)), the average response time E[T] of tasks initially decreases sharply from a high value, then decreases slightly before finally converging to a certain value as the sleeping parameter \(\theta\) increases. For this case, the processing capability of the VMs in Module I is insufficient to cope with the existing traffic load, so some arriving tasks have to wait in the system buffer. As a result, the VMs in Module II are more likely to be awake after a sleep period and process the tasks waiting in the system buffer. The influence of the sleeping parameter on the average response time of tasks is discussed as follows.

When the sleeping parameter \(\theta\) is relatively small (such as \(0<\theta <0.4\) for \(d=41\)), the tasks arriving in the sleep period will have to wait longer in the system buffer. This results in a higher average response time of tasks. For this case, the influence on the average response time of tasks exerted by the sleeping parameter is greater than that exerted by the arrival rate of tasks and the service rate of tasks. Consequently, the average response time of tasks will decrease sharply as the sleeping parameter increases.

When the sleeping parameter \(\theta\) becomes larger (such as \(0.4<\theta <2.0\) for \(d=41\)), the tasks arriving during a sleep period will be processed earlier. This results in a lower average response time of tasks. For this case, the arrival rate of tasks and the service rate of tasks are the dominate factors influencing the average response time of tasks. Consequently, there is only a slight decreasing trend in the average response time of tasks in respect to the sleeping parameter.

From Fig. 2, we also notice that for the same sleeping parameter \(\theta\), the average response time E[T] of tasks will increase as the number d of VMs in Module II increases. As the number of VMs in Module II increases, and the system capability becomes weaker, and tasks will sojourn longer in the system. This will inevitably increase the average response time of tasks.

Comparing the results in Fig. 2a, b, we find that for the same number d of VMs in Module II and the same sleeping parameter \(\theta\), a larger service rate \(\mu _{2}\) of a task on the VM in Module II leads a lower average response time E[T] of tasks. This is because the fact that the larger the service rate of a task on the VM in Module II is, the more quickly the VMs in Module II will process the tasks, and the fewer tasks will wait in the buffer. Therefore, the average response time of potential users will be lower.

Figure 3 examines the influence of the sleeping parameter \(\theta\) on the energy saving rate \(\psi\) of the system for the different number d of VMs in Module II.

Fig. 3
figure 3

Energy saving rate \(\psi\) of the system versus sleeping parameter \(\theta\)

From Fig. 3, we observe that for the same number d of VMs in Module II, the energy saving rate \(\psi\) of the system decreases as the sleeping parameter \(\theta\) increases. The larger the sleeping parameter is, the more frequently the VM in Module II listens to the system buffer and consumes additional energy. Therefore, the energy saving rate of the system will decrease.

From Fig. 3, we also notice that for the same sleeping parameter \(\theta\), either too few or too many VMs being deployed in Module II will lead to a lower energy saving rate \(\psi\) of the system. When the number of VMs in Module II is very small (such as \(d=0,1,4\)), less energy can be saved even though all the VMs in Module II are sleeping. This results in a lower energy saving rate of the system. When the number of VMs in Module II is very large (such as \(d=41,44,50\)), the system capability gets weaker. There is hardly any chance for the VMs in Module II to go to sleep. This results in a lower energy saving rate of the system.

Comparing the results shown in Fig. 3a, b, we find that the different numbers d of the VMs in Module II and the different service rates \(\mu _{2}\) of a task on the VM in Module II have different influence on the energy saving rate \(\psi\) of the system.

When less VMs are deployed in Module II (such as \(d=4\) for \(\theta =0.2\)), the service capability of Module I is strong enough to process most of the arriving tasks, therefore only a few VMs in Module II will wake up and process other remaining tasks. In this case, the energy saving rate of the system mainly depends on the service rate of a task on the awake VM in Module II. As the service rate of a task on the VM in Module II increases, the VMs in Module II will consume more energy. This results in a lower energy saving rate of the system.

When more VMs are deployed in Module II (such as \(d=41, 44, 50\) for \(\theta =0.2\)), the service capability of Module I is weaker, therefore more VMs in Module II have to wake up and process the arriving tasks. In this case, the number of sleeping VMs in Module II is the dominant factor to influence the energy saving rate of the system. The larger the service rate of a task on the VM in Module II is, the more quickly the VMs in Module II will process the tasks, and the more VMs in Module II will go to sleep, therefore more energy will be saved. This results the energy saving rate of the system to be higher.

From the discussions above, we foresee that when deploying the VMs in Module II and setting the sleeping parameter, we need to consider the service rate of a task on the VM in Module II.

In Figs. 2 and 3, the experiment results with \(d=0\) are for the conventional strategy where all the VMs always stay awake. The experiment results with \(d=50\) are for the conventional strategy where all the VMs are under an asynchronous multi-sleep mode. Compared to the conventional strategy where all the VMs always stay awake, our proposed strategy results in greater energy consumption without significantly affecting the response performance. Compared to the conventional strategy where all the VMs are under an asynchronous multi-sleep mode, our proposed strategy performs better in guaranteeing the response performance at the cost of occasional degradation in energy saving effect.

Comparing the results shown in Figs. 2 and 3, we find that a larger sleeping parameter leads to not only a shorter average response time of tasks but also a lower energy saving rate of the system, while a smaller sleeping parameter leads to not only a higher energy saving rate of the system but also a longer average response time of tasks. We also find that the energy saving rate of the system is higher with a moderate number of VMs in Module II, while the average response time of tasks is lower with a smaller number of VMs in Module II. Therefore, a trade-off between the average response time of tasks and the energy saving rate of the system should be aimed for when setting the number of VMs in Module II and the sleeping parameter in our proposed VM allocation strategy.

7 Performance Optimization

By trading off the average response time of tasks against the energy saving rate of the system, we establish a system cost function \(F(d,\theta )\) as follows:

$$\begin{aligned} F(d,\theta )=f_{1}E[T]-f_{2}\psi \end{aligned}$$
(15)

where \(f_{1}\) and \(f_{2}\) are treated as the impact factors for the average response time E[T] of tasks and the energy saving rate \(\psi\) of the system on the system cost function.

We note that the mathematical expressions for the average response time E[T] of tasks and the energy saving rate \(\psi\) of the system are difficult to express in closed-forms. The monotonicity of the system cost function is uncertain. In order to jointly optimize the number of the VMs in Module II and the sleeping parameter with the minimum system cost function, we turn to the Particle Swarm Optimization (PSO) intelligent searching algorithm.

Compared with other intelligent optimization algorithms, the PSO algorithm is simple to implement, and there are not many parameters to be adjusted [22, 23]. However, the traditional PSO algorithm has the disadvantage of premature convergence and easily falling into a local extreme. For this, in this paper, we turn to a PSO algorithm with a chaotic mapping mechanism and a nonlinear decreasing inertia weight to optimize the number of the VMs in Module II and the sleeping parameter together.

The main steps to jointly optimize the number of the VMs in Module II and the sleeping parameter are given in Table 2.

In Table 2, we use the system parameters given in Table 1, and set \(f_{1}=4\), \(f_{2}=1\), \(N=100\), \(iter_{max}=200\), \(c_{1}=1.4962\), \(c_{2}=1.4962\), \(w_{max}=0.95\), \(w_{min}=0.40\), \(Ub=2\), \(Lb=0\) and \(X=50\). For different service rates \(\mu _{2}\), we obtain the optimal combination \((d^{*},\theta ^{*})\) for the number of VMs in Module II and the sleeping parameter with the minimum system cost function \(F^{*}\) in Table 3.

Table 2 Main steps to obtain optimal combination \((d^{*},\theta ^*\))

The optimization results in Table 3 depend on the arrival intensity of tasks, the serving capability of VMs and the cloud capacity. By substituting the arrival rate \(\lambda\), the service rate \(\mu _{2}\) of a task on the VM in Module II, and the total number \((c+d)\) of VMs in a virtual cluster, etc. into the algorithm in Table 2, the optimal parameter combination \((d^{*},\theta ^{*})\) for the number of VMs in Module II and the sleeping parameter can be obtained for the proposed strategy.

Table 3 Optimization results: \((d^*,\theta ^*)\) and \(F^{*}\)

8 Conclusions

In this paper, with the aim of reducing energy consumption and achieving greener computing, we proposed a novel energy-efficient Virtual Machine (VM) allocation strategy. Considering an asynchronous multi-sleep mode and an adaptive task-migration scheme with the proposed strategy, we established a type of queueing model with partial asynchronous multiple vacations, and derived the steady-state distribution of the system model. The queueing model quantified the effects of the number of VMs in Module II and the sleeping parameter. These effects were measured by two performance measures: the average response time of tasks and the energy saving rate of the system. Experimental results showed that the energy saving rate of the system is higher with a moderate number of VMs in Module II, while the average response time of tasks is lower with a smaller number of VMs in Module II. Accordingly, we built a system cost function to investigate a trade-off between different performance measures. By using a PSO algorithm with a chaotic mapping mechanism and a nonlinear decreasing inertia weight, we jointly optimized the number of VMs in Module II and the sleeping parameter with the minimum system cost function. In our future work, we would investigate a VM allocation strategy with (NT) policy to trade off the average response time of tasks and the energy saving rate of the system, and build a four-dimensional Markov chain to have insight into the proposed strategy by considering the migration time and the wakeup time. Moreover, we would introduce a more versatile stochastic process, such as Markov Modulated Poisson Process (MMPP) or Interrupted Poisson Process (IPP), to model the task arrivals, and use a real-world dataset to enhance the contribution of our research.