Keywords

1 Introduction

In modern data center, virtualization technology [1] has played an important role in simplifying resource management, integrating server capability and improving resource utilization. It has become the key supporting technology in cloud computing systems. Virtual machine resource scheduling [2] is one of the core techniques in this field, how to design the resource scheduling algorithms for virtual machines and thus ensure the load balancing of system and improve the user experience has been a hot research topic. Most resource scheduling algorithms for virtual machines apply serial ways to deploy user jobs, which can cause the higher resource updating frequency, the longer of the overall job completion time and the poor user experience. Furthermore, serial job deployment usually uses greedy strategy and cannot achieve the global optimum. This may lead to the serious load imbalance and affect the overall system performance. To solve these problems, researchers have proposed many resource scheduling algorithms for virtual machines.

Liu et al. [3] presented two new metrics, Balance Capacity (BC) and Potential Capacity (PC) for virtual machines and used greedy strategy to deploy batch jobs serially, although it makes virtual machines have better scalability, it also can lead to the overall job completion time longer and load imbalance of a cloud system. Zhang [4] proposed a resource scheduling approach for virtual machines based on OpenStack, which can reduce the unbalance of resource and the consumption of power, but the algorithm don’t consider the dependencies of the virtual machines. Minarolli and Freisleben [5] presented a resource management algorithm based on artificial neural network which reduced the number of active servers, but this method mainly focuses on energy consumption, and do not consider load balance and user experience. [6] proposed a random resource scheduling algorithm which has good load balancing. But the state of the virtual machine is unknown, which may lead to the heavy load imbalance of virtual machines and long waiting time of the tasks. Qu et al. [7] proposed a CPU resource scheduling method based on workload-Aware, this algorithm is a cyclical algorithm, and each cycle should adjusts the weights, and it lead to the computational cost very expensive. Dong et al. [8] proposed a two-stage virtual machine scheduling model achieving a tradeoff between energy efficiency and network performance, but the method cannot accurately estimate the cost of dynamic migrating virtual machines. Aiming at the problem of uncertainties in the resource allocation, Umamageswari and Babu [9] proposed a dynamic resource allocation optimization method for virtual machines, which improved the cloud system load balancing in some extent, and reduced the expense cost, but this method used the static allocation algorithm and cannot adapt to requirements of dynamic job resource scheduling. Carril et al. [10] presented a fault-tolerant virtual cluster architecture, which mainly focused on the fault-tolerant issues of the virtual machine clusters in a cloud environment.

In summary, the existing resource scheduling algorithms for virtual machines consider only one or two factors; the overall job completion time is too long, and it may lead to the system load imbalance. To solve the problems, an improved PC based resource scheduling algorithm for virtual machines is proposed, which comprehensively considers the overall job completion time and load balance of a cloud platform, use an improve potential capacity (IPC) to accurately calculate resource remaining capacities for virtual machines, which can avoid inexact match between jobs and servers, and a batch job deployment way is proposed to reduce the update time and the serial deployment time. The experimental results show that the proposed algorithm can effectively reduce the overall job completion time, improve the load balance of a cloud system and has better performance.

2 The Improved PC Based Resource Scheduling Algorithm for Virtual Machines

2.1 The Measures of Virtual Machine Capacity

In cloud computing systems, virtual machines are basic units of dynamic deploying, sharing server’s calculation capacity and facilitating management. How to exactly and dynamically measure the capacity of virtual machines is a very important problem in jobs scheduling. In this paper we made the deep research on the elastic new metrics, Balance Capacity (BC) and Potential Capacity (PC) for virtual machines which are proposed by Liu et al. [3], and did the necessary amendments to the calculation method to make it more reasonable. On this basis, we designed a new batch job scheduling algorithm. Usually the resource sharing model apples Min, Max and Share three parameters to describe the server resources occupied by a virtual machine, where the parameter Min represents the amount of the server resources occupied by a virtual machine without any load; Max represents the amount of the server resources occupied by a virtual machine with full loads; Share represents the share ratio of a virtual machine be allocated the competition resource. Next we will give the definitions of BC and PC and the corresponding calculation formulas.

Definition 1.

Balance Capacity (BC) is the guarantee capacity to assign each virtual machine in the worst case. Considering a virtual machine set of \( VM = \{ vm_{1} , \ldots ,vm_{n} \} \), vm i is the i-th (\( 1 \le {\text{i}} \le {\text{n}} \)) virtual machine, Min(Min > 0) is re minimum load of the virtual machine, Max ≤ pCap is the maximum load of the virtual machine, pCap represents the total resource number of the physic machine containing the virtual machine. The BC of virtual machine i can be calculated as the following.

$$ pCap^{\prime} = pCap - \sum\nolimits_{k = 1}^{m} {Min_{k} } $$
(1)
$$ BC_{i}^{ '} = Min_{i} + share_{i} /\sum\nolimits_{k = 1}^{m} {share_{k} *pCap^{\prime}} $$
(2)
$$ BC_{i} = \left\{ {\begin{array}{*{20}c} {BC_{i}^{ '} } & {BC_{i}^{ '} < Max_{i} } \\ {Max_{i} } & {BC_{i}^{ '} \ge Max_{i} } \\ \end{array} } \right. $$
(3)

Obviously, the BC of a virtual machine is independent of the dynamic system parameters of cloud systems, when the number of virtual machines on a physic node is fixed; the BCs of the virtual machines on the physic node are fixed and become constant values, so it do not reflect the system dynamic characters.

Definition 2.

Potential Capacity (PC) is the maximum available capacity of a virtual machine in some system state.

PC of a virtual machine relies on the current remaining resource of physic node in the cloud platform. The current resource utilization of vm i is U i . When there are N + 1 virtual machines running on a physic node, the computation of PC B for a virtual machine B is as shown in formula (4).

$$ PC_{B} = \left\{ {\begin{array}{*{20}c} {min\{ pCap - \sum\nolimits_{i = 1}^{N} {U_{i} ,Max_{B} } \} } & {\sum\nolimits_{i = 1}^{N} {U_{i} } \le \sum\nolimits_{i = 1}^{N} {BC_{i} } } \\ {pCap - \sum\nolimits_{i = 1}^{N} {U_{i} } \quad \quad \quad } & {otherwise} \\ \end{array} } \right. $$
(4)

Obviously, with the change of utilization of other virtual machines, and the value of PC is dynamic changes, it is possible to truly reflect the virtual machines available capacity. \( {\text{PC}} - {\text{U}}_{\text{i}} \) (\( {\text{U}}_{\text{i}} \) is the resource utilization of i-th virtual machine) represents the current resource remaining capacity for every virtual machine in the cloud platform, when the current resource utilization ratio of a virtual machine increases, resource remaining capacity should decrease, which is reasonable. In summary, as the value of PC increase, the value of \( {\text{PC}} - {\text{U}}_{\text{i}} \) increase, it indicates that the greater the ability of the remaining resource of the virtual machines, the more of available resource.

According to the formula (4), we can get the value of PC for various resource of virtual machines such as CPU, memory, network, I/O and etc. The current resource remaining capacity of the i-th virtual machine can be represented by a set VM i  = {vm i1vm i2, …, vm im }, vm ij  = pc ij  − u ij , vm ij is the remaining amount of the j-th resource on the i-th virtual machine.

2.2 Measures of Job Resource Requirements

Jobs are divided into independent jobs and collaboration jobs in Cloud Computing systems [11], we only consider scheduling strategy and deployment strategy of independent jobs in this paper. It assumes that a set of batch jobs \( {\text{Job}} = \{ {\text{job}}_{1} , \ldots ,{\text{job}}_{\text{n}} \} \), \( {\text{job}}_{\text{i}} \) \( (0 \le {\text{i}} \le {\text{n}} \)) represents the i-th jobs in job queue submitted by a user. In order to express various VM resource requirements of jobs, we use the job requirement vector to express various resources requirements. The job requirement vector of the i-th job is \( {\text{Re}}_{\text{i}} = \{ {\text{e}}_{{{\text{i}}1}} ,{\text{e}}_{{{\text{i}}2}} , \ldots ,{\text{e}}_{\text{im}} \} \), \( {\text{e}}_{\text{ij}} \) is the value of the j-th (\( 0 \le {\text{j}} \le {\text{m}} \)) resource requirement of the i-th job.

In order to calculate and measure the various resources requirements of jobs conveniently, these resources value need to be normalized to between 0 and 1 (\( 0 \le {\text{e}}_{\text{ij}} \le 1) \). It assumes that the maximum value of CPUs is \( {\text{Max}}_{\text{cpu}} \), and the maximum value of memory is \( {\text{Max}}_{\text{mem}} \), the maximum value of network is \( {\text{Max}}_{\text{network}} \), the maximum value of I/O is \( {\text{Max}}_{{{\text{i}}/{\text{o}}}} \) and the maximum value of other types of resources. The resource requirement value of the j-th resource on the i-th job can be normalized to \( {\text{e}}_{\text{ij}} = {\text{e}}_{\text{ij}} /{\text{Max}}_{{({\text{cpu}},{\text{mem}},{\text{network}},{\text{i}}/{\text{o}})}} \), and the resource requirements vector of job i can be expressed as \( {\text{Re}}_{\text{i}} = \left\{ {{\text{e}}_{{{\text{i}}1}} ,{\text{e}}_{{{\text{i}}2}} , \ldots ,{\text{e}}_{\text{im}} } \right\}, 0 \le {\text{e}}_{\text{ij}} \le 1 \).

Job resource requirements cannot adequately reflect the importance of jobs requirement for various resource such as compute-intensive jobs [12], network-intensive jobs, I/O-intensive jobs and etc. So we need a vector \( {\text{Weight}}_{\text{i}} \) to represents the weights of various resource in job i, which is called job resource weight vector, \( {\text{Weight}}_{\text{i}} = \{ {\text{w}}_{{{\text{i}}1}} ,{\text{w}}_{{{\text{i}}2}} ,..,{\text{w}}_{\text{im}} \} \), \( {\text{w}}_{\text{ij}} \) is the weight value of the j-th resource for the i-th job, and \( \sum\nolimits_{{{\text{k}} = 1}}^{\text{m}} {{\text{w}}_{\text{ik}} = 1} \).

2.3 Job Deployment Algorithm

A match matrix value is used to express the match degree between jobs requirement and resource remaining capacity of virtual machines, the matrix value can be computed by the formula (5).

$$ Value = \left[ {\begin{array}{*{20}c} {\sum\nolimits_{i = 1}^{m} {(\frac{{vm_{1i} }}{{e_{1i} }} *w_{1i} )} } & \cdots & {\sum\nolimits_{i = 1}^{m} {(\frac{{vm_{ni} }}{{e_{1i} }} *w_{1i} )} } \\ \vdots & \ddots & \vdots \\ {\sum\nolimits_{i = 1}^{m} {(\frac{{vm_{1i} }}{{e_{ni} }} *w_{ni} )} } & \cdots & {\sum\nolimits_{i = 1}^{m} {(\frac{{vm_{ni} }}{{e_{ni} }} *w_{ni} )} } \\ \end{array} } \right] $$
(5)

The matrix value is consists of n rows and m columns, m is number of virtual machines (VMs) and n is the number of jobs, an element \( {\text{a}}_{\text{ij}} \) in the value represents the match degree between job i and VM j. if the value of \( {\text{a}}_{\text{ij}} \) is less than 1, it represents that the i-th job cannot deployed to the j-th VM; if the value of \( {\text{a}}_{\text{ij}} \) is greater than or equal to 1, it shows that the i-th job can be deployed to the j-th VM. Based on the above matrix, we proposed the cross elimination approach to deploy the jobs.

In the matrix value, if \( {\text{a}}_{\text{ij}} \) is the optimal match pair <job, VM> from a global perspective, then eliminates i-th row and j-th column, and the i-th job is deployed to the j-th virtual machine. Repeated do this process, we can realize the deployment of batch jobs in a short time. The proposed job deployment method selects match pairs <jobs, VMs> from a global perspective and deploys batch jobs as optimal as possible. Its basic idea is that trying to minimize the number of elements that their value are less than 1 in the matrix value, and makes the number of mismatch pairs <jobs, VMs> be minimum. There are two steps to choose \( {\text{a}}_{\text{ij}} \), the first step is to select \( {\text{a}}_{\text{ij}} \) that is greater than or equal to 1 in the matrix; the second step is to count \( {\text{sum}}_{\text{i}} \) of the elements whose value are less than 1 in this row of the matrix, and \( {\text{sum}}_{\text{j}} \) which is the number of elements that its value are less than 1 in this column. If sum of \( {\text{sum}}_{\text{i}} \) and \( {\text{sum}}_{\text{j}} \) is greater than the sum of the number of row elements and the column elements which are less than 1, then the i-th job is dynamically deployed to the j-th virtual machine, and the row and column elements value in this element \( {\text{a}}_{\text{ij}} \) assigns to 0. This approach eliminates the elements of jobs and the virtual machine does not match elements from a global perspective. Repeated this process until there is no element can be eliminated.

Lastly the results have two cases. First, the matrix value is empty, it shows that all batch jobs which remove from the job queue have already been completed deployment; second, the matrix value is not empty, it shows that there are some jobs that cannot be completed deployment. In order to complete the jobs deployment of which cannot be successfully deployed as soon as possible, the jobs are inserted to the front of job queue to redeployment, so that they can be selected in next time. It may decrease the overall job completion time as soon as possible and reduce job QoS requirement impact for users, the deployment algorithm shown in Algorithm 1.

3 Experimental Evaluation

To evaluate the performance of the improved algorithm, we setup the corresponding experiment environment and make a large number of experiments. The comparisons between the proposed algorithm and other algorithms are given in detail.

3.1 Experiment Environment

We designed and implemented the proposed algorithm based on the cloud simulation platform CloudSim3.0.1 [13], and then tested the performance of the proposed algorithm from the job completion time and load balancing capacity two aspects. Table 1 illustrates the experiment parameters including the experimental data size and its scope, where MIPS is the executing speed of VMs or jobs, and the job length represents the length of job executing time.

Table 1. Experimental parameters

3.2 Minimum and Maximum Load of VMs and the Number of VMS

We tested the algorithm on our virtualization management cloud platform consists of five servers and created 20 to 35 VMs, each VM has and 1 CPU of 700 MIPS, 512 M RAM and 100G Disk. After launching the service of Xend and virtualization management, we use MPSTAT command to check the CPU loads. In Fig. 1, (a) is CPU loads of server before VM created; (b) is the CPU loads after create one virtual machine on the server; (c) is the CPU loads of the server when running CPU pressure test cases and VM reaches its full load state.

Fig. 1.
figure 1

The CPU load variation of a server with different VM loads

It can be concluded that the percentage of the minimum load of a VM on this server is 99.96−99.88 = 0.08, while the maximum load is 99.96−66.67 = 33.29. So the minimum CPU load of configuring virtual machine deployment on the server is 0.08 %, the maximum load is 33.29 %. So the number of VMs when all VMs running on a server node reached full load is 3 (0.3329*3 = 0.9987 is close to 1). On the other hand, when the VMs are running without any load, the number of VMs may be 12 (0.08*12 = 0.96 is close to 1). Therefore, the number of VMs on each server will be between 3 and 12. After reasonably calculation, the reasonable number of VMs on each server ranges from 4 to 7. Here the number of VMs on each server is set to 6.

3.3 Feasibility of Cross Elimination Method

For the Value matrix obtained, the cross elimination method is proposed to perform the batch job deployment, elements range from 0 to 2.5, and the scale is 100 × 100. Experiments were repeated 100 times, 500 times and 1000 times respectively. The average success rate of the method is shown in Fig. 2.

Fig. 2.
figure 2

The results of cross elimination

From the Fig. 2 we can see that when the experiments were repeated 100 times, 500 times and 1000 times, the average success rate of cross elimination method exceeds 0.84. We also analyzed the failure rate of the cross elimination method. When the matrix value is not empty, the average remaining elements of the matrix is as shown in Table 2.

Table 2. Average remaining elements number after cross elimination

As the result shows that when the matrix Value is not empty, average remaining elements number of the matrix ranges from 1.80 to 2.54, indicating that only 1 to 2 job failed to complete the deployment, which only has a slight impact on the job completion time, job throughput and load balance of cloud system. Therefore, we can use cross elimination method for matching jobs resource requirements and VMs remaining resources and implementing the job deployment. So the cross elimination method is suitable for batch job deployment.

3.4 Performance Analysis of the Algorithm

When calculating the IPCs of VMs, the share value of each VM should be known. Assuming that the share is average value, if there are 5 VMs on a server, and the share value of each VM is 1/5 = 0.2. We compared the algorithms from two aspects.

  1. (1)

    The comparison of the overall job completion time. In CloudSim, We set the task instruction length (MI) to 2000 and MIPS of each VM to 1000, we use three groups of data for experiment. The range of instruction length of first group is (0, 5000], the second group is [5000, 10000], and the third group is [10000, 15000], the results of three groups are shown in Figs. 3, 4 and 5 respectively.

    Fig. 3.
    figure 3

    Overall completion time of jobs (job length ϵ (0, 5000))

    Fig. 4.
    figure 4

    Overall completion time of jobs (job length ϵ (5000, 10000))

    Fig. 5.
    figure 5

    Overall completion time of jobs (job length ϵ (10000, 15000))

We can see from the figures that when the number of jobs is 3500 and the range of instruction length is (0, 5000], the overall time of the improved potential capacity (IPC) algorithm is reduced by about 500 ms; when the range of job length is [5000, 15000], the overall time of IPC algorithm is reduced by about 200 ms.We can concluded that IPC algorithm performed better than the greedy algorithm and CloudSim polling strategy as for the overall job completion time, especially when the range of job length is (0, 5000]. That is because PC-U of VMs can dynamically represent the resource remaining capacity of the VMs, when the length of a job is shorter, PC-U can better meet the resource requirements of jobs.

  1. (2)

    The comparison of the load balancing. To verify the load balance of computing system under different algorithms, the load balance degree is introduced to measure the load balance level of computing system, which is calculated by using the variance of each server at one time point, the calculation is as formula (6).

$$ LoadDegree = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {D_{i} } $$
(6)

Where \( {\text{D}}_{\text{i}} \) is the CPU load variance of all servers in computing system at the i-th time point, n is the times of collecting data for each server load on the cloud platform. The higher of LoadDegree means the worse of the load balancing, and the lower of LoadDegree means the better of load balancing. In the experiment, we set the number of time point is 10, the time points are uniformly distributed throughout the process of the experiments. When the total number of jobs submitted by user is 500 and job length is [5000, 10000], the results of three algorithms is shown as Fig. 6. When the total number of jobs is 1500 and job length is [5000, 10000] the results is show as Figs. 7 and 8 represents the results that the total number of jobs is 2500 and job length is [5000, 10000]. The results of Figs. 6, 7 and 8 show that the proposed algorithm IPC achieves better load balancing than any other algorithm.

Fig. 6.
figure 6

Load balance of cloud platform (the number of jobs = 500)

Fig. 7.
figure 7

Load balance of cloud platform (the number of jobs = 1500)

Fig. 8.
figure 8

Load balance of cloud platform (the number of jobs = 2500)

From the above two aspects, we can get a conclusion that the proposed resource scheduling algorithm IPC can decrease the overall job completion time and has better load balancing than the other algorithm.

4 Conclusions

In this paper, we deeply study the resource schedule algorithms for VMs, and propose an improved potential capacity (IPC) based resource schedule algorithm. The algorithm comprehensively considers the overall job completion time and system load balancing, and uses the IPC to dynamically compute the resource remaining capacities of VMs, so it can avoid the inexact matching between the jobs and the server resources, and greatly reduce the job deployment time and improve the load balancing of the system. The algorithm is compared with the greedy algorithm and the polling strategy based on CloudSim simulation software, the experimental results show that the proposed algorithm has a better performance in reducing overall job completion time and improving the load balance of cloud platform. Next we will further improve the performance of the proposed algorithm and apply it to real cloud systems for further test.