Keywords

1 Introduction

Cloud computing provides an on-demand and scalable delivery model for the users [1]. It has been used to solve the complicated computation and storage problems by more and more governments, research institutions and industries [2] due to provide resources as a service to the users. Effective virtual resource scheduling can improve the utilization of resources and meet the needs of users. The virtual resource scheduling problem is considered to be a combinatorial optimization problem, but also a NP complete problem [3].

The workload of each virtual machine (VM) is always changing, and some may exhibit cyclical changes. We usually tend to over allocate virtual resources in order to ensure the application to have a better performance during the peak [4], which will inevitably lead to low utilization of resources. To operate and manage resources more conveniently, effective monitor is often used [5]. But there will be a traffic surge for quite a time after application is deployed and then decline gradually based on the monitoring. For example, in the school’s teaching information system, a few students will visit this website in ordinary time, but a huge amount of traffic will be generated when students need to select the course. Then the application system needs to extend resources to deal with the requests traffic. We usually monitor the system and warn system managers to deal with the lack of resources. But there is often a delay to schedule resources by monitor and warning.

To deal with the above challenges and to improve the utilization of virtual resources and flexibility of scheduling, we propose a method to dynamically schedule the virtual resources based on the Autoregressive Integrated Moving Average Model (ARIMA) prediction model. Resources will be allocated in advance on the basis of the prediction data. Our paper addresses the following problems:

Find a prediction model to accurately predict the possible application workload of the next time interval.

Dynamic scheduling of virtual machine resources according to the change of workload and prediction data to ensure more efficient use of resources.

The rest of our paper is organized as follows. Section 2 is an overview of the current state of the academic research on the virtual resource scheduling. Section 3 describes the system architecture. Section 4 focuses on the prediction model we will use. Section 5 describes the virtual machine scheduling strategy and the corresponding experiment details are illustrated in Sect. 6. Section 7 presents conclusions and future work.

2 Related Works

Since cloud computing uses virtualized resources, scheduling and resource allocation are the important research topics [6]. It is a hot research topic that how to use effective scheduling strategies to reduce the cost of execution and improve the utilization of resources.

Silpa et al. [6] discuss the current scheduling algorithm that has been published in cloud computing, in this paper, 15 different algorithms are studied and compared, such as fuzzy genetic algorithm based on task scheduling.

Some resource scheduling frameworks are put forward. Singh et al. [7] present an efficient cloud workload management framework under the premise of certain workload and quality of service. Shuja et al. [8] introduce a resource scheduling framework for efficient utilization.

Hassan et al. [9] present Nash bargaining to save resources and optimize the number of servers. Singh et al. [10] propose service quality indicators to optimize the execution time and avoid waste of resources. And some studies about resource optimization algorithms, such as the article [11, 12].

Zheng et al. [13] point out that virtual machine placement is also a way to optimize the resources, two kinds of virtual machine placement method, incremental placement and consolidated placement, are discussed and a new virtual machine placement strategy is proposed. Wang et al. [14] explore an energy and QoS-aware VM placement optimization approach based on particle swarm optimization.

Liu et al. [15] present an approach to process the dynamic user service requests more cost-effectively. Zhou et al. [16] introduce a dynamically adjust the virtual resource rental strategy to help cloud service providers maximizing profits.

There are also some researches on prediction methods. Salah et al. [17] use Markov chains to estimate the number of virtual machine instances that are required to be allocated under a given Service Level Object (SLO) standard. Shyama et al. [18] study a Bayesian model to predict resource needs of CPU and memory intensive applications in the short term and long term. However, the article [17] focus on load balance, and in [18] mainly focus on CPU cores and RAM, but they are not very suitable for our scenarios.

3 System Architecture

3.1 Architecture Description

The system architecture used in this paper showed in Fig. 1 includes the following layers User Layer, Application Layer, Control Layer, Virtual Layer, Physical Layer.

Fig. 1.
figure 1

System architecture diagram

User Layer: Users who use the application in cloud plateform.

Application Layer: It provides basic application environment, such as tomcat server, MySQL database, Hbase, etc.

Control Layer: It includes Load Balance Module, Data Collector Module, Data Predictor Module, Resource Scheduler Module.

Load Balancer Module: All requests submitted by the users will be forwarded through the Load Balancer to our web server. We use Nginx as a load balancing server in the system architecture.

Data Collector Module: This module is mainly to get the data through real-time monitoring. Some data will be collected by this module, such as CPU, memory, request quantity, etc., and then data will be pre processed. We store the requests per second to database in order to analyze historical data in the future. The module will calculate the response time for each request, which is ready for us to assess the application response time.

Data Predictor Module: The module uses the data collected in the Data Collector Module to calculate the number of requests to be reached at the next time interval of the application. The next time interval is the amount of user requests that the application will achieve after 5 min. On one hand, we mainly study the prediction of the application workload in the short term; On the other hand, the high frequency statistics and calculation will cause some pressure on our server. Generally each virtual machine can be completed in 30–96 s from creation to deployment [17], so the time is enough for us to deal with the coming pressure on the system. Of course, you can adjust the time interval.

Resource Scheduler Module: Using the Data Predictor Module to get the predicted requests of the application to decide to increase or reduce the virtual resource.

Virtual Layer: It supply virtual resource and composes a virtual resource pool.

Physical Layer: It is infrastructure and mainly includes the physical servers and the virtual machines deployed on the physical servers, which provides the underlying resources for the application.

4 Workload Prediction Methods

In this section we will discuss several methods for predicting the workload. We simplify the workload to the number of user requests in our scenario. But the requested quality of service has been taken into account, we assume that the response time of the request is satisfied in the range of 0 to 2 s, the maximum workload of the server is considered under such a condition. We mainly discuss three prediction methods: Moving Average method, Polynomial Fitting method, ARIMA Model method. We have also introduced the error to analyze the three methods which is more suitable for our scenario.

① Moving Average: Moving Average is one of widely known technical indicator used to predict the future data in time series analysis [19]. In statistics, a moving average is a calculation to analyze data points by creating series of averages of different subsets of the full data set. It includes simple moving average, the cumulative moving average and weighted moving average [20]. We use simple moving average in this paper.

We assume that the requests for a certain time of application is ri, then the sequence of requests can be expressed as R = {r1, r2 … ri … rn | n < T}, where T is our measurement time. According to the definition of the moving average, we can predict the amount requested at time n + 1 is:

$$ {\text{Pre}}{\_}{\text{R}}\text{ = }{{\left( {\text{r}_{\text{1}} \text{ + r}_{\text{2}} { + \ldots + }{\text{r}}_{\text{n}} } \right)} \mathord{\left/ {\vphantom {{\left( {\text{r}_{\text{1}} \text{ + r}_{\text{2}} { + \ldots + }{\text{r}}_{\text{n}} } \right)} \text{n}}} \right. \kern-0pt} \text{n}} . $$
(1)

Pre_R is the predicted value at time n + 1 in Eq. (1), n denotes the average movement cycle, r1 to rn are the first n values.

② Polynomial Fitting method: Polynomial regression is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial [21]. Linear relationship is not a good description of the relationship between the amounts of application requests at different times, so the polynomial method is one aspect to consider.

We assume that the ti time corresponds to a request amount of ri, and (ti, ri) is a point on a two curve, namely:

$$ \text{r}_{\text{i}} \text{ = at}_{\text{i}}^{\text{2}} \text{ + bt}_{\text{i}} \text{ = c}\text{.} $$
(2)

Since it is the conic section, we only need to use three sets of values to be able to seek out a, b and c solution. If we predict ri+1, just need three points, that is, \( ({\text{t}}_{{{\text{i}} - 2}} ,{\text{ r}}_{{{\text{i}} - 2}} ),({\text{t}}_{{{\text{i}} - 1}} ,{\text{ r}}_{{{\text{i}} - 1}} ), \, ({\text{t}}_{\text{i}} ,{\text{ r}}_{\text{i}} ) \).

③ ARIMA Model: The ARIMA model is built based on Markov random process, which not only absorbs the dynamic advantages of the regression analysis, but also the advantages of moving average [22]. Non-seasonal ARIMA model use ARIMA (p, d, q) to express, wherein p, d, q are non-negative integers, p is the order of autoregressive model; d indicates the degree of differencing, q represents the moving average model order. Seasonal ARIMA model using ARIMA (p, d, q) (P, D, Q) m, m refers to the number of cycles per season, P, D, Q, respectively, refers to the autoregressive, moving average and differential [23].

5 Scheduling Algorithm and Implementation

5.1 VM Provisioning Algorithm and Implementation

We assume that a certain time point T, the requests for our application is R. We simplify the application workload to the user’s request. Our machine (Web Server) number is N in the current state. We adopt a polling workload allocation strategy. The number of requests for each web server shared by the load balancer is R/N.

We use Data Collector Module to collect raw data and pre-processing data. We use Tresponse to denote the response time for each request, and pass the request of the pre processed data to the ARIMA module. Each time the user requests for the application will be recorded in the database, we can calculate the amount of user requests of each interval, abbreviated Ri. We predict requests of the next time interval, assumed to be Spredict based on Ri. Spredict and Scurrent_max, the maximum workload that the virtual machines can bear under the current scale, will be passed to the Dynamic Scheduling Module. If Spredict is greater than Scurrent_max, then the DSM will find the right virtual machine template in physical servers to configure virtual resources. The way we are using is to randomly select a virtual machine that is providing services for the application and obtain its information (the using operation system, CPU kernel number, memory, etc.). Finding the information from the template in physical servers and here we assume that each virtual machine that has been used is created by a template in the physical machine server. In order to reduce the false positive rate, we can set the number of times to meet the judgment. We begin to configure virtual resources, when the times are more than a specified number of times. Assuming that each virtual machine created by virtual machine template can withstand the maximum workload of MHVMW. The difference between the predicted workload and the maximum workload that the current size of the virtual machines can withstand is the workload that we need to create virtual machines to support. We can roughly calculate the number of virtual machines that need to be created at a time combined with MHVMW, using Nneed to denote the calculated number. To determine the location of the virtual machine created, first of all, physical machines workload will be sorted from least to most, and we use the word “size” to denote the number of physical machines that meet the conditions for creating virtual machines, then taking the remainder of “size” from 0 to Nneed. The purpose of this is to create a virtual machine in a low workload physical machine. Finally, the strategy returns the positions of the virtual machine to be created.

Symbol definition:

LVMWi: Lowest Virtual Machine Workload, the minimum workload value allowed by each virtual machine, if the current workload is lower than the value, we will consider it to be an idle virtual machine.

HVMWi: Highest Virtual Machine Workload, the maximum workload value allowed by each virtual machine, if the current workload is higher than the value, we will believe that the quality of service provided can not meet the needs of users.

Spredict: The predicted workload value obtained from the prediction model.

Ncurrent: Number of current virtual machines.

Scurrent_max: The maximum workload that the virtual machines can bear under the current scale.

$$ {\text{S}}_{{{\text{current}}\_{ \hbox{max} }}} = \sum\limits_{i = 1}^{Ncurrent} {HVMW_{i} } $$
(3)

Scurent_min: The minimum workload required for the virtual machines at the current scale.

$$ {\text{S}}_{{{\text{curent}}\_{ \hbox{min} }}} = \sum\limits_{i = 1}^{Ncurrent} {LVMW_{i} } \,. $$
(4)

MHVMW: Model Highest Virtual Machine Workload, the virtual machine can be made into a template, this variable represents the maximum workload that it can take after the template is turned into a virtual machine.

PM: Physical Machine, PM = {PM1, PM2 … PMi | i < PMCount}.

VM: Virtual Machine, VM = {VM1, VM2 … VMj | j < VMCount}.

figure a
figure b
figure c

5.2 VM Recycling Algorithm and Implementation

It is obviously a waste of resources that virtual resources remain the largest scale after the peak of the traffic is over. VM recycling Algorithm will be enabled when the current workload is less than the specified minimum workload. We will carry out an ascending sort to workload of all virtual machines when the recycling algorithm is enabled. In order to reduce the false positive rate, we can also use the method that predicted times reach the number of count times specified to enable VM Recycling Algorithm. We will poll the VM workload and recycle the VM that is the minimum workload until the remaining one virtual machine to provide services.

6 Experimental Design and Results

6.1 Experimental Environment

In order to verify whether the prediction model and the resource scheduling algorithms are effective, we do some experiments in this part. The experimental environment used is: 2 LoadRunner servers, 1 load balance server, 4 application servers, 1 MySQL servers and 3 physical machines (Table 1).

Table 1. Configuration information

6.2 Experiment and Result Analysis of the Predict Methods

We deploy our application in two groups of two virtual machines, one group using the default policy, another group using the proposed strategy in this paper. Using 2 LoadRunner servers to simulate the request, and then using our prediction model to predict. We will show the number of requests for the site below and the curve drawing of the real value and the predictive value obtained by using the methods in the Sect. 4.

Fig. 2.
figure 2

Site request volume graph

Fig. 3.
figure 3

The real value and predictive value of MA

Fig. 4.
figure 4

The real value and predictive value of polynomial fitting

Fig. 5.
figure 5

The real value and predictive value of ARIMA

Fig. 6.
figure 6

Relative error curve of MA, Polynomial and ARIMA

Fig. 7.
figure 7

Default policy and VRDS comparison graph with failed requests

Fig. 8.
figure 8

Default policy and VRDS response time comparison graph

Fig. 9.
figure 9

Changes in the number of VMs under the default policy and VRDS

Figure 2 is the general trend of accepting requests from our website. The number of requests for the site is a gradual increase from less to more, after a certain period of time, the requests will gradually decline. This is the scenario we need to deal with in the project. Figure 3 uses moving average method to fit our scenarios, and the overall trend is well fitted, but we can find out from the graph, the fitting curve of this method is relatively backward, which cannot be very good to help us to predict the future trend of requests. Figure 4 method is very good at the request of the amount of the increase and decrease of the scene, but the volatility of prediction results is larger in the peak period of requests. In our scenario, the Fig. 5 method can be well fitted with both increasing and decreasing values. In order to observe the real value and the predicted value of the scene, we introduce the error analysis. Figure 6 shows the relative error of the three methods.

From Fig. 6, we can see that the relative error of ARIMA is relatively stable, and the error is the smallest of the three methods, we will use the model to predict in our scene.

6.3 Resource Scheduling Experiment Results and Analysis

We use our prediction model in the virtual machine scheduling algorithm. By prediction, it will inform our Virtual Resource Dynamic Scheduling (VRDS) algorithm when the user’s traffic continues to increase. And then our VRDS algorithm calculates the size of the required resources, select a reasonable resource scheduling and decide to create or recycle the virtual machine. In the default policy we give a fixed number of virtual machines, and in the VRDS strategy will base on the load situation to do resource scheduling. In Fig. 7, it shows the comparison of the number of requests for the users to access the web site under the two different strategies. VRDS strategy can effectively reduce the number of failed requests. It is assumed that the maximum response time for each user to request is 2 s. The response time of the user request is illustrated in the case of Fig. 8 with two strategies, which are continuously increasing with the number of requests on the website. VRDS strategy can be more close to the response time we set.

Figure 9 shows the change in the number of VMs, VRDS can create and recycle VMs in different stages.

In summary, in our scenario, the prediction model used in this article can be more fitting site requests constantly increasing amount of requests to the maximum amount and gradually decreasing after the scene. VRDS algorithm combined with the prediction model in this paper can effectively and timely response to the site of the high workload situation.

7 Conclusion and Future Work

With the advent of big data era, the data is growing geometrically. Our web site or application is likely to generate a huge surge in traffic because of sudden or hot events. Relying solely on the traditional way apparently is unable to cope with such pressure, and cloud computing brings us a new revolution. Deploying our applications in the cloud will help us to avoid the collapse of the application because of heavy workload. However, there are still many deficiencies in the dynamic scheduling of cloud resources.

In this paper, we proposed a method for dynamic scheduling of virtual resources based on prediction. The prediction will help us to make the decision to deal with the load too much earlier, and change the passive into the initiative. By actively calculating the size of the virtual resources that are needed to cope with the current workload and the decision to create a reasonable location for the virtual machine, we will be more rapid in response to the heavy workload of cloud applications and ensure that the application can easily cope with the massive use of access.

Of course, that we simplify the server workload to the user’s request for the application is not enough to completely express the actual situation of the workload, and the workload prediction method is still not fine enough, the scene is relatively simple. In future work we will consider more factors that are more close to the actual situation and simulate our experiments, and apply our algorithm to more practical scenarios.