Keywords

1 Introduction

Many devices extract a lot of raw big data and they have the potential to make information to change the world. In the process to generate useful data, prediction is getting important. There are several methods to predict the future. And one of the most famous prediction model developed from time series analysis is ARIMA (Auto Regressive Integrated Moving Average) [1, 2].

Cloud computing is one of the hottest technique to handle the generated big data using virtualization technology [3]. But some users were hard to use the cloud services so Cloud Service Brokers (CSB) have been created and for the users and CSB, Reserved VM (RVM) service was made by Cloud Service Provider (CSP) that Amazon EC2 [4] is a representative of. In CSB system, it will be beneficial to merging a prediction scheme into RVM reservation policy. Because CSBs contract Service Level Agreements (SLA) with both CSPs and Cloud Service Clients (CSC). All predictions always have an error and there should be a way to cover this error but it is hard.

For prediction mechanism, Kim et al. [5] suggested the scheme to set the proper number of RVMs to be leased on the CSB’s side. They propose an idea called C-VMR (VM reservation scheme) is adaptively choosing RVM number to be leased based on predicted method called ARIMA and its algorithm gives a basic idea to our step 1 algorithm in the Sect. 2. Also Shumway et al. [6] provides ARIMA modeling method with giving R application examples. In the process to reduce the prediction error, we choose to use the VM replacement concept as our Replacement policy that abstractly suggested by Kang et al. [7] who proposed the A3R (Recycle, Replacement, Reposition) algorithm. It focuses on how to cost-efficiently broker VMs in cloud computing services. The VM replacement scheme just gives a concept that when the CSPs have no RVM to supply to the user corresponding to the user-requested RVM, they can lease the RVMs with larger capacities than the demanded RVMs’.

In this paper, we integrate ARIMA prediction model to RVM reservation policy. We also use the replacement scheme based on Kang et al. [7] to cover the occurring error related to the demands on RVMs. Simply, in broker system, if there is no RVM to lease to users and there are some RVMs that have larger capacities than RVM that user requested, then CSB will let the larger ones be leased to users to get more benefits. Briefly, the rest of the paper will be illustrated as follows. Section 2 introduce the prediction-based RVM reservation policy and applied RVM Replacement method. Section 3 is the experiment with its evaluation. Lastly, Sect. 4 concludes the paper.

2 VM Reservation with Prediction and Job Allocation

2.1 Problem Statement

As an aspect of commerce, if the providers can predict the future demand from the historical data collection, they will become more beneficial. In the cloud computing, the same concept can be applied for CSB and the prediction of VM requests is getting important. Figure 1(a) shows relationship of SLAs in cloud computing environment. SLA contract between clients and brokers needs some information such as deadline and budget of the users. From this data, brokers are easy to provision their resources. If resource demand varies as depicted in Fig. 1(b), there exists over-provisioning by wasting cost and under-provisioning by violating SLA. So the uncertainty of the resource demands is an inevitable problem and by predicting the demand, reserving the proper number of resources is hard to solve.

Fig. 1.
figure 1

CSB constraints: (a) SLA relationship of cloud broker between consumers and providers, (b) Demand variation causing QoS problem

To resolve the problem, many prediction models [8] can be considered and Kim et al. [5] suggested the C-VMR method in order to reserve VMs from ARIMA model. Kim’s approach was based on Eq. (1), where we try to improve operation scheme. We consider a different prediction model, prediction, and others. Thus, we expect good performance in VM reservation. In this case, applying Eq. (1) for VM reservation could cause over-provisioning or under-provisioning problems because distinctive values like \( {\textbf{max}}\,\varvec{D}_{\varvec{p}} (\varvec{t}) \) and \( {\textbf{min}}\,\varvec{D}_{\varvec{p}} (\varvec{t}) \) over each \( \varvec{T}_{\varvec{p}} \) will mislead non-average number to lease RVMs.

$$ n_{RVM}^{l} \left( {\text{t}} \right) = \left\lfloor { \frac{1}{{T_{p} }}\mathop \sum \limits_{k = t}^{{t + T_{p} - 1}} D_{p} \left( k \right) - n_{RVM}^{e} \left( t \right)} \right\rfloor $$
(1)

Kang et al. [7] proposed a method cost efficient VM brokering. One of the scheme that they suggested was the replacement algorithm and the concept was that larger capacity RVMs can be borrowed by smaller capacity RVM on the VM allocation request of smaller VM. We can improve this method in two ways. First A3R scheme is for the situation that the prediction is not applicable however, if we use this scheme when VM allocation proceeds with a prediction method, the replacement algorithm will act as an error controller to cover the difference between real VM demand and reserved VMs. Second, they proposed the scheme with abstracted explanation and did not prove its availability on cost policy and it needs to be concreted as an algorithm with a specific form.

2.2 A Proposed Model Description

Figure 2 is the proposing model to overcome the problems discussed in Sect. 2.1. Below the dotted line, historical demand acts as an input of the prediction model and the module will generate the future demands that are predicted values. From the information of predicted number of VMs to lease, a heuristic algorithm that we propose supports CSB reserving VMs. The result of VM reservation affects the broker’s VM pool. From the number of RVM and OVM that broker has, the cost can be calculated. In Fig. 2, when user request comes to the brokers, it needs to be allocated. The scheme C-VMA (VM Allocation Scheme) [5] is applied for task allocation. First, and the replacement algorithm works. In this process, VM allocation algorithm module let the brokers know how many OVM to lease and when they need to get more OVM, they check the VM pool first to find RVM that is replaceable. If there is a replaceable RVM, by using VM replacement method, OVM leasing cost will not happen. The entire process represents VM allocation and it can be used to check the performance utilization. So this model is whole conceptual diagram how the proposing scheme works and the specific algorithm and formula will be explained.

Fig. 2.
figure 2

The proposed model for VM reservation using M-C-VMR (Modified C-VMR) and task allocation based on M-C-VMA (Modified C-VMA)

The prediction scheme in this paper is based on C-VMR [5], the demand from the time t during \( T_{p} \) that is prediction period. Then, \( n_{{RVM_{\alpha } }}^{l} ({\text{t}}) \) denotes the number of RVMs to be leased at the time t, \( D_{\alpha } (t) \) is the predicted demand at time t and \( \alpha \) is the type of RVMs so it is represented as {S, M, L}. \( n_{{DRVM_{\alpha } }}^{e} (t) \) is the number of existing predicted demanded RVMs in the VM pool at time t. \( n_{{ADRVM_{\alpha } }}^{e} (t) \) is the number of existing actual demanded VMs in the VM pool at time t. \( n_{{RVM_{\alpha } }}^{e} ({\text{t}}) \) is the number of RVMs in the VM pool at time t. \( opt(n_{{RRVM_{\alpha } }}^{e} \left( t \right)) \) is the optimized number of Replaceable RVMs in VM pool at time t but we are focusing on applying the demand prediction on the original VM Replacement scheme so that we assume this term might be negligible in this experiment.

Equation (2) is the average demand during \( T_{p} \) and it means that deleting the maximum and minimum values of the predicted demands on measuring the average will result the better fitted mean on the \( T_{p} \). Equation (3) is about how many RVMs to lease from the result in Eq. (2). \( \alpha \) is a VM type that can be all VM types, and \( \beta \) is the VM type that is made for designating Replaceable RVMs (RRVMs) which is applied for the Replacement policy. The number of RRVM on \( \beta \) type can be calculated by using Eq. (4). With the RRVM concept, Eq. (5) represents the maximum number of RVMs to be leased to user. The VM reservation equations are as follows.

$$ n_{{DRVM_{\alpha } }}^{e} \left( t \right) = \frac{1}{{T_{p} - 2}}\{ (\mathop \sum \limits_{k = t}^{{t + T_{p} - 1}} D_{\alpha } \left( k \right)) - \mathop {\hbox{max} }\limits_{{k \in [t, t + T_{p} ]}} D_{\alpha } \left( t \right) - \mathop {\hbox{min} }\limits_{{k \in [t,t + T_{p} ]}} D_{\alpha } \left( t \right)\} $$
(2)
$$ n_{{RVM_{\alpha } }}^{l} \left( {\text{t}} \right) = \left\lfloor {n_{{DRVM_{\alpha } }}^{e} (t) - n_{{RVM_{\alpha } }}^{e} \left( t \right) - {\text{opt}}(n_{{RRVM_{\beta } }}^{e} (t))} \right\rfloor $$
(3)
$$ n_{{RRVM_{\beta } }}^{e} \left( t \right) = n_{{RVM_{\beta } }}^{e} \left( t \right) - n_{{ADRVM_{\beta } }}^{e} \left( t \right) $$
(4)
$$ \hbox{max} (n_{{RVM_{\alpha } }}^{e} \left( t \right)) = n_{{RVM_{\alpha } }}^{e} \left( t \right) + \mathop \sum \limits_{\beta > \alpha } n_{{RRVM_{\beta } }}^{e} \left( t \right) $$
(5)

In step 1 of algorithm 1, we need to set prediction method to get the RVM demand in each type. From the historical data, we can measure the future demand on all RVM types. The output let the broker know what will be the approximated demand and how many RVMs to lease. Step 2 will be the stage to generate RVM actually by the result of the step 1. \( n_{{RVM_{\alpha } }}^{e} (t) \) will be the number of RVMs to maintain from the prediction and NG(t) is the number of RVMs to be newly generated (Fig. 3).

Fig. 3.
figure 3

Prediction-based VM reservation scheme

Algorithm 2 will be the stage to make up the difference between the actual demand and predicted demand occurred on the previous Algorithm 1 because of the RVM reservation error by the prediction method. This method will be done in the process of VM allocation. We used C-VMA and modified it with the RVM Replacement method that is abstractly suggested. When CSB allocates the user-requested tasks to VMs that the CSB has, CSB will first look into the OVM pool to find the OVM which satisfies that the value, residual time - the predicted application execution time is larger than \( \delta \). If true, the OVM will be used for the task. And there is no OVM to satisfy the condition then search RVM to use. Lastly, when RVM is not available, the replacement section will be executed and RVM which has larger capacity than the past RVM has will be used. Through even these entire searching procedures, if there is no VM to allocate the task on, CSB will lease OVM from one of the CSPs (Fig. 4).

Fig. 4.
figure 4

Task allocation using replacement scheme

3 Performance Evaluation

3.1 Prediction-Based VM Reservation with M-C-VMR

In this part, first we generated the demand of CSCs to lease VMs in each type from the CSB for 4 years. The generated demand is measured the user VM requests per day. From the demand, we calculated the average number of VM requests per week and also from the data, we could get the predicted average number of VM requests per week. To do this, we use the R application [9] with ASTSA package with ARIMA model for prediction. Graphs in Fig. 5 describes the procedure to get predicted demand that explained.

Fig. 5.
figure 5

Demand prediction process using ARIMA model

We checked the cost policy of GoGrid [10] which is shown in a chart in Table 1. Minimum time to lease of OVM (\( MT_{O} \)) is an hour, and Minimum Time to lease of RVM (\( MT_{R} \)) is a month. We considered that the broker initially has small, medium, and large type of RVMs leased from CSPs. As shown in Fig. 6, we changed the initial VM numbers on each type like 70/70/70 means small/medium/large VM numbers that the broker has. And it is on the horizontal axis. The vertical axis stands for the actual total cost that broker needs to pay. The total cost is measured by Eq. (5) and the ratio \( (\delta ) \) can be \( \frac{{MT_{O} \times MT_{O} }}{{MT_{R} }} \le \delta \le \frac{{MT_{O} \times MT_{R} }}{{MT_{R} }} \) and we set OVM using time as \( MT_{O} \) in this evaluation. It reflects the fact that OVM is leased in shorter term than RVM’s. If the OVM leasing event happens, then OVM cost will be considered, otherwise not considered.

Table 1. GoGrid cloud cost policy [6]
Fig. 6.
figure 6

Comparison of budget policy using M-C-VMR, M-C-VMA with others

$$ {\text{C}}_{TotalSum} = \mathop \sum \limits_{\alpha } \{ {\text{C}}_{{RVM_{\alpha } }}^{{MT_{R} }} \cdot n_{{RVM_{\alpha } }}^{e} \left( t \right) + {\text{C}}_{{OVM_{\alpha } }}^{{MT_{O} }} \cdot n_{{OVM_{\alpha } }}^{e} \left( t \right) \cdot \left\lfloor {\frac{{\delta \times MT_{R} }}{{MT_{O} }}} \right\rfloor \} $$
(6)

The lines of the result graphs in Fig. 6 mean as follows. The diamond point line is the case that only OVMs are used without any RVM. The circle point line stands for the concept that with RVM introduced, initial leased RVM number will be maintained and on under-provisioning state, the CSBs will lease OVM from CSP to prevent breaking SLA with their CSCs. Especially at the 90/90/90 stage, it shows the minimal cost of the circle point line and it is because leasing about 90 VMs on each type draws the cost-efficient conclusion by leasing proper number of RVMs. The plus point line is for the C-VMR prediction modeling and it shows overall smaller cost than the circle point line since the prediction was introduced. The cross point line is the proposing prediction concept and the cost is smaller than the plus point line about $200 to $1200. The square line is added the replacement concept on the plus point line. It shows cheaper result than the plus point line. Sometimes the cost flow trend goes up, and this is because when the large RVMs are replaced to the medium RVMs and large RVMs are needed, the broker needs to lease OVM from the CSP. However, when a lot of RVM are reserved comparing to the actual demand, replacement algorithm is effective to get the cost reduced. Lastly, the triangle point line is applied by the proposing scheme and replacement. By excluding the points that have a chance to be way far from the average, and using replacement scheme, it leads the most minimal cost Fig. 6.

3.2 VM Utilization with M-C-VMA

Through experiment, we evaluated the performance of M-C-VMA with comparing the C-VMA. Our testbed was implemented as shown in Fig. 7. Each machine specification is Intel(R) Xeon(R) CPU E5620 with two quad-core CPUs providing hyper-threading. They have 8 GB main memory and 1,000 GB hard disk and are clustered to provide the cloud services by using OpenStack [11]. We prepared two VMs of three types (small, medium, large) for the RVM set and a VM of the same type for the OVM. We assume the situation that users want to execute the three different types of the Montage scientific applications [12] in cloud (m105-1.5, m106-1.7, m108-1.7) and their requests which require some specific types arrive by following a Poisson process with its mean value, 30 s. Also the RVM minimum leasing time is set to 2 h and the time for OVM is 5 min. The factors are scale-downed in terms of the reality.

Fig. 7.
figure 7

The experiment environment for M-C-VMA performance evaluation

Figure 8 shows the VM utilization result of M-C-VMA comparing with the C-VMA. The bar graph indicates that the VM utilization on medium and large types is better by using M-C-VMA and C-VMA works well only on small type. This is because the proposing replacement scheme used in the task allocation process affects the VMs to be replaced. This method is about managing the VMs that have larger capacities than the requested type of VM. When a VM for some specific applications is out of stock, the C-VMA need to lease the new VM. However, the M-C-VMA finds the Replaceable VM first, letting the application run on the larger size VM, whose specification is proper enough to run on smaller VM. This is the reason why the small type VM utilization of the proposing scheme is lower than the C-VMA. In other words, it is explained in the same aspect of the replacement scheme that the VMs of the small type do not have many chance to be utilized. By using the proposing algorithm, two things get better. First, M-C-VMA does not need to lease OVM from the beginning and it means that RVM utilization will increase, so that the cost that was supposed to be wasted will be saved. This explanation can be proved by the cost comparison result of Fig. 6. And second, the execution time can be decreased by making the task that was supposed to be executed on smaller type run on larger type.

Fig. 8.
figure 8

The comparison of average utilization in VM

4 Conclusion

In this paper, we proposed an adaptive prediction-based VM replacement scheme in cloud to solve the difficulties about managing VM pool for CSBs. To maintain RVMs cost-effectively, we suggested the prediction scheme by introducing the prediction model and heuristics named M-C-VMR to get the RVM number to newly lease. And the M-C-VMA applying replacement method in VM allocation phase offsets the extra cost that occurs because of the prediction error. Evaluation about cost showed that M-C-VMR and M-C-VMA can decrease the budget. VM Utilization explained that the proposing scheme gives many benefits related to cost by increasing the VM utilization, and in terms of the execution time by giving a chance to run a task on the larger VM. For the future works to do will be to improve the proposing prediction method such as meta-heuristics and to propose another scheme to cover the difference error due to the prediction.