1 Introduction

Cloud computing is a promising technology aimed to bring various visualized resources, software, and platforms as services to its customers based on the pay-for-use model [1]. To provide high-performance cloud services for end-users, conducting resource management in cloud DCs is of high importance [2, 3] and it can decrease the energy consumption costs as well as CO2 emissions [4, 5]. At general, resource management scheme can be classified as reactive and proactive categories which in the first case, when the workload increases/decreases to a predefined specific threshold, resource management will be conducted [6]. But, regarding the boot time of the VMs, the reactive method cannot deal with the sudden burst of the workload [7] and may result in service level agreement (SLA) violations. On the other hand, proactive methods solve this problem by predicting the future workload of DC by recognizing the possible resource usage patterns and provisioning the required resource. Consequently, by effective prediction, the performance degradation can be deterred and idle resources can be reduced to further improve the profit. However, conducting proactive resource management is not a trivial process and variable workload of the cloud-hosted services may lead to the following problems:

  • Under-provisioning: The applications do not get enough resources to process all their requests and may cause SLAV.

  • Over-provisioning: Virtual resources are assigned to the application more than needed, which incurs more cost to the customer. However, up to some level, over-provisioning is required to handle the fluctuation of workload up to some level.

  • Oscillation: A combination of over-provisioning and under-provisioning problems happens as a result of auto-scaling.

Consequently, accurate workload prediction is a crucial factor in conducting effective proactive resource management schemes and allocating on-demand resources to the user requests [8]. Thus, cloud resource management systems, on one hand, should be able to allocate the desired virtual resources in order to prevent performance loss and on the other hand should prevent the resource wastage by de-allocating the idle resources(auto-scaling) [9, 10]. To deal with these issues, as shown in Fig. 1, the cloud-hosted services should be monitored and their loads should be logged. Then, this historic load data can be processed and fed into a workload predictor to forecast future load. For this purpose, various resources such as CPU, memory, network bandwidth, and even I/O operations can be employed in the prediction process. Using this information, resource management and auto-scaling schemes can scale up/down the virtual resources as needed [11]. At general, cloud resources can be scaled horizontally and vertically which in the horizontal case [12], more VMs are provisioned as predicted to deal with the future loads and in the vertical case, the existing VMs’ resources should be increased [7, 13, 14], but often operating systems do not allow such changes because of the security risks [15,16,17,18].

Fig. 1
figure 1

Elasticity using workload prediction

However, workload prediction in cloud computing is a challenging issue, since unlike HPC systems and Grid computing, cloud workloads have higher variance, are shorter, more interactive, and their average noise is almost 20 times of grid computing. In addition, since cloud resources are shared by several users or tasks they may suffer from some fluctuations and also new workload patterns can continuously emerge. Besides, non-stationarity workloads in cloud infrastructure, which their pattern change over time, make retraining of the prediction models more frequent and increases the overheads correspondingly. To solve these problems and regarding the importance of accurate workload prediction in the effective resource management of the cloud DCs, a significant deal of attention has been paid for load prediction by using various mathematical models and machine learning-based prediction algorithms [8, 19,20,21,22,23,24]. This article presents a thorough investigation of the state-of-the-art workload forecasting schemes, their applied techniques, and motivations to conduct them. It categorizes these schemes regarding their applied predicting method and describes how each framework tries to predict the future load and employs these results in resource management, auto-scaling, and scheduling. After conducting an in-depth analysis of the literature, open research issues in this context are provided which can lay the foundation of future studies.

To the best of our knowledge, this is the first article aimed to carry out a comprehensive study on the workload prediction schemes in the cloud computing context. The main contributions of this article are as follows:

  • The background knowledge and existing challenges about the load prediction are presented.

  • A classification of the recently published load prediction schemes is conducted according to their applied prediction algorithm. Also, the main contributions of each load prediction scheme are summarized and in each category of workload prediction schemes, their applied simulation factors, simulation environments, workloads, predicted factors are listed and compared.

  • A critical discussion and a comprehensive comparison of the load prediction schemes are provided and their features are analyzed which can be useful in determining the future studies area.

  • Illuminating the future researches challenges and open problem in the load prediction context.

The remaining of this article are organized as follows: Sect. 2 provides background concepts about load prediction and Sect. 3 presents the classification and overview of the literature. Also, Sect. 4 provides discussion and comparison results and presents the concluding issues and open research directions. Table 1 specifies the abbreviations applied in the rest of this article.

Table 1 Abbreviations and Acronyms

1.1 Workload prediction

Generally, the workload can be defined as all inputs requests which are sent from online interactions of the end-users with the cloud services or to batch-processed jobs. This section is trying to provide the main challenges, advantages and various details of the workload prediction in cloud DCs.

1.2 Motivations and objectives

Using workload prediction, dynamic resource management and proactive auto-scaling can achieve several important objectives. For instance, accurate forecast of the near future workload has a direct effect on the reducing response time, SLAV, over-provisioning, and under-provisioning problems. Effective handling of the workloads increases the scalability and throughput of the systems. Also, by preventing the over-provisioning of the virtual resources, the power consumption of the cloud DCs, cost, and the number of failed requests can be decreased, and customer satisfaction can be improved.

Figure 2 indicates the main steps of the auto-scaling process which should be executed in the cloud environment to provide elasticity and deal with the fluctuating workloads. As shown in this figure, these four steps known as MAPE loop, are monitoring, analysis, planning, and execution steps. In the monitoring step, auto-scaler should monitor the specified performance indicators to determine the need for scaling operations. In the analysis step, the auto-scaler determines whether it is necessary to perform scaling actions according to the monitored information.

Fig. 2
figure 2

MAPE loop

To be more specific, the following issues should be considered in these items:

  • Scaling timing: The auto-scaler should decide about the scaling action. It can reactively/proactively provision or de-provision the resources.

  • Load prediction: if the auto-scaler is proactive, the load should be predicted accurately.

  • Adaptiveness to changes: The auto-scaler should handle the changes and timely adapt its model and tunings to the new situation.

  • Oscillation mitigation: scaling oscillation happens when the auto-scaler performs opposite scaling actions in a short period of time. Since this problem causes high resource wastage and SLAV, it should be prohibited.

The planning step estimates the total virtual resources which should be provisioned/de-provisioned in the next scaling action regarding constraints such as monetary cost to be more specific, the following operations will be performed in this section:

  • Resource estimation: the planning step should be able to estimate how many resources are just enough to handle the current or incoming load. This is a difficult task as the auto-scaler requires to determine needed resources without being able to actually execute the scaling plan to observe the real application performance, and it has to take the specific application deployment model into account in this process.

  • Resource combination: to provision resources, the auto-scaler can use vertical scaling or horizontal scaling. If horizontal scaling is employed, as the CSPs offer various types of VMs, the auto-scaler can choose one of them.

In the last step or the execution phase, the scaling plan should be executed to provision or de-provision the decided resources.

1.3 Challenges

Figure 3 depicts the main challenges of the workload prediction in cloud computing DCs, which can be elaborated as follows [8]:

Fig. 3
figure 3

Workload prediction challenges

  • Adaptability: The prediction model should be adaptable to the behavior changes of the hosted applications and must learn the applications dynamic behavior to decrease the prediction error. However, workload prediction schemes my fail when the workload data does not have any specific distributions.

  • Proactive: Since the VM provisioning and migration are time-consuming, the prediction should be proactive. Thus, before the load burstiness occurs, the model should predict future demand so that the resource manager has enough time to provide the appropriate resources.

  • Historic Data: An effective prediction model should investigate all effective resources and parameters on the workload behavior. It should consider the correlation between resources patterns extracted from historical data could show the application behavior in various dimensions and estimate the future behavior accurately. However, the proactive resource management schemes suffer from clod start problem, in which there is not required workload historic data to train the workload predictor.

  • Complexity: To be efficient, time and space complexities of the prediction model should not be significant.

  • Data Granularity: The initial phase for designing the prediction model is to determine which resources should be monitored. Then, the length of the sampling intervals should be defined, because the coarse-grained long-term sampling causes the model to lose the dynamism of the system while the short-term sampling, fine-grained, increases the cost of data collection and processing. It may include the details that are not useful and the model complexity increases to capture them.

  • Pattern Length: Choosing the pattern length is a challenging issue and it should be selected to find the most popular patterns and the application behavior. In most prediction models, the pattern length is fixed and a sliding window is used to extract the patterns. Improper pattern length prevents the model to learn the specific patterns.

1.4 Workload type

Generally, the Cloud DCs workloads consist of a collection of diverse applications and services which have their own performance and resource requirements and by constraints specified in the form of SLAs.

The workloads can be classified according to their processing model, architectural structure, resource requirements, and non-functional requirements. In this context, regarding the processing model used by the workloads, online (interactive) and offline (batch) can be considered for them which have different behaviors, requirements, and impact on the resource management policies. For instance, an interactive workload can have short tasks, while the batch ones consist of resource-intensive and long tasks.

Also, cloud workloads can be classified according to their architectural structure expressed regarding the data flows and processing of each individual application. For example, multi-task applications can be structured by pipeline model, parallel model, and hybrid models. Furthermore, regarding the amount of applied resources workloads can be classified as I/O intensive, compute-intensive, and bandwidth sensitive. At general, network bandwidth is important for online interactive workloads, but storage and computing resources indicate batch workloads. Moreover, the resource requirements of some workloads may be stable, while as shown in Fig. 4, others may have specific temporal patterns such as periodic, bursting, growing and on/off. These patterns typically depend on the intrinsic characteristics of the applications, as well as on the workload intensity. A communication-intensive phase can be followed by a computation-intensive phase. The burstiness of the workload intensity in cloud DCs can increase resource demands and may have a negative impact on cloud performance.

Fig. 4
figure 4

Workload type

1.5 Datasets

Also, the workloads applied to evaluate the workload prediction approaches can be synthetic or real. Synthetic workloads are generated with workload generators, while real workloads can be achieved from benchmark datasets such as Google Cluster trace, NASA dataset, etc. or must be retrieved from real cloud platforms. Various datasets and workloads are used to evaluate the workload prediction approaches. Figure 5a depicts a host load from the Google traces and Fig. 5b indicates a trace load from the AuverGrid dataset. Google workload contains over 40 million task events at minute resolution across about 12,000 hosts in 2011 over a 1 month period. These traces specify the resource and scheduling information of each task, such as scheduling class, event type, resource request, priority, resource usage rate, etc. Host load at a given time point is a total load of all running tasks on that host. Often the workload prediction schemes conduct seasonal and non-seasonal studies on the workload time series.

Fig. 5
figure 5

A host load in two workload datasets

1.6 Evaluation factors

To evaluate the effectiveness of the workload prediction and analyze its impact on the resource the following metrics are used:

  • Accuracy: The prediction models are mainly evaluated by the accuracy of their predicted results and whose outputs are closest to the actual values is the best. The deviation or error metrics measure the difference between the real behavior and the predicted behavior of the application the result of the prediction error, may result in problems such as under-provisioning and over-provisioning can. Figure 6 indicates some of the basic prediction error metrics utilized in the evaluation of the workload prediction approaches.

    Fig. 6
    figure 6

    Prediction error metrics

  • Cost: prediction errors can lead to SLAV and low resources utilization. The cost metrics are employed to measure the cost resulted from the prediction error.

  • Success: Success metrics specify how much the prediction method is able to forecast the future behavior of the application. Success Rate is defined as the ratio of the number of accurate estimations to the total number of estimations. The accurate prediction falls within some delta of the actual value.

  • Profit: Profit metrics are applied to compute the profit of the CSP computed according to the revenue obtained from renting out the resources, preventing the SLAV and resources wastage.

2 Load prediction schemes in cloud computing

A number of workload prediction schemes such as [25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50] have appeared in the literature. This section presents a classification of the proposed workload prediction frameworks and describes their main contributions and their utilized techniques for cloud workload prediction. Figure 7 depicts the classification of the workload prediction frameworks in the cloud computing environment according to their applied algorithms in the forecasting process. To be more specific, this section highlights the following issues about the investigated forecasting schemes:

Fig. 7
figure 7

Taxonomy of the load prediction schemes in cloud computing

  • What are the main contributions of each workload prediction scheme?

  • Which prediction algorithms are used to forecast the workload accurately?

  • Which workload datasets are applied in each forecasting scheme?

  • Which environments are used to evaluate each workload prediction scheme?

  • Which evaluation factors are applied to assess the accuracy and effectiveness of each load forecasting scheme?

  • Which resources are predicted by each scheme to recognize the incurred workload?

2.1 Regression-based schemes

This subsection is aimed to conduct a review on the regression-based workload prediction frameworks [51] designed for various cloud environments.

In [52], Antonescu et al. presented two predictive SLA-aware VM-scaling algorithms for dEIS systems for finding better scaling conditions using distributed applications derived from constant-load benchmarks, with SLA constraints. They used autoregressive predictive SLA-aware scaling to guarantee performance in the distributed cloud applications. As an advantage, the authors provide a comprehensive evaluation of their work regarding various metrics such as RMSD, execution time, number of VMs, and so on.

In [53] Yang et al. presented a linear regression model to estimate the load and applied it in an auto-scaling mechanism to scale virtual resources in real-time scaling and pre-scaling. They considered the pre-scaling using integer programming and introduced a greedy method for accurate forecasting which incurs a lower cost and SLAV.

2.1.1 ARIMA-based schemes

This subsection is aimed to conduct a review on the ARIMA-based load prediction frameworks such as [54, 55]. In [56], Li et al. presented ARIMA-DEC, a load prediction-based VM provisioning technique. This scheme employs an ARIMA-based load predictor with dynamic error compensation and applies it in TBAMP, a time-based cost-aware provisioning algorithm. ARIMA-DEC can reduce SLA default rate and TBAMP algorithm can save rental cost. TBAMP algorithm considers the cost of adjusted VMs and takes the cost of released VMs into account.

In [57], Kumar et al. tried to conduct a better forecast of the load to reduce the power cost. They compared forecast performance of the ARIMA, SARIMA (seasonal integrated ARMA), and ARFIMA (fractionally integrated ARMA) with the singular spectrum analysis method using CPU, RAM and network trace collected from Wikimedia Grid. They showed that increasing the input size does not necessarily provide better forecasting results, but the ARFIMA model suffers from high computation time when the input size increases.

In [58], Calheiros et al. provided a proactive approach for dynamic provisioning resource regarding forecasting performed with the ARIMA model. It applies a load analyzer component which provides its estimations to the other components to enable them to properly scale the resources. However, because of the limitations of the ARIMA model, this model is not able to predict the peak resource consumption.

In [59], Messias et al. tried to predict requests arriving in the next time period to prevent overloading. This problem becomes complicated when historical data is not available to be evaluated. They proposed a prediction approach using GA to aggregate time series-based forecasting models. The authors conducted workload prediction using the ARMA and ARIMA methods. They also applied the Holt-Winters approach to capture seasonality, but they do not provide a cost model to be optimized.

2.1.2 Support vector regression-based schemes

This subsection is aimed to conduct a review on the support vector regression or SVR-based load prediction frameworks which a number of them have been proposed in the workload prediction literature. For instance, in [60], Barati et al. provided TSVR, a tuned SVM-based approach which trains three SVR-based factors using the GA and PSO algorithms. It uses a chaotic sequence to improve prediction accuracy and prevented premature convergence by increasing the exploration and diversity in the search space. It also reduces the computational burden of generating random numbers in comparison to GA. In addition, kernel-based methods are applied to forecast memory and CPU loads. They performed simulation using Google cloud traces. Nevertheless, the TSVR takes a long time for tuning SVR parameters at the beginning of the algorithm.

The work in [61], provided a decision-maker to handle the VM migration by estimating the load and combining it with predicted performance factors of the migration process. Thus, the migration can be started when the required resources are available and no performance degradation of applications happen. Figure 8 depicts the architecture of load prediction in this scheme.

Fig. 8
figure 8

Dynamic resource provisioning in [61]

Table 2 determines the datasets, simulation software, evaluation factors, and predicted factors applied in the evaluation of the regression-based schemes.

Table 2 Comparison of the regression-based schemes load estimation solutions

2.2 Classifier-based schemes

This part of the paper discusses the load forecasting approaches which have applied various types of classifiers for workload prediction.

2.2.1 SVM-based schemes

This subsection is aimed to conduct a review on the SVM-based load prediction frameworks designed for various cloud environments. For instance, in [62], Tong et al. proposed a feature periodical coefficient and some existed classification methods are implemented. Experiments on the real-world dataset invalidate the efficiency of the new proposed feature, which is in the most effective combinations of features, it boosts successful rate and decreases the MSE. The SVM method can achieve nearly the same performance as the Bayes methods and their performance is higher.

In [63], the authors presented WWSVM, a load prediction model using weighted wavelet SVM to estimate the PMs’ load in the cloud DC. They used the wavelet transform as a kernel function in the SVM to assign a weight to the samples according to their importance and enhance the prediction accuracy. They have applied the PSO algorithm for parameter optimization and used the Google dataset to verify their approach. As shown in Fig. 9, this scheme consists of data preprocessing and load prediction phases, in which the first phase performs workload normalization and autocorrelation analysis. To validate the performance of this load prediction scheme, experiments are conducted using the Google dataset and chose CPU utilization in the load prediction process.

Fig. 9
figure 9

The block diagram of the prediction model in [63]

In [64], Nikravesh et al. try to improve the prediction accuracy of auto-scaling using SVM and ANN classification. They indicated that prediction accuracy of SVM and ANN depends on their load pattern, but, SVM provides better prediction accuracy with periodic and increasing load patterns, while ANN has better results in forecasting unpredicted load patterns. They evaluate this scheme by using Amazon EC2.

2.2.2 Random forest-based schemes

In [65], Cetinski et al. provided AME-WPC, a model for workload forecasting in the DCs which improves the prediction accuracy. They handled load prediction using classification and regression methods and tested it with the random forest classifier. The architecture of this approach is depicted in Fig. 10. But, the influencing events in the workload fluctuations are not considered in this scheme.

Fig. 10
figure 10

Load forecasting in [65]

2.2.3 Artificial neural network-based schemes

This subsection is aimed to conduct a review on the ANN-based workload estimation schemes [66,67,68,69,70] designed for cloud environments. For instance, in [71], Imam et al. employed a time delay ANN and a regression method to forecast jitter in the load. This regression model applies moderately to the trace, as evident by spline interpolation. Nevertheless, the analysis depicts more improvement in regression modeling techniques when dealing with such traces.

The work provided in [72], introduced POSITING, a forecasting model which conducts the sequential pattern mining, applies the correlation between various resources and finds applications’ behavioral pattern. They investigated the capabilities of online learning for POSITING to provide reliable results, but it is not adaptable to the load variations. As an advantage, this scheme considers the correlation between different resources and extracts behavioral patterns of applications independently.

In [73], Kumar, et al. proposed a load prediction model using ANN and DE algorithm which is capable of learning proper mutation method and crossover rate. The simulations performed on NASA provided HTTP traces. As an advantage, this scheme avoids the risk of being trapped in local optima. Figure 11 exhibits the ANN structure applied in this scheme.

Fig. 11
figure 11

Load predictor model in [73]

In [74], Lu et al. introduce RVLBPNN, a load prediction model which uses the BPNN algorithm to exploit the relationships among the arriving loads. RVLBPNN improves prediction accuracy compared to the HMM and naive Bayes classifier-based models by a considerable margin. However, issues such as periodicity of the workload are not considered in this scheme.

In [75], Zhou et al. presented a solution for dynamic load-based on AHPGD and HHGA-RBF ANN which focuses on the load balancing of the allocation of user request tasks in a cloud. This load prediction model uses a hybrid hierarchical GA and the recursive least-squares method to train parameters of RBF ANNs. It is aggregated with the weighted round-robin algorithm and updates the weights of each node within the time period. They proposed three modules in their algorithm: node load information monitoring module, load prediction module, and request scheduling module. The architecture of this scheme is shown in Fig. 12.

Fig. 12
figure 12

RBF neural network training by HHGA in [75]

In, Imam, et al. presented a resource allocation scheme to support the increasing need for VMs. They used time delay ANN and regression techniques for load prediction. They utilized real load traces for performance evaluation to show that time delay ANN can predict the load in a cloud environment.

2.2.4 Bayesian-based schemes

This subsection is aimed to conduct a review on the Bayesian-based schemes [76,77,78,79] designed for load prediction frameworks various cloud environments.

In [80], Di et al. proposed a forecasting method to estimate load over long-term intervals and the average load in future time intervals, based on the Bayes model. They detected predictive features of the load to capture the predictability and host load pattern. They determined the effective combinations of these features for prediction. As an advantage, this scheme can detect the mean load for the future hours with high accuracy and low MSE, regardless of fluctuations.

In [81], Dietrich et al. provided a linear predictor for Least Mean Squares, a regression model system parameter identification. Load fluctuation is estimated via a linear-in-parameters model. This observation reduces the complexity of parameter estimation as the LMS learns the parameters of the model iteratively as the game progresses. However, the LMS cannot always outperform a hand-tuned PID controller.

In [82], Tian et al. Minimizing Content Reorganization and Tolerating Imperfect Workload Prediction for cloud-based Video-on-Demand Services Nguyen et al. try to reduce content reorganization and tolerate imperfect load forecasting. They presented a video-on-demand servicing system according to a pay-as-you-go cloud. They proposed a load absorber and designed a provisioning algorithm called Absorb Window. Load absorbers eliminate the bandwidth wastage and reduce the content reorganization. The architecture of this approach is depicted in Fig. 13.

Fig. 13
figure 13

Control loop in [82]

2.2.5 Deep learning-based schemes

Deep learning approaches are suitable for long-term prediction of workloads and their performance can be further improved by increasing the size of training data and the depth of the model. A number of deep learning-based approaches are provided to forecast workload in the cloud DCs. For instance, in [83], Patel et al. tried to find a correlation among the workload of VMs regarding and predicted the workload of the next VMs with accuracy. In addition, they optimized granularity of training data, activation functions, and the number of layers. They have used predicted workload information for VM management and migration plan choice will be transferred to application provisioner which will receive the accepted user request and apply the suitable VM placement strategy to map the VM to PMs. They evaluated the effectiveness of their deep learning model using PlanetLab traces and showed that the LSTM can improve the performance of workload prediction while convolutional ANN gives a low performance. The architecture of this approach is depicted in Fig. 14. Their model receives the CPU utilization of VMs as input and forecast CPU utilization in the future.

Fig. 14
figure 14

Utilization-aware load forecasting in [83]

In [84], Gupta et al. applied multivariate LSTM models to forecast resource usage in the cloud DCs. They used the Google cluster traces and evaluated the LSTM model and bidirectional LSTM model with fractional difference-based methods. They indicated the LSTM model long-range dependencies in time series-based resource consumption data and produced better out of sample estimations. As an advantage, these multivariate extensions of LSTM and BLSTM models generate better estimations than univariate ones.

In [85], Zhang et al. introduced a deep learning model using the canonical polyadic decomposition to forecast the cloud load. They used the deep learning model to learn important features of the complex load data in VMs and applied the canonical polyadic decomposition to compress parameters to enhance the training efficiency. Table 3 determines the datasets, environments, evolution factors, and predicted factors applied in the classifier-based load forecasting schemes.

Table 3 Comparison of the classifier-based workload prediction schemes

2.3 Stochastic-based workload prediction schemes

This subsection addresses the stochastic prediction schemes designed to estimate various loads in the cloud DCs using stochastic models.

2.3.1 Markov model-based schemes

A Markov chain is a mathematical tool to model a system during the time which experiences the transition from one state to another according to certain probabilistic rules. Markov chains can be classified as discrete-time and continuous-time Markov chains. Also, based on the number of previous states which they consider for deciding the next state, they can be classified as the first order and high order Markov chains. By definition, in the first-order Markov, each state only depends on its previous state, while in the high order Markov, each state depends on some of its predecessors. Markov chain models are successfully applied by various schemes such as [86, 87] to model the workload prediction. This subsection is aimed to conduct a review on the Markov chain-based schemes such as [88]. For example, in [89], Pacheco et al. studied the web load fluctuations to find how to achieve virtual resources in fluctuating traffic. They investigated the Markovian arrival processes or MAP and the related M/M/1 queueing model for performance forecasting of the deployed servers. MAPs are a special type of Markov models applied as a compact description of the time-varying characteristics of loads. MAPs can be used for heavy-tail distributions in HTTP traffic and can be applied within analytical queueing models to estimate system performance.

In [90], Shen et al. presented CloudScale, to automate resource scaling by using requests prediction and prediction error handling. It deals with scaling conflicts using migration. They used CloudScale on top of the Xen hypervisor and conducted simulations using the RUBiS benchmark driven by real Web server traces. As an advantage, this scheme employs DVFS for mitigating the energy usage regarding the SLA.

2.3.2 Hidden markov model-based schemes

HMM, or hidden Markov model is one of the most widely applied statistical Markov modeling tools for discrete-time series [91]. In contrast to the Markov chain models where all states are visible, an HMM uses hidden states which are unobservable. The HMM can be used to predict the future state of a stochastic variable. HMM are also used for workload prediction. For example, in [92], Khan et al. provide a co-clustering solution to find groups of VMs that have correlated load patterns and their activation periods. They introduced an HMM-based method to detect the temporal correlations in the VM clusters and to forecast fluctuation in their pattern.

In [56], Xu et al. tried to forecast and categorize the short-term cloud load using an HMM-based clustering approach. The Bayesian information criterion and Akaike information criterion are used to find the optimal HMM model size and cluster numbers. Trained HMMs are applied to detect the cluster that may possess the current load and with its data, a GA optimized Elman network is provided to forecast future load. Figure 15 depicts the block diagram of this forecasting scheme. However, they have not considered the correlation among the CPU, memory, and disk workloads.

Fig. 15
figure 15

Block diagram of the forecasting process in [56]

2.3.3 Queuing model-based schemes

This subsection addresses schemes such as [93,94,95] which have used queueing models for workload prediction. For instance, in [96], Sahni et al. provide a heterogeneity-aware solution to handle the dynamic loads and keep the required QoS level. It conducts estimation using online resources profiling and workload history. It also provides the required resource configurations to achieve QoS at reduced cost and improved resource utilization. It captures the performance variation in the VMs and uses the request arrival pattern and the service rate to configure resources. However, this model only considers independent applications and does not support dependencies among the incoming requests.

The work in [97], provided a VM level resource auto-scaling scheme for a web application which can forecast its requests to determine optimal resource demand using queuing theory and multi-objective optimization. This scheme takes into account factors such as cost, latency, and SLAV factors in each time-unit re-assignment. They employed the Amazon cloud and evaluated their scheme using three real datasets.

Table 4 gives the datasets, simulation environments, and evaluation factors applied in the evaluation of the stochastic workload forecasting schemes.

Table 4 Comparison of the stochastic load prediction schemes

2.4 Grey predicting-based schemes

The scheme in [98], presented a load predicting approach using grey predicting model to allocate VMs. The authors have used the time-dependent load in the same period in each day and forecasted whether the VM load tendency is towards increasing or decreasing? They have compared the forecasted value with the workload of the previous time period, and decide which VM in the PM should be migrated to have a balanced workload and less energy usage. Their experiments indicated that this scheme uses fewer data in the prediction process and can allocate the VMs resources with energy-saving. The architecture of this approach is depicted in Fig. 16.

Fig. 16
figure 16

Dynamic resource management in [98]

2.5 Autocorrelation clustering-based schemes

The work in [99], Kluge et al. have employed autocorrelation clustering to predict the load of a periodic soft real-time application. Using this forecasting method, they tuned the processor performance to meet all deadlines. Nevertheless, they have not handled the numerical instabilities induced by the implicit rounding during the autocorrelation clustering algorithm execution.

2.6 Chaos-based schemes

In [100], Ardagna et al. applied capacity allocation techniques to coordinate multiple distributed resource controllers working in geographically distributed cloud sites. Capacity allocation solutions are integrated with a load redirection mechanism which forwards the incoming requests between various domains. The advantages include reducing the costs of the allocated VMs and meeting QoS constraints such as the average response time.

In [101], Qazi et al. presented PoWER that tries to predict the behavior of the cluster and distributes VMs in the cluster and turns off unused PMs for reducing power consumption. They have used chaos theory to make prediction indifferent to the loads’ type and inherent cycles in them, and by conducting experiments indicated that their approach outperforms better than FFT-based time series method in load prediction.

2.7 Kalman filter model-based schemes

In [102], Hu et al. presented three models to estimate load using a Kalman filter model and put forward a pattern matching model to forecast the load. They applied its results to provide a new trigger strategy for cloud elasticity automatic scaling mechanism. This model improves the forecasting accuracy and reduces the automatic scaling delay, but it should be extended to support other workload prediction scenarios and improve its predicting accuracy. Table 5 determines the datasets, simulation software, evolution factors, and predicted factors applied in the outlined workload prediction schemes.

Table 5 Comparison of the grey predicting, autocorrelation, chaos, and Kalman filter-based workload prediction schemes

2.8 Wavelet-based schemes

This part of the article tries to discuss the wavelet-based load estimation schemes such as [103,104,105,106,107] designed for cloud computing DCs. For example, in [108], Liu et al. proposed a VM migration solution which applies a time series-based load forecasting algorithm. They tuned the upper and lower bounds of load for hosts and predicted the tendency of their subsequent loads by creating a load time series using the cloud model. Afterward, they stipulated a VM load-aware migration WAM which chooses a source PM, a destination PM, and a VM on the source PM to be migrated. Also, in this scheme, the authors have considered CPU consumption as workload and applied the PlanetLab dataset and the CloudSim software for evaluation. The flowchart of this framework is provided in Fig. 17.

Fig. 17
figure 17

Load forecasting in [108]

In [109], Lyu et al. introduced a forecasting method consisting of a forecast module, an adjustment module, and a collection module. The first module applies machine learning methods to enhance forecasting accuracy. As an advantage, they introduced an effective way of recognizing the dual-threshold load rate forecast mechanism to balance availability and profit. The architecture of this approach is depicted in Fig. 18.

Fig. 18
figure 18

Workload predicting architecture in [109]

In [110], Qazi et al. presented an efficient method to predict the cluster behavior based on its history and re-distribute VMs to free under-utilized PMs and turned them off to save power. They evaluated real loads and used a chaotic time series. Chaos theory with optimizations makes this framework indifferent to the loads’ type and inherent cycles in them.

2.9 Collaborative filtering-based schemes

In [111], Duggan et al. presented a learning-based solution for load forecasting for analytical databases applied by different CSPs. Enabling load performance estimations that can be ported across hardware configurations it could help cloud users with their service-purchase decisions and CSPs in their provisioning decisions. This approach applies collaborative filtering to forecast lightweight load fingerprints that model the behavior of concurrent query loads for choosing hardware configurations.

In [112], Zhang et al. provided a prediction-based scaling solution which uses collaborative filtering with a pattern matching technique. It enhances reactive rule-based scalability techniques and provides a method to link SLA according to lower-level metrics from the infrastructure. Nevertheless, for fine-tuning of this approach, more infrastructure metrics should be considered.

Table 6 determines the dataset, simulation software environment, and the factors predicted and evaluated the wavelet, collaborative filtering-based schemes-based workload prediction schemes.

Table 6 Properties of the wavelet and collaborative filtering-based load forecasting approaches

2.10 Ensemble-based schemes

Even though some of the previously discussed workload prediction schemes have applied a single prediction method, their accuracy may not be as required and also the prediction length may not be increased. To mitigate these problems several ensemble-based load forecasting frameworks have been proposed in the literature which this subsection is aimed to review them.

For example, in [113], Cao et al. introduced propose an ensemble method which uses multiple models to increase performance and CPU load forecasting. They apply a two-layer ensemble model which consists of predictor and ensemble layers. The predictor optimization layer applies new predictor instances and removes those ones with poor performance. The ensemble layer produces the final forecasting based on the results of multiple predictor instances and can provide feedback to the predictor optimization layer, which helps it to adopt appropriate optimization strategies. In this scheme, predictor replacement is used regarding the performance evaluation for maintaining the performance of a predictor set. Then, the poorest predictor should be removed and another predictor should be added. The architecture of this approach is depicted in Fig. 19.

Fig. 19
figure 19

Ensemble-based forecasting in [113]

The work in [114], provided a prediction method to enhance accuracy in the auto-scalers using an ensemble-based load forecasting approach. They evaluated several predicting models for in predicting various load patterns. This ensemble technique is implemented using three real-world loads. They trained each model in real-time and aggregated the forecasted results based on the weights computed using inverse errors of the fitted values for the training data. However, further work is needed to identify the optimum input window size to maximize accuracy while meeting the temporal restrictions on calculating the forecasts in real-time.

In [115], Singh et al. tried to reduce PMs’ power usage, cooling, and CO2 emissions to improve the sustainability of the cloud infrastructure. They used load forecasting techniques that guide in identifying servers, time intervals, and other critical parameters needed in the cloud DCs. This scheme is able to deal with non-stationarity workloads and by updating its learning parameters, avoids re-training of its prediction models. Furthermore, they applied Weighted Majority and Simulatable Experts to deal with the extensive non-stationarity and massive online streaming data.

In [116], Sommer et al. proposed PRUF, an ensemble-based forecasting module to predict future utilization of VMs. They proposed a proactive VM migration policy using predictive overload detection and performed a study in the CloudSim. The architecture of this approach is depicted in Fig. 20. Table 7 compares the properties of ensemble-based methods.

Fig. 20
figure 20

Forecasting architecture in [116]

Table 7 Comparison of the ensemble-based schemes workload prediction schemes

2.11 Hybrid load prediction schemes

This subsection attempts to discuss the load forecasting designed using a combination of the before mentioned predicting methods.

2.11.1 SVR + kalman filter

In [117], Hu et al. presented KSwSVR, a multi-step-ahead load predicting method, which integrates SVR and Kalman smoother. Public trace is applied to verify its forecasting accuracy, stability, and adaptability. CPU allocation experiment indicated that the KSwSVR can reduce resources usage while meeting SLA requirements. In this scheme, the Kalman smoother is employed to reduce the noise of resources usage data, caused by measurement errors.

2.11.2 Deep learning + SVM

In [118], Tarsa et al. used hierarchical sparse coding, which is a form of deep learning to model user-driven loads using on-chip hardware performance counters. They predicted periods of low instruction throughput, which frequency and voltage can be scaled to reclaim power. Using a multi-layer coding structure, this method codes counter values features learned from data and passes them to an SVM classifier where they act as signatures for predicting future load states.

2.11.3 ARIMA + RNN

In [119], Janardhanan et al. focused on the time series predicting of CPU usage in DCs using LSTM network and evaluated it against the ARIMA model.

2.11.4 ARIMA + wavelet decomposition

In [120], Bi et al. introduced a hybrid method which uses wavelet decomposition and ARIMA to forecast the future load. It tries to smooth the task time series by using SavitzkyGolay filtering and decomposes it into multiple components via wavelet decomposition. Their forecasting results are reconstructed via wavelet reduction to estimate the number of arriving tasks. However, better data smoothing algorithms can be used to further improve the prediction accuracy of this scheme.

2.11.5 LR + SVM

In [121], Liu et al. proposed an adaptive approach for load forecasting, which classifies load into various classes assigned for various forecasting models regarding the load features and assigns various prediction models regarding the workload features. They transformed the load classification problem into a task assignment problem using a mixed 0–1 integer programming model and provided an online solution for it. For prediction, they have used linear regression and SVM which is good at the prediction of nonlinear data. They applied the Google cluster trace to evaluate this approach. The architecture of this solution is exhibited in Fig. 21. As an advantage, this approach improves the platform cumulative relative forecasting errors.

Fig. 21
figure 21

Load forecasting in [121]

2.11.6 ANN + regression

In [122], Tang et al. introduced MLWNN which applies linear regression and wavelet ANN to forecast short-term load. They provide a heuristic power-aware job scheduling with a load forecasting method and employed the error backpropagation algorithm to train a three-layered feed-forward WNN model and get a minimum error. The authors presented a job scheduling approach, which includes a resource management method based on the MLWNN workload prediction. They conducted their experiments using CloudSim software and indicated that their approach can reduce power usage and increase resources utilization.

In [123], Gandhi et al. tried to improve resources allocation in cloud DCs to reduce SLAV and power usage. They employed a predictive resource provisioning method, which deals with load estimation at coarse time scales and reactive provisioning to deal with any excess of load at finer time scales. The combination of predictive and reactive provisioning achieves an improvement in meeting SLA, conserving power, and reducing provisioning costs. The architecture of this scheme is shown in Fig. 22.

Fig. 22
figure 22

Hybrid resource provisioning in [123]

2.11.7 ARM + regression + SVR

In [124], Guo et al. proposed NUP, a hybrid forecasting method, which uses the load type to switch forecasting algorithms. It used autocorrelation coefficients and Hurst exponents of loads to determine the loads belong to the period or the trend. NUP applies linear regression and similarities among periods to replace missing data of trend and period loads. It uses linear regression and ARMA to predict the trend and SVR to forecast the period.

Table 8 determines the datasets, simulation software, evaluation factors, and the prediction factors considered in the hybrid and ensemble-based load forecasting schemes.

Table 8 Comparison of the hybrid and ensemble-based schemes load prediction approaches

3 Discussion

This subsection provides an extensive comparison of the workload forecasting approaches designed for the various cloud environments and its results can illuminate the future research directions. It mainly analyzes the following issues about these schemes:

  • Publication year of the published schemes in the workload prediction schemes.

  • Simulator software and environments applied to analyze the outlined schemes.

  • Factors applied to compare the proposed frameworks and exhibit their effectiveness.

  • Datasets and workloads employed in the investigated prediction schemes.

  • The number of the load prediction schemes which have applied each forecasting method.

  • The number of schemes which have predicted various resources in their predictions.

Figure 23 depicts the publication year of the outlined load forecasting schemes. As shown in this figure many workload predictions schemes have been recently proposed to deal with this problem and this context is an active research field.

Fig. 23
figure 23

Publication year of the outlined load forecasting schemes

Furthermore, Fig. 24 exhibits the datasets employed in the studied schemes and specifies the number of solutions which apply each dataset. As shown in this figure, the main datasets applied in this context are Google and NASA datasets and the self-collected data by the authors from real environments.

Fig. 24
figure 24

Applied datasets in the workload forecasting schemes

Figure 25 depicts the experiment factors applied in the evaluation of the load prediction schemes and the number of schemes which have applied each evaluation factor. As it is shown in this figure, factors such as CPU load, cost, and execution time are mostly employed by the studied schemes. Figure 26 shows the number of loads predicting schemes designed and proposed using the prediction methods outlined before. As shown in this figure, ANN and wavelet transform is used by more load forecasting schemes. Figure 27 indicates the factors which forecasted by the workload prediction schemes. On the other hand, some of the schemes recognize the load with the increase of CPU consumption while the others may consider other factors such as memory, bandwidth, and the even the disk I/O. As shown in this figure, CPU consumption is a critical factor considered by more forecasting schemes to detect workload.

Fig. 25
figure 25

Simulation factors applied in the load prediction

Fig. 26
figure 26

Applied algorithms in the load predicting methods

Fig. 27
figure 27

Number of the scheme which forecasted each factor

Figure 28 shows the factors predicted by the workload prediction approaches to forecast the workload. As shown in this figure, only a few schemes have considered three or four factors in their predictions and in future studies, this issue can be further investigated to better predict the workload and prevent resource wastage. Figure 29 reflects the environments and simulators utilized in the evaluation of the investigated schemes and determines the number of approaches which have used each kind of simulators. As shown in this figure, CloudSim is the most popular simulator software applied in this context.

Fig. 28
figure 28

Factors predicted in the load forecasting schemes

Fig. 29
figure 29

Environments and simulators

Figure 30 exhibits the number of the scheme which has applied only one dataset and number of schemes that have used two datasets in their simulation and verification process. As depicted in this figure, only a few schemes have applied two datasets; consequently, in future researches and studies the workload forecasting schemes can evaluate their approaches using multiple datasets to further ensure of their approach’s accuracy.

Fig. 30
figure 30

Datasets

4 Conclusion

The main objective of the cloud computing paradigm is to provide various virtual remote resources and service to its customers. In this context, providing the guaranteed QoS, increasing throughput, and return on investment are of the features which can be achieved by effective resource management in cloud DCs. Future workload prediction in cloud DCs is an essential step in proper resource management and auto-scaling approaches which aids cloud service providers in provisioning/de-provisioning virtual resources. However, prediction errors can cause problems such as under-provisioning or over-provisioning, which the former reduces the cloud performance and leads to SLA violations and the latter leads to the resource wastage problem.

Regarding the importance of the accurate load prediction based on the historical workload data and handling issues such as workload fluctuation and Slashdot effects, various load prediction schemes are provided in the literature. This paper first presents the basic concepts and challenges in the workload prediction process. Then, it delivers a taxonomy and survey of the investigated load forecasting approaches and describes their main contributions and applied algorithms to conduct predictions. Furthermore, features such as the applied workload datasets, simulation factors, predicted factors and simulators employed by each forecasting scheme is illuminated. Furthermore, an extensive analysis of the workload forecasting schemes is provided which can be useful for future studies and researches. In the future studies the following issues can be further investigated:

  • Exploring other machine learning techniques to further improve the workload prediction’s performance.

  • Providing better load forecasting schemes to recognizing more realistic and complex request patterns which may happen in real life.

  • Defining new workload prediction metrics, for example on the lags in burst predictions. Also, since the cost of prediction errors in the cloud environment is not symmetric, defining better evaluation metrics should be considered on this issue.

  • Regarding the suitability of the non-linear prediction models to predict time series with seasonal variations, they can be used for optimizing processes with longer time horizons.

  • Investigating the resource management algorithms to utilize the achieved forecasting results.

  • Integrating the load prediction schemes with the intrusion detection schemes to recognize the DDoS attacks from the Slashdot effects.

  • Creating lightweight workload prediction schemes to be applied in the recently emerging technologies such as IoT, cloudlets, fog computing, and mobile edge computing which have limited and fewer resources than the cloud DCs.

  • One of the important directions for future researches is the integration of the autoscaling schemes with the IDS and IPS systems to better handle the DDoS attacks and Yo–Yo attacks. Generally, autoscaling systems convert the DDoS attacks to EDoS attacks to deal with malicious behaviors. Recognizing the DDoS workload from the users’ workload is an open issue which should be dealt with in the future researches [55, 125,126,127].