Keywords

1 Introduction

Elasticity is the ability of a system to dynamically adapt to load variations, increasing or decreasing resources to meet performance and efficiency demands. There are several platforms, both in the public cloud and on-premises, that provide autoscaling functionality to implement elasticity. However, in some cases (e.g. Kubernetes) their approach is based on a reactive algorithm that can have difficulty anticipating and handling rapid and extreme changes in load.

Rather than simply reacting to load changes after they occur, our approach aims to use the predictive capability of AI models to forecast load variations and proactively adapt system resources. The approach has the potential to improve the efficiency, performance and responsiveness under highly variable loading conditions. This paper will evaluate the use of supervised machine learning models such as logistic regression, decision trees and neural networks, as well as the unsupervised K-means strategy. This paper is structured as follows: Sects. 2 and 3 present the background and related works. Subsequently, in Sect. 4 we present the methodology used. Then in Sects. 5 and 6 we present the data collection and preparation process. Section 7 presents the results obtained. Section 8 shows the preliminary conclusions of the work.

2 Background

Before starting, it is important to introduce the concepts related to research.

As previously mentioned, elasticity is the ability of certain types of systems to automatically create or eliminate resources in order to adapt to varying load conditions. One of the ways to implement elasticity is by means of autoscaling or automatic scaling. There are two fundamental types of autoscaling. In reactive autoscaling, the system monitors current workload traffic or resource usage; when certain thresholds in performance metrics are reached, the system will then calculate and determine a suitable scaling decision. Proactive autoscaling uses sophisticated techniques to predict future resource demands. Based on these predictions the autoscaler decides to scale up or down according to a predetermined forecast.

The following is a brief description of the models evaluated in this work:

Logistic regression [3] is a statistical technique that aims to produce, from a set of observations, a model that allows the prediction of values taken by a categorical variable, often binary, from a series of continuous and/or binary explanatory variables.

Decision trees [7] are supervised learning models that predict output as a function of successive decisions made on inputs. Each node in the tree represents a decision, which makes the trees easy to understand and interpret. The Gradient Boosting algorithm [1] builds predictive models as an ensemble of weaker models, usually decision trees. Gradient Boosting combine several weak models to create a strong model. XGBoost is a specific implementation of Gradient Boosting that has been optimized to be highly efficient and flexible. Finally, Neural Networks [2] are deep learning models that mimic the way the human brain works. They are particularly useful for tasks that involve learning from large amounts of high-dimensional data, such as images or text.

The K-means algorithm [4] is one of the most widely used unsupervised learning algorithms for partitioning a given data set into K groups. The “means” in the name refers to the centroids of the groups, which are imaginary or real points representing the center of the groups.

3 Related Work

In the literature, there are several works that have addressed the proactive self-scaling in containers through AI techniques. In [8], for example, proactive autoscaling using a LSTM (recurrent neural network) is proposed. The authors describe a methodology similar to the one presented in this paper by collecting a data set coming from cloud-native applications. The data set is used to train a learning-based forecasting model. The model is used to effectively predict the future resource demand for the application. Authors of [5] propose artificial neural networks (ANN) for proactive autoscaling and [6] use an extreme learning machine (ELM) based workload prediction model, whose learning time is very low. The workload is predicted more accurately.

4 Methodology

The methodology used in this work was based on CRISP-DM (Cross Industry Standard Process for Data Mining). We add some steps to build and tune the AI models: model architecture selection, hyperparameter tuning, data augmentation, and cross-validation. The steps are shown in Fig. 1.

Fig. 1.
figure 1

Implementation phases of CRISP-DM methodology for IA (Image authored by the authors)

5 Data Generation

In order to obtain the data to train the models, we developed and deployed three services on a machine with a Windows 11 operating system, Intel Pentium i7 hardware, 16 GB of RAM 4 cores, and 8 logical processors.

The services, implemented in Spring Boot and under a Rest API, are described as follows: a) Create-token is responsible for generating a token with a parametric controlled validity to allow authentication of a user with the correct credentials in a given period of time. b) generateRAMMemoryConsumption is a code that gradually increases its RAM memory usage, emulating a scenario where a service requires more and more resources as it processes more data. c) calculoAltoConsumoProcesador generates a high CPU usage, performing mathematical operations to calculate prime numbers in a specific range.

The metrics obtained for each service were CPU and memory usage. The load was generated using JMeter, and it wa gradually increased to simulate different levels of demand. More than 30,000 observations were collected.

6 Data Preparation

Once the data had been collected, the relevant features had to be selected and prepared to be compatible with the AI models. First, a cleaning process was performed to remove any incorrect, incomplete, null, inaccurate or irrelevant data. Then, data transformation and normalization processes were performed. Finally, a new feature (variable) called “new_vm” was created, which is a binary representation of whether or not a new virtual machine or container needs to be created. This feature was generated using a threshold defined on CPU and RAM usage. If CPU or RAM usage exceeded this threshold, then “new_vm” was set to 1, indicating that a new computing entity was needed. Otherwise, it was set to 0. In the pre-processing stage, the threshold was set to 74%. The correlation between different features in the data set was also calculated to identify whether there are features that are strongly associated with each other (see Table 1).

Table 1. Correlation of the variables with respect to the new_vm variable

According to the results in Table 1, all the characteristics related to processor and memory usage show a moderate to strong positive correlation with ‘new_vm’, implying that when these parameters increase, the need for a new computing entity (virtual machine or container) also tends to increase.

The last step was to split the data into training and test sets. For the exercise, a 70/30 split was used, where 70% of the data was used to train the models and 30% to test them.

7 Results

A summary of the results of the unsupervised techniques is shown in Table 2. Details of each technique are discussed below the table.

Table 2. Summary of unsupervised techniques

Logistic Regression: With an accuracy of 87%, this model provided a good basis. However, its performance may not be sufficient for critical applications where prediction accuracy is of paramount importance. The main advantage of this model is its simplicity and ease of interpretation.

Decision Tree: With 97% accuracy, this model outperformed logistic regression. It can be used as a proactive solution because of its training speed and clear graphical representation of decisions, which can be useful in understanding which factors are most important in the autoscaling decision. It was the second best ranked model in terms of FN (False Negatives), however, its initial depth suggested that over fitting may have occurred in the training data, which could affect its performance with new data.

Gradient Boosting: Like the decision tree, this model achieved an accuracy of 97%, which suggest that it is capable of making very accurate predictions. An advantage of Gradient Boosting is that it can capture complex interactions between features, which could be useful given the nature of the data. It was the model that stood out with respect to the False Negatives and its training time makes it a good choice for further use.

Neural Network: Although it had a slightly lower accuracy of about 96%, this model showed good performance. We believe that further adjustments could be made to optimize its performance, such as modifying the hyperparameters or changing the architecture. Neural networks are especially powerful in capturing nonlinear interactions and complex patterns in data. A disadvantage, in a practical implementation, is that if the model is required to constantly learn and retrain itself with new data, the time it takes to train may affect its usability.

With respect to the application of the K-means algorithm we can comment that although its use is not to obtain predictions, its silhouette coefficients of 0.53 and WCSS of 1996, indicate that it makes a good segmentation of the data. This technique is useful for exploring the underlying structure of the data and detecting patterns. It can be used when clear labels are not available or when seeking to understand the underlying relationships and structures between variables.

8 Conclusions

The results of the applied models shown good performance, with some of them achieving an accuracy of up to 97%. These results suggest that it is feasible to predict the need for new computing entities using the available monitoring data. Although there are other ways to address this autoscaling problem, such as with the use of expert systems, the results obtained and previous work, lead us to believe that machine learning techniques are suitable for further investigation. These techniques will be applied in variable load scenarios, especially in cloud environments; in these scenarios expert systems can become difficult to maintain and scale. Machine learning models can handle large volumes of data in an efficient, scalable manner.

As future work, the evaluated models can be integrated into a container orchestration system such as Kubernetes to predict to predict the need for new resources and to proactively scale up. In this way, one could have a proactive system instead of a reactive one, meaning that the system could adapt to resource needs before a decrease in performance occurs.