1 Introduction

There are several cloud-based systems that have been commonly used by many people and organizations that are critically dependent on cloud resource provision such as social media platforms, online video streaming services, and e-commerce. Leading companies like Facebook, Netflix, and Ali Baba are significant consumers of cloud services, utilizing such systems. To manage cloud resource provisioning on demand, cloud service providers (CSPs) have introduced an autoscaler mechanism [6]. However, the current autoscaler primarily operates reactively, triggering scaling actions only when certain metrics breach predefined thresholds, which may lead to challenges during peak hours when a substantial number of users interact with cloud-based applications.

Despite the use of sophisticated techniques (e.g., reinforcement learning, queuing theory, time-series analysis [15]) cloud scaling is still prone to erroneous decisions, leading to over- or under-provisioning of resources for applications. Over-provisioning results in resource waste, increased energy consumption due to activated server hosts and cooling devices, and the release of greenhouse gases [16]. Under-provisioning, on the other hand, causes performance degradation and resource scarcity for managed applications [15]. Therefore, an autoscaler must make accurate scaling decisions, ensuring adequate resource provisioning without compromising Quality of Service (QoS) and Quality of Experience (QoE) requirements [3]. The responsibility for policy correctness lies with the responsible party (i.e., cloud architects), necessitating meticulous design-time tasks to mitigate the risk of QoS and QoE violations. However, the complexity of defining reliable and universal scaling policies arises from the intricate configuration of conditions, parameter variations in the unpredictable cloud environment, and heterogeneity in cloud provision. Universal scaling policies are needed to ensure that the policy can be adopted by the cloud service providers regardless of the cloud provision technologies [5].

The above-mentioned challenge requires an effective assessment approach for ensuring correct-by-design of the cloud scaling policies. Existing approaches can be categorized into (i) experimentation, (ii) simulation, and (iii) analytical modeling [4]. The experimentation requires the scaling policies to be deployed in a real environment and their performance is monitored for a certain period of time to capture real usage patterns and perform the relevant assessment. However, this approach requires long-term monitoring to capture diverse workload patterns effectively. The simulation utilizes simulation tools and techniques to simulate various scaling policies, and observe their behavior and performance under different conditions. Nevertheless, it may not fully capture the dynamic nature of cloud systems and workloads. Analytical modeling uses mathematical techniques to gain insights into the behavior, performance, or properties of a system. However, it may not fully capture all aspects of real-world complexities and uncertainties. Despite these limitations, analytical modeling remains a powerful tool for gaining insights, making predictions, and guiding decision-making in many domains. It is often complemented by other modeling techniques, such as simulation and empirical studies, to provide a more comprehensive understanding of complex systems.

Formal verification which is a specialized form of analytical modeling, offers a rigorous method to produce correct-by-design cloud scaling policies. Probabilistic model checking (PMC), a subfield of formal verification, provides a way to achieve this objective. This is possible since the cloud scaling system exhibits the characteristics of the stochastic systems [12], such as randomness and uncertainty caused by the dynamic and unpredictable nature of the workload and resource demands. When applying PMC, the responsible party (i.e., cloud architects) needs to formalize the cloud scaling process using the appropriate modeling techniques. This is a pre-requisite task that has to be carefully constructed in order to effectively carry out the verification task using PMC. Each modeling technique has its own characteristics and advantages. Hence, choosing one over another, or utilizing multiple models to maximize the insights are the key decision that has to be made by the cloud architects. This is a great challenge faced by them which affects the benefits gained from PMC activity. Hence, there is a need to provide a deeper understanding of the formal models in the context of the cloud scaling process using PMC. Herein, we focus on two types of formal modeling techniques, namely, the Markov Decision Process (MDP) and Discrete Time Markov Chain (DTMC).

There are existing works that applied PMC for guaranteeing the correct-by-design of cloud scaling policies. We can view these works from one perspective (i.e., MDP or DTMC) or both perspectives. From an MDP perspective, the work by Naskos et al. [21] applied the MDP model for verifying latency-aware cloud scaling that considers non-deterministic scaling action. From the DTMC perspective, the work by Evangelidis et al. [5] proposed DTMC model for verifying utilization-aware cloud scaling that excludes the controls of non-deterministic actions in the scaling process. From both perspectives, Moreno et al. [17] applied MDP and DTMC to capture the interaction between the decision maker and the environment of the self-adaptive system. Specifically, the MDP was utilized to model the decision maker, while the DTMC was used to model the environment. Following these works may assist the interested party to extend or improve the targeted models. However, the fundamental question remains, that is "which formal models to apply for capturing which aspects of the cloud scaling process?". This question is non-trivial to answer when there is no comprehensive comparison of utilizing the formal models within similar context.

Thus, our work takes a novel perspective in modeling and assessing the cloud scaling process using MDP and DTMC models using PMC. The use of DTMC enables the representation of stochastic behavior, while the use of MDP allows the abstraction of non-deterministic scaling actions of the cloud scaling system. In comparison with [21], our MDP modeling approach considers additional metrics apart from latency which includes response time, CPU and RAM utilization. In comparison with [5], our DTMC modeling approach includes additional metrics, namely, latency, and RAM utilization. In comparison with [17], our work utilizes both models to capture the cloud scaling behavior within a similar context. To support the model assessment, we give specific attention to three non-functional dimensions, namely, performance, complexity, and robustness. The performance assessment aims to reveal the capability of each model in adapting to the changes in workload conditions. The complexity assessment aims to investigate the state-space exploration and time complexity of the models to support the computation of the verification process. Meanwhile, the robustness assessment aims to assess their reliability and stability. PRISM model checker [14] is utilized for modeling and assessing the formalized models. Meanwhile, PCTL is used to quantify the probabilities and verify the models based on the cloud scaling objectives.

In summary, the main contributions of this research work are as follows:

  • An approach to formalize and assess cloud scaling processes based on two formal modeling techniques, namely MDP and DTMC and PMC as presented in Sects. 4 and 5;

  • The assessment results that compare the complexity of the formalized cloud scaling process, the performance of the verified cloud scaling policies, and the robustness of the formalized cloud scaling process as presented in Sect. 5;

  • The insights gained from the assessment in relation to the advantages of utilizing DTMC and MDP for verifying the cloud scaling process as well as the significance of these insights to the industry’s practicality as presented in Sect. 6.

The remainder of this paper is organized as follows. Section 2 provides the background of the cloud scaling scenario as well as the DTMC, and MDP models. Section 3 summarizes the related work and highlights their key findings. Section 4 presents the proposed verification approach for cloud scaling. Section 5 discusses the experimentation setup and results. Section 6 discusses the advantages of each model-based checking followed by their significance to the industry’s practicality. Finally, we conclude and present directions for future work in Sect. 7.

2 Background

In this section, we delve into a cloud scaling scenario aimed at analyzing the effectiveness of cloud scaling policies. Subsequently, we present the mathematical models for Discrete-Time Markov Chain (DTMC) and Markov Decision Process (MDP) utilized in this study.

2.1 Cloud scaling verification scenario

Figure 1 presents an overview of a cloud scaling system composed of front-end, back-end, and autoscaler components. The front end enables user interactions with the cloud-based application, while the back end handles virtual machines with allocated computing resources (e.g., CPU and RAM). To meet the application’s demands, the autoscaler dynamically adjusts resource allocation. Since incoming requests vary, ensuring sufficient resources is crucial to maintain acceptable response times, and the autoscaler plays a vital role in achieving optimal system performance.

The responsibility of the respective party, (i.e., cloud architect) is to produce reliable cloud scaling policies for the autoscaler, satisfying the requirements of both cloud service providers (e.g., Amazon Web Service, Microsoft Azure, Alibaba Cloud) and service users. However, cloud architects encounter challenges in utilizing the appropriate formal modeling techniques to verify the cloud scaling process, particularly involving peak hour and normal hour provisioning patterns.

Consider an online video streaming site deployed as a cloud application that is accessed by users worldwide to watch movies. During working hours, user access to the site is minimal, resulting in a small number of streaming patterns available for assessment by the cloud architect. In contrast, after working hours, especially at night, user access surges, leading to an increase in streaming patterns. These patterns depend on parameters like usage hours, concurrent viewers, and content popularity, which in turn influence the required computing resources (e.g., CPU utilization) and system performance (e.g., latency) of the streaming site. The architect must evaluate these patterns before designing appropriate scaling policies to enable the autoscaler to dynamically adjust resource allocation accordingly. However, data collected during peak hours exhibit high variation compared to data from normal hours, adding complexity due to the probabilistic nature of workload patterns, latency requirements, and resource provisioning.

To address these challenges, the cloud architect needs a modeling and assessment approach that can:

  1. 1.

    consider the probabilistic nature of workload patterns, which evolve based on uncertain user behavior, to accurately determine resources required by the cloud system.

  2. 2.

    takes into account the demand for low latency when accessing the cloud system while designing scaling policies to avoid performance issues, especially during peak hours.

  3. 3.

    verify the effectiveness of resource provisioning during normal and peak hours to avoid extreme provisioning events. Over-provisioning during normal hours incurs unnecessary costs, while under-provisioning during peak hours may cause service degradation and user dissatisfaction.

Therefore, addressing this issue necessitates implementing an approach that tackles these challenges and aids the cloud architect in designing reliable scaling policies. These policies should be rooted in user requests and resource usage data collected during both normal and peak hours to enhance overall task performance.

Fig. 1
figure 1

High-level view of cloud scaling system

2.2 Probabilistic model checking

This section discusses the fundamental background of DTMC and MDP in PMC, as well as the types of properties that can be used to verify the two models.

2.2.1 Discrete time Markov chain (DTMC)

DTMC is the simplest Markov model, and it is interpreted as a transition system augmented where a sequence of random variables takes values in some set with slight probabilities. Each of the variables in the state performs the configuration of the system being modeled. DTMCs are discrete stochastic processes with the Markov property, indicating that the probability of future states is solely determined by the current state and not by prior states [7]. Simply put, this process is memoryless, so it has no memory. As the name suggests, the transition from one state to another occurs in discrete time steps. The DTMC is represented as a tuple of \(D = \{ S, S_{init}, P, L \}\) where:

  • S is a finite set of states;

  • \(S_{init}\) is the initial state;

  • \(P: S \times S \rightarrow [0,1]\) is a stochastic matrix \((\sum _{s'\in s}P(s,s')=1 \ \forall s \in S)\). An element \(P(s_i,s_j)\) represents the probability that the next state of the process will be \(s_j\) given that the current state is \(s_i\).;

  • \(L \times S \rightarrow 2^{AP}\) is a labeling function that assigns a state with atomic propositions.

2.2.2 Markov decision process (MDP)

MDP is a mathematical framework that supports modeling decision-making in many situations. It can also be considered as the extension of the DTMC due to the additional non-determinism provided by the model in addition to probabilistic behavior [13]. This characteristic supports the trigger of non-deterministic scaling actions based on certain conditions. The MDP is represented as a tuple of \(M = \{S, s_{init}, Act, P, AP, L\}\) where:

  • S is a set of states;

  • \(s_{init}\) is an initial state;

  • Act is a set of actions;

  • \(P: S \times Act \rightarrow Dist(S)\) is the partial transition probability function, where Dist(S) refers to the set of all discrete probability distributions over S;

  • AP is a set of atomic propositions;

  • \(L \times S \rightarrow 2^{|AP|}\) is a labeling function that assigns to each state \(s \in S\), a set L(s) of atomic propositions.

Furthermore, it has been acknowledged in [19] that the MDP can be modeled using two approaches, namely, bounded-by-action (BBA) and bounded-by-state (BBS). For BBA, scaling action is considered a part of the predefined rules for scaling policies, and the scaling process is terminated once do nothing action is reached at the goal state. Meanwhile, BBS employs the same implementation as BBA but terminated the process when a pre-specified number of transitions (steps) is reached at the goal state.

2.2.3 Properties

Two types of standard properties can be used to verify decisions made by the MDP model, namely the probabilistic and expected reachability properties [1]. The first property focuses on determining the probability of reaching a certain state that satisfies the predicate needed for the quantitative analysis of the model. Meanwhile, the second is needed to perform a cost-based analysis of the model. Besides that, these types of properties can also be applied to verify the behavior captured with DTMC [22]. However, unlike DTMC, MDP enables both properties to be implemented according to the decision strategy, which minimizes or maximizes certain attributes of the targeted process.

The properties to be verified against these models can be specified in probabilistic temporal logics such as PCTL and linear temporal logic (LTL) [12]. Meanwhile, the syntax of these temporal logics is derived from four main elements, which are the properties (i.e., probability (P), reward (R)), the LTL formula, the reward objective, and the single instance of P or R operators (e.g., true).

3 Related work

This section compares the existing works that utilize DTMC-based and/or MDP-based modeling using PMC to address policy verification problems of resource management in the cloud, edge, and fog computing environments.

3.1 DTMC and MDP-based modeling with PMC

Existing work that is quite similar to our research can be referred to Moreno et al. [17]. They utilized DTMC and MDP modeling techniques to verify the adaptation tactic of a self-adaptive system which involves the action of adding or removing servers in the cloud system. However, our objective differs from theirs, as we focus on analyzing specific aspects (i.e., response time, latency, CPU and RAM utilization) of cloud scaling policies rather than adapting tactics for the cloud system itself. While [17] integrated MDP and DTMC to capture the interaction between the decision maker and the environment of the self-adaptive system, our unique contribution lies in independently employing DTMC and MDP models. Our focus is on capturing the cloud scaling behavior and comparing their respective analyses. To achieve this, we model the cloud scaling process using both modeling approaches with identical states but distinct characteristics. The state transition for both models is controlled with probabilities, but the non-deterministic scaling actions are also considered in MDP to control its decision-making process. By conducting a comparative analysis between these two models, we gain a comprehensive understanding of the system’s behavior, including the strengths, limitations, and trade-offs associated with each model’s specification when accurately capturing latency-aware cloud scaling policies. Our independent application of DTMC and MDP models not only advances the understanding of cloud scaling policies but also sets a methodological precedent for evaluating the behavior of the cloud scaling process using DTMC and MDP.

There are other studies that pay attention to multiple modeling techniques to address the modeling and analysis problem. In relation to DTMC and MDP, we have identified several works, namely, in the problem domain of virtual machine (VM) migration [2], human-robot interaction [18], and energy router (ER)-based system [8]. Specifically, Bashar et al. [2] focused on the interaction between MDP and DTMC model. They employed an MDP-based application manager model to establish the initial state for a DTMC-based hypervisor model, addressing a specific challenge related to live virtual machine (VM) migration within a cloud environment. On the other hand, Muhammad et al. [18] analyzes the human-robot interaction during the sensor replacement process in the wireless sensor network by utilizing DTMC to model the operator and MDP to model the robot. Meanwhile, Gao et al. [8] applied the continuous-time Markov chain (CTMC) which is the extension of DTMC to model multiple ER-based systems for reliability analysis. Meanwhile, MDP is utilized to model the electricity trading scheme of the second CTMC model to describe its trading behavior.

3.2 DTMC-based modeling with PMC

The work that employed DTMC in the context of the cloud autoscaling process can be referred to [5]. They examined the Quality of Service (QoS) of the policies based on the occurrence of the number of bad autoscale states resulting from the process. Their formalized policies only consider the number of allocated VMs and the CPU utilization of the system environment as the main metrics that drive the scaling process. Meanwhile, our model introduces two additional metrics, namely, the RAM utilization and the latency of the cloud application. These metrics are critical because RAM is a vital resource in cloud environments, and latency is a crucial factor for applications serving end-users. Inadequate memory allocation can lead to performance degradation and even application crashes. By incorporating RAM utilization as a metric, we can ensure that the autoscaling process allocates sufficient memory resources to efficiently handle the workload. Simultaneously, high latency can result in suboptimal user experiences, potentially leading to customer dissatisfaction and a potential loss of business. By monitoring and optimizing latency as a metric, we can ensure that the examined cloud application maintains acceptable response times. Moreover, instead of assigning a fixed reward value to the bad autoscale state based on CPU violation caused by the policies, we adopt the same approach (i.e., assigning the affected state with a fixed value) to examine the frequency of policies leading to under-provisioning and over-provisioning scenarios.

There are other existing studies that utilized DTMC-based model checking to investigate different areas of resource management. For instance, [25] addresses the issue of bandwidth allocation in a network of self-adaptive cameras. Similarly, [23] applies the same specification method to model and verify the performance of policies related to Mobile Edge Computing (MEC) allocation. Differing from these studies, our objective is to analyze the performance of cloud scaling policies specified in DTMC, particularly in terms of minimizing latency violations.

3.3 MDP-based modeling with PMC

The works that align with the proposed study can be referred to [19, 20] and Naskos et al. [21]. They utilized MDP to specify and analyze the cloud resource elasticity in terms of latency violations, deployment costs, and security attacks respectively. Two types of MDP have been proposed in their works, namely, Simple and Advanced. The Simple model utilized a BBA approach while the Advanced utilized a BBS approach. On the other hand, our proposed MDP model adopts only the BBA version of the MDP model since our focus is on the comparison between MDP and DTMC, rather than the types of MDP. Our MDP model has several significant differences which can be viewed from the layer, parameters, and reward perspectives. In terms of layer, our MDP model captures the VM scaling at the infrastructure layer rather than the database layer. Additionally, we include three additional parameters into the model (i.e., CPU utilization, allocated RAM, and application latency) to drive the scaling decision-making process while they only consider the number of VMs and latency. When it comes to rewards, we have implemented a reward model that encompasses various metrics. These metrics include response time, latency, CPU utilization, RAM utilization, as well as latency violations occurring during both under-provision and over-provision events. In contrast, their reward models solely focus on capturing latency violations during these two events.

There are also existing works that applied MDP-based model checking in different aspects of resource management. For instance, Kwiatkowska [9], Ray and Banerjee [24] applied a stochastic multiplayer game (SMG) model checking, which builds upon multiple MDPs to address the decision-making problem. In particular, Ismail and Kwiatkowska [9] tackled the formalization and synthesis of selecting the optimal cloud collaborator, and [24] addressed the requests allocation problem related to MEC servers. Meanwhile, Kochovski et al. [11] utilized MDP to model the container placement problem in the multi-tier environment (edge-fog-cloud). In comparison with our work, we adopt an MDP-based model checking for the cloud scaling problem.

4 Proposed approach

Fig. 2
figure 2

Modeling and Assessment approach of Cloud Scaling Process based on DTMC and MDP formal models and PMC

This section presents the steps taken to formally model and encode the cloud scaling process to enable the model checking task.

4.1 Overview

This research aims to propose a comprehensive solution for the interested party (i.e., cloud architects) to deliver reliable cloud scaling policies for cloud service providers. The process involves several phases, as depicted in Fig. 2. Initially, formalized cloud scaling processes are modeled using two specification methods: DTMC and MDP. The subsequent step involves encoding these formalized models in the PRISM high-level language, along with expressing the properties using PCTL through the PRISM model checker. Once the encoding process is completed, the models are subjected to verification against the specified properties. This verification captures essential information related to various metrics of the formalized cloud scaling process, including response time, latency, CPU and RAM utilization. The verification results are then assessed through three methods: complexity analysis, performance analysis, and sensitivity analysis.

In terms of complexity analysis, this paper primarily focuses on aspects such as state-space exploration complexity, time complexity (i.e., duration of model checking), and space complexity (i.e., storage space required for computations and state-space exploration). The second analysis pertains to the performance assessment of formally modeled cloud scaling policies, specifically concentrating on the aforementioned four metrics. The third analysis involves evaluating the robustness of the models. This is achieved by examining how sensitive the decision-making process is to changes in parameter values and an increase in the number of simulation samples used in the decision-making process. This thorough evaluation ensures the reliability and effectiveness of the designed policies. Upon successful verification of the reliability of the designed policies, they can potentially be provided to the cloud service provider for implementation in the autoscaler. This comprehensive approach empowers cloud architects to offer dependable cloud scaling solutions, enhancing the overall efficiency and effectiveness of cloud services.

Table 1 Extracted information from cloud scaling scenario for DTMC and MDP

Table 1 presents the key information extracted from the cloud scaling scenario for constructing the MDP and DTMC models. The Concepts and Descriptions columns encompass relevant considerations and accompanying explanations. The Modeling Roles column assigns roles to concepts, using Constant for user-input or fixed parameters, Variable for parameters updated during state transitions, Action for scaling events, and Update for state transitions. The Modeling Elements column refers to the symbols used in the model illustration to represent specific information, while the Model column lists the probabilistic models that consider each particular concept. Subsequently, this gathered information is mapped onto the cloud scaling models based on DTMC and MDP approaches. To validate the behavior of the models, a sample run was conducted using the PRISM model checker’s built-in simulator. Furthermore, we also encode a generalized cloud scaling system, without specific emphasis on any particular operating system or hardware technology. The models are designed to address the scaling of VMs at the Infrastructure as a Service (IaaS) level, with the goal of enhancing resource utilization (CPU and RAM), reducing latency, and improving response times when dealing with incoming user requests, which we refer to as load in our model.

Fig. 3
figure 3

Illustrated model of cloud scaling process

4.2 Modeling cloud scaling with DTMC

The scaling concepts obtained from the extracted information and the behavior specified in Table 2 for DTMC-based cloud scaler have been interpreted into an illustration shown in Fig. 3a. Both our DTMC and MDP models will exhibit the behaviors detailed in Table 2, with the exception that the behavior identified as ID BR12 is exclusively applicable to the MDP model.

The formalized cloud scaling process is modeled to receive a range of uncertain numbers of requests which will affect the changes in the model’s states in terms of several parameters including the current number of VMs, CPU utilization, RAM utilization, and latency. The states, s will probabilistically change based on a defined probability value, p, and the data of previous parameters will be updated. The discrete-time step t is incremented by one to indicate that the model has transitioned to a new state. When the s is experiencing a scale-out process, either 1 or 2 VMs will be added from the existing number of VMs and vice versa for scale-in where either 1 or 2 VMs will be removed. However, the model will skip the scaling process when the cloud system is allocated with the minimum number of VMs or maximum number of VMs. To get a finite quantitative analysis result, the DTMC model scaling process terminates if the maximum time step is reached, which occurs when the time steps of the scaling process reach their upper bound. In this study, the unit for time step is considered as milliseconds.

Table 2 Scaling behavior

DTMC model in Fig. 3a has two state labels, namely min VM and max VM which represent the atomic propositions of the scaling process. The colored arrows represent the transition to the next state after the scaling decision is made, with blue indicating scale-out, red indicating scale-in, and black indicating no scaling action. The initial state of the process comprises the initial number of VMs, the number of loads (i.e., user requests) determined probabilistically, as well as the CPU utilization, RAM utilization, and latency associated with the pair of VM-load information. Based on the illustration, we can assume that the \(S_1\) is the initial state of the cloud system. The \(S_1\) can only move to the state that represents the scale-out process (i.e., increasing number of VMs) instead of the scale-in process as the system is currently considered allocated with the minimum number of VMs allowed for it. Thus, the model can either move to a state that increases the number of VMs in 1 or 2 steps using probability, p or \(1-p\). In the bigger picture, if the transition probability p value is set to a value that is greater than 0.5, it means that the probability of selecting the one-step scaling decision is higher than the two-step scaling decision in which the decisions refers to adding and reducing the number of VMs allocated for the system. However, in our model, 0.5 was assigned as the p to ensure that the model is taking a fair transition. Meanwhile, \(S_2\) represents the system state after performing the scale-out process which causes the number of VMs in this state to be greater than the previous one. On the other hand, \(S_4\) represents the state where the cloud system is allocated with its maximum number of VMs. In this state, the cloud system is no longer able to move to a state with a greater number of VMs. However, it can move to a state that has a lower number of VMs. This is because, at this stage, the scale-out process is not allowed to avoid causing over-provision problems.

4.3 Modeling cloud scaling with MDP

Figure 3b is our illustrated model which has been adapted from the existing work. In comparison with [19], instead of treating the S state as the cluster size, we define the S state as the condition of the cloud system. The condition is driven by the current number of VMs, CPU utilization, RAM utilization, and latency that are associated with the allocated VMs. Furthermore, the elements modeled for our proposed MDP are similar to our DTMC model as both of them have two state labels, namely max VM and min VM. However, unlike the DTMC model that solely relies on probability values for state transitions, this MDP model also incorporates scaling actions to determine its path. These scaling actions are represented as act=0 for scale-out, act=1 for scale-in, and act=2 for no scaling. Besides, this model can either end its decision when there is no scaling action is required or when it reaches the maximum time step. Furthermore, the color-coded arrows in the model signify the scaling actions taken throughout the process, with a blue arrow denoting a scale-out action (i.e., VM addition), a red arrow indicating a scale-in action (i.e., VM removal), and a black arrow representing the absence of any scaling action.

The scale-out actions encompass three approaches with an approximate probability of p = 0.5 for state transition. This implies that the current number of VMs can be increased by 1 or 2 VMs. In contrast, the scale-in process allows for the removal of 1 or 2 VMs. Additionally, the decision-making process is influenced by constraints based on three metrics: latency, CPU and RAM utilization. Meanwhile, to differentiate the MDP-based cloud scaling process from the DTMC-based cloud scaling process, an additional element is considered in the constraints which is the scaling action that needs to be triggered for the following state. For clarity, the scaling constraints are excluded from Fig. 3b, focusing solely on the non-deterministic scaling actions. It is important to note that the scaling process cannot proceed beyond the maximum number of VMs. Moreover, in this study, we decided to implement the BBA approach where the decision-making process concludes when scaling actions are no longer triggered due to reaching the maximum number of VMs allowed for the application.

4.4 Modeling penalties

The penalties within our models serve as indicators of the effectiveness of probabilistic models like DTMC and MDP. These penalties come into play when there are extreme provisioning events for cloud applications. In this context, extreme provisioning can either mean over-provisioning or under-provisioning resources for cloud applications. Over-provision refers to allocating more resources than necessary, while under-provision occurs when the application doesn’t receive sufficient resources to handle incoming requests.

To determine penalty values, we utilize Eq. 1 to capture information about provisioning events linked to latency violations. The first and second rewards quantify the frequency of over-provision events, whereas the third and fourth rewards estimate the frequency of under-allocation events. In these equations, lat represents the current latency experienced by the cloud application when it is allocated a certain number of VMs, while \(lat_{up}\) indicates the upper latency threshold. Equation 2 provides the formulas used to calculate lat based on the load and number of VMs. In this paper, the load refers to the number of user requests for the cloud application. Additionally, the upper and lower thresholds for CPU and RAM utilization are denoted as \(util_{up}\), \(util_{low}\), \(ram_{up}\), and \(ram_{low}\), respectively. The model’s decisions are evaluated based on specific conditions. First, if the current system latency exceeds its threshold and simultaneously causes CPU or RAM utilization to fall below their lower thresholds, the model’s scaling action is considered overallocating the resources, resulting in a penalty. Second, if the current system latency surpasses its threshold and causes CPU or RAM utilization to exceed its upper thresholds, the action is deemed underallocation.

$$\begin{aligned}&\begin{aligned} Penalties ={\left\{ \begin{array}{ll} 1, &{} lat>= lat_{up} \ \& \ (util<=util_{low} \ \Vert \ ram<=ram_{low})\\ 0, &{} lat>= lat_{up} \ \& \ (util>util_{low} \ \Vert \ ram>ram_{low})\\ 1, &{} lat>= lat_{up} \ \& \ (util>=util_{up} \ \Vert \ ram>=ram_{up})\\ 0, &{} lat >= lat_{up} \ \& \ (util<util_{up} \ \Vert \ ram<ram_{up})\\ \end{array}\right. } \end{aligned} \end{aligned}$$
(1)
$$\begin{aligned}&\begin{aligned} latency = \frac{load}{number \ of \ VMs} \end{aligned} \end{aligned}$$
(2)

There are two significance of analyzing latency violations. First, users will be able to gain insights related to the decision-making process and policies adopted by the models. Thus, it helps them to understand the factors that contribute to latency violations and guides the design of better policies and decision-making strategies to minimize or prevent such violations. Second, latency violations can have a significant impact on user experience, system responsiveness, and overall resource utilization. By understanding how each model contributes to latency violations, it becomes possible to optimize resource planning and utilization strategies. This includes determining the appropriate number of VMs, CPU and RAM allocations, and scaling policies to meet the desired performance targets while minimizing latency violations.

4.5 Encoding DTMC and MDP models

Fig. 4
figure 4

Free parameter constants

Fig. 5
figure 5

Modules to generate CPU Utilization, RAM utilization and response time for cloud system environment

PRISM model checker has been utilized to encode the formalized cloud scaling process using PRISM high-level language. The structure of the proposed DTMC and MDP models comprises several elements which include free parameter constants and fixed parameter constants, generator modules, autoscaler modules, and reward components.

The free parameters refer to the constants (i.e., variables in PRISM language) that can receive value through user input while fixed parameters have been assigned a specific value. Meanwhile, the generator modules randomly provide a number of loads (i.e., user requests for the cloud system), as well as the CPU utilization, latency, RAM utilization, and response time of the cloud system before cloud scaling is performed. The scaler module consists of the state transitions that represent the condition of the cloud system when performing the cloud scaling process. The state of the model holds information regarding the current number of VMs (which started with the initial number of VMs input by the user) as well as CPU utilization, RAM utilization, and latency associated with the number of VMs. The transition between states is controlled by a probability, p which has been assigned as 0.5. The reward components of this model are encoded as state-based rewards to capture the value of latency, CPU utilization, allocated RAM, and response time. Besides that, we also modeled a reward to give a penalty to the model if a latency violation occurs. The DTMC-based cloud scaling process is terminated once the maximum timesteps are reached while MDP can either terminate the process when it is no longer required to trigger scaling action (only have to maintain the current number of VMs)or when it reaches the maximum time step.

Figure 4 shows the constants that are treated as free parameters. The parameters related to VMs are minVM, i_vm and maxVM which represent the minimum number of VMs allowed for the system, the initial number of VMs allocated for the system before the cloud scaling process started as well as the maximum number of VMs allowed for the system respectively. The parameter that holds the values for the number of user requests is the i_load. The parameters that represent the time-based information are maxRt, maxLat and maxStep which refers to the threshold of response time and latency as well as the maximum timestep of the model.

Fig. 6
figure 6

Variables in Scaler module

Fig. 7
figure 7

Sample commands in Scaler module

In Fig. 5, we present the generator modules. One of these modules called the cpu_generator, is responsible for generating random CPU utilization values, demonstrating the current system load. The initial CPU utilization is obtained from user input. In addition to assigning random CPU utilization to the models, this module updates the utilization variable with the new value after the scaling process occurs. Another module, the ram_generator, serves a similar purpose by generating random RAM utilization values. It follows the same process of providing random utilization and updating the variable with the reduced utilization to signify that scaling has taken place. The final generator module, rt_generator, randomizes and updates the response time of the model, just like the other generator modules. It is important to note that CPU and RAM utilization are assumed to be represented in percentages and megabytes, respectively, while response time is measured in milliseconds.

Fig. 8
figure 8

Formulas to update latency, CPU utilization, RAM utilization and response time

Fig. 9
figure 9

Reward components for latency, CPU utilization, RAM utilization and response time

Fig. 10
figure 10

Reward component for latency violations

Figures 6 and 7 illustrate the variable components and a set of commands implemented within the Scaler module. Both the proposed models, DTMC and MDP-based cloud scaling processes, are designed to represent the same scaling behavior. The Scaler module encompasses several scaling policies, each consisting of several sets of rules for analyzing state information, including the current number of virtual machines (VMs), CPU utilization, latency, and response time, along with possible scaling actions. Additionally, several variable components are utilized to capture information that represents the state of the models. The sample commands, written in the PRISM language, can be referenced in Fig. 7. It is important to note that the code snippet in the figure only represents a portion of the modeled policies. These rules determine whether the formalized cloud system is currently allocated with the minimum allowable number of VMs while experiencing CPU utilization between 60% and 100%, as well as high latency or response time. When the model’s state satisfies these rules, it triggers the provisioning of 1 or 2 additional VMs to the system. Consequently, several variables are updated, including the number of VMs, latency, CPU utilization, RAM utilization, response time, time step, load, and load counter. All components of the rule implementation are applied to both models presented in this paper. However, the condition for act is exclusively applied to the MDP to ensure that the decision made by the MDP also considers a potential scaling action to enhance the model’s state conditions. Furthermore, during the state transition, the values of latency, response time, CPU and RAM utilization have been updated by computing the formulas shown in Fig. 8.

4.6 Encoding reward and properties

Table 3 Properties

The reward components in both models were encoded as in Figs. 9 and 10. Reward components that capture latency, response time, CPU, and RAM utilization are associated with the number of VMs. Meanwhile, the reward component that captures latency violations has been encoded as discussed in Sect. 4.4.

Typically, there exists an inverse correlation between latency and the number of VMs allocated in a cloud system. As the number of VMs allocated in a cloud system increases, the latency in processing the load tends to decrease. This correlation arises due to the system’s capacity to distribute workloads across multiple VMs, enabling parallel processing and reducing the time required to complete tasks. Furthermore, CPU and RAM utilization may also exhibit an inverse correlation with the number of VMs, particularly when the scale-out process is initiated. Allocating more resources to the formalized cloud system often leads to an overall reduction in system utilization since the processing load can be adequately supported by the newly added VMs. In addition, the increase in the number of VMs indirectly impacts response time. This occurs because a larger number of workloads can be distributed within the system, resulting in higher CPU utilization, which, in turn, affects response time. For instance, when CPU utilization approaches or reaches 100%, it indicates a heavily loaded CPU, which can potentially introduce delays in processing requests. Consequently, the response time for task execution or request servicing may become prolonged.

Meanwhile, Table 3 presents the encoded probabilistic properties assigned to specific requirement IDs, namely SR1 through SR6, for the purpose of verification. The cumulative duration, denoted as "k," can be arbitrarily assigned any value (e.g., 10, 100, 1000) to encompass results within the specified temporal units. A longer duration allows the model to capture an extended series of behaviors throughout the designated process. In order to analyze the models using the verification process, expected reward properties were employed, and these properties were encoded using PCTL, a formalism that facilitates the verification of reward-based properties [10].

Specifically, SR1 until SR3 aims to elucidate the latency, CPU and RAM utilization experienced by the proposed models when they operate in a state where the current number of VMs reached or below the maximum threshold. The maximum number of VMs is obtained from the user at the outset of the model-checking procedure. Meanwhile, SR4 aims to capture the response time results from the cloud scaling behaviors when the CPU utilization is less than and equal to 100%. Lastly, SR5 and SR6 describe the frequency of latency violations resulting from scaling behaviors in the models during under-provision and over-provision scenarios, respectively.

5 Assessment

Our assessment goals are three-fold. Firstly, we aim to examine the complexity of the formalized cloud scaling process using the model-checking approach. Secondly, we aim to investigate the performance of the formalized cloud scaling policies by verifying the proposed models against the specified properties. In this study, we focus on analyzing four performance metrics which include response time, latency, CPU and RAM utilization. Thirdly, we aimed to evaluate the robustness of the formalized cloud scaling process by introducing variations in parameter values to the proposed models and simulating the process with three different sample sizes.

This assessment addresses the following research questions:

  • RQ1: Which formalized cloud scaling process has greater complexity in terms of implementation?

  • RQ2: How do the verified cloud scaling policies impact the performance metrics?

  • RQ3: Which formalized cloud scaling process is more robust in facing various sizes of datasets?

5.1 Assessment environment setup

We employed the PRISM model checker tool to carry out model checking and verification tasks. These experiments were conducted within a virtual machine running a 64-bit Windows Server 2016 operating system, equipped with octa-core processors and 64 GB of RAM.

Table 4 Configuration for model checking
Table 5 Configuration for performance evaluation
Table 6 Configuration for sensitivity analysis

To address RQ1, we initiated model checking process on two models to ensure their correctness and absence of deadlocks before verification. The process is performed by considering the configuration presented in Table 4. These parameters are described in Sect. 4.5. In case a deadlock was detected, we proceeded to verify a PCTL property, filter(print, “deadlock”), to identify the problematic state. Subsequent adjustments were made to rectify the affected states. Once we confirmed the models were error-free and devoid of deadlocks or state explosion issues, we proceeded with the verification process.

In addressing RQ2, we conducted experiments based on the configurations detailed in Table 5. These configurations differed primarily in workload patterns, with the "Normal" setting representing workloads during regular hours (with user request arrivals ranging from 100 to 1000) and the “Peak” setting simulating peak-hour workloads (with user request arrivals ranging from 10,000 to 10,900). Six other parameters remained constant across both settings. We assumed an initial allocation of 20 virtual machines (VMs) for the cloud system, with a minimum of one VM required for basic processing and a maximum limit of 100 VMs to encourage resource optimization. These configurations have been simulated with 1000 samples generated by the PRISM model checker.

To address RQ3, we conducted sensitivity analyses on our two models. First, we examined the sensitivity of cloud scaling behavior by varying parameter values in our proposed DTMC and MDP models. The specifics of this experiment are outlined in Table 6. Parameters under scrutiny included response time and latency thresholds, as well as the initial and minimum number of VMs allocated to the system. We conducted sensitivity analyses by altering one parameter at a time. For example, to assess the model’s sensitivity to the minimum number of VMs, we varied the minVm parameter between 1 and 20, keeping the other parameters at their initial values specified in the Initial Value column. Second, we further evaluated the sensitivity of our two proposed models by varying the number of samples used for simulating the cloud scaling process. PRISM employs simulation-based methods to estimate probabilistic properties, and the number of samples parameter dictates how many simulations the tool conducts to obtain statistical approximations. Increasing the number of samples enhances accuracy but also prolongs the analysis. Therefore, selecting an appropriate number of samples is crucial for balancing accuracy and computational efficiency. In this experiment, we ran simulations with 1000, 5000, and 10,000 samples.

5.2 Complexity of formalized cloud scaling process

Table 7 Model-checking results

To answer RQ1, the two formalized processes for scaling cloud services have been evaluated against the deadlock property specified in the PCTL expression. The results of the model checking process are presented in Table 7. Both models were configured with identical values for various parameters, including the initial number of VMs, the maximum and minimum number of VMs, maximum latency, and the number of loads (i.e., arrival requests). The DTMC necessitates reaching the maximum time step for scaling process termination, while the MDP offers flexibility, allowing termination based on the associated scaling action value or reaching the maximum time step.

The DTMC demonstrated a higher complexity compared to the MDP. This was evident from the time taken to explore reachable states and construct the model. However, the number of states and transitions in both models present that both models have the same number of states. Nevertheless, the number of transitions in the MDP model is greater as its decision is influenced by the non-deterministic scaling actions. Moreover, the MDP exhibited faster analysis of all feasible states that satisfied the investigated properties. This discrepancy arose from the termination conditions of the scaling processes: the MDP-based process terminated when the no operation scaling action was executed or when the maximum time step was reached, whereas the DTMC-based process ended when the maximum time step was reached. Meanwhile, in terms of resources, both models are built upon the same amount of Colorado University Decision Diagram (CUDD) memory and Java heap. CUDD memory refers to the amount of memory allocated specifically for the CUDD library to build and store Binary Decision Diagrams (BDDs). Meanwhile, Java heap refers to the portion of memory allocated by the Java Virtual Machine (JVM) to store Java objects and data dynamically created during the execution of PRISM.

5.3 Performance of verified cloud scaling policies

Fig. 11
figure 11

Performance analysis results across normal and peak hour scenarios

In this experiment, we introduced two ranges of user requests to the formalized cloud scaling process, representing workload patterns during normal hours and peak hours. The two scaling policies were verified against PCTL properties introduced in Sect. 4.6 and evaluated in terms of response time, latency, CPU utilization, as well as RAM utilization to answer RQ2. The results are presented in the form of a kernel density estimation (KDE) graphs to describe the density of the random output values.

The impact of the scaling process on response times is depicted in Fig. 11a. The decision taken by the DTMC model demonstrates superior performance in minimizing response times in both scenarios. The model’s response times fall within a narrow range, approximately spanning from 176 to 179 milliseconds, with a distribution density between 0.9 and 1.0. In contrast, the response times generated by the MDP decision, while not as low as those achieved by DTMC, exhibit a wider range. The MDP model captures response times ranging from approximately 178 to 183 milliseconds, maintaining a lower distribution density, around 0.5. These differences arise from the deterministic nature of transitions in the DTMC model, ensuring consistency, whereas the MDP’s probabilistic transitions introduce variability, resulting in less consistent response time measurements. Moreover, in a cloud environment, response times are not solely contingent on the volume of user requests but are also influenced by various other factors, such as the number of virtual machines (VMs) and their associated resources (e.g., CPU and RAM). If the system is adequately provisioned with resources to handle increased user requests, the response time may not experience a significant increase. However, inadequate adjustments in resource allocation to meet demand can adversely impact response times.

Figure 11b illustrates that the DTMC-based cloud scaling decision results in higher latency compared to the MDP in both scenarios. In normal hour settings, DTMC records latencies ranging from 200 to 400 milliseconds, while the MDP registers latencies of less than 200 milliseconds. During peak hour settings, DTMC records latencies spanning from 800 to 1400 milliseconds, whereas the MDP captures latencies ranging from approximately 150 to 450 milliseconds. The increase in latency values is influenced by the growing number of loads introduced to the models and the number of VMs allocated for the model’s execution.

In Fig. 11c, we can observe the distribution of CPU utilization values captured by the models in both scenarios. During the normal hour setting, DTMC captures utilization levels ranging from approximately 40% to 50%, while the MDP records utilization levels varying from 30% to 70%. Conversely, in the peak hour setting, the MDP captures lower utilization values, approximately between 35% and 45%, while DTMC registers levels ranging from approximately 40% to 50%. From this analysis, we deduce that the high utilization observed during normal hours is attributed to over-provisioning events. In contrast, the lower utilization during peak hours signifies that the system now possesses more capacity to accommodate incoming requests, leading to a more even distribution of the workload across the available resources.

Concerning RAM utilization, Fig. 11d demonstrates that the decision-making process based on MDP outperforms DTMC in both settings. The choices made with MDP managed to achieve a minimum utilization of 1500 to 1600 megabytes, whereas DTMC captured values as low as 1600 to 1700 megabytes. Since RAM is a secondary resource influenced by the scaling decisions, its data distribution pattern does not differ significantly from the CPU utilization distribution for both models in the two settings.

Fig. 12
figure 12

Sensitivity analysis across different sample sizes for overprovision event

5.4 Robustness of formalized cloud scaling process

The robustness of our proposed models can be assessed by examining their sensitivity to changes in parameter values. The results are available in Fig. 12 for over-provision events and Fig. 13 for under-provision events. In broader terms, our analysis reveals that the DTMC model displays higher sensitivity toward over-provisioning events, whereas the MDP model exhibits greater sensitivity toward under-provisioning events.

Figure 12 offers insights into our models’ sensitivity across six parameters and three distinct sample sizes, particularly from an over-provision perspective. Despite both the cloud scaling processes in DTMC and MDP being constrained by a maximum time step, MDP exhibits superior performance in minimizing average latency violations compared to DTMC. This advantage stems from MDP’s ability to consider various scaling actions within its decision-making framework, allowing it to select scaling decisions (i.e., model paths) that can effectively balance latency, CPU utilization, and RAM utilization. Moreover, when we evaluated the models using three sample sizes (i.e., 1000, 5000, and 10,000), we observed two distinct sensitivity patterns. These patterns can be categorized as either positive or negative sensitivity, indicating how the models respond to variations in parameter values.

In the case of DTMC, it exhibits positive sensitivity to variations in three parameters: the minimum number of VMs, RAM utilization, and maximum latency. Conversely, it shows negative sensitivity to changes in CPU utilization, initial VMs, and response time thresholds. On the other hand, MDP demonstrates positive sensitivity to variations in CPU utilization, maximum latency, response time thresholds, and the minimum number of VMs. In contrast, MDP shows negative sensitivity to changes in RAM utilization and initial VMs. Regarding sample size, when the number of samples is increased, DTMC experiences a significant increase in average latency violation when varying the minimum number of VMs. In the case of MDP, a substantial decrease is observed when initial VMs are varied. The sensitivity of these models is rooted in their inherent characteristics and how they handle certain parameters. DTMC’s sensitivity to the minimum number of VMs arises from its impact on the model’s state space, crucial for capturing key system behaviors. Conversely, MDP places greater importance on the initial number of VMs, as it significantly influences early decision-making and, consequently, the long-term behavior of the system.

Figure 13 illustrates the sensitivity of the models when viewed from the perspective of under-provisioning. The sensitivity of both models is notably influenced by variations in the initial number of VMs. As the simulation employs a larger number of samples, the average latency violation consistently decreases. Under-provisioning in these models occurs when latency exceeds the maximum threshold, and CPU or RAM utilization surpasses their upper thresholds. This situation indicates that the system possesses fewer VMs than required to effectively handle the workload. When the initial number of VMs is set too low, the system begins with insufficient resources, resulting in elevated latency and performance challenges. Consequently, fluctuations in the initial number of VMs can have a significant impact on the models’ sensitivity, leading to noticeable variations in system behavior. Expanding the number of samples in the simulations enhances the accuracy and comprehensiveness of the findings. In the context of under-provisioning, a larger sample size enables the model to encompass a broader spectrum of scenarios and workload patterns. With an expanded dataset, the model gains greater proficiency in exploring diverse decision paths and evaluating the consequences of under-provisioning. Consequently, the average latency violation tends to decrease, as the model possesses more comprehensive data, empowering it to make informed decisions and optimizations to mitigate under-provisioning challenges.

Fig. 13
figure 13

Sensitivity analysis across different sample sizes for under-provision event

6 Discussion

Through our experimental study, we have gained valuable insights by assessing the cloud scaling process using two modeling techniques: DTMC and MDP with probabilistic model checking. Each technique offers unique advantages from different aspects of cloud systems and gives several significance to the industry’s practicality.

6.1 Advantages of DTMC for verifying cloud scaling process

There are several advantages of model checking with DTMC that worth to be acknowledged when utilizing it for verifying the cloud scaling process.

  1. 1.

    Capturing Time Dependency: The DTMC allows modeling the cloud scaling process with a maximum timestep, capturing time-dependent behavior, which is essential when time plays a critical role in scaling decisions.

  2. 2.

    Consistent Response Time: The DTMC-based cloud scaling process demonstrates more consistent response time measurements compared to the MDP. The deterministic nature of transitions ensures response time predictability, which is valuable in certain applications.

  3. 3.

    Lower Latency: The DTMC-based process experiences lower latency compared to the MDP-based process. Simplified strategies result in faster actions and lower decision-making overhead, benefiting time-sensitive systems.

  4. 4.

    Lower CPU Utilization: The DTMC-based process captures lower CPU utilization compared to the MDP, conservatively utilizing resources and minimizing consumption in priority scenarios.

  5. 5.

    Stability and Predictability: The DTMC shows lower sensitivity to certain parameters during under-provision and over-provision events, providing a more stable and predictable system behavior.

  6. 6.

    Simplicity in Sensitivity Analysis: The DTMC’s deterministic nature and simplified strategies facilitate a straightforward sensitivity analysis, improving computational efficiency.

  7. 7.

    Handling Over-provision Events: The DTMC demonstrates lower sensitivity to initial VM allocation during over-provision events, avoiding potential over-provisioning and wastage.

  8. 8.

    Robustness with Smaller Sample Sizes: The DTMC-based cloud scaling process exhibits higher robustness when evaluated with smaller sample sizes, accurately capturing system behavior.

  9. 9.

    Simple and Short-Term Decision-Making: The DTMC is well-suited for simple, short-term decision-making processes, offering a higher level of detail within a limited time horizon.

  10. 10.

    Discrete State Space: The DTMC’s discrete nature allows it to efficiently operate in scenarios where continuous state spaces are not necessary, leading to more stable and predictable behavior in certain applications.

6.2 Advantages of MDP for verifying cloud scaling process

Model checking with MDP presents several advantages that are worth considering when utilizing it for verifying the cloud scaling process.

  1. 1.

    Lower Complexity: The MDP-based cloud scaling process demonstrates lower complexity compared to the DTMC model. Faster analysis of feasible states and a simpler structure make MDP more efficient for certain scenarios.

  2. 2.

    Faster Analysis: The MDP-based process exhibits faster analysis of all feasible states, crucial for real-time or time-sensitive systems.

  3. 3.

    Smaller Number of Reachable States: The MDP model has a smaller number of reachable states, reducing computational overhead by accessing only states relevant to specific scaling actions.

  4. 4.

    Flexible Termination Conditions: MDP’s termination condition allows fine-grained control over the scaling process, beneficial in situations requiring specific state or action achievements.

  5. 5.

    Response Time and Latency Flexibility: The MDP-based process exhibits greater flexibility in capturing response time and latency, adapting to varying conditions through probabilistic exploration.

  6. 6.

    Resource Utilization Adaptability: The MDP-based process demonstrates higher CPU utilization, effectively adjusting resource allocation to varying workloads.

  7. 7.

    Flexibility in Sensitivity to Parameters: MDP shows higher sensitivity to parameters, adapting dynamically to changes in workload and scaling conditions through probabilistic exploration.

  8. 8.

    Effective Handling of Under-provision Events: MDP better responds to sudden workload increases during under-provision events, accommodating higher demands and avoiding latency violations.

  9. 9.

    Robustness with Larger Sample Sizes: The MDP-based cloud scaling process demonstrates higher robustness when evaluated with larger sample sizes, providing accurate evaluations of long-term outcomes.

  10. 10.

    Complex Decision-Making: MDP excels in handling complex scenarios, effectively responding to dynamic environments with fluctuating workloads and varying resource demands.

  11. 11.

    Long-Term Considerations: MDP’s continuous state space and probabilistic nature allow assessing long-term outcomes and optimizing resource utilization over extended periods.

6.3 Practicality

Through this study, we can highlight several significance to the industry’s practicality, namely:

  • Comparing the complexity of the two models can provide insights into the computational requirements and resource utilization of each approach. Meanwhile, understanding the complexity can help industry practitioners choose the most suitable model for their specific needs. For example, if one model offers similar performance with lower computational complexity, it can lead to cost savings and more efficient resource allocation in cloud scaling processes.

  • Analyzing the performance of policies derived from DTMC and MDP models allows us to assess how well each approach adapts to changing workload conditions. This evaluation is crucial for industries that require efficient and responsive cloud scaling mechanisms to meet varying demands while ensuring optimal resource allocation. The findings can help organizations make informed decisions about which modeling technique suits their specific requirements.

  • Assessing the robustness of both models through parameter tuning and testing across diverse sample sizes facilitates the assessment of their reliability and stability. Conducting sensitivity analysis with variations in parameter values and sample sizes enables model calibration and validation. This process aids in identifying optimal parameter values that align the model more closely with real-world observations and historical data, thereby demonstrating its capacity to handle fluctuations and uncertainties in practical scenarios. The increased robustness of the model instills greater confidence within the industry, affirming the scalability and adaptability of their cloud infrastructure.

7 Conclusion

In this research, we have conducted an extensive analysis of cloud scaling processes using the probabilistic model checker PRISM, focusing on the comparison of two separate modeling techniques: DTMC and MDP. Our primary aim was to assess the effectiveness of different cloud scaling policies modeled distinctly using DTMC and MDP. By leveraging PRISM’s simulation capabilities, we obtained valuable insights into the performance and robustness of these policies. The utilization of PRISM for simulation allowed us to explore and quantify the efficiency of cloud scaling policies in the context of infrastructure-as-a-service (IaaS). In addition, we limit our evaluation by using arbitrary data to allow us to create controlled scenarios to assess model behavior and performance under specific conditions. This control can be valuable for understanding the model’s strengths and weaknesses.

Furthermore, our findings offer insights into the strengths and limitations of each modeling technique, providing valuable guidance for practitioners aiming to implement effective cloud scaling strategies. In our evaluation of both MDP and DTMC models, we identified two parameters that exerted significant influence on model sensitivity during over-provisioning and under-provisioning events: specifically, the minimum number of VMs and the initial number of VMs, respectively. Interestingly, we observed that increasing the number of samples had distinct effects on the performance of the two models. For DTMC, as the sample size increased, the decision-making process led to a higher average latency violation during over-provisioning events. However, it demonstrated the ability to reduce this violation as the sample sizes increased when evaluated against under-provisioning events. In contrast, MDP consistently managed to minimize average latency violation for both extreme events across various sample sizes.

Moving forward, there are several exciting opportunities for further research in this domain. Firstly, expanding the scope of analysis to include additional cloud scaling processes (e.g. vertical scaling, container-based scaling, predictive scaling) and policies would enhance the applicability and generalizability of our findings. Moreover, exploring different probabilistic model checking tools and techniques beyond PRISM could provide alternative perspectives on cloud scaling behavior. Furthermore, conducting experiments with real-world data integration could offer a more realistic assessment of cloud scaling policies’ performance. Additionally, investigating the scalability and adaptability of the proposed approach in multi-cloud and hybrid-cloud environments would be a valuable direction for future research. Such scenarios pose unique challenges that demand tailored solutions for efficient cloud resource management.