Keywords

1 Introduction

Modern software systems operate in a complex ecosystem of protocols, libraries, services, and execution platforms that change over time in response to: new technologies; repairing activities (due to faults and vulnerabilities); varying resources/services availability; and reconfiguration of the environment. Predictability is very hard to achieve since modern software-intensive systems are often situated in complex ecosystems that can be hard or even impossible to fully understand and specify at design-time. Namely, these systems are often exposed to multiple sources of uncertainty that can arise from an ambiguous specification not completely known before the system is running. Common examples of applications affected by uncertain and probabilistic behavior are control policies in robotics, speech recognition, security protocols, and service-based web applications (e.g., e-commerce, e-health, online banking, etc.). In this latter case, highly dynamic and changing ecosystems influence a workflow of interacting distributed components (e.g., web-services or microservices) owned by multiple third-party providers with different Quality of Service (QoS) attributes (e.g, reliability, performance, cost, etc.). Testing is the most common validation technique, seen as it generally represents a lightweight and vital process to establish confidence on the developed software systems. Nevertheless, there is little work in the scientific community that focuses on executable testing frameworks for uncertain systems, with notable exceptions in the context of cyber physical systems [11, 12].

As part of our ongoing research activity on testing under uncertainty [2, 3], this paper introduces HYPpOTesT: a model-based HYPOthesis Testing Toolkit for service-based web applications that considers uncertainty as a first-class concern. Namely, we focus on statistical hypothesis testing [5] of uncertain QoS parameters of the System Under Test (SUT), modeled by a Markov Decision Processes (MDP) [7]. MDPs represent a widely adopted formalism for modeling systems exhibiting both probabilistic and nondeterministic behavior. As described in [4], hypothesis testing (differently from functional testing) represents a fundamental activity to assess whether the frequencies observed during model-based testing (MBT) processes correspond to the probabilities specified in the model. Thus, HYPpOTesT has been tailored to deal with the uncertainty quantification problem [3] by means of hypothesis testing while executing a model-based exploration of the SUT. The MDP specification, along with assumptions on the uncertain QoS parameters, guides the automatic generation of the test cases so that the probability to stress the uncertain components of the application is maximized. Testing feeds a Bayesian inference [5] process that calibrates the uncertain parameters depending on the observed behavior. Bayesian inference is used to compute the Posterior density function associated with specific uncertain parameters \(\theta \) of the MDP model. As described in [5], Bayesian inference represents an effective technique used to update belief about \(\theta \). A Prior density function (or simply Prior) is the probability distribution that would express one’s beliefs about \(\theta \) parameters before some evidence (i.e., experimental data) is taken into account.

This paper focuses on engineering aspects of HYPpOTesT, such as design and implementation concerns. To sum up, our toolkit supports: (i) modeling of a service-based web application (SUT) in terms of a MDP using a simple domain-specific language; (ii) explicit elicitation of the uncertain QoS parameters using Prior probability distributions; (iii) automatic generation, execution, and evaluation of the test cases using an uncertainty-aware (on-the-fly) model-based testing algorithm. Throughout the paper, we adopt an uncertain web-based e-commerce application, U-Store (Uncertain Store), as running example to illustrate the main feature of the toolkit and as exemplar for a preliminary validation of our testing approach.

Related Work. In [1] MDPs are used to model systems exhibiting stochastic failures. The proposed approach aims at finding input-selection (i.e., testing) strategies which maximizes the probability of hitting failures. The approach introduced in [10] is based on Machine Learning and it aims at inferring a behavioral model of the SUT to select those tests which the inferred model is “least certain” about. Results suggest that such a test case generation outperforms conventional random testing. In [12] test case generation strategies based on the uncertainty theory and multi-objective search are proposed in the context of cyber-physical systems. Results in this work showed that this test strategy increases the likelihood to observe more uncertainties due to unknown behaviors of the physical environments. In [11] a testing method that takes into account uncertainty in timing properties of embedded software systems is proposed. This method improves the fault detection effectiveness of tests suites derived from timed automata compared to traditional testing approaches. Summarizing, there are few and recently defined approaches that deal with testing driven by uncertainty awareness. Notable examples has been briefly described. The topic definitely needs further investigation in the area of service-based applications, where QoS attributes are influenced by highly dynamic and uncertain ecosystems.

This paper is organized as follows. Section 2 introduces the running example; Sect. 3 describes our testing toolkit; Sect. 4 reports some experimental results of a preliminary validation of our toolkit; and Sect. 5 concludes the paper.

Table 1. Services composing the U-Store web-based application.

2 Running Example: The U-Store Web Application

U-StoreFootnote 1 consists of a number of services that implement specialized pieces of functionalities and interact with each other using HTTP resource APIs. Table 1 lists the services of the U-Store and provides a brief description of them. From the user perspective, the application behavior can be viewed as a number of functional statuses (or states), each one of them with a number of feasible inputs that cause services to execute specific tasks. Services tasks generate outputs and allow the current state to be changed accordingly.

In this context, both functional and non-functional quality attributes of the web application depend on parameters (e.g., performance, bandwidth, available memory, etc.) typically subject to different sources of uncertainty (e.g., jobs arrival, fault tolerance, scalability, etc.) [6]. As an example, suppose the user navigates the U-Store towards the Checkout web page. After selecting the payment method, the user submits the buy request. At this stage, the U-Store asks the external \(\mathtt {payment}\) service to execute the proper task. The outcome of this operation is inherently influenced by several sources of uncertainty (as those mentioned above), and from the user perspective uncertainty reflects common types of failure or undesired behavior upon the buy request, such as unexpected errors and/or high latency.

3 The Hypothesis Testing Toolkit

HYPpOTesT (see Fig. 1) is tailored to perform hypothesis testing of uncertain service-based web applications by combining an uncertainty-aware MBT and Bayesian inference. A description of the three major components follows.

Fig. 1.
figure 1

Main components of the HYPpOTesT toolkit.

Modeler. The modeler is an Eclipse IDE pluginFootnote 2 that allows the MDP specification to be created using a textual Domain Specific Language (DSL). As described in [7], a MDP model is composed of finite sets of states, transitions, and actions. Transitions between states occur by taking a nondeterministic choice among the available actions from the current state and then a stochastic choice of the successor state, according to a partial probabilistic transition function. Figure 2 reports an extract of the U-Store MDP design-time model using our DSL. The keywords reflect the structural elements of the model: \(\textsf {actions}\) (line 2), \(\textsf {states}\) (line 6), and \(\textsf {arcs}\) (line 14). Figure 3 contains a visual representation of this MDP extract. It is worth noting that upon the \(\textsf {submit}\) action from state \(S_6\) (\(\textsf {readyToPay}\)) we can have multiple responses from the system. Each one of them is associated with a different probability value reflecting our assumption about the behavior of the \(\mathtt {payment}\) service, typically based on past experience or previous studies. To express uncertainty on the assumptions, the DSL allows the Prior probability density functions to be specified (e.g., line 10). In particular, modelers use Dirichlet as conjugate Priors for the uncertain transition probabilities of the MDP model. In fact, as described in [5], the Dirichlet distribution \(Dir(\alpha _i)\) is the natural conjugate prior of the categorical distribution, with \(\alpha _i=(\alpha _{1}, ..., \alpha _{n})\) vector of concentration parameters. Priors are used to express uncertainty on model parameters describing QoS attributes, such as reliability of the services or the communication channels, response time, and cost in terms of resources usage or energy consumption.

To make our framework able to carry out model-based generation and execution of tests, the structural elements in the model are bound to components in the SUT as informally sketched in Fig. 3. Such a binding is defined by the modeler at design-time. MDP actions are mapped to controllable actions while arcs are mapped to observable events. Controllable actions are user inputs supplied to the application using the available web UI. A wide range of inputs typically seen in web applications are supported by our DSL, such as click on different UI elements (e.g., link, button, checkbox, etc.), filling in text fields, submit forms, navigating back and forth, and more. As an example, the \(\textsf {submit\_form}\) controllable action (line 4) allows a form to be submitted. Arguments specify the form id, a timeout, and the id of a specific UI element which contains the result of the executed task. Each arc in the model is mapped to an observable event (e.g., line 22), that is an arbitrary Java boolean expression, where we typically make assertions on the resulting UI element (i.e., a \(\mathtt {WebElement}\) object of the Selenium [8] library package \(\mathtt {org.openqa.selenium}\)). After the execution of a controllable action, HYPpOTesT waits until one of the suitable observable events happens and performs the selected transition. These operations are executed by an automatically generated test harness (AspectJ instrumentation). The MBT module generates test cases using the MDP specification and makes them executable upon the SUT though the test harness.

Fig. 2.
figure 2

U-Store MDP extract in our DSL.

Fig. 3.
figure 3

Visual representation of the U-Store MDP extract and of mapping to SUT components.

Model-Based Testing Framework. The main components of the testing frameworkFootnote 3 are: the MBT module responsible for test cases generation; the Selenium WebDriver responsible for test cases execution; and the Inference module responsible for hypothesis testing.

The MBT module dynamically generates test cases from the MDP specification according to an uncertainty-aware test case generation strategy. Essentially, this strategy solves a dynamic decision problem [7] to compute the best exploration policy \(\pi ^*\) that returns, for each state s, the actions that maximize the probability to reach the uncertain \(\theta \) parameters in the model. More technical details on this strategy can be found in [2]. Thus, the testing process stochastically samples the state space by choosing those inputs that allow the uncertain components of the SUT to be stressed out. To this end, the test harness provides a high-level view of the SUT behavior matching the abstraction level of the MDP specification. Technically, it allows the actions selected by the exploration policy \(\pi ^*\) to be translated into valid inputs for the SUT by means of the Selenium WebDriver that interacts directly with the web UI. At the same time, observable events provide a serialized view on the SUT behavior to keep track of the execution trace and extract meaningful data to perform hypothesis testing. From a theoretical perspective, the MBT module uses the test harness to conduct a input/output conformance game [2, 9] between the model and the SUT. During the conformance game, hypothesis testing is carried out by the Inference module that incrementally updates beliefs on \(\theta \) parameters by using the Bayesian inference formulation: \(Posterior \propto Likelihood \cdot Prior\). In our context, the Prior and Posterior are conjugate distributions and the Posterior can be obtained by applying a very efficient updating rule [2, 5]. In fact, the Posterior is distributed as \(Dir(\alpha ')\), where \(\alpha ' = \alpha + (c_1,...,c_n)\) with \(c_i\) number of observations in category i. At termination, the MBT module summarizes the Posteriors (by computing the mean values) and calibrates the \(\theta \) parameters.

Two termination conditions are currently supported by our testing framework: a traditional condition based on the number of executed tests; and termination based on the convergence of the Bayes factor [5]. This latter condition, in particular, is a model selection method that allows the testing activity to be terminated when the \(\theta \) parameters do not substantially change during the inference process. So, by using this latter method the Inference module decides when inferred \(\theta \) parameters are strongly supported by the data under consideration.

Testing UI. The UI allows information about hypothesis testing to be visualized for human consumption. Three different canvas in the main window show: the MBT model, the Posterior density functions, and a log produced by the MBT module. The MDP model canvas contains an animated visualization of the model. During testing, the UI highlights the current model state and the current action selected by the test case generation strategy. The Probability charts canvas displays the Posterior distributions so that the tester can see how the inference process updates the knowledge on \(\theta \) parameters while testing goes on. The log canvas shows textual information generated by the MBT module for each uncertain parameter: the number of executed tests, the summarization of the Posterior density functions, and the Bayes factor.

Fig. 4.
figure 4

Inference of a \(\theta \) parameter.

Fig. 5.
figure 5

Inference effectiveness.

4 Evaluation

We are evaluating our testing framework by conducting a large testing campaign of the U-Store application. Here we briefly discuss some significant results and we refer the reader to our implementation for the replicability of the presented data. To measure the effectiveness of HYPpOTesT, we artificially induced abnormal conditions due to sources of uncertainty. Namely, the services composing the U-Store application have been configured to simulate service degradation by means of failure rates and random delays. As an example, we forced the uncertain \(\mathtt {payment}\) component to have specific failure/error rates and we executed our testing framework starting from wrong hypothesis (i.e., using an informative Prior having a relative error of 1.5). Figure 4 shows how the uncertain success rate associated with the \(\mathtt {payment}\) service varies during hypothesis testing of the U-Store. The uncertainty-aware strategy allows the probability to test the \(\mathtt {payment}\) component to be maximized during model-based exploration. As long as evidence is collected, the Posterior knowledge is incrementally updated. The termination condition based on the Bayes factor allows the uncertain \(\theta \) parameter to be inferred with high precision (i.e., the order of magnitude of the Posterior relative error is \(10^{-2}\)) after executing \(\sim 8k\) tests.

We also compared the effectiveness of our uncertainty-aware test case generation strategy with two traditional model-based exploration strategies: a completely random walk approach; and a history-based exploration approach. Figure 5 shows the accuracy in terms of Posterior relative error and Posterior Highest Probability Density (HPD) region width, very often used as a measure of the confidence gained after the inference activity (i.e., the smaller the region width, the higher the accuracy). This comparative evaluation assumes equal effort (i.e., 5k tests) spent using different strategies. For each strategy, we started the hypothesis testing activity using a Prior with 0.5 relative error and 0.1 HPD region width. In our running example, the uncertainty-aware strategy used by HYPpOTesT allows inference to be always more precise. On average, we measured a decreased Posterior relative error by a factor of \(\sim 50\).

5 Conclusion

In this paper we presented HYPpOTesT, a model-based testing toolkit for uncertain web service applications modeled in terms of MDPs. HYPpOTesT adopts an online MBT technique that combines test case generation guided by uncertainty-aware strategies and Bayesian inference. Namely, we focused on statistical hypothesis testing of uncertain QoS parameters of the SUT. The U-Store application was used throughout the paper to illustrate the features of the toolkit and as validation benchmark.

As future work, we plan to study different fine grained uncertainty-aware testing methods in order to assess whether delivered confidence out from testing may be better if looking at specific and uncertain model-based properties of interest. We also plan to enhance the toolchain that supports the proposed approach with the ability to perform sensitivity analysis can be apportioned to different experimental designs (e.g., traffic condition, frequency of requests, workload of services).