Keywords

1 Introduction

Cloud computing is a model for enabling ubiquitous, convenient, [2] on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. There are three-service models in the cloud, Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS). An application built or composed of a set of these services is called Cloud Service Based Application (CSBA).

In CSBA, a client can rent a cloud service or a set of cloud services from single or multi providers to make up an application. The provisioning of these services relies on Service Level Agreement (SLA), [3] which is a contract between the client and the service provider including a set of non-functional requirements called Quality of Service (QoS). In the case of SLA violations, costly penalty payments and adjustments to the contract or the underlying system may occur as a consequence. For this specific reason monitoring the SLA and detecting, predicting and resisting violations appears to be important, and it is more challenging in CSBAs since various providers are involved.

Although in the body of literature there is a lot of work on monitoring, detecting and predicting SLA violations. Most of the available systems either rely on grid or service oriented infrastructures, which are not directly compatible to Clouds due to the difference of resource usage model. Considerable efforts has been made lately to monitor and detect SLA violations in CSBA, but these approaches are reactive, i.e., they can only detect a violation and take the necessary action only when the violation has already happened.

Today, Elasticity in Cloud Computing allows the Cloud environment to -ideally automatically- assign a dynamic number of resources to a task, aiming to ensure that the amount of resources needed for its execution is indeed provided to the respective task. The argue that elasticity in cloud computing is a replacement for monitoring SLA violations is a controversial research problem, that is out of the scope of our research. But from our perspective systems providing elasticity today like AmazonWatch and AmazonScale are specific for single provider and don’t fit to the CSBAs that is formed of cloud services from various providers. Nevertheless these tools reacts to the symptoms of resource deficiency in the system rather than being proactive and predicting violations in the system. Moreover what we propose is a leading paradigm in the area of proactive learning from SLA violations in CSBAs.

Learning from history and the failures is in the instinct of human being. We as human beings learn from our historical events and failures that we face in our life to make up some rules in our mind that adapts to new context and proactively anticipate future events and avoid the failure. This idea is the main core of proactive event processing and artificial intelligence researches. For example if our mind learns that an event A followed by an event B leads to a failure C, the next time we face A followed by B we proactively act to the failure and anticipate its consequences.

The question that arises here are, can we adapt the concept of learning from history and making up proactive rules in the context of SLA violations in CSBAs?

In CSBAs a failure in an SLA might lead to a failure in another SLA or in the global SLA, can we correlate or associate these precedencies? The most important question is what is the mechanism to figure out these precedencies and make the proactive rules?

The contributions of this paper are twofold: (i) we present a novel monitoring framework for CSBA, namely Proactive learning from SLA violation, based on MAPE-KFootnote 1 adaptation loop, and (ii) we concretely address the ‘Analysis’ component of the proposed framework. This novel proactive learning approach takes advantage of the massive amount of past process execution data in order to predict potential violations. It identifies the best counter measures that need to be applied. As a proof-of-concept of the proposed approach, a prototype has been developed that ascertains the applicability and feasibility of the proposed solution.

The rest of this paper is structured as follows. Section 2 discusses related work. Section 3 introduces a running scenario that will be used throughout the paper. Section 4 lays the background needed to understand the work proposed in this paper. The proposed predictive monitoring framework is presented in Sect. 5. Section 6 presents our proposed SLA violations learning approach. Section 7 discusses the implementation of the proposed approach on a real-life log. Finally, Sect. 8 draws conclusions and perspectives.

2 Literature Review

There is a massive amount of work in the literature related to cloud-based environment, covering various aspects of this multi-disciplinary domain. In the next discussion, we are focusing on prominent efforts in the area of SLA monitoring and management in CBSA, which is appraised against the work proposed in this paper.

In principle, approaches for monitoring and detecting SLA violations with respect to QoS constraints are mainly based on techniques and strategies to adapt QoS settings according to changes and violations detected during execution of CSBA. In this case QoS parameters are generally used to repair and optimize a web service. Generally, these adaptive approaches are based on the ability to select and replace the failed services dynamically at runtime or during deployment. The selection is governed not only by the need to substitute services but to optimize the requirements of QoS of the system. Accordingly, the system must autonomously adapt itself in order to improve the quality of service of the process. In [11] proposed a novel hybrid and adaptive multi learners approach for online QoS modeling in the cloud; they described an adaptive solution that dynamically selects the best learning algorithm for prediction.

The proposals in [6, 10] address the problem of violation detection and adaptation of SLA contracts between several layers. For example, [6] proposed a methodology to create, monitor and adapt the inter-layer SLA contracts. The SLA model includes parameters such as KPI (key performance indicators), KGI (indicators key objectives), and metrics infrastructure. [10] proposed a solution to avoid SLA violations by applying inter-layer techniques. The proposed approach uses three layers for the prediction of SLA violations. The identification of adaptation needs is based on the prediction of QoS, which uses assumptions about the characteristics of the execution context. In [11], the authors introduced a Cloud Application and SLA monitoring architecture, and proposed two methods for determining the frequency these applications need to monitor, they also identified the challenges in regard with application provisioning and SLA enforcement, especially in multi-tenant Cloud services.

2.1 Monitoring and Detecting SLA Violations in CSBA Related to QoS Management

Considerable efforts has been made lately to monitoring and detecting SLA violations in CSBA [8,9,10] but all proposed approaches are reactive, i.e., they can only detect a violation and take the necessary action only when the violation has already happened.

These approaches are based on the techniques and strategies to adapt settings QoS according to changes and violations detected during execution of CSBA (Cloud Service Based Application). In this case QoS parameters are generally used to repair and optimize a web service. Generally, these adaptive approaches are based on the ability to select and replace the failed services dynamically at runtime or during deployment. The selection is governed not only by the need to substitute services and/or optimize the requirements of QoS of the system. Accordingly, the system must autonomously adapt it self in order to improve the quality of service process. The objective of [8,9,10], is to select the best set of services available at the time of execution, taking into account the constraints of the process but also the preferences of the user and the execution context.

Reference [11] proposed a novel hybrid and adaptive multilearners approach for online QoS modeling in the cloud; they described an adaptive solution that dynamically selects the best learning algorithm for prediction.

To determine the inputs of QoS model at runtime, they partition the inputs space into two sub-spaces, each of which applies different symmetric uncertainty based selection techniques, and then they combine the sub-spaces results.

In [12] the PREVENT approach is described to support the prediction and prevention of SLA violation in service compositions, based on the monitoring of events and the machine learning techniques. The prediction of violations is calculated only for predefined controls in the composition, based on classifiers prediction models regression. However, this approach does not support the changes in the composition according to the problems that may appear in any portion of the composition.

In contrast, [3] addressed the QoS degradation problem that can create SLA violations of a compound service and suggest that management while performing service composition requires consideration of the structure composition and the dependencies between the participating services. They proposed an approach that determines the impact of each service on the overall performance of a composition. This approach aims to estimate the impact factor of the QoS of each service involved in the composition. The shaft concept composition is proposed to characterize the dependency relationships between a service composition and SLA when analyzing the components of the impact factor services.

2.2 SLA Management Including Violation Detection SLA Violations

The works [6, 11] addressee the problem of violation detection and adaptation of SLA contracts between several layers. For example, [6] proposed a methodology to create, monitor and adapt the inter-layer SLA contracts. The SLA model includes parameters proposed KPI (key performance indicators) and KGI (indicators key objectives) and metrics infrastructure. In a second work, [11] proposed a solution to avoid SLA violations by applying the inter-layer techniques. The proposed approach uses three layers for the prediction of SLA violations. The identification of adaptation needs is based on the prediction of QoS, which uses assumptions about the characteristics of the execution context.

Reference [3] proposed a methodology and a tool for learning adaptative strategies of web services to automatically select the optimal repair actions. The proposed methodology is able to learn its repair knowledge incrementally once a detected fault is previously repaired. It is therefore possible to ensure adaptability, at runtime, according current characteristics of the faults and in the history of repair actions performed previously. So this methodology includes the ability to learn autonomously, the two model parameters, which are useful to determine the type of fault and repair strategies that are successful and suitable for particular fault.

Reference [13] introduced a Cloud Application and SLA monitoring architecture, and proposed two methods for determining the frequency that applications needs to be monitored, they identified the challenges facing application provisioning and SLA enforcement especially in multi-tenant Cloud services. [5] discuss autonomous QoS management using a proxy-like approach, SLAs can be exploited to define certain QoS parameters that a service has to maintain during its interaction with a specific customer. However, their approach is limited to Web services and does not consider Cloud.

2.3 SLA Violation Prediction

Reference [8] introduced a general approach to prediction of SLA violations for composite services, taking into account both QoS and process instance data, and using estimates to approximate not yet available data, in contrary to our work their work is to alert the provider to potential quality problems, it clearly cannot directly help preventing them or suggesting proactive actions, moreover in their approach the system introduces the notions of checkpoints (points in the execution of the composition where prediction can be done).

Reference [4] introduced some concepts, such as the basic idea of using prediction models based on machine learning techniques, the trade-off between early prediction and prediction accuracy. However, the authors do not discuss important issues such as the integration of instance and QoS data, or strategies for updating prediction models. Additionally, this work does not take estimates into account, and relatively little technical information about their implementation is publicly available.

Reference [11] proposed a new methodology that predicts any deviation in SLA thresholds, for a response time metric in a SOA-based system, using stochastic models. They created an analytic model that implements different SOA features and then accompanied it with a failure model that is able to figure out if the result from the analytic model fails to comply with a predefined SLA. The model in this work cannot predict any QoS metric in an SLA; rather, it is mainly used for predicting response time compliance according to changes in workload.

Discussion:

The main limitation of the aforementioned approaches is that they only consider certain services regions (execution points) of the composition and do not consider all process tasks. Most of the works targeting SLA violations prediction is addressing grid environments or service-oriented infrastructure that differs from cloud infrastructure, therefore the applicability of these approaches on CSBA is limited.

To the best of our knowledge, the approach proposed in this paper is the first that uses data mining techniques to learn from SLA violations in order to correlate between multiple violated SLAs. It recommends actions for automatically reconfiguring the CSBA to avoid the predicted violation before its occurrence.

2.4 Discussion

Although the adaptation approaches presented above address the problem of SLA violations and service execution failures. In particular, these approaches aim at establishing a connection only with new services that comply with functional and non-functional requirements of adaptation and ignore generally similar services that do not meet the requirements for interoperability and interaction (e.g., services with incompatible communication protocols or different interfaces). This may neglect candidate services with best characteristics and affects, as a result. The main limitation of these approaches is that they only consider certain services regions (execution points) of the composition and do not consider all process tasks. Most of the works targeting SLA violations prediction is considered on grid environments or service-oriented infrastructure that differs from cloud infrastructure, there for the applicability of these approaches on CSBA is limited.

To the best of our knowledge, the approach proposed in this paper is the first that uses data mining techniques to learn from SLA violations in order to correlate between violated SLAs and recommends actions for configuring the CSBA to avoid the predicted violation.

3 Motivation Scenario

To illustrate the ideas presented in this paper we will use a simple travel agency scenario, which is composed of three services: (i) Reserve Flight, (ii) Payment Service, and (iii) Reserve Hotel Service.

We summarize in Table 1 the Service Level Objectives (SLOs) for our CSBA. It corresponds to the SLA specifications of all the QoS constraints for the whole application. Each cloud service provider involved in the Travel-Agency-CSBA configuration promises to satisfy the stipulated Qualities of Services (QoS) through a Service Level Agreement (SLA) with his consumer.

Table 1. QoS constraints of SLA relevant to the scenario.

Each of these services is made up of a mixture of rented Cloud Services (SaaS, PaaS and IaaS). This work aims at locating the failure event and determine adaptation actions in order to prevent its spread at the others layers, as soon as possible. The central focus of Travel-Agency CSBA scenario is the SLA between the client Travel-Agency and the cloud service providers offering the Reserve Fly, the Reserve hotel and the payment cloud services. Upon receiving a request placed by the customer ‘C’, a process instance is created (Fig. 1).

Fig. 1.
figure 1

A simple process describing the travel agency scenario.

For this instance, the process execution starts with the activity ‘Reserve Fly’ (S1). Then, the SLA monitor is called and the software services are invoked if they are available. The maximum duration of the Response time of the whole process should be less than 20 s, a violation of the respective SLA occurs, as it can be seen in Fig. 2.

Fig. 2.
figure 2

Travel agency SLA violations.

4 Background

Numerous tasks are reached by data mining. They can be classified in descriptive tasks which are the association rules in our case and predictive tasks which is here the decision Tree used for the prediction from execution logs.

In this section, we first introduce the decision Tree, a commonly used data mining technique in order to build predictions models from execution logs to be able to predict potential violation and react to it proactively. This is followed by an overview of association rules.

4.1 CSBA Layers

CSBA is a set of different services, each one is provided by different suppliers. There are three-service models in the cloud, Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS). An application built or composed of a set of these services is called Cloud Service Based Application (CSBA) which is a composition of list of rented service.

In CSBA, a client can rent a cloud service or a set of cloud services from single or multi providers to make up an application. The provisioning of these services relies on Service Level Agreement (SLA), which is a contract between the client and the service provider including a set of non-functional requirements called Quality of Service (QoS) [2].

4.2 Decision Tree

Objective:

classification of people or things into groups by recognizing patterns. The user or the expert has always a tendency to structure or classify data into groups of similar objects called classes. For this purpose, he uses distance measurements in order to evaluate the belonging of an object to a class. The most known classification methods are nearest neighbor and decision trees.

For detecting the violation we have used the decision tree learning which uses a decision tree as a model to predict the value of a target variable based on input variables (features), it generate the tree using the tool WEKA (https://weka.wikispaces.com/). The WEKA tool uses the j48 algorithm to generate the decision tree. Additionally, based on monitoring data of historical process instances, we used decision tree learning order to learn the dependencies and to construct classification models which are then used to predict the value of an instance while it is still running.

If a violation is predicted, we identify adaptation requirements and adaptation strategies are extracted from the decision tree in order to prevent the violation. We have decided to use decision trees because of their following advantages in our context: (i) they constitute a white box model as they show explicitly the relationships between explanatory attribute value ranges and categorical target attributes (i.e., KPI classes).

Thus they are easy to understand and interpret for people and enable human support in the learning and adaptation phases. (ii) They support both explanation and prediction. (iii) In particular, they support extraction of adaptation requirements.

A decision tree algorithm works by splitting the instance set into subsets by selecting an explanatory attribute (new node in the tree) and corresponding splitting predicates on the values of that attribute (branches). This process is then repeated on each derived subset in a recursive manner until all instances of the subset at a node have the same value of the target attribute or when splitting does not improve the prediction accuracy.

In order to find out these dependencies, we use classification learning known from machine learning and data mining. In a classification problem, a dataset is given consisting of a set of examples (a.k.a. instances) described in terms of a set of explanatory attributes (a.k.a. predictive variables) and a categorical target attribute. The explanatory attributes may be partly categorical and partly numerical. By using a learning algorithm, based on the example dataset (a.k.a. training set) a classification model is learned (a.k.a. supervised learning), whose purpose is to identify recurring relationships among the explanatory variables which describe the examples belonging to the same class of the target attribute. The so created classification model can be used to explain the dependencies in past instances but in particular also to predict the class of (future) instances for which only the values of the explanatory attributes are known. Each leaf of the decision tree is associated with a learning rule classified by class support (class support) and a probability distribution (class probability). Class support represents the number of examples in the training set, that follow the path from the root to the leaf and that are correctly classified; class probability (prob) is the percentage of examples correctly classified with respect to all the examples following that specific path, as shown in the formula as shown:

$$ prob = \frac{{ \ne \left( {corr\_class\_leaf\_examples} \right)}}{{ \ne \left( {corr\_class\_leaf\_examples + incorr\_class\_leaf\_exapmles} \right)}} $$

A rule is always given with two measures (the support (S) and the confidence (C)) describing its strength and its interestingness. The support (Eq. (1)) is the percentage of transactions that satisfy A and B among all the transactions of the transactions base. The confidence (Eq. (2)) is the percentage of transactions that verify the consequent of a rule among those that satisfy the antecedent (premise) data.

$$ Support\,A \to P\left( {A \cup B} \right) $$
(1)
$$ Confidence\,A \to B = P (B/A) = support\frac{A \cup B}{support}\left( A \right) $$
(2)

4.3 Association Rules

Objective:

associating what events are likely to occur together.

Association models aim to discover relationships or correlations in a set of items. Association rules is a data mining technique intending to find associated values in a given dataset and serving decision making. It has the following form: A-> B (S %), (C %). This rule means that tuples satisfying conditions in A also satisfy conditions in B.

A rule is always given with two measures (the support (S) and the confidence (C)) describing its strength and its interestingness. The support (Eq. (1)) is the percentage of transactions that satisfy A and B among all the transactions of the transactions base. The confidence (Eq. (2)) is the percentage of transactions that verify the consequent of a rule among those that satisfy the antecedent (premise) data. A rule is always given with two measures (the support (S) and the confidence (C)) describing its strength and its interestingness. The support (Eq. (1)) is the percentage of transactions that satisfy A and B among all the transactions of the transactions base. The confidence (Eq. (2)) is the percentage of transactions that verify the consequent of a rule among those that satisfy the antecedent (premise) data.

The extraction of association rules is the generation of the interesting rules with support and confidence greater than minimum thresholds of support and confidence. The process of extracting association rules involves two distinct phases. Firstly, the items having a support level that exceeds a certain threshold are segregated. Secondly, the most frequent items are combined in order to generate associations.

5 A Framework for Proactive Learning from SLA Violations

Figure 3 portrays a high-level architectural view of the proposed cross-layer self-adaptation framework. The framework is based on MAPE-K adaptation loop [5], introduced by IBM as an efficient and novel approach for self-adaptation in autonomic computing. As shown in Fig. 3, the MAPE-K adaptation loop comprises of five main components corresponding to its acronym, which will be discussed in the following. The main focus of this paper is on the Analysis component of the MAPE-K loop, where a proactive learning approach is proposed (cf. Sect. 5) to predict potential QoS violations based on historical execution logs and react accordingly to avoid/prevent the predicted violation. Therefore, the upcoming sections will focus on presenting the details of these two main components.

Fig. 3.
figure 3

Architecture of service-based business processes running on the cloud.

Knowledge:

As services of CSBA CSBS rely on third party cloud service providers, a SLA involves a ‘consumer’ and one to many ‘providers’. For example, a client “X” buys an infrastructure services from Amazon, a platform service from Google, and a software service from SalesForce. These providers must satisfy the obligations they promise to the client “X”. A violation indicates either a cloud service consumer or a provider fails to satisfy one or more constraints contained in the agreement.

The SLA violation patterns are classified into QoS constraint violation patterns, security constraint violation patterns, privacy constraint violation patterns, and regulatory constraint violation patterns. The term ‘constraint’ refers to obligations related to QoS, privacy, regulatory, and security, which are stipulated and must be satisfied by the involved service providers. The knowledge component is responsible of storing and maintaining the SLAs specified upon a cloud service-based system (Fig. 4).

Fig. 4.
figure 4

High-level architectural view of the proposed proactive monitoring framework.

Monitoring:

Monitoring of SLA compliance is of crucial importance to the proposed framework. Monitoring is intended to be in a near real-time fashion in order to take corrective actions before it is too late [14]. Our intention also is to predict possible SLA violation and avoid them before they occur. To tackle the monitoring task, we will depend on complex event processing (CEP) technology.

In traditional systems, data are static in the system while the queries used are changing. For example in traditional database systems, the data are stored in tables and users can write different queries that access those tables to process data and get the results. However, when using complex event processing (CEP) technology, the roles of data and queries are reversed; where the queries will be static and data or events will be dynamic based on the input event streams from different sources. Events are heterogeneous and are generated from different sources. Events can be emitted from sources like sensors, software applications, databases, etc. Events generated from such sources are called raw events. Raw events have more value when they are combined together.

There are several CEP platforms. For the work proposed here, we will use ESPER (http://www.espertech.com/esper/index_redirected.php), an open source CEP platform. For each of the event data sources, an event type is registered within ESPER. To process these events, ESPER defines the query language Event Processing Language (EPL) which is an SQL-like language with select/from/where/order by/group by clauses in addition to built-in temporal operators to reason about the sequencing of events.

For more details about the monitoring component, we refer the reader to [14].

Analysis:

Based on the monitoring results, the analysis component is responsible for performing complex data analysis and reasoning, by the continuous interaction with the knowledge component.

Particularly, the analysis component carries out the processing, the correlation, and the analysis of events stemming from the history of the instances and predicts potential violations for the running instance. Based on that prediction, the planning component will take necessary actions. For this to happen, we will rely on a set of historical traces on the cloud service-based system that will provide insights on how the process was executed in the past. Based on information extracted from the historical traces, predictions and recommendations are provided for running instances. Such predictions and recommendation will be relying on data mining techniques, more specifically, decision trees.

Planning:

Once a violation is predicted, the planning component takes the hand over. To deal with such a predicted violation, the planning component starts by searching for alternative solutions in order to avoid the occurrence of the predicted violation. The planning component will attempt to adapt the smallest possible set of services without directly targeting a re-engineering process of the whole system.

Execution:

constructed in the previous planning component, the execution component is responsible for selecting the adaptation plan (in the form of recommendations as passed from the planning component) with the highest probability of success.

The plan will involve recommendations of adaptation actions for all directly and indirectly affected layers in the cloud stack. That’s the propagation of the adaptation actions from top to down. The selected actions will be executed to avoid anticipated QoS violations, and then the applied actions will be evaluated to check their impact. This involves evaluating whether the applied adaptation plan was actually an efficient plan and prevented anticipated QoS violations, and if it resulted in any other negative impacts. This evaluation will iteratively enhance the quality of the predication models by better learning.

6 The Proposed Learning Approach in Cloud Environments

In this section, we present the details of the proposed learning approach, which combines different existing techniques ranging from learning approaches to decision tree learning, to provide predictions, at runtime, about the achievement of business goals in a Business Process (BP) execution trace. In the following sections, we provide an overview of the approach. Section 7 discusses the implementation of the proposed approach as a Proof-Of-Concept (POC) of the realization of the proposed approach.

6.1 The Proposed Learning Approach

The process of learning from SLA violation and making dependencies precedence between different SLAs violations in CSBAs have been identified as major research challenges in Cloud environments.

SLA does not contain information about the dynamicity of the system. In other words, it is independent of the context of the business process, and it contains information about the service behavior or quality provided by the service which we aim to exploit. SLAs are not mathematically defined. That means that the semantics of the SLA elements and metrics are defined in natural languages, which makes it harder to understand the semantic of QoS, and it is usually dependent on the client and provider contract. Thus, being precise and formal about SLA semantic is necessary. SLAs violations come from different kind of failures, determining the appropriate type of actions to be taken when predicting an SLA violation is equally important.

First Phase:

Learning phase: It is a continuous evolving process. The association rules extraction is explained as follows:

  • Given: a set of historical BP event logs of SLA violations.

  • Find: Association rules.

Our method of Association Rules (AR) determination goes through three steps:

  • The first step is to discover frequent itemsets.

  • The second step is dedicated to find out ARs on the basis of the first step outputs.

  • The third step is the refinement of the extracted ARs.

These steps are depicted in the diagram shown in Fig. 5. Two steps are combined in order to carry out ARs. They are respectively the generation of frequent itemsets and the extraction of association rules. The frequent itemsets are extracted as defined in Algorithm 1.

Fig. 5.
figure 5

Proposed SLA violation learning approach: Architecture overview.

figure a

The input corresponds to the set of historical data generated by the previous SLA violation of CSBA. The AR contains QoS property of SLA in the antecedent and the violated SLA as consequent of the rule. The proposed formula of the AR is given in Eq. 4.1:

$$ \varvec{IF }\,{\text{event}}\,\varvec{AND }\,{\text{condition}}\,\varvec{THEN}\left( {\text{perform}} \right){\text{action}} $$

The next phase is the prediction phase (described below), in that phase some historical executions of CSBA are necessary to bootstrap the prediction. The concrete amount of instances that are necessary, depend both on the expected quality of prediction and on the size and complexity of the service composition.

Second Phase:

SLA Prediction:

The objective of this phase is to (i) predict potential violation, and to (ii) construct/build the best configuration as the recommendation from what have been detected and learned based on the previous learning phase. The set of such entries is presented in the Decision Tree of the output is the frequent set of dependencies. The Prediction algorithm gives precise predictions and avoids unnecessary adaptations. Generally, the approach that predicts SLA violations is based on the idea of predicting concrete SLO values based on whatever monitoring information is already available. In order to identify which data should be used to train which model, some domain knowledge is necessary. However, dependency analysis can be used to identify the factors that have the biggest influence on the respective SLOs.

The association rules prediction is explained as follows:

  • Given: ARs extracted.

  • Find: Predictive ARs.

The process of this phase is shown in Algorithm 2:

figure b

6.2 Exemplifying on the Running Scenario

In this section, the prediction of violations is applied on the Travel agency CSBA scenario (described in Sect. 3). The rules below are an outcome of the Decision Tree mining the data sets sent as inputs from an Excel file of the Travel-Agency scenarios is described in (Sect. 3).

Example of Rule:

IF sum RT2 + RT3  10 = no And RT3 ≤ 6AndRT2 ≤ 11 And RT1 ≥ 5 Then Violation = Viol(P1, P6), the configuration will be: {Response time  20, 1 CPU, 2 GB RAM}.

As presented in Fig. 2, IaaS is an infrastructure as a service that is a rented service from amazon service provider with 1 CPU and 2 GB Ram and it promises PaaS is a platform as a service that presents a rented IIS (internet information services), with 1 CPU and 2 GB Ram and it promises to satisfy response time < 20 s. The Global SLA service promises to satisfy response time < 20 s, giving 10 s as a response time for each of ‘Reserve Hotel’ and 5 s for ‘Reserve Flight’, and 5 s for ‘Payment Application’ service. The response time for example could be violated when any of these application services has an internal violation in response time at the level of SaaS, PaaS or IaaS. In order to avoid such situation, the SLA manager acts proactively based on history of completed activities. At the same time, another monitoring component detects that there is an I/O failure at the SaaS layer as S1 has produced a wrong output.

A specific rule is triggered that derives the best strategy which consists of executing another instance of the web service on a more powerful server with a better memory and CPU allocation. (Amazon (3 CPU, 3 GB RAM)). Assume that a Monitoring Component, running at the server where the web service is executed, detects that the available main memory is not sufficient (IaaS layer) for the web service. At this stage, proactive actions are suggestions to be taken based on some predefined suitable actions for each type of violation the system may encounter. In Table 2, we can see that each violation has a violation type that could be availability or security depending on the type of violation. The actions taken are of two types, namely surgery and elastic actions.

Table 2. Proactive actions.

6.3 Cross Layer Rules

The DM technique uses event rules to reason about the events that occur at a specific service layer. Based on the events received RemoteFault caused by (AuthorizationDenied fault) at a higher layer, a specific rule is fired which derives that the best strategy is to execute another instance of the failed at a more powerful server and with a better memory and CPU allocation. But if the new service requires different protocol/resources, this may lead to adaptation in the PaaS and IaaS layer as well.

Indeed, monitoring events at each layer in an isolated manner is not enough as events may depend on each other. For this purpose, aligning events and defining relationships between them allows a correct diagnosis and avoids the execution of a conflicting adaptation. Events are derived based on rules in the form: Antecedent → Consequent.

Example:

a composition fault caused by a missing or an extra role may trigger the ReplayScope action. Since reexecuting the process activity from its beginning with a missing partnerLink may cause the same problem, a SWRL rule may recommend adaptation at SaaS layer by selecting a substituting service for the corresponding role, to achieve reexecuting all of the activities inside the faulty scope.

hasActivity (ws, act) ∧ causesFault (act, ExtraRole) ˅ causesFault(act, Missing Role) → isManagedBy(ws, Substitute) ∧ isManagedBy(act, ReplayScope) → causesFault(ws, ProcessFault).

Figure 5, simulated or real data are presented as a form of Excel or text file (delaminated file) to the Decision Tree. These data contains entries for process instances with set of violation at different services, below is an example of an entry:

Set of such entries are presented to the Decision Tree, where the output is a frequent set of dependencies (in other word the most violating services that happens together). But the output needs to be filtered since the time constraint of the violating services is important.

For example, the rules below are an outcome of the Decision Tree mining the data sets inputs from the Excel file:

Example of Rule:

IF somme RT2 + RT3  10 = no And RT3 ≤ 6 And RT2 ≤ 11 And RT1 ≥ 5 Then Violation = Viol (P1, P6), the configuration will be: {Response time  20, 1 CPU, 2 GB RAM}.

We extracted the best configuration available in the database to proactive the global violation before it come.

The configuration manager is responsible for configuring the CSBA. The proactive actions suggested by the proactive engine are mapped into the configuration manager to take the action. The action taken is recorded in the knowledge.

Figure 7 shows the form when “suggest action” is pressed. Proactive actions are suggested based on an excel file containing the suitable action corresponding to the predicted violation.

7 Implementation

To demonstrate the applicability and feasibility of our approach, we developed a prototypeFootnote 2 using JAVA. We trigger the execution of 100 process instances using a test client. For each of these instances we select the concrete supplier service and shipper service randomly in order to ensure that history data used for learning contains metrics data on each of these services. During process instance execution, the previously specified metrics are measured and saved in the knowledge database. Then, for each checkpoint a decision tree is learned using the J48 algorithm.

For the implementation of the Predictor, we rely on the WeKa J48 implementation of the C4.5 algorithm, which takes as input a ‘.arff’ file and builds a decision tree as shown in Fig. 6: Text files of real data. The ‘.arff’ file contains a list of typed variables (including the target variable) and, for each trace prefix (e.g., for each data snapshot), the corresponding values are also maintained. The resulting Decision Tree is then analyzed to generate predictions and recommendations as shown in Fig. 6.

Fig. 6.
figure 6

Learning phase.

The configuration manager is responsible for configuring the CSBA. The proactive actions suggested by the proactive engine are mapped into the configuration manager to take the action. The action taken is then stored in the knowledgebase. The algorithm searches in the database (as shown in (Fig. 7) for suitable actions that can be used. Below in Fig. 8 is a part of the Excel file that we used for our decision. For example, as shown in the Table 2 since the violation is Response Time, then the suitable action is to add 1 CPU to the violating service.

Fig. 7.
figure 7

Decision tree.

Fig. 8.
figure 8

Text files of real data.

We evaluated experimentally the model’s performance and accuracy. The experiments were performed on a machine with quad-core CPU 2.6 GHz, 8 GB RAM and Mac OS X operating system. This experiment evaluates the algorithm’s raw relevant and absolute accuracy. The static metrics precision and recall is measured while fluctuating the interval size from 4 to 20 events. Figure 8 shows that relevant precision is 1 for small intervals and falls while increasing the interval size, while absolute precision fluctuates similarly at lower levels (as more irrelevant sub-patterns are discovered) (Fig. 9).

Fig. 9.
figure 9

A screenshot of ARs extraction.

8 Conclusion and Future Work

In this paper we studied adaptation technologies particularly the detection and proactive technologies for CSBA. We discussed the outcomes of our analysis. In particular, we discussed the limitations of different approaches of cross-layer service adaptations.

The major limitation we found is the lack of coordination between adaptation activities that may lead to conflicts or incompatibilities. According to our study, the current solutions do not consider the fact that adaptation in a layer may affect adversely the other layers of service based systems. According to our study, current cross-layer adaptation approaches lack efficient coordination which leads to conflict and incompatibilities. We believe that these problem must be addressed for an efficient cross-layer service adaptation. We presented the results of a brief study on adaptive agent based systems. We found in our study that the agent based adaptive systems have some advanced, features such as context-awareness, self-adaptation, etc. The adaptive SBAs can be benefited by these features especially, the service based adaptive systems can be more intelligent and autonomous.

Additionally, based on our understanding we presented some research directions in the area of cross layer service adaptations. We strongly believe that the research in this area should focus on context awareness, self-adaptation, and performance etc. to develop highly high-performance solutions. We also presented a proposal of a solution which are currently working on.

There are a few limitations of our study. Firstly, this is merely a literature review. However, the state of the art could be better reviewed or understood by benchmarking the existing solutions. A comparison of adaption technologies in different contexts can be done by following a set of rigorous protocols. This paper is missing such an comparison. In our future work, we plan to conduct an empirical study with the current cross-layer adaptation technologies. Also, we plan to conduct a study by covering more contexts (Fig. 10).

Fig. 10.
figure 10

Evaluation results.