1 Introduction

Many business endeavors depend heavily on the development of information systems (Ross et al. 2006). For these to be successful, it is desirable not to develop by trial-and-error, but rather by predicting the properties of envisioned services as early as possible in the lifecycle (Kurpjuweit and Winter 2007). Such predictions may guide architects and developers, allowing them to explore and compare design alternatives at a low cost (van Sinderen et al. 2012). Designers routinely argue for or against alternative design alternatives based on the expected impact of those choices on, e.g., the future financial viability, availability, security or functional capabilities. However, experience-based predictions made by individual developers have drawbacks in terms of transparency, consistency, cost and availability (Cooke and Goossens 2004; Nielsen 1994). Therefore, it is desirable to have formal approaches to business and IT architecture property prediction. In addition to prediction, system property analysis methods may be employed to assess properties of existing systems that are difficult to measure directly, as in the case of for instance information security (Freiling 2008).

1.1 Contribution

In this article, we present P2AMF, a framework Footnote 1 for generic architecture analysis of systems Footnote 2. It has to be stressed that the term system here is not limited to IT systems alone. Rather, following the Oxford dictionaries, it refers to a ”set of things working together as parts of a mechanism or an interconnecting network; a complex whole” (Oxford Dictionaries 2013). P2AMF can be used to describe everything that can be expressed in terms of classes, their attributes and relations between these classes. P2AMF is based on the OCL (Object Constraint Language 2010). The most prominent difference between P2AMF and OCL is the probabilistic nature of P2AMF. P2AMF allows the user to capture uncertainties in both attribute values and model structure.

One of the main purposes of performing such architecture analysis is to provide decision support. Seen from the perspective of a generic decision cycle, e.g. the OODA loop (Richards 2004), c.f. Fig. 1, the work presented in this article can aid in encoding observations in terms of scenarios and describing the as-is and various to-be architectures. In particular, P2AMF can analyse these scenarios in the orientation phase of the OODA loop. This in order to facilitate rational decision-making based on analysis rather than subjective judgement.

Fig. 1
figure 1

The presented work seen from the perspective of the OODA loop

1.2 Architecture prediction

In Business and IT system design, many qualities are worth predicting. These include theoretically well-established non-functional properties such as performance, reliability and schedulability. For these properties, widely accepted analysis approaches have been established (UML Profile for MARTE 2009; Lyu 1996; Smith and Williams 2001). There are also properties where consensus on the theoretical base has yet to materialize, e.g. in the case of business network profitability (Gordijn et al. 2005; Johnson et al. 2013; Osterwalder 2004), security (Jürjens 2002; Lodderstedt et al. 2002; Sommestad et al. 2010), and interoperability (Chen and Doumeingts 2003; Ullberg et al. 2012). Finally, there are many functional capabilities and non-functional properties that are so specific to a certain context that the analysis approach needs to be tailored for each instance, e.g. compliance with local and national regulations for financial reporting. The multitude of potentially interesting analyses prompts the need for generic languages and frameworks for system property analysis. An additional raison d’être for such formalisms is the integrated analysis of multiple properties that they enable. Multi-attribute analysis provides a base for structured quality attribute trade-off, and trade-offs between different properties is a key element in any design activity.

To contain the analysis algorithms of multiple business and IT system properties, a framework needs to feature an appropriate and sufficiently flexible language. Many system property analysis approaches are based on logic (Halpern and Weissman 2003; Hansson and Jonsson 1994). There are also a number of quality analyses employing arithmetic operations (Joseph and Pandya 1986; Smith and Williams 2001). Finally, many business and IT property analyses require information on structural aspects of the system (role assignment, logical structure, deployment structure, etc.) (Gokhale 2007; Ritchey and Ammann 2000).

Although analysis is the most prominent feature of the framework, it should also act as the communicator of the design. It must therefore offer descriptive capabilities in terms of modeling. One of the most widely applied generic modeling languages today is the Unified Modeling Language (UML) (OMG Unified Modeling Language 2011). Practically all major IT architecture and design tools are based on or support UML modeling (Lee et al. 2005; Quatrani 2002). Any generic framework for quality analysis therefore benefits from UML compatibility, allowing models to be shared between design and analysis.

The OCL (Object Constraint Language 2010), developed under the aegis of the Object Management Group (OMG), satisfies these requirements, including compatibility with UML. OCL incorporates predicate logic, arithmetics and set theory, making it sufficiently expressive to contain most system property analysis needs. OCL was developed with normative purposes in mind. However, OCL is also suitable for the descriptive (in particular predictive) purposes of architecture analysis, allowing the designer to predict the effects of various changes to a planned business or IT system, or to better understand the workings of an existing system. OCL has previously been employed for various types of analysis (Briand et al. 2003; Lodderstedt et al. 2002; Skene and Emmerich 2003). However, one increasingly important characteristic of modern business and IT systems is not captured by OCL: uncertainty.

1.3 The importance of uncertainty

As industries grow older, our knowledge of the business and IT systems being developed grows less certain. There are several reasons for this development. Firstly, the systems and the architectures describing them are rapidly growing more complex; they are growing in size as well as in the complexity of the underlying technologies. While a few decades ago, it was feasible for one person to fully grasp the workings of a company’s IT architecture or its collaborations with suppliers and customers, this is no longer the case, simply due to the increased complexity. Secondly, as businesses and systems grow older, so do the people who designed them. The original developers of many systems and businesses running today have changed jobs, retired, or even died. Combined with the poor state of documentation that plagues many projects, this adds to our uncertainty. Thirdly, the use of externally developed and maintained business and IT services is increasing. Such uncertainty about e.g. the physical location of stored data may make an exit plan from a vendor or partner difficult to execute (Joint et al. 2009), thus resulting in a lock-in effect (Rold and Chamberlin 2011).

However, it is important to remember that the uncertainty inherent in the business yields not only risk, but also opportunity. Renowned consultancy Gartner argues that for organizations with highly variable total demand or uncertain future needs, cloud solutions, generally coupled with high uncertainties (Khajeh-Hosseini et al. 2010; Marston et al. 2011; Randles et al. 2010), outperform traditional methods, since the pay-per-use principle minimizes waste (Claunch 2011). However, to reap such potential business benefits, well-reasoned decisions under uncertainty must be made.

Arguably, these uncertainties are, in general, so significant that they ought not be ignored in the analysis. Whether a given prediction should be used for decision making or not depends on its credibility. In the face of uncertainty, the decision maker must trade the expected consequences of inaccuracy against the cost of lowering it, typically through additional data collection and modeling. Due to its importance for decision making, it is standard practice in many areas, e.g. in the medical domain, to report on the credibility of analysis results and treatments (Altman et al. 1983; Daly and Bourke 2008; Gardner and Altman 1986). Such credibility is typically specified in terms of significance levels, confidence intervals or similar.

To allow for explicit consideration of uncertainty in the analysis of non-functional properties in business and IT, the framework presented in this paper, P2AMF, is capable of expressing and comprehensively treating uncertainty in UML models. In P2AMF, attributes are random variables. P2AMF also allows the explicit modeling of structural uncertainty, i.e. uncertainty regarding the existence of objects and links. Indeed, as opposed to comparable formalisms (cf. Sect. 7 on related work), P2AMF features probabilistic versions of logic, arithmetic and set operators, properly reflecting both structural uncertainty and the uncertainty of attribute values.

1.4 Outline

This article is composed of eight sections. In Sect. 2, P2AMF is described from the perspective of the user; in this section, the contribution of the article is provided in its most accessible form. The qualities that make P2AMF suitable to different types of analyses are presented in Sect. 3; these qualities are put to the test in the business case analysis of Sect. 4. The most challenging part of the development of P2AMF was the extension of the OCL inference mechanism to a probabilistic context. The proposed inference approach is presented in Sect. 5. Section 6 reports on the successful implementation of a tool for P2AMF modeling and analysis. In Sect. 7, related work is considered. Finally, Sect. 8 contains the conclusions.

2 Introduction to P2AMF

In this section, P2AMF is described from the point of view of the user, i.e. an analyst evaluating a particular property. In the first subsection, the differences between P2AMF and the UML–OCL duo are explained. Then, an example class diagram is introduced and subsequently instantiated. This is followed by a subsection where the object diagram attribute values are predicted. The final two subsections describe the expressiveness and some current applications of P2AMF.

2.1 Differences between P2AMF and UML–OCL

The OCL is a formal language used to state expressions on UML models. These expressions typically specify invariant conditions that must hold for the system being modeled, or queries over objects described in a model (Object Constraint Language 2010). OCL expressions are written in the context of a class, an association or an attribute in order to specify an invariant for one of these concepts. Starting from this context, an OCL expression can navigate through class diagram associations to produce collections of objects or attributes. Based on these, it is possible to evaluate various conditions, e.g. the existence of an object with specific properties, or comparing the value of an attribute with a threshold value.

From the user perspective, P2AMF has many similarities to UML–OCL; from a syntax perspective, every valid P2AMF statement is also a valid OCL statement. There are, however, also significant differences. The first and most important difference is that while OCL mainly is employed in the design phase to specify constraints on a future implementation, P2AMF is used to reason about existing or potential systems. P2AMF may be employed to predict the uptime of a system while OCL is used to pose requirements on the uptime of the same system. While OCL is mainly normative, mandating what should be, P2AMF is descriptive and predictive, calculating what is or will be.

A second difference between UML–OCL and P2AMF is the importance of the object diagram for P2AMF. As in standard UML, class diagrams with embedded expressions may be constructed that represent a whole class of concepts. These diagrams may then be instantiated into object diagrams representing the actual architectures using these concepts. In P2AMF, however, the object diagrams become particularly significant as inference is performed on them.

Furthermore, P2AMF takes uncertainty into consideration. In particular, two kinds of uncertainty are introduced. Firstly, attributes may be stochastic. For instance, when classes are instantiated, the initial values of their attributes may be expressed as probability distributions. As will be described later, the values may subsequently be tuned for each instance.

Secondly, the existence of objects and links may be uncertain. It may, for instance, be the case that we no longer know whether a specific role in the organization is still in use or whether it has been retired. This is a case of object existence uncertainty. Such uncertainty is specified using an existence attribute that is mandatory for all classes. We may also be uncertain about whether a role is responsible for a process or if a server is still in the cluster servicing a specific application. These are examples of association uncertainty. Similarly, this is specified with an existence attribute on the association, implemented using association classes.

The introduction of two mandatory existence attributes and the specification of attribute values by means of probability distributions thus constitute the only changes to OCL as perceived by the user. These modest changes, however, allow for a comprehensive probabilistic treatment of the affected class and object diagrams, including both attribute uncertainty and structural uncertainty, enabling proper probabilistic inference over OCL expressions.

2.2 An example class diagram

To illustrate the usage of P2AMF, consider the simple example of a cloud service. This is a case where the probabilistic nature of P2AMF is relevant; in cloud computing, the sheer complexity of the cloud mandates for an architecture, and architecture analysis, approach. Furthermore, there is a fundamental uncertainty about such things as the number of servers currently providing a given service, about the characteristics of these particular servers, etc. Nevertheless, these aspects influence the properties of the service at hand.

The class diagram for the example is given in Fig. 2. It contains three classes: Service , Cloud and Server . In the present example, we assume that the service provider would like to predict the future response time of the provided service. Thus, responseTime is an attribute of each of the three classes. Furthermore, every server can be up or down, thus prompting the attribute available . If a server is down, the time to repair is given by the attribute timeToRepair . Some of the attributes are given initial values while the rest are derived from other attributes. Such initial values may be either deterministic or probability distributions and correspond to the attribute value relevant for the whole class of concepts, and may be overwritten with more precise information for a specific instance in the object diagram. There is also a helper operation, min , returning the minimum of the provided values. Below, the model’s P2AMF expressions are provided. The class diagram also specify cardinality for the relationships. These are always specified deterministically as in a normal class diagram. However, since the existence of objects and links is uncertain, there might be object models that do not satisfy the cardinality constraints, which is managed by the sampling algorithm, see Sect. 5 for more information.

Fig. 2
figure 2

An example class diagram

Going from the bottom up in the P2AMF expressions below, first consider the Boolean server existence attribute. The probability that a given server exists is given by a Bernoulli distribution of 97 %. Since the running example concerns a future state, this probability distribution represents the belief that a server will in fact be installed as planned, and will be dependent on the modeler’s or other expert’s knowledge. Continuing to the attribute available , the distribution specifies a 95 % probability that a given server is up and running at any given moment. For the attributes timeToRepair and responseTimeWA , truncated normal distributions specify the expected time (in seconds) before a server is up and running again after a failure and the response time for the case of a server that has not failed respectively. Both are truncated at 0 to avoid the small probability for negative values. So far, we have considered four attributes assigned initial probability distributions on the class level. They thus represent the whole population of considered servers. Later, as the class diagram is instantiated, these estimates can be updated with system-specific data.

The top-most attribute of the Server class differs from the previously presented as it is derived. The derivation states that the response time of the server depends on whether it is available or not. If it is available, responseTimeWA gives the response time while timeToRepair returns the relevant value when the server is down. The Execution association connects the Server to the Cloud class. As there is uncertainty about whether a given server is connected to the Cloud, its existence attribute is assigned a probability of 70 %.

The Cloud class has two attributes: its existence , which is similar to the existence attribute of the server class except that we are certain that the Cloud exists; and a real attribute specifying the expected response time of the networking infrastructure. The Provision association class connects Service to the Cloud . Its features are similar to the Execution association class.

figure a

Finally, the class Service contains one derived attribute, responseTime , one operation, min , and the mandatory existence attribute. The service response time is given as a sum of the Cloud networking infrastructure response time on the one hand, and the minimum response time of the set of providing servers on the other. The min operation simply returns the minimum value of a set of values. The existence attribute is similar to those of the other classes.

2.3 An example object diagram

The class diagram captures the general type of system and the causal effects that such systems are subject to. In order to make specific predictions, however, object diagrams detailing actual system instances are required. Instantiations follows the same rules as in object orientation in general. Classes are instantiated into objects, associations into links, multiplicities must be respected, and attributes may be assigned values (in the case of P2AMF, either deterministic values or probability distributions).

There is, however, one interesting and useful difference. In ordinary UML/OCL, values may not be assigned to derived attributes since those attributes are inferred from the derivation expression. Assignment of a different value than the one resulting from the derivation rule would lead to an inconsistent model. The probabilistic inference algorithm presented in Sect. 5, however, does allow the assignment of values to derived attributes, as long as attributes are assigned values within the ranges specified by the probability distributions, on the class level. The most useful consequence of this capability is the possibility to infer backwards in the causal chain. In our running example, we can therefore gain knowledge about the availability of the servers merely by observing the response time of the service. This capacity for backwards reasoning is not available in standard OCL/UML. As an example, consider a model where x = y + z. If x is assigned a value, OCL can tell us nothing of the value of y. P2AMF, however, can. Therefore, in P2AMF, all information that is provided in the object diagram is used to improve the predictions of attribute values.

Returning to the running example, consider the object diagram of Fig. 3. In this instance of the class diagram, the calculator —an instance of the Service class—uses theCloud , which is the single instance of the Cloud class. Three redundant Server instances are present in the Cloud, calcServA , calcServB and calcServC .

Fig. 3
figure 3

Instantiation of the example class diagram

We assume that the service provider estimates the attribute values as presented in Table 1. Note that attributes may be assigned either deterministic values, as theCloud.existence , or stochastic ones, as e.g. calcServC.timeToRepair . Some are not assigned any values at all. These will instead be inferred as part of the prediction. Again, note that unlike standard UML/OCL, any attribute may be assigned a value and any attribute may be unassigned; inference will still be possible. A modeler can therefore obtain predictions based on the current state of knowledge, however poor that knowledge is. Of course, high uncertainties in the object diagram will generally lead to high uncertainties in the predictions.

Table 1 Attributes are assigned either probability distributions or deterministic values in the object diagram

2.4 Inference in the object diagram

With support of a tool (Ullberg et al. 2010), the analyst can perform predictive inference on the object diagram described above with the click of a button. The details of the underlying algorithms are presented in Sect. 5. The results of the inference are new probability distributions assigned to the attributes. As these are typically non-parametric, they are most easily presented in the form of diagrams. Figure 4 displays the distribution of the most interesting attribute, calculator.responseTime . We note that the most probable response time is 80 ms. This is the sum of the most probable response times of theCloud and calcServB , as calcServB is the fastest server and is probably available. However, there is a certain probability (24 %) that calcServB is down (i.e. that available is false ) or that it is not in service (that existence is false ). In this case, calcServA will most probably (83 %) be available, and the response time will increase to 130 ms on average. If calcServA also fails or if it is not in service, calcServC will provide a mean response time of 170 ms. Despite the tripled redundancy, there is a small probability (1.2 %) that none of the servers are available. In that case, the response time depends on the installed server with the shortest time to repair, i.e. either calcServA or calcServC , with a mean of 1:40h (6,000 s) each. Finally, although quite unlikely, there is the risk (0.3 %) that none of the servers will exist as modeled; they could have been taken out of service or were perhaps never installed in the first place. In this case, the response time will be so high that the exact value no longer matters.

Fig. 4
figure 4

calculator.responseTime probability distribution

As mentioned, backward inference is an important capability of probabilistic reasoning. As an example, suppose that when the system has been installed, an end user of the calculator service measures its response time to 130 ms. From this information, the prediction system automatically infers that both calcServA and calcServB must be either unavailable (90 % probability) or non-existent (e.g. retired) (10 % probability) while calcServC must be providing the service. This conclusion is reached automatically, but it can be understood intuitively as follows: Provided by redundant servers, the calculator service response time is given by the fastest available server. Since the measured service response time (taking the Cloud into account) is slower than those of calcServA and calcServB , they are surely down. Since the measured response time fits the probability distribution of calcServC when it is up and running, this must be the providing server.

3 Expressiveness of P2AMF

A set of expressive characteristics featured by P2AMF make it particularly well suited for specifying predictive system property models. These include object orientation, support for first-order logic, arithmetics, set theory and support for expressing both class and instance level uncertainty. This section expands on these capabilities and the next section illustrate them in context.

3.1 Object orientation

P2AMF operates on class and object diagrams. The object-oriented features of such diagrams may therefore be leveraged by the predictive systems in P2AMF. These features are well known and include class instantiation, inheritance, polymorphism, etc. The benefits include the following:

  • Real-world modeling The object-oriented paradigm has consistently proven suitable for modeling the real world.

  • Software system modeling Equally successful as in the modeling of the real world has the object-oriented paradigm been in modeling the software systems that operate on that world.

  • Instantiation From the point of view of prediction, the concept of instantiation allows a clear differentiation between the general prediction theory (expressed on the class level) and the specific instances of prediction (represented by object models). This separation is also a separation of concerns between the developer of the predictive system (e.g. an expert on some quality attribute) and its user (the modeler of a specific system).

Due to object-orientation’s suitability for system analysis, several object-oriented prediction systems have previously been proposed (UML Profile for MARTE 2009; Grassi et al. 2007; Smith and Williams 2001).

3.2 First-order logic

P2AMF is able to express first-order logical relations. The predictive benefits of predicate logic are undisputed. These are used as a base for many deductive formalisms (Jackson 2002; Spivey 1992), including those aimed at for instance information security (Halpern and Weissman 2003), interoperability (Allen 1997), and correctness (Moriconi and Qian 1994).

3.3 Arithmetics

If predicate logic is important for predictive systems, arithmetics, the oldest branch of mathematics, is perhaps even more so. Arithmetics is used for prediction of properties ranging from hardware-related ones such as throughput (Smith and Williams 2001), reliability (Lyu 1996) and execution time (Puschner and Koza 1989) to organizational and economic ones such as cost (Drury 2007), efficiency (Mason-Jones and Towill 1999) and value (Rappaport 2000).

3.4 Set theory

The object-oriented concept of instantiation allows the creation of many objects of the same class. In order to efficiently make predictions on such models, set theory is indispensable. The ability to speak of the number of components in a certain system, the qualities of a set of objects following a given navigation path in an object diagram, etc. are important for predictions on most systems with varying structure. The many prediction-oriented software specification formalisms based on set theory (Jackson 2002; Spivey 1992) testify to its relevance.

3.5 Instance-level uncertainty

As previously discussed, for many real-world systems and situations, perfect information is rare. On the contrary, the available information is often incomplete, old, vague, conflicting or otherwise uncertain (Aier et al. 2009). In P2AMF, attributes of objects, e.g. the uptime of a certain server, the age of a piece of hardware, or the efficiency of an employee, may be expressed by probability distributions.

For many systems, not only the attribute values are associated with uncertainty, but also the system structure. Is it still the case that system X communicates with system Y? Does cloud service Z have double servers as the specification claims, or was one retired last month? Structural uncertainty grows important as the system grows larger and moves further from the modeler, e.g. into the Cloud. The introduction of the existence attribute on classes and associations allows the specification of structural uncertainty in P2AMF.

3.6 Class-level uncertainty

As mentioned previously, the object-oriented separation of class and object diagrams by instantiation is particularly suitable for predictive models as the generic, theoretical prediction laws and structures may be captured on the class level while particulars about a specific system are left to the object model. This division also pertains to the specification of uncertainty. While the object-level modeler may be uncertain about the structure and attributes of a specific system model, the class-level modeler may need to express uncertainties about e.g. the strengths of attribute relations. For instance, to what extent a certain category of firewalls reduces the success rate of cyber attacks is rarely known precisely. Uncertainty regarding such information may be codified by means of class-level attribute uncertainty in P2AMF. Similarly as for the instance level, the existence attribute also allows specification of structural uncertainty on the class level.

4 A business case analyzed with P2AMF

In order to show the usefulness of P2AMF for business-related analysis, this section offers an applied example. Since IT is increasingly being procured ”as a service”, the example will address profitability analysis of service level agreements, SLAs, from the perspective of the service provider. In particular, we focus on availability provision, an important but difficult area of SLA writing.

Specifically, assume that we are about to sell an IT service to a customer. Following negotiations, there is now an SLA proposed. It sets an Availability-Requirement , expressed as a percentage, and also a requirement on the Time To Recovery, TTRRequirement , expressing the customer demand that recovery in the case of outage does not exceed h hours. To enforce these requirements, the SLA contains provisions on fines in case of breaches: There is an AvailabilityFinePerBasisPoint which has to be paid if the average annual availability goes below the requirement. For example, with a requirement of 99.50  % availability, a result of 99.35  % would entail 15 times the fine per basis point to be paid. Similarly, there is a a TTRFinePerHour which has to be paid if the TTR exceeds the requirement. For example, with a requirement of 4 hours recovery time, a recovery lasting 6 hours would entail twice the fine per hour to be paid. Finally, of course, the SLA contains a SalesIncome , i.e. the amount for which we sell it. But will it be profitable?

To find the NetIncome , i.e. the SalesIncome minus our costs for the IT service offered and any fines, we need to model our ApplicationServiceOffer in greater detail. As service providers, we have access to certain information that the customer does not. In particular, we know the Cost of our ApplicationService , as well as the distribution of its Time To Failure, TTF . For the sake of the example, we let this distribution be Weibull, following (Schroeder and Gibson 2010) and (Heath et al. 2002). Similarly, we know the the Cost of our RecoveryProcess , as well as the distribution of its Time To Recovery, TTR . For the sake of the example, we let this distribution be log-normal, following (Schroeder and Gibson 2010) and (Franke et al. 2013b).

Before proceeding with the example, it is worth to dwell on how this problem would be addressed manually, without P2AMF. Probably, the business analyst would apply the well-known equation availability (1):

$$A =\frac{\hbox{MTTF}}{\hbox{MTTF}+\hbox{MTTR}}$$
(1)

Using mean TTF and TTR, this equation gives a valid result for the steady state availability. This is a single figure—the mean—not the entire distribution. But—as pointed out by Snow and Weckman –it is dangerous to base conclusions about SLA breaches on means (Snow and Weckman 2007). P2AMF allows us to consider the entire distribution of both availability and TTR. The result of a simulation (with suitable parameters) is given in Fig. 5. As is evident, the SLA at hand has a good chance of being profitable. However, there is also a substantial risk that it will run at a loss, and a small chance that it will run at a considerable loss. The mean value (also displayed in the figure) gives no insight into these finer details of the opportunities and risks involved.

Fig. 5
figure 5

ServiceLevelAgreement.NetIncome probability distribution

Briefly considering the P2AMF implementation, the main work is done in the definition of the NetIncome -attribute:

figure b

The two if-clauses (illustrating first-order logic) check to see if any of the fines for excessive downtime or availability are to be calculated. If so, the amounts are calculated (illustrating arithmetics) based on the values of the Availability and TTR attributes of the ApplicationServiceOffer and the RecoveryProcess , respectively. These are stochastic, based on sampling of Weibull TTF and log-normal TTR (illustrating instance-level uncertainty). In the final calculation, the stochastic fines and the fixed costs of the provider are both subtracted from the fixed income gained from selling the service, thus yielding the stochastic net income illustrated in the histogram in Fig. 5.

Indeed, this model can help us not only with the immediate decision of whether to accept or decline a concrete SLA on offering. It can also help us make more strategic decisions on availability. There are two ways of increasing availability: increase TTF or decrease TTR (Franke 2012). The P2AMF model enables us to see how much it is worth to pay for better hardware or more robust software (which will increase TTF), and how much it is worth to pay for a stand-by repair crew (which will decrease TTR), given the SLAs that we have currently signed with our customers.

Improving business analysis of availability SLAs is important. In Franke et al. (2013c) practitioners explain that their companies are immature when it comes to SLA writing, particularly when IT services are delivered in complex architectures with many layers of sub-contractors. Furthermore, recent research suggests that practitioners often fail to maximize expected utility when faced with availability SLA decisions (Franke et al. 2013a).

4.1 Other applications and examples

P2AMF has been used to predict such diverse properties as availability (Franke et al. 2013c), interoperability (Ullberg et al. 2012), cyber security (Holm et al. 2013) and the effects of changes to the organizational structure of an enterprise (Gustafsson et al. 2009). It has also been used for multi-property analysis (Närman et al. 2012) of aggregated systems and services and tradeoffs between different attributes including cost and availability (Österlind et al. 2013). The papers cited contain other applied examples of P2AMF that can be read by the interested reader. In total, more than 40 analyses have so far been conducted using P2AMF. For example, P2AMF has been used for analysis by more than 20 practitioners, either on their own or with support from the authors. Most of these cases belong to the financial, defense or power utility businesses. In most cases, the practitioners used the tool to do availability analysis, followed by cost evaluations and data accuracy analysis. Furthermore, P2AMF as encapsulated in the tool has also been used to teach students enterprise architecture analysis. Throughout 5 courses, more than 50 students have used the tool to date.

5 Probabilistic inference

In this section, we explain how inference is performed in P2AMF models. A Monte Carlo approach is employed, where the probabilistic P2AMF object diagram is sampled to create a set of deterministic UML/OCL object diagrams. For each of these sample diagrams, standard OCL inference is performed, thus generating sample values for all model attributes. For each attribute, the sample set collected from all sampled OCL models is used to characterize the posterior distribution.

Several Monte Carlo methods may be employed for probabilistic inference in P2AMF models, including forward sampling, rejection sampling and Metropolis-Hastings sampling (Koller and Friedman 2009; Walsh 2004). Of these, rejection and Metropolis-Hastings sampling allow the specification of evidence on any attribute in the object models while forward sampling only allows evidence on root attributes Footnote 3. In this section, we will present rejection and Metropolis-Hastings sampling since evidence on all attributes is a likely scenario.

Both sampling algorithms have in common that the first step is to generate random samples from the existence attributes’ probability distribution \(P({\bf X}):\,{\bf x}[1],\ldots,{\bf x}[M]\). For each sample, x[i], and based on the P2AMF object diagram O p, a reduced object diagram, \(N_i \in {\bf N}\), containing only those objects and links whose existence attributes, X j , were assigned the value true, is extracted. Some object models generated in this manner will not conform to the constraints of UML. In particular, object models may appear where a link is connected to only one or even zero objects. Such samples are rejected. Other generated object models will violate e.g. the multiplicity constraints of the class model. Such samples are also rejected. Additionally, some OCL derivations are undefined for certain object models, for instance a summation derivation over an empty set of attributes. After this rejection procedure, a set of traditional UML/OCL object diagrams remains, \(\varvec{\Upxi} \subset {\bf N}\). The structures in \(\varvec{\Upxi}\) vary but all elements are syntactically correct. The attributes are not yet assigned values.

5.1 Rejection sampling

The objective of the rejection sampling algorithm is to generate samples from the posterior probability distribution P(X, Y|e), where e = e Xe Y denotes the evidence of existence attributes as well as the remaining attributes. The objective is thus to approximate the probability distributions of all attributes, given observations on the actual values of some attributes, and prior probability distributions representing beliefs about the values of all attributes prior to observing any evidence.

figure c

The algorithm is depicted in Algorithm 1. Rejection sampling requires the attributes that are part of a sample \(Y_1,\ldots,Y_n\) to be sorted in topological order i.e. parent attributes appear earlier in the sequence than the attributes that are calculated based on them, their children. Following the general first step, in the second step, for each of the remaining object diagrams, \(\Upxi_i\), the probability distribution of the root attributes, P(Y r) is sampled, thus producing the sample set \({\bf y}^{r}[1],\ldots,{\bf y}^{r}[{\it size}(\varvec{\Upxi})]\). If there is evidence on a root attribute, the sample is assigned the evidence value. Based on the samples of the root attributes, the OCL expressions are calculated in topological order for each remaining attribute in the object diagram, \(y_i^{\overline{r}} = f_{y_i^{\overline{r}}}({\bf Pa}_{y_i^{\overline{r}}})\). The result is a set of deterministic UML/OCL object diagrams, \(\varvec{\Uplambda} \subset \varvec{\Upxi}\), where in each diagram, all attributes are assigned values.

The third step of the rejection sampling algorithm rejects those object diagrams that contain attributes which do not conform to the evidence. The sampling process ensures that root attributes always do conform, but this is not the case for OCL-defined attributes. The final set of object diagrams, \({\bf O} \subset \varvec{\Uplambda}\), contains attribute samples from the posterior probability distribution P(X, Y|e). These samples may thus be used to approximate the posterior.

5.2 Metropolis-Hastings sampling

Metropolis-Hastings sampling (Koller and Friedman 2009; Walsh 2004) is an iterative sampling technique converging to a desired distribution limit. It aims to create a Markov chain MC with a stationary distribution being the desired distribution, i.e., a chain of samples where the sampled attribute values match the specified evidence. The algorithm is described in Algorithm 2. First one valid sample is created using rejection sampling. Once this sample is found it is used as the first element in the Markov chain.

figure d

The second step is to create a new chain element based on the last added element. A new sample is created as a copy of the last chain element. For the attributes without any specified evidence new values are generated using a candidate-generating distribution. Then the likelihood of the new sample given the old sample \(P({\bf x}^{\prime} |{\bf x})\) is evaluated. Thereafter the probability of acceptance \(\varvec{\alpha}\) of the sample is calculated, considering the likelihood \(P({\bf x}^{\prime} |{\bf x})\), which over time is given more weight to. If \(\varvec{\alpha}\) is greater than a given limit l the sample is added to the chain otherwise the last added element is added again. The second step is repeated until a predefined number M of chain elements has been added. The first samples are typically not used to evaluate the model; they are called burn-in samples B and train the algorithm. As a final step the burn-in samples are removed.

Similar to rejection sampling, Metropolis-Hastings sampling allows specifying evidence for any attribute of the model. This algorithm does need a comparably smaller number of samples and is therefore more effective, especially when considering models including a large number of attributes. The biggest disadvantage of Metropolis-Hastings sampling is that, especially for models with many local minima, a solution not being the best one might be found. This is because of the chain structure of the result, where samples are based on their predecessor.

6 Implementation of P2AMF

In this section, we report on a software tool that allows modeling of both probabilistic class diagrams and probabilistic object diagrams Footnote 4. It also performs inference as described in Sect. 5. A complementary presentation of the tool is available in Buschle et al. (2013).

6.1 Design

The presented tool is implemented in Java using the Eclipse rich client platform. To provide the modeling facility the Eclipse Modeling Framework (EMF) (Steinberg et al. 2008) is used and extended. The tool is divided into two components, the Class Modeler, and the Object Modeler, corresponding to two file types: class and object diagrams.

The Class Modeler is a graphical editing tool for probabilistic class diagrams. In addition to the basic editing functionality provided by other class diagram modeling tools, the Class Modeler (1) allows attribute values defined either by probability distributions or by OCL expressions, (2) requires a value for the mandatory existence attributes of classes and associations, and (3) provides OCL syntax checking support using the OCL plugin of the Eclipse Modeling Framework Eclipse Modeling Framework (2011). A screen shot of the Class Modeler is presented in Fig. 6.

Fig. 6
figure 6

Screen shot of the Class Modeler

The Object Modeler (cf. Fig. 7) has two components: (1) an editing tool for probabilistic object models, and (2) an inference engine. The editing tool differs from other object diagram editor tools mainly in (1) allowing probabilistic attribute values, including the mandatory existence attributes, (2) displaying histograms for all attributes representing their probability distributions after inference, and (3) offering different inference algorithms and parameters. By the click of a button, the calculations described in Sect. 5 generate posterior probability distributions for all attributes.

Fig. 7
figure 7

Screen shot of the Object Modeler

6.2 Usage and performance

As mentioned, the tool has been used for modeling and prediction of several system properties. The largest class diagrams created in these projects have reached sizes of some twenty classes and sixty attributes. As object diagrams can grow significantly larger, we have produced examples with some seventy objects and five hundred attributes.

While model editing of these sizes is straightforward, the performance in the inference can be an issue, depending on the complexity of the P2AMF expressions, the selected inference algorithm and its parameters. For our most complicated models, one Monte Carlo sample requires 0.5 s for the OCL inference on a standard laptop. Footnote 5 For all algorithms, the inference time grows linearly with the number of samples. Figure 8 displays the inference time as a function of the number of samples for the example presented in Fig. 3. Note that 1,000 samples normally is a sufficient sample size; in the diagram, we display the results for larger sample sizes to highlight the linear relationship. For rejection sampling, the acceptance rate (the share of samples that conform to evidence and are thus not rejected) influences inference time inversely proportionally. Figure 9 shows the inference time per sample as a function of the inverted acceptance rate, also for the example of Fig. 3. Thus, the more unlikely the evidence, the greater the share of samples that is rejected, and the longer the computation required to produce a specific number of samples. For the Metropolis-Hastings algorithm, inference time is also affected by choice of proposal distribution and burn-in time. Overall, the current implementation is usable but can encounter performance problems for certain models. Forward sampling for 1,000 samples in our largest models on the aforementioned hardware requires some 80 seconds to calculate. Compared to the time required for modeling (counted in hours or days), this is quite acceptable.

Fig. 8
figure 8

Inference time as a function of the number of samples for the example of Fig. 3. 1,000 samples is normally a sufficient sample size

Fig. 9
figure 9

Inference time per sample as a function of the inverted acceptance rate of Fig. 3

There are several options available for performance improvements, including more efficient coding, multithreading (which is expected to improve performance significantly) and even high-performance cloud computing.

7 Related work

There are three categories of work that in different ways are similar to P2AMF. The first category includes variants of first-order probabilistic models. Among other proposals, these include Bayesian Logic (BLOG) (Milch et al. 2005), Probabilistic Relational Models (PRM) (Friedman et al. 1999) and Directed Acyclic Probabilistic Entity-Relationship (DAPER) models (Heckerman et al. 2004). These are similar to P2AMF in their use of object-based templates which may be instantiated into structures amenable to probabilistic inference. However, first-order probabilistic models also differ from P2AMF; most importantly, they do not consider how logic and arithmetic operators are affected by structural uncertainty. Consider the following expression,

figure e

In contrast to the mentioned works, P2AMF will properly weigh the friends’ probability of existence (object existence probability) as well as the probability that they are indeed friends (link probability existence) in order to provide a relevant measure of average age. When all friends’ existence is certain, the expression will evaluate to their average age (if the friends are 15, 20 and 60 years, the average is 31.67 years). If one friend’s existence probability decreases, her age’s influence on the average will shrink proportionally. Figure 10 displays the probability distribution in the case where the three friends all have a 75 % existence probability.

Fig. 10
figure 10

Average age of three friends

The second category of related work comprises query and constraint languages such as SQL (Melton and Simon 1993) and OCL (Object Constraint Language 2010). Similarly to P2AMF, these languages allow logical and arithmetic queries of object or entity models. They are, however, deterministic rather than probabilistic. There are however also probabilistic versions of such query languages, such as PSQL (Dey and Sarkar 1998). These approaches have similarities with the second type of uncertainty introduced in Sect. 2.1, the existence of objects, but do not cover the stochastic attributes.

The third and most important category of related work is work on stochastic quality prediction for software architecture. Some of these, such as MARTE (UML Profile for MARTE 2009), KLAPER (Grassi et al. 2008) and ArgoPerformance (Distefano et al. 2005), are concerned with the analysis of UML or other Meta-Object Facility (MOF) compliant models. Others, such as the Palladio component model for model-driven performance prediction (Becker et al. 2009), the work by Meedeniya et al. (2012) on architecture based reliability evaluation and the work of the Q-ImPrESS EU project on performance, reliability, and maintainability (Becker et al. 2008) have opted for non-UML modeling formalisms. In the specific context of cloud computing, Stantchev proposes a method for performance evaluation (Stantchev 2009), Klems et al. (2009) offer a framework for economic value analysis, and Lee et al. (2009) propose metrics for non-functional Software-as-a-Service properties. However, common to all of these contributions is their focus on the analysis of particular properties. P2AMF differs from these, as it does not propose specific analyses but rather provides a general language for expressing them. The closest match is probably the work by Ferrer et al. (2012) on multiple non-functional property evaluation, using the Dempster-Shafer approach to probabilistic reasoning. However, P2AMF is more general still; aiming to offer not just a toolbox but a unified language where the best practice of e.g. reliability or performance modeling can be expressed. Within this third category, there are also generic frameworks for system quality analysis, such as ATAM (Bass et al. 2003). These typically provide quite different support than P2AMF, and are not based on probabilistic foundations.

8 Conclusions

It is desirable to predict and assess the expected quality and behavior of business and IT systems already in the design stage. Furthermore, in the constantly changing complex and uncertain business environment of today, the need for such analyses to deal with uncertainty grows.

In this paper, we have reported on a language and tool for probabilistic prediction and assessment of business and IT system properties. The formalism, P2AMF, supports automatic probabilistic reasoning based on set theory, first-order logic and arithmetics. Based on class and object diagrams, P2AMF is compatible with UML. This paper has introduced P2AMF and exemplified it for some simple analysis cases. The use of P2AMF for predicting such diverse properties as system availability, interoperability, performance, usability, data accuracy and the effects of changes to the organizational structure of an enterprise has been reported on. Two algorithms for performing the required probabilistic inference was proposed, and a software tool supporting both modeling and inference was presented.