Keywords

1 Introduction

The coarse-grained, loosely coupled service-oriented architecture (SOA) processes communication among services through simple and well-defined interfaces independent of the underlying implementation platform or network communication module. WS and SOA-based software systems are often combined with various other services to realize SOA. The WS composition problem has received a great deal of research attention, to this effect. With the proliferation of WSs on the Internet, QoS is commonly adopted to describe non-functional WS characteristics. Optimizing the QoS-aware service composition (QSC) is an especially popular research subject in this field. The goal is to select a composite WS that maximizes certain aggregated-quality functions while implementing the desired user functionalities and preserving several QoS constraints.

The workflow-based QSC problem has been extensively studied as well [1,2,3,4]. However, most previous researchers consider each WS to have a deterministic QoS. In actuality, the QoS measure of a WS is intrinsically probabilistic due to the complexity and dynamic nature of the network environment [5] and is very challenging to accurately estimate. For example, the response time for a WS is dependent upon the number of requests invoking it. As discussed by Wang et al. [6], the QoS obtained by the service provider’s description (or the QoS value calculated through historical information) does not truly reflect the performance of the service. For scientific computing tasks, service oriented applications, and MapReduce applications in cloud environment, research has shown that the CPU, network, and I/O performance may fluctuate significantly in the short term [7, 8]. Armbrust et al. [9] found that the performance of a service can fluctuate by 4–16% due to network and disk I/O interference. The QoS of a WS should be described in an uncertain form in order to ensure an accurate and workable QSC problem model [10].

Previous researchers have represented the QoS of a WS as a single value, multiple values [11], standard statistical distribution [12, 13], and any probability distribution [14]. QoS, when represented as a constant value, does not contain quality variations. It is more reasonable to model QoS as a standard statistical distribution (e.g., normal distribution) than several values with different frequencies. Though not every QoS measure of a WS follows a normal distribution, taking any probability distribution into consideration will increase the difficulty of the problem significantly.

In this study, we assumed that the QoS to a WS follows a normal distribution. In an attempt to design a DAG-based workflow, we established QoS aggregation methods for several QoS criteria and built an IQCP to tackle the QSC problem with uncertain QoS. The main contributions of this work are as follows.

  • We use an original and efficient aggregation approach for maximum/min-type and product-type QoSs. Compared to representing QoSs in any probability distribution, the proposed method estimates QoS aggregation quicker and more accurately.

  • We built the QSC problem with uncertain QoS into the well-known IQCP model, which is promising for exactly solving the composition problem with uncertain QoS.

The remainder of this paper is structured as follows. An overview of the research on Web service composition with uncertain QoS is conducted in Sect. 2. Section 3 lists the necessary assumptions and theorems. Section 4 describes the workflow and QoS model. Section 5 details QoS aggregation calculation process. Section 6 proposes the WS composition model with uncertain QoS; Sect. 7 describes its performance in detail. Section 8 provides a brief summary and conclusion.

2 Related Work

The global constraint decomposition strategy [15,16,17] can be adopted to tackle with the QSC problem by considering the uncertainty of QoS. This typically involves dividing the WS composition process into two phases: decomposition of global constraints and local optimization selection. In the former phase, the global constraints are decomposed into a series of constraints imposed on each subtask only. Using these local constraints, the local selection process is carried out via optimization to quickly select best services while ensuring that global constraints are satisfied. When exceptions occur during running time, an appropriate substitution can be quickly identified by simply repeating the local selection process. The strategy is thus adaptable to dynamic environments to a certain extent. Chen et al. [18] proposed the instant recommendation approach to deal with manage uncertain QoS, which works by revealing the most reliable and robust services per the execution log of composite services, therefore, user demands can be fulfilled with higher probability. Hyunyoung et al. [19] also estimated actual QoS performance to a service based on the real transaction history rather than the QoS information published by its provider.

Representing QoSs as multiple values or probability distributions may be a more straightforward way to resolve the uncertain QoS service composition problem. Wang et al. [6] and Shen et al. [20] used the cloud model to evaluate QoS uncertainty; three key parameters(expected value, entropy and hyper entropy) were used to characterize the stability of QoS, then to decrease the number of candidate or composite services, redundant services were pruned by Skyline computing. Skyline computing was also adopted by following work. Fu et al. [21] used an empirical distribution function to describe QoS uncertainty with special focus on stochastic dominance (SD) theory. The method discussed by Fu et al. [22] does not require the assumption that QoS has a specific distribution, and focuses on aggregating the QoS in a cumulative manner. Yu et al. [23] developed the novel p-dominant service skyline concept, which is computed based on a p-R-tree indexing structure and a dual-pruning scheme.

Some researchers have calculated the QoS of a composite service, called QoS aggregation, which is one of the core issues relevant to the QSC problem. Hwang et al. [5] presented a uniform probabilistic model to denote the QoS of atomic or composite WSs with corresponding computation algorithms. The method is precise, but extremely time-consuming. Zheng et al. [14] developed a set of formulas for QoS aggregation according to four typical patterns: sequential, concurrent, selection, and loop. As opposed to the method presented by Hwang et al. [5], its numerical computation algorithms stipulate that the starting point and width of the intervals must be consistent for all QoS distributions – this unfortunately makes QoS monitoring and parameter-setting more difficult. They also ignore QoS aggregation for multiplicative QoS (e.g., reliability) to avoid any combinatorial explosions. Chellammal et al. [15] also denotes QoS denoted as a Probability Mass Function (PMF). By introducing the global constraint decomposition strategy, QoS aggregation is only calculated when the composite service selected via local optimization is unfit for user requests. This reduces the high time overhead on QoS aggregation. By modeling QoS values in normal distribution, Schuller et al. [24] selected the optimized service combination at minimal cost under QoS requirements; they used a simulation approach for QoS estimation. Wang et al. [25] focused on the uncertainty of service execution rather than the uncertainty of QoS. Du et al. [26] and George et al. [27] only used one QoS criterion each: the former used response time, the latter used cost.

3 Underlying Assumptions and Theorems

We held the following assumptions in conducting this study:

  1. (1)

    The QoS to a WS follows a normal distribution and the QoS of one WS is unrelated to the QoSs of other WSs.

  2. (2)

    When QoSs to each WS follow normal distributions, the QoS aggregation to a composite service combined by these WSs follows a normal distribution.

  3. (3)

    For a given workflow F, let s = (s1, s2, …, sn) be an arbitrary composite service to F and the response time of si (i = 1, 2, …, n) follow a normal distribution \( N\left( {\mu_{i} ,\upsigma_{i}^{2} } \right) \). There are two non-negative real number sequences (x1, x2, …, xn) and (y1, y2, …, yn) which can be used to calculate the expectation E(s) and variance D(s) of the response time of s as follows:

$$ E\left( s \right) = \sum\nolimits_{i = 1}^{n} {x_{i} \cdot \mu_{i} } ,\quad D\left( s \right) = \sum\nolimits_{i = 1}^{n} {y_{i} \cdot \sigma_{i}^{2} } $$
(1)

Assume that Xi is a random variable, and \( X_{i} \sim N\left( {\mu_{i} ,\sigma_{i}^{2} } \right) \) (i = 1, 2,…,n), and Xi is independent of Xj (i ≠ j). Let \( Y_{n} = \prod\nolimits_{i = 1}^{n} {X_{i} } \). The expectation and variance of Yn are denoted as E(Yn) and D(Yn), respectively.

Theorem 1.

\( E\left( {Y_{n} } \right) = \prod\nolimits_{i = 1}^{n} {\mu_{i} } \).

Proof:

Because X1, X2, …, Xn are independent of each other, E(Yn) can be obtained as follows:

$$ E\left( {Y_{n} } \right) = E\left( {\mathop \prod \limits_{i = 1}^{n} X_{i} } \right) = \mathop \prod \limits_{i = 1}^{n} E(X_{i} ) = \mathop \prod \limits_{i = 1}^{n} \mu_{i} $$

Theorem 2.

If \( \mu_{i} = \varepsilon \sigma_{i} \), then \( \left( {Y_{n} } \right) = \left[ {\left( {1 + \frac{1}{{\varepsilon^{2} }}} \right)^{n} - 1} \right]\prod\nolimits_{i = 1}^{n} {\mu_{i}^{2} } \)

Proof:

Apply mathematical induction to n.

  1. (1)

    When n = 2,

    $$ \begin{aligned} D\left( {Y_{2} } \right) & = D\left( {X_{1} X_{2} } \right) = \sigma_{1}^{2} \sigma_{2}^{2} + \sigma_{1}^{2} \mu_{2}^{2} + \sigma_{2}^{2} \mu_{1}^{2} \\ & = \left( {2\varepsilon^{2} + 1} \right)\sigma_{1}^{2} \sigma_{2}^{2} = \left[ {\left( {1 + \frac{1}{{\varepsilon^{2} }}} \right)^{2} - 1} \right]\mu_{1}^{2} \mu_{2}^{2} \\ \end{aligned} $$
  2. (2)

    Let us assume that when n is equal to k, the theorem is true. That is,

    $$ D\left( {Y_{k} } \right) = \left[ {\left( {1 + \frac{1}{{\varepsilon^{2} }}} \right)^{k} - 1} \right]\prod\nolimits_{i = 1}^{k} {\mu_{i}^{2} } . $$

When n = k + 1,

$$ \begin{aligned} D\left( {Y_{k + 1} } \right) & = D\left( {Y_{k} } \right)D\left( {X_{k + 1} } \right) + D\left( {Y_{k} } \right)\left[ {E\left( {X_{k + 1} } \right)} \right]^{2} + D\left( {X_{k + 1} } \right)\left[ {E\left( {Y_{k} } \right)} \right]^{2} \\ & = \left\{ {\left[ {\left( {1 + \frac{1}{{\varepsilon^{2} }}} \right)^{k} - 1} \right]\mathop \prod \limits_{i = 1}^{k} \mu_{i}^{2} } \right\}\left( {\sigma_{k + 1}^{2} + \mu_{k + 1}^{2} } \right) + \sigma_{k + 1}^{2} \mathop \prod \limits_{i = 1}^{k} \mu_{i}^{2} \\ & = \left[ {\left( {1 + \frac{1}{{\varepsilon^{2} }}} \right)^{k + 1} - 1} \right]\mathop \prod \limits_{i = 1}^{k + 1} \mu_{i}^{2} \\ \end{aligned} $$

These two steps yield the conclusion.

4 Workflow and QoS Model

A workflow represents how to constitute the capabilities of different WSs in four basic patterns (sequence, concurrency, selection, and loop). The labeled graph [28], numbered graph [20], and DAG [29] are common ways to represent a workflow. In the resource allocation field, DAG is used to represent the workflow [30, 31]. DAG cannot directly denote the selection and loop patterns. However, the loop pattern can be regarded as a special sequential one. A workflow with selection patterns can be broken up into several workflows without any selection pattern. Therefore, a workflow with selection and loop patterns can be transformed into several workflows that can be represented by DAG. Consider the workflow shown in Fig. 1, which can be split into the two workflows shown in Fig. 2a and b, respectively. Here, we only consider workflows that can be represented with DAG.

Fig. 1.
figure 1

Workflow with selection pattern

Fig. 2.
figure 2

Equivalent workflow to Fig. 1 represented by DAG

The QoS is used to measure the performance of candidate services. The most commonly used QoS criteria include cost, response time, reliability, availability, reputation, and throughput. According to the aggregation method, these criteria can be divided into four classes: sum-type (e.g., cost), min/max-type (e.g., response time), product-type (e.g., reliability), and average-type (e.g., reputation). Similar to [14], the aggregation rules for sequence and concurrency patterns and different types of QoS criteria are summarized in Table 1.

Table 1. Aggregation rules for different patterns and types of QoS

5 Expectation and Variance of QoS for Composite Services

Assume that there are n tasks T = {T1, T2,…, Tn} in a workflow F. Each task Ti, i \( \in \left[ {1,n} \right] \) has m number of candidate WSs si= {si1, si2,…, sim}. A set of 0–1 variables x= {xij}(\( 1 \le i \le n \), \( 1 \le j \le m \)) represent a combination cs(x) of F. When the task ti chooses the service sij, pij = 1, otherwise pij = 0. Let the cost, response time, reliability, and reputation of sij follow normal distributions \( {\text{N}}\left( {\mu p_{ij} ,\sigma p_{ij}^{2} } \right) \), \( {\text{N}}\left( {\mu t_{ij} ,\sigma t_{ij}^{2} } \right) \), \( {\text{N}}\left( {\mu r_{ij} ,\sigma r_{ij}^{2} } \right) \) and \( {\text{N}}\left( {\mu c_{ij} ,\sigma c_{ij}^{2} } \right) \), respectively. According to our assumptions, the cost, response time, reliability, and reputation of cs(x) will also follow normal distributions. Their corresponding expectation and variance are discussed below.

5.1 Expectation and Variance of Cost

According to Table 1, the cost of a composite service can be summed by the cost of all its components. The linear combination of a set of independent normal random variables still obeys a normal distribution, so the cost of cs(x) is distributed from a normal distribution. Thus, the expectation Ep(cs(x)) and variance Dp(cs(x)) of cs(x) can be obtained by Formulas (2) and (3), respectively:

$$ {\text{E}}_{p} \left( {cs\left( x \right)} \right) = \sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j}^{m} {p_{ij} \cdot \mu p_{ij} } } $$
(2)
$$ {\text{D}}_{p} \left( {cs\left( x \right)} \right) = \sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \sigma p_{ij}^{2} } } $$
(3)

5.2 Expectation and Variance of Response Time

Let j represent an arbitrary composite service of F which consists of a series of services (\( s_{{1j_{1} }} \), \( s_{{1j_{1} }} \), …, \( s_{{nj_{n} }} \)). Under our assumptions, the existence of two non-negative real number sequences (x1, x2, …, xn) and (y1, y2, …, yn) yields the following two formulas:

$$ \sum\nolimits_{i = 1}^{n} {x_{i} \cdot \mu t_{{1j_{1} }} } = \mu t_{j} ,\quad \sum\nolimits_{i = 1}^{n} {y_{i} \cdot \sigma t_{{1j_{1} }}^{2} } = \sigma t_{j}^{2} $$
(4)

where \( \mu t_{j} \), \( \sigma t_{j} \) denote the expectation and mean square deviation of j, respectively. Their values can be calculated as follows by sampling:

$$ \mu t_{j} = \sum\nolimits_{i = 1}^{Times} {t_{{ij_{i} }} \left( k \right)} ,\quad \sigma t_{j}^{2} = \sum\nolimits_{i = 1}^{Times} {\left[ {t_{{ij_{i} }} \left( k \right) - \mu t_{j} } \right]^{2} } $$
(5)

where \( t_{{ij_{i} }} \left( k \right) \) denotes the k-th response time of \( s_{{ij_{i} }} \) and Times is the number of sampling iterations.

Select q number of different composite services (j1, j2, …, jq) by random for F and let:

$$ x = \left( {x_{1} ,x_{2} , \ldots ,x_{n} } \right)^{\text{T}} ,\quad u = (\mu t_{j1} , \mu t_{j2} , \ldots , \mu t_{jn} )^{\text{T}} $$
(6)
$$ U = \left( {\begin{array}{*{20}l} {\mu t_{{1j_{11} }} } \hfill & {\mu t_{{2j_{12} }} } \hfill & \cdots \hfill & {\mu t_{{nj_{1n} }} } \hfill \\ {\mu t_{{1j_{21} }} } \hfill & {\mu t_{{2j_{22} }} } \hfill & \cdots \hfill & {\mu t_{{nj_{2n} }} } \hfill \\ \cdots \hfill & \cdots \hfill & \cdots \hfill & \cdots \hfill \\ {\mu t_{{1j_{q1} }} } \hfill & {\mu t_{{2j_{q2} }} } \hfill & \cdots \hfill & {\mu t_{{nj_{qn} }} } \hfill \\ \end{array} } \right) $$
(7)

yielding the following expression:

$$ Ux = u $$
(8)

When q > n, Formula (8) is a non-negative overdetermined linear equation system. Its solution, that is, the value of x, can be calculated by a known method.

Similarly, let:

$$ y = \left( {y_{1} ,y_{2} , \ldots ,y_{n} } \right)^{\text{T}} ,\quad o = (\sigma t_{j1} , \sigma t_{j2} , \ldots , \sigma t_{jn} )^{\text{T}} $$
(9)
(10)

This allows us to obtain following equation:

$$ Oy = o $$
(11)

y can be obtained by solving Eq. (11).

Calculating the value of x or y are time consuming. However, the calculation process can be completed offline because it depends only on the workflow and candidate services, and is independent of user requirements. Hence, this process does not affect the time overhead of combining services.

After determining values of x and y, the expectation and variance of the response time of cs(x), denoted as Et(cs(x)) and Dt(cs(x)), respectively, can be calculated as follows:

$$ {\text{E}}_{t} \left( {cs\left( x \right)} \right) = \sum\nolimits_{i = 1}^{n} {\left( {x_{i} \sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \mu t_{ij} } } \right)} ,\quad {\text{D}}_{t} \left( {cs\left( x \right)} \right) = \sum\nolimits_{i = 1}^{n} {\left( {y_{i} \sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \sigma t_{ij}^{2} } } \right)} $$
(12)

5.3 Expectation and Variance of Reliability

According to Table 1, the reliability of a composite service can be achieved by multiplying the reliability of all its components. Based on Theorem 1, the expectation of reliability of cs(x), denoted as Er(cs(x)), can be calculated as follows:

$$ {\text{E}}_{r} \left( {cs\left( x \right)} \right) = \prod\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \mu r_{ij} } } $$
(13)

According Theorem 2, the variance of reliability of cs(x), denoted as Dr(cs(x)), can be calculated approximately by Formula (14):

$$ {\text{D}}_{r} \left( {cs\left( x \right)} \right) = \left[ {\left( {1 + \frac{1}{{\varepsilon^{2} }}} \right)^{n} - 1} \right]\prod\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \mu r_{ij}^{2} } } $$
(14)

where the parameter \( \varepsilon \) can be obtained as follows:

$$ \varepsilon = \frac{1}{n \cdot m}\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {\frac{{\mu r_{ij} }}{{\sigma r_{ij} }}} } $$
(15)

5.4 Expectation and Variance of Reputation

According to Table 1, the reputation of a composite service can be calculated by averaging the reputation of all its components. The expectation and variance of the reputation of cs(x), denoted as Ec(cs(x)) and Dc(cs(x)), respectively, can be calculated as follows:

$$ {\text{E}}_{c} \left( {cs\left( x \right)} \right) = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \mu c_{ij} } } ,\quad {\text{D}}_{c} \left( {cs\left( x \right)} \right) = \frac{1}{{n^{2} }}\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \sigma c_{ij}^{2} } } $$
(16)

6 Web Service Composition Model with Uncertain QoS

Without loss of generality, our aim is to minimize the cost while satisfying QoS constraints in regards to response time, reliability, and reputation. Our model is described in detail below.

$$ {\text{Object:}}\quad { \hbox{min} }\left( {\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j}^{m} {p_{ij} \cdot \mu p_{ij} } } + \beta \sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \sigma p_{ij}^{2} } } } \right) $$
(17)
$$ {\text{s}} . {\text{t}} .:\quad P\left( {q_{t} \le C_{t} } \right) \ge p_{t} ,\quad P\left( {q_{r} \ge C_{r} } \right) \ge p_{r} ,\quad P\left( {q_{c} \ge C_{c} } \right) \ge p_{c} $$
(18)
$$ p_{ij} \in \left\{ {0,1} \right\}, 1 \le i \le n, 1 \le j \le m $$
(19)
$$ \sum\nolimits_{j = 1}^{m} {p_{ij} } = 1, 1 \le i \le n $$
(20)

where \( \beta \) is a tunable parameter, P(X ≤ x) is the probability that the value of X falls into the interval (− \( \infty \), x], Ct, Cr, and Cc respectively represent the constraints of response time, reliability, and reputation, and pt, pr, and pc are given constants.

Considering that the QoS of composition services are subject to normal distributions, Inequation (18) can be converted into Inequations (21)–(23) in accordance with the \( 3{\varvec{\upsigma}} \) principle:

$$ \mu_{t} + 3\sigma_{t} \le C_{t} $$
(21)
$$ \mu_{r} - 3\sigma_{r} \ge C_{r} $$
(22)
$$ \mu_{c} - 3\sigma_{c} \ge C_{c} $$
(23)

where \( \mu_{t} \), \( \sigma_{t} \), \( \mu_{r} \), \( \sigma_{r} \), \( \mu_{c} \), and \( \sigma_{c} \) respectively represent the expectation and mean variance of the response time, reliability, and reputation of a composite service.

Inequation (21) is equivalent to the following two inequations:

$$ 0 \le C_{t} - \mu_{t} $$
(24)
$$ 9\sigma_{t}^{2} \le (C_{t} - \mu_{t} )^{2} $$
(25)

Substituting Formula (12) into in Inequation (25), and introducing the tunable parameters \( \beta_{1} \) and \( \beta_{2} \) (considering that there exists some error in the expectation and variance of the response time), yields the following:

$$ 9\beta_{2} \sum\nolimits_{i = 1}^{n} {\left( {y_{i} \sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \sigma t_{ij}^{2} } } \right)} \le \left( {C_{t} - \beta_{1} \sum\nolimits_{i = 1}^{n} {\left( {x_{i} \sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \mu t_{ij} } } \right)} } \right)^{2} $$
(26)

Introduce a variable \( \gamma = \sqrt {\left( {1 + \frac{1}{{\varepsilon^{2} }}} \right)^{n} - 1} \), Substituting Formulas (13) and (14) into inequation (22), and introducing the tunable parameters \( \beta_{3} \) (considering that there exists some error in the variance of the reliability), yields the following:

$$ \prod\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \mu r_{ij} - 3\beta_{3} \gamma } } \sqrt {\prod\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \mu r_{ij}^{2} } } } \ge C_{r} $$
(27)

Note that \( \sqrt {p_{ij} } = p_{ij} \). After some simplifications, Inequation (27) is equivalent to the following inequality:

$$ \left( {1 - 3\beta_{3} \gamma } \right) \cdot \prod\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \mu r_{ij} \ge C_{r} } } $$
(28)

Taking the logarithm of both sides of Inequality (28) yields the following:

$$ \sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \log \left( {\mu r_{ij} } \right)} } \ge { \log }\left( {C_{r} /\left( {1 - 3\beta_{3} \gamma } \right)} \right) $$
(29)

Inequality (23) is equivalent to the following condition:

$$ \mu_{c} - C_{c} \ge 0,\quad (\mu_{c} - C_{c} )^{2} \ge 9\sigma_{c}^{2} $$
(30)

Substituting Formula (16) into Inequality (30) yields:

$$ \left( {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \mu c_{ij} - C_{c} } } } \right)^{2} \ge \frac{3}{{n^{2} }}\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \sigma c_{ij}^{2} } } $$
(31)

In summary, the problem with WS composition for uncertain QoS can be represented as an IQCP model.

$$ {\text{Object:}}\quad { \hbox{min} }\left( {\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j}^{m} {p_{ij} \cdot \mu p_{ij} } } + \beta \sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \sigma p_{ij}^{2} } } } \right) $$
(32)
$$ {\text{s}} . {\text{t}} . :\quad 0 \le {\text{C}}_{\text{t}} -\upbeta_{1} \sum\nolimits_{i = 1}^{n} {\left( {x_{i} \sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \mu t_{ij} } } \right)} $$
(33)
$$ 9\upbeta_{2} \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {\left( {{\text{y}}_{\text{i}} \sum\nolimits_{{{\text{j}} = 1}}^{\text{m}} {{\text{p}}_{\text{ij}} \cdot\upsigma{\text{t}}_{\text{ij}}^{2} } } \right)} \le \left( {{\text{C}}_{\text{t}} -\upbeta_{1} \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {\left( {{\text{x}}_{\text{i}} \sum\nolimits_{{{\text{j}} = 1}}^{\text{m}} {{\text{p}}_{\text{ij}} \cdot\upmu{\text{t}}_{\text{ij}} } } \right)} } \right)^{2} $$
(34)
$$ \sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \log \left( {\mu r_{ij} } \right)} } \ge { \log }\left( {C_{r} /\left( {1 - 3\beta_{3} \gamma } \right)} \right) $$
(35)
$$ \sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{m} {p_{ij} \cdot \mu c_{ij} - n \cdot C_{c} \ge 0} } $$
(36)
$$ \left( {\sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {\sum\nolimits_{{{\text{j}} = 1}}^{\text{m}} {{\text{p}}_{\text{ij}} \cdot\upmu{\text{c}}_{\text{ij}} - {\text{n}} \cdot {\text{C}}_{\text{c}} } } } \right)^{2} \ge 3 \cdot \sum\nolimits_{{{\text{i}} = 1}}^{\text{n}} {\sum\nolimits_{{{\text{j}} = 1}}^{\text{m}} {{\text{p}}_{\text{ij}} \cdot\upsigma{\text{c}}_{\text{ij}}^{2} } } $$
(37)
$$ p_{ij} \in \left\{ {0,1} \right\}, 1 \le i \le n, 1 \le j \le m $$
(38)
$$ \sum\nolimits_{j = 1}^{m} {p_{ij} } = 1, \,\,1 \le i \le n $$
(39)

7 Experiments

7.1 Robustness Metrics

Several previous researchers have explored temporal robustness metrics for resource scheduling or service composition problems. There is no consensus on which metric should be adopted, but instead it is up to the scholar’s discretion per the problem at hand. Tolerance time [30], makespan mean [31], slack time [32], and robustness probability [33] are commonly used metrics. In this study, we established the following two metrics according to these metrics.

The first is robust probability Rp, which represents the probability that the selected composite service satisfies the stated constraints. Let TotalTimes represent the total number of tests and FailedTimes represent the total number of defaults. Rp is calculated as follows:

$$ R_{p} = \left( {{\text{TotalTimes}} - {\text{FailedTimes}}} \right)/{\text{TotalTimes}} $$
(40)

The other is relaxation metrics Rs, which represents the gap between user constraints and the aggregated QoS of the selected composite service:

$$ R_{s} = \left( {C_{t} - t} \right)/C_{t} + \left( {r - C_{r} } \right)/C_{r} + \left( {c - C_{c} } \right)/C_{c} $$
(41)

where Ct, Cr, and Cc respectively denote the restrictive conditions of response time, reliability, and reputation, while t, r, and c respectively denote the response time, reliability, and reputation of the selected composite service. The values of t, r, and c are random, so the Rs value is the minimum value of multiple measurements.

7.2 Simulation Environment and Parameter Settings

We conducted experiments on a PC which has a 2.4 GHz CPU and 4 GB of memory installed with win7 and JRE6. We used CPLEX to solve the IQCP model and the function lsqnonneg in Matlab to solve the non-negative overdetermined linear equation system. The expectations of response time, reliability, and reputation to a candidate service were taken from the QWS database [34]. The expectation of cost was randomly evaluated on the interval [100, 200] due to the lack of information about cost in this database. As pointed out by Armbrust et al. [9], the fluctuation range of response time can reach 4–16%. For the response time, we let the mean variance be times the expectation where is a random value on [0.1, 0.2]. We took a similar approach to the mean variance of cost and reputation. The reliability criterion belongs to product-type. If the magnitude of fluctuation is relatively large, the reliability to a composite service including a lot of component services may tend towards zero. Thus, the reliability criterion is a random value on [0.001, 0.015]. And the maximum reliability and reputation criteria were set to 1.

If a candidate service is selected for each task of the workflow, its QoS is the average expectation of the QoS for all its candidate services. The response time, reliability, and reputation of this composite service are denoted as BCt, BCr, and BCc, respectively, then the values of Ct, Cr, and Cc are set to 1.2 * BCt, 0.8 * BCr, and 0.8 * BCc, respectively. We set the number of samples to 10000, \( \beta = 0 \), and \( \beta_{1} = \beta_{2} = \beta_{3} = 1 \).

The DAGs in our experiments were randomly generated. The number of nodes starts at 10 and increases to 100 in intervals of 10. In DAG, there is an initial node and a termination node. Each node has 1–4 direct child nodes except the termination node at a ratio of 6:3:2:1. The number of candidate services per task also starts at 10 and increases to 100 by intervals of 10.

7.3 Robustness Analysis

When number of tasks was assigned 20 and number of WSs varied between 10 and 100, values of Rp and Rs were as shown in Table 2. Table 3 shows these values when number of WSs was assigned 20 and number of tasks varied between 10 and 100. The Rp values in both tables are approximately 99.9% for different number of tasks and WSs, i.e., more than 99.74% as determined by the \( 3{\varvec{\upsigma}} \) principle. The value of Rs is around 0.7, indicating that there were still some gaps between user constraints and the aggregated QoS of the selected composite service in our experiment. The values of Rp and Rs were less affected by the scale of the problem, indicating that our model has good stability.

Table 2. Rp and Rs over WSs with 20 tasks
Table 3. Rp and Rs over tasks with 20 WSs

The results shown in Figs. 3 and 4 indicate that the time overhead increases rapidly with the number of tasks and the number of services when using CPLEX. More efficient algorithms are yet necessary.

Fig. 3.
figure 3

Time overhead over a range of WSs with 20 tasks

Fig. 4.
figure 4

Time overhead over a range of tasks with 20 WSs

7.4 QoS Estimation of Composite Services

Accurate and rapid estimation of QoS is the key to resolving the large-scale WS composition problem with uncertain QoS. We evaluated the QoS distribution and time overhead of our approach (labeled as M1) compared to the method adopted by Hwang et al. [5] (labeled as M2) and the simulation method adopted by Zheng et al. [14] (labeled as M3). The number of tasks and WSs are assigned 20 and 100, respectively. For M2, we adopted the algorithm and parameters recommended by Hwang et al. [5]; that is, the aggregate random variable discovery problem (ARVD) used the greedy strategy, the sample space of a single random variable was set to 20, and the aggregate size of the sample space was set to 30. For M3, number of samples was 10000.

We estimated the QoS distribution for any composite service for a given workflow with 20 tasks using the above three methods; the results are shown in Fig. 5. Generally, when the number of samples was large enough, the results obtained by M3 were very close to the actual. The distributions of cost (Fig. 5a), response time (Fig. 5b), reliability (Fig. 5c), and reputation (Fig. 5d) obtained by our method were approximately the same as M3. The results obtained by M2 deviated substantially.

Fig. 5.
figure 5

QoS distribution to composite services for three methods

As shown in Fig. 6, the time complexity of M1 was far less than M2 or M3 for the number of tasks. In effect, our method is better suited to solving large-scale service composition problems with uncertain QoSs.

Fig. 6.
figure 6

Time overhead to calculate QoS for three methods

8 Conclusions

As distributed and integrated applications, WSs are invoked over a network (usually the Internet). The corresponding QoS is affected by many factors including the network environment, hardware facilities, user behavior, and others, making it very challenging to accurately estimate. The model used to describe the WS composition problem with uncertain QoS must be sufficiently robust – in other words, the selected composite services should have a high probability of meeting user requirements even if the QoSs of WSs are volatile. In this study, we represented the WS composition problem with uncertain QoS as an IQCP model based on some assumptions and approximations. We validated the proposed model by a series of simulations.

In the future, we plan to further optimize the IQCP model and its parameters. We also plan to find more effective algorithms to solve the model and to apply to other types of QoS probability distributions.