5.1 Overview

As shown in Fig. 5.1, with a set of standard protocols, i.e., SOAP (Simple Object Access Protocol), WSDL (Web Services Description Language), and UDDI (Universal Description, Discovery and integration), Web services provided by different organizations can be discovered and integrated to develop service-oriented applications [3]. With the growing number of Web services in the Internet, many alternative Web services can provide similar functionalities to fulfill users’ requests. Syntactic or semantic matching approaches based on services’ tags in UDDI repository are usually employed to discover suitable Web services [9]. However, discovering Web services from UDDI repositories suffers several limitations. First, since UDDI repository is no longer a popular style for publishing Web services, most of the UDDI repositories are seldom updated. This means that a significant part of information in these repositories is out of date. Second, arbitrary tagging methods used in different UDDI repositories add to the complexity of searching Web services of interest.

To address these problems, an automated mechanism is required to explore existing Web services. Considering that WSDL files are used for describing Web services and can be obtained in several ways other than UDDI repositories, several WSDL-based Web service searching approaches are proposed such as Binding Point, Grand Central, Salcentral, and Web Service List. However, these engines only simply exploit keyword-based search techniques which are obviously insufficient for catching the Web services’ functionalities. First, keywords cannot represent Web services’ underlying semantics. Second, since a Web service is supposed to be used as part of the user’s application, keywords cannot precisely specify the information user needs and the interface acceptable to the user. In this chapter, we employ not only keywords but also operation parameters to comprehensively capture Web service’s functionality.

Fig. 5.1
figure 1

\(\copyright \)[2010] IEEE. Reprinted, with permission, from Ref. [11]

Service-oriented system architecture.

In addition, Web services sharing similar functionalities may possess very different non-functionalities (e.g., response time, throughput, availability, usability, performance, integrity). In order to effectively provide personalized Web service ranking, it is requisite to consider both functional and non-functional characteristics of Web services. Unfortunately, the Web service search engines mentioned above cannot distinguish the non-functional differences between Web services.

QoS-driven Web service selection is a popular research problem [1, 6, 10]. A basic assumption in the field of selection is that all the Web services in the candidate set share identical functionality. Under this assumption, most of the selection approaches can only differentiate among Web services’ non-functional QoS characteristics, regardless of their functionalities. While these QoS-driven selection approaches are directly employed to Web service search engines, several problems will arise. One is that Web services whose functionalities are not exactly equivalent to the user searching query are completely excluded from the result list. Another problem is that Web services in the result list are ordered only according to their QoS metrics, while combining both functional and non-functional attributes is a more reasonable method.

To address the above issues, we propose a new Web service discovering approach by paying respect to functional attributes as well as non-functional features of Web services. A search engine prototype, WSExpress, is built as an implementation of our approach. Experimental results show that our search engine can successfully discover user-interested Web services within top results. In particular, the contributions of this chapter are threefold:

  • Different from all previous work, we propose a brand new Web service searching approach considering both functional and non-functional qualities of the service candidates.

  • We conduct a large-scale distributed experimental evaluation on real-world Web services. 3738 Web services (15,811 operations) located in 69 countries are evaluated both on their functional and non-functional aspects. The evaluation results show that we can recommend high-quality Web services to the user. The precision and recall performance of our functional search is substantially better than the approach in previous work [7].

  • We publicly release our large-scale real-world Web service WSDL files and associated QoS datasets for future research. To the best of our knowledge, our dataset is the first publicly available real-world dataset for functional and non-functional Web service searching research.

The rest of this chapter is organized as follows: Sect. 5.2 introduces Web service searching backgrounds. Section 5.3 presents the system architecture. Section 5.4 presents our QoS-aware searching approach. Section 5.5 describes our experimental results. Section 5.6 concludes the chapter.

5.2 Motivation

Figure 5.2 shows a common Web service query scenario. A user wants to find an appropriate Web service which contains operations that can be integrated as part of the user’s application. The user needs to specify the functionality of a suitable operation by filling the fields of keywords, input and output. Also the user may have some special requirements on service quality, such as the maximum price. These personal requirements can be represented by setting the QoS constraint field. The criticality of different quality criteria for a user can be defined by setting the QoS weight field.

A lot of Web services can be accessed over the Internet. Each service candidate provides one or more operations. Generally, these operations can be described in the structure shown in Fig. 5.2. Each operation includes a name, the parameters of input and output elements, and the descriptions about the functionality of this operation as well as the Web services it belongs to in its associated WSDL document. The service quality associated with this operation is represented by several criteria values, e.g., Q1, Q2 in Fig. 5.2.

Fig. 5.2
figure 2

\(\copyright \)[2010] IEEE. Reprinted, with permission, from Ref. [11]

Web service query scenario.

Table 5.1 User query examples.

Table 5.1 shows Web service query examples. In query 1, a user wants to find a Web service that can provide appropriate operations for displaying prices of different types and brands of cars. The input information provided by the user for that particular operation is the types and names of cars. This query is structured into three parts: keywords, input, and output. The keywords part defines in which domain is the query about. In this example, the user concerns about the domain “car.” The input part contains “name” and “type” since they can be provided by the user. The output part is set as “price” to specify the information the user wants to obtain from an appropriate operation.

Table 5.2 Web service examples.

In Table 5.2, we enumerate three possible results for the user’s search query. Web service 1 provides one operation CarPrice and this operation’s functionality is almost the same as what the user specifies in the query. In addition, the service quality meets the user’s requirements. Web service 2 provides operation AutomobileInformation. Operation AutomobileInformation can provide many information details including the price of the automobiles after invoked with “name” and “model” as input. However, some service quality criteria, such as the service price (Q1) and the response time (Q2), are beyond the user’s tolerance. Operation VehicleRecommend provided by Web service 3 recommends suitable vehicles for the user to rent. Although its target is to suggest the most suitable vehicle and vehicle rental companies to the user, it can also be invoked for obtaining the prices of cars due to the prime cost information provided. Besides, operation VehicleRecommend’s service quality fits the user’s constraints and preferences quite well. Among these three Web services, the most suitable one is Web service 1, and another acceptable one is Web service 3, but Web service 2 is not highly suggested due to its service quality. Thus, a reasonable order of the recommendation list for the user’s query is Web service 1, Web service 3, and Web service 2.

Fig. 5.3
figure 3

\(\copyright \)[2010] IEEE. Reprinted, with permission, from Ref. [11]

System architecture.

5.3 System Architecture

Now, we describe the system architecture of our QoS-aware Web service search engine. As shown in Fig. 5.3, after accepting a user’s query specification, our search engine should be able to provide a practical Web service recommendation list. The search engine consists of three components: non-functional evaluation, ki functional evaluation, and QoS-aware Web service ranking.

There are two phases in the non-functional evaluation component. In phase 1, the search engine obtains QoS criteria values of all the available Web services. In phase 2, the search engine computes the QoS utilities of different Web services according to the constraints and preferences specified in the QoS part of the user’s query.

The functional evaluation component contains two phases. In phase 1, the search engine carries out a preprocessing work on the WSDL files associated with the Web services. This work aims at removing noise and improving accuracy of functional evaluation. In phase 2, the search engine evaluates the Web service candidates’ functional features. These features are described by similarities between the functionality specified in the query and the functionality of operations provided by those Web services.

Finally, the search engine combines both functional and non-functional features of Web services in the QoS-aware Web service ranking component. A practical and reasonable Web service recommendation list is then provided as a result to the user’s search query.

5.4 QoS-Aware Web Service Searching

5.4.1 QoS Model

In our QoS model, we describe the quantitative non-functional properties of Web services as quality criteria. These criteria include generic criteria and business specific criteria. Generic criteria are applicable to all Web services like response time, throughput, availability, and price, while business criteria such as penalty rate are specified to certain kinds of Web services.

By assuming m criteria are employed for representing a Web service quality, we can describe the service quality using a QoS vector \((q_{i,1}, q_{i,2}, \ldots , q_{i,m})\), where \(q_{i,j}\) represents the jth criterion value of Web service i.

Some QoS criteria values of Web services, such as penalty rate and price, can be obtained from the service providers directly. However, other QoS attributes’ values like response time, availability, and reliability need to be generated from all the users’ invocation records due to the differences between network environments. In this chapter, we use the approach proposed in [12] to collect QoS performance on real-world Web services.

We put all the Web services’ QoS vectors together and form a QoS matrix Q. Each row in Q represents a Web service, while each column represents a QoS criterion value.

$$\begin{aligned} Q= \begin{pmatrix} q_{1,1} &{} q_{1,2} &{} \ldots &{} q_{1,t} \\ q_{2,1} &{} q_{2,2} &{} \ldots &{} q_{2,t} \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ q_{s,1} &{} q_{s,2} &{} \ldots &{} q_{s,t}.\\ \end{pmatrix} \end{aligned}$$
(5.1)

A utility function is used to evaluate the multi-dimensional quality of a Web service. The utility function maps a QoS vector into a real value for evaluating the Web service candidates. To represent user priorities and preferences, Two steps are involved into the utility computation: (1) The QoS criteria values are normalized to enable a uniform measurement of the multi-dimensional Quality-of-Service independent of their units and ranges. (2) The weighted evaluation on criteria is carried out for representing user’s constraints, preference, and special requirements.

5.4.1.1 Normalization

In this step, each criterion value is transformed to a real value between 0 and 1 by comparing it with the maximum and minimum values of that particular criterion. For some criterion, the possible absolute value could be very large or infinite. A pair of maximum and minimum values are specified for every criterion, respectively. Let \(q_{i,u}\) be the upper bound value and \(q_{i,l}\) be the lower bound value for the ith criterion, respectively. Every QoS value is transformed according to the following equations:

$$ f(x)= \left\{ \begin{array}{rl} r_{min}, &{}\text{ if } x<r_{min} \\ r_{max}, &{}\text{ if } x>r_{max}\\ x, &{} \text {otherwise}. \end{array} \right. $$

The normalized value of \(q_{i,j}\) can be represented by \(q'_{i,j}\) as follows:

$$\begin{aligned} q'_{i,j}=\frac{q_{i,j}-q_{i,0}}{q_{i,n}-q_{i,0}}. \end{aligned}$$
(5.2)

Thus, the QoS matrix Q is transformed into a normalized matrix \(Q'\) as follows:

$$\begin{aligned} Q'= \begin{pmatrix} q'_{1,1} &{} q'_{1,2} &{} \ldots &{} q'_{1,t} \\ q'_{2,1} &{} q'_{2,2} &{} \ldots &{} q'_{2,t} \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ q'_{s,1} &{} q'_{s,2} &{} \ldots &{} q'_{s,t}. \\ \end{pmatrix} \end{aligned}$$
(5.3)

5.4.1.2 Utility Computation

Some Web services need to be excluded from the candidate set due to their inconsistency with the user’s QoS constraints. The QoS constraints set the worst quality user can accept. These constraints are usually set according to the application developers’ experience or computed by some QoS-driven composition algorithm. Web service with any QoS criterion grade unsatisfying user constraint may cause problem while integrated into user’s application. For example, if a service fails to return the result within a given period of time, another service may exit with a error code time out while waiting for the result. Assume a user’s constraint vector is \(C=(c_{1}, c_{2}, \ldots , c_{m})\), in which \(c_{i}\) sets the minimum normalized ith criterion grade. We will only consider those Web services whose criteria grades are all larger than the constraints. In other words, we delete the rows which fail to satisfy the constraints from \(Q'\) and produce a new matrix \(Q^{*}\):

$$\begin{aligned} Q^*= \begin{pmatrix} q^*_{1,1} &{} q^*_{1,2} &{} \ldots &{} q^*_{1,t} \\ q^*_{2,1} &{} q^*_{2,2} &{} \ldots &{} q^*_{2,t} \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ q^*_{s,1} &{} q^*_{s,2} &{} \ldots &{} q^*_{s,t}. \\ \end{pmatrix} \end{aligned}$$
(5.4)

For the sake of simplicity, we only consider positive criteria whose values need to be maximized (negative criteria can be easily transformed into positive attributes by multiplying −1 to their values).

Finally, a weight vector \(W=(w_{1},w_{2},\ldots ,w_{m})\) is used to represent user’s priorities on preferences given to different criteria with \(w_{k}\in \mathbb {R}^{+}_{0}\) and \(\sum _{k=1}^m w_{k} =1\). The final QoS utilities vector \(U=(u_{1},u_{2},\ldots )\) of Web service candidates are therefore can be computed as follows:

$$\begin{aligned} U= Q^* * W^T \end{aligned}$$
(5.5)

in which \(u_{i}\) is the ith Web service QoS utility value within the range [0, 1].

5.4.2 Similarity Computation

Web services provide reusable functionalities. The functionalities are described by the input and output parameters defined in WSDL file.

Now, we describe a similarity model for computing similarities between a user query and Web service operations. In this model, a vector \((Keywords, Input, Output)\) is used to represent the functionality part of a user query as well as the functionality part of Web service operations. Particularly, the keywords of a Web service operation are abstracted from the descriptions in its associated WSDL file. Three phases are involved in the similarity search: WSDL preprocessing, clustering parameters, and similarity computation.

5.4.2.1 WSDL Preprocessing

In order to improve the accuracy of similarity computation for operations and user query in our approach, we first need to preprocess the WSDL files. There are two steps as follows:

  1. 1.

    Identify useful terms in WSDL files. Since the descriptions, operation names, and input/output parameters’ names are made manually by the service provider, there are a lot of misspelled and abbreviated words in real-world WSDL files. This step replaces such kind of words with normalized forms.

  2. 2.

    Perform word stemming and remove stop words. A stem is the basic part of the word that never changes even when morphologically infected. This process can eliminate the difference between inflectional morphemes. Stop words are those with little substantive meaning.

5.4.2.2 Similarity Computation

Now, we describe how to measure the similarities of Web service operations to a user’s query. The functionality part of a user’s query \(R_{f}\) consists of three elements \(R_{f}=(r^{k},r^{in},r^{out})\). The keywords element is a vector \(r^{k}=(r^{k}_{1},r^{k}_{2},\ldots ,r^{k}_{l})\), where \(r^{k}_{i}\) is the ith keyword. Moreover, the input element \(r^{in}=(r^{in}_{1},r^{in}_{2},\dots ,r^{in}_{m})\) and the output element \(r^{out}=(r^{out}_{1},r^{out}_{2},\dots ,r^{out}_{n})\), where \(r^{in}_{i}\) and \(r^{out}_{i}\) are the ith terms of input element and output element, respectively. A Web service operation also consists of three elements \(OP_{f}=(K,In,Out)\). The keywords element of operation i is a vector of words \(K^{i}=(k^i_{1},k^i_{2},\dots , k^i_{l'})\). The input and the output elements are vectors \(In^{i}=(in^{i}_{1}, in^{i}_{2}, \dots , in^{i}_{m'})\) and \(Out^{i}=(out^{i}_{1}, out^{i}_{2}, \dots , out^{i}_{n'})\), respectively. Thus, user queries and Web service operations are described as sets of terms. By applying the TF/IDF (Term Frequency/Inverse Document Frequency) measure [8] into these sets, we can measure the cosine similarity \(s_{i}\) between Web service operation i and a user’s query.

Vector similarity (VS) measures the cosine of the angle between two corresponding vectors and sets it as the similarity of the two vectors. In similarity search for Web service, the two vectors measured are Web service operation and user query:

$$\begin{aligned} s_{i}= \frac{ \sum _{i=1}^{t} r_{i} \cdot t_{i} }{ \sqrt{\sum _{i=1}^{t} r_{i}^2} \cdot \sqrt{\sum _{i=1}^{t} t_{i}^2} }. \end{aligned}$$
(5.6)

Pearson correlation coefficient (PCC), another popular similarity measurement approach, was introduced in a number of recommender systems for similarity computation, since it can be easily implemented and can achieve high accuracy. The similarity between an operation and a user’s query can be calculated by employing PCC as follows:

$$\begin{aligned} s_{i}= \frac{ \sum _{i=1}^{t} (r_{i}-\bar{r}) \cdot (t_{i}-\bar{t}) }{ \sqrt{\sum _{i=1}^{t} (r_{i}-\bar{r})^2} \cdot \sqrt{\sum _{i=1}^{t} (t_{i}-\bar{t})^2} } \end{aligned}$$
(5.7)

where \(\bar{r}\) is average TF/IDF value of all terms in a operation vector and \(\bar{t}\) is average TF/IDF value of all terms in a user’s query vector. The PCC similarity value \(s_i\) is in the interval of −1 and 1, and a larger value means indicate a higher similarity.

5.4.3 QoS-Aware Web Service Searching

With an increasing number of Web services being made available in the Internet, users are able to choose functionally appropriate Web services with high non-functional qualities in a much larger set of candidates than ever before. It is highly necessary to recommend to the user a list of service candidates which fulfill both the user’s functional and non-functional requirements.

5.4.3.1 Utility Computation

A final rating score \(r_i\) is defined to evaluate the conformity of each Web service i to achieve the search goal.

$$\begin{aligned} r_{i}=\lambda \cdot \frac{1}{\log (p_{s_{i}}+1) }+(1-\lambda )\cdot \frac{1}{\log (p_{u_{i}}+1)} , \end{aligned}$$
(5.8)

where \(p_{s_{i}}\) is the functional rank position and \(p_{u_{i}}\) is the non-functional rank position of Web service i among all the service candidates. Since the absolute values of similarity and service quality indicate different features of Web service and include different units and range, rank positions rather than absolute values is a better choice to indicate the appropriateness of all candidates. \(\frac{1}{\log (p+1)}\) calculates the appropriateness value of a candidate in position p for a query. \(\lambda \in [0,1]\) defines how much the functionality factor is more important than the non-functionality factor in the final recommendation.

\(\lambda \) can be a constant to allocate a fixed percentage of the two parts’ contributions to the final rating score \(r_i\). However, it is more realistic if \(\lambda \) is expressed as a function of \(p_{s_{i}}\):

$$\begin{aligned} \lambda = f(p_{s_{i}}) \end{aligned}$$
(5.9)

\(\lambda \) is smaller if the position in similarity rank is lower. This means a Web service is inappropriate if it cannot provide the required functionality to the users no matter how well it serves. The relationship between searching accuracy and the formula of \(\lambda \) will be identified to extend the search engine prototype in our future work.

5.4.3.2 Rank Aggregation

After receiving the users’ query, the functional component of WSExpress computes the similarity \(s_{i}\) in Sect. 5.4.2 between search query \(R_{f}\) and operations of Web service i, while the non-functional component of WSExpress employs \(R_{q}\) to compute the QoS utility \(u_{i}\) in Sect. 5.4.1 of each Web service i.

Now our goal is to consider user’s preferences on both functional and non-functional features and provide a rank list by combing evaluation results of the two aspects of service candidates. Given the user’s preference on functional and non-functional aspect, we can provide a personalized rank list by assigning each service candidate a certain score based on its positions in similarity ranking and QoS utility ranking. In other words, we aggregate the rankings of similarity and QoS utility according user defined preference.

We formally describe the optimal rank aggregation problem in the following. Given a set \(S=\{s_1, s_2, \ldots \}\) of service candidates, an ranking list \(l=<l(1), l(2), \ldots>\) is an permutation of all service candidates, where l(i) denotes the service at position i of l. Given two ranking lists \(l_p, l_q\) of similarity and QoS utility, respectively, the optimal rank list \(l_o\), which is an aggregation of \(l_p\) and \(l_q\), should be recommended to users.

Given the similarity values or QoS utility scores of candidates, we assume that there is an uncertainty of ranking list \(l_p\) or \(l_q\). In other words, any service \(s_j\in S\) is assumed to be possible for ranked in the top position of l. But different services may have different likelihood values. Under this assumption, we define the top one probability of Web service \(s_j\) as follows:

$$\begin{aligned} P(s_j)=\frac{f(r_{j})}{\sum _{k=1}^{m}f(r_{k})}, \end{aligned}$$
(5.10)

where f(x) can be any monotonically increasing and strictly positive function, \(P(s_j)>0\) and \(\sum P(s_j)=1\). For simplicity, we take the exponential function for f(x) [2]. Note that the top one probabilities \(P(s_j)\) form a probability distribution over the set of services S. The top one probability indicates the probability of a service being ranked in the top position of a user’s ranking list. By Eq. (5.10), a Web service with high similarity value or QoS utility value is assigned to a high probability value.

In order to estimate the quality of recommended Web service list, we need to define the distance between two ranking lists []. Ranking list distance evaluates the similarity of two lists. A distance value is smaller if more items are ordered in the similar positions. Given two ranking lists \(l_1\) and \(l_2\) over the Web service set S, the distance between \(l_1\) and \(l_2\) is defined by

$$\begin{aligned} d(l_1,l_2)=-\sum _{j=1}^{m}P(s_{1j})P(s_{2j}), \end{aligned}$$
(5.11)

where \(s_1j\) is the service in the jth position of \(l_1\) and \(s_2j\) is the service in the jth position of \(l_2\).

We therefore define the Web service recommendation as the following optimization problem:

$$\begin{aligned} \underset{l_o}{\min }\mathcal {L}(l_p,l_q) = \lambda d(l_o, l_p) + (1-\lambda )d(l_o, l_q), \end{aligned}$$
(5.12)

where \(d(l_o, l_p)\) is the distance between the optimal ranking list and the functionality ranking list, \(d(l_o, l_q)\) is the distance between the optimal ranking list and the non-functionality ranking list, and \(\lambda \) controls the trade-off between functionality and non-functionality.

Intuitively, Web services recommended by the final ranking list are functional comply with the users’ requirements and with high QoS level. Our goal is to find a rank of all candidate in \(\mathrm {U }\) that minimize the objective value function Eq. 5.12. One possible approach to solve the problem is check all the possible ranking lists in the solution space and select the optimal ranking which minimize the objective value function Eq. 5.12. The size of solution space is O(n!) for n candidates. In fact, this is a NP-complete problem, which can be proved by transforming into a NP-complete problem of finding minimum cost perfect matching in the bipartite graph. Therefore, we propose a greedy algorithm to find a suboptimal solution as follows:

figure a

5.4.4 Online Ranking

In this section, we propose an online service recommendation algorithm. Since the QoS performance of Web services is dynamic at runtime, the ranking list should adopt the updated QoS information. Therefore, the Optimal Rank Algorithm is extended to integrate QoS information dynamically. In our ranking aggregation approach, a nice property is that before aggregation the functional utility and non-functional utility are calculated independently. For functional similarity search, the ranking list remains the same in different time intervals. The QoS ranking list is changing from time to time. Therefore, the optimal recommendation list should be adopted to the new QoS value accordingly. The online service recommendation algorithm is described as follows:

figure b

5.4.5 Application Scenarios

5.4.5.1 Searching Styles

To attack the above problem, we propose a novel search engine which can provide the user with brand new searching styles. We define a user search query in the form of a vector \(R=(R_{f},R_{q})\), which contains functionality part \(R_f\) and non-functionality part \(R_q\) for representing the user’s ideal Web service candidate. \(R_{q}=(C,W)\) defines the user’s non-functional requirements, where C and W set the user’s constraints and preferences on QoS criteria separately as mentioned in Sect. 5.4.1. Our new searching procedure consists of three styles in the following discussion.

Keywords Specified In this searching style, the user only needs to simply enter the keywords vector \(r^k\) and QoS requirements \(R_{q}\). The keywords should capture the main functionality the user requires in the search goal. In Table 5.1 as an example, since the user needs price information of cars, it is reasonable to specify “car” or “car, price” as the keywords vector.

Interface Specified In order to improve the searching efficiency, we design the “interface specified” searching style. In this style, the user specifies the expected functionality by setting the input vector \(r^{in}\) and/or output vector \(r^{out}\) as well as QoS requirements \(R_{q}\). The input vector \(r^{in}\) represents the largest amount of information the user can provide to the expected Web service operation, while the output vector represents the least amount of information that should be returned after invoking the Web service operation.

Similar Operations For a more accurate and advanced Web service searching, we design the “similar operation” searching style by combining above two styles. This style is especially suitable in the following two situations. In the first situation, the user has already received a Web service recommendation list by performing one of the above searching styles. The user decides the Web service to explore in detail, checks the inputs and outputs of its operations, and even tries some of the operations. After carefully inspecting a Web service the user may find that this Web service is not suitable for the applications. However, the user does not want to repeat the time-consuming inspecting process for other service candidates. This style enables the user to find similar Web service operations by only modifying a small part of the previous query to exclude these inappropriate features. In the second situation, the user already integrates a Web service into the application for a particular functionality. However, due to some reason this Web service becomes unaccessible. Without requesting an extra query process, the search engine can automatically find other substitutions.

Now, we discuss in detail how the functional evaluation component operates in different scenarios.

  • If only the keywords vector in the functionality part of the user query is defined, the similarity is computed in Sect. 5.4.2 using the keywords vector \(r^k\) of the query and the keywords vector K extracted from the descriptions, operation names, and parameter names.

  • If the input and output vectors in the functionality part of the user query are defined, the input similarity and output similarity are computed in Sect. 5.4.2 using the input/output vector \(r^{in}\)/\(r^{out}\) of the query and the input/output vector In/Out of an operation. The functional similarity is a combination of input and output similarities.

  • If the whole functionality part of a query is available. The functional similarity of an operation is a combination of the above two kinds of similarities, which is computed using \(R_f\) and \(OP_f\).

5.5 Experiments

The aim of the experiments is to study the performance of our approach compared with other approaches (e.g., the one proposed by [7]). We conduct two experiments in Sects. 5.5.1 and 5.5.2, respectively. Firstly, we show that the Top-k Web services returned by our approach have much more QoS gain than other approaches. Secondly, we demonstrate that our approach can achieve highly relevant results as good as other similarity-based service searching approaches even there is no available QoS values.

5.5.1 QoS Recommendation Evaluation

In this section, we conduct a large-scale real-world experiment to study the QoS performance of the Top-k Web services returned by our searching approach.

To obtain real-world WSDL files, we developed a Web crawling engine to crawl WSDL files from different Web resources (e.g., UDDI, Web service portal, and Web service search engine). We obtain totally 3738 WSDL files from 69 countries. Totally, 15,811 operations are contained in these Web services. To measure the non-functional performance of these Web services, 339 distributed computers in 30 countries from PlanetLab are employed to monitor these Web services. The detailed non-functional performance of Web service invocations is recorded by these service users (distributed computer nodes). Totally, 1,267,182 QoS performance results are collected. Each invocation record is a k-dimensional vector representing the QoS values of k criteria. For simplicity, we use two matrices, which represent response-time and throughput QoS criteria, respectively, for experimental evaluation in this chapter. Without loss of generality, our approach can be easily extended to include more QoS criteria.

Table 5.3 Statistics of WS QoS dataset,

The statistics of Web service QoS dataset are summarized in Table 5.3. Response-time and throughput are within the range 0–20 s and 0–1000 kbps, respectively. The means of response-time and throughput are 0.910 s and 47.386 kbps, respectively. Figure 5.4 shows the distributions of response-time and throughput. Most of the response-time values are between 0.1 and 0.8 s, and most of the throughput values are between 5 and 40 kbps.

In most of the searching scenarios, users tend to look at only the top items of the returned result list. The items in the higher position, especially the first position, is more important than the items in lower positions in the returned result list. To evaluate the qualities of Top-k returned results in a ranked list, we employ the Normalized Discounted Cumulative Gain (NDCG), a standard IR measure [4] approach as performance evaluation metric. Let \(s_1,s_2,\ldots ,s_p\) be a ranked list of Web services produced by a searching approach. Let \(u_i\) be the associated QoS utility value of Web service \(s_i\), which ranked in position \(p_i\). Discounted Cumulative Gain (DCG) and NDCG of at rank k are defined, respectively, as

Fig. 5.4
figure 4

\(\copyright \)[2010] IEEE. Reprinted, with permission, from Ref.  [11]

Value distributions.

$$\begin{aligned} DCG_{k}=u_i + \sum _{i=2}^{k} \frac{u_{i}}{\log _2 p_{i}}, \end{aligned}$$
(5.13)
$$\begin{aligned} NDCG_{k}=\frac{DCG_{k}}{IDCG_{k}} \end{aligned}$$
(5.14)

where IDCG is the maximum possible gain value that is obtained with the optimal reorder of k Web services in the list \(s_1,s_2,\ldots ,s_p\). For example, consider the following QoS utility values which are ordered according to the position of associated Web services in a ranked Web service list:

\(u= [0.3, 0.2, 0.3, 0, 0, 0.1, 0.2, 0.2, 0.3, 0]\)

The perfect ranking would have QoS utility values of each rank of

\(u= [0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.1, 0, 0, 0]\)

which would give ideal DCG utility values.

Fig. 5.5
figure 5

\(\copyright \)[2010] IEEE. Reprinted, with permission, from Ref. [11]

NDCG of Top-K Web services,

To study the performance of our approach, we compared our WSExpress Web service searching engine with the URBE [7], a keywords matching approach, employing our real-world dataset described above. Totally, 5 query domains are studied in this experiment. Each domain contains 4 user queries. Figure 5.5 shows the NDCG values of Top-k recommended Web services. The Top-k NDCG values of our WSExpress engine are considerably higher than URBE (i.e., 0.767 of WSExpress compared with 0.200 of URBE for Top5 and 0.697 of WSExpress compared with 0.303 of URBE for Top10). This means that, given a query, our search engine can recommend high-quality Web services in the first positions.

Table 5.4 NDCG values (A larger NDCG value means a better performance).

Table 5.4 shows the NDCG values of Top-k recommended Web services in the five domains. In most of the queries, NDCG values of WSExpress are much higher than URBE. In some search scenarios such as query 2, the NDCG values of WSExpress and URBE for Top5 are identical, since in this particular case the most functional appropriate Web services have the most appropriate non-functional properties. In other words, these Top5 Web services have highest QoS utilities and similarity values. However, while more top Web services are considered, such as Top10, the NDCG values of WSExpress are becoming much higher than URBE.

5.5.2 Functional Matching Evaluation

In this experiment, we study the relevance of the recommended Web services to the user’s query without considering non-functional performance of the Web services. By comparing our approach with URBE, we observe that the Top-k Web services in our recommendation list are highly relevant to the user’s query even without any available QoS values.

The benchmark adopted for evaluating the performance of our approach is the OWLS service retrieval test collection OWLS-TC v2 [5]. This collection consists of more than 570 Web services and 1000 operations covering seven application domains (i.e., education, medical care, food, travel, communication, economy, and weaponry). The benchmark includes WSDL files of the Web services, 32 test queries, and a set of relevant Web services associated with each of the queries. Since the QoS feature is not considered in this experiment, we set the QoS utility value of each Web service as 1.

Top-k recall (\(Recall_{k}\)) and Top-k precision (\(Precision_{k}\)) are adopted as metrics to evaluate the performance of different Web search approaches. \(Recall_{k}\) and \(Precision_{k}\) can be calculated by

$$\begin{aligned} Recall_{k}=\frac{|Rel\cap Ret_k|}{|Rel|}, \end{aligned}$$
(5.15)
$$\begin{aligned} Precision_{k}=\frac{|Rel\cap Ret_k|}{|Ret_k|}, \end{aligned}$$
(5.16)

where Rel is the relevant set of Web services for a query and \(Ret_k\) is a set of Top-k Web services search results.

Fig. 5.6
figure 6

\(\copyright \)[2010] IEEE. Reprinted, with permission, from Ref. [11]

Recall and precision performance.

Since user tends to check only top few Web services in common search scenario, an approach with high Top-k precision values is very practical in reality. Figure 5.6 shows the experimental results of our WSExpress approach and the URBE approach. In Fig. 5.6a, the Top-k recall values of WSExpress are higher than URBE. In Fig. 5.6b, the Top-k precision values of WSExpress are considerably higher than URBE, indicating that more relevant Web services are recommended in high positions by our approach.

5.5.3 Online Recommendation

In this chapter, we propose an online Web service recommendation approach. Different from the previous ranking approach, it adopts the real time QoS information to recommend Web services. In this section, we evaluate the performance of online recommendation approaches.

In this experiment, we deploy 142 distributed computers located in 22 countries from PlanetLab. Totally, 4532 publicly available real-world Web services from 57 countries are monitored by each computer continuously. In our experiment, each of the 142 computers sends null operation requests to all the 4532 Web services during every time interval. The experiment lasts for 16 hours with a time interval lasting for 15 min. By collecting invocation records from all the computers, finally we include 30,287,611 QoS performance results into the Web service QoS dataset. Each invocation record is a k dimension vector representing the QoS values of k criteria. We then extract a set of \(142 \times 4532 \times 64\) user-service-time tensors, each of which stands for a particular QoS property, from the QoS invocation records. For simplicity, we employ two tensors, which represent response-time and throughput QoS criteria, respectively, for experimental evaluation in this chapter. Without loss of generality, our approach can be easily extended to include more QoS criteria.

Table 5.5 Statistics of online QoS dataset.

The statistics of Web service QoS dataset are summarized in Table 5.5. Response-time and throughput are within the range of 0–20 s and 0–1000 kbps, respectively. The means of response-time and throughput are 3.165 s and 9.609 kbps, respectively. The distributions of the response-time and throughput values of the user-service-time tensors are shown in Fig. 5.7a, b respectively. Most of the response-time values are between 0.1 and 0.8 s and most of the throughput values are between 0.8 and 3.2 kbps.

The experimental results are shown in Fig. 5.8. Each time interval lasts for 15 minutes. The parameter setting is Top-K=5. From Fig. 5.8, we observe that in each time interval, online recommendation approach has a higher NDCG value than URBE, which means Web services with better QoS performance are recommended compared with URBE. Since URBE cannot adopt the dynamic QoS information for recommendation in time, the NDCG values of approach URBE decrease significantly as the time passed. After about 30 time intervals, the NDCG value is below 0.3 which means QoS performance of the recommended Web services has a high probability that cannot fulfill the users’ non-functional requirements. In our online rank aggregation approach, we employ the latest QoS information of Web services for recommendation. Therefore, the NDCG values are maintained in a high level, which indicates that we can always recommend appropriate Web services with high QoS performance to the users.

Fig. 5.7
figure 7

\(\copyright \)[2010] IEEE. Reprinted, with permission, from Ref. [11]

QoS value distributions of online dataset.

Fig. 5.8
figure 8

\(\copyright \)[2010] IEEE. Reprinted, with permission, from Ref. [11]

NDCG of online recommendation.

Fig. 5.9
figure 9

\(\copyright \)[2010] IEEE. Reprinted, with permission, from Ref. [11]

Impact of \(\lambda \).

5.5.4 Impact of \(\lambda \)

In our method, the parameter \(\lambda \) controls the user’s preference on functionality and non-functionality. A larger value of \(\lambda \) means functionality is preferred. In Fig. 5.9, we study the impact of \(\lambda \) by varying the values of lambda from 0 to 1 with a step value of 0.1. Other parameter setting is Top-K = 10.

Figure 5.9a shows the NDCG values and Fig. 5.9b shows the Precision values. From Fig. 5.9a, we observe that \(\lambda \) impacts the NDCG performance significantly, which demonstrates that incorporating the QoS information greatly improves the non-functional quality of recommended Web services. In general, when the value of \(\lambda \) is increased from 0 to 1, the NDCG value is decreased. This observation indicates that if functionality is preferred, the QoS performance of recommended Web services is decreased. If \(\lambda =0\), we only employ the QoS information for Web service recommendation; therefore, the NDCG value is 1. If \(\lambda =1\), we only employ the functional similarity information for Web service recommendation; therefore, the NDCG value is very small. From Fig. 5.9b, we observe that \(\lambda \) also impacts the precision significantly, which demonstrates that incorporating the functional similarity information greatly improves the recommendation accuracy. In general, when the value of \(\lambda \) is increased from 0 to 1, the precision value is increased. This observation indicates that if functionality is preferred, the functional requirements can be fulfilled well. If \(\lambda =0\), we only employ the QoS information for Web service recommendation; therefore, the precision value is vary small. If \(\lambda =1\), we only employ the functional similarity information for Web service recommendation; therefore, the precision value is 1. In other cases, we fuse the information of QoS and functionality for Web service recommendation.

A proper value of \(\lambda \) is highly related to the preference of the user. The user defines the importance of functionality and non-functionality. A proper value of \(\lambda \) can be defined by analyzing the impact of \(\lambda \) on a small sample dataset.

5.6 Summary

In this chapter, we present a novel Web service search engine WSExpress to find the desired Web service. Both functional and non-functional characteristics of Web services are captured in our approach. We provide user three searching styles in the WSExpress to adapt different searching scenarios. A large-scale real-world experiment in distributed environment and an experiment on benchmark OWLS-TC v2 are conducted to study the performance of our search engine prototype. The results show that our approach outperforms related works.

In the future work, we will conduct data mining in our dataset to identify for which formulas of \(\lambda \) our search approach can achieve optimized performance. Clustering algorithms for similarity computation will be designed for improving functional accuracy of searching result. Finally, the non-functional evaluation component will be extended to dynamically collect quality information of Web services.