Keywords

1 Introduction

Web services technologies are emerging as a powerful vehicle for organizations that need to integrate their applications within and across organizational boundaries [14]. The growing presence and adoption of Web services as the delivery mode in business have caused a wide range of attentions not only from industry but also academia. Due to the large number of Web services that meet the specific functional requirements of service users, its time and resource consuming for service users to choose appropriate services to satisfy their various needs. To help service users find their desired services, there is urgently need to build effective methods of service selection and recommendation.

To facilitate effective Web service recommendation, the quality of candidate services needs to be assessed from non-functional properties [5]. Quality-of-Service (QoS), which is usually employed for representing the non-functional characteristic of Web services such as response-time, failure-rate and availability, has become an important differentiating point of different Web services [6]. And different Web service QoS properties can be divided into two parts: one is the user-independent QoS properties (e.g., price, popularity, etc.) and the other is the user-dependent QoS properties (e.g., response-time, failure-rate, throughput, etc.) [7]. Specially, the values of user-independent QoS properties are usually provided by service providers and equal for different users, while the values of user-dependent QoS properties can fluctuate widely for different service users since the values are highly related to the Internet conditions and the positions of the service users. Consequently, the same Web service may have very different QoS values for different service users, and methods where QoS values obtained by one user are straight used by other users are inappropriate. Instead, a personalized Web service prediction is needed, in which accurate values of the user-dependent QoS properties should be obtained immediately.

Some previous works [811] have employed collaborative filtering (CF) [1215] to Web service recommendation. These methods can predict the QoS performance of a Web service for a user by employing historical QoS information from other similar service users, who have similar historical QoS experience on the same set of commonly-invoked Web services [57].

To improve the prediction accuracy, some hybrid prediction methods are proposed by combining of both memory-based CF method and model-based CF method [16] or user-based CF [17, 18] method and item-based CF method [10, 11, 19, 20]. Work [16] employs clustering model and item-based CF method to enhance the prediction accuracy. Zheng et al. [10, 11] and Tang et al. [19, 20] combine user-based CF method and item-based CF method to balance two different CF methods, then increase the prediction accuracy of the traditional CF methods.

However, there is still one problem of the previous works to be solved, which influences the performance of QoS prediction. That is to say, the current methods fail to recognize the importance of the asymmetric correlation among different service users. Through the experiments implemented on a real-world Web service datasetFootnote 1, which includes 1.97 million Web service invocation results of 339 Web services observed by 5825 service users, we conclude that the asymmetric correlation among service users is the same significant as the similarity of services for QoS prediction.

To address the absence of asymmetric correlation among service users, different from the above methods, we propose a novel method for QoS prediction in dynamic Web services. The proposed method systematically integrates asymmetric correlation among service users and asymmetric correlation propagation into the deviation computation of different service items in QoS prediction. In addition, this paper mainly focuses on the asymmetric correlation model for service users, asymmetric correlation propagation model and deviation computation method with asymmetric correlation.

The main contributions of this work are threefold:

  • Firstly, the asymmetric correlation for service users is introduced into the QoS prediction in dynamic Web services, which has a significant effect on the prediction accuracy.

  • Secondly, asymmetric correlation propagation model is also proposed to take into account of the propagation and attenuation of asymmetric correlation for service users.

  • Finally, experiments are conducted to evaluate our method by employing a real-world QoS dataset which includes about 1.97 million invocations for Web services.

The remainder of this paper is organized as follows: Sect. 2 provides an overview on the proposed framework of QoS value prediction. Section 3 introduces the QoS value prediction method with asymmetric correlation in detail. Section 4 presents the experiments of our method. Section 5 concludes the paper with a summary and a description of future work.

2 Framework of QoS Value Prediction

To provide the optimal services for service users among functionally equivalent ones, the framework shown in Fig. 1 acts as a platform for service users to obtain optimal Web services with satisfying predicted QoS values by employing asymmetric correlation and asymmetric correlation propagation. Figure 1 shows the framework overview of QoS value prediction, which includes the following procedures.

  1. (a)

    Asymmetric correlation (AC) modeling is the first crucial problem to be solved in our framework. Firstly, existing QoS values of Web services are collected and analyzed to build the correlation matrix (CM) for QoS values. And the correlation matrix records the numbers of Web services that have been invoked by each pair of service users. After that, asymmetric correlation matrix (ACM) for service users can be constructed by using correlation matrix, where each entry in the ACM matrix is related to not only the numbers of Web services invoked by each pair of service users but also the numbers of Web services invoked by the others. Finally, according to the asymmetric correlation matrix, asymmetric correlation graph for service users can be built.

  2. (b)

    Asymmetric correlation propagation modeling is the second crucial problem to be attacked. In this step, the propagation of asymmetric correlation for service users in asymmetric correlation graph is taken into account adequately. Furthermore, it’s very important to obtain the relation flow, which transfers the values of asymmetric correlation for a service user to others strongly connected to it.

  3. (c)

    Deviation computation of Web services in QoS prediction framework will be built based on the values of asymmetric correlation propagation and the QoS dataset. Then the results of deviation computation can be used to find the different services for corresponding service users. In this step, traditional deviation computation employing SLOPEONE method [14] is modified and extended with the values of propagation.

  4. (e)

    After finding the appropriate different services, the missing QoS values for service users can be predicted by employing the deviation character.

Fig. 1.
figure 1

Framework overview of QoS value prediction

3 QoS Value Prediction with Asymmetric Correlation

3.1 Preliminaries

In order to make an accurate prediction, some researchers employ collaborative filtering methods to provide personalized prediction of QoS values for service users [811]. Due to the great successes in modeling characteristics of service users and Web services, collaborative filtering techniques have been widely employed in famous commercial systems, such as Amazon, eBay, etc. [10]. In the QoS prediction scenario, CF methods can automatically predict QoS values of unused services for service users by collecting Web service QoS information from other users.

In a typical QoS prediction scenario [10, 11], there is a user-item matrix which includes M service users and N Web service items. Every entry in this user-item matrix \(r_{m,n}\) denotes a vector of QoS values (e.g., response-time, failure-rate, etc.), which is obtained by the service user m on the Web service item n. And if Web service item n is not invoked by service user m previously, then \(r_{m,n}=0\).

Item-based CF method can provide dramatically better performance than user-based method [10, 15]. And Slope One method (named as SLOPEONE) is a typical item-based CF method which works on comparing the intuitive principle of a popular differential between service items rather than similarity between service items [15]. The typical deviation computation of service item i and j can be calculated by:

$$\begin{aligned} Dev(i,j) = \sum \limits _{u \in U(i) \cap U(j)} {\frac{{r_{u,i} - r_{u,j} }}{{|U(i) \cap U(j)|}}}, \end{aligned}$$
(1)

where every user u has invoked both service item i and service item j, and \(|U(i)\cap U(j)|\) is the number of service users consisting of u. \(r_{u,i}\) and \(r_{u,j}\) represent the QoS values of service item i and service item j observed by user u, respectively.

In addition, the SLOPEONE method uses different service items to predict the QoS values by employing the following equation:

$$\begin{aligned} P(r_{u,i} ) = \overline{r_{u}} + \frac{{\sum \limits _{j \in R_u } {Dev(i,j)} }}{{|R_u |}}, \end{aligned}$$
(2)

where \(P(r_{u,i})\) is a vector of predicted QoS values of entry \(r_{u,i}\) in the user-item matrix using SLOPEONE method. \(\overline{r_{u}}\) is a vector of average QoS values of different service items invoked by service user u. Furthermore, \(R_u = \{ j|j \in r_u ,j \ne i,num(Dev(i,j)) > 0\}\) is the set of all relevant items, in which \(num(Dev(i,j)) > 0\) represents the number of service user u, who invoked both service item i and service item j, should be larger than 0. And \(|R_u|\) is the number of \(R_u\). Dev(ij) can be calculated by Eq. (1).

3.2 Asymmetric Correlation Modeling

In this paper, asymmetric correlation is introduced to address the absence of correlations among service users, which have a significant effect on QoS value prediction. Furthermore, modeling of asymmetric correlation includes three steps which are described as follows.

Correlation Matrix (CM) for QoS Values Building. A necessary first step before correlation matrix (CM) building is to give the definition of CM, which is based on the assumption that the more items that have been rated by both user \(u_p\) and user \(u_q\), the closer the users are [21]. Based on work [21], in which correlation rating matrix records the number of items that have been rated by each pair of users in MovieLens scenario, the CM in QoS value prediction for Web services can be defined by:

(3)
$$\begin{aligned} CM_{u_p ,u_q } = |S(u_p ,u_q )\mathrm{{ |,}} \qquad p = 1,...,M,q = 1,...,M, \end{aligned}$$
(4)

where \(S(u_p ,u_q )\) is a set of Web services observed by both service user \(u_p\) and service user \(u_q\), and \(|S(u_p ,u_q )|\) is the number of \(S(u_p ,u_q )\). \(r_{u_p ,i_n }\) and \(r_{u_q ,i_n }\) are the sets of QoS values of Web service in observed by service user \(u_p\) and service user \(u_q\), respectively. Furthermore, M is the number of service users and N is the number of Web service items in user-item matrix.

Equations (3) and (4) show that CM is an M-by-M symmetric matrix. Taking into account the degrees of service user correlations, the values of correlations for service users, which invoked different number of Web services, should be different.

Fig. 2.
figure 2

Asymmetric correlation graph for service users

Asymmetric Correlation Matrix (ACM) for Service Users Building. According to [21], the correlation of service users is related to not only the numbers of Web services invoked by each pair of users but also the total numbers of Web services invoked by other users. Therefore, CM matrix can be improved and transformed to asymmetric correlation matrix (ACM). So, the ACM matrix for service users in our method can be calculated by:

$$\begin{aligned} ACM_{u_p ,u_q } = \left\{ \begin{array}{l} \frac{{|S(u_p ,u_q )\mathrm{{ |}}}}{{\sum \limits _{l \in M}^{l \ne p} {|S(u_p ,u_l )|} }}\mathrm{{,}}\;p \ne q,\mathrm{{ }} \\ \ 0\ \quad \qquad \quad \mathrm{{,}}\;p = q, \\ \end{array} \right. p = 1,...,M,q = 1,...,M, \end{aligned}$$
(5)

where \(u_p\) and \(u_q\) are the service users in user-item matrix, and \(u_l\) are the service users who are not equal to \(u_p\).

From Eqs. (4) and (5), it can be observed that ACM matrix is different from CM matrix and ACM matrix is an asymmetric matrix.

Asymmetric Correlation Graph for Service Users Building. According to ACM matrix, the correlations of service users can be built as a directed graph as shown in Fig. 2, in which a node \(u_p\) expresses a service user and a directed edge \(e_{pq}\) from node \(u_p\) to node \(u_q\) expresses an asymmetric correlation from \(u_p\) to \(u_q\). Therefore, the asymmetric correlation graph can be employed in asymmetric correlation propagation model to obtain the relation flow.

3.3 Asymmetric Correlation Propagation Modeling

In the step, we take into account not only asymmetric correlation among service users but also propagation and attenuation of asymmetric correlation. Observing from Fig. 2, we can intuitively see that the correlation among service users can spread throughout the graph and some users may obtain high correlation than others.

PageRank is an excellent way to prioritize the results of Web keyword searches, which counts the approximation of a pages importance or quality by not counting links from all pages equally, and by normalizing by the number of links on a page. And the PageRank for node n is defined as follows [22]:

$$\begin{aligned} PR(n)=a\cdot \sum \limits _{q:(q,n) \in \varepsilon } {\frac{{PR(q)}}{{w_q }}} + (1 - a)\cdot \frac{1}{{|v|}}, \end{aligned}$$
(6)

where PR(n) is the importance score for every node \(n \in v\) according to the graph connectivity. |v| is the number of nodes in graph \(G = (v,\varepsilon )\), in which \(\varepsilon \) is the set of directed connected edges among nodes. a is a decay factor which is usually set to 0.85. And \(w_q\) is the out-degree of node q.

Similar to PageRank [23], an asymmetric correlation propagation model is proposed which is based on the intuition that the correlation of service users can be effectively computed by employing a similar method as PageRank.

So, the following equation can be defined to compute the propagation of correlation among service users by modifying Eq. (6). And then, different correlation ranks for service users can be obtained.

$$\begin{aligned} PR(u_q ) = a\cdot \sum \limits _{p \in N(u_q )} {PR(u_p )} ACM(u_p ,u_q ) + (1 - a)\cdot \frac{1}{M}, \end{aligned}$$
(7)

where \(PR(u_q)\) is a vector of correlation ranks for service user \(u_q\). \(N(u_q)\) is a set of service users that connect to service user \(u_q\). ACM is the asymmetric correlation matrix calculated by Eq. (5), and M is the number of service users in user-item matrix.

In vector form, Eq. (7) can be equivalently defined as:

$$\begin{aligned} \left[ \begin{array}{l} PR(u_1 ) \\ \mathrm{{ }}\ \quad . \\ \mathrm{{ }}\ \quad . \\ \mathrm{{ }}\ \quad . \\ PR(u_M ) \\ \end{array} \right] = a\cdot ACM^T \cdot \left[ \begin{array}{l} PR(u_1 ) \\ \mathrm{{}}\ \quad . \\ \mathrm{{}}\ \quad . \\ \mathrm{{}}\ \quad . \\ PR(u_M ) \\ \end{array} \right] + \left[ \begin{array}{l} (1 - a)/M \\ \mathrm{{ }}\quad \quad . \\ \mathrm{{ }}\quad \quad . \\ \mathrm{{ }}\quad \quad . \\ (1 - a)/M \\ \end{array} \right] , \end{aligned}$$
(8)

where \(ACM^T\) is the transposed matrix of the ACM.

The correlation ranks for service users can be determined by calculate the eigenvector with eigenvalue 1 or by repeating the process until all correlation ranks become stable [23].

3.4 Deviation Computation and QoS Value Prediction

In traditional CF, Gao et al. [21] use different weights for users to represent the relationships. Different from this work which focuses on movie recommendation by using MovieLens dataset, our method employs about 1.7 million real-world QoS records (eg., response-time, throughput.) for Web services observed by service users to predict the missing QoS values for service users, and modifies the propagation model to suit the dynamic QoS prediction Scenario.

In this step, the asymmetric correlation and propagation among service users are combined into deviation computation of Web services for QoS value prediction in service-oriented architecture.

And then, the following equations are employed to compute the deviation and predict the QoS values by using enhanced SLOPEONE method respectively.

$$\begin{aligned} Dev'(i,j) = \frac{{\sum \limits _{u \in U(i) \cap U(j)} {(r_{u,i} - r_{u,j} )*PR(u)} }}{{\sum \limits _{u \in U(i) \cap U(j)} {PR(u)} }}, \end{aligned}$$
(9)
$$\begin{aligned} P'(r_{u,i} ) = \overline{r_u } + \frac{{\sum \limits _{j \in R_u } {Dev'(i,j)} }}{{|R_u |}}, \end{aligned}$$
(10)

where PR(u) is a set of correlation ranks for service user u which can be calculated by Eq. (8). \(Dev'(i,j)\) is the new deviation value that is employed to predict the QoS values in Eq. (10). \(P'(r_{u,i})\) is a new vector of predicted QoS of \(r_{u,i}\) in user-item matrix, in which the asymmetric correlation and propagation have been took into account adequately.

4 Experiments

4.1 Dataset Description

To study the QoS value prediction performance of our method, we use a well-known dataset provided by Zheng et al. [7], which includes about 1.97 million invocations for Web services which are obtained from 339 service users on 5825 publicly available Web services. In addition, a \(339\,\times \,5825\) user-service item matrix can be obtained by analyzing and processing the dataset, and each entry in the user-service item matrix represents a QoS value (i.e., response-time.).

In order to evaluate the performance of our method, we randomly choose different number of services from 5825 available Web services named as NS ranging from 1000 to 4000 with a step value of 1000. Then, a \(339\,\times \,NS\) user-service item matrix is constructed. After that, all entries in the \(339\,\times \,NS\) user-service item matrix are divided into two parts randomly, one as the training set and the other as the test set. A parameter percent (0 \(<\) percent \(\le \) 1) is employed to present the percentage of training set in whole user-service item matrix, and (1-percent) shows the percentage of test data. Furthermore, to simulate the real situation, entries in training set are removed randomly with density (0 \(<\) density \(\le \)1).

4.2 Metrics

Mean Absolute Error (MAE) is employed to measure the prediction accuracy of our method in comparison with other well-known CF methods, such as the User-based method using Pearson Correlation Coefficient (named as UPCC), the Item-based method using Pearson Correlation Coefficient (named as IPCC), and WSRec method [10]. The MAE is defined as:

$$\begin{aligned} MAE = \frac{{\sum \nolimits _{u,i} {|r_{u,i} - \hat{r}_{u,i} |} }}{N}, \end{aligned}$$
(11)

where \(r_{u,i}\) is the real QoS value of service item i invoked by user u, \(\hat{r}_{u,i}\) denotes the predicted QoS value for service item i, and N is the number of predicted values. In addition, the Normalized Mean Absolute Error (NMAE) is also used to measure the prediction quality as follows.

$$\begin{aligned} NMAE = \frac{{MAE}}{{\sum \nolimits _{u,i} {r_{u,i} } /N}}. \end{aligned}$$
(12)

The lower the MAE, NMAE are, and the more effective the prediction methods are.

In order to study the prediction performance more completely, other metrics are introduced in this paper, which include Precision, Recall, and F-measure [24, 25].

Precision is defined as the ratio of number of truly “high” ratings among these that were predicted to be “high” by a prediction method in this way [25]:

$$\begin{aligned} Precision = \frac{{|R_{u,i} \cap T_{u,i} |}}{{|R_{u,i} |}}, \end{aligned}$$
(13)

where \(R_{u,i}\) is a vector of predicted QoS values of \(r_{u,i}\) with “high” QoS values and \(T_{u,i}\) is a vector of true QoS values observed by users with “high” QoS values.

Recall is defined as the ratio of number of correctly predicted “high” ratings among all the ratings obtained to be “high” which is calculated by [25]:

$$\begin{aligned} Recall = \frac{{|R_{u,i} \cap T_{u,i} |}}{{|T_{u,i} |}}. \end{aligned}$$
(14)

Larger Precision and Recall correspond to the better performances respectively. However, the values of Precision and Recall are usually conflictual. That is to say, the Precision can be improved by increasing the number of “high” QoS values with high efficiency, while the Recall is increasing correspondingly with high cost [26]. Due to the inconsistent performances between precision and recall, F-measure metric can be used to balance of above two competitive metrics [27].

$$\begin{aligned} F_\beta = \frac{{(1 + \beta ^2 ) \cdot (Precision \cdot Recall)}}{{\beta ^2 \cdot Precision + Recall}}, \end{aligned}$$
(15)

where the parameter \(\beta \) is a regular certain value which is set to 0.5, 1, and 2 respectively.

4.3 Comparison

Since the user-service item matrix is usually very sparse in reality, the QoS value prediction methods always encounter the sparsity problem. Then, the parameter density is set to 0.1 to keep the consistency with the real-world data distribution. Table 1 shows the MAE and NMAE values of five different prediction methods on response-time with percent = 0.8, density = 0.1 and NS from 1000 to 4000 with a step value of 1000. And the best performance result is shown with parameter \(\lambda =0.2\) in WSRec method [10]. The UPCC method and IPCC method are the famous CF methods which are widely studied. In addition, SLOPEONE method is defined in Eq. (2) and our method is named SLOPEONE-R which takes into account the asymmetric correlation and propagation adequately. Figure 3 includes three versions of Table 1. One version compares the MAE and NMAE values with different NS range from 1000 to 4000, the second version uses the bar graph to show the MAE and NMAE values with different methods while the last version employs curve graph to illustrate the five methods with different MAE and NMAE values under different experimental sets.

Table 1. MAE and NMAE comparison with basic methods
Fig. 3.
figure 3

MAE and NMAE values comparison (Color figure online)

The experimental results for the five methods using the same error measure and over the same data set shown in Table 1 and Fig. 3 indicate that:

  • Our SLOPEONE-R method improves the prediction performance, and outperforms other four methods for two different metrics-MAE and NMAE under all experimental settings.

  • The SLOPEONE-based methods outperform the Pearson Correlation Coefficient (PCC)-based methods, which indicates that deviation computation is more important than similarity computation in QoS prediction.

  • The SLOPEONE-R method obtains smaller MAE and NMAE values than SLOPEONE method. This observation shows that asymmetric correlation and propagation among service users have an important effect on QoS prediction accuracy.

  • Fig. 3(a)–(d) show that the MAE and NMAE values have the similar downward trend of five different methods with different NS, which the MAE is smaller than NMAE consistently and the UPCC method obtains the largest MAE and NMAE values while SLOPEONE-R method always gains the smallest MAE and NMAE values. This illustrates the better prediction accuracy of our method.

  • Both Fig. 3(e) and (f) illustrate that the MAE and NMAE values of five methods generally experience a downward trend with the rise of the Web service number from 1000 to 3000, indicating that more QoS values can improve the prediction accuracy. By contrast, the MAE values of five methods experience an upward trend when the Web service number is set to 4000, since the prediction accuracy is influenced not only by the Web service number, but also by the nature of datasets. Furthermore, too many QoS values may suffer from data over fitting.

  • It can be seen from the Table 1 and Fig. 3 that the absolute values of MAE and NMAE of our method are improved with slight enhancement. The reason is that the number of service users in user-service item matrix is 339 and the mean rank for every user is about 0.002950 with a small value. Though with slight enhancement of MAE and NMAE values, our method can obtain better performance with other metrics which will be discussed in following paragraph.

As shown in Table 2, we compare our SLOPEONE-R method with WSRec method and SLOPEONE method by employing three metrics, such as Precision, Recall and F-measure with the parameter \(\beta \)  = 0.5, \(\beta \)  = 1 and \(\beta \)  = 2 respectively. In the experiment, we set NS = 1000, percent = 0.8, and density = 0.1. In services computing, the “high” QoS values in Eqs. (13) and (14) are more than 0 in the experiment. In addition, the ratios of performance improved by using SLOPEONE-R method versus WSRec method and SLOPEONE method are also presented in Table 2. Furthermore, Fig. 4 illustrates the different levels of performance with seven metrics which are composed of Tables 1 and 2.

Table 2. Performance comparison of three methods

Observing from Table 2 and Fig. 4, we draw the conclusion that:

  • The SLOPEONE-R method outperforms other competitive methods steadily in all metrics except Precision, indicating that by using asymmetric correlation and propagation among service users, the prediction accuracy can be enhanced noticeably.

  • It can be seen from the Table 2 that the prediction accuracy of SLOPEONE-R method obtains significant enhancement on average by 87.33 % versus WSRec, and by 0.47 % versus SLOPEONE for four different metrics (e.g., Recall, F0.5, F1, F2.), indicating better prediction accuracy of our method.

  • In Table 2, the Precision values of three different methods are equal, since \(R_{u,i}\) is the subset of \(T_{u,i}\) in QoS prediction and the true and effective QoS values in \(T_{u,i}\) are all larger 0, then the values of Precision for different methods are equal to 1.

  • In Fig. 4, the same as the experimental results given in Table 1 and Fig. 4, the experimental results of three different methods are following the same trends, which SLOPEONE-based methods outperforms PCC-based methods and SLOPEONE-R method exceeds SLOPEONE method with high values of metrics-Recall, F0.5, F1 and F2.

Fig. 4.
figure 4

Performance comparison with seven metrics (Color figure online)

5 Conclusion and Future Directions

In this paper, we leverage the correlations of service users, more specifically asymmetric correlation and asymmetric correlation propagation among service users, to discover relations among all service users involved in QoS prediction. Furthermore, our method combines asymmetric correlation and asymmetric correlation propagation into the deviation computation of different service items in QoS prediction, by using improved PageRank method and extended SLOPEONE method to improve the prediction accuracy. In addition, experiments are conducted in real-world dataset indicating the effectiveness and feasibility of our method. Future direction includes integrating more QoS properties in our method to predict QoS values. Moreover, experiments of proposed method will be enlarged under more different situations.