Keywords

1 Introduction

Context-Aware Recommender System (CARS) for Web services aims to recommend Web services not only similar to those already rated with the highest score, but also that could combine the contextual information with the recommendation process [1, 4, 10]. In recent years, preliminary benefits have been seen in Web services recommendation considering various contextual factors [2, 3]. For example, temporal [7, 8], spatial [9, 12, 13] and social [5, 6, 11, 14, 16] contexts are widely extracted separately for personalized Web services recommendation.

Specifically, one of the key steps for CARS method is referring to previous service invocation experiences under the similar location of the current user to make Quality of Services (QoS) prediction [17]. Existing works mainly discussed the influence of regional correlations on user preference [14]. There are also several novel methods combining the location-aware contexts with matrix factorization methods [13]. However, the user preference expansion triggered by instant update of user location is not fully observed for personalized recommendation. For instance, when making recommendation for a user, we are expected to be aware of the rapid change of the user location immediately, and thus of the expansion of user preference as well.

In this paper, we propose a Web services recommendation approach dubbed as CASR-UPE. Our approach consists of three steps: (1) model the influence of user location update on user preference; (2) perform the context-aware similarity mining for updated location; (3) predict the QoS of Web services by Bayesian inference and recommend the best Web service for the user subsequently. Finally, we evaluate the CASR-UPE algorithm on WS-Dream dataset [17] by evaluation matrices of both RMSE and MAE. Experimental results show that our approach outperforms the six state-of-the-art benchmark methods with a significant margin.

Hereafter, the paper is organized as follows. Section 2 introduces the related works. Section 3 shows a motivating example. Section 4 gives the details of CASR-UPE method. Section 5 shows the experimental results and discussion. Finally, the general conclusion as well as perspectives in Sect. 6 closes this paper.

2 Related Works

Context-aware recommender system has gained significant momentum in recent years. The development of mobile devices and their crowd sensing capabilities have enabled the collection of rich contextual information on time and location [9, 1113].

In the CARS methods, the temporal contexts [7, 8] have been widely used in conventional CARS methods. Another widely discussed context information is the location context [9, 12, 13], especially in LBSN [5, 6]. A location-aware services recommendation method is presented in [9, 10] by referring to previous service invocation experiences under the similar location with the current user. However, these methods merely consider the location as a filter to make recommendation to the current user. The influence of regional correlation on user preference is also considered in [13]. In addition, a location-based hierarchical matrix factorization (HMF) method [15] is proposed to perform personalized QoS prediction. In short, the above location-based service recommendation methods overlook the user preference expansion triggered by the instant update of user location.

Furthermore, temporal location correlations [20, 21] have been studied for location recommendations in location-based social network (LBSN). However, the location recommendation in LBSN is different from Web services recommendation and the temporal location effects seem to be not suitable for Web services recommendation.

Among the current works of considering location-aware context in CARS, we propose a Web services recommendation approach, considering the influence of user location update on user preference, and then have an updated location similarity mining.

3 A Motivating Example

Figure 1 is a scenario of use to recommend weather forecast services considering the user preference expansion. The upper part represents a Web services repository (\( S_{1} ,S_{2} ,\, \ldots ,\,S_{n} \)) and many service users (\( u_{1} ,u_{2} ,\, \ldots ,\,u_{m} \)), where services and users are distributed all over the world. Suppose S 1 = “Weather ChinaFootnote 1”; S 2 = “Moji Weather ChinaFootnote 2”); S 3 = “US National Weather ServiceFootnote 3”; S 4 = “Le Figaro météo in FranceFootnote 4”. The underlying part illustrates the distributed networks. The curves link users and services to their corresponding geographic positions.

Fig. 1.
figure 1

A scenario of weather forecast services recommendation considering user preference expansion

Firstly, in this scenario of Fig. 1, \( U_{\text{current}} \) is in NYC at present time. As we all know, the accuracy of weather forecast services is highly relevant to the location and it is natural to believe that a user prefers the service either located in his/her city. So when making recommendations to \( U_{\text{current}} \) at present time, we should consider her current location (New York) and recommend S 3 to her. However, if we know that the user will have a conference in Beijing next week (e.g. from his/her mobile phone calendar), we should consider the influence of location update (Beijing) on user preference and recommend S 1 to \( U_{\text{current}} \) when recommending weather services for next week. In a word, when making recommendations to users, we should consider the influence of user location update on user preference.

Secondly, when making recommendations to \( U_{\text{current}} \), we should also consider the set of users in the same location with \( U_{\text{current}} \) because of “the more similar between the current user and another user’s context (e.g. location), the more probability of the two users will have a same preference”. The user location update will lead to different sets of similar users. For instance, \( U_{\text{current}} \) is in New York at present time, her similar set of users is made of \( U_{4} , U_{5} ,U_{6} \) (all from New York). But \( U_{\text{current}} \) will be in Beijing next week, her similar set of users will become \( U_{1} , U_{2} ,U_{3} \) (all from Beijing). So we should also consider the influence of user location update on the set of similar users.

Above all, we consider the expansion of user preference triggered by instant update of user location when making recommendation to the current user.

4 CASR-UPE Algorithm: Context-Aware Web Services Recommendation Based on User Preference Expansion

4.1 Problem Definition

In order to help readers understanding our algorithm better, the following definitions are given.

We assume that there are a set of users \( {\text{U}} = \left\{ {{\text{u}}_{1} ,{\text{u}}_{2} , \ldots ,{\text{u}}_{\text{n}} } \right\} \) and a set of Web services \( S = \left\{ {s_{1} ,s_{2} , \ldots ,s_{m} } \right\} \). \( u_{i} \left( {1 \le i \le n} \right) \) in the context-aware web services recommender system. A service user from U must have invoked a service from \( S \) at least once.

\( L_{U,t} = \left\{ {l_{{u_{i} ,t}} } \right\}\left( {1 \le i \le n} \right) \) is the set of \( l_{{u_{i} ,t}} \) that is the temporal location of user \( u_{i} \).

\( L_{S} = \left\{ {l_{{s_{k} }} } \right\} \) is the set of \( l_{{s_{k} }} \), which is the network location of the service \( s_{k} \).

\( R = \left\{ {r_{{u_{i} ,s_{k} }} } \right\} \) is the set of rating records on the Web service \( s_{k} \) by the user \( u_{i} \), where \( \left( {1 \le i \le n} \right)\; and\; \left( {1 \le k \le m} \right) \).

\( \bar{R} = \left\{ {\bar{r}_{1} ,\bar{r}_{2} , \ldots ,\bar{r}_{i} , \ldots ,\bar{r}_{n} } \right\} \) is the set of mean rating of all Web services invoked by \( {\text{U}} = \left\{ {{\text{u}}_{1} ,{\text{u}}_{2} , \ldots ,{\text{u}}_{\text{n}} } \right\} \).

When a service \( s_{k} \) is invoked by the user \( u_{j} \), it will present a set of QoS properties. We will have \( {\text{Q}}_{{{\text{i}},{\text{j}}}} = < {\text{q}}_{1}^{{{\text{i}},{\text{j}}}} ,{\text{q}}_{2}^{{{\text{i}},{\text{j}}}} , \ldots ,{\text{q}}_{\text{k}}^{{{\text{i}},{\text{j}}}} > \), which is a l-tuple denoting service invocation records of \( s_{k} \) invoked by the user \( u_{j} \), where \( q_{l}^{k,j} \left( {1 \le k \le m,\;1 \le j \le n} \right) \) denotes the value of l-th property recorded during the invocation of \( s_{k} \) called by \( u_{j} \).

4.2 Modeling the Influence of User Location Update on User Preference

We know that the user’s locations are different as time goes on. As described in Sect. 3, for region-related service (e.g. weather forecast services), the accuracy of recommendation is highly relevant to the specific region in real-time. The impact of regional correlation on user preference is defined as:

$$ {\text{P}}_{{{\text{RC L}}\left( {\text{t}} \right)}} = \left\{ {\begin{array}{*{20}c} 1 & {if\;web\; service \;is\; related \;to\; regio} \\ 0 & {if\;web \;service \;is\; not \;related \;to \;region} \\ \end{array} } \right., $$
(1)

For region-irrelated services, it’s also reasonable for users to have a preference to services near to his/her region because the network distance between users and services (mainly because of transfer delay) have an obvious effect on Internet application performance (e.g. response time and throughput). The network distance’s influence on user preference can be defined as:

$$ P_{ND L\left( t \right)} = P_{0} Dis\left( {l_{{u_{i} ,t}} ,l_{{s_{k} }} } \right)_{nor} , $$
(2)

Here we assign 1 to it according to our need. \( Dis\left( {l_{{u_{i} ,t}} ,l_{{s_{k} }} } \right) \) is the network distance between \( l_{{u_{i} ,t}} \) (the user’s network location) and \( l_{{s_{k} }} \) (the service’s network location). Also, network distance measurement technology can help us to get the \( Dis\left( {l_{{u_{i} ,t}} ,l_{{s_{k} }} } \right) \), which should be normalized as \( Dis\left( {l_{{u_{i} ,t}} ,l_{{s_{k} }} } \right)_{nor} \) to have a same evaluation criterion

In addition, different weights are assigned to the impact of both regional correlation and network distance (\( w_{1} \) to \( P_{RC L\left( t \right)} \) and \( w_{2} \) to \( P_{ND L\left( t \right)} \)). Thus, \( w_{1} \) and \( w_{2} \) will be used to represent the influence of user location update on user preference as follows:

$$ P_{L\left( t \right)} = w_{1} P_{RC L\left( t \right)} + w_{2} P_{0} {\text{Dis}}\left( {{\text{l}}_{{u_{i} ,{\text{t}}}} ,{\text{l}}_{{s_{k} }} } \right)_{\text{nor}} , $$
(3)

Finally, we can have a data filtering results based on \( P_{L\left( t \right)} \) and get the services which correspond to the current preference of a user.

4.3 Context-Aware Similarity Mining for Updated Location

In this step, it is assumed that for location-based services recommendation, the more similar between the current user and another user’s context (e.g. location), the more probability of the two users will have similar QoS on the same Web service. The set of similar users with the current user in terms of location will be got after this step.

We use the Euclidean distance later to describe the similarity between the two users’ locations. The nearer the distance is, the more similar they are. The following formula is the presentation of Euclidean distance between \( l_{{u_{i} ,t}} \) and \( l_{{u_{j} ,t}} \):

$$ Sim\left( {l_{{u_{i} ,t}} ,l_{{u_{j} ,t}} } \right) = \sqrt {\sum\nolimits_{k = 1}^{N} {\left( {l_{i,t,k} - l_{j,t,k} } \right)^{2} } } , $$
(4)

Furthermore, we could calculate the distances between current user’s location and other users’ locations, thus and get the set of users who are the closest with the current user.

4.4 QoS Predication and Services Recommendation

In the final step, we use the Bayesian inference to make QoS prediction and services recommendation based on the past invocation records filtered from the above steps. The formula of Bayesian inference is defined as:

$$ P\left( {OS = 1 |s_{i} } \right) = \frac{{P\left( {s_{i} |OS = 1} \right) *P\left( {OS = 1} \right)}}{{P\left( {s_{i} } \right)}}, $$
(5)

In order to explain our formula, we give an example in Table 1. In the example, we suppose that a threshold (e.g. \( q = 0.7 \)) just for the explanation of Bayesian inference(When we have experiment, we will set different values of q to get different results so that we can find the q which lead to the best result). That is to say, if \( QoS > 0.7 \), we will say the service satisfies the user who invoked it, while if \( QoS < 0.7 \), we will say the service is not satisfied. We use 1 to donate “satisfied” while 0 to donate “not satisfied”.

Table 1. Example of Bayes inference

Table 1 shows an example of service invocation records, where each triple \( s_{i} ,u_{j} ,n \) represents a n-th service invocation of \( s_{i} \) by the user \( u_{j} \). According to formula (6), the \( P\left( {OS = 1 |s_{i} } \right) \) donates the prediction QoS of the current user to the Web service \( s_{i} \), \( P\left( {OS = 1} \right) \) donates the probability of the satisfactory ones in all the web service, \( P\left( {s_{i} |OS = 1} \right) \) donates the probability of Web service \( s_{i} \) in the satisfactory ones. The maximum result represents the best service. Thus, we can recommend the top n Web services to the current user. The approaches to calculate \( P\left( {OS = 1 |s_{i} } \right) \) are:

$$ P\left( {OS = 1 |s_{1} } \right) = \frac{{P\left( {\left( {s_{1} |OS = 1} \right) *P\left( {OS = 1} \right)} \right)}}{{P\left( {s_{1} } \right)}} = \frac{{\frac{1}{2} *\frac{1}{2}}}{{\frac{3}{8}}} = \frac{2}{3} $$
$$ P\left( {OS = 1 |s_{2} } \right) = \frac{{P\left( {\left( {s_{2} |OS = 1} \right)} \right) *P\left( {OS = 1} \right)}}{{P\left( {s_{2} } \right)}} = \frac{{\frac{1}{4} *\frac{1}{2}}}{{\frac{3}{8}}} = \frac{1}{3} $$
$$ P\left( {OS = 1 |s_{3} } \right) = \frac{{P\left( {\left( {s_{3} |OS = 1} \right)} \right) *P\left( {OS = 1} \right)}}{{P\left( {s_{3} } \right)}} = \frac{{\frac{1}{4} *\frac{1}{2}}}{{\frac{2}{8}}} = \frac{1}{2} $$

Finally, we can make the QoS prediction for the user \( s_{1} \) by 2/3. Later, according to the results of each Web service, we could rank the value from the higher to the lower. Hence, we conclude that \( s_{1} \) would be recommended to the current user compared with \( s_{2} \) and \( s_{3} \).

The entire procedure of CASR-UPE algorithm is shown as follows.

5 Experiments

In this section, we choose six algorithms to compare with CASR-UPE algorithm on WS-Dream dataset by evaluation metrics of both MAE and RMSE.

5.1 Datasets and Data Processing

WS-Dream [17] dataset 1Footnote 5 is adopted in our experiments, which contains 1,542,884 Web services invocation records executed by 150 distributed service users on 100 Web services. Approximately, every user invokes a Web service 100 times. Each invocation record contains 6 parameters: IP address, WSID (ID of web service), RTT (round-trip time), Data Size, Response HTTP Code, and Response HTTP Message. Since Response HTTP Code and Message are highly related, here we omit the property Response HTTP Code.

The raw data must be normalized before use. Gaussian approach is used to normalize QoS data, due to its well-balanced distribution. The normalization rule for Response HTTP Message is as follows: if the message is “OK”, the normalized value is 1, otherwise it is 0. The normalization rule for RTT and Data Size is defined as:

$$ {\text{r}}_{\text{l}}^{{{\text{k}},{\text{j}}}} = 0.5 + \left( {{\text{r}}_{\text{l}}^{{{\text{k}},{\text{j}}}} - \overline{{{\text{r}}_{\text{l}}^{\text{j}} }} } \right)/\left( {2 *3\upsigma_{\text{j}} } \right), $$
(6)

Where \( \sigma_{j} \) is the standard deviation of user u j ’s QoS data on the l-th property and \( \overline{{{\text{q}}_{\text{k}}^{\text{l}} }} \) denotes the arithmetic mean of QoS data collected from user \( u_{j} \) on the l-th QoS property. Now we can simulate the feedback of a user after invoking a Web service by evaluating the overall QoS of a service. The weight QoS formula can be described as:

$$ {\text{QoS}} = {\text{w}}_{1} * {\text{v}}_{\text{RTT}} + {\text{w}}_{2} * {\text{v}}_{\text{DataSize}} + {\text{w}}_{3} * {\text{v}}_{\text{RHTTPMessage}} , $$
(7)

Where \( w_{1} , w_{2} \) and \( w_{3} \) are set to 0.35, 0.05 and 0.6 respectively according to their different significance. For example, Response HTTP Message shows that whether the invocation succeeded so it is a fundamental property and can be set 0.6. Thus the properties of RTT and Data Size are not that important and they can be set 0.35 and 0.05 respectively.

All experiments were developed with Matlab. They were performed on a Lenove desktop computer with the following configuration: Intel Core i5 2.50 GHz CPU, 2 GB RAM with the Windows 7 operating system.

5.2 Evaluation Metrics

The evaluation metrics [17] we use in our experiments are Mean Absolute Error (MAE) and Root Mean Square Error (RMSE):

$$ MAE = \frac{{\mathop \sum \nolimits_{u,s} \left| {Q_{u,s} - \hat{Q}_{u,s} } \right|}}{N}, $$
(8)
$$ RMSE = \frac{{\sqrt {\mathop \sum \nolimits_{u,s} \left( {Q_{u,s} - \hat{Q}_{u,s} } \right)^{2} } }}{N}, $$
(9)

In the formulas (8) and (9), \( Q_{u,s} \) denotes actual QoS values of a Web service s observed by the user u, \( \hat{Q}_{u,s} \) represents the predicted QoS values of service s for the user u, and N denotes the number of predicted value.

5.3 Evaluation

Comparative Algorithms.

Six algorithms are compared with our CASR-UPE in this paper:

  • RBA (Recommendation by all): recommend web services to a user collected by all users without a filtering.

  • UPCC [22]: recommend web services collected from the users sharing the similar preference with the current user (PCC based on user profiles).

  • IPCC [23]: recommend web services similar to the ones the current user preferred in the past (PCC based on services).

  • CASR [9]: make recommendation based on the service invocation experiences under similar location context with the current user.

  • ITRP-WS [24]: ITRP-WS considers the time decay effects in UPCC.

  • CASR-UP [25]: make recommendation considering the user preference determined by user’s location.

Performance Comparison.

Figure 2 shows the results of MAE and RMSE for different algorithms. The results are generated in different threshold q (from 0.65 to 0.95 and the interval is 0.025) in the ratio 14:1 of training dataset and test dataset. From Fig. 2, we could make the conclusion that: (1) the MAEs and RMSEs of CASR-UPE are much better than other five algorithms when the threshold \( 0.725 \le q \le 0.925 \); (2) it is abnormal when the threshold \( q = 0.95 \); and (3) When \( q \le 0.725 \), the MAEs and RMSEs of the algorithms remain almost invariable. We can also see that the best q is 0.775. In Sect. 5.4, we will further explain the reason of both (2) and (3). In general, the results demonstrate that the significant of CASR-UPE algorithm in recommending web services considering the user preference expansion.

Fig. 2.
figure 2

MAE and RMSE results of compared methods (14:1)

Figure 3 shows the average MAE/RMSE results of the six algorithms in different ratios (8:7, 9:6, 10:5, 11:4, 12:3, 13:2, and 14:1). What we could learn from the results are: (1) as the ratio of training and test data increases the MAE and RMSE results of six algorithms decrease; (2) in different ratios, the results of the CASR-UPE algorithm also performs better than the other six algorithms; and (3) CASR-UPE performs worse than CASR-UP in the ratio of 8:7, 9:6 and 10:5 adopting RMSE as the evaluation metric. In Sect. 5.4, we will further explain reasons for those three results above. In general, the results in different ratios demonstrate that the significant of CASR-UPE algorithm in recommending web services considering the user preference expansion.

Fig. 3.
figure 3

MAE and RMSE results of compared methods (in various ratios)

5.4 Discussion

In this subsection, we will discuss two aspects in our experiments to further explain the results in 5.3.

Trade-off Parameters:

From the results of Fig. 2, we can infer that: (1) The MAEs and RMSEs of CASR-UPE are smaller than other algorithms when the threshold \( 0.725 \le q \le 0.925 \), but why \( q = 0.95 \) is an exception? By analyzing CASR-UPE method, we can find that after selecting some Web services according to the user dynamic preference, the invocation records of these selected services are more useful. As the threshold q rises up to 0.95, most of positive services will be excluded and the results will be in a high and abnormal value. (3) Why is that when \( q \le 0.725 \), the MAEs and RMSEs of the algorithms remain almost invariable? When the threshold q decreases, the request of QoS will decrease and many negative services will be included. When q is low enough, all Web services will be included, thus the MAEs and RMSEs remain invariable. (4) We could conclude that the threshold q for the calculated probability is highly relevant to the result. If q is too low, many negative Web services will be included, while if q is too high, many positive Web services will be excluded.

Figure 3 shows the influence of different ratios on the MAEs and RMSEs results. When the ratio of training dataset and test dataset arises, more data is used to train the algorithm and few data is used to test the results. Thus, the accuracy will be better. However, why does CASR-UPE perform worse than CASR-UP in the ratio of 8:7, 9:6 and 10:5 adopting RMSE as the evaluation metrics? The possible reason may come from the randomly changed training dataset when the ratio of training data: test data decreases.

Impact of User Preference Expansion.

Comparing with other algorithms not considering user preference expansion, we got the results of the impact of user preference expansion on recommendation accuracy. The results shown in the Figs. 2 and 3 collectively demonstrate that: (1) the combination of the influence of user location update on user preference get a better recommendation; and (2) the updated location similarity mining could also further improve the accuracy of recommendation.

6 Conclusion

In this paper, we propose the CASR-UPE algorithm, for modeling the influence of user location update on user preference and performing updated location similarity mining. Finally, the experiments results show that CASR-UPE algorithm improves predictive accuracy and outperforms the compared methods.

Despite the significant progress of user preference expansion in context-aware Web services recommendation, there still remain numerous avenues to explore. Our future works include: (1) incorporate novel context properties, such as social context (interpersonal interest similarity, interpersonal influence among social network, etc.) to improve more personalized recommendation; and (2) focus on the correlations between context properties, such as temporal-spatial correlations to improve the accuracy of QoS prediction.