Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Recommender Systems are defined as software tools and techniques that provide suggestions for items to be of use to a target user [14]. Although recommender systems have been an important research area since the mid-1990s, interest in this area has increased dramatically as a result of the vast amounts of online data, information, and options that are now available to users [1, 6]. Recommender systems are now a vital component of many online services and retail e-commerce sites including Amazon (general products), Netflix (movies), YouTube (videos) and Pandora (music).

Collaborative filtering has been the most popular and successful approach in building recommender systems [12, 17]. In contrast to content-based filtering, in which the list of recommendations is based on the similarity of an item to items the user has previously purchased, collaborative filtering bases the recommendations on the item similarities as well as the purchases of users who have similar tastes and preferences; for example, if a target user had bought books on recommendation systems in the past, then recommend books that are preferred by other users who have also bought books on recommendation systems. One of the main problems suffered by collaborative filtering approaches is the sparsity problem; i.e., because the number of items is so large, the average user will only have rated an extremely small proportion of these items, meaning that even the most popular items will have very few ratings.

Graph-based random walk models have recently become a popular approach to collaborative filtering [2, 4, 12, 16, 17]. These models represent users and items as the nodes of a graph. A user node and item node are connected if the user has rated the item, purchased the item, or displayed interest in the item in some other way. In the case of ratings data (which we focus on in this paper), the value of the rating is usually represented as the weight of the edge connecting the user and item. Commencing from the node for some target user, a random walk is performed on the graph, and the results are then used to perform useful tasks such as ranking items in order of their importance to the user. Representing the relationships between users and items as a graph helps overcome the data sparsity problems inherent in more traditional approaches to collaborative filtering. Graph-based models also offer the advantage of being easily able to incorporate social network data, as recently demonstrated in [2, 16].

A problem with the graph-based representation described above, which we will refer to as the ‘weighted-edge representation’, is that it is unable to fully capture the similarity relations that are implicit in the data. The premise of collaborative filtering is to make item recommendations based on user similarities; however, because random walk favors large-weighted connections, walk is more likely to proceed through two users that share a high rating for some item than through users who share a low rating. That is, two users who are similar by virtue of rating the same items highly will be more influential in the construction of the recommendation list than two users who have rated the same items poorly. We refer to this as the ‘High Ratings/Low Ratings’ problem. The problem arises as a result of representing ratings using a single weight.

In this paper we introduce a novel scheme for representing recommendation data as a graph. Unlike the weighted-edge approach, in which a user and an item are each represented by a single node, under the proposed scheme item ratings are represented by multiple nodes. This enables the explicit representation of low ratings, thus facilitating the flow of information through both low-rating and high-rating connections. Empirical results on the MovieLens dataset show that recommendations made using the proposed scheme are much better correlated with results in the test ratings set, and that under a top-k evaluation, there is an improvement of up to 15 % in precision and recall. An attractive feature of the approach is that it also associates a confidence value with a recommendation.

The remainder of the paper is structured as follows. Section 2 describes the weighted-edge graph representation, and how random walk commencing from some target user can be used to produce a ranking of nodes according to their importance to that user. The section also provides a simple example demonstrating the inability of the weighted-edge representation to fully capture and utilize similarities based on low rankings. Section 3 presents the proposed approach, which we refer to as the ‘rating-nodes representation’. Section 4 presents results comparing the performance of the two approaches on the well-known MovieLens dataset. Section 5 provides further discussion of the rating-nodes approach, and concludes the paper.

2 Graph-Based Recommender Models

We define a weighted-edge recommendation graph as an undirected graph G = {V, E}, in which the set of nodes V is the union of a set of user nodes U and a set of item nodes I. A user node is connected to an item node if the user has rated the item; i.e., for some user u ϵ U and item i ϵ I, edge (u, i) ϵ E only if r ui  ≠ 0. Associated with the edge will be a weight w ui , based on some normalization of r ui . We assume for simplicity that the recommendation graph is bipartite, meaning that edges may exist only between user nodes and item nodes, but not between nodes of the same type; however, this assumption does not limit the generality of the approach, which can easily be extended to deal with connections between nodes of the same type, as well as the introduction of other node types (e.g., nodes representing social network data, as in [16]).

The Markov Chain describing the sequence of nodes visited by a random walker is called a ‘random walk’. The concept of random walk is important, because the amount of time a random walker spends visiting some node provides a measure of the relative importance of that node. The probability of moving from node i to node j is calculated by dividing the edge weight w ij by the sum of the weights of all of i’s outgoing edges:

$$ p_{ij} = \frac{{w_{ij} }}{{\sum\nolimits_{k \in N(i)} {w_{ik} } }} $$
(1)

where N(i) denotes the nodes that are the immediate neighbors of i. The larger the value of an outgoing weight, the larger is the probability of traversing that edge as opposed to the other outgoing edges.

The relative importance of nodes can be determined by finding the stationary distribution. For a graph with adjacency matrix W = {w ij }, the stationary distribution can be found by solving the eigenvector equation x = Wx, in which case the dominant eigenvector x will represent the stationary distribution. Page-Rank [3] is a variation of this in which at each step the walker is teleported with probability (1 – d) to a random node rather than following an edge (typically d ≅ 0.85). Personalized PageRank [8] is a further variation in which the user is teleported not to a random node, but to the node corresponding to that user. The eigenvector equation in this case is

$$ \varvec{x} = d\varvec{Wx} + (1 - d)\varvec{\theta} $$
(2)

where θ is a vector in which θ = 1 for the node corresponding to the target user, and 0 for all others. The components of x are the PageRank values. Each node will have a PageRank value.

In order to find the dominant eigenvector x, the above eigenvector equation needs to be solved. A general and robust approach is power iteration, which begins with a random vector x k , and iterates the step \( \varvec{x}_{k + 1} = d\varvec{Wx}_{k} + (1 - d)\varvec{\theta} \) until convergence, when x will be the dominant eigenvector [13]. Note that the dominant eigenvector will not be unique, since any linear scaling of this eigenvector will also satisfy the eigenvector equation. Therefore it is the relative, not absolute, scores which are important. It is common to normalize the eigenvector such that its components sum to one.

The PageRank values obtained by performing personalized PageRank for some target user provide a measure of the relative importance of those nodes to the user. A recommendation list for that user can be constructed by simply sorting the user’s un-rated items in decreasing order of PageRank.

2.1 The High Ratings/Low Ratings Problem

When a random walk is applied to a weighted-edge graph as described above, the Page-Rank value of an item node provides a relative measure of the strength of the recommendation of that item to the target user. The PageRank for an item ultimately depends on how similar users rated the item, and we now demonstrate through a simple example that the weighted-edge representation is deficient in fully capturing the similarity relations that may be present in ratings data.

Consider the graph shown in Fig. 1, which represents recommendation data for four users and three items. The edge weightings represent the ratings, and can take an integer value from 1 to 5, with a 1 representing a low rating and a 5 representing a high rating. The absence of an edge indicates that the user has not rated the item. Table 1 is the adjacency matrix for the graph, where a value of 0 indicates no edge (i.e., no rating).

Fig. 1.
figure 1

Sample recommendation graph for 4 users and 3 items.

Table 1. Adjacency matrix for Fig. 1 graph

User 1 and User 2 gave a rating of 5 to Item 1, while User 3 and User 4 gave a rating of 1. Users 1 and 2 liked the item as much as each other, and Users 3 and 4 (dis)liked it as much as each other. In addition to this, User 1 gave a rating of 5 to Item 2, while User 3 gave a rating of 5 to Item 3. We want to find out how User 2 would have rated Item 2 and how User 4 would have rated Item 3. That is, we need Item 2’s PageRank for target User 2, and Item 3’s PageRank for User 4.

Table 2 shows the PageRank values for the three items for separate random walks commencing from each user. Note that the PageRank values in each row do not sum to unity. This is because some of the PageRank will be assigned to user nodes. The percentages indicate the proportion of item PageRank assigned to each item node. The PageRank of Item 2 for target User 2 is 0.0795 (17 %), while the PageRank of Item 3 for User 4 is significantly lower than this at 0.0425 (9 %). This is despite the fact that, on the basis of the ratings data available, User 1 and User 2 are as similar to each other (i.e., one rating in common) as User 3 and User 4 are to each other (also one rating in common). Also problematic is the fact that for User 4, Item 2 actually receives a higher PageRank than Item 3, which is also at odds with what the similarities in the ratings data suggest.

Table 2. PageRank values for Fig. 1 graph

We refer to this problem as the ‘High ratings/Low ratings’ problem. The problem occurs because the weighted-edge representation causes random walk to favor high-weighted edges. Item 1 has four outgoing links: edges of weight 1 to each of Users 3 and 4; and edges of weight 5 to Users 1 and 2. Any walk passing through Item 1, will be more likely to proceed to Item 2 (via User 1) than to Item 3 (via User 3), thus explaining the observations from Table 2. Under the weighted-edge representation, two users who are similar by virtue of rating the same items highly will be more influential in the construction of the recommendation list than two users who have rated the same items poorly. In the next section we provide an alternative representation scheme that avoids these problems.

3 Rating Node Representation

In order to overcome the High/Low Ratings problem, we propose in this section a novel graph representation scheme which we refer to as the ‘rating-nodes representation’.

Whereas in the weighted-edge representation each item is represented by a single node, under the rating-nodes representation each item is represented by multiple nodes, which we refer to as ‘rating nodes’. That is, G = {V, E} is an undirected graph in which V: = UI R , where U is the set of users, and I R : = I 1I 2 ∪  …  ∪ I N is the set of item rating nodes. Corresponding to each item there are N rating nodes, not just one. This allows the representation of an item rating to be distributed over a collection of rating nodes, and it is this distributed representation that allows a fuller representation of the similarity relationships implicit in the ratings dataset.

There is some scope in how an item rating may be represented using such a collection of nodes. In the following, each of the rating nodes for an item corresponds to one of the possible ratings that can be given to an item. For example, in the MovieLens dataset, ratings can take an integer value from 1 to 5; thus ratings will be represented using five rating nodes, each corresponding to each of the possible rating values. If a user u ϵ U gave movie i a rating of n (where n is a number between 1 and 5), then there will be an edge (of weight 1) between user node u and item rating node i n , but no connection (i.e., weight 0) between u and the other rating nodes for item i. Suppose that a user has given an item a rating of 5, and that another user has given the same item a rating of 1. Figure 2a shows the weighted-edge representation, and Fig. 2b shows the rating nodes representation.

Fig. 2.
figure 2

Recommendation graph for 2 users and 1 item (a) weighted-edge representation; (b) ranking-node representation.

The rating–nodes representation does not change the nature of the random walk, and personalized PageRank can be applied exactly as described in Sect. 2. At convergence, the PageRank for a particular item will be distributed over the rating nodes for that item. Prior to describing how these PageRank values can be collapsed to provide a single recommendation value, we return to the example from Sect. 2.1 to demonstrate how the rating-nodes representation overcomes the High Ratings/Low Ratings problem.

3.1 High Ratings/Low Ratings Revisited

Under the ranking-nodes representation, the rating information in Table 1 would be represented as per the adjacency matrix shown in Table 3. Running personalized Page-Rank from each of the four users, results in the PageRank values shown in Table 4. There are two things to note. Firstly, the PageRank of Item 2 for target User 2 (0.101) is now the same as the PageRank value of Item 3 for target User 4. Secondly, the PageRank value of Item 3 for User 4 (0.101) is now equal to the PageRank value of Item 2 for User 2. The rating-nodes representation has solved the High Ranking/Low Ranking problem: similarity is equally propagated through both low and high ratings.

Table 3. Adjacency matrix for Fig. 1 graph (ranking-nodes representation)
Table 4. PageRank values for Fig. 1 graph (ranking-nodes representation)

3.2 Recommendation Value

The recommendation value \( rec_{ui} \) of an item i for some target user u is simply the PageRank-weighted sum of the rating values represented by the rating nodes; i.e.

$$ rec_{ui} = {{\sum\nolimits_{j = 1}^{n} {x_{i}^{j} r_{j} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{j = 1}^{n} {x_{i}^{j} r_{j} } } {\sum\nolimits_{j = 1}^{n} {x_{i}^{j} } }}} \right. \kern-0pt} {\sum\nolimits_{j = 1}^{n} {x_{i}^{j} } }} $$
(3)

where \( x_{i}^{j} \) is the PageRank score for the j th rating node for item i and r j is the rating value represented by the rating node (1 through to 5 for the MovieLens dataset). While \( rec_{ui} \) will always take a real value on the interval [1 5], this value should not be interpreted as being on the same scale as the ratings in the dataset, since by the nature of the averaging process the \( rec_{ui} \) values will rarely take extreme values of 1 or 5. Note also that the same value for \( rec_{ui} \) may result from more than one combination of values for the \( x_{i}^{j} \). For example, multiplying each of the \( x_{i}^{j} \) by a factor of 2 will not affect the value of \( rec_{ui} \). The total PageRank associated with an item i (i.e., the denominator in the above equation) can be interpreted as the strength of the recommendation. That is, some \( rec_{ui} \) value may be predicted strongly or weakly.

3.3 Additional Comments

One potential issue that can be identified from Table 4 concerns the disconnection between nodes. For example, random walk starting at User 3 now gives a PageRank of 0 for Item 2. This might be considered reasonable on the grounds that there is no longer a path from User 3 to Item 2 (because Users 3 and 1 disagree on their rating of Item 1, which has a pivotal position in this graph). However the fact that Users 1 and 3 even reviewed the same movie (despite disagreeing about their ratings of it) suggests that there should perhaps be some connection between the two users. This issue of disconnection is not a problem under the weighted-edge representation since any item that was reviewed by the same two reviewers will result in weights of at least 1 connecting the two users with that item node. This problem can be solved in the rating-nodes approach by inserting edges, weighted with some small positive value α (e.g., 0.05), between user nodes and item rating nodes corresponding to recommendations not made by the user. However we will see from the results in Sect. 4 that in the case of large graphs, disconnection is not an issue, as there are many other pathways through which random walk can proceed. While the rating-nodes representation that we have described in this section has assumed that nodes are of two types—user nodes and item nodes—the representation can easily be extended to incorporate social attributes. In this paper we are primarily interested in determining whether the rating-nodes representation leads to improved performance, and we leave the investigation of incorporating social attributes to future work.

4 Experiments

This section presents empirical results comparing the performance of the rating-nodes and weighted-edge representations on the well-known MovieLens dataset (www.movielens.umn.edu) [5]. The version of the dataset used in this research contains 943 users, 1682 movies and 100,000 ratings in the scale of 1 to 5. The data is organized into five 80 %-20 % training/test splits, for use in five-fold cross-validation. The dataset has been collected on the MovieLens web site and made available by GroupLens Research and has been used in [6, 10, 11, 15, 16].

4.1 Correlation-Based Evaluation

The recommendation value \( rec_{ui} \) defined in Sect. 3.2 provides a measure which can be used to produce a ranked item recommendation list for the target user. This suggests an evaluation based on correlation; i.e., how well does the ranking of items in the recommendation list correlate with the ranking of items by their star-rating value in the test set?

This correlation can be measured using Spearman’s rank correlation coefficient, defined as

$$ \rho = \frac{{\sum\nolimits_{i} {\left( {x_{i} - \bar{x}} \right)\left( {y_{i} - \bar{y}} \right)} }}{{\sum\nolimits_{i} {\left( {x_{i} - \bar{x}} \right)^{2} \sum\nolimits_{i} {\left( {y_{i} - \bar{y}} \right)^{2} } } }} $$
(4)

where x i and y i are ranks of the scores represented by variables X i and Y i . In our case, one of these variables corresponds to the recommendation values; the other to the star-ratings in the test set. Since the star-ratings in the test set can only take an integer value in the set {1, 2, …, 5}, the test ratings will involve many ties. Ties are assigned a rank equal to the average of their position in the descending order of values. For example, assuming that six movies are sorted according to their star-ratings of 5, 4, 4, 3, 2, 1 (second and third movies tied with a rating of 4), the rankings given to the movies would be 1, 2.5, 2.5, 4, 5, 6 (The second and third movie both receive a rating of 2.5 = (2 + 3)/2).

Table 5 shows the Spearman rank correlations between test rankings and recommendation list for the weighted-edge and rating-nodes representations. For the latter we used both α = 0 and α = 0.1. The correlations were obtained by averaging the correlations across all users and all 5 train/test splits. The recommendation values resulting from the rating-nodes approach are clearly much more highly correlated with the test set ratings than are the recommendation values from the weighted-edge approach.

Table 5. Spearman rank correlations between test ratings and recommendation values

To explore this further, Fig. 3 shows scatterplots of recommendation rankings (vertical axis) versus test set rankings (horizontal axis) for a typical user (User 1, Test Set 1). The test set for this user consisted of 137 movie reviews. Higher-rated movies appear to the left of the plot; lower-rated movies appear to the right. Thus the first vertical bar corresponds to movies with a 5-star rating; the second to movies with a 4-star rating, etc. The correlation coefficient for the weighted-edge representation is 0.331, and that for the rating-nodes representation is 0.578. This difference in correlation can be clearly discerned through visual inspection, and is particularly apparent for both higher-rated and lower-rated movies, and to a lesser degree for middle-rated movies.

Fig. 3.
figure 3

Recommendation rankings (vert.) versus test rankings (horz.) for User 1, Test_Set 1. (a) weighted-node representation (ρ = 0.331); (b) rating-nodes representation (ρ = 0.578).

4.2 Top-K Evaluation

A common approach to evaluating recommendations on the MovieLens dataset is Top-k evaluation (used, for example in [7, 9, 16]). Under this style of evaluation, a recommendation set is formed by selecting the k most highly recommended items, and a test set is formed by selecting the most highly rated items from some test data. Measures such as Precision and Recall are then calculated based on these lists. It is not surprising that graph-based approaches such as the weighted-edge approach typically perform well under this type of evaluation, since these type of recommendation systems are designed to predict higher-rated items. Given the correlation results above, it is interesting to compare the performance of the rating-nodes approach with the weighted-edge approach under this style of evaluation.

To evaluate the predictions for some user u, we select the k most highly recommended movies from the sorted recommendation list, and place these in the Recommendation Set (recset). Following Shang et al. [16], we use only the test set movies with a 5-star rating. We place all of the movies to which user u has given a 5-star rating in the Test Set (testset). Any movie which appears in both the Test Set and the Recommendation Set is a hit. Recall, Precision and F1-measure are then defined as follows:

$$ \begin{aligned} Recall & = \frac{{\left| {recset \cap testset} \right|}}{{\left| {testset} \right|}} \\ Precision & = \frac{{\left| {recset \cap testset} \right|}}{{\left| {recset} \right|}} \\ F1 - measure & = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall} \\ \end{aligned} $$

Figure 4 shows the Precision, Recall, and F1 curves corresponding to the weighted-edge and rating-nodes representations for k = 5, 10, …, 50. (For the rating-nodes representation, curves for α = 0 and α = 0.1 were identical). The values are averaged across all users and all 5 training/test splits. As can be seen from the curves, the Precision and Recall for low values of k are approximately 15 % higher for the rating-nodes representation than they are for the weighted-edge representation. As the value of k increases, the difference in performance becomes less significant.

Fig. 4.
figure 4

Recall, Precision and F1-measure for Top-k analysis.

5 Conclusion

Graph-based approaches provide a powerful means of combining collaborative and content-based filtering. Users are similar if they like the same items, and items are similar if they are liked by the same users. This provides a powerful mechanism by which similarity relations can be propagated through such networks. In Sect. 2.1 we demonstrated that while the weighted-edge representation is able to capture similarities between users who share high-ratings for products, it is not able to adequately capture relationships between users who share a low-rating for some item, and we referred to this as the high-ratings/low-ratings problem. The rating-nodes representation proposed in Sect. 3 was designed to overcome this problem.

Results on the MovieLens dataset have shown that recommendation rankings made using the rating-nodes representation are much better correlated with results in the test ratings than are the rankings produced using the weighted-edge approach. However, what is surprising about this result is that the correlation improved so dramatically over the full range of ratings; i.e., for high ratings as well as low ratings. While we expected a significant increase in correlation for low ratings, there is less room for improvement in the case of high rankings since the weighted-edge approach is designed to predict high ratings. The improved performance in predicting high ratings was corroborated by the results of the top-k analysis, in which the rating-nodes approach displayed a clear improvement in precision and recall over the weighted-edge approach.

The graphs we used in this paper were bipartite; i.e., edges exist only between user nodes and item nodes, but not between nodes of the same type. However, both the weighted-edge and rating-nodes representations are general, and can easily represent user-user links, as well as item-item links. There has been considerable recent interest in incorporating social network data into graph-based recommender systems. For example Bogers [2] and Shang et al. [16] have recently incorporated social network data into graph-based recommender systems using the weighted-edge representation. With the increasing amount and variety of social networking data that is becoming available, this area of research will no doubt continue to grow, and we believe that the distributed, multiple-node representation scheme that we have proposed in this paper will provide a powerful method of incorporating such data, thus allowing a richer representation of similarities than is possible through weighted-edge representations.