Abstract
Recommender systems are becoming more and more attractive in both research and commercial communities due to Information overload problem and the popularity of the Internet applications. Collaborative Filtering, a popular branch of recommendation approaches, makes predictions based on historical data available in the system. In particular, user based Collaborative Filtering largely depends on how users rate various items of the database and the success of such a system largely relies on pair wise similarity between users. However popular items may give a negative effect on choosing similar users of the target user. The proposed work namely User Similarity Adjustment based on Item Diversity (USA_ID) is designed to achieve personalized recommendations by modifying user similarity scores, for the purpose of reducing the negative effects of popular items in user based Collaborative Filtering framework. A Recommender system is focusing exclusively on achieving accurate recommendations i.e., providing the most relevant items for the needs of a user. From user’s perspective, they would not be interested when they are facing monotonous recommendations even if they are accurate. Whilst much research effort is spent on improving accuracy of recommendations, less effort is taken on analyzing usefulness of recommendations. Novelty and Diversity have been identified as key dimensions of recommendation utility. It has been made clear that greater accuracy leads to lower diversity which results in accuracy-diversity trade off in personalized recommender systems. The proposed work provides an approach to increase the utility of a Recommender system by improving accuracy as well as diversity. Experiments are conducted on the bench mark data set MovieLens and the results show efficiency of the proposed approach in improving quality of predictions.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Recommender Systems (RS) are helping people to identify their preferences from large collection of candidate objects. They are used in variety of applications such as online recommendation of books [5], CDs [18], movies [10], news [12] and many others. RSs are now popular both commercially and in the research community [21]. Many commercial applications like Amazon.com\(^\mathrm{TM}\) (www.amazon.com), Netflix (www.netflix.com), etc., make use of recommendations in order to attain business profits. RSs can be viewed as personal information retrieval in which there is no explicit query to express user’s wish rather implicit information about user’s interest. RSs are getting more and more attraction from electronic commerce domains as they have potential value in business. Research communities from Machine Learning, Data Mining, Information Retrieval and Statistics are working on RS domain.
RSs are based on one of three strategies [3]. They are Collaborative Filtering (CF) [20] and Content Based Filtering [19](CB) and a hybrid of both the approaches [8]. CB creates a profile for each user or product to characterize its nature. The profiles are used to associate users with matching products [19]. An alternative to CB is CF which relies only on past user behavior in the form of previous transactions or product ratings [13]. Collaborative Filtering analyzes relationships between users and inter-dependencies among products to identify new user-item associations.
CF approach can be designed based on either user similarities or item similarities that are derived from historical data [22]. In CF, the prediction is based on a database of past purchases, or ratings made by the system users [19]. The ratings given by users for various items are available in the system. Each rating shows how much an item is liked by a particular user. The task of a CF based recommender system is to predict how much a user likes an item which is currently unrated.
The most common form of CF is the neighborhood-based approach (also known as k Nearest Neighbors) [15]. The neighborhood CF techniques can be user based or item based [22]. These KNN techniques identify items that are likely to be rated similarly or like-minded people with similar history of rating or purchasing, in order to predict unknown relationships between users and items. Merits of the neighborhood-based approach are intuitiveness, sparing the need to train and tune many parameters, and the ability to easily explain the reasoning behind a recommendation [4]. Item-based approach looks into set of items similar to the target item i and selects k most similar items \(i_1\),\(i_2\),...,\(i_k\). Once the most similar items are found, the prediction is computed by taking a weighted average of the target user’s ratings for those similar items [20].
User-based Collaborative Filtering approach assumes that users who agreed on preferred objects in the past will tend to agree in the future. User-based approach looks into the set of users who share similar preferences with the target user u and selects k most similar users \(u_1\), \(u_2\),... \(u_k\) and makes prediction for the target item i based on the preferences given by them for the item i.
The ultimate goal of any RS is to satisfy user’s requirement [11]. Improving accuracy of predictions has been the motive for RS research community for few decades. At the same time there is a growing demand in the user population to receive interesting rather than accurate recommendations. Recommending same kind of items ever to a user results in monotonic behavior of Recommender Systems [17]. Always recommending highly similar items to the items already rated by a user brings no novelty to the user [3]. In particular novelty and diversity in recommendations are identified as key dimensions in user satisfaction in many application domains [23]. For E-Commerce websites, recommending variety of items has the potential to make more profits by increasing sales diversity [6].
In user based Collaborative Filtering, similarity between users is the core part of recommendation. If two users correlate in more number of items then their similarity score is high. The similarity score will be less if they co-rate less number of items. Popular items would have been rated by many users. So even if there is no actual correlation between two users, since they have co-rated popular items, their similarity score is influenced by the popular items. There fore the proposed work reduces the negative impact of popular items in user similarity computations.
The objective of the proposed technique is to overcome the limitations of accurate recommendations by giving a proper trade off between accuracy and diversity of recommendations. The proposal
-
calculates global popularity score of each item based on ratings available in the system, from which each item’s global diversity score is calculated
-
modifies pair-wise user similarity which is calculated from the historical data using Pearson Correlation and Cosine similarity with the help of global diversity score of items to reduce the adverse effect of popular items in user based collaborative filtering framework
-
empirically shows the usefulness of the proposed user similarity modification approach on the benchmark data set,namely MovieLens
The rest of the paper is organized as follows. Section 2 describes the state of the art user based Collaborative Filtering framework, Sect. 3 discusses about related work of diversity enhancement available in the literature, Sect. 4 describes the proposed method of diversity enhancement, Sect. 5 discusses about experimental evaluations and Sect. 6 gives future development of the work and conclusion.
2 Existing User-Based Collaborative Filtering Framework
This section describes user based Collaborative Filtering framework [16] which is used as base line technique for making predictions.
In user based CF, given a target user u, users who share similar rating pattern with u are considered as neighbors and their ratings are used to predict the unrated items of u. The effectiveness of user based CF methods depends on pairwise similarity scores between users. Each user profile (row vector) is sorted by its dis-similarity towards the target user’s profile. Ratings by similar users contribute to predict the target item rating. Most commonly used metrics for calculating similarity between items are Cosine similarity [22] and Pearson Correlation coefficient [3].
Pearson Correlation coefficient to compute similarity between each pair of users is formulated as
where I is the set of items rated by both users u and v. \(r_{u,i}\) is the rating provided by user u for item i. \(\bar{r}_u\) is the average rating of user u. \(S_{u,v}\) can be between \(-1\) and \(+1\).
By treating each user profile as a vector in a high dimensional space, Cosine similarity calculates the similarity score between two users as the cosine of the angle between the two corresponding user profile vectors. Cosine similarity between users u and v is calculated as
The most important step in a Collaborative Filtering system is to generate the output interface in terms of predictions [20]. Once the set of most similar users of the target user u is identified the next step is to calculate target user \(u's\) rating for an item i with the help of the ratings provided by those neighbours for the item i. Predictions can be made based on the weighted average of known ratings as given below
where \(P_{a,i}\) is the predicted value of target user a for item i and K is the set of Top k similar users of the target user a.
3 Related Work
Making only accurate recommendations is not always useful to users. For example, recommending only popular items (e.g., blockbuster movies that many users tend to like) could obtain high accuracy, but also can lead to a decline of other aspects of recommendations, including recommendation diversity [2]. Recommending long-tail items to individual users can intensify this effect. Thus, more consumers would be attracted to the companies that carry a large selection of long tail items and have long tail strategies, such as providing more diverse recommendations [7].
Diversification is defined as the process of maximizing the variety of items in recommendation lists [1]. In [24], Ziegler et al. did a large scale online study, and their experimental results show that users’ overall satisfaction with recommendation lists not only depends on accuracy, but also on the range of interests covered. They also found that human perception can only capture a certain level of diversification inherent to a list.
Temporal diversity is an important facet of recommender systems [17]. The authors showed that how CF data changes over time by performing a user survey and they evaluated three CF algorithms from the point of view of diversity in the sequence of recommendation lists produced over time.
In [23], the author classifies diversity into two types, namely, aggregate diversity and individual diversity. The first case accounts for how different are items in a recommendation list for a user, which is normally the notion of diversity employed in most works. Nevertheless, aggregate diversity is understood as the total amount of different items, a recommendation algorithm can provide to the community of users.
Brynjolfsson et al. [6] demonstrated that recommendations would increase sales of the items in the long tail, resulting in the improvement of aggregate diversity in contrast to individual diversity. Herlocker [16] proposed aggregate diversity measure to be the percentage of items that the recommender system is able to make recommendations for (often known as coverage). Gediminas Adomavicius et al. [2], talked about the importance of aggregate diversity in recommendation. They proposed diversity-in-Top-N metric which can serve as an indicator of the level of personalizations provided by a recommender system.
A common approach to diversified ranking is based on the notion of maximal marginal relevance (MMR) [9]. Marginal relevance is defined as a weighted combination of the two metrics namely, accuracy and diversity in order to account for the trade-off between them. A method called PLUS (Power Law adjustments of User Similarities) is proposed in [14] to achieve personalized recommendations. PLUS makes use of power function to adjust user similarity scores for the purpose of reducing adverse effects of popular objects in the user based Collaborative Filtering framework. The proposed work (USA_ID) aims to reduce the negative effect of popular items in order to improve the quality of recommendations.
4 Proposed Technique
The objective of a personalized recommender system is to rank a set of items for a given user so that highly ranked items are more preferred by the user [14]. In order to achieve this, the proposed technique called User Similarity Adjustment based on Item diversity (USA_ID) modifies the pair-wise similarity between users. The modification is done to reduce the negative impact of popular items which is expected to be rated by most of the users.
4.1 User Similarity Computation
If there are m users who have given ratings for n items, then the ratings data can be represented as an mXn matrix with rows representing users and columns representing items. The matrix is called user-item rating matrix R. Each element \(r_{u,i}\) is an ordinal value ranging from \(R_{min}\) to \(R_{max}\). Unrated values are considered to be zero.
For the given mXn user item rating matrix R, user-user similarity matrix can be represented as an mXm symmetric matrix S. The matrix rows and columns represent users and each \(S_{u,v}\) represents the similarity between user u and user v. More specifically given the profiles of users u and v, the similarity between them is given by
where \(s_{u,v}\) can be Pearson Correlation Coefficient or Cosine similarity as defined in Sect. 2.
4.2 Computing Popularity Score of Items
Global popularity score of an item i is defined as the ratio of number of users who rated for the item to the total number of users in the system. Popularity score of an item will be more for items which have been rated by many users. Popularity score of item i, \(pop_i\) is defined as
where \(U_i\) is the set of users who rated for item i and \(r_{u,i}\) is the rating assigned by the user u for the item i and m is the total number of users. From popularity score of item i, one can compute global diversity score of item i as
4.3 User Similarity Modification
The basic assumption in user based Collaborative Filtering is that users with similar preferences will have similar preferences in future. Therefore predictions are made based on the preferences given by close neighbours of the target user.
Similarity between two users is based on how they agree while giving preferences for various items of the domain. In an extreme case, popular items should have been rated by many users, and thus the chance of any two users to correlate on those items is high [14]. As a consequence of this, less similarity score is assigned to two users when they correlate only on popular items. Even though such users are less preferred in prediction process, still they have impact on the quality of predictions. So the proposed technique gives a discount to the pair wise similarity between two users who have correlated only in popular items. Thus the pairwise similarity of such users is multiplied by the diversity score of the items in which they correlate. The proposed technique adjusts user similarity values which can be calculated using Pearson Correlation or Cosine similarity.
In order to do so, the similarity between each pair of users is modified as
where C is the set of items co-rated by users u and v and t is the cardinality of the set C. \( T_{u,v}\) is the modified user similarity between users u and v. The modification is done for every element of the similarity matrix S. Further the prediction computations are done based on the modified user similarity values which is given as
where \(P_{a,i}\) is the predicted value of active user a and item i and K is the set of Top k similar users of the active user a.
5 Experimental Results
This section discusses about the data set used, evaluation metrics and the effectiveness of the proposed approach, USA_ID.
5.1 Data Set Used
The experiments are conducted on Movielens 100k (www.Movielens.com), which is a standard data set for discussing the efficiency of Collaborative Filtering techniques.The data set contains ratings given by 943 users for 1683 items. The ratings are in the range 1 to 5. Total number of ratings available is 100000. We split the data set into two sets namely train with 80 % of the ratings of the original rating matrix and test with the remaining 20 % of the ratings. Five cross validation is done on the data set to report the results.
5.2 Evaluation Metrics Used
In order to measure accuracy of the predictions, two categories of techniques namely accuracy and diversity metrics are adopted. In order to prove accuracy of the proposed approach two classification accuracy measures namely Precision and Recall [16] are considered. Precision is defined as the ratio of relevant items selected to number of items selected. Precision represents the probability that a selected item is relevant.
Recall is defined as the ratio of relevant items selected to total number of relevant items available. Recall represents the probability that a relevant item will be selected.
where \(N_{rs}\) is the number of relevant items retrieved, \(N_s\) is the number of items retrieved and \(N_r\) is the total number of relevant items in the data set.
In order to prove diversity of the proposed approach two metrics namely, ILD (Intra List Diversity) and MN (Mean Novelty) are used. Ziegler, et al. [24] introduced the ILD to assess the topical diversity of recommendation lists, which is computed in terms of decreasing ILS (Intra List Similarity). The authors suggested that ILS is an efficient measure that complements existing accuracy measures to capture user satisfaction. ILS is calculated as
where n is the total number of items recommended, RL is the Top k recommended list to the user u and \(Sim_{i,j}\) is the similarity score between item i and item j. The similarity score used here is Cosine similarity as discussed in [20]. Higher score denotes lower diversity. If Sim used is normalized to the range 0 to 1, then ILD can be computed as
Next metric for evaluating the diversity of the system is Mean Novelty (MN) [14]. For each item, we calculate the fraction of users that have rated the item and obtain the information content of the item as the negative logarithm of the fraction. Let \(D_u(k)\) is the Top k ranking subset of items of the user u. Given topk items recommended to a user, we average the information content of all items to obtain novelty of the system to the user. The mean novelty of the system is calculated as the average novelty over all users as
where \(f_i\) is the fraction of users that have collected the \(i^{th}\) item and U is the set of users.
5.3 Improvement of Recommendation Accuracy and Utility
This section compares the efficiency of the proposed method \(USA\_ID\) with actual user similarity computed from the historical data as given in (2) and (3) which is referenced as UserSim, and PLUS, discussed in [14].
For each user TopK recommendations are considered for the discussion about the efficiency of the proposed approach. Experiments are done for three values of K namely 10, 20, and 50. The comprehensive comparison between the approaches is shown in Tables 1, 2 and 3 for Top10, Top20 and Top50 recommendations respectively. In Table 1, USA_ID with Pearson Correlation yields an improvement of 21 % and 15 % over UserSim and PLUS respectively on Precision measure. USA_ID with Cosine gives an improvement of 18 % and 8 % over UserSim and PLUS respectively on Precision measure.
USA_ID with Pearson Correlation provides an improvement of 8 % and 7 % over UserSim and PLUS respectively with respect to Recall measure. USA_ID with Cosine provides an improvement of 4 % and 2 % over UserSim and PLUS respectively with respect to Recall measure.
USA_ID with Pearson Correlation offers an improvement of 4.8 % and 4.4 % over UserSim and PLUS respectively with respect to ILD measure. USA_ID with Cosine offers an improvement of 5.5 % and 5 % respectively over UserSim and PLUS with respect to ILD measure.
USA_ID with Pearson Correlation yields an improvement of 55 % and 54 % over UserSim and PLUS respectively with respect to MN. USA_ID with cosine yields an improvement of 57 % and 56 % over UserSim and PLUS respectively with respect to MN. The similar improvements are reported in Tables 2 and 3 for Top20 and Top50 recommendations. We observe from the tables that USA_ID significantly improves the performance of the recommender systems in terms of accuracy measured by Precision, Recall as well as diversity measured by ILD, MN.
6 Conclusion
Novelty and Diversity in recommendations are considered as significant dimensions to attract users. This work presents an approach called USA_ID which modifies user similarity to reduce the negative impact of popular items of the domain. Standard user based Collaborative Filtering frame work is used to execute the prediction computations. Experiments are conducted on the standard data set Movielens. Results show that USA_ID is effective in improving accuracy and diversity of recommendations. The proposed method can further be investigated to check its applicability in Item based Collaborative Filtering framework. The modified similarity score of the users can be modeled as a graph, and the graph properties can be analyzed to improve the quality of predictions further.
References
Adamopoulos, P., Tuzhilin, A.: On unexpectedness in recommender systems: or how to expect the unexpected. In: Workshop on Novelty and Diversity in Recommender Systems (DiveRS 2011), at the 5th ACM International Conference on Recommender Systems (RecSys 2011), pp. 11–18, Chicago, Illinois, USA. ACM (2011)
Adomavicius, G., Kwon, Y.: Maximizing aggregate recommendation diversity: A graph-theoretic approach. In: Proceedings of the 1st International Workshop on Novelty and Diversity in Recommender Systems (DiveRS 2011), pp. 3–10 (2011)
Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)
Bell, R.M., Koren, Y.: Improved neighborhood-based collaborative filtering. In: KDD Cup and Workshop at the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007)
Bogers, T., Van Den Bosch, A.: Fusing recommendations for social bookmarking web sites. Int. J. Electron. Commer. 15(3), 31–72 (2011)
Brynjolfsson, E., Hu, Y., Simester, D.: Goodbye pareto principle, hello long tail: the effect of search costs on the concentration of product sales. Manage. Sci. 57(8), 1373–1386 (2011)
Brynjolfsson, E., Hu, Y., Smith, M.D.: Research commentary-long tails vs. superstars: the effect of information technology on product variety and sales concentration patterns. Inf. Syst. Res. 21(4), 736–747 (2010)
Burke, R.: Hybrid recommender systems: survey and experiments. User Mod. User-Adap. Interact. 12(4), 331–370 (2002)
Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336. ACM (1998)
Carrer-Neto, W., Hernández-Alcaraz, M.L., Valencia-García, R., García-Sánchez, F.: Social knowledge-based recommender system. Application to the movies domain. Expert Syst. Appl. 39(12), 10990–11000 (2012)
Castells, P., Vargas, S., Wang, J.: Novelty and diversity metrics for recommender systems: choice, discovery and relevance. In: International Workshop on Diversity in Document Retrieval (DDR 2011) at the 33rd European Conference on Information Retrieval (ECIR 2011), pp. 29–36. Citeseer (2011)
Chuanmin, M., Xiaofei, S., Jing, M., Xin, Z.: Collaborative filtering algorithm based on random walk with choice. sekeie-14, pp. 192–196 (2014)
Ekstrand, M.D., Riedl, J.T., Konstan, J.A.: Collaborative filtering recommender systems. Found. Trends Hum.-Comput. Interact. 4(2), 81–173 (2011)
Gan, M., Jiang, R.: Improving accuracy and diversity of personalized recommendation through power law adjustments of user similarities. Decis. Support Syst. 55(3), 811–821 (2013)
Herlocker, J.L., Konstan, J.A., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 230–237. ACM (1999)
Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. (TOIS) 22(1), 5–53 (2004)
Lathia, N., Hailes, S., Capra, L., Amatriain, X.: Temporal diversity in recommender systems. In: Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 210–217. ACM (2010)
Linden, G., Smith, B., York, J.: Amazon. com recommendations: Item-to-item collaborative filtering. Internet Comput. 7(1), 76–80 (2003)
Pazzani, M.J.: A framework for collaborative, content-based and demographic filtering. Artif. Intell. Rev. 13(5–6), 393–408 (1999)
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on World Wide Web, pp. 285–295. ACM (2001)
Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 257–297. Springer, New York (2011)
Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 2009, Article No. 4 (2009)
Vargas, S., Castells, P.: Rank and relevance in novelty and diversity metrics for recommender systems. In: Proceedings of the Fifth ACM Conference on Recommender Systems, pp. 109–116. ACM (2011)
Ziegler, C.N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the 14th International Conference on World Wide Web, pp. 22–32. ACM (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Latha, R., Nadarajan, R. (2015). User Similarity Adjustment for Improved Recommendations. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-26832-3_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)