Abstract
Data collected for collaborative filtering (CF) purposes might be cross distributed between two online vendors, even competing companies. Such corporations might want to integrate their data to provide more precise and reliable recommendations. However, due to privacy, legal, and financial concerns, they do not desire to disclose their private data to each other. If privacy-preserving measures are introduced, they might decide to generate predictions based on their distributed data collaboratively. In this study, we investigate how to offer hybrid CF-based referrals with decent accuracy on cross distributed data (CDD) between two e-commerce sites while maintaining their privacy. Our proposed schemes should prevent data holders from learning true ratings and rated items held by each other while still allowing them to provide accurate CF services efficiently. We perform real data-based experiments to evaluate our proposals in terms of accuracy. The results show that the proposed methods are able to provide precise predictions. Moreover, we analyze our schemes in terms of privacy and supplementary costs. We demonstrate that our schemes are secure, and online overhead costs due to privacy concerns are insignificant.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aggarwal, C, Yu, PS (eds) (2008) Privacy-preserving data mining: models and algorithms. Springer Science + Business Media, NY
Amirbekyan A, Estivill-Castro V (2009) Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets. Knowl Inf Syst 21(3): 327–363
Bansal A, Chen T, Zhong S (2010) Privacy-preserving back-propagation neural network learning over arbitrarily partitioned data. Neural Comput Appl 20(1): 143–150
Bhowmick SS, Gruenwald L, Iwaihara M et al. (2006) PRIVATE-IYE: a framework for privacy-preserving data integration. In: Proceedings of the 22nd international conference on data engineering workshops. Atlanta, GA, April 2006, p 91
Canny J (2002) Collaborative filtering with privacy via factor analysis. In: Proceedings of the international ACM SIGIR conference. Tampere, Finland, August 2002, pp 238–245
Canny J (2002a) Collaborative filtering with privacy. In: Proceedings of the IEEE symposium on security and privacy. Oakland, CA, pp 45–57
Chang J, Hung LP, Ho CL (2007) An anticipation model of potential customers’ purchasing behavior based on clustering analysis and association rules analysis. Expert Syst Appl 32(3): 753–764
Clifton C, Doan A, Elmagarmid A et al. (2004) Privacy-preserving data integration and sharing. In: Proceedings of the 9th ACM SIGMOD workshop on research issues in data mining and knowledge discovery. Paris, France, June 2004, pp 19–26
Duan Y, Canny J (2008) Practical private computation and zero-Knowledge tools for privacy-preserving distributed data mining. In: Proceedings of SDM 2008 confererence. Atlanta, GA, USA, April 2008, pp 265–276
Even S, Goldreich O, Lempel A (1985) A randomized protocol for signing contracts. Commun ACM 28: 637–647
Evfimievski A (2002) Randomization in privacy-preserving data mining. SIGKDD Explor 4(2): 43–48
Goldberg K, Roeder T, Gupta D et al (2001) Eigentaste: a constant time collaborative filtering algorithm. Inf Retr 4(2): 133–151
Gupta D, Digiovanni M, Narita H et al (1999) Jester 2.0: a new linear-time collaborative filtering algorithm applied to jokes. In: Proceedings of the workshop on recommender systems: algorithms and evaluation, international ACM SIGIR conference. Berkeley, CA, USA, August 1999, pp 291–292
Han S, Ng WK (2007) Multiparty privacy-preserving decision trees for arbitrarily partitioned data. Int J Intell Control Syst 12(4): 351–358
Herlocker JL, Konstan JA, Borchers A et al (1999) An algorithmic framework for performing collaborative filtering. In: Proceedings of the ACM SIGIR conference. Berkeley, CA, USA, pp 230–237
Huang CY, Shen YC, Chiang IP et al (2007) Characterizing web users’ online information behavior. J Am Soc Inf Sci Technol 58(13): 1988–1997
Inan A, Kaya SV, Saygin Y et al (2007) Privacy-preserving clustering on horizontally partitioned data. Data Knowl Eng 63(3): 646–666
Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining. Chicago, IL, USA, August 2005, pp 593–599
Kaleli C, Polat H (2007) Providing naïve Bayesian classifier-based private recommendations on partitioned data. Lecture Notes in Computer Science 4702: 515–522
Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. Trans Knowl Data Eng 16(9): 1026–1037
Kantarcioglu M, Clifton C (2004) privately computing a distributed k−nn classifier. Lecture Notes in Computer Science 3202: 279–290
Kantarcioglu M, Vaidya JS (2003) Privacy-preserving naïve bayes classifier for horizontally partitioned data. In: Proceedings of the IEEE ICDM workshop on privacy preserving data mining. Melbourne, FL, USA, November 2003, pp 3–9
Kargupta H, Das K, Liu K (2007) Multi-party privacy-preserving distributed data mining using a game theoretic framework. Lecture Notes in Computer Science 4702: 523–531
Kaya SV, Pedersen TB, Savas E et al (2009) Efficient Privacy-preserving Distributed Clustering based on Secret Sharing. Lecture Notes in Computer Science 4819: 280–291
Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of KDD 2008, Las Vegas, NV, USA, August 2008, pp 426–434
Liang Z, Bo X, Jun G (2008) A hybrid approach to collaborative filtering for overcoming data sparsity. In: Proceedings of the 9th international conference on signal processing. Beijing, China, October 2008, pp 1595–1599
Lin X, Clifton C, Zhu M (2005) Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inf Syst 8(1): 68–81
Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data perturbation for privacy-preserving distributed data mining. Trans Knowl Data Eng 18(1): 92–106
Liu P, Chetal A (2005) Trust-based secure information sharing between federal government agencies. J Am Soc Inf Sci Technol 56(3): 283–298
Luo H, Fan J, Lin X et al (2009) A distributed approach to enabling privacy-preserving model-based classifier training. Knowl Inf Syst 20(2): 157–185
Merugu S, Ghosh J (2003) Privacy-preserving distributed clustering using generative models. In: Proceedings of the 3rd IEEE international conference on data mining. Melbourne, FL, USA, November 2003, pp 211–218
Naor M, Pinkas B (1999) Oblivious transfer and polynomial evaluation. In: Proceedings of the 31st ACM symposium on theory of computing. Atlanta, GA, USA, May 1999, pp 245–254
Paillier P (1999) Public-key cryptosystems based on composite degree residue classes. Lecture Notes in Computer Science 1592: 223–238
Pennock DM, Horvitz E, Lawrence S et al (2000) Collaborative filtering by personality diagnosis: a hybrid memory- and model-based approach. In: Proceedings of the 16th conference on uncertainty in artificial intelligence. Stanford, CA, USA, July 2000, pp 473–480
Pinkas B (2002) Cryptographic techniques for privacy-preserving data mining. SIGKDD Explor 4(2): 12–19
Prasad PK, Rangan CP (2007) Privacy-preserving BIRCH algorithm for clustering over arbitrarily partitioned databases. In: Proceedings of the ADMA 2007 conference. Harbin, China, August 2007, pp 146–157
Polat H, Du W (2005) Privacy-preserving collaborative filtering on vertically partitioned data. Lecture Notes in Computer Science 3721: 651–658
Polat H, Du W (2008) Privacy-preserving top-N recommendation on distributed data. J Am Soc Inf Sci Technol 59(7): 1093–1108
Qiu L, Li Y, Wu X (2008) Protecting business intelligence and customer privacy while outsourcing data mining tasks. Knowl Inf Syst 17(1): 99–120
Rozenberg B, Gudes E (2006) Association rules mining in vertically partitioned databases. Data Knowl Eng 59(2): 378–396
Sarwar BM, Karypis G, Konstan JA et al (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international world wide web conference. Hong Kong, May 2001, pp 285–295
Shapira B, Elovici Y, Meshiach A et al (2005) PRAW—a PRivAcy model for the web. J Am Soc Inf Sci Technol 56(2): 159–172
Su C, Bao F, Zhou J et al (2007) Privacy-preserving two-party k-means clustering via secure approximation. In: Proceedings of the 21st international conference on advanced information networking and applications workshops. Niagara Falls, Ontario, Canada, May 2007, pp 385–391
Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell Vol:2009
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-based Syst 10(5): 557–570
Teng Z, Du W (2009) A hybrid multi-group approach for privacy-preserving data mining. Knowl Inf Syst 19(2): 133–157
Vaidya JS, Clifton C, Kantarcioglu M et al (2008) Privacy-preserving decision trees over vertically partitioned data. ACM Trans Knowl Discov Data 2(3): 1–27
Vaidya JS, Clifton C (2002) Privacy-preserving association rule mining in vertically partitioned data. In: Proceedings of the 8th ACM SIGKDD international conference. Edmonton, Alberta, Canada, July 2002, pp 639–644
Vaidya JS (2004) Privacy-preserving data mining over vertically partitioned data. PhD thesis, Purdue University, West Lafayette, IN, USA
Van den Poel D, Buckinx W (2005) Predicting online purchasing behavior. Eur J Oper Res 166: 557–575
Wright RN, Yang Z (2004) Privacy-preserving Bayesian network structure computation on distributed heterogeneous data. In: Proceedings of the 10th ACM SIGKDD international conference, Seattle, WA, USA, August 2004, pp 703–718
Yakut I, Polat H (2010) Privacy-preserving SVD-based collaborative filtering on partitioned data. Int J Inf Tech Decis Mak 9(3): 473–502
Yang W, Huang S (2008) Data privacy protection in multi-party clustering. Data Knowl Eng 67: 185–199
Yi X, Zhang Y (2009) Privacy-preserving naïve bayes classification on distributed data via semi-trusted mixers. Inf Syst 34(3): 371–380
Yi X, Zhang Y (2007) Privacy-preserving distributed association rule mining via semi-trusted mixer. Data Knowl Eng 63(2): 550–567
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yakut, I., Polat, H. Privacy-preserving hybrid collaborative filtering on cross distributed data. Knowl Inf Syst 30, 405–433 (2012). https://doi.org/10.1007/s10115-011-0395-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0395-3