Abstract
Most existing personalization systems rely on site-centric user data, in which the inputs available to the system are the user’s behaviors on a specific site. We use a dataset supplied by a major audience measurement company that represents a complete user-centric view of clickstream behavior. Using the supplied product purchase metadata to set up a prediction problem, we learn models of the user’s probability of purchase within a time window for multiple product categories by using features that represent the user’s browsing and search behavior on all websites. As a baseline, we compare our results to the best such models that can be learned from site-centric data at a major search engine site. We demonstrate substantial improvements in accuracy with comparable and often better recall. A novel behaviorally (as opposed to syntactically) based search term suggestion algorithm is also proposed for feature selection of clickstream data. Finally, our models are not privacy invasive. If deployed client-side, our models amount to a dynamic “smart cookie” that is expressive of a user’s individual intentions with a precise probabilistic interpretation.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Banerjee, A., Ghosh, J.: Clickstream clustering using weighted longest common subsequences. In: Proc. of the Web Mining Workshop at the 1 st SIAM Conference on Data Mining, Chicago (2001)
Gunduz, S., Ozsu, M.: A web page prediction model based on click-stream tree representation of user behavior. In: KDD 2003, pp. 535–540 (2003)
Huang, J., Lu, J., Ling, C.X.: Comparing naive bayes, decision trees, and svm with auc and accuracy. In: ICDM 2003, p. 553 (2003)
Li, K., Qu, W., Shen, H., Wu, D., Nanya, T.: Two cache replacement algorithms based on association rules and markov models. In: SKG, p. 28 (2005)
Lukose, R., Li, J., Zhou, J., Penmetsa, S.R.: Learning user purchase intent from user-centric data. Technical report, Hewlett-Packard Labs (2008)
Moe, W.W., Fader, P.S.: Dynamic conversion behavior at e-commerce sites. Management Science 50(3), 326–335 (2004)
Montgomery, A.L., Li, S., Srinivasan, K., Liechty, J.C.: Modeling online browsing and path analysis using clickstream data. Marketing Science 23(4), 579–595 (2004)
Padmanabhan, B., Zheng, Z., Kimbrough, S.O.: Personalization from incomplete data: what you don’t know can hurt. In: KDD 2001, pp. 154–163 (2001)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Teevan, J., Dumais, S.T., Horvitz, E.: Personalizing search via automated analysis of interests and activities. In: SIGIR 2005, pp. 449–456 (2005)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lukose, R., Li, J., Zhou, J., Penmetsa, S.R. (2008). Learning User Purchase Intent from User-Centric Data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_63
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)