Web user clustering and Web prefetching using Random Indexing with weight functions

Wan, Miao; Jönsson, Arne; Wang, Cong; Li, Lixiang; Yang, Yixian

doi:10.1007/s10115-011-0453-x

Web user clustering and Web prefetching using Random Indexing with weight functions

Regular Paper
Published: 30 November 2011

Volume 33, pages 89–115, (2012)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Knowledge and Information Systems Aims and scope Submit manuscript

Web user clustering and Web prefetching using Random Indexing with weight functions

Download PDF

Miao Wan^1,2,
Arne Jönsson¹,
Cong Wang²,
Lixiang Li² &
…
Yixian Yang²

411 Accesses
22 Citations
Explore all metrics

Abstract

Users of a Web site usually perform their interest-oriented actions by clicking or visiting Web pages, which are traced in access log files. Clustering Web user access patterns may capture common user interests to a Web site, and in turn, build user profiles for advanced Web applications, such as Web caching and prefetching. The conventional Web usage mining techniques for clustering Web user sessions can discover usage patterns directly, but cannot identify the latent factors or hidden relationships among users’ navigational behaviour. In this paper, we propose an approach based on a vector space model, called Random Indexing, to discover such intrinsic characteristics of Web users’ activities. The underlying factors are then utilised for clustering individual user navigational patterns and creating common user profiles. The clustering results will be used to predict and prefetch Web requests for grouped users. We demonstrate the usability and superiority of the proposed Web user clustering approach through experiments on a real Web log file. The clustering and prefetching tasks are evaluated by comparison with previous studies demonstrating better clustering performance and higher prefetching accuracy.

Article PDF

Scalable Textual Similarity Search on Large Document Collections Through Random Indexing and K-means Clustering

Modeling user interests from web browsing activities

Article 01 November 2016

LOH and Behold: Web-Scale Visual Search, Recommendation and Clustering Using Locally Optimized Hashing

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Anderson CR (2002) Amachine learning approach to web personalization. Ph.D. thesis, University of Washington
Ansari S, Kohavi R, Mason L, Zheng Z (2000) Integrating e-commerce and data mining: architecture and challenges. In: Proceedings of the 2001 IEEE international conference on data mining (ICDM 2001), pp 27–34
Berendt B (2002) Using site semantics to analyze, visualize, and support navigation. Data Min Knowl Discov 6(1): 37–59
Article MathSciNet Google Scholar
Berry MJA, Linoff G (1996) Data mining techniques for marketing, sales and customer support. Wiley, London
Google Scholar
Bezerra BLD, de Assis Tenório de Carvalho F (2010) Symbolic data analysis tools for recommendation systems. Knowl Inf Syst (on-line)
Bundschus M, Yu Sh, Tresp V, Rettinger A, Dejori M, Kriegel H-P (2009) Hierarchical bayesian models for collaborative tagging systems. In: Proceedings IEEE international conference on data mining (ICDM 2009), pp 728–733
Cadez I, Heckerman D, Meek C, Smyth P, Whire S (2002) Visualization of navigation patterns on a website using model based clustering. Technical Report MSR-TR-00-18, Microsoft Research
Catledge LD, Pitkow JE (1995) Characterizing browsing strategies in the world-wide web. Comput Netw ISDN Syst 27: 1065–1073
Article Google Scholar
Characteristics of WWW Client Traces, Cunha CA, Bestavros A, Crovella ME (1995) Boston University Department of Computer Science. Technical Report TR-95-010. http://ita.ee.lbl.gov/html/contrib/BU-Web-Client.html
Chatterjee N, Mohan S (2008) Discovering word senses from text using Random Indexing. In: Gelbukh A (ed) Computational linguistics and intelligent text processing (Lecture Notes in Computer Science), CICLing 2008, LNCS 4919, pp 299–310
Cooley R (2000) Web usage mining: discovery and application of interesting patterns from web data. Ph.D. thesis, University of Minnesota
Cooley R, Mobasher B, Srivastava J (1999) Data preparation for mining world wide web browsing patterns. J Knowl Inf Syst 1(1): 5–32
Google Scholar
Curran JR (2004) From distributional to semantic similarity. Ph.D. thesis, University of Edinburgh
Etzioni O (1996) The world-wide Web: quagmire or gold mine. Commun ACM 39(11): 65–68
Article Google Scholar
Feng S, Wang D, Yu G, Gao W, Wong K (2010) Extracting common emotions from blogs based on fine-grained sentiment clustering. Knowl Inf Syst 24(1). doi:10.1007/s10115-010-0325-9
Fu Y, Creado M, Ju C (2001) Reorganizing web sites based on user access patterns. In: Proceedings of the tenth international conference on information and knowledge management, pp 583–585
Gorman J, Curran JR (2006) Random Indexing using statistical weight functions. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP 2006), Sydney, Australia, pp 457–464
Halkidi M, Vazirgiannis M, Batistakis I (2000) Quality scheme assessment in the clustering process. In: Proceedings of the 4th European conference on principles and Practice of Knowledge Discovery in Databases (PKDD 2000), Lyon, France
Hou J, Zhang Y (2002) Constructing good quality web page communities. In: Proceedings of the 13th Australasian database conferences (ADC2002), vol 36. ACS Inc, Melbourne, pp 65–74
Hou J, Zhang Y (2003) Effectively finding relevant web pages from linkage information. IEEE Trans Knowl Data Eng 15(4): 940–951
Article Google Scholar
IBM (2003) SurfAid analytics. http://surfaid.dfw.ibm.com
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall
Jin X, Zhou Y, Mobasher B (2004) A unified approach to personalization based on probabilistic latent semantic models of web usage and content. In: Proceedings of the AAAI 2004 workshop on semantic web personalization (SWP’04), San Jose
Kanerva P, Kristofersson J, Holst A (2000) Random Indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd annual conference of the cognitive science society. Erlbaum, New Jersey, p 1036
Kanerva P (1988) Sparse distributed memory. The MIT Press, Cambridge
MATH Google Scholar
Kanerva P, Sjödin G, Kristofersson J, Karlsson R, Levin B, Holst A, Karlgren J, Sahlgren M (2001) Computing with large random patterns. In: Uesaka Y, Kanerva P, Asoh H (eds) Foundations of real-world intelligence. CSLI Publications, Stanford
Google Scholar
Kaski S (1999) Dimensionality reduction by random mapping: fast similarity computation for clustering. In Proceedings of the international joint conference on neural networks (IJCNN98), IEEE Service Center
Krishnapuram R, Joshi A, Nasraoui O, YI L (2003) Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Trans Fuzzy Syst 4(9): 596–607
Google Scholar
Lan B, Bressan S, Ooi BC, Tan K (2000) Rule-assisted prefetching in web server caching. In: Proceedings of 2000 ACM international conference on information and knowledge management (Virginia, USA), vol 1. ACM, New York, pp 504–11
Landauer T, Dumais S (1997) A solution to Platos problem: the latent semantic analysis theory for acquisition, induction and representation of knowledge. Psychol Rev 104(2): 211–240
Article Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, pp 281–297
Mobasher B, Cooley R, Srivastava J (2000) Automatic personalization based on web usage mining. Commun ACM 8(43): 142–151
Article Google Scholar
Nasraoui O, Frugui H, Krishnapuram R, Joshi A (2000) Extracting web user profiles using relational competitive fuzzy clustering. Int J Artif Intell Tools 4(9): 509–526
Article Google Scholar
Nanopoulos A, Katsaros D, Manolopoulos Y (2001) Effective prediction of web-user accesses: a data mining approach. In Proceedings of workshop web usage analysis and user profiling (WebKDD’01) (San Francisco, USA). ACM, New York
Oceans Research Group. Department of Computer Science, Boston University. http://cs-www.bu.edu/groups/oceans/Home.html
Paliouras G, Papatheodorou C, Karkaletsis V, Spyropoulos CD (2000) Clustering the users of large web sites into communities. In: Proceedings of the international conference on machine learning (ICML), pp 719–726
Pal SK, Ghosh A, Uma Shankar B (2000) Segmentation of remotely sensed images with fuzzy thresholding and quantitative evaluation. Int J Remote Sens 21(11): 2269–2300
Article Google Scholar
Sahlgren M, Karlgren J (2005) Automatic bilingual lexicon acquisition using Random Indexing of parallel corpora. J Nat Lang Eng (Special Issue on Parallel Texts)
Sahlgren M, Karlgren J (2005) Automatic bilingual lexicon acquisition using Random Indexing of parallel corpora. J Nat Lang Eng Special Issue Parallel Texts 11(3): 1–14
Google Scholar
Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. Academic Press, New York
MATH Google Scholar
Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison-Wesley, Reading
Google Scholar
The Internet Traffic Archive. http://ita.ee.lbl.gov/index.html
Teng W, Chang C, Chen M (2005) Integrating web caching and web prefetching in client-side proxies. IEEE Trans Parallel Distrib Syst 16: 444–455
Article Google Scholar
Tian W, Choi B, Phoha VV (2002) An adaptive web cache access predictor using neural network. In: Proceedings of 15th international conference on IEA/AIE (Cairns, Australia), vol 2358. Springer, Berlin, pp 450–459
Wan M, Li L, Xiao J, Yang Y, Wang C, Guo X (2010) CAS based clustering algorithm for Web users. Nonlinear Dyn 61(3): 347–361
Article MATH Google Scholar
Wu Y, Chen A (2002) Prediction of web page accesses by proxy server log. World Wide Web 5: 67–88
Article MATH Google Scholar
Xie Y, Phoha VV (2001) Web user clustering from access log using belief function. In: Proceedings of the 1st international conference on Knowledge capture, pp 202–208
Yang S, Li Y, Wu X, Pan R (2006) Optimization study on k value of K-means algorithm. J Syst Simul 18(3): 97–101
Google Scholar
Zhou Y, Jin X, Mobasher B (2004) A recommendation model based on latent principal factors in web navigation data. In: Proceedings of the 3rd international workshop on web dynamics. ACM Press, New York

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, Linköping University, 581 83, Linköping, Sweden
Miao Wan & Arne Jönsson
Information Security Center, Beijing University of Posts and Telecommunications, P.O. Box 145, 100876, Beijing, China
Miao Wan, Cong Wang, Lixiang Li & Yixian Yang

Authors

Miao Wan
View author publications
You can also search for this author in PubMed Google Scholar
Arne Jönsson
View author publications
You can also search for this author in PubMed Google Scholar
Cong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lixiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Yixian Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miao Wan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wan, M., Jönsson, A., Wang, C. et al. Web user clustering and Web prefetching using Random Indexing with weight functions. Knowl Inf Syst 33, 89–115 (2012). https://doi.org/10.1007/s10115-011-0453-x

Download citation

Received: 30 June 2010
Revised: 01 July 2011
Accepted: 22 October 2011
Published: 30 November 2011
Issue Date: October 2012
DOI: https://doi.org/10.1007/s10115-011-0453-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Web user clustering and Web prefetching using Random Indexing with weight functions

Abstract

Article PDF

Similar content being viewed by others

Scalable Textual Similarity Search on Large Document Collections Through Random Indexing and K-means Clustering

Modeling user interests from web browsing activities

LOH and Behold: Web-Scale Visual Search, Recommendation and Clustering Using Locally Optimized Hashing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Web user clustering and Web prefetching using Random Indexing with weight functions

Abstract

Article PDF

Similar content being viewed by others

Scalable Textual Similarity Search on Large Document Collections Through Random Indexing and K-means Clustering

Modeling user interests from web browsing activities

LOH and Behold: Web-Scale Visual Search, Recommendation and Clustering Using Locally Optimized Hashing

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation