Abstract
Clustering web users based on their access patterns is a quite significant task in Web Usage Mining. Further to clustering it is important to evaluate the resulted clusters in order to choose the best clustering for a particular framework. This paper examines the usage of Kullback-Leibler divergence, an information theoretic distance, in conjuction with the k-means clustering algorithm. It compares KL-divergence with other well known distance measures (Euclidean, Standardized Euclidean and Manhattan) and evaluates clustering results using both objective function’s value and Davies-Bouldin index. Since it is imperative to assess whether the results of a clustering process are susceptible to noise, especially in noisy environments such as Web environment, our approach takes the impact of noise into account. The clusters obtained with KL approach seem to be superior to those obtained with the other distance measures in case our data have been corrupted by noise.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. In: SIGKDD Exploratios, vol. 1(2) (January 2000)
Petridou, S., Pallis, G., Vakali, A., Papadimitriou, G., Pomportsis, A.: Web Data Accessing and the Web Searching Process. In: ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 2003), Tunis, Tunisia, July 14-18 (2003)
Vakali, A., Papadimitriou, G.: Web Engineering: The Evolution of New Technologies. Guest Editorial in IEEE Computing in Science and Engineering 6(4), 10–11 (2004)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
McQueen, J.B.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proc. 5th Berkley Symposium on Mathematical Statistics and Probability, Statistics, vol. I, pp, 281–297 (1994)
Kerr, M.K., Churchill, G.A.: Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments. PNAS 98, 8961–8965 (2001)
Stein, B., Eissen, S.M.Z., Wißbrock, F.: On Cluster Validity and the Information Need of Users. In: 3rd IASTED Int. Conference on Artificial Intelligence and Applications (AIA 2003) (2003)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering Validity Checking Methods: Part II. In: SIGMOD Record, vol. 31(3) (September 2002)
Kasturi, J., Acharya, R., Ramanathan, M.: An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics 19(4), 449–458 (2003)
Sturn, A.: Cluster analysis for large scale gene expression studies. Master’s thesis, Graz University of Technology, Graz, Austria (2001)
Dhillon, I.S., Mallela, S., Kumar, R.: Enchanced Word Clustering for Hierarchical Text Classification. In: KDD 2002, pp. 191–200 (2002)
Dhillon, I.S., Mallela, S., Kumar, R.: Information Theoretic Feature Clustering for Text Classification. Journal of Machine Learning Research 3, 1265–1287 (2003)
Boutin, F., Hascoer, M.: Cluster Validity Indices for Graph Partitioning. In: Proceedings of the Eighth International Conference on Information Visualisation (IV 2004), 1093-9547/04 IEEE (2004)
Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4(3), 95–104 (1974)
Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Learning 1(2) (1979)
Larsen, B., Aone, C.: Fast and Effective: Text Mining Using Linear-time Document Clustering. In: Proc. KDD 1999 Workshop, San Diego, CA, USA (1999)
Mobasher, B., Cooley, R., Srivastava, J.: Creating Adaptive Web Sites Through Usage-Based Clustering of URLs. In: Proccedings of the 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX 1999) (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Petridou, S.G., Koutsonikola, V.A., Vakali, A.I., Papadimitriou, G.I. (2006). A Divergence-Oriented Approach for Web Users Clustering. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751588_130
Download citation
DOI: https://doi.org/10.1007/11751588_130
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34072-0
Online ISBN: 978-3-540-34074-4
eBook Packages: Computer ScienceComputer Science (R0)