Abstract
Graphs arise in numerous applications, such as the analysis of the Web, router networks, social networks, co-citation graphs, etc. Virtually all the popular methods for analyzing such graphs, for example, k-means clustering, METIS graph partitioning and SVD/PCA, require the user to specify various parameters such as the number of clusters, number of partitions and number of principal components. We propose a novel way to group nodes, using information-theoretic principles to choose both the number of such groups and the mapping from nodes to groups. Our algorithm is completely parameter-free, and also scales practically linearly with the problem size. Further, we propose novel algorithms which use this node group structure to get further insights into the data, by finding outliers and computing distances between groups. Finally, we present experiments on multiple synthetic and real-life datasets, where our methods give excellent, intuitive results.
This material is based upon work supported by the National Science Foundation under Grants No. IIS-9817496, IIS-9988876, IIS-0083148, IIS-0113089, IIS-0209107 IIS-0205224 INT-0318547 SENSOR-0329549 EF-0331657 IIS-0326322 by the Pennsylvania Infrastructure Technology Alliance (PITA) Grant No. 22-901-0001, and by the Defense Advanced Research Projects Agency under Contract No. N66001-00-1-8936. Additional funding was provided by donations from Intel, and by a gift from Northrop-Grumman Corporation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: Proc. SC 1998, pp. 1–13 (1998)
Ñg, A.Y., J̃ordan, M.I., W., Y.: On spectral clustering: Analysis and an algorithm. In: Proc. NIPS, pp. 849–856 (2001)
van Dongen, S.M.: Graph clustering by flow simulation. PhD thesis, Univesity of Utrecht (2000)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA. 99 (2002)
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification ofWeb communities. In: KDD (2000)
Zhang, B., Hsu, M., Dayal, U.: K-harmonic means - a spatial clustering algorithm with boosting. In: Roddick, J., Hornsby, K.S. (eds.) TSDM 2000. LNCS (LNAI), vol. 2007, pp. 31–45. Springer, Heidelberg (2001)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. JASI 41, 391–407 (1990)
Pelleg, D., Moore, A.: X-means: Extending K-means with efficient estimation of the number of clusters. In: Proc. 17th ICML, pp. 727–734 (2000)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proc. 9th KDD, pp. 89–98 (2003)
Mishra, N., Ron, D., Swaminathan, R.: On finding large conjunctive clusters. In: Proc. 16th COLT, pp. 448–462 (2003)
Reddy, P.K., Kitsuregawa, M.: An approach to relate the web communities through bipartite graphs. In: Proc. 2nd WISE, pp. 302–310 (2001)
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Rissanen, J.: Universal prior for integers and estimation by minimum description length. Annals of Statistics 11, 416–431 (1983)
Watts, D.J.: Small Worlds: The Dynamics of Networks between Order and Randomness. Princeton Univ. Press, Princeton (1999)
Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: KDD, Edmonton, Canada, pp. 61–70 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chakrabarti, D. (2004). AutoPart: Parameter-Free Graph Partitioning and Outlier Detection. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive