AutoPart: Parameter-Free Graph Partitioning and Outlier Detection

Chakrabarti, Deepayan

doi:10.1007/978-3-540-30116-5_13

Deepayan Chakrabarti²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3202))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

3035 Accesses
56 Citations

Abstract

Graphs arise in numerous applications, such as the analysis of the Web, router networks, social networks, co-citation graphs, etc. Virtually all the popular methods for analyzing such graphs, for example, k-means clustering, METIS graph partitioning and SVD/PCA, require the user to specify various parameters such as the number of clusters, number of partitions and number of principal components. We propose a novel way to group nodes, using information-theoretic principles to choose both the number of such groups and the mapping from nodes to groups. Our algorithm is completely parameter-free, and also scales practically linearly with the problem size. Further, we propose novel algorithms which use this node group structure to get further insights into the data, by finding outliers and computing distances between groups. Finally, we present experiments on multiple synthetic and real-life datasets, where our methods give excellent, intuitive results.

This material is based upon work supported by the National Science Foundation under Grants No. IIS-9817496, IIS-9988876, IIS-0083148, IIS-0113089, IIS-0209107 IIS-0205224 INT-0318547 SENSOR-0329549 EF-0331657 IIS-0326322 by the Pennsylvania Infrastructure Technology Alliance (PITA) Grant No. 22-901-0001, and by the Defense Advanced Research Projects Agency under Contract No. N66001-00-1-8936. Additional funding was provided by donations from Intel, and by a gift from Northrop-Grumman Corporation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, or other funding parties.

Download to read the full chapter text

Chapter PDF

The Common-Neighbors Metric Is Noise-Robust and Reveals Substructures of Real-World Networks

A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability

Spectral Clustering and Block Models: A Review and a New Algorithm

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: Proc. SC 1998, pp. 1–13 (1998)
Google Scholar
Ñg, A.Y., J̃ordan, M.I., W., Y.: On spectral clustering: Analysis and an algorithm. In: Proc. NIPS, pp. 849–856 (2001)
Google Scholar
van Dongen, S.M.: Graph clustering by flow simulation. PhD thesis, Univesity of Utrecht (2000)
Google Scholar
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA. 99 (2002)
Google Scholar
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification ofWeb communities. In: KDD (2000)
Google Scholar
Zhang, B., Hsu, M., Dayal, U.: K-harmonic means - a spatial clustering algorithm with boosting. In: Roddick, J., Hornsby, K.S. (eds.) TSDM 2000. LNCS (LNAI), vol. 2007, pp. 31–45. Springer, Heidelberg (2001)
Chapter Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. JASI 41, 391–407 (1990)
Article Google Scholar
Pelleg, D., Moore, A.: X-means: Extending K-means with efficient estimation of the number of clusters. In: Proc. 17th ICML, pp. 727–734 (2000)
Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proc. 9th KDD, pp. 89–98 (2003)
Google Scholar
Mishra, N., Ron, D., Swaminathan, R.: On finding large conjunctive clusters. In: Proc. 16th COLT, pp. 448–462 (2003)
Google Scholar
Reddy, P.K., Kitsuregawa, M.: An approach to relate the web communities through bipartite graphs. In: Proc. 2nd WISE, pp. 302–310 (2001)
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Article MATH Google Scholar
Rissanen, J.: Universal prior for integers and estimation by minimum description length. Annals of Statistics 11, 416–431 (1983)
Article MATH MathSciNet Google Scholar
Watts, D.J.: Small Worlds: The Dynamics of Networks between Order and Randomness. Princeton Univ. Press, Princeton (1999)
Google Scholar
Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: KDD, Edmonton, Canada, pp. 61–70 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University,
Deepayan Chakrabarti

Authors

Deepayan Chakrabarti
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Dipartimento di Informatica, Università degli Studi di Bari,
Floriana Esposito
Pisa KDD Laboratory, ISTI - CNR, Area della Ricerca di Pisa, Via Giuseppe Moruzzi 1, Pisa, Italy
Fosca Giannotti
Dipartimento di Informatica, Via F. Buonarroti 2, 56127, Pisa, Italy
Dino Pedreschi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chakrabarti, D. (2004). AutoPart: Parameter-Free Graph Partitioning and Outlier Detection. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-30116-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

AutoPart: Parameter-Free Graph Partitioning and Outlier Detection

Abstract

Chapter PDF

Similar content being viewed by others

The Common-Neighbors Metric Is Noise-Robust and Reveals Substructures of Real-World Networks

A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability

Spectral Clustering and Block Models: A Review and a New Algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

AutoPart: Parameter-Free Graph Partitioning and Outlier Detection

Abstract

Chapter PDF

Similar content being viewed by others

The Common-Neighbors Metric Is Noise-Robust and Reveals Substructures of Real-World Networks

A Statistical Test of Heterogeneous Subgraph Densities to Assess Clusterability

Spectral Clustering and Block Models: A Review and a New Algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation