Measuring Constraint-Set Utility for Partitional Clustering Algorithms

Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato

doi:10.1007/11871637_15

Ian Davidson²¹,
Kiri L. Wagstaff²² &
Sugato Basu²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4213))

Included in the following conference series:

European Conference on Principles of Data Mining and Knowledge Discovery

4045 Accesses
74 Citations

Abstract

Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves performance, with respect to the true data labels. However, in most of these experiments, results are averaged over different randomly chosen constraint sets, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.

Download to read the full chapter text

Chapter PDF

Partition-Based Clustering Using Constraint Optimization

Constraint-based clustering selection

Article 05 June 2017

Constrained Clustering: Current and New Trends

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning (2001)
Google Scholar
Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: Proceedings of the Nineteenth International Conference on Machine Learning (2002)
Google Scholar
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. NIPS 15 (2003)
Google Scholar
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA (2004)
Google Scholar
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research 6 (2005)
Google Scholar
Wagstaff, K.L.: Intelligent Clustering with Instance-Level Constraints. PhD thesis, Cornell University (2002)
Google Scholar
Lu, Z., Leen, T.K.: Semi-supervised learning with penalized probabilistic clustering. In: Advances in Neural Information Processing Systems 17 (2005)
Google Scholar
Davidson, I., Ravi, S.S.: Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proceedings of the 2005 SIAM International Conference on Data Mining (2005)
Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(366) (1971)
Google Scholar

Download references

Author information

Authors and Affiliations

State University of New York, Albany, NY, 12222, USA
Ian Davidson
Jet Propulsion Laboratory, Pasadena, CA, 91109, USA
Kiri L. Wagstaff
SRI International, Menlo Park, CA, 94025, USA
Sugato Basu

Authors

Ian Davidson
View author publications
You can also search for this author in PubMed Google Scholar
Kiri L. Wagstaff
View author publications
You can also search for this author in PubMed Google Scholar
Sugato Basu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Davidson, I., Wagstaff, K.L., Basu, S. (2006). Measuring Constraint-Set Utility for Partitional Clustering Algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science(), vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_15

Download citation

DOI: https://doi.org/10.1007/11871637_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45374-1
Online ISBN: 978-3-540-46048-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Measuring Constraint-Set Utility for Partitional Clustering Algorithms

Abstract

Chapter PDF

Similar content being viewed by others

Partition-Based Clustering Using Constraint Optimization

Constraint-based clustering selection

Constrained Clustering: Current and New Trends

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Measuring Constraint-Set Utility for Partitional Clustering Algorithms

Abstract

Chapter PDF

Similar content being viewed by others

Partition-Based Clustering Using Constraint Optimization

Constraint-based clustering selection

Constrained Clustering: Current and New Trends

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation