Abstract
Constrained clustering algorithms have the advantage that domain-dependent constraints can be incorporated in clustering so as to achieve better clustering results. However, the existing constrained clustering algorithms are mostly k-means like methods, which may only deal with distance-based similarity measures. In this paper, we propose a constrained hierarchical clustering method, called Correlational-Constrained Complete Link (C-CCL), for gene expression analysis with the consideration of gene-pair constraints, while using correlation coefficients as the similarity measure. C-CCL was evaluated for the performance with the correlational version of COP-k-Means (C-CKM) method on a real yeast dataset. We evaluate both clustering methods with two validation measures and the results show that C-CCL outperforms C-CKM substantially in clustering quality.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised Clustering by Seeding. In: Proceedings of the 9th International Conference on Machine Learning, pp. 19–26 (2002)
Cho, S.B., Ryu, J.: Classifying Gene Expression Data of Cancer Using Classifier Ensemble with Mutually Exclusive Features. Proceedings of IEEE 90, 1744–1753 (2002)
Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P.O., Herskowitz, I.: The Transcriptional Program of Sporulation in Budding Yeast. Science 282, 699–705 (1998)
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
Davidson, I., Ravi, S.S.: Clustering With Constraints: Feasibility Issues and the k-Means Algorithm. In: Proceedings of the SIAM International Conference on Data Mining (2005)
Fisher, D.H.: Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning 2, 139–172 (1987)
Gordon, A.D.: Classification, 2nd edn. Monographs on Statistics and Applied Probability 82. Chapman and Hall/CRC, NY (1999)
Klein, D., Kamvar, S., Manning, C.: From Instance-level Constraints to Space-level Constraints: Making the Most of Prior Knowledge in Data Clustering. In: Proceedings of the 9th International Conference on Machine Learning, pp. 307–314 (2002)
Tseng, V.S., Kao, C.P.: Efficiently Mining Gene Expression Data via a Novel Parameterless Clustering Method. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 355–365 (2005)
Wagstaff, K., Cardie, C.: Clustering with Instance-level Constraints. In: 17th International Conference on Machine Learning, pp. 1103–1110 (2000)
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means Clustering with Background Knowledge. In: Proceedings of the 19th International Conference on Machine Learning, pp. 577–584 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tseng, V.S., Chen, LC., Kao, CP. (2008). Constrained Clustering for Gene Expression Data Mining. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2008. Lecture Notes in Computer Science(), vol 5012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68125-0_73
Download citation
DOI: https://doi.org/10.1007/978-3-540-68125-0_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68124-3
Online ISBN: 978-3-540-68125-0
eBook Packages: Computer ScienceComputer Science (R0)