Abstract
Since minimum sum-of-squares clustering (MSSC) is an NP-hard combinatorial optimization problem, applying techniques from global optimization appears to be promising for reliably clustering numerical data. In this paper, concepts of combinatorial heuristic optimization are considered for approaching the MSSC: An iterated local search (ILS) approach is proposed which is capable of finding (near-)optimum solutions very quickly. On gene expression data resulting from biological microarray experiments, it is shown that ILS outperforms multi–start k-means as well as three other clustering heuristics combined with k-means.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brucker, P.: On the Complexity of Clustering Problems. Lecture Notes in Economics and Mathematical Systems 157, 45–54 (1978)
Grötschel, M., Wakabayashi, Y.: A Cutting Plane Algorithm for a Clustering Problem. Mathematical Programming 45, 59–96 (1989)
Hansen, P., Jaumard, B.: Cluster Analysis and Mathematical Programming. Mathematical Programming 79, 191–215 (1997)
Zhang, M.: Large-scale Gene Expression Data Analysis: A New Challenge to Computational Biologists. Genome Research 9, 681–688 (1999)
Brazma, A., Vilo, J.: Gene Expression Data Analysis. FEBS Letters 480, 17–24 (2000)
Eisen, M., Spellman, P., Botstein, D., Brown, P.: Cluster Analysis and Display of Genome-wide Expression Patterns. In: Proceedings of the National Academy of Sciences, USA, vol. 95, pp. 14863–14867 (1998)
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic Determination of Genetic Network Architecture. Nature Genetics 22, 281–285 (1999)
Yeung, K., Haynor, D., Ruzzo, W.: Validating Clustering for Gene Expression Data. Bioinformatics 17, 309–318 (2001)
Bradley, P.S., Fayyad, U.M.: Refining Initial Points for k-Means Clustering. In: Proc. 15th International Conf. on Machine Learning, pp. 91–99. Morgan Kaufmann, San Francisco (1998)
Penã, J.M., Lozano, J.A., Larranãga, P.: An Empirical Comparison of Four Initialization Methods for the k-Means Algorithm. Pattern Recognition Letters 20, 1027–1040 (1999)
Johnson, D.S., McGeoch, L.A.: The Traveling Salesman Problem: A Case Study. In: Aarts, E.H.L., Lenstra, J.K. (eds.) Local Search in Combinatorial Optimization, pp. 215–310. Wiley and Sons, New York (1997)
Lourenco, H.R., Martin, O., Stützle, T.: Iterated Local Search. In: Glover, F., Kochenberger, G. (eds.) Handbook of Metaheuristics. Kluwer Academic Publishers, Dordrecht (2003)
Moscato, P.: Memetic Algorithms: A Short Introduction. In: Corne, D., Dorigo, M., Glover, F. (eds.) New Ideas in Optimization, pp. 219–234. McGraw-Hill, New York (1999)
Merz, P., Freisleben, B.: Memetic Algorithms for the Traveling Salesman Problem. Complex Systems 13, 297–345 (2001)
Forgy, E.W.: Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classifications. Biometrics 21, 768–769 (1965)
MacQueen, J.: Some Methods of Classification and Analysis of Multivariate Observations. In: Proceedings of the Fifth Berkeley Symposium on Mathemtical Statistics and Probability, pp. 281–297 (1967)
Alsabti, K., Ranka, S., Singh, V.: An Efficient Space-Partitioning Based Algorithm for the k-Means Clustering. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 355–359. Springer, Berlin (1999)
Pelleg, D., Moore, A.: Accelerating Exact k-Means Algorithms with Geometric Reasoning. In: Chaudhuri, S., Madigan, D. (eds.) Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281. ACM Press, New York (1999)
Likas, A., Vlassis, N., Verbeek, J.J.: The Global k-Means Clustering Algorithm. Pattern Recognition (36)
Pelleg, D., Moore, A.: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Proc. 17th International Conf. on Machine Learning, pp. 727–734. Morgan Kaufmann, San Francisco (2000)
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting Patterns of Gene Expression with Selforganizing Maps: Methods and Application to Hematopoietic Differentiation. In: Proceedings of the National Academy of Sciences, USA, vol. 96, pp. 2907–2912 (1999)
Cho, R.J., Campbell, M.J., Winzeler, E.A., Conway, S., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A Genomewide Transcriptional Analysis of the Mitotic Cell Cycle. Molecular Cell 2, 65–73 (1998)
Merz, P., Zell, A.: Clustering Gene Expression Profiles with Memetic Algorithms. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 811–820. Springer, Berlin (2002)
Xu, Y., Olman, V., Xu, D.: Clustering Gene Expression Data using a Graph- Theoretic Approach: An Application of Minimum Spanning Trees. Bioinformatics 18, 536–545 (2002)
Merz, P., Freisleben, B.: Fitness Landscapes, Memetic Algorithms and Greedy Operators for Graph Bi-Partitioning. Evolutionary Computation 8, 61–91 (2000)
Merz, P., Katayama, K.: Memetic Algorithms for the Unconstrained Binary Quadratic Programming Problem. Bio Systems (2002) (to appear)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Merz, P. (2003). An Iterated Local Search Approach for Minimum Sum-of-Squares Clustering. In: R. Berthold, M., Lenz, HJ., Bradley, E., Kruse, R., Borgelt, C. (eds) Advances in Intelligent Data Analysis V. IDA 2003. Lecture Notes in Computer Science, vol 2810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45231-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-45231-7_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40813-0
Online ISBN: 978-3-540-45231-7
eBook Packages: Springer Book Archive