Abstract
Given a set of entities associated with points in Euclidean space, minimum sum-of-squares clustering (MSSC) consists in partitioning this set into clusters such that the sum of squared distances from each point to the centroid of its cluster is minimized. A column generation algorithm for MSSC was given by du Merle et al. in SIAM Journal Scientific Computing 21:1485–1505. The bottleneck of that algorithm is the resolution of the auxiliary problem of finding a column with negative reduced cost. We propose a new way to solve this auxiliary problem based on geometric arguments. This greatly improves the efficiency of the whole algorithm and leads to exact solution of instances with over 2,300 entities, i.e., more than 10 times as much as previously done.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aloise D., Deshpande A., Hansen P., Popat P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75, 245–249 (2009)
Aloise D., Hansen P.: A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering. Pesquisa Operacional 29, 503–516 (2009)
Aloise, D., Hansen, P.: Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering. To appear in J. Glob. Optim. (2010)
An L.T., Belghiti M.T., Tao P.D.: A new efficient algorithm based on DC programming and DCA for clustering. J. Glob. Optim. 37, 593–608 (2007)
Asuncion, A., Newman, D.J.: UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html. (2007)
Bagirov A.M.: Modified global k-means algorithm for minimum sum-of-squares clustering problems. Pattern Recognit. 41, 3192–3199 (2008)
Bagirov A.M., Yearwoord J.: Hierarchical grouping to optimize an objective function. Eur. J. Oper. Res. 170, 578–596 (2006)
Bonami, P., Lee, J.: BONMIN user’s manual. Technical report, IBM Corporation, June (2007)
Brusco M.J.: A repetitive branch-and-bound procedure for minimum within-cluster sum of squares partitioning. Psychometrika 71, 347–363 (2006)
Brusco M.J., Steinley D.: A comparison of heuristics procedures for minimum within-cluster sums of squares partitioning. Psychometrika 72, 583–600 (2007)
Christou, I.T.: Exact method-based coordination of cluster ensembles. To appear in IEEE Trans. Pattern Anal. Mach. Intell. (2010)
Diehr G.: Evaluation of a branch and bound algorithm for clustering. SIAM J. Sci. Stat. Comput. 6, 268–284 (1985)
Dinkelbach W.: On nonlinear fractional programming. Manage Sci 13, 492–498 (1967)
Drezner Z., Mehrez A., Wesolowsky G.O.: The facility location problem with limited distances. Transp. Sci. 25, 183–187 (1991)
du Merle O., Hansen P., Jaumard B., Mladenović N.: An interior point algorithm for minimum sum-of-squares clustering. SIAM J. Sci. Comput. 21, 1485–1505 (2000)
du Merle O., Villeneuve D., Desrosiers J., Hansen P.: Stabilized column generation. Discrete Math. 194, 229–237 (1999)
Edwards A.W., Cavalli-Sforza L.L.: A method for cluster analysis. Biometrics 21, 362–375 (1965)
Elhedhli S., Goffin J.-L.: The integration of an interior-point cutting plane method within a branch-and-price algorithm. Math. Program. 100, 267–294 (2004)
Fisher R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. VII, 179–188 (1936)
Forgy E.W.: Cluster analysis of multivariate data: efficiency vs. interpretability of classifications. Biometrics 21, 768 (1965)
Goffin J.-L., Haurie A., Vial J.-P.: Decomposition and nondifferentiable optimization with the projective algorithm. Manag. Sci. 38, 284–302 (1992)
Grötschel, M., Holland, O.: Solution of large-scale symmetric traveling salesman problems. Math. Program. 51, 141–202 (1991). Data sets available at http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/tsp
Hansen P., Jaumard B.: Cluster analysis and mathematical programming. Math. Program. 79, 191–215 (1997)
Hansen, P., Jaumard, B., Meyer, C.: A simple enumerative algorithm for unconstrained 0–1 quadratic programming. Cahier du GERAD G-2000-59, GERAD, November (2000)
Hansen P., Mladenović N.: J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recognit. 34, 405–413 (2001)
Hansen P., Mladenović N.: Variable neighborhood search: principles and applications. Eur. J. Oper. Res. 130, 449–467 (2001)
Hansen P., Mladenović N., Pérez J.A.M.: Variable neighborhood search: methods and applications. 4OR 6, 319–360 (2008)
Hansen P., Negai E., Cheung B.K., Mladenović N.: Analysis of global k-means, an incremental heuristic for minimum sum-of-squares clustering. J. Classif. 22, 287–310 (2005)
Hartigan J.A.: Clustering Algorithms. Wiley, New York (1975)
Heinz, G., Peterson, L.J., Johnson, R.W., Kerk, C.J.: Exploring relationships in body dimensions. J. Stat. Education 11, (2003) Data set available at http://www.amstat.org/publications/jse/v11n2/datasets.heinz.html
Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of the 10th ACM Symposium on Computational Geometry, pp. 332–339 (1994)
Jain A.K., Murty M.N., Flynn P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)
Jensen R.E.: A dynamic programming algorithm for cluster analysis. Oper. Res. 17, 1034–1057 (1969)
Kelley J.E.: The cutting plane method for solving convex programs. J. SIAM 8, 703–712 (1960)
Kogan J.: Introduction to Clustering Large and High-Dimensional Data. Cambridge University Press, New York (2006)
Koontz W.L.G., Narendra P.M., Fukunaga K.: A branch and bound clustering algorithm. IEEE Trans. Comput. C-24, 908–915 (1975)
Laszlo M., Mukherjee S.: A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering. IEEE Trans. Pattern Anal. Mach. Intell. 28, 533–543 (2006)
Laszlo M., Mukherjee S.: A genetic algorithm that exchanges neighboring centers for k-means clustering. Pattern Recognit. Lett. 36, 451–461 (2007)
Leyffer, S.: User manual for MINLP_BB. Technical report, University of Dundee, UK, March (1999)
Liberti L.: Reformulations in mathematical programming: definitions and systematics. RAIRO-RO 43(1), 55–86 (2009)
Likas A., Vlassis N., Verbeek J.J.: The global k-means clustering algorithm. Pattern Recognit. 36, 451–461 (2003)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 2, pp. 281–297. Berkeley, CA (1967)
Mahajan M., Nimbhorkar P., Varadarajan K.: The planar k-means problem is NP-hard. Lect. Notes Comput. Sci. 5431, 274–285 (2009)
Merz P.: An iterated local search for minimum sum-of-squares clustering. Lect. Notes Comput. Sci. 2810, 286–296 (2003)
Mirkin B.: Mathematical Classification and Clustering. Kluwer, Dordrecht, The Netherlands (1996)
Mirkin B.: Clustering for Data Mining: A Data Recovery Approach. Chapman and Hall/CRC, Boca Raton (2005)
Mladenović N., Hansen P.: Variable neighborhood search. Comput. Oper. Res. 24, 1097–1100 (1997)
Pacheco J.A.: A scatter search approach for the minimum sum-of-squares clustering problem. Comput. Oper. Res. 32, 1325–1335 (2005)
Pacheco J.A., Valencia O.: Design of hybrids for the minimum sum-of-squares clustering problem. Comput. Stat. Data Anal. 43, 235–248 (2003)
Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33, 60–100 (1991). Data set available at http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/tsp
Pal, S.K., Majumder, D.D.: Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Trans. Syst. Man. Cybern. 7, 625–629 (1977). Data set available at http://www.isical.ac.in/sushmita/patterns/vowel.dat
Peng J., Xia Y.: A new theoretical framework for k-means-type clustering. Stud Fuzziness Soft Comput. 180, 79–96 (2005)
Reinelt, G.: TSPLIB– a traveling salesman library. ORSA J. Comput. 3, 319–350 (1991). http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95
Ruspini E.H.: Numerical method for fuzzy clustering. Inf. Sci. 2, 319–350 (1970)
Ryan D.M., Foster B.A.: An integer programming approach to scheduling. In: Wren, A. (eds) Computer Scheduling of Public Transport Urban Passenger Vehicle and Crew Scheduling, pp. 269–280. North-Holland, Amsterdam (1981)
Sherali H.D., Adams W.P.: Reformulation-linearization techniques for discrete optimization problems. In: Du, D.Z., Pardalos, P.M. (eds) Handbook of Combinatorial Optimization 1, pp. 479–532. Kluwer, Dordrecht (1999)
Sherali H.D., Desai J.: A global optimization RLT-based approach for solving the hard clustering problem. J. Glob. Optim. 32, 281–306 (2005)
Späth H.: Cluster Analysis Algorithm for Data Reduction and Classification of Objects. Wiley, New York (1980)
Steinhaus H.: Sur la division des corps matèriels en parties. Bulletin De L’Académie Polonaise Des Sciences Classe III. IV, 801–804 (1956)
Steinley D.: K-means clustering: a half-century synthesis. Br. J. Math. Stat. Psychol. 59, 1–34 (2006)
Taillard É.D.: Heuristic methods for large centroid clustering problems. J. Heuristics 9, 51–73 (2003)
Teboulle M.: A unified continuous optimization framework for center-based clustering methods. J. Mach. Learn. Res. 8, 65–102 (2007)
Tuy H.: Concave programming under linear constraints. Soviet Math. 5, 1437–1440 (1964)
van Os B.J., Meulman J.J.: Improving dynamic programming strategies for partitioning. J. Classif. 21, 207–230 (2004)
Vavasis S.A.: Nonlinear Optimization: Complexity Issues. Oxford University Press, Oxford (1991)
Xavier, A.E., Negreiros, M.J., Maculan, N., Michelon, P.: The use of the hyperbolic smoothing clustering method for planning the tasks of sanitary agents in combating dengue. In: Proceedings of IFORS 2005 (2005)
Xia, Y., Peng, J.: A cutting algorithm for the minimum sum-of-squared error clustering. In: Proceedings of the SIAM International Data Mining Conference (2005)
Yeh, I.-C.: Modeling of strength of high performance concrete using artificial neural networks. Cement and Concrete Res. 28, 1797–1808 (1998). Data set available at http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aloise, D., Hansen, P. & Liberti, L. An improved column generation algorithm for minimum sum-of-squares clustering. Math. Program. 131, 195–220 (2012). https://doi.org/10.1007/s10107-010-0349-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-010-0349-7