Abstract
In this paper, a version of K-median problem, one of the most popular and best studied clustering measures, is discussed. The model using squared Euclidean distances terms to which the K-means algorithm has been successfully applied is considered. A fast and robust algorithm based on DC (Difference of Convex functions) programming and DC Algorithms (DCA) is investigated. Preliminary numerical solutions on real-world databases show the efficiency and the superiority of the appropriate DCA with respect to the standard K-means algorithm.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Alon N., Spencer J.H. (1991): The Probabilistic Method. Wiley, New York, NY
Arora, S., Kannan, R.: Learning mixtures of arbitrary Gaussians. In: Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, July 6–8, pp. 247–257. Heraklion, Crete, Greece (2001)
Al-Sultan K. (1995): A Tabu search approach to the clustering problem. Pattern Recogn. 28(9): 443–1453
Bradley, P.S., Mangasarian, O.L., Street, W.N.: Clustering via concave minimization, Technical Report 96-03, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin May 1996. Advances In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Neural Information processing Systems 9, pp. 368–374. MIT Press, Cambridge, MA Available by ftp://ftp.cs.wisc.edu/math-prog/tech-trports/96-03.ps.Z.
Bradley, B.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Shavlik, J. (eds.) Machine Learning Proceedings of the Fifteenth International Conferences(ICML’98), pp. 82–90. San Francisco, CA 1998, Morgan Kaufmann.
Charikar, M., Guha, S.: Improved combinatorial algorithms for facility location and k-median problems. In: Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science, 17–18 October, pp. 378–388. New York, NY, USA (1999)
Charikar, M., Guha, S., Tardos, E., Shmoys, D.B.: A constant-factor approximation algorithm for the k-median problem. In: Proceedings of the 31st Annual ACM Symposium on Theory of Computing pp. 1–10 (1999)
De Leeuw J. (1997): Applications of convex analysis to multidimensional scaling, Recent developments. In: Barra J.R., et al. (eds). Statistics. North-Holland Publishing company, Amsterdam, pp. 133–145
De Leeuw J. (1988): Convergence of the majorization method for multidimensional scaling. J. Classi. 5, 163–180
Dhilon I.S., Korgan J., Nicholas C. (2003): Feature Selection and Document Clustering. In: Berry M.W. (eds). A Comprehensive Survey of Text Mining. Springer-Verlag, Berlin, pp. 73–100
Duda R.O., Hart P.E. (1972): Pattern Classification and Scene Analysis. Wiley, New York
Feder, T., Greene, D.: Optimal algorithms for approximate clustering. In: Proceedings of the 20th Annual ACM Symposium on Theory of Computing (STOC) May 2–4, Chicago, Illinois, USA (1988)
Fisher D. (1988): Knowledge acquisition via incremental conceptual clusterin. Mach. Learning, 2, 139–172
Fukunaga K. (1990): Statistical Pattern Recognition. Academic Press, NY
Hiriart Urruty J.B., Lemarechal C. (1993): Convex Analysis and Minimization Algorithms. Springer Verlag, Berlin, Heidelberg
Jain A.K., Dubes R.C. (1988): Algorithms for Clustering Data. Prentice-Hall Inc, Englewood Cliffs, NJ
Hartigan J.A. (1975): Clustering Algorithms. Wiley, New York
Le Thi Hoai An: Contribution à l’optimisation non convexe et l’optimisation globale: Théorie, Algoritmes et Applications. Habilitation à Diriger des Recherches, Université de Rouen, Juin (1997)
Le Thi Hoai An, Pham Dinh Tao (1997): Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J. Global Optim. 11(3): 253–285
Le Thi Hoai An, Pham Dinh Tao: DC programming approach for large-scale molecular optimization via the general distance geometry problem. Nonconvex Optimization and Its Applications, Special Issue “Optimization in Computational Chemistry and Molecular Biology: Local and Global Approaches”, pp. 301–339. Kluwer Academic Publishers, Dordrecht (2000) (This special issue contains refereed invited papers submitted at the conference on optimization in computational chemistry and molecular biology: local and global approaches held at Princeton University, 7–9 May 1999)
Le Thi Hoai An, Pham Dinh Tao (2001): DC programming approach for solving the multidimensional scaling problem. Nonconvex optimizations and its applications: special issue “From local to global optimization”, pp. 231–276. Kluwer Academic Publishers, Dordrecht
Le Thi Hoai An, Pham Dinh Tao: DC programming: theory, algorithms and applications. The state of the art. In Proceedings of The First International Workshop on Global Constrained Optimization and Constraint Satisfaction (Cocos’ 02), 28 p. Valbonne-Sophia Antipolis, France, 2–4 October (2002)
Le Thi Hoai An, Pham Dinh Tao (2003): Large Scale Molecular Optimization from distances matrices by a DC optimization approach. SIAM J. Optim. 14(1): 77–116
Le Thi Hoai An, Pham Dinh Tao (2005): The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46
Liu, Y., Shen, X., Doss, H.: Multicategory ψ–learning and support vector machine (29 p.). Conference on Machine Learning, Statistics and Discovery, June 22–26, Department of Statistics, the Ohio State University (2003)
Mangasarian O.L. (1997): Mathematical programming in data mining. Data Mining and Knowl. discov. 1, 183–201
MacQueen J.B. (1967): Some Methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability”, Berkeley, University of California Press,1, 281–297
Meyerson A., O’Callaghan L., Plotkin S. (2004): A k-Median algorithm with running time independent of data size. Machine Learn. 56, 61–87
Neumann, J., Schnörr, C., Steidl, G.: SVM-based feature selection by direct objective minimisation, Pattern Recognition. In: Proceeding of 26th DAGM Symposium, vol. 3175, pp. 212–219, LNCS (2004)
Pham Dinh Tao. (1984): Convergence of subgradient method for computing the bound-norm of matrices, Linear Algebra Appl. 62, 163–182
Pham Dinh Tao. (1984): Algorithmes de calcul d’une forme quadratique sur la boule unité de la norme du maximum. Numer. Math. 45, 377–440
Pham Dinh Tao, Le Thi Hoai An.(1997): Convex analysis approach to d.c. programming: Theory, Algorithms and Applications. Acta Mathematica Vietnamica, (dedicated to Professor Hoang Tuy on the occasion of his 70th birthday), 22(1): 289–355
Pham Dinh Tao, Le Thi Hoai An. (1998): DC optimization algorithms for solving the trust region subproblem. SIAM J. Optim. 8, 476–505
Rao M.R. (1971): Cluster analysis and mathematical programming. J. Amer. Stat. Associ., 66, 622–626
Rockafellar R.T. (1970): Convex Analysis. Princeton University, Princeton
Selim S.Z., Ismail M.A. (1984): K-means-Type algorithms: a generalized convergence theorem and characterization of local optimilaty. (Book Series) IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, 81–87
Schüle T., Schnörr C., Weber S., Hornegger J. (2005): Discrete tomography by convex-concave regularization and d.c. programming. Discr. Appl. Math. 151, 229–243
Weber S., Schüle T., Schnörr C. (2005): Prior learning and convex-concave regularization of binary tomography. Electr. Notes in Discr. Math. 20, 313–327
Weber S., Schnörr C., Schüle T., Hornegger J., (2005): Binary Tomography by Iterating Linear Programs, Klette, R., Kozera, R., Noakes, L., and Weickert, J., (eds.) Computational Imaging and Vision – Geometric Properties from Incomplete Data, Kluwer Academic Press, Dordrecht
Wong, T., Katz, R., McCanne, S.: A preference clustering protocol for large-Scale Multicast Applications, Proceedings of the First International COST264 Workshop on Networked Group Communication, November 17–20, LNCS, pp. 1–18. Pisa, Italy (1999)
Wolberg W.H., Street W.N., Mangasarian O.L. (1995): Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Anal. Quant. Cytol. Histol. 17(2): 77–87
Wolberg W.H., Street W.N., Heisey D.M., Mangasarian O.L. (1995):Computerized breast cancer diagnosis and prognosis from fine-needle aspirates. Arch. Surg. 130, 511–516
Wolberg W.H., Street W.N., Heisey D.M., Mangasarian O.L. (1995): Computer-derived nuclear features distinguish malignant from benign breast cytology. Hum. Patholo., 26, 792–796
Yuille A.L., Rangarajan A. (2002): The Concave-Convex Procedure (CCCP). Avances in Neural Information Processing Systems 14, pp. 1033–1040. MIT Press, Cambridge, MA
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
An, L.T.H., Belghiti, M.T. & Tao, P.D. A new efficient algorithm based on DC programming and DCA for clustering. J Glob Optim 37, 593–608 (2007). https://doi.org/10.1007/s10898-006-9066-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-006-9066-4