Abstract
Cluster analysis is an important tool for data exploration and it has been applied in a wide variety of fields like engineering, economics, computer sciences, life and medical sciences, earth sciences and social sciences. The typical cluster analysis consists of four steps (i.e. feature selection or extraction, clustering algorithm design or selection, cluster validation and results interpretation) with feedback pathway. These steps are closely related to each other and affect the derived clusters. In this paper, a new metaheuristic algorithm is proposed for cluster analysis. This algorithm uses an Ant Colony Optimization to feature selection step and a Greedy Randomized Adaptive Search Procedure to clustering algorithm design step. The proposed algorithm has been applied with very good results to many data sets.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aha, D. W., & Bankert, R. L. (1996). A comparative evaluation of sequential feature selection algorithms. In D. Fisher & J.-H. Lenx (Eds.), Artificial intelligence and statistics. New York: Springer.
Al-Ani, A. (2005a). Feature subset selection using ant colony optimization. International Journal of Computational Intelligence, 2(1), 53–58.
Al-Ani, A. (2005b). Ant colony optimization for feature subset selection. Transactions on Engineering, Computing and Technology, 4, 35–38.
Al-Sultan, K. (1995). A tabu search approach to the clustering problem. Pattern Recognition, 28(9), 1443–1451.
Azzag, H., Guinot, C., Venturini, G. (2006). Data and text mining with hierarchical clustering ants. In A. Abraham, C. Grosan & V. Ramos (Eds.), Swarm intelligence in data mining (pp. 153–190).
Azzag, H., Venturini, G., Oliver, A., & Gu, C. (2007). A hierarchical ant based clustering algorithm and its use in three real-world applications. European Journal of Operational Research, 179, 906–922.
Babu, G., & Murty, M. (1993). A near-optimal initial seed value selection in k-means algorithm using a genetic algorithm. Pattern Recognition Letters, 14(10), 763–769.
Brown, D., & Huntley, C. (1992). A practical application of simulated annealing to clustering. Pattern Recognition, 25(4), 401–412.
Cano, J. R., Cordón, O., Herrera, F., & Sánchez, L. (2002). A GRASP algorithm for clustering. In F. J. Garijo, J. C. Riquelme, & M. Toro (Eds.), LNAI: Vol. 2527. IBERAMIA 2002 (pp. 214–223). Berlin: Springer.
Cantu-Paz, E., Newsam, S., & Kamath, C. (2004). Feature selection in scientific application. In Proceedings of the 2004 ACM SIGKDD international conference on knowledge discovery and data mining (pp. 788–793).
Celeux, G., & Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis, 14, 315–332.
Chen, L., Tu, L., & Chen, H. (2005). A novel ant clustering algorithm with digraph. In L. Wang, K. Chen, & Y. S. Ong (Eds.), LNCS: Vol. 3611. ICNC 2005 (pp. 1218–1228). Berlin: Springer.
Chu, S., & Roddick, J. (2000). A clustering algorithm using the tabu search approach with simulated annealing, In N. Ebecken & C. Brebbia (Eds.), Data mining II-proceedings of second international conference on data mining methods and databases (pp. 515–523), Cambridge, U.K.
Cowgill, M., Harvey, R., & Watson, L. (1999). A genetic algorithm approach to cluster analysis. Computers and Mathematics with Applications, 37, 99–108.
Dorigo, M., & Stutzle, T. (2004). Ant colony optimization. Cambridge: Bradford Book.
Dorigo, M., Maniezzo, V., & Colorni, A. (1996). Ant system: optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics – Part B, 26(1), 29–41.
Feo, T. A., & Resende, M. G. C. (1995). Greedy randomized adaptive search procedure. Journal of Global Optimization, 6, 109–133.
Glover, F. (1989). Tabu search I. ORSA Journal on Computing, 1(3), 190–206.
Glover, F. (1990). Tabu search II. ORSA Journal on Computing, 2(1), 4–32.
Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Massachusetts: Addison-Wesley.
He, Y., Hui, S. C., & Sim, Y. (2006). A novel ant-based clustering approach for document clustering. In H. T. Ng (Eds.), LNCS: Vol. 4182. AIRS 2006 (pp. 537–544). Berlin: Springer.
Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press.
Jain, A., & Zongker, D. (1997). Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 153–158.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys, 31(3), 264–323.
Janson, S., & Merkle, D. (2005). A new multi-objective particle swarm optimization algorithm using clustering applied to automated docking. In M. J. Blesa (Eds.), LNCS: Vol. 3636. HM 2005 (pp. 128–141). Berlin: Springer.
Jouve, P.-E., & Nicoloyannis, N. (2005). A filter feature selection method for clustering. In M.-S. Hacid (Eds.), LNAI: Vol. 3488. ISMIS 2005 (pp. 583–593). Berlin: Springer.
Kao, Y., & Cheng, K. (2006). An ACO-based clustering algorithm. In M. Dorigo (Eds.), LNCS: Vol. 4150. ANTS 2006 (pp. 340–347). Berlin: Springer.
Kao, Y.-T., Zahara, E., & Kao, I.-W. (2007). A hybridized approach to data clustering. Expert Systems with Applications. doi:10.1016/j.eswa.2007.01.028.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of 1995 IEEE international conference on neural networks (Vol. 4, pp. 1942–1948).
Kira, K., & Rendell, L. (1992). A practical approach to feature selection. In Proceedings of the ninth international conference on machine learning, Aberdeen, Scotland (pp. 249–256).
Li, Z., & Tan, H.-Z. (2006). A combinational clustering method based on artificial immune system and support vector machine. In B. Gabrys, R. J. Howlett, & L. C. Jain (Eds.), LNAI: Vol. 4251. KES 2006, Part I (pp. 153–162). Berlin: Springer.
Liao, S.-H., & Wen, C.-H. (2007). Artificial neural networks classification and clustering of methodologies and applications – literature analysis from 1995 to 2005. Expert Systems with Applications, 32, 1–11.
Liu, Y., Chen, K., Liao, X., & Zhang, W. (2004). A genetic clustering method for intrusion detection. Pattern Recognition, 37, 927–942.
Liu, Y., Liu, Y., Wang, L., & Chen, K. (2005). A hybrid tabu search based clustering algorithm. In R. Khosla (Eds.), LNAI: Vol. 3682. KES 2005 (pp. 186–192). Berlin: Springer.
Marinakis, Y., Migdalas, A., & Pardalos, P. M. (2005a). Expanding neighborhood GRASP for the traveling salesman problem. Computational Optimization and Applications, 32, 231–257.
Marinakis, Y., Migdalas, A., & Pardalos, P. M. (2005b). A hybrid genetic-GRASP algorithm using Langrangean relaxation for the traveling salesman problem. Journal of Combinatorial Optimization, 10, 311–326.
Marinakis, Y., Marinaki, M., Doumpos, M., Matsatsinis, N., & Zopounidis, C. (2008). Optimization of nearest neighbor classifiers via metaheuristic algorithms for credit risk assessment. Journal of Global Optimization, 42, 279–293.
Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33, 1455–1465.
Meng, L., Wu, Q. H., & Yong, Z. Z. (2000). A faster genetic clustering algorithm. In S. Cagnoni (Eds.), LNCS: Vol. 1803. EvoWorkshops 2000 (pp. 22–33). Berlin: Springer.
Mirkin, B. (1996). Mathematical classification and clustering. Dordrecht: Kluwer.
Nasraoui, O., Gonzalez, F., Cardona, C., Rojas, C., & Dasgupta, D. (2003). A scalable artificial immune system model for dynamic unsupervised learning. In E. Cantú-Paz (Eds.), LNCS: Vol. 2723. GECCO 2003 (pp. 219–230). Berlin: Springer.
Ng, M. K. (2000). A note on constrained k-means algorithms. Pattern Recognition, 33, 515–519.
Paterlini, S., & Krink, T. (2006). Differential evolution and particle swarm optimization in partitional clustering. Computational Statistics and Data Analysis, 50, 1220–1247.
Ray, S., & Turi, R. H. (1999). Determination of number of clusters in K-means clustering and application in colour image segmentation. In Proceedings of the 4th international conference on advances in pattern recognition and digital techniques (ICAPRDT99), Calcutta, India.
Resende, M. G. C., & Ribeiro, C. C. (2003). Greedy randomized adaptive search procedures. In F. Glover & G. A. Kochenberger (Eds.), Handbook of metaheuristics (pp. 219–249). Boston: Kluwer.
Rokach, L., & Maimon, O. (2005). Clustering methods. In O. Maimon & L. Rokach (Eds.), Data mining and knowledge discovery handbook (pp. 321–352). New York: Springer.
Selim, S., & Alsultan, K. (1991). A simulated annealing algorithm for the clustering problems. Pattern Recognition, 24(10), 1003–1008.
Shelokar, P. S., Jayaraman, V. K., & Kulkarni, B. D. (2004). An ant colony approach for clustering. Analytica Chimica Acta, 509, 187–195.
Shen, H.-Y., Peng, X.-Q., Wang, J.-N., & Hu, Z.-K. (2005a). A mountain clustering based on improved PSO algorithm. In L. Wang, K. Chen, & Y. S. Ong (Eds.), LNCS: Vol. 3612. ICNC 2005 (pp. 477–481). Berlin: Springer.
Shen, J., Chang, S. I., Lee, E. S., Deng, Y., & Brown, S. J. (2005b). Determination of cluster number in clustering microarray data. Applied Mathematics and Computation, 169, 1172–1185.
Sheng, W., & Liu, X. (2006). A genetic k-medoids clustering algorithm. Journal of Heuristics, 12, 447–466.
Sherafat, V., Nunes de Castro, L., & Hruschka, E. R. (2004). TermitAnt: an ant clustering algorithm improved by ideas from termite colonies. In N. R. Pal (Eds.), LNCS: Vol. 3316. ICONIP 2004 (pp. 1088–1093). Berlin: Springer.
Sun, J., Xu, W., & Ye, B. (2006). Quantum-behaved particle swarm optimization clustering algorithm. In X. Li, O. R. Zaiane, & Z. Li (Eds.), LNAI: Vol. 4093. ADMA 2006 (pp. 340–347). Berlin: Springer.
Sung, C., & Jin, H. (2000). A tabu-search-based heuristic for clustering. Pattern Recognition, 33, 849–858.
Tarsitano, A. (2003). A computational study of several relocation methods for k-means algorithms. Pattern Recognition, 36, 2955–2966.
Tsang, C.-H., & Kwong, S. (2006). Ant colony clustering and feature extraction for anomaly intrusion detection. Studies in Computational Intelligence (SCI), 34, 101–123.
Tseng, L., & Yang, S. (2000). A genetic clustering algorithm for data with non-spherical-shape clusters. Pattern Recognition, 33, 1251–1259.
Tseng, L., & Yang, S. (2001). A genetic approach to the automatic clustering problem. Pattern Recognition, 34, 415–424.
Wu, F.-X., Zhang, W. J., & Kusalik, A. J. (2003). A genetic k-means clustering algorithm applied to gene expression data. In Y. Xiang & B. Chaib-draa (Eds.), LNAI : Vol. 2671. AI 2003 (pp. 520–526). Berlin: Springer.
Xu, R., & Wunsch, D., II (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3), 645–678.
Yang, Y., & Kamel, M. S. (2006). An aggregated clustering approach using multi-ant colonies algorithms. Pattern Recognition, 39, 1278–1289.
Yeh, J.-Y., & Fu, J. C. (2007). A hierarchical genetic algorithm for segmentation of multi-spectral human-brain MRI. Expert Systems with Applications. doi:10.1016/j.eswa.2006.12.012.
Younsi, R., & Wang, W. (2004). A new artificial immune system algorithm for clustering. In Z. R. Yang (Eds.), LNCS: Vol. 3177. IDEAL 2004 (pp. 58–64). Berlin: Springer.
Zhang, C., & Hu, H. (2005). Ant colony optimization combining with mutual information for feature selection in support vector machines. In S. Zhang & R. Jarvis (Eds.), LNAI: Vol. 3809. AI 2005 (pp. 918–921). Berlin: Springer.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Marinakis, Y., Marinaki, M., Doumpos, M. et al. A hybrid ACO-GRASP algorithm for clustering analysis. Ann Oper Res 188, 343–358 (2011). https://doi.org/10.1007/s10479-009-0519-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-009-0519-2