1 Introduction

Cluster analysis, or clustering, is one of the main tools in Data Analysis and Machine Learning, since it intends to discover groups or classes in large data sets of objects described by observed variables, simplifying this way the set with a small number of clusters. Most clustering methods are based on dissimilarities, graphs, models, or densities. In our case, we will deal with dissimilarities or distances for numerical data sets. There are two main families in this case: partitioning methods and hierarchical ones, being K-means and agglomerative hierarchical methods, respectively, the most widely used in practice. Both have local optimality problems: local minima that depend on initialization for K-means, greedy procedure for agglomerative hierarchical clustering.

Several combinatorial optimization metaheuristics have been used for cluster partitioning (Handl & Knowles [14]; Ng & Wong [21]; Sarkar, Yegnanarayana, & Khemani [23]; Trejos, Murillo, & Piza [27]). In this article, we deal with partitioning for numerical data sets, using an ant colony optimization (ACO) approach in order to overcome the local optima problem.

According to Handl and Knowles [14], published in 2006, “a few implementations of ACO have been proposed for data clustering, with the construction graph typically employed to directly represent cluster assignments (Handl & Meyer [15]; Runkler [22])”.

In 2004, we published a first paper on clustering using an ant colony optimization approach (Trejos, Murillo, & Piza [28]) for the minimization of the within sum-of-squares criterion. In that method, ants were associated with partitions that were modified during the iterations, according to a probability of selection that depends on the visibility (proportional to the distance between the objects) and the pheromone trail (which depends on the fact that the objects have been classified together in the partitions). The pheromone matrix measured relation intensity between pairs of objects.

By that time, Shelokar, Jayaraman, and Kulkarni [24] published another clustering method based on ACO for minimizing the same criterion as in Trejos, Murillo, and Piza [28], with a pheromone trail but no local heuristic. The pheromone matrix relates objects and clusters, and it is defined by the inverse of the objective function. The matrix is used as a kind of adaptive memory that contains information provided by the previously found superior solutions, and is updated at the end of each iteration (Shelokar et al. [24]). This information is considered by the other ants to continue the clustering process. However, it is not clear how the authors selected the parameters to execute the ACO algorithm. They indicate that several simulations were performed to find the algorithm parameters (Shelokar et al. [24]), but they do not present details about the process. They also present a comparison among their ants algorithm and other heuristic methods such as genetic algorithm, simulated annealing, and tabu search.

Later on, Kao and Cheng in a short paper [17] improved Shelokar’s algorithm introducing a local heuristic or visibility based on the inverse of the distance between objects and class centers. The pheromone trail is also defined by the inverse of the criterion and the algorithm follows almost the same steps as Shelokar algorithm (Shelokar et al. [24]), with the difference that visibility is introduced.

Neither Shelokar et al. [24] nor Kao and Cheng [17] give a detailed analysis on the choice of parameters for their methods.

In the present article, we use ACO with ants constructing partitions. The strategy is based on the traveling salesman problem (TSP) in a similar way as it was tackled in Bonabeau, Dorigo, and Therauluz [4] with ACO, in our case for the clustering problem. It is a constructive method, in which each ant builds a partition. This part of the process is similar to the ideas presented in Kao and Cheng [17] and Shelokar et al. [24], which were previously presented; but this paper deals with three different aims: first, developing a fitting parameters analysis studying the algorithm behavior in the clustering problem according to its parameters. Second, we introduce a local search procedure based on the K-means algorithm, to improve the basic ACO (BACO) algorithm performance. And finally, to develop a performance comparison among the K-means algorithm (KM), the BACO algorithm and the BACOK (BACO improved with the local search procedure) algorithm.

The article is organized as follows. Section 2 contains the mains concepts of clustering we use in the article, introducing the main notation we need. In Sect. 3 the artificial ant concept is explained and the ACO classical algorithm is presented. In Sect. 4 we introduce the proposed ACO algorithm. Section 5 describes the experiment performed. Sections 6 and 7 present the results and some remarks.

2 Clustering

Cluster analysis, or clustering, deals with finding homogeneous groups of objects such that similar objects belong to the same class and it is possible to distinguish between objects in different classes. Cluster analysis can be defined as an optimization problem in which a given function consisting of within cluster similitary and among clusters dissimilarities need to be optimized (Jafar & Sivakumar [16]; Xavier & Xavier [30]). In the numerical case, there is a set of objects \(\Omega = \{\mathbf {x}_1,\mathbf {x}_2,\ldots ,\mathbf {x}_n\}\) such that \(\mathbf {x}_i \in \mathbb {R}^p\), for all i, that is, the objects are described by p numerical or quantitative variables. The most widely used criterion (Everitt, Landau, Leese, & Stahl [9]) is the minimization of the within sum-of-squares, also known as within inertia or variance:

$$\begin{aligned} W = \frac{1}{n} \sum _{k=1}^K \sum _{\mathbf {x}_i\in C_k} \Vert \mathbf {x}_i - \mathbf {g}_k \Vert ^2, \end{aligned}$$
(1)

where K is the number of classes or clusters (number fixed a priori), \(P = (C_1,C_2,\ldots ,C_K)\) is a partition of \(\Omega \), and \(\mathbf {g}_k\) is the barycenter or mean vector of \(C_k\). Minimizing W(P) is equivalent to maximizing the between sum-of-squares (between inertia and variance):

$$\begin{aligned} B = \sum _{k=1}^K \frac{|C_k|}{n} \Vert \mathbf {g}_k - \mathbf {g} \Vert ^2, \end{aligned}$$

where \(\mathbf {g}\) is the overall barycenter and \(|C_k|\) is the cardinality of class \(C_k\), since the sum \(I = W(P)+B(P)\) is a constant (the total inertia) (Everitt, Landau, Leese, & Stahl [9]).

The W(P) function is not a convex function, thus W(P) could have several local minima (Ng & Wong [21]; Sarkar et al. [23]). This feature causes the traditional clustering algorithms based on local search, such as K-means, to find mostly local minima (Trejos et al. [27]). Furthermore, the global optimization algorithms (such as linear programming, interval methods, branch, and bound methods) present a high sensitivity to relatively high-dimensional data tables, in which the algorithms’ probability for finding the optimal partition is very low. In those cases, algorithms report solutions that differ significantly from the optimum clustering (Bagirov [2]). Those features represent a challenge to try to find alternative optimization strategies, and combinatorial optimization heuristics are a viable option.

In recent years heuristic algorithms have been used to solve complex optimization problems, since their random nature is useful to efficiently avoid the convergence to local minima (Babu & Mutry [1]; Klein & Dubes [19]; Trejos et al. [27]). As particular examples of optimization heuristics used in clustering it is possible to cite simulated annealing, tabu search, genetic algorithms, particle swarm optimization, and ant colony optimization.

In the particular case of ant colony optimization, there are several contributions, as the already mentioned (Kao & Cheng [17]; Shelokar et al. [24]; Trejos et al. [28]), and some other more recent (Handl & Meyer [15]; Handl & Knowles [14]; Runkler [22]; Zhe et al. [31]).

3 Artificial Ant Colonies

The optimization approach based on ant colonies (ACO) is part of a large group based on swarm intelligence. It was proposed by Marco Dorigo in 1992, to solve several discrete optimization problems (Dorigo, DiCaro, & Gambardella [6]; Jafar & Sivakumar [16]), and since then it has been applied to several combinatorial optimization problems. This method, like every metaheuristic, depends on parameters which control several decisions taken in the process. There are several papers which develop parameters analysis for the ACO algorithm. In Gaertner and Clark [13] an empirical analysis of the sensitivity of the ACO algorithm to variations of some parameters for different instances of the TSP (traveling salesman problem) is presented. Similarly, in Wei [29] an experiment with parameter combinations is shown, in order to improve the speed of convergence of the ACO algorithm in the TSP. Also, this author indicates that at present the parameter settings and properties research of basic ant colony algorithm are mostly still in the experimental stage (Wei [29]) Meanwhile, Stützle et al. [25] provides an extensive review of available research results on parameter adaptation in ACO algorithms. They mention that ACO algorithms involve a number of parameters that need to be set appropriately, in particular \(\alpha \), \(\beta \) (both used to weigh the relative influence of the pheromone) and \(\rho \) (evaporation rate parameter, \(0\le \rho \le 1\)). A parameter selection in the TSP context is developed in Dorigo, Maniezzo, and Colorni [8], in three different experiments. They tested the ranges: \(\alpha \in \{0,0.5,1,2,5\}\), \(\beta \in \{0,1,2,5\}\), \(\rho \in \{0.3, 0.5, 0.7, 0.99, 0.999\}\) and \(Q\in \{1,100,10000\}\). The numbers \(\alpha =1\) and \(\beta =5\), were selected as the best values for these parameters. Parameter \(\rho \) was fixed, depending on the experiment, in 0.99, 0.99 or \(\rho =0.5\). And finally, parameter Q was found to be negligible.

In nature, the optimization developed by ants while they look for food consists basically of minimizing the distance between the nest and food. For this reason, the first application of ACO was to the TSP (Bonabeau et al. [4]). In that problem the agent should visit n cities, all interconnected, visiting all cities just one time and then returning to the departure city, minimizing the distance.

In this paper, the TSP idea is used to study the clustering optimization problem. Thus, it is necessary to introduce artificial ants; that is, agents in charge of finding a feasible solution in the search space. During this process the ant will drop artificial pheromones so that other ants can rebuild the same solution. Pheromones should be volatile (disappear in time on the trails that have not been intensified) and have to increase on the shortest trails while the number of iterations increases (Dorigo et al. [6]).

The pheromone update formula applied in the TSP is given by \( \tau _{uv}=(1-\rho ) \tau _{uv}+\rho \Delta \tau _{uv} \) (Barcos, Rodríguez, Álvarez, & Robusté [3]; Dorigo, Birattari, & Stützle [5]; Dorigo & Gambardella [7]), where \( \tau _{uv} \) is the pheromone present on the trail from u to v, \( \rho \) is the evaporation rate, and

$$\begin{aligned} \Delta \tau _{uv}=\sum _{m=1}^{M}\Delta \tau _{uv}^{m}, \end{aligned}$$

where M is the number of ants, and \( \Delta \tau _{uv}^{m} \) is the pheromone dropped by the m-th ant on the trail (uv) , normally given by

$$ \Delta \tau _{uv}^{m}={\left\{ \begin{array}{ll} Q/d_{m} &{} \text { if ant } m \text { walks across } (u,v) \\ 0&{} \text { otherwise;} \end{array}\right. } $$

where Q is a parameter to be fitted and \( d_{m} \) represents the total distance walked by ant m.

An alternative way to deal with pheromones is to make local updatings, that is, every time an ant goes from node u to node v, a local pheromone update is applied on the trail (uv) (Dorigo & Gambardella [7]). A possible local update formula is \( \tau _{uv}=\tau _{uv}+\dfrac{Q}{d_{uv}},\) where Q is a parameter to be fitted and \( d_{uv} \) is the distance between u and v. When all ants finish their trips, the pheromone is updated by applying the evaporation rate.

On the other hand, each ant has to decide to which node it goes from the current node. In that choice three factors are fundamental: visibility, pheromone trail, and a probabilistic factor. Thus, if \( T_{m} \) represents the route built by the ant m while it is on the node u, then the probability of going to the node v is given by

$$ p_{uv}^{m}={\left\{ \begin{array}{ll} \dfrac{[\tau _{uv}]^{\alpha } \cdot [\eta _{uv}]^{\beta } }{ \sum \limits _{s\not \in T_{m}} [\tau _{us} ]^{\alpha } [\eta _{us}]^{\beta }} &{} \text { if }v \not \in T_{m} \\ 0 &{} \text { if } v\in T_{m}; \end{array}\right. } $$

where \( \eta _{uv} \) is the visibility, defined by \( \eta _{uv}=1/d_{uv} \), with \( d_{uv} \) the distance from the node u to node v; \( \tau _{uv} \) is the pheromone on the trail (uv), and \( \alpha \) and \( \beta \) are parameters to be fitted (Barcos et al. [3]; Dorigo et al. [6]; Kennedy & Eberhart [18]).

To stop the algorithm, Bonabeau et al. [4] proposed using a maximum iteration number. The disadvantage of this procedure is that it could stop the algorithm while it is still improving the solutions. Also, Dorigo et al. [8] considered investigating a stagnation behavior of all ants traveling the same path. A stagnation process is present if a percentage of the ants has the same distance in their paths. Thus, it is almost certain that those ants are traveling the same path, or at least, that they are traveling paths with the same cost value.

In Algorithm 1, the classical ACO algorithm is shown.

figure a

4 Description of the Proposed ACO Algorithm

The method starts by defining a list of M artificial ants \( \mathbf {h}_{1},\mathbf {h}_{2},\ldots ,\mathbf {h}_{M}\), that will build a data clustering in K classes (or clusters). At the beginning, it is possible to define the best ant in the colony, denoted by \( \mathbf {h}^{*} \), equal to \( \mathbf {h}_{m} \) for some \( m=1,2,\ldots ,M \), because in that moment there is no comparison parameter among them; thus the assignment could be random.

For ant \(\mathbf {h}_{m}\), with \( m=1,2,\ldots ,M \), K random points in the space of individuals (a hyperrectangle that contains all individuals) are considered, denoted by \( \mathbf {g}_{1}^{m},\mathbf {g}_{2}^{m},\ldots ,\mathbf {g}_{K}^{m} \). These points are interpreted as the initial centroids. \( C_{k}^{m} \) denotes the class k, with centroid \( \mathbf {g}_{k}^{m} \), which has been built by ant m. Also, \( \mathbf {h}_{m} \) has a tabu list \( L_{m} \), which is a short term memory that contains the objects classified by \(\mathbf {h}_{m}\). In each iteration, in order to complete the tour, ant m has to classify the objects not in \(L_{m}\). When the iteration is done, all objects should be in \(L_{m}\), this guarantees that the clustering process is complete.

During the clustering process, each ant randomly chooses an object that is not in its tabu list. Then, the ant should randomly select a class in which to classify the object. If ant m selects object i, then the process to choose the class uses a probabilistic roulette (see Talbi [26]). The probability that \(\mathbf {h}_{m} \) assigns object i to class \( C_{k}^{m} \) is denoted by \( p_{ik}^{m} \). To calculate this probability it is necessary to consider the following factors:

  • Visibility: This factor is denoted by \( \eta _{ik}^{m} \), and it consists of the visibility of \(\mathbf {h}_{m}\), located on object \( \mathbf {x}_{i} \), to “see” class \( C_{k}^{m} \). The visibility is defined as the reciprocal of the distance from object \( \mathbf {x}_{i} \) to \(\mathbf {g}_{k}^{m} \), the centroid of class \(C_{k}^{m}\). Thus, \( \eta _{ik}^{m}:=\tfrac{1}{d_{ik}^{m}} \text {, \ where }d_{ik}^{m}=d^{2}(\mathbf {x}_{i},\mathbf {g}^{m}_{k})=\left\| \mathbf {x}_{i}-\mathbf {g}_{k}^{m} \right\| ^{2}.\) If the visibility which \(\mathbf {h}_{m}\) has of class \(C_{k}^{m}\) is large, then the probability of classifying \(\mathbf {x}_{i}\) in class k is also large.

  • The pheromone trail: The pheromone trail perceived by \(\mathbf {h}_{m}\) on the arc from \(\mathbf {x}_{i}\) to \( \mathbf {g}_{k}^{m} \) is denoted by \( \tau _{ik} \). It quantifies pheromones that have been dropped by all ants which have classified the same object \(\mathbf {x}_{i}\) in its respective class k. If \( \tau _{ik} \) is large, then the probability of assigning class k to cluster \(\mathbf {x}_{i}\) is going to increase.

Equation (2) shows the formula used to calculate \( p_{ik}^{m}\), considering visibility and the pheromone trail, inspired by the corresponding formula used by the agent in the TSP:

$$\begin{aligned} p_{ik}^{m}:=\dfrac{[\tau _{ik}]^{\alpha }\cdot [\eta _{ik}^{m}]^{\beta }}{\sum \limits _{r=1}^{K} [\tau _{ir}]^{\alpha }\cdot [\eta _{ir}^{m}]^{\beta } }, \end{aligned}$$
(2)

where \( \alpha \) and \( \beta \) are parameters to be fitted.

On the other hand, when \( \mathbf {h}_{m} \) chooses class \( C_{k}^{m} \) for object \( \mathbf {x}_{i} \), the ant will register index i in the respective tabu list \(L_{m}\). Futhermore, \( \mathbf {h}_{m} \) should do the following processes related to the assignment.

  • Local pheromone update: Ant \( \mathbf {h}_{m} \) should drop a pheromone trail between object \( \mathbf {x}_{i} \) and class \( C_{k}^{m} \). To do this, an auxiliary pheromone matrix was defined, denoted by \( \Gamma _{aux} \) with size \( n\times K \), such that entry ik of \( \Gamma _{aux} \) contains pheromones between \(x_{i}\) and class k. This matrix has the format presented in Table 1.

    Table 1 Auxiliary pheromone matrix \(\Gamma _{aux}\)

    Ant \( \mathbf {h}_{m} \) will drop \( \Delta \tau _{ik}^{m} \) pheromones. This quantity is defined by \(\Delta \tau _{ik}^{m}:=\dfrac{Q}{d^{m}_{ik}},\) where Q is a parameter to be fitted. Finally, the local pheromone update is done by adding \( \Delta \tau _{ik}^{m} \) with the current entry ik of \( \Gamma _{aux}\).

  • Centroid update: The final step in this process is to update the centroid \( \mathbf {g}_{k}^{m} \) of class \(C_k^m\). One possibility is using its definition \( \mathbf {g}_{k}^{m}:=\tfrac{1}{{\left| C_{k}^{m} \right| }}\displaystyle \sum _{\mathbf {x}\in C_{k}^{m}}\mathbf {x}\). This option is not advisable because there are several unnecessary calculations. If fact, it is possible to update \( \mathbf {g}_{k}^{m} \) recursively using its value in the previous iteration in case object \(\mathbf {x}_{i}\) is transferred to class \(C_k^m\). In Trejos et al. [27] the following formula is proven and is used to update the centroids more efficiently: \(\mathbf {g}_{k}^{m}:=\tfrac{1}{{\left| C_{k}^{m} \right| }}\left[ \left( {\left| C_{k}^{m} \right| }-1 \right) \mathbf {g}_{k}^{m}+\mathbf {x}_{i} \right] \).

After each ant has clustered one object, it should randomly select a new object that is not in its tabu list. Next, the ant should follow the process previously described. This process is done n times, clustering all objects by all ants.

When the process ends, each ant has a complete clustering of objects with the respective barycenters. Also, matrix \( \Gamma _{aux} \) contains pheromones that were dropped by ants. Entry ik of \( \Gamma _{aux} \) contains pheromone \( \Delta \tau _{ik} \), which has been dropped by all ants that classified object i in its respective class k. This quantity is represented by

$$\begin{aligned} \Delta \tau _{ik}= \displaystyle \sum _{m=1}^{M}\Delta \tau _{ik}^{m} . \end{aligned}$$

The next step is to calculate, for each ant, the within inertia. To do this, the classification done by each ant, and the respective barycenters, should be considered. Also, if one of the ants has a within inertia less than \( W(\mathbf {h}^{*}) \) (the best inertia so far in memory), then \(\mathbf {h}^{*}\) (the best ant in memory) is required to be updated.

Global pheromones are stored in a matrix \( \Gamma \) with the same structure as \( \Gamma _{aux}\). At the beginning, this matrix is initialized with values close to zero (indicating pheromone absence). When the travels of all ants finish, \( \Gamma \) is updated in entry ik by \( \Gamma _{ik}:=(1-\rho ) \Gamma _{ik}+\rho \Delta \tau _{ik}, \) where \( \rho \) is the pheromone evaporation rate.

When the pheromone updating process is done, matrix \( \Gamma _{aux} \) is initialized, to be used in the next iteration. Also, tabu lists (one per ant) are initialized, to start a new classification process.

As the final step to conclude the current iteration, an intensification process done by the best ant (the ant with lowest within inertia, denoted by \( \mathbf {h}^{*} \)) is developed. \( \mathbf {h}^{*} \) repeats her path dropping extra pheromones in arcs it visited. The intensification follows the following rule:

$$\begin{aligned} \Gamma _{ik}:={\left\{ \begin{array}{lll}\Gamma _{ik}+\tfrac{Q}{W(\mathbf {h}^{*})}\; \text { if the object }i \text { is in the class }k \text { of }\mathbf {h}^{*}, \\ \\ \Gamma _{ik} \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\text {otherwise;} \end{array} \right. } \end{aligned}$$

where \( W(\mathbf {h}^{*}) \) denotes the within inertia of the classification done by \( \mathbf {h}^{*} \). This ends the current iteration and a new clustering process is started, considering the following information: the global pheromone matrix \( \Gamma \), the barycenters of ants, which will be used as the initial centroids for the new classes, and the best ant \( \mathbf {h}^{*} \).

Algorithm 2 presents a detailed pseudocode of the BACOK. The K-means algorithm was applied (see line 19 in Algorithm 2) to each ant. The method is applied after all ants have built their respective classifications, and until the absolute difference between current inertia and previous inertia is less than 0.0001. Algorithm 3 shows how the local search strategy based on K-means works. If lines from 19 to 22 are eliminated from Algorithm 2, then BACO algorithm pseudocode is obtained. Finally, in the event that there has been no improvement, Algorithm 2 uses an iteration number (10 iterations) as stopping criterion (see line 4). Consider that, is increments in line 5, but its value must be returned to zero every time a better solution (comparing with the best in memory) is found. This stopping criterion is based on the stagnation behavior concept presented in Dorigo et al. [8].

figure b
figure c

5 Parameter Analysis

To develop the parameter analysis three data tables (T105, T525, and T2100) were built, with randomly generated normal variables. The data sets T105 (\(n=105\) and \(p=6\)) and T525 (\(n=525\) and \(p=6\)) consists of 105 and 525 objects, respectively. Both sets have seven clusters (\(K=7\)), such that six classes have variance equal to \(\sigma ^2=1\), and the seventh class has \(\sigma ^2=3\). The data set T105 has a “big” class with 51 objects, and the remaining six groups with 9 objects. Meanwhile, T525 has a class with 265 objects, and the remaining objects are equitably distributed in the other groups. The W(P) reference values for T105 and T525 were calculated using the Eq. (1), thereby 7.62467183 and 7.45610263 were obtained for these tables, respectively. Table T2100 has 2100 objects, seven clusters with the same cardinality and all classes have different variances. The W(P) reference value for this set is 22.56959210.

The Algorithm 2 has four parameters that should be fitted, with the aim of achieving good performance. Parameters \( \alpha \) and \( \beta \) control the relative weights assigned to pheromone concentration and ant visibility, respectively. Meanwhile, \( \rho \) represents the pheromone evaporation rate, used to update the pheromone matrix. Finally, parameter Q is a pheromone amplification constant.

To develop the parameter analysis tables T105 and T525 were used, and for each table, and for each parameter combination, 200 multistart runs were done. Based on the ranges presented in Dorigo et al. [8], in the current experiment a further analysis was developed, using \(\rho \in \{0.1,0.2,\ldots ,0.9\}\), \(\alpha , \beta \in \{0,0.5,1,1.5,\ldots , 6\}\), and \(Q\in \{50,100,150,\ldots ,500\}\). In total \(9\times 13\times 13 \times 10=15210\) combinations were run for each table. This analysis used \(M=10\) (the number of ants).

The pictures in Fig. 1 show some examples of the 90 contour maps built with the performance percentages (each percentage represents how many times the algorithm scores the W(P) reference value, in the 200 runs) obtained with table T105, for the different parameter combinations. For example, Fig. 1a shows the contour map for \(\rho =0.1\), \(Q=50\) and \(\alpha ,\beta \in \{0,0.5,1,1.5,\ldots , 6\}\). This analysis showed that \(\rho =0.5\) was the best option, because the best performance zone for \(\rho =0.5\) (the darker red zone in Fig. 1b) is better (largest area) than those of the remaining \(\rho \) values.

Fig. 1
figure 1

Some examples of contour maps created with the performance percentages, for \(Q=50\), \(\rho =0.1,0.5,0.9\), and variants values for \(\alpha \) and \(\beta \). Analysis done with table T105

On the other hand, very similar contour maps were obtained when \(\rho \) was fixed, and Q varied from 50 to 500 (10 contour maps per each \(\rho \) value). This showed evidence that Q was not an important parameter in this experiment. And this coincides with the observation presented in Dorigo et al. [8], which indicates that Q has a negligible influence in the algorithm. Therefore, the parameter Q was fixed at 250 (the range middle value), but also could be fixed at 100, as they did.

Next, an analysis for \(\alpha \) and \(\beta \) was developed with tables T105 and T525, using \(\rho =0.5\), \(Q=250\), and \(\alpha ,\beta \in \{0,0.25,0.5,0.75,\ldots ,6\}\). Figure 2 shows the contour maps obtained in this process. This analysis was not enough to determine optimum values for \(\alpha \) and \(\beta \). Figure 2a and b only suggest that the best performance is probably obtained when \(1.5\le \beta \le 5\) and \(0<\alpha \le 2.5\). For this reason, an extra analysis was developed with table T2100. Figure 3 shows that any combination for \(\alpha \) and \(\beta \) in the dark red region could be taken. Therefore, for this experiment the combination \(\beta =2.5\) and \(\alpha =0.25\) was selected. Summarizing, the parameters were chosen as \(\alpha =0.25\), \(\beta =2.5\), \(\rho =0.5\), and \(Q=250\).

Fig. 2
figure 2

Contour maps created with the performance percentages, with the fixed values \(\rho =0.5\) and \(Q=250\)

Fig. 3
figure 3

Contour maps created with the performance percentages, with the fixed values \(\rho =0.5\) and \(Q=250\), in table T2100

6 Extra Data Sets, Results, and Discussion

A personal computer with 8 GB of RAM memory and an Intel Core i7-4712MQ CPU@2.30 GHz processor, was used in this experiment. In order to develop a comparison among the algorithms BACO, BACOK and KM, five real-life data sets were downloaded from the website of UCI repository of machine learning databases (UCI [20]): iris (\(n=150\), \(K=3\) and \(p=4\)), wine (\(n=178\), \(K=3\) and \(p=13\)), glass identification (\(n=214\), \(K=6\), \(p=9\)), red wine quality (\(n=1599\), \(K=3\), \(p=11\)) and white wine quality (\(n=4898\), \(K=3\), \(p=11\)) data sets. In glass data set the first attribute was not considered as a variable, because it is an identification number (for this reason \(p=9\)). Furthermore, K was fixed at 6 because the type of glass number 4 is not present in this data set (in total, there are 7 types of glass). In wine quality (both tables), the attribute number 12 was not considered because it is an output variable. Additionally, two groups (A and S) of bidimensional synthetic data sets were considered (downloaded from Fränti & Sieranoja [12]), which are described on Fränti and Sieranoja [11]. Group A (3 sets) varies the number of clusters, and the group S varies the overlapping among the clusters (4 sets). All cases use \(p=2\). Table 2 summarizes the main features of these sets and Fig. 4 shows a bidimensional representation for each set. Also, the ground truth centroids for these data sets are available on Fränti and Sieranoja [12]; hence, it was possible to analyze if the proposed BACOK algorithm was generating a reasonable clustering for the data. The centroid index (CI) presented in Fränti, Rezaei, and Zhao [10] is a cluste- level similarity measure, based on the cluster centroids, which can be used to compare one clustering against other solution or the ground truth, if is available. The algorithm BACOK was executed 100 times on sets A1, A2, A3, S1, S2, S3, and S4, and the best solution found, in each case, was compared with the ground truth solution, using the CI value. In all cases, the CI value was equal to zero, therefore according to Fränti and Sieranoja [11], our algorithm is properly clustering those datasets. This experiment was made with 20 ants (\(M=20\)) and the parameters \(\alpha =0.25\), \(\beta =2.5\), \(\rho =0.5\), and \(Q=250\). Finally, using for each set the centroids of the best solution and the definition of W(P) (see Eq. 1), the best within inertia for each set (\(W_{best}\)) was calculated (see column number 4 on Table 2).

Table 2 Main features for sets on group A ans S
Fig. 4
figure 4

Two-dimensional representation for the datasets on groups A and B

Table 3 Performance comparison among BACO, BACOK, and KM

Table 3 summarizes the results obtained with the three algorithms. The performance of each algorithm is represented by a percentage, and this corresponds to the number of times in which the algorithm scored the \(W_{best}\) value in 100 multistart runs. The algorithm BACO also used \(M=20\), \(\alpha =0.25\), \(\beta =2.5\), \(\rho =0.5\), and \(Q=250\). Meanwhile, the KM algorithm iterates until the difference between two consecutive within inertias is less than 0.001. The symbol “-” used in Table 3 means the algorithm did not attend the \(W_{best}\) reference value in any of the 100 runs. Also, the standard deviation of inertia, the average time and the standard deviation of time, in those 100 executions, are presented in Table 3.

Table 3 shows how the algorithm BACOK performed very good on the available data sets. This final comparison is valuable because it reinforces one of the principal contributions of this paper: the BACO and KM algorithms did not show good results in most all data sets, but our algorithm uses the potential of K-means to improve the algorithm BACO, and then significantly better results were obtained. That hybridization process presented in algorithm BACOK reveals how the K-means algorithm itself could not work well, but it can be used to improve other heuristic algorithms. Finally, the lowest performance reported by algorithm BACOK was in the set S4, which has the highest level of overlap (see Fig. 4).

7 Conclusions

We have presented a hybrid clustering method based on the ant colony optimization metaheuristic and the K-means algorithm. The method is based on some features developed for ACO in the traveling salesman problem and it is improved by the K-means algorithm in each iteration. The adaptation to the clustering problem takes into account the representation of clusters by barycenters, and therefore the distance between objects and barycenters is used for defining visibility and the pheromone trail.

After an extensive parameter fitting, an experimentation was implemented in order to evaluate the method. It performed very well, attaining the reference value for the inertia in each data table, in reasonable time. Furthermore, the method showed very good results when it was applied to other benchmark data sets, where the ground truth for each set was available.

Finally, the experiment revealed the parameter Q does not have a relevant role in the ACO algorithm, but the algorithm is very sensitive to the values assigned to the parameters \(\alpha \), \(\beta \) and \(\rho \). The parameter fitting process was necessary to improve the algorithm performance and it gave the combination \(\alpha =0.25\), \(\beta =2.5\) and \(\rho =0.5\).