1 Introduction

Clustering, an important task in data mining, has been approached by many disciplines because of its wide applications such as biology, information retrieval, business, medicine, social science, earth science. In a clustering problem, objects of the data set are partitioned into appropriate number of clusters based on some similarity function. Consequently, objects sharing same cluster are more similar in comparison to the objects in the distinct clusters [31]. Clustering quality is often measured by an internal validity criteria, which can be based on different features of the clusters such as compactness, isolation, and connectedness. Conventionally, clustering algorithms are broadly classified as partitional, hierarchical, and density based algorithms [15]. The hierarchical methods organize the objects as a hierarchical tree structure where each level represents partitions of the data set. These methods do not require initialization of solutions and prior knowledge about number of clusters. However, in these methods, setting stopping criteria is very difficult as well as objects assigned to a cluster can not move to another cluster at later stage [29]. Density based clustering methods [12] implement a key idea that each object in a cluster should not contain number of objects in the neighborhood (in a defined radius) less than a defined threshold value. Though these methods are usually well suited to identify outliers, they find it difficult to partition the data set with overlapping clusters.

On the other hand, Partitional clustering methods directly decompose the data set into number of clusters. Partitional clustering can be fuzzy or hard [6]. Hard partitional clustering creates non-overlapping clusters by assigning each object into one cluster. However, in case of fuzzy partitional clustering [2], each object resides into every cluster with some membership weight. It is more suited to the data sets having overlapping clusters. Many traditional/conventional clustering algorithms, e.g., K-Means [18], K-medoids [26] have been proposed to solve the hard partitional clustering problem. However, these algorithms suffer from many drawbacks, e.g., they stuck into local optima, the quality of partitions is dependent on initial solution [23]. Therefore, a large number of nature-inspired population based global search optimization methods, e.g., evolutionary methods, swarm intelligence methods, have been proposed to overcome these deficiencies and enhance the quality of clustering solutions [11, 17].

However, most of the research proposals available in the literature partition data sets on a single objective function. In this sense, optimized solution based on a single objective function is biased towards a particular criterion. As clustering problems usually consist of multiple conflicting objectives, various researchers [14, 15, 28] suggest that quality of clustering solution should be evaluated by multiobjective optimization methods to reduce dominance of a particular objective on the results of clustering and it should be treated as a natural way to partition the data sets. Such methods obtain a set of non-dominated solutions known as pareto-front, where each solution is a trade-off of the conflicting objectives and no solution in this front is inferior to the other solutions. Finally, the decision maker picks a solution as the promising solution based on his/her requirement.

The multiobjective optimization methods simultaneously optimize a number of conflicting objectives. Over the years, many classical (non nature-inspired computing) multiobjective methods, e.g., weighted sum methods [24], Benson’s method [10], are proposed to solve many real-world problems involving multiple conflicting objectives. Though these methods are easy to implement, they suffer from some inherent drawbacks, e.g., optimal solutions depend on the chosen initial solution, they tend to get stuck to a suboptimal solution, they are not efficient in handling problems having discrete search space, they can not be used efficiently on parallel machines [7]. To overcome these drawbacks, many multiobjective evolutionary algorithms [15, 32] and multiobjective swarm intelligence methods [1, 4] are proposed to obtain trade-off solutions in the pareto-front. In [4], the authors propose mutiobjective particle swarm optimization (MOPSO) to solve multiobjective problems using PSO [21] and apply it on several benchmark test functions. They show that it is highly competitive over its competitors NSGA II, microGA, and PAES. However, we observe that there is no information sharing mechanism among particles as a swarm (in group); each particle is influenced by its own experience pbest, known as cognitive component, and the leader of the swarm gbest, known as social component. Here, we present a novel method TSMPSO to improve the diversity mechanism (to build such an information sharing mechanism among particles) in MOPSO by introducing crossover operator of GA. Here, TSMPSO along with three competing algorithms MOPSO, elitist non-dominated sorting genetic Algorithm [9] (NSGA-II), and Multiobjective Artificial Bee Colony [1] (MABC) are applied to clustering application. Two conflicting clustering objectives SSE [30] and connectedness [16] are optimized to obtain non-dominated solutions on variety of real data sets having different cluster characteristics, e.g., size, shape, density. Here, external archive (which is dynamic but limited in size) stores the non-dominated solutions as the search progresses and is used to guide the particle’s flight in swarm during iteration and further to find a best solution having highest F-measure value at the end of the run. We experiment with seven real data sets from UCI machine learning repository. The results show that TSMPSO is superior over the competing methods on all the data sets.

Rest of the paper is organized as follows. Section 2 presents a brief literature review. Section 3 presents the algorithmic background of competing methods. Our proposed method TSMPSO is detailed in Sect. 4. Experimental set up, results and discussion are included in Sect. 5. Finally, Sect. 6 concludes with future scopes in the field of data clustering.

2 Related work

A brief survey of the multiobjective evolutionary algorithms and multiobjective particle swarm intelligence methods and applications can be found in [33] and [27] respectively. As mentioned above, clustering usually consists of conflicting objectives; few researchers approach the problem in multi-objective framework to find the best clustering solutions corresponding to trade-off of the objectives. In this context, an evolutionary multiobjective optimization method [14] attempts to improve the accuracy of clustering using two conflicting objective functions overall deviation and connectedness when the number of clusters in the data set is known. It adopts PESA-II [5] for clustering by incorporating a specialized mutation and an initialization procedure. A well-known multiobjective hard clustering method, Multiobjective Clustering with automatic k-determination (MOCK) [15] consists of two main phases. In the first phase, the initial clustering phase, PESA-II [5] is used to optimize two conflicting objectives as mentioned in [14] to produce a set of trade-off solutions. For encoding, it uses a locus-based adjacency representation. This representation encodes clustering along with number of clusters. The initial population is filled up using two clustering algorithms minimum spanning tree (MST) based algorithm and K-means. Half of the population is filled up using the MST-based algorithm. Remaining half is filled up by K-means algorithm for different number of clusters. It uses uniform crossover operator to provide diversity in search space and neighborhood-biased mutation operator to make the convergence faster. At the end, this phase generates a set of non-dominated solutions. In second phase, the model selection phase, the number of clusters is determined based on selecting a particular solution from the pareto-front. This particular solution is selected based on the shape of the pareto-front (knee) by the Gap statistics. In [16], four different pairs of clustering criteria are evaluated and analyzed in multiobjective framework with the same encoding, variation operators, initialization and parameter setting as in [15]. It has been shown that pair of objectives employed in MOCK acts as strongest combination. In [13], the corrected Rand based selection strategies are used to reduce a high number of partitions generated in multiobjective framework. These strategies significantly reduce the number of solutions (partitions) by maintaining diversity in search space and preserving quality solutions. We propose a novel method TSMPSO which outperforms its competitors for hard partitional clustering. Performance assessment is carried out in terms of classification accuracy, distribution of solutions in the obtained pareto-front, convergence of the obtained pareto-front with respect to the reference pareto-front, and number of solutions in the obtained pareto-front.

3 Algorithm background

In this section, we describe the MOPSO [4], NSGA II [9], and MABC [1] in brief.

3.1 MOPSO

PSO is a population-based metaheuristic search algorithm for single objective optimization where every particle in the swarm is a potential solution. Particle’s flight is influenced by its personal best position and the global best position in the swarm [21]. The velocity (v) and position (x) of a particle is updated based on Eqs. 1 and 2, respectively.

$$\begin{aligned} v_\mathrm{bd}(t+1)&= v_\mathrm{bd}(t) + c_1.r_{1d}(y_\mathrm{bd}(t)-x_\mathrm{bd}(t))\nonumber \\&+ c_2.r_{2d}(\widehat{y}_d(t)-x_\mathrm{bd}(t))\end{aligned}$$
(1)
$$\begin{aligned} x_\mathrm{bd}(t+1)&= x_\mathrm{bd}(t) + v_\mathrm{bd}(t+1) \end{aligned}$$
(2)

where \(d\) is \(d\mathrm{th}\) dimension of a particle; \(b\) is \(b\mathrm{th}\) particle in the swarm; \(y\) is local best of a particle ; \(\widehat{y}\) is the global best in the swarm; t is the iteration; \(r_1\) and \(r_2\) are random numbers in the interval (0, 1), and \(c_1\) and \(c_2\) are positive acceleration constants. It requires following changes to solve a multiobjective optimization problem [27].

  • Selection criteria of the personal best (pbest) for a particle and the global best (gbest) in the swarm.

  • Strategy to retain all non-dominated solutions found during the run, e.g., external archive.

  • Diversity mechanism in the search process to escape convergence to a single solution or sub-optimal solutions.

Algorithm 1 demonstrates a step by step procedure of MOPSO to solve a multiobjective problem. MOPSO changes PSO to deal with multiobjective optimization problems. When particle’s pbest is dominated by particle’s current position, it is replaced by particle’s current position. If neither of them is dominated by each other, then we select one of them randomly. MOPSO uses external repository to store non-dominated solutions obtained in each iteration and later leader selected from this repository guides the solutions towards pareto-front by maintaining diversity. Concept of this archive is similar to the adaptive grid of Pareto archive evolutionary strategy (PAES) [22]. PSO is mainly suitable for multiobjective optimization because of the high speed of convergence as long as premature convergence does not occur. Since single information sharing mechanism in PSO among the particles may easily lead to convergence to the local optimum (i.e., premature convergence) in multiobjective optimization, appropriate promotion of diversity is an important issue.

figure a

3.2 NSGA II

NSGA II [9] is a multiobjective optimization evolutionary algorithm based on elitism and diversity preservation mechanism. Initially, parent population \(PP_t\) (size N) is generated randomly based on a particular criteria. Then offspring population \(OP_t\) is generated by selecting parents from parents population \(PP_t\) using crowded binary tournament selection and other genetic operators. Further, the offspring population \(OP_t\) (size M) is combined with the parent population \(PP_t\) (size N) making the entire population \(EP_t\) of size \(M+N\). Afterwards, non-dominated fronts and crowding distance (a measure of density of solutions in the neighborhood) of solutions are calculated on \(EP_t\). Later, N solutions out of \(M+N\) are promoted to the next generation based on better rank or higher crowding distance on the same rank. Consequently, at the end of the run, we obtain a set of non-dominated solutions on best pareto-front (solutions with the best rank) in the final population. We outline NSGAII in Algorithm 2.

figure b

NSGA II uses explicit diversity preservation mechanism crowding distance to diversify solutions on the non-dominated front and elitism to preserve already obtained non-dominated solutions to the next generation.

3.3 MABC

ABC algorithm is a swarm intelligence population based optimization algorithm introduced by Karboga [19] where potential solutions are food sources of honey bees. The algorithm design is inspired by intelligent food foraging behavior of real honey bees. The quality of solutions is evaluated based on values (nectar amount) of food source in terms of two objectives SSE and connectedness. In a natural bee swarm, there are three types of honey bees: employed bees, onlooker bees, and scout bees. The employed bees search new food sources, which dominate the food sources in their memory. The onlooker bees gather information from employed bees to select food sources which dominate larger number of food sources and further search for better quality food sources in neighborhood of the selected food sources. The scout bees discover food sources randomly for the exhausted food sources. Here, MABC [1] changes ABC to deal with multiobjective optimization problems. Three phases of MABC to search new food sources are described as follows:

  • Employed bees phase: In employed bees phase, an employed bee searches for the food source in the neighborhood, which dominates the food source in her memory. The position of new food source with respect to \(i_{th} \) food source is obtained by following Eq. 3.

    $$\begin{aligned} x_{ij}' = x_{ij} + w1\star \phi _{ij} (x_{ij}-x_{pj}) \end{aligned}$$
    (3)

    where \(p\) is a food source randomly selected from the archive ; \(N\) denotes number of food source; \(j\in \{1,2,\ldots ,d\}\) is a randomly selected index; \( \phi _{ij} \) is a random number between [0,1]; the coefficient w1 controls the importance of \(p\) in search of new food source. If the new obtained food source in the neighborhood is better than the food source in her memory, then the bee updates her position with new one. Here, the number of employed bees is equal to the number of food sources.

  • Onlooker bees phase: In onlooker bees phase, all the employed bees provide information about the quality of new food sources to the onlooker bees. Then onlooker bees choose their food sources based on probability \(\mathrm{prob}_i\) related to fitness of the obtained solutions in the employed bees phase. In MABC, \(\mathrm{prob}_i\) of food source \(i\) is evaluated using Eq. 4.

    $$\begin{aligned} \mathrm{prob}_i = \frac{\mathrm{fit}_i}{\sum _{i=1}^{N}{\mathrm{fit}_i}} \end{aligned}$$
    (4)

    where \( \mathrm{fit}_i \) is the fitness value of food source \(i\); \( \mathrm{fit}_i =\frac{\mathrm{dom}(i)}{N} \); \(\mathrm{dom}(i)\) is number of food sources dominated by \(i\); \(N\) represents the number of food sources. Each onlooker bee modifies the solution in her memory similar to the employed bee based on \(\mathrm{prob}_i\) values. However, an another coefficient parameter w2 controls the importance of neighbour \(p\) which is selected randomly from the swarm. If the obtained solution dominates previous one, bee memorizes non-dominated solution and forget the previous one.

  • Scout Bees Phase: In the previous two phases of ABC, it is found that the position of some food sources is not upgraded in predetermined number of trials. Such food sources are called abandoned solutions. At this point, scout bees search new solution \(x_i\) randomly as shown in Eq. 5 for each abandoned solution.

    $$\begin{aligned} x_{ij} = x_{\mathrm{min},d} + \mathrm{rand}(0,1) (x_{\mathrm{max},d}- x_{\mathrm{min},d}) \end{aligned}$$
    (5)

    where d is dimension of the data set.

Algorithm 3 demonstrates a step by step procedure of MABC [1] to solve a multiobjective problem.

figure c

MABC uses grid based approach to maintain non-dominated solutions in an external archive after performing three phases Of MABC.

4 Proposed methods

In this section, we present a description of the proposed algorithm TSMPSO.

4.1 Solution representation and initialization

We follow a centroid based representation to represent a particle (candidate solution) as described in [23]. Here, every candidate solution contains \(k\times \mathrm{nvar}\) dimensions, where \(k\) denotes the number of clusters, which is known a priori and \(\mathrm{nvar}\) indicates the number of dimensions in the data points. Every dimension of a cluster is assigned a random number between maximum \((x_{\mathrm{max}})\) and minimum \((x_{\mathrm{min}})\) value of the corresponding dimension in the data set. The \(\mathrm{position}(n, d)\) represents \(d\mathrm{th}\) dimension of the \(n\mathrm{th}\) cluster, and rand() is uniformly distributed random number in the range of [0,1]. Algorithm 4 describes steps of particle initialization.

figure d

Note that initial velocities of particles in the PSO are assigned to zero. When new solutions are generated, dimensions of the new solutions must satisfy the dimensional boundary condition of search space. It means that if the value of a dimension of a new solution crosses the value of upper bound of corresponding dimension in the search space then that value is replaced by the value of the upper bound. Similar condition is applied for the lower bound also. As the paper deals with the clustering problem where k is known a priori, a solution must contain k clusters. However, it is possible that a newly generated solution does not contain valid k clusters. Such solution is regarded as erroneous solution . Here, the erroneous solution is replaced by a randomly generated new solution as per the procedure described in Algorithm 4.

4.2 Fitness functions

Since single validity measure is not equally appropriate for the data set of different characteristics, simultaneous optimization of multiple objectives naturally improves the results. Therefore, we select a pair of conflicting objectives intra cluster variance (SSE) [30] and connectedness [15] based on comparative study shown in [16]. First objective SSE, shown in Eq. 6, is based on the idea that each object should belong to a nearest cluster centre and quality of solution should be evaluated in context of compactness of clusters. It is effective for the data set having well-separated and spherical clusters.

$$\begin{aligned} SSE(S)= \sum ^{k}_{i=1} \sum _{\forall x_p\in c_i}\Vert x_p-m_i\Vert ^{2} \end{aligned}$$
(6)

Here \(S\) is a solution, \(x_p\) indicates \(p\mathrm{th}\) object of the data set, \(m_i\) implies centroid of the \(i\mathrm{th}\) cluster, and \(c_i\) denotes the \(i\mathrm{th}\) cluster. However, the second objective connectedness, shown in Eq. 7, is based on the idea that objects in the neighbourhood should belong to the same cluster. It is more compatible to identify clusters of arbitrary shapes [15].

$$\begin{aligned} \mathrm{Connectedness}(S)= \sum _{p=1}^{N} \left( \sum _{j=1}^{T} x_{p,nn_p(j)}\right) \end{aligned}$$
(7)

where \(S\) denotes a solution, \(p\) indicates \(p\mathrm{th}\) object in data set, \(N\) indicates number of objects in the data set, \(nn_p(j)\) indicates \(j\mathrm{th}\) nearest neighbor of object p, and \(x_{p, nn_p(j)}\) indicates penalty received with value (1/j) when \(j\mathrm{th}\) nearest neighbor data point is not in the same cluster as p. Otherwise, penalty value received for that nearest neighbor data point is zero. In other words, for each data point p, amount of penalty is equal to sum of penalty received with value (1/j) due to each nearest neighbor j when p and j are not in the same cluster; \(T\) is a parameter which indicates the number of nearest neighbours to utilize; it can be seen that penalty value received gradually decreases for distant neighbor.

Consequently, by these two objective functions, homogeneity of the clusters is measured by SSE and separation between the clusters is measured implicitly by connectivity measure.

4.3 TSMPSO

TSMPSO algorithm is based on pareto based elitist approach [7]. For initialization of swarm \(POP\) (size \(NP\)), particle velocities are initialized to zero and personal best (pbest) of each particle is initialized to its initial position. Obtained non-dominated solutions in the swarm are preserved in external archive. It serves two purposes; first, non-dominated solutions are preserved as search progresses, and second, an appropriate global best (leader) particle is selected from this archive that guides the swarm during the search process. To select the leader and maintain the external archive, hypercubes [4] are created in the objective space where solutions in external archive are maintained as coordinate system. Location of these solutions in hypercubes are determined according to values of their objective functions. Further, fitness assignment of each hypercube containing more than one solution is performed. The assigned fitness of hypercube equals to the result of dividing any number \(x>1\) (Here, we selected \(x=10\) as recommended in [4]) by number of solutions in respective hypercube. This process lead to decrease the fitness of hypercube having more number of solutions. Roulette-wheel selection is performed to select the hypercube based on associated fitness. A leader is selected randomly from solutions residing in selected hypercube.

In first-stage of diversity mechanism in TSMPSO, changes in particle’s velocity and position are performed on current swarm as per Eqs. 1 and 2 respectively to create a temporary swarm, say \(POP1\). Then particle pbest is updated to its current position except if its pbest dominates its current position in order to provide better diversity to solutions [3]. Since particles are influenced by their pbest and the leader of swarm, no information sharing mechanism is available among all particles in the swarm except that each particle can obtain information from the leader. Therefore, in order to build such an information sharing mechanism among particles to improve diversity in search process, in second-stage a genetic crossover operator is applied on the swarm to build such an information sharing mechanism among individuals. A single point crossover is performed with some crossover probability \(P_c\) by selecting two parents randomly from the current swarm to generate offsprings population by exchanging segments beyond crossover point. To perform crossover, a random integer number is generated in the range of 1 to \(k*\mathrm{nvar}-1\), where \(k\) denotes number of cluster centroids and \(\mathrm{nvar}\) indicates the number of dimensions in a centroid. Here, we denote generated offsprings population as \(POP2\). This additional diversity mechanism is helpful to enhance the exploration capability of MOPSO and leads to diversify solutions in the search space.

Now, the \(POP1\) and \(POP2\) are combined to create temporary population \((TEMP\_POP)\). For \(TEMP\_POP\), different pareto fronts are created based on dominance criteria [7]. New population is filled by starting the best non-dominated front followed by the second, and so on. When last-allowed front is considered, the remaining slots in the new population may be less than the solutions on this front. The crowding distance (\(CD\)) is calculated on last-allowed front. It is one of the niching method to calculate density of solution around an individual solution on a non-dominated front, which sums the normalized distance of two neighbour solutions \(p-1\) and \(p+1\) of a solution \(p\) in objective space along each of the objectives. The least crowded solution on the last-allowed front is given priority like NSGA II [9]. Finally, external archive is updated with respect to the non-dominated particles obtained in the new swarm. Existing solution in the archive is removed if it is dominated by any current non-dominate solution. Whenever external archive becomes full, extra solutions are removed based on giving priority to solutions residing in less populated regions [4]. Accordingly, places of solutions within hypercubes are redetermined. The pseudo code of the TSMPSO is shown in Algorithm 5 and the flowchart is presented in Fig. 1.

figure e
Fig. 1
figure 1

Flow chart of proposed algorithm

5 Experimental results and discussions

We perform experiments on TSMPSO along with three well-know multiobjective optimization methods MOPSO [4], NSGA II [9], MABC [1] to optimize two conflicting criteria intra cluster distance and connectedness simultaneously. The obtained results demonstrate that the proposed algorithm is highly competitive with competing algorithms over seven real data sets from the UCI machine learning repository.Footnote 1

5.1 Parameters setup

As results of the population based optimization methods are influenced by parameters settings, these values should be selected carefully based on empirical and/or theoretical evidence. We choose population size 100 as higher value increases the computational overhead. Based on the experiments, the maximum number of fitness function evaluation selected as stopping criteria is 40,000 as performance of algorithms is not considerably improved afterwards. Parameter settings of competing algorithms which are applied in clustering domain are similar to their original research papers. We select the value of \(T\) parameter based on recommended range given in [15]. Further, the value of Number of hypercubes (divisions for adaptive grid) and size of external archive are chosen based on the recommended value in [4]. The value of inertia weight w, cognitive learning factor \(c1\) and social learning factor \(c2\) are chosen as 0.71, 1.49, 1.49 respectively as this set produce the best results overall among different sets of values. The Table 1 presents all parameter settings for the proposed as well as the competing methods.

Table 1 Control parameters for TSMPSO, MOPSO, and NSGA II

5.2 Datasets

The datasets are in matrix of size \(n \times \mathrm{nvar} \) with real-valued elements, where \(n\) is number of objects and \(\mathrm{nvar}\) is number of features. These objects are partitioned into k non-overlapping clusters. In this work, we perform experiment on seven real datasets Iris, Glass, Vowel, Wisconsin Breast cancer (WBC), Wine, Zoo, and Dermatology from the UCI machine learning repository to validate performance of the TSMPSO. A brief characteristic of these datasets is presented in Table 2. In WBC, we use only 683 samples out of originally 699 samples by removing 16 samples having some missing features.

Table 2 Datasets descriptions

5.3 F-measure

We use F-measure to judge accuracy of the obtained clusters. F-measure [28] is a balanced measure, which is evaluated by combining precision and recall as harmonic mean. Precision is the fraction of retrieved objects that are relevant, while recall is the fraction of relevant objects that are retrieved. F-measure of a cluster with respect to known class can be mathematically expressed as Eq. 8.

$$\begin{aligned} F-\mathrm{measure} = 2\times \frac{\mathrm{precision} \times \mathrm{recall}}{\mathrm{precision} + \mathrm{recall}} \end{aligned}$$
(8)

where, \(\mathrm{precision}= m_{pq}/m_p \), \(m_{pq}\) is the number of objects which belong to cluster \(p\) and class \(q\) both, and \(m_{p}\) is the total number of objects in cluster p; \(\mathrm{recall}= m_{pq}/m_q \), \(m_{q}\) is the total number of objects in class \(q\). Optimum value of F-measure is 1.

5.4 Performance assessment matrices

Generally, following issues are considered to assess algorithms in order to perform quantitative assessment in the multiobjective scenario [35].

  1. 1.

    Convergence of solutions: solutions should be as close as possible towards optimal pareto front. It is assessed based on minimized distance between obtained pareto-front of an algorithm and optimal pareto-front.

  2. 2.

    Diversity of solutions: solutions in the pareto-front should be distributed in such a way that they represent a set of solutions covering the entire pareto-optimal region uniformly.

  3. 3.

    Number of solutions: number of solutions should be as maximum as possible on non-dominated pareto front.

Therefore, we assess performance of the algorithms using following commonly used performance assessment matrices as no single metric can judge performance of an algorithm in complete sense [7].

  • Coverage (\(C\)) [34]: this metric is used to compare relative quality of two sets of solutions based on domination. For two pareto-fronts \(S_1\) and \(S_2\), \(C(S_1, S_2)\) evaluates the proportion of solutions in \(S_2\), which are dominated by solutions in \(S_1\). It can be mathematically defined as shown in equation 9.

    $$\begin{aligned} C(S_1,S_2) = \dfrac{ |\{a_2\in S_2; \exists a_1 \in S_1:a_1\preceq a_2\}|}{|S_2|} \end{aligned}$$
    (9)

    Value of \(C\) lies between 0 and 1. A higher value of \(C\) indicates a better dominance. \(C(S_1, S_2)\) = 1 means each solution of \(S_2\) is dominated by at least one solution of \(S_1\). On the other hand, \(C(S_1, S_2)\) = 0 means no solution of \(S_2\) is dominated by any solution of \(S_1\).

  • Distribution (\(\triangle \)) [9]: this metric provides useful information about distribution (spread) on obtained non-dominated front (\(S\)) of an algorithm. It is calculated with a relative distance measure between consecutive solutions in \(S\). It is mathematically described as shown in Eq. 10.

    $$\begin{aligned} \triangle (S)= \sum _{i=1}^{|S|-1} \dfrac{|d_i - \bar{d}|}{|S|-1} \end{aligned}$$
    (10)

    Here, for each solution \(i\), \(d_i = \sum _{m=1}^{M} {|{f_m}^{i+1} - {f_m}^i|}\) is the sum of absolute difference in normalized objective function values between consecutive solutions \(i\mathrm{th}\) and \({(i+1)}\mathrm{th}\) in (\(S\)), which is sorted in order in each objective; \(\overline{d}\) is the mean of \(d_i\). Smaller value of \(\triangle \) indicates better uniformly distributed solutions where ideal value of \(\triangle \) is zero which indicates that the solutions on \(S\) are uniformly distributed.

  • Convergence [8]: this matric assess convergence of the non-dominated solution set of an algorithm with respect to a reference set \(P^*\). It is mathematically defined as shown in Eq. 11.

    $$\begin{aligned} \mathrm{Convergence}(S) = \dfrac{\sum _{i=1}^{|S|} d_i}{|S|} \end{aligned}$$
    (11)

    where, \( d_i= \mathrm{min}_{j=1}^{p^*} \sqrt{\sum _{k=1}^{M} \dfrac{f_k (i) - f_k (j)}{f_k^{\mathrm{max}} - f_k^{\mathrm{min}}}} \)

    Here, \(f_k^{\mathrm{max}}\) and \(f_k^{\mathrm{min}}\) are the minimum and maximum objective function values respectively for \(k\mathrm{th}\) objective function in \(P^*\). Smaller value of this measure indicates superiority of the solution set. Ideal value of this measure is zero, which indicates all solution set lie on reference (optimal) pareto front.

  • ONVG [25]: overall Non-dominat Vector Generation (ONVG) is the count of solutions on obtained non-dominated pareto front. It is mathematically defined as shown in Eq. 12.

    $$\begin{aligned} ONVG(S) = |S| \end{aligned}$$
    (12)

5.5 Comparison of results

We compare performance of proposed algorithm TSMPSO with three well known algorithms MOPSO [4], NSGA II [9], and MABC [1] for crisp clustering on all the experimental data sets. Initially, performance of these algorithms is measured based on the best solution in obtained solution set which has the highest F-measure value. Table 3 shows the mean and standard deviation (shown in bracket) of the highest F-measure values achieved by solution sets in different runs. It is clearly visible that TSMPSO reveals its superiority over competing algorithms in term of mean classification accuracy of F-measure. Moreover, TSMPSO reports the smallest standard deviation in term of Glass, WBC, and Zoo data sets which demonstrates robustness of the algorithm. In case of Wine data set, mean accuracy achieved by these algorithms are close to each other. MOPSO demonstrates better mean accuracy than NSGA II in term of Iris , WBC, and Zoo data sets. Moreover, it achieves smaller standard deviation of accuracy over all the data sets except Glass data set. This indicates robustness of MOPSO with respect to NSGA II. Further, we selected different matrices to quantitatively assess performance of algorithms in terms of diversity and convergence. Table 4 presents quantitative comparison of coverage metric (\(C\)) obtained pareto-fronts of algorithms from combined pool of sets of solutions in 30 independent runs. Symbols A, B, C, and D indicate TSMPSO, MOPSO, NSGA II, and MABC respectively. The values of \(C(A, B)\), \(C(A, C)\), and \(C(A, D)\) are 1 in most of the data sets, which clearly show that all solutions of competitive algorithms are dominated by at least one solution of TSMPSO. On the other hand, the values of \(C(B, A)\), \(C(C, A)\), and \(C(D, A)\) are 0 in most of the datasets which indicate that no solution of TSMPSO is dominated by any solution of competitive algorithms. Table 5 shows mean and standard deviation (shown in bracket) of distribution (\(\triangle \)) achieved by obtained sets of solutions in different runs. TSMPSO reports the smallest mean and standard deviation for Iris, Glass, Wine, and Zoo data sets, which indicates that the solutions are closer to a uniform distribution along the pareto-front and algorithm is more robust. Furthermore, Table 6 presents distribution (\(\triangle \)) for obtained non-dominated solutions from combined pool of sets of solutions in all runs. TSMPSO reports the smallest value over all the data sets with single exception that MOPSO has the smallest value for Wine data set. This indicates that \(TSMPSO\) is able to find a distribution of solutions closer to a uniform distribution along the pareto-front. Table 7 presents quantitative comparison of convergence metric for obtained pareto-fronts of algorithms from combined pool of sets of solutions in all runs. TSMPSO reports the smallest value over all the data sets along with zero value for Iris, Glass, and WBC data sets. This indicates that TSMPSO produces solutions that are more closer to the reference pareto-front over all the data sets. However, other methods have difficulties in convergence towards optimal pareto-front. Table 8 shows the values of ONVG of algorithms for the pareto-fronts obtained from the combined pool of sets of solutions in all the runs. TSMPSO reports the highest values of ONVG among the competing algorithms over all the data sets with exception that MOPSO and NSGA II have the largest value for Vowel and Dermatology data sets, respectively. This means that TSMPSO has lager number of solutions on obtained non-dominated pareto-front for most of the data sets.

Table 3 Performance of algorithms is based on F-measure on different data sets
Table 4 Coverage for obtained non-dominated solutions from combined pool of sets of solutions in all run
Table 5 Average results of Distribution with standard deviation of obtained solutions sets are presented based on all runs
Table 6 Distribution for obtained non-dominated solutions from combined pool of sets of solutions in all runs
Table 7 Convergence for obtained non-dominated solutions from combined pool of sets of solutions in all runs
Table 8 ONVG for obtained non-dominated solutions from combined pool of sets of solutions in all runs

Based on performance exhibited by MOPSO, it can be analyzed that solution achieved by MOPSO does not properly approach to the optimal pareto-front. This indicates that MOPSO does not properly diversify the solution in global search space because PSO suffer from well known stability problem as well as information sharing mechanism is single directional among particles. In NSGA II, genetic crossover is a good way to provide diversity in search space by establishing information sharing mechanism among solutions. Therefore, performance of this algorithm is supposed to be better. However, experimentally it has been shown that NSGA II does not achieve better solutions over the proposed algorithm. Such type of behavior demonstrates that good solutions promoted in next generation are not sufficiently explored in search space by only applying genetic crossover operation. It concludes that it requires better balance of exploration and exploitation in the search space. It can also be observed from empirical results that MABC does not compete with the proposed algorithm over these data sets. Because in the search procedure of ABC, bees exchange only one dimension with a random neighbor, which slow the optimization ability of algorithm when dimension of problem increases [20]. It concludes that it requires sufficient exploration in search space.

As TSMPSO incorporates exploration through GA and exploitation through PSO, it performs better; experimental results also support its superiority over the competitors. Figure 2 shows comparison of the best F-measure value achieved by algorithms in 30 independent runs over all the data sets. It also supports the superiority of the proposed method over competing methods. Further, graphical representation of pareto-fronts obtained by the competing algorithms are shown in Fig. 3 for different data sets, which clearly shows that solutions of TSMPSO on obtained pareto-fronts are closer to uniform distribution with larger extent. In addition, solutions are closer to the reference optimal pareto-fronts. The above validates superiority of the TSMPSO over the other competing algorithms in terms of distribution, convergence, and number of solutions as reported quantitatively by different metrics (refer Tables 4, 5, 6, 7 8). In case of Dermatology data set, based on F-measure values, all four algorithms perform highly unsatisfactory as it is a high dimensional data set containing 34 dimensions. Even on this data set, proposed method demonstrate better performance over other three methods. Overall, TSMPSO is a good alternative to solve crisp clustering problems through multiobjective approach to produce high quality trade-off solutions.

Fig. 2
figure 2

Bar chart shows the comparison of the best F-measure value obtained in 30 runs

Fig. 3
figure 3

Pareto front for non-dominated solutions from combined pool of obtained solutions in all independent runs. a Iris; b glass; c vowel; d WBC; e wine; f zoo; g dermatology

6 Conclusions

Multiobjective optimization methods are natural way to optimize multiple criteria of clustering by reducing biases of a particular single objective. In this paper, TSMPSO is proposed to increase exploratory capability of the MOPSO by introducing an additional diversity mechanism. Results show that TSMPSO outperforms well-known competitive algorithms in term of classification accuracy (F-measure) and quality assessment metrics across the range of real data sets. As the future work, the authors aim to extend this study for data sets with unknown number of clusters.