A new meta-heuristic algorithm based on chemical reactions for partitional clustering problems

Singh, Hakam; Kumar, Yugal; Kumar, Sumit

doi:10.1007/s12065-019-00221-w

A new meta-heuristic algorithm based on chemical reactions for partitional clustering problems

Research Paper
Published: 23 March 2019

Volume 12, pages 241–252, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Evolutionary Intelligence Aims and scope Submit manuscript

A new meta-heuristic algorithm based on chemical reactions for partitional clustering problems

Download PDF

381 Accesses
10 Citations
Explore all metrics

Abstract

In the field of engineering, heuristic algorithms are widely adopted to solve variety of optimization problems. These algorithms have proven its efficacy over classical algorithms. It is seen that chemical reactions consist of an efficient computational procedure to design a new product. The formation of new product contains numbers of objects, states, events and well defined procedural steps. A meta-heuristic algorithm inspired through chemical reaction is developed, called artificial chemical reaction optimization (ACRO) algorithm. In this work, an ACRO algorithm is adopted to solve partitional clustering problems. But, this algorithm suffers with slow convergence rate and sometimes stuck in local optima. To handle these aforementioned problems, two operators are inculcated in ACRO algorithm. The performance of proposed algorithm is tested over well-known clustering datasets. The simulation results confirm that proposed ACRO algorithm is an effective and competitive algorithm to solve partitional clustering problems.

Chemical Reaction-Based Optimization Algorithm for Solving Clustering Problems

Electrical Search Algorithm: A New Metaheuristic Algorithm for Clustering Problem

Article 21 December 2022

Exploring meta-heuristics for partitional clustering: methods, metrics, datasets, and challenges

Article Open access 12 September 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Clustering is an unsupervised technique that can be widely adopted for data analysis.. In clustering, there is no need of training the data. This technique adopts a distance measure to compute data objects in clusters. In present time, clustering can get wide attention from research community both of theoretical and practical point. This technique is widely adopted in diverse fields such as process monitoring [1], image processing [2], feature selection [3], gene expression and micro array data [4], e-learning [5] and healthcare [6] etc. Through literature, it is found that large numbers of meta-heuristic algorithms have been presented to obtain optimal solution for clustering problems. Such algorithms are based on some natural phenomena, laws of physics, thermodynamics etc., behavior of swarms, insect, animals etc. The algorithms based on physics and thermodynamics laws are charged system search algorithm (CSS) [7, 8] and magnetic charged system search algorithm (MCSS) [9, 10]. The algorithms based on natural phenomena’s are black hole (BH) [11] and galaxy based algorithm [12]. The algorithms based on the behavior’s of bird, insect and animal are artificial bee colony algorithm (ABC), ant colony optimization (ACO), bat algorithm (BA) etc. In present time, these algorithms are more popular than classical algorithms and widely adopted by researchers to solve diverse optimization problems due to following reasons.

•
Having different solution strategies.
•
Less dependency on size of the problem, solution space, constraints and variables.
•
Adapt itself according to problem domain.
•
Effective mechanism to solve combinatorial and non linear problems
•
Avoid local optima problem and less computation power.
•
Search the solution space with different initial points.

Due to aforementioned advantages, these algorithms have been extensively applied in diverse fields such as image processing, biology, sciences, market research, biomedical process and engineering. Moreover, the improved versions of these algorithms have also been presented in literature [13].

Recently, an algorithm inspired through chemical reaction is reported in the literature to find the optimal solutions for global optimization problems [14, 15]. This algorithm is based on the chemical reactions like synthesis, decomposition, metathesis etc. The population of ACRO algorithm is denoted using reactants. The various encoding schemes like integer, character, real and float can be used to denotes reactants and further, to solve the optimization problems. The different chemical reactions are applied to evaluate and generate new reactants during the execution of algorithm. The objective of this work is to evaluate the efficiency of ACRO algorithm to solve the partitional clustering problems. Moreover, to improve the convergence rate of ACRO algorithm, some modifications are incorporated in it. The performance of proposed algorithm is tested on benchmark clustering datasets and experimental results are compared with state of art clustering algorithms reported in literature. The rest of paper is organized as. The ACRO algorithm is presented into sub Sect. 1.1. Section 2 presents the ACRO algorithm for partitional clustering problems. Section 3 presents different clustering algorithms used for comparison. The simulation results of proposed algorithm are described in Sect. 4. Section 5 summarizes the entire work.

1.1 Artificial chemical reaction optimization (ACRO) algorithm

The motivation behind the ACRO algorithm is chemical reaction process. It is observed that atoms and molecules constantly move and collide in a viscous fluid. Atoms can be described as the elementary particle of a chemical reaction and having a type, charge, mass, position, radius, orientation, and velocity. Whereas, a molecule can be defined as a group of two or more atoms linked through bonds. During a chemical reaction several changes occur like make and break of bonds, change in orientation, substitution and replacement in molecules. It is considered that ACRO algorithm begins with the action of reactants in a vessel. Let us consider a fixed volume vessel filled with a uniform mixture of N chemical reactants which undergo different types of chemical reactions. Let $Ri \in (1 \leq i \geq N)\,$is the list of chemical types, and it is considered that these types can cooperate through M specified chemical reaction channels. In ACRO algorithm, the encoding scheme of the reactants is depending on the user choice and it can be binary, real, string, integer, etc. The new reactant can be produced through the interaction of one or two reactants. The ACRO algorithm starts with set of initial reactants in a solution. Further, reactants are consumed and produced through different chemical reactions. The algorithm is stopped; either the termination condition is met or similar to a state where no more reaction can take place. For the chemical reactions, reactants are chosen based on their concentrations and potentials. Moreover, it is observed that consecutive and competing reactions are the two types of reactions that are mainly reported. Reactants are joined together serially in a consecutive reaction whereas in competing type reactions, different products occur based on the specific condition. It is also seen that sometimes output of one reaction can be a reactant of another reaction. There are many factors that can affect the execution of a reaction. However, ACRO algorithm is based on the simple concept of an equal probability of monomolecular or bimolecular reactions and their alternatives. The main steps of the ACRO algorithm are listed below.

Step 1: Initialize the problem and user defined parameters of algorithm.

Step 2: Evaluate the initial reactants and evaluation.

Step 3: Apply chemical reactions.

Step 4: Update reactants.

Step 5: Termination condition.

2 Proposed work: ACRO for clustering problems

This section presents ACRO algorithm to determine optimal cluster centers for partitioned based clustering problems. In partitioned based clustering problems, the main task is to find the optimum set of cluster centers with minimum intra cluster distance. Prior to apply the ACRO algorithm for partitional clustering problems, two modifications are done in proposed algorithm. The aim of these modifications is to handle slow convergence rate and local optima problems. To achieve the same, two operators i.e. position based operator and neighborhood operator are incorporated in ACRO algorithm. The position based operator is incorporated in synthesis reaction step. Whereas, for inter reactant collision i.e. tradeoff between different reactants, a neighborhood operator is proposed. The main clustering steps of ACRO algorithm are mentioned in sub Sect. 3.2.

2.1 Proposed modification in ACRO algorithm

To make the ACRO algorithm more effective and improve the quality of results, two modifications are proposed in ACRO algorithm. These modifications can be summarized as.

2.1.1 Position based operator

In this work, a position based operator is proposed in synthesis reaction step. The aim of this operator is to generate new reactant. In synthesis reaction, a new reactant is using Eq. 9. The working of position based operator is given as. Suppose, ${{\text{R}}_{\text{i}}}$ and ${{\text{R}}_{\text{j}}}$. are two reactants and generate a new reactant ${{\text{R}}_{{\text{i}},{\text{new}}}}$. The elements in ${{\text{R}}_{\text{i}}}$and ${{\text{R}}_{\text{j}}}$ have equal opportunity to select and pass to same position in new reactant ${{\text{R}}_{\text{i}}}$. Each element in ${{\text{R}}_{\text{i}}}$ has fifty percent probability to select and passes to same position in reactant ${{\text{R}}_{{\text{i}},{\text{new}}}}$. The rest of position ${{\text{R}}_{{\text{i}},{\text{new}}}}$ are filled through reactant ${{\text{R}}_{\text{j}}}.$ For example, ${{\text{R}}_{\text{i}}}=[7,8,3,9,1,10,2]$ and ${{\text{R}}_{{\text{j~}}}}$ = [11, 14, 2, 5, 8, 4, 1], the new reactant ${{\text{R}}_{{\text{i}},{\text{~new}}}}$ can be generated by combining the reactants ${{\text{R}}_{\text{i}}}$ and ${{\text{R}}_{\text{j}}}$. Suppose, the elements 7, 3, 1, 2 are selected randomly from reactant ${{\text{R}}_{\text{i}}}$ and rest of elements are selected from reactant ${{\text{R}}_{\text{j}}}$. The resultant reactant ${{\text{R}}_{{\text{i}},{\text{~new}}}}$. can be expressed as ${{\text{R}}_{{\text{i}},{\text{~new}}}}$ = [7, 14, 3, 5, 1, 4, 2].

2.1.2 Neighborhood operartor

For better inter reactant collision.e. better tradeoff between reactants, a concept of neighborhood operator is inculcated in ACRO algorithm. The aim of neighborhood operator is to overcome the local optima problem. To handle the local optima problem, two elements of current reactant are selected and inter change the posionof these. In turn, a new reactant is generated which is different from previous reactant. For example, the position of current reactant cab be given as ${{\text{R}}_{\text{i}}}$ = [2, 3, 5, 6, 9, 4]. We can select two elements from the current reactant. The selected elements are five and eight. Now, the positions of the elements are interchanged each other. The newly generated reactant is [2,3,4,5, 7, 8, 16] and it different from previous reactant.

2.2 Steps of proposed ACRO clustering algorithm

Step 1: Initialization of user defined parameters and statement of the problem.

Clustering is a minimization problem and can be expressed as follows:

$$Minimize~f\left( {x)~subject~to~{y_i} \in {C_k};\,where~i=1,2,3, \ldots ,D~and~k=1,2, \ldots ,K} \right)$$

where f(x) is an objective function which is defined in terms of Euclidean distance; is the set of data objects; D is the dimension of data object, ${C_k}$ is the set of k^th cluster center and K describes number of clusters in a given data set. It is also noted that the computed cluster centers should be in the range of ${\text{y}}_{{\text{i}}}^{{{\text{min}}}} \,$ and ${\text{y}}_{{\text{i}}}^{{{\text{max}}}}\,$ and also known as boundary constraints. Further, in this work, integer encoding scheme is used to obtain the desired results. The user defined parameters of ACRO algorithm and ReacNum i.e. maximum number of iteration is also defined in this step.

Step 2: Set initial reactants and evaluation.

In this step, the population of the ACRO algorithm is specified; the population is defined in terms of initial reactants. The initial reactants are uniformly identified from the feasible search space. For clustering problem, initially, two reactants i.e. R₁ and R₂ are identified from dataset such as ${R_1}~=~\{ {y_{i,1}},~{y_{i,2}},~.~.~.~,~{y_{i,d}}\} ,~{R_2}~=~\{ {y_{j,1}},~{y_{j,2}},~.~.~.~,~{y_{j,d}}\}$, where, d denotes the length (dimension) of reactant. The number of reactants is similar to number of clusters (K). Then, rests of reactants are derived from initially determined reactants R₁ and R₂ using following procedure. To compute R₁ and R₂, firstly, a dividing factor (k) is initialized. Suppose, k = 2; then, two new extra reactants i.e. R₃, R₄ are generated from R₁ and R₂ using below mentioned procedure

$${{\text{R}}_3}=\left\{ {\begin{array}{*{20}{c}} {r*{{\text{y}}_{{\text{i}},1}},~{\text{r*}}{{\text{y}}_{{\text{i}},2}},~.~.~.~,r*~{{\text{y}}_{{\text{i}},\frac{{\text{d}}}{2}}};~} \\ {r*{{\text{y}}_{{\text{j}},\frac{{\text{d}}}{2}+1}},~{\text{r*}}{{\text{y}}_{{\text{j}},{\text{d}} - \frac{1}{2}+2}},~.~.~.~,~{\text{r*}}{{\text{y}}_{{\text{j}},1}}} \end{array}} \right\}$$

(1)

$${{\text{R}}_4}=\left\{ {\begin{array}{*{20}{c}} {{\text{r*}}{{\text{y}}_{{\text{j}},1}},~{\text{r*}}{{\text{y}}_{{\text{j}},2}},~.~.~.~,~{\text{r*}}{{\text{y}}_{{\text{j}},\frac{{\text{d}}}{2}}};} \\ {~~r*{{\text{y}}_{{\text{i}},\frac{{\text{d}}}{2}+1}},~{\text{r*}}{{\text{y}}_{{\text{i}},{\text{~~d}} - \frac{1}{2}+2}},~.~.~.~,~{\text{r*}}{{\text{y}}_{{\text{i}},1}}} \end{array}} \right\}.$$

(2)

where, r denotes a random number in the range of $~0<r<1$. Further, the more reactants are generated using the following procedure until, the number of reactants are not similar to the desired clusters (K).

$${{\text{R}}_5}=\left\{ {\begin{array}{*{20}{c}} {r*{{\text{y}}_{{\text{i}},1}},~{\text{r*}}{{\text{y}}_{{\text{i}},2}},~.~.~.~,r*~{{\text{y}}_{{\text{i}},\frac{{2{\text{d}}}}{3}}};~~} \\ {r*{{\text{y}}_{{\text{j}},2{\text{d}}/3+1}},~{\text{r*}}{{\text{y}}_{{\text{j}},2\left( {{\text{d}} - 1} \right)/3+2}},~.~.~.~,~{\text{r*}}{{\text{y}}_{{\text{j}},1}}} \end{array}} \right\}$$

(3)

$${{\text{R}}_6}=\left\{ {\begin{array}{*{20}{c}} {r*{{\text{y}}_{{\text{i}},1}},~{\text{r*}}{{\text{y}}_{{\text{i}},2}},~.~.~.~,r*~{{\text{y}}_{{\text{i}},\frac{{\text{d}}}{3}}};~~r*{{\text{y}}_{{\text{j}},{\text{d}}/3+1}},~{\text{r*}}{{\text{y}}_{{\text{j}},2\left( {{\text{d}} - 1} \right)/3+2}},} \\ {~~r*~{{\text{y}}_{{\text{i}},\frac{{2{\text{d}}}}{3}}},~.~.~.~,~{\text{r*}}{{\text{y}}_{{\text{j}},1}}} \end{array}} \right\}$$

(4)

$${{\text{R}}_7}=\left\{ {{\text{r*}}{{\text{y}}_{{\text{i}},1}},{\text{~r*}}{{\text{y}}_{{\text{i}},2}},...,{\text{r*~}}{{\text{y}}_{{\text{i}},\frac{{\text{d}}}{3}}};{\text{~~r*}}{{\text{y}}_{{\text{j}},{\text{d}}/3+1}},,...,{\text{~r*}}{{\text{y}}_{{\text{j}},1}}} \right\}$$

(5)

$${{\text{R}}_8}=\left\{ {{\text{r*}}{{\text{y}}_{{\text{j}},1}},{\text{~r*}}{{\text{y}}_{{\text{j}},2}},...,{\text{~r*}}{{\text{y}}_{{\text{j}},\frac{{\text{d}}}{3}}};{\text{~~r*~}}{{\text{y}}_{{\text{i}},\frac{{\text{d}}}{3}}},...,{\text{~r*}}{{\text{y}}_{{\text{i}},1}}} \right\}$$

(6)

$${{\text{R}}_9}=\left\{ {{\text{r*}}{{\text{y}}_{{\text{j}},1}},{\text{~r*}}{{\text{y}}_{{\text{j}},2}},...,{\text{~r*}}{{\text{y}}_{{\text{j}},\frac{{\text{d}}}{3}}};{\text{~~r*}}{{\text{y}}_{{\text{i}},\frac{{\text{d}}}{3}+1}},{\text{~r*}}{{\text{y}}_{{\text{i}},\frac{{2\left( {{\text{d}} - 1} \right)}}{3}}},{\text{~r*}}{{\text{y}}_{{\text{j}},2{\text{d}}/3+1}}...,{\text{~r*}}{{\text{y}}_{{\text{i}},1}}} \right\}$$

(7)

$${{\text{R}}_{10}}=\left\{ {{\text{r*}}{{\text{y}}_{{\text{j}},1}},{\text{~r*}}{{\text{y}}_{{\text{j}},2}},...,{\text{~r*}}{{\text{y}}_{{\text{j}},\frac{{2{\text{d}}}}{3}}};{\text{~r*}}{{\text{y}}_{{\text{i}},\frac{{2\left( {{\text{d}} - 1} \right)}}{3}+1}},{\text{~~r*}}{{\text{y}}_{{\text{i}},\frac{{2{\text{d}}}}{3}+2}},...,{\text{~r*}}{{\text{y}}_{{\text{i}},1}}} \right\}$$

(8)

Step 3: Applying chemical reactions.

Step 3.1: Bimolecular reactions.

Let us consider ${{\text{R}}_1}=\{ {{\text{y}}_{{\text{i}},1}},{{\text{y}}_{{\text{i}},2}},...,{{\text{y}}_{{\text{i}},{\text{d}}}}\}$ and ${{\text{R}}_2}=\{ {{\text{y}}_{{\text{j}},1}},{{\text{y}}_{{\text{j}},2}},...,{{\text{y}}_{{\text{j}},{\text{d}}}}\}$ are two reactants that can participate in bimolecular reaction. As a result of this, various bimolecular reactionsare incorporated in ACRO algorithm such as synthesis reaction, displacement reaction, redox2 reaction, mono-molecular reaction. For all these operation, integer representation of population is considered rather than binary encoding. The detailed explanation of these reactions is described as below.

Step 3.2: Synthesis reaction: Using this reaction, a new reactant can be obtained using following equation

$${\text{R}} = ({{\text{r}}_1},{{\text{r}}_2},{{\text{r}}_3},\ldots,{{\text{r}}_{\text{i}}},{{\text{r}}_{\text{j}}},{{\text{r}}_{\text{k}}},\ldots,{{\text{r}}_{\text{n}}});\,{\text{where}},{{\text{R}}_{{\text{i}},{\text{new}}}}={{\text{R}}_{\text{i}}}+{{{\varvec{\uplambda}}}_{\text{i}}}\left( {{{\text{R}}_{\text{j}}} - {{\text{R}}_{\text{i}}}} \right)$$

(9)

In the above equation, ${{{\varvec{\uplambda}}}_i}$ is a random value in the interval [0.25, 1.25]; ${{\text{R}}_{\text{i}}}{\text{~and~}}{{\text{R}}_{\text{j}}}$ is randomly selected reactant. Further, a position based operator is proposed in this step to generate new reactant.

Step 3.3: Displacement reaction.

The new reactants are obtained using below mentioned procedure. Suppose, ${{\text{R}}_{\text{k}}}=({{\text{R}}_1},{{\text{R}}_2}...{{\text{R}}_{\text{i}}},{{\text{R}}_{\text{j}}},...,{{\text{R}}_{\text{K}}})$ where, k = 1, 2…….K.

$${{\text{R}}_{{\text{i}},{\text{new}}}}={{\text{R}}_{\text{i}}}\left( {1 - {{{\varvec{\uplambda}}}_{{\text{td}}}}{{\text{R}}_{\text{j}}}} \right)$$

(10)

$${{\text{R}}_{{\text{j}},{\text{new}}}}={{{\varvec{\uplambda}}}_{{\text{td}}}}{{\text{R}}_{\text{j}}}+\left( {1 - {{{\varvec{\uplambda}}}_{{\text{td}}}}{{\text{R}}_{\text{i}}}} \right)$$

(11)

where ${{{\varvec{\uplambda}}}_{td}} \in \{ 0,1\}$ and ${{{\varvec{\uplambda}}}_{td+1}}=2.3~{({{{\varvec{\uplambda}}}_{td}})^{2{\text{sin}}\left( {\pi {{{\varvec{\uplambda}}}_{td}}} \right)}}$; where, in ${{{\varvec{\uplambda}}}_{td}}$the suffix ‘td’ is increased by 1, when reaction is completed.

Step 3.4: Redox2 reaction.

In this reaction, if R₁ is better reactant in terms of objective function, then in ${{{\varvec{\uplambda}}}_{tr}}$ the suffix ‘tr’ is updated by using one when the reaction is performed and it is computed using following equation.

$${{\text{R}}_{{\text{i}},{\text{new}}}}={{\text{R}}_{\text{i}}}+{{{\varvec{\uplambda}}}_{{\text{tr}}}}\left( {{{\text{R}}_{\text{i}}} - {{\text{R}}_{\text{j}}}} \right);{\text{~where}},{{{\varvec{\uplambda}}}_{{\text{tr}}}} \in \left\{ {0,1} \right\}.$$

(12)

Otherwise, R₂ is better reactant, then in ${{{\varvec{\uplambda}}}_{tr,}}$ the suffix ‘tr’ is updated by using 1 when the reaction is performed and it is computed using following equation.

$${{\text{R}}_{{\text{j}},{\text{new}}}}={{\text{R}}_{\text{j}}}+{{{\varvec{\uplambda}}}_{{\text{tr}}}}\left( {{{\text{R}}_{\text{j}}} - {{\text{R}}_{\text{i}}}} \right);{\text{~where}},{\text{~~~}}{{{\varvec{\uplambda}}}_{{\text{tr}}}} \in \left\{ {0,1} \right\}$$

(13)

The value of ${{{\varvec{\uplambda}}}_{tr}}~$ is computed using following equation.

$${{{\varvec{\uplambda}}}_{{\text{tr}}+1}}=\left\{ {\begin{array}{l} {0{{{\varvec{\uplambda}}}_{{\text{tr}}}}=0} \\ {\frac{1}{{{{{\varvec{\uplambda}}}_{{\text{tr}}}}{\text{mod}}\left( 1 \right)}}{{{\varvec{\uplambda}}}_{{\text{tr}}}} \in (0,1)} \end{array}} \right.~$$

(14)

$$1/{{{\varvec{\uplambda}}}_{{\text{tr}}}}{\text{mod}}\left( 1 \right)=\frac{1}{{{{{\varvec{\uplambda}}}_{{\text{tr}}}}}} - \frac{1}{{{{{\varvec{\uplambda}}}_{{\text{tr}}}}}}$$

(15)

Step 3.4: Monomolecular reactions.

Step 3.4.1: Decomposition reaction.

In this reaction, suppose${\text{R}}=\left( {{{\text{R}}_1},{{\text{R}}_2}...{{\text{R}}_{\text{i}}},{{\text{R}}_{\text{j}}},...,{{\text{R}}_{\text{n}}}} \right)$ be the reactant and ${{\text{C}}_{\text{k}}} \in \{ {{\text{R}}_{\text{m}}},{{\text{R}}_{\text{n}}}\}$ be an atom that can take part in monomolecular reaction. The new atom of the molecule ${{\text{R}}_{{\text{i}},{\text{new}}}}$ is a random population or reactant from$\{ {{\text{R}}_{\text{m}}},{{\text{R}}_{\text{n}}}\} \in {{\text{C}}_{\text{k}}}$, but $\left\{ {{{\text{R}}_{\text{m}}},{{\text{R}}_{\text{n}}}} \right\} \ne ({{\text{R}}_{\text{i}}}\ {\text{and}}\ {{\text{R}}_{\text{j}}}).$

Step 3.4.2: Redox1 Reaction.

In this reaction, the new reactant is generated using following procedure.

$${{\text{R}}_{{\text{i}},{\text{new}}}}={{\text{R}}_{\text{m}}}+{{{\varvec{\uplambda}}}_{\text{t}}}\left( {{{\text{R}}_{\text{n}}} - {{\text{R}}_{\text{m}}}} \right)$$

(16)

where, ${{{\varvec{\uplambda}}}_t} \in \{ 0,1\}$ such that initial ${{{\varvec{\uplambda}}}_0} \notin \{ 0.0,0.25,0.5,0.75,1.0\}$ and ${{{\varvec{\uplambda}}}_{t+1}}=4{{{\varvec{\uplambda}}}_t}\left( {1 - {{{\varvec{\uplambda}}}_t}} \right);$ t is updated by using 1 when reaction is performed.

Step 4: Reactants Update.

In this step, a chemical equilibrium test is carried out. If the fitness function of the new generated reactants is better than other, the generated reactant includes into the chemical reactions process and put into reactant pool. The worst reactants are excluded reactant pool and chemical process.

Step 5: Neighborhood operator.

In this step, neighborhood operator is used to generate new reactant and overcome the local optima problem.

Step 6: Termination Condition.

If Reactnum is equal to the user defined maximum number of iterations, then algorithm stops its execution and produces the optimal cluster centers, otherwise steps 3 and 4 are repeated, until the desired results are not obtained or termination condition is not satisfied”.

2.3 Pseudo code of the ACRO clustering algorithm

This subsection presents the main steps of proposed ACRO clustering algorithm. These steps are listed in Algorithm 1.

3 Other clustering algorithms

3.1 K-means algorithm

In clustering field, K-means algorithm is most popular algorithm, developed by MacQueen [17]. K-means algorithm partitions n objects into k number of clusters that are optimal in terms of some criteria. In K-means algorithm, sum of Euclidean distance is taken as similarity measure and objects are assigned to clusters according to minimum Euclidean distance values. Several variants of K-means algorithm are also reported in literature. Zalik developed a variant of K-means algorithm that performs clustering without pre-assignment of number of clusters [18]. In this algorithm, the mean square error cost function is extended with new cost-function. To improve the quality of solution, an iterative K-mean approach is reported in [19]. In this work, the quality of solution is improved by removing one cluster (minus), dividing another one (plus) and applying re-clustering again, in each iteration. Author claimed that plus and minus can perform well with minimized sum of the squared Euclidean distance. Tzortzis and Likas applied MinMax method to overcome the poor initialization problem of k-means algorithm [20]. The MinMax method assigns weights to the clusters relative to their variance and optimize them. Further, MinMax k-means algorithm is improved using iterative procedure that enables the algorithm to automatic adopt the exponent of the datasets. Malinen et al., developed K-means* algorithm based on inverse transformation [21]. Instead of fitting the clustering model to the data the K-means* fit the data to an optimal clustering structure. Kang et al., hybridized the K-means algorithm with mussels wandering optimization algorithm to obtain new swarm intelligence-based method [22]. The proposed algorithm inherits the merits of both algorithm like local search ability of K-means and global optimization ability of MWO. Further the similarity-based clustering ensemble method and weighted-incorporated similarity-based clustering ensemble method are incorporated in the proposed algorithm.

3.2 Ant colony optimization algorithm

To optimize problem search space a meta-heuristic algorithm based on behavior of ants is proposed by Dorigo et al., in 1996, called ant colony optimization (ACO) [23]. The characteristics of ant system such as positive feedback for rapid discovery of good solution, distributed computation to avoid premature convergence and the use of a constructive greedy heuristic to accept solution in early stages of search process are adopted in this algorithm. Shelokar et al., adopted the ACO algorithm in clustering field to find optimal cluster centers [24]. In this work, a pheromone matrix is used to guide search towards optimal solution. Moreover, it is noticed that several variants of ACO algorithm are presented in literature for solving clustering problems. Wan et al., introduced chaotic ant swarm approach for partitional clustering [25]. The proposed algorithm is insensitive to clusters size and density. Huang et al., hybridized continuous ant colony optimization algorithm with particle swarm optimization algorithm [26]. The hybrid ACO algorithm consists of four types of approaches such as sequence, parallel, sequence approach with an enlarged pheromone-particle table and global best exchange. The sequence approaches perform well with enlarged pheromone particle table as compared to other approaches. Menéndez et al., designed two medoids based ACO clustering algorithms [27]. Both algorithms employ the ant colony procedure to determine optimal cluster centers. The goal of Medoid seTACO algorithm is to choose the best medoids based on distance information, while the K-Adaptive Medoid seTACO algorithm is an extension of Medoid seTACO that automatically adjust the number of clusters.

3.3 Particle swarm optimization algorithm

PSO algorithm is based on the behaviors of birds flocking and fish schooling. Cura applied the PSO algorithm to optimize clustering search space problems [28].The proposed algorithm is effective and applicable when the number of clusters is known or unknown. Chuang et al., designed an accelerated chaotic particle swarm optimization algorithm [29]. In this works, the chaotic mapping techniques (logistic maps) are combined with particle swarm optimization algorithm. To enhance the exploration ability of particle swarm optimization algorithm, it is hybridized with big bang big-crunch algorithm [30]. The aim of this hybridization is to overcome the premature convergence problem of PSO algorithm. Further, it is noted that to improve the performance of PSO algorithm, a cooperative co-evolution framework is reported for clustering [31]. The cooperative co-evolution method works in divide and conquers fashion. A hybrid PSO algorithm is presented for effective clustering in [32]. The PSO algorithm is combined with K-Harmonic means and improved cuckoo search algorithms. It is noticed that the proposed algorithm is quite insensitive to cluster center initialization and produces high quality clustering results. Moreover, Hatamlou and Hatamlou hybridized PSO with heuristic search algorithm to overcome the inadequacies of PSO algorithm [33]. In this work, the particle swarm optimization is used to produce an initial solution, while the heuristic search algorithm is used to improve quality of solution.

3.4 Cat swarm optimization algorithm

Cat swarm optimization algorithm is based on swarm intelligence of cats [34]. This algorithm works in two modes i.e. seeking mode and tracing mode. In seeking mode cat is in resting position but being alert, while in tracking mode cat is tracing some targets. Further, to improve the search ability of cat swarm optimization algorithm, Mohapatra et al., modified CSO algorithm using mutation operator [35]. The mutation operator is used to obtain best cat position in search space. Further, it is noticed that Kumar and Sahoo combined the improved cat swarm optimization with k-harmonic means to solve clustering problems [36]. In this work, adaptive inertia weight factor and selection mechanism are introduced to enhance diversity and exploitation ability of CSO algorithm. In continuation of their work, Kumar and Singh designed an improved version of cat swarm optimization algorithm (ICSO) to solve global optimization problem [37].

3.5 Bat algorithm

A new meta-heuristic algorithm based on the behavior of bats, called Bat algorithm is presented in [38]. The microbats use echolocation to detect prey and avoid obstacles. When, microbats search for pray, they emit short pulses and listen for echo from surrounding to detect objects. Initially, Senthilnath et al., applied an approach-based on Bat algorithm to solve crop clustering problem [39]. Further, to cluster large application, a robust optimization technique based on bat behavior and k-medoids partitioning is reported in [40]. In this work, the swarm intelligence of bat algorithm used to guide in search space. Ashish et al., designed a parallel bat algorithm using map-reduce architecture to perform clustering on large volume data [41]. The developed algorithm works in three modules i.e. bat movement, fitness calculation and reduce module. Locations of bats are identified and updated using map function in bat movement module. The fitness of each bat is evaluated in fitness module. Further, reduce function is applied to obtain best bat position in reduce module. The performance of proposed algorithm is tested on five data sets and compared with particle swarm optimization. Authors claimed that proposed algorithm work efficiently on large data sets.

4 Experimental setup

The experimental setup of the proposed work is presented in this section. The efficiency of the proposed algorithm is evaluated on several benchmarks clustering datasets. These datasets are taken from UCI repository. In this study, two artificial datasets are also considered to test the performance of proposed algorithm. These datasets are generated in Matlab. The performance of proposed algorithm is evaluated using intra cluster distance and f-measure parameters. The intra cluster distance parameter determines the quality of clustering, whereas, f-measure denotes the precision. The experimental results of proposed algorithm are compared with K-means, PSO, ACO, CSO, BA and CRO algorithms [17, 24, 36, 42,43,44,45]. Tables 1 and 2 illustrate the parameters setting of the proposed algorithm and other algorithms used in this study.

Table 1 Parameter settings of different algorithms

A new meta-heuristic algorithm based on chemical reactions for partitional clustering problems

Abstract

Similar content being viewed by others

Chemical Reaction-Based Optimization Algorithm for Solving Clustering Problems

Electrical Search Algorithm: A New Metaheuristic Algorithm for Clustering Problem

Exploring meta-heuristics for partitional clustering: methods, metrics, datasets, and challenges

Explore related subjects

1 Introduction

1.1 Artificial chemical reaction optimization (ACRO) algorithm

2 Proposed work: ACRO for clustering problems

2.1 Proposed modification in ACRO algorithm

2.1.1 Position based operator

2.1.2 Neighborhood operartor

2.2 Steps of proposed ACRO clustering algorithm

2.3 Pseudo code of the ACRO clustering algorithm

3 Other clustering algorithms

3.1 K-means algorithm

3.2 Ant colony optimization algorithm

3.3 Particle swarm optimization algorithm

3.4 Cat swarm optimization algorithm

3.5 Bat algorithm

4 Experimental setup

4.1 Experimental results

4.2 Statistical results

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation