Neighborhood search based improved bat algorithm for data clustering

Kaur, Arvinder; Kumar, Yugal

doi:10.1007/s10489-021-02934-x

Neighborhood search based improved bat algorithm for data clustering

Published: 14 January 2022

Volume 52, pages 10541–10575, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

Neighborhood search based improved bat algorithm for data clustering

Download PDF

813 Accesses
15 Citations
1 Altmetric
Explore all metrics

Abstract

Clustering is an unsupervised data analytic technique that can determine the similarity between data objects and put the similar data objects into one cluster. The similarity among data objects is determined through some distance function. It is observed that clustering technique gains wide popularity due to its unsupervised and can be used in diverse research filed such as image segmentation, data analytics, outlier detection, and so on. This work focuses on the data clustering problems and proposes a new clustering algorithm based on the behavior of micro-bats. The proposed bat algorithm to determine the optimal cluster center for data clustering problems. It is also observed that several shortcomings are associated with bat algorithm such as slow convergence rate, local optima, and trade-off among search mechanisms. The slow convergence issue is addressed through an elitist mechanism. While an enhanced cooperative method is introduced for handling population initialization issues. In this work, a Q-learning based neighbourhood search mechanism is also developed to effectively overcome the local optima issue. Several benchmark non-healthcare and healthcare datasets are selected for evaluating the performance of the proposed bat algorithm. The simulation results are evaluated using intracluster distance, standard deviation, accuracy, and rand index parameters and compared with nineteen existing meta-heuristic algorithms. It is observed that the proposed bat algorithm obtains significant results with these datasets.

Variants of bat algorithm for solving partitional clustering problems

Article 11 March 2021

Dynamic frequency based parallel k-bat algorithm for massive data clustering (DFBPKBA)

Article 06 September 2017

Hybrid Fuzzy C-Means Using Bat Optimization and Maxi-Min Distance Classifier

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the data analytic field, clustering is a well-known data analysis method for determining similar data objects and grouped these data objects into one cluster [1, 2]. A cluster consists of similar data objects and has dissimilarity with the object present in other clusters [3]. The clustering task can be described through partitional, hierarchical, grid based, density based, and model-based clustering [4,5,6]. The partitional clustering aims to divide the set of data objects into many distinct clusters based on dissimilarity measures. In hierarchical clustering corresponds to the tree structure of clusters and each node of the tree acts as a cluster. It consists of two approaches- agglomerative (bottom up) and divisive (top down). In the agglomerative approach, each data point belongs to a separate cluster initially. As the process goes on, data points merge within a cluster, if having similar capabilities. While in the divisive approach, all data points belong to one cluster and are repeatedly divided into smaller clusters based on dissimilarity criteria. The grid-based clustering quantizes the data space into a finite number of cells and generates a grid like structure. While, in density-based clustering, the clusters are designed on the basis of data compactness. The compactness is computed through the number of data points presented in a given radius and the density of data points can be used to construct the clusters. The model-based clustering considers the generation of clusters using probability distribution function and each component represents a cluster. The clustering techniques has been proved their potentiality in various field such as image segmentation, stock market, pattern recognition, outlier detection, feature extraction, and medical data analysis [7,8,9].

Recently, meta-heuristic algorithms are widely adopted in the field of data clustering for obtaining the optimum clustering solutions [10]. These algorithms are inspired from swarm intelligence and insect behaviour like particle swarm optimization (PSO) [11, 12], artificial bee colony (ABC) [13,14,15] and ant colony optimization (ACO) [16, 17]; well-known physics laws like magnetic optimization algorithm (MOA) [1], charged system search (CSS) [18], black hole (BH) [19] and big bang-big crunch (BB-BC) [20, 21]; chemical processes like artificial chemical reaction optimization (ACRO) [22]; evolutionary algorithms [23] like genetic algorithm (GA), genetic programing (GP) and biogeography based algorithm (BBA) [24]; animal behaviour based algorithms like grey wolf optimization (GWO) [25], elephant heard optimization [26], cat swarm optimization [27], lion optimizer [28]; and population based algorithms like sine cosine algorithm [29], stochastic fractal search algorithm [30], thermal exchange optimization algorithm [31]. These algorithms are differ to each other in terms of local and global search mechanisms. The different search mechanisms are adopted for computing local and global optimum solutions. Few algorithms having strong local search ability, for example CSO, BB-BC, SCA, BH etc., while rest of have strong global search ability like PSO, ABC, BBA, GWO etc. [10] However, it is observed that for getting optimal solution, local and global search abilities should be balanced [12]. Further, Abraham et al. [32] stated that data clustering can minimize the dissimilarity measures between the data points within a cluster and dissimilarity can be maximized between data points of other clusters. Several clustering algorithms are designed such k-means, c-means, tabu search, simulated annealing etc. But, it is observed that these algorithms are sensitive to initial solutions, thus easily trapped in local optima. For example, Choudhury et al., [33] designed an entropy based method to determine the initial solutions for the k-means algorithm. The aim of this method is to overcome the dependency of k-means on initial solutions. Moreover, Torrente and Romo [34] also considered the initialization issue of k-means and developed a new initialization method based on the concept of bootstrap and data depth for computing the optimal initial solutions. It is also noticed that traditional clustering algorithms faced difficulty with complex and large datasets. This issue of data clustering is effectively addressed through metaheuristic algorithms. For example, Ahmadi et al. [35] designed a clustering algorithm based on the grey wolf optimization method to tackle the data clustering problems, especially with large datasets. Several modifications like local search and balancing factor are incorporated into grey wolf optimization to effectively handle the data clustering problem. Ghany et al. [36] developed a hybrid clustering algorithm based on whale optimization algorithm (WOA) and tabu search (TS) for solving data clustering. The reason for hybridization is to overcome the local optima and also to improve the quality of clustering solutions. The results stated that hybridization of WOA and TS successfully handles the aforementioned issues.

Sorensen presented the critical evaluation of various well known metaheuristic algorithms [37]. It is stated that researchers focus on the actual mechanism behind the underlying concept rather than to develop the new metaheuristic algorithm and also concentrate promising research direction in the field of metaheuristic algorithms. To keep in mind rather than design a new metaheuristic algorithm, this work considers the existing metaheuristic algorithm i.e. Bat algorithm for solving data clustering problems. Recently, the bat algorithm become popular in the research community and provides optimal solution for various optimization problems [38,39,40]. Bat algorithm is developed by Yang et al. [38] based on the behavior of micro-bats into an algorithm, especially the echolocation feature of micro-bats. The micro-bats use the echolocation feature to detect prey (food) and avoid obstacles. For detection of prey, microbats emits a short pulse. The aim of short pulse is to produce echo and in turn, micro-bats recognize the shape and size of prey. It is seen that several performance issues are associated with the bat algorithm [40,41,42]. These issues are outlined as convergence rate, local optima, population initialization, and trade-off factor among local, and global searches. In turn, the bat algorithm converges on near to optimal solution instead of the optimal solution. The issues related to the performance of the bat algorithm are summarized as

Population Initialization: The initial population provides a significant impact on the success of clustering algorithms [42, 43]. If, initial population is not selected in effective manner, then premature convergence problem can occur. As, meta-heuristic algorithms select the initial population using random function.
Local optima: It is noticed that sometime, the population of bat algorithm is not updated in effective manner [39, 44]. In turn, the objective function returns same value in successive iteration. Finally, algorithm converges with same solution, but the solution is not optimal one. This situation is called local optima and it occurs due to lack of appropriate mechanism to update population of micro-bats.
Convergence Rate: The convergence rate of an algorithm depends on the optimization process and exploration of the search space [45, 46]. The convergence rate can also affect due to lack of coordination between exploration (local search) and exploitation (global search) processes.

The contribution of work is given as:

1.
To develop an enhanced cooperative co-evolution method to handle the population initialization issue.
2.
An elitist strategy is developed for improving the convergence rate.
3.
To incorporate limit operator to check the local optima situation in algorithm.
4.
To develop the neighbourhood search mechanism for exploring optimal candidate solution in exploration process.
5.
The proposed bat algorithm is applied to solve clustering problems.

2 Related works

The recent works reported on partitional clustering algorithm are summarized in this section. Since past few decades, numbers of clustering algorithms are developed for obtaining the optimum results for partitional clustering. Few of them are discussed below.

To determine best initial population and automatic cluster numbers, Rahman and Islam [47] designed a hybrid algorithm based on K-means (KM) and genetic algorithm (GA). Genetic algorithm was applied to determine optimized initial cluster centres for KM. C-means algorithm was adopted to obtain optimum clustering results. The performances of proposed algorithms were assessed on twenty datasets. The results were compared using well-known clustering techniques. It was claimed that fuzzy c-means with GA gives better clustering results.

Liu et al. [48] presented a clone selection algorithm for addressing automatic clustering. In automatic clustering, number of clusters can be detected in auto manner. Hence, in this work, authors introduce a genetic operator for detecting the number of clusters. The well-known twenty-three datasets are selected for measuring the performance of the clone selection algorithm. The results are compared with ILS, ACDE, VGA and DCPSO algorithms. Authors claimed that proposed algorithm provides better results without prior knowledge for number of clusters.

A two-step artificial bee colony algorithm was reported for obtaining the optimal clustering results [49]. Prior to implement, three improvements are inculcated in ABC algorithm to make it more robust and efficient. These improvements are summarized as initial cluster centre locations, updated search mechanism, and equations and abandoned food source. The initial cluster centre locations are determined through one step KM method. A PSO based search mechanism is used for exploring the promising search space. Hooke and Jeeves concept are considered for evaluating abandoned food source locations. The performance of proposed two-step ABC algorithm is tested on both artificial and benchmark data sets and compared with well-known clustering algorithms. It was observed from the results that the proposed algorithm significantly improves the performance of conventional ABC algorithm.

Cao et al. [50] developed a new initialization method based on neighbourhood rough set model. The intra cluster and inter cluster similarities of an object were represented in terms of cohesion and coupling degrees. Furthermore, it is integrated with KM algorithm for improving clustering results. The efficacy of proposed algorithm is tested over three datasets and compared with other two initialization algorithms. The proposed initialization method provides superior results than traditional methods.

Han et al. [51] adopted a new diversity mechanism in gravitational search algorithm to handle clustering problems. The collective response of birds can be used to design diversity mechanism and implemented through three simple steps- (i) initialization, (ii) identification (nearest neighbours) and (iii) orientation alteration. The candidate population is generated into initialization step as a first step of algorithm. The second step corresponds to evaluate the nearest neighbours through a neighbourhood strategy. Third step can change the current location of candidate solution based on nearest neighbour. Thirteen datasets are chosen for evaluating the performance of algorithm and simulation results are compared with well-known clustering algorithms. Authors claimed that proposed algorithm achieves superior clustering results.

Senthilnath et al. [52] introduced two-phase fire fly algorithm (FA) for clustering task. This algorithm simulates the flashing pattern and social insect behaviours of fire flies. First phase of algorithm measures the variation of light intensity. Second phase towards the movement of fireflies. The efficiency of fire fly algorithm is assessed on thirteen standard datasets and compared with ABC and PSO. The simulation results favour the existence of FA algorithm in clustering filed.

To handle the initialization issue of K-mean algorithm, Erisoglu et al. [53] developed a new initialization method. This method is based on the bi-dimensional mapping of features. Initially, two features are chosen, the first feature is an attribute with maximum value of variation coefficient, called main axis. The second feature is determined using correlation values between main axis (first variable) and rest of attributes. Hence, the second feature is an attribute with minimum correlation. The several benchmark datasets are used to evaluate the performance of the proposed algorithm. From simulation results proved that proposed method significantly better than KM algorithm.

Kumar and Sahoo [54] hybridized the MCSS algorithm with PSO. The personal best mechanism of PSO algorithm was added into magnetic charge system search algorithm. Further, neighbourhood strategy was also introduced to avoid local optima situation. The ten datasets are selected for evaluating the performance of MCSS–PSO algorithm and results are compared with wide range of clustering algorithms. Authors claimed that better quality results are achieved by MCSS-PSO algorithm.

Zhou et al. [55] introduced simplex method-based SSO algorithm for solving clustering task. In this work, simplex method is incorporated into SSO algorithm to enhance local search ability and improved convergence speed. The eleven datasets are considered for evaluating the simulation results of proposed algorithm and compared with well-known clustering algorithms. The proposed SSO algorithm perform well in terms of accuracy, robustness, and convergence speed.

Boushaki et al. [56] designed a new quantum chaotic CS algorithm for clustering task. To extend the global search ability of quantum chaotic cuckoo search algorithm, a nonhomogeneous update mechanism was employed. Chaotic maps are incorporated in this algorithm to improve convergence speed. The performance of algorithm was compared with different variants CS algorithms and hybrid variants of clustering algorithms. Authors claimed that proposed CS algorithm provides more compact clusters than other algorithms.

A combination of GA and message-based similarity (MBS) measure was presented for effective cluster analysis by Chand et al. [57]. The MBS measure consists of two types of messages-responsibility and availability. The messages are exchanged among data points and cluster centres. The responsibility can be measured as evidence regarding cluster centres, while the availability corresponds to appropriateness of data point with respect to clusters. Further, GAMBS consists of variable-length real-valued chromosome representation and evolutionary operator. The artificial and real-life datasets are adopted for measuring the performance of GAMBS algorithm. The simulation results showed that the algorithm obtains significant clustering results.

Hatamlou [23] developed a new clustering algorithm inspired from black hole (BH) phenomenon. Like other clustering algorithms, BH algorithm starts with initial population selection and objective function evaluation. The performance of proposed algorithm is tested on six benchmark datasets and it is stated that black hole clustering algorithm provides better clustering results.

Zhang et al. [58] presented an ABC algorithm for data clustering. In ABC, onlooker bees and employed bees are responsible for global search, while scout bees are responsible for local search. Further, Deb’s rule is incorporated to redirect search in solution space. The performance was tested on three real-life datasets and compared with other clustering algorithms. Results revealed that the proposed algorithm provides good quality results.

Taherdangkoo et al. [59] reported a new blind naked mole rat’s algorithm in clustering field. This algorithm considers food search capability and colony protection characteristics of mole rats. The algorithm starts by initializing the population of mole rats and searches the entire space for optimal solution in random fashion. In next iterations, employed mole rats start movement to target food source and their neighbours. The performance of proposed algorithm was tested on six standard datasets and compared with other well-known clustering algorithms. Results revealed that blind naked mole rat’s algorithm provides higher accuracy with faster convergence speed.

Hatamlou [60] considered the slow convergence rate of binary search algorithm and designed a new algorithm for cluster analysis. This algorithm chooses initial cluster points from differ locations. Further, the search direction is based on the successive objective function values. If current objective function is better than previous objective function, then search proceeds in same direction, otherwise in opposite direction. The six benchmark datasets are chosen for evaluating the efficacy of proposed algorithm. The results are compared with KM, GA, SA, TS, ACO, HBMO, and PSO algorithms. The proposed algorithm provides superior clustering results.

Bijari et al. [61] presented a memory-enriched BB-BC algorithm for clustering. It works in two phases- BB and BC phase. The BB phase corresponds for generation of random points near to initial seed points. While, BC phase corresponds for optimizing these generated points. The BB-BC algorithm is memory less algorithm. So, a memory concept is integrated into BB-BC algorithm for memorizing the best location and also maintaining the exploration and exploitation tasks. The performance of algorithm was tested on six data sets and compared with well-known algorithms like GA, PSO, GWO, and original BB–BC. Results stated that the clustering results are improved significantly.

Abualigah et al. [62] combined krill herd (KH) optimization algorithm with harmony search (HS) to overcome local optima problem in clustering. A global exploration operator and reproduction procedure was integrated in krill herd algorithm. The seven standard datasets are selected for measuring the performance of proposed algorithm and results are compared with GA, PSO, HS, KHA, H-GA, and H-PSO algorithms. Authors claimed that proposed combination (KH + HS) achieves more accurate clustering results.

Pakrashi and Chaudhuri [63] hybridized Kalman filtering algorithm with KM algorithm. In this work, authors consider the slow convergence rate of KM algorithm and it can be improved with the help of Kalman filtering algorithm. Further, a conditional restart mechanism was also incorporated in K-Means algorithm to handle local optima situation. The seven benchmark datasets are taken for evaluating the performance of proposed algorithm and results are compared with HKA, KGA, GAC, ABCC, and PSO algorithms. It is noticed that Kalman filtering algorithm successfully overcome the deficiency of KM algorithm.

Kang et al. [64] hybridized KM and mussels wandering optimization algorithm, called K-MWO. The proposed algorithm comprises of local search abilities of KM, while, global search is accomplished through mussels wandering optimization algorithm. The performance is tested on nine datasets and results are compared with K-M and K-PSO algorithms. Authors claimed that K-MWO is an effective clustering algorithm.

To solve clustering search space problems, Wang et al. [65] presented a hybrid version of flower pollination algorithm (FPA) and bee pollinator algorithm (BPA). The discard pollen operator of ABC is used for enhancing global search ability of flower pollination algorithm. Further, the local search mechanism is improved through elite mutation and crossover operator. Several artificial and benchmark datasets are selected for measuring the performance of proposed algorithm. The simulation results are compared with KM, FPA, CS, PSO, ABC, DE, algorithms. Results proved that combination of FPA and BPA provides more optimal results than others.

Hatamlou and Hatamlou [66] designed a two-stage clustering approach to overcome the drawbacks of particle swarm optimization like local optima and slow convergence speed. In first stage, PSO algorithm is adopted for generating the initial candidate solution. In second stage, HS algorithm is considered for improving the quality of solution. Seven datasets are chosen for measuring the performance of the proposed algorithm and results are compared with KM, PSO, GSA, BB-BC methods. It is seen that proposed algorithm determines good quality clusters.

A hybrid version of ABC algorithm with genetic algorithm is also presented for enhancing the information exchange mechanism among bees by Yan et al. [67]. It is applied for solving data clustering problems. The information exchange mechanism is enhanced with the help of crossover operator. The six standard datasets are adopted for evaluating the simulation results of proposed ABC algorithm and results are compared with other ABC, CABC, PSO, CPSO and GA clustering algorithms. The proposed ABC algorithm having better clustering results than others.

To perform efficient clustering, Kwedlo [68] combined differential evolution (DE) algorithm with KM. KM algorithm is used to tune candidate solutions generated through mutation and crossover operators of DE. Additionally, a recording procedure is also introduced to handle redundant solutions. The performance of proposed algorithm was compared with five other well-known clustering algorithms. It was noticed that DE-KM algorithm gives state of art clustering results.

Yin et al. [69] presented a hybridized version of improved GSA with KHM for solving clustering problems. This work considers the convergence rate of KHM and diversity mechanism of GSA to develop new algorithm. The performance of proposed algorithm is tested on seven benchmark datasets and compared with other well-known clustering algorithms. Authors claimed that combination of KHM-GSA achieves better convergence.

A hybrid version of ant algorithm is presented for handling clustering problems [70]. KHM algorithm is used for hybridizing the Ant algorithm, called KHM-Ant. The proposed algorithm contains the merit of both algorithms such as initialization characteristic of KHM and local optima characteristic of ant. The five benchmark datasets are considered for measuring the performance of KHM-Ant algorithm. The simulation results are compared with KHM and ACA. Authors claimed that more optimal results are achieved by KHM-Ant algorithm.

Xiao et al. [71] developed a quantum-inspired GA (QGA) for partitional clustering. In this work, Q-bit based representation and rotation operation of quantum gates are applied for achieving better search mechanisms. Several standards and simulated datasets are selected for evaluating the performance of QGA algorithm. The QGA is able for finding optimal clusters without prior knowledge of number of clusters centres.

Aljarah et al. [72] hybridized grey wolf optimizer (GWO) with tabu search (TS) for cluster analysis. TS is incorporated as an operator in GWO for searching neighbourhood. It helps in balancing exploration and exploitation of GWO. The proposed GWOTS is tested on thirteen real datasets and results have been compared with other popular metaheuristics. The experiment results show that GWOTS is superior in terms of convergence behaviour and optimality of results.

A PSO based clustering algorithm is presented in [73]. The concept of cooperative evaluation is incorporated into PSO for improving convergence rate and diversity. The cooperative co-evolution method worked as decomposer and PSO algorithm as optimizer. The standard and simulated datasets are selected for measuring the performance of PSO and compared with SRPSO, ACO, ABC, DE, and KM algorithms. The concept of cooperative evaluation improves the performance of PSO in significant manner.

To solve clustering problems effectively, an improved CSO algorithm is reported in [74]. Several modifications are incorporated in CSO algorithm to make it effective. These modifications are described in terms of search equations. Further, a local search method is also developed for handling local optima problem. The performance is evaluated on five datasets and compared with several known clustering algorithms. Simulation results showed that improved CSO obtains effective clustering results.

A class room teaching based meta-heuristic algorithm is also presented for handling clustering problems [75]. The properties of K-means algorithm were also investigated for effective clustering results [76]. Six benchmark datasets are used to evaluate the performance of aforementioned algorithm. The performance is measured in terms of overlapping, number of clusters, dimensionality and cluster size. An intelligent system for spam detection was presented in [77]. More relevant features were identified using evolutionary random weight networks. Table 1 gives the summary of the various studies of literature.

Table 1 Summarization of the recent state of art works in the direction of partitional clustering

Neighborhood search based improved bat algorithm for data clustering

Abstract

Similar content being viewed by others

Variants of bat algorithm for solving partitional clustering problems

Dynamic frequency based parallel k-bat algorithm for massive data clustering (DFBPKBA)

Hybrid Fuzzy C-Means Using Bat Optimization and Maxi-Min Distance Classifier

Explore related subjects

1 Introduction

2 Related works

3 Bat algorithm

4 Improved Bat algorithm for cluster analysis

4.1 Enhanced co-operative co-evolution method

4.1.1 Population partitions and size description

4.2 Elitist strategy

4.2.1 Evaluation phase

4.2.2 Updating phase

4.3 Q-learning based neighborhood search mechanism

4.3.1 Identification step

4.3.2 Evaluation phase

4.4 Time complexity

5 Experimental setup and results

5.1 Experiment 1: benchmark clustering datasets (non-healthcare datasets)

5.1.1 Comparison of simulation results of proposed BAT and standard/well-known clustering algorithms

5.1.2 Comparison of simulation results of proposed BAT and existing hybrid clustering algorithms

5.1.3 Comparison of simulation results of proposed BAT and recently reported clustering algorithms

5.2 Experiment 2: healthcare datasets

5.2.1 Comparison of simulation results of proposed BAT and standard/well-known clustering algorithms

5.2.2 Comparison of simulation results of proposed BAT and existing hybrid clustering algorithms

5.2.3 Comparison of simulation results of proposed BAT and recently reported clustering algorithms

5.3 Statistical test

6 Conclusion

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation