An enhanced whale optimization algorithm for clustering

Singh, Hakam; Rai, Vipin; Kumar, Neeraj; Dadheech, Pankaj; Kotecha, Ketan; Selvachandran, Ganeshsree; Abraham, Ajith

doi:10.1007/s11042-022-13453-3

An enhanced whale optimization algorithm for clustering

Published: 27 July 2022

Volume 82, pages 4599–4618, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

An enhanced whale optimization algorithm for clustering

Download PDF

Hakam Singh¹,
Vipin Rai²,
Neeraj Kumar²,
Pankaj Dadheech³,
Ketan Kotecha⁴,
Ganeshsree Selvachandran ORCID: orcid.org/0000-0001-7161-2109⁵ &
…
Ajith Abraham⁶

612 Accesses
15 Citations
Explore all metrics

Abstract

Clustering is a technique of grouping the data objects into clusters. Many metaheuristic algorithms based on swarm intelligence, physic laws, and chemical reactions, among others, have been developed for clustering. In this study, an enhanced whale optimization algorithm (EWOA) is introduced to solve clustering problems. The whale optimization algorithm (WOA) is adapted and enhanced with two additional operational procedures. The position update equations from the water wave optimization algorithm are incorporated into the algorithm to improve the search space and accelerate the convergence rate. The tabu and neighbourhood search mechanisms were added to handle the local optima situation. The efficiency of the proposed EWOA is measured using a simulation-based experiment conducted on eight benchmark datasets, and the results obtained are then compared to seven existing clustering algorithms/techniques. The performance of each algorithm is compared and analyzed using the average intra-cluster distance and f-measure parameters. The experimental results demonstrated the applicability and feasibility of the enhancements that were made and proved the superiority of the proposed EWOA clustering algorithm.

An improved multi-population whale optimization algorithm

Article 11 April 2022

An Improved Water Flow Optimizer for Data Clustering

Article 17 July 2024

An efficient meta-heuristic algorithm based on water flow optimizer for data clustering

Article 21 December 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Data mining is a procedure that pulls previously unknown patterns or information from databases. It is also characterised as a descriptive analytics approach known as clustering, which discovers patterns based on specified dissimilarity criteria [9]. Due to extensive applicability, clustering has captured the attention of research communities who have worked to develope several evolutionary metaheuristic algorithms to solve clustering problems (Dorigo et al. [5]; Cura [4]; Kumar and Sahoo [18,19,20]; Karaboga and Ozturk [16]; Hatamlou et al. [13]; Hatamlou [11]). Clustering methods optimally divide a set of data objects and retain them in clusters (Nanda and Panda [26]; Mat et al. [22]). The clustering process is carried with the help of some dissimilarity measures. The Euclidean distance given in Eq. (1) is an extensively accepted similarity measure in the partitional clustering techniques. It is described as a sum of the square root of the difference between data objects and the cluster centres. The data objects are tailored into the clusters according to the distance values.

$$ D\left({Z}_i,{C}_j\right)=\sqrt{\sum \limits_{i=1}^n\sum \limits_{k=1}^d{\left({Z}_{ik},{C}_{jk}\right)}^2} $$

(1)

where Z_i symbolizes the i^th data instance/object, C_j represents the j^th cluster centre/centroid, whereas n and d denote the number of instances/data objects and dimension/attribute in the dataset, respectively.

In this study, an enhanced version of the whale optimization algorithm (WOA) is proposed to find optimized cluster centres. The WOA is a nature-inspired method that simulates the foraging behaviour of humpback whales (Mirjalili and Lewis [24]). The original WOA suffers from various issues such as the ones listed below:

Convergence rate: The convergence rate is concentric around the search space mechanism and the coordination among exploration and exploitation processes is lacking (Kumar and Sahoo [19]).
Local optima: It is a situation when the candidate solution is not getting an update and primarily occurs due to the absence of a population divarication mechanism. It is observed that the original WOA suffers from local optima (Kumar and Kaur [17]).

To overcome these and other problems that are inherent in the original WOA, this study proposes an improved algorithm called the Enhanced Whale Optimization Algorithm (EWOA). The EWOA is adapted from the original WOA and enhanced with two additional operational procedures to accelerate the convergence rate and overcome the local optima situation. To accelerate the convergence rate, position update equations from the water wave optimization algorithm are incorporated into the algorithm to improve the search space, while the tabu and neighbourhood search mechanisms were incorporated to overcome the local optima situation. The efficiency of the proposed EWOA is measured using a simulation-based experiment conducted on eight benchmark datasets, namely the Iris, Cancer, CMC, Wine, Glass, Thyroid, LR and ISOLET datasets, and the results obtained are then compared to seven existing clustering algorithms/techniques, namely the Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Cat Swarm Optimization (CSO), Genetic Algorithm (GA), Advanced Chemical Reaction Optimization (ACRO), WOA, and K-means algorithms. The performance of each algorithm is compared and analyzed using the average intra-cluster distance and f-measure parameters. The applicability and feasibility of the proposed algorithm is demonstrated via the experimental study that has been carried out in this paper. The experimental results highlighted the enhancements that were made and proved the superiority of the proposed EWOA clustering algorithm.

The core contributions of this study are summarized as follows:

i).
Introduced the EWOA as an improvement to the original WOA by incorporating some key enhancements to overcome the problems that are inherent in the the original WOA.
ii).
Incorporate tabu and neighbourhood strategy to handle local optima situations.
iii).
Incorporate the position update equations from the water wave optimization algorithm to improve the search space, minimize the intra-cluster distance and accelerate the convergence rate.
iv).
Demonstrate the feasibility and applicability of the proposed EWOA model by implementing the algorithm in solving cluster analysis problems using eight experimental benchmark datasets.
v).
Proved the superiority of the proposed EWOA model by comparing the results of the experimental study obtained via the proposed EWOA model and seven other well-known clustering algorithms.

This paper is divided into 6 sections. The literature review and related works is presented in Section 2, while Section 3 describes the background details of the algorithms and methods that are employed in this paper. The improvements and enhancements that were done to the algorithm are detailed in Section 4, while the experimental study and the results that were obtained are presented in Section 5. Concluding remarks and the future scope of this study is presented in Section 6, followed by the list of declarations, acknowledgments and the list of references.

2 Literature review

Several algorithms have been developed, hybridized, and improved in the past few decades to solve clustering problems. Some of them are expounded in this section. Premalatha and Natarajan [27] proposed a PSO algorithm with an enhanced discrete binary PSO, where the model was tested using three datasets and compared with the K-means clustering algorithm, and the outcome was an improved execution and more diversity in the swarm. Kao et al. [15] studied partitional clustering problems using a hybrid optimization solution, while Chang et al. [3] eliminated the local optima and premature convergence problems of the standard genetic algorithm with their proposed gene rearrangement strategy. Jiang and Wang [14] presented a cooperative coevolution framework for BPSO algorithms, in which cooperative coevolution was used to decompose the problem into K subproblems, while PSO was used to solve these problems. Wang et al. [38] proposed a chaotic KH hybridized clustering method which had a better convergence speed. Kumar and Sahoo [18] proposed a hybrid data clustering algorithm by combining CSO and K-harmonic means algorithms, tested it against various existing algorithms, and concluded that the proposed hybrid model has improved convergence speed. Kumar and Sahoo [19] introduced another hybrid algorithm that combined the PSO and MCSS algorithms, in which this hybrid model employed the neighbourhood technique to improve the search process. Menéndez et al. [23] proposed a multi medoid-based ACO clustering algorithm that automatically determines the optimum number of clusters and works without predetermined criteria, i.e. the number of clusters. Hatamlou [12] hybridized the PSO and the big bang-big crunch (BB-BC) algorithms to overcome the local optima and premature convergence problems.

Zhang et al. [41] and Karaboga and Ozturk [16] studied the use of the ABC algorithm that simulated the intelligent foraging behaviour of honey bee swarms in clustering problems and proved that the ABC algorithm is indeed efficient for solving multivariate data clustering problems. Yan et al. [40] introduced a hybrid variant of the ABC algorithm for solving clustering problems, in which the crossover operator of GA is integrated with the ABC algorithm to accelerate its convergence speed and achieve optimality in the solution faster. Alshamiri et al. [2] integrated the extreme learning machine (ELM) model into the ABC algorithm, in which the ELM model will project the input data into a high-dimensional feature space, while the ABC algorithm will perform the partitions. Kumar and Sahoo [20] proposed an efficient two-step ABC algorithm, in which the K-means algorithm is used to identify the initial seed points or food sources for the ABC algorithm.

Senthilnath et al. [31] implemented the firefly algorithm that simulates the social insects’ behaviour and flash pattern of fireflies in solving clustering problems. Hatamlou [11] introduced a black hole (BH) phenomenon-based algorithm for clustering, in which the search space is defined in terms of the black hole, stars, and their absorption mechanism. The efficiency of the BH-based method was tested using standard datasets and was proven to be an effective clustering technique. Wang et al. [39] proposed a bee pollinator with a flower pollination algorithm to improve searchability and achieve faster convergence. Siddiqi et al. [32] introduced a new hybrid model that integrated the GA and SimE algorithms to automate the partitional clustering process. A greedy method is first applied to select the initial seed points, and optimization methods are then implemented to optimize them. Kushwaha et al. [21] proposed a magnetic force-based clustering algorithm in which a magnetic force-based search mechanism is implemented to find the optimal cluster centres. The data points are considered as particles and get rendered due to magnetic forces. The optimum position for the centroid particles is said to be achieved when the magnetic force applied by the data points approaches zero.

To automate the clustering process, Zhou et al. [43] projected the simplex method in the social spider algorithm. The simplex method is used to estimate and update the positions of the spiders. This stochastic variant strategy enhanced the population diversity and improved the local search ability of the traditional algorithm. Han et al. [10] hybridized the birds flock and gravitational search algorithm (BFGSA) to develop an efficient algorithm for partitional clustering that uses neighbourhood strategies to explore a broader range of search space. This hybrid model managed to overcome the local optima, handling of multidimensional data, and premature convergence problems. Ganguly [6] proposed a neighbour heuristic-based algorithm for cluster analysis, in which a function was introduced to avoid the direct distance vectors computation and get the topmost similar vectors in this work. Singh et al. [35] introduced an artificial chemical reaction-based algorithm for partitional clustering problems, whereby neighbourhood and position-based operators were taught to overcome the deficiencies of traditional chemical reaction algorithms resulting in a more efficient clustering algorithm. Singh and Kumar [33] hybridized the ACRO algorithm with genetic operators, whereas Singh and Kumar [34] introduced a neighborhood search based on the CSO algorithm and applied this to solve clustering problems. Motwani et al. [25] developed three methods to generate the initial centroids for initial cluster selection and concluded that the farthest distributed centroid clustering algorithm produces quality clusters.

Santana-Velásquez et al. [30] focused on applying Machine Learning (ML) techniques as an alternative to DRG’s traditional classification methods. The primary goal is to determine if ML techniques can categorize patients according to the DRGs criteria using information available during discharge. This data served as the foundation for subsequent research on the prediction of DRGs in the early phases of patients’ hospitalization episodes. Stephan et al. [36] applied the HAW technique in an ANN model concurrently with feature selection (FS) and parameter optimization algorithms. Backpropagation learning was used to develop HAW in this study, which comprises robust backpropagation (HAW-RP), Levenberg–Marquart (HAW-LM), and momentum-based gradient descent (HAW-GD) methods. The accuracy, complexity, and computation time of this hybrid model was studied using several breast cancer datasets. Goyal et al. [8] applied various optimization algorithms such as the particle swarm optimization (PSO), cat swarm optimization (CSO), BAT, cuckoo search algorithm (CSA) optimization algorithm, and whale optimization algorithm (WOA) for load balancing, energy efficiency, and better resource scheduling to create an efficient cloud environment. The study found that the WOA beat all the other algorithms in response time, energy consumption, execution time, and throughput in the scenario of seven servers and eight server configurations.

Stephan et al. [37] proposed a novel hybrid Artificial Bee Colony (hybrid ABC) optimization algorithm where the strong explorative capabilities of the chemotaxis phase of the bacterial foraging optimization were integrated with a spiral model-based exploitative phase of the ABC algorithm. This enabled the proposed hybrid ABC algorithm to overcome the demerits of poor exploration procedures in the standard ABC algorithm and outperform the corresponding standalone ABC algorithm. Rahnema and Gharehchopogh [29] proposed an improved version of the ABC algorithm based on the swarm intelligence characteristic of whales and found that random memory and elite memory enhanced the convergence speed of the improved algorithm. Ghany et al. [7] combined the WOA with the tabu search method. The tabu search enabled the WOA to store multiple best solutions and utilize them to explore the solution space more effectively. Purushothaman et al. [28] combined the Gray wolf optimization and grasshopper algorithms for clustering. This hybridization improved reliability and reduced computational time. Ahmadi et al. [1] modified the Gray wolf optimization algorithm by introducing a balanced approach to exploration and exploitation and centers around the best solution, and showed that the proposed algorithm produced state-of-the-art results with a higher accuracy rate. Kumar and Kaur [17] introduced three new variants of the bat algorithm that managed to resolve problems related to initial cluster selection, convergence rate, and local optima with the help of enhanced cooperative evolution, elitist, and neighbourhood search strategies. These enhancements resulted in a robust partitional clustering algorithm.

All these innovations to the existing bio-inspired algorithms were proven to have improved efficiency, faster convergence rate, shorter computation time, and higher accuracy when compared to the corresponding standard, standalone bio-inspired algorithms.

3 Methodology

This section gives the background description of the algorithms and methods that have been implemented in this work. The Enhanced Whale Optimization Algorithm (EWOA) has been successfully utilized in the field of clustering to produce optimal cluster centres. The dataset is first put into memory, and the fundamental parameters are then configured. Following that, other sequential processes, such as sampling or cluster centre selection, goal function computation, assignment of data items to appropriate clusters, and updating of points are performed.

3.1 Whale optimization algorithm

The whale optimization algorithm is a nature-inspired algorithm that simulated the foraging behaviour of humpback whales [24]. Although it was initially designed to solve numerical problems, it was soon applied to several other domains such as clustering, due to its self-explorative nature and ability to achieve convergence at a faster rate. The formulated mathematical model stimulated the prey identification and hunting strategies of humpback whales. The prey finding and encircling processes are modelled using Eqs. (2) and (3):

$$ \overrightarrow{D}=\left|\overrightarrow{C_{cv}}\overrightarrow{Z^{\ast }}(t)-\overrightarrow{Z}(t)\right| $$

(2)

$$ \overrightarrow{Z}\left(t+1\right)=\overrightarrow{Z^{\ast }}(t)-\overrightarrow{A_{cv}}.\overrightarrow{D} $$

(3)

where $ \overrightarrow{A_{cv}}=2\overrightarrow{a}.r-\overrightarrow{a} $, $ \overrightarrow{C_{cv}}=2r $. The terms $ \overrightarrow{Z} $ and $ \overrightarrow{Z^{\ast }} $ denote the current position vector and global best position vector, respectively, $ \overrightarrow{C_{cv}} $ and $ \overrightarrow{A_{cv}} $ are coefficient vectors, r is a rand (0, 1) function, a is linearly decreased from 2 to 0 over the iterations.

The bubble-net attacking process is a combination of shrinking encircling, and spiral position update methods. In shrinking encircling, the coefficient vectors get varied to simulate the humpback whale behaviour. At the same time, in the spiral position update method, the formulated spiral equation is trailed to find the helix-shaped movement of whales as denoted by Eqs. (4) and (5). The humpback whales perform shrinking encircling or spiral movements that can be calculated using Eq. (6).

$$ {\overrightarrow{D}}^{\prime }=\left|\overrightarrow{Z^{\ast }}(t)-\overrightarrow{Z\ }(t)\right|\kern0.5em $$

(4)

$$ \overrightarrow{Z}\left(t+1\right)={\overrightarrow{D}}^{\prime }.{e}^{bl}.\cos \left(2\pi l\right)+\overrightarrow{Z^{\ast }}(t) $$

(5)

$$ \overrightarrow{Z}\left(t+1\right)=\left\{\begin{array}{c}\overrightarrow{Z^{\ast }}(t)-\overrightarrow{A_{cv}}.\overrightarrow{D}\kern4.75em if\ p<0.5\ \\ {}{\overrightarrow{D}}^{\prime }.{e}^{bl}.\cos \left(2\pi l\right)+\overrightarrow{Z^{\ast }}(t), ifp\ge 0.5\ \end{array}\right. $$

(6)

Here, $ \overrightarrow{D} $ is a distance vector, b is a constant vector, and l is a rand [−1,1], and p is a rand (0,1) function. The humpback whale search preys randomly in search space. The movements of the whale results in the change in vector location as denoted by Eqs. (7) and (8).

$$ \overrightarrow{D}=\left|\overrightarrow{C_{cv}.}\overrightarrow{Z_{rand}}-\overrightarrow{Z}\right| $$

(7)

$$ \overrightarrow{Z}\left(t+1\right)=\overrightarrow{Z_{rand}}-\overrightarrow{A_{cv}}.\overrightarrow{D} $$

(8)

The term $ \overrightarrow{Z}\left(t+1\right) $ represents a new position vector, while the term $ \overrightarrow{Z_{rand}} $ denotes a randomly chosen vector.

3.2 Water wave optimization algorithm (WWOA)

Recently, a water wave theory-based optimization algorithm was introduced for solving global optimization problems (Zheng [42]). This algorithm inherits the propagation, refraction, and breaking phenomena of water waves for searching and optimization. In water wave propagation operations, the new water waves are generated using Eq. (9) while the wavelength, λ is calculated using Eq. (10).

$$ \overrightarrow{Z}\left(t+1\right)=\overrightarrow{Z}+\mathit{\operatorname{rand}}\left(-1,1\right)\times \lambda \times {L}_d $$

(9)

$$ \lambda =\lambda \times {\alpha}^{\frac{\left(f(x)- fmin+\varepsilon \right)}{\left( fmax- fmin+\varepsilon \right)}} $$

(10)

Here, L_d is the length of the search space (1 ≤ d ≤ n), λ is the wavelength, fmin and fmax are the minimum and maximum fitness values, respectively, α is the wavelength dropping factor, and ε is a fixed constraint.

3.3 Tabu search

Tabu search is an elite list-based global optimization technique. The starting solutions are stored in the list and iteratively compared with the upcoming solutions. If an improved solution is obtained, the previous/starting solution is updated/ replaced with a better solution. The implementation of tabu search avoids re-entering previously explored regions and uses a single point for exploration [42].

3.4 Neighbourhood strategy

The neighbourhood strategy is used to enhance the searchability of the algorithm and increases the probability of finding a new solution. This primary centers around the neighbouring solutions and uses them to generate new solutions [30].

4 Proposed work: An enhanced whale optimization algorithm (EWOA) for partitional clustering

This section detailed the EWOA for solving partitional clustering problems. In this study, two improvements are proposed: (i) The propagation method is incorporated into the whale optimization algorithm; (ii) An integrated strategy is proposed to handle the local optima situation. A detailed description is given below.

4.1 Improvements in search space mechanism

The whale optimization algorithm is incorporated with an additional exploration mechanism to enhance searchability. The random prey search operation of the whale optimization algorithm is replaced with the propagation method of the water wave optimization algorithm given in Eqs. (9) and (10). The explorative search mechanism of the water wave algorithm is utilized to generate the new location vector and diversify the solution.

4.2 Integration of tabu and neighbourhood search strategies

In the second improvement, an integrated strategy based on tabu, and neighbourhood search is designed and implemented to solve local optima and nullify premature convergence problems. Here, the tabu list is extended to store N number of global best Z_{N, gbest} positions in it. These Z_{N, gbest} best positions are used as neighbouring points in the neighbourhood search strategy. Afterward, to generate a single point, the harmonic means of Z_{N, gbest} point is calculated. To understand in a better way, assume that, Z_{tabu, gbest}are a tabus list that stores (N) number of global best data points (Z_{N, gbest}). These data points are used as neighbouring points Z_i, neigh = {Z_{1, gbest}, Z_{2, gbest}, ……Z_{N, gbest}}, where N = 1, 2, …, 9, and the harmonic mean of these neighbouring data points is calculated, Z_new=Harmonic mean of(Z_N, neigh) is used to generate a new data point.

4.3 Proposed EWOA model in solving clustering problems

The enhanced whale optimization algorithm is successfully implemented in the clustering field to achieve the optimal cluster centres. Initially, the dataset is loaded in memory, and basic parameters are initialized. Afterward, the different consecutive operations, sampling or cluster centre selection, objective function computation, assignments of data objects to respective clusters, updates, and others are followed. The pseudo-code of the proposed algorithm is detailed in Algorithm 1 and graphically presented in Fig. 1.

4.4 Toy example

The working of EWOA algorithm in the clustering field is exemplified using an artificial dataset. The artificial dataset (9,3,4) contains 9 data instances, 3 classes, and 4 attributes.

Step 1.
Load dataset and specify number of clusters (K = 3), total population = 9, no of iterations = 10.

5.1	3.5	1.4	0.2
4.9	3	1.4	0.2
4.7	3.2	1.3	0.2
7	3.2	4.7	1.4
6.4	3.2	4.5	1.5
6.9	3.1	4.9	1.5
6.3	3.3	6	2.5
5.8	2.7	5.1	1.9
7.1	3	5.9	2.1

Step 2.
Randomly selected initial cluster centres.

4.7000	3.2000	1.3000	0.2000
6.9000	3.1000	4.9000	1.5000
5.8000	2.7000	5.1000	1.9000

Step 3.
Evaluate the objective function.

0.5099	4.1641	4.2083
0.3000	4.2367	4.1809
0.0000	4.4159	4.3347
4.2767	0.2646	1.4491
3.8497	0.6481	1.063
4.4159	0.000	1.253
5.4727	1.6155	1.3342
4.3347	1.253	0.000
5.529	1.1874	1.5684

Step 4.
Assign data objects to clusters according to minimum objective function values.

0.5099	4.1641	4.2083
0.3000	4.1809	4.2367
0.0000	4.3347	4.4159
0.2646	1.4491	4.2767
0.6481	1.063	3.8497
0.0000	1.253	4.4159
1.3342	1.6155	5.4727
0.0000	1.253	4.3347
1.1874	1.5684	5.529

The index values of the clusters are:

1	2	3
1	3	2
1	3	2
2	3	1
2	3	1
2	3	1
3	2	1
3	2	1
2	3	1

Step 5.
Generated cluster centres in 9th iteration.

5.1000	3.5000	1.4000	0.2000
6.3000	3.3000	6.0000	2.5000
7.1000	3.0000	5.9000	2.1000

Step 6.
Check for local optima.
Step 7.
Update the candidate solution.
Step 8.
Check the ‘Stop’ criteria. If requirements are met, ‘Stop’, else repeat steps 3–8.
Step 9.
Optimal solution.

5.1000	3.5000	1.4000	0.2000
6.3000	3.3000	6.0000	2.5000
7.1000	3.0000	5.9000	2.1000

5 Experimental results and analysis

This section provides a detailed description of simulation results and parameter settings for the EWOA. The simulation is performed in the MATLAB 2016 environment, configured on a Windows 10 OS, processor intel i3, 8 GB RAM equipped machine. The performance of the proposed EWOA is measured on eight datasets, and the characteristics are detailed in Table 1. The results are compared with seven clustering algorithms, namely the PSO, ACO, CSO, GA, ACRO, WOA, and K-means clustering algorithms. The user-defined parameters setting of EWOA are defined as population = K × d, number of clusters or groups =K, A = [−1, 1], random function (0,1) and length of search space (1 ≤ d ≤ n), iterations = 200. The algorithms run thirty times individually, and the results are evaluated as an average case of performance parameters (intra-cluster distance and f-measure).

Table 1 Description of datasets

Full size table

5.1 Results and discussion

This subsection presents a comparative analysis and convergence behaviour of EWOA and other clustering algorithms. Table 2 presents the performance comparison of K-means, GA, PSO, ACO, CSO, ACRO, WOA, and EWOA using average intra-cluster distance and f-measure parameters. From simulation outcomes, it is observed that the EWOA obtain minimum intra-cluster distance values except for the CMC datasets. Further, the f-measure is also computed to assess the classification of data objects to corresponding clusters. The EWOA attained a healthy f-measure rate except for CMC and LR datasets. For the CMC dataset, ACRO algorithm, and LR dataset, GA has superior results.

Table 2 Performance comparison of EWOA and other well-known clustering algorithms

Full size table

The convergence behavior of AWOA, WOA, ACRO, CSO, ACO, PSO, GA, and K-means clustering algorithms are depicted in Fig. 2a-h. The x-axis shows the number of iterations, and the y-axis shows the intra-cluster distance. From graphs, it is revealed that EWOA converges on a minor level except for the CMC dataset. Although in most aspects, the EWOA provides a better convergence rate.

Except for the CMC and LR datasets, the EWOA achieved a better f-measure rate. GA outperforms the CMC dataset, the ACRO method, and the LR dataset.

Figure 2a-h shows the convergence behavior of EWOA and other clustering algorithms. The convergence on ISOLET dataset is described with the Number of Iterations v/s Intra Cluster Distance, compared with the EWOA, WOA, ACRO, CSO, ACO, PSO, GA & K-means.

5.2 Statistical analysis

The Friedman statistical test is carried out to prove the significance of the results and verify the feasibility of the newly proposed algorithm. Here two hypotheses (null hypothesis (H₀) and alternative hypothesis (H₁)) are projected; the H₀ expresses that the algorithms have similar performance; the H₁ expressese that algorithms have dissimilar performance. Table 3 shows the statistical analysis using the intra-cluster distance parameter. The test shows that the critical value is 14.067144, and the p value is 7.12E-07 at a significance level from 0.5. These values are consistent with the test which showed that the null hypothesis (H₀) is rejected, hence proving that the algorithms have dissimilar performances. The EWOA was also found to have significantly distinct performances compared to the other algorithms that are compared in this study.

Table 3 Statistical analysis using intra-cluster distance

Full size table

Table 4 shows the statistical analysis using the f-measure parameter. The EWOA gets the first rank except for CMC and LR datasets. However, for cancer, wine and balance, it is approximately equal to the ACRO algorithm. The critical value is 14.0671 that shows the significant difference among algorithms.

Table 4 Statistical analysis using F-measure

Full size table

Table 4 shows the statistical analysis using the f-measure parameter. The EWOA gets the first rank for all the datasets except for the CMC and LR datasets. However, for the cancer, wine and balance datasets, the results obtained from the EWOA was found to be approximately equal to the results from the ACRO algorithm. The critical value of 14.0671 indicates that there is a significant difference in the performance of the algorithms.

6 Conclusion and future work

In this study, an improvement to the original WOA called the Enhanced Whale Optimization Algorithm (EWOA) has been developed for solving clustering problems. This improved algorithm has proven to be able to overcome the problems that are inherent in the original WOA, namely the slower convergence rate due to the convergence being concentric around the search space mechanism and the local optima situation. To overcome the problems that are inherent in the WOA, the EWOA is enhanced with two additional operational procedures to accelerate the convergence rate and overcome the local optima situation. Minimum intra-cluster distance and an accelerated convergence rate was achieved through the implementation of position update equations from the water wave optimization algorithm that were incorporated into the algorithm to improve the search space, whereas the local optima situation was overcome by implementing the tabu and neighbourhood search strategies into the algorithm. The efficiency of the proposed EWOA was measured using a simulation-based experimental study that was conducted on eight benchmark datasets, namely the Iris, Cancer, CMC, Wine, Glass, Thyroid, LR and ISOLET datasets. The results obtained were then compared to the results obtained via seven existing clustering algorithms/techniques, namely the PSO, ACO, CSO, GA, ACRO, WOA, and K-means algorithms. The performance of each algorithm was compared and analyzed using the average intra-cluster distance and f-measure parameters. The results obtained clearly showed the applicability and feasibility of the enhancements that were made to the EWOA and the superiority of the proposed EWOA model in solving clustering problems compared to the existing models/methods. The future scope of this work involves the application of the proposed EWOA model in solving problems related to vehicular networks for cluster head formation and load balancing.

References

Ahmadi R, Ekbatanifard G, Bayat P (2021) A modified grey wolf optimizer based data clustering algorithm. Appl Artif Intell 35(1):63–79
Article Google Scholar
Alshamiri AK, Singh A, Surampudi BR (2016) Artificial bee colony algorithm for clustering: an extreme learning approach. Soft Comput 20(8):3163–3176
Article Google Scholar
Chang DX, Zhang XD, Zheng CW (2009) A genetic algorithm with gene rearrangement for K-means clustering. Pattern Recogn 42(7):1210–1222
Article Google Scholar
Cura T (2012) A particle swarm optimization approach to clustering. Expert Syst Appl 39(1):1582–1588
Article Google Scholar
Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39
Article Google Scholar
Ganguly D (2018) A fast partitional clustering algorithm based on nearest neighbours heuristics. Pattern Recogn Lett 112:198–204
Article Google Scholar
Ghany KKA, AbdelAziz AM, Soliman THA, Sewisy AAEM (2022) A hybrid modified step whale optimization algorithm with Tabu search for data clustering. Journal of King Saud University-Computer and Information Sciences 34(3):832–839
Article Google Scholar
Goyal S, Bhushan S, Kumar Y, Rana AUHS, Bhutta MR, Ijaz MF, Son Y (2021) An optimized framework for energy-resource allocation in a cloud environment based on the whale optimization algorithm. Sensors 21(5):1583
Article Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier
Han X, Quan L, Xiong X, Almeter M, Xiang J, Lan Y (2017) A novel data clustering algorithm based on modified gravitational search algorithm. Eng Appl Artif Intell 61:1–7
Article Google Scholar
Hatamlou A (2013) Black hole: A new heuristic optimization approach for data clustering. Inf Sci 222:175–184
Article MathSciNet Google Scholar
Hatamlou A (2017) A hybrid bio-inspired algorithm and its application. Appl Intell 47:1059–1067
Article Google Scholar
Hatamlou A, Abdullah S, Hatamlou M (2011) Data clustering using big bang–big crunch algorithm. In: Pichappan P, Ahmadi H, Ariwa E (eds) Innovative computing technology. INCT 2011. Communications in Computer and Information Science, vol 241. Springer, Berlin, Heidelberg, pp 383–388. https://doi.org/10.1007/978-3-642-27337-7_36
Chapter Google Scholar
Jiang B, Wang N (2014) Cooperative bare-bone particle swarm optimization for data clustering. Soft Comput 18(6):1079–1091
Article Google Scholar
Kao YT, Zahara E, Kao IW (2008) A hybridized approach to data clustering. Expert Syst Appl 34(3):1754–1762
Article Google Scholar
Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee colony (ABC) algorithm. Appl Soft Comput 11(1):652–657
Article Google Scholar
Kumar Y, Kaur A (2021) Variants of bat algorithm for solving partitional clustering problems. Eng Comput. https://doi.org/10.1007/s00366-021-01345-3
Kumar Y, Sahoo G (2015) A hybrid data clustering approach based on improved cat swarm optimization and K-harmonic mean algorithm. AI Commun 28(4):751–764
Article MathSciNet Google Scholar
Kumar Y, Sahoo G (2015) Hybridization of magnetic charge system search and particle swarm optimization for efficient data clustering using neighborhood search strategy. Soft Comput 19(12):3621–3645
Article Google Scholar
Kumar Y, Sahoo G (2015) A two-step artificial bee colony algorithm for clustering. Neural Comput & Applic 28(3):537–551
Article Google Scholar
Kushwaha N, Pant M, Kant S, Jain VK (2018) Magnetic optimization algorithm for data clustering. Pattern Recogn Lett 115:59–65
Article Google Scholar
Mat AN, İnan O, Karakoyun M (2021) An application of the whale optimization algorithm with levy flight strategy for clustering of medical datasets. International Journal of Optimization and Control: Theories & Applications 11(2):216–226
MathSciNet Google Scholar
Menéndez HD, Otero FE, Camacho D (2016) Medoid-based clustering using ant colony optimization. Swarm Intelligence 10(2):123–145
Article Google Scholar
Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
Article Google Scholar
Motwani M, Arora N, Gupta A (2019) A study on initial centroids selection for partitional clustering algorithms. In: Hoda M, Chauhan N, Quadri S, Srivastava P (eds) Software engineering. Advances in intelligent systems and computing, vol 731. Springer, Singapore, pp 211–220. https://doi.org/10.1007/978-981-10-8848-3_21
Chapter Google Scholar
Nanda SJ, Panda G (2014) A survey on nature inspired metaheuristic algorithms for partitional clustering. Swarm and Evolutionary computation 16:1–18
Article Google Scholar
Premalatha K, Natarajan AM (2008) A new approach for data clustering based on PSO with local search. Computer and Information Science 1(4):139–145
Article Google Scholar
Purushothaman R, Rajagopalan SP, Dhandapani G (2020) Hybridizing gray wolf optimization (GWO) with grasshopper optimization algorithm (GOA) for text feature selection and clustering. Appl Soft Comput 96:106651
Article Google Scholar
Rahnema N, Gharehchopogh FS (2020) An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering. Multimed Tools Appl 79(43):32169–32194
Article Google Scholar
Santana-Velásquez, A., John Freddy Duitama, M., & Arias-Londoño, J.D. (2020). Classification of diagnosis-related groups using computational intelligence techniques. Proceedings of the 2020 IEEE Colombian Conference on Applications of Computational Intelligence (IEEE ColCACI 2020), 2020, pp. 1–6, https://doi.org/10.1109/ColCACI50549.2020.9247889.
Senthilnath J, Omkar SN, Mani V (2011) Clustering using firefly algorithm: performance study. Swarm and Evolutionary Computation 1(3):164–171
Article Google Scholar
Siddiqi UF, Sait SM (2017) A new heuristic for the data clustering problem. IEEE Access 5:6801–6812
Article Google Scholar
Singh H, Kumar Y (2020) Hybrid artificial chemical reaction optimization algorithm for cluster analysis. Procedia Computer Science 167:531–540
Article Google Scholar
Singh H, Kumar Y (2020) A neighborhood search based cat swarm optimization algorithm for clustering problems. Evol Intel 13(4):593–609
Article Google Scholar
Singh H, Kumar Y, Kumar S (2019) A new meta-heuristic algorithm based on chemical reactions for partitional clustering problems. Evol Intel 12(2):241–252
Article Google Scholar
Stephan P, Stephan T, Kannan R, Abraham A (2021) A hybrid artificial bee colony with whale optimization algorithm for improved breast cancer diagnosis. Neural Comput & Applic 33:13667–13691
Article Google Scholar
Stephan P, Stephan T, Gandomi AH (2022) A novel breast cancer diagnosis scheme with intelligent feature and parameter selections. Comput Methods Prog Biomed 214:106432
Article Google Scholar
Wang GG, Guo L, Gandomi AH, Hao GS, Wang H (2014) Chaotic krill herd algorithm. Inf Sci 274:17–34
Article MathSciNet Google Scholar
Wang R, Zhou Y, Qiao S, Huang K (2016) Flower pollination algorithm with bee pollinator for cluster analysis. Inf Process Lett 116(1):1–14
Article Google Scholar
Yan X, Zhu Y, Zou W, Wang L (2012) A new approach for data clustering using hybrid artificial bee colony algorithm. Neurocomputing 97:241–250
Article Google Scholar
Zhang C, Ouyang D, Ning J (2010) An artificial bee colony approach for clustering. Expert Syst Appl 37(7):4761–4767
Article Google Scholar
Zheng YJ (2015) Water wave optimization: A new nature-inspired metaheuristic. Comput Oper Res 55:1–11
Article MathSciNet MATH Google Scholar
Zhou Y, Zhou Y, Luo Q, Abdel-Basset M (2017) A simplex method-based social spider optimization algorithm for clustering analysis. Eng Appl Artif Intell 64:67–82
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank the Editors and the anonymous reviewers for their valuable comments and suggestions which has helped to improve the quality and clarity of the paper. The authors would also like to acknowledge the assistance rendered by Dr. Cherry Bhargava for the general supervision of the research group and general administrative support.

Data Availability

The data that support the findings of this study are available upon request from the corresponding authors.

Funding

This work was supported by the Ministry of Education, Malaysia under grant no. FRGS/1/2020/STG06/UCSI/02/1.

Author information

Authors and Affiliations

Chitkara University School of Engineering and Technology, Chitkara University, Himachal Pradesh, India
Hakam Singh
Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India
Vipin Rai & Neeraj Kumar
Swami Keshvanand Institute of Technology, Management & Gramothan (SKIT), Jaipur, Rajasthan, 302017, India
Pankaj Dadheech
Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, MH, 412115, India
Ketan Kotecha
Faculty of Business and Management, UCSI University, Jalan Menara Gading, 56000 Cheras, Kuala Lumpur, Malaysia
Ganeshsree Selvachandran
Machine Intelligence Research Labs, Auburn, WA, 98071, USA
Ajith Abraham

Authors

Hakam Singh
View author publications
You can also search for this author in PubMed Google Scholar
Vipin Rai
View author publications
You can also search for this author in PubMed Google Scholar
Neeraj Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Pankaj Dadheech
View author publications
You can also search for this author in PubMed Google Scholar
Ketan Kotecha
View author publications
You can also search for this author in PubMed Google Scholar
Ganeshsree Selvachandran
View author publications
You can also search for this author in PubMed Google Scholar
Ajith Abraham
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the conception and design of the study. Material preparation, data collection, data visualization and data analysis were performed by Hakam Singh, Vipin Rai, Neeraj Kumar, and Pankaj Dadheech. Advanced data analysis and validation were done by Ketan Kotecha, Ganeshsree Selvachandran and Ajith Abraham. The first draft of the manuscript was written by Hakam Singh, Vipin Rai, Neeraj Kumar, and Pankaj Dadheech. The second draft was prepared and edited by Ganeshsree Selvachandran and Ajith Abraham. All authors commented on previous versions of the manuscript. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Ganeshsree Selvachandran.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Ethical compliance

Authors’ declaration: This manuscript is the authors’ original work and has not been published elsewhere. All authors have checked the manuscript and have agreed to this submission.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Singh, H., Rai, V., Kumar, N. et al. An enhanced whale optimization algorithm for clustering. Multimed Tools Appl 82, 4599–4618 (2023). https://doi.org/10.1007/s11042-022-13453-3

Download citation

Received: 08 March 2022
Revised: 18 May 2022
Accepted: 02 July 2022
Published: 27 July 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-13453-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An enhanced whale optimization algorithm for clustering

Abstract

Similar content being viewed by others

An improved multi-population whale optimization algorithm

An Improved Water Flow Optimizer for Data Clustering

An efficient meta-heuristic algorithm based on water flow optimizer for data clustering

Explore related subjects

1 Introduction

2 Literature review

3 Methodology

3.1 Whale optimization algorithm

3.2 Water wave optimization algorithm (WWOA)

3.3 Tabu search

3.4 Neighbourhood strategy

4 Proposed work: An enhanced whale optimization algorithm (EWOA) for partitional clustering

4.1 Improvements in search space mechanism

4.2 Integration of tabu and neighbourhood search strategies

4.3 Proposed EWOA model in solving clustering problems

4.4 Toy example

5 Experimental results and analysis

5.1 Results and discussion

5.2 Statistical analysis

6 Conclusion and future work

References

Acknowledgments

Data Availability

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical compliance

Ethical approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation