1 Introduction

According to Britannica, optimization is a collection of mathematical principles and methods used in many disciplines to solve quantitative problems [1]. The optimization process is applied in almost every field today. While researchers solved simple optimization problems in traditional ways in the past, nowadays, increasingly complex optimization problems are solved with the help of computers.

Real-world problems in the optimization field are complex and challenging to solve. Algorithms used to obtain exact results in such complex and challenging to solve problems are usually slow in execution time. They are designed to function only in the problem they are intended to solve. It is impossible to use such algorithms for other challenging and complex problems. For this reason, algorithms that do not provide exact solutions and work faster have been developed. These algorithms are called heuristic algorithms. Heuristic algorithms are faster than other algorithms and aim to find a solution close to the best solution by evaluating all possibilities in the solution space. However, they never guarantee that they will find the best solution. Although such algorithms work fast, they are the algorithms dependent on the problem they apply since the information of the problem that the algorithms will use is used in the development phase. These algorithms are called classical heuristic algorithms.

Apart from classical heuristic algorithms, meta-heuristic algorithms are not dependent on the problem. Meta means high level. In other words, we can evaluate meta-heuristic algorithms as high-level heuristic algorithms. These algorithms are often designed with inspiration from nature and can be applied to many problems by simply changing the objective function [2]. Although meta-heuristic algorithms do not have any information about the problem, they can be considered a kind of black box because they can give the most appropriate variable values for the most appropriate solution. Recently, some engineering problems that we know np-hard problems were solved with metaheuristic algorithms. No Free Lunch Theorem of Optimization [3] proves that optimization algorithms are not always a suitable and perfect tool for all optimization problems. That means metaheuristic algorithms perform better results on specific optimization problems and not as well on other problems. Hence, this theorem encourages researchers to develop more efficient metaheuristic algorithms. Even in the last two years, researchers solved problems such as damage identification in plate structures [4], identification of static and dynamic cracks on mechanical structures [5], and structural damage detection [6] with metaheuristic algorithms. Also, enhanced versions of these algorithms are still being researched, such as a novel version of Cuckoo search [7] in recent years.

The primary purpose of this study is to develop a new meta-heuristic algorithm. The developed algorithm is inspired by the movement of electricity in a highly resistive environment. The proposed algorithm is called Electrical Search Algorithm (ESA). In this work, we compared our proposed algorithm with some of the early developed and recently developed metaheuristic algorithms which are Genetic Algorithm (GA) [8], Particle Swarm Optimization (PSO) [9], Backtracking Search Algorithm (BSA) [10], Differential Search Algorithm (DSA) [11], Marine Predators Algorithm (MPA) [12], K-Means Clustering Optimization Algorithm (KO) [13], and Harris Hawks Optimizer Algorithm (HHO) [14].

We compared the performance of the proposed algorithm with the performance of seven existing algorithms in four benchmark test functions and the clustering performance of four UCI machine learning clustering datasets of the K-Means algorithm. Also, we tested the proposed algorithm in IEEE CEC (2019) accuracy benchmark functions.

The rest of the paper is organized as follows. Background work and literature review are presented in Sect. 2. The problem definition of the clustering problem and the proposed Electrical Search Algorithm are presented in Sect. 3. Section 4 consists of performance evaluations and an analysis of the proposed algorithm. As a final, conclusions and future works are mentioned in Sect. 5.

2 Background work

2.1 Data clustering

Data clustering defines a collection of patterns in a uniform dataset. The objective is to develop an automatic algorithm that can accurately classify an unlabeled dataset. Data clustering is a technique for organizing data into meaningful clusters based on similarity criteria to find groupings with the smallest intra-cluster and highest inter-cluster distances. Each cluster contains data that are related to each other but different from other clusters. The clustering problem is proven that an NP-hard problem. All clustering algorithms can be classified into three categories: hierarchical, partitional, and overlapping. Hierarchical algorithms work by taking each data as a cluster. Then adds, similar data into the same cluster, and this process reduces the cluster number. The algorithm runs until the predefined cluster number is achieved. Overlapping algorithms are better expressed as fuzzy clustering. In this type of clustering, all data is a member of all clusters with a membership degree. By assigning each data to the cluster with the highest degree of membership, we can assign these data to the same clusters. Partitional algorithms take data and assign them into clusters according to the similarity between data and cluster center. The partitional clustering method splits a data set into several clusters according to the fitness function. The fitness function directly influences the nature of cluster formation. The partitioning job is transformed into an optimization problem once a suitable fitness function has been chosen, for instance, clustering data based on minimization of inter-cluster distance or maximization of the intra-cluster distance of cluster centers [15].

This approach is the most popular since MacQueen developed the k-means algorithm [16]. With this approach, algorithms can cluster large datasets easily. Because of that, researchers from various fields use these types of algorithms. These fields include, but are not limited to, signal and image processing, wireless sensor network coverage, robotics, web mining, pattern recognition, consumer identification in economics, and disease identification in medical sciences [15].

2.2 Data clustering as an optimization problem

Since we can handle clustering as an optimization problem, we can cluster data with meta-heuristic algorithms. Many studies have been done on solving clustering problems with meta-heuristics. Commonly used heuristic algorithms for solving clustering problems are Genetic Algorithms (GAs) and swarm-based optimization algorithms such as the particle swarm optimization algorithm (PSO).

In [17], the genetic algorithm is used to optimize clusters created during unsupervised clustering. The aim of the [17] is to assign data to clusters, and the algorithm’s performance depends on the initial population. Sarkar et al. used evolutionary programming for clustering in [18]. The methodology and the objective functions used for solving the clustering problem are similar to our approach. In work [19], another form of the genetic algorithm is proposed for solving clustering problems. However, it differs from [17] in that they selected initial cluster centers from data that will be clustered. In [20], another genetic algorithm variation is used to solve the clustering problem. In this work, the structure of the gene string consists of a representation of assigned clusters. In [21], the chromosome structure of the genetic algorithm is represented as strings of real numbers values of the cluster centers. The Genetic K-Means Algorithm (GKA) is introduced in [22]. GKA has one step K-Means operator, which takes assigned data points from coded strings and reassigns them to the nearest cluster centers. In [23], a hybrid k-medoid algorithm (HKA) is presented. HKA consists of k-medoid and local search heuristics, and the heuristic search part of the algorithm has hybridized with GA. Fast Genetic K-Means Algorithm (FGKA) is introduced in [24]. FGKA is inspired by [22]. The difference between the FGKA and GKA is that the FGKA allows illegal strings, and GKA tries to eliminate illegal strings. The incremental Genetic K-means Algorithm (IGKA) [25] is an extension of the FGKA [24]. IGKA has better running performance than FGKA in cases when the mutation probability is small. The idea behind the IGKA is that when the mutation probability is small, calculate the objective value and cluster centroids incrementally. Both algorithms converge to the same result, but IGKA is faster than FGKA. In the same paper, the authors also introduced the Hybrid Genetic K-means Algorithm (HGKA), a hybridized version of the IGKA and FGKA that combines the best parts of both algorithms. In [26], an algorithm called COWCLUS is presented. This algorithm is a combination of GA and hill-climbing algorithms. COWCLUS genes are represented as an assigned cluster of data points. COWCLUS acts as a standard GA algorithm, but at the last iteration, the algorithm performs a local search with the hill-climbing algorithm. COWCLUS uses the Variance Ratio Criterion (VRC) as the objective function to determine clusters.

Previously mentioned papers take the clustering problem in fixed cluster numbers. However, in the real world, in most cases, cluster size cannot be determined without prior knowledge. Algorithms that do not need to specify cluster size are developed to overcome this problem.

In [27], automatic clustering is proposed. The idea of the [27] is a weight added into the inter-distance in the objective function, which consists of subtraction of the inter-distance from intra-distance that can determine the cluster size. A small weight value causes clusters to be large numbers and compact, while a higher value of the weight results in clusters with a small number and broader. In paper [28], real coded GA is used for the clustering problem where the number of the clusters is not fixed prior. The paper’s fundamental idea is that each gene contains real coded clusters centers in the gene structure and can have a different length. That means that each gene can represent a different number of cluster centers. Another attempt for automatic clustering was made in [29]. This work also uses real-coded GA, and the gene structure contains real-coded cluster centers. Each gene’s length is fixed on a maximum number of clusters. The critical element of the paper is that in gene coding, there is a symbol represented as ”do not care,” which means genes can represent an unfixed number of cluster centers.

There are also swarm-based meta-heuristic algorithms that can solve the clustering problem. One of the popular swarm-based metaheuristic algorithms is PSO.

In [30], PSO is used for solving the clustering problem. The paper shows that the standard PSO algorithm can solve the clustering problem and develop a PSO hybrid that takes the initial seed from the K-Means algorithm. A different approach for clustering with PSO is made in [31]. Despite other approaches, a particle represents only one cluster center and is affected by data points in this approach. Each particle has one part of the solution. All particles must be united and held as the final solution to solve the clustering problem. Accelerated chaotic particle swarm optimization (ACPSO) is proposed to solve the clustering problem [32]. This work uses chaotic maps in standard PSO velocity updating formulation to fast convergence.

Besides pure PSO algorithms, hybrid algorithms are used for solving the clustering problem.

A hybrid version of the PSO and K-Harmonic Means(KHM) algorithm is introduced [33]. The algorithm called PSOKHM works sequentially as PSO and KHM. The first eight iteration works as PSO, and after the eighth iteration, the algorithm works throughout four iterations as KHM. This switching process continues until the end of the iteration limit. A combination of PSO and Rough Set (RS) is presented in [34]. Unlike the other approaches, in this work, the dimension of the data point has a cluster membership. If a data point reaches a certain amount of membership in that cluster, that data point is assigned to that cluster. In [35], a hybrid algorithm called FAPSO-ACO-K is proposed that consists of the combination of fuzzy adaptive particle swarm optimization (FAPSO), ant colony optimization (ACO), and the K-means algorithm. The algorithm differs from the PSO as all particles have their global best value, which can be selected and updated via the ACO trail intensity mechanism. After the algorithm reaches the stopping criteria, the final global best value is considered the initial solution of the K-Means algorithm, and the K-Means algorithm starts clustering. At the end of the calculation, if the found solution is better than the global best value solution algorithm considers the output result of the K-Means solution. If it is imperfect, the algorithm considers the output result of the PSO-ACO part.

In conclusion, in two decades, comprehensive work has been done to solve this np-hard problem. In our work, we selected the objective function, cluster coding scheme in agents, and clustering performance parameters in light of these works.

2.3 Particle swarm optimization

Particle Swarm Optimization (PSO) is a swarm intelligence-based optimization algorithm [9]. The behavior of a flock of birds searching for a food source has inspired this algorithm. The algorithm’s core principle is that birds in the swarm share information about newly discovered food resources. The birds move toward the best food resource among the newly discovered resources. While migrating to the best food source, birds examine the search space for new food resources, causing the algorithm to scan most of the search space for alternative food sources, potentially identifying an approximate best solution. In the algorithm, each bird in the flock is called a particle. The position update formula of the particle is given in Eq.(1), and the velocity update formula of the particle is given in Eq. (2). The mathematical expression of the search for food behavior can be defined as follows:

$$\begin{aligned} x_{ij}(t+1)=x_{ij}(t)+v_{ij}(t) \end{aligned}$$
(1)
$$\begin{aligned} v_{ij}(t+1)&=v_{ij}(t)+c_1r_{1j}(t)\left[ y_{ij}(t)-x_{ij}(t)\right] \nonumber \\&\quad +c_2r_{2j}(t)\left[ \Upsilon _j(t)-x_{ij}(t)\right] \end{aligned}$$
(2)

where variable \(x_{ij}\) and \(v_{ij}\) denote the current value and velocity, respectively, of the particle i in dimension j at iteration t. Variable \(y_{ij}\) and \(\Upsilon _{j}\) are the best solutions found by particle and the best solutions of all particles, respectively. Constants \(c_{1}\) and \(c_{2}\) are the personal and social learning coefficient, respectively, and the variables \(r_{1j}\) and \(r_{2j}\) are the random numbers generated in the range of [0.0,1.0].

2.4 Genetic algorithm

The genetic algorithm was introduced in 1975 by Holland [8]. The genetic algorithm is an evolutionary algorithm based on the biological reproduction mechanism. In the genetic algorithm, each optimization agent is defined as a gene, and the collection of these genes is expressed as a population. There are two unique steps in this algorithm. The first is the crossover stage, which combines two genes to create a new one. The second stage is a mutation, which randomly modifies different regions of the newly formed gene. With the help of these unique steps, randomly generated populations may converge to a good solution. Over time, the genetic algorithm evolved. Crossover operators [36,37,38], mutation operators [39], and gene selection mechanisms [40] were all introduced in the literature [41,42,43,44].

2.5 Differential search algorithm

The seasonal migration of various animals searching for a productive livelihood is the theory behind the development of the DSA. All organisms join together to form a superorganism and begin searching for efficient habitats. The individuals of the superorganism examine whether they fit the provisional criteria of randomly chosen places during their travels. Persons of the superorganism announced the stopover quickly nestle and continue their voyage from that area if any site is suited for their temporary layover during the trip. During a stopover, the superorganism uses a random process related to the Brownian-like random walk to explore the sites left between the organisms. Then, donors are created by reorganizing all of the superorganisms’ individuals. These donors are ideal for locating the stopover location. Randomly chosen persons of the superorganism migrate towards the donor’s destination to effectively find the stopover area. This change in the site allows the superorganism to keep moving toward the global minimum. The DSA is convenient for multimodal optimization problems because it does not prefer to select the best possible solution for a presented problem correctly. The DSA contains two fine-tuning control variables based on the task at hand. Detailed information on the DSA can be found in [11].

2.6 Backtracking search algorithm

The BSA [10], which is based on Evolutionary Algorithms (EAs), is designed to address common issues in EAs, such as high sensitivity to control parameters and premature convergence. The BSA follows the same five steps as the traditional EA: initialization, selection-I, mutation, crossover, and selection II. During selection-I, The BSA computes the historical population as a pointer of the search path. At the beginning of each iteration, the algorithm can redefine the historical people. A person from a prior generation, chosen randomly, behaves like a memory until it is modified. The BSA has a different mutation and crossover strategy than the EA and its improved forms. In the mutation phase, merely one parameter is employed to govern the expanse of the search direction matrix while producing trial populations. Crossover, on the other hand, is a difficult concept to grasp. The final trial population is created using two ways. The first method controls the number of individuals who will mutate in a trial using a mix rate.

In contrast, the second method enables merely one randomly selected person to mutate in each attempt. The population is updated using greedy selection in the second stage of the BSA, in which persons with only high fitness values in the attempted population are employed. Despite its simplistic formation, the algorithm’s usage of the dual population approach may make it time and memory-consuming to compute.

2.7 Marine predator algorithm

The idea behind the Marine Predator Algorithm (MPA) is the movements of ocean predators while hunting for their prey. MPA search agents use both Lévy flight and Brownian motion. This algorithm has three main stages. Phase one is when the prey is moving more swiftly than the predator. Phase two is when both predator and prey are moving at nearly the same rate, and phase three is when the predator moves faster than the prey. In phase one, exploration stands out, and Brownian motion is used. Later, in phase two, both Lévy flight and Brownian motion are used for exploration and exploitation. After that, in phase three, only Lévy flight is used for exploitation. The algorithm uses Brownian motion for giant leaps, and for small steps, it uses the Lévy flight strategy [12].

2.8 K-means clustering optimization algorithm

The main idea of the K-Means Clustering Optimization Algorithm (KO) is to find better potential search spaces by clustering the search space into three clusters. Unlike the other optimization algorithms, KO focuses on improving the search space, not the search agent itself. KO updates the initial population only by the assigned search agent to that specific population. Besides, the population in KO shrinks from the initial population number N to four by eliminating the worst solutions. Search agents search majorly in their cluster areas within the perimeter, which is calculated and shrank through the iterations. Also, agents can search outside their defined search space within the limit of the formulation. In each iteration, cluster centers are recalculated, and cluster centers are getting closer to the global best position at the late stages of the algorithm. Search agents select their strategy in the next iteration according to a formulation and a threshold value. More detailed information on the KO can be found in [13].

2.9 Harris Hawks optimizer algorithm

The Harris Hawks Optimizer algorithm (HHO) is inspired by the predator bird Harris’ Hawks and their behavior of foraging with their flock and family members. In nature, other predator birds search and hunt down the prey alone. In contrast, Harris’ Hawks have a unique cooperative foraging behavior. HHO has two exploration and two exploitation strategies. Exploration strategies are based on random location perch or other family members’ location perch, and they have the same chances of selection. The transition from exploration to the exploitation stage is based on escaping energy of the prey, which decreases through iterations. While the prey’s escaping energy is high in the early stages of the algorithm, the search strategy is selected as exploration. At the later stages of the algorithm, the search strategy changes from exploration the exploitation because of the decreased prey’s escaping energy. For the exploitation stage, HHO has four possibilities: soft besiege, hard besiege, soft besiege with progressive rapid dives, and hard besiege with progressive rapid dives. More detailed information on the HHO can be found in [14].

3 Material and methods

3.1 Definition of clustering problem

Clustering is the operation of dividing a given set of n points in an N-dimensional Euclidean space by a predefined number of groups (or clusters), such as K, based on the measure of similarity/dissimilarity. For example, let us say a group of n points \(x_1, x_2,\ldots , x_n\) is represented by S, and a cluster K by \(C_1, C_2,\ldots , C_K\). In this case, the mathematical expression of the clustering operation can be defined as follows:

$$\begin{aligned} C_i\ne \varnothing \qquad i=1,\ldots ,K \\ C_{ij}= \varnothing , \quad i=1,\ldots ,K, \quad j=1,\ldots ,K,\nonumber \\ i \ne j \quad \bigcup _{i=1}^K C_i=S \end{aligned}$$

The variables that we will optimize in the problem are the cluster centers. In general, this problem is a minimization problem. The problem aims to minimize the sum of the Euclidean distance difference between the points belonging to each cluster center. For the points x and y given by coordinates in n-dimensional Euclidean space, the distance formula is given in Eq. (3).

$$\begin{aligned} d(x,y)=\sqrt{\sum _{i=1}^n(x_i-y_i)^2} \end{aligned}$$
(3)

Since each point in the dataset is assigned to a cluster, the objective function can be defined as in Eq. (4).

$$\begin{aligned} f=\sum _{i=1}^nd(x_i-z_j) \end{aligned}$$
(4)

where, \(x_i\) is the member of the cluster \(k_j\) with the cluster center of \(z_j\), n is the number of points in the cluster \(k_j\) and j is defined between 1 to K, which is the number of the predefined cluster. Thus, our objective in solving the clustering problem is the minimize the function f.

Since we have an objective function, we can solve the problem with metaheuristics algorithms. In our approach, cluster centers are the variables that need to be optimized. Firstly, cluster centers for K clusters are randomly generated. After that, all data is assigned to the nearest cluster with the nearest cluster center. Thus, each data can have one cluster. After assigning each point to the nearest cluster, a new cluster center is found by simply averaging the points assigned to the cluster to ease the algorithm’s work. After that, the objective function f is calculated with new cluster centers. The optimization process continues with new cluster centers until predefined stopping criteria are met. A flowchart of the process is given in Fig. 1.

Fig. 1
figure 1

Flowchart of the clustering problem algorithm

3.2 Proposed algorithm

3.2.1 Movement of the electricity

Electricity is a phenomenon that occurs in nature. In nature, electricity can be found in many forms, such as lightning, the human nerve system, and some animals’ defense system. The movement of the electricity is called electric current and is defined as simply the motion of the atom’s valance electron. Suppose an existing potential energy source with a conductor attached to the poles of the energy source has an energy that is large enough to attract or repel an electron. In that case, the electrons in this source can move through the negative pole of the energy source to the positive pole with the help of the valance electrons in the conductor’s atom. By its nature, electricity likes to move in low-resistive areas. Since areas of less resistance require less energy than areas of higher resistance, it seeks areas of minimal resistance to moving. While searching the less resistive areas, electricity moves like Brownian motion and leaves behind Lichtenberg figures’ pattern. The mathematical structure of the search model of the proposed algorithm is based on the generation of Lichtenberg figures generated by electricity in high resistance fields [45]. In Fig. 2, we can see the pattern of the Lichtenberg figure. Although the shapes have random patterns, they are similar to trees that branch. Because of the tree structure, these shapes are also called Brownian Trees.

Fig. 2
figure 2

Lichtenberg figure created on wood with high voltage electricity [46]

The mathematical expression of how the valence electron moves in a high resistive area is given in Eq. (5) [45].

$$\begin{aligned} p(x,t)=\frac{N}{\sqrt{4\pi Dt}}\times \mathrm{{exp}}\left( \frac{-x^2}{4Dt}\right) \end{aligned}$$
(5)

where, N represents the total particle number, D the diffusion coefficient of the environment, t time, and x position.

Fig. 3
figure 3

Burn marks created on wood on high voltage electricity

The electricity moves to the opposite pole to complete the loop via low-resistance areas. The main inspiration of the work is the movement of the electricity while trying to complete the loop and its search for low resistive paths, not the shortest paths.

If we examine the Lichtenberg figures, we can see that electricity diverts to low resistive areas even if it has a chance to complete the loop via the shortest path. It moves around the high-resistive areas and branches to the low-resistive regions. The start point of the poles is the edge of the wood surface and searches the low resistive areas to the center. If we imagine the wood surface as a search space, the initialization scheme of the proposed algorithm cannot be random. It has to start at two different and far points, which is different from most metaheuristic algorithms. The electricity searches the entire wood surface slowly but stably. This persistence makes our algorithm tackle the early convergence problem. The electricity movement was done in the poles, which is a different mechanic for a search algorithm. In our proposed algorithm first half of the population searches the search space towards the opposite pole.

Meanwhile, the best location of the pole and the current best location of the entire search space is updated. After that other half of the population starts to search with updated values of the best locations. The search starts with only one agent in each pole, and new search agents are created with local search. While creating the agents, the best locations are also updated. This mechanism gives our algorithm a dynamic structure.

Moving steps of the alternating current high voltage electricity on wood given in Fig. 3. If we examine Fig. 3, we can see in 3a that the electricity starts to move with random branches. Because the alternating current, negative and positive poles of the electric source are shifting, it causes the movement of the electric start on both poles, respectively. Regular electric grid alternates at 50 or 60 Hz, depending on the country. It means polar shift happens every 0.02 or 0.0167 seconds. Because the shifting is too fast, the human eye cannot see the shifting motion, and it may seem that electricity moves from both poles simultaneously. However, it moves respectively. If we look at 3b right side of the pole is branched with two main branches, and the left side branched three separate ways but only continues from one. Later 3c, we can see that the left side of the pole continues from one way while the right side selects the main way to move forward. After that, in 3d, we can see that left side of the pole, the main way is abandoned, and a new main way is generated while the right side of the pole adjusts itself to changes in the left pole. When we look at the 3e and 3f, it may seem electricity almost completes the loop, but because of the high resistive areas main way on the left side of the pole is abandoned. A new way starts from the earlier main way, which is generated in 3c, and the right side of the pole adjusts itself and curves upward to reach the left side of the pole. In images 3g and 3h, we can see both poles search the wood for low resistive areas and move towards each other. If we look at the starting point of both poles, the shortest way is the linear line, but electricity chooses to consume less energy to move and complete the loop. This structure of the search mechanism creates our algorithm’s search pattern. Our proposed algorithm mimics this natural phenomenon and tries to search the entire search space with branches while abandoning high resistive areas and transferring energy to more valuable areas.

Fig. 4
figure 4

Flowchart of proposed ESA

3.2.2 Algorithm steps

The proposed algorithm mimics the event of the movement of electricity on a high resistive area or surface. Like electricity, our algorithm has two poles which are negative and positive. The algorithm starts in these poles. The search agent in the negative pole starts a search by referencing the minimum values of the search space dimensions. The search agent starts a search with maximum values of the search space dimensions for the positive pole. The search begins with an agent at both poles sequentially and increases the number of agents while agents of opposite poles attract each other. The maximum number of agents for each pole is equal and predefined at the beginning of the algorithm. Agents are attracted to both the best of the opposite pole agent and the global best agent, which can be in the opposite pole or the same pole as the agent. As shown in Fig. 2, many branches encountered with very high resistance are terminated prematurely and are not continued moving to the opposite pole. We can say those branches failed for search or are trapped in local extremum points. To mimic this mechanism, we added a failure variable for the agents. This variable will be increased whenever an agent fails to provide a better solution for the problem. That means our agents have a memory feature that can remember the best solution of their own.

After an agent passes the limit of the failure variable, the proposed algorithm will delete the agent, and a new agent will be created in deleted agent’s assigned pole. This mechanism provides the algorithm solution for the problem of being stuck up in the local extremum point. As a result of that, the population gets better diversity. The flowchart of the proposed algorithm is given in Fig. 4. The proposed algorithm uses Eq. (6) for global search and Eq. (7) for local search. Besides that, the local search formula is an adapted version of Eq. (5).

$$\begin{aligned} x_i(t+1)= & {} 0.7\times \left( r_1\times x_i+\left( 1-r_1\right) \times x_{\mathrm{{best}}_i} \right) \nonumber \\{} & {} +0.3\times \left( r_2\times x_i+\left( 1-r_2\right) \times x_{\mathrm{{pole}}_i}\right) \end{aligned}$$
(6)
$$\begin{aligned} x_i(t+1)= & {} x_i+\frac{0.3\times \Delta x_\mathrm{{best}}+0.7\times \Delta x_{\mathrm{{pole}}}}{\sqrt{4\pi Dt}} exp\left( \frac{-1}{4Dt}\right) \nonumber \\ \end{aligned}$$
(7)

In Eq. (6), variable \(x_i\) denotes the current value of the agent i. Variables \(r_1\) and \(r_2\) are the random numbers generated in the range of [0.0,1.0]. Finally, variables \(x_{\mathrm{{best}}_i}\) and \(x_{\mathrm{{pole}}_i}\) denote the values of the best agents in the search space and opposite pole in that iteration, respectively. Eq. (6) mimics the attraction of the opposite poles. The values \(x_{\mathrm{{best}}_i}\) and \(x_{\mathrm{{pole}}_i}\) change in every iteration, and these values are specific for the running iteration. The change in values mimics the adjustment when the pole shifting occurs and a new main way for the pole is generated. The fixed values 0.7 and 0.3 are attraction coefficients selected by numerous experiments while developing the algorithm. Because we want exploration to occur in small and big steps, we limit the opposite pole attraction to 0.3 and the best location attraction to 0.7. Attraction steps have two possibilities: the best location occurs in the opposite pole, and the best location occurs in the same pole. When the best location occurs in the opposite pole search agent moves with big steps to the opposite pole and if the best location occurs in the same pole search agent moves with small steps to the opposite pole. Either way, the agents always attract to opposite poles and tend to move towards opposite poles.

Fig. 5
figure 5

a Initialization of ESA, b Global and local search in early stages of ESA, c Global and local search in late stages of ESA, d Completion of search process of ESA, e Failing Branches, f Randomly created branches

In Eq. (7), same as Eq. (6), variable \(x_i\) denotes the current value of the agent i. Variables D and T represent the diffusion coefficient and iteration number, respectively. Finally, variables \(\Delta x_{\mathrm{{best}}}\) and \(\Delta x_{\mathrm{{pole}}}\) indicate the distance between the current agent’s position and the best agents of the search space and opposite pole, respectively. With the help of Eq. (6), the proposed algorithm uses a big step to reach the global best solution, and with Eq. (7), the algorithm uses a small step to approach the opposite pole’s best solution. All properties and phases of the ESA are visualized in Fig. 5. In section (a) of Fig. 5, the initialization of the ESA is shown. In sections (b) and (c), global and local search is initiated. Finally, in part (d) search process is completed. Subdivision (e) of Fig. 5 shows us the failing branches of the algorithm. In subdivision (f), we can see the creation of random new twigs and boughs resulting from the deletion of failing branches through the iterations. Eq. (7) mimics the branching structure of electricity. The phrase \(0.3\times \Delta x_{\mathrm{{best}}}+0.7\times \Delta x_{\mathrm{{pole}}}\) determines the direction and the phrase \(\frac{1}{\sqrt{4\pi Dt}}\times exp\left( \frac{-1}{4Dt}\right) \) determines the magnitude of the step. The algorithm’s exploitation phase consists of the agent’s branching movement after the exploration phase. The fixed values 0.7 and 0.3 are selected for the same purpose as in Eq. (6) to limit the branching with big and small steps. Also, this exploitation stage has two possibilities: When the best location occurs in the opposite pole, the search agent branches with big steps to the opposite pole, and if the best location occurs in the same pole, the search agent branches with small steps to the opposite pole.

The fixed values 0.7 and 0.3 can be adjusted for the problem, but we choose not to temper these coefficients. Because tuning these coefficients to best is itself an optimization problem.

The failure limit mimics the abandonment of the main ways and transferring energy to a different area in phenomena. This structure helps the algorithm not to trap in local points. The failure limit is determined with the ‰3 of the maximum iteration number. Also, the failure limit can be tuned. The failure limit determines whether the agents spend time on exploration or exploitation. For more exploration failure limit can be decreased, and for more exploitation failure limit can be increased.

Graphical abstract of the proposed algorithm is given in Fig. 6 and pseudo code of the algorithm is given in Algorithm 1

Fig. 6
figure 6

Graphical abstract of the proposed ESA algorithm

figure a

3.3 Test environment setup

We tested the proposed algorithm with four frequently used benchmark functions. These functions have multi-modal, unimodal, separable, and non-separable properties. Multi-modal functions have more than two local extremum points and are hard to solve compared to unimodal functions because of the probability of trapping in local extremum points. Solving a non-separable function is more challenging than solving a separable function because function’s each variable depends on the other variables. The comparison functions used to test the algorithm are given in Eq.(8), Eq. (9), Eq. (10), and Eq. (11) and named Rosenbrock, Ackley, Griewank, and Rastrigin functions, respectively. All benchmark problems are minimization problems; therefore, minimum results are better.

$$\begin{aligned} f(x)= & {} \sum _{i=1}^{d-1}\left[ 100\left( x_{i+1}-x_i^2\right) ^2+\left( x_i-1\right) ^2\right] \end{aligned}$$
(8)
$$\begin{aligned} f(x)= & {} -20\times \mathrm{{exp}}\left( -0.2\times \sqrt{\frac{1}{d}\sum _{i=1}^d x_i^2} \right) \nonumber \\{} & {} -\mathrm{{exp}}\left( \frac{1}{d}\sum _{i=1}^d\cos \left( 2\pi x_i\right) \right) +20+\mathrm{{exp}}(1) \end{aligned}$$
(9)
$$\begin{aligned} f(x)= & {} \sum _{i=1}^d\frac{x_i^2}{4000}-\prod _{i=1}^d\cos \left( \frac{x_i}{\sqrt{i}}\right) +1 \end{aligned}$$
(10)
$$\begin{aligned} f(x)= & {} 10d+\sum _{i=1}^d\left[ x_i^2-10\cos \left( 2\pi x_i\right) \right] \end{aligned}$$
(11)

Rastrigin, Ackley, and Griewank equations have a multi-modal property, and the Rosenbrock equation has unimodal property. Also, Ackley, Griewank, and Rosenbrock equations have a non-separable property, and the Rastrigin equation has separable property. We compared the proposed algorithm with GA, PSO, BSA, and DS algorithms on these four benchmark functions with three different dimension values, which are 10, 20, and 30, respectively. The control parameters of these algorithms are given in Table 1. In all test cases, the population limit was set to 100. Also, for fair evaluation, the iteration limit was set to 500, 1000, and 1500 for dimensions 10, 20, and 30, respectively.

Table 1 Control parameters of algorithms
Table 2 The basic parameters of the 100-digit challenge

The primary purpose of selecting the CEC 2019’s 100-Digit Challenge is that if we expand the time and iteration limit, our algorithm can perform better because its search mechanism structure tends to update the best locations. Besides that, the failure structure of the agents makes the agents born at different locations in the respective pole, which gives the algorithm better exploration within this expanded time and iteration limit. Because of the rounded and shifted versions of the problems, we can claim that our proposed algorithm does not tend to memorize the problems and searches randomly according to our formulations.

The CEC 2019’s 100-Digit Challenge problem can be found in [47]. The challenge consists of ten problems; seven of them are shifted and rotated. The other three problems are new and challenging. The aim is to calculate the accuracy of the function up to 10 digits without a time limit. Boundaries and dimension values of the IEEE CEC (2019) functions are given in Table 2. Control parameters of the proposed algorithm are set to population=30 Diffusion Coefficient=1.49 Iteration Number=1E+06.

The proposed algorithm and the popular clustering algorithm K-Means are compared on performance in solving the clustering problem. For comparison, we used four real-life datasets provided by the UCI Machine Learning Repository, which are Iris [48], Wine [49], Seeds [50], and HCV [51] datasets. Both for the proposed algorithm and K-Means algorithm, the maximum iteration is set to 150, and the highest cluster number is set to 10. Besides, the proposed algorithm’s agent number is limited to 200, and the failure limit is set to 5. The properties of these data sets are given in Table 3. The Davies–Bouldin index (DB-Index) [52] was used to evaluate the clustering problem performance. DB-Index measures the compactness and separateness of the clusters. The smallest DB-Index indicates that clustering performance is better for the found cluster centers.

Table 3 Dataset properties

3.3.1 Statistical tests

The comparability of the algorithms has frequently been inferred through hypothesis testing [53]. However, to conclude, it is necessary to determine the null hypothesis H0 and the alternative hypothesis H1. The null hypothesis is a statement that often denotes no differences between the algorithms being compared. The alternate hypothesis, on the other hand, represents the differences. For our purposes:

H0: There is no difference between compared algorithms

H1: There is a difference between compared algorithms

The statistical test probability value (\(\alpha \)) also determines whether the hypothesis should be disproved. The acceptable level for our test is 0.05.

A non-parametric statistical test, the Friedman test, was first presented by Friedman [54, 55]. It has been used to specify differences in how specific algorithms behave. The Friedman test results for compared algorithms are displayed in columns, and test cases are represented in rows. The test begins by ranking each row according to the values of the columns in the row, after which it calculates the overall rank values for each column. The \(X^2\) (Chi-square) distribution with k-1 degrees of freedom (df), where k is the number of compared methods, is used to calculate the test’s significance.

We can find the appropriate \(X^2\) values for df in [56]. Expected \(X^2 = 9.49\) from the table in [56] because \((\mathrm{{d}}f)\)=4 and \(\alpha =0.05\). The null hypothesis must be rejected if the actual \(X^2\) value is higher than anticipated.

Another non-parametric statistical test used to identify differences between two samples or algorithms is the Wilcoxon signed-rank test [57]. For creating the difference vector, the test typically starts with discrepancies between the two algorithms’ outputs, which have sizes of \(N*1\), where N is the total number of tests. Then, it assigns a rank of 1 to each row in the vector, starting with the minimum value. The difference vector’s \(R_-\) and \(R_+\) values are then calculated. The T value of the test, \(\min (R_-, R_+)\), is calculated based on the data. As a result, using the T value, the test’s probability value p is calculated.

4 Results and discussions

4.1 Comparison of the results of benchmark functions

All test cases are run ten times with relevant settings. Rosenbrock, Ackley, Griewank, and Rastrigin functions’ evaluation results for 10, 20, and 30 dimensions are given in Table 4. All values in Table 4 are the mean of the executed ten runs with corresponding algorithms.

Table 4 Evaluation results of the benchmark functions
Fig. 7
figure 7

Rosenbrock function in a 10 Dimension, b 20 Dimension and c 30 Dimension

Fig. 8
figure 8

Ackley function in a 10 dimension, b 20 dimension and c 30 dimension

Fig. 9
figure 9

Griewank function in a 10 dimension, b 20 dimension and c 30 dimension

Fig. 10
figure 10

Rastrigin function in a 10 dimension, b 20 dimension and c 30 dimension

We can see in Table 4 that our algorithm, ESA, has given almost similar results with MPA, KO, and HHO in each test case. Our algorithm found an exact solution in all ten runs in six of the twelve cases. Both HHO and KO have also found an exact solution in all ten runs in six of the twelve cases. MPA only found five exact solutions.

We can see in Figs. 7a, 8a, 9a, and 10a that ESA has performed a tremendous step-down pattern that is a product of our failure deletion procedure. Those failing agents are deleted whenever ESA is trapped in a local minimum point, and brand-new agents are created at corresponding poles. The Rastrigin function is the easiest of all three benchmark functions besides the Ackley and Griewank functions, which are at a similar difficulty.

ESA found better solutions than GA, PSO, BSA, and DS but only in Ackley with 20 and 30 dimensions; ESA found better results than MPA, KO, and HHO. Other than ESA performs similar results compared to MPA, KO, and HHO.

Our proposed algorithm is designed to handle the problem with a wide range of iterations and time limits. If we double the iterations, our algorithm performs the same as the MPA, KO, and HHO and always finds the exact solutions for all test cases.

4.2 Statistical analysis

4.2.1 Friedman test

We can see in Table 5 which algorithm is more successful. The algorithm that has a minimum rank value has a better performance compared to other algorithms. As shown in Table 5, ESA performs second best in all test functions.

Besides that probability value is found \(1,0245E-10\) is way less than \(\alpha =.005\), \(X^2\) value is 60, 842672, which is higher than expected value of 14.07. This means null hypothesis H0, ”There is no difference between compared algorithms” must be rejected. It means that the H1 hypothesis is true: ”There is difference between compared algorithms”. Results of the Friedman test were given in Table 6.

4.2.2 Wilcoxon signed rank test

The major differences between the compared methods are described by p values. For the null hypothesis to be rejected, these numbers need to be lower than 0.05. If not, it must accept it, implying that the comparing methods’ performance is equivalent.

We can see in Table 7 that ESA has yielded a different performance than the most of the compared algorithms. It means our proposed algorithm behaves differently from the compared algorithms. However, in the case of MPA and KO, our algorithm performed similar results.

4.3 Comparison of the results of the IEEE CEC (2019) 100 Digit Challenge Problems

100 Digit Challenge is inspired by Æsop’s fabled race between Tortoise and Hare. The main idea of the fable is that relying too much on speed as the success criterion is not suitable for always. Because of that primary goal of the challenge is to determine whether the algorithms that search persistently or aggressively are better.

The competition results and analysis are published in [58]. We can see in Table 8 that our proposed ESA algorithm has a score of 63,96. Due to our hardware limitations, we have to limit our iteration limit to \(10E+06\) and population size to 30. If we look at the competition results average iteration is between \(7*10E+8\) and \(8*10E+10\), and the population size is between 200 and 31623. Even with the hardware limitations, our proposed algorithm achieved good results. If we examine the [58], our score is above the bottom in the primary and middle in the secondary algorithms.

4.4 Comparison of the results of the clustering problem

We used the Davies–Bouldin index (DB-Index) for the performance analysis of the clustering problem. For all datasets, cluster number ”k” is started with two and ends with ten. It means that we have thirty-six test cases. We calculated the Davies–Bouldin index for each test case for ESA and K-Means algorithm and compared them. The comparison of the DB-Indexes is given in Table 9, and graphical comparisons of the DB-Indexes for the ESA and K-Means algorithm are given in Fig. 11.

We can see in Table 9 that ESA has better clustering performance than K-Means in all test cases. Also, if we look at the dataset basis, ESA conducted better clustering than K-Means on each dataset. In test cases where k is two, ESA and K-Means algorithms have the same DB-Index. The situation mentioned earlier refers to the fact that ESA has conducted the same performance on the clustering problem as the popular clustering algorithm K-Means. That proves that our algorithm can solve the clustering problem.

DB-Indexes are used to determine the optimal number ”k” and perform clustering analysis. Smaller DB-Indexes indicate generated clusters are compact and far from each other. For example, we can see in Fig. 11 that ESA has generated better clusters than the K-Means algorithm on all datasets for all tested k values.

5 Conclusion

This paper introduces a new nature-inspired metaheuristic optimization algorithm called the ESA. It is based on the movement of electricity in high-resistive areas. Like other metaheuristic algorithms, the ESA needs a population to start a search. This population consists of search agents who have a memory and a lifespan. Also, the ESA has a unique structure called negative and positive poles. Suppose we match these terms with electrical terms. Then, we can say that population is like the voltage, search agents are electrons, and pole structure is electrical polarity structure like negative and positive. Thus, electrons in opposite poles tend to move to each other. This move tendency is the basis of the search mechanism of the ESA.

To determine the efficiency and reliability of the proposed algorithm, the ESA is tested with four benchmark functions and solved the clustering problem, which is an np-hard problem with four different datasets. Benchmark functions are commonly used in literature, and data sets are well-known datasets. All benchmark function results are compared with seven different algorithms: GA, PSO, BSA, DS, MPA, KO, and HHO, and these algorithms have similar search mechanisms to the proposed ESA. As a result, the proposed ESA exhibits gratifying exploitation, exploration, and convergence characteristics with respect to GA, PSO, BSA, and DS and have similar results with MPA, KO, and HHO. Furthermore, with a great convergence rate, the ESA provides better results than the GA, PSO, BSA, and DS. Most of the time, while other algorithms cannot find the exact solutions, the ESA finds the exact solutions. Besides that, the statistical findings support the test results and show that ESA performs better than the other compared algorithms. The test results of the 100 Digit Challenge show us that our algorithm can handle complex problems even with limitations. Furthermore, ESA handled the problem efficiently for clustering performance and performed better clustering capability than the K-Means algorithm.

Table 5 Mean ranks of the algorithms

We can see that if we increase the time and iteration limit for solving the problems, our proposed algorithm can yield better results. Because of the design of the proposed algorithm, the search mechanism works at a slow but persistent rate to search valuable search spaces. Instead of using big steps for exploration, our algorithm uses mostly small steps. Therefore the iteration number for finding the exact solution is increased. Thanks to the failure mechanism and the initialization scheme, even in challenging problems proposed algorithm never traps at the local points. Results show that the proposed algorithm can get near the exact solution but, due to iteration limits, in some cases, never reaches the exact solution. The proposed algorithm differs from other metaheuristics compared to its initialization scheme, pole search mechanism, and update strategy of the best solutions.

Table 6 Friedman test statistics of the algorithms

We used a fixed ratio for exploration and exploitation movements. A dynamic ratio can be applied to the algorithm to overcome the slowness and late convergence. We used the diffusion coefficient of wood in our experiments; for future work, different diffusion coefficients can be used, and the effect on the algorithm can be compared. In addition, different initialization schemes of the initial population can be examined.

Table 7 p values for the Wilcoxon signed rank test
Table 8 Fifty runs for each function sorted by the number of correct digits
Table 9 Comparison of the DB-indexes
Fig. 11
figure 11

DB-indexes of ESA and K-means algorithm on a Iris dataset, b Wine dataset, c Seeds dataset and d HCV dataset