1 Introduction

Early-stage diagnosis plays a major role in increasing the chance of recovery from breast cancer. World Health Organization (WHO) has reported that WHO estimates that there may be an increase in the cancer incidence of 27.5 million in the year of 2040, with 16.3 million deaths due to cancer [1]. Currently, the average risk of a woman in the USA developing breast cancer sometime in her life is about 13%. According to the American Cancer Society, in the USA in the year 2021, invasive breast cancer expected is 281,550 of new cases, and about 43,600 women are estimated to die because of breast cancer [2]. In metropolitan cities in India such as Mumbai, Chennai, Delhi, Bangalore, Ahmadabad and Bhopal, noninvasive breast cancer has affected 28% to 35% of the women population [3]. Thus, breast cancer has become a serious health issue around the globe and early detection is essential in reducing life fatalities [4]. Early detection can be done using various scanning methods such as magnetic resource imaging, ultrasound imaging, self-check-up, mammography and biopsies [5]. Traditionally followed breast cancer methods consume more time for diagnosis and they fail because of inaccurate diagnosis caused by human errors. Automated computer-based diagnosis schemes overcome the demerits of manual diagnosis, and hence, unnecessary surgeries and biopsies can be avoided [6]. Expert systems based on ANN have strong predictive capabilities which makes it suitable for building medical diagnosis systems [7]. ANN-based decision-making systems have outperformed the traditional technique used for classifying patterns.

Metaheuristic-based swarm intelligence approach is used for real-time optimization problem-solving [8,9,10]. Commonly used swarm intelligence approaches are the ant colony optimization (ACO) [11] and the particle swarm optimization (PSO) [12] inspired by the foraging behavior of ants and social behavior of birds, respectively. The echolocation capability of the microbats available in nature forms the basis of the bat algorithm (BA) [13]. A population-based swarm technique introduced based on the foraging behavior of honey bees [14]. The dynamic and static behavior of dragonflies forms the basis of a new metaheuristic algorithm called dragonfly algorithm (DA) [15]. Based on the herding behavior of krill, another swarm technique is proposed called krill herd (KH) algorithm [16].

A technique based on migration behavior, called the monarch butterfly optimization (MBO), is introduced [17]. The foraging behavior for the survival of E. coli bacteria forms the basis of the bacterial foraging optimization (BFO) [18]. Another swarm technique called the artificial immune system (AIS) is inspired by the biological immune system of the human body [19]. An algorithm for global optimization based on interior design and decoration [20]. A salp swarm algorithm (SSA) based on the swarming behavior of the salp in the ocean is introduced that can be used to solve multidimensional optimization problems [21]. Based on the Brownian movements and Levy movements of the predators during their foraging process, another swarm technique called marine predictor algorithm (MPA) was proposed [22]. This paper focused on hybridizing artificial bee colony optimization with the whale optimization algorithm to introduce the HAW algorithm. The proposed HAW algorithm integrates the employee bee phase of the ABC with the encircling prey/bubble net attacking method to have an enhanced exploitative phase called the employee attacking phase. In the employee attacking phase, the bees follow the bubble net attacking method of the whales to find out better food sources. The explorative phase of the HAW is driven by a mutative initialization phase of the standard ABC algorithm.

Appropriate selection of ANN topology design parameters such as the number of hidden layers, numbers of hidden nodes, initial weight values between the connections, learning rate and algorithm plays a vital role in building a successful ANN model [23]. The convergence of the backpropagation learning process can be affected by the improper selection of weights making the learning process to be trapped in the local optimal locations [24, 25]. Improper selection of the hidden nodes may make the ANN classifier to deal with the problems of under fitting and over fitting. If the usage of hidden nodes in an ANN model is not appropriate to the amount of learning required for accurate diagnosis, then the ANN classifier may be either overtrained where the ANN model can give accurate results in case of training and fails with inaccurate results in the case of testing or undertrained where the prediction rate decreases. Based on the above discussion, this paper focuses on optimal selection of the value of initial weights and the optimal selection of the hidden node sizes of an ANN model using HAW algorithm with the help of a wrapper architecture such that work aims at improving the learning performance of an ANN avoiding the problems of overfitting and underfitting with increased predictive capabilities.

FS deals with the deletion of irrelevant, redundant and noisy features present in the input dataset of a classifier. FS improves the generalization of an ANN classifier system with reduced computational time, as demonstrated in [26, 27]. Hence, simultaneously optimizing the input features and design parameters of ANN such as the initial weights and hidden node size can increase the predictability of the ANN classifier. Swarm-based intelligent systems are used for coupled optimization of input features and ANN design parameters [28]. Due to the importance of simultaneous optimization of ANN design parameters and FS process, the ANN topology optimization can be coupled with ABC optimization due to its powerful local and global search capabilities in finding out global optimal solutions.

This paper focuses on the following objectives:

  1. (i)

    A hybrid ABC-WOA optimization (HAW) that integrates the encircling prey and the bubble net attacking method of WOA with the employee bee phase of standard ABC to form an employee attacking phase.

  2. (ii)

    To make HAW to escape from local optimum locations, the proposed employee attaching phase uses the simulated annealing technique.

  3. (iii)

    To get a diversified set of solutions, exploration of the HAW is enhanced using the proposed mutative exploration phase of ABC.

  4. (iv)

    HAW is implemented for optimal feature subset selection and ANN parameter optimization using Wisconsin breast cancer dataset. HAW-optimized ANN model is evaluated in terms of accuracy, complexity and computational time.

1.1 Artificial bee colony (ABC) optimization

A metaheuristic swarm-based search mechanism called ABC is introduced by Karaboga in 2005. It is a population-based approach, inspired by the foraging nature of honey bees that solves multidimensional and multimodal real-time optimization problems for different applications, as demonstrated in [29]. ABC is based on a stochastic process that is robust and highly flexible with a lesser number of control parameters that make it simple. The algorithmic steps of the ABC optimization process are described in Algorithm (1):


Algorithm 1: Artificial bee colony algorithm

  • Step 1: Initialization:

  • Food sources are randomly produced using Equation (1).

    $$A_{k}^{l} = A_{k}^{l} + random\left( {0,1} \right)*\left( {A_{{\max }}^{l} - A_{{\min }}^{l} } \right)$$
    (1)
  • \(A_{k}^{l}\) represents kth food source with lth parameter and j = 1, 2………N, in which N represents maximum food sources. l = 1, 2……dim, in which ‘dim’ represents the dimension representing the number of parameters in the optimization problem. \(A_{max}^{l}\) and \(A_{min}^{l}\) are the minimum and maximum bound of the lth parameter of the optimization problem, respectively.

  • Step 2: Quality Evaluation of food source:

  • The fitness values are identified for each food source \(A_{k}\).

  • Step 3: Employed bee Phase:

  • Food sources are assigned to employee bees or worker bees. The employee bees use Eq. (2) to search neighborhood food sources surrounding the current food sources \(A_{k}^{l}\).

    $$E_{k}^{l} = A_{k}^{l} + random\left[ { - 1,1} \right]{ }*\left( {A_{k}^{l} - A_{d}^{l} } \right)$$
    (2)
  • \(A_{d}\) is a random food source where d ∈ {1, 2…, N}. ‘l’ is a random integer, and i = {1, 2…, dim} and ‘d’ should not be equal to ‘l’ for proper exploitation. If the quality of \(E_{k}^{l}\) is greater than \(A_{k}^{l}\), then bee discards \(A_{k}^{l}\) saving \(E_{k}^{l}\) or vice versa.

  • Step 4: Onlooker Bee Phase:

  • Information regarding the selected food sources is shared with the onlooker bees. The probability value \(Z_{k}\) of each food source received from the employee bee is calculated using Eq. (3).

    $$Z_{k} = \frac{{fitness\left( {A_{k} } \right)}}{{\mathop \sum \nolimits_{k = 1}^{N} fitness\left( {A_{k} } \right)}}$$
    (3)
  • The quality of the food source \(A_{k}\) is represented as \(fitness\left( {A_{k} } \right)\). The value \(Z_{k}\) of food source is compared with a \(random\left( {0,1} \right)\). Food sources with a \(Z_{k}\) value greater than \(random\left( {0,1} \right)\) are selected by the onlooker bees.

  • Step 5: Food source memorization:

  • The food source with the highest \(fitness\left( {A_{k} } \right)\) is selected and memorized.

  • Step 6: Scout bee phase:

  • In the scout bee phase, unimproved food sources are identified based on a counter value and they are replaced by a randomly generated food source according to Equation (1).

1.2 Whale optimization algorithm (WOA)

WOA is a population-based swarm intelligence metaheuristic algorithm introduced by Mirjalili and Lewis [30] which is inspired by the foraging behavior of humpback whales. The humpback whales’ hunts group of krill or fishes using shrinking circle and producing bubbles in a circle ‘9′-shaped path. The exploitation phase is carried out using encircling prey and bubble net attacking based on the spiral. A random search of prey is used for exploration. The exploitation phase of WOA is explained in Algorithm (2).

Algorithm 2: Encircling prey/bubble net attacking of WOA

To hunt the prey for survival, humpback whales encircle around the prey which can be mathematically represented using Eqs. (4) and (5).

$$Y = \left| {\vec{L}.A^{*} \left( t \right) - A\left( t \right)} \right|$$
(4)
$$\vec{A}\left( {t + 1} \right) = \overrightarrow {{A^{*} }} \left( t \right) - \vec{M}*Y$$
(5)

where t represents the current iteration, \({\text{A}}^{*}\) represents the best solution found so far, A gives the position vector, | | represents the absolute value, and L and M represents the coefficient vectors that can be obtained using Eqs. (6 and 7).

$$\vec{M} = 2\vec{m}*\vec{r} - \vec{m}$$
(6)
$$\vec{L} = 2* \vec{r}$$
(7)

where m is reduced linearly starting from 2 till 0 as the iteration proceeds. r represents a random vector from a uniform distribution between [0,1]. Each whale that represents a solution updates its position using Eq. (5) where the updated new position of the whale depends on the best position (prey) found so far. The position of the whales can be controlled by the adjustment of vectors L and M. The value of m is decreased to achieve the shrinking encircling behavior using Eq. (8).

$$m = 2 - t\frac{2}{MaxIterat}$$
(8)

where t represents the current iteration and MaxIterat represents maximum iterations. The new position of the whale on the spiral path can be calculated using Eq. (9);

$$\vec{A}\left( {t + 1} \right) = Y^{\prime}*e^{ws} *\cos \left( {2\pi l} \right) + \vec{A}*\left( t \right)$$
(9)

where \({{{{Y}}^{\prime}}} = \left| {{{\vec{{{A}}}}}^{*} \left( t \right) - ~{{\vec{{{A}}}}}\left( t \right)} \right|\) which indicates the distance of a whale and the best solution (prey). w is a constant that represents the shape of the logarithmic spiral. s is the random number generated between [− 1,1]. Hence, the updated new position of the whale is calculated using 50% probability using a random number \(P_{i}\) generated between [0,1] as represented by Eq. (10).

$$\vec{A}\left( {t + 1} \right) = \left\{ {\begin{array}{*{20}c} {\vec{A}\left( {t + 1} \right) = \overrightarrow {{A^{*} }} \left( t \right) - \vec{M}*Y} & {if (P_{i} < 0.5)} \\ {\vec{A}\left( {t + 1} \right) = Y^{\prime}*e^{ws} *\cos \left( {2\pi l} \right) + \vec{A}*\left( t \right)} & {if \left( {P_{i} \ge 0.5} \right)} \\ \end{array} } \right.$$
(10)

1.3 Comparative Investigation of ABC and WOA in terms of exploration and exploitation

In the context of exploration, WOA uses the search of prey phase for exploration that completely depends on a random search agent which is a stochastic strategy. In the same way, ABC incorporates scout bees for exploration with the help of a random search. This makes both the algorithms to produce solutions concentrated in a local area at the initialization phase, losing its diversification. Hence, the search process prematurely converges returning sub-optimal solutions in both of the algorithms. Hence, both ABC and WOA are weak at exploration.

In the context of exploitation, the local search process is incorporated using the encircling prey and the bubble net attacking method. The WOA exploitative phase guarantees convergence since positions of the whale are updated using the best solution (prey) obtained so far. Hence, proper exploitation is guaranteed by the encircling prey and the bubble net attacking method in the direction toward the prey since the search process is always guided by the best solution found so far. Comparatively, ABC exploitation is carried out using the employee bee phase and the onlooker bee phase where the positions of the food sources are updated by changing the single parameter of the old solution (food source) that causes the existence of similar food sources that converge at the same optimum locations. Also, the local search of ABC cyclically revisits the same solutions that create the problem of looping making the search converge prematurely. Hence, WOA is better at exploitation as compared to ABC.

1.4 Problems that are addressed by the proposed HAW

Many researchers have used ABC and WOA to develop optimal classifiers that can be used for medical diagnosis purposes, but still, the standard ABC suffers from the following issues which HAW addresses.

  1. (i)

    The local search by the employee and onlooker bee cyclically revisits similar solutions inducing the problem of looping making the search process converge prematurely.

  2. (ii)

    ABC optimization makes the solutions to be concentrated in local regions due to a lack of diversified solutions at initialization.

  3. (iii)

    The food source positions are updated by changing the single parameter of the old solution (food source) which causes the existence of similar food sources that converge at the same optimum locations.

  4. (iv)

    Exploitation is performed by two phases, namely the employee bee phase and the scout bee phase, whereas the exploration process is done only by scout bees, which leads to an imbalance in exploration and exploitation.

HAW that is capable of resolving the above issues can be used for generating an optimized ANN classifier that can accurately and efficiently be used for breast cancer diagnosis.

2 Related works

P. Shunmugapriya and S. Kanmani proposed an integrated algorithm of ABC and ACO for finding optimal feature subsets of medical datasets [31]. The global search followed by ABC is improved by using feature subsets generated by ACO to the ABC optimization process. The approach yielded an accuracy of 99.07% using Wisconsin breast cancer dataset (WBCD). The exploitation uses traditional greedy selection making algorithm to prematurely converge at the local optimal locations. The algorithm is only used for FS and no parameter optimization. Zorarpacı and Ozel introduced a hybrid algorithm of DE and ABC for optimal binary subsets selection [32]. The algorithm combines the high exploration property of DE with an improved onlooker bee phase of the ABC. The approach achieved F-measure of 92.2, 96.4 and 97.6 for decision tree classifier, naive Bayes classifier and RBF networks classifier, respectively, using WBCD. The algorithm is only used for FS and no parameter optimization.

Shanthi and Bhaskaran presented a modified ABC for FS [33]. The exploitation of the employee bees is improved where the neighborhood search process is improved using the global best solution. The modified ABC is used for FS using benchmark datasets called mammographic image analysis society (MIAS) and digital database for screening mammography (DDSM) for breast cancer diagnosis. Classification is carried out using self-adaptive resource allocation network. The accuracy was evaluated as 96.89% and 97.17% for MIAS and DDSM, respectively. The algorithm has not focused on the explorative phase and it has used only randomized initial solutions with loss of diversification. Rao et al. applied a FS algorithm using ABC and decision trees based on the gradient boosting model [34]. The features are selected from the Wisconsin breast cancer dataset and Haberman’s survival dataset. A regression tree is used as the classifier where gradient descent finds the direction of the gradient of residuals. The classification accuracy is 74.3% for Haberman’s cancer dataset and 92.8% for WBCD. It has not been evaluated in terms of complexity.

An efficient ABC is proposed by Badem et al., for optimal learning of deep neural networks (DNN) [35]. This algorithm used ABC and Broyden–Fletcher–Goldfarb–Shannon (BFGS) with limited memory. This proposed ABC tuned the parameters of DNN with cascaded autoencoder layers. The classification accuracy using WBCD is 73.03%. The step size of the neighborhood search is kept static throughout the entire search process affecting the convergence. Garro presented an optimized classification of DNA microarrays using ABC [36]. The optimal feature subsets from breast cancer datasets are selected using ABC. Then the selected optimal feature subsets are given to MLP, radial basis function neural network and support vector machine (SVM). The accuracy attained is 94.7% for MLP, for SVM accuracy is 89.5% and for RBF it is 73.7%. The algorithm used standard ABC without any improvement. Palanisamy and Kanmani proposed ABC-based FS for UCI datasets [37]. The system chooses 2 features from 9 attributes from WBCD and yielded an accuracy of 96.69%. The system is simple but used only the standard ABC.

Optimal FS using ABC for UCI repository datasets is proposed [38]. The employee bee phase is modified using a modification rate where the feature is selected if the random number is greater than the modification rate. The classification accuracy is 75.87%. The algorithm has not focused on the explorative phase and it has used only randomized initial solutions with loss of diversification. Two hybrid algorithms are proposed based on ABC and PSO [39]. In the first algorithm, the employee bee phase is hybridized with PSO to find new velocity position updates. In the second algorithm, the onlooker and scout bee phase are improved using mutations of the genetic algorithm. Both of the algorithms have the highest accuracy of 99.14% with an optimal selection of 13 features using WBCD. The algorithm is only used for FS and no parameter optimization. A hybrid algorithm for FS using branch and bound approach and ABC is proposed [40]. The algorithm first applies the branch and bound and finds the first set of features. Then, it applies ABC to identify the second set of features. A union operation is done to form a new set of optimal features. The algorithm has not focused on classification.

Schiezaro and Pedrini used an optimal FS using whale bubble net hunting strategy for UCI repository datasets [41]. The algorithm handles exploitation using the bubble net attacking method phase. Further, a global search is carried out by the search for the prey phase. During the evaluation, the SVM classifier attained an accuracy of 98.77%, precision of 99.15%, recall of 98.64% and f-score of 98.9%. This algorithm is only used for FS and no parameter optimization. J. Jona and N. Nagaveni presented an optimal FS using the integration of ACO and cuckoo search [42]. Local search behavior of ACO is improved using the exploitation of cuckoo search. The algorithm selected feature set that is optimal from the set of 78 texture features derived using GLCM. The input is taken from the MIAS dataset. In this approach, 5 features were selected with 94% accuracy. The algorithm uses the SVM classifier for prediction. This algorithm showed increased performance of 4% and 2% when compared with PSO and ACO, respectively.

A novel hybrid whale–artificial bee colony optimizer framework is introduced by Siddavaatam and Sedaghat for cross-layer optimization for Internet of Things (IoT) [93]. An efficient MAC for IoT has been designed to minimize energy consumption with extended network lifetime. The novel hybrid whale–artificial bee colony optimizer framework is used to obtain optimal nodes and the communication parameters in the IoT. It saves computation resources of the resource constrained IoT devices.

3 Materials and methodologies

A wrapper-based method that eliminates the use of statistical methods such as information gain or F-score is used for implementing the proposed HAW. The proposed architecture is depicted in Fig. 1. The input dataset is the breast cancer dataset where the total set is divided into three subsets. The first set that contains 50% of samples is used for training. The next 25% of samples are used for the validation, and the rest of 25% are used for testing. With the help of the optimal set of input features generated by HAW, the optimal features are selected from the three subsets where the other features are rejected. The optimal selected features of the training set are used for training the underlying ANN classifier. The proposed wrapper architecture was implemented utilizing MATLAB 8.5 software. Backpropagation training is done using a neural network toolbox.

Fig. 1
figure 1

Proposed wrapper architecture

3.1 Breast cancer datasets used by the proposed Wrapper architecture

HAW is evaluated using breast cancer datasets such as the WBCD [87], Wisconsin diagnostic breast cancer dataset (WDBC) [88], Wisconsin prognostic breast cancer dataset (WPBC) [89], DDSM [90], MIAS [91] and INbreast database [92]. The description of the datasets used is given in Table 1.

Table 1 Breast cancer datasets used by the proposed wrapper architecture

The optimal initial weights and hidden node size generated by HAW are used as the initial parameter settings of ANN. The ANN error is calculated with the help of the validation set. If the validation error increases for six iterations continuously, the training of ANN is stopped. The fitness of trained ANN is calculated using Eqs. (12 and 13). The ANN with high fitness (best) is selected and tested using the testing set with optimal feature subsets. The complexity (number of connections) of final ANN achieved is calculated using Eq. (11).

$${\text{Connect}} = U*V + V*W + V + W$$
(11)

U’ gives input features (size), ‘V’ indicates hidden node size, and ‘W’ indicates output nodes (size). The resulting ANN with the least connections guarantees less complexity. Fitness of ANN is calculated by Eq. (6). A higher value of \(ANN Err\) indicates low-fitness ANN.

$${\text{ANN Err}} = \left( {\frac{{P_{max} - P_{min} }}{L*w}} \right)\mathop \sum \limits_{i = 1}^{l} \mathop \sum \limits_{j = 1}^{w} \left( {A_{j}^{i} - B_{j}^{i} } \right)^{2}$$
(12)
$${\text{Fitness}} = \frac{1}{ANN Error}$$
(13)

l’ and ‘w’ is the size of the output nodes and validation examples, respectively. Pmax and Pmin are maximum and minimum actual output, respectively. \(B_{j}^{i }\) and \(A_{j}^{i}\) is the target output and actual output, respectively.

The initial solution representation is given using Fig. 2. I bits give the random initial weights, in which 2I different initial weights can be explored. J bits give the hidden node size so that 2J hidden node size can be explored. K bits give the feature bits that represent the total features. If a feature is selected, then ‘K’ bit is one; otherwise, it is zero. The size of the K bits may vary based on the total number of features available in the dataset.

Fig. 2
figure 2

Initial solution representation

3.2 Detailed description of the proposed HAW algorithm

HAW optimization algorithm is framed by the integration of a mutative initialization phase of ABC optimization with the exploitation phase of the whale optimization technique. The standard ABC is weak in exploration because of localized initial food sources due to the poor random search process. Hence, the HAW algorithm has used a mutative exploration phase at its initialization such that the algorithm can explore the entire problem space and finds out new promising regions. This employee bee phase of the ABC optimization process is integrated with the exploitative of WOA such that the employee bees follow the encircling prey/bubble attacking method of whales to update the positions of the food sources. The best food source found at each iteration is considered as the target prey of WOA. The HAW involves two stages: In the first stage, HAW uses a mutative initialization phase is proposed using different mutations and it derives a possible set of diversified solutions. In the second stage, an employee bee attacking phase is proposed such that the optimum set of solutions derived by the mutative initialization phase forms the initial food source positions of the employee attacking phase that follows the attacking method of whales for the prey. The simulated annealing technique is used in the employee bee attacking phase to make the algorithm to escape from the local optimum locations and avoid looping problems. A flowchart representation of the HAW algorithm is shown in Fig. 3. HAW optimization is summarized as follows:

  1. (i)

    A mutative initialization phase is proposed to derive a set of diversified solutions to expedite the search speed at the exploration phase.

  2. (ii)

    An employee attacking phase is proposed so that the employee bees adapt the encircling prey/bubble net attacking method of whales for updating the current food source positions during their foraging process. The exploitation of the employee bee attacking phase is guided by the best food source (prey of the whales) found so far.

  3. (iii)

    To escape from suboptimal location and to avoid looping problems, simulated annealing (SA)-based employee attacking phase is proposed.

  4. (iv)

    The onlooker bee phase and scout bee phase are followed in the same way as that of the standard ABC optimization.

Fig. 3
figure 3

Flowchart representation of the proposed HAW algorithm

3.2.1 Initialization and fitness calculation

A food source indicates a possible solution of the underlying optimization problem. Each food source is generated using the ‘dim’ number of variables that represent the dimension of problem space considered. The generation of the initial population is done through the random distribution of food sources using Eq. (14).

$$A_{k}^{l} = A_{\min }^{l} + random\left( {0,1} \right)*\left( {A_{\max }^{l} - A_{\min }^{l} } \right)$$
(14)

\(A_{k}^{l}\) represents the kth variable of food source \(k\) and k = 1, 2…N where N represents the maximum size of the food sources, where N = 1, 2…dim, and ‘dim’ represents the dimension based on the number of parameters of the underlying optimization problem.\(random\left( {0,1} \right)\) is the random number generated between 0 to 1. \(A_{max }^{l}\) represents the maximum bound of the lth variable of the optimization problem and \(A_{min}^{l}\) gives the minimum bound of the lth variable of an optimization problem. The estimated tight bound of Algorithm (3) is \(\theta\)(n2) where n will be the number of food sources. The algorithm for the initial generation of food sources is given in Algorithm (3).


Algorithm 3: Initialization of HAW

figure a

3.2.2 Proposed mutative exploration phase

The mutative exploration phase detects multiple food sources based on its quality. Better food sources are selected from the total population. Further, they are divided into three subpopulations based on the fitness difference between each food source and the best food source in the population with the help of three different threshold values such as limit1, limit2 and limit3. The three different subpopulations of food sources are subjected to different mutations where higher-fitness food sources are mutated less and low-fitness food sources are mutated high. Thus, the amount of mutation is inversely proportional to the fitness value of food sources. Better food sources with high fitness values are grouped as \(A_{k1}\) food sources whose fitness is close to the fitness of the best food source of the total population. The \(A_{k1}\) food sources are best; hence, a local search process is facilitated around the best food sources, which are mutated using Gaussian mutations where Eq. (15) is utilized to make small random changes to \(A_{k1}\) food sources.

$$A_{k1}^{^{\prime}} = A_{k1} + \mu_{g} .G\left( {0,1} \right)$$
(15)

\(A_{k1}^{^{\prime}}\)is the mutated food source, which is generated after the Gaussian mutation; \(\mu_{g}\) is the mutation rate indicating the strength of Gaussian noise added; \({A}_{k1}\) is the original food source; and \(G\left(\mathrm{0,1}\right)\) is the random number of Gaussian distribution whose mean is zero and variance is one.

Intermediate food sources with intermediate fitness values are grouped as \({A}_{k1}\) food sources whose fitness shares an intermediate fitness difference from the best food source of the total population. The \(A_{k2}\) food sources are intermediate; hence, a uniform search process is facilitated around the intermediate food sources and is mutated using uniform mutations where Eq. (16) is utilized to generated uniformly mutated food sources of the \(A_{k2}\) represented as \(A_{k2}^{^{\prime}}\) in which a random value is chosen from a solution and replaced with a uniform random value between the user-defined upper (Ub) and lower (Lb) limits.

$$A_{k2}^{^{\prime}} = A_{k2}$$
(16)

The worst food sources with low fitness values are grouped as \(A_{k3}\) food sources whose fitness values are at a larger difference from the fitness of the best food source in the population. The \(A_{k3}\) food sources are worse than \(A_{k1}\) and \(A_{k2}\). Hence, a global search is facilitated around the worst food sources, in which \(A_{k3}\) food sources are mutated using Levy mutations (LM) using Eq. (17) that are more probable to escape from the local optimum.

$$A_{h3}^{^{\prime}} = A_{h3} + \mu_{c} .C\left( {0,1} \right)$$
(17)

\(A_{h3}^{^{\prime}}\) is the mutated food source generated after Cauchy mutation and \(\mu_{c}\) is the Cauchy mutation rate that indicates the strength of the mutation. The mutated food sources along with the replaced worst food sources \(A_{x}\) forms the new set of food source \(A_{new}\) eligible for exploitation. The estimated tight bound of Algorithm 4 is \(\theta\)(n2) where n are the number of food sources mutated in explorative phase. The mutative exploration phase is described by algorithm (4).


Algorithm 4: Mutative exploration phase

figure b

3.2.3 Proposed employee bee attacking phase

The food sources generated from the mutative exploration phase along with the replaced randomly generated food sources form the initial food source positions of the exploitation phase. Each food source is assigned with an employee bee where a bee searches a better food source around the current food source following the same way a whale search for the prey. An employee bee follows a circular path during searching for a neighborhood food source where the best food source found so far is considered as the center of the circle. This exploitative mechanism makes the search process to be always guided by the best optimal locations. The step size β of the search process is kept high at the initial stages of the iteration on the circular path to facilitate the process of exploration, whereas in the later iterations the step size is gradually decreased to facilitate the process of exploitation. This dynamic step size β of the search process makes the employee bees to search the entire problem space such that the bees can reach remote locations that were not reached in the exploration phase. The problem of oscillations and local optima can be eliminated with the dynamic step size in the circular path. A random number ran is generated using uniform distribution and checked against the control variables \(C_{1}\) and \(C_{2}\). Then, if a randomly generated number ran is less than \(C_{1}\) then the food source position is found using Eq. (18).

$$N_{k}^{l} = A_{k}^{l} + \beta *\left( {A_{k}^{l} - A_{d}^{l} } \right)$$
(18)

where \(\beta\) = random (− 1,1) is the step size that is dynamically varied across the iterations. \(N_{k}^{l}\) is the new food source. \(A_{k}^{l}\) is the current food source, \(A_{d}^{l}\) is the randomly selected food source.

Then if the randomly produced number is less than \(C_{2}\) and a probability check is done using \(P_{i}\) where if \(P_{i} < 0.5\) then new neighborhood position is found searching around a circular path keeping the best food source found so far \(A_{k}^{*}\) as the center of the circle using Eq. (19).

$$N_{k}^{l} = \vec{A}_{k}^{*} - \vec{M}*Y$$
(19)

M represents the coefficient vectors obtained using Eq. (6 &8). Y is the distance between the current food source and the center of the circle (best food source) represented in Eq. (20).

$$Y= \left| {A_{k}^{*} - { }A_{k}^{l} } \right|$$
(20)

Another probability check, if \(P_{i} \ge 0.5\) then new neighborhood position is found searching around a circular path keeping the best food source \(A_{k}^{*}\) found so far as the center of the circle using Eq. (21)

$$N_{k}^{l} = Y*e^{aw} *\cos \left( {2\pi l} \right) + A_{k}^{*}$$
(21)

a is a constant that represents the shape of the logarithmic spiral. w is the random number generated between [− 1,1].

If fitness of \(N_{k}^{l}\) (neighborhood food source) is lesser than fitness \(A_{k}^{l}\) (current food source), the employee bee attacking phase accepts \(N_{k}^{l}\) by accepting downhill movements to make the search process to escape from its local optimum in the fitness landscape. The acceptance of the worst food sources is based on the probability value \({\text{e}}^{{\frac{{ - \Delta E}}{t}}}\) where \(\Delta E\) represents the difference between the current temperature and the next randomly generated temperature. This is done using a simulated annealing-based selection mechanism where better and worst solutions are accepted in the employee bee attacking phase. Acceptance or rejection of worst solutions is based on the controlling parameter called the simulated annealing temperature where the probability of accepting the worst solutions decreases as the iteration proceeds where the temperature gets decreased. At the initial stages, the simulated annealing temperature ‘t’ is set to have a high value where the value \(\frac{ - \Delta E}{t}\) tends toward 0 making the probability value \({\text{e}}^{{\frac{ - \Delta E}{t}}}\) toward 1 allowing the acceptance of the worst solutions. As the iteration grows, the value \(\frac{ - \Delta E}{T}\) tends toward 1 making the probability value \({\text{e}}^{{\frac{ - \Delta E}{t}}}\) toward 0 allowing the acceptance of better solutions. This makes the HAW algorithm to allow both the uphill and downhill movements in the fitness landscape making the algorithm to eliminate the problem of striking at local optimum locations. When the iterations grow, the simulated annealing temperature ‘T’ is cooled down using Eq. (22).

$$T\left( {t + 1} \right) = \emptyset *T\left( t \right)$$
(22)

where\(T\left( {t + 1} \right)\) is the new temperature and \(T\left( t \right)\) is the temperature of the previous iteration. ‘\(\emptyset\)’ is the simulated annealing constant which is set close to 1. The estimated tight bound of Algorithm 5 is \(\theta\)(n2) where n will be the number of the employee bees assigned to the food sources. The employee bee attacking phase is described by algorithm (5).

Algorithm 5: Proposed employee bee attacking phase


figure c

3.2.4 Onlooker bee phase

The information about the newly generated food sources \(N_{k}^{l}\) is shared with onlooker bees. Further, onlooker bees produce probability value \(Z_{k}\) for a food source, and it receives from the employee bee attacking phase using Eq. (3). The estimated tight bound of Algorithm 6 is \(\theta\)(\(n\)) where \(n\) is the number of food sources shared by employee bees.

Algorithm 6: Onlooker bee phase


figure d

3.2.5 Scout bee phase

The abandoned food sources are replaced and new food sources are introduced by the scout bees. In each iteration, if the food source isn’t improved then the limit value associated with the food source is incremented and if the food source doesn’t improve for certain iterations and crosses the threshold limit value, then those food sources are replaced by the scout bees by random generation process using Eq. (1). The estimated tight bound of Algorithm 7 is \(\theta\)(\(n\)) where \(n\) is the number of unimproved solutions. The detailed description of the scout bee phase is given in Algorithm (7).

Algorithm 7: Scout bee phase

figure e

4 Performance evaluation of the proposed HAW optimization

The HAW algorithm is used to generate the optimal feature subsets, initial weights and hidden node size of an ANN which is trained using three different learning mechanisms based on backpropagation such as resilient backpropagation (RP), Levenberg–Marquardet backpropagation (LM) and momentum-based gradient decent backpropagation (GD). The HAW performance is analyzed to find out the backpropagation variant that achieves the least ANN error with low complexity using WBCD. Further, the best backpropagation variant that is selected is used for training the ANN when tested for the rest of the datasets. The HAW-optimized ANN is analyzed for ten independent runs using various generation sizes, and the mean accuracy, mean connections and mean computational time are calculated.

4.1 Parameter settings and experimental setup using WBCD

The implementation of backpropagation is done using default training parameters. The winner takes all approach in output nodes is used for classification. The parameter settings of the ANN classifier are shown in Table 2. HAW utilized the cancer1 dataset [88] which contains preprocessed WBCD samples. The values between 0 and 1 are rescaled and attribute values that are missing are filled using mean values of non-missing attributes. The training dataset is formed using the first 349 samples which are 50% of the total dataset, the second 175 samples that lead to 25% of the total dataset are used as the validation dataset and the last 175 samples that are the remaining 25% are used as the testing dataset.

Table 2 Parameter settings of ANN using WBCD dataset

The HAW parameter settings are shown in Table 3. The values of thresholds limit1 and limit2 are set using trial-and-error method. The count limit is set as 10 to enhance global exploration. The employee bees size and the onlooker bees size are set according to the size of the solution space. The mutation rates of the Gaussian, uniform and Cauchy rates were set using the trial-and-error method to enhance the local, uniform and global search process. The updating probability and the attacking probability using trial-and-error method for different generation sizes and they are set accordingly to avoid convergence at local optimal locations.

Table 3 Parameter settings of HAW using WBCD dataset

4.2 Evaluation of Proposed HAW using WBCD with respect to accuracy, complexity and computational time

The HAW is evaluated for the complexity in terms of the number of connections and accuracy using ten runs for generation sizes 10, 20 and 30 as given in Table 4. HAW-RP achieved the highest mean accuracy of 99.25% at the 20th generation size. The mean connections were 10.40 for HAW-RP, which is low when compared to HAW-LM and HAW-GD. HAW gained the less complexity ANN trained with RP. The validation error convergence of the optimized ANN for RP, LM and GD for various generations is depicted in Fig. 4.

Table 4 Performance of the proposed HAW algorithm for various backpropagations
Fig. 4
figure 4

Convergence of validation error for HAW

Followed by HAW-RP, HAW-GD is 98.57% accurate, with average connections of 14.21 in the 30th generation. Next to HAW-GD, HAW-LM achieved high accuracy of 98.22% in the 30th generation with average connections of 15.46. The classification accuracy of the optimized ANN for RP, LM and GD for various generations is depicted in Fig. 5. The accuracy of HAW-RP was 1.05% more than HAW-LM and 0.69% more than HAW-GD.

Fig. 5
figure 5

Performance of HAW with respect to classification accuracy

The confusion matrix based on the true positive, true negative, false positive and false negative of HAW-RP, HAW-LM and HAW-GD for the best network achieved is shown in Table 5. Figure 6 gives the complexity of the ANN achieved in terms of the number of connections for various generation sizes. HAW-RP achieved lower complexity with lesser connections, followed by HAW-GD and HAW-LM. The mean hidden node count of HAW-RP was 22.5% less than HAW-LM and 20.78% less than HAW-GD. HAW-RP produced ANN complexity of having 28.08% fewer connections than HAW-LM and 26.81% fewer connections than HAW-GD.

Table 5 Confusion Matrix of the HAW-optimized ANN across ten runs
Fig. 6
figure 6

Performance of HAW with respect to the number of connections

The performance metrics of the optimized ANN generated using HAW-RP, HAW-LM and HAW-GD are shown in Table 6. HAW-RP showed higher sensitivity and specificity, followed by HAW-GD and HAW-LM. Figure 7 shows the hidden node count for HAW-RP, HAW-LM and HAW-GD over generation sizes.

Table 6 Performance of HAW-optimized ANN based on different metrics
Fig. 7
figure 7

Evolution of hidden node count with FS for HAW across different generations

Table 7 gives the importance of FS on the performance of the HAW-optimized ANN in terms of classification. Table 7 shows that HAW-optimized ANN with FS improved prediction accuracy with less complexity when compared with the performance without the FS.

Table 7 Performance of HAW-RP based on FS

Table 8 shows the confusion matrix for best network achieved using HAW-RP, with selected features such as uniformity of cell size, single epithelial cell size and bare nuclei.

Table 8 Confusion Matrix of HAW-RP-optimized best ANN with features selected

The evaluation of HAW-optimized ANN with respect to computational time is given in Table 9. The mean computational time for 10 runs was estimated for various sizes of 10, 20 and 30. The computational time of ANN optimized using HAW-LM was low in comparison with the computational time of HAW-RP and HAW-GD.

Table 9 Performance-based on average computational time

Figure 8 compares HAW with PSO, DE, ABC, BA, ACO, BFO, DA, GA, AIS, MBO, WOA and SSA using the WBCD. HAW attained highest accuracy, which was 9.67%, 7.53%, 10.89%, 12.27%, 12.78%, 8%, 10.03%, 0.97%, 8.83%, 9.91%, 5.03% and 3.39%, respectively, higher than when compared to above-mentioned algorithms. This time complexity of the proposed ANN architecture is calculated to be O(n ∗ t ∗ (ij + jk)) where n is the number of epochs, t is the number of training examples and i, j and k are the number of nodes in the first, second and third layers of ANN architecture.

Fig. 8
figure 8

Comparison between evolutionary methods using WBCD dataset

Figure 9 compares HAW and existing hybrid algorithms that are ABC-based using WBCD, which includes ABC-ACO [31], ABC-DE [32], ABC-DA [97] and ABC-gradient decision tree [34]. The accuracy of HAW was 0.07%, 1.58%, 2.36% and 2.02%, respectively, higher when compared to the above-mentioned algorithms.

Table 10 gives a comparison of the HAW-optimized classifier with existing breast cancer diagnosis schemes using WBCD datasets. Table 10 shows that optimized ANN classifier using HAW outperformed existing breast cancer diagnosis schemes.

Table 10 Comparison with existing breast cancer diagnosis schemes

4.3 Performance evaluation of HAW using different breast cancer datasets

The HAW-RP yielded the best ANNs with high accuracy and low complexity. Hence, HAW-RP was tested for the rest of the breast cancer datasets taken for investigation. Table 11 shows the performance of the. Table 11 shows the performance of the HAW-RP-optimized ANN for different breast cancer datasets. The average classification accuracy, F-measure, number of connections and computational time have been calculated using ten different independent runs for different generation sizes, and the best of the average values has been recorded. The HAW algorithm-optimized ANN model; namely, HAW-RP has shown better performance with a smaller number of connections in comparison with standard ABC and WOA for the datasets taken for investigation.

Table 11 Performance of HAW-RP-optimized ANN for different datasets

Figures 9 and 10 show the graphical representation of the performance of the standard ABC and WOA-optimized ANN model and the HAW-optimized ANN model for various breast cancer datasets taken for investigation with respect to classification accuracy and an average number of connections. The HAW-optimized ANN model has shown higher accuracy and low complexity in comparison with the standard ABC and WOA-optimized ANN model. Table 12 shows the confusion matrix with a frequently selected feature set for the various datasets used for investigation of HAW-RP-based ANN after the execution of ten independent runs.

Fig. 9
figure 9

Comparison of ABC-based hybrid algorithms using WBCD datasets

Fig. 10
figure 10

Comparison with other evolutionary methods using WDBC and WPBC dataset

Table 12 Confusion matrix after ten runs for various datasets

Table 13 shows the performance comparison of the HAW-optimized ANN model in terms of average classification accuracy for ten independent runs with various classifiers for different datasets. The HAW-ANN model has shown higher accuracy compared with existing classifiers for all the datasets taken for investigation.

Table 13 Average classification accuracy for various classifiers

4.4 Comparison with existing approaches using WDBC and WPBC

Figure 10 compares HAW with PSO, DE, ABC, BA, ACO, BFO, DA, GA, AIS, MBO, WOA and SSA using the WDBC. Each algorithm is executed for 10 runs various generation sizes 10, 20 and 30 and the best is taken for comparison. HAW-based ANN attained highest accuracy, which was 7.65, 6.83, 12.96, 17.12, 8.84, 5.69, 9.2, 7.42, 12.96, 11.3, 7.65 and 5.35%, respectively, higher than when compared to above-mentioned algorithms.

Table 14 gives a comparison of the HAW-optimized classifier with existing breast cancer diagnosis schemes using WDBC and WPBC datasets. Table (6.10) shows that the proposed optimized ANN classifier using HAW outperformed existing breast cancer diagnosis schemes.

Table 14 Comparison with existing breast cancer schemes for WDBC and WPBC dataset

Figure 10 compares HAW with PSO, DE, ABC, BA, ACO, BFO, DA, GA, AIS, MBO, WOA and SSA using the WPBC. Each algorithm is executed for 10 runs various generation sizes 10, 20 and 30 and the best is taken for comparison. HAW-based ANN attained highest accuracy, which was 7.6%, 6.06%, 11.33%, 13.03%, 6.41%, 9.31%, 10.44%, 5.02%, 10.44%, 7.6%, 2.99% and 2.67%, respectively, higher when compared to the above-mentioned.

4.5 Comparison with existing approaches using DDSM, MIAS and INbreast databases

Figure 11 shows the comparison of HAW with evolutionary algorithms such as PSO, DE, ABC, BA, ACO, BFO, DA, GA, AIS, MBO, WOA and SSA. Each algorithm is executed for 10 runs various generation sizes 10, 20 and 30 and the best is taken for comparison. Using DDSM database, HAW-based ANN achieved accuracy which is 7.98%, 7.16%, 13.3%, 17.48.1%, 9.17%, 6.01%, 9.53%, 7.74%, 13.3%, 11.64%, 6.81% and 6.35% more than above-mentioned algorithms, respectively. Using MIAS database, HAW-based ANN achieved highest accuracy which is 10.53%, 10.28%, 10.4%, 8.2%, 8.22%, 12.67%, 10.28%,7.17%, 9.42%, 10.28%, 8.22% and 7.87% more than above-mentioned algorithms, respectively. Using INbreast database, HAW-based ANN achieved highest accuracy which is 7.6%, 9.14%, 14.83%, 9.26%, 5.99%, 11.47%, 8.66%, 5.99%, 8.07%, 12.36%, 10.97% and 10.11% more than above-mentioned algorithms, respectively.

Fig. 11
figure 11

Comparison with other evolutionary methods using DDSM, MIAS and INbreast

Table 15 gives a comparison of the HAW-optimized classifier with existing breast cancer diagnosis schemes using DDSM, MIAS and INbreast datasets. Table 16 shows that optimized ANN classifier using HAW outperformed existing breast cancer diagnosis schemes.

Table 15 Comparison with existing breast cancer diagnosis schemes for DDSM, MIAS and INbreast datasets
Table 16 Comparison with existing breast cancer diagnosis schemes

4.6 Comparison with existing approaches that doesn’t involve complex tuning process for breast cancer diagnosis

Table 16 gives a comparison of the HAW-ANN scheme with existing breast cancer diagnosis schemes that involves either FS or parameter tuning process. Table 17 shows that having both FS and parameter optimization of ANN in parallel is vital and the HAW-ANN outperformed existing breast cancer diagnosis schemes that has used either FS or the parameter tuning of the classifier involved.

5 Discussions

The advantage of proposed HAW lies in deriving an optimal ANN by the optimal searching of hidden node size, the initial value of weights and input features using a simple wrapper approach. ANN uncertainty arises due to the improper selection of the best set of parameters of the model. These uncertainties also occur due to inherent variability present in the input of ANN. If the inputs and the other parameters of the model are uncertain, it causes uncertain and inaccurate output of the model. This problem of uncertainty in the ANN architecture used in this paper has been eliminated by choosing the best parameters of the model using proposed HAW algorithm. The algorithm has also selected the optimal input features from different datasets that acts as the input of the ANN used. Uncertainty in the input data has also been overcome by appropriate data division and data processing methods such as normalization and data filling procedures. However, detailed uncertainty analysis can be taken as the future work for this research.

The main objective of this research work is to propose a hybrid optimization algorithm to select the optimal feature set to be processed by the neural network in order to have a reduced computational time and complexity without sacrificing its performance. The optimal feature sets obtained were only 25% of the original set of features, and hence, this research work has the feed forward neural network considering the amount of processing data and to avoid the problem of over fitting of the data. The proposed ANN architecture has not been tested for high-dimensional datasets. This could be done as the future work where deep learning models or other decision-making models can be used when the set of features processed by the model is high. For the datasets taken for evaluation, the proposed ANN architecture considered in this paper has achieved accurate results.

6 Conclusions

This paper proposed a hybrid algorithm by integrating the standard ABC with WOA for simultaneous feature subset selection and parameter optimization of ANN for breast cancer diagnosis. Performance evaluation of HAW using backpropagation variants such as RP, LM and GD is done for different breast cancer datasets. Based on the results achieved, RP achieved the best prediction accuracy with a low-complexity ANN. HAW-RP achieved the highest mean accuracy of 99.25%. The mean connections were 10.40 for HAW-RP, which is low when compared to HAW-LM and HAW-GD for WBCD dataset. For WDBC, WPBC, DDSM, MIAS and INbreast databases, HAW-optimized architecture achieved 98.5, 96.3, 98.8, 98.7 and 99.1, respectively. From the results of this study, it is concluded that having both FS and parameter optimization of ANN is vital. In comparison with existing researches, the average accuracy attained by HAW is promising. In comparison with existing evolutionary algorithms, the HAW algorithm was more accurate and less complex.

Further, the limitations of the HAW-ANN classifier are based on the ‘no free lunch theorem’ which states that classifier trained on a particular feature set may not be used for other feature sets. Based on this, the HAW-optimized classifier that is trained on the breast cancer datasets has not been yet tested for other medical datasets and other high-dimensional datasets. The proposed wrapper approach of HAW-ANN scheme is complex and requires more computational time than the existing breast cancer scheme since it involves FS and parameter tuning process parallel.

In the future, the proposed hybrid algorithm HAW can be used for segmentation of the breast regions and other abnormal regions such as masses and microcalcifications from the digital mammograms eliminating the background portions. This can be done by using HAW to generate optimal thresholds that can be used by various segmentation methods such as region growing and multi-level thresholding. HAW can also be used for segmentation of various abnormalities that occur in the different parts of the body such as the brain, liver and kidney. In future, HAW can also be used to select the optimal input feature subsets for various existing classifiers under different medical datasets for various disease diagnoses and it can be used to find out optimal parameters for different ANN architectures such as the CNN, RNN and deep belief neural networks. In future, the HAW can also be used in optimal parameter tuning for different classifiers such as the SVM, RF, decision tree and optimal selection of fuzzy rules.