1 Introduction

Cancer is the uncontrolled regeneration and reproduction of untypical and unwanted cells in different parts of a body. Cancer really is a comprehensive term for a group of diseases caused when untypical or unusual cells grow up in different parts of the body, thus also known as neoplasm (literally meaning, a new growth). There are hundreds of strains of cancer diseases such as colon cancer, oral cancer, skin cancer, prostate cancer, breast cancer and lung cancer, depending upon the site of reproduction of the cells. Among all these types of cancers, breast cancer (B-Can) is a sort of cancer with contrasting appearances among women. It is the most common type of cancer that affects women throughout the world [3]. A breast lump is the most classic presenting manifestation among women with B-Can and has a relatively outrageous predictive value for it to be malignant [27].

Based on histological and molecular evidences, B-Can is tabulated into three crucial sub-types based on the presence or absence of molecular indicators for progesterone (PR+) or oestrogen receptors (ER+) and human epidermal growth factor 2 (ERBB/HER2+): hormone receptor-positive (ERBB2-), ERBB2+ and triple-negative B-Can (TNBC) (lacking all 3 standard molecular markers). The treatment and the duration of it vary from the type of B-Can [50]. There are numerous ways of diagnosis and treatment of breast cancer. Some of the methods for treatment are radiotherapy, chemotherapy, immunotherapy (using immune cells already present in breast tissue) [17], combining both radiotherapy and immunotherapy [38], breast conservation therapy (BCT) and mastectomy which offer analogous survival and can be regarded as equivalent therapy in early-stage B-Can. The last two therapies, BCT and mastectomy, are the most conflicting ones for women to decide [18]. Digital mammography (DM) is the most recommended method for diagnosing breast tumours [33]. The diagnosis of breast cancer using IOT-enabled classifier systems is also available in the literature [31]. Following these methods and earlier diagnosis, there is an improvement in B-Can treatment leading to an increase in survival rate. However, such therapies and diagnosis may have some side effects, majorly in pregnant women who may have the chance of miscarriage and terminations [16]. Due to these reasons, patients have started refusing to undergo the treatment which increases the risk of mortality to 2.42 times higher according to the study [15].

Typically being the most commonly occurring cancer type among women and second-most periodic cancer overall, B-Can accounted for as much as 19.3 million new cases and 10 million cancer deaths around the world in the year 2020 (estimate by Cancer Journal) [46]. Though there are some studies that focus on male breast cancer, it still exists to be rare and understudied when compared to female breast cancer, which shows that the 5-year survival rate for male counterparts was lower than that for female patients. In the subcontinent of India itself, it accounted for almost 14 of total cancer cases among Indian women. Figures show that breast cancer has now outgrown even the most frequent forms of cancers including cervical cancer. Being on the rise in both rural and urban parts of the country, every four minutes an Indian woman is diagnosed with this disease in India with post-survival rates reported to be as low as 80 among US women and an even worse 60 among Indian women. B-Can is the second leading cause of cancer-associated mortality in the USA and has been reckoned that 1 out of 8 women will experience breast cancer at some point of their life [15].

It is well known to many that as the stages of this disease grow, the chances of survival of the patient diminish. With more than half of Indian women diagnosed with the disease suffering from stage three–four of breast cancer, the chances of survival seem very far-fetched.

It is seen that a lot of factors affect the chances of developing breast cancer in women. Correlation of different factors includes body mass index (BMI) [8], pregnancy obesity [25], alcohol consumption patterns and intensities [49], active–passive smoking [35], age groups [30] among others helping in more accurately evaluating the risks of developing breast cancer in later ages and chances of recovery. As close as 50% of the total cases fall in the age group of 25-50 years. Making the situation worse, more than 70% of the advanced-stage cases have low survival rates and high mortality rates. Various studies have been done in the past discussing the incidence and mortalities of female breast cancer, particularly for the Asia-Pacific regions.

Due to the most recent COVID-19 pandemic, the patients of B-Can are more susceptible to be infected by the virus, leading to more deaths. Also, due to lockdown and shortage of medical supplies, their treatments and therapies are given a pause, increasing the chances of augmentation of raging cancer cells. The aftereffects of this pandemic included high rates of anxiety, insomnia, distress and depression, especially in breast cancer patients [22]. To beat these effects as well as the disease itself, an active and healthy lifestyle is believed to be very helpful. Physical exercise has been shown to lower the risk of breast cancer, reduce the incidence of recurrence and improvise the survival rate of breast cancer patients. It can also aid with mental wellness and the immune system [10]. The low survival rates of breast cancer patients in the world are due to the failure in the detection of the disease in the early phase. As time passes and the stages of the disease progress, the chances of survival become less. Apart from spreading awareness, the only possible ways to change this situation are to develop and deploy some state-of-the-art accurate and efficient algorithms that can detect and raise alarms when the disease is at its early phase itself. Breast cancer can be cured and the survival rates plausible if detection is done in time.

1.1 Past approaches for detection in brief

Various works and researches have been done in the past for the early phase detection of cancer by the tools and algorithms of nature-inspired metaheuristics.

Kiyan et al. [26] proposed an algorithm which used radial basis function (RGF), general regression neural network (GRNN) and probabilistic neural network (PNN) to classify Wisconsin Breast Cancer Dataset (WBCD) and compare the result with multilayer perceptron (MLP). Yeh et al. [55] devised a data mining method using the concept of discrete PSO. This method created a novel PSO that has a feasible system structure, and every particle is coded in positive integer numbers. Statistical analysis is used to eliminate feature variables that are insignificant.

Nazarian et al. [37] introduced a model for the classification of binary issues through utilizing information mining procedures, based on artificial bee colony algorithms.

Krawczyk et al. [28] presented a work in progress on a computer-aided breast cancer determination. They utilized a multi-objective memetic algorithm to choose the pool of one-class predictors that show simultaneously high diversity and consistency. Dheeba et al. [12] have advocated a PSO optimized WNN model. The model attempted to increase the classification accuracy of WNN by tuning the initial network parameter using PSO. By using PSOWNN, they evolved connected WNN and optimized it by using the best network architecture which was created by optimizing the momentum factor, learning rate and number of neurons in the hidden layer.

Bhardwaj et al. [4] came up with a new algorithm, genetically optimized neural network (GONN), for problem-solving. They modified the neural network to improve its architecture (structure and weight) of planning. They used the GONN algorithm to classify breast cancer tissue as malignant or benign. To demonstrate their results, they took a WBCD database.

Wang et al. [52] have proposed the CMWOAFS-SVM model which is an improved WOA based on the chaos and control engineering theories which are lodged to optimize two critical parameters for SVM with the RBF kernel and simultaneously select features in practical classification tasks. Similarly systems have also been developed for heart disease prediction and Brainstroke Classification using machine learning [21, 41].

1.2 Brief about our model

Machine learning models integrating metaheuristic optimization algorithms have been employed for various applications including detection of breast cancer. However, such models can often face issues like less generalization and overfitting and need better parameter optimization. Though metaheuristic algorithms have been effective in optimizing SVM ensembles and proposed better ability to find global optima for SVM, synchronization of multiple metaheuristic algorithms needs more exploration for optimizing the combination strategies in SVM ensembles. This should be implemented by considering ensemble diversity and performance improvement. The SVM can be made suitable for specific characteristics of the breast cancer dataset, by optimizing the parameters using the ABC algorithm for better model performance. In the model, we have employed the artificial bee colony that is based on the communication and cognitive skills of honeybees [24]. The issue with ABC is that it has an inherent problem of stagnation. This suggests that in the first phase, the location gradually changes, and so, the exploration phase moves at a slow pace. We have implemented an improved black hole [11] algorithm as a means to increase convergence of the model and to expand the exploration phase. The classic black hole algorithm possesses difficulty in maintaining a balance between the exploitation and exploration phases; thus, it was modified to tackle the very same limitation. It has a tendency to get stuck in the local optima. The improved black hole algorithm did not face any such limitation, since the stars orbiting around the focal point were at random angles and move to a better position, and therefore, exploration was done efficiently. This algorithm was incorporated into the first phase of the artificial bee colony’s exploration, thus making it more efficient, hence establishing a balance between the exploration and exploitation phases. Once a star enters the event horizon, it is destroyed. Traditional black hole algorithms create points randomly in the search space which can possibly make the search space uneven and very different from the initial distribution. To rectify this drawback, we applied mutation and crossover genetic operators. The crossover operation is one of the well-known evolutionary algorithms, such as differential evolution (DE) [45], that can significantly increase an algorithm’s effectiveness. Over the binary coded GA, we applied a two-point crossover. The extreme points were rectified using a two-point crossover. In order to arrive at a wider range of options, we employed the crossover operator. After crossover, we also employed swap mutation which is also one of the most prominent genetic operators. This makes sure that no two points, which are generated in different iterations, are exact replicas of each other. This model is then used to optimize a majority voting ensemble technique which classifies the data points as per the task. All these improvements over respective base algorithms work in harmony with each other and work towards combating their limitations. The proposed model can improve the overall performance of SVM ensemble and increase model accuracy. Better parameter optimization and reduction in errors in the proposed breast detection model can produce excellent results.

The contributions of research work are summarized as follows:

  1. 1.

    ABC algorithm is synchronized with improved black hole algorithm to make it more efficient and overcome their respective drawbacks.

  2. 2.

    Crossover and mutation genetic operators are used to generate new stars instead of randomized generation to counter the limitation of black hole algorithm, thereby making a diversified population and a better newly generated star.

  3. 3.

    A novel GBHABC model is proposed to optimize the weights of the ensemble model for the best possible solution.

  4. 4.

    Majority voting ensemble is used to set priorities for different kernels available in the model.

2 Related work

In the past, a diverse range of datasets were utilized for the early diagnosis of various diseases such as diabetes, breast cancer, hepatitis and others. Many studies used the Wisconsin Prognostic Breast Cancer Dataset (WBCD) to detect breast cancer. This dataset is still used to create and validate various algorithms in the field of breast cancer detection today. In an earlier study, the use of data mining techniques in conjunction with nature-inspired algorithms was seen to solve diagnosis and detection challenges. Some of the major state-of-the-art techniques are illustrated in Table 1.

Yeh et al. [55] proposed a data mining technique using the discrete PSO on WBCD dataset. The proposed method created a novel PSO with a feasible system structure, with every particle being coded in positive integer numbers. Statistical analysis was used to eliminate insignificant feature variables, and after that, only six feature variables were remaining. The proposed hybrid data mining method reduced the computational complexity and sped up the whole data mining process. The proposed method showed satisfactory results on a range of performance metrics including sensitivity, accuracy and specificity.

Chen et.al. [6] showed the superiority of a hybrid SVM-based model rough set-support vector machine (RS-SVM) over other relevant state-of-the-art algorithms for tackling the problem of breast cancer diagnosis. In the proposed RS-SVM model, the RS feature reduction algorithm was employed for feature selection to eradicate the redundant features of the dataset and add to the diagnostic accuracy of the model by SVM. RS feature reduction algorithm was employed on the whole dataset for optimal subset generations, and only a small fraction was retained for the complete study. The combination subset of features used by the model for categorizing the breast tumours was pinned down to be the most promising feature of the reduction algorithm. Moreover, experiments done on altered slices of the WBCD showed that the RS-SVM model performed satisfactorily in distinguishing between the two classes of breast tumour. The performance of this model greatly depended on the optimal parameters of the SVM. Thus, the need for new algorithms to accurately calculate the optimal parameters arose. Moreover, a single dataset like WBCD was used for evaluating the model, thus undermining the accuracy and real-life application of this model.

Chen et al. [7] proposed the PSO-SVM Classifier for breast cancer diagnosis on the WBCD dataset. They incorporated feature selection and parameter optimization in the suggested technique. The proposed approach is divided into two parts. The first stage involves using the PSO algorithm to firmly conduct SVM parameter optimization with feature selection. The second stage involves conducting a classification job with the help of a selected feature and an ideal value using the tenfold approach and the SVM model. In the proposed model, the particle swarm framework simultaneously solved the issue of feature selection and model selection in SVM. For the designing purposes of the objective function of particle swarm optimization a weighted function was adopted in which simultaneously accounted for a number of support vectors, selected features and average accuracy rates of SVM, but the model was not feasible in practical diagnosis as it costed a lot of CPU time for PSO-based system training.

Nazarian et.al [37] put forward a novel classification method by using artificial bee colony algorithm for classification. This algorithm was another straightforward, new, simple and proficient algorithm dealing with real-world scenarios for optimization. In this paper, analysis of this ailment is conceived through utilizing data related to B-Can samples. The aftereffects of this technique were contrasted with different strategies for breast cancer determination.

Krawczyk et al. [28] proposed a CAD-based breast cancer determination, tested on genuine and proven medical datasets of 675 pictures, gathered from 75 patients out of which 25 were amiable, 25 were dangerous and 25 were fibroadenoma. In the study, each patient was addressed by 9 pictures chosen by a pathologist self-assertively. They proposed a productive medical choice support structure that permitted recognizing amiable, dangerous and fibroadenoma cases from the dataset. The nuclei recognition methodology depends on the firefly calculation algorithms. The methodology produced nuclei markers that were utilized in the marker-controlled watershed division. In the final step, an image recognition step was finished by a novel classifier. The model was built utilizing a one-class classifier trouped for a multi-class decomposition, permitting to capture proficiently the unique attributes of inspected classes and henceforth to precisely classifying them. The outcomes introduced in this paper showed that a modernized medical determination system dependent on the proposed strategy would be successful and could give precise diagnostic data.

Sheikhpour et. al. [44] proposed a PSO-KDE-based model: hybridizing the PSO and classifier based on nonparametric kernel density estimation for breast cancer determination on the WBCD. In this model, they used PSO to simultaneously find a subset of features that could minimize a number of features as well as the classification error in the classifier based on kernel density estimation. This model contained feature selection using PSO and classification with the selected feature subset and optimal kernel bandwidth determination. KDE-PSO model had better average performances than the average performance of the GA-KDE model in breast cancer diagnosis. Though results could show the supremacy of this model over other proven methods, the PSO used for minimizing the classification error and number of features was not for multi-objective function. On the contrary, some researchers have also proposed a TLBO-PSO [43]-based model in their study having a multi-objective optimization with two primary objectives: the first one for attaining the highest classification accuracy and the second to select the least number of informative genes. They used the breast cancer microarray dataset for implementation of their study. Using the proposed TLBO-PSO model, they were able to enhance various performance metrics in the study including sensitivity, accuracy and specificity.

Another study proposed by Wang et al. [53], based on CNN- and CAD-based methods, was proposed in a study that used a dataset of 219 patients with 614 ABUS—automated breast ultrasound—volumes that included 745 cancer regions, and 144 healthy women with a total of 900 volumes, without abnormal findings. In this study, a 3D CNN architecture was proposed for CAD cancer diagnosis in ABUS volumes. The proposed system used the densely deep supervision (DDS) mechanism to learn more discriminative features for malignant representations of cancer cells and to enhance the sensitivity of detection, and utilize the recommended threshold map with a unaccustomed threshold loss to streamline the cancer probability map for the false-positive reduction while maintaining a high record of sensitivity metric. Although the proposed detection system presented astonishing results, it still posed a few limitations. In the model, cystic lesions and fatty lumps would be mis-classified as cancerous lesions and small cancerous lesions may be neglected by considering them a common ultrasound shadow.

Wang et al. [52] proposed the CMWOAFS-SVM model which was an enhanced WOA model based on the control and chaos engineering theories, proposed to optimize two principal parameters for SVM with the RBF kernel and simultaneously select features in practical classification tasks. They tested the prototype on WBCD, diabetes and dermatology datasets. The proposed model consisted of two parts: the first part included feature selection and parameter optimization of SVM that were simultaneously conducted by an improvised WOA algorithm. In the latter part, to evaluate the suitability of SVM for the classification task using a tenfold CV analysis, a subset of the optimal parameter pair and feature was obtained from the initial part. The proposed model could prove supremacy over other models by better performances and feature subset sizes, and the ability to tackle feature selection and parameter optimization simultaneously. Along with these astonishing results, it is not obscured that CMWOA failed to have any advantage in terms of computational cost over the original WOA. Moreover, it also has a complex scavenging process being based on multi-swarm having stratified mechanisms.

Table 1 Related work comparison

Some authors relied more on other datasets than WBCD, seen in studies like one done by Dheeba et. al. [12]. They proposed a PSO optimized WNN model on the real-world clinical database of 216 mammograms collected from mammogram screening centres. In this model, they have attempted to enhance the classification accuracy of WNN by tuning the initial network parameter using PSO. By using PSOWNN, they evolved connected WNN and optimized it by using the best network architecture which was created by optimizing the momentum factor, learning rate and the number of neurons in the hidden layers. The proposed PSOWNN classifier showed improvement in the classification accuracy of CAD analysis of the digital mammogram for diagnosis of breast cancer.

3 Methods and data

This section is divided into four primary subsections: a discussion of the artificial bee colony algorithm, a description of the black hole algorithm, a discussion of genetic operators and a discussion of the WBCD dataset.

3.1 ABC algorithm

figure a

One of many nature-inspired algorithms, “artificial bee colony (ABC)” [24] proposed by Karaboga, is inspired by honeybees’ foraging behaviour. The algorithm proposes to see any possible solution to the problem as a food source for the bees, whereas the amount of nectar in the food source corresponds to the solution’s quality/fitness obtained. The artificial bee colony consists of three groups, namely employed bees, scouts and onlooker bees. The employed bees exploit local information to extract the locally best possible solution. Once all employed bees are done with the search process, onlooker bees evaluate and choose a food source based on the information supplied by the employed bees. This process is then used until a global best possible solution is found to the optimization problem. The proposed equation, to produce a candidate food (feasible possible solution) source from the information of an old one present in memory, is:

$$\begin{aligned} v_{\textrm{ij}}=x_{\textrm{ij}}+ \Phi _{\textrm{ij}}( x_{\textrm{ij}}- x_{\textrm{kj}}), \end{aligned}$$
(1)

where \(k\in \{1,2,...,\textrm{SN}\}\) and \(j \in \{1,2,...,D\}\) are the randomly chosen indices. Each solution denoted by \(x_i (i=1,2, \ldots ,\textrm{SN})\) is a D-dimensional vector, where D is the number of optimization parameters. \(\Phi _{ij}\) is a randomly chosen value lying between [−1, 1], controlling the production of neighbouring food sources around \(x_{ij}\) and representing a comparison among two food positions as seen by a bee. A food source is chosen by an artificial onlooker bee based on the probability value attached with that food source, \(p_i\) given by Eq. (2).

$$\begin{aligned} p_i=\textrm{fit}_i/ \sum _{n=1}^{\textrm{SN}}\textrm{fit}_n, \end{aligned}$$
(2)

where \(\textrm{fit}_i\) is the fitness value of the solution i and SN is the number of food sources which is equal to the number of employed bees.

In this algorithm, scout bees replace an old food source, of which the nectar is rejected by the bees, with a new food source. This process is simulated by the algorithm by generating a random position in dimension space and producing a position at random and replacing that with the rejected food source. In the ABC algorithm, after a prefixed number of cycles, if the quality of food source cannot be improved any further, then that food source is abandoned by the bees. This prefixed number of cycles, called as limit for abandonment, is a very important and sometimes a deciding control parameter in determining the efficiency and accuracy of the algorithm. Taking the abandoned source as \(x_i\), while \(j\in \{1, 2,\ldots , D\}\), the scout bees discover a new food source to be replaced with \(x_i\). This operation can be defined by using the following equation:

$$\begin{aligned} x^j_i=x^j_{\textrm{min}}+\textrm{rand}(0,1)\left( x^j_{\textrm{max}}-x^j_{\textrm{min}}\right) , \end{aligned}$$
(3)

The ABC algorithm is tried to be represented with a pseudo-code shown in Algorithm 1.

3.2 Black hole algorithm

figure b

The black hole algorithm [29, 19, 20] is a bio-inspired population-based method having similar features with another population-based algorithm. This method was inspired by the black hole phenomenon. A black hole is essentially a region of space with so much mass concentrated inside it that if an entity approaches it, there is no way of escaping its gravitational pull. If anything falls into black hole including light, then that object is forever gone from the universe. In the black hole algorithm, we start with an initial population of candidates that are a feasible solution to optimization problems, for every iteration, the best candidate out of the population is selected, and the remaining becomes the normal star after the initialization process the normal star starts moving towards black hole, i.e. best candidate. Whenever a normal star is swallowed by the black hole then for replacing the candidates that have entered the black hole new candidates are generated in the search space.

3.2.1 The absorption rate of stars by a black hole

After the initialization, absorption of the star around the black hole is started and all the normal stars start moving to the black hole. The absorption rate of stars by a black hole is calculated as follows:

$$\begin{aligned} X_i(t) = X_i(t) +\textrm{rand} \times (X_{\textrm{BH}}- X_i(t) ) \end{aligned}$$
(4)

where i=1,2,3,4…. n, \(X^t_i\) and \(X^{t+1}_i\), respectively, are the location of \(i{\textrm{th}}\) star at t and t+1. \(X_{\textrm{BH}}\) is the location of the black hole in the search space and rand is a randomly generated number between the interval of [0,1] and N is the number of normal stars.

3.2.2 Probability of crossing the event horizon during moving star

We use the likelihood of crossing the event horizon (\(E_H\)) of the black hole while moving the normal star towards the black hole to obtain the most optimal data from the problem’s search space. Every star which crosses the event horizon is absorbed by the black hole that means it dies and a new star is generated in the search space. To calculate the radius of the event horizon, the following equation is used:

$$\begin{aligned} R_{\textrm{BH}}=\frac{f_{\textrm{BH}}}{\sum _{i=1}^{N} f_i} \end{aligned}$$
(5)

where \(f_{\textrm{BH}}\) and \(f_i\) are the fitness values of black hole and \(i{\textrm{th}}\) star, respectively, is the number of candidate stars. The pseudo-code for black hole algorithm is shown below in algorithm.

3.3 Genetic algorithms

Genetic algorithms (GAs) are adaptive heuristic search algorithms that are a subset of evolutionary algorithms. They are working on a concept based on natural selection and genetics. The random search is given historical data to direct it into the region of greater performance in the solution space. They are frequently utilized to produce high-quality solutions to optimization and search challenges. GAs work on a population of simulated chromosomes. Typically, these are binary coded. Each chromosome represents a solution to a problem and has a fitness, which is a real number that indicates how good a solution it is to the specific problem. It is a population-based search method that employs the survival of the fittest idea. A genetic algorithm uses operators that instruct the algorithm to solve a particular issue. During the search procedure, the GAs employed a number of operators. Encoding scheme, crossover, mutation and selection are among these operators.

3.3.1 Encoding scheme

In many computational issues, the encoding technique, or the method used to change data into a certain form, is critical. The information provided must be transformed into a certain bit string. The encoding strategy is different depending on the issue domain. They can be used in neural networks to identify the best weights, generate programmes or expressions and so forth. Binary, octal, hexadecimal, permutation, value-based and tree encoding techniques are well known.

3.3.2 Selection techniques

In the genetic algorithm, selection plays a vital role in selecting whether or not a certain string will participate in the reproduction process. It basically chooses parents who will mate and recombine to produce offspring for the future generation. Parental selection is critical to the GA’s convergence rate because good parents lead individuals to better and fitter solutions. This operator is also known as the reproduction operator. The rate of convergence of GA is determined by the selection pressure.

3.3.3 Crossover operators

Reproduction and biological crossover are analogous to the crossover operator. In crossover, more than one parent is chosen, and one or more offspring are generated using the parents’ genetic material. Crossover is typically used with a high probability in a GA. Single-point, two-point, k-point, uniform, partially matched, order, precedence preserving crossover, shuffle, reduced surrogate and cycle are examples of well-known crossover operators. These crossover operators are fairly generic, and they can also be used to create a problem-specific crossover operator.

For our proposed model, we have used a two-point crossover. The concept of two-point crossover is identical to that of one-point crossover. Instead of a single cut point, two are randomly selected at the same places in two parents, and two offspring are formed from them.

Process for a two-point crossover:

  1. (i)

    Choose two cut locations at random between each parent’s genes.

  2. (ii)

    Copy the sub-string between the two cut places from Parent 2 to Offspring 1, as indicated in Fig. 1.

  3. (iii)

    The remaining values are copied from the first parent and placed exactly as they are in Offspring 1, excluding duplicate values. ( Genes 1 and 8 have already been inserted in the middle of the Offspring 1 in Fig. 1. As a result, by being copied to it, they are omitted.)

  4. (iv)

    Empty areas in Offspring 1 are filled with genes that did not occur in the first parent, but did appear in the second parent after the second cut point, as illustrated in Fig. 2. (As gene 4 comes after the second cut point of Parent 2, it appears first in Offspring 1, followed by gene 5.)

  5. (v)

    Similarly, Offspring 2 can be formed by swapping the roles of the parents as seen in Fig. 2, which is represented as Offspring 2.

Fig. 1
figure 1

Diagram representation of two-point crossover

Fig. 2
figure 2

Diagram representation of two-point crossover

3.3.4 Mutation operators

Mutation is a genetic operator that preserves genetic variation from one population to the next. It can be characterized as a minor random change in the chromosome that results in a new solution. It is typically used with a low probability. The GA is simplified to a random search if the probability is sufficiently high. The "exploration" of the search space is related to mutation. In several circumstances, it has been discovered that mutation is required for GA convergence, whereas crossover is not. Displacement, simple inversion and scramble mutation are well-known mutation operators.

3.4 SVM

SVM is one of the most successful binary classification algorithms due to its excellent generalization capabilities. SVM was developed in the 1990 s by Vapnik et al. [9] with the goal of minimizing structural risk and making the model very resistant to overfitting. If a model classifies testing or unknown data with a lower accuracy than the training data, it is said to be overfit.

By creating a hyperplane or a group of hyperplanes, SVM divides the data into distinct classes. A hyperplane is defined in SVM as a surface that entirely divides data points into two classes so that no data point from one class falls into the other [47].

The hyperplane with the greatest distance to the nearest data points in both classes, out of an unlimited number of potential hyperplanes, is deemed the most optimum [9].

A hyperplane is the set of points \(\textbf{x}\) satisfying the equation:

$$\begin{aligned} \textbf{w}\cdot \textbf{x}+ \textbf{b} = 0 \end{aligned}$$
(6)

where b is the bias and \(\textbf{w}\) is the weight vector which in linear case can be easily solved using Lagrangian multipliers.

SVM may efficiently classify high-dimensional data by mapping it onto a high-dimensional feature space such that the mapped data can be readily separated using lower-order polynomial functions, despite its origins in linearly separable situations. A kernel function is used to map original data into a high-dimensional feature space [5].

The following common kernels are used in SVM classifier:

  1. 1.

    Linear kernel:

    $$\begin{aligned} f(\alpha _{i},\alpha _{j}) = \alpha _{i}\cdot \alpha _{j} \end{aligned}$$
    (7)
  2. 2.

    Polynomial kernel:

    $$\begin{aligned} f(\alpha _{i},\alpha _{j}) = (\beta \alpha ^{t}\alpha _{j} + r^{2})^{2} \end{aligned}$$
    (8)
  3. 3.

    RBF kernel:

    $$\begin{aligned} f(\alpha _{i},\alpha _{j}) = e^{[\gamma ||\alpha _{i} - \alpha _{j}||^{2}]} \end{aligned}$$
    (9)
  4. 4.

    Sigmoid kernel:

    $$\begin{aligned} f(\alpha _{i},\alpha _{j}) = \tan h (\beta \alpha _{i}^{t}\alpha _{j} + r) \end{aligned}$$
    (10)

3.5 WBCD dataset

Breast cancer is one of the major reasons for the death of women around the world. It is an absolute urgency of finding treatment solutions and some accurate diagnosing. To deal with this problem, the raw breast cancer data are taken from Wisconsin Diagnosis Breast Cancer dataset from UCI machine learning repository. Features in the dataset are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe the characteristics of the cell nuclei present in the image [14].

These data are preprocessed by replacing the missing values with 0 instead of removing them as it should not introduce a noticeable impact on statistical data patterns of the dataset. The dataset is too small for our use after dividing the dataset into training and testing subsets. Another common issue with the dataset is skewness. So, in order to resolve the small dataset size issues and the data unbalance new data are generated of the same dimensions by introducing normal distributed random noise. Almost 75% of data are considered for training and 25% for testing. A piece of detailed information about the WBCD dataset is presented in Table 2.

Table 2 Description of the WBCD dataset

4 Proposed model

ABC faces a notable challenge in the form of stagnation, where the exploration phase experiences a gradual progression in the initial stages due to the evolving nature of the location. This indicates that the exploration process encounters difficulties in advancing at a desirable pace during the early phases of ABC. Finding an appropriate balance between the exploitation and exploration stages is one of the main difficulties addressed by the basic black hole algorithm. The algorithm frequently runs into problems because it tends to become stuck in local optima, which makes it difficult for it to adequately traverse the search space, which subsequently results in sub-optimal solutions. Another property of black hole algorithm is that when a star crosses the event horizon it gets destroyed. To keep the number of data points same as initial it generates new points randomly. Some areas of the search space may become densely inhabited as a result of the algorithm’s random creation of points, while other areas may remain mostly unexplored. As a result, the algorithm can find it difficult to fully explore the whole search area, which could result in biased findings or a failure to find the best answers. The algorithm’s efficiency and efficacy may be hampered by the uneven distribution produced by the random point creation, making it more difficult for it to successfully navigate the search environment. All these shortcomings led to the development of the proposed algorithm.

The architecture of the proposed GBHABC (hybrid artificial bee colony and black hole with genetic operators) model is given in Fig. 3. The pseudo-code for GBHABC is illustrated in Algorithm 3. In the first step, a support vector machine (SVM) is applied. SVM gives the best outcome when we use the RBF kernel function for classification. Different kernel functions can be specified for the decision function. Common kernels are provided, but we can also specify custom kernels. We are training six different kinds of SVM experts with varied values of RBF parameters. Values designated to the RBF variable for different SVM classifiers are [0.1, 0.2, 0.5, 1, 2, 5]. Later, in the result section, it has been shown that based on the selected RBF value, the performance measures of each SVM classifier differ. In the second step, the outcomes of the SVM expert system are ensemble using weighted majority voting. Littlestone and Warmuth [34] showed the minimization of errors made by the ensemble system, by providing weights to the majority voting procedure. Based on their classification accuracy, all the experts are provided with some weights.

Fig. 3
figure 3

Architecture of proposed GBHABC

figure c

We have proposed GBHABC model to generate weights for the six (SVM) experts for weighted majority voting. ABC has the limitation of being stagnant due to which the ABC algorithm gets stuck to the local optima or converges prematurely. So, black hole is introduced in order to make ABC more explorative. To create a simulation of the black hole phenomena, all candidate solutions are chosen as stars, and the solution with the highest fitness among the candidates is chosen as the black hole. The flowchart of our proposed algorithm is represented in Fig. 4.

Fig. 4
figure 4

Flowchart of the proposed GBHABC

4.1 Initialization and updation of positions

All the solutions are initialized in a random manner in the search space as shown in Eq. (11).

$$\begin{aligned} x_{\textrm{ij}}= x_{\textrm{lb}}+ \textrm{rand}[0,1](x_{\textrm{ub}}-x_{\textrm{lb}}) \end{aligned}$$
(11)

where \(x_{\textrm{ij}}\) represents the ith food source in the populous in \(j{\textrm{th}}\)dimension. \(x_{\textrm{ub}}\) and \(x_{\textrm{lb}}\) are the upper bound and the lower bound, respectively, for \(x_{\textrm{ij}}\) and \(\textrm{rand}[0,1]\) is a uniformly scattered arbitrary number in the bounds [0, 1].

After initializing the BH ( black hole) and the stars, the updation of positions by the solutions (stars) is done using the direction and the distance of the BH. The position of the stars is updated by the equation expressed by Eq. (12).

$$\begin{aligned} x_i(t+1) = x_i(t) + \textrm{rand}(x_{\textrm{BH}}- x_i(t)) \end{aligned}$$
(12)

where \(x_i(t+1)\) and \(x_i(t)\) represent the position of the \(i{\textrm{th}}\) solution during iteration \(t+1\) and t, respectively, \(x_{\textrm{BH}}\) is the position of the black hole (best solution) in the search space and \(\textrm{rand}[0,1]\) is a uniformly scattered arbitrary number in the bounds [0, 1].

If a solution (star) achieves a better position than the black hole, their positions are swapped, and a new star is added as the BH of the search space. Furthermore, if a star advances towards the BH, it has the potential to cross the BH’s event horizon (EH). Crossing the horizon signifies that the star will be absorbed in the BH, i.e. eliminated, and a new star will be generated, i.e. a new solution in the search space will be produced.

4.2 Generation of a new star

When a star is absorbed by the black hole, a new star is created using one of the following strategies:

  1. 1.

    A new star is created at random with a probability of 0.65 (by using uniform distribution ).

  2. 2.

    A new star is created by selecting two stars from the population and recombining their attributes using a recombination procedure. The best of them will be added to the population later on. This approach is used with a probability of 0.35 at first.

The probabilities for regeneration [0.65, 0.35] change during the progress of the model. For instance, using the 1st strategy the model starts with a 0.65 probability of generating stars. When the ratio of the iterations to the total number of iterations reaches 0.2, BH changes the probability for generating stars using the 2nd strategy to be 0.65 and sets the probability to be 0.35 for the 1st strategy. It will alternate again when the ratio reaches 0.4. The alternation of probabilities will continue for each 20% progress of the iterations until the stop condition is satisfied. By implementing the above strategies, we can alternate the focus on exploitation and exploration during model progression and achieve the best balance between them in order to find the optimal solution.

4.3 Recombination method for generation of a new star

According to the fitness value, the current population is sorted and divided into 2 subpopulations, i.e. \(p_{\textrm{male}}\) and \(p_{\textrm{female}}\). Randomly one star is selected from both, and a two-point crossover is applied over the selected stars. Then, among the two resulting stars created by crossover, the best one is selected. Finally, swap mutation is performed on the newly generated star to have diversity and even distribution in the search space. This star is added to search space which makes the search space better and proposed model efficient.

Then the subsequent phases of the model follow the same steps as ABC. Similar to ABC, GBHABC is also divided into three segments which are improved employed bee segments where the BH phenomenon with the genetic operators is applied, the onlooker bee segment and the scout bee segment. All other phases except the first one are kept similar as in the original ABC algorithm.

5 Experimentations and results

This section compares and contrasts the findings achieved using the suggested model and those obtained without it. First, there will be a quick review of the assessment measures. As previously stated, our research consists of two key experiments. The outcomes of traditional approaches will be given and debated in the first experiment. The outcomes of the suggested model are then presented and discussed in the second experiment.

5.1 Evaluation metrics

One of the most crucial steps is the comparison of a model by evaluation metrics. By utilizing and studying these metrics, we can analyse the performance of the proposed model and compare it with the existing methods. For this regard, 8 metrics have been considered, which are sensitivity, precision, specificity, F1 score, accuracy, FPR, FNR, MCC computed using the true positive (TP), false negative (FN), false positive (FP) and true negative (TN) rates. The confusion matrix was created for different RBF values.

As mentioned above, 8 metrics have been used to compute the performance of various RBF kernels of SVM. The different values taken for RBF in SVM were 0.1,0.2,0.5,1.0,2.0 and 5.0. They were calculated using Eqs. (13)–(20):

$$\begin{aligned} \textrm{Specificity}= & {} \textrm{TNR} = \frac{\textrm{TN}}{(\textrm{TN} + \textrm{FP})} \end{aligned}$$
(13)
$$\begin{aligned} \textrm{Sensitivity}= & {} \textrm{TPR} = \frac{\textrm{TP}}{(\textrm{TP} + \textrm{TN})} \end{aligned}$$
(14)
$$\begin{aligned} \textrm{Precision}= & {} \frac{\textrm{TP}}{(\textrm{TP} + \textrm{FP})} \end{aligned}$$
(15)
$$\begin{aligned} F1= & {} \frac{2\textrm{TP}}{(2\textrm{TP} + \textrm{FP} + \textrm{FN})} \end{aligned}$$
(16)
$$\begin{aligned} \textrm{Accuracy}= & {} \frac{(\textrm{TP} + \textrm{TN})}{(\textrm{TP} + \textrm{TN} + \textrm{FP} + \textrm{FN})} \end{aligned}$$
(17)
$$\begin{aligned} \textrm{FPR}= & {} \frac{\textrm{FP}}{(\textrm{FP} + \textrm{TN})} \end{aligned}$$
(18)
$$\begin{aligned} \textrm{FNR}= & {} \frac{\textrm{FN}}{(\textrm{FN} + \textrm{TP})} \end{aligned}$$
(19)
$$\begin{aligned} \textrm{MCC}= & {} \frac{(\textrm{TP}\cdot \textrm{TN}) - (\textrm{FP}\cdot \textrm{FN}) }{\sqrt{(\textrm{TP}+\textrm{FP})(\textrm{TP}+\textrm{FN})(\textrm{TN}+\textrm{FP})(\textrm{TN}+\textrm{FN})}} \end{aligned}$$
(20)

5.2 First experiment

This experiment discusses the results obtained by using different RBF values for SVM algorithm. As previously mentioned in our first experiment, 699 records of whole WBCD dataset were used. The confusion matrices are shown in Table 3 that shows the results for different RBF values of SVM. The table shows the confusion matrix of the testing set used for this experiment. 25 per cent of the dataset is used for this step to verify the feasibility of SVMs in the current study. It can be observed that in the best case, as shown in Table 3, 61 malignant records were detected as malignant and were classified as TP, and in the best case, 109 benign records were detected as benign and were classified as TN. Moreover, in the best case, 0 benign records were classified as malignant and 52 malignant records were classified in the benign class. The calculated F1 score, specificity, sensitivity, precision and accuracy values are illustrated in Table 4, for training steps.

Table 3 Confusion matrix for various SVM RBF values
Table 4 Evaluation metrics of SVM

According to Table 4, the evaluation metrics with the greatest values were produced in SVM with RBF value 0.1. It was able to obtain a 97.14 per cent accuracy rate. It can also be observed that the different RBF values tested were 0.1, 0.2, 0.5, 1.0, 2.0 and 5.0. There are two important factors in RBF SVM that determine the final results: gamma\((\gamma )\) and C. The C parameter strikes a compromise between the proper classification of training examples and the maximization of the decision function’s margin. If the decision function is better at reliably categorizing all training points, a narrower margin will be acceptable for higher values of C. A lower C encourages a larger margin and, as a result, a simpler decision function; however, this comes at the cost of training accuracy. This means that the C parameter functions as a model regularizer. The gamma\((\gamma )\) parameter controls the impact of individual training samples on the overall model, which influences the final output. The gamma\((\gamma )\) parameter has a big influence on how the model behaves. If gamma is set too high, the radius of the support vectors’ sphere of action is reduced to the support vector itself, and no amount of regularization using C can avoid overfitting. To make the procedure efficient and less difficult, just the gamma\((\gamma )\) parameter was adjusted for multiple SVM kernels based on these data. The parameters used during the experiment are listed with their values in Table 5.

Table 5 Parameter settings of the proposed GBHABC algorithm

Figure 5a–d shows the comparison between above-discussed SVMs’ according to different evaluation metrics. Figure 5a compares all the kernels according to their accuracy, data are given in Table 4. Similarly, Fig. 5b–d compares specificity, sensitivity and F1 score, respectively. These graphs confirm that changing gamma from 0.1 to 5.0 affects the outcomes of the models. Table 4 shows that when the value of gamma\((\gamma )\) is increased from 0.1 to 5.0, the accuracy reduces from 97.14 per cent to 70.29 per cent. Similarly, sensitivity and f1 score also decrease with increasing the gamma\((\gamma )\) value. On a fundamental level higher value of gamma\((\gamma )\) makes the model more forgiving, but trades off the accuracy and other metrics. On the other hand a lower value of gamma\((\gamma )\) is more stern on the search space selection and has higher accuracy. Both higher value and lower value of gamma\((\gamma )\) has their benefits and limitations. So in the next experiment we utilize the maximum potential of these different kernels by using them in cohesion with each other with the help of reliable ensemble learning and our proposed hybrid model to improve the results further.

Fig. 5
figure 5

SVM performance metrics

A little discussion regarding the relevance of the features of the dataset is furthered. According to Table 6, the most essential aspects for identifying BC were bare nuclei, uniformity of cell shape, normal nucleoli, uniformity of cell size and bland chromatin, whereas the least important criteria for identifying BC were single epithelial cell size and marginal adhesion. Table 6 shows the relative relevance of all aspects using the coefficient characteristics. For example, the most crucial trait for identifying breast cancer is the coefficient value of Bare nuclei, which is 0.28. With a value of 0.00, the coefficient value of marginal adhesion has the lowest relevance of all the attributes. These discoveries might be extremely beneficial to doctors who treat patients with breast cancer. In other words, taking into account the most essential characteristics can increase the quality of breast cancer diagnosis while lowering the mistake rate.

Table 6 Coefficient for each attribute

5.3 Second experiment

The next phase of our study was to implement our proposed model to optimize the weights of an ensemble model which uses different SVM kernels discussed in experiment 1. As discussed before, SVM kernels with varying gamma \((\gamma )\) values have different results and benefits. So, we propose that we use all these different kernels in a weighted voting ensemble model with weights of the model being optimized by our proposed GBHABC model.

Finally we evaluated our model on the same metrics on which individual SVM kernels were evaluated. For even more clarity, results of this experiment are on a larger pool of dataset than the previous experiment, but the general distribution of the dataset and the dataset itself were same. Our model achieved an accuracy of \(99.42\%\), sensitivity of \(98.41\%\), specificity of \(100.00\%\), precision of \(100.00\%\) and F1 score of \(99.20\%\). These results show that our proposed approach has improved the overall results of the model on the Wisconsin dataset.

Figure 6 shows the accuracy vs epochs comparison for training and validation steps. The y-axis represents accuracy in the range from [0, 1] which on multiplication with 100 can easily be converted to percentage measure. The x-axis represents the number of epochs. Training step is denoted with a blue line and the validation step is denoted with an orange line. As observed, the training accuracy remains almost constant throughout all the epochs, only dipping by a mere \(0.57\%\) during the entire process. Whereas the validation accuracy improves consistently from a value around \(95\%\) to \(99\%\) and during the process, it becomes equal to the training accuracy and finally settles down. This graph proves that our model handles overfitting really well and is generalized for more data points rather than just being able to classify only the training examples.

Fig. 6
figure 6

Accuracy vs epochs for training and validation steps

We performed our experiment on different dataset distributions. The distributions were 50–50, 60–40, 75–25 as training and test sets. We calculated accuracy, sensitivity, specificity, precision and F1 score for the above-mentioned dataset distributions. The results of the experiment with different dataset distributions are given in the Table 7. As observed from the table, the accuracy and other evaluation metrics were very much close to each other, and thus, it can be inferred that the model is stable for different splits of the data.

Table 7 Dataset distribution

Equations (13)–(20) were used for final evaluation of our model. The final results for the proposed hybrid GBHABC on the basis of accuracy, specificity, sensitivity, precision, F-1 score, FPR, FNR and MCC are 99.42%, 100.00%, 98.41%, 100.00%, 99.20%, 0.00, 0.02 and 0.99, respectively, and the same is listed in Table 8. These results signify that the proposed model is superior to the classical SVM models. The confusion matrix for the given results is shown in Table 9. Figure 7 shows the visual representation of the obtained results for better and easy understanding. All these results imply that the proposed GBHABC model had a positive impact on the detection of breast cancer by improving the ability to correctly classify all the cases.

Table 8 Evaluation of our model
Table 9 Confusion matrix of our proposed GBHABC
Fig. 7
figure 7

Metrics representation

5.4 Comparison with previous studies

We are going to expand on matters a bit by mentioning that we compared the findings obtained by utilizing the proposed methodology, with those that were already reported in the prior study. These studies had differing results, but all focused on the prediction of breast cancer incidence. Table 10 given in our study is used to represent the present levels of accuracy in our model’s development, compared to those of other models built in the past.

Table 10 Comparison of different methods applied for breast cancer prediction

Some of the models were able to achieve significantly higher accuracy on the breast cancer detection study. Sheikhpour et al. [44] proposed an approach by applying particle swarm optimization (PSO) with nonparametric KDE (kernel density estimation) and achieved an accuracy of \(98.45\%\). Abdar et al. [1] achieved a analogous accuracy of \(98.07\%\) by implementing a branched ensemble approach that utilized the stacking and voting techniques among the classifiers. Similarly, Nahato et al. [36], Zarbakhsh et al [56], Punitha et al [40] achieved \(98.6\%\), \(99.26\%\) and \(97.53\%\) accuracy, respectively.

Figures 8, 9, 10 shows the comparison between the proposed model and some of the previous studies with the help of graphs that display a cross-sectional comparison in the form of a bar graph with different hybrid algorithms that existed in the past. A tabular representation of the comparison on the evaluation metrics like accuracy, sensitivity and specificity of the proposed model GBHABC has been done with previously existing hybrid algorithms on the WBCD breast cancer dataset as shown in Table 11. Various hybrid algorithms like FS-SVM, PSO-SVM, ABC, BH-SVM and fruit fly-SVM were taken into account for the comparison with GBHABC. All of these algorithms were run on our preprocessed WBCD dataset for comparison with the proposed research, and the results were analysed. Among the aforementioned algorithms, FS-SVM had an accuracy of 96.60%, specificity of 95.27% and sensitivity of 97.79%. PSO-SVM had an accuracy of 97.81%, specificity of 98.87% and sensitivity of 96.87%. ABC had an accuracy of 97.95%, specificity of 96.34% and sensitivity of 98.42%. BH-SVM had an accuracy of 95.25%, specificity of 92.60% and sensitivity of 96.85%. Similarly fruit fly-SVM had an accuracy of 93.62%, specificity of 96.53% and sensitivity of 93.10%. Our model GBHABC has an accuracy of \(99.42\%\), a sensitivity of \(100.00\%\) and a specificity of \(98.41\%\). GBHABC definitely appears to be superior when compared to the other tests, according to its accuracy, sensitivity and specificity. Thus, we can conclude that the proposed model GBHABC is quite effective for an early diagnosis of breast cancer and better than already present methodologies.

Fig. 8
figure 8

Proposed vs old accuracy comparison

Fig. 9
figure 9

Proposed vs old sensitivity comparison

Fig. 10
figure 10

Proposed vs old specificity comparison

Table 11 Model comparison

6 Conclusion and future scope

According to the findings of the present study, we have suggested the GBHABC (artificial bee colony with black hole and genetic operators) model for the detection of breast cancer by distinguishing tumours as malignant or benign at an early stage of development. To do this, we used the well-known WBCD dataset from the University of California, Irvine dataset repository to get raw breast cancer data. The proposed GBHABC-SVM model consists of six SVM classifiers (differentiated by the RBF parameter), which are combined into a weighted voting ensemble to improve performance.

It consisted of two primary experiments. The SVM method with three major parameters, C, \(\epsilon\) and \(\gamma\) , and was used in the first experiment. Here, a basic SVM model, a polynomial SVM (with default parameters) and multiple modified SVMs with different C, \(\epsilon\), \(\gamma\) and values were compared. Our findings show that optimizing these parameters improves SVM performance. We utilized the first experiment’s optimal parameters of C, \(\epsilon\), \(\gamma\) and in subsequent studies. As part of the second experiment, we combine all of these distinct kernels into a weighted voting ensemble model, with the weights of the model being optimized using our suggested model. A new model was introduced under the name of GBHABC. Our final findings show that the suggested technique allows us to achieve an accuracy, precision, sensitivity and f1-score of 99.42%, 100%, 98.41% and 99.20% and, as a consequence, to be highly successful in terms of predicting breast cancer. In conclusion, we believe the GBHABC model should be used in medical research for breast cancer prediction.

It would be intriguing to do a preliminary investigation of the model on various kinds of medical data in the future. In future, the integration of this hybrid model into multi-class classification is extendable to other datasets and various types of cancer detection. Furthermore, alternative machine learning models can leverage diverse forms of data, such as images and X-ray signals, for cancer detection. Moreover, future research endeavours may also encompass the examination of early cancer detection and the anticipation of cancer staging and can be extended to other diseases like brain tumour, glaucoma and diabetes detection.