1 Introduction

Data mining refers to the procedure of uncovering unexpected patterns from huge volumes of data. Classification is a kind of data analysis that pulls out models that outline key data classes. These models, known as classifiers, project categorical (unordered, discrete) class labels. Classification is a key task in the data mining process. A training set is an input dataset within a classification problem, where the aim is to utilize the training dataset to construct a class label model. This model can then be employed to build up output when the intended output is not specified.

Several approaches, such as the neural network (NN) [1, 2], radial basis function (RBF) [3], Naive Bayes (NB) [4], A case-based reasoning [5], and support vector machine (SVM) [6, 7], have been effectively employed to solve classification problems [8]. Among these, the NN is a renowned and extensively deployed approach. Different NN models have been conceived, such as RBF, feed-forward, modular, multilayer perceptron (MLP), and probabilistic neural networks (PNNs).

Conversely, a metaheuristic methodology is an iterative generation procedure which steers a subordinate heuristic by blending intelligently dissimilar conceptions for delving into and utilizing the search space. Various methodologies have been deployed to categorize and define metaheuristic algorithms. However, essentially, both population and single based metaheuristics canister to train an NN. Techniques based on a single solution include the simulated annealing (SA) approach [9], iterated local search approach [10], Tabu search [11], guided local search [12], and variable neighborhood search [13].

The utilization of a population-based technique along with an NN has drawn lot of attention. The reason behind this is that when an NN is blended with an evolutionary algorithm (EA), a system with greater intelligence can be developed than when only an EA or an NN is utilized [14]. For example, the particle swarm optimization (PSO) algorithm has been utilized for training an NN [15]. Other swarm intelligence approaches include ant colony optimization (ACO) [16], the genetic algorithm (GA) [17], artificial bee colony (ABC) algorithm [18], harmony search algorithm (HSA) [19], cuckoo search (CS) algorithm [20], artificial fish swarm (AFS) algorithm [21], firefly algorithm (FA) with PNN [22, 23], and biogeography-based optimization (BBO) [24].

BBO is a newly created population-based stochastic metaheuristic optimization algorithm that was first introduced in 2008 by Dan Simon [25]. The performance of BBO has been assessed by comparing it with seven other biology-based algorithms. The results obtained are strongly in favor of the BBO algorithm. It has been applied in several domains and fields, such as image processing, scheduling, image classification, and feature extraction. Moreover, the efficiency of the BBO algorithm has been demonstrated when applied to a real-world sensor selection problem for aircraft engine health estimation [26]. The performance of BBO has been compared with many algorithms in the literature. It has performed well, which has encouraged other researchers to improve the BBO algorithm by using it in conjunction with different methods in order to solve different optimization problems.

Nevertheless, despite recent advances there is still a need for more research to determine how a sound initial state can steer an algorithm toward improved classification precision and greater speed of convergence when solving classification problems. Therefore, this study deploys local search (i.e., the SA algorithm) to formulate an initial solution for classification problems. It then employs a heuristic initialization hybridized with BBO to enhance the quality of the initial solution by painstakingly probing around it, thereby enhancing the solutions’ performance in determining the best weight for the PNN classification approach. The research presented investigates how the initial population of BBO can be generated in order to hasten convergence and the ultimate objective values.

The remainder of this research paper organized as follows: In Sect. 2 explains the initialization of the population. Section 3 explains the hybridized heuristic initialization utilizing the proposed initialization of biogeography-based optimization (IBBO) algorithm. Section 4 describes and analyzes the results of an experiment that tests the IBBO algorithm on 11 benchmark datasets. Finally, Sect. 5 concludes the paper.

2 Initialization of the population

The techniques deployed for population initialization can be categorized into sequential diversification, random generation, heuristic initialization, and parallel diversification [25]. The initial population of a metaheuristic algorithm is produced randomly. Any heuristic (e.g., local search) can be utilized to initialize the population [25]. Heuristic initialization begins from scratch (i.e., from a null/empty solution) and builds a solution by handing over values to one decision variable at a time until an entire solution is produced. At every step, a local search heuristic is employed to choose a new solution, and this approach seems to be more effective compared to random initialization [25].

Local search or any other metaheuristic algorithm can be used to generate the initial population of an algorithm. In this research, the SA algorithm [26] is used to generate the initial population by generating a random solution (as mentioned in the random construction method) and then applying the SA algorithm for a certain number of iterations to improve the quality of the solution. The improved solution is then added to the algorithm’s population. These steps are repeated until the population is filled with the required number of solutions. However, it should be noted that although a local search method can generate an initial population that contains a good-quality solution, the population may lose its diversity which may lead to premature convergence. A general pseudocode of the local search constructive heuristic is shown in Fig. 1.

Fig. 1
figure 1

Pseudocode for local search constructive heuristic

3 Proposed method: initial population of BBO algorithm

Biogeography-based optimization has been applied in several domains and fields, such as image processing, scheduling, image classification, and feature extraction [27]. The results obtained by the BBO algorithm compare favorably with some other classical metaheuristics such as the GA, PSO, and ACO when applied to a similar problem, and the BBO algorithm also consistently outperforms these algorithms in finding an optimal solution for some other problems [27]. Moreover, experimental results have revealed that the BBO algorithm can be considered a promising alternative approach for solving optimal power flow problems because it can converge to a better-quality solution [24]. It also has good convergence characteristics and robustness compared with PSO, differential evolution (DE), and some other approaches [27].

In recent years, BBO has received greater attention due to its uniqueness and amazing performance. Also, it performs well when compared with some other metaheuristics such as the GA, PSO, and ACO [27]. Also, they have confirmed their influential skills in simulations and experimentations of computation and optimization. Biogeography-based optimization can show significant characters in numerous areas, including data mining, bioinformatics, and information security, and its current applications indicate that BBO has the capacity for real-world deployment [27].

BBO has been proved to be very efficient in solving many NP hard problems, that is the problems for which event the best-known algorithms have exponential time complexity [28]. Typically, the time complexity of which is O(n). It is not difficult to see that the complexity of each iteration of the algorithm is O(n2 D + nO(f)), where O(f) is the time complexity for computing the fitness function f [29].

Therefore, the aim of this study is to investigate the impact of a good initial population on the convergence speed and the final objective values by using heuristic initialization (i.e., local search) to help the BBO algorithm to create the initial population. Local search (i.e., the SA algorithm) is used to create a good initial population in order to reduce the number of iterations (i.e., to achieve a better convergence speed) and to enhance the quality of the initial population so that high-quality solutions can be found.

Specifically, this study proposes a sequential hybridization of SA and BBO rather than the version of BBO applied in Mirjalili [30] and others. Basically, in this study, it is hypothesized that having a good initial point for BBO will ameliorate the finding of the optimal values for NNs. Thus, the aim of this study is to use a heuristic initialization (i.e., local search) to help the BBO in determining the initial population to see the good solution from anywhere in the problem space and prevent corresponds as a random search. The local search (i.e., the SA algorithm) is used to create a good initial population to reduce the number of iterations in BBO (i.e., to achieve a better convergence speed) and to enhance the quality of the initial population so that high-quality solutions can be found.

Figures 2 and 3 show the proposed structure of the IBBO algorithm. Figure 2 shows the first part of the proposed algorithm which involves using the SA algorithm to produce different initial solutions (hybrid initialization of the population) to find a good initial population state that could serve to maximize classification accuracy also and thus overcome the low convergence issue when solving classification problems. The SA algorithm is based on the ideas of statistical mechanics whereby the annealing process is used to heat a substance which is then slowly cooled to create a powerful crystalline structure. The strength of the construction is dependent upon the speed of the cooling of the metals. The SA algorithm was the first algorithm accepting non-improving neighbors to be selected. SA escaping from local optima and the better transfer are all the time accepted [31].

Fig. 2
figure 2

Flowchart of the first stage of the proposed IBBO approach: heuristic initialization by SA

Fig. 3
figure 3

Flowchart of the second stage of the proposed IBBO approach

The second part of the proposed approach, which is shown in Fig. 3, involves the BBO, which is used as an improvement algorithm that aims to try to control random steps in the BBO mechanism, after evaluating the population in each iteration. The BBO approach is one of the efficient methods for solving complex problems, so BBO was employed to have optimal values for the optimize weight of the PNN [24].

Thus, the IBBO algorithm is composed of two main stages, namely initialization (SA) and BBO. A set of steps is performed in each of these stages. In the SA stage, the SA algorithm is used to make the initial (solution) population of candidate solutions for the weights of standard PNN (in this case). In the second stage, the fitness value of all the candidates is calculated according to the BBO algorithm and searches are conducted to find better solutions to ensure efficient convergence, quality solutions, and ultimately obtain the optimal parameter settings for training the PNN, and thereby achieve better classification accuracy.

4 Experimental results

An experiment was conducted to assess and compare the performance of the proposed IBBO approach and that of some other hybrid approaches in terms of classification quality and convergence speed when applied to a number of datasets. The results for classification quality are presented in tabular form in terms of accuracy, specificity, sensitivity, and error rate. The results for convergence speed are presented as graphical depictions of the algorithms’ convergence behavior. The results thus obtained are corroborated by employing a statistical significance test (i.e., the Wilcoxon test), and a brief commentary is provided.

The experiment was performed on 11 datasets that were used in another study [22]. The size of the training and testing sets was also adopted from that study. Table 1 summarizes the characteristics of the datasets, and Table 2 summarizes the parameters for the proposed algorithm that were determined after number of preliminary experiments.

Table 1 Datasets characteristics
Table 2 Parameter settings

4.1 Results for classification quality

The classification quality achieved by the tested algorithms depended on their generation of a sound initial population state for facilitating faster convergence and enhancing solution quality. Table 3 provides a comparison of the performance of the proposed IBBO and that of BBO when a preliminary solution is randomly generated. The optimal results are indicated in bold.

Table 3 Classification accuracy, error rate, sensitivity, and specificity (%) for the IBBO algorithm and BBO-PNN

The first contribution of this study is the proposition of a new hybrid method that combines the PNN and BBO to solve classification problems using random initialization. The second contribution is the use of a heuristic initialization (i.e., local search) in order to help the BBO to start from a good solution. Local search (i.e., the SA algorithm) is used to create a good initial population in order to reduce the number of iterations of BBO and thus enhance the quality of the solutions. Table 3 shows a comparison of the results obtained by IBBO with PNN and those achieved by BBO-PNN with a randomly generated preliminary solution.

As can be seen from Table 3, the hybrid IBBO methodology shows improved performance with regard to classification precision compared to the BBO. The IBBO clearly outshines the BBO technique and can attain greatest classification precision for the tested datasets. Table 3 also indicates that the IBBO shows better performance with depend on sensitivity than the BBO on the all of the datasets. For instance, in the case of the BC dataset, IBBO attained a sensitivity of 83%. In other words, when a diagnostic test is conducted to find a patient who has a particular disease, there is an 83% prospect of positively detecting the patient. As for specificity, in the case of the ACA dataset, IBBO attained 98%. This means that if a diagnostic test is conducted on a patient who does not have a specific disease, there is a 98% chance of detecting a patient who is a negative case. In light of the above-reported results, the IBBO is superior to the BBO with regard to classification precision, specificity, sensitivity, and error rate. This is because when the algorithm begins from a good initial state, it can improve the ultimate objective values.

4.2 Results for convergence speed

The convergence behavior of the proposed IBBO algorithm was examined with regard to the generation of various preliminary solutions for determining a good initial population state so as to warrant effectual convergence, centered around the BBO on the test datasets.

Figure 4 depicts the simulation outcomes of the convergence attributes of the IBBO algorithm and that of another BBO technique, the (LSFA) [22]. The figure shows that IBBO has a faster convergence than LSFA and that the convergence trend is the same for all the datasets tested.

Fig. 4
figure 4

Convergence characteristics of LSFA and IBBO algorithms

As the simulation results indicate, the search scheme of the IBBO algorithm accelerates convergence by blending the heuristic initialization approach with the LSFA. Prior research works have observed that heuristic initialization can be employed to accelerate the search process and can also be utilized to attain an almost-optimal solution [32]. Here, the heuristic initialization algorithm is able to come up with viable initial solutions, thereby driving quicker convergence. According to the simulation outcomes, the IBBO exhibits quicker convergence in comparison with other models primarily because this algorithm begins after a good preliminary population state.

In sum, IBBO is an appropriate approach that can be deployed to solve classification problems, considering that it exhibited decent performance with regard to both classification quality and speed of convergence.

4.3 Comparison with the literature

The proposed IBBO approach is also compared with some renowned approaches in the literature which have attained the best-known outcomes. The approaches used for this comparison are neural fuzzy inference system [24], fuzzy NNs [25], back-propagation (BP) [32], artificial neural networks (ANNs) [33] (CBA) [34], and hybrid simplex-GA [35], as well as the LSFA, SFA, LFA, and FA [22]. The acronyms for these state-of-the-art methods are given in Table 4.

Table 4 Acronyms of the compared methods

The outcomes of the comparison are based on 20 independent runs on 11 benchmark datasets. The optimal outcomes are depicted in bold in Table 5. It should be noted that none of the methodologies in Table 4 tested on wholly datasets utilized in this study. Therefore, the comparison of the performance of the proposed approach differs by dataset in terms of the methodologies compared and depends on various parameter settings which could lead to different classification precision results (Table 3).

Table 5 Comparison between the results using IBBO and state of the art

As can be seen from Tables 5 and 6, IBBO performs best overall, followed by LSFA in terms of classification quality. Furthermore, IBBO is able to categorize the Fourclass dataset, sans any erroneous classification (precision of 100%). The outcomes of the experiment, as depicted in Tables 5 and 6, suggest that the proposed hybrid method (IBBO) outclasses the other methodologies in the literature on seven out of the 11 datasets (HSS, AP, Heart, BC, SPECTF, ACA and Fourclass). Furthermore, IBBO exhibits superior performance with regards to classification precision compared to BBO. The IBBO undoubtedly outclassed the other hybrid approaches and can attain the highest classification accuracy on all the datasets tested.

Table 6 Comparison of results using the second part of the datasets between IBBO and state of the art

4.4 Validation and analysis of results

A statistical analysis of the performance of IBBO was also undertaken to ascertain whether it was statistically dissimilar from the LSFA. For this purpose, a Wilcoxon test was conducted with a significantly of 95% (α = 0.05) for classification precision, specificity, and sensitivity. Table 7 depicts the p values obtained.

Table 7 p values using Wilcoxon test for the algorithms of IBBO and LSFA for classification accuracy, sensitivity and specificity

From Table 6, it can be seen that, with regard to the classification precision, the outcomes attained by the IBBO are statistically different with p value less than 0.05) compared to those of the LSFA algorithm (with the exception of the ACA, GCD and Fourclass datasets). Notably, there was no significant difference when using LSFA and IBBO algorithms in six datasets that exhibited a p value more than 0.05 (indicated in bold). With regard to specificity, the IBBO and the LSFA are statistically different (except for the ACA, LD and Fourclass datasets).

One of the main shortcomings of the BBO can be found on the exploitation side, and this shortcoming led researchers to propose modified versions of BBO [27]. One of the main reasons for proposing the hybrid IBBO approach in this study was to increase the exploitation capability by using local search (i.e., the SA algorithm) to allow BBO to be initiated with a strong population.

5 Conclusion

The ultimate aim of this study was to enhance the solutions generated for classification problems by addressing two challenges: (1) making adjustments to the internal weights of the network to enable high-quality classification precision and (2) attaining a faster convergence speed. This study examined the feasibility of using BBO along with a PNN algorithm. Here, the BBO was utilized to make adjustments to the PNN’s weights. Then, additional research was carried out to ascertain how an initial state could determine improved classification precision with faster convergence speed when resolving classification problems. This study employed the SA algorithm to produce a preliminary result for the improve classification problem. Then BBO was applied to enhance the quality of the preliminary solution by painstakingly searching around it to determine the best weight for enhanced PNN classification methodology. The proposed hybridized IBBO methodology exhibited superior overall the performance of 11 benchmark datasets used. Going forward, the proposed approach could be further developed and applied to real-value datasets in various domains and also to major real-world problems including text and web mining. Recently, other advanced versions of BBO have been proposed for highly demand none convex optimization problems. These versions could be tailored for classification problems and then compared with the IBBO proposed in this study.