1 Introduction

The widespread application of information technology at all levels has contributed to an exponential growth in the volume of stored data [1]. These data sets contain a wealth of information that is not readily accessible, but which, if retrieved, may be of great value and significance. The extraction of this information from the data volume is called data mining [2,3,4,5]. The feature selection (FS) process is one of the important stages in data mining as it is used to reach this information in the fastest and most accurate way. The process involves selecting the features that best fit the study problem. So, instead of selecting all of the features, only the specific features related to the study problem are chosen [6]. This means that only those features that are important for making an informed decision [7,8,9] about a particular problem are selected for further processing. This approach therefore allows for considerable classification accuracy [10] and also helps to improve the understanding of statistical structures and performance [11] in a shorter time.

The FS process is composed of three key steps [12]. First, the generation process generates subsets of the entire data array. This is followed by the assessment phase and the validation phase [13]. In FS, the features associated with a problem are selected by using either a filter or a wrapper approach during the assessment phase [14]. A number of specific approaches and selection criteria have been developed for the filter approach, such as mutual information [15], information gain [16], principal component analysis and others [17]. On the other hand, a classifier in wrapper form, such as the K-nearest Neighbor (KNN) and the support vector machine (SVM) [18], can also be utilized.

If, in the generation process all the possible subsets of a data set are obtained, this results in a problem with very high complexity that requires a high computation time of 2^n, where n is the number of features in a data set [19]. Many artificial intelligence (AI)-based methods have been used to reduce the search time and at the same time ensure reliable and accurate results. One of the AI methods that is most widely used at the current time is the metaheuristic algorithm. This type of algorithm can be employed as a tool to search for patterns and traits so as to improve the performance of a classifier [20]. A metaheuristic algorithm can perform a random search, thus accelerating the speed of finding relevant features. Furthermore, to ensure accuracy, it takes into account the information gathered during the search to guide the ongoing search process [21]. There are several types of metaheuristic algorithm that are inspired by nature and have been proved to be effective [22, 23],in finding solutions for many AI problems [24, 25].

Metaheuristic algorithms are widely used in problem solving of FS [26,27,28]. They randomly generate subsets based on parameters that decide how those algorithms work [29], and the execution times have been shown to be dramatically reduced and consistent results provided [30, 31]. Some of the metaheuristics that have been used to solve FS problems include the mine blast algorithm (MBA), chaotic dragonfly algorithm [32], ant colony optimization [33], particle swarm optimization [34], bee colony optimization [35], grasshopper optimization [36], the chaotic crow search algorithm [37], evolutionary gravitational search algorithm [38], cat swarm optimization algorithm [39], harmony search algorithm [40] and cuckoo search algorithm [41] among others [8, 25, 31, 42,43,44].

Monarch butterfly optimization (MBO) is another metaheuristic algorithm that is nature inspired. The MBO is based on the migratory behavior of a North American butterfly, and was initially developed by Wang et al. [45].. The MBO has obtained satisfactory results when applied to many optimization problems and FS problems [46].

Building on this success, in this paper, the MBO is modified in two ways in order to try to achieve more reliable results. The first modification involves the use of an enhanced crossover operator, and is named MBOICO. The second modification involves the use of Lévy flight, and is named MBOLF. The performance of these two proposed methods is assessed and compared with four recent wrapper FS methods, namely, the original MBO, the hybrid whale optimization algorithm with SA (WOASAT), the mine blast algorithm with simulated annealing (MBA-SA), gravitational search algorithm (HGSA) and the (BGSA). The findings demonstrate that MBOICO is superior to these approaches in terms of classification accuracy and the number of features chosen (selection size).

The remainder of this paper is structured as follows: Section 2 provides an overview of the works that are the most relevant to this study. Next, section 3 describes the MBO algorithm. Then, section 4 introduces the proposed modified MBO methods, namely, MBOICO and MBOLF. This is followed by section 5, which describes the experiments and discusses the results of those experiments. Finally, section 6 draws some conclusions and makes some suggestions for possible future research directions.

2 Related works

In the context of nature-inspired metaheuristics, the first noteworthy study is that by Wang et al. [47], who implemented the (EWA) algorithm which was inspired by the earthworm’s two forms of reproduction (named Reproduction 1 and Reproduction 2). Reproduction 1 produces only one offspring, while Rep roduction 2 generates one or more offspring at a time, and nine enhanced crossover operators can effectively do this. The researchers also added Cauchy mutation (CM) the EWA method. Nine different EWA methods based on nine improved crossover operators were implemented with one, two and three offspring, respectively, giving a total of 27 variations of the method. The findings indicated that EWA23 improved performance, and on most of the tested benchmark data sets, it was able to achieve a decent level of fitness as compared to the others. In another approach, Sayed et al. [48] introduced a hybridized model based on the slap swarm algorithm (SSA) and chaos theory. The researchers also evaluated the efficacy of tenchaotic maps as a means of improving the results of the basic SSA. The outcomes demonstrated that the SSA disorderly model greatly improved SSA efficiency in respect of both the global and local search.

Chatterjee et al. [49] implemented S-shaped and V-shaped conversion functions to transform the (SSD)‘s continuous search space to binary form and also implemented a local search algorithm, named (LAHC), to improve the SSD’s exploitation functionality. The results showed that this approach achieved high classification accuracy with low selection size. On the other hand, Ghosh et al. [50] instituted a mimetic algorithm for handwritten word recognition based on wrapper-filter FS. While, in another recent work, Shunmugapriya et al. [35] introduced a new hybrid ant and bee colony model for enhancement the FS problems. Zawbaa et al. [51], used (ALO) in a wrapper approach to optimize FS, whereas Shivalingegowda and Jayasree [52] proposed an approach to enhance the FS process in a wireless network by using the gravitational search algorithm .

Abdelbaset et al. [53] tried to solve the feature selection problems using mutation operator with the gray wolf algorithm. The findings demonstrate how the operator of the mutation increases the functionality and feasibility of this approach to enhance the feature selection process. Moreover, in [54] Wu implemented the Pareto optimization with ABC algorithm to pick non-dominant features from big data. On the other hand, Emary and Grosan [55] enhanced ALO performance for FS with use of chaos. To show its efficacy in reducing features of complex data, they adapted their approach to biology and medicine data. In addition, Wang et al. [56] used data preprocessing with FS, using differential evolution (DE), and Shahbeig et al. [57] used binary DE for determining sponge iron consistency.

Mafarja et al. [36] implemented grasshopper optimization algorithm as well as evolutionary population dynamics in wrapper model the findings of which demonstrated the EPD’s potential to greatly improve the GOA in terms of accuracy and selection size.

Another mechanism in FS, which was developed by Malakar et al. [58] implemented a GA-based hierarchical FS method for Indian handwriting recognition, the findings of which indicated a significant increase in accuracy of the letter recognizing. In another work, Saidi et al. [59] developed a hybrid solution that used the GA and the Pearson coefficient of correlation to improve the FS procedure; and Emary et al. [60] suggested a wrapper FS structure using a binary ant lion approach and demonstrated that a fast convergence speed along with high classification efficiency could be achieved through their system.

Several approaches were hybridized to address the problems with FS. For example, in [61, 62] a hybridization was introduced between the ACO and the GA. In addition, Mafarja and Abdullah [63] employed SA to enhance the GA local search. Also, Zorarpaci and Ozel [64] proposed a wrapper model hybrid of ABC algorithm and DE. On the other hand,

Zhang et al. [65] performed the first analysis of a cost-based function selection problem on multi-objective (PSOs) by integrating the hybrid operator, the crowding distance, and the Pareto dominance relationship. The researchers named their method HMPSOFS. The proposed multi-objective algorithm is matched on that HMPSOFS can more efficiently scan the solution space as a multi-objective algorithm in order to achieve multiple test data sets with three multi-objective FS algorithms. The experimental findings demonstrated a set of non-dominated solutions rather than a single one. Examining the Pareto front obtained by the multi-objective algorithm will help users select their favorite solutions to meet their own needs. So, it is a highly efficient way of selecting features to address cost-based selection problems.

Yong et al. [66] proposed a multi-objective selection algorithm based on bare-bones particle swarm optimization. Their approach relied on a fortified memory technique that was intended to solve the particle depletion phenomenon. That is, a hybrid mutant was used with aim of enhancing the search capabilities of the proposed algorithm.

On the other hand, Mafarja and Mirjalili [67] proposed a new hybrid integrating WOA and SA in Wrapper FS (WOASAT) process. SA was used in their research in two manners: during the first iteration, SA and WOA collaborated with each other to pick features as one portion. The SA process was used in the second step following the end of the WOA. The findings revealed that the WOASAT approach improved the FS mechanism compared with new state-of-the-art methods in term of accuracy and selection size. In other research, Abualigah et al. [38] employed PSO as an FS approach, where a subset of features was extracted by using a clustering technique (k-means) to guarantee the clustering precision of the FS approach. This method strengthened the performance of textual record clustering because it used text clustering in k-means to identify further similar classes. The methodology was applied to six clustering data sets of textual data, and the findings were contrasted to the produced by certain excellently-known previous FS strategies. The performance evaluation showed that the new approach provided more reliable findings in terms of both classification accuracy and selection size.

In addition, Mafarja et al. [68] implemented the GOA to address FS problems with the use of two strategies based on transfer functions and the location of the better solution which keeps the current solution simple. The efficacy of the approaches introduced was contrasted with that of 11 existing methods using 25 comparison datasets. The findings revealed that the methods proposed improved the level of convergence in finding the best solutions from the features chosen relative to the methods before. Therefore, the mutation operators significantly improved the GOA.

In a recent study, Faris et al. [69] developed a crossover operator-driven scheme for solving FS problems that involved using improved binary SSA-based optimization. The researchers compared the performance of their proposed scheme with that of prior FS methods on 22 well-known data sets. The results showed that the proposed scheme had better detection precision than other methods such as BGWO, BBA and BPSO. Their approach also demonstrated the obvious dominance that convergence approaches had about hybrid methods.

Recently, Taradeh et al. [38] proposed a new approach using evolutionary gravitational search with two classifiers, KNN and DT, to solve FS problems. The results demonstrated that the algorithm had the ability to achieve high affinity rates through a balance between local and global search as compared to previous methods. The suggested HGSA can effectively outperform other wrapping approaches and demonstrate benefits in terms of exploration and exploitation, tradeoffs in search patterns and higher convergence speeds relative to other peers on multiple FS tasks.

Alweshah et al. [70] proposed two methods to enhance the FS process. The first involved the use of the MBA in order to automate the discovery step of the FS process. The second involved hybridizing the MBA with simulated annealing (MBA-SA) as a local search to enhance the MBA’s ability to locate solutions in the exploitation phase. The performance of the two suggested approaches (MBA and MBA-SA) was evaluated on 18 benchmark data sets from the UCI repository, and the detailed experimental findings indicated that MBA-SA obtained good results compared with other approaches in the literature. Later, Alweshah et al. [46] proposed a wrapper FS approach using the MBO. The experiments that were carried out showed that the MBO achieved a high classification accuracy rate and was able to dramatically reduce the number of selected features.

There are several other works which have employed MBO in varying fields and areas. Including Feng et al. [71] applied chaos theory and Gussian mutation worker to boost MBO ‘s global search capabilities. Their suggested approach, called CMBO, was used as a binary optimization problem to solve three large-scale group instances of 0–1 knapsack problem. The findings revealed that their approach was capable of outperforming the simple MBO and eight other algorithms in determining the best outcome to this type of problem. Chakrabarty et al. [72] used MBO with images collected by satellites to identify information in the captured images, such as forest areas and different terrains. The findings revealed that their approach obtained a high similarity between the break and the initial images and therefore had a slightly improved ability to segment very complicated images compared to other methods of segmentation. In addition, In the meanwhile, The ABC algorithm and MBO algorithm were applied by Ghanemet al [73] to optimize the solution to numerical optimization problems by optimizing the equilibrium between local search and global search phases in order to achieve effective results in terms of precision, convergence speed and preventing local optimization. Their suggested hybrid approach, called ABC-MBO (HAM), was applied to 13 datasets of benchmarks and the results were equated with nine metaheuristic algorithms used to solve numerical optimization problems. The findings showed that the ABC-MBO (HAM) had the potential to obtain more reliable results and improve the level of convergence relative to the other processes.

In another domain, Devikanniga and Raj [74] used the MBO algorithm to refine the weights of an artificial neural network classifier (ANN), MBO was used to create a model to differentiate a stable person from an adult with osteoporosis. Experimental findings revealed that the new model had a high potential for interpretation of medical data compared to other related models of ANN hybridization.

Another early MBO study was undertaken by Strumberger et al. [75], who suggested an MBO-based hybrid strategy with FA. The approach was tested on six sample datasets, and the study proved that the new hybrid approach was more effective in solving hard optimization problems than the five methods it was compared to. Also, Faris et al. [76] employed MBO with combat premature convergence more effectively. To this end, they used a novel operator to extend the outcomes of the hunt according to the precision and speed of the execution. In addition they used their approach to teach expectations of multilayers. They implemented their approach in 15 benchmark datasets, and the results revealed an increase both in terms of escaping local optima as well as a slightly faster convergence rate and shorten execution times.

Most of the preceding works used metaheuristic algorithms as a search approach for FS problems. Many of these works used these algorithms without modification. However, others modified them in several ways, and the most important of which was hybridization to improve the balance between local search and global search to prevent exposure to local optima. The MBO algorithm has been shown to obtain good results in terms of classification accuracy when it is used mostly without modification and this reflects the fact that there is a balance between global and local search in this algorithm. Nevertheless, in this paper, the global search attribute of the MBO algorithm was modified in order to achieve more precise results and to improve its efficiency in finding the correct points from the start and before turning to local search. This proposed methodology therefore differs from other approaches in the literature because with this method, the researcher aims to show the significance of global search in solving FS problems.

2.1 Monarch butterfly optimization

As mentioned above, MBO, which developed by Wang et al. [45], is one of the metaheuristic algorithms that mimics the behavior of fauna in the natural world. Specifically, it simulates the normal migration pattern of monarch butterflies that live in North America. Like many other butterflies, the monarch butterfly migrates twice a year. In the first migration, which starts in Canada the butterflies head south to Mexico, and in the second migration, the butterflies make the return journey from Mexico north to Canada [77].

The migratory activity of these butterflies is replicated to solve multiple optimization problems. However, certain rules and fundamental concepts must be followed so that the best solution to the problem can be found [78]. First, most of the butterflies that represent the population are either located in Land 1 (the pre-migration home) or in Land 2 (the post-migration home). Second, every butterfly’s offspring is produced individually through the migration operator, regardless of whether the siblings are present in Land 1 or Land 2. Note also that the population does not change; it must remain stable. Therefore, at each iteration, a fitness function is used to eliminate either the parent or the new child. Lastly, the 54 selected butterflies based on the fitness feature are transferred towards the next generation and are not changed by the migration operator.

Around the beginning of April, butterflies start to migrate when they leave Land 1 and go to Land 2, and then, in September, the reverse migration process from Land 2 to Land 1 begins. The cumulative number of monarch butterflies in the two lands represents the entire population, which is called NP.

The individual butterflies interpret the data and communicate with each other locally, spreading information within the swarm, which results in the evolving capability of the system. Examining mental systems in this way is important for understanding intelligence. In such systems. This general viewpoint also makes one analyze swarms differently naturally inspired rather than depending on a thorough knowledge of the micro-level.

2.1.1 Migration operator

The Butterfly migration mechanism is described as follows [79]:

$$ {X}_{i.k}^{t+1}={X}_{r1.k}^t $$
(1)

where \( {X}_{i.k}^{t+1} \) defines the Kth elements of Xi at generation t + 1, describing the location of the butterfly Iand\( {X}_{r1.k}^t \) denotes Kth elements from new generation place. In this, r is a random number which is defined by the given formulas:

$$ \mathrm{R}=\operatorname{rand}\ast \mathrm{peri} $$
(2)

Where, peri is the time span for migration.

In another case, if r > p, and Kth elements of the location of the new generation are then calculated by the equation:

$$ {X}_{i.k}^{t+1}={X}_{r2.k}^t $$
(3)

Where \( {X}_{r2.k}^t \) defines Xr2 Kth elements in t butterfly generation r2. And P indicates the ratio of monarch to land butterflies 1. An overview of a butterfly’s migration operator is given in Algorithm 1.

figure a

2.1.2 Butterfly operator adjustment

Using this algorithm the equilibrium is achieved by evaluating the P value ratio between the migration directions from land 1 to land 2. If the value of P is big, it indicates that the amount of selected butterflies from Land 1 is higher than that from Land 2, and conversely.

The position of the butterflies is modified if the rand produced is less than or equal to the P value. The following equation indicates the Butterfly location’s modified position:

$$ {X}_{j.k}^{t+1}={X}_{best.k}^t $$
(4)

where \( {X}_{j.k}^{t+1} \) represents the Kth elements of Xj at t + 1 generation,that explains butterfly position j, and\( {X}_{best.k}^t \) define the Kth elements of Xbest in both Land 1 and Land 2 at present generation t. Thus, if rand > P, then the following formula shifts it:

$$ {X}_{j.k}^{t+1}={X}_{r3.k}^t $$
(5)

Then, when the rand is greater than BAR, the following equation will change the current location:

$$ {X}_{j.k}^{t+1}={X}_{j.k}^{t+1}+\alpha \ast \left({dx}_k-0.5\ \right) $$
(6)

Where the BAR reflects the change factor for the butterfly and the dx is the walking step for the j butterfly which is determined by flying the Lévy as follows:

$$ \mathrm{Dx}=\mathrm{L}e\mathrm{vy}\ \left({X}_{\mathrm{j}}^t\right) $$
(7)

And α in Eq. (6) is a weighted variable measured with Eq. 8:

$$ \upalpha ={S}_{max}/{t}^2 $$
(8)

Where Smax Represents the maximum walking butterfly length in a single step, and t represents the current generation. The butterfly-adjusting system definition remains set out in Algorithm 2.

figure b

The general behavior of the MBO algorithm is defined in algorithm 3 after studying the behavior of the butterflies in algorithms 1 and 2. [80].

figure c

3 Proposed approaches

In this research, two methods proposed used the wrapper FS technique. First one is using an optimized cross over operator (MBOICO) to strengthen the search capability of MBO. Second, the Lévy flight is using in MBO search.

3.1 MBO with improved crossover operator (MBOICO)

In the previously studied MBO algorithm, all the butterflies generated will move automatically to the next generation irrespective of their fitness value [81], that the predicted outcomes will reduce if the fitness interest is religious. In order to allow butterflies with better fitness values to only cross the next stage, the MBO Search was updated using a cross-operator by updating the migration operator and updating the butterfly modification operator in this paper.

3.1.1 Updating migration operator

To ensure that only butterflies with better fitness values arrive for optimum precision in the next generation, the first generation migration coefficient modification is used according to the formula below. [82]:

$$ {X}_{i, new}^{t+1}=\kern0.5em \left\{\begin{array}{c}\kern0.5em {X}_{i,}^{t+1},\kern1.5em f\left({X}_{i,}^{t+1}\ \right)<f\left({X}_{i,}^t\ \right)\\ {}{X}_{i,}^t\kern1.25em ,\kern4.75em otherwise\end{array}\ \right\} $$
(10)

Where the \( {X}_{i, new}^{t+1} \) is the update of the first generation.\( f\left({X}_{i,}^{t+1}\ \right) \) and \( f\left({X}_{i,}^t\ \right) \) are the fitness values for butterflies \( \kern0.5em {X}_{i,}^{t+1} \) and \( {X}_{i,}^t \) respectivly.

figure d

3.1.2 Updating butterfly adjusting operator

In the optimization process, the self-adaptive crossover operator with greedy strategy is used to gain the maximum value of first-generation knowledge about the butterfly population. In the butterfly adjusting process, a new, powerful form of crossover operator developed. The updated version will formulate as equation as follows:

$$ {X}_{j2}^{t+1}={X}_{j1}^{t+1}\kern0.5em \mathrm{X}\ \left(1-\mathrm{Cr}\right)+{X}_j^t\kern0.5em \mathrm{X}\ \mathrm{Cr} $$
(11)

Where \( {X}_{j2}^{t+1} \)is additional afresh-generated butterfly individual using \( {X}_{j1}^{t+1} \) and \( {X}_j^t. \) And Cr represents the crossover rate, and a self-adaptive process is used to change it as the following equation:

$$ \mathrm{Cr}=0.8+0.2\kern0.75em \mathrm{X}\kern0.75em \frac{f\left({X}_j^t\ \right)-f\left({X}_{best}\ \right)}{f\left({X}_{worst}\right)-f\left({X}_{best}\right)} $$
(12)

Where\( f\left({X}_j^t\ \right) \) is fitness function of butterfly j in Sub-population 2, and Xbest and (Xworst Reflect the worst and best individual butterfly in the entire butterfly population respectively. And as shown Cr ‘s scale value varies from 0.20 to 0.80.

Now you can show the new butterfly as in the following:

$$ {X}_{j, new}^{t+1}=\kern0.5em \left\{\begin{array}{c}\kern2em {X}_{j1}^{t+1}\kern3.25em ,\kern3em f\left({X}_{j1}^{t+1}\ \right)<f\left({X}_{j2}^{t+1}\ \right)\\ {}\kern1.50em {X}_{j2}^{t+1}\kern2.5em ,\kern1.75em f\left({X}_{j2}^{t+1}\ \right)<f\left({X}_{j1}^{t+1}\ \right)\end{array}\ \right\} $$
(13)

Where \( f\left({X}_{j1}^{t+1}\ \right)\ and\ f\left({X}_{j2}^{t+1}\ \right) \) represents the fitness function of the butterfly \( {X}_{j1}^{t+1} \) and \( {X}_{j2}^{t+1} \) respectively. Algorithm 5 shows the updating butterfly adjusting operator.

figure e

The wrapper FS algorithm proposed in this paper uses a KNN classifier as an evaluator, while at the same time it uses MBOICO as a search technique. The KNN classifier specifies the precision limit for the features selected by MBOICO. Fig. 1 illustrates the steps involved in the MBOICO technique when using wrapper-based FS.

Fig. 1
figure 1

MBOICO with wrapper FS technique

3.2 MBO with Lévy flight method (MBOLF)

In this study, another method was suggested to change the MBO search by developing a levey-flight mutation operator dependent mutation operator.

Animals and insects usually hunt for food randomly or semi-randomly, and have many delights in achieving their food target [83]. The next pathway in the hunt depends on the animal’s current position and so on, and the path these species may select depends on a mathematically designed model [84].

Recent research by Reynolds and Frye [85] have shown that many individuals explore nature on a sequence of normal pathways punctuated by a sudden 90o turn, leading to a discontinuous scale-free search pattern in the Lévy flight model. Only human conduct and light can be associated with flights to Lévy [86, 87].

Lévy flight is one of the strategies employed to increase the algorithm’s convergence process and keep away from collapsing into local optima. This approach is first introduced by Lévy and Borel in 1954 [88]. In this process, the random walk ‘s phase length is extracted from a power law distribution with a long tail, so it is called Lévy distribution, and the new population is created near the current best solution, so it can speed up local search and avoid falling into local optimum [89].

Lévy flight reflects a random walk technique and its phase duration is calculated by the distribution of Lévy, which it considers to be a fast trailed distribution with a long path. It makes it easy to use in complicated exploration search cases and allows to escape from ideal local regions [90].

The random number generation in Lévy flight consists of two simple stages, the random choice of direction and the generation of stages that follow the Lévy chosen [91].

For similar random variables, the Lévy distribution can be described as the sum of variable numbers, and the Fourier transformation is defined as the following equation:

$$ \mathrm{F}\left(\mathrm{k}\right)=\exp\ \left[-\upalpha\ {\left|\mathrm{k}\right|}^{\upbeta}\ \right]\kern0.5em ,0<\upbeta \kern0.5em \le 2 $$
(14)

Where α is a scale parameter within [−1, 1] interval and β is a Lévy index for random walk, and the step length S can be calculated by Mantegna’s algorithm as:

$$ \mathrm{S}=\frac{\upmu}{{\left|\mathrm{v}\right|}^{\frac{1}{\mathrm{b}}}} $$
(15)

Where μ and v are drawn from normal distributions. Which is μ ∼N(0, σμ2) and v ∼N(0, σv2). And σμ can be determined by the following equation:

$$ {\upsigma}_{\upmu}=\left\{{\frac{r\left(1+b\right)\sin \left(\ \frac{\pi b}{2}\right)}{r\left[\ \left(1+b\right)/2\kern0.5em \right]\mathrm{b}{2}^{\left(\mathrm{b}-1\right)/2}}}^{1/b}\right\} $$
(16)

After that, the step size is calculated by:

$$ \mathrm{Stepsize}=0.01\ \mathrm{X}\ \mathrm{S} $$
(17)

Where the 0.01 reflects the element that comes from the fact that L/100 would be the standard stage size of walks, and L is the traditional duration scale; otherwise, the hunt for Lévy flights may be too intense to allow new approaches outside the quest domain.

The velocity is modified using Lévy flight method to improve the efficiency of MBO quest. Initially initialize the full generation, the NP monarch butterfly populations randomly, peri, p, BAR, and Smax, close to simple MBO. Through using Lévy flight method to upgrade the MBO rpm, each butterfly takes a long leap toward improving butterfly diversity to improve MBO ‘s global quest space exploration.In Lévy flight method parameter takes major role in distribution. By applying different values for β, the random distribution is changed differently. In this research, A constant value for β is chosen; β = 1.5.

Through integrating the benefits of random walk into the MBO, the butterfly positions in each iteration are strengthened. Algorithm 6 below shows the MBOLF algorithm pseudocode while Fig. 2 shows the MBOLF algorithm flowchart.

Fig. 2
figure 2

The MBOLF Flowchart

figure f

The suggested wrapper FS algorithm in this paper uses a KNN classifier as an evaluator and, at the same time, it uses the MBOLF as a search technique. The KNN classifier specifies the accuracy limit for the MBOLF chosen functions. Fig. 2 illustrates the procedures associated in using a wrapper FS strategy to implement the MBOLF.

Which is well known, the selection process is depends on two variables, namely the degree of precision and the numbers of selected features. To achieve a balance between these two variables, a fitness method for estimating the output was created. Eq. (9) reflects criteria for fitness function used in this paper:

$$ \mathrm{Fitness}={\alpha \gamma}_R\left(\mathrm{D}\right)+\upbeta \frac{\left|R\right|}{\left|N\right|} $$
(9)

where D is the rate of error classification obtained through KNN, R is the number of selected features, C is the number of features in the original data sets and β and α are the two parameters that denote the position of subset length and the classification quality, respectively, where α [0, 1] and β = (1 – α) [67].

The level of complexity of the proposed method is dependent on the crossover operators, their application, the distribution of the individuals and the population, and the fitness function. In view of the normal choices (point mutation, point convergence, roulette wheel selection), the time complexity is O(g(nm + nm + n)), where mg is the number of generations, n is the population size and m is the individual size. So, the complexity is O (gnm) [92, 93].

The wrapper process, as stated earlier, embeds the classifier into the assessment phase to identify the proper features that need to be selected. This approach can produce results that are more precise in terms of choosing the correct features to match the purpose of the analysis, but, at the same time it increases the amount of time taken to reach a solution. The wrapper approach only tests O ((S + 1) N) candidate feature subsets to eventually choose S features and in the worst case O(N2) feature subsets [92].

4 Experiments

The wrapper FS model is integrated into the proposed methods using the assessment phase KNN classifier, where K = 5 is used to evaluate the accuracy of the selected features that MBO generates during the generation cycle. 25 UCI data warehouse comparison datasets were implemented to check the reliability of the proposed approach [46, 67, 68, 70]. Table 1 Shows the number of functions and objects within each of the data sets used.

Table 1 Dataset Details

Each dataset was split into a K-fold cross-validation training and testing data set, where K-1 folds were used for training and testing, and the remaining fold was used for testing.

The folds are chosen in such a way that each fold comprises approximately the same number of class marks. Considering the number of folds and repeats, KNN produces train/test splits such that the models of the same splits can be tested. The splits are provided as an ARFF file with the row ID, fold number, repeat number and class (TRAIN or TEST) as part of the job summary. The uploaded predictions are labelled with the test instance fold and repeat number, so that the effects can be correctly measured and aggregated. Instead KNN stores the results per fold/repeat as well as the aggregated points.

In the experiments the input parameters were set due to the results of some initial tests which allowed the proposed method to produce better performance. Table 2 displays the configurations for input parameters (Fig. 3).

Table 2 Parameters Setting
Fig. 3
figure 3

MBOLF with wrapper FS technique

4.1 The results and discussions

Three outcomes were taken into account in order to calculate the efficacy of the proposed methods: accuracy, number of selected features (selection size), and convergence speed. Table 3 displays the average accuracy and sample size performance of the two approaches suggested after 30 runs on each dataset.

Table 3 MBOICO and MBOLF average accuracy and selection size

MBOICO has obtained very good performance in terms of accuracy and number of selected features, as seen in Table 3. The average accuracy of this method in all datasets was 92%, which in the field of solving FS problems is known to be a very high rate, and the importance of such results is apparent as compared with the findings of other approaches in the literature discussed in the next section. The quality of the MBOICO’s collected results is due to the crossover technique improved in MBO’s quest approach, which allows the system to produce the best results possible while avoiding slipping into local minima.

MBOICO outperformed the MBOLF method in 14 datasets: CongressEW, Credit, Exactly2, Derm, IonosphereEW, KrvskpEW, Lung, M-of-n, SpectEW, Tic-tac-toe, Vote, WQ, WineEW and zoo. It produced the same results in terms of classification accuracy in six data sets: Breastcancer, Exactly, Derm2, Led, Mushroom and SonarEW.

MBOLF also achieved high classification accuracy with an average of 91% in all data sets. It outperformed the MBOICO in five data sets: BreastEW, HeartEW, Lymphography, PenglungEW and WaveformEW. These high results of MBOLF obtained through using the mutation technique which made by Lévy flight.

As regards the number of features selected, the findings of the two approaches suggested have been very close; MBOICO was the best in this term with an average of 12.04 selected features in all datasets, followed by MBOLF with an average of 12.40 in all datasets.

One of the goals of this work was to speed up optimization by reducing randomization and the exploration process by discovering the optimum solution in a shorter amount of time. From Fig. 4, we can note the MBOICO’s success in identifying optimal solutions for 25 data sets within the context of accelerated convergence.

Fig. 4
figure 4figure 4

The convergence speed of all methods

Based on the results depicted in Fig. 4, the MBOLF method was the fastest algorithm in terms of finding better results within the search space. In all the datasets, it needed fewer than six iterations to improve the solution, except for the zoo dataset for which it needed 15 iterations to find the best solution. This confirms that the use of Lévy flight accelerates the convergence speed.

As for MBOICO method, although it achieved the best classification accuracy in most datasets, it could not improve the convergence speed to the same extent as MBOLF.

The basic MBO produced the worst results in terms of convergence speed, but this was expected because it was not modified in order to address the issue of speed, unlike MBOICO and MBOLF.

The above outcome is due to MBOICO and MBOLF achieving a good balance between local and global search in finding solutions, which prevents the search process from falling into local optima. The results also indicate that MBOICO and MBOLF start from a good initial state. Table 4 shows the number of iterations needed by each of the modified methods to get a better solution than the other method in the run time in all datasets.

Table 4 The iteration number needed to achieve better solutions

Furthermore, the two methods, MBOICO and MBOLF, were both subjected to a T-test to assess the accuracy of all obtained data sets from each approach. The interval of importance was set at 95% (α = 0.05). This research was undertaken to determine whether MBOICO’s result was substantially different from the MBOLF’s. Table 5 shows the findings for the MBOICO and the MBOLF, as well as the consistency P-values of the T-test. From the table it can be seen that the reliability of the MBOICO is very significant to MBA, as all P-values are less than 0.03.

Table 5 P-values of the T-test

4.2 Comparison of MBOICO with methods in the literature

MBOICO was used in this section for compatibility with approaches in the literature. This is preferred because it has the high accuracy of classification compared with the MBOLF system. The findings of the proposed MBOICO system were compared to the results of the FS methods, namely MBO, MBA-SA, WOASAT, BGSA and HGSA [46, 67, 68, 70] . All comparisons with the approaches in the literature were made using the same datasets and parameters as in those strategies. The distinction between the suggested MBOICO approach and the other methods was based on the following three criteria: accuracy of classification, FS size and fitness value. The MBOICO’s classification accuracy and FS size were determined by taking the average accuracy and the average number of selected features, and measuring the mean, max and min fitness values. The fitness function and the accuracy were determined using Eq. (9).

The MBOICO method was initially compared with the MBA-SA method as both methods were applied to the same 25 datasets during the experimental phase to measure their efficiency. According to Table 6, MBOCIO surpassed MBA-SA in 10 datasets: BreastEW, CongressEW, Credit, Exactly, Derm2, KrvskpEW, SonarEW, Tic-tac-toe, WaveformEW and WQ. Also, it produced the same results in nine datasets: Breastcancer, Derm, HeartEW, Lymphography, M-of-n, Mushroom, PenglungEW, WineEW and zoo. On the other hand, MBA-SA outperformed MBOICO in six datasets: Exactly2, IonosphereEW, Led, Lung, SpectEW and Vote. Overall, the average accuracy of MBOICO was better than MBA-SA across all 25 datasets, at 92% versus 91%.

Table 6 Comparison of MBOICO and MBA-SA by accuracy and feature selection size

As for the size of the selected features, the superiority of the MBOICO approach is evident. It was able to surpass the MBA-SA in 18 datasets. It also had a better average size of 12.02 features across all 25 datasets as opposed to an average of 15.97 for MBA-SA. The better overall performance of MBOICO is related to the modification which was applied to the global search step to improve its ability to accurately select the most relevant features before progressing to local search.

Figure 5 shows the average of accuracy rate and the selection size of MBOICO and MBA-SA in all datasets used.

Fig. 5
figure 5

Average accuracy and average feature selection size of MBOICO and MBA-SA

To confirm the reliability of the proposed method MBOICO and its capacity to provide such a high level of classification accuracy whilst at the same time effectively reducing the number features, its performance was also compared with that of four approaches in the literature, namely, MBO, WOASAT, BGSA and HGSA across 18 datasets that were used to test these methods in the original works. As seen in Table 7, and in contrast to WOASAT’s test, MBOICO has demonstrated good success since it has received stronger outcomes Moreover, in eight datasets it had the same outcomes, including Breastcancer, CongressEW, Exactly, KrvskpEW, M-of-n, SonarEW, Ballot, and WineEW. There were only three datasets where WOASAT reached MBOICO with an approximate very slight gap of 1%. These datasets were BreastEW, IonosphereEW and SpectEW. As regards the comparison with the results of BGSA, MBOICO obtained better results in nine datasets, namely, CongressEW, Exactly2, IonosphereEW, KrvskpEW, Lymphography, M-of-n, PenglungEW, SonarEW and Tic-tac-toe. However, in seven datasets, including Breastcancer, BreastEW, Exactly, HeartEW, Vote, WineEW and Zoo, MBOICO recorded the same findings as BGSA. There were only two datasets where BGSA surpassed MBOICO by an average very slight gap of 2.5%; these were SpectEW and WaveformEW. As for the contrast with HGSA tests, MBOICO obtained stronger performance in eight datasets, including CongressEW, HeartEW, IonosphereEW, KrvskpEW, Lymphography, SonarEW, Tic-tac-toe and Zoo. Moreover, MBOICO provided the same findings as HGSA in eight datasets; namely, Breast cancer, BreastEW, Accurately, Exactly2, M-of-n, PenglungEW, Test, and WineEW. There were only two datasets where HGSA exceeded MBOICO with an average discrepancy of 4%; these were SpectEW and WaveformEW datasets. Ultimately, MBOICO beat all the other approaches across the 18 datasets, reaching an overall classification accuracy of 93%.

Table 7 Comparison of Accuracy of MBOICO and Methods in the Literature

Figure 6 below also illustrates the superiority of the proposed approach MBOICO compared to the four methods in the literature, based on the average classification accuracy across all 18 datasets.

Fig. 6
figure 6

Classification accuracy of all methods

Table 8 demonstrate the simple dominance of MBOICO in terms of reducing the number of features as comparing with the other algorithms. It surpassed the MBO in 11 datasets, and also exceeded WOASAT in 12 datasets. As well as MBOICO outperformed BGSA in 13 datasets and HGSA in eight datasets, and in a further eight datasets recorded the same findings as HGSA.It can be noted, the experiments on the 18 datasets showed that the MBOICO achieved a higher selection size rate than all the other methods expect HGSA, and it was 13 features in average.

Table 8 Selection Size Rate by All Methods

Figure 5 also demonstrates the superiority of the proposed approach over all previous methods except for HGSA.

This work also examined the results during the run cycle of the proposed algorithm to determine the best, bad and average fitness values for all the data sets evaluated. Table 9 displays the best fitness values collected for each sample during 30 MBOICO tests, and contrasts the results with those of the literature’s four processes. MBOICO and WOASAT have the highest fitness with a mean value of 0.03, as can be seen from the table. The lowest fitness metrics for both strategies are displayed in Table 10. This can be shown that for all datasets MBA-SA obtained the worst overall performance, followed respectively by the MBO, MBOICO and WOASAT. As for the average fitness values, Table 11 reveals that of all the approaches, MBOICO and HGSA had the highest average fitness (Fig. 7).

Table 9 Best Fitness Values Produced by All Methods
Table 10 Worst Fitness Values Produced by All Methods
Table 11 Average Fitness Values Produced by All Methods
Fig. 7
figure 7

Number of selected features by all methods

From the above analyses, the MBOICO achieved a high level of performance in terms of accurately selecting the best features and in selecting a relatively small number of features (selection size) as compared to the other tested algorithms. The balance that it achieved between global and local search was clearly demonstrated by the fast convergence speed. This shows that the proposed method has great potential for use in finding integrated and ideal solutions for applications that depend on a fast search process. In this paper, the global search was updated to improve its productivity by finding the right points from the start and before going on to local search. Thus this methodological approach differs from other approaches in the literature because it reveals the significance of global search in solving FS problems.

5 Conclusion

Feature selection is a complicated task in the fields of data mining and machine learning. Researchers have been struggling for some years to find the best way to identify the lowest total number of features so that the initial dataset can be represented as precisely and as easily as possible. Recent studies have focused on using metaheuristic algorithms to select the required features. One of these algorithms is the MBO algorithm, which was subjected to two modifications in this study: 1) the addition of an improved crossover operator (MBOICO) and 2) the incorporation of Lévy flight (MBOLF). These two modified methods were then applied with a wrapper FS system and KNN classifier to 25 benchmark data sets. In the development and evaluation phases, the two approaches used the KNN classifier. The two modified methods suggested were also tested on 18 other data sets in order to compare their results with those of four models in the literature (BGSA, HGSA WOASAT, MBO) and MBA-SA. The findings demonstrated the MBOICO’s dominance in terms of accuracy and number of selected features, where it achieved an overall accuracy rate of 93% across 25 datasets, which can be regarded as a quite clear victory in this field of study. However, there is still room for further improvement. Therefore, in future works, researchers may wish to employ an SVM instead of KNN, and they may also wish to hybridize the MBO algorithm with other metaheuristic algorithms as well as another classifier. Furthermore, the modified MBO has the potential to be applied as an FS search technique in other fields such as the internet of things, image segmentation and sentiment analysis among others.