1 Introduction

Nowadays, there are huge amounts of data and resources in various fields which make the digital processing of raw data becomes a challenging issue. These data should be kept safe from damage or loss. So, it can be stored and arranged in physical disks in folders or datasets; these datasets may include a large number of attributes and features. However, not all of these features are important and may be irrelevant or redundant, which may adversely affect the processing accuracy and increase the computational time due to the large search space [1]. Therefore, when researchers want to manipulate these datasets, the best practice is to apply feature selection to choose the significant, best, important, and optimal subset of them [2]. Also, the irrelevant or redundant features are removed, which leads to improving the efficiency, achieving better accuracy, and reducing data complexity [1].

The searching for the optimal solution in a large search space is NP-hard problem [3] and is considered as a multi-objective problem (minimize the number of the selected feature and maximize the classification accuracy) [4]. Therefore, there are several models of feature selection considered as an optimization problem to avoid an expensive computational time and stagnation in local optima. Evolutionary computation and swarm intelligence methods are effective techniques used in this problem; the genetic algorithms (GA) are an example of evolutionary computation [5]. The particle swarm optimization (PSO) [6], artificial bee colony (ABC) [7], grey wolf optimization (GWO) [8], ant colony optimization (ACO) [9], and ant lion optimizer (ALO) [10] are examples of swarm intelligence.

These swarm intelligence algorithms are computational intelligence-based methods, which are made up of a population of artificial agents and inspired by the social behavior of animals in the real world. Most of these algorithms are applied in feature selection problem. In [1] a binary ABC algorithm is introduced for choosing optimal features and tested with ten benchmark datasets. This algorithm achieved the best classification performance in almost all cases as well as outperformed other methods such as GA and PSO, whereas [11] proposed a new algorithm based on binary bare bones PSO algorithm and kNN that proved best average classification accuracies for 88% of the experiments’ datasets. In addition, [12] introduced a modified cuckoo search algorithm with rough sets for feature selection, and also, the authors in [13] proposed a binary ALO approach to select the optimal feature subset in order to maximize the classification accuracy and minimize the number of selected features. This approach was compared with three well-known optimization algorithms, namely PSO, GA, and binary bat algorithm; the results showed better accuracy than other algorithms. As well as, social spider algorithm [14], GWO [15], binary bat algorithm [16], and many other methods such as [17,18,19] provided good exploitation and exploration in general optimization problems, especially when they have been applied in feature selection problems; however, their accuracy, time consumption, and finding global optimum still require more efforts to improve them.

In this direction, there are various studies provided many tries to overcome these drawbacks. One of the most effective tries is improving optimization methods by adding chaotic sequences instead of random sequences; this technique proved its performance in escaping from local minima than other stochastic methods [20]. In [21], chaotic krill herd (CKH) was introduced for solving optimization problems; the Singer map is used to adjust the three main movements of the krill by regulating the KH’s inertia weights. The result of CKH showed better performance than basic KH and other robust optimization approaches, whereas [22] improved fruit fly optimization algorithm (FOA) by introduced a new parameter (alpha) to generate food sources. This parameter was integrated with chaotic maps to produce chaotic FOA (CFOA) and tested in 14 well-known test functions. The Chebyshev map proved best performance and CFOA showed fast convergence rate and good ability to find the global optimum. In addition, there are many other studies utilized chaotic to solve optimization problem and proved effectiveness against standard methods, such as [23,24,25,26], and [27].

In the other hand, chaotic maps are applied in feature selection problems; in this trend, [4] proposed a chaotic ALO (CALO) by adapted the parameter which is used to improve the trade-off between exploration and exploitation. The performance of CALO was better than ALO, PSO, and GA. In the same efforts, [28] provided chaotic binary PSO (CBPSO), and it used two chaotic maps, namely logistic and Tent to set the inertia weight of BPSO. The results showed that the BPSO with tent map achieved higher accuracy than logistic map. Improved chaotic genetic algorithm (ICGA) was introduced by Li et al. [29]. It used the tent map to generate the initial population and logistic map in mutation operation. The results showed better performance than other methods.

Therefore, the ability of chaotic is that it can help optimization methods by overcoming many drawbacks, such as trapping in local minima, slow searching, premature convergence, and time consuming; all these are motivating us to utilize chaotic maps to improve MVO and maintain the population diversity in the problem of interest. In this paper, we propose a new feature selection algorithm, which combines chaotic maps with MVO. The main advantages of this improving are to maximize the classification accuracy and minimize the size of the selected feature. The rest of this paper is arranged as follows: Section 2 provides the concept of MVO algorithm. Chaotic maps are introduced in Sect. 3. The proposed algorithm is given in Sect. 4. Section 5 discusses the experiments results. The conclusion of this paper and future work are given in the last section.

2 Materials and methods

2.1 Multi-verse optimizer (MVO) algorithm

Multi-verse optimizer (MVO) is a new nature-inspired algorithm that emulates the theory of multi-verse in physics, and it simulates the interaction between universes. It was introduced by Mirjalili to deal with various optimization problems [30].

2.1.1 Inspiration

Following [30], the authors mentioned that in the multi-verse theory states, there is more than one big bang that causes the generation of a universe. The concept of multi-verse points to the existence of other universes in addition to our universe [31], and this is opposite of universe. These universes can be interact and/or collide with each other based on the multi-verse theory. There are three concepts in the multi-verse theory, namely white holes, black holes, and wormholes. A white hole (also called the big bang) is created when the collisions between parallel universes occur; therefore, it is considered as the main component for the birth of a universe, and in our universe, these holes are not seen. Unlike a white whole, black holes (have been observed frequently) attract everything including light beams with their extremely high gravitational force [32]. Finally, the wormholes connect different parts of a universe together, and they act as time/space travel tunnels where objects are able to travel instantly between any corners of a universe (or even from one universe to another). The objects are allowed to move between different universes through white/black hole tunnels. When a white/black tunnel is established between two universes, the universe with higher inflation rate is considered to have a white hole, whereas the universe with less inflation rate is assumed to own black holes. The objects are then transferred from the white holes of the source universe to black holes of the destination universe.

2.1.2 Mechanism

In this section, MVO algorithm is illustrated based on the concept of a white hole and black hole in order to explore search spaces and the wormholes are used to improve the quality of MVO in exploiting the search spaces.

In MVO algorithm [30], white holes tend to transmit objects to other universes, whereas black hole tend to receive these objects. So, over the iterations, the inflation rates of all universes are enhanced. Wormholes helps in maintaining the diversity of universes to improve the exploration phase of the search space and the exploitation phase, and prevent getting trapped in local optima. So, the MVO algorithm starts with generating random universes, in every iteration, objects use the white/black holes to transferee from the universe with high inflation rates to other universe with low inflation rates. In addition, the objects in any universe are moved by random teleportations via wormholes toward the best universe. These processes are repeated until the end criteria is satisfied.

Therefore, there are three rules are implemented in MVO algorithm: (1) If the inflation rate is high, this increases the probability of having a white hole and decreases the probability of having a black hole. (2) The universes send objects from a white hole and receive them from a black hole, according to their inflation rate. (3) The universes’ objects may be updated by the objects of the universe, which has the best inflation rate by wormholes.

In MVO algorithm [30], each universe represents a solution, and each object in the universe is a variable in the solution. Each universe also, has an inflation rate, which is proportional to the corresponding fitness function value of the solution.

The mathematical model of the white/black hole tunnels is represented by considering the population of universes U as [30]:

$$\begin{aligned} U=\begin{bmatrix} x_{1}^1&x_{1}^2&\dots&x_{1}^d \\ x_{21}&x_{2}^2&\dots&x_{2}^d \\ \vdots&\vdots&\vdots&\vdots \\ x_{n}^1&x_{n}^2&\dots&x_{n}^d\\ \end{bmatrix} \end{aligned}$$
(1)

where d is the dimension of the problem (number of parameters) and N is the number of universes. These universes are sorted based on their inflation rates (fitness function values), and then the object \(x_i^j\) ( the jth parameter of ith universe) is exchanged by using the roulette wheel mechanism that selects universe \(U_k\) as in Eq. (2):

$$\begin{aligned} x^j_i\mathrm{=}{\left\{ \begin{array}{ll} x^j_k &\quad {\mathrm{if}}\quad r_1 <NI(U_i) \\ x^j_i &{} \quad {\mathrm{otherwise}} \end{array}\right. } \end{aligned}$$
(2)

where \(r_1 \in [0,1]\) is a random number, \(U_i\) is the ith universe, and NI\((U_i)\) is normalized inflation rate (fitness value) of the \(U_i\).

The less inflation rate indicates that a higher probability of sending objects through white/black hole tunnels. By supposing that each \(U_i\) has wormholes (that transport the objects of \(U_i\) through space randomly), the exploitation is performed and the diversity of universes is maintained. These wormholes are changing the objects of the universes in random form without consideration of their inflation rates. Also, the wormholes are used to update the universes’ objects and improve the rate of inflation by changing the objects of the universe which has the best inflation rate as the following:

$$\begin{aligned} x_i^j\mathrm{=} {\left\{ \begin{array}{ll} {\left\{ \begin{array}{ll} {X_j^b+{\mathrm{TDR}}\times \left( \left( ub_j-lb_j\right) \times r_4+lb_j\right) } &{} r_3\mathrm{<}0.5 \\ {X_j^b-{\mathrm{TDR}}\times \left( \left( ub_j-lb_j\right) \times r_4+lb_j\right) } &{} r_3\mathrm{\ge }0.5 \end{array}\right. } &{} r_2<{\mathrm{WEP}} \\ x_i^j &{} r_2\ge {\mathrm{WEP}} \end{array}\right. } \end{aligned}$$
(3)

where \(X_j^b\) shows the jth parameter in the best solution, \(lb_j\) and \(ub_j\) indicate the lower and the upper bounds respectively in jth variable, \(r_2, r_3,\) and \(r_4\) are random numbers in [0, 1].

The traveling distance rate (TDR) is a coefficient that is applied to determine the distance to allow a wormhole move the object to the best universe, and it is defined as:

$$\begin{aligned} {\mathrm{TDR}}=1-\frac{t^{1/p}}{T^{1/p}} \end{aligned}$$
(4)

where t equals the current iteration, T indicates the length of iterations, and p equals 6 as a default value and determines the exploitation accuracy over the iterations.

The wormhole existence probability (WEP) is used to increase linearly over the repetition to emphasize exploitation and the WEP is defined as:

$$\begin{aligned} \mathrm{WEP}={\mathrm{WEP}}_{\mathrm{min}}+t\times \left( \begin{array}{c} \dfrac{{\mathrm{WEP}}_{\mathrm{max}}-{\mathrm{WEP}}_{\mathrm{min}}}{T} \end{array} \right) \end{aligned}$$
(5)

where \({\mathrm{WEP}}_{\mathrm{min}}\) equals 0.2 as a default value, \({\mathrm{WEP}}_{\mathrm{max}}\) equals 1 as a default value.

The computational complexity of the MVO algorithm is calculated based on the number of iterations and universes as well the mechanism of roulette wheel and sorting. The sorting algorithm has the complexity of \(O(n \ {\mathrm{log}} \ n)\) in the best case and \(O(n^2)\) in the worst case. And the roulette wheel selection is of O(n) or \(O({\mathrm{log}} \ n)\) depending on the implementation. The following equations show the overall computational complexity [30]:

$$\begin{aligned} O({\mathrm{MVO}}) = O\left( l \left( n^2 + n \times d \times {\mathrm{log}} \ n\right) \right) \end{aligned}$$
(6)

where n is the universes’ number, l is the maximum iterations’ number, and d is the objects’ number.

2.2 Chaotic maps

Chaotic methods have important properties such as ergodicity, stochastically intrinsic, and showing irregular conduct as well as sensitive dependence on the initial conditions [4, 33]. These properties have been translated to various equations which are called “chaotic maps” to be applicable for using in computational applications such as optimization problem. So, using these maps to update random variables in optimization methods is called chaotic optimization algorithm (COA) [21]. This change makes optimization methods inherit the strength of chaos such as the ergodic and non-repetition; so, it can escape from local optima and attain a high-speed searches than random search.

In this paper, we use one-dimensional, non-invertible maps to create a set of chaotic values, to adjust MVO parameters. In Table 1 there are five chaotic maps which are used in the experiments.

Table 1 Five different chaotic maps used in this study

3 The proposed chaotic multi-verse optimizer-based feature selection

In this section, the proposed algorithm for feature selection in wrapper mode is illustrated, which the chaotic theory is combined with the standard MVO algorithm. The chaotic theory has an important feature that its ability to change the initial values for the data which makes different in system behaviors. As in the standard MVO algorithm, Eq. (3) is the main equation to improve the inflation rate using wormholes, and the parameter \(r_4\) is an important parameter that affects in updating position in the exploration phase. Therefore, the tuning of this parameter using chaotic maps plays an important role to improve the MVO mechanism to perform exploitation and avoid local optima. The proposed algorithm (called CMVO) adapts this parameter in each iteration of the optimization process. In Fig. 1 and Algorithm 1 the proposed algorithm is illustrated.

Fig. 1
figure 1

The proposed algorithm (CMVO)

The CMVO starts with constructing the chaotic map (using one from Table 1), then generating a random population of size N and dimension d, in which each universe (solution) represents a combination of features. The object in universe is represented as binary value using the following formula:

$$\begin{aligned} x^j_i\mathrm{=}\left\{ \begin{array}{l} \mathrm{1\ \ \ \ \ }{\mathrm{if}}{\quad}x^j_i\mathrm{>}\sigma \\ \mathrm{0\ \ \ }{\mathrm{otherwise}} \end{array} \right. \end{aligned}$$
(7)

where \(\sigma \in [0,1]\) is a random value, the fitness function is taken into concentration the accuracy (KNN classifier is used) and the number of selected feature. The fitness function must maximize the classification accuracy and minimize the selected features; therefore, it defined as:

$$\begin{aligned} {\text{Fitness}\, \text{function}} = \gamma \times \frac{N_C}{N} + \beta \times \left( 1 -\frac{d_s}{d}\right) \end{aligned}$$
(8)

where \(\gamma\) and \(\beta\) are the weighted factors which have value in [0, 1] to balance between the minimization the number of features and maximization the accuracy of classification and \(d_s\) is the number of selected features. d is the total number of features. \(N_C\) is the correct number of classified instances. \(\frac{N_C}{N}\) is the classification accuracy of kNN classifier; it is evaluated after splitting the dataset into training and testing sets using fivefold cross-validation method, where the algorithm runs five times, and at each run, the dataset is split into five classes. One of them is chosen to represent the testing set, and the four classes are used to represent the training set. The accuracy at each run is calculated, and then the average of accuracy for five runs is computed, which represents the final output.

After computing the fitness function for each solution (inflation rate), the solutions are sorted (using Quick sorting algorithm) based on their fitness values. Then the white/black holes model (lines 14–20 in Algorithm 1) is performed using the roulette wheel selection mechanism. The wormhole model is computed (lines 21–32), where the value of \(r_4\) has been selected from the chaotic map. Then the global best function and its corresponding global best solution are computed as in lines 10–11. The previous steps are repeated until stop conditions are satisfied (maximum number of iterations).

figure a

4 Experiments and discussion

A comparative analysis has been done between the following algorithms, MVO based on tent chaotic map, MVO based on logistic chaotic map, MVO based on Singer chaotic map, MVO based on sinusoidal chaotic map, MVO based on piecewise chaotic map, the standard MVO, PSO, and ABC. In the rest of this paper, when the name of chaotic map is mentioned separately, it refers to the CMVO based on this map.

Each algorithm has been applied 10 times with random positioning of the search agents. The parameters settings for all algorithms are as follows, numbers of search agents are 25, max iteration is 100, problem dimension is d, and search domain is [01]; as well as PSO: \({{\hbox {Inertial weight}}} = 1\), \({{\hbox {Inertia weight damping ratio}}} = 0.9\), \({{\text{ Personal } \text{ learning } \text{ coefficient }}} = 1.5\), and \({{\hbox {Global learning coefficient}}} = 2.0\). For ABC, \({{\hbox {a number of limit trials}}}= 5\), and in MOV and CMVO algorithms the \({\mathrm{WEP}}_{\mathrm{min}}=0.2\) and \({\mathrm{WEP}}_{\mathrm{{max}}}=1\).

4.1 Datasets

The datasets are taken from the UCI data repository [34]. Five datasets are used to validate the performance of the proposed algorithm. Fivefold cross-validation is applied to split a whole dataset into labeled training set (80% sample) and unlabeled testing set (20% sample). Table 2 summarizes these datasets in details.

Table 2 The datasets used in this study

4.2 Performance metrics

To compute the performance of the algorithms, four classifiers have been tested and evaluated including Random Forest (RF), J48 decision tree (J48), Kstar, and logistic model tree (LMT).

Also, performance of all algorithms has been evaluated by using different measures of performance, namely accuracy, precision, sensitivity, specificity, NPV, and F-measure.

4.2.1 Classification accuracy

The classification accuracy for the experiment is defined as

$$\begin{aligned} {\mathrm{Accuracy}}=\frac{{\mathrm{TP}}+ {\mathrm{TN}}}{{\mathrm{TP}} + {\mathrm{FP}} + {\mathrm{FN}} + {\mathrm{TN}}}\times 100 \end{aligned}$$
(9)

where TP, TN, FP, and FN are represented the true positive, true negative, false positive, and false negative, respectively.

4.2.2 Sensitivity and specificity

Sensitivity measures the proportion of actual positives which are correctly identified (also called recall).

$$\begin{aligned} {\mathrm{Senstivity}} =\frac{{\mathrm{TP}}}{{\mathrm{TP}} + {\mathrm{FN}}}\times 100 \% \end{aligned}$$
(10)

Specificity measures the proportion of negatives which are correctly identified.

$$\begin{aligned} {\mathrm{Specificity}}=\frac{{\mathrm{TN}}}{{\mathrm{FP}} + {\mathrm{TN}}}\times 100 \% \end{aligned}$$
(11)

4.2.3 Negative predictive value

The negative predictive value (NPV) is the probability that gives a negative result and it is defined as the proportion of subjects with a negative test result who are correctly classified.

$$\begin{aligned} {\mathrm{NPV}} =\frac{{\mathrm{TN}}}{{\mathrm{TN}} + {\mathrm{FN}}}\times 100 \% \end{aligned}$$
(12)

A high NPV for a given test means that when the test yields a negative result, it is most likely correct in its assessment.

4.2.4 F-measure

The F-measure (also, called F-score) is the harmonic mean of both measures recall and precision. It is defined as:

$$\begin{aligned} F{\text{-}}{\mathrm{measure}} =\frac{2\left( {\mathrm{precision}} + {\mathrm{recall}}\right) }{{\mathrm{precision}} \times {\mathrm{recall}}} \end{aligned}$$
(13)

4.3 Results and discussion

The results of the CMVO algorithm based on different five maps compared to other swarm algorithms are given in Tables 3, 4, 5, 6, 7, 8, 9 and 10 and Figs. 2, 3, 4 and 5 (All in tables and figures points to the classification based all features). Table 3 illustrates the number of selected features using different algorithms, and Tables 4, 5, 6, 7, 8 and 9 demonstrate the performance of different classifiers for each dataset based on the features selected.

Table 3 The number of selected features using eight algorithms

Tables 4 and 5 as well as Figs. 2 and 3 show the average of classification rate and the average of precision, sensitivity, specificity, F-measure, and NPV measures for Wisconsin dataset employing swarm algorithms and CMVO algorithm with different classifiers. (Note in the following we will refer to these measures precision, sensitivity, specificity, NPV, and F-measure by PSSNF measures).

The accuracy of piecewise algorithm is the best (97%), followed by the Singer and logistic which have the same accuracy like if all features are used (94%). The MVO and ABC algorithms are (same accuracy 91%) best than the rest of algorithms (sinusoidal is 90%, PSO is 90%, and Tent is 88%). Also, from Table 5 we can conclude that the higher performance is the sinusoidal map (based on term PSSNF, in general, has 94%), the ABC algorithm is the second best with accuracy 93%. Then the rest algorithms have the nearly the same performance expect the Tent algorithm is less than all algorithms.

Table 4 The accuracy of all algorithms with different classifiers that represent the correct classification rate
Fig. 2
figure 2

The average of accuracy over all classifiers for each dataset using different feature selection algorithm

Fig. 3
figure 3

The average of each PSSNF measure for Wisconsin dataset

Table 5 The PSSNF measures of selected features of Wisconsin dataset between CMVO (based on chaotic maps) and the other three algorithms

The results for the Dermatology dataset are shown in Tables 4 and 6, as well as Figs. 2 and 4. From these results, we can observe that the sinusoidal and logistic algorithms yield a better accuracy of 99% (with nine and eleven features respectively). Also, the ABC algorithm has higher performance than Piecewise, Singer, The standard MVO, PSO, and Tent algorithms(94, 94, 91, 90 and 83% respectively). From Table 6 the ABC algorithm and the standard MVO have the best results (overall measures); however, there is a small difference between them and sinusoidal (96%). Then the Singer is better than all other algorithms, namely piecewise, logistic, tent and PSO algorithms (94, 92, 85 and 93% respectively).

Table 6 The PSSNF measures of selected features of Dermatology dataset between CMVO (based on chaotic maps) and the other three algorithms
Fig. 4
figure 4

The average of each PSSNF measure for Dermatology dataset

Tables  4 and 7 as well as Figs. 2 and 5 illustrate the performance of algorithms for DNA dataset. The logistic and tent algorithms yield a better accuracy of 87% (with 95 and 88 features respectively). Also, the PSO, MVO algorithms and if all features are selected are the same results (85%), and their performance is higher than the rest of algorithms. From Table 6, the PSO algorithm has the best results (in term PSSNF) 93%, followed by the piecewise (91%). Then the tent (with accuracy 89%) is higher than sinusoidal and logistic (have nearly performance 88%). Then ABC (87%) is better than MVO and Singer (86%). Also, This table illustrates that the best algorithm in the term of F-measure is the PSO followed by the CMVO based on most chaotic maps expect singer map that less than ABC algorithm.

Table 7 The PSSNF measures of selected features of DNA dataset between CMVO (based on chaotic maps) and the other three algorithms
Fig. 5
figure 5

The average of each PSSNF measure for DNA dataset

Tables 4 and 8 as well as Figs. 2 and 6 give the results for Ion dataset. The results in these tables are established the performance of the MVO algorithm which has the better accuracy ( 98% with 15 features) followed by the sinusoidal (97% with 17 features). The performance of logistic is better than the rest of the algorithms with accuracy 96%, also the remaining algorithms have close performance expect ABC algorithm that has 86%. From Table 8 and Fig. 6, we can reach to the same results above, in which the best one is MVO the sinusoidal. These algorithms have the better performance in term of F-measure and all other performances (Precision, Sensitivity, Specificity, and NPV).

Table 8 The PSSNF measures of selected features of Ion dataset between CMVO (based on chaotic maps) and the other three algorithms
Fig. 6
figure 6

The average of each PSSNF measure for Ion dataset

The results for the Sonar dataset are given in Tables 4 and 9 as well as Figs. 2 and 7. We can conclude from these results that the PSO algorithm has the higher performance overall algorithms in all terms, then both piecewise and the logistic are in the second rank with accuracy 85%, and over the PSSNF measures, they have close values. Also, the MVO is better than the rest of algorithms (tent, sinusoidal, Singer, and ABC with accuracy 73, 75, 70, and 80%, respectively).

Table 9 The PSSNF measures of selected features of Sonar dataset between CMVO (based on chaotic maps) and the other three algorithms
Fig. 7
figure 7

The average of each PSSNF measure for Sonar dataset

Finally, from all previous results, to determine the based feature selection algorithm we compute the mean accuracy over all datasets as in Fig. 8. From this figure, we can conclude that the best algorithm is the logistic (with accuracy 92%), followed by the piecewise. The sinusoidal, the standard MVO, and PSO algorithms are have the close value of accuracy 89%; however, this accuracy value is less than if all features are used. Also, the piecewise and sinusoidal algorithms have the same accuracy 87%, followed by the ABC and tent algorithms (with accuracy 85%) that are better than Singer algorithm (83%).

Moreover, for the purpose of illustration, Fig. 9 illustrates the boxplots representing the accuracy averages of all algorithms using all datasets. It is evident from Fig. 9 that the logistic algorithm is located at the upper side of the figure, which refers to the high accuracy scores of its results than the results of other algorithms.

Fig. 8
figure 8

The accuracy over all algorithms

Fig. 9
figure 9

The boxplots of accuracy of all algorithms

4.4 ANOVA analysis

To further analysis the previous results, the analysis of variance (ANOVA) test is used. So far, we have examined the means of a set of measures using ten runs to mean of the accuracy showing that some differences can be found among them. We used the ANOVA test with post hoc LSD test to further statistically compare all the algorithms over their mean accuracies on all datasets. Here, the null hypothesis is that all algorithms are equivalent in term of accuracy rate. The ANOVA gives a statistical value called p value, if this value is smaller than the significance level (\(\alpha =0.05\)) we can say that they are significantly different and we reject the null hypothesis. The LSD test is used after the null hypothesis is rejected and this test is used to determine the difference among the algorithms.

The p value, in our experiment, was 0.028, and this value was less than \(\alpha\); therefore, we rejected the null hypothesis and used the LSD test in order to determine whether there is exists a significant difference between the logistic map and other algorithms. Table 10 shows the LSD result, and we can observe from this table that these results are consistent with Fig. 8. Also, there is no significant difference between the accuracy of logistic, the sinusoidal, and piecewise, also, no significant difference with classification using all features.

However, there is a significant difference between algorithm logistic and four algorithms, namely tent, singer, PSO, and ABC. We can conclude that the logistic is a much better feature selection algorithm.

Table 10 Comparisons using LSD test between CMVO-Logistic and all other algorithms

5 Conclusions and future works

In this paper, a novel optimization algorithm based on chaotic and multi-verse optimizer (CMVO) is proposed using five chaotic maps for global optimization of feature selection. The characteristics of the chaotic systems such as regularity and semi-stochastic are used to improve the performance of the MVO algorithm. The CMVO algorithm based on five chaotic maps is tested using five benchmark datasets collected from UCI repository, in which the performance of CMVO algorithm is compared with standard MVO and two other swarm algorithms namely, PSO and ABC. The experimental results showed that tuned MVO with chaotic maps increases the performance of classification rate with minimizing the number of selected features. From the results, we can conclude that the CMVO-based logistic map is better feature selection algorithm compared to all other algorithms; also, in general, chaotic maps increase the performance of MVO, for example, CMVO based on sinusoidal and piecewise are better than ABC and tent algorithms. In the worst results of CMVO-based Singer map (with 83%), its performance nearly equal to ABC algorithm. In future, we will improve the MVO algorithm by applying more chaotic maps to different applications.