1 Introduction

Optimization is introduced in different fields like engineering or computer sciences; this process consists in determining the best solution for a complex problem exploring a search space (Abualigah 2019; Abualigah and Hanandeh 2015; Deng et al. 2017a). Accuracy is always expected from optimization methods; however, depending on the implementation, they require more (or less) accurate solutions (Abualigah et al. 2018a, b; Zhao et al. 2019; Deng et al. 2019; Abualigah and Khader 2017; Deng et al. 2017b; Abualigah et al. 2018c). Optimization can be divided into multi-objective optimization (MO) and global optimization (GO). In MO, it is necessary to find a set of solutions for a group of objective functions; meanwhile, in GO only a single-objective function is used, and one solution is found. This paper focuses on GO, where a considerable amount of methods has been proposed inspired by different natural processes. For example, the genetic algorithms (GAs) (Goldberg 1989) that are a part of the evolutionary algorithms (EAs) or the particle swarm optimization (PSO) (Kennedy and Eberhart 1995) that defines the swarm algorithms (SAs). EAs and SAs aim to generate intelligent approaches based on behaviors from nature, in general, a set single agents is used, and these agents can communicate to each other sending information about the region of the search where they are located. This process can be divided into exploration and exploitation, and a good balance between them is reflected in the efficiency of the optimization methods. Most of the EAs and SAs are a good alternative for GO. However, only a small group of algorithms explore the intelligence of humans. Some examples are the human behavior-based optimization (HBBO) (Ahmadi 2017) that mimics the interaction of societies or the brainstorm optimization (BSO) (Shi 2011) that is based on the process of brainstorming to generate new ideas. Both HBBO and BSO are part of the swarm algorithms.

On the other hand, human beings consider the most intelligent species on the planet, as humans, we can work in groups in solving any problem. To provide solutions, we can learn, experiment, explore, think and collaborate with other humans. Also, we have the creativity and the capability to innovate. In this context, the brainstorm is a strategy for generating innovative solutions for a specific problem, in which a group of human experts from different areas interacts to find new ideas; a moderator helps to select the best ideas according to the criteria that provide a good solution. The selected ideas are now taken as a base to generate new ones. The BSO is a relative new swarm optimization algorithm that possesses different operators for imitating the brainstorming process (Shi 2011). The most important steps are the convergence and the divergence operators, they are used to find the best solution, and they also perform the exploitation and the exploration of the search space. In the convergence, the elements of the population are concentrated in a specific section of the search space; meanwhile, in the divergence, the population is distributed to explore different regions. However, the main advantage of BSO is the use of clustering in the iterative process. In this way, BSO combines the data mining techniques and swarm intelligence to create an optimization algorithm. The clustering step is used to separate the population into groups that are used to exploit prominent regions in the domain of the problem. The BSO has been used in economic dispatch (Jadhav et al. 2012), and in optimal power flow solution (Krishnanand et al. 2013). Meanwhile, another interesting application for neural networks has also been introduced in Cao et al. (2015).

BSO as other swarm or evolutionary approaches does not provide the best solutions for all the optimization problems, and the computational effort affects its performance; in this context BSO is also open to being adapted and modified (Zhan et al. 2013). Some modifications have been done in the clustering step, for example, in (Chen et al. 2015), authors proposed the affinity propagation clustering. Meanwhile, in Shi (2015), the elitist rules are proposed to separate the BSO population into two groups depending on the fitness, and this modification avoids the use of more complex clustering methods. Another typical situation in BSO is the initialization of new solutions along the iterative process. To affront such situations, there are generated new alternatives to improve the BSO, for example the dynamic adjustment of the internal parameters (Zhou et al. 2012). The initialization of new solutions using a batch-mode has also been presented by Chen et al. (2014). Such approaches increase the performance of BSO providing better results than the standard algorithm. However, based on the No-Free-Lunch (NFL), not all the optimization methods are able to provide excellent results for the same problem (Wolpert and Macready 1997). Moreover, the NFL also assumes that an optimization method cannot be enhanced without sacrifice any advantage.

Considering the above, the drawbacks that BSO possesses can be summarized in the lack of exploration that depends directly on the internal configuration of the algorithm. The configuration of the control parameters of BSO is not an easy task, and it also depends on the problem to be solved. Moreover, the exploitation is also affected by the way in which, the clusters are created. Such problems are the main motivation of this paper to propose an improved version of BSO. In this sense, there are used three methods at different stages of the algorithm, and such methods are (1) the chaotic maps, (2) the opposition-based learning and (3) disruption operator. The chaotic maps (CM) are a part of the chaos theory that is defined as an erratic properties in nonlinear systems. The CM are kind of particles traveling through a nonlinear dynamic system. Such particles do not follow any regular path, for that reason the chaotic distribution has a high level of randomness. In practice, the optimization takes advantage of the features of the CM since different implementations support the evidence that the diversity of solutions and the convergence are improved using them (Yang et al. 2007). On the other hand, the opposition-based learning (OBL) is a machine learning rule proposed by Tizhoosh (2005) for improving the search capabilities of optimization algorithms. The OBL takes the population of solutions and computes an opposite population, and then, using the objective function value the best elements between the two populations are selected. Such elements are used to create a new set of solution that possesses the best position in the search space. OBL can be used in different sections of the algorithm, depending on the implementation, it could be applied in the initialization or after a modification of the set of solutions. The OBL has demonstrated its ability to improve several SA (Cuevas et al. 2012; Abd ElAziz et al. 2017). In the same context, the disruption operator (DO) has been proposed to enhance the exploitation and the exploration ability of SA and EA. The DO is extracted from astrophysics and was introduced by Harwit (2006), where the disruption in astronomy is the sudden inward fall of a group of particles by the gravity. This phenomena occurs when all other forces are not able to provide enough pressure to counter the gravity and maintain the equilibrium (Harwit 2006). The use of DO for optimization algorithms was first proposed for enhancing the performance of the gravitational search algorithm (GSA) (Sarafrazi et al. 2011). Moreover, it attracts the attention of researchers, and its use has been extended for different approaches like the Bare-bones particle swarm optimization (Liu et al. 2014).

The algorithm proposed in this paper is called the opposition chaotic BSO with disruption (OCBSOD). In this method, the initialization is performed using a chaotic map to compute the initial solutions; then, the OBL generates the opposite positions in the search space. The best particles are selected and used in the iterative process. To update the position of the elements of the population, the DO is used with the BSO operators; then, the OBL is applied to improve the exploration ability of the search domain. The aim of this paper can be summarized in the use of different strategies to enhance the performance of BSO. The OCBSOD has been tested over a set of optimization problems with different complexity. The experimental results and the comparison with other recent methods provide the evidence of the effectiveness of the proposed approach to optimize mathematical functions. Moreover, to verify the performance of the proposed hybrid algorithm in real problems, it has been used to solve the problem of feature selection (FS). FS is a method used to extract the most representative features from a large set of data. FS is a critical step; it helps to discard redundant and irrelevant features and reduce the dimensionality of the dataset (Ewees et al. 2019; Labani et al. 2018; Tian et al. 2018a, b). The proposed OCBSOD for FS is applied over different datasets taken from the UCI repository (Frank and Asuncion 2010), and they have different complexities. The results obtained in FS with the OCBSOD are superior to other similar approaches from the state of the art. The main contributions of this study can be summarized as:

  • Improve the performance of the brainstorm optimization algorithm through providing it with a suitable initial population and maintain its diversity.

  • Apply the chaotic maps and opposite-based learning to find the most suitable initial population for BSO.

  • Integrate the BSO with disruptor operator to enhance its diversity during the optimization process.

  • Evaluate the performance of the proposed OCBSOD to find the solution of a set of global optimization problems.

  • Evaluate the ability of the proposed OCBSOD to enhance the classification of different UCI datasets by using it as a feature selection method.

The remainder paper is organized as follows: Sect. 2 introduces the concepts of chaotic maps, opposition-based learning, disruption operator and brainstorm optimization. Section 3 presents the proposed OCBSOD; meanwhile, in Sect. 4 the experimental results and comparisons are provided. Finally, in Sect. 5, some conclusions are discussed.

2 Background

2.1 Chaotic maps: basic

Recently, the improvement in EA’s and SA’s by using the chaos theory has more attention, for example multi-verse optimizer (Ewees et al. 2019) and brainstorm optimization (Yang and Shi 2015). According to the results of these approaches, an enhancement in the convergence rate and the diversity of these EA’s and SA’s can be seen. This results from the properties of chaos such as ergodicity that leads to traverse all the local points and searching for the solutions in the whole search domain (Yang and Shi 2015). Also, if the nonlinear system has an infinite number of different periodic responses and represents sensitive dependence on the initial values, then it is considered as a chaotic system. This sensitivity property to the initial values for a chaotic system makes large differences in its output if there exist small differences in the initial values. The third important property is the stochastic/randomness that makes the chaotic system can avoid the local optimal points. Based on these properties, the optimization algorithms that employ chaotic maps potentially increase the searching capabilities. In other words, using chaotic maps an optimal solution can be found faster that using the standard probability distribution.

The chaotic maps are used to model the chaos. A chaotic map (CM) is defined as a dynamical discrete-time function with a continuous value that determines the relationship of the current value (\(ch_i\)) of the chaotic system with its following value \(ch_{i+1}\). The mathematical definition of these maps can be formulated as in the following equation:

$$\begin{aligned} ch_{i+1} =\beta (ch_i), \end{aligned}$$
(1)

where \(\beta \) represents the transformation mapping function. The Singer map is considered as one of the most popular CMs that was introduced by Aguirregabiria in 2014 (Aguirregabiria 2009), and it can be defined by the following equation:

$$\begin{aligned} ch_{i+1}= & {} \mu (7.86 ch_i - 23.31 ch_i^2 +28.75 ch_i^3\nonumber \\&-\, 13.301875 ch_i^4), \end{aligned}$$
(2)

where \(ch_{i}\in (0,1)\) is the previous chaotic number, while the \(\mu \in [0.9, 1.08]\) is a random number.

2.2 Opposition-based learning: basic concept

In (Tizhoosh 2005), the basic knowledge of opposition-based learning (OBL) is introduced to enhance the performance of the algorithm through considering a solution and its opposite at the same time. The classical EAs ( and SAs) start by generating a random population which contains the solutions for the tested problem, and they go through updating it toward the optimal solution. Therefore, the computational time may be increased, and also, the convergence rate may be reduced if this initial population is not properly selected; however, this may occur in the case of the absence of prior information. To avoid this problem, it can be determined a strategy for searching in the opposite direction about new solution. The basic concepts of the OBL strategy can be defined by assuming a real number x lies in the interval [ul]; then, its opposite number (\({\overline{x}}\)) is computed by as:

$$\begin{aligned} {\overline{x}}=u+l-x \end{aligned}$$
(3)

In higher dimension, the generalization of an \({\overline{x}}\) formulated as:

$$\begin{aligned} \overline{\mathbf{x }}=\mathbf{u }+\mathbf{l }-\mathbf{x }\,\, \text {or}\,\, \overline{x_j}={u_i}+{l_j}-{x_j},\quad j=1,2,\ldots ,D,\nonumber \\ \end{aligned}$$
(4)

In Eq. (4), \(\mathbf{x }\) and \(\overline{\mathbf{x }}\) represent the solution vector and its opposite solution in D dimension, respectively.

2.2.1 Opposition-based optimization

The opposition-base optimization method is introduced by considering the \(\mathbf{x }\in R^D\) as the solution of the given problem and its fitness function \(f(\mathbf{x })\). As well as, based on the definition of the opposite value, \(\overline{\mathbf{x }}\) is the opposite to \({\mathbf{x }}\) and its fitness function \(f(\overline{\mathbf{x }})\). Now, the \(\overline{\mathbf{x }}\) is selected when its \(f(\overline{\mathbf{x }})\) is better than \(f(\mathbf{x })\); otherwise, \(\mathbf{x }\) will be selected. Thus, by this way, the best solutions from the current solutions and their opposite are kept in the population (Cuevas et al. 2012).

2.3 Disruption operator

The basic definition of the disruption operator \(D_{op}\) is given as Sarafrazi et al. (2011):

$$\begin{aligned} D_{op}{=}\left\{ \begin{array}{ll} { Dis_{i,j}\times \Psi \left( \frac{-1}{2},\frac{1}{2}\right) }&{}\quad \text {if}\,\, Dis_{i, \mathrm{best}} { \ge }1 \\ { 1+Dis_{i,\mathrm{best}}\times \Psi \left( \frac{-10^{-16}}{2},\frac{10^{-16}}{2}\right) }&{}\quad \text {otherwise} \end{array} \right. \nonumber \\ \end{aligned}$$
(5)

where \(\Psi (a,b)\) represents the uniform function used to generate the number in the interval [ab]. The \(Dis_{i,j}\) is the Euclidean distance between the ith and jth idea (where j is the nearest neighborhood of the ith idea). From the definition of \(D_{op}\), \(D_{op}<1\) if the two ideas are similar to each other and this refers to the convergence of the ideas.

2.4 Brainstorm optimization

The brainstorm optimization (BSO) is a relative novel algorithm that is inspired by the process of creating new ideas. Brainstorming is commonly used in companies to increase the creativity in work groups. The ideas are formulated by the members of the groups and selected by a moderator considering predefined criteria. In the BSO context, an idea is considered as a candidate solution and is extracted from the search space. The work groups are simulated using clustering, and it is possible to interchange ideas between the “groups.” In general terms, the BSO begins by randomly building a set of candidate solutions. These solutions represent positions in the bounded search domain where the objective function is defined. Once the solutions are initialized and evaluated, m clusters are generated. The entire procedure is explained in Algorithm 1, and the main steps are explained in this section.

2.4.1 Clustering

In this process, the elements of the initial population are separated into groups considering similarities that they possess. Along the iterative process, the clusters are maintained, and new solutions (ideas) are assigned to each group. In this context, the ideas with the best objective function value replace the center of the cluster. Different methods can be used in this phase; however, the k-means algorithm is used in the traditional BSO algorithm.

Fig. 1
figure 1

Flowchart of the proposed method

2.4.2 Generating new positions

This step of the BSO is applied to maintain the diversity of the ideas. A new idea is created using one or more elements of the population (or clusters). In the standard BSO, a probability \(p_{\mathrm{gen}}\) is used to determine if the new solution will be produced by one or two ideas stored in the population. This step is crucial in the optimization process of BSO because if the exploitation of the prominent regions can be enhanced, a new solution is generated by one cluster. Meanwhile, when it is created using two or more clusters, the exploration is increased because the new idea could be far from the elements used for its computation. In Eq. (6), the process of selecting between using one or two clusters is performed considering the probability \(p_{ot}\).

$$\begin{aligned} x_{s}=\left\{ \begin{array}{ll} x_i &{}\quad \text {if}\,\, \rho _2<p_{ot}\\ r_1\times x_{i,1}+(1-r_2)\times x_{i,2}&{}\quad \hbox {for two clusters} \end{array}\right. \end{aligned}$$
(6)

From Eq. (6), \(x_{\mathrm{s}}\) is the selected idea from the population, \(x_i\) is the ith element of the population, and \(\rho _2, r_1\) and \(r_2\) represent a random uniformly numbers. Once \(x_{\mathrm{s}}\) is chosen, there is a necessity to compute the new position, and this task is performed using Eq. (7).

$$\begin{aligned} x_{\mathrm{New}}=x_{\mathrm{s}}+\zeta \times r_3, \end{aligned}$$
(7)

where \(x_{\mathrm{New}}\) is the new element that is computed and \(r_3\in [0,1]\) represents a random value uniformly distributed. Meanwhile, \(\zeta \) is a variable that controls the speed of convergence, it is updated at each iteration, and its value is computed as follows:

$$\begin{aligned} \zeta =r_4\times log\,sig \left( \frac{0.5 \times t_{\mathrm{max}}-t}{k}\right) , \end{aligned}$$
(8)

where \(t_{\mathrm{max}}\) represents the maximum number of iterations previously defined, t is the current iteration and the logsig represents a logarithmic sigmoid transfer function. This function is informative to improve the ability of global search and local search at the beginning of the evolution and when the process is approaching the end, respectively. k represents a predefined parameter that used to change the slopes of the logsig function. Finally, \(r_4\in [0,1]\) represents a random number. The \(x_{New}\) is then evaluated in the objective function.

figure a

3 The proposed OCBSOD method

The proposed method is called OCBSOD since it combined the chaotic with OBL to improve the initial solutions of BSO algorithm (Shi 2011). Also, the disruption operator is applied to improve the updated solution. The steps of OCBSOD approach are given with more details in the following subsections (see Fig. 1). Then, this section explains such steps for solving the problem of feature selection. Moreover, it is important to mention that some steps (i.e., convert the solution to binary and using the classifier) could be removed for solving mathematical optimization problems.

3.1 Initial stage

The initial solutions are very important for meta-heuristic algorithms because it will affect the convergence rate and the performance of the final solutions. Based on the properties of the chaotic maps (randomness and sensitivity to the initial conditions), they can be used to generate the initial population. As well as, the process of replacing the random initial population with it opposite solution can improve the initial solution and accelerate convergence rate. Therefore, the initial stage of the proposed method combines the chaotic systems and the OBL strategy. The initial stage starts by applying the chaotic Singer map to produce a population X of size N as:

$$\begin{aligned} x_i=ch_i\times (u-l)+u,\quad x_i\in X, \end{aligned}$$
(9)

where \(ch_i, u\) and l represent the chaotic map value generated from Eq. (2), upper bound and lower bound as defined in Eq. (4), respectively. The OBL is then applied to enhance the initial population X by calculating the opposite direction for each idea as defined in Eq. (4). The next step used for the problem of FS in the initial stage is to convert each solution to binary as defined in Eq. (10):

$$\begin{aligned} x_{i}\left( t{}\right) { =}\left\{ \begin{array}{ll} { 1}&{}\quad \text {if}\,\, \frac{{ 1}}{{ 1+}e^{{ -}x_i\left( t\right) }}{ >}\epsilon \\ { 0}&{}\quad \text {otherwise} \end{array} \right. \end{aligned}$$
(10)

where \(\epsilon \in [0,1]\) is a random threshold and t is the current iteration. Also, the values of \(x_i\) which is equal to 1 represent the relevant features and those values that is equal to 0 are the irrelevant features. Then, each \(x_i(t)\), at the current iteration t, is evaluated through computing the objective function that defined as (El Aziz and Hassanien 2018).

$$\begin{aligned} \mathrm{fit}\left( x_i(t)\right) =\xi {E}_{x_i(t)}+(1-\xi ) \left( 1-\frac{|x_i(t)|}{|C|}\right) , \end{aligned}$$
(11)

where \(E_{x_i(t)}\) represents the classification error using K-NN classifier; also, the second part of the equation represents the ratio of selected features. |C| represents the total number of features, and \(|x_i|\) is the number of selected features. The \(\xi \in [0,1]\) is applied to balance between the two parts of Eq. (11). The solution in the opposite population \({\bar{X}}\) also assessed by using the same objective function (Eq. (11)).

From X and \({\bar{X}}\), the best N solutions are chosen which represents the current population. Equations (10)–(11) are used only for FS problem, and they are removed when the proposed method is applied for the global optimization.

3.2 Updating stage

In the updating stage, the ideas are updated by using the traditional BSO algorithm as discussed in Sect. 2.4 and the disruption operator (Sarafrazi et al. 2011).

3.2.1 Updating using traditional BSO

At this step, the convergence operation is performed through clustering the solutions to m clusters using k-means; then, divergence operation is performed through disrupting cluster center and creating solutions, in which the disrupting cluster center chooses a random idea and replaced it with a new random generated idea according to the probability \(p_r\). Meanwhile, in the operation used to create new solutions, and the BSO selects one cluster (if the probability \(p_{ot}>\rho _2\)) or two clusters (if \(p_{ot}<\rho _2\)) to generate a new idea. In the case of select one cluster, the cluster center \(x_{\mathrm{s}}\) is selected from a random cluster when the probability \(p_{oc}>\rho _3\); otherwise, a random idea is selected \(x_s\). In the contrary case, the BSO selects two clusters randomly, and then, based on probability \(p_{tc}>\rho _4\) the two clusters’ center is combined with \(x_{\mathrm{s}}\) or two clusters’ center ideas can also be combined with \(x_{\mathrm{s}}\). The next step is to create a new idea \(x_{\mathrm{new}}\) using \(x_{\mathrm{s}}\), and its fitness value is compared with the fitness value of \(x_{\mathrm{s}}\), and the best of them is preserved.

Table 1 Parameters of algorithm and their values
Table 2 Tested functions

3.2.2 Updating using disruption operator

After the solutions are updated using the traditional BSO steps, the DO is employed to improve the diversity (as a result the convergence also improved), where the disruption operator for any solution is defined as:

$$\begin{aligned} X=X\times D_{op}, \end{aligned}$$
(12)

where \(D_{op}\) can be defined as in Eq. (5). Finally, the steps of the previous stages are executed again until the stopping conditions are met.

3.3 Complexity

The computational complexity of the OCBSOD (O(OCBSOD)) depends on the following items: (1) the size of the population (N), (2) total number of iterations (\(t_{\mathrm{max}}\)), (3) the dimension of the given problem and (4) the sorting algorithm. (Here, the Quicksort (QS) is used.) Since the complexity of QS (\(O_{QS}\)) is O(NlogN) and \(O(N^2)\) in the best case and the worst case, respectively, so O(OCBSOD) is:

$$\begin{aligned} O(OCBSOD)=O((4\times N\times n+O_{QS})\times t_{\mathrm{max}}) \end{aligned}$$
(13)

Therefore,

$$\begin{aligned}&O(OCBSOD)\nonumber \\&\quad =\left\{ \begin{array}{ll} O\left( \left( N+ \frac{N}{k}\right) +(2N\times D+N^2)\times t_{max}\right) &{}\quad \hbox {In worst case of Quicksort} \\ O\left( \left( N+\frac{N}{k}\right) +(2N\times D+N log N)\times t_{max}\right) &{}\quad \hbox {In best case of Quicksort} \end{array} \right. \nonumber \\ \end{aligned}$$
(14)

where \(k\in [0,1]\) is the part of X to calculate its \({\bar{X}}\).

4 Experiments and discussion

To assess the quality of the OCBSOD method, three experiment series are executed, (1) used it as the global optimization method to find the optimal value of 23 benchmark functions, (2) verify the influence of the chaotic Signer map and (3) apply the proposed algorithm as a FS method. The experiments are implemented over “Windows 7 (64bit)” that runs on “CPU Core2 Duo with 4GB ram”, and “Matlab 2014b” is used. All the methods executed 30 independent runs over each tested problem, and for comparisons the maximum number of function evaluation is set to \(10^5\).

4.1 Experiment series 1: benchmark functions

In this experiment, a set of benchmark functions are used to evaluate the performance of the OCBSOD approach as an alternative global optimization method, in which it is compared with other algorithms that have been established their performance in the related literature; these algorithms are, namely, Moth-flame optimization (MFO) (Mirjalili 2015), BSO, artificial bee colony (ABC) (Karaboga 2005), harmony search (HS) (Lee and Geem 2005), social-spider optimization (SSO) (El Aziz and Hassanien 2018), salp swarm algorithm (SSA) (Mirjalili et al. 2017) and multi-verse optimizer (MVO) (Mirjalili et al. 2016). For free comparison between the algorithms, we fixed the common parameters such as the total number of the solutions \(N=30\) with dimension equal to 30. Also, the maximum number of iterations is 1000. In this study, the parameter values for each algorithm are setting as in Table 1 which set as the same values in the original implementation.

Table 3 Comparison between the algorithm based on the average of fitness function values
Table 4 Standard deviation for all approaches
Table 5 Computational time

4.1.1 Test functions description

There are 23 benchmark functions that defined in Table 2, also these functions contain seven unimodal functions (\(\textit{F1}\) to \(\textit{F7}\)) and the rest are multimodal (\(\textit{F8}\) to F23). The unimodal functions have a single extreme point within a search domain and they are applied to evaluate the convergence rate of the algorithms. In other words, the multimodal functions have more than one local one extreme value points.

Table 6 Results of Wilcoxon rank sum test for all algorithms along each function

4.1.2 Measures of performance

In order to evaluate the performance of the methods, the following measures are used (Suganthan et al. 2005):

  1. (1)

    Mean of fitness (\(\mu \)):

    $$\begin{aligned} \mu = \frac{1}{N_{\mathrm{run}}} \sum _{i=1}^{N_{\mathrm{run}}} \mathrm{fit}_i \end{aligned}$$
    (15)
  2. (2)

    Standard deviation (STD):

    $$\begin{aligned} \sigma = \sqrt{\frac{1}{N_{\mathrm{run}}-1}\sum _{i=1}^{N_{\mathrm{run}}}{(\mathrm{fit}_i-\mu )^2}} \end{aligned}$$
    (16)

    where \(N_{\mathrm{run}}\) is the total number of runs, \(fit_i\) is a fitness value (i.e., \(F1, F2,\ldots ,F23\)).

  3. (3)

    The number of times (NVTR) the algorithm successfully reaches to the value-to-reach (VTR) for each test function is defined as:

    $$\begin{aligned} SR=\frac{NVTR}{\hbox {total number of trails}} \end{aligned}$$
    (17)
  4. (4)

    The diversity measure of the algorithm \(div_{\mathrm{Alg}}\) of D-dimensional problem is defined as:

    $$\begin{aligned} div_{\mathrm{Alg}}=\frac{1}{N}\sum _{i=1}^N{\left( \sqrt{\sum _{j=1}^D(x_i^j-{\bar{x}}^j)^2}\right) }, \end{aligned}$$
    (18)

    where D and \(x^j_i\) are the dimension of the problem and the ith solution at the jth dimension, also, \({\bar{x}}^j\) is the average solution \({\bar{x}}\) at the jth dimension.

4.1.3 Discussion

The comparison results for all the methods among a set of 23 functions are given in Tables 3, 4, 5 and 6 and Figs. 2, 3, 4 and 5. From Table 3, it can be seen that, in general, the OCBSOD method gives the better results than the other methods over the most functions. In this sense, it achieved the first rank with 11 functions, six unimodal (F1–F5 and F7) and five multimodal (F8–F11 and F21). Meanwhile, the ABC is in the second rank with five functions (F6, F12–F14 and F20), followed by BSO and HS algorithms that obtain better values for only one function F19 and F15, respectively. As well as, for the functions F16-F18 all the algorithms give the optimal value, also, for the two functions F22, F23, the proposed OCBSOD, SSO and ABC algorithms give the better results overall other algorithms.

Figures 2, 3, 4 and 5 depict the convergence curves for the comparative methods along the functions. From these figures, it is possible to observe that the convergence curve for the proposed OCBSOD algorithm is better than other algorithms for functions namely, F1–F5, F7–F11. However, for functions F6, F12, F13 and F14 the ABC is the best algorithm, while for function F15 the HS algorithm is the better, also, for F16, F17 and F18 the SSA is the best with a small difference between it and other algorithms.

In addition, the standard deviation for each algorithm is presented in Table 4. It can notice from it that (1) The proposed algorithm has the smallest variation overall the algorithms along all function, except for the SSO algorithm that gives the best results in 9 functions (F2, F6–F8, F10, F12–F15) and the MFO the give better results in F18. (2) the traditional BSO, OCBSOD, and SSO give the same standard division for the function F17.

Table 5 shows the computational time and the success rate for each algorithm, in which from this table it can be seen that, in terms of time computational, the proposed OCBSOD algorithm takes a small time to achieve the best value for the functions except function F1 the MFO is the better. Also, the MVO obtains the shortest time at the functions F19, F21, and F22. In addition, according to the values of success rate SR, in Table 5 it can be seen that the OCBSOD has the higher number of success rate along most functions except the functions F5–F7, F12–F14, F17 and F18. Also, no any approach reached to the value-to-reach (VTR) for those functions (i.e., F5–F7, F12–F14, F17 and F18) except for function F12 the BSO reached seven times.

Moreover, Fig. 6 shows the diversity of the comparative algorithms for four selected functions. From this figure, we can observe that the diversity of the proposed OCBSOD algorithm is better than other methods among the selected functions.

From the previous results, it can be concluded that the OCBSOD provides an effectiveness and efficiency to determine the global solution for the problems with lower diversity, fast convergence rate. This high performance is achieved due to the good initial population that selected by using the chaotic Singer map to build a population; then, the OBL is used to enhance it. As well as, the disruption operator makes the OCBSOD algorithm maintain its diversity that leads to a good performance.

4.1.4 Statistical analysis

The Wilcoxon rank sum test (Wilcoxon 1945) used in this section to add further statistical analysis; here, all the algorithms run until \(10^5\) function evaluation are reached. This is a nonparametric test which is used to see if there is a significant difference between the median of the control group and other or not. In this study, the null hypothesis is that there is no a significant difference between the control group (OCBSOD) and the other groups (algorithms) at the level of significance equal 5%. The results of Wilcoxon rank sum test are presented in Table 6, in which we can observe that there exists a significant difference between the OCBSOD and the other methods in most of 23 functions. For example, the proposed algorithm is better than HS in all functions, while there is a no significant difference between OCBSOD and both MVO and SSO overall functions except (F19 and F20) and (F15 and F20), respectively. Also, the results of OCBSOD are not a significant difference with the results of BSO, SSA, MFO and ABC in functions (F5, F12, F14 and F16–F18), (F16–F20), (F14 and F16-F20) and (F5, F16, F17 and F19), respectively.

4.2 Experiment series 2: influence of chaotic map

In this section, the influence of removing the chaotic Singer map from the proposed algorithm and generating a random population is tested using the same benchmark functions. The comparison results between the proposed OCBSOD algorithm and its version (that called OBSOD) without chaotic map are illustrated in Table 7, in which we can observe that the OCBSOD has achieved better results, according to the average of the fitness function, in nine functions (F4, F5, F8-F11, F14, F15 and F21). Also, its version OBSOD has the best value in functions (F1-F3, F6, F7, F12, F13, F19, and F20). Meanwhile, the other five function both have nearly the same average value; however, the standard deviation of the OCBSOD is better than the OBSOD version overall the tested functions except F3, F4, F6, F13, F20, F22, F23. Moreover, the time requires by the proposed OCBSOD is smaller than the OBSOD nearly by 11.7457s along all the tested functions, as well as the SR of OCBSOD algorithm is the best.

Fig. 2
figure 2

Convergence curves of F1–F6

Fig. 3
figure 3

Convergence curves of F7–F12

Fig. 4
figure 4

Convergence curves of the functions F13–F18

Fig. 5
figure 5

Convergence curves of F19–F23

Fig. 6
figure 6

Diversity for algorithms at F5, F6, F12, and F14

Table 7 Comparison results between the proposed OCBSOD and its version OBSOD

4.3 Experiment series 3: feature selection

In this Experimental, the quality of the OCBSOD method to determine the optimal subset of features is assessed by comparing it with other feature selection methods.

4.3.1 Datasets description

In this experiment, eight UCI datasets are used in the comparison between the OCBSOD algorithm and the other algorithms. The description of datasets is given in Table 8, in which these datasets have different properties such as the size of features and instances are different from one dataset to another.

For a fair comparison between the algorithms, the tenfold cross-validation (CV) method is used to split the dataset into testing and training sets. This method is performed through split the dataset into ten groups and make one of them as a testing set and the remaining as training set and repeat this assignment ten times in which at each time, different groups are chosen as a testing set. The output is computed as the average of the ten accuracies.

4.3.2 Performance metric

Several performance metric are used to assess the performance of the OCBSOD method through evaluating the fitness function values, the size of the selected features and the accuracy of classifier according to the selected features as in Table 9, where \(|x_{\mathrm{best}}^i|\) represents the length of the selected feature at the ith run, while D is the total number of features. The TN, TP, FN and FP are the true negative samples, the true positive samples, the false negative samples and the false positive samples. Here, the K-nearest neighborhood (K-NN) is used as a classifier.

4.3.3 Discussion

The comparison results between the methods for each dataset are presented in Tables 10, 11 and 12 and Figs. 7, 8, 9 and 11 which represent the performance of each algorithm according to different measures. For example, Table 10 illustrates the best, average and worst values of the fitness function for each algorithm along the datasets. It can be seen in this table that, in general, the proposed OCBSOD has the best average and worst values; however, according to the best measure it has a value similar to SSO algorithm, but better than the other algorithms (also, as in Fig. 7). The performance of the OCBSOD is better than BSO algorithm in all cases except the average for the Soybean dataset. The better values in these measures for fitness function are given in this table in bold font.

In addition, Table 11 shows the average of CPU time(s), in which it can be seen that the OCBSOD takes the smallest time overall dataset, nearly 33.758s, whereas the SSO algorithm allocates the second rank with 34.436s followed by the HS algorithm, and also the BSO and ABC nearly have needed the same time. Moreover, in terms of the selected features ratio, the OCBSOD has the smallest ratio with value nearly equal to 0.390 overall datasets, followed by the SSO and BSO in the second and third rank, respectively, as in Fig. 8b. Also, for each dataset the proposed OCBSOD gives the better results in BreastCW, Soybean and WaveFN datasets, while BSO gives the smallest ratio of the selected features in Spambase, Congress and Wine. Meanwhile, the HS algorithm is the better in hepatitis and IonoSp datasets as in Fig. 8a.

Figures 9 and 10 show the average of accuracy through running each algorithm 30 runs along each dataset and overall datasets, respectively. From these figures, it can observe that, in general, the BSO has the less accuracy, while the SSO and HS have nearly the same results, as well as, the OCBSOD has the better accuracy. Since there exist two classifiers and eight datasets, so there are 16 cases, the OCBSOD algorithm achieved the first rank with seven cases, also, SSO in the second rank with three cases followed by ABC in the third rank. Also, the HS and BSO have only one case; however, for Soybean dataset, the OCBSOD, SSO, ABC and HS have the same results.

Moreover, Fig. 11 shows the average results of the two classifiers as the average along all datasets, in which from this figure we can conclude that the RF classifier gives the better results than K-NN classifier. Therefore, the RF classifier is the better classifier, and in this study, that could be combined with the OCBSOD algorithm.

Table 8 Description of UCI dataset
Table 9 Performance measure
Table 10 Results of the fitness function for each algorithm
Table 11 Average results of CPU time(s)

4.3.4 Statistical analysis

In order to statistically analyze the results of the algorithms for FS problem, the Wilcoxon’s rank sum test is used as in Table 12. Here, it is important to mention that the stop criteria are set to \(10^5\) objective function evaluations. From this table, it can be noticed that there exists a significant difference between the OCBSOD and the other methods because the p-value is less than 5%, so we reject the null hypothesis.

Table 12 Wilcoxon’s rank sum test results between the feature selection methods

From the previous results, one can be concluded that the OCBSOD algorithm provides an effectiveness and efficiency to determine the global solution for the problems with suitable diversity, fast convergence rate. As well as, the accuracy of classification is increased through applying the OCBSOD to find the optimal subset of features. This high performance is achieved due to the good initial population that selected by using the chaotic Singer map to produce a population; then, the OBL strategy is used to enhance it. As well as, the disruption operator makes the OCBSOD algorithm maintain its diversity that leads to a good performance. However, there are some limitations of the proposed method that need to be enhanced such as the disruption operator is applied to update the solutions. However, it is better to use it only for small part of them through using criteria. By this way, the complexity of the OCBSOD can be reduced.

Fig. 7
figure 7

Average, best and worst fitness function values

5 Conclusions and future works

This paper introduced a modified version of the BSO algorithm called OCBSOD to improve the ability of BSO to exploration and exploitation. The enhancement is performed by generating the initial population using the chaotic Singer map and then benefit from the properties of the OBL to increase the efficiency of the initial population. Thereafter, the solutions of this population are updated using the steps of the BSO algorithm followed by the disruption operator that improves the diversity of the solution and therefore increases the convergence rate. The performance of the OCBSOD algorithm is assessed using a set of three experiments; in the first experiment, the proposed OCBSOD is compared with other seven algorithms, namely BSO, SSO, SSA, MVO, MFO, HS and ABC, to find the optimal solution for functions. According to the results presented in this experiment, the OCBSOD gives the results better than other algorithms regarding all performance measures. Meanwhile, in the experimental series 2, the influence of the chaotic singer map is evaluated on the same benchmark functions through generating a random population. The results show that the chaotic singer map affects the performance of the proposed algorithm. To further evaluate the performance of OCBSOD, in the third experiment, the OCBSOD is applied as a feature selection method to find the relevant features from eight datasets to improve the accuracy of classification, where the OCBSOD is compared with other FS methods, namely BSO, SSO, ABC and HS. Based on the performance measures, the classification accuracy of OCBSOD is better than other methods.

Fig. 8
figure 8

Average results of selected feature ratio a for each dataset, b overall dataset

Fig. 9
figure 9

Classification accuracy value of algorithms along each dataset

Fig. 10
figure 10

Classification accuracy value of algorithms overall datasets

Summarizing the main contribution of this article is the combination of the different operators to enhance the searching process in BSO. The inclusion of OBL and DO permits to increase the diversity of the population; meanwhile, the chaotic maps help in the exploitation phase. Another impressive contribution is the application of the proposed approach for feature selection. This implementation permits us to have an alternative method to select the best elements from a dataset. The limitation of this work is that the use of the operators could increase the computational effort for specific problems. Here, it is necessary to test it in different domains to verify its performance.

Fig. 11
figure 11

Average of classification accuracy of two classifiers overall Algorithms

The future scope of this study is to use this algorithm in different applications such as data mining and image processing by considering it as (1) the image segmentation method, (2) multi-objective optimization algorithm, (3) used it to solve the constrained optimization problems, (4) apply it to renewable energy problems and (5) apply the methodology of OCBSOD in other meta-heuristic algorithms to enhance their ability for exploitation and exploration exploitation of the search domain.