1 Introduction

To search the optimal solution of the nonlinear, non-differentiable and non-separable complex problem, swarm intelligence algorithms were proposed to simulate the foraging behaviors and biological habits of animals, and have received increasing attention in last several decades (Revay and Zelinka 2019; Kaur and Kumar 2020; Mehta and Saxena 2020; Farrag et al. 2019; Kumar 2021, 2019; Dongoran et al. 2018; Zhang et al. 2018; Kanata et al. 2018; Daylamani-Zad et al. 2017). The classical swarm intelligence algorithms and variants have Particle Swarm Optimization (PSO) (Kennedy and Eberhart 1995), Artificial Bee Colony (ABC) (Karaboga 2005) and Ant Colony Optimization (ACO) (Dorigo et al. 1991), Cuckoo Search Algorithm and Hill Climbing (CSAHC) (Shehab et al. 2018), Cuckoo Search Algorithm by using Reinforcement Learning (CSARL) (Shehab et al. 2018), Cuckoo Search combined with Bat Algorithm (CSBA) (Shehab et al. 2019), Opposition-based learning in Multi-Verse Optimizer (OMVO) (Shehab and Abualigah 2022). MFO is a novel swarm intelligence algorithm, proposed by Mirjalili in 2015 (2015) and inspired by the transverse orientation mechanism of moths in nature. Meanwhile, MFO has successfully been applied in various fields of the practical engineering, such as breast cancer detection (Ahmed et al. 2019; Sayed and Hassanien 2017), clustering for internet of things (Reddy and Babu 2019; Yang et al. 2017; Bharany et al. 2022), feature selection (Sayed and Hassanien 2017), productivity forecasting (Reddy and Babu 2019), power dispatch problems (Elsakaan et al. 2018; Mei et al. 2018; Trivedi et al. 2018; Anbarasan and Jayabarathi 2017) and image segmentation (Khairuzzaman and Chaudhury 2017; Jia et al. 2019; Abd El Aziz et al. 2017; Said et al. 2017). However, the search performance of MFO is extremely influenced by control parameters, global exploration skill and local exploitation ability. To further improve the search performance of MFO, researchers mainly promote the optimization ability of MFO from the following three aspects: enhancing the global exploration and local exploitation ability, adjusting control parameters, and mixing with other algorithms.

In terms of global exploration and local exploitation, Opposition-based Moth Flame Optimization (OMFO) was proposed by Apinantanakon et al. (2017), which introduces the opposite location of moths during the spiral search process. Li et al. (2020) proposed an Improved Moth Flame Optimization (IMFO) to improve global optimization ability of MFO algorithm by using Levy flight mechanism and dimension-by-dimension evaluation method. Zhao et al. (2018) proposed an Ameliorated Moth Flame Optimization (AMFO), which not only improves the solution precision of classic MFO, but also enhances the convergence speed and the stability of MFO. Xu et al. (2019a) designed an enhanced moth flame optimizer with mutation strategy for avoiding premature, and it consists of many mutation strategies, such as Gaussian mutation (GMFO), Cauchy mutation (CMFO) and Levy mutation (LMFO). The Gaussian mutation can improve the exploitation ability of the algorithm, and the Cauchy mutation may guarantee the population to search in the major area and discard local optima readily. The Levy mutation can help population to escape from local optima because of its heavy-tailed distribution.

The optimized moth flame optimizer based on Gaussian mutation and cultural learning was proposed in Xu et al. (2018). Xu et al. (2019b) proposed an improved MFO algorithm based on the chaotic local search and Gaussian mutation. Nadimi-Shahraki et al. (2021b) proposed the migrate-based moth flame optimization (M-MFO) algorithm that uses the migrate operator to improve the position of unlucky moths. The migrate operator promotes the diversity of population, so that population can obtain better search performance. To provide a more efficient tool for optimization purposes, Shan et al. (2021) proposed a double adaptive weight mechanism into MFO algorithm, termed as WEMFO. WEMFO adaptively change the search strategy in different search periods. To fully utilize the information of flame population, an enhanced moth flame optimization with multiple flame guidance mechanism (EMFO) is proposed in Wang et al. (2022). In EMFO, the intersection information of multiple flames is used to guide moth search, which enhances the global diversity of moth population. Nadimi-Shahraki et al. (2021c) proposed a multi-trial vector-based moth flame optimization algorithm, named for MTV-MFO. MTV-MFO uses three different search strategies to enhance the global search ability, and prevents the original MFO’s premature convergence during the optimization process. Ma et al. (2021) proposed an improved moth flame optimization algorithm for alleviating the premature convergence problem. An inertia weight of diversity feedback control is utilized to balance the global explore ability and local exploitation ability.

In terms of hybrid algorithm, Shehab et al. (2021) proposed a hybrid moth flame optimization algorithm by using new selection schemes, in which hill climbing (HC) is used to hybrid with moth flame optimization (HC-MFO) for enhancing the global exploration ability. Mohammad et al. (Nadimi-Shahraki et al. 2022) proposed an effective hybridizing of whale optimization algorithm (WOA) and a modified moth flame optimization algorithm, named for WMFO to solve the optimal power flow. In WMFO, WOA and the modified MFO cooperate to effectively discover the promising areas and provide high-quality solutions. Sayed et al. (Sayed and Hassanien 2018) proposed the hybrid MFO and Simulated Annealing (SA), and MFO comminating with PSO were proposed in Bhesdadiya et al. (2017); Anfal and Abdelhafid 2017; Jangir 2017). Gravitational Search Algorithm was hybridized into MFO, which was proposed by Sarma et al. (Sarma et al. 2017). Intelligent facial emotion recognition using the hybrid MFO and Firefly Algorithm (FA) was proposed by Zhang et al. (2016).

Regarding control parameters, Emary et al. (2016) put forward a chaos-based automatic control method on exploration and exploitation rates against the manual parameter control of MFO. Wang et al. (2017) proposed a chaotic strategy to simultaneously perform parameter optimization and feature selection in MFO. To improve the exploitation of moth population, Guvenc et al. (2017) proposed the chaotic moth swarm algorithm, in which ten chaotic maps were incorporated into MFO algorithm for finding the best numbers of moths.

Although those variants of MFO have obtained better performance than classical MFO, they exhibit poor solution accuracy in solving multi-modal optimization problems. To alleviate this problem, an adaptive MFO algorithm with historical flame archive is presented in this paper, inspired by individual’ optima guided method in PSO.

The main contributions of this paper can be summarized as follows.

  1. (1)

    An adaptive historical flame archive strategy is proposed to enhance the solution precision of multimodal problems by storing the better historical flame information.

  2. (2)

    To accelerate the convergence speed and make full use of flame information, a top flame randomly matching mechanism is constructed by randomly selecting one of top flames to guide moth search.

  3. (3)

    To systematically verify the superiority of the proposed MFO–HFA algorithm, 25 complex benchmark functions are utilized to estimate the overall performance of MFO–HFA. Besides, MFO–HFA is used to generate the rules of IDS by NSL-KDD dataset.

The rest of this paper is given as follows. In Sect. 2, the review of original MFO algorithm is summarized. The adaptive MFO algorithm with historical flame archive is detailly described in Sect. 3. Section 4 presents the simulation results on the 25 benchmark functions. MFO–HFA is used to generate the rule of IDS by NSL-KDD dataset in Sect. 5, followed by conclusions in Sect. 6.

2 Moth flame optimization

MFO is a novel population-based intelligence algorithm, inspired by the navigation mode of the moths using the moon light in nature. The transverse orientation of moth is shown in Fig. 1. Moth flying at a fixed angle around the moon is an effective method to travel in a straight line for long distance in night. When moth is flying around an artificial light, moth will trap into the spiral path and gradually approach the light. Figure 2 shows that moths gradually approach the flame during the spiral flight process, which is mapped to the spiral search that has the promising performance in solving practical engineering optimization problems. The optimization function and the composition of MFO is described in detail as follows.

Fig. 1
figure 1

Transverse orientation of moths

Fig. 2
figure 2

Spiral flight of moths

2.1 Problem formulation

The problem optimized by MFO algorithm is formulated as follows.

$$ f\left( {X^{*} } \right) = \min f(X) $$
(1)

where \(X_{i} = \{ x_{1} ,x_{2} , \ldots ,x_{D} \}\) is a solution of objective problem, and D denotes the number of dimensions. \(f(X)\) is fitness value of variable \(X\), and is a minimizing problem. \(X^{*}\) is the global optimal solution.

2.2 Generating the initial population of moths

The moth population of MFO algorithm can be described as follows.

$$ M = \left[ {\begin{array}{*{20}c} {m_{11} } & {m_{12} } & \cdots & {m_{1D} } \\ {m_{21} } & {m_{22} } & \cdots & {m_{2D} } \\ \vdots & \vdots & \ddots & \vdots \\ {m_{N1} } & {m_{N2} } & \cdots & {m_{ND} } \\ \end{array} } \right] $$
(2)

where N is the number of moths.

The fitness values of all moths are listed in a matrix as follows.

$$ OM = \left[ {\begin{array}{*{20}c} {OM_{1} } \\ {OM_{2} } \\ {\begin{array}{*{20}c} \vdots \\ {OM_{N} } \\ \end{array} } \\ \end{array} } \right] $$
(3)

Another essential component in the MFO algorithm is flame. An array similar to the moth matrix is given as follows.

$$ F = \left[ {\begin{array}{*{20}c} {F_{11} } & {F_{12} } & \cdots & {F_{1D} } \\ {F_{21} } & {F_{22} } & \cdots & {F_{2D} } \\ \vdots & \vdots & \ddots & \vdots \\ {F_{N1} } & {F_{N2} } & \cdots & {F_{ND} } \\ \end{array} } \right] $$
(4)

For the flames, the matrix of fitness values also is key and represented as follows.

$$ OF = \left[ {\begin{array}{*{20}c} {OF_{1} } \\ {OF_{2} } \\ {\begin{array}{*{20}c} \vdots \\ {OF_{N} } \\ \end{array} } \\ \end{array} } \right] $$
(5)

2.3 Operators of MFO

MFO has three main operators described detailly as follows.

$$ MFO = (I,P,T) $$
(6)

where \({\text{I}}\) is the initialization function that generates a group unfirmly random solution (moths) in optimization space and corresponding to fitness values (\(I:\emptyset \to \{ M,OM\}\)). \(P\) stands for spiral search function that is the main operator \((P:M \to M)\), and moves the moths around the flames for searching optimal solution. \({\text{T}}\) refers to whether to satisfy the optimization process \((T:M \to true, false)\).

The following equation represents \({\text{I}}\) operator, which is used to generate initial moth population.

$$ M_{i,j} = rand*\left( {ub_{j} - lb_{j} } \right) + lb_{j} $$
(7)

where \(M_{i,j}\) denotes \(j\) th dimension of \(i\) th moth, \(ub_{j}\) and \(lb_{j}\) represent the upper bounds and lower bounds, respectively. \(rand\) is the random in range [0,1]. The moths fly in the search space by using the transverse orientation mechanism. There are three conditions that should be complied with when utilizing a spiral search. Firstly, the position of the moth should be the starting point of the spiral search. Subsequently, the position of the flame should be the ending point of the spiral search. Finally, the scope of spiral search should be in the search space.

Thus, the logarithmic spiral search of the MFO algorithm can be represented as follows.

$$ moth_{i,t + 1} = D_{ij} *e^{b\beta } *\cos \left( {2\pi \beta } \right) + F_{j} $$
(8)
$$ D_{ij} = |F_{j} - moth_{i,t} | $$
(9)

where \(F_{j}\) is \(j\)th flame, and \(moth_{i,t}\) denotes \(i\)th moth at \(t\) generation. Meanwhile, \(D_{ij}\) stands for the distance between \(j\)th flame and \(i\)th moth, and \(b\) is the spiral constant and defines the shape of the logarithmic spiral. In addition, β represents the random in interval [\(r\), 1], where \(r\) is linearly decreased from − 1 to − 2.

As it can be seen in Eq. (9), the next position of a moth is determined by the corresponding flame, and it is not necessarily in the space between them. Therefore, the exploration ability and the exploitation capacity of the population can be guaranteed.

2.4 Updating the number of flames

The balance between local search ability and global exploration ability is achieved by the original time-varying mechanism (i.e., the number of flames gradually decrease) that is defined as follows.

$$ Flame\_no = round(N - \left( {N - 1} \right)*gen/genMax) $$
(10)

where \(Flame\_no\) is the number of flames. \(gen\) denotes \(gen\)th iteration, and \(genMax\) is the maximal generation. Besides, \(round\) stands for the rounding function.

2.5 Flame matching mechanism

How to choose the flame for moths is a crucial problem that decides the performance of the algorithm. In MFO algorithm, the flames are sorted based on their fitness values after updating the matrix of flames in each iteration. Then, the flame select method is executed to improve the convergence ability of the population, listed as follows.

$$ MF_{i} = \left\{ {\begin{array}{*{20}l} {F_{i} ,\quad if\; i \le Flame\_no} \\ {F_{Flame\_no} ,\quad otherwise} \\ \end{array} } \right. $$
(11)

where \(i\)th moth fly around \(MF_{i}\)th flame of sorted flame matrix.

2.6 Process of spiral search of MFO

To sum up, the search process of MFO can be described as follows. Moths are randomly generated in the search space, and the fitness value of each moth individual is calculated. Some top positions found so far are viewed as flames and added to flame population. Subsequently, control parameter \(\text{flame\_no}\) and decrease factor \({\text{r}}\) are updated according to the time-varying mechanism, respectively. The positions of moths are renewed by the spiral search function to find better solution. The above process will be repeated until the termination criteria are met.

3 Adaptive moth flame optimization with historical flame archive strategy

The main advantage of original MFO algorithm is that the spiral search mechanism is simple and efficient to optimize some practical problems. However, for some complex problems, especially the multi-modal and high dimensional problems, it may be premature and the obtained solution precision will be poor. The flame population in MFO algorithm increases the risk of premature. Besides, the classical flame matching mechanism in MFO is inefficient and does not make use of information of top flame. To solve above problems, this paper proposes two effective mechanisms (adaptive historical flame archive strategy and new flame number updating mechanism) described in detail as follows.

3.1 Adaptive historical flame archive strategy

The effect of the flame population that is composed of the best flame found so far, is crucial to guarantee the search ability of MFO. However, in the later stage of evolutionary search, the diversity of flame population is very poor so that moth population easy to fall into the trap of local optimal solution. In order to enhance the diversity of flame population, an adaptive historical flame archive strategy is designed to avoid moth population premature. This flame archive is described as follows.

$$ {\text{F}} = \left[ {\begin{array}{*{20}c} {pF_{1} } \\ {\begin{array}{*{20}c} {pF_{2} } \\ \vdots \\ {pF_{N} } \\ \end{array} } \\ \end{array} } \right] $$
(12)

where \(pF_{i}\) denotes the personal historical optimal solution of the \(i\)th moth.

The updating strategy of historical flame archive can be described by Fig. 3, and it can be formulated as follows.

$$ pF_{i} = \left\{ {\begin{array}{*{20}l} {M_{i} ,\quad if\; OM_{i} < OpF_{i} } \\ {pF_{i} ,\quad otherwise} \\ \end{array} } \right. $$
(13)

where \(OpF_{i}\) is the fitness value of the historical optimal solution of \(i\)th moth.

Fig. 3
figure 3

Updating mechanism of historical flame archive

The advantage of the historical flame archive will be clearly illustrated in Fig. 4. The labels 1, 2 and 3 in the Fig. 4 denote the positions of index 1, 2 and 3 flames, respectively. Meanwhile, labels 1′, 2′ and 3′ are the positions of index 1, 2 and 3 moths after an update, respectively. Based on the definition of MFO algorithm, flames are the best position found so far. Therefore, the new flames will be the positions labeled 3′, 3 and 1′, which will make the population lose the information of the global optimal solution and trap into the local optima. The adaptive historical flame archive strategy described above is used to ensure that the information of the global optima can be maintained in the process of search, because the ith flame will be replaced by the \(i\)th moth only when the fitness value of the \(i\)th moth is better than that of the \(i\)th flame. In the above example, the positions of index 2, 3 and 1′ will be viewed as the new flames so that index 2 with the information of global optima is kept.

Fig. 4
figure 4

Schematic diagram of updating flame (The peak value is the worst fitness value of the functions, and the depression value represents the optimal value of fitness value)

3.2 Top flame randomly matching mechanism

The flame matching mechanism in the original MFO is inefficient and does not make use of information of top flame, because many moths of the population in the middle and later stage of search are searching around the \(Flame\_no\)th flame and the \(Flame\_no\)th flame not provide the best direction information (i.e., the \(Flame\_no\)th flame is not the best individual of flame population). To solve the problem, each moth randomly chooses one of top q% flames for searching the solution space, which is defined as follows.

$$ SF = sort(F) $$
(14)
$$ F_{i} = SF_{rand*q\% *N} $$
(15)

where \(sort(F)\) refers that the flames of flame population are ranked from small to large according to their fitness value, and \(F_{i}\) denotes the flame corresponding to the ith moth. \(rand\) is a random in range [0, 1], and q is a control parameter that defines the number of top flames. N stands for the size of flame population.

3.3 Flowchart and pseudo-code of MFO–HFA

figure a

The pseudo-code of the MFO–HFA is summarized in Algorithm 1, where the modification of MFO–HFA is bold to show clearly.

As previously analyzed, the complete flowchart of the MFO–HFA algorithm is given in Fig. 5, where the modification of MFO–HFA is bold for clarity.

Fig. 5
figure 5

Flowchart of MFO–HFA

4 Benchmark function optimization problems

The numerical benchmark functions of CEC 2005 (Suganthan et al. 2005) are used to test the performance of MFO–HFA compared with classic MFO algorithm, other variants of MFO and some state-of-the-art optimization algorithms (i.e., DE, adaptive CoDE (ACoDE) (Wang et al. 2011), PSO and comprehensive learning particle swarm optimizer (CLPSO) (Liang et al. 2006)). For a fair comparison, all simulations are carried out on the same physical environment with MATLAB 2018b, and each algorithm is independently run 25 times with \(D\)*10,000 function evaluations (FES) for reducing statistical errors. The Wilcoxon’s rank sum test at a 5% significance level was used to calculate statistically reliable results.

4.1 Benchmark functions

The 25 benchmark functions are used to test the performance of the MFO–HFA proposed by this paper, proposed in the CEC2005 (Suganthan et al. 2005) special session on real-parameter optimization. F1–F5 of CEC 2005 are continuous unimodal functions while F6–F14 are multimodal and have a significant number of local minima. Besides, F15–F25 are hybrid composition functions.

The dimension of the problems (i.e., decision variables) is set to 30 for all the 25 functions. In this experiment, the mean value and standard deviation of the function error value (\(f\left( {gbest} \right) - f(X^{*} )\)) are recorded for testing the performance of each algorithm, where \(gbest\) is the best solution found by the algorithm in a run and \(X^{*}\) is the theoretical global optimum of the benchmark functions.

4.2 Parameter settings of comparative algorithms

MFO–HFA is compared with five other variants of MFO algorithm, i.e., AMFO (Zhao et al. 2018), GMFO (Xu et al. 2019a), CMFO (Xu et al. 2019a), LMFO (Xu et al. 2019a), OMFO (Apinantanakon and Sunat 2017). Besides, classic MFO (Mirjalili 2015), PSO (Kennedy and Eberhart 1995), ACoDE (Wang et al. 2011), CLPSO (Liang et al. 2006) and DE (Storn and Price 1997) are used as a comparison algorithm to evaluate the effect of MFO–HFA algorithm. The parameters of the MFO–HFA algorithm are set as follows. The size of moth population is 100, and the size of the historical flame archive equal to the size of population. The parameter q is set as 0.2, according to the sensitivity analysis of the parameter q in Sect. 4.5. The population size of other algorithms also is 100, and other parameters of comparison algorithms are the same with their original papers.

4.3 Experimental results

4.3.1 Comparisons on solution accuracy

The results of solution accuracy are shown in both Tables 1 and 2 in terms of the mean optimal solutions and the standard deviation of the solutions, which are obtained by each algorithm with 25 independent runs and 300,000 times fitness evaluation on 25 benchmark functions.

Table 1 Results of solution accuracy obtained by six compared algorithms
Table 2 Results of solution accuracy obtained by six compared algorithms

In each row of Tables 1 and 2, the average values over 25 independent runs are listed in the first line, and the standard deviations are given in the second line. The P value and H value of nonparametric statistical test with a significance level α = 0.05 are presented in the third and fourth lines. The symbol ‘ǂ’ is tagged in the back of the mean value yielded by the algorithm that is significantly worse than MFO–HFA algorithm. If MFO–HFA is worse than other algorithms, a ‘ξ’ is added in the back of the mean value of corresponding algorithm. The symbol ‘ ~ ’ indicates that there is no significant difference between MFO–HFA and the compared algorithm. At the last row of the table, a summary of total number of ‘ǂ’, ‘ξ’ and ‘ ~ ’ is presented. In addition, the best results are bold to show clearly.

It can be seen from the Table 1 that MFO–HFA obtains the best performance on 17 test functions, and is poor on 9 functions (F5–F8, F12, F14, F15 and F23). AMFO yields the best results on function F8, and GMFO obtains the best results on 3 functions (F6, F14 and F24). CMFO gains the best results on 4 functions (F5, F7, F15 and F24), and LMFO obtains the best results on 3 functions (F5, F8 and F24). OMFO gets best performance on 4 functions (F8, F12, F23 and F24).

It can be clearly observed from Table 2 that the original MFO algorithm has the best performance on 6 functions (F5, F7, F8, F12, F24 and F25), compared with MFO–HFA, PSO, CLPSO, DE and ACoDE. Furthermore, CLPSO algorithm obtains the best performance on five functions, i.e., F9, F15, F21, F23 and F24. The state-of-the-art DE algorithm outperformances other algorithms on 9 benchmark functions which are 3 unimodal functions (F1, F2 and F4), 1 multimodal function (F6) and 5 hybrid composition functions (F18–F24). The ACoDE gains the best results on 3 standard functions (F3, F21 and F24). The MFO–HFA obtains the best performance on 7 benchmark functions that are 3 multimodal functions (F10, F11 and F13) and 4 hybrid composition functions (F16, F27, F21 and F24).

Compared with MFO, PSO and CLPSO, MFO–HFA has achieved an overwhelming advantage. Meanwhile, the performance of MFO–HFA is similar to that of DE on 25 benchmark functions. Based on the above analysis, MFO–HFA significantly improves the solution accuracy and exploration ability of MFO. The main reason is the adaptive historical flame archive strategy of MFO–HFA has stronger ability to jump out of local optimum compared with PSO, CLPSO, DE and ACoDE.

4.3.2 The comparison results of convergence speed

In Figs. 6 and 7, the vertical axis is the natural logarithm of the mean value over independent 25 runs, and the horizontal axis is the sampling point where 31 sampling points are taken from FES = 1000 and mod (FES, 10,000) = 0.

Fig. 6
figure 6figure 6figure 6figure 6figure 6

Convergence performance of the six compared algorithms (i.e., MFO–HFA, CMFO, GMFO, LMFO, AMFO and OMFO) on 25 functions

Fig. 7
figure 7figure 7figure 7figure 7figure 7

Convergence performance of the six compared algorithms (i.e., MFO–HFA, MFO, ACoDE, DE, CLPSO and PSO) on 25 functions

It can be clearly seen from the Fig. 6 that MFO–HFA obtains better convergence speed and solution accuracy than other 5 variants of MFO on four unimodal functions (F1–F4), three multimodal functions (F9, F10, F11 and F13) and seven hybrid composition functions (F16–F20, F22 and F24). It proves that the methods proposed by this paper improve the convergence speed of original MFO and MFO–HFA has strong convergence ability. In addition, in early stage of evolution search, although MFO–HFA has a slower convergence speed than other five algorithms on some functions, it has strong exploration ability and has achieved better solution accuracy. It indicates that the historical flame archive strategy can improve the exploration capacity of MFO and make the moth population escapes the local optimal trap. MFO–HFA does not have a best convergence speed on five functions (F5, F7, F8, F12, F15 and F23), this may be because the local exploration ability of MFO–HFA is slightly poor.

It can be seen from Fig. 7 that MFO–HFA gains the highest convergence speed on four functions (F10, F11, F16 and F17), and MFO obtains the highest convergence speed on three functions (F5, F8 and F12). DE gets the highest convergence speed on eight benchmark functions (F1, F2, F4, F6 and F21-F24). Meanwhile, PSO only has the best convergence speed on test function F14, and CLPSO has obtained the best convergence speed on two functions (F9 and F15). These above analyses show that MFO–HFA has promising convergence speed in solving some complex problems.

4.4 Component analysis of MFO–HFA

To verify the effectiveness of the component of MFO–HFA, adaptive historical flame archive strategy is embedded in MFO (named for MFO-A), and MFO with top flame randomly matching mechanism is named for MFO-T. Meanwhile, MFO is utilized to optimize 25 benchmark functions of CEC2005, compared with MFO–HFA, MFO-A and MFO-T. The average ranking of the Friedman test of above algorithms is shown in Table 3. Friedman test (Das et al. 2011) is a non-parametric statistical test for comparison of more than two algorithms, utilized to show the differences in compared algorithms. All compared algorithms are ranked according to their average performance for each test function. It is worth noting that the KEEL software (Dukic and Dobrosavljevic 1990) is used to calculate the rankings of compared algorithms for all problems.

Table 3 Average rankings obtained by four algorithms on 25 benchmark functions of CEC2005

Table 3 clearly shows that MFO–HFA gains first performance, and its average rankings is 2.18. MFO-A obtains the second performance, and its average rankings is 2.24. Meanwhile, MFO-A is better than MFO, which indicate that adaptive historical flame archive strategy is effective. In addition, MFO-T is also better than MFO, and gains third performance. Those show that top flame randomly matching mechanism can fully use the information of top flames for improving the search ability of population. Finally, MFO–HFA is better than MFO-A and MFO-T, which demonstrate that adaptive historical flame archive strategy and top flame randomly matching mechanism are effectively integrated into MFO.

4.5 Sensitivity of the parameter q

The parameter q is a threshold to determine whether the flame is top flame. The choice of parameter q will influence the diversity of top flames, and then affect the performance of algorithms. To find out a good choice of the parameter q, MFO–HFA with different parameter q = {0.1, 0.2, …, 1} is used to optimize 25 benchmark problems of CEC2005. The other parameter settings of the algorithm are same as the settings described earlier.

Table 4 summarizes the results of average rankings of Friedman test. It clearly shows that 0.2 is best and a boundary. In the range [0.1, 0.2], the larger the q value, the better the ranking of the algorithm. In the range [0.2, 1.0], the larger the q value, almost, the worse the ranking of the algorithm. Therefore, we suggest q = 0.2 for the MFO–HFA.

Table 4 Average rankings obtained by the MFO–HFA with different q value on 25 benchmark functions of CEC2005

4.6 Comparison results of time complexity

The total comparisons of average time complexity of 25 functions in one iteration about compared algorithms are shown in Fig. 8 in the form of bar plot. In Fig. 8, it is clearly shown that the mean CPU time of both MFO–HFA and MFO is similar. Additionally, MFO–HFA is further better than PSO, ACODE, CLPSO and DE, which indicate that the time complexity of MFO–HFA is acceptable.

Fig. 8
figure 8

Mean CPU time of compared algorithms on 25 benchmark functions of CEC2005

5 Rule-based network intrusion detection problem

Network Intrusion Detection System (NIDS) is designed to identify and prevent the misuse of the computer networks. Most of the current IDSs are rule-based, and their performance significantly depends on sets of pre-defined rule that are provided by experts or automatically created by system. Therefore, the update of rule is critical to rule-based IDS. However, the update of rule is a nonlinear, non-differentiable and non-separable complex problem. To verify the performance of MFO–HFA on real-world engineer optimization problems, MFO–HFA is utilized to optimize the rule updating of network intrusion detection. In this section, NSL-KDD dataset (Meena and Choudhary 2017; NSL-Kdd dataset) is used to train individuals and test individuals of MFO–HFA and the compared intelligence algorithms.

5.1 Rule-based network intrusion detection method

The intelligent algorithm-based network intrusion detection is firstly proposed by Chittur et al. (Chittur 2001), which is classical rule-based intrusion detection method. There are two crucial points when using intelligent algorithms to solve the problem of rule updating.

Firstly, the rule provided by intelligent algorithms is based on data analysis for the network intrusion detection problem. Each attribute in the rule is designed to preserve a randomization parameter for each data, and this parameter multiplied by the data would obtain a weight value for the determinacy of whether a certain data is an attack or not. The determinacy formula, \(C_{i}\), of whether record \(R\) is classified as an attack by rule \(X_{i}\), is described as follows.

$$ C_{i} \left( {X_{i} } \right) = \mathop \sum \limits_{j = 1}^{n} (R_{j} \times X_{i,j} ) $$
(16)

where \({\text{X}}_{\text{i,j}}\) denotes the random parameter for attribute \({\text{R}}_{\text{j}}\), and n represents the number of attributes. Furthermore, the arbitrary threshold value is established, and any determinacy value which exceeds this threshold value is regarded as a malicious attack.

Secondly, the fitness of an individual is dependent upon how many attacks are correctly detected and how many normal data are viewed as malicious attack. The false positives are expressed as a positive ratio of total normal data while correct detections are expressed as a negative ratio of total attacks. In this experiment, the fitness function F of specific individual \({\text{X}}_{\text{i}}\) is given as follows.

$$ F\left( {X_{i} } \right) = \frac{\beta }{B} - \frac{\alpha }{A} $$
(17)

where α is the number of correctly detected attacks, and A stands for the number of total attacks. Besides, β is the number of false positive, and B denotes the total number of normal data. The fitness value is over the closed interval [− 1,1] with − 1 being the worst fitness and 1 being the best fitness.

5.2 NSL-KDD dataset

The KDD dataset was generated via a simulated U.S. Air Force local-area network set up at Lincoln Labs, which was operated similarly to a standard Air Force network, excepting for planned and recorded attacks. In this paper, an improved KDD dataset (NSK-KDD) is used to test the performance of MFO–HFA. Although NSL-KDD dataset may not be a perfect representative of practical networks because of the lack of public data sets for network-based IDSs, it still can be used as an effective benchmark dataset to compare different network intrusion detection methods. The NSL-KDD dataset does not include redundant records of the classic KDD99 dataset in the train set, so that the classifiers will not have a preference on frequent records. Meanwhile, there are no duplicate records in the test sets, and the number of selected records from each difficulty level group is inversely proportional to the rate of records in the original KDD99 dataset.

The NSL-KDD dataset was split into twin sections, i.e., training dataset and testing set. The training set consisted of 125,973 network connections, and test set compose of 22,544 network connections. Each one of network connection records compose of 41 attributions and a label attribution. In addition, every string type attribute of original records of NSL-KDD are digitized. For example, the numerical value of the 2th attribute will be 0 when this attribute is “TCP” type. If the “protocol_-type” is “ICMP”, 1 is treated as the value of the 2th attribute.

5.3 Parameter settings of compared algorithms

The parameters of six intelligent algorithms (i.e., MFO–HFA, MFO (Shehab et al. 2018), AMFO (Zhao et al. 2018), GMFO (Xu et al. 2019a), CMFO (Xu et al. 2019a), LMFO (Xu et al. 2019a) and OMFO (Apinantanakon and Sunat 2017)) are set as follows. The size of moth population is 100, and the dimension D of individuals is equal to 41. In addition, other parameters of comparison algorithms are the same with their original paper, and the threshold value is set to 1. The initial upper boundary of all dimensions of individuals is 1, and the initial lower boundary is − 1. It is worth noted that there is no boundary in decision space, and the initial boundary is only used to initialize the population. Each algorithm is independently run 25 times with 410,000 function evaluations.

5.4 The results of experiment simulation

The standard metrics of precision, recall, F1-Score and accuracy are used to evaluate the performance of MFO–HFA algorithms on NSL-KDD dataset. These metrics can be realized in terms of TP (True Positive), FP (False Positive), TN (True Negative) and FN (False Negative). TP denotes the number of data is classified as an attack, and which are actual attack. FP implies the number of data that are detected as an attack, but which are actual normal data. TN is the number of data that are classified as normal data, and which are actual normal. FN indicates the number of data that are classified as normal data, but which are actual attack. Precision is defined as the proportion of positive identifications that are actual correct.

$${\text{Precision}}=\frac{\text{TP}}{{\text{TP}}+{\text{FP}}}$$
(18)

Recall denotes the TP rate, i.e., the proportion of correct predictions to that of actual attack.

$${\text{Recall}}=\frac{\text{TP}}{{\text{TP}}+{\text{FN}}}$$
(19)

F1-Score represents the harmonic mean of precision and recall.

$${\text{F}}1 - {\text{Score}}= \text{2*} \frac{{\text{Precision}}*{\text{Recall}}}{{\text{Precision}}+{\text{Recall}}}$$
(20)

Accuracy is defined as ratio of correct prediction to that of total amount of data.

$${\text{Accuracy}}=\frac{{\text{TP}}+{\text{TN}}}{{\text{TP}}+{\text{TN}}+{\text{FP}}+{\text{FN}}}$$
(21)

The evaluation of all algorithms on NSL-KDD dataset is listed in Table 5. It can be clearly seen from the Table 5 that MFO–HFA gains 96.47% accuracy and obtains the highest score on Precision, Recall and F1-Score, compared with other six algorithms, which indicates that MFO–HFA exhibits promising performance on the rule-updating of network intrusion detection.

Table 5 Evaluation of all algorithms on NSL-KDD dataset

In the Fig. 9, the vertical axis is the mean value of fitness over independent 10 runs, and the horizontal axis is the sampling point where 42 sampling points were taken from FES = 1000 and mod (FES, 10,000) = 0). It can be seen from Fig. 8 that MFO–HFA obtains the highest convergence speed and the best solution accuracy on first sampling point. Based on above analysis, it can be concluded that MFO–HFA is effective for rule-based network intrusion detection and has great potential in solving practical engineering problems.

Fig. 9
figure 9

Convergence performance of the seven algorithms on network intrusion detection

6 Conclusions

This paper proposes a variant of MFO algorithm, which is named MFO with historical flame archive, MFO–HFA for short. An adaptive historical flame archive strategy and a top flame randomly matching mechanism are integrated into MFO. In MFO–HFA, a new flame matrix (historical flame archive) is applied instead of the old flame matrix in MFO. The adaptive historical flame archive strategy adaptively updates flame so that population can keep the historical optimal solution information. In addition, the top flame randomly matching mechanism is utilized to accelerate the convergence speed and make full use of flame information. The performance of the MFO–HFA is compared with other variant of MFO, the original MFO and other state-of-the-art swarm intelligence algorithms on CEC 2005 benchmark functions and intrusion detection problem. Although no algorithm is optimal for every problem according to the theory of no free lunch, MFO–HFA has really obtained promising performance in solving complex shifted and rotated multi-modal problems, and real-world intrusion detection optimization problems.