1 Introduction

Optimization is the process which aims to find a more appropriate set of parameters for a given problem, in order to obtain a more desirable outcome. It is necessary for many problems which can be found in diverse fields, such as scientific, social, economic and engineering [1]. Among all kinds of optimization problems, continuous optimization problems take up a large proportion [2, 3]. In recent decades, using meta-heuristic algorithms to deal with such problems has attracted more and more attention and has become an important branch of optimization methodology.

Imperialist competitive algorithm (ICA) is a new population-based meta-heuristic algorithm proposed by Atashpaz-Gargari and Lucas in 2007, inspired by the historical phenomenon of imperialism and colonialism [4]. Due to its competitiveness over other meta-heuristics in terms of convergence rate and global search ability, the ICA has received significant interests within the short time period since its advent and has been successfully applied to a wide range of optimization tasks, which come from various fields such as mechanical engineering [58], electrical engineering [912], industrial engineering [1318], civil engineering [19, 20], petroleum engineering [2123] and computer engineering [2430]. However, similar to other population-based meta-heuristic algorithms, it also often suffers premature convergence and falls into local optimal area, especially when the problem is complicated, high dimensional or multipeak [3134].

Several improved approaches have been proposed to enhance the algorithm’s performance. Bahrami et al. [31] proposed a method which utilizes chaotic maps instead of the original uniform distribution to adjust the angle of colonies’ movement toward imperialist, to help the algorithm escaping from local optima; Abdechiri et al. [32] proposed an adaptive ICA, in which a probability density function is introduced and used to dynamically adapt the angle, to balance the exploration ability and exploitation ability of the algorithm. Arish et al. [35] proposed a fuzzy version, in which, in the absorption policy, colonies are moved toward the resulting vector of all imperialists with the aid of a fuzzy membership function. Kaveh and Talatahari [20, 36] presented an improved approach with two new defined movement steps and used the redefined algorithm to optimize the design of skeletal structures and to solve other engineering optimization problems. As a further work, Talatahari et al. [34] proposed a method which utilizes chaotic variables to replace the random variables in the assimilation equation which guides the colonies’ movements. They proposed a total of three replacement approaches and investigated seven different chaotic maps to generate the chaotic variables.

All the improved approaches mentioned above are focused on enhancing the performance of the original algorithm through changing the moving mode of the colonies. Some modifications which are quite complicated are needed, bringing some difficulties to the implementation and application of these improved approaches. In this article, we propose a novel improved method which is simple in structure and is very easy to be carried out. The main idea of the proposed approach is to enhance the algorithm’s performance by changing the moving mode of the imperialists through applying mutation operators to them. Three different mutation operators, the Gaussian mutation, the Cauchy mutation and the Lévy mutation, are investigated particularly through numerical experiments.

The rest of this article is structured as follows. In Sect. 2, a brief introduction about the basic ICA is given. In Sect. 3, the proposed improved approach is set out in detail. In Sect. 4, the numerical experiments, results, related discussions and the comparisons with other algorithms are given. And in Sect. 5, a conclusion of this work and the key areas of the future works are provided.

2 Basic imperialist competitive algorithm

The ICA simulates the colonial competition in human society. Its implementation can be divided into the following steps [4].

2.1 Initialize the empires

Similar to other population-based meta-heuristic algorithms, ICA also begins with a randomly generated population which contains N initial solutions. In ICA, each individual is called a ‘country.’ For an N var-dimensional optimization problem, a country is a 1 × N var array whose elements are randomly generated in the allowable range of the corresponding parameters, as:

$${\text{country}} = \left[ {p_{1} ,p_{2} ,p_{3} , \ldots ,p_{{N_{\text{var}} }} } \right]$$

After, the cost of each country is calculated by the cost function.

$${\text{cost}} = f({\text{country}}) = f(p_{1} ,p_{2} ,p_{3} , \ldots ,p_{{N_{\text{var}} }} )$$

Then, all the initial countries would be divided into two classes. Some best countries among them would be regarded as ‘imperialists,’ and the rest countries would be regarded as ‘colonies.’ After that, the cost of every imperialist is normalized by Eq. (1) [4], where C n and c n stand for the normalized cost and the cost of the nth imperialist, respectively. And then, the normalized power of each imperialist is calculated by Eq. (2) [4], where p n stands for the normalized power of the nth imperialist and N imp stands for the total number of imperialists.

$$C_{n} = c_{n} - \mathop {\hbox{max} }\limits_{i} \left\{ {c_{i} } \right\}$$
(1)
$$p_{n} = \left| {\frac{{C_{n} }}{{\sum\nolimits_{i = 1}^{{N_{\text{imp}} }} {C_{i} } }}} \right|$$
(2)

ICA uses the concept of ‘empire’ to proceed with subsequent processes. Here, an empire is made up of one imperialist and some colonies. The initial solutions will form several empires. Every initial colony would be distributed to one and only one imperialist. How many colonies an imperialist can obtain is proportional to its power and calculated by Eq. (3) [4], where N.C. n and N col stand for the number of the colonies distributed to the nth imperialist and the total number of the colonies, respectively. For each imperialist, N.C. n colonies are randomly selected from the initial population and distributed to it.

$$N.C._{n} = {\text{round}}\left\{ {p_{n} \cdot N_{\text{col}} } \right\}$$
(3)

2.2 Assimilation, revolution and uniting

After empires are initialized, within each empire, colonies would be moved toward the imperialist. This process is called ‘assimilation,’ as illustrated in Fig. 1. Each colony is moved x units to the relevant imperialist every time. Meanwhile, a random amount of deviation (the θ in Fig. 1) is added to the movement direction, to enhance the exploration ability of the algorithm. x and θ are defined as shown in Fig. 1, where d is the distance between colony and the relevant imperialist; β is a number which is >1 and makes the colony to get closer to the imperialist from both sides, and γ is a parameter whose value determines the size of the deviation added to the original direction. Usually, a value of about 2 for β and a value of about π/4 (rad) for γ can get a good performance [4].

Fig. 1
figure 1

Assimilation process

Meanwhile, some colonies would be randomly selected out according to a preset rate and then be replaced with an equal number of new randomly generated countries. This process is called ‘revolution.’ The revolution process is similar to the mutation operator in the genetic algorithm, used to enhance the algorithm’s ability to escape from local optima and to avoid premature convergence. The rate is called ‘revolution rate’ [4].

In the processes mentioned above, if a colony becomes better than the relevant imperialist, their roles will be exchanged. Meanwhile, if two imperialists are moved to a similar position (the distance between them is smaller than a preset threshold distance), the two relevant empires would be united to one empire. The new empire would take over all the colonies of the two previous empires and take one of the two previous imperialists as its imperialist [4].

2.3 Competition between empires

Competition between empires is the core of the ICA. In this stage, firstly, the total cost of every empire is calculated by Eq. (4) [4], on the basis of the cost of its imperialist and colonies. Then they would be normalized by Eq. (5) [4]. Here, T.C. n and N.T.C. n stand for the total cost and the normalized total cost of the nth empire, respectively, and ξ is a decimal fraction, whose value determines the weight of the colonies’ cost in the total cost.

$$T.C._{n} = {\text{Cost}}({\text{imperialist}}_{n} ) + \xi \cdot {\text{mean}}\{ {\text{Cost}}({\text{colonies}}\;{\text{of}}\;{\text{empire}}_{n} )\}$$
(4)
$$N.T.C._{n} = T.C._{n} - \mathop {\hbox{max} }\limits_{i} \{ T.C._{i} \}$$
(5)

Then, the weakest colony of the weakest empire would be picked out and become a temporary independent country. Other empires would contend for control of this independent country. Every imperialist may but only one can succeed finally. The success probability of each empire is proportional to its power, as that given by Eq. (6) [4].

$$P_{{P_{n} }} = \left| {\frac{{N.T.C._{n} }}{{\sum\nolimits_{i = 1}^{{N_{\text{imp}} }} {N.T.C._{i} } }}} \right|$$
(6)

The ICA realizes the competition process described above in the following method. Firstly, the success probability of each empire would be used to constitute a vector P, as Eq. (7) [4]. And then, a vector R with the same size as P and whose elements are randomly generated in the interval [0,1] is created as Eq. (8) [4]. Then, a vector D is obtained by subtracting R from P as Eq. (9) [4]. Finally, the empire whose relevant index in D is largest will obtain the mentioned colony. This handling method is similar to but quicker than the conventional roulette wheel method in the genetic algorithm.

$$P = \left[ {p_{{P_{1} }} ,p_{{P_{2} }} ,p_{{P_{3} }} , \ldots ,p_{{P_{{N_{\text{imp}} }} }} } \right]$$
(7)
$$R = \left[ {r_{1} ,r_{2} ,r_{3} , \ldots ,r_{{N_{\text{imp}} }} } \right]\quad {\text{where}}\quad r_{i} \sim U(0,1)\quad {\text{and}}\quad 1 \le i \le N_{\text{imp}}$$
(8)
$$D = \left[ {D_{1} ,D_{2} ,D_{3} , \ldots ,D_{{N_{\text{imp}} }} } \right]\text{ } = P - R = \left[ {p_{{P_{1} }} - r_{1} ,p_{{P_{2} }} - r_{2} ,p_{{P_{3} }} - r_{3} , \ldots ,p_{{P_{{N_{\text{imp}} }} }} - r_{{N_{\text{imp}} }} } \right]$$
(9)

In each generation of the ICA, the processes of assimilation, revolution and competition are carried out in sequence. As iteration proceeds, the weak empire steadily loses its colonies and the powerful empire obtains more and more colonies. In this process, the empire which loses all its colonies will be collapsed. The ultimate result is that there is only one empire left in the solution population. The imperialist of the empire is the solution obtained by the algorithm. Usually, a preset iterative number can be used as a termination condition. Figure 2 illustrates the entire process of the original ICA.

Fig. 2
figure 2

Process of the original ICA

3 The improved ICA combined with mutation operators

3.1 Analysis of the basic ICA

At the early stage of iterations, the colonies of every empire are dispersed in the whole search space and can be moved in a wide range in the assimilation process. Meanwhile, the competition between multiple empires gives the colonies a chance to be transferred from one empire to another. These two aspects can ensure the diversity of the population. Moreover, though the imperialists are the best individuals in the current population, their qualities are still poor at this stage. So a colony has a big possibility to become better than the relevant imperialist and then replace it. Once an imperialist is replaced, the movement directions of all the colonies it controls would be changed. Thus, the diversity of the population can be enhanced further. However, as the algorithm proceeds, the colonies gradually move closer and closer to the relevant imperialists. Accordingly, their movement ranges also become smaller and smaller. Meanwhile, the competition process would decline the number of empires. When there is only one empire left, all the colonies can only move toward the same imperialist. Obviously, at this time, the diversity of the population would decline greatly. If the remained imperialist is located on a local optimal area, the algorithm would not have enough ability to jump out.

From what have been discussed above, it can be possible to conclude that maintaining the diversity of the population is essential for enhancing the global search ability of the algorithm and avoiding premature convergence. Applying mutation operators to the individuals in the iterative process is a good choice. However, how to implement the mutation operators is the key question. From the description about the mechanism of the ICA, we can find that the imperialists play key roles during the search process. They are the best found solutions, and they would share their location information to other solutions. Therefore, we prefer to apply mutation operators on them (Fig. 3). In addition to directly increasing the diversity of the population, the mutation operators applied to the imperialists can also give them self-exploration ability. In the original algorithm, the imperialists are stationary, unless they are replaced by other better colonies. After the application of mutation operators, the imperialists can take the initiative to explore new better positions. Therefore, the algorithm would get more opportunities to keep away from stagnation.

Fig. 3
figure 3

Mutation operator brings the imperialist self-exploration ability

3.2 Various mutation operators

Many different mutation operators have been proposed and applied to various meta-heuristic algorithms, such as Gaussian mutation [3740], Cauchy mutation [37, 38, 41], Lévy mutation [4244], exponential mutation [45], t mutation [46], chaotic mutation [47] and mixed or hybrid mutation [48, 49]. Among them, Gaussian mutation, Cauchy mutation and Lévy mutation are the most widely used approaches. Therefore, in our work, these three mutation operators are investigated particularly.

Though mutation operators have various forms, they share such a same basic idea: forcing the algorithm to search in new regions by the means of using random generated numbers to change the positions of the current candidate solutions. How to generate the random numbers is the main difference between different mutation operators. On the other hand, even a same kind of mutation operation can be implemented by various specific forms. In our study, mutation operators are applied to the basic ICA according to Eq. (10).

$$X_{i}^{{j{\prime }}} = X_{i}^{j} \cdot (1 + k \cdot N_{\text{random}} )\quad {\text{where}}\quad j = 1,2,3, \ldots ,N_{\text{var}} ,$$
(10)

where \(X_{i}^{j}\) is the jth variable of the ith imperialist, k is an additional scale parameter, and N random indicates that a random number which is generated based on the mutation operators and generated anew for each variable. As already mentioned previously, three different mutation operators are investigated in our work. In them, N random is generated according to different kind of random distributions.

3.2.1 Gaussian mutation

In the Gaussian mutation, the random numbers are generated based on the Gaussian distribution, whose one-dimensional probability density function can be given as Eq. (11) [50]

$$f_{{\mu ,\sigma^{2} }} (x) = \frac{1}{{\sigma \sqrt {2\pi } }}{\text{e}}^{{ - \frac{{(x - \mu )^{2} }}{{2\sigma^{2} }}}} ,$$
(11)

where μ is the mean value and σ 2 is the variance. For convenience, Gaussian distribution can be described as N(μ, σ 2). In our work, the standard Gaussian distribution with μ = 0 and σ = 1, described as N(0,1), is used to generate random numbers.

3.2.2 Cauchy mutation

In the Cauchy mutation, the random numbers are generated based on the Cauchy distribution, whose one-dimensional probability density function can be given as Eq. (12) [50]

$$f_{t} (x) = \frac{1}{\pi }\frac{t}{{t^{2} + (x - x_{0} )^{2} }},$$
(12)

where x 0 is a location parameter which specifies the location of the peak of the distribution and t is a scale parameter which specifies the half width at half maximum. For convenience, Cauchy distribution can be described as C(t, x 0). In our work, the standard Cauchy distribution with t = 1 and x 0 = 0, described as C(1, 0), is used to generate random numbers.

3.2.3 Lévy mutation

In the Lévy mutation, the random numbers are generated based on the Lévy distribution. In a sense, Lévy distribution is a generalization of Gaussian distribution and Cauchy distribution. Its probability density function can be given as Eq. (13) [38, 43, 44]

$$L_{\alpha ,\gamma } = \frac{1}{\pi }\int_{0}^{\infty } {{\text{e}}^{{ - \gamma q^{\alpha } }} } \cos (qy){\text{d}}q,$$
(13)

where γ is the scaling factor satisfying γ > 0 and α satisfies 0 < α < 2 and controls the shape of the distribution. In our work, we use the Lévy distribution with γ = 1 and α = 1.3, according to that recommended in the literature [51], and adopt the algorithm proposed by Mantegna [52] to obtain Lévy random numbers.

Figure 4 diagrams the difference between the probability density functions of the standard Gaussian distribution, the standard Cauchy distribution and the Lévy distribution with γ = 1 and α = 1.3. From Fig. 4, it can be observed that these three distributions have different characteristics. The standard Gaussian distribution tends to generate random numbers which are closer to 0, the standard Cauchy distribution tends to generate random numbers which are farther from 0, and the Lévy distribution lies between the standard Gaussian distribution and the standard Cauchy distribution. Therefore, it can be expected that the three different mutation operators could bring different effects.

Fig. 4
figure 4

Comparisons between the probability density functions of the standard Gaussian distribution, the standard Cauchy distribution and the Lévy distribution (γ = 1, α = 1.3)

As mentioned previously, in our work, the mutation operators are applied on the imperialists. Specifically, in every iteration, the selected mutation operator is applied to every imperialist dimension by dimension. The elements of the imperialist are updated one by one, according to Eq. (10). If the new element exceeds its domain, it will be restricted on the search boundaries. After finishing the update, the newly obtained solution would be evaluated immediately. Then, a greedy strategy is applied to decide whether to accept this update or not. If the new solution is better than the previous one, the update would be accepted, or otherwise the update would be abandoned.

Figure 5 shows the pseudo-code of the mutation operators. In our method, this procedure is inserted between step 8 and step 9 in Fig. 2.

Fig. 5
figure 5

Pseudo-code of the applying process of the mutation operators

4 Numerical experiments, results and discussion

The three different mutation operators produce three different improved algorithms. For convenience, in the following paragraphs, the basic ICA, the improved ICA with Gaussian mutation, the improved ICA with Cauchy mutation and the improved ICA with Lévy mutation will be abbreviated as BICA, IICA-G, IICA-C and IICA-L, respectively.

4.1 Benchmark functions

In order to demonstrate the improved effects, all the four algorithms are tested together and then compared on nine widely used minimized benchmark functions. The dimension of every function is set to 30 in our study. The name, formula, variable range and theoretical optimal value of these benchmark functions are listed in Table 1. These functions have different characteristics: F 1, F 2, F 4, F 5 and F 6 are unimodal; F 5 is discontinuous; F 7, F 8 and F9 are multimodal functions; the number of local minima increases exponentially with the problem dimension [37].

Table 1 Benchmark functions used in the experimental study

4.2 Experiments settings

All the four algorithms are implemented in MATLAB R2012a, under a PC with Pentium 4 at 2.9 GHZ, 4 GB RAM and Windows 7 Ultimate Operating system. On every function, every algorithm is run 30 independent trials to eliminate the influence caused by the randomness of the algorithms. Meanwhile, the maximum number of fitness function evaluations (MAX_NFFEs) is used as a termination condition. Every trial ends when the MAX_NFFEs is reached. The MAX_NFFEs is set to 150,000 for F1, F5, F6 and F8, 200,000 for F2 and F7, 500,000 for F3 and F4 and 300,000 for F9, according to relevant literatures [37, 53].

To be fair, all the common parameters in the four algorithms are set to same, as:

  1. 1.

    The number of initial countries is set to 100.

  2. 2.

    The number of initial empires is set to 3.

  3. 3.

    The revolution rate is initialized to 0.3 and exponentially declines with the number of iterations, as that illustrated by Eq. (14).

    $${\text{revolution}}\;{\text{rate}} = 0.3 \times 0.99^{{{\text{iteration}}\;{\text{number}}}}$$
    (14)
  4. 4.

    The ξ in the formula 4 is set to 0.02.

The number of initial countries and the number of initial empires are set through lots of experiments. These two parameters have great influences on the quality of obtained results. If they are set too small, the algorithms would be easily trapped into local optimum and obtain bad results finally, because the diversity of population cannot be guaranteed. On the other hand, if they are set too large, the fitness evaluations would be exhausted very quickly, and the algorithms also cannot obtain satisfactory results. The revolution rate and the ξ are set according to the suggestions given by the authors. These two parameters would also influence the performance of the algorithms, but the effects brought by them are relatively small.

The scale factor k in the mutation operators is set as:

  1. 1.

    For the Gaussian mutation, the standard Gaussian distribution is used and the scale factor k is set to 0.5.

  2. 2.

    For the Cauchy mutation, the standard Cauchy distribution is used and the scale factor k is set to 0.1.

  3. 3.

    For the Lévy mutation, α is set to 1.3 and the scale factor k is set to 0.2.

The values of the scale factor k are also determined through experiments. In a whole, a larger value of k means that the positions of the imperialist would be changed greatly after the mutation operator is applied. We tested the improve algorithms with different values of k on these functions and finally determined the values mentioned above. In what follows, a further illustration about the influence of the value of k is given.

It should be pointed out that the parameters we used here are effective, but they are not the best for specific functions. During our previous experiments, we found that for some specific functions, using other parameters could obtain better results. Without loss of generality, in our tests, fixed parameters are used for the tests on different functions. We have not made more effort in finding the best parameter settings for every specific function, because this work has exceeded the scope of this article.

4.3 Influence of the scale factor

For meta-heuristic algorithms, the values of the parameters would influence the performance greatly. The proposed approach introduces a new parameter: the scale factor k. To investigate the effect of this parameter to the performance of the proposed approach, we tested the proposed algorithms with different values of k on these functions and the results obtained on two benchmark functions: The Sphere function and the Griewank function are given here. Due to the different characteristics of the three mutation operators, the IICA-G is tested on the situations of k = 0.25, k = 0.5, and k = 0.75, and the IICA-C and the IICA-L are tested on the situations of k = 0.1, k = 0.2, and k = 0.3. For each value of k, every algorithm is tested for 20 times on each function and the results are arranged in Tables 2, 3 and 4, respectively.

Table 2 Comparisons of the results of the IICA-G with different values of k on the Sphere function and the Griewank function
Table 3 Comparisons of the results of the IICA-C with different values of k on the Sphere function and the Griewank function
Table 4 Comparisons of the results of the IICA-L with different values of k on the Sphere function and the Griewank function

From Tables 2, 3 and 4, it can be seen that the value of the scale factor k would influence the obtained results obviously. On the Sphere function, the obtained results become better and better along with the increase in the scale factor. On the Griewank function, the situation is different. The largest values of k have not produced best results. On other functions which have not been listed here, the values of k also affected the quality of the results. Based on an overall consideration of the quality of results obtained on these functions, we determined to set these parameters to 0.5 for the IICA-G, 0.1 for the IICA-C and 0.2 for the IICA-L, to guarantee the overall performance of the proposed algorithms.

4.4 Results and discussion

The statistic results of the four algorithms are arranged in Table 5, where ‘Min(best),’ ‘Mean,’ ‘Median,’ ‘Max(worst)’ and ‘Std’ represent the best obtained result, the average value of all the obtained results, the median value of all the obtained results, the worst obtained result and the standard deviation of all the obtained results of the corresponding algorithm, respectively; ‘Run time’ denotes the average computation time of the 30 trials for every function, reflecting the computational complexity of the algorithms. The best results have been detached in bold.

Table 5 Comparisons of the results obtained by the BICA, the IICA-G, the IICA-C and the IICA-L on the tested benchmark functions

From Table 5, it can be observed that all the three mutation operators bring the basic algorithm effective improvements. On every tested function, compared with the results obtained by the BICA, the results obtained by the IICA-G, the IICA-C and the IICA-L are closer to the theoretical optimal value. Meanwhile, the three improved algorithms show better stability and robustness. The results obtained by them are more stable than that obtained by the BICA. For instance, on F1, the results obtained by the BICA vary from 2.2725e−002 (the best result) to 2.0000e+004 (the worst result), which is a relatively large range, while the results obtained by the three improved algorithms are more concentrated. A similar situation can also be observed on the F2, F3, F5, F7 and F9.

There are some differences between the results obtained by the three improved algorithms. On F1, F2 and F4, the results obtained by the IICA-G are significantly better than that obtained by the IICA-C and the IICA-L; on F3, F5 and F6, the results obtained by the IICA-C are much better; on F7, all the three improved algorithms obtained the theoretical optimal value, and the difference between their results is very slightly because they are located on the same order of magnitude; on F8, the results obtained by the three improved algorithms also have the same order of magnitude, and the results obtained by the IICA-C are slightly better. Only on F9, there is no difference between the finally results obtained by the three algorithms. All the three algorithms obtained the theoretical optimal value in every run.

Meanwhile, it can also be observed that though the three improved algorithms show superiority than the BICA on every function, the improvements shown on different function are different. On F1, F2, F3, F4, F5, F8 and F9, the improvements are significant, while on F6 and F7, they are not so significant.

Furthermore, from Table 5, it can also be observed that the mutation operators increase the cost time, while at the same time enhancing the performance of the BICA. However, the increases are slight and acceptable.

In order to further study the difference among the behaviors of the four algorithms, the convergence curves of the four algorithms on every function are given in Fig. 6. The convergence curve is obtained by averaging the variations of the best obtained result with the NFFEs over the 30 runs.

Fig. 6
figure 6

Convergence curves of the four algorithms on every function

From Fig. 6, it can be observed that there are significant differences between the convergence curves of the four algorithms. Compared with the convergence curves of the BICA, the convergence curves of the three improved algorithms are steeper, indicating that the improved algorithms have faster convergence rate than the BICA. On F1–F6, the curves of the BICA become flat before the Max_NFFEs is reached, indicating that the BICA has got trapped into local optimum, while the curves of the three improved approaches decline steadily throughout the whole solving process. It can be predicted that if the Max_NFFEs is increased, the BICA cannot improve the obtained solutions further, while the three improved approaches with mutation operators can do that. On F7 and F8, all the curves of the four algorithms become flat before the Max_NFFEs is reached. However, the curves of the three improved algorithms converge faster and converge to more accurate solutions finally. On F9, the curve of the BICA becomes flat soon after the trials begin, while the curves of the three improved algorithms decline steadily. Finally, the curve of the IICA-G converges to the theoretical optimal value after about 50,000 NFFEs, and the curves of the IICA-C and the IICA-L also converge to the theoretical optimal value after about 150,000 NFFEs and about 120,000 NFFEs, respectively.

Meanwhile, it can be observed that on every function except F5 and F6, the IICA-G shows the fastest convergence rate, the IICA-L occupies the second position, and the IICA-C shows slowest convergence rate. On F5 and F6, the situation is the opposite. The IICA-C shows the fastest convergence rate, while the IICA-G shows the slowest convergence rate.

As what have been mentioned, the improvement effects brought by the three mutation operators are significant on some functions but not significant on some others. Meanwhile, the three improved algorithms show different behaviors on different functions. This is caused by the different characteristics of different functions. As we know, though meta-heuristic algorithms have the characteristics of generality, a specific meta-heuristic algorithm cannot be suitable to every benchmark function. In other words, the inherent characteristics would influence the performance of one algorithm greatly.

4.5 Comparison with other works

In this section, we compare the proposed approach with two other works. The first contrastive algorithm is the imperialist competitive algorithm combined with chaos (CICA), which is proposed in the literature [34] and used here as a representative of the previous improved approaches of the basic ICA; the second contrastive algorithm is the real coded genetic algorithm approach with random transfer vectors-based mutation (RCGA-RTVM), which is proposed in the literature [54] and used here as a representative of various meta-heuristic algorithms, due to its outstanding performance which has been proven in the literature [54].

In the literature [34], the CICA is tested on four 10D benchmark functions. To compare the IICAs we proposed with the CICA, we also test the IICAs on these four functions. Meanwhile, the results given in the literature [34] are directly taken into the comparisons, and every improved ICA we proposed is run 30 independent trials. To be fair, we use the same iterations number and search ranges given in the literature [34]. And the other parameters we used are as same as that used in Sect. 4.2. The results are arranged in Table 6. The best results have been detached in bold.

Table 6 Comparisons of the results obtained by the CICA [34], the IICA-G, the IICA-C and the IICA-L on four different functions

From Table 6, it can be clearly observed that on the Griewank function and the Rosenbrock function, the CICA shows better performance than the three IICAs proposed in this work. The results of the CICA are more accurate. While on the Rastrigin function and the Ackley function, the IICAs obtained better results: On Rastrigin function, the three improved algorithms proposed in this work obtained the optimal value in every trial; on Ackley function, the three IICAs also obtained better results and showed better stability.

In the literature [54], the RCGA-RTVM is tested on four 30D functions and one 2D function. To make the comparisons, we also tested the IICAs on these five functions. Each IICA is also run 30 independent trials on each function, and the MAX_NFFEs of each trial, the search range and the number of initial solutions are as same as those used in the literature [54]. The other parameters in the IICAs are as same as that presented in Sect. 4.2. The results of the comparison are arranged in Table 7, where the results of the RCGA-RTVM are directly taken from the literature [54], and the best results have also been detached in bold.

Table 7 Comparisons of the results obtained by the RCGA-RTVM [54], the IICA-G, the IICA-C and the IICA-L on five different functions

From Table 7, it can be seen that on the Rosenbrock function and the Step function, the improved ICAs obtained better results. The results are closer to the theoretical optimal value. On the Schweel 1.2 function and the Schweel 2.21 function, the RCGA-RTVM performs better than the three different IICAs. The results it obtained are significantly more accurate. On the Six-hump camel-back, because this function is relatively simple, all the four algorithms show satisfactory performances, and there is no significant difference between their results.

In summary, the proposed IICAs show better performance than the CICA and the RCGA-RTVM on some functions and show worse performance on some other functions. Each algorithm has its own specialties and is more suitable for some problems. In practice, when using meta-heuristic algorithm to deal with problem, usually the problem is regarded as a black box, and its characteristics cannot be obtained. Usually, different algorithms should be tried until a satisfactory solution is obtained. In these cases, the improved algorithms proposed in this work can be used as options.

4.6 An application of the proposed algorithms

In this section, we use a real-world optimization case to demonstrate the applicability of the proposed algorithms. This problem is a classical benchmark problem which aims to minimize the total weight of a simple gear box, which is shown in Fig. 7. It involves seven design variables, the face width b (x 1), module of teeth m (x 2), number of teeth on pinion z (x 3), length of first shaft between bearings l 1 (x 4), length of second shaft between bearings l 2 (x 5), diameter of first shaft d 1 (x 6) and diameter of second shaft d 2 (x 7). Moreover, this problem involves seven nonlinear and four linear constraints, limitations on the bending stress of gear teeth, surface stress, transverse deflections of shafts 1 and 2 due to transmitted force and stresses in shafts 1 and 2. Due to the fact that the third design variable (the number of teeth on pinion) should only be integral, this problem is a mixed integral-continuous constrained problem. The mathematical formulation of this problem can be summarized as follows [55]:

Fig. 7
figure 7

Structure of the gear box to be optimized

$$\begin{aligned} & f(x) = 0.7854x_{1} x_{2}^{2} (3.3333x_{3}^{2} + 14.9334x_{3} - 43.0934) \\ & \quad \quad - \,1.508x_{1} (x_{6}^{2} + x_{7}^{2} ) + 7.4777(x_{6}^{3} + x_{7}^{3} ) \\ & \quad \quad + \,0.7854(x_{4} x_{6}^{2} + x_{5} x_{7}^{2} ) \\ & {\text{Subject}}\;{\text{to:}} \\ & g_{1} (x) = \frac{27}{{x_{1} x_{2}^{2} x_{3} }} - 1\,\le\,0 \\ & g_{2} (x) = \frac{397.5}{{x_{1} x_{2}^{2} x_{3}^{2} }} - 1\,\le\,0 \\ & g_{3} (x) = \frac{{1.93x_{4}^{3} }}{{x_{2} x_{3} x_{6}^{4} }} - 1\,\le\,0 \\ & g_{4} (x) = \frac{{1.93x_{5}^{3} }}{{x_{2} x_{3} x_{7}^{4} }} - 1\,\le\,0 \\ & g_{5} (x) = \frac{{\sqrt {\left( {\frac{{745x_{4} }}{{x_{2} x_{3} }}} \right)^{2} + 16.9e6} }}{{110x_{6}^{3} }} - 1 \le 0 \\ & g_{6} (x) = \frac{{\sqrt {\left( {\frac{{745x_{5} }}{{x_{2} x_{3} }}} \right)^{2} + 157.5e6} }}{{85x_{7}^{3} }} - 1 \le 0 \\ & g_{7} (x) = \frac{{x_{2} x_{3} }}{40} - 1 \le 0 \\ & g_{8} (x) = \frac{{5x_{2} }}{{x_{1} }} - 1 \le 0 \\ & g_{9} (x) = \frac{{x_{1} }}{{12x_{2} }} - 1 \le 0 \\ & g_{10} (x) = \frac{{1.56x_{6} + 1.9}}{{x_{4} }} - 1 \le 0 \\ & g_{11} (x) = \frac{{1.56x_{7} + 1.9}}{{x_{5} }} - 1 \le 0 \\ & 2.6 \le x_{1} \le 3.6,\;0.7 \le x_{2} \le 0.8,\;17 \le x_{3} \le 28, \\ & 7.3 \le x_{4} \le 8.3,\;7.3 \le x_{5} \le 8.3,\;2.9 \le x_{6} \le 3.9, \\ & 5.0 \le x_{7} \le 5.5 \\ \end{aligned}$$

To handle the constraints that exist in this problem, the penalty function method is adopted. Penalty function method is a commonly used constraint-handling method which can transform a constrained problem to an unconstrained problem by the aid of a self-defined penalty function. In this work, the penalty function is defined as Eq. (15).

$$F(x) = f(x) +\Delta \cdot \sum\limits_{i = 1}^{N} {\hbox{max} (0,g_{i} (x))^{2} } ,$$
(15)

where F(x) is the fitness value of a solution x in the transformed problem; f(x) is the original objective function value; N is the total number of constraints; \(\Delta\) is the penalty factor, and it is set to 1e20 in our work.

The BICA, the IICA-G, the IICA-C and the IICA-L are tested together on this problem and then compared with other four algorithms. Because the dimension of this problem is relatively low, in our study, the number of countries is set to 25, the number of imperialists is set to 2, the maximum number of fitness function evaluations is set to 20,000, and every algorithm is tested 10 independent times. The results of the four algorithms are arranged in Table 8.

Table 8 Comparisons of the results obtained by the BICA, the IICA-G, the IICA-C and the IICA-L on the gear box design problem

As a classical benchmark case, this problem has been solved by many different algorithms. In our study, we compared the best solutions obtained by the three improved ICAs with that obtained by other four existing algorithms, the society and civilization algorithm (SCA) [56], the Evolution Strategy (μ + λ-ES) [57], the artificial bee colony algorithm (ABC) [55] and the cuckoo search algorithm (CS) [58]. The parameter and constraint values of the best solutions obtained by different algorithms are arranged in Table 9.

Table 9 Comparisons of the parameter and constraint values of the best solutions obtained by the SCA [56], the (μ + λ)-ES [57], the ABC [55], the CS [58], the IICA-G, the IICA-C and the IICA-L for the gear box design problem

From Table 9, it can be observed that the results obtained by the three improved ICAs are better than that obtained by the other algorithms. Though all the best solutions obtained by the algorithms listed in the table are feasible because no any constraint is violated, the corresponding objective function value of the solutions obtained by the IICA-G, the IICA-C and the IICA-L is smallest. This case suggests that the proposed algorithms have good applicability.

5 Conclusion and future work

In this work, an improved method for the imperialist competitive algorithm to enhance its performance is proposed by introducing mutation operators into the original algorithm. In the proposed approach, the mutation operators are applied on the imperialist to change their behaviors and then to obtain better performance. The proposed approach is simple in structure and easy to be carried out. Based on the proposed method, three improved algorithms are obtained by using three different mutation operators, the Gaussian mutation, the Cauchy mutation and the Lévy mutation, and then are investigated particularly by experiments. The experimental results suggest that they can effectively improve the performance of the original algorithm. Compared with the original ICA, the three obtained improved algorithms have faster convergence rate, better global search ability and better stability and would not dramatically increase the time consumption.

Meanwhile, the three improved algorithms are also compared with two other excellent algorithms. The comparative results suggest that the proposed algorithms have their own advantages. They can obtain better results than the contrastive algorithm on some functions. Therefore, the proposed algorithms can be used as an option in practical applications.

Finally, the three improved algorithms are tested on a real-world optimization case and compared with other four existing algorithms. The results suggest that the proposed algorithms have good applicability. All the three variants with different mutation operators can obtain better solutions than the four contrastive algorithms.

This work suggests that changing the behavior of the imperialists can enhance the performance of the original algorithm. Therefore, more attention should be paid to the behavior of the imperialists in future work. Additionally, the proposed improved algorithms should be used to solve more real-world optimization problems in future.