1 Introduction

Many real-world application problems in engineering, science and technology can be formulated as continuous optimization problems (CnOPs) [1,2,3,4,5]. Meta heuristics (e.g., simulated annealing (SA), evolutionary algorithms (EAs), differential evolution (DE), particle swarm optimization (PSO), ant colony optimization (ACO), estimation of distribution algorithms (EDA), etc.) are a family of optimization techniques that have seen increasingly rapid development and have been applied to CnOPs over the past few years [6]. Although these approaches have shown excellent search abilities when applying to some 30–100 dimensional problems, many of them suffer from the “curse of dimensionality”, which implies that their performance deteriorates quickly as the dimensionality of search space increases [6]. Complexity of the problem usually increases with the size of problem and the solution space of the problem increases exponentially with the problem size. Thus more efficient search strategies are required to solve the large scale CnOPs. Zhang et al. proposed a specially tailored EA based on a decision variable clustering method [7] and an approximate non-dominated sorting for evolutionary many-objective optimization [8]. Historically, scaling EAs to large size problems have attracted much interest, including both theoretical and practical studies. The earliest practical approach might be the parallelism of an existing EA. Later, cooperative coevolution appears to be another promising method.

Ant colony optimization is inspired by the ants’ foraging behavior and it was first applied to solve discrete optimization problems [4, 9, 10]. Socha and Dorigo [5] applied ACO for continuous domains, called ACO\(_\mathrm{R}\). The fundamental idea underlying ACO for the continuous domains is the shift from using a discrete probability distribution to using a continuous one, that is, a probability density functions (PDFs). It uses a solution archive as a form of pheromone model for the derivation of a probability distribution over the search space. However, ACO\(_\mathrm{R}\) concentrates mainly on the small-scale CnOPs, and for the larger CnOPs or multi-modal CnOPs, the results obtained by ACO\(_\mathrm{R}\) are far from being competitive with the results obtained by the other algorithm.

In this paper, to solve efficiently larger CnOPs, several variants of ant colony optimization (ACO\(_\mathrm{R})\) with crossover operations are suggested, which is to keep a proper trade-off mechanism between diversification and intensification. It is also known well that hybridization of EAs with other techniques can greatly improve the efficiency of search [7]. In the proposed algorithm, the operation similar to the crossover in the GA is introduced to generate some new PDF set in the promising space, which is aim at balancing the diversification and intensification. As a result, the pheromone information is enhanced in the promising space, and the global optima can be found efficiently. Additionally, the crossover operation helps the ant colony exploit the correlation information among the design variables.

The proposed algorithms are evaluated by using 21 test functions whose dimension is 30–1000. We compare the results with other continuous optimization methods in the literature. The results show the used crossover operators improve efficiently the performance of the ACO\(_\mathrm{R}\) and perform better than the compared algorithms.

2 ACO with crossover operators for continuous optimization

2.1 ACO\(_\mathrm{R}\)

One of the first attempts to apply an ant-related algorithm to the CnOPs is continuous ACO (CACO) [11]. In the CACO, the notion of the nest is introduced, but the CACO does not perform an incremental construction of solutions, which is one of the main characteristic of the ACO metaheuristic. Another ant-related approach to the CnOPs is the API algorithm [12], in which the ants perform their search independently, but starting from the same nest. The third ant-based approach to the CnOPs is continuous interacting ant colony (CIAC) [13]. Other several ant-inspired algorithms for CnOPs were proposed [14,15,16]. However, as explained in Socha and Dorigo [5], most of these algorithms use search mechanisms different from those used in the ACO metaheuristic [15]. The first algorithm that can be classified as an ACO algorithm for continuous domains is ACO\(_\mathrm{R}\) [5]. In ACO\(_\mathrm{R}\), the discrete probability distributions used in the solution construction by ACO algorithms for combinatorial optimization are substituted by PDFs. ACO\(_\mathrm{R}\) uses a solution archive for the derivation of these PDFs over the search space. Additionally, ACO\(_\mathrm{R}\) uses sums of weighted Gaussian functions to generate multimodal PDFs.

ACO\(_\mathrm{R}\) initializes the solution archive with k solutions that are generated uniformly at random. Each solution is a D-dimensional vector with real-valued components \(x_i \in \left[ {x_{\min } ,x_{\max } } \right] \), with \(i=1\), \(2,\ldots , D\).

In this paper, we assume that the optimization problems are unconstrained except possibly for bound constraints of the D real-valued variables \(x_{i}\). The k solutions of the archive are kept sorted according to their quality (from best to worst) and each solution \(S_{j}\) has associated a weight \(\omega _j \). This weight \(\omega _i \) is calculated using a Gaussian function as [15]:

$$\begin{aligned} \omega _j =\frac{1}{qk\sqrt{2\pi }}\mathop e\nolimits ^{\frac{-\left( {rank(j)-1} \right) ^{2}}{2\left( {qk} \right) ^{2}}} \end{aligned}$$
(1)

where rank(j) is the rank of solution \(S_{j}\) in the sorted archive, and q is a parameter of the algorithm. By computing rank(j), the best solution receives the highest weight. The weights are used to choose probabilistically a guiding solution around which a new candidate solution is generated. The probability of choosing solution \(S_{j}\) as guiding solution is given by Eq. (2) [17]:

$$\begin{aligned} p_j =\frac{w_j }{\sum _{r=1}^k {w_r } } \end{aligned}$$
(2)

So that the better the solution, the higher are the chances of choosing it. Once a guiding solution \(S_{guide}\) is chosen, the algorithm samples the neighborhood of the i-th real-valued component of the guiding solution \(\mathop s\nolimits _{guide}^j \)using a Gaussian PDF with \(\mathop s\nolimits _{guide}^j =\mathop \mu \nolimits _{guide}^j \), and \(\mathop \sigma \nolimits _{guide}^j \) equal to

$$\begin{aligned} \mathop \sigma \nolimits _{guide}^j =\xi \sum _{r=1}^k {\frac{\left| {\mathop S\nolimits _r^i -\mathop S\nolimits _{guide}^i } \right| }{k-1}} \end{aligned}$$
(3)

which is the average distance between the value of the i-th component of \(S_{guide}^i \) and the values of the i-th components of the other solutions in the archive, multiplied by a parameter\( x_{i}\) [17]. The process of choosing a guiding solution and generating a candidate solution is repeated a total of \(N_{a}\) times (corresponding to the number of “ants”) per iteration. Before the next iteration, the algorithm updates the solution archive keeping only the best k of the \(k+N_{a}\) solutions that are available after the solution construction process.

Fig. 1
figure 1

Updating procedure of PDFs of the COACO\(_\mathrm{R}\)

2.2 The scheme of ACO with crossover operators

Establishing a balance between exploration and exploitation for global search algorithm is important. Some crossover operators could establish an adequate balance between exploration and exploitation, and generate distribution in the exploration and exploitation zones in the correct proportion. For this reason, the crossover is introduced into the ACO. Figure 1 presents the outline of ACO\(_\mathrm{R}\) with crossover operator COACO\(_\mathrm{R}\). As shown in Fig. 1, some of PDFs are generated by the crossover operations, and some of the other is from the original way of ACO\(_\mathrm{R}\). In the proposed algorithm, the crossover operation in genetic algorithm is employed to improve the search ability of the ant colony by enhancing the pheromone distribution in the promising searching space. A parameter \(n_{co}\), which is the number of the crossover operations, denotes the degree of the crossover operation. As a result, the set of the PDFs become more diversiform and effective, and the ants could find the good solutions efficiently under the improved PDF set.

As for the procedure of the COACO\(_\mathrm{R}\), initially, the PDF set of the ant colony is filled with randomly generated Gaussian functions. The algorithm iteratively updates the PDF set. The iteration includes two phases, in the first phase, the solutions are constructed according to the PDFs, and in the second phase, the PDF set is updated. In the proposed algorithm, the PDF set consists of not only the PDFs generated by the original ACO\(_\mathrm{R}\) but also the PDFs generated by the crossover operation. Firstly, m new solutions are built based on the PDF set by each ant independently. Secondly, k PDF vectors are generated by the best k solutions directly. Meanwhile, new 2\(n_{co}\) PDF vectors are generated by the crossover operations on the newly generated PDFs. After both the two kinds of PDFs were built, the updating of the PDF set is accomplished by adding the new 2\(n_{co}\) PDF vectors to the PDF set and removing the same number of worst PDF vectors. The outline of the proposed algorithm is shown in Fig. 2.

Fig. 2
figure 2

Pseudo code of the COACO\(_\mathrm{R}\)

3 Crossover operators

As mentioned above, crossover operator plays an important role to improve the performance of ACO\(_\mathrm{R}\). Three crossover methods (BLX-\(\alpha \) [18], UNDX [19] and PNX [20]) are employed for COACO\(_\mathrm{R}\), which are usually used in genetic algorithm. Based different crossover methods, the presenting COACO\(_\mathrm{R}\) are named as follows: ACO\(_{BLX}\), ACO\(_{UNDX}\), and ACO\(_{PNX}\), respectively.

3.1 BLX-\(\alpha \)

Blend crossover (BLX-\(\alpha )\) is a well-known crossover operator proposed by Eshelman and Schaffer [18]. For two parent solutions:

$$\begin{aligned}&{P} _1 =(P_{1,1} ,P_{1,2} ,\ldots ,P_{1,n} ),\\&{P} _2 =(P_{2,1} ,P_{2,2} ,\ldots ,P_{2,n} ),\\&\min _i =\min \{P_{1,i} ,P_{2,i} \},\\&\max _i =\max \{P_{1,i} ,P_{2,i} \}, i=1,2,\ldots ,n,\\&I=\max _i -\min _i . \end{aligned}$$

the BLX-\(\alpha \) randomly picks a solution in the range [ min\(_{i}-\alpha \)I, max\(_{i}+\alpha \)I]. Thus, if u is a random number between 0 and 1, an offspring \(\mathbf {C}=(C_1 ,C_2 ,\ldots ,C_n )\) is created as follows:

$$\begin{aligned} C_i =(1-\gamma )P_{1,i} +\gamma P_{2,i} \end{aligned}$$
(4)

where \(\gamma =(1+2\alpha )u-\alpha \). BLX-\(\alpha \) has an interesting property: the location of the child solution depends on the difference in parent solutions.

3.2 UNDX

In the unimodal normal distribution crossover (UNDX) [19], multiple parents are used to create two or more offspring solutions around the center of mass of these parents. A small probability is assigned to solutions away from the center of mass. UNDX crossover is formulated as follows:

$$\begin{aligned} \begin{array}{l} {c} _1 ={m} +z_1 {e} _1 +\sum \limits _{k=2}^n {z_k {e} _k } , \\ {c} _2 ={m} -z_1 {e} _2 -\sum \limits _{k=2}^n {z_k {e} _k } , \\ \end{array} \end{aligned}$$
(5)
$$\begin{aligned}&{m} =({p}_1 +{p}_2 )/2,\\&z_1 \sim N(0,\sigma _1^2 ), \quad z_k \sim N(0,\sigma _2^2 )(k=2,\ldots ,n),\\&\sigma _1 =\alpha d_1 , \sigma _2 =\beta d_2 /\sqrt{n},\\&{e}_1 =({p}_2 -{p}_1 )/|{p}_2 -{p}_1 |,\quad {e}_i \bot {e}_j (i,j=1,\ldots ,n,i\ne j) \end{aligned}$$

Here, n is the dimension of the variable. \(p_{1}\) and \(p_{2}\) are a pair of parent. \(d_{2}\) is the distance from \(p_{3}\) (a parent selected uniformly at random from the mating pool) to \(p_{1}-p_{2}\) and \(d_{1}={\vert }p_{1}-p_{2}{\vert }\). \(\alpha \) and \(\beta \) are parameters defined by users. The commended settings are \(\alpha =0.5\), and \(\beta = 0.35\), respectively.

3.3 PNX

In this work, the parent centric normal crossover (PNX) [20] is also used. This parent-centric crossover is self-adaptive in the sense that the spread of the possible offspring solutions depends on the distance between the parents, which decreases as the population converges. In addition, PNX is an isotropic operator as it does not preferentially search along any particular direction. Another beneficial feature is that PNX has a non-zero probability of generating offspring over the whole search space. In PNX, for each of the offspring c, we proceed as follows to determine its jth gene (\(c_{j})\). First, we draw a single random number,\(\omega \in \left[ {0,1} \right] \), we use the form \(y_i^{(1)} \) if \(\omega <0.5\)and \(y_i^{(2)} \) if \(\omega \ge 0.5\). Once this choice is made, the same selected form is used for every component j. The forms are

$$\begin{aligned}&\mathop y\nolimits _j^{(1)} =N\left( \mathop x\nolimits _j^{(1)} ,\frac{\left| {\mathop x\nolimits _j^{(2)} -\mathop x\nolimits _j^{(1)} } \right| }{\eta }\right) \\&\mathop y\nolimits _j^{(2)} =N\left( \mathop x\nolimits _j^{(2)} ,\frac{\left| {\mathop x\nolimits _j^{(2)} -\mathop x\nolimits _j^{(1)} } \right| }{\eta }\right) \end{aligned}$$
$$\begin{aligned} c_j =\left\{ \begin{array}{ll} \mathop y\nolimits _j^{(1)},&{}\quad \omega <0.5 \\ \mathop y\nolimits _j^{(2)},&{}\quad \omega \ge 0.5 \\ \end{array} \right. \end{aligned}$$
(6)

where \(N(\mu ,\sigma )\) is a random number drawn from a Gaussian distribution with mean \(\mu \) and standard deviation \(\sigma \), \(x_j^i \) is the jth component of the ith parent and \(\eta \) is a tunable parameter. The larger is the value of \(\eta \) the more concentrated is the search around the parents. In this paper, \(\eta \) is set to 2.

4 Simulation

In this section, we evaluate the performance of the proposed algorithms. Firstly, we present search abilities when applying to some 30 dimensional problems. Then 500 and 1000 dimensional for each problem is considered to evaluate their performance for large scale global optimization.

4.1 Experimental setup

In order to verify the effectiveness of the proposed algorithm, we use the benchmark test functions (\(f_{1} \sim f_{21})\) used in the literature [3]. Among the 21 traditional benchmark functions, functions \(f_{1} \sim f_{10}\) are unimodal (there are some recent evidence that \(f_{4}\) is a multimodal for D > 3, and the correlation between the variables of Rosenbrock function \(f_{4}\) is very strong). Functions \(f_{11}\sim f_{21}\) are multimodal with the number of local minima increasing exponentially with the problem dimension, and especially the function \(f_{21}\) is a strong multi-apex function. Functions \(f_{1} \sim f_{21}\) are reported as follows:

  1. 1.

    Sphere function (\(f_1 )\)

    $$\begin{aligned}&f(x)=\sum _{i=1}^n {x_i^2 } , \;\, -5.12\le x_i \le 5.12, x^{*}=(0,0,\ldots ,0),\\&f(x^{*})=0. \end{aligned}$$
  2. 2.

    Ellipsoid function (\(f_2 )\)

    $$\begin{aligned} f(x)= & {} \sum _{i=1}^n {(1000^{i-1/n-1}x_i )^{2}} , \quad -5.12\le x_i \le 5.12,\\ x^{*}= & {} (0,0,\ldots ,0), f(x^{*})=0. \end{aligned}$$
  3. 3.

    k-Tablet function (\(f_3 )\)

    $$\begin{aligned} f(x)&=\sum _{i=1}^k {x_i^2 +\sum _{i=k+1}^n {(100x_i )^{2}}},\quad -5.12\le x_i \le 5.12,\\ x^{*}&=(0,0,\ldots ,0), f(x^{*})=0. \end{aligned}$$
  4. 4.

    Rosenbrock function \(f_4 \)

    $$\begin{aligned} f(x)= & {} \sum _{i=2}^n {(100(x_1 -x_i^2 )^{2}+(1-x_i )^{2}}),\\&-2.048\le x_i \le 2.048,\\ x^{*}= & {} (0,0,\ldots ,0), f(x^{*})=0. \end{aligned}$$
  5. 5.

    Schewefel problem 3 (\(f_5 )\)

    $$\begin{aligned}&\mathop {\min }\limits _x f(x)=\sum _{i=1}^n {|x_i |+\coprod _{i=1}^n {|x_i |} }, \quad -10\le x_i \le 10,\\&x^{*}=(0,0,\ldots ,0)\,and\,f(x^{*})=0. \end{aligned}$$
  6. 6.

    Schewefel problem 4 (\(f{ }_6)\)

    $$\begin{aligned}&\mathop {\min }\limits _x f(x)=\mathop {\max }\limits _x \{ |x_i |, 1\le i\le n\}, \\&\quad -100\le x_i \le 100,\\&x^{*}=(0,0,\ldots ,0)\;and\;f(x^{*})=0. \end{aligned}$$
  7. 7.

    Axis parallel hyper ellipsoid (\(f_7 )\)

    $$\begin{aligned} \mathop {\min }\limits _x f(x)&=\sum _{i=1}^n {ix_i^2 } , \quad -5.12\le x_i \le 5.12,\\ x^{*}&=(0,0,\ldots ,0)\;and\;f(x^{*})=0. \end{aligned}$$
  8. 8.

    Zakharow’s function (\(f_8 )\)

    $$\begin{aligned} \mathop {\min }\limits _x f(x)&=\sum _{i=1}^n {x_i^2 } +\left( \sum _{i=1}^n {\frac{i}{2}x_i})^{2}+(\sum _{i=1}^n {\frac{i}{2}x_i }\right) ^{4},\;\;\\&\quad -5.12\le x_i \le 5.12,\\&x^{*}=(0,0,\ldots ,0)\;and\;f(x^{*})=0. \end{aligned}$$
  9. 9.

    Exponential problem (\(f_9 )\)

    $$\begin{aligned} \mathop {\min }\limits _x f(x)&=-\exp \left( 0.5\sum _{i=1}^n {x_i^2 }\right) , \quad -1\le x_i \le 1,\\ x^{*}&=(0,0,\ldots ,0)\;and\;f(x^{*})=-1. \end{aligned}$$
  10. 10.

    Ellipsoidal function (\(f_{10} )\)

    $$\begin{aligned} \mathop {\min }\limits _x f(x)&=\sum _{i=1}^n {(x_i -i)^{2}} , \quad -n\le x_i \le n,\\ x^{*}&=(1,2,\ldots ,n)\;and\;f(x^{*})=0. \end{aligned}$$
  11. 11.

    Ackley’s problem (\(f_{11} )\)

    $$\begin{aligned} \begin{array}{l} \mathop {\min }\limits _x f(x)=-20\exp \left( -0.2\sqrt{\frac{1}{n}\sum \limits _{i=1}^n {x_i^2 } }\right) \\ -\exp \left( \frac{1}{n}\sum \limits _{i=1}^n {\cos (2\pi x_i )}\right) +20+e,\quad -30\le x_i \le 30,\\ x^{*}=(0,0,\ldots ,0)\;and\;f(x^{*})=0. \\ \end{array} \end{aligned}$$
  12. 12.

    Cosine mixture problem (\(f_{12} )\)

    $$\begin{aligned} \mathop {\min }\limits _x f(x)&=\sum _{i=1}^n {x_i^2 } -0.1\sum _{i=1}^n {\cos (5\pi x_i )} , \quad -1\le x_i \le 1,\\ x^{*}&=(0,0,\ldots ,0)\;and\;f(x^{*})=-0.1n \end{aligned}$$
  13. 13.

    Griewank problem (\(f_{13} )\)

    $$\begin{aligned} \mathop {\min }\limits _x f(x)&=1+\frac{1}{4000}\sum _{i=1}^n {x_i^2 } -\prod _{i=1}^n {\cos \Big (\frac{x_i }{\sqrt{i}}\Big )} ,\; \\&\quad -\,600\le x_i \le 600,\\ x^{*}&=(0,0,\ldots ,0)\;and\;f(x^{*})=0. \end{aligned}$$
  14. 14.

    Levy and Montalvo problem 1 (\(f_{14} )\)

    $$\begin{aligned} \begin{array}{ll} &{}\mathop {\min }\limits _x f(x)=\frac{\pi }{n}(10\sin ^{2}(\pi y_1 )\\ &{}\qquad \qquad \qquad +\sum \limits _{i=1}^{n-1} {(y_i -1)^{2}} [1+10\sin ^{2}(\pi y_{i+1} )]\\ &{}\qquad \qquad \qquad \times (y_n -1)^{2}),\\ &{}where\;y_i =1+\frac{1}{4}(x_i +1),\;-10\le x_i \le 10,\\ &{}x^{*}=(0,0,\ldots ,0)\;and\;f(x^{*})=0. \\ \end{array} \end{aligned}$$
  15. 15.

    Levy and Montalvo problem 2 (\(f_{15} )\)

    $$\begin{aligned} \begin{array}{ll} &{}\mathop {\min }\limits _x f(x)=0.1(\sin ^{2}(3\pi x_1 )+ \sum \limits _{i=1}^{n-1} {(x_i -1)^{2}}\\ &{}\qquad \qquad \qquad \;\times [1+\sin ^{2}(3\pi x_{i+1} )]+(x_n -1)^{2}\\ &{}\qquad \qquad \qquad \;\times [1+\sin ^{2}(2\pi x_n )]), \quad -10\le x_i \le 10, \\ &{}x^{*}=(0,0,\ldots ,0)\;and\;f(x^{*})=0. \\ \end{array} \end{aligned}$$
  16. 16.

    Schwefel problem (\(f_{16} )\)

    $$\begin{aligned} \mathop {\min }\limits _x f(x)&=418.9829*n-\sum _{i=1}^n {x_i \sin (\sqrt{|x_i |})} ,\\&\quad -\,500\le x_i^ \le 500,\; \\ x^{*}&=(420.97,\ldots , 420.97)\;and\;f(x^{*})=0. \end{aligned}$$
  17. 17.

    Generalized penalized function 1 (\(f_{17} )\)

    $$\begin{aligned} \begin{array}{ll} &{}\mathop {\min }\limits _x f(x)=\frac{\pi }{n}(10\sin ^{2}(\pi y_1 )+\sum \limits _{i=1}^{n-1} {(y_i -1)^{2}}\\ &{}\qquad \qquad \qquad \;\times [1+10\sin ^{2}(\pi y_{i+1} )] +\,(y_n -1)^{2})\\ &{}\qquad \qquad \qquad \;+\,\sum \limits _{i=1}^n {u(x_i{,}10{,}100{,}4)},\quad -10\le x_i \le 10, \\ &{}x^{*}=(0,0,\ldots ,0)\;and\;f(x^{*})=0. \\ \end{array} \end{aligned}$$
  18. 18.

    Generalized penalized function 2 (\(f_{18} )\)

    $$\begin{aligned} \begin{array}{ll} &{}\mathop {\min }\limits _x f(x)=0.1 (\sin ^{2}(3\pi x_1 )+ \sum \limits _{i=1}^{n-1} {(x_i -1)^{2}}\\ &{}\qquad \qquad \qquad \;\times [1+\sin ^{2}(3\pi x_{i+1} )] +(x_n -1)^{2})\\ &{}\qquad \qquad \qquad \;\times [1+\sin ^{2}(2\pi x_n )]+\,\sum \limits _{i=1}^n {u(x_i{,}10{,}100{,}4)},\\ &{} -5\le x_i \le 5, x^{*}=(0,0,\ldots ,0)\;and\;f(x^{*})=0. \\ \end{array} \end{aligned}$$

    In the problem 17 and 18, the penalty function u is given by the following expression:

    $$\begin{aligned} u(x,a,k,m)=\left\{ \begin{array}{ll} k*pow((x-a),m)&{}\quad if x>a, \\ -k*pow((x-a),m)&{}\quad if x<-a, \\ 0 &{}\quad otherwise. \\ \end{array} \right. \end{aligned}$$
  19. 19.

    Bohachevsky function (\(f_{19} )\)

    $$\begin{aligned} f(x)&=\sum _{i=1}^{n-1} (x_i^2 +2x_{i+1}^2 -0.3\cos (3\pi x_i )\\&\quad -\,0.4\cos (4\pi x_{i+1})+0.7), -5.12\le x_i \le 5.12,\\ x^{*}&=(0,0,\ldots ,0), f(x^{*})=0. \end{aligned}$$
  20. 20.

    Schaffer function (\(f_{20}\))

    $$\begin{aligned} f(x)&=\sum _{i=1}^{n-1} {\Big [(x_i^2 +x_{i+1}^2 )^{0.25}\times (\sin ^{2}(50(x_i^2 +x_{i+1}^2 )^{0.1})} \\&\qquad +\,1.0)\Big ],\\&\quad -\,100\le x_i \le 100, x^{*}\\&=(0,0,\ldots ,0), f(x^{*})=0. \end{aligned}$$
  21. 21.

    Rastrigin function (\(f_{21}\))

    $$\begin{aligned} f(x)&=10n+\sum _{i=1}^n {(x_i ^{2}-10\cos (2\pi x_i ))} , \;\; \\&\quad -\,5.12\le x_i \le 5.12, x^{*}=(0,0,\ldots ,0),\\&\quad f(x^{*})=0. \end{aligned}$$
Fig. 3
figure 3

The convergence process on the sphere \(f_{1}\), D \(=\) 30

Fig. 4
figure 4

The convergence process on the sphere \(f_{4}\), D \(=\) 30

In general, the comparison of algorithms for CnOP is usually based on the following criterion to evaluate the algorithms: the number of the function evaluations (FEs) to achieve a certain solution quality [9, 10]. For all the 21 test functions, we performed 25 independent runs. The stopping criterions are as follows: \({\vert } f(s)-f(s*){\vert }<\)10\(^{-7}\)(s* is the global optimal solution) or the maximum number of function evaluations (MaxFEs) is set to 10,000*D. It means that if the error accuracy does not reach 10\(^{-7}\) within 10,000*D, the simulation run is considered to an unsuccessful run.

4.2 Parameter setting

It is difficult to find the best parameter combination for all the problems because of the multimodality and nonlinearity of different kinds of objective functions. As a result, the robustness of the parameter is very important, and it is a challenge to suggest common fixed values of the parameters. We carried out extensive experiments for the proposed COACO\(_\mathrm{R}\) algorithm to analyze the parameters.

Fig. 5
figure 5

The convergence process on the sphere \(f_{11}\), D \(=\) 30

Fig. 6
figure 6

The convergence process on the sphere \(f_{20}\), D \(=\) 30

The same parameters used in the three proposed COACO\(_\mathrm{R}\) are as follows: the number of ants m, the size of PDF set k, the number of the crossover operations \( n_{co}\), and the evaporation rate \(\xi \) of the pheromone. For the sake of fairness, m, k, \(n_{co}\) and \(\xi \) are set as same as value for three COACO\(_\mathrm{R}\) and the typical ACO\(_\mathrm{R}\). They are set as Table 1. The parameters setting of crossover operators are described in Sect. 3.

4.3 Performance evaluation for 30 dimensionality problems

In this subsection, we firstly present the performance evaluation for 30 dimensionality problems. To investigate the performance of the proposed COACO\(_\mathrm{R}\), the convergence properties of the three COACO\(_\mathrm{R}\) on some typical functions (\(f_{1}\), \(f_{4}\), \(f_{11}\), and \(f_{20})\) are analyzed, in comparison with the original ACO\(_\mathrm{R}\). The parameters of the ACO\(_\mathrm{R}\) is set the same as the proposed COACO\(_\mathrm{R}\) except the \(n_{co}\) (\(n_{co}\) = 0 in ACO\(_\mathrm{R})\), because there is no crossover operation in ACO\(_\mathrm{R}\). The typical functions include the basic function, the non-separable function which have correlation among the design variables, and the multimodal functions which have a large number of local minima.

Table 1 Parameters setting
Table 2 Comparing with other algorithms on 21 benchmark functions, D \(=\) 30

Sphere function \(f_{1}\) is the basic function to evaluate the algorithm. For the non-separable function we choose the Rosenbrock function \(f_{4}\) [21]. In the Rosenbrock function, the variables are correlated and interact between two adjacent variables, and the global minimum is inside a long, narrow, parabolic shaped flat valley. As shown in the Figs. 3 and 4, the proposed algorithm could find the global optimum successfully and faster than the ACO\(_\mathrm{R}\) for these two.

For the multimodal functions, the Ackley function \(f_{11 }\) [22] and the Schaffer function \(f_{20}\) [23] are chosen. Ackley function has an exponential term that covers its surface with numerous local minima. Schaffer function is made up of a large number of local minima whose value increases with the distance to the global minimum. The number of the local minima is so huge that the ant is trapped in the local minima easily. From the Figs. 5 and 6, we can see that for the Ackley function and the Schaffer function, the proposed three algorithms also faster to reach the required accuracy than the ACO\(_\mathrm{R}\). Rastrigin function (\(f_{21})\) is another famous multimodal functions, which is made up of a large number of local minima. As shown in Table 2, the proposed algorithm could find the global optimum very easily while the ACO\(_\mathrm{R}\) could not find the global optimum for the Rastrigin function.

In order to have an overview of the performance of the proposed algorithms, more extensive experiments are carried out. CMA-ES [22], CGA\(_\mathrm{R}\) [3] and the differential evolution [24, 25] are employed to compare with the proposed COACO\(_\mathrm{R}\) on 21 benchmark functions (\(f_{1}\sim f_{21})\).

The CMA-ES is the state-of-the-art algorithm that is useful for continuous optimization. The CGA\(_\mathrm{R}\) algorithm is a new framework called CGA with an FPDD-LX crossover operator. The DE is another state-of-the-art algorithm that is useful for the real world application, and we select the classical DE approach called DE/rabd/1 to compare with the proposed algorithm. The parameter settings and results of other algorithm is based on the literatures [3, 25]. We performed 25 independent runs using the stopping criterions \({\vert }f(x)-f(x*){\vert }<\)10\(^{-7}\). The mean numbers of the FEs to achieve the fixed accuracy level 10\(^{-7}\) are recorded in Table 2 for the above algorithm.

As shown in Table 2, ACO\(_{BLX}\), ACO\(_{UNDX}\) and ACO\(_{PNX}\) are all enable to achieve the fixed accuracy level 10\(^{-7}\) for 21 problems (D \(=\) 30). In this case, ACO\(_{BLX}\), ACO\(_{UNDX}\) and CMA-ES obtain 8, 5, and 8 champions among 21 test problems, respectively. ACO\(_{BLX}\), ACO\(_{UNDX}\) and ACO\(_{PNX}\) are obviously superior to CAM-ES for the multimodal problems. From Table 2, we also known the ACO\(_\mathrm{R}\) is inferior to three presenting COACO\(_\mathrm{R}\) when applying to some problems (\(D = 30\)).

Table 3 Simulation results, D \(=\) 500
Table 4 Simulation results, D \(=\) 1000

4.4 Performance evaluation for 500–1000 dimensionality problems

Nowadays, high dimensional optimization problems are an interesting field of research. Although these approaches such as CMA-ES and DE have shown excellent search abilities when applying to some 30–100 dimensional problems, many of them suffer from their performance deteriorates quickly as the dimensionality of search space increases. CMA-ES is an algorithm that uses several operations with a complexity of O(\(n^{3})\), where n is the dimension value, and although there are versions that try to reduce this problem, it has not been actually resolved [24]. This behavior makes CMA-ES does not scale well for large scale optimization problems. To further challenge and evaluate the proposed algorithms, we apply them to 500–1000 dimensionality problems.

Tables 3 and 4 record the reached accuracy level (\(f(x)-f(x\)*)) to terminate before reaching the stopping criterions: \({\vert } f(x)-f(x*){\vert }<\)10\(^{-7}\), or the number of function evaluations (FES) is larger than 10,000*D. The mean FES needed in each run to achieve the fixed accuracy level (see the “Accuracy” column in Tables 3, 4) are also indicated in Tables 3 and 4. As shown in Tables 3 and 4, ACO\(_{UNDX}\) has excellent performance for large scale optimization problems. There are fourteen 500 dimensionality problems to reach the accuracy level 10\(^{-7}\) and one to 10\(^{-6}\) when using ACO\(_{UNDX}\). As for 1000 dimensionality problems, ACO\(_{UNDX}\) has 15 problems to reach the accuracy level 10\(^{-7}\). Other two algorithms ACO\(_{BLX}\) and ACO\(_{PNX}\) are also superior to ACO\(_\mathrm{R}\) for large scale optimization problems. With regard to the mean FES needed in each run, ACO\(_{UNDX}\) also obviously less than other algorithms.

5 Conclusion

In this paper, we evaluated the performance of the ACO algorithms with different crossover operations for CnOPs. A large number of simulations on 21 benchmark problems were carried out. Simulation results showed ACO\(_\mathrm{R}\) with crossover operation is far well the typical ACO\(_\mathrm{R}\) from 30 to 1000 dimensionality problems. So we can draw a conclusion that crossover operators can effectively improve the global exploration ability of ACO\(_\mathrm{R}\). ACO\(_{BLX}\) has the most excellent performance for 30 dimensionality problems and it is superior to CAM-ES for the multimodal problems. ACO\(_{UNDX}\) has the most excellent performance for large scale optimization problems. The proposal is a very competitive optimization algorithm for large scale problems, both in results and in processing time. In future works, we will focus cooperative coevolution between ACO\(_{R }\)and EA.