Keywords

1 Introduction

The general (single-objective) global optimization problem is defined as followsFootnote 1:

$$\begin{aligned} \begin{aligned} \text {Minimize} \quad&f(\textit{\textbf{x}}) \\ \text {Subject to } \quad&lb_j \le x_j \le ub_j ,\quad j=1,2,...,D \end{aligned} \end{aligned}$$
(1)

where are the lower bound and the upper bound of the decision variables \(\textit{\textbf{x}}\), respectively. is the objective function. The feasible solution space is defined as: . When \(D \ge 1000\), this is called large scale global optimization (LSGO) [11] and because of the limitations of mathematical programming techniques (particularly when dealing with highly nonlinear objective functions [2, 17]), the use of metaheuristics (particularly evolutionary algorithms) has become relatively popular [5]. Differential Evolution (DE) [1, 14] is a metaheuristic designed for continuous search spaces that has been very successful in solving a variety of complex optimization problems. However, as happens with other meta-heuristics, the performance of DE quickly deteriorates as we increase the number of decision variables of the problem (the so-called “curse of dimensionality”) [3]. Additionally, the properties and conditions of the fitness landscape may change (e.g., going from unimodal to multimodal) [2, 3]. Many current approaches for large-scale global optimization are based on cooperative coevolution (CC), but several non-CC have also been proposed in recent years [2, 4, 5, 11, 12, 17]. CC consists in decomposing the original large scale problem into a set of smaller subproblems which are easier to solve (i.e., it is a divide-and-conquer approach) [2, 12, 17]. On the other hand, the non-CC approaches try to solve the large scale problem as a whole. Most of these non-CC approaches are hybrid schemes that combine several metaheuristics that complement each other in order to overcome their limitations. In fact, several researchers [2, 5, 11] have combined DE with non-population-based local search methods to boost its overall performance. In this paper, we propose a new non-CC approach which is based on the so-called Success-History Based Parameter Adaptation for Differential Evolution (SHADE) algorithm [15]. Our proposed algorithm consists of three stages: (1) initialization, (2) global search and (3) local search. During the initialization stage a gradient-free non-population-based local search method is applied to one of the best individuals generated in order to make an early enhancement. Afterwards, the global and local search stages are iteratively repeated one after another. Our proposed approach consists of two populations that collaborate with each other since the first population presents a search scheme specialized in exploration (thus carrying out the global search stage) and the second one presents a search engine specialized in exploitation (carrying out the local search stage). The first population evolves according to SHADE’s algorithm and the second one according to a new developed SHADE’s variant which we’ve named eSHADE\(_{\text {ls}}\). The communication between these two populations is done via a simple migration protocol (happening when switching from the global to the local search stage or vice versa). Our proposal is considered a hybrid and non-CC algorithm since it combines an evolutionary algorithm with a local search method (used just once during the initialization stage) and solves the problem as a whole (decomposition never happens). The performance of our proposed algorithm, referred to as Global and Local search using SHADE (GL-SHADE), is evaluated on the CEC’2013 large-scale global optimization (LSGO) test suite. The remainder of this paper is organized as follows. In Sect. 2, we provide the background required to understand the rest of the paper. Section 3 describes in detail our proposal. Our experimental design and our results are provided in Sect. 4. Finally, our conclusions and some paths for future research work are provided in Sect. 5.

2 Background

2.1 Differential Evolution

Differential Evolution (DE) was originally proposed by Storn and Price in 1995 [14, 18]. DE is a stochastic population-based evolutionary algorithm (EA) that has shown to perform well in a variety of complex (including some real-world) optimization problems [1, 14]. DE has three control parameters: NP (population size), F (mutation scale factor) and Cr (crossover rate) [3]. It is well known that the performance of DE is very sensitive to these parameters [1]. Like other EAs, an initialization phase is its first task [18]. After that, DE adopts three main operators: mutation, recombination, and selection.

figure a

The initialization phase (see Algorithm 1, line 1) consists in randomly scattering NP guesses (points) over the search space as follows: \(x_{i,j} = lb_j + (ub_j - lb_j)*rnd(0,1)\), where \(x_{i,j}\) represents the \(j^{th}\) gene of the \(i^{th}\) individual [18].

The mutation and recombination operators are applied to generate a new trial vector (\(\varvec{u_i}\)). During mutation, DE creates a new obtained candidate solution called a donor solution (\(\varvec{v_i}\)). The sort of mutation and recombination operator to be adopted is defined based on the DE version that we use. The notation \(DE/\alpha /\beta /\gamma \) is adopted to indicate the particular DE variant to be adopted [1]: \(\alpha \) specifies the base vector, \(\beta \) is the number of difference vectors used, and \(\gamma \) denotes the type of recombination to be used [3]. Algorithm 1 presents the classical DE variant, called DE/rand/1/bin. In this scheme, for the mutation operator, three mutually exclusive individuals (\(\varvec{x_{r1}}\), \(\varvec{x_{r2}}\) and \(\varvec{x_{r3}}\)) are randomly selected where \(\varvec{x_{r1}}\) is perturbed using the scaled difference between \(\varvec{x_{r2}}\) and \(\varvec{x_{r3}}\) in order to obtain \(\varvec{v_i}\) (lines 5–6). For the recombination operator, genes are inherited from \(\varvec{v_i}\) and from the target vector (\(\varvec{x_i}\)) and a binomial (bin) distribution is adopted in order to obtain \(\varvec{u_i}\) (lines 7–13) where at least one gene must be inherited from \(\varvec{v_i}\) (see line 9). The last step is the selection operator (line 14) where the fittest between \(\varvec{u_i}\) and \(\varvec{x_i}\) is chosen to represent the next generation’s \(\varvec{x_i}\). Several DE variants exist (see [1, 14]) and the selection of any of them will influence the search behavior of DE in different ways. Since there are only two types of recombination (exponential (exp) and binomial (bin)) the mutation strategy is really the one that better identifies the behavior of a particular DE scheme [1].

2.2 An Enhanced Differential Evolution Algorithm Based on Multiple Mutation Strategies

The Enhanced Differential Evolution Algorithm Based on Multiple Mutation Strategies (abbreviated as EDE by its authors) [18] was proposed as an enhancement of the DE/best/1/bin scheme, aiming to overcome the tendency of this scheme to present premature convergence. The core idea of EDE is to take advantage of the direction guidance information of the best individual produced by the DE/best/1/bin scheme, while avoiding being trapped into a local optimum. In the EDE algorithm, an opposition-based learning initialization scheme is combined with a mutation strategy composed of two DE variants (DE/current/1/bin and DE/pbest/bin/1) aiming to speed up convergence and to prevent DE from clustering around the global best individual. EDE also incorporates a perturbation scheme for further avoiding premature convergence. Algorithm 2 shows the way in which EDE works.

figure b

A remarkable feature of EDE is that it first uses a very explorative evolutionary strategy, but as the number of function evaluations increases, it changes to a much less explorative scheme (EDE sets \(r_{min} = 0.1\) and \(r_{max} = 1\)). Another remarkable feature of EDE is the coupled perturbation method that it adopts between generations. Actually, if no mutation strategy switching is incorporated and we decide to use only line 11 (i.e., the mutation operation where \(M = 1\)), the procedure is transformed into a pure population-based local search method.

2.3 Success-History Based Parameter Adaptation for Differential Evolution

The Success-History Based Parameter Adaptation for Differential Evolution (SHADE) [15] is a self-adaptive version of the DE variant called DE/current-to-pbest/1/bin. The self-adaptive mechanism is applied to both F and Cr. Therefore, NP is its only control parameter (for further information about the self-adaptive mechanism of SHADE, interested readers are referred to  [15]). SHADE also incorporates an external archive (A) built from the defeated parents throughout generations, since such individuals are used during the application of the mutation operator in order to promote a greater diversity. The adopted mutation strategy is the following:

$$\begin{aligned} \varvec{v_i} = \varvec{x_i} + F_i*(\varvec{x_{pbest}} - \varvec{x_i}) + F_i*(\varvec{x_{r_2}} - \varvec{x_{r_3}}) \end{aligned}$$
(2)

where F is regenerated for every \(x \in P\) (the regeneration procedure can be consulted in [15]). \(x_{r_2}\) and \(x_{r_3}\) are randomly chosen vectors from P and \(P \cup A\), respectively, and \(x_{pbest}\) is taken randomly from the \(p\%\) fittest individuals in the population. The parameter p is regenerated for every \(x \in P\) as follows:

$$\begin{aligned} p_i = rand(\frac{2}{NP},p_{max}) \end{aligned}$$
(3)

where \(p_{max} = 0.2\). The mutation strategy described in Eq. (2) is explorative as the base vector is practically the target vector. Another important feature is that information about the fittest individuals are taken into account. Finally, binomial recombination is used, but Cr is regenerated in a way analogous to the F regeneration procedure.

3 Our Proposal

Our proposed approach is called Global and Local Search using SHADE (GL-SHADE), and its main procedure is illustrated in Algorithm 4. GL-SHADE consists of three main components (lines 1–3):

  • MTS-LS1 (stands for Multiple Trajectory Search - Local Search 1) is a gradient-free non-population-based local search method (for a detailed explanation of this method, interested readers should refer to [16]). This method is used at the beginning of the search (line 9) and just once in all the procedure (the idea is to boost the search at the beginning).

  • SHADE is used for population 1 (line 1). It integrates a global search scheme which takes into account information about the top individuals in the population (this property is remarkable since all enhancements done by other methods to the fittest individual can be used to guide the search) and presents a robust self-adaptive mechanism. Due to the aforementioned features, this scheme is adopted to handle the global search of our proposed approach.

  • eSHADE\(_{\mathbf {ls}}\) is used for population 2 (line 2). This is the variant called SHADE/pbest/1/exp coupled with the EDE perturbation method (see Algorithm 2, lines 16–27). This variant uses the following mutation strategy:

    $$\begin{aligned} \varvec{v_i} = \varvec{x_{pbest}} + F_i*(\varvec{x_{r_2}} - \varvec{x_{r_3}}) \end{aligned}$$
    (4)

    where \(p_{max}\) is set to 0.1 (see Eq. (3)). The strategy described in Eq. (4) is one of the mutation operators that EDE incorporates for scattering new trial vectors close to several top individuals and not just near the fittest one. As can be seen, this strategy is very different to that of Eq. (2), and our proposed approach allows them to complement each other. Additionally, exponential recombination is adopted instead of binomial recombination. The main procedure of eSHADE\(_{\text {ls}}\), which was designed to handle the local search, is described in Algorithm 3.

    figure c

The first task in Algorithm 4 is an initialization stage (lines 1–9). During this stage, the GL-SHADE’s components are defined (lines 1–3), the corresponding populations are generated (line 5), the fittest individual from population 1 is set as the best solution so far (line 7) and a local search procedure is applied to the best solution recorded so far (line 9); in this case, the MTS-LS1 method is used. When defining the components, it is necessary to provide the stopping criterion (maximum number of evaluations \(max_{FEs}\)) since in some cases, it is required to stop the overall search process even when the maximum component’s requested number of evaluations hadn’t been reached (\(G_{FEs}\) or \(L_{FEs}\)). For example, MTS-LS1’s execution (line 9) stops when \(counter_{FEs} \ge L_{FEs}\) or \(current_{FEs} \ge max_{FEs}\). In fact, \(counter_{FEs}\) and \(current_{FEs}\) are updated accordingly as the component’s execution goes by. The second task is to perform a global and local search stage (lines 10–16). During this stage, population 1 receives (line 11) the new best individual (which must be placed at the position where the old best individual is) and then the global search scheme is executed (line 12) while \(counter_{FEs} \le G_{FEs}\) and \(current_{FEs} \le max_{FEs}\). After finishing, population 1 migrates (lines 13–14) its best individual to population 2 (the entry individual is randomly placed) and then the local search scheme is executed (line 15) while \(counter_{FEs} \le L_{FEs}\) and \(current_{FEs} \le max_{FEs}\). Finally, population 2 migrates (lines 16–11) its best individual to population 1 and the procedure is repeated until the maximum number of function evaluations (\(max_{FEs}\)) is reached. It is worth noting that after applying the mutation or perturbation procedures, a variable may fall outside its allowable boundaries. If this happens, we apply the same normalization procedure adopted in [15].

figure d

4 Experimental Results

In order to assess the performance of our proposed GL-SHADE, we adopted the test suite used at the large-scale global optimization competition held at the 2013 IEEE Congress on Evolutionary Computation (CEC’2013) [6], but adopting the experimental conditions and guidelines of the LSGO competition held at the 2019 IEEE Congress on Evolutionary Computation (CEC’2019) [7].

The previously indicated benchmark consists of 15 test problems, all with 1000 decision variables except for f13 and f14 which are overlapping functions, where \(D=905\) [2, 11]. In general, these test problems can be categorized as shown in Table 1.

For each test problem, 25 independent executions were carried out. Each run is stopped when reaching a pre-defined maximum number of objective function evaluations (\(max_{FEs} = 3 \times 10^{6}\)). Additionally, we also report the results obtained after performing 120,000 and 600,000 objective function evaluations. The parameter values adopted in our experiments are shown in Table 2 and our results are summarized in Table 3 (we report the best, worst, median, mean, and standard deviations calculated over the 25 runs performed for each test problem).

Table 1. Features of the test problems from the CEC’2013 benchmark
Table 2. Parameter values used in our experiments

 

Table 3. Summary of the results obtained by GL-SHADE in the CEC’2013 test problems

Finally, we also provide the convergence curves for f2, f7, f11, f12, f13 and f14 (see Fig. 1). In order to reduce its running time, GL-SHADE and the benchmark set were implementedFootnote 2 using C\(++\) and CUDA. All experiments were performed using the Intel(R) Core(TM) i7-3930K CPU @ 3.20 GHz with 8 GB RAM (using the Ubuntu 18.04 operating system), and the GeForce GTX 680 GPU with the CUDA 10.2 version.

Fig. 1.
figure 1

Convergence curves with logarithmic scale for some CEC’2013 benchmark problems

4.1 Comparison with Its Components

In order to investigate if GL-SHADE is able to outperform its individual components, we adopted the Wilcoxon signed-rank test. Here, we take \(N = 25\) and \(\alpha = 0.05\), meaning that we use a sample size of 25 executions and a significance level of \(5 \%\). Results are summarized in Table 4. There, we can see the mean as well as its statistical significance next to it. The notation “b/e/w” shown in the last row means that a component is significantly better in “b” test problems, that there was no statistically significant difference in “e” test problems and that it was significantly worse in “w” test problems, with respect to GL-SHADE. These results indicate that our proposed approach presents a better overall performance than any of its components considered separately. This is particularly true for the overlapping and partially separate test problems with a separable subcomponent (i.e., f4 to f7).

Table 4. Statistical validation (GL-SHADE is the control algorithm).

4.2 Comparison Between GL-SHADE and SHADEILS

SHADE with iterative local search (SHADEILS) [11] is one of the best hybrid algorithms currently available for large-scale global optimization, according to [9, 10]. SHADEILS uses SHADE to handle the global search and it incorporates MTS-LS1 and a gradient-based method to undertake the local search. Additionally, it integrates a re-start mechanism which is launched when stagnation is detected. In order to compare SHADEILS with respect to our proposal, we adopted again the test suite used at the large-scale global optimization competition held at the 2013 IEEE Congress on Evolutionary Computation (CEC’2013) [6]. We also applied the Wilcoxon signed-rank test with \(N = 25\) and \(\alpha = 0.05\). Our comparison of results is summarized in Table 5. The results that are better in a statistically significant way are shown in boldface. Additionally, we show the convergence curves for both SHADEILS and GL-SHADE in Fig. 1. SHADEILS is significantly better than our proposed approach in 4 out of 15 test problems, while GL-SHADE is significantly better than SHADEILS in 9 out of 15 test problems. Our proposed approach is particularly better than SHADEILS in the overlapping test problems.

Table 5. Statistical comparison of results: SHADEILS vs GL-SHADE using the CEC’2013 benchmark problems (SHADEILS is the control algorithm), performing 3,000,000 objective function evaluations

4.3 Comparison with Respect to State-of-the-art Algorithms

Our proposed approach was also compared with respect to three state-of-the-art algorithms (besides SHADEILS), namely:

  • CC-RDG3 [12]: is a CC-based algorithm. It was the winner of the 2019 LSGO competition [10, 13].

  • MOS [5]: is a hybrid algorithm that was considered as one of the best metaheuristics for LSGO during several years (from 2013 to 2018) [11, 13].

  • LSHADE-SPA [2]: is a hybrid-CC method. This approach together with SHADEILS were the first to outperform MOS [9].

For comparing our results, we adopted the 2019 LSGO competition criteria [10]. Thus, each algorithm is assigned a score (per test problem) using the Formula One Score (FOS) which is based on its position (1st/25pts, 2nd/18pts, 3rd/15pts, 4th/12pts and 5th/10pts). The comparison of results (performing 3,000,000 objective function evaluations) of our proposed GL-SHADE with respect to other state-of-the-art algorithms is shown in Fig. 2. The results from the other algorithms were extracted from the LSGO competition database [7, 8, 13]. Based on the results summarized in Fig. 2, we obtained the ranking shown in Table 6.

Fig. 2.
figure 2

GL-SHADE vs state-of-the-art algorithms using the CEC’2013 benchmark problems adopting the FOS criterion

Table 6. Ranking according to the FOS criterion

5 Conclusions and Future Work

In this paper, we proposed a new DE-based algorithm (GL-SHADE) specifically designed to solve LSGO problems. Our proposed approach incorporates a global search engine combined with a local search mechanism. For the global search engine, our proposed approach adopts SHADE. For our local search mechanism, we adopted a population-based self-adaptive algorithm (eSHADE-ls) which is indeed the main difference of our proposal with respect to other state-of-the-art algorithms adopted for LSGO. SHADE and eSHADE-ls collaborate with each other during the evolutionary process. Since our proposed approach adopts very different and complementary mutation strategies, SHADE’s strategy scatters a mutated vector around the target vector, while eSHADE-ls’ strategy scatters it close to one of the top individuals in the population. Our proposed approach was able to outperform its components (when considered independently), as well as SHADEILS in most of the test problems adopted (we adopted the CEC’2013 LSGO test problems). Additionally, it was found to be very competitive with respect to four state-of-the-art algorithms and obtained the best rank (based on the FOS criterion). As part of our future work, we are interested in experimenting with different hybrid schemes. For example, one possibility would be to have an automatic resource allocation mechanism that can determine the maximum number of evaluations that each component of our approach should be executed, with the aim of improving its performance. We are also interested in trying other (more elaborate) local search schemes that can also improve the performance of our proposed approach.