1 Introduction

In this paper, we consider the following singularly perturbed convection–diffusion problem with two small parameters

$$\begin{aligned} \left\{ \begin{array}{l} Lu=-\varepsilon u''(x)+\mu b(x)u'(x)+c(x)u(x)=f(x),\\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad x\in \varOmega =(0,1),\\ u(0)=A,~~~~u(1)=B, \end{array} \right. \end{aligned}$$
(1)

where \(0<\varepsilon \ll 1\) and \(0<\mu \ll 1\). The functions b(x), c(x) and f(x) are assumed to be sufficiently smooth satisfying

$$\begin{aligned} 0<b^*\le b(x), \;0<c^*\le c(x), \; x\in [0,1], \end{aligned}$$

where \(b^*\) and \(c^*\) are two positive constants. When \(\mu =0\) or \(\mu =1\), this problem encompasses reaction–diffusion problem or convection–diffusion problem, respectively. These kinds of problems arise in transport phenomena in chemistry and biology (Bigge and Bohl 1985). The nature of the two-parameter problem was asymptotically examined by O’Malley (1967), where the ratio of \(\mu \) to \(\varepsilon \) has significant role in solution. For this problem, two boundary layers occur at \(x=0\) and \(x=1\). Because of the presence of these layers, some standard numerical methods applied on a uniform mesh fail to give a satisfactory numerical solution. Thus, much attention has been focused on the use of a non-uniform mesh that is adapted to the singularly perturbed problems.

Recently, Gracica et al. (2006) used a second-order monotone numerical scheme which was combined with a piecewise-uniform Shishkin mesh to solve problem (1). Linß (2010) presented a streamline-diffusion finite element method (SDFEM) on a Shishkin mesh. Furthermore, Linß and Roos (2004) developed a first-order upwind difference scheme on a piecewise-uniform Shishkin mesh. Roos and Uzelac (2003) also proposed a SDFEM on a Shishkin mesh to solve problem (1). Herceg (2011) presented a finite difference scheme for a class of linear singularly perturbed boundary value problems with two small parameters which was discretized on a Bakhvalov-type mesh. Kadalbajoo and Yadaw (2008) solved problem (1) by using the cubic B-spline collocation method on a piecewise-Shishkin mesh.

In a word, it can be seen from the above literature that the upwind finite difference scheme applied on a Shishkin mesh is very popular in solving the singularly perturbed convection–diffusion equation with two small parameters. As far as we know, this mesh contains two grid transition points \(\lambda _1\) and \(\lambda _2\) which have some different definitions in some papers. In Miller et al. (1996), the authors defined \(\lambda _1\) and \(\lambda _2\) as

$$\begin{aligned} \lambda _1=\min \left( \frac{1}{4},\frac{\sigma _1}{\mu _1}\ln N\right) ,~~\lambda _2=\min \left( \frac{1}{4},\frac{\sigma _2}{\mu _2}\ln N\right) , \end{aligned}$$

where \(\sigma _1,\sigma _2\) are two positive constants, \(\mu _1\) and \(\mu _2\) are defined in Miller et al. (1996), and N, our discretization parameter, is a positive even. Then, they divided the intervals \([0,\lambda _1]\) and \([1-\lambda _2,1]\) into N / 4 subintervals, respectively, and \([\lambda _1,1-\lambda _2]\) is dissected into N / 2. In practical computation, the numerical results of problem (1) are related to the choice of constants \(\sigma _1,\sigma _2\). As far as we know, there is no any method which is used to calculate the grid parameters \(\sigma _1\) and \(\sigma _2\). Therefore, it is very important to study a clearly method to get the best Shishkin mesh parameters.

In recent years, various improved intelligence algorithms or hybrid intelligence algorithms have been designed to solve optimization problems (Ouyang and Yang 2016), such as PSO with neighborhood operator (Suganthan 1999), distance-based locally informed PSO (Qu et al. 2013), hybrid PSO algorithm (Ouyang et al. 2014), hybird genetic algorithm (Xu et al. 2014), hybrid chemical reaction optimization (Xu et al. 2015), parallel hybrid PSO (Ouyang et al. 2015), heterogeneous CLPSO algorithm (Lynn and Suganthan 2015), multi-population DE algorithm (Wu et al. 2016), hybrid harmony search algorithm (Ouyang et al. 2016a), hybrid cultural algorithm (Ali et al. 2016a, b), hybrid invasive weed optimization algorithm (Ouyang et al. 2016b).

As we know, differential evolution (DE) algorithm (Srorn and Price 1997) is a fast and simple method which performs well on a wide variety of problems. It is a population-based stochastic search technique, which is inherently parallel. DE algorithm is a relatively new nonlinear search and optimization approach, which is particularly well suited to solve some complicate optimization problems. Due to its advantages of simple structure, easy implementation and good computational efficiency, DE algorithm has been successfully used to solve many problems such as mechanical engineering (Abderazek et al. 2015), Signal processing (Liu and Lampinen 2005), pattern recognition (Das and Konar 2009), some problems of parameter estimation (Gong and Cai 1976). Recently, some hybrid DE algorithms (Gong et al. 2011, 2015) were also presented for some global numerical optimization.

In view of the unique advantages of differential evolution algorithm for estimating parameter, the mainly work of this paper is motivated by using jDE algorithm (Brest et al. 2006) to obtain the best mesh transition points. More specifically, we will first use the B-spline collocation technique developed in Kadalbajoo and Yadaw (2008) to study the numerical solution of problem (1). Then, we may use the double-mesh principle (Matthews et al. 2002) to estimate the absolute errors. At last, we transform the choice of mesh parameter problem into a nonlinear unconstrained optimization problem. Furthermore, we utilize the jDE algorithm to find two suitable mesh transition points and the corresponding numerical results for the problem (1).

The remainder of this paper is organized in the following way. Section 2 gives a simple introduction to the mesh selection strategy. Section 3 shows a detailed theoretical analysis of B-spline collocation method. Section 4 introduces A differential evolution algorithm to optimize the Shishkin mesh parameters. Section 5 displays the numerical experimental results and discussions in detail. Finally, the paper concludes with Sect. 6.

2 Mesh selection strategy

At first, we use the piecewise-uniform grid to divide the interval [0,1] into three subintervals:

$$\begin{aligned} \varOmega _0=[0,\lambda _1],~~\varOmega _c=[\lambda _1,1- \lambda _2]~~\text {and}~~\varOmega _1=[1-\lambda _2,1], \end{aligned}$$

where the transition parameters are given by

$$\begin{aligned} \lambda _1=\min \left( \frac{1}{4},\delta _1\mu \ln N\right) ,~~\lambda _2=\min \left( \frac{1}{4},\delta _2\varepsilon \ln N\right) , \end{aligned}$$

where \(\delta _1\) and \(\delta _2\) are two positive parameters. Then, we place N / 4, N / 2 and N / 4 mesh points in three subregions \([0,\lambda _1]\), \([\lambda _1,1-\lambda _2]\) and \([1-\lambda _2,1]\), respectively. Finally, the mesh widths can be obtained as follows:

$$\begin{aligned} \tilde{h}=\left\{ \begin{array}{ll} \frac{4\lambda _1}{N}, &{}\quad \hbox {for the interval } [0,\lambda _1], \\ \frac{2(1-\lambda _1-\lambda _2)}{N}, &{}\quad \hbox {for the interval } [\lambda _1,1-\lambda _2], \\ \frac{4(1-\lambda _2)}{N}, &{}\quad \hbox {for the interval } [1-\lambda _2,1]. \end{array} \right. \end{aligned}$$

3 B-spline collocation method

Let \(\overline{\varOmega }^N=\{x_0,x_1,x_2,\ldots ,x_N\}\) be a Shishkin mesh defined in Sect. 2, and then, the cubic B-spline functions (Kadalbajoo and Yadaw 2008) are given as follows:

$$\begin{aligned} B_i(x)=\frac{1}{\tilde{h}^3}\left\{ \begin{array}{ll} (x-x_{i-2})^3, &{}\quad \hbox {if } x_{i-2}\le x\le x_{i-1}, \\ \tilde{h}^3+3\tilde{h}^2(x-x_{i-1})+3\tilde{h}(x-x_{i-1})^2 -3(x-x_{i-1})^3, &{} \quad \hbox {if } ~x_{i-1}\le x\le x_{i},\\ \tilde{h}^3+3\tilde{h}^2(x_{i+1}-x)+3\tilde{h}(x_{i+1}-x)^2 -3(x_{i+1}-x)^3, &{} \quad \hbox {if } ~x_{i}\le x\le x_{i+1},\\ (x_{i+2}-x)^3, &{} \quad \hbox {if } ~x_{i+1}\le x\le x_{i+2}, \\ 0, &{}\quad \hbox {otherwise,} \end{array} \right. \end{aligned}$$
(2)

where \(i=0,1,2,\ldots ,N\). For the above functions (2), we introduce four additional knots \(x_{-2}<x_{-1}<x_0\) and \(x_{N+2}>x_{N+1}>x_N\). Obviously, each of the function \(B_i(x)\) is twice continuously differentiable on the entire real line. In addition, for each \(x_j\), \(j=0,1,\ldots ,N\), we have

$$\begin{aligned} B_i(x_j)=\left\{ \begin{array}{ll} 4, &{}\quad \hbox {if }~i=j, \\ 1, &{}\quad \hbox {if }~i-j=\pm 1,\\ 0, &{}\quad \hbox {if }~i-j=\pm 2. \end{array} \right. \end{aligned}$$
(3)

Similarly, we can show that

$$\begin{aligned} B'_i(x_j)=\left\{ \begin{array}{ll} 0, &{}\quad \hbox {if }~i=j, \\ \displaystyle {\pm \frac{3}{\tilde{h}}}, &{}\quad \hbox {if }~i-j=\pm 1, \\ 0, &{}\quad \hbox {if }~i-j=\pm 2 \end{array} \right. \end{aligned}$$
(4)

and

$$\begin{aligned} B''_i(x_j)=\left\{ \begin{array}{ll} \displaystyle {\frac{-12}{\tilde{h}^2}}, &{} \quad \hbox {if } ~i=j, \\ \displaystyle {\frac{6}{\tilde{h}^2}}, &{}\quad \hbox {if } ~i-j=\pm 1, \\ \displaystyle {0}, &{}\quad \hbox {if }~i-j=\pm 2. \end{array} \right. \end{aligned}$$
(5)

As far as we know, the dimensional of cubic B-spline function space is \(N+3\). Similar to (2), we first define two extra cubic B-spline functions \(B_{-1}\) and \(B_{N+1}\). Then, the cubic B-spline function space can be given as follows

$$\begin{aligned} \varPhi _3(\overline{\varOmega })=\texttt {span}\{B_{-1},B_0,B_1, \ldots ,B_{N+1}\}. \end{aligned}$$

Thus, for any cubic polynomial function S(x), we have

$$\begin{aligned} S(x)=\sum _{i=-1}^{N+1}a_iB_i(x), \end{aligned}$$
(6)

where \(a_i\) are unknown real coefficients.

Here, we use function S(x) defined in (6) to approximate the exact solution of (1), yield

$$\begin{aligned} LS(x_i)=f(x_i),~~~~0\le i \le N, \end{aligned}$$
(7)

and

$$\begin{aligned} S(x_0)=A,~~~~S(x_N)=B. \end{aligned}$$
(8)

By using the values of B-spline functions \(B_i\) and of derivatives at mesh points \(\overline{\varOmega }^N\), we obtain the following system of \(N+1\) linear equations with \(N+3\) unknown variables

$$\begin{aligned}&(-6\varepsilon -3\mu b_i\tilde{h}+c_i\tilde{h}^2)a_{i-1}+(12\varepsilon +4c_i\tilde{h}^2)a_i\nonumber \\&\quad +(-6\varepsilon +3\mu b_i\tilde{h}+c_i\tilde{h}^2)a_{i+1}=f_i\tilde{h}^2, \end{aligned}$$
(9)

where \(0\le i \le N\).

From the boundary conditions, we have

$$\begin{aligned} a_{-1}+4a_0+a_1=A, \end{aligned}$$
(10)

and

$$\begin{aligned} a_{N-1}+4a_N+a_{N+1}=B. \end{aligned}$$
(11)

Next, eliminating \(a_{-1}\) from first equation (9) and (10), we obtain

$$\begin{aligned}&(36\varepsilon +12\mu \tilde{h}b_0)a_0+6\mu b_0 \tilde{h}a_1\nonumber \\&\quad =\tilde{h}^2f_0-A(-6\varepsilon -3\mu b_0\tilde{h}+c_0\tilde{h}^2). \end{aligned}$$
(12)

Similarly, we have

$$\begin{aligned}&(-6\mu \tilde{h}b_N)a_{N-1}+(36\varepsilon -12\mu b_N\tilde{h})a_N\nonumber \\&\quad =\tilde{h}^2f_N-B(-6\varepsilon +3\mu b_N\tilde{h}+c_N\tilde{h}^2). \end{aligned}$$
(13)

Finally, we can get the following system of \(N+1\) linear equations with \(N+1\) unknown variables

$$\begin{aligned} Tx_N=d_N, \end{aligned}$$
(14)

where \(x_N=(a_0,a_1,\ldots ,a_{N})^{\mathrm {T}}\) are the unknown real coefficients with right hand side

$$\begin{aligned} d_N&=(\tilde{h}^2f_0-A(-6\varepsilon -3\mu b_0\tilde{h}+c_0\tilde{h}^2),\tilde{h}^2f_1,\ldots ,\tilde{h}^2f_{N-1},\\&\qquad \tilde{h}^2f_N-B(-6\varepsilon +3\mu b_N\tilde{h}+c_N\tilde{h}^2))^{\mathrm {T}} \end{aligned}$$

and

$$\begin{aligned} T=\left[ \begin{array}{ccccccc} t_{0,0} &{} \quad t_{0,1} &{} \quad 0 &{} \quad 0 &{} \quad \cdots &{} \quad 0 &{} \quad 0 \\ t_1 &{} \quad t_2 &{} \quad t_3 &{} \quad 0 &{} \quad \cdots &{} \quad 0 &{} \quad 0 \\ 0 &{} \quad t_1 &{} \quad t_2 &{} \quad t_3 &{} \quad \cdots &{} \quad 0 &{} \quad 0 \\ \vdots &{} \quad \vdots &{} \quad \ddots &{} \quad \ddots &{} \quad \ddots &{} \quad \vdots &{} \quad \vdots \\ 0 &{} \quad 0 &{} \quad \cdots &{} \quad t_1 &{} \quad t_2 &{} \quad t_3 &{} \quad 0 \\ 0 &{} \quad 0 &{} \quad \cdots &{} \quad 0 &{} \quad t_1 &{} \quad t_2 &{} \quad t_3 \\ 0 &{} \quad 0 &{} \quad \cdots &{} \quad 0 &{} \quad 0 &{} \quad t_{N,N-1} &{} \quad t_{N,N} \end{array} \right] , \end{aligned}$$

where

$$\begin{aligned} t_{0,0}= & {} 36\varepsilon +12\mu \tilde{h}b_0,~~~~ t_{0,1}=6\mu b_0\tilde{h},\\ t_1= & {} -6\varepsilon -3\mu b_i\tilde{h}+c_i\tilde{h}^2,\\ t_2= & {} 12\varepsilon +4c_i\tilde{h}^2,~~~~ t_3=-6\varepsilon +3\mu b_i\tilde{h}+c_i\tilde{h}^2,\\ t_{N,N-1}= & {} -6\mu \tilde{h}b_N, ~~~~ t_{N,N}=36\varepsilon -12\mu b_N\tilde{h}. \end{aligned}$$

It is easy to see that the matrix T is strictly diagonally dominant. Thus, we can solve the above system equations (14) for \(a_0,a_1,\ldots ,a_N\). Furthermore, we obtain \(a_{-1}\) and \(a_{N+1}\) by substitute \(a_0,a_1,\ldots ,a_N\) into the boundary condition (10) and (11). Hence the collocation method by using a basis of cubic B-splines functions applied to problem (1) has a unique solution.

4 A differential evolution algorithm to optimize the Shishkin mesh parameters

4.1 The objective function

In general, the exact solution of problem (1) is not available, especially for the nonlinear problem. Thus, in order to estimate the absolute errors of numerical solution of problem (1), we can use the double-mesh principle developed in Matthews et al. (2002) to estimate the absolute errors. Obviously, for each \(\varepsilon \), \(\mu \) and N, the solution of (7) is a binary function about variables \(\delta _1\) and \(\delta _2\). So, we define \(\mathbf S _{\varepsilon ,\mu }^N(\delta _1,\delta _2)\) be the solution of the approximate scheme on the original Shishkin mesh. Similarly, on the mesh produced by uniformly bisecting the origin mesh, we define \(\mathbf S _{\varepsilon ,\mu }^{2N}(\delta _1,\delta _2)\). Then we can use the following formula to estimate the maximum point-wise error

$$\begin{aligned} E_{\varepsilon ,\mu }^N(\delta _1,\delta _2) =\Vert \mathbf S _{\varepsilon ,\mu }^N(\delta _1,\delta _2) -\mathbf S _{\varepsilon ,\mu }^{2N}(\delta _1,\delta _2)\Vert _{\infty }. \end{aligned}$$
(15)

In practical computation, one may choose suitable parameters \(\delta _1\) and \(\delta _2\) to make the value of \(E_{\varepsilon ,\mu }^N(\delta _1,\delta _2)\) as small as possible. Therefore, in this paper, we may transform the problem of mesh parameter calculation into the following nonlinear unconstrained optimization problem

$$\begin{aligned} \texttt {Fitness}=\min \Vert \mathbf S _{\varepsilon ,\mu }^N(\delta _1,\delta _2) -\mathbf S _{\varepsilon ,\mu }^{2N}(\delta _1,\delta _2) \Vert _{\infty }. \end{aligned}$$
(16)

Obviously, the above objective function (16) is an implicit function above variables \(\delta _1\) and \(\delta _2\), and is not differentiable. So, some traditional optimization methods are not suitable to solve it. In addition, once the above objective function (16) has many local extreme points, the traditional optimization methods may not find the global optimization solution, efficiently.

4.2 A brief review of differential evolution algorithm

Differential evolution (DE) algorithm presented by Srorn and Price (1997) is an effective and practical intelligent optimization algorithm. It aims at solving an optimization problem by evolving a population of D-dimensional parameter vectors, so-called individuals, which encode the candidate solutions, i.e., \(\mathbf x _{i,G}=(x_{1i,G},\ldots ,x_{Di,G})\), \(i=1,\ldots ,\hbox {NP}\) toward the global optimum. Here, NP be the number of individuals in the population and \(\mathbf x _{i,G}\) be each target vector at the generation G. First, the initial population should be chosen by uniformly randomizing individuals with the search space constrained by the minimum and maximum bounds \(\mathbf x _{min}\) and \(\mathbf x _{max}\). Then, the DE algorithm can be concluded three operations: mutation, crossover and selection.

4.2.1 Mutation

After initialization, for each target vector \(\mathbf x _{i,G}\), a mutant vector \(\mathbf v _{i,G}=(v_{1i,G}, v_{2i,G}, \ldots , v_{Di,G})\) is generated by

$$\begin{aligned} \mathbf v _{i,G+1}=\mathbf x _{r1,G}+F(\mathbf x _{r_2,G}-\mathbf x _{r_3,G}),\;r_1\ne r_2\ne r_3\ne i,\nonumber \\ \end{aligned}$$
(17)

where \(r_1,r_2,r_3\in [1,\hbox {NP}]\) are mutually different random indexes and \(F\in [0,2]\) is a scaling constant that controls the amplification of the difference vector \((\mathbf x _{r_2,G}-\mathbf x _{r_3,G})\).

4.2.2 Crossover

A trial vector \(\mathbf u _{i,G+1}=(u_{1i,G+1},u_{2i,G+1},\ldots ,u_{Di,G+1})\) is obtained by the crossover operator, according to the following scheme

$$\begin{aligned} u_{ji,G+1}=\left\{ \begin{aligned} v_{ji,G+1},&\quad \text {if}\; r(j)\le \hbox {CR}\; \text {or}\; j=rn(i), \\ x_{ji,G},&\quad \text {if}\; r(j)>\hbox {CR}\; \text {and}\; j\ne rn(i), \end{aligned} \right. \end{aligned}$$
(18)

where \(j=1,2,\ldots ,D\), \(r(j)\in [0,1]\) is the jth evaluation of uniform random generator number. \(\hbox {CR}\in [0,1]\) is a user-specified constant which controls the fraction of parameter values copied from the mutant vector. \(rn(i)\in (1,2,\ldots ,D)\) is a randomly chosen index which ensures that \(\mathbf u _{i,G+1}\) gets at least one element from \(\mathbf v _{i,G+1}\).

4.2.3 Selection

A greedy selection scheme is given by

$$\begin{aligned} \mathbf x _{i,G+1}=\left\{ \begin{array}{ll} \mathbf u _{i,G+1},&{}\quad \text {if}\; f(\mathbf u _{i,G+1})<f(\mathbf x _{i,G}), \\ \mathbf x _{i,G},\;&{}\quad \text {otherwise}, \end{array} \right. \end{aligned}$$
(19)

where \(j=1,2,\ldots ,D\). In other words, if, and only if, the trial vector \(\mathbf u _{i,G+1}\) has less or equal objective function value than the corresponding target vector \(\mathbf x _{i,G+1}\), the trial vector \(\mathbf u _{i,G+1}\) will replace the target vector \(\mathbf x _{i,G+1}\) and enter the population of the next generation. Otherwise, the old value \(\mathbf x _{i,G}\) will remain in the population for the next generation.

DE algorithm can be used to solve multi-objective, non-differentiable problems, and so on. It is very efficiently in a lot of diverse engineering applications such as neural networks (Piotrowski 2014), image processing (Ali et al. 2014), etc.

4.3 Previous work related to DE algorithm

The effectiveness of standard DE algorithm in solving a complicated optimization problem highly depends on the chosen mutation strategy and its associated parameter values. In the past few years, many DE researchers have some techniques for choosing trial vector generation strategies and their associated control parameter settings. According to Srorn and Price (1997), DE algorithm is very sensitive to the choice of parameters F and CR. The suggested choices are \(F\in [0.5,1]\), \(\hbox {CR}\in [0.8,1]\) and \(\hbox {NP}=10D\). Liu and Lampinen (2005) used control parameters set to \(F=0.9\), \(\hbox {CR}=0.9\). Ali and Törn (2004) chose \(\hbox {CR}=0.5\) and used the following scheme to calculate F

$$\begin{aligned} F=\left\{ \begin{aligned}&\text {max}\left( l_\mathrm{min}, 1-\left| \frac{f_\mathrm{max}}{f_\mathrm{min}}\right| \right) ,\quad \text {if}\; \left| \frac{f_\mathrm{max}}{f_\mathrm{min}}\right| <1 \\&\text {max}\left( l_\mathrm{min}, 1-\left| \frac{f_\mathrm{min}}{f_\mathrm{max}}\right| \right) ,\quad \text {otherwise}, \end{aligned} \right. \end{aligned}$$
(20)

where \(f_{max}\) and \(f_{min}\) are the maximum and minimum values of vectors \(\mathbf{x}_{i,G}\), respectively. Gämperle et al. (2002) considered different parameter settings for DE on Sphere, Rosenbrock and Rastrigin functions. In their experiment results, the scaling factor F is equal to 0.6, the crossover rate CR be between [0.3, 0.9], and NP be between [3D, 8D]. Rönkkönen et al. (2005) suggested using F values in [0.4, 0.95] and CR values in [0, 0.2]. Recently, several researchers (Zaharie 2003; Zaharie and Petcu 2003; Abbass 2002) have developed some approaches to control parameters F and CR. Very recently, more and more researchers paid attention to the self-adaptive DE algorithm, see, e.g., Qin and Suganthan (2005), Omran et al. (2005) and Rahnamayan et al. (2008).

Table 1 Numerical results calculated by using different algorithms with \(\varepsilon =10^{-8}\), \(\mu =10^{-10}\), \(N=32\)
Table 2 Numerical results calculated by using different algorithms with \(\varepsilon =10^{-8}\), \(\mu =10^{-10}\), \(N=64\)
Table 3 Numerical results calculated by using different algorithms with \(\varepsilon =10^{-8}\), \(\mu =10^{-10}\), \(N=128\)
Table 4 Numerical results calculated by using different algorithms with \(\varepsilon =10^{-10}\), \(\mu =10^{-12}\), \(N=32\)
Table 5 Numerical results calculated by using different algorithms with \(\varepsilon =10^{-10}\), \(\mu =10^{-12}\), \(N=64\)
Table 6 Numerical results calculated by using different algorithms with \(\varepsilon =10^{-10}\), \(\mu =10^{-12}\), \(N=128\)
Table 7 Numerical results calculated by using different algorithms with \(\varepsilon =10^{-12}\), \(\mu =10^{-14}\), \(N=32\)
Table 8 Numerical results calculated by using different algorithms with \(\varepsilon =10^{-12}\), \(\mu =10^{-14}\), \(N=64\)
Table 9 Numerical results calculated by using different algorithms with \(\varepsilon =10^{-12}\), \(\mu =10^{-14}\), \(N=128\)

4.4 Self-adapting differential evolution (jDE) algorithm

Based on the above literature review, the effectiveness of convectional DE algorithm in solving a numerical optimization problem depends on the selected mutation strategy and its associated parameter values. Therefore, choosing suitable control parameter values for the convection DE algorithm is a very important task. In Brest et al. (2006), by introducing two new control parameters \(\tau _1\) and \(\tau _2\) to adjust the value of F and CR, Brest et al. proposed a self-adapting differential evolution (jDE) algorithm. The new parameters \(F_{i,G+1}\) and \(\hbox {CR}_{i,G+1}\) are calculated by

$$\begin{aligned} F_{i,G+1}&=\left\{ \begin{aligned}&F_l+\hbox {rand}_1*F_u, \quad \text {if}\; \hbox {rand}_2<\tau _1, \\&F_{i,G},\qquad \qquad \qquad \quad \text {otherwise}, \end{aligned} \right. \end{aligned}$$
(21)
$$\begin{aligned} \hbox {CR}_{i,G+1}&=\left\{ \begin{aligned}&\hbox {rand}_3,\quad \text {if}\; \hbox {rand}_4<\tau _2, \\&\hbox {CR}_{i,G},\;\; \text {otherwise}, \end{aligned} \right. \end{aligned}$$
(22)

where \(\hbox {rand}_j\in [0,1]\), \(j=1,2,3,4\) are uniform random values, \(F_l=0.1\) and \(F_u=0.9\) are the lower and upper bounds of the F, respectively. Obviously, from Equations (21)–(22), the new F takes a value from [0.1, 1.0] in a random manner and the new CR takes a value form [0, 1].

In our paper, in order to solve the above nonlinear optimization problem (16), we will use the technique presented in (21)–(22) to obtain the control parameters F and CR. For the population size NP, we do not change it during the run.

5 Numerical experiments

In this section, the following numerical example is given to illustrate the effectiveness of the presented method

$$\begin{aligned}&-\varepsilon u''(x)+\mu u'(x)+u(x)=\cos \pi x,~~~~x\in (0,1), \end{aligned}$$
(23)
$$\begin{aligned}&u(0)=0,~~~~u(1)=0. \end{aligned}$$
(24)

For two given parameters \(\delta _1\) and \(\delta _2\), let \(\texttt {S}^N\) and \(\texttt {S}^{2N}\) be the numerical solutions which are calculated on N and 2N mesh intervals, respectively. Then, the following formula is defined:

$$\begin{aligned} E_{\varepsilon ,\mu }^{N}=\Vert \texttt {S}^N-\texttt {S}^{2N}\Vert _{\infty }. \end{aligned}$$

Here, to solve the above problem (23)-(24), we first use jDE algorithm to optimize the problem (16), and obtain the optimal parameters \(\delta _1\) and \(\delta _2\) and the corresponding numerical solution. To facilitate the experiments, we use the software MATLAB2012a to program a M-file for implementing the algorithms on a PC with a 32-bit windows 7 operating system, a 4GB RAM and a 3.10 GHz-core(TM) i5-based processor.

In the experiment, in order to illustrate the advantages of the jDE algorithm to solve above optimize problem (16), we also calculate the numerical results by using (original) DE, particle swarm optimization (PSO) algorithm (Kennedy and Eberhart 1995), comprehensive learning particle swarm optimization(CLPSO) (Liang et al. 2006), real-coded genetic algorithm (rcGA) (Ono and Kobayashi 1997), covariance matrix adaptation evolution strategy (CMA-ES) (Hansen et al. 2003), self-adaptive DE (SaDE) algorithm (Qin et al. 2009), JADE (Zhang and Sanderson 2009) and self-adapting differential evolution (jDE) (Brest et al. 2006).

Throughout this paper, the parameter settings of each stochastic algorithm are as follows:

  1. (1)

    the maximum number of generations \(D=50\), the population size \(\hbox {NP}=50\).

  2. (2)

    For the PSO and CLPSO algorithms, two accelerating factors are set to 0.5 and 0.5, respectively. Inertia weight factor is set to 0.45.

  3. (3)

    For the (original) DE, SaDE and JADE algorithms, crossover factor \(F=0.5\), crossover probability \(\hbox {CR}=0.1\).

In our experiment, for different values of \(\varepsilon \), \(\mu \) and N, the numerical results of 30 independent runs are summarized in Tables 1, 2, 3, 4,5, 6, 7, 8 and 9. The maximum values, minimum values of \(E_{\varepsilon ,\mu }^{N}\) and the corresponding parameters \(\delta _1\), \(\delta _2\) are also listed in Tables 1, 2, 3, 4,5, 6, 7, 8 and 9. Meanwhile, in order to compare the robustness of the each algorithm, the average computed values of \(E_{\varepsilon ,\mu }^{N}\) and variance are also given. It can be seen from Tables 1-9 that the computing precision of jDE is slightly higher than the other seven algorithms (PSO, CLPSO, rcGA, CMA-ES, DE, SaDE, JADE) by comparing the maximum, minimum and mean values. In addition, the algorithm stabilization of jDE is slightly stronger than the other seven algorithms (PSO, CLPSO, rcGA, CMA-ES, DE, SaDE and JADE) by comparing the variance values. The experimental results show that the jDE algorithm has certain competition advantages in computing precision and algorithm stabilization than the other seven algorithms (PSO, CLPSO, rcGA, CMA-ES, DE, SaDE and JADE).

The comparison of statistical data shows that the jDE algorithms give better results than rcGA, PSO, CLSPO and CMA-ES algorithm. Furthermore, jDE algorithm performs better than original DE, while original DE algorithm does not always perform better than PSO. Thus, to further illustrate the advantages of jDE algorithm, the convergence speed of the jDE, DE, SaDE, JADE and PSO algorithms are plotted in Figs. 1, 2 and 3 with \(\varepsilon =10^{-8}, \mu =10^{-10}\), \(\varepsilon =10^{-10}, \mu =10^{-12}\), \(\varepsilon =10^{-12}, \mu =10^{-14}\) and \(N=32\), respectively. Obviously, from these figures, one can easily see that jDE algorithm is certainly better than PSO, DE, SaDE, JADE. It is shown from Figs. 1, 2 and 3 that the jDE algorithm has certain superiority in convergence velocity than the other four algorithms.

Fig. 1
figure 1

Fitness of different algorithms for generation with \(\varepsilon =10^{-8}\), \(\mu =10^{-10}\) and \(N=32\)

Fig. 2
figure 2

Fitness of different algorithms for generation with \(\varepsilon =10^{-10}\), \(\mu =10^{-12}\) and \(N=32\)

Fig. 3
figure 3

Fitness of different algorithms for generation with \(\varepsilon =10^{-12}\), \(\mu =10^{-14}\) and \(N=32\)

6 Conclusions

In most of the cases, the Shishkin mesh method is frequently used to solve the singularly perturbed problem. However, for the Shishkin mesh transition points, almost all of authors are arbitrary selection parameters. Thus, this work can be viewed as a preliminary step-up in finding a challenging numerical method to obtain the Shishkin mesh transition points. Specially, we transform the Shishkin mesh transition parameter selection problem into a nonlinear unconstrained optimization problem which is solved by using the self-adapting differential evolution (jDE) algorithm. The experimental results show that the jDE algorithm has certain competition advantages in computing precision, algorithm stabilization and convergence speed than the state-of-art algorithms. It is noted that the method presented in this paper can be extend to other type of singularly perturbed problems.