1 Introduction

Particle swarm optimization (PSO) was introduced by Kennedy and Eberhart in 1995 for solving optimization problems (Kennedy and Eberhart 1995). Currently, it is a computationally effective and widely used stochastic optimization algorithm. From the time of initial introduction of the PSO algorithm, various improved PSO algorithms have been proposed in order to enhance its performance (Ding et al. 2014; Lim and Isa 2014; Wang et al. 2014; Yu and Zhang 2014; Zou et al. 2014; Shi and Eberhart 1999; Nickabadi et al. 2011; Feng et al. 2007; Shen et al. 2010; Adewumi and Arasomwan 2016; Chuang et al. 2011; Gandomi et al. 2013; Arasomwan and Adewumi 2014; Malik et al. 2007; Ali and Kaelo 2008), and it has been applied in various areas (Behnamian and Fatemi 2010; Afshar 2012; Victoire and Jeyakumar 2005; Franken and Engelbrecht 2005). Compared with the improvement and application researches of PSO algorithm, its theoretical analyses were relatively few and not very much perfect. The theoretical researches of PSO can be roughly classified into three categories, namely (a) the research on particle’s trajectory; (b) the research on the parameter selection and stability; and (c) the research on convergence based on the framework of stochastic optimization algorithm.

The research on particle’s trajectory: The first theoretical analysis of a simple model of PSO has been proposed by Ozcan and Mohan (1998, 1999), they analyzed the particle’s trajectory of the simple model of PSO in one-dimensional and multidimensional space under the assumption that the random variable, the optimal position of individual particle, and the swarm’s optimal position are constant. Clerc and Kennedy (2002) have investigated the characteristics and convergence of a particle trajectory based on the model presented in Ozcan and Mohan (1999). van den Bergh (2002) has analyzed the particle trajectory for the model of Ozcan and Mohan (1999) with inertia weight. Li et al. (2006) have made a thorough research on the stability of particle’s trajectory of PSO based on difference equation and Z transform.

The research on the parameter selection and stability: The research on the parameter selection and stability of PSO is mainly based on the dynamic system stability theory. Trelea has analyzed convergence of PSO and given a set of parameter selection to balance the exploration and exploitation abilities of PSO (Trelea 2003). Fernandez-Martinez and Garcia-Gonzalo have investigated the stability of the discrete PSO and continue PSO (Fernandez-Martinez et al. 2011). Kadirkamanathan et al. (2006) have described the optimal particle as a nonlinear feedback system when PSO stagnated, and analyzed the sufficient condition of stability based on Lyapunov theorem. Emara and Fattah (2004) have proposed a continue model of PSO and proved the stability of the model based on Lyapunov theorem. Rapaic and Kanovic (2009) have presented a formal convergence analysis of PSO with time-varying parameters.

The research on convergence based on the framework of stochastic optimization algorithm: The above analyses did not consider the randomness of the PSO algorithm. Obviously, these results more or less deviate from the real particle swarm system due to the loss of the randomness (Jiang et al. 2007). Therefore, some theoretical analyses of PSO have been presented under random environment. For standard particle swarm optimization algorithm, Jiang et al. (2007) have analyzed the convergent condition of variance and expectation of the particle’s position based on stochastic process and proposed the parameter selection guidelines by convergence analysis. van den Bergh and Engelbrecht (2006) have proved that standard particle swarm optimization algorithm was neither local convergent nor global convergence by using the convergence criterion proposed by Solis and Wets and proposed an improved PSO algorithm to ensuring the local convergence. Chen and Chen (2011) have introduced a statistical interpretation of PSO and given theoretical results on the convergence time by using the statistical model. Jin et al. (2007) have presented a sufficient condition of the system mean-square to be stable based on stochastic process.

For a deterministic optimization algorithm, convergence analysis means that the iterative sequence generated by the algorithm converges to the optimal solution. Similarly, the convergence of a stochastic optimization algorithm should be that the iterative sequence generated by the algorithm converges to the optimal solution with probability 1. Aforementioned theoretical analyzes were mainly based on the dynamic system theory, difference equation theory, and stochastic process theory and did not use the Markov chain theory to consider the probability convergence of PSO. Based on the Markov chain theory, Pan et al. (2013) have researched the probability convergence of the standard PSO algorithm. But the transition probability of a particle was calculated by intuitive geometric method without considering the density of a probability distribution, the correctness of result can be influenced due to the loss of rigorous mathematical proof. It is necessary that the probability convergence of the standard PSO algorithm is proved by rigorous mathematical methods and the improved PSO algorithm that is convergence with probability 1 is proposed.

In this paper, we derive the transition probability of a particle by using mathematical method for the standard PSO algorithm. Based on the derived result, we prove that the standard PSO algorithm is convergence with probability 1 under certain condition. Then, we proposed a new improved particle swarm optimization algorithm which is not only convergence with probability 1, but also improve convergence rate and the solution quality.

This paper is organized as follows. In Sect. 2, the standard PSO algorithm is described in detail. Section 3 calculates the density function of update particle. In Sect. 4, the convergence analysis is made. Proposed algorithm and numerical simulations are given in Sect. 5. Finally, Sect. 6 gives conclusions.

2 The standard PSO algorithm

In standard PSO algorithm, each solution in the search region \(S=\{x\in R^{D}|l_i \le x_i \le u_i ,l_i ,u_i \in R, \quad i=1,2,\ldots ,D\}\) can be modeled as a particle which moves in S by the velocity of the particle. Each particle has position variable and velocity variable. Suppose that a swarm consists of \(N_p \) particles, the position and velocity of the ith particle of the swarm are denoted as a D-dimensional vectors \(X_i =(x_{i1} ,x_{i2} ,\ldots ,x_{iD} )^\mathrm{T}\in S \) and \(V_i =(v_{i1} ,v_{i2} ,\ldots ,v_{iD} )^\mathrm{T}\in V=\{(v_1 ,v_2 ,\ldots ,v_D )|v_{\min } \le v_i \le v_{\max } \}\), \(i=1,2,\ldots ,N_p \), respectively. The previous best position of the ith particle is represented by \(P_i =(p_{i1} ,p_{i2} ,\ldots , p_{iD} )^\mathrm{T}\). The previous best position of the swarm is denoted as \(P_g =(p_{g1} ,p_{g2} ,\ldots ,p_{gD} )^\mathrm{T}\). At iteration k, the d th dimension of particle i’s velocity and position are updated as follows:

$$\begin{aligned} v_{id}^{k+1}= & {} \omega _k v_{id}^k +c_1 r_1 \left( p_{id}^k -x_{id}^k \right) +c_2 r_2 \left( p_{gd}^k -x_{id}^k \right) ,\nonumber \\ \end{aligned}$$
(1)
$$\begin{aligned} x_{id}^{k+1}= & {} x_{id}^k +v_{id}^{k+1} , \end{aligned}$$
(2)

where \(c_1 \) and \(c_2 \) are two acceleration factors, \(r_1 \) and \(r_2 \) are two random numbers, uniformly distributed in the range \(\left[ {0,1} \right] \), and \(\omega \) is inertia factor which can defined as follows:

$$\begin{aligned} \omega _k =(\omega _{\text {start}} -\omega _{\text {end}} )\left( {\frac{K_{\max } -k}{K_{\max } }} \right) +\omega _{\text {end}} , \end{aligned}$$
(3)

where \(\omega _{\text {start}} \) and \(\omega _{\text {end}} \) are the initial and final values of inertia weight, k is the current iteration number, \(K_{\max } \) is the maximum iteration number.

According to (1) and (2), the position update equation can also be described as

$$\begin{aligned} x_{id}^{k+1} =x_{id}^k +\omega _k v_{id}^k +c_1 r_1 \left( p_{id}^k -x_{id}^k \right) +c_2 r_2\left( p_{gd}^k -x_{id}^k \right) . \end{aligned}$$
(4)

3 Probability density function

Let \(\xi \) and \(\eta \) be two independent uniform random variables distributed in [0, 1], we discuss the probability density function of the random variable \(\zeta =a\xi +b\eta +c\), where a, b, and c are real numbers. Lemma 1 was introduced in the textbook of probability theory.

Lemma 1

Assume that \((\xi ,\eta )\) is a two-dimensional continuous random variable, and it’s joint density function is p(xy). If functions \(\left\{ {{\begin{array}{l} {u=g_1 (x,y)} \\ {v=g_2 (x,y)} \\ \end{array} }} \right. \) have continuous partial derivative, and it exists uniquely inverse functions \(\left\{ {{\begin{array}{l} {x=x(u,v)} \\ {y=y(u,v).} \\ \end{array} }} \right. \) Let \(\left\{ {{\begin{array}{l} {U=g_1 (\xi ,\eta )} \\ {V=g_2 (\xi ,\eta ),} \\ \end{array} }} \right. \) then the joint density function of (UV) is defined as follows:

$$\begin{aligned} p(u,v)=p(x(u,v),y(u,v))|J| \end{aligned}$$
(5)

where \(J=\frac{\partial (x,y)}{\partial (u,v)}\) is Jacobian determinant.

Lemma 2

Assume that \(\xi \) and \(\eta \) are two mutually independent random variables, and uniformly distributed in the range [0, 1], then the probability density function of the random variable \(\zeta =a\xi +b\eta +c\,(a\ne 0,b\ne 0,a\ne b)\) is

$$\begin{aligned} p_\zeta (u)=\left\{ {\begin{array}{ll} \frac{u-\alpha }{\left| a \right| \left| b \right| }, &{}\quad \hbox {if }\,\alpha \le u\le \beta \\ {} \frac{\min \left\{ {\left| a \right| ,\left| b \right| } \right\} }{\left| a \right| \left| b \right| }, &{}\quad \hbox {if }\beta \le u\le \gamma \\ {} \frac{\tau -u}{\left| a \right| \left| b \right| }, &{}\quad \hbox {if }\gamma \le u\le \tau \\ 0, &{}\quad \hbox {otherwise} \\ \end{array}} \right. \end{aligned}$$
(6)

where \(\alpha =\min \{0,a\}+\min \{0,b\}+c\), \(\beta =\min \big \{ {\min \left\{ {0,a} \right\} +\max \left\{ {0,b} \right\} ,\max \left\{ {0,a} \right\} +\min \left\{ {0,b} \right\} } \big \}+c\), \(\gamma = \max \left\{ {\min \left\{ {0,a} \right\} +\max \left\{ {0,b} \right\} ,\max \left\{ {0,a} \right\} +\min \left\{ {0,b} \right\} } \right\} +c,\) and \(\tau =\max \{0,a\}+\max \{0,b\}+c\).

Proof

Since \(\xi \) and \(\eta \) are uniformly distributed random variables in [0, 1], the density functions of \(\xi \) and \(\eta \) are \(p_\xi (x)=\left\{ {\begin{array}{ll} 1,&{}\quad \hbox { if }\,\,0\le x\le 1 \\ 0,&{}\quad \hbox { otherwise } \\ \end{array}} \right. \) and \(p_\eta (y)=\left\{ {\begin{array}{ll} 1,&{}\quad \hbox { if }\,\,0\le y\le 1 \\ 0,&{}\quad \hbox { otherwise } \\ \end{array}} \right. \), respectively. \(\xi \) and \(\eta \) are mutually independent random variables; therefore, joint density function of \((\xi ,\eta )\) is \(p(x,y)=p_\xi (x)\cdot p_\eta (y)\).

Let \(\left\{ {\begin{array}{l} \zeta =a\xi +b\eta +c \\ \varsigma =a\xi \\ \end{array}} \right. \), then inverse functions of functions \(\left\{ {\begin{array}{l} u=ax+by+c \\ v=ax \\ \end{array}} \right. \) are

$$\begin{aligned} \left\{ {\begin{array}{l} x=\frac{v}{a} \\ y=\frac{u-v-c}{b} \\ \end{array}} \right. . \end{aligned}$$
(7)

Jacobian determinant of Eq. (7) is \(J=-\frac{1}{ab}\). By Lemma 1, we know that the joint density function of \((\zeta ,\varsigma )\) is

$$\begin{aligned} p(u,v)= & {} p\left( \frac{v}{a},\frac{u-v-c}{b}\right) \cdot \left| J \right| \nonumber \\= & {} p_\xi \left( \frac{v}{a}\right) p_\eta \left( \frac{u-v-c}{b}\right) \cdot \frac{1}{|ab|}. \end{aligned}$$
(8)

Thus, the density function of \(\zeta =a\xi +b\eta +c\) is

$$\begin{aligned} p_\zeta (u)=\int _{-\infty }^{+\infty } {p_\xi \left( \frac{v}{a}\right) p_\eta \left( \frac{u-v-c}{b}\right) \cdot \frac{1}{|ab|}} \hbox {d}v. \end{aligned}$$
(9)

When \(a,b>0\), \(p_\xi (\frac{v}{a})\cdot p_\eta (\frac{u-v-c}{b})=1\) only and only if \(0\le v\le a\), \(u-b-c\le v\le u-c\) and \(c\le u\le a+b+c\). Hence, if \(c\le u\le a+b+c\), then

$$\begin{aligned} p_\zeta (u)= & {} \int _{\max \{0,u-b-c\}}^{\min \{a,u-c\}} {\frac{1}{ab}}\hbox {d}v\\= & {} \frac{\min \{a,u-c\}-\max \{0,u-b-c\}}{ab}\\= & {} \frac{a+b-|u-a-c|-|u-b-c|}{2ab}. \end{aligned}$$

If \(u<c\) or \(u>a+b+c\), then

$$\begin{aligned} p_\zeta (u)=0. \end{aligned}$$

This implies that

$$\begin{aligned} p_\zeta (u){=}\left\{ {\begin{array}{ll} \frac{a+b-\left| {u-(a+c)} \right| -\left| {u-(b+c)} \right| }{2ab},&{}\quad \hbox { if }\,\,c\le u\le a{+}b{+}c \\ 0, &{}\quad \hbox {otherwise} \\ \end{array}} \right. . \end{aligned}$$
(10)

From Eq. (10), it follows that when \(a>b>0\),

$$\begin{aligned} p_\zeta (u)=\left\{ {\begin{array}{ll} \frac{u-c}{\left| a \right| \left| b \right| },&{}\quad \hbox {if }\,\,c\le u\le b+c \\ {} \frac{b}{\left| a \right| \left| b \right| },&{}\quad \hbox {if }\,\,b+c<u\le a+c \\ {} \frac{a+b+c-u}{\left| a \right| \left| b \right| },&{}\quad \hbox {if }\,\,a+c<u\le a+b+c \\ {} 0, &{}\quad \hbox {otherwise} \\ \end{array}} \right. , \end{aligned}$$
(11)

and when \(b>a>0\),

$$\begin{aligned} p_\zeta (u)=\left\{ {\begin{array}{ll} \frac{u-c}{\left| a \right| \left| b \right| }, &{}\quad \hbox { if }\,\,c\le u\le a+c \\ {} \frac{a}{\left| a \right| \left| b \right| }, &{}\quad \hbox { if }\,\,a+c<u\le b+c \\ {} \frac{a+b+c-u}{\left| a \right| \left| b \right| },&{} \quad \hbox { if }\,\,b+c<u\le a+b+c \\ {} 0, &{}\quad \hbox {otherwise} \\ \end{array}} \right. . \end{aligned}$$
(12)

Similarly, when \(a>0,b<0\) and \(a>-b\), we can obtain

$$\begin{aligned} p_\zeta (u)=\left\{ \begin{array}{ll} \frac{u-(b+c)}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,b+c\le u\le c \\ {} \frac{-b}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,c\le u\le a+b+c \\ {} \frac{a+c-u}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,a+b+c\le u\le a+c \\ {} 0,&{}\quad \hbox {otherwise} \\ \end{array} \right. \end{aligned}$$
(13)

When \(a>0,b<0\) and \(a<-b\), we have

$$\begin{aligned} p_\zeta (u)=\left\{ {{\begin{array}{ll} \frac{u-(b+c)}{\left| a \right| \left| b \right| }, &{}\quad \hbox { if }\,\,b+c\le u\le a+b+c \\ {} \frac{a}{\left| a \right| \left| b \right| }, &{}\quad \hbox { if }\,\,a+b+c\le u\le c \\ {} \frac{a+c-u}{\left| a \right| \left| b \right| }, &{}\quad \hbox { if }\,\,c\le u\le a+c \\ {} 0, &{}\quad \hbox {otherwise} \\ \end{array} }} \right. . \end{aligned}$$
(14)

When \(a<0,b>0\) and \(b>-a\), we have

$$\begin{aligned} p_\zeta (u)=\left\{ {{\begin{array}{ll} \frac{u-(a+c)}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,a+c\le u\le c \\ {} \frac{-a}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,c\le u\le a+b+c \\ {} \frac{b+c-u}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,a+b+c\le u\le b+c \\ {} 0,&{}\quad \hbox {otherwise} \\ \end{array} }} \right. \end{aligned}$$
(15)

When \(a<0,b>0\) and \(b<-a\), we get

$$\begin{aligned} p_\zeta (u)=\left\{ {{\begin{array}{ll} \frac{u-(a+c)}{\left| a \right| \left| b \right| }, &{}\quad \hbox { if }\,\,a+c\le u\le a+b+c \\ {} \frac{b}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,a+b+c\le u\le c\\ {} \frac{b+c-u}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,c\le u\le b+c \\ {} 0,&{}\quad \hbox { otherwise } \\ \end{array} }} \right. \end{aligned}$$
(16)

When \(a<0,b<0\) and \(a<b\), we get

$$\begin{aligned} p_\zeta (u)=\left\{ \begin{array}{ll} \frac{u-(a+b+c)}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,a+b+c\le u\le a+c \\ {} \frac{-b}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,a+c\le u\le b+c \\ {} \frac{c-u}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,b+c\le u\le c \\ {} 0,&{}\quad \hbox {otherwise} \\ \end{array} \right. \end{aligned}$$
(17)

When \(a<0,b<0\) and \(a>b\), we obtain

$$\begin{aligned} p_\zeta (u)=\left\{ {{\begin{array}{ll} \frac{u-(a+b+c)}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,a+b+c\le u\le b+c \\ {} \frac{-a}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,b+c\le u\le a+c \\ {} \frac{c-u}{\left| a \right| \left| b \right| },&{}\quad \hbox { if }\,\,a+c\le u\le c \\ {} 0,&{}\quad \hbox {otherwise} \\ \end{array} }} \right. \end{aligned}$$
(18)

Let \(\alpha =\min \{0,a\}+\min \{0,b\}+c\), \(\beta =\min \big \{ {\min \left\{ {0,a} \right\} +\max \left\{ {0,b} \right\} ,\max \left\{ {0,a} \right\} +\min \left\{ {0,b} \right\} } \big \}+c\), \(\gamma =\max \left\{ {\min \left\{ {0,a} \right\} +\max \left\{ {0,b} \right\} ,\max \left\{ {0,a} \right\} +\min \left\{ {0,b} \right\} } \right\} +c,\) and \(\tau =\max \{0,a\}+\max \{0,b\}+c\). Then, the unified form of Eqs. (11)–(18) can be written as follows:

$$\begin{aligned} p_\zeta (u)=\left\{ {\begin{array}{ll} \frac{u-\alpha }{\left| a \right| \left| b \right| },&{}\quad \text {if}\,\,\,\alpha \le u\le \beta \\ {} \frac{\min \left\{ {\left| a \right| ,\left| b \right| } \right\} }{\left| a \right| \left| b \right| },&{}\quad \text {if}\,\beta \le u\le \gamma \\ {} \frac{\tau -u}{\left| a \right| \left| b \right| },&{}\quad \text {if}\,\,\,\gamma \le u\le \tau \\ {} 0,&{}\quad \hbox {otherwise} \\ \end{array}} \right. . \end{aligned}$$

This completes the proof of the lemma.

If \(a=b\ne 0\), then \(\beta =\gamma \). By Lemma 2, the density function of \(\zeta =a\xi +b\eta +c\) is

$$\begin{aligned} p_\zeta (u)=\left\{ {\begin{array}{ll} \frac{u-\alpha }{\left| a \right| \left| b \right| },&{}\quad \text {if }\,\,\alpha \le u\le \beta \\ {} \frac{\tau -u}{\left| a \right| \left| b \right| },&{}\quad \text {if }\,\,\beta \le u\le \tau \\ {} 0,&{}\quad \hbox {otherwise} \\ \end{array}} \right. . \end{aligned}$$
(19)

If \(a=0\) and \(b\ne 0\), then \(\zeta =b\eta +c\) is one-dimensional random variable function. It is easy to get the density function of \(\zeta \)

$$\begin{aligned} p_\zeta (u)=\left\{ {{\begin{array}{ll} \frac{1}{\left| b \right| },&{}\quad \text {if }\alpha \le u\le \tau \\ {} 0,&{}\quad \hbox {otherwise} \\ \end{array} }} \right. . \end{aligned}$$
(20)

Similarly, If \(a\ne 0\) and \(b=0\), we can obtain the density function of \(\zeta =a\xi +c\)

$$\begin{aligned} p_\zeta (u)=\left\{ {{\begin{array}{ll} \frac{1}{\left| a \right| },&{}\quad \text {if}\,\alpha \le u\le \tau \\ {} 0,&{}\quad \hbox {otherwise} \\ \end{array} }} \right. . \end{aligned}$$
(21)

\(\square \)

4 Convergence analysis

4.1 Definitions and notation

Definition 1

Let f(x) is the real function, and \(x^{*}\in S\), if

$$\begin{aligned} f(x^{{*}})\le f(x),\quad \forall x\in S, \end{aligned}$$

then \(x^{{*}}\) is said to be a global optimal solution on S.

Definition 2

\(\forall \varepsilon >0\), let

$$\begin{aligned} B_\varepsilon =\left\{ {x\in S\left| {|f(x)-f(x^{{*}})|<\varepsilon |} \right. } \right\} , \end{aligned}$$

then \(B_\varepsilon \) is called a set of \(\varepsilon \)-optimal solutions where \(x\in B_\varepsilon \) is said to be \(\varepsilon \)-optimal solution.

By Eq.(3), we know that the ith particle state in PSO swarm is determined by the vector \(\varphi _i =(X_i ,V_i ,p_i ,p_g )^\mathrm{T}\in \varGamma \), where \(\varGamma =S\times V\times S\times S\in R^{4D}\).

Definition 3

Let

$$\begin{aligned} \varPhi _{Np} =\left\{ \varPhi \vert \varPhi =(\varphi _1, \varphi _2, \ldots , \varphi _{Np}) ,\varphi _i \in \varGamma , \, i=1,2,\ldots ,Np \right\} , \end{aligned}$$

then \(\varPhi _{Np} \) is said to be the state space, where \(\varPhi \in \varPhi _{Np} \) is called a state.

Definition 4

Let

$$\begin{aligned} \varOmega =\left\{ {\omega |\omega =(\varPhi _1 ,\varPhi _2 ,\varPhi _3 ,\ldots ),\varPhi _k \in \varPhi _{Np} ,\forall k} \right\} , \end{aligned}$$

then \(\varOmega \) is said to be the sample space, \(\omega \in \varOmega \) represents a sequence of states. In here \(\varPhi _k =(\varphi _1^k ,\varphi _2^k ,\ldots ,\varphi _{Np}^k )\) and \(\varphi _i^k =(X_i^k ,v_i^k ,p_i^k ,p_g^k )\), \(i=1,2,\ldots ,NP\).

We define the function,

$$\begin{aligned} \chi _B (\varPhi )=\sum _{i=1}^{Np} {(1_B (X{ }_i)+1_B (p_i )+1_B (p_g ))},\quad \forall \varPhi \in \varPhi _{Np} \end{aligned}$$
(22)

where \(1_B (x)\) is the indication function of B, i.e., \(1_B (x)=1\) if \(x\in B\), \(1_B (x)=0\) otherwise.

We define the transition probability,

$$\begin{aligned} p\{\chi _B \big (\varPhi _{k+1} \big )\ne 0\}= & {} p\{\chi _B \big (\varPhi _{k+1} \big )\ne 0|\varPhi _k =\varPhi \},\nonumber \\&\forall \varPhi \in \varPhi _{Np} \end{aligned}$$
(23)

which is the conditional probability.

In this paper, assume that f(x) is continuous function.

By the assumption, we know that there exists a positive real number \(\delta \) such that

$$\begin{aligned} \mathrm{E}_\delta&{=}&\{(x_1 ,x_2 ,\ldots ,x_D )||x_i {-}x_i^*|<\delta ,i{=}1,2,\ldots ,D\}\subset B_\varepsilon ,\nonumber \\ \end{aligned}$$
(24)

where \(x_i^*\) is the ith dimension of \(x^{*}\). Notice that if \(x_i^*=l_i \) or \(x_i^*=u_i \), then the inequality \(|x_i -x_i^*|<\delta \) is replaced by \(0\le x_i -x_i^*<\delta \) or \(\delta <x_i -x_i^*\le 0\). For convenience, we only consider the case of Eq. (24) in this paper.

By Eq. (24), we know that \(\mu (B_\varepsilon )>0\) where \(\mu \) is the Lebesgue measure on \(R^{D}\).

4.2 Convergence proof

In standard PSO algorithm, we can see that the \((k+1)\) th moment state \(\varPhi _{k+1} \) of the particle swarm depends only on the kth moment state \(\varPhi _k \). Therefore, the stochastic state sequence \(\{\varPhi _k \}\) generated by the standard PSO algorithm is a Markov chain.

Lemma 3

\(\forall \varPhi \in \varPhi _{Np} \), \(\chi _{B_\varepsilon } (\varPhi )\ne 0\) if and only if \(p_g \in B_\varepsilon \).

Proof

Assume \(p_g \notin B_\varepsilon \), then \(1_{B_\varepsilon } (p_g )=0\). \(p_g \) is the best position of the swarm, therefore \(f(X_i )\ge f(p_g )\) and \(f(p_i )\ge f(p_g )\), \(\forall i\). Thus \(x_i \notin B_\varepsilon \) and \(p_i \notin B_\varepsilon \) which implies \(1_{B_\varepsilon } (x_i )=0\) and \(1_{B_\varepsilon } (p_i )=0\), \(\forall i\). From (22), we have \(\chi _{B_\varepsilon } (\varPhi )=0\) which contradicts with \(\chi _{B_\varepsilon } (\varPhi )\ne 0\).

Conversely, if \(p_g \in B_\varepsilon \), then \(1_{B_\varepsilon } (p_g )=1\). By (22), \(\chi _{B_\varepsilon } (\varPhi )\ne 0\). This completes the proof of the lemma. \(\square \)

Lemma 4

\(\forall (\varPhi _1 ,\varPhi _2 ,\ldots )\in \varOmega \), if \(\chi _{B_\varepsilon } (\varPhi _k )\ne 0\), then \(\chi _{B_\varepsilon } \big (\varPhi _{k+1} \big )\ne 0\).

Proof

Since \(\chi _{B_\varepsilon } (\varPhi _k )\ne 0\), by Lemma 3, we have \(p_g^k \in B_\varepsilon \). By standard PSO algorithm, we know \(f(p_g^{k+1} )\le f\big (P_g^k \big )\)which implies \(p_g^{k+1} \in B_\varepsilon \). From Lemma 3, we get \(\chi _{B_\varepsilon } \big (\varPhi _{k+1} \big )\ne 0\). This completes the proof of the lemma.

Let \(a_{id}^k =c_1 (p_{id}^k -x_{id}^k )\), \(b_{id}^k =c_2 (p_{gd}^k -x_{id}^k )\) and \(c_{id}^k =x_{id}^k +\omega v_{id}^k \), then Eq. (4) can be described as follows

$$\begin{aligned} x_{id}^{k+1} =a_{id}^k r_1 +b_{id}^k r{ }_2+c_{id}^k \end{aligned}$$
(25)

which is a random variable.

If ab and c in Lemma 2 are replaced by \(a_{id}^k ,b_{id}^k \) and \(c_{id}^k \), respectively, then \(\alpha \), \(\beta \), \(\gamma \) and \(\tau \) are replaced by \(\alpha _{id}^k \), \(\beta _{id}^k \), \(\gamma _{id}^k \) and \(\tau _{id}^k \), accordingly. \(\square \)

Lemma 5

\(\forall \varPhi \in \varPhi _{Np} \), if \(\varPhi _k =\varPhi \) and \(x^{{*}}=(x_1^{*} ,x_2^{*} ,\ldots x_D^{*})\in M^{k}=\bigcup \nolimits _{i=1}^{Np} {M_i^k } \), \(\forall k\ge 1\), then there exists a constant \(\rho >0\), such that

$$\begin{aligned} p(x_{B_\varepsilon } \big (\varPhi _{k+1} \big )\ne 0)\ge \rho . \end{aligned}$$
(26)

where \(M_i^k =[\alpha _{i1}^k ,\tau _{i1}^k ]\times [\alpha _{i2}^k ,\tau _{i2}^k ]\times \cdots \times [\alpha _{iD}^k ,\tau _{iD}^k ]\), \(i=1,2,\ldots ,Np\), \(k=1,2,3,\ldots \) is a hypercube and \(\varPhi _{k+1} \) is generated by the standard PSO algorithm.

Proof

Sine \(x^{{*}}=(x_1^{*} ,x_2^{*} ,\ldots x_D^{*} )\in M^{k}\), there exists at least \(l\in \{1,2,\ldots ,Np\}\), such that \(x^{{*}}\in M_l^k \). This can imply \(x_d^{*} \in \lfloor {\alpha _{ld}^k ,\tau _{ld}^k } \rfloor \), \(d=1,2,\ldots ,D\).

When \(a_{ld}^k \ne 0,b_{ld}^k \ne 0\) and \(a_{ld}^k \ne b_{ld}^k \), we consider the transition probability \(p\left( {x_d^{*} -\delta<x_{ld}^{k+1} } \right. \left. {<x_d^*+\delta } \right) \) from following two cases.

Case 1 Suppose that \(x_d^{*} -\delta \ge \alpha _{ld}^k \) or \(x_d^{*} +\delta \le \tau _{ld}^k \).

By reason of \(\alpha _{ld}^k \le x_d^{*} \le \tau _{ld}^k \), we have \(\left| \left[ {\alpha _{ld}^k ,\tau _{ld}^k } \right] \cap \right. \left. \left[ {x_d^{*} {-}\delta ,x_d^{*} {+}\delta } \right] \right| {\ge } \delta \), where |I| denotes the length of the interval I. Then, there exists at least one among \(\left| \left[ {\alpha _{ld}^k ,\beta _{ld}^k } \right] \right. \left. \cap \left[ {x_d^{*} -\delta ,x_d^{*} +\delta } \right] \right| \), \(\left| {\left[ {\beta _{ld}^k ,\gamma _{ld}^k } \right] \cap \left[ {x_d^{*} -\delta ,x_d^{*} +\delta } \right] } \right| \), and \(\left| {\left[ {\gamma _{ld}^k ,\tau _{ld}^k } \right] \cap \left[ {x_d^{*} -\delta ,x_d^{*} +\delta } \right] } \right| \) which is equal or greater than \(\frac{\delta }{3}\). If \(\left| {\left[ {\alpha _{ld}^k ,\beta _{ld}^k } \right] \cap \left[ {x_d^{*} -\delta ,x_d^{*} +\delta } \right] } \right| \ge \frac{\delta }{3}\), then \(\left| {\left[ {\max \{\alpha _{ld}^k ,x_d^*-\delta \},\min \{\beta _{ld}^k ,x_d^*+\delta \}} \right] } \right| \ge \frac{\delta }{3}\). By Lemma 2, we have

$$\begin{aligned}&p\left( {x_d^{*} -\delta<x_{ld}^{k+1} <x_d^{*} +\delta } \right) =\int _{x_d^{*} -\delta }^{x_d^{*} +\delta } {p_{x_{ld}^{k+1} } (u)} \hbox {d}u\\&\quad \ge \int _{\max \left[ {\alpha _{ld}^k ,x_d^{*} -\delta } \right] }^{\min \left[ {\beta _{ld}^k ,x_d^{*} +\delta } \right] } {\frac{u-\alpha _{ld}^k }{\left| {a_{id}^k } \right| \left| {b_{id}^k } \right| }} \hbox {d}u. \end{aligned}$$

Let \(\theta _1 =\max \left[ {\alpha _{ld}^k ,x_d^{*} -\delta } \right] \), \(\theta _2 =\min \left[ {\beta _{ld}^k ,x_d^{*} +\delta } \right] \), then \(\theta _2 -\theta _1 \ge \frac{\delta }{3}\), \(\theta _1 -\alpha _{ld}^k \ge 0\). Thus, we have

$$\begin{aligned}&p\left( {x_d^{*} -\delta<x_{ld}^{k+1} <x_d^{*} +\delta } \right) \\&\quad \ge \frac{1}{2\left| {a_{id}^k } \right| \left| {b_{id}^k } \right| }\left[ {\left( {\theta _2 -\alpha _{ld}^k } \right) ^{2}-\left( {\theta _1 -\alpha _{ld}^k } \right) ^{2}} \right] \\&\quad =\frac{1}{2\left| {a_{id}^k } \right| \left| {b_{id}^k } \right| }\left( {\theta _2 +\theta _1 -2\alpha _{ld}^k } \right) \left( {\theta _2 -\theta _1 } \right) \\&\quad \ge \frac{1}{2\left| {a_{id}^k } \right| \left| {b_{id}^k } \right| }\left( {\frac{\delta }{3}+2\theta _1 -2\alpha _{ld}^k } \right) \frac{\delta }{3}\\&\quad \ge \frac{1}{2\left| {a_{id}^k } \right| \left| {b_{id}^k } \right| }\left( {\frac{\delta }{3}} \right) ^{2}. \end{aligned}$$

Let \(m=\mathop {\max }\limits _{1\le i\le D} \{|l_i |,|u_i |\}>0\), then \(|a_{id}^k |\le 2c_1 m\) and \(|b_{id}^k |\le 2c_2 m\). Hence, one has

$$\begin{aligned} p\left( {x_d^{*} -\delta<x_{ld}^{k+1} <x_d^{*} +\delta } \right) \ge \frac{1}{8c_1 c_2 m^{2}}\cdot \left( {\frac{\delta }{3}} \right) ^{2}. \end{aligned}$$
(27)

Similarly, if \(\left| {\left[ {\gamma _{ld}^k ,\tau _{ld}^k } \right] \cap \left[ {x_d^{*} -\delta ,x_d^{*} +\delta } \right] } \right| \ge \frac{\delta }{3}\), we can obtain

$$\begin{aligned}&p\left( {x_d^{*} -\delta<x_{ld}^{k+1} <x_d^{*} +\delta } \right) \nonumber \\&\quad \ge \int _{\max \left[ {\gamma _{ld}^k ,x_d^{*} -\delta } \right] }^{\min \left[ {\tau _{ld}^k ,x_d^{*} +\delta } \right] } {\frac{\tau _{ld}^k -u}{\left| {a_{id}^k } \right| \left| {b_{id}^k } \right| }} \hbox {d}u\ge \frac{1}{8c_1 c_2 m^{2}}\cdot \left( {\frac{\delta }{3}} \right) ^{2}. \end{aligned}$$
(28)

If \(\left| {\left[ {\beta _{ld}^k ,\gamma _{ld}^k } \right] \cap \left[ {x_d^{*} -\delta ,x_d^{*} +\delta } \right] } \right| \ge \frac{\delta }{3}\), by Lemma 2, we have

$$\begin{aligned}&p\left( {x_d^{*} -\delta<x_{ld}^{k+1} <x_d^{*} +\delta } \right) \nonumber \\&\quad \ge \int _{\max \left[ {\beta _{ld}^k ,x_d^{*} -\delta } \right] }^{\min \left[ {\gamma _{ld}^k ,x_d^{*} +\delta } \right] } {\frac{\min \lfloor {\left| {a_{id}^k } \right| \left| {b_{id}^k } \right| } \rfloor }{\left| {a_{id}^k } \right| \left| {b_{id}^k } \right| }} \hbox {d}u\ge \frac{\min \{c_1 ,c_2 \}}{4c_1 c_2 m}\cdot \frac{\delta }{3}.\nonumber \\ \end{aligned}$$
(29)

Case 2 Suppose that \(x_d^{*} -\delta <\alpha _{ld}^k \) and \(x_d^{*} +\delta >\tau _{ld}^k \).

By Lemma 2, we have

$$\begin{aligned} p\left( {x_d^{*} -\delta<x_{ld}^{k+1} <x_d^{*} +\delta } \right) =\int _{\alpha _{ld}^k }^{\tau _{ld}^k } {p_{x_{ld}^{k+1} } (u)} \hbox {d}u=1. \end{aligned}$$
(30)

Similar to the above proof, when \(a_{ld}^k \ne 0,b_{ld}^k \ne 0\) and \(a_{ld}^k =b_{ld}^k \), by Eq. (19), we can get

$$\begin{aligned} p\left( {x_d^{*} -\delta<x_{ld}^{k+1} <x_d^{*} +\delta } \right) \ge \frac{1}{8c_1 c_2 m^{2}}\cdot \left( \frac{\delta }{2}\right) ^{2}. \end{aligned}$$
(31)

When \(a_{ld}^k \ne 0,b_{ld}^k =0\), by Eq. (21), we can obtain

$$\begin{aligned} p\left( {x_d^{*} -\delta<x_{ld}^{k+1} <x_d^{*} +\delta } \right) \ge \frac{\delta }{\left| {a_{id}^k } \right| }\ge \frac{\delta }{2c_1 m}. \end{aligned}$$
(32)

When \(a_{ld}^k =0,b_{ld}^k \ne 0\), by Eq. (20), we have

$$\begin{aligned} p\left( {x_d^{*} -\delta<x_{ld}^{k+1} <x_d^{*} +\delta } \right) \ge \frac{\delta }{\left| {b_{id}^k } \right| }\ge \frac{\delta }{2c_2 m}. \end{aligned}$$
(33)

When \(a_{ld}^k =0,b_{ld}^k =0\), we have \(\alpha _{ld}^k =\tau _{ld}^k =c_{ld}^k \). By \(x_d^{*} \in \lfloor {\alpha _{ld}^k ,\tau _{ld}^k } \rfloor \), we have \(x_{ld}^{k+1} =c_{ld}^k =x_d^*\) which is not a random variable. In this case, we define

$$\begin{aligned} p\left( {x_d^{*} -\delta<x_{ld}^{k+1} <x_d^{*} +\delta } \right) =1. \end{aligned}$$
(34)

Let \(\rho _1 =\min \Big \{ \frac{1}{8c_1 c_2 m^{2}}\left( {\frac{\delta }{3}} \right) ^{2},\frac{\min \{c_1 ,c_2 \}}{4c_1 c_2 m}\frac{\delta }{3},\frac{1}{8c_1 c_2 m^{2}}\left( {\frac{\delta }{2}} \right) ^{2},\frac{\delta }{2c_1 m},\frac{\delta }{2c_2 m},1 \Big \}>0\), from Eqs. (27)–(34), we can get

$$\begin{aligned} p(x_d^{*} -\delta<x_{ld}^{k+1} <x_d^{*} +\delta )\ge \rho _1 >0 \end{aligned}$$
(35)

Because each component of \(x_l^{k+1} \) is independent, by (35), we obtain,

$$\begin{aligned} p(x_l^{k+1} \in B_\varepsilon )\ge & {} p(x_l^{k+1} \in E_\delta ) \\= & {} \prod _{d=1}^D p \left( {x_d^{*} {-}\delta<x_{ld}^{k+1} <x_d^{*} {+}\delta } \right) {\ge } \rho _1^D >0. \end{aligned}$$

Thus

$$\begin{aligned} p(x_{B_\varepsilon } \big (\varPhi _{k+1} \big )\ne 0)\ge p(x_l^{k+1} \in B_\varepsilon )>\rho _{^{1}}^D . \end{aligned}$$

Let \(\rho =\rho _1^D >0\), we have

$$\begin{aligned} p(x_{B_\varepsilon } \big (\varPhi _{k+1} \big )\ne 0)\ge \rho >0. \end{aligned}$$

This completes the proof of lemma. \(\square \)

Theorem 1

If there exists \(k_0 \) such that \(x_{B_\varepsilon } (\varPhi _{k_0 } )\ne 0\), then the sequence \(\left\{ {p_g^k } \right\} \), \(\forall k\ge k_0 \) generated by the standard PSO algorithm can obtain the \(\varepsilon \)-optimum solution, i.e.,

$$\begin{aligned} p_g^k \in B_\varepsilon ,\forall k\ge k_0. \end{aligned}$$

Proof

Due to \(x_{B_\varepsilon } (\varPhi _{k_0 } )\ne 0\), from Lemma 4, one has \(x_{B_\varepsilon } (\varPhi _k )\ne 0\), \(\forall k\ge k_0 \). By Lemma 3, we have \(p_g^k \in B_\varepsilon \). This completes the proof of the theorem.

Theorem 2

Suppose that \(\forall (\varPhi _1 ,\varPhi _2 ,\varPhi _3 ,\ldots )\in \varOmega \) is a sequence generated by standard PSO algorithm. If there exists a subsequence \(\{\varPhi _{k_j } \}\) such that the condition of Lemma 5 is satisfied, then the sequence \(\{p_g^k \}\) converges to the \(\varepsilon \)-optimum solution with probability one, i.e.,

$$\begin{aligned} \mathop {\lim }\limits _{k\rightarrow \infty } p\{p_g^k \in B_\varepsilon \}=1. \end{aligned}$$

Proof

By Lemma 4 and the Markovian property of the process, one has

$$\begin{aligned}&p\{\chi _{B_\varepsilon } (\varPhi _{k_{j+1} } )=0\} \nonumber \\&\quad =p\{\chi _{B_\varepsilon } (\varPhi _1 )=0,\chi _{B_\varepsilon } (\varPhi _2 )=0,\ldots \hbox {, }\chi _{B_\varepsilon } (\varPhi _{k_{j+1} } )=0\} \nonumber \\&\quad =p\{\chi _{B_\varepsilon } (\varPhi _1 )=0\}\cdot \prod _{t=1}^{k_{j+1} -1} {p\{\chi _{B_\varepsilon } (\varPhi _{t+1} )=0\}} \nonumber \\&\quad \le \prod _{t=1}^{k_{j+1} -1} (p\{\chi _{B_\varepsilon } (\varPhi _{t+1} )=0|\chi _{B_\varepsilon } (\varPhi _t )=0\}\nonumber \\&\qquad +p\{\chi _{B_\varepsilon } (\varPhi _{t+1} )=0|\chi _{B_\varepsilon } (\varPhi _t )\ne 0\}) \nonumber \\&\quad =\prod _{t=1}^{k_{j+1} -1} {p\{\chi _{B_\varepsilon } (\varPhi _{t+1} )=0|\chi _{B_\varepsilon } (\varPhi _t )=0\}} . \nonumber \\&\quad \le \prod _{t=1}^{_j } {p\{\chi _{B_\varepsilon } (\varPhi _{k_{t+1} } )=0|\chi _{B_\varepsilon } (\varPhi _{k_{t+1} -1} )=0\}} . \end{aligned}$$
(36)

Because the time homogenous property of Markov Chain assures that

$$\begin{aligned}&p\{\chi _{B_\varepsilon } (\varPhi _{k_{j+1} } )=0|\chi _{B_\varepsilon } (\varPhi _{k_{j+1} -1} )=0\}\nonumber \\&\quad =p\{\chi _{B_\varepsilon } (\varPhi _{k_2 } )=0|\chi _{B_\varepsilon } (\varPhi _{k_2 -1} )=0\},\forall j. \end{aligned}$$
(37)

In order to calculate \(p\{\chi _{B_\varepsilon } (\varPhi _{k_2 } )=0|\chi _{B_\varepsilon } (\varPhi _{k_2 -1} )=0\}\), we define

$$\begin{aligned} \Delta =\{\varPhi |\chi _{B_\varepsilon } (\varPhi )=0\}\subset R^{4\cdot D\cdot Np}. \end{aligned}$$

\(\forall \varPhi \in \Delta ,\) From Lemma 5, we can get

$$\begin{aligned}&p\{\chi _{B_\varepsilon } (\varPhi _{k_2 } )\nonumber \\&\quad =0|\chi _{B_\varepsilon } (\varPhi _{k_2 -1} )=0\}=p\{\chi _{B_\varepsilon } (\varPhi _{k_2 } )=0|\varPhi _{k_2 -1} =\varPhi \} \nonumber \\&\quad =1-p\{\chi _{B_\varepsilon } (\varPhi _{k_2 } )\ne 0|\varPhi _{k_2 -1} =\varPhi \} \nonumber \\&\quad \le 1-\rho \end{aligned}$$
(38)

From Eqs. (36) and (37), we know that the probability (35) becomes

$$\begin{aligned} p\{\chi _{B_\varepsilon } (\varPhi _{k_{j+1} } )=0\}\le (1-\rho )^{j}\quad . \end{aligned}$$

This implies that

$$\begin{aligned}&p\{\chi _{B_\varepsilon } (\varPhi _{k_{j+1} } )\ne 0\}=1-p\{\chi _{B_\varepsilon } (\varPhi _{k_{j+1} } )=0\}\\&\quad \ge 1-(1-\rho )^{j} \end{aligned}$$

By Lemma 3, one has

$$\begin{aligned} p\{p_g^{k_{j+1} } \in B_\varepsilon \}=p\left\{ \chi _{B_\varepsilon } (\varPhi _{k_{j+1} } )\ne 0\right\} \ge 1-(1-\rho )^{j}. \end{aligned}$$

Therefore, we have

$$\begin{aligned} \mathop {\lim }\limits _{j\rightarrow \infty } p\left\{ p_g^{k_{j+1} } \in B_\varepsilon \right\} \ge 1-\mathop {\lim }\limits _{j\rightarrow \infty } (1-\rho )^{j}=1. \end{aligned}$$

Thus

$$\begin{aligned} \mathop {\lim }\limits _{j\rightarrow \infty } p\left\{ p_g^{k_{j+1} } \in B_\varepsilon \right\} =1. \end{aligned}$$

By standard PSO algorithm, we know that \(\{f\big (P_g^k \big )\}\) is monotone-decreasing sequence. Hence, one has

$$\begin{aligned} \mathop {\lim }\limits _{k\rightarrow \infty } p\{p_g^k \in B_\varepsilon \}=1. \end{aligned}$$

This completes the proof of the theorem. \(\square \)

Theorem 3

Suppose that \(\forall (\varPhi _1 ,\varPhi _2 ,\varPhi _3 ,\ldots )\in \varOmega \) is a sequence generated by standard PSO algorithm. If \(x^{{*}}=(x_1^{*} ,x_2^{*} ,\ldots x_D^{*} )\) does not belong to \(cl\left( {\bigcup \nolimits _{k=1}^\infty {\bigcup \nolimits _{i=1}^{Np} {M_i^k } } } \right) \) which is closure of the set \(\bigcup \nolimits _{k=1}^\infty {\bigcup \nolimits _{i=1}^{Np} {M_i^k } } \), then there exists \(\varepsilon >0\) such that

$$\begin{aligned} p\{p_g^k \in B_\varepsilon \}=0 \qquad \hbox { for }\quad \forall k\ge 1. \end{aligned}$$
(39)

Proof

Since \(cl\left( {\bigcup \nolimits _{k=1}^\infty {\bigcup \nolimits _{i=1}^{Np} {M_i^k } } } \right) \) is closed set, the objective function f, based on its continuity, has the minimum \(f_{\min } \) on the set \(cl\left( {\bigcup \nolimits _{k=1}^\infty {\bigcup \nolimits _{i=1}^{Np} {M_i^k } } } \right) \). By \(x^{{*}}=(x_1^{*} ,x_2^{*} ,\ldots x_D^{*} )\notin cl\left( {\bigcup \nolimits _{k=1}^\infty {\bigcup \nolimits _{i=1}^{Np} {M_i^k } } } \right) \), we know that \(f_{\min } -f(x^{*})>0\). Let \(\varepsilon =f_{\min } -f(x^{*})\), then \(B_\varepsilon \bigcap {cl\left( {\bigcup \nolimits _{k=1}^\infty {\bigcup \nolimits _{i=1}^{Np} {M_i^k } } } \right) } \) is an empty set. This implies that \(B_\varepsilon \bigcap {M_i^k } \) is also an empty set for \(\forall i\in \{1,2,\ldots ,Np\}\) and \(\forall k\ge 1\).

Because the random variables \(x_{i1}^k ,x_{i2}^k ,\ldots ,x_{iD}^k \) are independent, the joint density function of the D-dimensional random variable \(x_i^k =(x_{i1}^k ,x_{i2}^k ,\ldots ,x_{iD}^k )^\mathrm{T}\) is

$$\begin{aligned}&p_{x_i^k } (u_1 ,u_2 ,\ldots ,u_D )=p_{x_{i1}^k } (u_1 )\cdot p_{x_{i2}^k } (u_2 )\cdot \ldots \cdot p_{x_{iD}^k } (u_D )\nonumber \\&\quad \hbox { for } \quad \forall i\in \{1,2,\ldots ,Np\} \hbox { and } k\ge 1. \end{aligned}$$
(40)

If \(\forall (u_1 ,u_2 ,\ldots ,u_D )\notin M_i^k \), then there exists at least \(d\in \{1,2,\ldots ,D\}\) such that \(u_d \notin [\alpha _{id}^k ,\tau _{id}^k ]\). We have \(p_{x_{id}^k } (u_d )=0\) by Eqs. (6), (19), (20) and (21). From Eq. (40), we know that if \((u_1 ,u_2 ,\ldots ,u_D )^\mathrm{T}\notin M_i^k \) then \(p_{x_i^k } (u_1 ,u_2 ,\ldots ,u_D )=0\) for \(\forall i\in \{1,2,\ldots ,Np\}\) and \(k\ge 1\).

Since \(B_\varepsilon \bigcap {M_i^k } \) is an empty set, one has

$$\begin{aligned} p\{x_i^k \in B_\varepsilon \}= & {} \int _{B_\varepsilon } {p_{x_i^k } (} u_1 ,u_2 ,\ldots ,u_D \}\hbox {d}u_1 \hbox {d}u_2 \ldots \hbox {d}u_D =0 \nonumber \\&\hbox { for } \quad \forall i\in \{1,2,\ldots ,Np\} \hbox { and } k\ge 1. \end{aligned}$$
(41)

By standard PSO algorithm, we know that \(p_g^k \) is an element of the set \(\bigcup \nolimits _{t=1}^k {\bigcup \nolimits _{i=1}^{Np} {\{x_i^t \}} } \) for \(\forall k\ge 1\). Therefore, we have from Eq. (41) that

$$\begin{aligned} p\{p_g^k \in B_\varepsilon \}=0\quad \hbox { for } \quad \forall k\ge 1. \end{aligned}$$

The proof is completed. \(\square \)

We know from Theorem 2 that the standard PSO algorithm converges to the \(\varepsilon \)-optimum solution with probability one for any \(\varepsilon \) under certain conditions. But Theorem  3 shows that the standard PSO algorithm reaches the \(\varepsilon \)-optimum solution with probability zero for some \(\varepsilon \). In order to overcome this drawback, we give a new improved PSO algorithm which converges to the \(\varepsilon \)-optimum solution with probability one in next section.

5 Improved PSO algorithm and Numerical simulations

5.1 Improved PSO algorithm

It is generally known that the effective methods of enhancing the performance of PSO algorithm are to balance the exploration and exploitation abilities. The exploitation ability is implemented for the best particles (the best fitness value) and exploration ability for the worst particles (the worst fitness value). Moreover, Gaussian perturbation is introduced in order to ensure that the proposed PSO algorithm converges to the \(\varepsilon \)-optimum solution with probability one for any \(\varepsilon \). Based on this idea, we propose an improved PSO algorithm (Here, we call it IPSO) according to the following strategy.

At iteration k, the ith particle’s best position \(P_i^k =(p_{i1}^k ,p_{i2}^k ,\ldots ,p_{iD}^k )\) is updated as follows:

$$\begin{aligned} \bar{{p}}_{id}^k {=}\left\{ \begin{array}{ll} \mu p_{id}^k (u_d -l_d )\rho _1 N(0,\sigma ^{2})\\ \qquad +(1-\mu )(p_{id}^k+\rho _2 N(0,\sigma ^{2})),&{} \hbox { if rand}<\hbox {CR} \\ p_{id}^k,&{}\hbox {otherwise} \\ \end{array}\right. . \end{aligned}$$
(42)

The first part of Eq. (42) is exploration operator which can improve exploration ability; the Second part of Eq. (42) is exploitation operator which can improve exploitation ability. Here, \(\rho _1 \) and \(\rho _2 \) are two positive real numbers, rand is a random number between 0 and 1. \(\mu \) is the weighting factor, which is defined as follows:

$$\begin{aligned} \mu =\frac{f(P_i^k )-f\big (P_g^k \big )}{f_{\max } -f\big (P_g^k \big )}, \end{aligned}$$
(43)

where \(f_{\max } =\mathop {\max }\limits _{1\le i\le Np} \{f(P_i^k )\}\). \(N(0,\sigma ^{2})\) represents Gaussian distribution with a time-varying standard deviation as follows:

$$\begin{aligned} \sigma =\sigma _{\max } -\left( \sigma _{\max } -\sigma _{\min } \right) \frac{k}{K_{\max } }, \end{aligned}$$
(44)

where \(\sigma _{\max } \) and \(\sigma _{\min } \) are the initial and final values of standard deviation, k is the current iteration number, \(K_{\max } \) is the maximum iteration number, and CR is a threshold which will change according to the following equation

$$\begin{aligned} \hbox {CR}=\left\{ {\begin{array}{ll} \frac{\alpha (f_{\text {ave}} -f(P_i^k ))}{f_{\text {ave}} -f\big (P_g^k \big )},\quad \hbox { if }\,\,f(P_i^k )\le f_{{ ave}} \\ \frac{\alpha (f(P_i^k )-f_{\text {ave}} )}{f_{\max } -f_{{ ave}} },\quad \hbox { if }\,\,f(P_i^k )>f_{{ ave}} \\ \end{array}} \right. , \end{aligned}$$
(45)

where \(\alpha \) is a constant between 0 and 1, and \(f_{{ ave}} \) is defined as follows:

$$\begin{aligned} f_{\text {ave}} =\frac{1}{Np}\sum _{j=1}^{Np} {f(P_j^k )} , \end{aligned}$$
(46)

By Eqs. (42) and (45), we know that the best particle and the worst particle can be updated with a greater probability, the best particle can exploit around itself, and the worst particle is able to explore the search space thoroughly. If \(\bar{{p}}_{id}^k \) is less than \(l_d \) or greater than \(u_d \), then \(\bar{{p}}_{id}^k =p_{id}^k \). The new best position of particle i at iteration k is updated as follows:

$$\begin{aligned} P_i^k (new)=\left\{ {\begin{array}{ll} \bar{{P}}_i^k,&{}\quad \hbox {if }\,\,\quad f(\bar{{P}}_i^k )<f(P_i^k ) \\ P_i^k,&{}\quad \hbox {if}\,\, \quad f(\bar{{P}}_i^k )\ge f(P_i^k ) \\ \end{array}} \right. . \end{aligned}$$
(47)

which will replace \(P_i^k \). The pseudo-code of the IPSO algorithm is shown in Fig. 1.

Fig. 1
figure 1

Pseudo-code for the IPSO algorithm

5.2 Convergence analysis of IPSO

Lemma 6

If \(\bar{{P}}_i^k \in B_\varepsilon \), then \(P_i^k (new)\in B_\varepsilon \) for any i and k.

Proof

If \(P_i^k (new)=P_i^k \), then we know that \(f(\bar{{P}}_i^k )\ge f(P_i^k )\) by Eq. (47). Since \(\bar{{P}}_i^k \in B_\varepsilon \), we have \(P_i^k \in B_\varepsilon \). Thus, we get \(P_i^k (new)\in B_\varepsilon \). If \(P_i^k (new)=\bar{{P}}_i^k \), then \(P_i^k (new)\in B_\varepsilon \) is obvious. \(\square \)

Lemma 7

\(\forall \varPhi \in \varPhi _{Np} \), if \(\varPhi _k =\varPhi \), then there exists a constant \(\rho >0\), such that

$$\begin{aligned} p(x_{B_\varepsilon } \big (\varPhi _{k+1} \big )\ne 0)\ge \rho . \end{aligned}$$
(48)

where \(\varPhi _{k+1} \) is generated by the IPSO algorithm.

Proof

By IPSO algorithm, Eq. (22) and Lemma 6, we have

$$\begin{aligned} p(x_{B_\varepsilon } \big (\varPhi _{k+1} \big )\ne 0)\ge & {} p(P_g^{k+1} (new)\in B_\varepsilon ) \\ \ge p(\bar{{P}}_g^k \in B_\varepsilon )= & {} p((\bar{{p}}_{g1}^k ,\bar{{p}}_{g2}^k ,\ldots ,\bar{{p}}_{gD}^k )\in B_\varepsilon ) \end{aligned}$$

From Eqs. (43) and (45), we know that \(\mu =0\) and \(\hbox {CR}=\alpha \) when \(p_{id}^k =p_{gd}^k \). In other words, \(\bar{{p}}_{gd}^k =p_{gd}^k +\rho _2 N(0,\sigma ^{2}),d=1,2,\ldots ,D\) is implemented with probability\(\alpha \). Thus, \(\bar{{p}}_{gd}^k \) is a random variable, and its probability density function is

$$\begin{aligned} p_{\bar{{p}}_{gd}^k } (x_d )=\frac{1}{\sqrt{2\pi }\sigma \rho _2 }e^{-\frac{(x_d -p_{gd}^k )^{2}}{2\sigma ^{2}\rho _2^2 }}, \hbox { for }x_d \in (-\infty ,+\infty ). \end{aligned}$$

Since \(\bar{{p}}_{g1}^k ,\bar{{p}}_{g2}^k ,\ldots ,\bar{{p}}_{gD}^k \) are mutually independent random variables, the probability density function of the D-random variable \((\bar{{p}}_{g1}^k ,\bar{{p}}_{g2}^k ,\ldots ,\bar{{p}}_{gD}^k )\) is

$$\begin{aligned}&p(x_1 ,x_2 ,\ldots ,x_D )=\prod _{d=1}^D \frac{1}{\sqrt{2\pi }\sigma \rho _2 }e^{-\frac{(x_d -p_{gd}^k )^{2}}{2\sigma ^{2}\rho _2^2 }}\\&\quad =\left( {\frac{1}{\sqrt{2\pi }\sigma \rho _2 }} \right) ^{D} e^{-\sum \nolimits _{d=1}^D {\frac{\big (x_d -p_{gd}^k\big )^{2}}{2\sigma ^{2}\rho _2^2 }} }, \hbox { for } x_d \in (-\infty ,{+}\infty ). \end{aligned}$$

Thus,

$$\begin{aligned}&p((\bar{{p}}_{g1}^k ,\bar{{p}}_{g2}^k ,\ldots ,\bar{{p}}_{gD}^k )\in B_\varepsilon )\\&\quad =\alpha ^{D}\mathop {\int _{B_\varepsilon } {\left( {\frac{1}{\sqrt{2\pi }\sigma \rho _2 }} \right) ^{D}e^{-\sum \limits _{d=1}^D {\frac{(x_d -p_{gd}^k )^{2}}{2\sigma ^{2}\rho _2^2 }} }\hbox {d}x_1 \hbox {d}x_2 \ldots \hbox {d}x_D } }\\&\quad \ge \left( {\frac{\alpha }{\sqrt{2\pi }\sigma _{\max } \rho _2 }} \right) ^{D}e^{-D\frac{2m^{2}}{\sigma _{\min }^{2}\rho _2^2 }}\mu (B_\varepsilon ). \end{aligned}$$

Let \(\rho =\left( {\frac{\alpha }{\sqrt{2\pi }\sigma _{\max } \rho _2 }} \right) ^{D}e^{-D\frac{2m^{2}}{\sigma _{\min }^{2}\rho _2^2 }}\mu (B_\varepsilon )>0\), we have

$$\begin{aligned} p(x_{B_\varepsilon } \big (\varPhi _{k+1} \big )\ne 0)=p\left( x_{B_\varepsilon } \big (\varPhi _{k+1} \big )\ne 0 \vert \varPhi _k {=}\varPhi \right) \ge \rho {>}0. \end{aligned}$$

This completes the proof of lemma. \(\square \)

Theorem 4

Suppose that \(\forall (\varPhi _1 ,\varPhi _2 ,\varPhi _3 ,\ldots )\in \varOmega \) is a sequence generated by IPSO algorithm. Then the sequence \(\{p_g^k \}\) converges to the \(\varepsilon \)-optimum solution with probability one, i.e.,

$$\begin{aligned} \mathop {\lim }\limits _{k\rightarrow \infty } p\{p_g^k \in B_\varepsilon \}=1. \end{aligned}$$

Proof

See the proof of Theorem 2. \(\square \)

5.3 Numerical simulations

5.3.1 Benchmark functions

In order to evaluate the performance of the proposed IPSO algorithm, we take the 13 standard benchmark functions provided in Yao et al. (1999). These functions are given in Tables 1 and 2, respectively, where D represents the dimension of the function, S is the search region, and \(f_\mathrm{{opt}} \) is the minimum value of the function. The functions in Table 1 are unimodal and in Table 2 are multimodal. The multimodal functions have a large number of local minima, so the algorithm must be the capable of finding the optimum solution and should not be trapped in local optima.

Table 1 Unimodal test functions
Table 2 Multimodal test functions

5.3.2 Comparison with other algorithms

The IPSO algorithm is applied to 13 standard benchmark functions, and the results is compared with LDIWPSO (Shi and Eberhart 1999), AIWPSO (Nickabadi et al. 2011), CDIWPSO (Feng et al. 2007), DAPSO (Shen et al. 2010), and SSRDIWPSO (Adewumi and Arasomwan 2016). In our experiments, the same set of parameters is assigned for all algorithms: The initial and final values of inertia weight are 0.9 and 0.4, the acceleration factors are \(c_1 =2\) and \(c_2 =2\), the search region of velocities of the particles is set to S which is the search region of positions of the particles, population size is set to 30, and the maximum number of iteration \(K_{\max } \) is 3000. In IPSO, other parameters are set as follows: \(\rho _1 =20\), \(\rho _2 =0.5\), \(\sigma _{\max } =1\), \(\sigma _{\min } =0.2\), and \(\alpha =0.5\). In all algorithms, if the positions and velocities of the particles are beyond their search region, then the positions and velocities of the particles are updated by using random method. For all the algorithms, two stopping conditions are adopted, the one is that the maximum number of iteration is satisfied and the other one is that \(|f(p_g )-f_\mathrm{{opt}} |<\varepsilon \) or the iteration numbers are over the maximum number of iteration, where \(\varepsilon \) is a small tolerance. The first stopping condition is used for evaluating the solution accuracy, and the second is used to evaluate the runtimes of the algorithms. All algorithms are implemented in MATLAB 7.0 and run on PC with 2.00 GB RAM memory, 2.10 GHz CPU, and Windows 7 operation system. For each problem, the experiment is repeated 30 times.

Table 3 Experimental results of benchmark functions in Table 1 with \(D=30\) and \(K_{\max } =3000\)
Table 4 Experimental results of benchmark functions in Table 2 with \(D=30\) and \(K_{\max } =3000\)

By the first stopping condition, Tables 3, 4, 5, 6, 7, and 8 show the results with respect to the best of the best solutions, worst of the best solutions, average of the best solutions, standard deviation of the best solutions, and average runtimes of algorithms for the unimodal and multimodal functions with dimensions 30, 100, and 1000, respectively. In Table 7, “f” means that the running results fail.

Table 5 Experimental results of benchmark functions in Table  1 with \(D=100\) and \(K_{\max } =3000\)

In Table 3, we see that the six algorithms can find better optimum solutions on functions \(F_1 \) and \(F_2 \), but AIWPSO and SSRDIWPSO outperform IPSO. For functions \(F_3 \) and \(F_4 \), LDIWPSO, AIWPSO, CDIWPSO, DAPSO, and SSRDIWPSO have a very poor ability to explore and exploit the search space, whereas IPSO can get better optimum solutions. This is clear indication that IPSO has better global search ability and can easily get out of local optima than the other five algorithms. For function \(F_5 \), the six algorithms does not find the optimum solutions, but the average value of the best solutions obtained by IPSO is better than other algorithms, and IPSO is more robust than the other algorithms based on comparing the obtained standard deviation. For function \(F_6 \), the six algorithms get same results. For function \(F_7 \), IPSO is superior to the other five algorithms.

Table 6 Experimental results of benchmark functions in Table  2 with \(D=100\) and \(K_{\max } =3000\)
Table 7 Experimental results of benchmark functions in Table  1 with \(D=1000\) and \(K_{\max } =3000\)
Table 8 Experimental results of benchmark functions in Table 2 with \(D=1000\) and \(K_{\max } =3000\)
Fig. 2
figure 2

Performance comparison for functions a \(F_2 \), b \(F_3 \), c \(F_4 \), and d \(F_6 \) with \(D=30\) and \(K_{\max } =3000\)

Table 4 shows that IPSO has a strong ability to explore and exploit the search space except for the function \(F_{13} \). For functions \(F_8 \), \(F_9 \), and \(F_{11} \), IPSO is much better than the other five algorithms in exploring and exploiting and can find better optimum solutions. For function \(F_{10} \), IPSO is worse than CDIWPSO, but can also get better optimum solutions and is superior to the other four algorithms. For function \(F_{12} \), IPSO is worse than AIWPSO, CDIWPSO, and SSRDIWPSO for the best of the best solutions, but it is much better than the other five algorithms for the worst and average of the best solutions and more robust than the other algorithms based on comparing the obtained standard deviation. For function \(F_{13} \), AIWPSO, CDIWPSO, and SSRDIWPSO are superior to IPSO for the best of the best solutions, but all algorithms act nearly the same for the average of the best solutions.

Table 5 shows the strong ability of our method to search for optimum solutions. Except for function \(F_5 \), IPSO performs significantly better than the other five algorithms in exploring and exploiting the search space, and it exactly leads to finding the optimum solutions for function \(F_6 \). For function \(F_5 \), IPSO do not find the optimum solutions, but IPSO is better than the other five algorithms.

Fig. 3
figure 3

Performance comparison for functions a \(F_8 \), b \(F_{10} \), c \(F_{11} \) and d \(F_{12} \) with \(D=30\) and \(K_{\max } =3000\)

Fig. 4
figure 4

Performance comparison for functions a \(F_5 \), b \(F_7 \), c \(F_9 \) and d \(F_{11} \) with \(D = 100\) and \(K_{\max } =3000\)

As Table 6 illustrates, LDIWPSO, AIWPSO, CDIWPSO, DAPSO, and SSRDIWPSO have poor exploration and exploitation for the functions in Table 2, but IPSO has a good ability to explore and exploit the search space and can find good solutions on functions \(F_9 \), \(F_{10} \), \(F_{11} \), and \(F_{12} \). For functions \(F_8 \) and \(F_{13} \), IPSO also has poor performance in finding good solutions, but it is better than the other five methods.

Tables 7 and 8 show the comparative results on the high-dimensional benchmark functions of Tables 1 and 2. The dimension of these functions is 1000. From Tables 7 and 8, we can see that IPSO outperforms the other five algorithms for all functions. LDIWPSO, AIWPSO, CDIWPSO, DAPSO, and SSRDIWPSO have a very poor ability to explore and exploit the search space and cannot find good solutions for all functions, while IPSO has a better ability to find the good solutions on functions \(F_1 \), \(F_3 \), \(F_4 \), \(F_6 \), \(F_7 \), \(F_{10} \), and \(F_{11} \). Especially for function \(F_6 \), it exactly finds the global optimum solutions. IPSO is able to find acceptable solutions on functions \(F_9 \) and \(F_{12} \). For functions \(F_5 \), \(F_8 \), and \(F_{13} \), IPSO cannot find good solutions, but it is significantly better than other algorithms. For function \(F_2 \), all methods are not able to obtain the numerical results. It is noted that according to obtained the average best-so-far solutions, IPSO is much better than the other five methods in high-dimensional search problems.

From Tables 38, we can see that the average runtimes of LDIWPSO, AIWPSO, CDIWPSO, DAPSO, and SSRDIWPSO are nearly the same for all benchmark functions, while the average runtimes of IPSO are longer than those of other five algorithms for all benchmark functions. The mutation operator introduced in the standard particle swarm algorithm can promote the PSO to explore and exploit the search space, but the computational time will greatly increase at the same time.

Figure 2 shows the progress of the average best-so-far solution of LDIWPSO, AIWPSO, CDIWPSO, DAPSO, SSRDIWPSO, and IPSO over 30 runs for functions \(F_2 \), \(F_3 \), \(F_4 \), and \(F_6 \), with dimension 30. In Fig. 2, the convergence speed of IPSO is faster than that of the other five algorithms except for function \(F_2 \), and it achieves the better quality results for functions \(F_2 \), \(F_3 \), \(F_4 \), and \(F_6 \). In particular, the convergence speed of IPSO is very rapid for functions \(F_3 \), \(F_4 \), and \(F_6 \).

The progress of the average best-so-far solution of LDIWPSO, AIWPSO, CDIWPSO, DAPSO, SSRDIWPSO, and IPSO over 30 runs for functions \(F_8 \), \(F_{10} \), \(F_{11} \), and \(F_{12} \) with dimension 30 are shown in Fig. 3. From Fig. 3, compared against the other five algorithms, the convergence speed of IPSO is faster except for function \(F_{10} \). In particular, IPSO can achieve the better quality results on functions \(F_{11} \), and \(F_{12} \). For function \(F_{10} \), IPSO is better than LDIWPSO, AIWPSO, DAPSO, and SSRDIWPSO, but worse than CDIWPSO.

A performance comparison of LDIWPSO, AIWPSO, CDIWPSO, DAPSO, SSRDIWPSO, and IPSO over 30 runs for the average best-so-far solution of functions \(F_5 \), \(F_7 \), \(F_9 \), and \(F_{11} \) with dimension 100 is given in Fig. 4. In Fig. 4, compared with the other five algorithms, the convergence speed of IPSO is faster on functions \(F_5 \), \(F_7 \), \(F_9 \), and \(F_{11} \), and IPSO can converge quickly to the optimal solutions on functions \(F_7 \), \(F_9 \), and \(F_{11} \).

Fig. 5
figure 5

Performance comparison for functions a \(F_1 \), b \(F_7 \), c \(F_8 \) and d \(F_{13} \) with \(D = 1000\) and \(K_{\max } =3000\)

Figure 5 shows the performance comparison of LDIWPSO, AIWPSO, CDIWPSO, DAPSO, SSRDIWPSO, and IPSO over 30 runs for the average best-so-far solution of functions \(F_1 \), \(F_7 \), \(F_8 \), and \(F_{13} \) with dimension 1000. In Fig. 5, the convergence speed of IPSO is faster than that of the other five algorithms for functions \(F_1 \), \(F_7 \), \(F_8 \), and \(F_{13} \), and IPSO can converge quickly to the optimal solutions on functions \(F_1 \) and \(F_7 \).

Now, we evaluate the runtimes of these algorithms under the condition that the convergence accuracy is satisfied. According to the second stopping condition, Tables 9, 10, 11, 12, 13, and 14, respectively, show the average iteration numbers and average runtimes of algorithm for the unimodal and multimodal functions with dimensions 30, 100, and 1000 over 30 runs. In these experiments, the population size is set to 30, the maximum number of iteration is 3000, and the convergence accuracy \(\varepsilon \) is \(10^{-5}\), \(10^{-3}\), and \(10^{-1}\) for 30-, 100-, and 1000-dimensional benchmark functions, respectively.

From the average runtimes reported in Tables 9 and 10, for functions \(F_3 \), \(F_6 \), \(F_9 \), \(F_{11} \), and \(F_{12} \), IPSO is better than other five methods. For function \(F_1 \), IPSO is slightly better than LDIWPSO and DAPSO, but worse than AIWPSO, CDIWPSO and SSRDPSO. For function 10, except for SSRDIWPSO, IPSO is better than other four algorithms. For functions \(F_2 \), \(F_4 \), \(F_5 \), \(F_7 \), \(F_8 \) and \(F_{13} \), IPSO is worse than other five algorithms. From the average numbers of iteration given by Tables 9 and 10, before the maximum number of iteration is met, LDIWPSO, AIWPSO, CDIWPSO, DAPSO, and SSRDIWPSO are able to obtain the optimal solutions to achieve the given precision on seven functions, and IPSO can obtain the optimal solutions to achieve the given precision on ten functions, especially, IPSO can converge quickly to the optimal solutions to achieve the given precision on functions \(F_6 \) and \(F_{11} \).

Table 9 Experimental results of benchmark functions in Table  1 with \(D=30\), \(\varepsilon =10^{-5}\), and \(K_{\max } =3000\)
Table 10 Experimental results of benchmark functions in Table  2 with \(D=30\), \(\varepsilon =10^{-5}\), and \(K_{\max } =3000\)
Table 11 Experimental results of benchmark functions in Table 1 with \(D=100\), \(\varepsilon =10^{-3}\), and \(K_{\max } =3000\)
Table 12 Experimental results of benchmark functions in Table 2 with \(D=100\), \(\varepsilon =10^{-3}\), and \(K_{\max } =3000\)
Table 13 Experimental results of benchmark functions in Table 1 with \(D=1000\), \(\varepsilon =10^{-1}\), and \(K_{\max } =3000\)
Table 14 Experimental results of benchmark functions in Table  2 with \(D=1000\), \(\varepsilon =10^{-1}\), and \(K_{\max } =3000\)

From Tables 11 and 12, the average runtimes obtained by LDIWPSO, AIWPSO, CDIWPSO, DAPSO, and SSRDIWPSO are nearly the same for all the benchmark functions in Tables 1 and 2, and the results obtained by IPSO are superior to those obtained by the other five methods on functions \(F_1 \), \(F_3 \), \(F_6 \), \(F_7 \), \(F_{10} \), and \(F_{11} \). In addition, average numbers of iteration of IPSO are smaller than those of other five algorithms on eight functions. Specially, IPSO can converge quickly to the optimal solutions to achieve the given precision on functions \(F_1 \), \(F_3 \), \(F_6 \), \(F_7 \), and \(F_{11} \) before the maximum number of iteration is met, while the average numbers of iteration of the other five algorithms almost reached the maximum number for all functions.

From Tables 13 and 14, the average runtimes obtained by LDIWPSO, AIWPSO, CDIWPSO, DAPSO, and SSRDIWPSO are nearly the same for all benchmark functions, and the results obtained by IPSO are better than those obtained by other five methods on eight functions. For average numbers of iteration, IPSO is better than the other five methods on nine functions. In particular, IPSO can converge quickly to the optimal solutions to achieve the given precision on functions \(F_1 \), \(F_3 \), \(F_4 \), \(F_6 \), \(F_7 \), \(F_{10} \), and \(F_{11} \) before the maximum number of iteration is met.

6 Conclusion

The convergence analyses of standard PSO algorithm were studied by some scholars based on the dynamic system theory, difference equation theory, and stochastic process theory, but the studies of convergence in probability are relatively few. The overall goal of this paper is to analyze the convergence of standard PSO algorithm through probability theory. We prove that the standard PSO algorithm is convergence with probability 1 under certain condition. This result is helpful to understand the mechanism of standard PSO algorithm and propose more powerful PSO algorithm. Then, a new improved PSO algorithm that is convergence with probability 1 is proposed. The numerical results obtained from the proposed algorithm demonstrate that the proposed algorithm significantly improves the performance of PSO algorithm in terms of efficiency. In the future, it is very interesting to study probability convergence of other stochastic optimization algorithms.