An analysis of the velocity updating rule of the particle swarm optimization algorithm

Bonyadi, Mohammad Reza; Michalewicz, Zbigniew; Li, Xiaodong

doi:10.1007/s10732-014-9245-2

An analysis of the velocity updating rule of the particle swarm optimization algorithm

Published: 01 April 2014

Volume 20, pages 417–452, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Heuristics Aims and scope Submit manuscript

An analysis of the velocity updating rule of the particle swarm optimization algorithm

Download PDF

Mohammad Reza Bonyadi¹,
Zbigniew Michalewicz^1,2,3 &
Xiaodong Li⁴

1057 Accesses
47 Citations
3 Altmetric
Explore all metrics

Abstract

The particle swarm optimization algorithm includes three vectors associated with each particle: inertia, personal, and social influence vectors. The personal and social influence vectors are typically multiplied by random diagonal matrices (often referred to as random vectors) resulting in changes in their lengths and directions. This multiplication, in turn, influences the variation of the particles in the swarm. In this paper we examine several issues associated with the multiplication of personal and social influence vectors by such random matrices, these include: (1) Uncontrollable changes in the length and direction of these vectors resulting in delay in convergence or attraction to locations far from quality solutions in some situations (2) Weak direction alternation for the vectors that are aligned closely to coordinate axes resulting in preventing the swarm from further improvement in some situations, and (3) limitation in particle movement to one orthant resulting in premature convergence in some situations. To overcome these issues, we use randomly generated rotation matrices (rather than the random diagonal matrices) in the velocity updating rule of the particle swarm optimizer. This approach makes it possible to control the impact of the random components (i.e. the random matrices) on the direction and length of personal and social influence vectors separately. As a result, all the above mentioned issues are effectively addressed. We propose to use the Euclidean rotation matrices for rotation because it preserves the length of the vectors during rotation, which makes it easier to control the effects of the randomness on the direction and length of vectors. The direction of the Euclidean matrices is generated randomly by a normal distribution. The mean and variance of the distribution are investigated in detail for different algorithms and different numbers of dimensions. Also, an adaptive approach for the variance of the normal distribution is proposed which is independent from the algorithm and the number of dimensions. The method is adjoined to several particle swarm optimization variants. It is tested on 18 standard optimization benchmark functions in 10, 30 and 60 dimensional spaces. Experimental results show that the proposed method can significantly improve the performance of several types of particle swarm optimization algorithms in terms of convergence speed and solution quality.

Particle Swarms Reformulated Towards a Unified and Flexible Framework

Particle Swarm Optimization with Dynamic Topology and Conservation of Evaluations

Velocity Restriction-Based Improvised Particle Swarm Optimization Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper we modify the Particle Swarm Optimization (PSO) algorithm to enhance its performance for the following optimization problem:

$$\begin{aligned}&\displaystyle \hbox {optimize}f\left( {\vec {x}} \right) ,~\vec {x} = (x_{1} , \ldots ,x_{d} )\end{aligned}$$

(1a)

$$\begin{aligned}&\displaystyle \hbox {subject to }\vec {l} \le \vec {x} \le \vec {u}\end{aligned}$$

(1b)

$$\begin{aligned}&\displaystyle \hbox {where }\vec {l} = (l_{1} , \ldots ,l_{d} ),~\overrightarrow{{~u}} = (u_{1} , \ldots ,u_{d} ) \end{aligned}$$

(1c)

In the rest of the paper, without losing generality, we consider minimization problems only.

PSO is a stochastic population-based optimization method that was developed by Kennedy and Eberhart (1995). The algorithm has been successfully applied to many problems such as function optimization, artificial neural network training, pattern classification, and fuzzy system control (Engelbrecht 2005; Poli 2008), to name a few. Due to the ease of implementation and the fast convergence to acceptable solutions, PSO has received much more attention in recent years (Poli 2008). However, there are still many open issues in this algorithm, e.g. selection of the topology of the swarm, population sizing, and rotation variance (Engelbrecht 2005; Wilke 2005; Poli et al. 2007). Also, several studies have been conducted to analyze the behavior of particles with different coefficients (Shi and Eberhart 1998a, b; Zhang et al. 2005; Jiang et al. 2007) and different topologies (Mendes et al. 2004; Clerc 2006). Moreover, some researchers have analyzed the role of random matrices in the velocity updating rule (Secrest and Lamont 2003; Krohling 2004; Wilke 2005; Clerc 2006; Wilke et al. 2007a, b). This study presents a more in-depth analysis over these random matrices.

Four research contributions reported in this paper are:

(1)
identification of several problems caused by multiplying personal and social influence vectors by two random matrices,
(2)
presentation of a new approach based on Euclidean rotation matrices with random directions (generated by random normal distribution) for updating the velocity of particles to address these problems,
(3)
detailed investigation of the parameters of the normal distribution used in the Euclidean rotation matrices, and
(4)
experimentation with an adaptive approach that identifies values of the parameter of the normal distribution during the run. This adaptive approach is independent from the algorithm and the number of dimensions of the problem at hand.

The proposed Euclidean rotation is added to seven PSO variants and these extended algorithms are compared with their original versions on 18 standard optimization functions. Results of these experiments confirm that the proposed Euclidean rotation enhances the performance of the PSO methods for finding better solutions.

The rest of the paper is organized as follows. After a brief review of the basic components of PSO and some well-studied variants of PSO in Sect. 2, the effect of random matrices on the trajectory of particles is investigated in Sect. 3. It is shown that the random matrices presented in the velocity updating rule may have an uncontrollable impact on both the length and direction of the particles’ movement. In Sect. 4 a new approach for updating the velocity is presented and the ability of this approach to address the identified problems is tested. This approach introduces an additional parameter $\sigma $ that is responsible for controlling the direction of particles’ movement. The role of this parameter is investigated in Sect. 5. Then the proposed method is incorporated into several PSO variants. These new versions of PSOs are compared with their standard instances. The comparison shows that the proposed approach enhances the performance of all tested PSO variants in the majority of cases. The last section concludes the paper by summing up the advantages and disadvantages of the proposed method.

2 Background

In this section, we present a brief background of PSO. In Sect. 2.1, the original PSO model and its early modifications are reviewed. In Sect. 2.2, several PSO variants that are used in our comparative study are described. These variants are selected because they are either frequently used in literature or recently proposed. In Sect. 2.3, several existing studies on the use of random matrices in PSO are reviewed.

2.1 Basics of PSO

Particle Swarm Optimization (PSO) was introduced by Kennedy and Eberhart (1995). The method processes a population (referred to as swarm) of $m\ge 2$ particles; each particle is defined by three $d$-dimensional vectors:

Position $(\vec {x}_{t}^{i})$—the position of the $i$th particle in the $t$th iteration that is used to evaluate the particle’s quality.
Velocity ($(\vec {v}_{t}^{i}))$—the direction and magnitude of movement of the $i$th particle in the $t$th iteration.
Personal best ($(\vec {p}_{t}^{i}))$—the best position^{Footnote 1} that the $i$th particle has visited in its lifetime (up to the $t$th iteration). This vector serves as a memory for keeping knowledge of quality solutions (Eberhart and Kennedy 1995).

In the standard form of PSO (Kennedy and Eberhart 1995), the velocity vector for a particle $i$ is updated according to three other vectors that are the personal best position of the $i$th particle ($(\vec {p}_{t}^{i}))$, the best position found over the whole swarm (known as $(\vec {g}_{t}^{i}))$, and the current position of the particle ($(\vec {x}_{t}^{i}))$. The formulation for updating the velocity and position of the particles in the original PSO is given as Kennedy and Eberhart (1995):

$$\begin{aligned} \vec {V}_{{t + 1}}^{i}&= \vec {V}_{t}^{i} + \varphi _{1} R_{{1t}} \mathop {\underbrace{{(\vec {p}_{t}^{i} - \vec {x}_{t}^{i} )}}}\limits _{Personal~influence~(PI)} + \varphi _{2} R_{{2t}} \mathop {\underbrace{{(\vec {g}_{t} - \vec {x}_{t}^{i} )}}}\limits _{Social influence~(SI)}\end{aligned}$$

(2a)

$$\begin{aligned} \vec {x}_{{t + 1}}^{i}&= \vec {x}_{t}^{i} + \vec {V}_{t}^{i} \end{aligned}$$

(2b)

In this equation, $\varphi _{1}$ and $\varphi _{2 }$ are two real numbers known as acceleration coefficients,^{Footnote 2} and ${p}''$ and ${g}''$ are the personal best (of particle $i)$ and the global best vector (the best personal best in the swarm), respectively, until iteration $t$ (this model of PSO is usually known as the global best model because of the usage of the global best vector). Also, the role of $PI=\vec {p}_{t}^{i}$ (Personal Influence) and $SI=\vec {g}_{t}^{i}$ (Social Influence) vectors is to attract the particles to move toward known quality solutions, i.e. personal and global best. Moreover, $R_{1t}$ and $R_{2t}$ are two $d\times d$ diagonal matrices (Clerc 2006; Oca et al. 2009), where their elements are random numbers distributed uniformly ($\sim U(0,1))^{ }$ ^{Footnote 3} in [0, 1]. These matrices can be interpreted as mappings (any such mapping is called Rndm hereafter), which are applied in order to exploit new solutions around the global and personal best vectors and to keep the particles from moving exactly toward them; consequently, they diversify the particles for more effective searches. Also, these matrices are responsible for the direction diversity in the swarm (Wilke et al. 2007a, b). Note that matrices $R_{1t}$ and $R_{2t}$ are generated at each iteration for each particle independently. In Eberhart and Kennedy (1995), another model was introduced in which $\vec {g}_{t}^{i}$ was determined for each particle according to a sub-set of all particles rather than all particles in the swarm (this is called local best model).

Shi and Eberhart (1998a, b) introduced a new coefficient $\omega $ (inertia weight) in Eq. 2a to control the influence of the previous velocity value (this component, $\omega \vec {V}_{t}^{i}$, is called inertia) as follows:

$$\begin{aligned} \vec {V}_{{t + 1}}^{i} = \omega \vec {V}_{t}^{i} + \varphi _{1} R_{{1t}} \mathop {\underbrace{{(\vec {p}_{t}^{i} - \vec {x}_{t}^{i} )}}}\limits _{Personal influence}+ \varphi _{2} R_{{2t}} \mathop {\underbrace{(\vec {g}_{t} - \vec {x}_{t}^{i})}}\limits _{Social influence} \end{aligned}$$

(3)

where coefficient $\omega $ controls the influence of the previous velocity on movement (this is called InertiaPSO throughout the paper). One of the issues in InertiaPSO (as well as the original PSO) is that the velocity vector may grow to infinity if the value of $\omega $, $\phi _{1}$, and $\phi _{2}$ is not set correctly. There are many strategies to prevent the velocity from growing to infinity (Helwig and Wanka 2007). One popular and simple strategy is to restrict the velocity to the range $[-\vec {V}_{max},\vec {V}_{max}]$, where $\vec {V}_{max}$ is the maximum allowed velocity. Also, in many implementations of PSO, if a particle has left the search space, its objective value is not considered for updating personal and global best vectors. The growth of velocity was studied by Clerc and Kennedy (2002), Trelea (2003), Van den Bergh and Engelbrecht (2006), Poli (2009) through theoretical analysis of the dynamics of particles. It was found that the convergence behavior of PSO depends on the value of its coefficients ( $\omega $, $\phi _{1}$, and $\phi _{2})$. In fact, if these values are set correctly (within a specific range) then particles converge to a fixed point (shown to be the weighted average of $\vec {p}_{t}^{i}$ and $\vec {g}_{t})$ in the search space (Trelea 2003). Note that this fixed point is not guaranteed to be a quality point (Van den Bergh and Engelbrecht 2010).

In the remaining part of this section, some discussions of PSO variants (sub-section $B)$ together with the analysis of random matrices ($R_{1t}$ and $R_{2t})$ in PSO are provided (sub-section $C)$.

2.2 PSO variants

There are numerous PSO variants based on different velocity/position updating rules (Liang et al. 2006; Ghosh et al. 2010), different topologies (Mendes et al. 2004; Pso 2006; Xinchao 2010), different parameter values (Clerc and Kennedy 2002; Ratnaweera et al. 2004), different hybridizations (Huang et al. 2010; Wang et al. 2011), and population sizing (Chen and Zhao 2009; Hsieh et al. 2009); all of these have improved the searching ability of the standard PSO. For example, an algorithm called Constriction coefficient PSO (CoPSO) was proposed (Clerc and Kennedy 2002), which defined a new form of the velocity updating rule:^{Footnote 4}

$$\begin{aligned} \vec {V}_{{t + 1}}^{i} = \chi \left( {\vec {V}_{t}^{i} + c_{1} R_{{1t}} \left( {\vec {p}_{t}^{i} - \vec {x}_{t}^{i} } \right) + c_{2} R_{{2t}} \left( {\vec {g}_{t} - \vec {x}_{t}^{i} } \right) } \right) ~ \end{aligned}$$

(4)

The parameter $\chi $ was called the constriction factor. The authors demonstrated that tuning the values of $\chi $, $c_{1}$ and$ c_{2}$ can prevent the swarm from explosion^{Footnote 5} and can lead to better balance between exploration and exploitation within the search space. They proved that velocity does not grow to infinity if the parameters ($\chi $,$ c_{1}$ and$ c_{2})$ satisfy the following equation:

$$\begin{aligned} \chi =\frac{2}{\left| {2-c-\sqrt{c^{2}-4c}} \right| } \end{aligned}$$

(5)

where $c=c_1 +c_2 >4$.

Many researchers applied time-based (iteration-based) adaptation approaches to the parameters of PSO. For example, a Decreasing-IW PSO was proposed (Shi and Eberhart 1998a, b) where the value of inertia decreased along the iteration number of the algorithm. In contrast, another approach, named Increasing-IW PSO, was proposed (Zheng et al. 2003); this approach used the same equation with the maximum and minimum of inertia interchanged. In another time-varying approach, called Stochastic-IW PSO (Eberhart and Shi 2001), the value of inertia weight was considered as a uniform random number in the interval [0.5, 1). This interval for the random inertia weight was selected because the expected value of a uniform random number in this interval (0.75) is in a good agreement with the value 0.729 that is used frequently in PSO variants for inertia weight.

In Van den Bergh and Engelbrecht (2006) the convergence of CoPSO^{Footnote 6} was investigated and it was shown that, under some conditions, the global best position gets stuck and does not move (this phenomenon is called stagnation). In order to address this issue, the velocity updating rule for the global best was revised. This method was called Guaranteed Convergence PSO (GCPSO). The authors proposed the use of local search around the position of the global best particle and set the velocity vector for this particle accordingly. The method was applied to several test benchmarks and the results showed significant improvement compared with CoPSO.

In this paper, the PSO variant introduced in Pso (2006) is considered the standard one and is called StdPSO2006 from now on. This version differs from the version proposed in Shi and Eberhart (1998a, b) only in terms of its topology updating; the global best vector is defined for each particle according to a restricted number of particles (stored in a list called link $_{i}$ for each particle $i)$ rather than all particles in the swarm (this vector is called the local best vector for particle $i)$. The length of link $_{i}$ is equal for all $i$ and is a parameter (shown by $K)$ set by the user. At every iteration, if a better solution was not found by the swarm at the previous iteration then link $_{i}$ is updated for all particles. To update link $_{i}$ for particle $i$, $K$ other particles are chosen randomly and stored in link $_{i}$. The best personal best among particles in the set $\left\{ {link_i \mathop \cup i} \right\} $ is selected as the local best vector for particle $i$. The set link $_{i}$ is updated only if no better solution was found, otherwise, it remains the same. Note that, the set link $_{i}$ is selected randomly for each particle independently; thus, these sets (for all particles) might have overlap in one iteration while have no overlap (disjoint) in another.

In 2011 an adaptive approach, called the Adaptive Inertia Weight PSO (AIWPSO), was proposed (Nickabadi et al. 2011) where the value of inertia weight was adaptively changed during the search process. The authors proposed a linear relation between the inertia weight in the current iteration and the number of particles improved in the previous iteration. Their results showed a considerable improvement in solution qualities in comparison to other adaptive/time-varying strategies for controlling the inertia.

There are many other variants of PSO (Tu and Lu 2004; Poli et al. 2007) that have emerged over the past 15 years and it would be difficult to discuss all of them in detail. However, in order to have a comprehensive test environment, we confined our tests to nine PSO variants. The variants have been selected in this paper because:

they have been used frequently in other papers as baselines for comparisons, or
they have studied the rules of the random matrices in the velocity updating rule.

2.3 Investigation of random matrices

A study of the particles’ movement was presented through visualization (Secrest and Lamont 2003). The authors claimed that in PSO (Eq. 3), $\phi _{1}R_{1t}$ PI is a vector in the direction of PI and $\phi _{1}R_{1t}$ only influences the length of PI (and the same holds for SI). Consequently, the position of each particle at the ($t+1$)th iteration is somewhere (because of the stochastic nature of the velocity caused by Rndm) across a parallelogram defined by the current position, PI, and SI in the $t$th iteration. However, it is unclear whether the authors used random values or random matrices for the velocity updating rule, because the use of random values would result in getting the same value for all dimensions, so the direction diversity would be ignored.

Two issues related to the use of random matrices were detected in Clerc (2006). The first issue was that the summation of uniform distributions used in the velocity updating rule for Rndm mappings was no longer uniform. The second issue was that when personal best ($\vec {p})$ or global best ($\vec {g})$ share the same coordinate value with the position vector ($\vec {x})$ for particle $i$, PI (or SI) loses that dimension. In this case, the search process becomes a random line search in that dimension. In order to address these problems, several alternative distributions for $R_{1t}$ and $R_{2t}$ were proposed and compared (Clerc 2006).

A comparison between CoPSO and linear PSO (i.e. CoPSO where $R_{1t}$ and $R_{2t}$ are random numbers rather than random matrices) was conducted (Wilke et al. 2007a, b). This comparison showed that the trajectory of the movement of particles in the linear PSO turns into a line search at the very beginning of the search process. The authors claimed that, because of the directional diversity caused by random matrices (instead of random values in the linear PSO), CoPSO does not suffer from this problem. However, it was shown that, under some circumstances, the direction diversity might be lost in CoPSO as well (Van den Bergh and Engelbrecht 2010).

The performance of CoPSO was also investigated in cases where the search space was rotated (Wilke 2005). The authors found that CoPSO is sensitive to rotating the search space, i.e. CoPSO is a rotation variant algorithm. In order to implement a CoPSO model that is rotation invariant, they used random rotation matrices (rather than random diagonal matrices) in the velocity updating rule. They used an approximation idea (an exponential map) for generating the rotation matrices to reduce the computational expenses arising from matrix multiplication. In the exponential map, a rotation matrix $M$ is generated by:

$$\begin{aligned} M=I+\mathop \sum \limits _{i=1}^{max_i } \frac{1}{i!}\left( {\frac{\alpha \pi }{180}\left( {A-A^{T}} \right) } \right) ^{i} \end{aligned}$$

(6)

where $A$ is a $d \times d$ matrix with elements generated randomly in the interval [$-$0.5, 0.5], $\alpha $ is a scalar representing the rotation angle, and $I$ is the identity matrix. The generated matrix $M$ is a random rotation matrix with the angle $\alpha $. It is clear in Eq. 6 that the matrix $M$ becomes a more accurate estimation of a rotation matrix with the angle $\alpha $ when max $_{i}$ is larger. In Wilke et al. (2007a, b) the value of max $_{i}$ was set to 1, limiting the approximation to one term of the summation only. Thus, the approximation error of generating the random rotation matrix using the exponential map method grows with the rotation angle ($\alpha )$. Also, the impact of different values of $\alpha $ was not discussed and the value of this parameter was set to 3. Note that for larger values of max $_{i}$, the time complexity of generating rotation matrices grows rapidly. The time complexity for generating and multiplying the approximated rotation matrix (with max $_{i}$=1) into a vector is in $O(d^{2})$, i.e. including generating the matrix ($O(d^{2})$) plus vector-matrix multiplication ($O(d^{2})$). The rotation matrix was incorporated into CoPSO (the random diagonal matrices in CoPSO were replaced by the random rotation matrices) to test if it could improve the performance of the algorithm when the search space is rotated. The method (hereafter, we use acronym WPSO for Wilke’s PSO) was applied to five standard optimization benchmark functions and its results were compared to those of the standard PSO (Wilke 2005). These results showed that the performance of WPSO was not influenced by rotating the search space.

3 Effects of random matrices on vectors

Consider a vector $\vec {v}=(x_0,y_0)'$ in 2-dimensional Cartesian coordinates^{Footnote 7} and consider a 2 dimensional mapping Rndm called $R (R=\left[ {{\begin{array}{cc} {r_{11} }&{} 0 \\ 0&{} {r_{22} } \\ \end{array} }} \right] , r_{11} ,r_{22} \sim U(0,1))$. Also, assume that $\vec {v}^1$ is the result of applying the mapping $R $to $\vec {v}$, i.e. $\vec {v}=R \times \vec {v}(x_0,y_0)$. To investigate the effects of applying $R$ to an arbitrary vector, we generated 5,000 random matrices $R$ and applied them to a vector $\vec {v}$. Figure 1 shows the results of applying $R$.

All possible locations for the mapped vector $\vec {v}^{\prime }$ have been represented by the grey area in Fig. 1. Clearly, the mapping $R$ displays some properties when it is applied to an arbitrary vector $\vec {v}$, namely:

Property 1: $R$ does not preserve the length of $\vec {v}$

Property 2: Applying $R$ does not preserve the phase (direction in $d$ dimensions) of $\vec {v}$

Property 3: As $\vec {v}$ becomes closer to one axis, the probability of $\theta $ and $\theta ^{{\prime }}$ (the angle between $\vec {v}$ and $x$ axis and $\vec {v}^{\prime }$ and $x$ axis, respectively) becoming closer to each other increases rapidly; for a vector $\vec {v}$ that is exactly on one axis of the coordinates, no rotation takes place for any $R$ ^{Footnote 8}

Property 4: Applying $R$ produces vector $\vec {v}$ in the same quadrant (orthants in multidimensional space) where $\vec {v}$ is in (as the random values are positive)

Let us analyze the behavior of $\Vert \vec {v}^{\prime }\Vert $ and $\theta -\theta {\prime }$ to visualize the effects of applying $R$ to $\vec {v}$. To illustrate the listed properties, an arbitrary two-dimensional vector $\vec {v}=(-1,0.7)$ chosen randomly only for experimental purposes, is mapped 5,000 times by randomly generated matrices $R$. The histograms of $\Vert \vec {v}^{\prime }\Vert /\Vert \vec {v}\Vert $, $\theta {\prime }$, and $\theta -\theta {\prime }$ have been shown in Fig. 2a–c, respectively. Clearly, the length of $\vec {v}$ (Fig. 2a) is less than or equal to the length of ${v}''$ (Property 1). Considering $\frac{\Vert \vec {v}^{\prime }\Vert }{\Vert \vec {v}\Vert }$ as a random variable, the probability distribution for this random variable is different with changes of the vector $\vec {v}$. As an example, $P(\frac{\Vert \vec {v}^{\prime }\Vert }{\Vert \vec {v}\Vert }<0.2)\approx 0.2$ for vector $\vec {v}=(-1,0.007)$, while this value for vector $\vec {v}=(-1,0.7)$ is around 0.06. Clearly, scaling the length of vector $\vec {v}$ by applying $R$ is vector-dependent. Although the probability distribution for each element of $\vec {v}$ is uniform, there is no guarantee that the length of $\vec {v}$ has a uniform distribution. Thus, it is not possible to assume that applying $R$ has a uniform effect on the length of any vector in the $d$-dimensional case, and this assumption (having uniform effect on a vector) is correct only in one-dimensional space. Our simulations showed that this distribution is close to uniform for the vectors that are closer to axes of the coordinates while it is not uniform otherwise.

Applying $R$ changes the direction (Property 2) of the vector $\vec {v}$ (Fig. 2b). The distribution of this change is not the same for all vectors and it might be quite different for different vectors. Indeed, the $P(\left| {\theta -\theta {\prime }} \right| <\gamma )$ (probability of rotating the vector $\vec {v}$ less than $\gamma $ degrees when it is mapped by an $R)$ depends on the vector $\vec {v}$ and is not controllable.

For example, the value of $P\left( {\left| {\theta -\theta ^{{\prime }}} \right| <10^{\circ }} \right) \approx 0.31$ when $\vec {v}=(-1,0.7)$, while this value is around 0.14 when $\vec {v}$ is $\left( {-1,0.07} \right) $. Also, as the vector $\vec {v}$ becomes closer to the axes of the coordinates, the probability $P(\left| {\theta -\theta ^{{\prime }}} \right| \le \gamma )$ is rapidly increased. For example, for $\vec {v}$, the probability $P(\left| {\theta -\theta ^{{\prime }}} \right| \le 5^{\circ })$ is around 0.16 while this probability for ${v}''$ is around 0.997 (Property 3). This increment in probability continues until ${v}''$ is exactly on one of the axes. In this case, $P\left( {\left| {\theta -\theta ^{{\prime }}} \right| \le \varepsilon } \right) =1$ for any non-negative $\varepsilon $. This means no rotation can take place and these random mappings operate as a random scaling factor. Figure 2.c shows that the angle between the horizontal axis and all vectors $\vec {v}$ is in the interval $[90^{\circ },180^{\circ }]$. This shows that all vectors $\vec {v}$ are placed in the second quadrant, that is, where vector $\vec {v}$ is found (Property 4).

Now let us look at the general case of PSO. In the velocity updating rule (Eq. 3), the vectors PI and SI are mapped by two Rndm, $R_{1t}$ and $R_{2t}$. Figure 3 shows possible results of such mapping in a particular instance (with $\phi _1 =\phi _2 =1.49)$.

In Fig. 3, the greyed rectangles represent the areas in which the mapped versions of PI and SI may be found. Clearly, by applying Rndm to PI and SI, the resulting vectors may be dramatically different from the original ones. So, “the knowledge of quality solutions” (Eberhart and Kennedy 1995) is radically affected and this may cause undesirable behaviors in particles’ movement, especially in terms of exploitation of the area around the personal and global bests.

To summarize, we have identified three main problems which are present in the velocity updating rule of PSO when Rndm is used.

Problem 1: Uncontrollable changes of the length and direction of PI and SI (Properties 1 and 2) which results in inefficient exploitation for better solutions in the area around personal and global best.^{Footnote 9}

Problem 2: When PI and SI are close to an axis (assume that the position of the particle is in the center of the coordinates), the direction freedom offered by the random mappings Rndm is very small and limited, hence the exploitation would not be very effective (Property 3).^{Footnote 10}

Problem 3: By applying any Rndm to PI and SI, there is no chance for them to be mapped to other quadrants (Property 4). In fact, the random matrices may confine the exploitation of personal and global bests to just one quadrant which may result in trapping the particles in that quadrant.

To illustrate the issue that Problem 2 may cause, consider the problem of finding the minimum of function $f\left( {x,y} \right) =x^{2}+y^{2}$ (the optimum solution is at$ (0, 0)$ and $ f(0, 0)=0$). We experimented with StdPSO2006 using four particles in order to find the optimum solution with the following initialization assumptions:

(1)
The position of particles was initialized as a uniform random value in the interval [3.99, 4.01]. The values for the first dimension were randomly generated in the defined function’s boundaries.
(2)
The second dimension for the velocity vector for all particles was initialized close to 0 (a random value in the interval [$-0.01, 0.01$]).^{Footnote 11}

Our simulation showed that, in this situation, in 97 % of 2,000 runs (each run with four particles and 1,000 function evaluations), the best solution found by the algorithm was around the point$ (0, 4)$. This shows that the value for the second dimension could not be changed by the algorithm, which confirms the second identified problem. In fact, in this situation the Rndm cannot maintain direction diversity of the movement of particles to guide them to explore the search space.

To illustrate the problem that might be caused by the Problem 3, assume that we want to find the minimum of the following 2-dimensional function:

$$\begin{aligned} f\left( {x,y} \right) =\frac{(x+1)(y+1)}{((x+1)^{2}+1)((y+1)^{2}+1)},\quad x,y\in \left[ {-5,5} \right] \end{aligned}$$

(7)

For this function, there is an asymptote with the value of zero when both $x$ and $y$ increased to positive infinity. However, this is actually a local optimum while the best point which minimizes $f(x, y)$ is $(x,y)=(-2,0)$ with $f\left( {-2,0} \right) =-0.25$. Consider that all particles are initialized with positive values for their positions and velocities. For the reasons given in the previous paragraph, i.e., the random matrices confine PI and SI to one quadrant, there is a small chance for the particles to find the optimal point and they can easily be trapped in the local optima. Our simulation showed that, by initializing the positions and velocities of four particles as positive numbers and using the StdPSO2006 for optimization in 25 iterations, the process converges to the local optima in 75 of 100 runs.

To sum up, it is apparent that the mapping Rndm significantly influences the search process. In the following section, an approach is proposed to overcome the abovementioned problems. As reported in Van den Bergh and Engelbrecht (2010), meaningful investigations of the exploration/exploitation abilities of PSO are not possible without considering all parts of the velocity updating equation. However, for simplicity (see also Spears et al. 2010), the inertia component can be excluded from the analysis of the behavior of particles when one of the main three problems (discussed earlier in this section) is studied. In fact, according to Eq. 3, there is no coefficient incorporated to $\omega \vec {v}_{t}$ for controlling/changing its direction (Problem 1). Thus, once the value for one of the dimensions of the inertia becomes close to zero, it cannot affect the bearing of movement in that dimension in the next iteration. By setting SI and PI (Problem 2) to small values for that dimension, no further changes in the bearing of movement in following iterations can take place (see earlier example for Problem 2). Also, as was exemplified, by initializing the velocity vector parallel to SI and PI, the inertia component cannot play an effective role during the following iterations (as was shown in the example for Problem 3).

4 Proposed approach

In order to address all issues identified in the previous section, a new form of the velocity updating rule is proposed; it uses a rotation mapping rather than a random mapping for PI and SI (similar to the WPSO approach, see Wilke 2005), but with some modifications). The new formula (based on the concept of Euclidean rotation) is defined for the velocity updating rule:

$$\begin{aligned} \vec {V}_{{t + 1}}^{i} = \omega \vec {V}_{t}^{i} + \varphi _{1} r_{{1t}} Rotm_{t} \left( {\sigma _{1} } \right) \left( {\vec {p}_{t}^{i} - \vec {x}_{t}^{i} } \right) + \varphi _{2} r_{{2t}} Rotm_{t} \left( {\sigma _{2} } \right) \left( {\vec {g}_{t}^{i} - \vec {x}_{t}^{i} } \right) \end{aligned}$$

(8)

where Rotm $_{t}$($\sigma _{1})$ and Rotm $_{t}$($\sigma _{2})$ are two rotation matrices (replacing the random matrices $R_{1t}$ and $R_{2t}$ that were used in earlier PSO variants) and $r_{1t}$ and $r_{2t}$ are two real random numbers, uniformly distributed in [0, 1], which are used to control the magnitude of movement.

The concept of rotation matrices is used to address the identified issues with Rndms. A rotation matrix is a $d\times d$ orthogonal matrix where its determinant is equal to 1. In this paper we use the Euclidean rotation matrix defined as follows (Duffin and Barrett 1994):

$$\begin{aligned} Rot_{a,b} \left( \alpha \right) =\left\{ {r_{i,j} \hbox {|}{\begin{array}{l} {r_{a,a} =r_{b,b} =\cos \left( \alpha \right) } \\ {r_{a,b} =-\sin \left( \alpha \right) } \\ {r_{b,a} =\sin \left( \alpha \right) } \\ {r_{j,j} =1,\quad j\ne a, j\ne b} \\ {r_{i,j} =0,\quad elsewhere} \end{array} }} \right\} \end{aligned}$$

(9)

As an example, in a two dimensional case, this matrix becomes $\left[ {{\begin{array}{ll} {\hbox {cos}(\alpha )}&{} {\hbox {sin}(\alpha )} \\ {-\hbox {sin}(\alpha )}&{} {\hbox {cos}(\alpha )} \\ \end{array} }} \right] $. Note that the Euclidean rotation matrix preserves the length of the vectors during rotation and only changes the direction of the vectors. The random components $r_{1t}$ and $r_{2t}$ are responsible for the changes of the length of the vectors in Eq. 8.

Rot $_{a,b}$($\alpha )$ rotates a vector in the $x_{a}x_{b}$ plane by angle $\alpha $ about the origin while it preserves its length (Chow 2000). To rotate a vector in all possible planes, all corresponding rotation matrices are multiplied by each other. We define $Rot\left( {\mu ,\sigma } \right) $ as follows:

$$\begin{aligned} Rot\left( {\mu ,\sigma } \right) =\mathop \prod \limits _{1\le i<j\le d} Rot_{i,j} \left( {\alpha _{i,j} } \right) \end{aligned}$$

(10)

where Rot($\mu $, $\sigma )$ is a rotation matrix which rotates a vector in all planes $x_{i}x_{j}$ about the origin with the angle $\alpha _{i,j}$ that is a normally distributed random number, with the mean $\mu $, and standard deviation $\sigma $, i.e. $\{\alpha _{ij} ,\forall i,j\}\sim \mathcal{N}(\mu ,\sigma )$. In this paper we set $\mu =0$, and by this setting, the direction changes caused by rotation are centralized around the direction of the original vector. We use also the following notation: $\hbox {Rotm}(\sigma )=\hbox {Rot}(0, \sigma )$. By multiplying a vector $\vec {v}$ with this rotation matrix, the vector is rotated around the origin while its length remains unchanged.

Note that Eq. 10 forces some computational overhead on the velocity calculations (see Eq. 11) arising from matrix multiplication. This overhead depends on the number of matrices (in our case, $d(d-1)/2$ which is the number of possible planes in a $d$ dimensional space). In fact, the time complexity for generating Rotm($\sigma )$ is $O(d^{5})$. However, the implementation proposed in this paper reduces the time complexity to $O(d^{2})$, which is the same as generating a $d$ by $d$ matrix (which is used in the standard PSO), while calculations are still accurate (see “Appendix III” for details on this implementation).

Because of this computational overhead, one can consider selecting only one random plane in each iteration for each particle and conducting the rotation only in that plane rather than all possible planes (it is called RotmP throughout the paper). This method is faster than Rotm in terms of computational complexity. In fact, the complexity of selecting one of the planes and rotation in that plane is $O(d)$. We compare these two approaches (Rotm and RotmP) in the experimental results section.

The proposed approach differs from WPSO in several aspects. First, WPSO uses an approximation method for generating rotation matrices, which causes inaccuracy in rotating the vectors, whereas the proposed approach can accurately generate the rotation matrices in the same order of complexity. Second, the exponential map method used in WPSO for generating rotation matrices does not preserve the length of movement (due to the usage of approximation) when rotating the vectors, whereas the proposed method keeps the length unchanged. Third, in WPSO, the error of the approximation of the rotation matrices grows with the angle of rotation, whereas the proposed method is accurate for any value of rotation angles (this has been pointed out as a drawback of the exponential map method for generating rotation matrices in Wilke et al. (2007a, b)).

As mentioned earlier, the lengths of the vectors which are mapped by Rotm($\sigma )$ are not changed. Also, rotation of the vector is independent from the orientation of the coordinate axes. Based on the properties of a normal distribution, 99.7 % of changes in the vector’s angles are in the interval [$-3\sigma , 3\sigma $] in each plane. Indeed, the $P({\vert }\theta -\theta '{\vert } <\gamma )$ (the probability of rotating the vector less than $\gamma $ degrees) is the same for all vectors, and it is calculable and also controllable, by $\sigma $.

To exemplify these claims, consider the vector $\vec {v}=(-1,0.7)$ from Sect. 3, i.e. $\vec {v}$.

Figure 4a shows vector $\vec {v}$ and all other vectors (the vectors in the grey area) are the result of applying $Rotm\left( 4 \right) $ to $\vec {v}$ in 5,000 observations (compare Fig. 1, 4a). Figure 4b shows the histogram for $\theta -\theta $’, where $\theta $ and $\theta $’ are the angle between $\vec {v}$ and $x$ axis and $\vec {v}$ and $x$ axis, respectively. The figure indicates that the angle (direction) has been changed between $-15^{\circ }$ and 15$^{\circ }$. Moreover, the rotation matrices have changed the angles more than 10 degrees in just 0.8 % of observations ( $P({\vert }\theta - \theta '{\vert }<10)$ is 99.2 %). Our experiments showed that this value is the same for the vector $\vec {v}=(-1,0.007)$, which was not the case when the Rndm mapping was used (see Fig. 2). Figure 4c shows that by applying the rotation matrix to a vector that is close to some coordinate axes, e.g. $\vec {v}=(-1,0.007)$, the quadrant may be changed (the angle of the mapped version can be more than 180$^{\circ })$. Figure 4d shows the histogram of $\theta $’ when 5,000 rotation matrices Rotm(4) were applied to the vector $\vec {v}=(-1,0.7)$.

To summarize, the rotation matrix approach preserves the length of the vector $\vec {v}$ while it rotates $\vec {v}$ according to a predefined rotation angle. This angle can be a random variable normally distributed with a mean 0 and standard deviation $\sigma $ (by setting $\mu $=0, the mapped vectors will be distributed around the original vector following a Gaussian distribution).

The application of the mapping Rotm($\sigma )$ has the following properties:

Property 1: The lengths of PI and SI are not changed by Rotm($\sigma )$ and they are controlled by $\phi _{1}r_{1t}$ and $\phi _{2}r_{2t}$, respectively.

Property 2: Direction of PI and SI is controlled by $\sigma $.

Property 3: The rotation of a vector is independent of the orientation of the coordinate axes.

Property 4: The Rotm($\sigma )$ can rotate the PI and SI to a new quadrant.

In the velocity updating rule given by Eq. 8, the direction of each part is changed in all planes according to $\sigma $ while the effects of the length of each part are controlled only by $\phi $ and $r$. Figure 5 shows an instance when Rotm(4.6) is used in the PSO updating rules (4.6 has been chosen only for the example purposes).

In Fig. 5, the shadowed areas are the possible areas to which the PI and SI might be mapped after applying Rotm($\sigma )$ (while $\phi _{1}$= $\phi _{2}=1.49$ and $\sigma _{1}$= $\sigma _{2}= 4.6)$. Also, the shaded spectrum of these areas shows the probability of mapping the original vector to the area, i.e., the darker the area is, the higher the probability that the mapped vector will be in that area.

In the rest of the paper we use a simpler version of Eq. 8, where $\sigma _{1}$= $\sigma _{2}$= $\sigma $:

$$\begin{aligned} \vec {V}_{{t + 1}}^{i} = \omega \vec {V}_{t}^{i} + Rotm_{t} \left( \sigma \right) \left( {\varphi _{1} r_{{1t}} \left( {\vec {p}_{t}^{i} - \vec {x}_{t}^{i} } \right) + \varphi _{2} r_{{2t}} \left( {\vec {g}_{t} - \vec {x}_{t}^{i} } \right) } \right) \end{aligned}$$

(11)

Note that this approach can be used in the velocity updating rule of other types of PSO as well, by simply replacing any Rndm with Rotm. Hereafter, any type of PSO (PSO*) that uses the approach of rotation matrices is called PSO*-Rotm, e.g., when the proposed approach is used in CoPSO we call the method CoPSO-Rotm.

To show that this approach can effectively address Problem 2 of the velocity updating rule identified in the previous section, we applied the StdPSO2006-Rotm (with $\sigma =20$, which was determined experimentally) to function $f\left( {x,y} \right) =x^{2}+y^{2}$. Also, we initialized all four particles through the same procedure described in the previous section. In this situation, the algorithm could find optimal solutions in 99 % of 2,000 runs; this shows that the algorithm can offer better direction diversity in comparison to the StdPSO2006.

Now, let us return to the optimization problem given by Eq. 7. StdPSO2006-Rotm (with $\sigma =20$) was trapped in the local optima in only 23 out of 100 runs (as opposed to 75 in the original version of StdPSO2006). Also, by considering $\sigma = 28$, the algorithm failed in only 10 out of 100 runs (see Sect. 5.3 for the details of setting of the parameter $\sigma )$.

Thus rotation matrices address all problems related to the velocity updating rule identified in the previous section; further, the introduction of rotation matrices makes PSO rotation invariant, as already reported in Wilke (2005).

In the following section, the new velocity updating rule based on Rotm($\sigma )$ rather than Rndm is tested on several well-studied PSO variants and the results are reported. These experiments show that the proposed approach improves the overall ability of the PSO to explore better solutions in the search space. The only parameter that has been added to the algorithm is $\sigma $, and it is investigated in more detail. For this parameter, a sensitivity analysis is performed to set its value for different algorithms. Also, a simple adaptive method is proposed to make the algorithm less problem-dependent.

5 Experimental results

In the following four subsections we describe the experimental setup, discuss the benchmark functions selected for experimentation, provide a sensitivity study of the parameter $\sigma $, and report on experimental results.

5.1 Experimental setup

All variants were implemented under the MatlabR2011a environment. Due to the stochastic nature of the PSO-based methods, the reported results for each algorithm are the averages over 100 runs. We analyze the effect of the parameter $\sigma $ on the performance of different PSO-based methods. After that, in order to show the efficiency of the proposed rotation matrix approach, three sets of comparisons are considered.

In the first set of comparisons, the proposed approach (including Rotm and RotmP) is incorporated into CoPSO (i.e. CoPSO-Rotm and CoPSO-RotmP) and compared with two other methods CoPSO-Rndm (Clerc and Kennedy 2002), WPSO (Wilke 2005). The purpose of this test is to find which type of random matrices (Euclidean rotation in all planes, Euclidean rotation in one plane, uniform, or exponential map rotation) is more effective. Also, to have a fair comparison among these methods, in this test Rotm and RotmP are incorporated to CoPSO because WPSO is an extension of CoPSO. Moreover, the parameter $\sigma $ for Rotm and RotmP is fixed to a single value as this parameter is fixed in WPSO as well.
We continue the comparisons by incorporating Rotm into several well-known PSO algorithms, namely CoPSO, StdPSO2006, Decreasing-IW, Increasing-IW, Stochastic-IW, AIWPSO, and GCPSO (Shi and Eberhart 1998a, b; Eberhart and Shi 2001; Clerc and Kennedy 2002; Zheng et al. 2003; Pso 2006; Van den Bergh and Engelbrecht 2010; Nickabadi et al. 2011). They are then compared with their corresponding original versions. The purpose of this test is to show the effect of the usage of rotation matrices on well-known PSO algorithms. In this test, all parameters are kept the same as the original parameters for each type while the proposed approach is incorporated into these methods. These PSO algorithms were selected for comparison purposes because they provide quality results and they have been frequently used by other researchers in their experiments.
In the last set of comparisons, the proposed adaptive approach is tested when it is used to set the value of $\sigma $ in StdPSO2006-Rotm and AIWPSO-Rotm. The aim of this test is to evaluate the proposed adaptive approach for the variable $\sigma $.

These tests are conducted on several standard optimization functions that are introduced in Sect. 5.2. A performance metric called ABSQ (Average of Best Solution Quality) is reported for each test. ABSQ is the average of objective values that an algorithm achieves in a predefined number of function evaluations (FE) over 100 runs. The maximum allowed number of FEs is fixed (10$^{4}$ in all experiments, unless it is specified in the test), the algorithm is run, and the best found solution in this time frame is recorded as BSQ. This procedure is repeated 100 times and the average is reported as ABSQ. The parameters used for each method are listed in Table 1.

Table 1 The parameters used for the selected methods for comparison

Full size table

Note that the same parameters were used in both versions of each method (PSO* and PSO*-Rotm); the number of particles for all methods is set to 20 in all tests. Also, velocity clamping was used to limit the value of each dimension of the velocity vector in all experiments (the value for $V_{max} $ for each dimension $i$ was set to $\frac{\max \left( {\left| {l_i } \right| ,\left| {v_i } \right| } \right) }{2})$. Moreover, the objective value for a particle is not updated if the particle is in the infeasible region (out of the boundaries). This bound handling technique is known as Infinity bound handling technique (Helwig et al. 2013). Note that, the Infinity bound handling together with velocity clamping is frequently used in the literature, however, it is not the best possible choice for handling particles which are out of the boundaries, see Helwig et al. (2013) for detailed analysis.

In some tests, the percentage of improvement (impairment) has been reported. This percentage is calculated by the following formula:

$$\begin{aligned} \frac{z^{*}-z}{z^{*}}*100 \end{aligned}$$

(12)

where $z$* and $z$ are the averages of ABSQ over a set of functions for two under-comparison algorithms, respectively and $z^{*}>z$.

5.2 Test functions

In order to compare the proposed approach to other methods, some benchmark optimization problems were used. This benchmark includes 9 well-studied functions selected from Yao et al. (1999), Tu and Lu (2004), Suganthan et al. (2005) and their rotated versions. The formula for all functions can be found in “Appendix I”. Due to the fact that some of these test functions are solvable by $d$ one-dimensional searches, Salomon’s method (Salomon 1995) is applied to rotate them. This procedure is realized by generating an orthogonal matrix $M$ that is multiplied by the input variables $x$ to get the rotated variable $y= M * x$. The new variable $y$ is used to determine the objective value for particles. This type of rotation increases the complexity of the function so they will not be solvable by a simple search.

Table 2 shows the bounds for all functions. In addition, the location for the best solution and the best objective value for all functions are reported in this table. $f_{1-9}$ are the basis functions while $f_{10-18}$ are their rotated versions (expressions for these functions can be found in “Appendix I”).

Table 2 Functions used for tests

Full size table

The optimal solution for most of these functions is in the center of coordinates. To test the performance of the algorithms when the optimal solution is not on the center, we add a vector $o$ to shift these functions so that their optimal solutions are not on the center. This shift only takes place for the last set of tests in this paper. The vector $o$ was generated randomly for each function separately in such a way that the optimal solution is still in the search range. The vector $o$ was generated once for each function and it was fixed for each function for all tests.

5.3 Parameter setting

The standard deviation of the rotation angle in the proposed rotation matrix (Rotm) is the only parameter that has been added to the PSO algorithms. In this section, a method for finding the best value for this parameter is discussed.

The setting of parameters is a challenging issue whenever algorithms are compared. Parameters for algorithms need to be either tuned for each algorithm or the use of identical parameters needs to be justified. Because seven PSO variants are considered for extension by using the proposed rotation matrices, the value of $\sigma $ should be investigated for all of these algorithms independently. Also, several parameters such as population size remain the same across algorithms in all tests. Furthermore, since the focus of the paper is not the comparison of algorithms, but to show that different algorithms can benefit from the new rotation matrix, the individual parameter settings (apart from $\sigma )$ for each algorithm don’t matter as much.

Also, in order to have a valid setting procedure, two standard functions, a unimodal (Sphere) and a multimodal (Griewank) function were selected for this purpose. To set the parameter $\sigma $ for each algorithm, we incorporated the proposed rotation matrix into the algorithm, fixed the number of dimensions and applied the method to both functions (Griewank and Sphere) to find the best performance for various values of $\sigma $.

Our investigations showed that the best values for $\sigma $ are in the interval [0, 7] for all of these methods. Hence, we changed the value of $\sigma $ in this interval with a step size 0.01 (701 steps overall). In each run, one of the algorithms was applied to the under-process function (two mentioned functions) 50 times (for a fixed number of dimensions, $701\times 50\times 2$ tests were done). This process was done for 10, 20, 30, 40, 50, and 60 dimensional problems. In total, for each algorithm, $701\times 50\times 2\times 6$ tests were run to find the best value for $\sigma $. The results for all setting procedures for all methods are shown in Fig. 6. Note that because the best values for both functions were very close, the average of the objective values is considered to be the most likely determiner of the best value of $\sigma $. Figure 6 presents this average of best values for $\sigma $ for seven PSO variants. It is obvious that the best value is different for different methods and for different numbers of dimensions. Hereafter, the selected values for $\sigma $ in all tests are the ones that have been presented in Fig. 6.

One of the reasons behind the reduction of the best $\sigma $ when increasing the number of dimensions can be related to the overall angle changes in the higher dimensions. In fact, by increasing the number of dimensions, the number of planes in the space is increased. As the rotation of the velocity vector is applied to each plane independently (see Eq. 8), the overall applied rotation to this vector is increased by the number of dimensions. Thus, a smaller rotation is needed as the number of dimensions increases. Also, Fig. 6 indicates that the best values of $\sigma $ for different algorithms are different, especially for AIWPSO. Although the reason behind this phenomenon is not clear, our experiments show that, in each algorithm, the best value of $\sigma $ is correlated with the improvement rate of the particles (ir), defined as the number of improved particles divided by the population size. In order to find the relation between $\sigma $ and ir, we recorded the average of ir in 10,000 function evaluations for each algorithm (six PSO variants) when they were applied to $f_{1-9}$ (Table 2) over 100 runs for each function. Figure 7 shows the results.

Figure 7 The average improvement rate (ir) of six PSOs when they are applied to 9 standard benchmark functions in Table 2. (a) 10 dimensional cases, (b) 30 dimensional cases

A simple comparison between the improvement rate ir of the methods (Fig. 7) and the corresponding $\sigma $ (Fig. 6) indicates a positive correlation between these variables. For example, the algorithm AIWPSO, which has the largest $\sigma $ in Fig. 6, has also the largest value for ir on average (Fig. 7).

Based on the correlation between $\sigma $ and ir and $d$, a simple adaptive approach (Eiben et al. 1999) is proposed for $\sigma $ to make the algorithm independent of the problem and the number of dimensions. According to Figs. 6 and 7, we can assume that the value of $\sigma $ is dependent on the number of dimensions $d$ and the current improvement rate (ir $_{t})$.^{Footnote 12} Therefore, we can say that $\sigma _{t}= h(ir_{t}, d)$ where $h$ is a function that relates the value of $\sigma $ at iteration $t$ to the number of dimensions ($d)$ as well as ir at iteration $t$.

As reported in Fig. 6, appropriate values for $\sigma $ become smaller as the number of dimensions grows. Hence, it is safe to assume that there is a negative correlation between these variables. Also, the appropriate value of $\sigma $ becomes bigger when the used algorithms have a higher average value for ir (Fig. 7). Obviously, function $h$ can be defined in many ways to reflect the discovered correlations. In this paper, we defined $h$ as follows:

$$\begin{aligned} h\left( {ir_t,d} \right) =\frac{\alpha *ir_t}{\sqrt{d}}+\beta \end{aligned}$$

(13)

In this formula, $\alpha $ and $\beta $ are two constants, which bound the value of $\sigma $ to the interval$[\beta ,\frac{\alpha }{\sqrt{d}}+\beta ]$. By using this equation the correlations between $\sigma $ and $d$ as well as ir $_{t}$ are taken into account. The value of $\beta $ prevents $\sigma $ from becoming 0. Also, the square root of $d$ in the denominator of the formula reduces the speed of reduction by $d $(as was observed in Fig. 6).

Our experiments showed that $\alpha =30$ and $\beta $=0.01 works appropriately for our cases. This adaptive approach is added to some PSO variants and its results are reported.

5.4 Results analysis

The algorithms are compared in terms of their ABSQ. To provide an accurate comparison among these algorithms, the Wilcoxon’s test (Wilcoxon 1945) with 5 % level of significance is conducted. The $p$ value of the test has been used in order to show the significance of the differences in results in each case. To make a fair comparison, all parameters/coefficients for all algorithms are the ones taken from their original versions (see Table 1), regardless of the random matrices they use.

In the first test, the proposed rotation matrices (both Rotm and RotmP) are compared with two other methods, CoPSO-Rndm and WPSO. Table 3 shows the results of applying these methods to eighteen 10-dimensional test functions. In order to have a fair comparison among these methods, Rotm and RotmP are incorporated into CoPSO and their results are compared with WPSO and CoPSO-Rndm (recall that WPSO is an extension of CoPSO). Moreover, the value of $\sigma $ is kept constant ($\sigma =2$) as WPSO used a constant value for $\sigma $. Also, the number of FEs was set to 200,000. Each algorithm has been compared with the others based on the Wilcoxon’s test. The result of an algorithm that is significantly better than the result of another algorithm is indicated by the superscript letters R, P, N, or W for CoPSO-Rotm, CoPSO-RotmP, CoPSO-Rndm, and WPSO, respectively.

Table 3 Results from comparison between CoPSO-Rotm (the CoPSO that uses the rotation matrices), CoPSO-RotmP, WPSO, CoPSO-Rndm

Full size table

The performance of CoPSO-Rotm is significantly better than CoPSO-RotmP in 10 functions while it is significantly worse in 8 functions. Thus, it seems it is slightly beneficial to conduct rotation in all planes (Rotm) rather than only one plane (RotmP) randomly. Hence, from here on, we only experiment with Rotm approach to test if it can improve the performance of the other PSO variants. Also, CoPSO-Rotm is significantly better than WPSO in 16 functions out of 18. This shows that the proposed Rotm offers better performance compared to the exponential map approach proposed in Wilke et al. (2007a, b). In addition, CoPSO-Rotm performs better than CoPSO-Rndm in 14 functions out of all 18. Thus, it is clear that the proposed Rotm is more effective than other tested methods for generating random rotation matrices for PSO variants. Note that, because the parameters of WPSO have been taken from the literature and they have not been set for the used benchmark set, this comparison between different algorithms has limited validity. In order to have a fair comparison, we have set all parameters for both methods (rotation angle, acceleration coefficients, and inertia weight) to equal values and the base methods are also identical.

Several variants of PSO (Shi and Eberhart 1998a, b; Eberhart and Shi 2001; Clerc and Kennedy 2002; van den Bergh and Engelbrecht 2002; Zheng et al. 2003; Pso 2006; Nickabadi et al. 2011) were tested when they used random matrices (Rndm) or the proposed rotation matrices (Rotm($\sigma )$ with $\sigma $ taken from Fig. 6) in their velocity updating rule (Eq. 11). Note that in each test the only change incorporated into each method was the replacement of the Rndm with Rotm while all other parameters (acceleration coefficients, inertia weight, population size, etc) have been left untouched.

Table 4 shows the results of applying StdPSO2006 to 18 benchmark problems (introduced in Table 2) in 10, 30, and 60 dimensions. The best results reported in Table 4 have been bolded. Those which were significantly better based on the Wilcoxon’s test ($p <0.05$) have also been marked by an asterisk. The StdPSO2006-Rotm performs significantly better in 14 out of 18 10-dimensional problems. Moreover, the algorithms perform almost the same for one problem ($f_{9})$ and, for the 3 remaining problems ($f_{4}$, $f_{13}$, and $f_{18})$, StdPSO2006 performs better. On the other hand, the performance of StdPSO2006-Rotm is significantly better than StdPSO2006 in all functions of 30 and 60 dimensions. This shows that the proposed rotation matrices are more effective for the problems with a larger number of dimensions.

Table 4 Results of applying StdPSO2006 and StdPSO2006-Rotm to 18 benchmarks with a different number of dimensions

Full size table

Additionally, the average of solutions (average row in Table 4) is quite different for the rotated and non-rotated functions when the StdPSO2006 is used (36, 20 and 48 % differences for 10, 30, and 60 dimensional problems). However, these values are very close to each other when the StdPSO2006-Rotm is used (9, 0.6, and 3.7 % for 10, 30, and 60 dimensional problems, respectively). In fact, rotating the functions has a small effect on the performance of the StdPSO2006-Rotm whereas this effect is considerable when the StdPSO2006 is used. Finally, the averages of solutions for the StdPSO2006-Rotm over 10, 30, and 60 dimensional non-rotated (rotated) functions are 58 % (28 %), 97 % (96 %), and 99 % (99 %) better than that of the StdPSO2006, respectively. The optimization curves for some of these functions using StdPSO2006 and StdPSO2006-Rotm are available in “Appendix II”.

Table 5 reports the results of CoPSO when it uses random matrices versus rotation matrices in its velocity updating rule.^{Footnote 13} It is clear that CoPSO-Rotm performs significantly better than CoPSO for 8 out of 18 functions in 10 dimensional space. Also, for four functions ($f_{2}, f_{7}, f_{11},$ and $f_{18})$, the rotation matrices have no significant effect on CoPSO and both algorithms work almost the same as one another. CoPSO-Rotm performs worse than the CoPSO in terms of ABSQ in six functions. When it comes to 30 dimensional problems, the CoPSO-Rotm wins the challenge on 13 functions, and only loses the race on three of them.

Finally, in 60 dimensional problems, the effect of the proposed rotation matrix is more substantial and it makes CoPSO-Rotm significantly better than CoPSO in 16 functions and almost the same as CoPSO for the two remaining functions. In addition, the table demonstrates that the performance of CoPSO-Rotm is not greatly affected when the functions are rotated (4.6, 0.6, and 1.2 % for 10, 30, and 60 dimension problems, respectively). However, this is not the case for CoPSO and the performance of the algorithm is significantly affected when the functions are rotated (25.4, 29.2, and 43.8 % worse for 10, 30, and 60 dimensional problems). Although the average of solutions found by CoPSO-Rotm for all 10 dimensional problems was 3 % worse than that of CoPSO, it was 90 % better than that of CoPSO in the 30 dimensional problems and 85 % better for the 60 dimensional problems, thereby showing that the proposed random rotation has improved the overall performance of CoPSO. The convergence curves for some of these functions using CoPSO and CoPSO-Rotm are available in “Appendix II”.

Table 6 shows the results of applying the proposed random rotations (Rotm) to AIWPSO. It is clear that the proposed rotation matrix has improved the performance of the method for 16 functions in 10 dimensional problems, but only in one case the performance was reduced. Also, in the 30 and 60 dimensional functions, the performance of the algorithm is significantly better for 13 functions. In AIWPSO-Rotm, the average of solutions has been less affected when the function’s space is rotated (43 % affected for 10, 30, and 60 dimensional problems on average in comparison to 75 % for AIWPSO). The average improvement over all the 10, 30, and 60 dimensional problems was 29.4, 95.5 and 75 %, respectively. The convergence curves for some of these functions using AIWPSO and AIWPSO-Rotm are available in “Appendix II”.

Table 6 Results of applying AIWPSO and AIWPSO-Rotm to 18 benchmarks with different number of dimensions

Full size table

In Decreasing-IW method the proposed rotation matrix has improved the performance of the algorithm by 17, 89.5 and 99 % for 10, 30, and 60 dimensional problems, respectively (Table 7). In Increasing-IW (Stochastic-IW) method, although the proposed rotation causes 11 % (7.7 %) impairment in 10 dimensional problems, it offers 92.8 % (90.6 %) and 98.8 % (92 %) improvement for the 30 and 60 dimensional problems, respectively. In GCPSO, the percentage of improvement in 10, 30, and 60 dimensional problems were 14, 10, and 36 %, respectively.

Table 7 Results of comparing three algorithms decreasing-IW, increasing-IW, stochastic-IW, and GCPSO when they use random or rotation matrices

Full size table

Results indicate that the proposed Euclidean rotation can improve the performance of different PSO methods. Figure 8 shows the convergence curve of several PSO variants when they use Rndm or Rotm to optimize the function $f_{1}$ in 10D and 60D.

Figure 8 shows that if Rndm is used in the algorithms, the improvement of the objective value slows down towards the end of the optimization. However, this is not the case if Rotm is used in the same algorithm in the tested cases. It seems that towards the end of the optimization the swarm does not have sufficient diversity to exploit better solutions. However, it seems that when Rotm is used, the swarm is still diverse enough to exploit better solutions. This was actually expected as towards the end of the optimization, the particles are closer to each other and the velocity vectors become smaller. Thus, the probability of having the PI and SI closer to one of the axes is higher, which, based on the second identified issue discussed in Sect. 3, results in ineffective exploitation and causes impairment in the performance of the algorithm. However, when Rotm is used, this issue does not exist and particles are still diverse enough to find better solutions.

In the last test, the proposed adaptive approach for $\sigma $ as well as the ability of the algorithm to deal with the problems with shifted optima is examined (the optimal point is not in the center of the coordinate). In these test functions, the locations of the optimal solutions of all test functions have been shifted by a vector $o$ (see Sect. 5.1). Two PSO variants (StdPSO 2006, AIWPSO) were selected to run on these benchmarks when they use Rndm (their original version), Rotm with a static approach for $\sigma $, or Rotm with the adaptive approach. They were applied to each test case 100 times and their results are reported. Also, each algorithm was run for 10,000 FEs with 20 particles. The listed problems are in 10-dimensional space. Table 8 shows the results.

Table 8 Comparative results between StdPSO2006-Rotm and AIWPSO-Rotm, when they use the adaptive approach or static value for $\sigma $

Full size table

The results presented in Table refTab8 indicate that the proposed adaptive approach has improved the ability of the StdPSO2006-Rotm (compared to the static approach) in 13 functions out of 18 (in 7 functions the improvement was significant). This improvement takes place in 10 functions for the AIWPSO-Rotm (in four functions the improvements were significant). The results reported in the table also indicate that the proposed Rotm approach has improved the performance of the methods for dealing with the problems in which the location of global optima is not located in the center of the coordinate. In fact, the StdPSO2006-Rotm with static $\sigma $ performed significantly better than its original version in 14 functions over all 18 listed when the optimum solution is not in the center of the coordinates. Also, the AIWPSO-Rotm performed significantly better (on average) than their original versions in optimizing 16 functions over all 18 functions.

6 Conclusion and future work

The velocity updating rule of PSO (with the use of randomly generated diagonal matrices) can drastically impact the direction and length of generated vectors. This impact can be different for different vectors and, despite calculability, it is not controllable. Additional side-effects of multiplying personal and global best vectors by two random diagonal matrices were identified in this paper. These are: delay in convergence and attraction to inappropriate directions, dissimilar direction changes in different vectors, and limitation in particles’ movement in the search space. In order to overcome these problems, the usage of Euclidean rotation matrices with random direction (generated by a normal distribution) rather than random diagonal matrices was suggested. According to the presented results, the proposed approach addressed these problems effectively. Also, as the Euclidean rotation preserves the length of the vectors and only affects their directions, it makes it easier to control the effects of randomness applied by the random matrices on the direction of the vectors. The parameters of the normal distribution (i.e. mean and variance $\sigma )$ for the rotation direction were analyzed in detail through some experiments. A tuning procedure and an adaptive approach (Eiben et al. 1999) were considered for setting the value of the parameter $\sigma $. In the empirical analysis presented in this paper, the new approach was adjoined to seven variants of PSO and the performance was put under evaluation using 18 standard benchmark functions. Experiments showed that the proposed approach significantly improved the performance of these PSO variants. Although designing a rotation invariant PSO,(which is one of the challenges involved in the algorithm (Hansen et al. 2008)) was not the aim of this study, experiments suggested that the proposed approach helps the algorithm to be rotation invariant, as theoretically investigated in Wilke (2005). The presented approach (usage of random rotation matrices) is just one of many possible approaches that can be used for dealing with the identified problems. As an example, one possibility would be to perform all calculations for optimization in the hyper-spherical space (rather than Cartesian space) thereby making rotation calculations much easier. Also, the proposed rotation approach involved rotating the vectors in all possible planes, which is computationally expensive. An alternative approach was presented in this paper (called RotmP), in which only one of the planes was selected randomly and rotation was applied only to that plane. Although results showed that rotation in all possible planes is slightly beneficial, there is still a tradeoff between computational expenses and the quality of solutions. Thus, it is worthwhile to investigate RotmP in more detail and to apply that to large scale optimization problems. According to the experiments presented in this paper, it seems that one of the PSO variants (called AIWPSO) gains more benefit from the usage of rotation matrices. The reason behind this phenomenon is not clear; it might be related to the interactions of different components or parameters of the algorithm. Hence, it is worthwhile to investigate this effect on different methods and to provide reasons as to why some methods are more and some are less compatible with rotation matrices. Moreover, it would be of great interest to evaluate the proposed approach in dynamic environments. We plan to apply the proposed velocity update equation to a set of dynamic test functions in a systematic way to understand the merits of the proposed approach in such environments.

Notes

In general, the personal best can be a set of best positions, but all PSO types listed in this paper use a single personal best.
These two coefficients control the effect of $\vec {p}_{t}^{i}$ and $\vec {g}_{t}$ on the movement of particles and they play an important role in the convergence of the algorithm. They are usually determined by a practitioner (Oca et al. 2009) or by the dynamic of the particles’ movement (Clerc and Kennedy 2002).
Alternatively, these two random matrices are often considered as two random vectors (Kennedy and Eberhart 1995). In this case, the multiplication of these random vectors by PI and SI is element-wise (Hadamard product).
Note that this formulation (Eq. 4) is algebraically equivalent to the one in Eq. 3.
This phenomenon (known also as “drunkard’s walk”) refers to the unlimited growth of velocity vectors and happens when the control parameters are defined randomly (Clerc and Kennedy 2002).
Note that the model of PSO in that paper was formulated the same as InertiaPSO but the values of parameters were selected according to the formulation in CoPSO.
For the sake of simplicity, the 2-dimensional space is considered for the presented examples. However, the results can be generalized to $d$-dimensional spaces.
Note that this property holds for $d$-dimensions as well where $\theta $ and $\theta $’ are considered as angles between the vector $\vec {v}^{\prime }$ and an arbitrary axis of the coordinate.
Note that, although the changes that result from applying Rndms may help the algorithm to jump out of local optima, the controllable changes are more desirable because the exploration/exploitation balance in the algorithm is adjustable.
A special case of this issue occurs when one of the elements of PI or SI is zero (${\vec {x}}$ and ${\vec {p}}$ or ${\vec {x}}$ and ${\vec {g}}$ have equal values for one of their dimensions). In this case, the search becomes a random scale in that dimension (Clerc 2006; Spears et al. 2010; Van den Bergh and Engelbrecht 2010).
Note that this situation may happen with non-zero probability.
Note that these are the parameters that have been considered in this paper and there might be some others that have an effect on the value of $\sigma $.
Note that the maximum number of FE in this test (Table 5) is 10,000 which is different from the one in Table 3, 200,000. Thus, the difference between the results in these two tables is expected.
Table 5 Results of applying CoPSO and CoPSO-Rotm to 18 benchmarks with different number of dimensions
Full size table

Abbreviations

ABSQ:: Average of Best Solution Qualities: the average of solution qualities over 100 runs of an algorithm when it is applied to a benchmark function
AIWPSO:: Adaptive Inertia Weight PSO: one of the variants of PSO
CoPSO:: Constriction coefficient PSO: one of the variants of PSO
Decreasing-IW:: Decreasing Inertia Weight: one of the variants of PSO
FE:: Function evaluation: the number of times that the under-process function is evaluated during the optimization process in one run
GCPSO:: Guaranteed Convergence PSO: one the varients of PSO
Increasing-IW:: Increasing inertia weight: one of the variants of PSO
PI/SI:: Personal influence/social influence: personal and Social Influence (see Eq. 2)
pPSA:: Perturbed particle swarm algorithm: one of the variants of PSO
PSO:: Particle swarm optimization
Rndm :: Diagonal Random Matrices: d*d diagonal matrices with randome values uniformly distributed in [0, 1].
Rotm :: Rotation about the origin in all possible planes with predifined angles ($\alpha _{ij})$ for each plane (see Eq. 10)
Rot($\mu $,$\sigma )$ :: Rotation about the origin in all possible planes with random angles for each plane normaly distributed ($\alpha _{ij} \sim \mathcal{N}(\mu ,\sigma ))$ (see Eq. 10)
Rotm :: Rotation about the origin in all planes with random angles each of which generated according to normal distribution with appropriate variance (see Fig. 7) and $\mu $=0
Stochastic-IW:: Stochastic Inertia Weight PSO: one of the variants of PSO
StdPSO2006:: Standard PSO proposed in 2006: one of the variants of PSO
WPSO:: Wilke’s PSO: one of the varients of PSO

References

Chen, D.B., Zhao, C.X.: Particle swarm optimization with adaptive population size and its application. Appl. Soft Comput. 9(1), 39–48 (2009)
Article Google Scholar
Chow, T.L.: Mathematical Methods for Physicists: A Concise Introduction. Cambridge University Press, Cambridge (2000)
Book Google Scholar
Clerc, M.: (2006) Particle Swarm Optimization. Wiley, New York
Clerc, M., Kennedy, J.: The particle swarm—explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 6(1), 58–73 (2002)
Article Google Scholar
de Oca, M.A.M., Stutzle, T., Birattari, M., Dorigo, M.: Frankenstein’s PSO: a composite particle swarm optimization algorithm. IEEE Trans. Evol. Comput. 13(5), 1120–1132 (2009)
Article Google Scholar
Duffin, K.L., Barrett, W.A.: Spiders: a new user interface for rotation and visualization of n-dimensional point sets. In: Proceedings of the Conference on Visualization, IEEE Computer Society Press, Los Alamitos, CA (1994)
Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: International Symposium on Micro Machine and Human Science, IEEE (1995)
Eberhart, R.C., Shi, Y.: Tracking and optimizing dynamic systems with particle swarms. In: Proceedings IEEE Congress Evolutionary Computation, IEEE (2001)
Eiben, A.E., Hinterding, R., Michalewicz, Z.: Parameter control in evolutionary algorithms. IEEE Trans. Evol. Comput. 3(2), 124–141 (1999)
Article Google Scholar
Engelbrecht, A.P.: Fundamentals of Computational Swarm Intelligence. Wiley, Hoboken (2005)
Google Scholar
Ghosh, S., Das, S., Kundu, D., Suresh, K., Panigrahi, B.K., Cui, Z.: An inertia-adaptive particle swarm system with particle mobility factor for improved global optimization. Neural Comput. Appl. 21(2), 237–250 (2010)
Article Google Scholar
Hansen, N., Ros, R., Mauny, N., Schoenausr, M, Auger, A.: PSO facing non-separable and ill-conditioned problems. In: INRIA (2008)
Helwig, S., Wanka, R.: Particle swarm optimization in high-dimensional bounded search spaces. In: Swarm Intelligence Symposium, IEEE (2007)
Helwig, S., Branke, J., Mostaghim, S.: Experimental analysis of bound handling techniques in particle swarm optimization. IEEE Trans. Evol. Comput. 17(2), 259–271 (2013)
Article Google Scholar
Hsieh, S.T., Sun, T.Y., Liu, C.C., Tsai, S.J.: Efficient population utilization strategy for particle swarm optimizer. IEEE Trans. Syst. Man Cybern. Part B-Cybern. 39(2), 444–456 (2009)
Article Google Scholar
Huang, H., Qin, H., Hao, Z., Lim, A.: Example-based learning particle swarm optimization for continuous optimization. Inf. Sci. (2010)
Jiang, M., Luo, Y.P., Yang, S.Y.: Stochastic convergence analysis and parameter selection of the standard particle swarm optimization algorithm. Inf. Process. Lett. 102(1), 8–16 (2007)
Article MATH MathSciNet Google Scholar
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: International Conference on Neural Networks, IEEE (1995)
Krohling, R.A.: Gaussian Swarm: A Novel Particle Swarm Optimization Algorithm. IEEE (2004)
Liang, J.J., Qin, A.K., Suganthan, P.N., Baskar, S.: Comprehensive learning particle swarm optimizer for global optimization of multimodal functions. IEEE Trans. Evol. Comput. 10(3), 281–295 (2006)
Article Google Scholar
Mendes, R., Kennedy, J., Neves, J.: The fully informed particle swarm: simpler, maybe better. IEEE Trans. Evol. Comput. 8(3), 204–210 (2004)
Article Google Scholar
Nickabadi, A., Ebadzadeh, M.M., Safabakhsh, R.: A novel particle swarm optimization algorithm with adaptive inertia weight. Appl. Soft Comput. 11(4), 3658–3670 (2011)
Article Google Scholar
Poli, R.: Analysis of the publications on the applications of particle swarm optimisation. J. Artif. Evol. Appl. 1–10 (2008)
Poli, R.: Mean and variance of the sampling distribution of particle swarm optimizers during stagnation. IEEE Trans. Evol. Comput. 13(4), 712–721 (2009)
Article Google Scholar
Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization an overview. Swarm Intell. 1(1), 33–57 (2007)
Article Google Scholar
Pso, I.: PSO Source Code. http://particleswarm.info/Standard_PSO_2006.c (2006)
Ratnaweera, A., Halgamuge, S.K., Watson, H.C.: Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients. IEEE Trans. Evol. Comput. 8(3), 240–255 (2004)
Article Google Scholar
Salomon, R.: Reevaluating genetic algorithm performance under coordinate rotation of benchmark functions—a survey of some theoretical and practical aspects of genetic algorithms. BioSystems 39, 263–278 (1995)
Article Google Scholar
Secrest, B.R., Lamont, G.B.: Visualizing particle swarm optimization-Gaussian particle swarm optimization. In: Swarm Intelligence Symposium, IEEE (2003)
Shi, Y., Eberhart, R.: A modified particle swarm optimizer. In: World Congress on Computational Intelligence, IEEE (1998a)
Shi, Y., Eberhart, R.: Parameter Selection in Particle Swarm Optimization. Evolutionary Programming VII. Springer, Berlin (1998b)
Google Scholar
Spears, W.M., Green, D.T., Spears, D.F.: Biases in particle swarm optimization. Int. J. Swarm Intell. Res. 1(2), 34–57 (2010)
Article Google Scholar
Suganthan, P.N., Hansen, N., Liang, J.J., Deb, K., Chen, Y., Auger, A., Tiwari, S.: Problem definitions and evaluation criteria for the CEC 2005 special session on real-parameter optimization. In: KanGAL Report (2005)
Trelea, I.C.: The particle swarm optimization algorithm: convergence analysis and parameter selection. Inf. Process. Lett. 85(6), 317–325 (2003)
Article MATH MathSciNet Google Scholar
Tu, Z., Lu, Y.: A robust stochastic genetic algorithm (StGA) for global numerical optimization. IEEE Trans. Evol. Comput. 8(5), 456–470 (2004)
Article Google Scholar
van den Bergh, F., Engelbrecht, A.: A new locally convergent particle swarm optimiser. In: Systems, Man and Cybernetics, Hammamet, Tunisia, IEEE (2002)
Van den Bergh, F., Engelbrecht, A.P.: A study of particle swarm optimization particle trajectories. Inf. Sci. 176(8), 937–971 (2006)
Article MATH Google Scholar
Van den Bergh, F., Engelbrecht, A.P.: A convergence proof for the particle swarm optimiser. Fundamenta Informaticae 105(4), 341–374 (2010)
MATH MathSciNet Google Scholar
Wang, Y., Li, B., Weise, T., Wang, J., Yuan, B., Tian, Q.: Self-adaptive learning based particle swarm optimization. Inf. Sci. 181(20), 4515–4538 (2011)
Article MATH Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometr. Bull. 1(6), 80–83 (1945)
Article Google Scholar
Wilke, D.: Analysis of the Particle Swarm Optimization Algorithm. Master Thesis, University of Pretoria (2005)
Wilke, D.N., Kok, S., Groenwold, A.A.: Comparison of linear and classical velocity update rules in particle swarm optimization: notes on diversity. Int. J. Numer. Methods Eng. 70(8), 962–984 (2007a)
Article MATH MathSciNet Google Scholar
Wilke, D.N., Kok, S., Groenwold, A.A.: Comparison of linear and classical velocity update rules in particle swarm optimization: notes on scale and frame invariance. Int. J. Numer. Methods Eng. 70(8), 985–1008 (2007b)
Article MATH MathSciNet Google Scholar
Xinchao, Z.: A perturbed particle swarm algorithm for numerical optimization. Appl. Soft Comput. 10(1), 119–124 (2010)
Article Google Scholar
Yao, X., Liu, Y., Lin, G.: Evolutionary programming made faster. IEEE Trans. Evol. Comput. 3(2), 82–102 (1999)
Article Google Scholar
Zhang, L.-P., Yu, H.-J., Hu, S.-X.: Optimal choice of parameters for particle swarm optimization. J. Zhejiang Univ. Sci. 6A(6), 528–534 (2005)
Article Google Scholar
Zheng, Y., Ma, L., Zhang, L., Qian, J.: Empirical study of particle swarm optimizer with an increasing inertia weight. In: Congress on Evolutionary Computation, IEEE (2003)

Download references

Acknowledgments

The authors would like to extend their great appreciation to the associate editor and anonymous reviewers for constructive comments that have helped us to improve the quality of the paper. Also, the authors extend thanks to Dr. Frank Neumann, Dr. Andrew Sutton, and Dr. Michael Kirley who provided us with excellent comments. This work was partially funded by the ARC Discovery Grants DP0985723, DP1096053, and DP130104395, as well as by Grant N N519 5788038 from the Polish Ministry of Science and Higher Education (MNiSW).

Author information

Authors and Affiliations

School of Computer Science, The University of Adelaide, Rm 4.52, Adelaide, SA, 5005, Australia
Mohammad Reza Bonyadi & Zbigniew Michalewicz
Institute of Computer Science, Polish Academy of Sciences, ul. Ordona 21, 01-237 , Warsaw, Poland
Zbigniew Michalewicz
Polish-Japanese Institute of Information Technology, ul. Koszykowa 86, 02-008 , Warsaw, Poland
Zbigniew Michalewicz
School of Computer Science & IT, RMIT University, Melbourne, 3001, Australia
Xiaodong Li

Authors

Mohammad Reza Bonyadi
View author publications
You can also search for this author in PubMed Google Scholar
Zbigniew Michalewicz
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Reza Bonyadi.

Appendices

Appendix I

Appendix I provides the formula for all used benchmark functions. The variable pd represents the number of dimensions.

$\varvec{f}_{1}$ : Rosenbrock’s function

$$\begin{aligned} f(x)=\sum _{i=1}^{pd-1} {(100(x_{i+1} -x_i ^{2})^{2}+(x_i -1)^{2})} \end{aligned}$$

$\varvec{f}_{2}$ : Rastrigin’s function

$$\begin{aligned} f(x)=\sum _{i=1}^{pd} {(x_i^2 -10\cos (2\pi x_i )+10)} \end{aligned}$$

$\varvec{f}_{3}$ : Ackley’s function

$$\begin{aligned} f(x)=20+e-20e^{-0.2\sqrt{\frac{\sum _{i=1}^{pd} {x_i^2 } }{pd}}}-e^{\frac{\sum _{i=1}^{pd} {\cos (2\pi x_i )} }{pd}} \end{aligned}$$

$\varvec{f}_{4}$ : Weierstrass function

$$\begin{aligned} \begin{array}{l} f(x)=\sum _{i=1}^{pd} {(\sum _{k=0}^{k\max } {[a^{k}\cos (2\pi b^{k}(x_i +0.5))]} )} \\ -pd\sum _{k=0}^{k\max } {[a^{k}\cos (2\pi b^{k}0.5)]} , \\ where_ a=0.5,b=0.3,k\max =20 \\ \end{array} \end{aligned}$$

$\varvec{f}_{5}$ : Griewank function

$$\begin{aligned} f(x)=1+\frac{\sum _{i=1}^{pd} {(x_i -100)^{2}} }{4000}-\prod _{i=1}^{pd} {\cos (\frac{x_i -100}{\sqrt{i}})} \end{aligned}$$

$\varvec{f}_{6}$ : Sphere function

$$\begin{aligned} f(x)=\sum _{i=1}^{pd} {x_i^2 } \end{aligned}$$

$\varvec{f}_{7}$ : Non-continuous Rastrigin’s function

$$\begin{aligned} \begin{array}{l} f(x)=\sum _{i=1}^{pd} {(y_i^2 -10\cos (2\pi y_i )+10)} \\ y_i =\left\{ {{\begin{array}{ll} {x_i }&{} {\left| {x_i } \right| <\frac{1}{2}} \\ {\frac{round(2x_i )}{2}}&{} {\left| {x_i } \right| \ge \frac{1}{2}} \\ \end{array} }} \right. ,\quad i=1,2,\ldots ,pd \\ \end{array} \end{aligned}$$

$\varvec{f}_{8}$ : Quadratic function

$$\begin{aligned} f(x)=\sum _{i=1}^{pd} {\left( {\sum _{j=1}^i {x_{j}} } \right) ^{2}} \end{aligned}$$

$\varvec{f}_{9}$ : Generalized penalized function

$$\begin{aligned} f(x)&= 0.1\left\{ {\begin{array}{ll} \sin ^{2}(3\pi x_1 )+\sum _{i=1}^{pd-1} {(x_i -1)^{2}[1+\sin ^{2}(3\pi x_{i+1} )]} + \\ (x_{pd} -1)^{2}[1+\sin ^{2}(2\pi x_{pd} )] \\ \end{array}} \right\} \\&+\sum _{i=1}^{pd} {u(x_i ,5,100,4)} \\ \end{aligned}$$

where

$$\begin{aligned} u(x_i a,k,m)=\left\{ {{\begin{array}{l} {{\begin{array}{ll} {k(x_i -a)^{m}}&{} {x_i >a} \\ \end{array} }} \\ {{\begin{array}{ll} 0&{} {-a\le x_i \le a} \\ \end{array} }} \\ {{\begin{array}{ll} {k(-x_i -a)^{m}}&{} {x_i <a} \\ \end{array} }} \\ \end{array} }} \right. \end{aligned}$$

Appendix II

Appendix II displays optimization curves for some of the benchmark functions ($f_{1}, f_{2}, f_{3}, f_{5}, f_{6}, f_{8})$ using theStdPSO2006, theStdPSO2006-Rotm,the CoPSO, the CoPSO-Rotm, the AIWPSO, and the AIWPSO-Rotm in 10, 30, and 60 dimensional spaces.

Appendix III

Appendix III presents the implementation details for calculating Rotm($\sigma )$ as given in Eq. (10). Assume we need to multiply $d$-dimensional vector $v$ by Rotm($\sigma )$. According to Eq. (10), we have:

$$\begin{aligned} v\times Rotm\left( \sigma \right) =v\times Rot_{1,2} \left( {\alpha _{1,2} } \right) \times Rot_{1,3} \left( {\alpha _{1,3} } \right) \times \cdots \times Rot_{d-1,d} \left( {\alpha _{d-1,d} } \right) \end{aligned}$$

The number of matrices on the right side of the equation is $d(d-1)/2$. However, according to Eq. (10), each matrix $Rot_{i,j} \left( {\alpha _{i,j} } \right) $ only contains 4 non-zero values that are in ($i$, $i)$, ($i$, $j)$, ($j$, $i)$, and ($j$, $j)$ positions of Rot. Consequently, multiplying $v$ by Rot can be performed as follows:

$$\begin{aligned} v_k \times Rot_{i,j} \left( {\alpha _{i,j} } \right) =\left\{ {{\begin{array}{ll} {v_k Rot_{i,i} \left( {\alpha _{i,i} } \right) +v_j Rot_{j,i} \left( {\alpha _{j,i} } \right) }&{}\quad {if k=i} \\ {v_k Rot_{j,j} \left( {\alpha _{j,j} } \right) +v_i Rot_{i,j} \left( {\alpha _{j,i} } \right) }&{}\quad {if k=j} \\ {v_k }&{}\quad {otherwise} \\ \end{array} }} \right. \end{aligned}$$

Clearly, multiplication of $v_{k}$ and $Rot_{i,j} \left( {\alpha _{i,j} } \right) $ can be done in O(1). Also, this multiplication alters only two indexes in $v$. This formulation can be repeatedly applied to $v$ for different matrices $Rot_{j,i} \left( {\alpha _{j,i} } \right) $. Because the number of these matrices is $d(d-1)/2$ and the multiplication is in O(1), this multiplication is done in $O(d^{2})$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonyadi, M.R., Michalewicz, Z. & Li, X. An analysis of the velocity updating rule of the particle swarm optimization algorithm. J Heuristics 20, 417–452 (2014). https://doi.org/10.1007/s10732-014-9245-2

Download citation

Received: 04 February 2013
Revised: 08 March 2014
Accepted: 13 March 2014
Published: 01 April 2014
Issue Date: August 2014
DOI: https://doi.org/10.1007/s10732-014-9245-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An analysis of the velocity updating rule of the particle swarm optimization algorithm

Abstract

Similar content being viewed by others

Particle Swarms Reformulated Towards a Unified and Flexible Framework

Particle Swarm Optimization with Dynamic Topology and Conservation of Evaluations

Velocity Restriction-Based Improvised Particle Swarm Optimization Algorithm

1 Introduction

2 Background

2.1 Basics of PSO

2.2 PSO variants

2.3 Investigation of random matrices

3 Effects of random matrices on vectors

4 Proposed approach

5 Experimental results

5.1 Experimental setup

5.2 Test functions

5.3 Parameter setting

5.4 Results analysis

6 Conclusion and future work

Notes

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix I

Appendix II

Appendix III

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An analysis of the velocity updating rule of the particle swarm optimization algorithm

Abstract

Similar content being viewed by others

Particle Swarms Reformulated Towards a Unified and Flexible Framework

Particle Swarm Optimization with Dynamic Topology and Conservation of Evaluations

Velocity Restriction-Based Improvised Particle Swarm Optimization Algorithm

Explore related subjects

1 Introduction

2 Background

2.1 Basics of PSO

2.2 PSO variants

2.3 Investigation of random matrices

3 Effects of random matrices on vectors

4 Proposed approach

5 Experimental results

5.1 Experimental setup

5.2 Test functions

5.3 Parameter setting

5.4 Results analysis

6 Conclusion and future work

Notes

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix I

Appendix II

Appendix III

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation