Keywords

1 Introduction

There is plenty of challenging real applications in sciences where optimization is naturally involved, and sophisticated minimization techniques are definitely necessary in order to allocate resources. In particular, these scientific tough problems often involve a remarkably large computational cost, along with large time of computation and machine resources.

Up to 15–20 years ago, to a great extent the main interest of theoreticians in optimization was for methods based on the use of derivatives. This was basically due to the following three strong reasons:

  • in several cases derivatives are available when solving computational problems. In particular, they are always ‘analytically’ available if the nonlinear functions involved are known in closed form (see for instance the work by Griewank in 2000 [14]), and they can be exactly computed (not simply approximated) at reasonable cost in small-medium scale problems [12, 22];

  • strong theoretical results have been developed, both in terms of convergence and computational performance, for optimization methods where derivatives (say of first/second order) are available;

  • the use of machine resources at a cheaper cost has allowed the solution of problems where derivatives can be suitably approximated by finite differences, using either coarse or fine techniques.

On the other hand, engineering design offers a huge number of real-world problems where scientists are continuously asked to apply robust methods, using the most recent theoretical advances. In particular, design problems often include functions which are not differentiable or where the use of derivatives is possibly discouraged. The following issues motivate the latter statement and give more precise guidelines for analyzing and improving optimization procedures not involving derivatives.

  • For large scale problems, computing derivatives by finite differences might be prohibitively costly, and also Automatic Differentiation [14] might be of difficult application. Furthermore, the computation of derivatives by finite differences proved to be very harmful when the scale of the problem increases.

  • Most of the codes for complex design problems are parameter dependent, and the parameters need to be ‘properly’ assessed. Their correct choice in practice implies that the overall performance of the code needs to be optimized with respect to those parameters. Thus, an implicit optimization problem with respect to these parameters requires a solution, and surely the derivatives of the functions involved are unavailable, being the output of a code nondifferentiable.

  • Most of the design problems need solution procedures where expensive simulations are performed. Typically, simulations are affected by noise, systematic errors arise and stochastic parameters are used, so that derivatives are essentially unavailable or their use may lead to completely destroy the robustness of procedures.

The issues above contribute to motivate the use of efficient and effective derivative-free methods, in order to solve a wide range of challenging problems.

In this chapter we focus on a modification of the PSO algorithm (originally proposed by Kennedy and Eberhart in 1995 [18]), for the solution of the unconstrained global optimization problem

$$\mathop {\hbox{min} }\limits_{{x \in {\rm I\!R}^{\text{n}} }} f(x),\qquad f:{\rm I\!R}^{\text{n}} \to {\rm I\!R}.$$
(1)

At present f(x) is assumed to be a continuous nonlinear, non-convex and computationally expensive function. Observe that in (1) we aim at using a modified PSO algorithm in order to detect a global minimum of f(x), i.e. a point \(x^{*} \in {\rm I\!R}^{\text{n}}\) such that f(x *) ≤ f(x), for any \(x \in {\rm I\!R}^{\text{n}}\).

The reason for which we focus on a modified PSO in order to tackle (1) is that when the function f(x) is computationally costly, exact methods may be definitely too expensive to solve (1). Moreover, some exact methods are possibly unable to provide a current satisfactory approximation of a solution. In the latter cases the use of heuristic approaches may be fruitful, in particular when the computational resources and/or the time allowed for the computation are severely limited, and asymptotically convergent procedures are unaffordable. On the basis of the latter observations, PSO proved to be both effective and efficient on several practical applications [8, 23, 24], so that it is often the heuristics of choice.

Recalling the above considerations, in the framework of derivative-free optimization, we think that combining heuristic procedures and exact methods could be amenable, provided that:

  1. 1.

    the overall hybridized scheme is efficient, i.e. it is possibly not too expensive. A legitimate expectation is that the overall computational cost of the combined scheme is in-between the cost of (not combined) PSO and the cost of the exact method;

  2. 2.

    the results provided by the combined procedure are endowed with some theoretical properties, which are guaranteed by an effective combination of PSO and the exact method. Typical theoretical properties characterize both the convergence of sequences of points, and the stationarity of limit points of the sequences generated by the hybridized scheme.

Thus, we focus here on some modifications of PSO, where converging subsequences of iterates are generated. As a consequence, in the next section we are committed to provide clear conditions, under which PSO particles trajectories can be controlled. On the other hand, our modifications proposed for PSO guarantee that the generated sequences of iterates have subsequences converging to stationary points of the objective function (see also [15, 16, 29, 30]). In particular, since there are in the literature theoretical results for several exact derivative-free methods [7, 19], we decided to combine PSO with a line search-based derivative-free algorithm, which is to our knowledge still an unexplored issue, apart from the analysis by Campana et al. in 2009 [1]. We consider here also a numerical experience on a simplified method proposed by Campana et al. in 2009 [1], where the choice of the search directions is particularly ‘intuitive’, and preserves some relevant theoretical results.

Observe that the aim of this paper is to provide robust methods with a twofold purpose. First we would like to exploit the capability of PSO to provide a satisfactory approximation to a global solution, within a few iterations. Then, by combining PSO with an exact method, we want to force the convergence of subsequences of points toward a stationary point, which satisfies first order optimality conditions for f(x). This paper is specifically concerned with both reporting some theoretical results and performing a valid numerical experience, to prove our theoretical conclusions.

As regards the symbols we adopt in this chapter, subscripts are used to identify the particles in a PSO scheme, whilst the superscripts indicate the iteration. By I we denote the identity matrix, and \(\| \cdot \|\) represents the Euclidean norm of a vector/matrix. Finally, B(c, r) is the real ball with center in the vector c and radius r > 0, i.e. \(B(c,r) = \{ y \in {\rm I\!R}^{\text{n}} :\left\| y - c \right\| \le r\}\). All the other symbols follow a very standard notation.

In Sects. 23 we discuss some issues related to the stability of PSO iteration. Then, the Sects. 45 introduce both the theory and the motivations for our modification of PSO iteration. In Sect. 6 we describe our proposal and we carry out the related convergence analysis. Section 7 is then devoted to report a numerical experience on a test case and real problems from ship design. Finally, Sect. 8 contains some conclusions.

2 Stable and Unstable Trajectories for PSO

The strategy of PSO for solving (1) is that of generating the P sequences \(\{ x_j^k \}\), j = 1,…,P, of iterates in \({\rm I\!R}^{\text{n}}\), each associated with the j-th particle of the swarm. The particles share information on the point \(p_{g}^{k}\), at any iteration k, satisfying the condition

$$f(p_{g}^{k} ) \le\;f(x_{j}^{h} ),\qquad \forall h \le k,\;\;\forall j \in \{ 1, \ldots ,P\} .$$

To our purposes we preliminarily refer to the following PSO iteration, for any k ≥ 0:

$$\begin{aligned} v_{j}^{k + 1} & = \chi_{j} \left[ {w_{j}^{k} v_{j}^{k} + c_{j} r_{j} \otimes (p_{j}^{k} - x_{j}^{k} ) + c_{g} r_{g} \otimes (p_{g}^{k} - x_{j}^{k} )} \right], \\ x_{j}^{k + 1} & = x_{j}^{k} + v_{j}^{k + 1} , \\ \end{aligned}$$
(2)

where j = 1,…,P represents the j-th particle (i.e. the j-th sequence of iterates), P is finite, while \(v_{j}^{k}\) and \(x_{j}^{k}\) are n-real vectors, which respectively represent the speed (i.e. the search direction) and the position of the j-th particle at step k. The real bounded coefficients c j and c g are typically given at the outset of iteration k = 0, and are possibly not modified unless stagnation arises. On the other hand, with \(r_{j} \otimes (p_{j}^{k} - x_{j}^{k} )\) (similarly with \(r_{g} \otimes (p_{g}^{k} - x_{j}^{k} )\)) we indicate that every entry of the vector \((p_{j}^{k} - x_{j}^{k} )\) is multiplied by a different value of r j , which is a random parameter in the uniform distribution between 0 and 1. Finally, for a given k ≥ 0, the n-real vectors \(\{ p_{j}^{k} \}\) satisfy the conditions

$$\begin{array}{*{20}l} {f(p_{j}^{k} ) \le\;f(x_{j}^{\ell } ),\qquad \forall \ell \le k,\,\,p_{j}^{k} \in \{ x_{j}^{\ell } \} ,} \hfill \\ \end{array}$$
(3)

moreover, χ j (constriction coefficient) and \(w_{j}^{k}\) (inertia) are positive bounded coefficients. In words the vector \(p_{j}^{k}\) represents the ‘best position’ in the j-th subsequence up to iteration k, while \(p_{g}^{k}\) is the ‘best position’ among all the vectors \(\{ p_{1}^{k} , \ldots ,p_{P}^{k} \}\). A keynote issue in PSO is that an effective choice of the coefficients χ, w k, c j and c g is often problem dependent, whereas several standard settings for them have been proposed in the literature (see [25]). Notwithstanding the latter fact, more precise rules for assessing the coefficients in (2) are still sought, with a specific reference to eventually avoid stagnation of PSO iteration.

In order to possibly generalize the recurrence (2), we can assume that the speed \(v_{j}^{k + 1}\) depends on all the \(P\) vectors \((p_{h}^{k} - x_{j}^{k} )\) (see also [21]), h = 1,…,P, and not only on the pair of vectors \((p_{j}^{k} - x_{j}^{k} )\), \((p_{g}^{k} - x_{j}^{k} )\). The resulting new iteration represents the so called Fully Informed PSO (FIPSO). The latter generalization is possibly unessential for our purposes, so that hereafter we limit our analysis to the more standard iteration (2).

Observe that in order to give rules, which ensure that PSO trajectories satisfy suitable conditions, we need to impose some restrictions to the coefficients in (2). In particular, after reviewing the literature we remark that the following (not mutually exclusive) conditions can be reasonably expected to hold for particles trajectories in PSO:

  1. (1)

    the sequence \(\{ x_{j}^{k} \}\) converges to \(x_{j}^{ * }\), for any j = 1,…,P, with \(x_{1}^{ * } = \cdots = x_{P}^{ * }\);

  2. (2)

    the sequence \(\{ x_{j}^{k} \}\) converges to \(x_{j}^{ * }\), for any j = 1,…,P, but possibly \(x_{j}^{ * } \ne x_{\ell }^{ * }\), with \(1 \le\;j \neq \ell \le P\);

  3. (3)

    the sequence \(\{ x_{j}^{k} \}\) is not diverging for any j = 1,…,P, i.e. \(\lim_{k \rightarrow \infty}\|x_j^k \| < + \infty\), for any j = 1,…,P and any k ≥ 0.

We highlight that different bounds can be imposed on the coefficients χ, w k, c j and c g in (2), in order to ensure that either of the three conditions (1)–(3) is fulfilled. It is also not difficult to realize that (1) implies (2) (but not viceversa) and (2) implies (3) (but not viceversa). Thus, the conditions on the coefficients of PSO ensuring (3), are expected to be both weak enough and sufficiently general to allow a wide exploration of the search space. For the latter reason, in this paper we prefer to study and analyze the case (3), while the interested reader can possibly refer to PSO literature (see also Sect. 4) for the analysis on the cases (1)–(2). Now, by (2) let us preliminarily consider the following assumption, in order to simplify our notation.

Assumption 1

We assume in (2) that χ j  = χ > 0, c j  = c > 0 and r j  = r > 0, for any j = 1,…,P. Moreover, \(c_{g} = \bar{c} > 0\), \(r_{g} = \bar{r} > 0\) and \(w_{j}^{k} = w\), for any j = 1,…,P and any k ≥ 0.

Then (see also [2]), using Assumption 1 the iteration (2) is equivalent to the discrete stationary (time-invariant) system

$$X_{j} (k + 1) = \left( {\begin{array}{*{20}c} {\chi wI} & { - \chi (cr + \bar{c}\bar{r})I} \\ \ \\ {\chi wI} & {\left[ {1 - \chi (cr + \bar{c}\bar{r})} \right]I} \\ \end{array} } \right)X_{j} (k) + \left( {\begin{array}{*{20}c} {\chi (crp_{j}^{k} + \bar{c}\bar{r}p_{g}^{k} )} \\ \ \\{\chi (crp_{j}^{k} + \bar{c}\bar{r}p_{g}^{k} )} \\ \end{array} } \right),$$
(4)

where

$$X_{j} (k) = \left( {\begin{array}{*{20}c} {v_{j}^{k} } \\ {x_{j}^{k} } \\ \end{array} } \right) \in {{\rm I\!R}}^{{2{\text{n}}}} ,\qquad {k} \ge 0.$$
(5)

For a given j, the vectors {X j (k)} identify a sequence of points in \({{\rm I\!R}}^{2\text{n}}\) and represent indeed the trajectory of the j-th particle in the state space \({{\rm I\!R}}^{{2{\text{n}}}}\). By definition, since X j (k) represents a state vector, it can be split into the so called free response \(X_{j\fancyscript{L}} (k)\) and the forced response \(X_{j\fancyscript{F}} (k)\) (see also [27]), such that

$$X_{j} (k) = X_{j\fancyscript{L}} (k) + X_{j\fancyscript{F}} (k),$$
(6)

being

$$X_{j\fancyscript{L}} (k) = \varPhi_{j} (k)X_{j} (0),\qquad X_{j\fancyscript{F}} (k) = \sum\limits_{\tau = 0}^{k - 1} H_{j}(k - \tau )U_{j} (\tau ),$$
(7)

and (with a little computation)

$$\varPhi_{j} (k) = \left( {\begin{array}{*{20}c} {\chi wI} & { -\chi (cr + \bar{c}\bar{r})I} \\ \ \\{\chi wI} & {\left[ {1 - \chi(cr + \bar{c}\bar{r})} \right]I} \\ \end{array} } \right)^{k} ,$$
(8)
$$H_{j} (k - \tau ) = \left( {\begin{array}{*{20}c} {\chi wI} & { - \chi (cr + \bar{c}\bar{r})I} \\ \ \\ {\chi wI} & {\left[ {1 - \chi (cr + \bar{c}\bar{r})} \right]I} \\ \end{array} } \right)^{k - \tau - 1} ,$$
(9)
$$U_{j} (\tau ) = \left( {\begin{array}{*{20}c} {\chi (crp_{j}^{\tau } + \bar{c}\bar{r}p_{g}^{\tau } )} \\ \ \\ {\chi (crp_{j}^{\tau } + \bar{c}\bar{r}p_{g}^{\tau } )} \\ \end{array} } \right).$$
(10)

We highlight the important fact that the free response \(X_{j\fancyscript{L}} (k)\) in (6)–(7) only depends on the initial point X j (0), and is not affected by changes of the vectors \(p_{j}^{\tau }\), p τ g , τ ≥ 0. The latter observation is of great interest, in order to assess rules for the parameters χ, w, c, \(r\), \(\bar{c}\) and \(\bar{r}\), as the next section shows.

3 Issues on Assessing Parameters in PSO

As described in the last section, the fruitful choice of the parameters in PSO is to a large extent guided by a couple of issues: the efficiency of the overall scheme and the necessity of guaranteeing at least non-diverging trajectories of the particles. As regards the first issue, the literature of PSO provides several suggestions which have proved to be effective in most cases (see for instance [4, 26]). Conversely, some more recent papers, concerned with studying the stability of particles trajectory, have detailed some restrictions on PSO parameters in order to dynamically control the trajectory of particles, and make them more accurately predictable. On this guideline, papers like Kadirkamanathan et al. in 2006 [17], and Gazi in 2012 [12] are advisable, since they contain clear indications on the latter issue.

Here we want to propose a unified approach for parameters assessment in PSO, using the reformulation (4)–(5). In practice (see also [27]), as long as Assumption 1 holds, our perspective is that of using classic analysis for discrete linear systems in order to deduce bounds on PSO parameters. On the other hand, we want to carry on our analysis as rigorously as possible, so that we will separately develop formal conditions for the following three purposes:

  1. (a)

    define necessary conditions (possibly not sufficient) on PSO parameters, so that particles trajectories are not diverging, i.e. the quantities \({\parallel }X_{j} (k){\parallel }\) are limited, for any j = 1,…,P and any k ≥ 0;

  2. (b)

    ensure no stagnation (see [13]) for a modified PSO scheme;

  3. (c)

    possibly introduce simple modifications to PSO, so that the resulting scheme is globally convergent to stationary points, i.e. for any choice of the initial positions \(x_{1}^{0} , \ldots ,x_{P}^{0}\) of the particles, the scheme generates sequences of points like {y k}, such that

    $$\mathop {\lim \,{\rm inf}}\limits_{k \to \infty } \ \left\| \nabla f(y^{k} ) \right\| = 0.$$

Observe that the latter condition substantially guarantees that possibly ‘true’ minima for f(x) are eventually outreached, even in case stagnation arises. We strongly remark that though we used the symbol \({\nabla }f(x)\), the latter relation will be proved without using any information on \({\nabla }f(x)\), so that a completely derivative-free method will be developed.

In the current section we consider the issue (a), while in the next section issues (b) and (c) will be analyzed in detail.

As well known, for a discrete linear system like (4)–(5) (see for instance [27]), if the j-th trajectory {X j (k)} in (6) is non-diverging, then

$$\mathop {\lim }\limits_{k \to \infty } X_{j} (k) = \mathop {\lim }\limits_{k \to \infty } X_{j\fancyscript{F}} (k),\qquad j = 1, \ldots ,P.$$

In other words, the free response \(X_{j\fancyscript{L}} (k)\) is bounded away from zero only for finite values of the index k, and

$$\mathop {\lim }\limits_{k \to \infty } X_{j\fancyscript{L}} (k) = 0.$$
(11)

This introduces a possible rule to address bounds for PSO parameters, since relation (11) imposes some restrictions to the eigenvalues of matrix \(\varPhi_{j} (k)\) in (8). Observing that Φ j (k) = Φ j (1)k, for any k ≥ 0, the 2n eigenvalues of the unsymmetric matrix \(\varPhi_{j} (1)\) are real. In particular, by setting for the sake of simplicity in (8)

$$a = \chi w,\qquad \omega = \chi (cr + \bar{c}\bar{r}),$$
(12)

we can prove that after some computation the matrix \(\varPhi_{j} (1)\) has two distinct eigenvalues λ j1 and λ j2 given by

$$\begin{aligned} \lambda_{j1} & = \frac{{1 - \omega + a - \left[ {(1 - \omega + a)^{2} - 4a} \right]^{1/2} }}{2}, \\ \lambda_{j2} & = \frac{{1 - \omega + a + \left[ {(1 - \omega + a)^{2} - 4a} \right]^{1/2} }}{2}, \\ \end{aligned}$$
(13)

each of them with algebraic multiplicity n. Thus, a necessary (but in general not sufficient) condition for the j-th trajectory {X j (k)} to be non-diverging and satisfy \(\mathop {\lim }\nolimits_{k \to \infty } X_{j\fancyscript{L}} (k) = 0\), is the following, which yields the required conditions on the coefficients of PSO iteration.

Lemma 1

Consider the PSO iteration (2). Let Assumption 1 hold. Let the eigenvalues λ j1 and λ j2 in (13) satisfy the conditions

$$|\lambda_{j1} | < 1, \qquad |\lambda_{j2} | < 1,$$
(14)

for any j = 1,…,P. Then, the sequence \(\{ X_{j\fancyscript{L}} (k)\}\) satisfies \(\mathop {\lim }\nolimits_{k \to \infty } X_{{j\fancyscript{L}}} (k) = 0\), and condition (14) is a necessary condition for the trajectory {X j (k)} to be non-diverging.

Note that most of the typical settings for PSO parameters proposed in the literature (see e.g. [4, 5, 26]), satisfy the condition (14). Moreover, Lemma 1 does not guarantee that the sequence {X j (k)}, for a given j, is converging, and indeed it possibly does not admit even limit points. This means that condition (14) only provides a result for item (a), but is definitely inadequate to treat items (b) and (c). This also implies that for instance if (14) holds, then possibly the trajectory \(\{ X_{j} (k)\}\) is not converging, or some of its subsequences converge to the point \(x_{j}^{ * }\) which is not a minimum and does not satisfy the property

$$f(x_{j}^{ * } ) \le\;f(x),\qquad \forall x \in B(x_{j}^{ * } ,\varepsilon ),\,\,\varepsilon > 0.$$

In the next sections we focus on a rigorous analysis of the latter issue. I.e., under mild assumptions we propose a modified PSO scheme such that, if the function f(x) is continuously differentiable, the sequence \(\{ x_{1}^{1} , \ldots ,x_{P}^{1} , \ldots ,x_{1}^{k} , \ldots ,x_{P}^{k} \}\) admits stationary limit points for f(x), so that

$$\mathop {\lim \,{\rm inf}}\limits_{k \to \infty } \ \left\| {{\nabla }f(x_{j}^{k} )} \right\| = 0\qquad {\text{or}}\qquad \mathop {\lim }\limits_{k \to \infty } \left\| {{\nabla }f(x_{j}^{k} )} \right\| = 0.$$
(15)

As the reader can expect, there is a theoretical and computational evidence that fulfilling condition (15) may be met (in several real applications) at the expense of a reasonably larger computational cost, with respect to the standard PSO iteration (2).

4 PSO and Stationarity

In Sect. 1 we have detailed some motivations, to explain why in the last two decades design optimization and simulation-based optimization have required effective and robust derivative-free optimization methods. Exploiting also the recent advances on parallel computing, the last two decades have seen in particular the blow up of a remarkably effective class of optimization methods, endowed with complete convergence analysis and competitive performance: namely direct search methods. The latter class (see [19]) counts several optimization methods, which do not use derivatives but basically rely on “the ranks of a countable set of function values” [19], i.e. on comparing the objective function values in specific points of the search space.

Among direct search methods we focus here on a subclass of iterative techniques, which is usually addressed in the literature as Generating Set Search (GSS). In the latter class, the main idea is that of decreasing the objective function at each iteration, on a cone in \({{\rm I\!R}}^{\text{n}}\) generated by suitable search directions. Pattern search methods are in the GSS class, and have the distinguishing feature of enforcing, at each iteration, a simple decrease of the objective function. Conversely, also line search-based derivative-free methods are iterative schemes in GSS class, however they impose at each iteration a so called sufficient reduction of f(x). We want to show that global convergence properties of a modified PSO scheme may be obtained by properly combining PSO with a line search-based derivative-free method, so that convergence to stationary points can be forced at a reasonable cost (see also items (b) and (c) of Sect. 3). On this guideline, there is plenty of examples where evolutionary strategies are combined with GSS schemes and yield globally convergent algorithms (see for instance [15, 31, 32]). In particular, in the last reference PSO is hybridized within a pattern search framework, and a resulting method converging to stationary points is given.

Observe that in the literature of derivative-free methods we can also find PSO-based approaches combined with a trust-region framework (see [31, 32]), in order to provide again globally convergent methods to stationary points.

In this section we consider the solution of the problem (1), and we focus on a modified PSO scheme, combined with a line search-based derivative-free algorithm. We study in particular the nature of limit points of the sequences \(\{ x_{j}^{k} \}\), j = 1,…,P, when Assumption 1 holds. However, we think that to have a better insight in our analysis, the following very preliminary results (see also [7]) can help the reader grasp the importance of the GSS class, in order to ensure convergence to stationary points.

Definition 1

Given the set of vectors D = {d 1,…,d m } of \({{\rm I\!R}}^{\text{n}}\), we say that D is a Positively Spanning Set (PSS) if for any vector \(u \in {{\rm I\!R}}^{\text{n}}\) we have

$$u = \sum\limits_{i = 1}^{m} \alpha_{i} d_{i} ,\quad \alpha_{i} \ge 0,$$

i.e. any vector u of \({{\rm I\!R}}^{\text{n}}\) can be expressed as the weighted sum of the vectors in D, using nonnegative weights.

Thus, a PSS substantially provides a set of vectors which positively span the space \({{\rm I\!R}}^{\text{n}}\). It can be easily proved that if D is a PSS of \({{\rm I\!R}}^{\text{n}}\), then its cardinality must be at least n + 1. It is very easy to define PSSs; simple examples of them in \({{\rm I\!R}}^{2}\) are given in Fig. 1, where m = 4 (top and bottom) and m = 3 (middle).

Fig. 1
figure 1

Examples of PSSs in \({\rm I\!R}^{2}\). The subscript ‘⊕’ in the uppermost PSS means that the vectors in the set are the coordinate unit vectors ± e i , i = 1,…,n

In addition, there is the following nice property of PSSs that we are going to exploit in our proposal. If the point \(x \in {{\rm I\!R}}^{\text{n}}\) is not stationary for f in (1) (i.e. \({{\nabla} }f(x) \ne 0\)), given the PSS D in \({{\rm I\!R}}^{\text{n}}\), there exists at least one vector, say \(\hat{d} \in D\), such that \({\nabla }f(x)^{T} \hat{d} < 0\), meaning that the direction \(\hat{d}\) is of descent for f(x) at x. The latter fact ensures that if the current point is not stationary, and a PSS is available, roughly speaking there is at least one direction of descent for f(x) in the PSS.

A so called cosine measure cm(D) can be associated to the PSS D of \({{\rm I\!R}}^{\text{n}}\), defined as follows.

Definition 2

Given the PSS D = {d 1,…,d m } in \({{\rm I\!R}}^{\text{n}}\), we define the cosine measure cm(D) of D as

$$cm(D) = \mathop {\hbox{min} }\limits_{{v \in {{\rm I\!R}}^{\text{n}} \backslash \{ 0\} }} \mathop {\hbox{max} }\limits_{{d_{i} \in D}} \left\{ {\frac{{v^{T} d_{i} }}{{{\parallel }v{\parallel \parallel }d_{i} {\parallel }}}} \right\},$$

being always cm(D) > 0.

By Definition 2 the quantity cm(D) clearly represents a measure of the least orthogonal projection of any vector in \({{\rm I\!R}}^{\text{n}}\) on vectors in D. As a consequence, if cm(D) → 0 then D might be not a ‘good’ PSS and consequently it might be difficult to find a descent direction for f in D. The following result clarifies the importance of introducing PSSs in derivative-free optimization, in order to characterize stationary points, without using any information on the gradient of f in (1).

Theorem 1

Let D = {d 1,…,d m } be a PSS. Suppose the function f(x) in (1) is continuously differentiable in \({{\rm I\!R}}^{\text{n}}\) and the gradient \({\nabla }f(x)\) satisfies the Lipschitz condition

$$\left\| \nabla f(y) - {\nabla }f(x) \right\| \le \nu {\parallel }y - x{\parallel },\qquad \forall y \in \mathop B\limits^{ \circ } (x,\alpha \delta ),\quad \delta = \mathop {\hbox{max} }\limits_{1 \le i \le m} {\parallel }d_{i} {\parallel },$$

for some ν > 0 and α > 0. If f(x)  f(x + αd i  ), i = 1,…,m, then the following bound holds for the gradient of f(x)

$${\| \nabla }f(x){\| } \le \frac{\nu }{2}\frac{1}{cm(D)}\alpha \delta .$$
(16)

Proof

Since D is a PSS, there exists at least one vector in the set \(\{ d_{i} \}\), say \(\hat{d} \in D\), such that

$$cm(D) \le \frac{{ - \nabla f(x)^{T} \hat{d}}}{{{\parallel }\nabla f(x){\parallel \parallel }\hat{d}{\parallel }}},$$

hence, recalling that α is positive, we have equivalently

$$cm(D){\parallel }\nabla f(x){\parallel \parallel }\hat{d}{\parallel }\alpha \le - \nabla f(x)^{T} (\alpha \hat{d}).$$
(17)

On the other hand, the hypothesis of continuous differentiability of f(x) allows to apply the Mean Value Theorem in the integral form, being for any d ∊ D

$$f(x + \alpha d) = f(x) + \int\limits_{0}^{1} {\nabla }f[x + t\alpha d]^{T} (\alpha d)dt,$$

or equivalently

$$0 \le f(x + \alpha d) - f(x) = \int\limits_{0}^{1} {\nabla }f[x + t\alpha d]^{T} (\alpha d)dt.$$
(18)

Combining (17) and (18) we obtain for the direction \(\hat{d}\)

$$\begin{aligned} cm(D){\| \nabla }f(x){\parallel \| }\hat{d}{\parallel }\alpha \,& \le - {\nabla }f(x)^{T} (\alpha \hat{d}) + \int\limits_{0}^{1} {\nabla }f[x + t\alpha \hat{d}]^{T} (\alpha \hat{d})dt \\ & \le \int\limits_{0}^{1} \left| {{\nabla }f[x + t\alpha \hat{d}]^{T} (\alpha \hat{d}) - {\nabla }f(x)^{T} (\alpha \hat{d})} \right|dt \\ & \le \int\limits_{0}^{1} \left\| {{\nabla }f[x + t\alpha \hat{d}] - {\nabla }f(x)} \right\|{\parallel }\alpha \hat{d}{\parallel }dt \\ & \le \nu \int\limits_{0}^{1} t{\parallel }\alpha \hat{d}{\parallel }^{2} dt \le \frac{\nu }{2}\alpha^{2} {\parallel }\hat{d}{\parallel }^{2} , \\ \end{aligned}$$

which immediately yields (16).

Loosely speaking, in order to suggest the reader the importance of Theorem 1, it can be rephrased in the following simple way. If the PSS D is available and the value f(x) cannot be decreased on points along all the directions in D, then it means that the iterate x is a stationary point. Indeed, in the latter case, from (16) we have \(\mathop {\lim }\nolimits_{\alpha \to 0} {\| \nabla }f(x) \| = 0\). Also note that if the PSS D is poor (i.e. cm(D) is small), then the bound on the gradient (16) is poor accordingly.

Since in our proposal we combine PSO with a line search-based derivative-free method, which relies on the use of PSSs, with out proposal we will be able to characterize stationarity conditions for a modified PSO scheme, without recurring to any information on the gradient \({\nabla }f(x)\). In the next section we describe some basic properties of the line search-based derivative-free method we couple with PSO.

5 Preliminaries on the Line Search-Based Method Adopted

We consider in this section the proposal by Lucidi and Sciandrone in 2002 [20], which includes some line search-based derivative-free methods. Since it is our intention to reduce the complexity of our proposal as much as possible, the next result represents a simplified version of the material of Lucidi and Sciandrone in 2002 [20].

Proposition 1

Let \(f:{{\rm I\!R}}^{\text{n}} \to {{\rm I\!R}}\), with f continuously differentiable in \({{\rm I\!R}}^{\text{n}}\) . Suppose that the points in the sequence {x k } are bounded. Suppose the directions \(d_{1}^{k} , \ldots ,d_{m}^{k}\) are bounded and form a positively spanning set of \({\rm I\!R}^{\text{n}}\). Then, the following stationarity condition holds

$$\mathop {\lim }\limits_{k \to \infty } {\| \nabla }f(x^{k} ){\| } = 0\qquad {\text{if and only if}}\qquad \mathop {\lim }\limits_{k \to \infty } \sum\limits_{j = 1}^{m} \hbox{min} \left\{ {0,{\nabla }f(x^{k} )^{T} d_{j}^{k} } \right\} = 0.$$
(19)

Let us consider the sequence {x k} in (19) and Theorem 1. By Proposition 1 necessary and sufficient conditions of stationarity for the sequence {x k} can be accomplished by simply exploiting at any iterate x k the function f(x) (through its directional derivative \({\nabla }f(x^{k} )^{T} d_{j}^{k}\)), along the search directions \(d_{1}^{k} , \ldots ,d_{m}^{k}\). Table 1 details a line search-based derivative-free method for unconstrained minimization, which uses the results of Proposition 1. In Lucidi and Sciandrone [20] a complete convergence analysis was developed for the Algorithm ls-df and the following conclusion was proved (see e.g. Proposition 5.1 in [20]).

Table 1 The line search-based derivative-free algorithm ls-df (see also Lucidi and Sciandrone 2002 [20])

Proposition 2

Suppose the directions \(d_{1}^{k} , \ldots ,d_{m}^{k}\) satisfy Proposition 1. Consider the sequence {x k} generated by the Algorithm ls-df and let the level set \(\fancyscript{L}_{0} = \{ x \in {{\rm I\!R}}^{\text{n}} :f(x) \le\;f(x^{0} )\}\) be compact. Then we have

$$\mathop {\liminf}\limits_{k \to \infty } \ {\| \nabla }f(x^{k} ){\| } = 0.$$
(20)

Note that the condition (20) is weaker than (19), and in principle is met only asymptotically by the Algorithm ls-df. Of course, recalling also the results in Theorem 1, a practical stopping condition of Algorithm ls-df could be obtained by monitoring the steplength \(\bar{\alpha }^{k}\) at Steps 2 and 3. The algorithm can stop when \(\bar{\alpha }^{k}\) becomes sufficiently small. Also observe that at Step 4 the point x k+1 might be computed for instance by any heuristic procedure; nevertheless, we can set in any case x k+1 ≡ y k, since convergence analysis does not require f(x k+1) < f(y k).

As a final consideration, note that relation (20) may be strongly strengthened by choosing a different (and computationally more expensive) strategy at Step 2 of the Algorithm ls-df. Indeed, instead of requiring that at Step 2 just one direction is of sufficient decrease for the objective function (i.e. f(x k + α k d k j ) ≤ f(x k) − γ(α k)2 for at least one index \(j \in \{ 1, \ldots ,m\}\)), we can exploit f(x) along all the directions \(\{ d_{1}^{k} , \ldots ,d_{m}^{k} \}\) in the PSS. The resulting algorithm is Algorithm ls-df + in Table 2. We recall that, for the sake of simplicity and in order to keep the computational cost as low as possible, we will couple PSO only with Algorithm ls-df. In the following proposition we summarize convergence properties also for Algorithm ls-df +: we remark that they are definitely stronger than the result in Proposition 2.

Table 2 The line search-based derivative-free algorithm ls-df + in Lucidi and Sciandrone (2002) [20]

Proposition 3

Suppose the directions \(d_{1}^{k} , \ldots ,d_{m}^{k}\) satisfy Proposition 1. Consider the sequence {x k} generated by the Algorithm ls-df + and let the level set \(\fancyscript{L}_{0} = \{x \in {\rm I\!R}^{\text{n}} :f(x) \le\;f(x^{0})\}\) be compact. Then we have

$$\mathop {\lim }\limits_{k \to \infty } {\| \nabla }f(x^{k} ){\| } = 0.$$
(21)

It is evident that in Algorithm ls-df + the stronger convergence result is obtained at the expense of a larger computational cost in Step 2. In addition, the procedure Line search () is aimed to determine a possible expansion of the steplength \(\alpha_{j}^{k}\).

6 A Hybrid Algorithm

In this section we propose a hybrid algorithm, obtained by coupling the PSO scheme described in Sect. 2 with the algorithm in Table 1. We remind the reader that the resulting method should be endowed with both the (local) convergence properties of Algorithm ls-df and the (global) strategies of exploration of PSO (see also Sect. 1).

Of course, it is far obvious that simply alternating a finite sequence of steps of PSO and a finite sequence of steps of Algorithm ls-df would provide a method satisfying a property similar to Proposition 2. However, the latter strategy might be a blind sequential application of two different algorithms, which does not exploit their peculiarities. On the contrary, we prefer to consider a scheme which at once both exploits (local strategy) and explores (global strategy) the objective function. Thus, we consider here a PSO-based method which attempts to detect a global minimum of the objective function, while retaining the asymptotic convergence properties of Algorithm ls-df.

Our proposal (namely Algorithm ls-df_pso) is summarized in Table 3 and its convergence properties to stationary points are summarized in the next proposition (see also [20]).

Table 3 The line search-based derivative-free algorithm ls-df_pso

Proposition 4

Suppose the directions d k1 ,…,d k m satisfy Proposition 1. Consider the sequence {x k } generated by the Algorithm ls-df_pso and let the level set \(\fancyscript{L}_{0} = \{ x \in {\rm I\!R}^{\text{n}} :f(x) \le\;f(x^{0} )\}\) be compact. Then we have

$$\mathop {\lim \,{\rm inf}}\limits_{k \to \infty } \ {\| \nabla }f(x^{k} ){\| } = 0.$$
(22)

Proof

Observe that the Algorithm ls-df and the Algorithm ls-df_pso differ only at Step 1 and Step 4. Indeed, Step 1 and Step 4 of Algorithm ls-df_pso are simply obtained from the corresponding steps of Algorithm ls-df, observing that the iterates y k (Step 1) and x k+1 (Step 4) are computed by using PSO. Therefore, convergence properties of the sequence generated by the Algorithm ls-df_pso are basically inherited from Proposition 2.

We conclude this section by highlighting that similarly to Algorithm ls-df_pso, instead of using Algorithm ls-df, we can couple PSO with the Algorithm ls-df +. The resulting hybrid scheme, would be much similar to Algorithm ls-df_pso but more expensive than the algorithm in Table 3. Moreover, it would be also endowed with convergence properties similar to those reported in Proposition 3, so that the condition (22), which is worth for Algorithm ls-df_pso, would be reinforced with a condition like (21).

7 Numerical Results

Numerical results are presented for the Rosenbrock function and for two real-world optimization problems in ship hydrodynamics. Specifically, a four-parameters shape optimization of a catamaran is investigated for (a) resistance reduction in calm water at fixed speed, and (b) expected resistance reduction in wave, taking into account stochastic sea state and speed. For all optimization problems, the deterministic implementation of PSO presented by Serani et al. in 2014 [28] is used for extension to ls-df_pso. A number of particles equal to 4n is used, with initialization over the variables domain by Hammersely sequence sampling, and PSO coefficients given by Clerc in 2006 [6], i.e., χ = 0.721, w = 1, \(cr=\bar{c}\bar{r}=1.655\) (see the work by Serani et al. in 2014 [28] for details). ls-df_pso parameters are set as h k  = 1, γ = 10−3, θ = 0.5, α k = 0.25 of the design variable range.

Specifically, the minimization in \({\rm I\!R}^{2}\) of the Rosenbrock function,

$$f(x,y) = (a - x)^{2} + b(y - x^{2} )^{2}$$
(23)

with n = 2, a = 1, b = 100, −20 ≤ x ≤ 20 and \(- 20 \le y \le 20\), is used as an explanatory test problem. Figure 2a shows the convergence of the ls-df_pso algorithm, compared to the standard PSO. Black squares indicate ls-df_pso iterations where the line search procedure LS is used to improve the optimum location. Figure 2b shows a comparison of the algorithms’ convergence in a close up of the variables domain. The global-optimum location history is depicted, along with the real minimum, which is located at x = 1, y = 1. The beneficial effects of using PSO with LS are evident, providing a faster and more effective convergence to the optimum, along with the identification of the proper region of the global optimum.

Fig. 2
figure 2

Rosenbrock function minimization: convergence of the objective function (a) and global-optimum location history (b)

The shape optimization of the Delft catamaran is shown as an example of industrial design problems. The Delft catamaran is a concept ship used for experimental and numerical benchmarks (see, e.g., the numerical studies presented by Diez et al. in 2013 [11]). Figure 3 shows the Delft catamaran model during towing tank experiments at CNR-INSEAN, along with a detail of the original hull shape. The design optimization problems are taken from the work by Chen et al. in 2014 [3] and Diez et al. in 2013 [9] respectively, and solved by means of stochastic radial-basis functions interpolation (details may be found in the work by Volpi et al. in 2014 [33]) of high-fidelity unsteady Reynolds-averaged Navier-Stokes (URANS) simulations. For the first problem, the design objective is the minimization of the resistance in calm water at Froude number equal to 0.5 (see [3] for details). For the second problem, the design objective is the reduction of the expected value of the mean resistance in head waves, taking into account stochastic sea state in the North Pacific ocean and variable speed (see [9] for details). For both problems, four design variables control global shape modifications, based on the Karhunen-Loève expansion of the shape modification vector [10].

Fig. 3
figure 3

Delft catamaran: towing tank experiments at CNR-INSEAN (a) and detail of the original geometry (b)

Figure 4a shows the convergence of the minimization procedure for the first problem, comparing PSO with ls-df_pso. LS is required only few times, as indicated by the black squares, and the minima provided by the two algorithms are extremely close, as shown in Figs. 4b and 5. Nevertheless, it may be noted that, at very reduced additional cost, ls-df_pso provides a solution certified with stationarity properties.

Fig. 4
figure 4

Minimization of calm-water resistance for the Delft catamaran: convergence of the objective function (a) and global-optimum design variables (b)

Fig. 5
figure 5

Minimization of calm-water resistance for the Delft catamaran: optimal shape design by PSO (a) and ls-df_pso (b)

Figure 6a shows the convergence of the minimization procedure for the second problem, comparing PSO with ls-df_pso. LS is required and applied a larger number of times than in the previous problem, and is essential to identify the global optimum, as shown in Fig. 6b. As a result, optimal shape designs provided by PSO and ls-df_pso are noticeably different (within the context of current application’s variation), as shown in Fig. 7. Additionally, it may be noted that the solution given by ls-df_pso is also endowed with stationarity properties.

Fig. 6
figure 6

Minimization of expected value of mean resistance in head waves for the Delft catamaran: convergence of the objective function (a) and global-optimum design variables (b)

Fig. 7
figure 7

Minimization of expected value of mean resistance in head waves for the Delft catamaran: optimal shape design by PSO (a) and ls-df_pso (b)

8 Conclusions

In this chapter we have detailed some globally convergent modifications of PSO iteration (2), for the solution of the unconstrained global optimization problem (1). Under mild assumptions, Proposition 4 proved that at least a subsequence of the iterates generated by our modified PSO, namely Algorithm ls-df_pso, converges to a stationary point, which is possibly a minimum point. We recall that using the standard PSO iteration, by no means we can guarantee convergence towards stationary points, unless we consider trivial cases of no practical interest. Thus, our result reinforces the theoretical properties of modified PSO schemes. To the best of our knowledge, our result is also among the first attempts to couple PSO with line search-based derivative-free schemes (see also [30, 31] for extensions to trust-region derivative-free approaches), where a modified PSO scheme is proved to satisfy conditions like (20) or (21).

On the basis of our experience, which seems confirmed by the results reported here, we are persuaded that a fruitful coupling of PSO with an iterative globally convergent derivative-free method, should yield a compromise, between the fast progress of PSO (global search) in the early iterations, and the capability to exploit (local search) the objective function.

We also have reported numerical experiences on a significant test function and two ship design problems, which confirm that ls-df_pso is more effective (and to a great extent equally efficient) than PSO. Indeed, ls-df_pso is able to achieve better solutions/designs, and provides stationarity properties at the associated optimal points.