1 Introduction

The derivative-free Nelder–Mead algorithm (NM) was introduced in 1965 for unconstrained optimization problems in \({\mathbb {R}}^n\) [38]. Although the algorithm is very popular, it fails to solve some optimization problems [17, 42]. In 1998, McKinnon [32] proposed a smooth strictly convex function in \({\mathbb {R}}^2\) on which the sequence of trial points produced by the NM algorithm fails to approach the unique minimizer.

Different variants using sufficient decrease in the objective function value were proposed [26, 37, 45] to guarantee convergence to a critical point in the smooth case.

Price et al. [40] propose a frame-based NM method [15] requiring sufficient decrease. Gao and Han [20] study the behavior of the NM algorithm on a convex objective function, and propose an efficient variant for larger problems with adaptive simplex parameters.

A stochastic version [11] is presented in the nonsmooth case. Brea [9] presents a NM inspired algorithm for nonlinear mixed problems. Bűrmen et al. [10] adapt NM so that the trial points are generated on an underlying mesh, i.e., a discretization of the space of variables. The present paper pushes further in that direction, but for inequality constrained optimization without derivatives. Specifically, the NM algorithm is inserted in the search step of the mesh adaptive direct search (Mads) algorithm [2, 3]. The resulting algorithm named Mads-NM benefits from the same hierarchical convergence analysis for nonsmooth constrained optimization as Mads.

The paper is structured as follows. Section 2 presents the original NM algorithm for unconstrained optimization. Section 3 gives a high-level description of the Mads algorithm, but focuses only on the elements necessary for the present work. Next, Sect. 4 proposes a way to insert a NM search step within Mads. Section 5 reports numerical experiments. Intensive tests are conducted on a collection of 87 test problems to tune the new algorithmic parameters. The resulting algorithm is then launched without any additional parameters, and is applied to three constrained blackbox engineering test problems. Our numerical experiments suggest that inserting a NM search step into the Mads algorithm significantly improves its performance.

2 The NM unconstrained optimization algorithm

There are many ways to present the NM algorithm for unconstrained optimization

$$\begin{aligned} \min _{x \in {\mathbb {R}}^n}&f(x) \end{aligned}$$

where \(f: {\mathbb {R}}^n \mapsto {\mathbb {R}}\) is the objective function. We present it in a way that might seem overly complicated, but it allows to simplify the presentation of the inequality-constrained case. The following definitions are introduced to order trial points.

Definition 2.1

A point \(x \in {\mathbb {R}}^n\) is said to dominate \(y \in {\mathbb {R}}^n\), denoted \(x \prec y\), if \(f(x) < f(y)\).

In the situation were two trial points share the same objective function value, (i.e., \(f(x) = f(y)\)), then instead of ordering them arbitrarily, we resort to the following determinist function that returns the oldest of the points.

Definition 2.2

A point \(x \in {\mathbb {R}}^n\) is said to be older than \(y \in {\mathbb {R}}^n\), if x was generated by the algorithm before y. The function \( \mathsf{Older} : {\mathbb {R}}^n \times {\mathbb {R}}^n \mapsto {\mathbb {R}}^n\)

$$\begin{aligned} \mathsf{Older}(x,y) = \left\{ \begin{array}{cl} x &{}\quad \text{ if } x \text{ was } \text{ generated } \text{ by } \text{ the } \text{ algorithm } \text{ before } y,\\ y &{}\quad \text{ otherwise. }\\ \end{array} \right. \end{aligned}$$

returns the oldest of two points.

Together with the definition of dominance, the function \(\mathsf{Older}\) allows us to introduce the following rule to compare two trial points.

Definition 2.3

The function \( \mathsf{Best} : {\mathbb {R}}^n \times {\mathbb {R}}^n \mapsto {\mathbb {R}}^n\)

$$\begin{aligned} \mathsf{Best}(x,y) = \left\{ \begin{array}{cl} x &{}\quad \text{ if } x \prec y \\ y &{}\quad \text{ if } y \prec x \\ \mathsf{Older}(x,y) &{}\quad \text{ if } f(x) = f(y) \\ \end{array} \right. \end{aligned}$$

returns the best of two points x and y in \({\mathbb {R}}^n\). A point \(x \in {\mathbb {R}}^n\) is said to be better than \(y \in {\mathbb {R}}^n\), if \(x = \mathsf{Best}(x,y)\).

The function \(\mathsf{Best}\) is clearly transitive, commutative and \(\mathsf{Best}(x,x)\) is well-defined as it returns x.

The NM algorithm can now be presented using the above definitions. Each iteration of the NM algorithm starts with a set \({\mathbb {Y}}= \{y^0, y^1, \ldots , y^n\}\) of \(n+1\) affinely independent points in \({\mathbb {R}}^n\) which defines a simplex. The simplex points are ordered so that \(y^{i-1}\) is better than \(y^i\) for every \(i=1,2,\ldots ,n\). The objective of each NM iteration is to replace the worst simplex point \(y^n\) by a better one. In order to achieve this, NM uses some of the following points and parameters:

$$\begin{aligned} \begin{array}{ll} x^c = \frac{1}{n}\sum _{i=0}^{n-1} y^i &{}\quad \hbox {the centroid of the }n\hbox { best points;} \\ x^r = x^c +(x^c - y^n) &{}\quad \hbox {the reflection of the worst point with respect to the}\\ &{} \qquad \hbox {centroid;}\\ x^e = x^c+\delta ^e(x^c - y^n) &{}\quad \hbox {where }\delta ^{e} \in \, ]1, \infty [\hbox { is the expansion parameter;} \\ x^{oc} = x^c+\delta ^{oc}(x^c - y^n) &{}\quad \hbox {where }\delta ^{oc} \in \, ]0, 1[\hbox { is the outside contraction parameter;} \\ x^{ic} = x^c+\delta ^{ic}(x^c - y^n) &{}\quad \hbox {where }\delta ^{ic} \in \, ]-1, 0[\hbox { is the inside contraction parameter;} \\ \gamma \in ]0, 1[ &{}\quad \hbox {is the shrink parameter.}\\ \end{array} \end{aligned}$$

At each iteration, NM either replaces the last point \(y^n\) of the ordered simplex by either \(x^r, x^e, x^{oc}\) or \(x^{ic}\), or it shrinks the simplex by replacing every vertex but the best one \(y^0\), so that the simplex becomes

$$\begin{aligned} {\mathbb {Y}}\leftarrow & {} \{y^0, y^0+\gamma (y^1-y^0), y^0+\gamma (y^2-y^0), \ldots , y^0+\gamma (y^n-y^0)\}. \end{aligned}$$
(1)

The most widely used values of the parameters are

$$\begin{aligned} \delta ^e=2, \quad \delta ^{ic}=-\frac{1}{2}, \quad \delta ^{oc}=\frac{1}{2} \quad \text{ and } \quad \gamma = \frac{1}{2}. \end{aligned}$$

The comparisons to determine the new simplex are made relatively to the following zones.

Definition 2.4

(Zones for unconstrained optimization) Let \({\mathbb {Y}}= \{ y^0, y^1, \ldots , y^n\}\) be an ordered simplex in \({\mathbb {R}}^n\). The trial point \(x\in {\mathbb {R}}^n\)

  • Belongs to the inside contraction zone if \(y^n\prec x\);

  • Else, it belongs to the expansion zone if \(x \prec y^0\);

  • Else, it belongs to the reflection zone if x dominates at least 2 points of \({\mathbb {Y}}\);

  • Else, it belongs to the outside contraction zone and x dominates 0 or 1 point of \({\mathbb {Y}}\).

The NM algorithm is compactly written in Algorithm 1.

figure a

Figure 1 provides a graphical interpretation of the mechanism of the NM algorithm. The vertical axis represents the objective function f, and the six circles represent the objective function values \(f(y^0)< f(y^1)< \cdots < f(y^5) = f(y^n)\). Replacing the vertex \(y^n\) by \(x^{oc}\) can only occur when \(x^r\) belongs to the outside contraction zone depicted in the figure, replacing \(y^n\) by \(x^{ic}\) can only occur when \(x^r\) belongs to the inside contraction zone, etc.

Fig. 1
figure 1

Zones relative to a 6-point simplex

3 The Mads constrained optimization algorithm

We consider the Mads algorithm with the progressive barrier (PB) [3] to handle inequality constraints and with dynamic scaling [8] to handle the varying magnitudes of the variables. The target class of optimization problems is

$$\begin{aligned} \min _{x \in X \subset {\mathbb {R}}^n}&f(x) \\ \text{ s.t. }&c(x) \le 0, \end{aligned}$$

where \(f: X \mapsto {\mathbb {R}}\cup \{\infty \}\) and \(c: X \mapsto ({\mathbb {R}}\cup \{\infty \})^m\) are functions with \(c = (c_1, c_2, \dots , c_m)^\top \), and X is some subset of \({\mathbb {R}}^n\). Notice that the functions are allowed to return the value \(\infty \), which is a convenient technique to represent evaluation failures. The entire feasible region is denoted by the set \(\Omega = \{ x \in X : c(x) \le 0 \}\). The Mads algorithm targets problems on which no assumptions on the smoothness of f or c are made [5].

The PB uses the constraint violation function [18, 19]

$$\begin{aligned} h(x) \ := \ \displaystyle \left\{ \begin{array}{ll} \displaystyle \sum _{j \in J} \left( \max \{c_j(x), 0\} \right) ^2 &{}\quad \text{ if } \ x \in X,\\ \infty &{}\quad \text{ otherwise, } \end{array}\right. \end{aligned}$$

where \(J = \{1,2,\ldots ,m\}\) is the set of indices of the constraints. The constraint violation function value h(x) is nonnegative, and equals zero if and only if the point x belongs to \(\Omega \).

Mads is a search-poll direct search iterative algorithm in which each iteration is constituted of two main steps. The search step can use various strategies (including utilization of surrogate functions and heuristics) to explore the space of variables. The poll step follows stricter rules and performs a local exploration in the space of variables in a region delimited by the poll size vector \(\Delta ^k\in {\mathbb {R}}^n_+\). In practice, a well conceived search step can allow the algorithm to escape locally optimal solutions (e.g., [1, 7]), and the poll step ensures practical and theoretical convergence [2].

Both these steps generate their candidate points on a discretization of the space of variables called the mesh. The fineness of the discretization is controlled by the mesh size vector \(\delta ^k \in {\mathbb {R}}^n_+\). Formally, at iteration k, the mesh is defined as follows:

$$\begin{aligned} M^k = V^k + \left\{ \mathop {\mathrm {diag}}(\delta ^k) z : z \in {\mathbb {Z}}^n\right\} . \end{aligned}$$
(2)

where the cache \(V^k\) is the set of points that were visited by the algorithm by the start of iteration k. In other words, a point x belongs to \(V^k\) if and only if both f(x) and h(x) were evaluated by the start of iteration k. The first set \(V^0\) is the set of one or more initial points supplied by the user.

The mesh and poll size vectors are updated at the end of each iteration. The entries of both vectors are reduced when the iteration fails to improve the incumbent solution, and otherwise, they are either increased or remain at the same value. Algorithm 2 provides a high-level description of Mads; the reader is invited to consult [8] for a thorough description.

figure b

The fundamental convergence result [3] of the Mads algorithm states that if the entire sequence of trial points belongs to a bounded set, and if the set of so-called refining directions is sufficiently rich, then there exists an accumulation point \(x^*\) such that the generalized directional derivative \(f^\circ (x^*; d)\) of Clarke [13] is nonnegative in every hypertangent [24] direction d to the domain \(\Omega \) at \(x^*\) provided that \(x^*\) is feasible. A similar result (involving h rather than f) holds in situations where the iterates never approach the feasible region: if the accumulation point \(x^*\) is infeasible, then the generalized directional derivative \(h^\circ (x^*; d)\) is nonnegative in every hypertangent direction d to the set X at \(x^*\).

These convergence results hold regardless of the search step of Algorithm 2, as long as the set \(S^k\) is finite and belongs to the mesh \(M^k\). In the present work, we make sure that these conditions are satisfied, which allows the convergence analysis to be unaltered.

4 The Mads-NM constrained optimization algorithm

We now present the NM-inspired Mads search step. Section 4.1 extends the definitions of the dominance relation \(\prec \) and of the function \(\mathsf{Best}\) to the constrained case. Section 4.2 focuses on the main Mads elements that need to be adjusted in order to implement the NM search feature. Section 4.3 describes the algorithmic specifications of the NM search step.

The design of this new algorithm involves a total of three parameters, whose default values are tuned in Sect. 5. These parameters are denoted by \(\pi = ( \pi _\mathsf{radius}, \pi _\mathsf{svd}, \pi _\mathsf{eval})\). They refer to a sampling radius, a threshold on singular values and a threshold on the number of function evaluations, respectively.

4.1 Ordering trial points

The NM algorithm for unconstrained optimization hinges on the notion of ordering vertices from best to worst, and at each iteration, the algorithm attempts to replace the worst one. Ordering the points in a simplex is easy in the absence of constraints, as only the objective function values need to be compared. Ordering gets more technical in constrained optimization. In order to define a transitive relation between the trial points, the definition of dominance as well as the function returning the best of two points are adapted. The definitions rely on both the objective and constraint violation functions f and h.

Definition 4.1

A point \(x \in {\mathbb {R}}^n\) is said to dominate \(y \in {\mathbb {R}}^n\), denoted \(x \prec y\), if

  • Both points are feasible and \(f(x) < f(y)\); or

  • Both points are infeasible and \(f(x) \le f(y)\) and \(h(x) \le h(y)\) with at least one strict inequality.

The above definition is coherent with the terminology used with the progressive barrier [3], in which the feasible and infeasible points were treated differentely.

In the situation where two trial points share the same function values, i.e., \(f(x) = f(y)\) and \(h(x)=h(y)\), then instead of ordering them arbitrarily, we resort once again to the determinist function \(\mathsf{Older}\) from Definition 2.2 returning the oldest of two points. Together with the definition of dominance, the function \(\mathsf{Older}\) allows us to introduce the following rule to compare two trial points x and y.

Definition 4.2

The function \(\mathsf{Best} : {\mathbb {R}}^n \times {\mathbb {R}}^n \mapsto {\mathbb {R}}^n\)

$$\begin{aligned} \mathsf{Best}(x,y) = \left\{ \begin{array}{cl} x &{}\quad \text{ if } x \prec y~ \text{ or } \text{ if } ~h(x)< h(y) \\ y &{}\quad \text{ if } y \prec x~ \text{ or } \text{ if } ~h(y) < h(x) \\ \mathsf{Older}(x,y) &{}\quad \text{ if } f(x) = f(y) \text{ and } h(x) = h(y)\\ \end{array} \right. \end{aligned}$$

returns the best of two points x and y in \({\mathbb {R}}^n\). A point \(x \in {\mathbb {R}}^n\) is said to be better than \(y \in {\mathbb {R}}^n\), if \(x = \mathsf{Best}(x,y)\).

Observe that with this definition, the comparison of a feasible point x with an infeasible one y always yields \(x = \mathsf{Best}(x,y)\). The comparison of two infeasible points where none dominates the other returns the most feasible one in terms of h.

Proposition 4.3

The function \(\mathsf Best\) is transitive over a set of trial points.

Proof

Consider the three trial points xy and z in \({\mathbb {R}}^n\) such that \(x = \mathsf{Best} (x,y)\) and \(y = \mathsf{Best} (y,z)\). We need to show that \(x = \mathsf{Best} (x, z)\). The analysis is divided into four cases.

  • First, if x and z are both feasible, then y is also feasible and therefore \(x = \mathsf{Best} (x, z)\).

  • Second, if x and z are both infeasible, then it follows that \(0 < h(x) \le h(y) \le h(z)\). Therefore, the only way that \(z = \mathsf{Best} (x, z)\) would be that

    $$\begin{aligned} f(x) = f(y) = f(z), \quad h(x) = h(y) = h(z) \quad \text{ and } z = \mathsf{Older} (x, z). \end{aligned}$$

    But if this were the case, then this would contradict \(x = \mathsf{Older} (x, y)\) and \(y = \mathsf{Older} (y, z)\).

  • Third, if x is feasible and z is infeasible, then the observation preceeding the statement of the proposition ensures that \(x = \mathsf{Best}(x,z)\).

  • Finally, the case where x is infeasible and z is feasible is impossible, because if it were the case, then y would not be feasible since \(x = \mathsf{Best}(x,y)\) and y would not be infeasible since \(y = \mathsf{Best}(y,z)\). \(\square \)

4.2 Construction of a simplex on a mesh

Section 4.3 defines a search step of Mads inspired by the NM algorithm. This NM search step uses elements present in the quadratic model search step proposed in [14]. In that paper, quadratic models of the objective and of each of the constraint functions are built and used to identify candidate trial points. But recall that an important requirement of the Mads algorithm is that every candidate point belongs to the mesh \(M^k\). The following notation is adopted. Let x be some point in \({\mathbb {R}}^n\). By appending the subscript \(\oplus \), we denote \(x_\oplus \in M^k\) as the mesh point which is the closest to x (ties are broken by rounding up), as detailed in [6]. We will say that \(x_\oplus \) is the point x rounded to the mesh. The quadratic models of [14] are constructed by considering all previously visited trial points within regions centred around incumbent solutions and of size related to \(\Delta ^k\). A similar region is considered in the present work.

With the progressive barrier algorithm [3], there are up to two incumbent solutions called the primary and secondary poll centers. One of them is the best feasible solution found so far by the algorithm, and the other one is the infeasible solution with the smallest value of h. For constructing the simplex, only the primary poll center is considered. Given the incumbent solution \(x^k\), the poll size vector \(\Delta ^k\) and the cache \(V^k\), define the set

$$\begin{aligned} {\mathbb {T}}_{\pi _\mathsf{radius}} \ = \ \left\{ x \in V^k : | x_i - x^k_i| \le \pi _\mathsf{radius} \Delta ^k_i, \ i\in \{1,2,\ldots , n \}\right\} , \end{aligned}$$

where \(\pi _\mathsf{radius} \ge 1 \) is a fixed parameter called the sampling radius (it is the first of the three new algorithmic parameters). As in Algorithm 1, the points of \({\mathbb {T}}_{\pi _\mathsf{radius}}\) are reordered so that \(x^{i-1} = \mathsf{Best}(x^{i-1}, x^i)\) for \(i =1,2,\ldots , p\).

Algorithm 3 uses the set \({\mathbb {T}}_{\pi _\mathsf{radius}}\) to attempt the iterative construction of a simplex \({\mathbb {Y}}\). It also uses \(\pi _\mathsf{svd}\), the second of the three new algorithmic parameters. The algorithm considers points of \({\mathbb {T}}_{\pi _\mathsf{radius}}\) in sequence, initializes \({\mathbb {Y}}\) so that it contains only \(x^0\) and then goes through the following steps before adding a point \(x^i\) to \({\mathbb {Y}}\). First, the matrix A formed by the columns of \(\{ y-x^0 : y \in {\mathbb {Y}}, y \ne x^0\}\) together with \(\{x^i - x^0\}\) is constructed. Second, in order to ensure that the final simplex is far from being degenerate relative to the poll size vector, it is required that each singular value of the matrix \(\mathop {\mathrm {diag}}(\Delta ^k)^{-1}A\) exceeds a given threshold \(\pi _\mathsf{svd}\).

figure c

4.3 The NM-search step

We now introduce and describe the NM-search step to insert into Algorithm 2. Given a sampling radius \(\pi _\mathsf{radius} \ge 1\) and a threshold parameter \(\pi _\mathsf{svd} > 0\), apply Algorithm 3 to attempt to build an ordered simplex \({\mathbb {Y}}= \{ y^0, y^1, \ldots , y^n\} \subset {\mathbb {R}}^n\) to initiate the search. If this attempt results in \({\mathbb {Y}}= \emptyset \), then abort the NM-search step.

The Mads-NM algorithm will iterate and attempt to replace \(y^n\) in \({\mathbb {Y}}\) by generating the centroid, the reflection, the outside contraction and/or the inside contraction as in the standard NM method. However, in order to satisfy the Mads requirement that every trial point belongs to the current mesh, the following notation rounds these new tentative points to the mesh:

\(x^r_\oplus \) :

is the reflection \(x^r\) rounded to the mesh;

\(x^e_\oplus \) :

is the expansion \(x^e\) rounded to the mesh;

\(x^{oc}_\oplus \) :

is the outside contraction \(x^{oc}\) rounded to the mesh;

\(x^{ic}_\oplus \) :

is the inside contraction \(x^{ic}\) rounded to the mesh.

Notice that the centroid \(x^c\) does not need to be rounded to the mesh, because the NM algorithm never evaluates function values at the centroid.

In the unconstrained NM algorithm, Fig. 1 illustrates the four zones used to compare the trial points. An endpoint of each zone is delimited by either the best simplex vertex \(y^0\) or by the worst one \(y^n\). Now, in the presence of constraints, we redefine these zones so that they reflect the contribution of both the objective function f and the constraint violation function h. In order to determine at which of these trial points the function values will be evaluated, we introduce the following subsets of the simplex \({\mathbb {Y}}\):

$$\begin{aligned} {\mathbb {Y}}^{0}= & {} \{ y \in {\mathbb {Y}}\; : \; \not \exists \, x \in {\mathbb {Y}} \text{ with } x \prec y\} \nonumber \\ {\mathbb {Y}}^{n}= & {} \{ y \in {\mathbb {Y}}\; : \; \not \exists \, x \in {\mathbb {Y}} \text{ with } y \prec x\}. \end{aligned}$$
(3)

The set \({\mathbb {Y}}^{0}\) contains all vertices that are not dominated by any other vertex. The set \({\mathbb {Y}}^{n}\) contains all vertices that do not dominate any other vertex. None of these sets can be empty, and they are not necessarily disjoint. The same logic could have been applied to NM for unconstrained optimization: The point \(y^0\) was undominated and \(y^n\) was the simplex point that did not dominate any other vertex.

The following is the counterpart of Definition 2.4, but for constrained optimization. It uses the new definition of dominance (Definition 4.1).

Definition 4.4

(Zones for constrained optimization) Let \({\mathbb {Y}}= \{ y^0, y^1, \ldots , y^n\}\) be an ordered simplex in \({\mathbb {R}}^n\), and let \({\mathbb {Y}}^{0}\) and \({\mathbb {Y}}^{n}\) be the subsets of \({\mathbb {Y}}\) from Equation (3). The trial point \(x\in {\mathbb {R}}^n\)

  • Belongs to the inside contraction zone if \(y^i\prec x\) for some \(y^i \in {\mathbb {Y}}^n\) or if \(h(x) > h(y^n)\);

  • Else, it belongs to the expansion zone if \(x \prec y^i\) for some \(y^i \in {\mathbb {Y}}^0\);

  • Else, it belongs to the reflection zone if x dominates at least 2 points of \({\mathbb {Y}}\);

  • Else, it belongs to the outside contraction zone and x dominates 0 or 1 point of \({\mathbb {Y}}\).

Figure 2 is the counterpart of Fig. 1 in the constrained case. Instead of only depicting the objective function on a real half-line, it takes into account the half-plane where the abscissa represents the non-negative constraint violation function h, and the ordinate is the objective function f. Figure 2 illustrates the sets \({\mathbb {Y}}^0\) and \({\mathbb {Y}}^n\) on a simplex with 16 elements. The 6 points represented by the symbol \(\odot \) are the undominated vertices and the 5 ones represented by the symbol are the vertices that do not dominate any other one.

Fig. 2
figure 2

Zones relative to a 16-point simplex

Algorithmic decisions are made wether the reflection point \(x^r_\oplus \) dominates a point of \({\mathbb {Y}}^0\) (the expansion zone in Fig. 2), \(x^r_\oplus \) is dominated by a point of \({\mathbb {Y}}^n\) (the inside contraction zone), \(x^r_\oplus \) dominates 2 or more points (the reflection zone), or \(x^r_\oplus \) dominates 0 or 1 points (the outside contraction zone). Recall that Definition 4.1 does not compare feasible with infeasible points, and therefore the figure is separated into two parts. The left part of the figure represents the zones for the feasible points, i.e., the points for which h equals zero. The rest of the figure concerns infeasible points.

The NM search step is compactly written as Algorithm 4.

figure e

There is a symmetry between this NM-search step and Algorithm 1, but there are three important differences. First, due to the operations that round to the mesh, it is possible that, after replacing the vertex \(y^n\), \({\mathbb {Y}}\) does not form a simplex anymore. If this is the case, then the NM-search terminates. Second, it is possible that the proposed value to replace the vertex \(y^n\) is a mesh point t that was previously visited (\(t \in V^k\)). In order to prevent cycling issues, the NM-search is terminated. Third, there is no need for the shrink parameter \(\gamma \) because the NM-search step terminates instead of shrinking the simplex. There are two reasons for not shrinking the simplex: -i- it could introduce up to n new trial points, which would increase significantly the cost of the search step; -ii- it would frequently generate points that are not on the current mesh.

When the NM-search step terminates, the Mads algorithm proceeds to the poll step as detailed in Algorithm 2. At the next Mads iteration, the NM-search step will be initiated from a new simplex determined by Algorithm 3 with the additional points generated during the poll step.

In addition to the parameters \(\pi _\mathsf{radius}\) and \(\pi _\mathsf{svd}\), we introduce a third algorithmic parameter called \(\pi _\mathsf{eval} \in {\mathbb {N}}\) that limits the number of function evaluations that each NM-search step can perform. In practice, this means that step 4 of the algorithm is invoked as soon as \(\pi _\mathsf{eval}\) function evaluations are performed. The counter is reset at each new NM-search step. For readability, this parameter is not presented in Algorithm 4. In the numerical experiments that follow in the next section, the regular NM parameters are set to the values \(\delta ^e=2\), \(\delta ^{ic}=-\frac{1}{2}\), \(\delta ^{oc}=\frac{1}{2}\), and the Mads algorithm with the NM search step of Algorithm 4 is denoted by Mads-NM.

5 Computational experiments

Computational experiments are conducted using the beta version 3.8.2 of the NOMAD [29] software package freely available at www.gerad.ca/nomad. All tests use the Mads strategy with \(n+1\) poll directions [6] with or without the the use of quadratic models. When both the quadratic model and NM searches are enabled, the former is performed first and the iteration opportunistically terminates in case of a success.

Data profiles [36] are presented below to assess if algorithms are successful in generating solution values close to the best objective function values. Identification of a successful run requires a convergence test. We denote \(x_e\) as the best feasible iterate obtained after e evaluations of one of the algorithm on one of the problems. The problem is said to be solved within the convergence tolerance \(\tau \) when

$$\begin{aligned} \bar{f}_\mathrm {fea}-f(x_e) \ge (1-\tau ) \left( \bar{f}_\mathrm {fea}-f^* \right) \end{aligned}$$

where \(f^*\) is the objective function value of the best feasible points obtained by all tested algorithms on all run instances of that problem. The value of \(\bar{f}_\mathrm {fea}\) is a common reference for a given problem obtained by averaging the first feasible objective function values, on all run instances of that problem for all algorithms. If no feasible iterate is obtained, the convergence test is failed.

Different instances are obtained by replicating algorithm runs with different pseudo-random generator seeds. In the present work, we consider that different initial points constitute different problems.

The horizontal axis of a data profile represents the number of evaluations for problems of fixed dimension, and represents groups of \(n+1\) evaluations when problems of different dimensions are involved. The vertical axis corresponds to the proportion of problems solved within a given tolerance \(\tau \). Each algorithm has its curve to allow comparison of algorithms capability to converge to the best objective function value.

The methodology of the numerical experiments is as follows.

Test on analytical problems are conducted in Sect. 5.1 to tune the algorithmic parameters. Section 5.2 compares the resulting algorithm to several well known implementations of the NM algorithm on unconstrained test problems. Finally, the algorithm is tested in Sects. 5.3, 5.4 and 5.5 on three constrained blackbox engineering test problems.

5.1 Preliminary experiments to calibrate parameters

Numerical experiments on analytical test problems are conducted to set default values to the three algorithmic parameters: The sampling radius \(\pi _\mathsf{radius}\), the singular value threshold \(\pi _\mathsf{svd}\) and the evaluation budget \(\pi _\mathsf{eval}\) of each NM call. In all figures below, the three parameters are compactly written as the triplet \(\pi = ( \pi _\mathsf{radius}, \pi _\mathsf{svd}, \pi _\mathsf{eval})\).

Mads-NM without quadratic models is tested on 87 analytical problems from the optimization literature. The characteristics and sources of theses problems are summarized in Table 1. The number of variables ranges from 2 to 20; 19 problems have constraints other than bound constraints.

Table 1 Description of the set of 87 analytical problems

A first set of runs was performed by only varying the singular value threshold \(\pi _\mathsf{svd} \in \{ 0.001, \, 0.005,\) \(0.01, \, 0.05, \, 0.1 \}\) while fixing \(\pi _\mathsf{eval} =20 n\) and \(\pi _\mathsf{radius} =4\). The parameter \(\pi _\mathsf{svd}\) has an important influence on the performance of Mads-NM, and the best performance is obtained with \(\pi _\mathsf{svd} = 0.01\). This value was fixed and the evaluation budget parameter was varied in a second set of runs: \(\pi _\mathsf{eval} \in \{ 5n, \, 10n, \, 20n,\) \(40n, \, 80n, \, 160n\}\). This parameter has a limited influence on the performance for values greater than 20n and \(\pi _\mathsf{eval} = 80n\) is the best value. In a third set of runs, the singular value threshold and evaluation budget parameters are fixed: \(\pi _\mathsf{svd} = 0.01\) and \(\pi _\mathsf{eval} = 80n\), and the sampling radius is varied: \(\pi _\mathsf{radius} \in \{ 1,\,2,\,4,\) \(8,\,16 \}\). Performances are similar for \(\pi _\mathsf{radius} \in \{8,16\}\). We have selected \(\pi _\mathsf{radius}=8\). Finally, varying separately \(\pi _\mathsf{svd}\) and \(\pi _\mathsf{eval}\) shows no improvement. These intensive computational tests were possible because the analytical formulations of the 87 test problems are available.

Figure 3a shows a representative sample of data profiles obtained for three distinct sets of parameters for Mads-NM. Similar plots were obtained with \(\tau =10^{-3}\) and \(\tau =10^{-7}\) but are not presented for readability. These experiments allow us to conclude that

$$\begin{aligned} \pi = (\pi _\mathsf{svd}, \pi _\mathsf{eval}, \pi _\mathsf{radius}) = (0.01, 80n, 8) \end{aligned}$$

is a satisfactory set of parameters for the problems at hand, and we use these parameters for all remaining tests.

Fig. 3
figure 3

Data profiles obtained with convergence tolerance \(\tau =10^{-5}\) on 10 replications of 87 analytical problems, with different NM settings (left) and with quadratic models (right). a Experiments with parameters \(\pi \) and b comparison with and without models

Figure 3b illustrates a second series of tests to evaluate the performance of Mads-NM with and without the use of quadratic models. The results demonstrate that Mads-NM outperforms Mads. Surprisingly Mads-NM without quadratic models performs better than Mads with quadratic models when the number of function evaluations exceeds \(\sim 350\times (n+1)\). The combined use of quadratic models and the NM search dominates all other algorithmic variants on these analytical problems.

5.2 Comparison with NM on unconstrained optimization

The Mads-NM algorithm can be launched on unconstrained optimization problems, and may be compared to Matlab’s fminsearch, an implementation [28] of the NM algorithm, fminsearch Adapt (same Matlab code as fminsearch but the expansion, contraction, and shrink parameters depend upon the dimension of the problem n as suggested in [20]) and gridNM , an implementation [10] of the grid restrained NM algorithm. In order to make this comparison, we extract from the previous 87 analytical problems, the ones without constraints other than bounds (\(m=0\) in Table 1). This yields a subset of 68 test problems.

With fminsearch, a single initial point is considered from which an initial simplex is automatically created and the algorithm does not depend on a random number generator. The same initial simplex selection as in fminsearch is used for the gridNM and fminsearch Adapt algorithms. For fair comparison, both the Mads and Mads-NM algorithms are run only once for all test problems.

The data profiles in Fig. 4 illustrate that gridNM , fminsearch Adapt and both Mads and Mads-NM without quadratic models outperform Matlab’s fminsearch. Again Mads-NM is the dominating algorithm.

Fig. 4
figure 4

Data profiles using Mads-NM, Mads, fminsearch, fminsearch Adapt, and gridNM with a convergence tolerance of \(\tau =10^{-5}\) on one replication of 68 test problems without constraints other than bounds

5.3 The LOCKWOOD problems

The Mads and Mads-NM algorithms with quadratic models are tested to solve the basic version of a pump-and-treat groundwater remediation problem from Montana Lockwood Solvent Groundwater Plume Site [31]. Each initial point defines a LOCKWOOD problem. Solving the problem consists in minimizing the operating costs subject to 2 constraints on the flux of two contaminant plumes obtained from the Bluebird simulator [16]. The problem has 6 design variables bounded in \([0;20000]^6\).

Figure 5 shows the optimization history when solving 20 LOCKWOOD problems from different initial points randomly selected in the hyper-rectangle \([0;20000]^6\). The graphs plot the evolution of the incumbent solution value versus the number of function evaluations. The right-hand-side plot zooms in on lower values of the objective function values. For 5 initial points, the Mads-NM algorithm fails to reach the feasible region and Mads fails 4 times. However, the figure shows that Mads-NM finds more solutions with a lower objective function value than Mads. This observation is also clearly apparent in the data profiles of Fig. 6.

Fig. 5
figure 5

Optimization history on 20 LOCKWOOD problems (right plot is a zoom on low objective values)

Fig. 6
figure 6

Data profiles obtained with convergence tolerance \(\tau \) on 20 LOCKWOOD problems. a \(\tau = 10^{-1}\) and b \(\tau = 10^{-2}\)

5.4 The MDO problems

The Mads and Mads-NM algorithms with quadratic models are tested to solve a simple multidisciplinary wing design optimization problem [44]. Each initial point defines a MDO problem. Solving the problem consists in maximizing the range of an aircraft subject to 10 constraints. The problem has 10 scaled design variables bounded in the hyper-rectangle \([0;100]^{10}\).

Figure 7a shows the optimization history when solving 20 MDO problems on different initial points using 1500 function evaluations or less. The initial points are integers randomly selected within the bounds. For 1 initial point, the Mads-NM algorithm fails to reach the feasible region and Mads fails 2 times. The plot of the optimization history and the data profiles from Fig. 7 show that the behaviour of both the Mads-NM and Mads algorithm is similar. For large number of evaluations, Mads-NM solves slightly more problems than Mads.

Fig. 7
figure 7

Mads and Mads-NM with quadratic models on 20 MDO problems. a Optimization history and b Data profiles with \(\tau =10^{-2}\)

In contrast with the previous subsections, the numerical experiments on the MDO problems do not reveal a dominant algorithm.

5.5 The STYRENE problems

The Mads and Mads-NM algorithms with quadratic models are tested to optimize a styrene production process [1]. The simulation of the chemical process relies on a series of interdependent calculation blocks using common numerical methods such as Runge-Kutta, Newton, fixed point and also chemical related solvers. Once the simulation of the process is successfully completed, the constraints and objective can be evaluated during a post-processing. The objective is to maximize the net present value of the styrene production process with 9 industrial and environmental regulations constraints. The simulation and post-processing are combined in a blackbox. If the simulation of the chemical process cannot succeed for an iterate, a blackbox evaluation failure is returned.

In this work, a STYRENE problem possesses eight 8 independent variables influencing the styrene production process. The variables considered during optimization are all scaled and bounded in \(X= [0;100]^8\). As a reference, we use the initial point \(x^\text {ref}\) provided in [1]. From this reference point, 20 STYRENE problems have been created by randomly generating integer initial point in an hyper-rectangle of radius 10 centred at \(x^\text {ref}\) (the points are projected on X if necessary). For all initial points, both the Mads and Mads-NM algorithms reach the feasible region from every initial point. The history plots of the Mads and Mads-NM runs using 1500 evaluations or less are presented in Fig. 8. Different local minimizers are reached by the algorithms, but the figure does not clearly indicate which algorithm is preferable.

Fig. 8
figure 8

Optimization history on 20 STYRENE problems

Data profiles with convergence tolerances of \(\tau =10^{-2}\) and \(10^{-3}\) are presented in Fig. 9. The profiles with \(\tau = 10^{-3}\) shows a dominance of Mads-NM over the Mads algorithm.

Fig. 9
figure 9

Data profiles obtained with two convergence tolerances on 20 STYRENE problems. a \(\tau =10^{-2}\) and b \(\tau =10^{-3}\)

6 Discussion

The paper introduces a way to extend the NM direct search algorithm so that it handles general inequality constraints. This is achieved by defining a NM search step within the Mads algorithm. The NM search points form simplices, ordered by a transitive relation involving both the objective and constraint violation function values. The simplices are reflected, expanded and contracted as in the original algorithm, except that the trial points are rounded to the current mesh. This last operation is necessary in order to inherit from the rich Mads hierarchical convergence analysis. Each NM search step is interrupted as soon as the vertices fail to form a simplex, or when a vertex is replaced by a previously visited point. The resulting algorithm is applicable to both unconstrained and inequality constrained blackbox optimization problems.

Numerical experiments on unconstrained optimization problems show that the resulting Mads-NM algorithm outperforms the Matlab implementations of NM (fminsearch and fminsearch Adapt), the gridNM algorithm, as well as the previous implementation of Mads with and without the use of quadratic models. Experiments on three constrained engineering problems suggest that the behaviour of Mads-NM and Mads is comparable during the first few hundreds of function evaluations. But as the number of function evaluations grows, the Mads-NM algorithm typically finds better feasible solutions than Mads. The difference in the percentage of problems solved in the data profiles is approximately \(5\%\) for the MDO problems (with \(\tau = 10^{-2}\)), \(25\%\) for the STYRENE problems (with \(\tau = 10^{-3}\)) and more than \(50\%\) for the LOCKWOOD problems (with \(\tau = 10^{-2})\).

The beta version 3.8.2 of the NOMAD blackbox optimization software uses the NM search step as described in the present paper, and contains all test problems considered with their reference initial point.