1 Introduction

The proposed solver searches for a global minimum of the optimization problem

$$\begin{aligned} \left\{ \begin{array}{rc} \displaystyle \min _x &{} f(x)\\ \text {s.t.:} &{} l^c \le c(x) \le u^c, \\ &{} l^h \le h(x) \le u^h, \\ &{} l^x \le x \le u^x, \\ \end{array} \right. \end{aligned}$$
(1)

where \(f: {\mathbb {R}}^n \rightarrow \mathfrak {R}\) might be a black box, \(c: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^q\) are black-box constraint functions and \(h: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^l\) are white-box constraint functions, i.e. their analytical expressions as well as their derivatives are available. The vectors \(l^c\), \(l^h\), \(u^c\) and \(u^h\) are lower and upper bounds on the constraints values c(x) and h(x), while \(l^x\) and \(u^x\) are bounds on the x variables, with \(l^c \in ({\mathbb {R}} \cup -\infty )^q\), \(l^h \in ({\mathbb {R}} \cup -\infty )^l\), \(u^c \in ({\mathbb {R}} \cup \infty )^q\), \(u^h \in ({\mathbb {R}} \cup \infty )^l\), \(l^x \in ({\mathbb {R}} \cup -\infty )^n\) and \(u^x \in ({\mathbb {R}} \cup \infty )^n\). We also address the case where there are no white-box constraint functions h. Finally, we assume that the bound constraints are unrelaxable, i.e. feasibility must be maintained throughout the iterations, while the other general constraints are relaxable. The definition of relaxable constraint employed in this paper follows the one proposed by Digabel and Wild (2015), i.e. a relaxable constraint is a constraint that does not need to be satisfied in order to obtain meaningful outputs from the simulations in order to compute the objective and the constraints.

When at least one of the functions in (1) has a closed form (i.e. either the objective function is a white box or there is at least one white-box constraint function in the problem), it is said to be a grey-box problem. If no information about the functions is given at all, which means that the objective function is a black box and that there are no white-box constraints, the problem is known as a black-box problem. Both grey-box and black-box optimization belong to the field of derivative-free optimization (DFO) (Audet and Hare 2017; Conn et al. 2009), where the derivatives of the functions are not available. DFO problems are encountered in real-life applications in various fields such as engineering, medicine, science and artificial intelligence. The black boxes are often the result of an expensive simulation or a proprietary code, in which case automatic differentiation (Griewank 2003; Griewank and Walther 2008) is not applicable.

Many optimizations methods have been developed for finding stationary points or local minima of (1) when both the objective and the constraints functions are black boxes (e.g., Amaioua et al. 2018; Audet et al. 2015; Bueno et al. 2013; Echebest et al. 2017; Lewis and Torczon 2002; Powell 1994; Sampaio and Toint 2015, 2016). However, a local minimum is not enough sometimes and so one needs to search for a global minimum (Floudas et al. 1999; Floudas 2000). A reduced number of methods have been proposed to find a global minimum of black-box problems with nonlinear constraints (see, for instance, Boukouvala et al. 2017; Jones et al. 1998; Regis and Shoemaker 2005; Regis 2011, 2014; Regis and Shoemaker 2007). Moreover, many global optimization methods for constrained black-box problems proposed in the literature or used in industry are unavailable to the public and are not open source. We refer the reader to the survey papers by Rios and Sahinidis (2013) and by Larson et al. (2019) and to the textbooks by Conn et al. (2009) and by Audet and Hare (2017) for a comprehensive review on DFO algorithms for different types of problems.

In the case of constrained grey-box problems, especially those found in industrial applications, it is a good idea to exploit the available information about the white boxes because such problems are usually hard to be solved (i.e. highly nonlinear, multimodal and with very expensive functions) and the available derivatives can be very helpful to drive the optimization solver towards local and global minima. Therefore, one would expect the solver to use any information given as input in order to attain the global minimum as fast as possible. Unfortunately, even less global optimization solvers exist for such problems today. A common approach of optimization researchers, engineers and practioners is to consider all the functions as black boxes and to use a black-box optimization algorithm to solve the problem. Two of the few methods that exploit the available information are ARGONAUT (Boukouvala et al. 2017) and a trust-region two-phase algorithm proposed by Bajaj et al. (2018). In ARGONAUT, the black-box functions are replaced by surrogate models and a global optimization algorithm is used to solve the problem to global optimality in order to find a lower bound while a local optimization algorithm is applied with different starting points to find the upper bounds. The surrogate models are updated only after the resolution of the problem by using the function values of the global minimum and local minima found by the solver within a clustering procedure that defines new sample points. After updating the models, the problem is solved again with the updated models and this process is repeated until convergence is declared. In the two-phase algorithm described by Bajaj et al. (2018), radial basis functions (RBF) are used as surrogate models for the black-box functions. Moreover, as in DEFT-FUNNEL, the self-correcting geometry approach proposed by (Scheinberg and Toint 2010) is applied to the management of the interpolation set. The algorithm of Bajaj et al. (2018) is composed of a feasibility phase, where the goal is to find a feasible point, and an optimization phase, where the feasible point found in the first phase is used as a starting point to find a global minimum.

The algorithm from Bajaj et al. (2018) is the one sharing more elements in common with DEFT-FUNNEL. However, these two methods differ since DEFT-FUNNEL combines a multistart strategy with a sequential quadratic optimization (SQO) algorithm in order to find a global minimum while the algorithm from Bajaj et al. (2018) applies a global optimization solver. Furthermore, DEFT-FUNNEL employs polynomial models rather than RBF models since the former is well suited for the SQO algorithm used in its local search. Despite the good performance of the algorithms proposed in Boukouvala et al. (2017) and Bajaj et al. (2018), neither is freely available or open source.

Contributions. This paper proposes a new global optimization solver for general constrained grey-box and black-box problems written in Matlab (MATLAB 2015b) that exploits any provided white-box functions given as inputs and that employs surrogate models built from polynomial interpolation in a trust-region-based SQO algorithm. Differently from the ARGONAUT approach, the surrogate models are updated during the optimization process as soon as new information from the evaluation of the functions at the iterates becomes available. Furthermore, the proposed solver, named DEFT-FUNNEL, is open source and freely available at the Github repository http://github.com/phrsampaio/deft-funnel. It is based on the works by Sampaio and Toint (2015, 2016) and it extends their original DFO algorithm to grey-box problems and to the search for a global minimum. As its previous versions, it does not require feasible starting points. To our knowledge, DEFT-FUNNEL is the first open-source global optimization solver for general constrained grey-box and black-box problems that exploits the derivative information available from the white-box functions. It is also the first one of the class of trust-funnel algorithms (Gould and Toint 2010) to be used in the search for global minima in both derivative-based and derivative-free optimization.

This paper serves also as the first release of the DEFT-FUNNEL code. Notice also that some modifications and additions have been made to the local search SQO algorithm with respect to the one presented in Sampaio and Toint (2016). In particular, some changes were done in the condition for the normal step calculation, in the criticality step and in the maintenance of the interpolation set, all of them being described in due course. Furthermore, we have also added a second-order correction step.

The extension to global optimization is based on the multi-level single linkage (MLSL) method (Kan and Timmer 1987a, b), a well-known stochastic multistart strategy that combines random sampling, a clustering-based approach for the selection of the starting points and local searches in order to identify all local minima under specific conditions. In DEFT-FUNNEL, it is used for selecting the starting points of the local searches done with the trust-funnel SQO algorithm.

Organization. The outline of this paper is as follows. Section 2 introduces the MLSL method while in Sect. 3 the DEFT-FUNNEL solver is presented in detail. In Sect. 4, numerical results on a set of benchmark problems for global optimization are shown and the performance of DEFT-FUNNEL is compared with those of other state-of-the-art algorithms in a black-box setting. Moreover, numerical results on a set of grey-box problems are also analyzed. Finally, some conclusions about the proposed solver are drawn in Sect. 5.

Notation. Unless otherwise specified, the norm \(\Vert \cdot \Vert \) is the standard Euclidean norm. Given any vector \(x\in {\mathbb {R}}^n\), we denote its i-th component by \(\left[ x\right] _i\). We define \(\left[ x\right] ^+=\max (0,x)\) where the \(\max \) operation is done componentwise. We let \(\mathcal{B}(z; \varDelta )\) denote the closed Euclidian ball centered at z, with radius \(\varDelta > 0\). Given any set \(\mathcal{A}\), \(|\mathcal{A}|\) denotes the cardinality of \(\mathcal{A}\). By \(\mathcal{P}_n^d\), we mean the space of all polynomials of degree at most d in \({\mathbb {R}}^n\). The Greek symbol \(\pi \) is used for the mathematical constant, also referred to as Archimedes’constant, as well as for defining the optimality measures of the normal and tangent subproblems, \(\pi _k^v\) and \(\pi _k^f\), respectively — both cases are explicitly stated for avoiding confusion. Finally, given any subspace \(\mathcal{S}\), we denote its dimension by \(dim(\mathcal{S})\).

2 Multi-level single linkage

The MLSL method (Kan and Timmer 1987a, b) is a stochastic multistart strategy originally designed for bound-constrained global optimization problems as below

$$\begin{aligned} \left\{ \begin{array}{rc} \displaystyle \min &{} f(x)\\ \text {s.t.:} &{} x\in \varOmega , \\ \end{array} \right. \end{aligned}$$
(2)

where \(\varOmega \subseteq {\mathbb {R}}^n\) is a convex, compact set containing global minima in its interior and is defined by lower and upper bounds. It was later extended to problems with general constraints by Sendín et al. (2009). As in most of stochastic multistart methods, MLSL consists of a global phase, where random points are sampled from a probabilistic distribution, and a local phase, where selected points from the global phase are used as starting points for local searches. MLSL aims at avoiding unnecessary and costly local searches that culminate in the same local minimum. To achieve this goal, sample points are drawn from an uniform distribution in the global phase and then a local search procedure is applied to each of them except if there is another sample point within a critical distance with smaller objective function value or if the current point is a previously detected local minimum. The method is fully described in Algorithm 2.1.

figure a

The ranking of the sample points at line 4 is optional and does not affect the convergence properties of the algorithm. It is mainly useful when \(\gamma <1\), which means that preference is given to the \(\gamma kN\) starting points having the lowest objective function values, as one might believe that they are more likely to be closer to a local minimum that those points with higher objective function values.

The critical distance \(r_k\) is defined as

$$\begin{aligned} r_k {\mathop {=}\limits ^\mathrm{def}}\pi ^{-1/2} \left( \varGamma \left( 1 + \frac{n}{2}\right) m(\varOmega ) \frac{\sigma \log kN}{kN} \right) ^{1/n}, \end{aligned}$$
(3)

for some \(\sigma >0\) and where \(\pi \) here is the Archimedes’ constant, \(\varGamma \) is the gamma function defined by \(\varGamma (x) = \int _0^\infty e^{-t}t^{x-1} \, dt\) and \(m(\varOmega )\) is the Lebesgue measure of the set \(\varOmega \). The details of this formula are elaborated on Kan and Timmer (1987a, 1987b) and are not part of the scope of this paper. The method is centred on the idea of exploring the region of attraction of all local minima, which is formally defined below.

Definition 1

Given a local search procedure \(\mathcal{P}\), we define a region of attraction \(\mathcal{R}(x^*)\) in \(\varOmega \) to be the set of all points in \(\varOmega \) starting from which \(\mathcal{P}\) will arrive at \(x^*\).

The ideal multistart method is the one that runs a local search only once at the region of attraction of every local minimum. However, two types of errors might occur in practice (Locatelli 1998):

  • Error 1. The same local minimum \(x^*\) has been found after applying local search to two or more points belonging to the same region of attraction of \(x^*\).

  • Error 2. The region of attraction of a local minimum \(x^*\) contains at least one sampled point, but local search has never been applied to points in this region.

In Kan and Timmer (1987a, 1987b), the authors demonstrate the following theoretical properties of MLSL that are directly linked to the errors above:

  • Property 1. [Theorem 8 in Kan and Timmer (1987a) and Theorem 1 in Kan and Timmer (1987b)] If \(\sigma >4\) in (3), then, even if the sampling continues forever, the total number of local searches ever started by MLSL is finite with probability 1.

  • Property 2. [Theorem 12 in Kan and Timmer (1987a) and Theorem 2 in Kan and Timmer (1987b)] If \(r_k\) tends to 0 with increasing k, then any local minimum \(x^*\) will be found within a finite number of iterations with probability 1.

Property 1 states that the number of possible occurrences of Error 1 is finite while Property 2 says that Error 2 never happens. Due to its strong theoretical results and good practical performance, MLSL became one of the most reliable and popular multistart methods of late.

Finally, we note that MLSL has already been applied into global black-box optimization problems with general constraints before (Armstrong and Favorite 2016; Sendín et al. 2009). In particular, Armstrong and Favorite (2016) proposes the method MLSL-MADS, which integrates MLSL with a mesh adaptive direct search (MADS) method (Audet and Dennis 2006) to find multiple local minima of an inverse transport problem involving black-box functions.

3 The DEFT-FUNNEL solver

First, the function \(z: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^{q+l}\) is defined as \(z(x) {\mathop {=}\limits ^\mathrm{def}}(c(x), h(x))\), i.e. it includes all the constraint functions of the original problem (1). Then, by defining \(f(x,s) {\mathop {=}\limits ^\mathrm{def}}f(x)\) and \(z(x,s) {\mathop {=}\limits ^\mathrm{def}}z(x) - s\), the problem (1) is rewritten as

$$\begin{aligned} \left\{ \begin{array}{rc} \displaystyle \min _{(x,s)} &{} f(x,s)\\ \text {s.t.:} &{} z(x,s) = 0, \\ &{} l^s \le s \le u^s, \\ &{} l^x \le x \le u^x, \\ \end{array} \right. \end{aligned}$$
(4)

where \(s\in {\mathbb {R}}^{q+l}\) are slack variables and \(l^s\in ({\mathbb {R}} \cup -\infty )^{q+l}\) and \(u^s\in ({\mathbb {R}} \cup \infty )^{q+l}\) are the lower and upper bounds of the modified problem with \(l^s = \left[ l^c \; l^h\right] ^T\) and \(u^s = \left[ u^c \; u^h\right] ^T\). We highlight that the rewriting of the original problem (1) as (4) is done within the solver and that the user does not need to interfere.

DEFT-FUNNEL is composed of a global search and a local search that are combined to solve the problem (4). In the next two sections, we elaborate on each of these search steps.

3.1 Global search

As mentioned previously, the global search in DEFT-FUNNEL relies on the MLSL multistart strategy. However, different from the Algorithm 2.1, the global search in DEFT-FUNNEL makes use of a merit function \(\varPhi \) rather than the objective function f in order to decide which starting points are selected for the local search. Moreover, the sampling of N random points is done within the set \(\left\{ x \in {\mathbb {R}}^n \; | \; l^x \le x \le u^x\right\} \). If there exists an index i such that \(l_i^x\) is not defined by the user, the negative value \(-10\) is used by default during the sampling. Analogously, if there exists an index i such that \(u_i^x\) is not defined by the user, the positive value 10 is used by default.

The global search is implemented in the function deft_funnel_multistart, which is called by typing the following line in the Matlab command window:

figure b

The inputs and outputs of deft_funnel_multistart are detailed in Table 1.

Table 1 Inputs and outputs of the function deft_funnel_multistart

The merit function \(\varPhi \) employed in DEFT-FUNNEL is the well-known \(\ell _1\) penalty functon which is defined as follows:

$$\begin{aligned} \varPhi (x) {\mathop {=}\limits ^\mathrm{def}}f(x) + \lambda \displaystyle \sum _{i=1}^{m}\left( \left[ z_i(x) - \left[ u^s\right] _i\right] ^+ + \left[ \left[ l^s\right] _i - z_i(x)\right] ^+\right) , \end{aligned}$$
(5)

where \(m = q+l\) is the total number of constraints excluding bound constraints on x, \(\lambda \) is the penalty parameter and \(z_i(x) = (c_i(x), h_i(x))\), \(i=1,\ldots ,m\). One of the advantages of this penalty function over others is that it is exact, that is, for sufficiently large values of \(\lambda \), the local minimum of \(\varPhi \) subject only to the bound constraints on the x variables is also the local minimum of the original constrained problem (1). Note that, although \(\varPhi \) is nondifferentiable, it is only used in the global search for selecting the starting points for the local searches.

The LocalSearch algorithm at line 7 is started by calling the function deft_funnel whose inputs and outputs are given in the next section.

3.2 Local search

Before describing in detail each component of the local search, we give a global view of its full algorithm in what follows. The algorithm is based on the one described in Sampaio and Toint (2016). It is a trust-region SQO method that makes use of a funnel upper bound \(v_k^{\max }\) on the infeasibility of the iterates in order to ensure convergence. At each iteration k, the iterate is defined by the point \((x_k,s_k)\), where \(x_k\in \mathcal{Y}_k\), with \(\mathcal{Y}_k\) being the interpolation set at iteration k. Every iterate satisfies the following bound constraints:

$$\begin{aligned} l^s\le & {} s_k \le u^s, \end{aligned}$$
(6)
$$\begin{aligned} l^x\le & {} x_k \le u^x. \end{aligned}$$
(7)

At each iteration, the local search algorithm checks if there are (nearly) active bounds at \(x_k\). If there is at least one active bound and if some conditions are satisfied, the local search algorithm calls itself recursively with a new subspace built from the active bound(s) as well as a new interpolation set that belongs to the new subspace. This step is done by the SubspaceMinimization subroutine, which is detailed in Sect. 3.2.2.

Next, the subroutine CriticalityStep checks if either the trust-region size is too small or the last search direction is too small. If neither holds, it checks if both feasibility and optimality have been achieved in the current subspace. If any of these criticality conditions is satisfied, the local search returns the current iterate \((x_k,s_k)\). The subroutine CriticalityStep is discussed in depth in Sect. 3.2.7.

If criticality has not been attained at \((x_k,s_k)\), a new step \(d_k {\mathop {=}\limits ^\mathrm{def}}(d^x_k,d^s_k)^T\) is computed. Each full step of the trust-funnel algorithm is decomposed as

$$\begin{aligned} d_k = \left( \begin{array}{c} d^x_k \\ d^s_k \end{array} \right) = \left( \begin{array}{c} n^x_k \\ n^s_k \end{array} \right) + \left( \begin{array}{c} t^x_k \\ t^s_k \end{array} \right) = n_k + t_k, \end{aligned}$$
(8)

where the normal step component \(n_k\) aims to improve feasibility and the tangent step component \(t_k\) reduces the objective function model without worsening the constraint violation up to first order. This is done by requiring the tangent step to lie in the null space of the approximated Jacobian of the constraints and by requiring the predicted improvement in the objective function obtained in the tangent step to not be negligible compared to the predicted change in f resulting from the normal step. The full composite step \(d_k\) is illustrated in Fig. 1. As it is explained in the next subsections, the computation of the composite step in the proposed algorithm does not involve the function z itself but rather its surrogate model.

Fig. 1
figure 1

Illustration of the composite step \(d_k = n_k + t_k\). The normal step \(n_k\) attempts to improve feasibility by reducing the linearized constraint violation at \((x_k, s_k)\), whereas the tangent step aims at minimizing the objective function without deteriorating the gains in feasibility obtained through the normal step. Here, \(A(x_k,s_k)\) denotes the Jacobian of z(xs) at \((x_k,s_k)\)

The step \(n_k\) is computed by the subroutine NormalStep, which considers a trust-region bound \(\varDelta _k^z\) whose update rule is based on the improvement on feasibility obtained at iteration k. The computation of the normal step is fully described in Sect. 3.2.3. The step \(t_k\) is computed by the subroutine TangentStep, which considers the trust-region bound \(\varDelta _k=\min [\varDelta ^f_k,\varDelta ^z_k]\), where \(\varDelta ^f_k\) is a trust-region bound that is updated according to the improvement on optimality obtained at iteration k. The calculation of the tangent step is described in detail in Sect. 3.2.4.

Depending on the contributions of the current iteration in terms of optimality and feasibility, the iteration is classified into three types: \(\mu \)-iteration, f-iteration and z-iteration. If \(d_k=0\), no contribution has been made to optimality or feasibility and thus iteration k is said to be a \(\mu \)-iteration. If iteration k has mainly contributed to optimality, it is said to be a f-iteration. Otherwise, it is defined as a z-iteration.

After having computed a trial point \(x_k + d_k\), the algorithm proceeds by checking the iteration type and whether the iteration was successful in a sense to be defined in Sects. 3.2.6 and 3.2.6. The iterate and the trust regions are then updated correspondingly, while the interpolation set is updated by the subroutine UpdateInterpolationSet according to a self-correcting geometry scheme described in Sect. 3.3. The subroutine \(\underline{f-\hbox {iteration}}\) is responsible for updating the trust regions \(\varDelta ^f_k\) and \(\varDelta ^z_k\) if iteration k is a f-iteration, while the subroutine \(\underline{z-\hbox {iteration}}\) does the same for \(\varDelta ^z_k\) if iteration k is a z-iteration. The details of both subroutines are given in Sects. 3.2.6 and 3.3. Finally, if \(\mathcal{Y}_k\) has been modified, the surrogate models are updated to satisfy the interpolation conditions for the new set \(\mathcal{Y}_{k+1}\), implying that new function evaluations are carried out for the additional point obtained at iteration k.

The complete local search algorithm is presented below, while its subroutines are fully explained in the next sections.

figure c

The index k indicates the iteration number, which is incremented in the end of iteration, at Step 8. However, whenever a recursive call takes place within SubspaceMinimization at Step 1, the iteration index k continues to be incremented within the new subspace. This means that the current value of k is passed as input to LocalSearch and that its final value is also returned by LocalSearch.

As it is explained in Sect. 3.2.7, the index i is incremented in CriticalityStep within an inner loop that aims at ensuring that the derivative information of the models is not too different from that of the original functions. Since i and k are incremented differently, they do not equal to each other.

The local search is started inside the multistart strategy loop in the global search, but it can also be called directly by the user in order to use DEFT-FUNNEL without multistart. This is done by typing the following line at the Matlab command window:

figure d

The inputs and outputs of the function deft_funnel are detailed below in Table 2. Many other additional parameters can be set directly in the function deft_funnel_set_parameters. Those are related to the trust-region mechanism, to the interpolation set maintenance and to criticality step thresholds.

Table 2 Inputs and outputs of the function deft_funnel

Once the initial interpolation set has been built using one of the methods described in the next subsection, the algorithm calls the function deft_funnel_main. In fact, deft_funnel serves only as a wrapper for the main function of the local search, deft_funnel_main, doing all the data preprocessing and paramaters setting needed in the initialization process. All the main steps such as the subspace minimization step, the criticality step and the computation of the new directions are part of the scope of deft_funnel_main.

3.2.1 Building the surrogate models

The local search algorithm starts by building an initial interpolation set either from a simplex or by drawing samples from an uniform distribution. The construction of the interpolation set is done in the function deft_funnel_build_initial_sample_set, which is called only once during a local search within deft_funnel. The choice between random sampling and simplex is done in deft_funnel when calling the function deft_funnel_set_parameters, which defines the majority of parameters of the local search. If random sampling is chosen, it checks if the resulting interpolation set is well poised and, if not, it is updated using the Algorithm 6.3 described in Chapter 6 in Conn et al. (2009), which is implemented in deft_funnel_repair_Y.

For the sake of simplicity, we assume henceforth that the objective function is also a black box. Let \(\mathcal{Y}_0=\{y^0,y^1,\ldots ,y^p\}\) be a poised set of sample points with an initial point \(x_0\in \mathcal{Y}_0\), where p denotes the cardinality of \(\mathcal{Y}_k\). As described in Sect. 3.3, p can increase over the iterations as the interpolation set is augmented with new trial points. The next step of our algorithm is to replace the objective function f(x) and the black-box constraint functions \(c(x)=(c_1(x),c_2(x),\ldots ,c_q(x))\) by surrogate models \(m^f(x)\) and \(m^c(x)=(m^{c_1}(x),m^{c_2}(x),\ldots ,m^{c_q}(x))\), respectively, built from the solution of the interpolation system

$$\begin{aligned} M(\phi , \mathcal{Y})\alpha _{\phi } = \varUpsilon (\mathcal{Y}), \end{aligned}$$
(9)

where

$$\begin{aligned} M(\phi ,\mathcal{Y}) = \begin{pmatrix} \phi _{0}(y^0) &{}\quad \phi _{1}(y^0) &{}\quad \cdots &{}\quad \phi _{b}(y^0) \\ \phi _{0}(y^1) &{}\quad \phi _{1}(y^1) &{}\quad \cdots &{}\quad \phi _{b}(y^1) \\ \vdots &{} \vdots &{}\quad \ddots &{} \vdots \\ \phi _{0}(y^p) &{}\quad \phi _{1}(y^p) &{}\quad \cdots &{}\quad \phi _{b}(y^p) \end{pmatrix}, \;\;\varUpsilon (\mathcal{Y}) = \begin{pmatrix} \varUpsilon (y^0) \\ \varUpsilon (y^1) \\ \vdots \\ \varUpsilon (y^p) \end{pmatrix}, \end{aligned}$$

where \(\phi = \{\phi _0, \dots , \phi _b\}\) is the basis of monomials in \(\mathcal{P}_n^d\) and \(\varUpsilon (x)\) is replaced by the objective function f(x), if building \(m^f(x)\), or some black-box constraint function \(c_j(x)\), if building \(m^{c_j}\), for \(j=1,\ldots ,q\).

If \(M(\phi ,\mathcal{Y})\) is square, (9) becomes an interpolation problem. If, in addition, \(M(\phi ,\mathcal{Y})\) is nonsingular for some basis \(\phi \in \mathcal{P}_n^d\), the set \(\mathcal{Y}\) is said to be poised for interpolation in \({\mathbb {R}}^n\). Poisedness is important since, given a function \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\) and a poised set \(\mathcal{Y}\), it follows that the interpolating polynomial m(x) exists and is unique (see, Conn et al. 2009, Lemma 3.2). In the underdetermined case, \(p<b\), the set \(\mathcal{Y}\) is said to be poised if \(M(\phi ,\mathcal{Y})\) has full row rank. As for the regression case, the set \(\mathcal{Y}\) is said to be poised for polynomial least-squares regression in \({\mathbb {R}}^n\) if the corresponding matrix \(M(\phi ,\mathcal{Y})\) has full column rank for some basis \(\phi \in \mathcal{P}_n^d\).

We consider underdetermined quadratic interpolation models that are fully linear and that are enhanced with curvature information along the optimization process. Since the linear system (9) is potentially underdetermined, the resulting interpolating polynomials may not be unique and so we provide to the user four approaches to construct the models \(m^f(x)\) and \(m^{c_j}(x)\) that can be chosen by passing a number from 1 to 4 to the input ‘whichmodel’: 1 - subbasis selection approach; 2 - minimum \(\ell _2\)-norm models; 3 - minimum Frobenius norm models; and 4 - regression (recommended for noisy functions).

The subbasis selection consists in considering only \(p+1\) columns of the matrix \(M(\phi ,\mathcal{Y})\) in the linear system (9) in order to have a square matrix, which is equivalent to choosing a subbasis \({\hat{\phi }}\) of \(\phi \) with only \(p+1\) elements. This is done in DEFT-FUNNEL by selecting the first \(p+1\) columns. In the second approach to build the models, the minimum \(\ell _2\)-norm solution of the system (9) is computed. In the third approach, we minimize the Frobenius norm of the Hessian of the surrogate model m(x) since it plays an important role on the error bounds for quadratic models (see Conn et al. 2009, Theorem 5.4). This is done by solving the following optimization problem in \(\alpha _{\phi }=(\alpha _L,\alpha _Q)\), where \(\alpha _L\) and \(\phi _L\) are related to the linear components of the natural basis \(\phi \), while \(\alpha _Q\) and \(\phi _Q\), to the quadratic ones:

$$\begin{aligned} \displaystyle \min&{\scriptstyle \frac{1}{2}}\Vert \alpha _Q\Vert _2^2 \nonumber \\ \text {s.t.:}&M(\phi _L, \mathcal{Y})\alpha _L + M(\phi _Q, \mathcal{Y})\alpha _Q = \varUpsilon (\mathcal{Y}). \end{aligned}$$
(10)

Finally, regression models are built by obtaining the least-squares solution of the system (9). Further details about each approach can be found in Conn et al. (2009). In DEFT-FUNNEL, minimum \(\ell _2\)-norm models are the default option since they obtained better performance in past numerical experiments than the others (Sampaio and Toint 2015, 2016).

If \(|\mathcal{Y}_0|=n+1\), where the initial interpolation points are affinely independent, a linear model rather than an underdetermined quadratic model is built for each function. The reason is that, despite both having error bounds that are linear in \(\varDelta \) for the first derivatives, the error bound for the latter includes also the norm of the model Hessian, as stated in Lemma 2.2 in Zhang et al. (2010), which makes it worse than the former.

Whenever \(n+1 < |\mathcal{Y}_k| \le (n+1)(n+2)/2 = p^{\max }\), the algorithm builds underdetermined quadratic models based on the choice of the user between the approaches described above. If regression models are considered instead, we set \(p^{\max } = (n+1)(n+2)\), which means that the sample set is allowed to have twice the number of sample points required for fully quadratic interpolation models. Notice that having a number of sample points larger than the required for quadratic interpolation can also worsen the local quality of the interpolation models as the sample set could contain points that are too far from the iterate, which is not ideal for models built for local approximation.

It is also possible to choose the initial degree of the models between fully linear, quadratic with a diagonal Hessian or fully quadratic. This is done within deft_funnel by setting the input argument cur_degree in the call to deft_funnel_set_parameters to one of the following options: model_size.plin, model_size.pdiag or model_size.pquad.

The interpolation system (9) is solved using a QR factorization of the matrix \(M(\phi , \mathcal{Y})\) within the function deft_funnel_computeP, which is called by deft_funnel_build_models.

In order to evaluate the error of the surrogate models and their derivatives with respect to the original functions f and c, we make use of the measure of well poisedness of \(\mathcal{Y}\) given below. Notice, however, that this definition is generalized for the interpolation, minimum-norm and regression cases and that the construction of the Lagrange polynomials differs for each case.

Definition 2

Let \(\mathcal{Y}=\{y^0,y^1,\ldots ,y^p\}\) be a poised set and \(\mathcal{P}_n^d\) be a space of polynomials of degree less than or equal to d on \({\mathbb {R}}^n\). Let \(\varLambda > 0\) and \(\{\ell _0(x),\ell _1(x),\ldots ,\ell _p(x)\}\) be the set of Lagrange polynomials associated with \(\mathcal{Y}\). Then, the set \(\mathcal{Y}\) is said to be \(\varLambda \)-poised in \(\mathcal{B}\in {\mathbb {R}}^n\) for \(\mathcal{P}_n^d\) if and only if

$$\begin{aligned} \displaystyle \max _{0\le i \le p}\displaystyle \max _{x\in \mathcal{B}} |\ell _i(x)| \le \varLambda . \end{aligned}$$
(11)

As it is shown in Conn et al. (2009), the error bound between at most fully quadratic models and the original functions as well as the error bound between their gradients depend linearly on the constant \(\varLambda \); the smaller it is, the better the interpolation models approximate the original functions. We also note that the error bounds for undetermined quadratic interpolation models are linear in the diameter of the smallest ball containing \(\mathcal{Y}\) for the first derivatives and quadratic for the function values.

The coefficients of each Lagrange polynomial are given by the solution of linear system

$$\begin{aligned} M(\phi , {\hat{\mathcal{Y}}})\lambda _j = e_{j+1}, \quad j=0,\dots ,p, \end{aligned}$$
(12)

where \(e_{j+1}\) is the (\(j+1\))-th column of the identity matrix of order \(q+1\) and \({\hat{\mathcal{Y}}}\) is a shifted and scaled version of \(\mathcal{Y}\) that is contained in a ball of radius one centered at the origin given by

$$\begin{aligned} {\hat{\mathcal{Y}}} = \{0, {\hat{y}}^1, \ldots , {\hat{y}}^p\} = \{0, (y^1 - y^0)/\varDelta , \dots , (y^p - y^0)/\varDelta \} \subset \mathcal{B}(0;1), \end{aligned}$$
(13)

where

$$\begin{aligned} \varDelta = \varDelta (\mathcal{Y}) = \max _{1\le i \le p} \Vert y^i-y^0\Vert . \end{aligned}$$
(14)

By scaling \(\mathcal{Y}\), the condition number of \(M(\phi , {\hat{\mathcal{Y}}})\), where \(\phi \) is the basis of monomials, can be used for measuring the well poisedness of the set \(\mathcal{Y}\) (see Conn et al. 2009, Chapter 3, page 48). Since the \(\varLambda \)-poisedness depends on the region \(\mathcal{B}\) in which poisedness is considered, we also shift the scaled sample set.

When \(p<b\), the set of Lagrange polynomials associated with \(\mathcal{Y}\) in Definition 2 is obtained by the minimum \(\ell _2\)-norm solution of (12). These Lagrange polynomials are an extension of the standard Lagrange polynomials of the interpolation case. On the other hand, when \(p>b\), the set of regression Lagrange polynomials for \(\mathcal{Y}\) is given by the least-squares solution of (12), or equivalently,

$$\begin{aligned} \min _{\lambda _j} \Vert M(\phi , {\hat{\mathcal{Y}}})\lambda _j - e_{j+1}\Vert ^2, \quad j=0,\dots ,p. \end{aligned}$$
(15)

Finally, since the constraint functions c(x) are replaced by surrogate models \(m^c(x)\) in the algorithm, we define the models \(m^z(x) {\mathop {=}\limits ^\mathrm{def}}(m^c(x),h(x))\) and \(m^z(x,s) {\mathop {=}\limits ^\mathrm{def}}m^z(x) - s\), which are those used for computing new directions.

3.2.2 Subspace minimization

In this subsection, we explain how the subspace minimization is employed in our algorithm. We define the subspace \(\mathcal{S}_k\) at iteration k as

$$\begin{aligned} \mathcal{S}_k {\mathop {=}\limits ^\mathrm{def}}\{x \in {\mathbb {R}}^n\;|\; \left[ x\right] _{i} = \left[ l^x\right] _{i} \;\text { for }\; i\in \mathcal{L}_k \;\text { and }\; \left[ x\right] _{i} = \left[ u^x\right] _{i} \;\text { for }\; i\in \mathcal{U}_k\}, \end{aligned}$$

where \(\mathcal{L}_k {\mathop {=}\limits ^\mathrm{def}}\{ i\;|\; \left[ x_{k}\right] _{i} - \left[ l^x\right] _{i} \le \epsilon _b\}\) and \(\mathcal{U}_k {\mathop {=}\limits ^\mathrm{def}}\{ i\;|\; \left[ u^x\right] _{i} - \left[ x_{k}\right] _{i} \le \epsilon _b\}\) define the index sets of (nearly) active variables at their bounds, for some small constant \(\epsilon _b >0\). If \(\mathcal{L}_k\ne \emptyset \) or \(\mathcal{U}_k\ne \emptyset \), a new well-poised interpolation set \(\mathcal{Z}_k\) is built by the function deft_funnel_choose_lin in the following way. First, it builds \(\mathcal{Z}_k\) by selecting all points in \(\mathcal{X}_k \cap \mathcal{S}_k\) that are inside the current trust region \(\mathcal{B}=\mathcal{B}(x_k;\varDelta _k)\), where \(\mathcal{X}_k\) is the set of all points obtained up to iteration k. Then, in order to complete the set with a total number of \(n_{\mathcal{S}}+1\) interpolation points to build a linear model in \(\mathcal{S}_k\), where \(n_{\mathcal{S}} = dim(\mathcal{S}_k) = n - |\mathcal{L}_k\cup \,\mathcal{U}_k|\), it proceeds by generating random points within \(\mathcal{B}\cap \mathcal{S}_k\) and including them in \(\mathcal{Z}_k\). In order to make \(\mathcal{Z}_k\) a \(\varLambda \)-poised set for a given \(\varLambda >1\), it applies the Algorithm 6.2 in Conn et al. (2009) for improving well poisedness via Lagrange polynomials. In practice, each new interpolation point \(y^j\) whose Lagrange polynomial \(\ell _j(x)\) satisfies the condition \(\max _{x\in \mathcal{B}\cap \mathcal{S}_k}|\ell _j(x)| > \varLambda \) is optimally replaced by a point that maximizes the corresponding Lagrange polynomial in \(\mathcal{B}\cap \mathcal{S}_k\). After a new \(\varLambda \)-poised interpolation set \(\mathcal{Z}_k\) has been built, a recursive call is made in order to solve the problem in the new subspace \(\mathcal{S}_k\) with the new set \(\mathcal{Z}_k\).

If the algorithm converges in a subspace \(\mathcal{S}_k\) with an optimal solution \(({\tilde{x}},{\tilde{s}})\), it checks if the latter is also optimal for the full-space problem, in which case the algorithm stops. If not, the algorithm continues by attempting to compute a new direction in the full space. This procedure is illustrated in Fig. 2.

Fig. 2
figure 2

Subspace minimization procedure. The algorithm calls itself recursively in the order to solve the problem in a new subspace. If convergence is attained, it goes back to check if the solution found is also optimal in the full space

The dimensionality reduction of the problem mitigates the chances of degeneration of the interpolation set when the sample points become too close to each other and often affinely dependent. Figure 3 gives an example of this scenario as the optimal solution is approached.

Fig. 3
figure 3

Illustration of a scenario where the interpolation set becomes degenerated as the optimal solution is approached. This example considers a two-dimensional problem with the bound constraint \(\left[ x\right] _2\ge 0\), which is active at the solution \(x^*\) and at the iterates close to it

In order to check the criticality in the full-space problem, a full-space interpolation set of degree \(n+1\) is built in an \(\epsilon \)-neighborhood around the point \(x^*_{\mathcal{S}}\), which is obtained by assembling the subspace solution \({\tilde{x}}\) and the \(|\mathcal{L}_k\,\cup \,\mathcal{U}_k|\) fixed components \(\left[ x_k\right] _i\), where \(i\in \mathcal{L}_k\,\cup \,\mathcal{U}_k\). The models \(m^f_k\) and \(m^c_k\) are then updated and the criticality step is entered.

The complete subspace minimization step is described in Algorithm 3.2 and it is implemented in the function deft_funnel_subspace_min, which is called inside deft_funnel_main.

figure e

As a final commentary on this section, notice that whenever a recursive call takes place in Step 3 of Algorithm 3.2, the iteration index k continues to be incremented within the new subspace, as already mentioned in Sect. 3.2. For this reason, the current value of k is retrieved in Step 4, after LocalSearch has returned a solution. In other words, if the iteration number equals \(k=iter\) for a certain iter before Step 3, it does not necessarily equals \(k=iter+1\) after Step 3.

3.2.3 The normal step

The normal step aims at reducing the constraint violation at (xs) as defined by

$$\begin{aligned} v(x,s) {\mathop {=}\limits ^\mathrm{def}}{\scriptstyle \frac{1}{2}}\Vert z(x,s) \Vert ^2. \end{aligned}$$
(16)

To ensure that the step \(n_k\) is normal to the approximately linearized constraint \(m^z(x_k,s_k) + J(x_k,s_k) n = 0\), where \(J(x,s){\mathop {=}\limits ^\mathrm{def}}(J(x)\;-I_m)\) is the Jacobian of \(m^z(x,s)\) with respect to (xs), the matrix \(I_m\) is the \(m\times m\) identity matrix and J(x) is the Jacobian of \(m^z(x)\) with respect to x, we require that

$$\begin{aligned} \Vert n_k \Vert _{\infty } \le \kappa _{n} \Vert m^z(x_k,s_k)\Vert , \end{aligned}$$
(17)

for some \(\kappa _{n} \ge 1\).

The computation of \(n_k\) is done by solving the constrained linear least-squares subproblem

$$\begin{aligned} \left\{ \begin{array}{rc} \displaystyle \min _{n=(n^x,n^s)} &{} {\scriptstyle \frac{1}{2}}\Vert m^z(x_k,s_k) + J(x_k,s_k) n\Vert ^2 \\ \text {s.t.:} &{} l^s \le s_k + n^s \le u^s, \\ &{} l^x \le x_k + n^x \le u^x,\\ &{} x_k + n^x \in \mathcal{S}_k, \\ &{} n \in \mathcal{N}_k, \end{array}\right. \end{aligned}$$
(18)

where

$$\begin{aligned} \mathcal{N}_k {\mathop {=}\limits ^\mathrm{def}}\{ n \in {\mathbb {R}}^{n+m} \mid \Vert n \Vert _{\infty } \le \min \left[ \,\varDelta ^z_k,\,\kappa _{n}\, \Vert m^z(x_k,s_k)\Vert \,\right] \, \}, \end{aligned}$$
(19)

for some trust-region radius \(\varDelta _k^z > 0\). Finally, a funnel bound \(v_k^{\max }\) is imposed on the constraint violation \(v_k {\mathop {=}\limits ^\mathrm{def}}v(x_k,s_k)\) for the acceptance of new iterates to ensure the convergence towards feasibility.

We notice that, although a linear approximation of the constraints is used for calculating the normal step, the second-order information of the quadratic interpolation model \(m^z(x,s)\) is still used in the SQO model employed in the tangent step problem as is shown next.

The subproblem (18) is solved within the function deft_funnel_normal_step, which makes use of an original active-set algorithm where the unconstrained problem is solved at each iteration in the subspace defined by the currently active bounds, themselves being determined by a projected Cauchy step. Each subspace solution is then computed using a SVD decomposition of the reduced matrix. This algorithm is implemented in the function deft_funnel_blls and is intended for small-scale bound-constrained linear least-squares problems.

3.2.4 The tangent step

The tangent step is a direction that improves optimality and it is computed by using a SQO model for the problem (4) after the normal step calculation. The quadratic model for the objective function is defined as

$$\begin{aligned} \psi _k((x_k,s_k) + d) {\mathop {=}\limits ^\mathrm{def}}m^f(x_k,s_k) + \langle g_k, d \rangle + {\scriptstyle \frac{1}{2}}\langle d, B_k d \rangle , \end{aligned}$$
(20)

where \(m^f(x_k,s_k) {\mathop {=}\limits ^\mathrm{def}}m^f(x_k)\), \(g_k {\mathop {=}\limits ^\mathrm{def}}\nabla _{(x,s)} m^f(x_k,s_k)\), and \(B_k\) is the approximate Hessian of the Lagrangian function

$$\begin{aligned} \mathcal{L}(x,s,\mu ,\xi ^s,\tau ^s,\xi ^x,\tau ^x)&= m^f(x,s) + \langle \mu , m^z(x,s)) \rangle + \langle \tau ^s, s - u^s \rangle + \langle \xi ^s, l^s - s \rangle \\&\quad \; + \langle \tau ^x, x - u^x \rangle + \langle \xi ^x, l^x - x \rangle \end{aligned}$$

with respect to (xs), given by

$$\begin{aligned} B_k = \begin{pmatrix} H_k + \sum _{i=1}^m [{\hat{\mu }}_k]_i Z_{ik} &{} 0 \\ 0 &{} 0 \end{pmatrix}, \end{aligned}$$
(21)

where \(\xi ^s\) and \(\tau ^s\) are the Lagrange multipliers associated to the lower and upper bounds, respectively, on the slack variables s, and \(\xi ^x\) and \(\tau ^x\) are the Lagrange multipliers associated to the lower and upper bounds on the x variables. In (21), \(H_k = \nabla ^2_{xx} m^f(x_k,s_k)\), \(Z_{ik} = \nabla ^2_{xx} m^z_{ik}(x_k,s_k)\) and the vector \({\hat{\mu }}_k\) may be viewed as a local approximation of the Lagrange multipliers with respect to the equality constraints \(m^z(x,s) = 0\).

By applying (8) into (20), we obtain

$$\begin{aligned} \psi _k((x_k,s_k) + n_k + t) = \psi _k((x_k,s_k) + n_k) + \langle g_k^{N}, t \rangle + {\scriptstyle \frac{1}{2}}\langle t, B_k t \rangle , \end{aligned}$$
(22)

where

$$\begin{aligned} g_k^{N} {\mathop {=}\limits ^\mathrm{def}}g_k + B_k\, n_k. \end{aligned}$$
(23)

Since (22) is a local approximation for the function \(f((x_k,s_k)+n_k+t)\), a trust region with radius \(\varDelta _k^f\) is used for the complete step \(d=n_k+t\):

$$\begin{aligned} \mathcal{T}_k {\mathop {=}\limits ^\mathrm{def}}\{ d \in {\mathbb {R}}^{n+m} \mid \Vert d \Vert _{\infty } \le \varDelta ^f_k \}. \end{aligned}$$
(24)

Moreover, given that the normal step was also calculated using local models, it makes sense to remain in the intersection of both trust regions, which implies that

$$\begin{aligned} d_k \in \mathcal{R}_k {\mathop {=}\limits ^\mathrm{def}}\mathcal{N}_k \cap \mathcal{T}_k {\mathop {=}\limits ^\mathrm{def}}\{ d \in {\mathbb {R}}^{n+m} \mid \Vert d \Vert _{\infty } \le \varDelta _k \}, \end{aligned}$$
(25)

where \(\varDelta _k = \min [ \varDelta _k^z, \varDelta _k^f]\).

In order to make sure that there is still enough space left for the tangent step within \(\mathcal{R}_k\), we first check if the following constraint on the normal step is satisfied:

$$\begin{aligned} \Vert n_k\Vert _{\infty } \le \kappa _{\mathcal{R}} \varDelta _k, \end{aligned}$$
(26)

for some \(\kappa _{\mathcal{R}} \in (0,1)\). If (26) holds, the tangent step is calculated by solving the following subproblem

$$\begin{aligned} \left\{ \begin{array}{rc} \displaystyle \min _{t=(t^x,t^s)} &{} \langle g_k^{N}, t \rangle + {\scriptstyle \frac{1}{2}}\langle t, B_k t \rangle \\ \text {s.t.:} &{} J(x_k,s_k) t = 0, \\ &{} l^s \le s_k + n^s_k + t^s \le u^s, \\ &{} l^x \le x_k + n^x_k + t^x \le u^x, \\ &{} x_k + n^x_k + t^x \in \mathcal{S}_k. \\ &{} n_k+t \in \mathcal{R}_k, \\ \end{array} \right. \end{aligned}$$
(27)

where we require that the new iterate \(x_k+d_k^x\) belongs to subspace \(\mathcal{S}_k\) and that it satisfies the bound constraints  (6) and (7). In the Matlab code, the tangent step is calculated by the function deft_funnel_tangent_step, which in turn calls either the Matlab solver linprog or our implementation of the nonmonotone spectral projected gradient method (Birgin et al. 2000) in deft_funnel_spg to solve the subproblem (27). The choice between both solvers is based on whether \(\Vert B_k\Vert \le \epsilon \), for a small \(\epsilon >0\), in which case we assume that the problem is linear and therefore linprog is used. Finally, as in the original trust-funnel method of Gould and Toint (2010), inexact tangent steps are allowed.

The f-criticality measure is defined as

$$\begin{aligned} \pi _k^f{\mathop {=}\limits ^\mathrm{def}}-\langle g_k^N, r_k \rangle , \end{aligned}$$
(28)

where \(r_k\) is the projected Cauchy direction obtained by solving the linear optimization problem

$$\begin{aligned} \left\{ \begin{array}{rc} \displaystyle \min _{r=(r^x,r^s)} &{} \langle g_k^N, r \rangle \\ \text {s.t.:} &{} J(x_k,s_k) r = 0, \\ &{} l^s \le s_k + n^s_k + r^s \le u^s, \\ &{} l^x \le x_k + n^x_k + r^x \le u^x, \\ &{} x_k + n^x_k + r^x \in \mathcal{S}_k. \\ &{} \Vert r\Vert _{\infty } \le 1. \end{array}\right. \end{aligned}$$
(29)

By definition, \(\pi _k^f\) measures how much decrease could be obtained locally along the projection of the negative of the approximate gradient \(g_k^N\) onto the nullspace of \(J(x_k,s_k)\) intersected with the region delimited by the bound constraints on \(x_k + n^x_k + r^s\) and \(s_k + n^s_k + r^x\) in (29). This measure is computed in deft_funnel_compute_optimality, which uses lingprog to solve the subproblem (29).

A new local estimate of the Lagrange multipliers \((\mu _k,\xi ^s_k,\tau ^s_k,\xi ^x_k,\tau ^x_k)\) is computed by solving the following problem:

$$\begin{aligned} \left\{ \begin{array}{rc} \displaystyle \min _{(\mu ,{\hat{\xi }}^s,{\hat{\tau }}^s,{\hat{\xi }}^x,{\hat{\tau }}^x)} &{} {\scriptstyle \frac{1}{2}}\Vert \mathcal{M}_k(\mu ,{\hat{\xi }}^s,{\hat{\tau }}^s,{\hat{\xi }}^x,{\hat{\tau }}^x) \Vert ^2\\ \text {s.t.:} &{} {\hat{\xi }}^s,{\hat{\tau }}^s,{\hat{\xi }}^x,{\hat{\tau }}^x \ge 0, \end{array} \right. \end{aligned}$$
(30)

where

$$\begin{aligned} \mathcal{M}_k(\mu ,{\hat{\xi }}^s,{\hat{\tau }}^s,{\hat{\xi }}^x,{\hat{\tau }}^x)&{\mathop {=}\limits ^\mathrm{def}}\left( \begin{array}{c} g_k^{N} \\ 0 \end{array} \right) + \left( \begin{array}{c} J(x_k)^T \\ -I_m \end{array} \right) \mu + \left( \begin{array}{c} 0 \\ I_{\tau }^s \end{array} \right) {\hat{\tau }}^s + \left( \begin{array}{c} 0 \\ -I_{\xi }^s \end{array} \right) {\hat{\xi }}^s \\&\quad + \left( \begin{array}{c} I_{\tau }^x \\ 0 \end{array} \right) {\hat{\tau }}^x + \left( \begin{array}{c} -I_{\xi }^x \\ 0 \end{array} \right) {\hat{\xi }}^x, \end{aligned}$$

the matrices \(I_{\xi }^s\) and \(I_{\tau }^s\) are obtained from \(I_m\) by removing the columns whose indices are not associated to any active (lower and upper, respectively) bound at \(s_k + n^s_k\), the matrices \(I_{\xi }^x\) and \(I_{\tau }^x\) are obtained from the \(n\times n\) identity matrix by removing the columns whose indices are not associated to any active (lower and upper, respectively) bound at \(x_k + n^x_k\), and the Lagrange multipliers \(({\hat{\xi }}^s,{\hat{\tau }}^s,{\hat{\xi }}^x,{\hat{\tau }}^x)\) are those in \((\xi ^s,\tau ^s,\xi ^x,\tau ^x)\) associated to active bounds at \(s_k + n^s_k\) and \(x_k + n^x_k\). All the other Lagrange multipliers are set to zero.

The subproblem (30) is also solved using the active-set algorithm implemented in the function deft_funnel_blls.

3.2.5 Which steps to compute and retain

The algorithm computes normal and tangent steps depending on the measures of feasibility and optimality at each iteration. Differently from Sampaio and Toint (2015, 2016), where the computation of the normal steps depends on the measure of optimality, here the normal step is computed whenever the following condition holds

$$\begin{aligned} \Vert z(x_k,s_k)\Vert > \epsilon , \end{aligned}$$
(31)

for some small \(\epsilon >0\) (i.e. preference is always given to feasibility). This choice is based on the fact that, in many real-life problems with expensive functions and a small budget, one seeks to find a feasible solution as fast as possible and that a solution having a smaller objective function value than the current strategy is already enough. If (31) fails, we set \(n_k=0\).

We define a v-criticality measure that indicates how much decrease could be obtained locally along the projection of the negative gradient of the Gauss-Newton model of \(m^z\) at \((x_k,s_k)\) onto the region delimited by the bounds as

$$\begin{aligned} \pi _k^v {\mathop {=}\limits ^\mathrm{def}}-\langle J(x_k,s_k)^T z(x_k,s_k), b_k \rangle , \end{aligned}$$

where the projected Cauchy step \(b_k\) is given by the solution of

$$\begin{aligned} \left\{ \begin{array}{rc} \displaystyle \min _{b=(b^x,b^s)} &{} \langle J(x_k,s_k)^T z(x_k,s_k), b \rangle \\ \text {s.t.:} &{} l^s \le s_k + b^s \le u^s, \\ &{} l^x \le x_k + b^x \le u^x, \\ &{} x_k + b^x \in \mathcal{S}_k, \\ &{} \Vert b\Vert _{\infty } \le 1. \end{array}\right. \end{aligned}$$
(32)

We say that \((x_k,s_k)\) is an infeasible stationary point if \(z(x_k,s_k) \ne 0\) and \(\pi _k^v=0\), in which case the algorithm terminates.

The procedure for the calculation of the normal step is given in the algorithm below. In the code, it is implemented in the function deft_funnel_normal_step, which calls the algorithm in deft_funnel_blls in order to solve the normal step subproblem (18).

figure f

If the solution of (29) is \(r_k=0\), then by (28) we have \(\pi _k^f=0\), in which case we set \(t_k=0\). If the current iterate is farther from feasibility than from optimality, i.e., for a given a monotonic bounding function \(\omega _t\), the condition

$$\begin{aligned} \pi _k^f > \omega _t(\Vert z(x_k,s_k)\Vert ) \end{aligned}$$
(33)

fails, then we skip the tangent step computation by setting \(t_k=0\).

After the computation of the tangent step, the usefulness of the latter is evaluated by checking if the conditions

$$\begin{aligned} \Vert t_k\Vert > \kappa _{\mathcal{Z}\mathcal{S}}\Vert n_k\Vert \end{aligned}$$
(34)

and

$$\begin{aligned} \delta _k^f {\mathop {=}\limits ^\mathrm{def}}\delta ^{f,t}_k + \delta ^{f,n}_k \ge \kappa _\delta \delta ^{f,t}_k, \end{aligned}$$
(35)

where

$$\begin{aligned} \delta ^{f,t}_k {\mathop {=}\limits ^\mathrm{def}}\psi _k((x_k,s_k)+n_k) -\psi _k((x_k,s_k)+n_k+t_k) \end{aligned}$$
(36)

and

$$\begin{aligned} \delta ^{f,n}_k {\mathop {=}\limits ^\mathrm{def}}\psi _k(x_k,s_k) -\psi _k((x_k,s_k)+n_k), \end{aligned}$$
(37)

are satisfied for some \(\kappa _{\mathcal{Z}\mathcal{S}} > 1\) and \(\kappa _\delta \in (0,1)\). The inequality (35) indicates that the predicted improvement in the objective function obtained in the tangent step is substantial compared to the predicted change in f resulting from the normal step. If (34) holds but (35) does not, the tangent step is not useful in the sense just discussed, and we choose to ignore it by resetting \(t_k=0\).

The tangent step procedure is stated in Algorithm 3.4 and is implemented in the function deft_funnel_tangent_step.

figure g

3.2.6 Iteration types

As mentioned previously, each iteration is classified into one of three types depending on the contributions made in terms of optimality and feasibility, namely: \(\mu \)-iteration, f-iteration and z-iteration. This is done by checking if some conditions hold for the trial point defined as

$$\begin{aligned} (x_k^+,s_k^+) {\mathop {=}\limits ^\mathrm{def}}(x_k,s_k) + d_k. \end{aligned}$$
(38)

3.2.6.1 \(\mu \)-iteration

If \(d_k=0\), then it means that \(n_k = 0\), that is, (26) holds. In this case, the Lagrange multipliers estimates are potentially the only new values that have been computed. For this reason, iteration k is said to be a \(\mu \)-iteration with reference to the Lagrange multipliers \(\mu \) associated to the constraints \(m^z(x,s)=0\). Notice, however, that not only new \(\mu _k\) values have been computed, but all the other Lagrange multipliers \((\xi ^s_k,\tau ^s_k,\xi ^x_k,\tau ^x_k)\) as well.

In this case, we set \((x_{k+1},s_{k+1}) = (x_k,s_k)\), \(\varDelta _{k+1}^f = \varDelta _k^f\), \(\varDelta _{k+1}^z = \varDelta _k^z\), \(v_{k+1}^{\max } = v_k^{\max }\) and we use the new multipliers to build a new SQO model in (20).

Since null steps \(d_k=0\) might be due to the poor quality of the interpolation models, we check the \(\varLambda \)-poisedness in \(\mu \)-iterations and attempt to improve it whenever the following condition holds

$$\begin{aligned} \varLambda \, \varDelta (\mathcal{Y}_k)>\epsilon _{\mu }, \end{aligned}$$
(39)

where

$$\begin{aligned} \varDelta (\mathcal{Y}_k) {\mathop {=}\limits ^\mathrm{def}}\displaystyle \max _j \Vert y^{k,j}-x_k\Vert , \end{aligned}$$

\(\varLambda \) is estimated by solving the maximization problem (11) present in Definition 2 and \(\epsilon _{\mu }>0\). The inequality (39) gives an estimate of the error bound for the interpolation models. If (39) holds, we try to reduce the value at the left side by modifying the sample set \(\mathcal{Y}_k\). Firstly, we choose a constant \(\xi \in (0,1)\) and replace all points \(y^{k,j}\in \mathcal{Y}_k\) such that

$$\begin{aligned} \Vert y^{k,j}-x_k\Vert >\xi \varDelta (\mathcal{Y}_k) \end{aligned}$$

by new points \(y^{k,j}_{*}\) that (approximately) maximize \(|\ell _{j_k}(x)|\) in \(\mathcal{B}(x_k;\xi \varDelta (\mathcal{Y}_k))\). Then we use the Algorithm 6.3 described in Chapter 6 in Conn et al. (2009) with the smaller region \(\mathcal{B}\) to improve \(\varLambda \)-poisedness of the new sample set. This procedure is implemented in the Matlab code by the function deft_funnel_repair_sample_set.

3.2.6.2 f-iteration

If iteration k has mainly contributed to optimality, it is called an f-iteration. Formally, this happens when \(t_k \ne 0\), (35) holds, and

$$\begin{aligned} v(x_k^+,s_k^+) \le v_k^{\max }. \end{aligned}$$
(40)

Convergence of the algorithm towards feasibility is ensured by condition (40), which limits the constraint violation with the funnel bound.

In this case, we set \((x_{k+1},s_{k+1}) = (x_k^+,s_k^+)\) if

$$\begin{aligned} \rho _k^f {\mathop {=}\limits ^\mathrm{def}}\frac{f(x_k,s_k) - f(x_k^+,s_k^+)}{\delta ^f_k} \ge \eta _1, \end{aligned}$$
(41)

for \(\eta _1\in (0,1)\), and \((x_{k+1},s_{k+1}) = (x_k,s_k)\), otherwise. Note that \(\delta _k^f>0\) (because of (36) and (35)) unless \((x_k,s_k)\) is first-order critical, and hence condition (41) is well defined. As for the value of the funnel bound in this case, we set \(v_{k+1}^{\max } = v_k^{\max }\).

Since our method can suffer from the Maratos effect (Maratos 1978), we also apply a second-order correction (see Section 15.6, in Nocedal and Wright 2006, for more details) to the normal step whenever the complete direction \(d_k\) is unsuccessful at improving optimality, i.e., whenever the condition (41) fails. As the latter might be due to the inadequate local approximation of the constraint functions, this effect may be overcome with a second-order step \({\hat{n}}\) that is calculated in deft_funnel_sec_order_correction by solving the following subproblem

$$\begin{aligned} \left\{ \begin{array}{rc} \displaystyle \min _{{\hat{n}}=({\hat{n}}^x,{\hat{n}}^s)} &{} {\scriptstyle \frac{1}{2}}\Vert m^z(x_k^+,s_k^+) + J(x_k,s_k) {\hat{n}}\Vert ^2 \\ \text {s.t.:} &{} l^s \le s_k^+ + {\hat{n}}^s \le u^s, \\ &{} l^x \le x_k^+ + {\hat{n}}^x \le u^x, \\ &{} x_k^+ + {\hat{n}}^x \in \mathcal{S}_k, \\ &{} {\hat{n}} \in \hat{\mathcal{N}_k}, \end{array}\right. \end{aligned}$$
(42)

where

$$\begin{aligned} \hat{\mathcal{N}_k} {\mathop {=}\limits ^\mathrm{def}}\{ {\hat{n}} \in {\mathbb {R}}^{n+m} \mid \Vert {\hat{n}} \Vert _{\infty } \le \min \left[ \,\varDelta ^z_k,\,\kappa _{n}\, \Vert m^z(x_k^+,s_k^+)\Vert \,\right] \, \}. \end{aligned}$$
(43)

After the second-order step \({\hat{n}}\) has been computed, the conditions

$$\begin{aligned} \begin{array}{c} v((x_k^+,s_k^+) + {\hat{n}}) \le v_k^{\max } \end{array} \end{aligned}$$
(44)

and

$$\begin{aligned} \begin{array}{c} \rho _k^{fC} {\mathop {=}\limits ^\mathrm{def}}\displaystyle \frac{f(x_k,s_k) - f((x_k^+,s_k^+) + {\hat{n}})}{\delta ^f_k} \ge \eta _1 \end{array} \end{aligned}$$
(45)

still need to be satisfied for the f-iteration with the augmented step \((x_k^+,s_k^+) + {\hat{n}}\) to be successful. If (44) and (45) hold, we set \((x_{k+1},s_{k+1}) = (x_k^+,s_k^+) + {\hat{n}}\).

3.2.6.3 z-iteration

If iteration k is neither a \(\mu \)-iteration nor a f-iteration, then it is said to be a z-iteration. This means that the major contribution of iteration k is to improve feasibility, which happens when either \(t_k=0\) or when (35) fails.

The trial point is accepted if the improvement in feasibility is comparable to its predicted value

$$\begin{aligned} \delta ^z_k {\mathop {=}\limits ^\mathrm{def}}{\scriptstyle \frac{1}{2}}\Vert m^z(x_k,s_k)\Vert ^2 - {\scriptstyle \frac{1}{2}}\Vert m^z(x_k,s_k) + J(x_k,s_k) d_k \Vert ^2, \end{aligned}$$

and the latter is itself comparable to its predicted decrease along the normal step, that is

$$\begin{aligned} n_k \ne 0,\; \text { } \delta _k^z \ge \kappa _{zn} \delta ^{z,n}_k \;\; \text{ and } \;\; \rho _k^z {\mathop {=}\limits ^\mathrm{def}}\frac{v(x_k,s_k) - v(x_k^+,s_k^+)}{\delta ^z_k} \ge \eta _1, \end{aligned}$$
(46)

for some \(\kappa _{zn}\in (0,1)\) and where

$$\begin{aligned} \delta ^{z,n}_k&{\mathop {=}\limits ^\mathrm{def}}&{\scriptstyle \frac{1}{2}}\Vert m^z(x_k,s_k)\Vert ^2 - {\scriptstyle \frac{1}{2}}\Vert m^z(x_k,s_k)+ J(x_k,s_k) n_k\Vert ^2. \end{aligned}$$
(47)

If (46) is not satisfied, the trial point \((x_k^+,s_k^+)\) is rejected.

Finally, the funnel bound is updated as follows

$$\begin{aligned} v_{k+1}^{\max } = \left\{ \begin{array}{ll} \max \left[ \kappa _{tx1} v_k^{\max }, v(x_k^+,s_k^+) + \kappa _{tx2}( v(x_k,s_k) - v(x_k^+,s_k^+)) \right] &{} \text{ if } (46) \text{ hold, }\\ v_k^{\max } &{} \text{ otherwise, } \end{array}\right. \end{aligned}$$
(48)

for some \(\kappa _{tx1} \in (0,1)\) and \(\kappa _{tx2}\in (0,1)\).

3.2.7 Criticality step

Two different criticality steps are employed: one for the subspaces \(\mathcal{S}_k\) with \(dim(\mathcal{S}_k)<n\) and one for the full space (\(dim(\mathcal{S}_k)=n\)). In the latter, convergence is declared whenever at least one of the following conditions is satisfied: (1) the trust-region radius \(\varDelta _k\) is too small, (2) the computed direction \(d_k\) is too small or (3) both feasibility and optimality have been achieved and the error between the real functions and the models is expected to be sufficiently small. As it was mentioned before, this error is directly linked to the \(\varLambda \)-poisedness measure given in Definition 2. In the subspace, we are less demanding and only ask that either \(\varDelta _k\) be very small or both feasibility and optimality have been achieved without checking the models error though.

The complete criticality step in DEFT-FUNNEL is described in the next algorithm.

figure h

In order to check feasibility and optimality in the full space, the algorithm employs a dynamic threshold \(\epsilon _i\) that is expected to fall below the fixed accuracy threshold \(\epsilon \) whenever criticality is verified in the current iterate. The initial value of \(\epsilon _i\) is set in Step 0 of Algorithm 3.1 when \(i=0\). The implicit loop in Step 2.3 may be viewed as a model improvement step that aims to ensure that the derivative information of the surrogate models do not differ too much from that of the original functions. This inner iteration follows the same idea of the criticality test in Algorithm 2 of Scheinberg and Toint (2010).

Notice that the definition of \(\vartheta _i\) at Step 2.4 of Algorithm 3.5 serves to indicate the point at which well-poised models \(m^c\) and \(m^f\) are known and the current trust region is larger than the region at which these models are poised. This variable is then used in the update rule of the interpolation set, as it is detailed in the next section.

3.3 Maintenance of the interpolation set and trust-region updating strategy

The management of the geometry of the interpolation set is based on the self-correcting geometry scheme proposed in Scheinberg and Toint (2010), where unsuccessful trial points are used to improve the geometry of the interpolation set. It depends on the criterion used to define successful iterations, which is passed to the algorithm through the parameter criterion, which depends on the iteration type (\(\mu \), f or z). For f-iterations, the inequality \(\rho _k^f\ge \eta _1\) must be satisfied, whereas for z-iterations the conditions in (46) must hold.

In general, unsuccessful trial points replace other sample points in the interpolation set which maximize a combined criteria of distance and poisedness involving the trial point. However, this replacement is avoided when the iterate is a point \(\vartheta _i\) (defined in CriticalityStep, Algorithm 3.5) at which well-poised models \(m^c\) and \(m^f\) are already known. Finally, the proposed solver do not make use of “dummy” interpolation points resulting from projections onto the subspaces as in Sampaio and Toint (2016). The whole procedure is described in Algorithm 3.6. As before, the maximum cardinality of the interpolation set, \(p^{\max }\), is \((n+1)(n+2)/2\).

figure i

The trust-region update strategies associated to f- and z-iterations are now described. Following the idea proposed in Gratton et al. (2011), the trust-region radii are allowed to decrease even when the interpolation set has been changed after the replacement of a far point or a close point at unsuccessful iterations. However, the number of times it can be shrunk in this case is limited to \(\nu _f^{\max }\) and \(\nu _z^{\max }\) as a means to prevent the trust regions from becoming too small too quickly. If the interpolation set has not been updated, the algorithm understands that the lack of success is not due to the surrogate models but rather due to the trust region size, and thus it reduces the latter.

figure j

The operations related to z-iterations follow below.

figure k

3.4 Parameter tuning and user goals

Like any other algorithm, DEFT-FUNNEL performance can be affected by how its parameters are tuned. In this section, we discuss about which parameters might have a major impact on its performance and how they should be tuned depending on the user goals. We consider four main aspects concerning the resolution of black-box problems: the budget, the priority level given to feasibility, the type of the objective function and the priority level given to global optimality. We can then describe the possible scenarios in decreasing order of difficulty as below:

  • Budget: very low, limited or unlimited.

  • Priority to feasibility: high or low.

  • Objective function: highly multimodal, multimodal or likely unimodal.

  • Priority given to global optimality: high or low.

Clearly, if the objective function is highly multimodal and the budget is limited, one should give priority to a multistart strategy with a high number of samples per iteration where global optimality is important. This can be done via two ways: the first one is by reducing the budget for each local search through the optional input maxeval_ls, which equals maxeval*0.7 by default; for instance, one could try setting maxeval_ls = maxeval*0.4, so that each local search uses up to 40% of the total budget only. The other possibility is to increase the number N of random points generated in the global search (see Step 3 in Algorithm 2.1). In case where attaining the global minimum is not a condition, a better approach would be to give more budget to each local search so that the chances to reach a local minimum are higher. If the objective function is not highly multimodal, one should search for a good compromise between spending the budget on each local search and on the sampling of the multistart strategy.

Notice also that when the budget is too small and it is very hard to find a feasible solution, it may be a good idea to compute a tangent step only when feasibility has been achieved. This is due to the fact that tangent steps may still deteriorate the gains in feasibility obtained through normal steps. When this happens, more normal steps must be computed which requires more function evaluations. Therefore, instead of spending the budget with both tangent and normal steps without guarantee of feasibility in the end, it is a better strategy to compute only normal steps in the beginning so that the chances of obtaining a feasible solution are higher in the end. For this purpose, the user should set the constant value \(\kappa _{\mathcal{R}}=0\) in (26). By doing so, the tangent step will be computed only if the normal step equals zero, which by (31) happens when the iterate is feasible. In the code, this can be done by setting kappa_r to zero in the function deft_funnel_set_parameters.

4 Numerical experiments

The numerical experiments are divided into two sections: the first one is focused on the evaluation of the performance of DEFT-FUNNEL on black-box optimization problems, while the second one aims at analyzing the benefits of the exploitation of white-box functions on grey-box problems. In all experiments with DEFT-FUNNEL, minimum \(\ell _2\)-norm models were employed. This choice is based on past numerical experiments (Sampaio and Toint 2015, 2016) where minimum \(\ell _2\)-norm models performed better than the other approaches. The criticality step threshold was set to \(\epsilon =10^{-4}\) and the trust-region constants were defined as \(\eta _1 = 10^{-4}\), \(\eta _2 = 0.9\), \(\eta _3 = 0.6\), \(\gamma _1 =0.5\), \(\gamma _2 = 2.0\), \(\nu _f^{\max } = 20\times n\) and \(\nu _z^{\max } = 20 \times n\), where n is the number of variables.

DEFT-FUNNEL is compared with two popular algorithms for constrained black-box optimization: NOMAD (Le Digabel 2011), a state-of-the-art C++ implementation of the MADS method, and the Pattern Search (PS) algorithm from the Matlab Global Optimization Toolbox (MATLAB 2015b). Both solvers belong to the class of direct-search methods for constrained black-box optimization algorithms. Besides their popularity, the choice for these solvers is based on the fact that both codes are publicly available for any user, a criterion that reduces the number of options significantly. Notice also that this section is neither intended to be an extensive benchmark on black-box optimization solvers nor to show the superiority of one solver over another—as it would require, among other things, a careful hyperparameter tuning of the other solvers as well—but rather to present encouraging results from DEFT-FUNNEL in a set of meaningful and difficult tests based on real-world applications.

Since NOMAD and PS are local optimization algorithms, they have been coupled with the Latin Hypercube Sampling (LHS) (McKay et al. 1979) method in order to achieve global optimality, a very common strategy found among practitioners in industry. In NOMAD, all constraints were treated as relaxable constraints by using the Progressive Barrier approach of the solver. Our experiments cover therefore two of the most popular approaches for solving black-box problems: surrogate-based methods and direct-search methods. Other DFO algorithms have already been compared against DEFT-FUNNEL in previous papers (see Sampaio and Toint 2015, 2016) on a much larger set of test problems mainly designed for local optimization.

4.1 Black-box optimization problems

The three methods are compared on a set of 14 well-known benchmark problems for constrained global optimization, including four industrial design problems—Welded Beam Design (WB4) (Deb 2000), Gas Transmission Compressor Design (GTCD4) (Beightler and Phillips 1976), Pressure Vessel Design (PVD4) (Coello and Montes 2002) and Speed Reducer Design (SR7) (Floudas and Pardalos 1990)—and the Harley pooling problem (problem 5.2.2 from Floudas et al. 1999), which is originally a maximization problem and that has been converted into a minimization one. Besides the test problems originated from industrial applications, we have collected problems with different characteristics (multimodal, nonlinear, separable and non-separable, with connected and disconnected feasible regions) to have a broader view of the performance of the algorithms in various kinds of scenarios. For instance, the Hesse problem (Hesse 1973) is the result of the combination of 3 separable problems with 18 local minima and 1 global minimum, while the Gómez #3 problem, listed as the third problem in Gómez and Levy (1982), consists of many disconnected feasible regions, thus having many local minima. The test problems G3-G11 are taken from the widely known benchmark list in Michalewicz and Schoenauer (1996). Table 3 gives the number of decision variables, the number of constraints and the best known feasible objective function value of each test problem.

Table 3 Problem name, number of decision variables, number of constraints (simple bounds not included) and the best known feasible objective function value of each test problem

Two types of black-box experiments were conducted in order to compare the algorithms. In the first type, a small budget of 100 black-box calls is given to each algorithm in order to evaluate how they perform on difficult problems with highly expensive functions. In such scenarios, many algorithms have difficulties to obtain even local minima depending on the test problem. In the second type of experiments, we analyze their ability and speed to achieve global minima rather than local minima by allowing larger budgets that range from \(100\times n\) to \(400\times n\), where n is the number of variables. Every function is considered as black box in both types of experiments. Finally, each algorithm is run 50 times on each test problem. For NOMAD and PS, each run is done on a different starting point obtained from LHS.

Only approximate feasible solutions of the problem (1) are considered when comparing the best objective function values obtained by the algorithms, i.e. we require that each optimal solution \(x^*\) satisfy \(cv(x^*) \le 10^{-4}\), where

$$\begin{aligned} cv(x) {\mathop {=}\limits ^\mathrm{def}}\max \left[ \left[ z(x) - u^s\right] ^+, \left[ l^s - z(x)\right] ^+ \right] . \end{aligned}$$
(59)

In the next two subsections, we show the results for the two types of experiments in the black-box setting.

4.1.1 Budget-driven experiments

The results of the first type of experiments are shown in Table 4. In the second column, \(f_{\text {OPT}}\) denotes the objective function value of the global minimum of the problem when it is known or the best known objective function value otherwise. For each solver, we show the best, the average and the worst objective function values obtained in 50 runs on every test problem.

Table 4 Results for the first type of experiments on a budget of 100 black-box calls. For each solver, it shows the best, the average and the worst objective function values obtained in 50 runs. For each column and each problem, the method with the lowest objective function value is in bold

As it can be seen in Table 4, DEFT-FUNNEL attained \(f_{\text {OPT}}\) in 10 out of 14 problems, while NOMAD did it in 5 problems and PS in only 4. Besides, when considering the best value found by each solver, DEFT-FUNNEL was superior to the others or equal to the best solver in all problems. In the average and worse cases, DEFT-FUNNEL also presented a very good performance; in particular, its worse performance was inferior to all others’ in only 4 problems, while its average performance was superior to others in 9 problems. Although NOMAD did not reach \(f_{\text {OPT}}\) often, it presented the second best average-case performance among all methods, while PS presented the worst. Finally, NOMAD and PS were unable to reach a feasible solution in the Harley pooling problem.

4.1.2 Experiments driven by global minima

This section evaluates the ability of each solver to find global minima rather than local minima. The results of the second type of experiments for each test problem are presented individually, which allows a better analysis of the evolution of each solver performance over the number of function evaluations allowed. Each figure is thus associated to one single test problem and shows the average progress curve of each solver after 50 trials as a function of the budget.

In Fig. 4, we show the results on test problems Harley, WB4, GTCD4 and PVD4. DEFT-FUNNEL and NOMAD presented the best performance among the three methods, with NOMAD being superior in 2 out of 4 problems. PS not only was inferior to the other two methods, but it also seemed not to be affected by the number of black-box calls allowed, as it can be seen in Fig. 4a–c. The solutions found by PS had often much larger objective function values than those obtained by DEFT-FUNNEL and NOMAD. Moreover, PS and NOMAD could not find a feasible solution for the Harley problem regardless of the number of black-box calls.

Fig. 4
figure 4

Mean best approximately feasible objective function value obtained by each solver on 50 trials with budgets of \(100\times n\), \(200\times n\), \(300\times n\) and \(400\times n\) black-box function evaluations

Figure 5 shows the results on test problems SR7, Hesse, Gómez #3 and G3. DEFT-FUNNEL and NOMAD had similar performances on SR7 and Hesse problems with little difference between both solvers, while PS had the poorest performance. On the other hand, the performance of DEFT-FUNNEL on Gómez #3 and G3 was significantly better than the other methods; in particular, DEFT-FUNNEL attained \(f_{\text {OPT}}\) in G3 in, practically, all tests. Finally, PS performance on SR7 shows that allowing more black-box calls was not helpful and that the objective function value even increased.

Fig. 5
figure 5

Mean best approximately feasible objective function value obtained by each solver on 50 trials with budgets of \(100\times n\), \(200\times n\), \(300\times n\) and \(400\times n\) black-box function evaluations

Fig. 6
figure 6

Mean best approximately feasible objective function value obtained by each solver on 50 trials with budgets of \(100\times n\), \(200\times n\), \(300\times n\) and \(400\times n\) black-box function evaluations

Figure 6 shows the results on test problems G4, G6, G7, G8, G9 and G11. DEFT-FUNNEL was superior to all other methods, attaining \(f_{\text {OPT}}\) on 4 problems (G4, G6, G8, G11) independently of the number of evaluations allowed. NOMAD was the second best, followed by PS. DEFT-FUNNEL and NOMAD had similar performances on G7 and G9, where both reached \(f_{\text {OPT}}\) when the number of black-box calls were higher than \(200 \times n\). Besides that PS did not find a feasible solution on Harley pooling problem, there is a significant increase on its objective function value as the number of evaluations grows on test problems SR7 and G7. These 3 problems are the ones with the largest number of constraints, which could be a reason for its poor performance.

Table 5 Problem name, number of decision variables, number of black-box constraints, number of white-box constraints, type of objective function—black box (BB) or white box (WB)—and the best known feasible objective function value of each test problem
Table 6 Experiments with grey-box problems with a budget of 100 black-box calls. For each solver, it shows the best, the average and the worst objective function values obtained in 50 runs. DEFT-FUNNEL GB denotes the results on the grey-box problems while DEFT-FUNNEL BB denotes the version of DEFT-FUNNEL with the WB functions treated as BB functions

4.2 Grey-box optimization problems

In this section, DEFT-FUNNEL is compared with NOMAD and PS on a set of artificial grey-box problems built by selecting a test problem and defining some of its functions as black boxes and the remaining as white boxes. Among the 5 grey-box problems considered in the experiments, 3 were used in the black-box experiments described in the previous subsection, where all functions were considered as black boxes, namely: Hesse, SR7 and GTCD4. The reuse of these test problems allows for evaluating the performance improvement of DEFT-FUNNEL in cases where some information is available and for comparing its performance with that obtained in the black-box setting. The remaining grey-box problems are the problems 21 and 23 from the well-known Hock-Schittkowski collection (Hock and Schittkowski 1980), denoted here as HS21 and HS23, respectively. Both HS21 and HS23 have nonlinear objective functions; however, in HS21 the objective function is defined as white box, while in HS23 it is defined as black box. The only constraint present in HS21 is defined as black box, while in HS23 there is a balance between both categories among the constraints. We have therefore attempted to cover different possibilities related to the type of functions in a grey-box setting. The derivatives of all functions defined as white boxes were given as input to DEFT-FUNNEL. The reader can find more details about the tested grey-box problems in Table 5.

In Table 6, the results obtained with a budget of 100 black-box calls are presented. In order to see the improvement on the DEFT-FUNNEL results when some functions are defined as white boxes, both the results obtained on the grey-box problem, denoted by DEFT-FUNNEL GB in Table 6, and those obtained on the corresponding black-box problem, denoted by DEFT-FUNNEL BB, are provided. The DEFT-FUNNEL BB results contain the same values already presented in Table 4.

It can be seen in Table 6 that DEFT-FUNNEL GB was the only one to reach \(f_{\text {OPT}}\) in two problems, having also the best average-case and worst-case performances in general. When comparing the black-box and grey-box results obtained by DEFT-FUNNEL on problems WB4, GTCD4 and SR7, it is evident that the available information from the white-box functions contributed to improve its performance. Not only the best objective function values on these problems were improved, allowing for reaching the global minimum on SR7, but also the average and worst ones. Since DEFT-FUNNEL had already attained the global minimum on Hesse in the black-box setting, the only expected improvement would be in the average and worst cases, which did not happen in the experiments. This is probably due to the fact that this a multimodal problem with 18 local minima, being a combination of 3 separable problems and having 1 global minimum. Therefore, even if information about the function is partially available, the problem remains very difficult to solve due to its nature.

5 Conclusions

This paper introduced a new global optimization solver for general constrained grey-box and black-box optimization problems named DEFT-FUNNEL. It combines a stochastic multistart strategy with a trust-funnel sequential quadratic optimization algorithm where the black-box functions are replaced by surrogate models built from polynomial interpolation. The proposed solver is able to exploit available information from white-box functions rather than considering them as black boxes. Its code is open source and freely available at the Github repository http://github.com/phrsampaio/deft-funnel. Unlike many black-box optimization algorithms, it can handle both inequality and equality constraints and it does not require feasible starting points. Moreover, the constraints are treated individually rather than grouped into a penalty function.

The numerical experiments have shown that DEFT-FUNNEL compares favourably with other state-of-the-art algorithms available for the optimization community on a collection of well-known benchmark problems. The reported numerical results indicate that the proposed approach is very promising for grey-box and black-box optimization. Future research works include extensions for multiobjective optimization and mixed-integer nonlinear optimization as well as as parallel version of the solver.