1 Introduction

In [1], Audet and Dennis, introduced the Mesh Adaptive Direct Search (Mads) class of algorithms for constrained, black-box optimization problems where derivative information for the objective function is either unavailable or unreliable and the set of feasible points is determined by black-box constraint functions. In most cases the objective function may not be smooth. Depending on assumptions made about the objective function, however, Mads is shown to converge to both first-order [1] and second-order [2] stationary points in the Clarke sense [3], as well as, first-order [4] stationary points in the Rockafellar sense [5].

Mads is a Search/Poll-based direct search method that was introduced as a means of extending the Gps [6] class of algorithms by allowing for an infinite number of polling directions (see Definition 2.1), in the limit, when exploring the space of variables. The first instance of Mads called LtMads [1] relied on a random lower triangular matrix to generate the set of polling directions and probabilistic arguments to obtain convergence results. This instance was later shown to have undesirably large angles between some of the poll directions at each iteration.

To eliminate these large angles and the need for probabilistic arguments, as well as make modifications for improved performance, a second instance of Mads called OrthoMads [7] was introduced. This instance is deterministic and uses a maximal orthogonal basis at each iteration for the set of polling directions. OrthoMads has been shown to behave well in practice, but for larger n (for example, n≥20), it has been observed that the set of normalized directions taken over all iterations tend to cluster around the coordinate directions, so that a small set of directions is slightly favored over all other possible directions. This behavior was not apparent in [7], as that paper only looked at the distribution of the polling directions for n=2,3. In fact, we can observe in Fig. 2 that, as n increases, the clustering behavior becomes more pronounced and larger proportions of the directions cluster around the coordinate directions.

The purpose of this paper is to introduce a new instance of Mads, which we call QrMads, that eliminates the clustering of directions. To accomplish this goal we use an equal area partition of the unit sphere and QR decomposition to find an orthogonal set of directions at each iteration. The orthogonal set of directions is then projected onto a hypercube centered at the origin with integer length and rounded to obtain a ‘nearly orthogonal’ integer set of directions. It can be observed that the set of normalized directions taken over the set of all iterations from QrMads obtains a more uniform distribution, as seen in Fig. 2.

There are trade-offs to this approach. First, QrMads is no longer deterministic. Second, at each iteration the set of polling directions is no longer orthogonal, but is instead only ‘nearly orthogonal’ becoming increasingly closer to orthogonal as the mesh size decreases. We believe that this trade-off is minor and that maintaining orthogonality is not a necessary feature, since in [7] it was only applied in an effort to achieve better directions than those generated by LtMads. Third, there is a higher computational cost associated with performing a QR decomposition to obtain a set of polling directions. For this reason, our target application is directed towards expensive objective functions where the QR decomposition will not significantly contribute to the overall solve time.

Sections 2 and 3 give the method used by QrMads for constructing a polling set on the current mesh and consisting of ‘nearly orthogonal’ polling directions. In Sect. 4, we outline the QrMads algorithm and show that it is a valid Mads instance that shares the theoretical convergence results of this class of algorithms. Finally, we give numerical results to compare QrMads with OrthoMads on a set of 60 smooth, 62 nonsmooth, and 28 constrained test problems taken from the optimization literature.

2 Mesh Adaptive Direct Search

The Mads class of algorithms was developed to solve black-box optimization problems of the form

$$\min_{x\in\varOmega} f(x), $$

where Ω is the set of feasible points. Each iteration of Mads is characterized by two steps, an optional Search and a local Poll, performed on a mesh whose fineness must converge to zero in the limit. At each iteration, the mesh M k is defined by

$$ M_k:= \bigl\{x+\varDelta _k^mDz : x\in V_k,\ z\in\mathbb{N}^{n_D} \bigr\}, $$
(1)

where V k is the set of all evaluated points at the start of iteration k, \(\varDelta _{k}^{m}\) is the mesh size parameter at iteration k, and \(D\in\mathbb{R}^{n\times n_{D}}\) is a matrix with n D directions in \(\mathbb{R}^{n}\). The Poll step is executed if the Search step fails to find a lower objective function value than the current best solution. The Poll step evaluates mesh points adjacent to the current best solution in directions that form a positive spanning set for \(\mathbb{R}^{n}\). The Poll step can be characterized by the following definition taken from [1]:

Definition 2.1

At iteration k, the set of poll trial points for the Mads algorithm is defined to be:

$$P_k:= \bigl\{x_k+\varDelta _k^md : d\in D_k \bigr\}\subset M_k, $$

where D k is a positive spanning set such that 0∉D k and for each dD k ,

  • d can be written as a non-negative integer combination of the directions of D,

  • the distance in -norm from the poll center x k to a poll trial point is bounded above by the poll size parameter \(\varDelta _{k}^{p}\),

  • limits (as defined by Coope and Price [8]) of the normalized sets D k are positive spanning sets.

From [1] we find that the poll and mesh size parameters produced by a Mads instance satisfy

$$ \liminf_{k\rightarrow+\infty} \varDelta _k^p= \liminf_{k\rightarrow+\infty }\varDelta _k^m=0. $$
(2)

The OrthoMads variant of Mads uses the Halton sequence [9] and the scaled Householder transformation, H=∥q2(I n −2vv T) with \(q\in\mathbb{R}^{n}\) and v=q/∥q∥, to construct an orthogonal basis for \(\mathbb{R}^{n}\) at each iteration. It defines

$$ \varDelta _k^p=2^{-\ell_k}\quad \mbox{and}\quad \varDelta _k^m=\min \bigl\{1,4^{-\ell _k} \bigr\}, $$
(3)

where k is the mesh index at iteration k, with 0=0 at the initial iteration.

For high dimensions, the directions produced by OrthoMads tend to cluster around the coordinate directions. The primary reason for this seems to be that the use of the Householder transformation,

will cause a higher concentration of directions to occur near the axes. This is the case because if any of the coordinates of the vector v are small, then the corresponding column of the resulting matrix H will tend to have small coordinates off the diagonal and be nearly one on the diagonal. This behavior becomes more pronounced in high dimensions (and occurs in all the columns) since the coordinates of most unit vectors will naturally be small.

3 Equal Area Partition of the Unit Sphere

In [10] and [11], Leopardi introduced the recursive zonal equal area (EQ) sphere partitioning algorithm for partitioning the unit sphere \(\mathbb{S}^{n-1}\subset\mathbb{R}^{n}\) into regions of equal area and small diameter. An equal area partition P is defined to be a nonempty finite set of closed Lebesgue measurable subsets R of \(\mathbb{S}^{n-1}\), called regions, such that \(\bigcup_{R\in P} R=\mathbb{S}^{n-1}\) and any two regions may only share points on their boundaries. The regions have equal area in the sense of the Lebesgue measure \(\mathcal{L}^{n-1}\), with \(\mathcal {L}^{n-1}(R)=\mathcal{L}^{n-1}(\mathbb{S}^{n-1})/N\), where N is the number of partitions and the boundary of each region has measure zero. The diameter of a region is defined by diam R:=sup{∥xy∥:x,yR}.

In [12], Leopardi proved two useful results about this algorithm for n≥2 and N≥1 partitions. First, he showed that the EQ partition algorithm produces an equal area partition of \(\mathbb {S}^{n-1}\). Second, he showed that the set of EQ partitions has diameter bound \(K\in\mathbb{R}_{+}\) in the following sense:

$$ \mbox{for any partition }P \mbox{ and for each } R\in P, \operatorname {diam}R\leq K N^{-1/(n-1)}. $$
(4)

As a consequence of the EQ partition algorithm, each region is defined to be a Cartesian product of spherical caps (the set of points of \(\mathbb{S}^{d}\), 1≤dn−1, whose spherical distance to a given point x is at most θ, for some value of θ) and intervals in spherical coordinates. The center of each region can then be defined to be the point that corresponds to the center of each spherical cap and/or interval. This leads to the following useful proposition which will be used to find a dense set of directions for the QrMads instance:

Proposition 3.1

Let \(\{P_{j}\}_{j=1}^{\infty}\) be a sequence of EQ partitions such that the number of partitions N j →∞. Next, build a sequence of points \(\{c_{i}\}_{i=1}^{\infty}\) from the region centers by exhausting the centers from all the regions of a single partition, in any order, before taking centers from the next partition in the sequence. Then the resulting sequence, \(\{c_{i}\}_{i=1}^{\infty}\), will be dense in \(\mathbb{S}^{n-1}\).

Proof

Let \(w\in\mathbb{S}^{n-1}\) and ε>0 be arbitrary. Then for each partition P j there exists a region R j P j with center c j such that wR j . If w is on the boundary of two or more such regions, then we may choose any one. We can then apply (4) to get

$$\|c_j-w\|\leq \operatorname {diam}R_j\leq K N_j^{-1/(n-1)}<\varepsilon, $$

for large enough j. □

4 Constructing a Basis

At each iteration of the Mads algorithm a positive spanning set D k is required. In the previous section we discussed a method for finding a dense set of directions, but not a dense set of positive spanning sets. In this section, we discuss a method for taking a single direction and forming an orthogonal basis that includes the original direction. We can then take the negatives of all the directions in the basis to form a positive spanning set. Since one of the directions from the dense set is included in each positive spanning set, we will then find that \(\bigcup_{k=1}^{\infty}D_{k}\) is dense in the unit sphere.

To generate an orthogonal basis from a given partition center c k , we will employ the QR decomposition of an n×m matrix (n<m) A k of full rank. This matrix A k can be created by augmenting c k with any matrix of full rank (e.g. the identity or a rotation matrix). The QR decomposition then generates an orthogonal matrix Q k and an upper trapezoidal matrix R k such that Q k R k =A k . It is important to note that the first column of Q k will be a scaler multiple of c k , since R k is upper trapezoidal, and its first column will therefore be equal to [a,0,…,0]T, for some constant \(a\in\mathbb{R}\). Thus the resulting matrix Q k will give an orthogonal basis with the first direction corresponding to c k .

To obtain a ‘nearly orthogonal’ integer basis, we will take each column vector q i of Q k , project it onto the hypercube centered at the origin with integer side length \(2\cdot2^{|\ell_{k}|+2\ell_{n}}\), and then round each component to the nearest integer as follows:

$$ d_i=\operatorname {round}\biggl(2^{|\ell_k|+2\ell_n} \frac{q_i}{\|q_i\|_{\infty }} \biggr), $$
(5)

where \(\operatorname {round}(\cdot)\) refers to the rounding operation applied to each component of the vector and k is as in (3). We then generate the new basis H k , with column vectors d i , and define \(D_{k}:=[ \begin{array}{cc}H_{k}& -H_{k}\\ \end{array} ]\) to obtain a positive spanning set.

We add the positive constant 2 n to the exponent in (5) in order to ensure that H k will be nonsingular. We do this because for small values of \(2^{|\ell_{k}|}\), it is possible that the set of vectors \(\{d_{i}=\operatorname {round}(2^{|\ell_{k}|}\frac{q_{i}}{\|q_{i}\|_{\infty }} ): 1\leq i\leq n\}\) will not form a basis. To find the constant n , we use the fact that the distance from a matrix A to the nearest singular matrix B (in Frobenius norm) is bounded below by the smallest singular value of A, that is, if B is singular then ∥BA∥≥σ n (A). We let A be the matrix with column vectors \(2^{|\ell_{k}|+2\ell_{n}}q_{i}/\|q_{i}\|_{\infty}\),

Then A=Q k T, where T is the diagonal matrix above. To find the singular values of A, we need to find the eigenvalues of A T A:

This has eigenvalues \((2^{|\ell_{k}|+2\ell_{n}}/\|q_{i}\|_{\infty})^{2}\), 1≤in. Thus the singular values of A are given by \(2^{|\ell_{k}|+2\ell _{n}}/\|q_{i}\|_{\infty}\), 1≤in. Next, we note that \(2^{|\ell _{k}|+2\ell_{n}}/\|q_{i}\|_{\infty}\geq2^{|\ell_{k}|+2\ell_{n}}\) and so for A+E to be nonsingular, we need

$$\|A+E-A\|=\|E\|<2^{|\ell_k|+2\ell_n}, $$

where E represents the perturbation matrix resulting from rounding. As a consequence of the projection and the rounding, the matrix E satisfies e ij ≤1/2 for all but n choices of i and j, and e ij =0 for the remaining n choices. Then \(\|E\|\leq\frac{\sqrt {n^{2}-n}}{2}\) and so we require

$$\frac{\sqrt{n^2-n}}{2}<2^{|\ell_k|+2\ell_n}, $$

for all values of k . Then letting k =0, we obtain the following requirement for the constant n determined only by the dimension n:

$$ \sqrt{n^2-n}<2^{2\ell_n+1}. $$
(6)

The following proposition will be useful for showing our new version of Mads is a valid instance and for establishing convergence results.

Proposition 4.1

If \(w\in\mathbb{S}^{n-1}\) and ε>0 are arbitrary, \(d=\operatorname {round}(\alpha w/\|w\|_{\infty})\), and α>0 is large; thend/∥d∥−w ∥<ε/2.

Proof

In Proposition 3.4 of [7] it is shown that there exists β>0 such that

$$\biggl\Vert \frac{q}{\|q\|}-w\biggr\Vert < \frac{\varepsilon}{2}, $$

when \(q=\operatorname {round}(\alpha w)\) and α>β. Then for any given \(w\in\mathbb{S}^{n-1}\) with \(\frac{\alpha}{\|w\|_{\infty}}>\beta\), we have

$$\biggl\Vert \frac{d}{\|d\|}-w\biggr\Vert < \frac{\varepsilon}{2}. $$

Now, since ∥w≤1 for all \(w\in\mathbb{S}^{n-1}\), we can observe that \(\frac{\alpha}{\|w\|_{\infty}}\geq\alpha\). Thus for α>β, the result follows for all \(w\in\mathbb{S}^{n-1}\). □

Although QrMads uses a ‘nearly orthogonal’ basis in place of an orthogonal set of directions at each iteration, we believe that it has two potential advantages over OrthoMads. First, the distribution of directions taken over all iterations is more uniform when QrMads is used in place of OrthoMads (see Fig. 2). Second, the method described above forces the distance in -norm from the poll center to a poll trial point to equal \(2^{|\ell_{k}|+2\ell_{n}}\) instead of only being bounded above by such value, ensuring that the step size taken will not be unintentionally small.

5 The QrMads Instance of Mads

The QrMads algorithm differs from the LtMads and OrthoMads algorithms only in the construction of the poll directions D k and the poll set P k . For all three instances, the mesh M k is defined by the set of directions \(D:=[ \begin{array}{cc}I_{n}& -I_{n}\\ \end{array} ]\). For QrMads, the choices of the mesh and poll size parameters are

$$ \varDelta _k^p=2^{-\ell_k}\quad \mbox{and}\quad \varDelta _k^m=\min\bigl\{4^{-\ell_k-\ell _n},4^{-\ell_n} \bigr\}, $$
(7)

with n defined as in (6), so that \(\varDelta _{k}^{p}/\varDelta _{k}^{m}=2^{|\ell_{k}|+2\ell_{n}}\) for all k, as in (5). The update rules for the mesh index k are given in Fig. 1, which describes the QrMads algorithm.

Fig. 1
figure 1

The QrMads algorithm

This algorithm is essentially the same as the algorithm from [7], with t k and \(c_{t_{k}}\) in place of the Halton directions \(u_{t_{k}}\) and \(q_{t_{k},\ell_{k}}\), respectively. The integer t k and the update rules are chosen so that there will be a subsequence of unsuccessful iterations with \(\varDelta _{k}^{m}\rightarrow0\) and such that the directions used in the subsequence will correspond to the entire sequence of partition centers as in [7].

The poll directions D k are determined using a sequence of partition centers as described in Sect. 2, the mesh index k , a sequence of n×n matrices of full rank, and the index t k . At each iteration, a center \(c_{t_{k}}\) is taken from the sequence of partition centers, this is then augmented with a random orthogonal matrix to form an n×(n+1) matrix, and then QR decomposition is used to find an orthogonal basis matrix with the direction from the first column corresponding to \(c_{t_{k}}\), as described in Sect. 3. The columns of the resulting matrix Q k are then projected onto the hypercube centered at zero with side length \(2\cdot2^{|\ell _{k}|+2\ell_{n}}\) and rounded. The resulting matrix H k will then be ‘nearly orthogonal’ with the length of each column in -norm equal to \(\varDelta _{k}^{p}/\varDelta _{k}^{m}\). This then forces the trial point \(x_{k}+\varDelta _{k}^{m}D_{k}e_{i}\) to be exactly \(\varDelta _{k}^{p}\) from the poll center in -norm. Finally, H k is completed to a maximal positive basis composed of 2n directions, \(D_{k}=[ \begin{array}{cc}H_{k}& -H_{k}\\ \end{array} ]\).

The following lemma will be useful to show that QrMads has the same convergence properties as in [1].

Lemma 5.1

The set of normalized directions \(\{d_{i,\ell_{k}}/\|d_{i,\ell _{k}}\| \}_{i=1}^{\infty}\), with \(d_{i,\ell_{k}}= \operatorname {round}(2^{|\ell_{k}|+2\ell_{n}}c_{i}/\|c_{i}\|_{\infty} )\) as in (5), is dense on \(\mathbb{S}^{n-1}\).

Proof

Let \(w\in\mathbb{S}^{n-1}\) and ε>0 be arbitrary. From Proposition 3.1, there exist a subsequence \(\{c_{i_{j}}\}_{j=1}^{\infty}\) of region centers of the sequence of partitions and a constant N>0 such that \(\|w-c_{i_{j}}\|<\varepsilon/2\), for all i j >N. From Proposition 4.1, for large values of | k | we have \(\| c_{i_{j}}-\frac{d_{i_{j},\ell_{k}}}{\|d_{i_{j},\ell_{k}}\|}\| < \varepsilon/2\). Then by applying the triangle inequality we obtain

$$\biggl\Vert w-\frac{d_{i_j,\ell_k}}{\|d_{i_j,\ell_k}\|}\biggr\Vert \leq\|w-c_{i_j}\|+\biggl\Vert c_{i_j}- \frac{d_{i_j,\ell _k}}{\|d_{i_j,\ell_k}\|}\biggr\Vert <\frac{\varepsilon}{2}+ \frac {\varepsilon}{2}=\varepsilon, $$

for some i j >N and for some large | k |. □

We are now ready for the main result:

Theorem 5.1

QrMads is a valid Mads instance with the convergence properties from [1].

Proof

We need to show that the poll directions for QrMads satisfy the following five properties from [1, 13]:

  • Each D k is a positive spanning set: This follows from the use of the constant n to ensure that the round procedure on Q k results in a basis.

  • Any direction D k e i (1≤i≤2n) can be written as a non-negative integer combination of the directions of D: This is true by construction since \(D=[ \begin{array}{cc} I_{n}& -I_{n}\\ \end{array} ]\) and D k e i is an integer vector on the hypercube centered at the origin with side length \(2\cdot2^{|\ell_{k}|+2\ell_{n}}\).

  • The distance from the poll center x k to a poll trial point in -norm is bounded above by \(\varDelta _{k}^{p}\): This is true by construction since we ensured \(\|D_{k}e_{i}\|_{\infty}= 2^{|\ell _{k}|+2\ell_{n}}\) and so \(\|\varDelta _{k}^{m}D_{k}e_{i}\|_{\infty}=\varDelta _{k}^{m}2^{|\ell _{k}|+2\ell_{n}}=\varDelta _{k}^{p}\).

  • The set of normalized directions used over all failed iterations is dense on the unit sphere: This follows from the strategy chosen for updating D k so that the entire sequence of partition centers corresponds to the sequence of failed iterates, by Lemma 5.1, and (2), which ensures the existence of large | k |.

  • Limits (as defined by [8]) of convergent subsequences of the normalized sets \(\overline{D}_{k}= \{ \frac{d}{\|d\|}: d\in D_{k} \}\) are positive spanning sets: For this property we need to show \(\det (\overline{H}_{k} )>\tau\) for all k, for some constant τ>0. The result will then follow as in [13].

First, note that the set \(\mathcal{O}=\{A\in\mathbb{R}^{n\times n}: A\mbox{ is an orthogonal matrix}\}\) is a bounded set since \(\mathcal{O}\subset\{A\in\mathbb{R}^{n\times n}:\|A\|=\sqrt{n}\} \subset B(0,\sqrt{n}+\varepsilon)\), where ∥⋅∥ denotes the Frobenius norm on \(\mathbb{R}^{n\times n}\) and \(B(0,\sqrt{n}+\varepsilon)\) is the ball of radius \(\sqrt{n}+\varepsilon\) centered around the matrix of zeros. Furthermore, the set \(V_{\delta}=\{B:\|A-B\|\leq n \delta\ \mbox{for any}\ A\in\mathcal{O}\}\) is bounded since any such B would satisfy \(\|B\|\leq\sqrt{n}+n\delta\) and so \(V_{\delta}\subset B(0,\sqrt {n}+n\delta+\varepsilon)\). Now, since the determinant function is C 1 on \(\mathbb{R}^{n\times n}\), it is Lipschitz on the bounded set V δ . That is, for any B,CV δ there exists a constant K>0 such that

$$\bigl|\det(C)-\det(B)\bigr|\leq K\|C-B\|. $$

This then yields

$$\bigl|\det(B)\bigr|\geq\bigl|\det(C)\bigr|-K\|C-B\|. $$

So, in particular, if we take \(C\in\mathcal{O}\) (since \(\mathcal {O}\subset V_{\delta}\)) and B such that ∥CB∥≤ we get

$$\bigl|\det(B)\bigr|\geq\bigl|\det(C)\bigr|-K\|C-B\|\geq1-K n\delta. $$

Now, note that for any k, \(Q_{k}\in\mathcal{O}\) and \(\overline{H}_{k}\in V_{\delta_{k}}\) for some δ k >0, since \(\overline{H}_{k}=Q_{k}+E_{k}\) for some perturbation matrix E k resulting from rounding. Next, we choose δ small enough so that 1−Knδ>0. Then by Proposition 4.1, we can choose | k | large enough, say | k |> δ , so that \(\Vert \overline{H}_{k}-Q_{k}\Vert \leq n\delta\) and conclude

$$\bigl \vert \det (\overline{H}_k )\bigr \vert \geq1-Kn\delta,\quad \mbox{for all}\ k\ \mbox{such that}\ |\ell_k|>\ell_{\delta}. $$

Next, let H k have columns d i and note that by construction det(H k )≠0. Since H k has integer components, we know |det(H k )|≥1. Also, note \(\|d_{i}\|\leq\sqrt{n}2^{|\ell_{k}|+2\ell_{n}}\), 1≤in, since each d i is on the hypercube with side length \(2\cdot2^{|\ell_{k}|+2\ell_{n}}\). We then get the following estimate

Finally, we note that \(n^{-n/2}2^{-n(|\ell_{k}|+2\ell_{n})}\) is decreasing as | k | increases. Then \(\vert \det (\overline{H}_{k} )\vert \geq n^{-n/2}2^{-n(\ell_{\delta}+2\ell_{n})}\), if | k |≤ δ , where δ is defined above. We then let

$$\tau=\min\bigl\{1-Kn\delta, n^{-n/2}2^{-n(\ell_{\delta}+2\ell_n)}\bigr\} $$

and conclude \(\det (\overline{H}_{k} )>\tau\) for all k. □

6 Numerical Tests

In this section, we compare QrMads and OrthoMads on 150 problems taken from the optimization literature. Each test was performed using our own MATLAB implementation. The test problems fall into three categories. The first category consists of 60 smooth test functions taken from [14]. The second category consists of 62 nonsmooth test functions taken from [15] and [16], where many of the problems in [16] are variable dimension generalizations of the problems from [15]. Lastly, 28 constrained test problems are also taken from [15] and [16], with the problems from [16] being constrained generalizations of some of the nonsmooth problems. Many of the constrained problems from [16] were chosen to correspond to unconstrained problems where OrthoMads performed better than QrMads.

The choices for \(\varDelta _{k}^{m}\) and \(\varDelta _{k}^{p}\) correspond to (3) and (7). For both instances the mesh index k is allowed to be negative. For both instances, the initial mesh size \(\varDelta _{0}^{m}\) is set to one. If either the number of function evaluations reaches 1000n or \(\varDelta _{k}^{p}\) drops below 10−10, then the stopping criteria are satisfied. For both implementations, an opportunistic strategy was used for the Poll step (the Poll stops as soon as a successful point has been found) and no search step was performed. For both implementations, constraints are handled using the extreme barrier method (f(x)=+∞ if xΩ).

For each test, a partition of the unit sphere of size N was generated, with N satisfying the following:

$$N=\left \{ \begin{array}{c@{\quad }l} 10^6 & \mbox{if }n\leq6,\\ 10^n & \mbox{if }6<n\leq15,\\ 10^{15}& \mbox{if }n>15.\\ \end{array} \right . $$

A random ordering of the centers of each region of the partition was then used as the sequence of directions for QrMads. This choice was made purely out of convenience as a means of testing the algorithm. There are many choices that can be made for generating a dense sequence of directions including starting with a smaller partition size and exhausting that partition before moving on to the centers from a larger partition. The performance of the QrMads algorithm when using different sequences of partition centers is unknown at this time and is an area for future research. Due to the random ordering of the partition centers, 30 instances of QrMads were performed on each test function and the final function values were compared to the value obtained from the OrthoMads algorithm.

Figure 2 illustrates the distribution and the density of directions for three Mads instances: LtMads, OrthoMads, and QrMads. The three variants were run on the Rosenbrock function with dimensions n=4, 10, and 20. In each case there are roughly 500n directions shown, where each direction is projected onto the plane defined by the first two coordinates. The resulting plots are typical of what can be expected regardless of the choices of the two coordinates. In each case, the QrMads directions have a relatively uniform distribution, while the LtMads directions have a large proportion with zero coordinates and the OrthoMads directions tend to cluster within specific regions. Upon further observation, it appears that many of the OrthoMads directions tend to cluster toward the coordinate directions as the dimension increases. In each case, however, the QrMads directions depict a more uniform distribution.

Fig. 2
figure 2

First two coordinates of the normalized poll directions of three instances of the Mads algorithm on the n-dimensional Rosenbrock function with n=4,10,20

We also found that QrMads finds a final function value for the Rosenbrock function that is strictly better than the final value obtained by OrthoMads for 19 of the 30 runs when n=4 and for all 30 runs when n=10 or 20. Overall, the test results presented in this section show a similar pattern of QrMads outperforming OrthoMads as the dimension of the test functions increase.

Note 6.1

In Fig. 2, the QrMads directions (and to some extent the LtMads directions) become more restricted towards the center as the dimension increases. This behavior is a reflection of the concentration of measure phenomenon and the fact that the observable diameter of \(\mathbb{S}^{n-1}\) converges to zero as n→∞ (for more on this topic see [17]). Intuitively, one can imagine that as n increases, the contribution to the norm of a vector from the first two elements gets smaller.

The results of the tests are summarized in Tables 1, 2, 3, and 4. If we let \(f_{O}^{*}\) denote the final value obtained by OrthoMads, \(f_{QR}^{*}\) denote the final value obtained by QrMads, and f(x 0) denote the initial value of f, then we can define the three scores S 1, S 2, and S 3 as follows:

  • S 1:= the number of final values from 30 runs of QrMads for which \(f_{QR}^{*}<f_{O}^{*}\).

  • S 2:= the number of final values from 30 runs of QrMads for which \(f_{QR}^{*}\leq f_{O}^{*}+0.01(f(x_{0})-f_{O}^{*})\).

  • S 3:= the number of final values from 30 runs of QrMads for which \(f_{O}^{*}\leq f_{QR}^{*}+0.01(f(x_{0})-f_{QR}^{*})\).

In this way, we can measure the number of final values of f for the 30 runs of QrMads that are strictly better than the value obtained from OrthoMads (S 1), as well as the number of final values of f from QrMads that are within 1 % of the value obtained from OrthoMads (S 2) and vice versa (S 3).

Table 1 Results for smooth problems
Table 2 Results for nonsmooth problems
Table 3 Results for constrained problems
Table 4 Summary of results for all test problems

Table 1 shows the results from the tests performed on the set of 60 unconstrained smooth functions. QrMads received a score for S 1 of 15 or more on 38 of these test problems, meaning that QrMads performed better than OrthoMads on more than half the 30 runs for a majority of the test functions from this set. QrMads received a score for S 2 of 15 or more on all of the problems from this set, indicating that QrMads finds a final value within a 1 % tolerance of OrthoMads on nearly all of the problems from this set. Finally, OrthoMads received a score for S 3 of 15 or more on 48 out of 60 of the test problems from this set. Of the 12 scores for S 3 that are below 15, there are 11 zeros and a four, indicating that QrMads showed a significant improvement over OrthoMads on 12 of the test problems from this set.

Table 2 shows similar results for the set of 62 unconstrained nonsmooth functions tested. QrMads received a score for S 1 of 15 or more on 30 of these test problems, meaning that QrMads performed better than OrthoMads on more than half the 30 runs for nearly half of the test functions in this set. QrMads received a score for S 2 of 15 or more on all but ten of the test functions from this set, indicating that QrMads finds a final value within a 1 % tolerance of OrthoMads on a large majority of these problems. Finally, OrthoMads received a score for S 3 of 15 or more on 42 out of 62 of the test problems from this set. Of the 20 scores for S 3 that are below 15, there are 15 zeros, a one, and a three; indicating that QrMads showed a significant improvement on these test problems.

Table 3 shows the results for the set of constrained test problems. QrMads received a score for S 1 of 15 or more on 17 out of 28 of the test problems from this set, meaning that QrMads performed better than OrthoMads on a majority of these problems. For S 2, QrMads received a score of 15 or more on all but five of the constrained test problems, again showing that QrMads finds a final value within a 1 % tolerance of OrthoMads on a large majority of these problems. OrthoMads received a score for S 3 of 15 or more on 15 out of 28 of the test problems from this set. Of the 13 scores for S 3 that are below 15, there are ten zeros, a two, and a seven; indicating that QrMads showed significant improvement on these test problems. It is of interest to note that many of the constrained problems in this set were chosen to be generalizations of nonsmooth problems on which QrMads did not perform as well as OrthoMads (based on the S 1 score), yet QrMads performed better than OrthoMads on most of these problems when the dimension exceeded 20.

Finally, Table 4 summarizes the results for all the test problems broken into categories of increasing dimension. The only categories where QrMads does not show a clear improvement over OrthoMads are the cases where the dimension n is below 20. As the dimension of the problems increases, however, we can see that QrMads outperforms OrthoMads on a consistent basis. We believe that this reflects the decreasing quality in the distribution of the poll directions for OrthoMads as n increases and the relatively uniform distribution of poll directions for the QrMads implementation for all dimensions, as illustrated in Fig. 2.

7 Conclusions

Just as OrthoMads was developed to generate a more uniform set of polling directions than what is produced by LtMads, this paper introduces QrMads as a means of further improving on this idea when compared to OrthoMads. Although OrthoMads produces a relatively uniform distribution of poll directions for low dimensional problems, in high dimensions this is no longer the case. It is the use of the Householder transformation for generating an orthogonal set of directions that causes a significant number of directions to cluster toward the coordinate directions, when we consider the set of directions taken over all iterations. Although the authors of [7] believed orthogonality to be important for eliminating large angles between poll directions, they overlooked a subtle mathematical implication of using Householder transformations that prevent a uniform distribution, which is masked in low dimensions. QrMads seeks to remedy this condition by using QR decomposition and an equal area partition of the unit sphere to generate a more uniformly distributed set of directions that are ‘nearly orthogonal’ at each iteration. The trade-off is that QrMads is not deterministic. Consequently, an area of further study is that of constructing a deterministic version of this Mads instance.

In some real-world applications, the coordinate directions often have important physical meaning, and thus may be better choices than those chosen randomly. In this case, OrthoMads may perform better. On the other hand, we believe that having a uniform distribution of poll directions is beneficial for problems with higher dimension and when polling near constraint boundaries because the more uniform set of directions should increase the chance of finding a direction close to a tangent cone generator and thus help the algorithm to avoid getting bogged down near a constraint boundary. The results from the problems tested in this paper support this conjecture since QrMads was shown to outperform OrthoMads on a large majority of the higher dimensional problems (n≥20) and a large majority of the constrained problems.