1 Introduction

Many real-world optimization problems are stated as a global optimization problem since functions describing these applications are often multiextremal, non-differentiable, and hard to evaluate even at one point (see, for example [17, 21, 22, 31, 34, 35, 47, 52]). In this paper, we focus our attention on continuous global optimization problems

$$\begin{aligned} \min \{F(y): \ y\in S=[a,b]\}, \end{aligned}$$
(1.1)

where S is a hyperinterval in \(\mathbf{R}^N\) and the objective function F(y) can be multiextremal, non-differentiable, and given as a “black-box”, i.e., any information regarding its analytical representation or any other data describing its structure is not available. However, it is supposed that F(y) satisfies the Lipschitz condition

$$\begin{aligned} |F(y')-F(y'')| \le L \Vert y'-y''\Vert , y', y''\in S, \end{aligned}$$
(1.2)

with an unknown Lipschitz constant L, \(0<L<\infty ,\) in the Euclidean norm. This statement can be very often encountered in practice and in the literature there exist numerous methods for dealing with the problem (1.1), (1.2) (see, e.g., [1, 3,4,5, 13, 17, 21, 23, 32,33,34, 41, 47, 48, 51, 52]).

In this paper, we consider the applied problem (1.1), (1.2) by using one of the mostly abstract mathematical objects—space-filling curves introduced by Peano in 1890 and independently by Hilbert in 1891 (even though we use Hilbert’s version of the curves, the traditional terminology for this kind of objects is “Peano curves” due to the priority of Peano). The curves under consideration emerge as the limit objects generated by an iterative process. They are fractals constructed using the principle of self-similarity. It is possible to prove that the curves fill in the hypercube \(S \subset \mathbf{R}^N\), i.e., they pass through every point of S (this fact gave rise to the term “space-filling curves”). It is known that it is possible to reduce the dimension of the global optimization problem (1.1), (1.2) by using the curves and to move from a multivariate problem to a univariate one (see studies in this direction in [2, 38, 44,45,46,47]).

More precisely, it can be shown (see [2, 45, 47]) that, by using space-filling curves, the multi-dimensional global minimization problem (1.1), (1.2) can be turned into a one-dimensional problem and that finding the global minimum of the Lipschitz function \(F(y), y \in S \subset R^N,\) is equivalent to determining the global minimum of the one-dimensional function f(x) over the interval [0, 1], i.e., it follows

$$\begin{aligned} f(x)=F(p(x)), \quad x\in [0,1], \end{aligned}$$
(1.3)

where p(x) is the Peano curve. Moreover, the Hölder condition

$$\begin{aligned} |f(x')-f(x'')| \le H |x'-x''|^{1/N}, \quad x',x'' \in [0,1], \end{aligned}$$
(1.4)

holds (see [47]) for the function f(x) with the constant

$$\begin{aligned} H=2L\sqrt{N+3}, \end{aligned}$$
(1.5)

where L is the Lipschitz constant of the original multi-dimensional function F(y) from (1.1), (1.2). In Fig. 1-right, the reduced function in one dimension corresponding to the test function in two dimensions from Fig. 1-left is shown. Clearly, a numerical approximation of the Peano curve is used in computations for the reduction. Thus, one can try to attack the problem (1.1), (1.2) by proposing algorithms for minimizing Hölderian function (1.3), (1.4) in one dimension.

Fig. 1
figure 1

A two-dimensional function from [10] satisfying the Lipschitz condition together with an approximation of level 5 to Peano curve (left) and the corresponding univariate Hölderian function (right). Dots show points on the curve where the objective function has been evaluated

It can be seen from the statement of the original problem (1.1), (1.2) that the only available information regarding the multi-dimensional function F(y) is that F(y) satisfies the Lipschitz condition (1.2) with an unknown constant L. As a result, the way the Lipschitz information is used by an optimization algorithm becomes crucial for its performance, convergence, and speed. In the literature there exist several methods to estimate L (see [4, 11, 12, 15,16,17,18, 42,43,44, 47, 50]), and it is known that an overestimation of L may slow down the search whereas an underestimate of the constant can lead to loss of the global solution. Let us briefly describe methods used to estimate L.

First, there exist algorithms that for the whole domain S use the same a priori given estimate of L or its adaptive estimate recalculated during the search at each iteration (see, e.g., [4, 17, 18, 33, 34, 36, 43, 44, 47]). This approach does not take into account any local information about the behavior of the objective function over small subregions of the domain S. This drawback can slow down the search significantly. A more advanced approach proposed originally in [39, 40] suggests to adaptively approximate local Lipschitz constants \(\tilde{L}(D_j)\) in different subregions \(\ D_j\subset S\) of the search region S during the process of optimization. This procedure performs a local tuning on the behavior of the objective function balancing global and local information obtained during the search (see also interesting hybridization ideas in [49, 50]). It has been shown in [20, 24, 39, 44, 47] that the local tuning techniques can lead to a significant acceleration of the global search. Another interesting approach that has been introduced in [19] in the popular method called DIRECT uses at each iteration several estimates of the Lipschitz constant L simultaneously. This way to deal with Lipschitz information attracts a wide interest of researchers (see, e.g., [6,7,8,9, 19, 23, 29,30,31,32,33]) and is under scrutiny in this work, as well.

In this paper, we propose to use Peano curves and instead of using the Lipschitz information in many dimensions to work with the Hölder information in one dimension trying to obtain several estimates of the Hölder constant using the DIRECT methodology. It should be stressed that such a transposition of the approach is not trivial at all. In fact, in the literature (see [14, 24, 25, 27, 28, 44]) there exist several methods estimating global and local Hölder constants whereas the usage of the DIRECT approach encounters a number of serious difficulties (see [26]) in the context of Hölder optimization. In Sect. 2, we describe a strategy that solves them and allows us to work with several estimates of the Hölder constant at each iteration. Then, a two-phases procedure intended to accelerate the search is presented in Sect. 3. A new algorithm using both discoveries for solving the problem (1.1), (1.2) and its convergence properties are described in Sect. 4. Section 5 presents results of numerical experiments that compare the new method with its competitors on 1000 test functions randomly generated by the GKLS-generator from [10]. Finally, Sect. 6 contains a brief conclusion.

2 Two ways to represent Hölderian minorants

Due to the use of the Peano space-filling curves, the N-dimensional problem (1.1), (1.2) is turned into the one-dimensional problem (1.3), (1.4) with the one-dimensional objective function f(x) from (1.3) satisfying the Hölder condition (1.4) with a constant \(0< H < \infty \) over the interval [0, 1]. It follows from (1.4) that, for all \(x, z\in [0,1]\) we have

$$\begin{aligned} f(x) \ge f(z)-H |x-z|^{1/N}. \end{aligned}$$
(2.1)

This fact means that the function

$$\begin{aligned} G(x)=f(z)-H |x-z|^{1/N}, \end{aligned}$$

with \(z\in [0,1]\) fixed, is a minorant (or support function) for f(x) over [0, 1], i.e.

$$\begin{aligned} f(x) \ge G(x), x \in [0,1]. \end{aligned}$$

Analogously, if we consider subintervals \(d_i=[a_i,b_i]\), \(1 \le i \le k,\) belonging to [0, 1] we obtain that the following function

$$\begin{aligned} G^k(x)= & {} g_i(x),\,\,\, x\in [a_i,b_i], \,\, 1 \le i \le k, \end{aligned}$$
(2.2)
$$\begin{aligned} g_i(x)= & {} \left\{ \begin{array}{ll} g_i^-(x) = f(m_i) -H (m_i-x)^{1/N}, &{} x\in [a_i,m_i], \\ g_i^+(x) = f(m_i) -H (x-m_i)^{1/N}, &{} x\in [m_i, b_i], \end{array} \right. \end{aligned}$$
(2.3)
$$\begin{aligned} m_i= & {} (a_i+b_i)/2 \end{aligned}$$
(2.4)

is a discontinuous nonlinear minorant for f(x) (see Fig. 2) and the values \(R_i, 1 \le i \le k,\) are lower bounds for the function f(x) over each interval \(d_i, 1 \le i \le k\). These values are called characteristics of intervals and can be calculated as follows if an overestimate \(H_1 \ge H\) of the Hölder constant H is given

$$\begin{aligned} R_i= R_i(H_1) = \min _{x \in [a_i,b_i]} g_i(x) = f(m_i) -H_1 |(b_i-a_i)/2|^{1/N}. \end{aligned}$$
(2.5)
Fig. 2
figure 2

Hölder support functions

As was mentioned in the introduction, the DIRECT algorithm (see [19]) uses at each iteration several estimates of the Lipschitz constant for selecting a suitable set of subintervals in the central points of which to evaluate the objective function. This selection can be easily done thanks to a smart representation of the intervals in a diagram in two dimensions. This representation is the core point of DIRECT and can be done since the Lipschitz information is used by this method to produce piece-wise linear minorants. In order to use the same methodology in the framework of the Hölderian optimization it is necessary to be able to find a suitable representation of intervals, as well.

Let us try to do this following the idea of DIRECT and show that a simple transposition from Lipschitz to Hölder world does not work. We represent in a two-dimensional diagram each interval \( d_i=[a_i,b_i]\) by a point with coordinates \( (h_i,f(m_i)) \), where \(h_i=0.5(b_i-a_i) \) and \(m_ i\) is from (2.4) exactly as DIRECT does. In Fig. 3-left, we have represented five different intervals \(d_A, d_B, d_C, d_D\), and \(d_E\) by the points A, BCD, and E, respectively. If we consider a fixed overestimate \(H_1\) of the Hölder constant, we can observe the corresponding nonlinear support functions (2.3) (shown in blue solid lines) related to these intervals. The characteristic \(R_A(H_1)\) of the interval represented by the dot A is obtained as the intersection of the curve (2.3) constructed at the point A with the vertical coordinate axis. It can be seen that the best (the lowest) characteristic is \(R_D(H_1)\) and the interval \(d_D\) would be subdivided at the next iteration if \(H_1\) is chosen as the estimate for H. However, the choice of \(R_D(H_1)\) is not easy since, as it can be seen from Fig. 3-left, the curves constructed using the estimate \(H_1\) intersect one another in various ways.

In addition, remind that we do not know the real value of H and wish to try all possible estimates of H from zero to infinity. The auxiliary functions corresponding to the second estimate \(H_2\) are shown in Fig. 3-left by red dashed lines. They produce again a lot of intersections among themselves and with the curves corresponding to \(H_1\). It becomes clear that the selection of the lowest characteristic for all possible estimates of H even with such a small number of intervals becomes complicated and it is unclear how to select intervals by varying estimates of the Hölder constant from 0 to infinity.

In order to overcome this difficulty and to give a more transparent procedure for selection of the best characteristics, a different representation of the intervals is proposed. The idea consists of the usage of the metric of Hölder instead of the Euclidean one in the construction of the diagram. More precisely, a generic interval \(d_i=[a_i,b_i]\) belonging to a current partition \( \{ D^k\}\) at the kth iteration is represented by a dot \(P_i\) with the coordinates \((p_i, w_i)\) where

$$\begin{aligned} p_i=|(b_i-a_i)/2|^{1/N}, w_i=f(m_i), \end{aligned}$$
(2.6)

and \(m_i\) is from (2.4).

Fig. 3
figure 3

Representation of intervals in the Euclidean metric (left) and in the Hölderian metric (right)

In Fig. 3-right, the representation of the same five intervals considered in Fig. 3-left can be observed in the new metric. A great simplification can be clearly seen since there are no more nonlinear curves and intersections between them for each fixed estimate of H. The obtained diagram is very similar to that used by the DIRECT method, in the Lipschitzian case [19]. In Fig. 3-right, the characteristic \(R_A(H_1)\) of the interval represented by the point A is exactly the intersection of the line passing through A with slope \(H_1\) and the vertical coordinate axis. Notice that, as expected, the values in the vertical coordinate axis coincide with those of Fig. 3-left. The selection of intervals with the best characteristics corresponding to different estimates of H becomes so much easier and is discussed in the following two sections.

3 Selection of intervals: two-phase approach

In this section, we describe in detail the intervals selection procedure that will be used in the method to be introduced in Sect. 4. As was already said above, at each iteration k the method should select in a suitable way a promising set of subintervals in which it intends to intensify the search and execute new trials (trial is evaluation of f(x) at a point x that is called trial point). To accelerate the search, a two-phase technique that balances the global and local information collected during the work of the method is introduced.

Fig. 4
figure 4

The nondominated intervals \(d_A, d_B, d_C, d_E, d_G\) and \(d_I\) are represented by dots ABCEG and I

In order to describe the selection procedure let us discuss Fig. 4 that shows a possible scenario at a generic iteration k of the algorithm. The interval [0, 1] [remind that since Peano curves are applied, the search is performed over the one-dimensional interval [0, 1] (see 1.3)] is subdivided into subintervals \(d_i=[a_i,b_i]\), \(i=1, \ldots , I(k)\), belonging to the current partition \(D^k\). Each interval is represented by a point in the two-dimensional diagram in Fig. 4, with coordinates given by (2.6), and is characterized, for each fixed value of H, by a lower bound given by \(R_i\) from (2.5). Points with the same abscissa represent intervals that have the same width. In Fig. 4, there are nine different groups of intervals corresponding to the points A, B, ..., I. At each iteration \(k \ge 1\) of the method each group of intervals receives a positive integer index \(l=l(k)\). The first group of large intervals (the column of the dot A in Fig. 4) gets the index \(l=1\), and the subsequent intervals are identified progressively by indices 2, 3, 4, ...etc. So, in Fig. 4 there are nine groups with indices 1, 2, ..., 9. The index 9 is referred to the group of intervals with minimal width (column of the point I).

For any fixed value H of the Hölder constant, it is easy (see Fig. 3-right where lower bounds for \(H=H_1\) and \(H=H_2\) are shown) to identify the interval corresponding to the minimal lower bound with respect to the other intervals in the current partition. By varying the value of H from 0 to infinity, the method should select a set of intervals corresponding to the smallest lower bound from (2.5) for some particular estimate of the Hölder constant H. These intervals should be partitioned during the next iteration and are called nondominated intervals and it can be easily seen that they are located on the lower-convex hull of the set of dots representing the intervals. In Fig. 4 the nondominated intervals are identified by points located at the bottom of each group with the same horizontal coordinate, that is points ABCEG and I. In practice, to determine these intervals algorithms for identifying the convex hull of the dots can be used, for example, the algorithm called Jarvis march, or gift wrapping, see [37]. Notice that the points HF, and D do not represent nondominated intervals even though they are the lowest in their groups. This happens because (see, e.g., the point F) the point G dominates F at smaller values of Hölder constant H and the point E dominates F at higher values of H.

The two phases (that can interchange each other several times during the work of the method) are the following: investigation of large unexplored intervals in order to find attraction regions of local minimizers that are better than the current best found solution (global phase) and a local improvement of the current best found solution (local phase). In order to explain their functioning let us remind that all the intervals on the diagram (see Fig. 4) are ordered in the increasing order from smaller to larger intervals along the horizontal axis. Thus, well explored zones of the search region corresponding to attraction regions of already visited local minima are located on the left-hand part of the diagram (small intervals) whereas unexplored zones of the domain are represented on the right-hand part of the diagram (large intervals). If during the work of the global phase a better solution than the current one has been obtained, then the method switches to the local phase in order to improve the new best record. After several improving steps the method switches back to the global phase and the search of new promising minima continues until the satisfaction of a stopping rule.

During the global phase the new algorithm explores mainly large intervals, thus it identifies the set of nondominated intervals not among all groups of intervals but only among some groups with indices lower than a calculated “middle index” r. This index represents a separator between the groups of large intervals and small ones. The global phase is performed until a function value improving the current minimal value on at least \(1\%\) is obtained. When this happens, the method switches to the local phase in the course of which the obtained new solution is improved locally. In the case when the algorithm is not switched to the local phase during more than a fixed number IglobMax of iterations (the improvement of the current minimum is still not found by exploring large intervals), it performs one “security” iteration in which determines nondominated intervals considering all groups of intervals present in the diagram.

Thus, during each iteration of the global phase the algorithm identifies a set of nondominated intervals. The subdivision of each of these intervals is performed only if a significant improvement on the function values with respect to the current minimal value \(f_{min}(k)\) is expected, i.e., once an interval \(d_t\in \{ D^k\} \) becomes nondominated, it can be subdivided only if the following condition is satisfied

$$\begin{aligned} R_t(\tilde{H}) \le f_{min}(k) -\xi , \end{aligned}$$
(3.1)

where the lower bound \(R_t=R_t(\tilde{H})\) is from (2.5) and the parameter \(\xi \) prevents the algorithm from subdividing already well-explored small subintervals.

During the local phase improving the just found new best solution the algorithm always explores three intervals: the interval containing the best current point (best interval) and the intervals located on the right and on the left of it. This phase finishes when the width of at least one of these intervals is less than a given accuracy. After the end of the local phase the algorithm switches back to the global phase and tries to find better solutions that can be located far away from the current best point. Notice that during the local phase a security iteration is carried out after performing a fixed number IlocMax of iterations without switching to the global phase. This is done in order to avoid a too long concentrating of efforts at local minima that are not global solutions. As before, at the security iteration nondominated intervals among all groups of intervals present in the diagram are taken into consideration.

Once the selection phase (local or global) has been concluded, the chosen intervals are subdivided in order to produce new trial points by the following partition strategy. At a generic iteration k, let \(S_k\) be the set of the intervals to be partitioned and \(d_t=[a_t,b_t]\) be an element of \(S_k\) represented by the corresponding point in the diagram at Fig. 4. Each interval \(d_t\) of the set \(S_k\) is subdivided into three equal parts

$$\begin{aligned}{}[a_t,b_t]=[a_t,u_t]\cup [u_t, v_t] \cup [v_t, b_t], \end{aligned}$$
(3.2)

of the length \((b_t-a_t)/3\), with

$$\begin{aligned} u_t=a_t+(b_t-a_t)/3, \ \ v_t=b_t-(b_t-a_t)/3. \end{aligned}$$
(3.3)

The three new generated intervals are added to the current partition \(\{ D^k\} \) and to the diagram in Fig. 4 and the interval \([a_t,b_t]\) is deleted from both. Finally, two new trials, \(f(c_1)\) and \(f(c_2)\), are executed at the central points of the new intervals \( [a_t,u_t]\) and \([v_t,b_t]\), where

$$\begin{aligned} c_1=(a_t+u_t)/2, c_2=(v_t+b_t)/2. \end{aligned}$$
(3.4)

Notice that the midpoint of the third interval \([u_t,v_t]\) is also the midpoint of the initial interval \([a_t,b_t]\) and, therefore, the function f(x) has already been calculated in it at previous iterations.

We conclude this section by reminding that the objective function f(x) is obtained by applying Peano curve that theoretically is introduced as a limit object being a fractal constructed using principles of the self-similarity. In practice, computable approximations of the Peano curve are used. Let us denote them by \(p_M(x)\), where M is the level of approximation of the curve (see the approximations with \(M=5\) in Fig. 1, respectively). The choice of the level M of the curve is essential to obtain a good performance of the method: in fact, a level that is too low can be insufficient to fill in the domain in an appropriate way creating so a risk to lose the optimal solution. On the other hand, when the value of M increases, the function in one dimension becomes more oscillating, especially if the dimension N of the original problem (1.1) grows up (see [28] for a detailed discussion). With increasing the dimension N, the width of intervals selected for partitioning can become very small (remind that we are in [0, 1] and the metric of Hölder is used) and even get close to the computer precision. For these reasons it is required an additional check of the width of the interval before subdivision. Namely, the interval \(d_t=[a_t,b_t]\) is partitioned only if the following condition is satisfied

$$\begin{aligned} b_t-a_t > \delta , \end{aligned}$$
(3.5)

where \(\delta \) is a parameter of the method.

4 The GOSH algorithm

In this section, a new algorithm called GOSH(Global Optimization algorithm working with a Set of estimates of the Hölder constant) is presented.

To describe the algorithm formally, we need to specify some notations. Suppose that at an iteration \(k \ge 1\) a partition \(\{D^k\}\) of \(D=[0,1]\) has been obtained. Suppose also that each interval \(d_i \in \{D^k\}\) is represented by a dot in the two-dimensional diagram from Fig. 4 and each group of intervals with the same width is numbered by the same integer index: this index is an integer positive number that varies between imax(k) (index that identifies the column of the larger intervals) and imin(k) (index of the column of the smaller intervals). The following notations are also adopted:

  • \(f_{min}(k)\) is the best function value (the “record” value) at the iteration k, and \(x_{min}(k)\) is the corresponding coordinate.

  • \(d_{min}(k)\) is the interval containing the point \(x_{min}(k)\).

  • \(f_{prec}(k)\) is the old best record. It serves to memorize the record \(f_{min}(k)\) at the start of the current phase (local or global).

  • Lcount and Gcount are counters of iterations performed during the local and global phases, respectively.

  • IlocMax and IglobMax are maximal allowed numbers of iterations that can be executed during the local and global phases, respectively, before making the general security iteration (in which the nondominated intervals are selected from the entire search domain).

  • phase is a flag specifying the current phase. It is equal to “loc” and “glob” in the local and global phases, respectively.

  • \(p_M(x)\) is the M-approximation of the Peano curve.

  • \(S^k\) is the set of intervals, \(S^k\subset D^k\), that will be subdivided and the corresponding set \(J^k\) is the set of their indices.

  • jloc is a flag that takes into account the fact that the set \(S^k\) can be empty. In this case \(jloc=0\), otherwise \(jloc=1\).

We are ready now to describe the algorithm.

Algorithm GOSH

figure a

Different stopping criteria can be used in the GOSH algorithm introduced above. One of them will be introduced in the next section presenting numerical experiments.

Let us make some comments upon the introduced method. Step 1 is the phase of selection of the intervals that, as was said above, can be either global or local. Suppose that at a generic iteration k of the algorithm the situation is that shown in Fig. 4, with 9 different groups of intervals, and assume that the interval \(d_{min}(k)\) containing the current minimum point \(x_{min}(k)\), belongs to the group of intervals identified by the index 7 (so exactly the point G). If \(phase=loc\) then 3 intervals will be selected: \(d_{min}(k)\), that corresponds to the point G in the diagram Fig. 4 and the intervals located to the right and to the left of it in [0, 1], respectively. Notice, that the latter two intervals, namely \(dr_{min}(k)\) and \(dl_{min}(k)\), can belong to two different groups of intervals in the diagram and not necessarily to the group with the index 7. In contrast, if the situation where \(phase=glob\) takes place then the separator index r is calculated where \(r=\lfloor \frac{7+1}{2}\rfloor =4\) and the nondominated intervals are searched only among the groups of intervals from index 1 to index 4. In this example, intervals represented by the points AB, and C at the diagram in Fig. 4 will be selected and split in three parts. Dots AB, and C will disappear from the diagram and there will be three new points in the column of B, three in the column of C, and three in that of D.

If in the local phase it happens that \(Lcount=IlocMax\) (or, analogously, in the global phase \(Gcount=IglobMax\)) then nondominated intervals among all groups of intervals are retrieved. Thus, in the diagram at Fig. 4 intervals represented by points ABCEG,  and I will be split. The three intervals obtained by the interval \(d_I\) will be represented by three points in the newly created column with the index 10. Notice that only intervals that satisfy condition (3.5) are selected for the further subdivision. It should be also emphasized that in Step 3, at the situation \(phase=loc\), the local exploration continues until the width of at least one of the 3 selected intervals is smaller than a fixed \(\delta ' \ge \delta \), with \(\delta \) from (3.5).

Let us consider now convergence properties of the GOSH algorithm. The first result discusses a connection between the original multi-dimensional problem and the reduced univariate one. To obtain the latter problem and to go to the interval [0, 1], an approximation \(p_M(x)\) of the Peano curve of a fixed level M is applied and in the course of the algorithm a lower bound \(U^*_M\) of the multi-dimensional function F(y) is calculated along the curve. In order to return to the original problem (1.1), (1.2) in N dimensions, it is important to understand how a lower bound for F(y) over the entire domain [ab] in \(\mathbf{R}^N\) can be obtained from \(U^*_M\). The following theorem gives the answer to this problem.

Theorem 4.1

Let \(U^*_M\) be a lower bound along the space-filling curve \(p_M(x)\) for a multi-dimensional function F(y), \(y\in [a,b]\subset R^N\), satisfying Lipschitz condition with constant L, i.e.,

$$\begin{aligned} U^*_M \le F(p_M(x)), \quad x\in [0,1]. \end{aligned}$$

Then the value

$$\begin{aligned} U^*=U^*_M - 2^{-(M+1)}L\sqrt{N} \end{aligned}$$

is a lower bound for F(y) over the entire region [ab].

Proof

See [28] or the recent monograph [44] for the proof of this result. \(\square \)

Theorem 4.1 is important because it links the multi-dimensional problem (1.1), (1.2) to the one-dimensional problem (1.3), (1.4), so we can concentrate our attention on the convergence properties in the one-dimensional interval [0, 1]. Let us suppose that the maximal number of generated trial points tends to infinity, and prove that the infinite sequence of trial points generated by the GOSH converges to any point of the one-dimensional search domain. This kind of convergence is called everywhere dense convergence.

Theorem 4.2

If \(\delta =0\) in (3.5), then for any point \(x \in [0,1]\) and any \(\eta >0\) there exists an iteration number \(k(\eta )\ge 1\) and a trial point \(x^{i(k)}\), \(k>k(\eta )\), such that \( |x-x^{i(k)}|< \eta \).

Proof

In the selection Step 2 of the algorithm the two phases, local and global, are alternated. In the local phase of GOSH an interval is subdivided only if its width is greater than a fixed \(\delta '>0\), \(\delta '\) from Step 3 of GOSH. When the width of the selected interval becomes less than \(\delta '\), the algorithm switches to the global phase. Since it is assumed that \(\delta =0\) in (3.5), and since the one-dimensional search region has a finite length and \(\delta '\) is a positive finite number, then there exists a finite iteration number \(j=j(\delta ')\) such that, for all iterations greater than j, only the global phase will be used during the work of the GOSH.

In the global phase the algorithm GOSH always selects for partitioning at least one interval \(d_t\) from the group of largest intervals (in Fig. 4 the group with index 1). In fact, there always exists a sufficiently large estimate \(H_\infty \) of the Hölder constant H, such that the interval \(d_t\) is the nondominated interval with respect to \(H_\infty \), and condition (3.5) is satisfied. Therefore, at each iteration, the intervals with the largest width will be partitioned into three subintervals of the length equal to a third of the length of the subdivided interval. Notice that each group of intervals contains only a finite number of intervals since the interval is finite and all its subintervals have a finite length. Thus, after a sufficiently large number of iterations \(k>k(\eta )\), all the intervals of the group with the maximal width will be partitioned. Such a procedure will be repeated with a new group of the largest intervals (the group with index 2 in Fig. 4) and so on until the largest intervals of the current partition will have the length smaller than \(\eta \). As a result, in the neighborhood of radius \(\eta \) of any point in [0, 1] there will exist at least one trial point generated by the GOSH. \(\square \)

5 Numerical experiments

In this section, results of some numerical experiments are presented. The new algorithm GOSH has been compared with the original DIRECT method [7] and its locally-biased modification LBDirect proposed in [8, 9]. In order to show the usefulness of the two-phase approach, the GOSH has been compared with its simplified version (called CORE hereinafter) that does not apply the local phase at all and only the global phase is used.

Ten different classes of functions generated by the GKLS-generator, a free software downloadable from http://wwwinfo.deis.unical.it/~yaro/GKLS.html and described in [10] have been used in the experiments. This generator constructs classes of multi-dimensional and multiextremal test functions with known global and local minima: each function is obtained by a paraboloid, systematically distorted by polynomials. Each class contains 100 test functions with the same number of local minima. In order to generate a specific class, only five parameters should be defined by the user (see Table 1), and it possible to generate harder or simpler test classes very easily. For example, a more difficult test class can be obtained either by decreasing the radius \(r^*\) of the attraction region of the global minimizer or by increasing the distance d from the paraboloid vertex to the global minimizer. In Table 1 we can see a complete description of the 10 classes that we have used in the experiments, for a total of 1000 test functions, in dimensions \(N=2,3,4,5,\) and 6. For each dimension two different classes, a simple class and a hard one, have been generated. The number of local minima m was taken equal to 10 and the global minimum \(f^*\) was fixed to \(-1\) for all the classes. In Fig. 1-left, an example of the test function no. 4 belonging to the class 1 is shown.

Let us describe the stopping rules used in the experiments. First, the tested algorithms stopped their work when the maximal number of trials \(T_{max}\), equal to \(10^6\) was reached. Remind, that the GKLS generates problems with known minima. This gives the possibility to use the vicinity of trials to the global minimizer as a measure of success of the work of algorithms and to construct an appropriate stoping rule. Let us denote as \(y^*_i\) the global minimizer of the i-th function of a test class, \(1\le i \le 100\). Then, the following condition can be applied.

Table 1 Description of 10 classes of randomly generated test functions used in the numerical experiments

Stopping criterion A method stops its work on the i-th function of a class when it generates a trial point falling in a ball \(B_i\) having a radius \(\rho \) and the center at the global minimizer of the i-th function, i.e.,

$$\begin{aligned} B_i = \{y \in R^N : \Vert y-y_i^*\Vert \le \rho \}, \quad 1\le i \le 100. \end{aligned}$$
(5.1)

In the experiments, the radius \(\rho \) in (5.1) was fixed equal to \(0.01\sqrt{N}\) for classes 1, 2, 3, 4, and 5, and \(0.02\sqrt{N}\) for classes 6, 7, 8, 9, and 10. It should be added also that the parameter \(\xi \) in (3.1) was fixed as follows

$$\begin{aligned} \xi = 10^{-4} \cdot | f_{min}(k) |, \end{aligned}$$

where \(f_{min}(k)\) is the current best function value. This choice has been considered by many authors (see [8, 9]), in particular, it has been used in the DIRECT method [7] with the most robust results. For this reason, in our experiments the same value was used, as well. Notice that for the DIRECT and LBDirect methods it is recommended (see, e.g., [7]) to verify stopping conditions after the end of each iteration and this rule has been used in our experiments since the usage of the rule (5.1) gives an insignificant improvement only.

The value of the parameter \(\delta \) in (3.5) was fixed equal to \(10^{-4}\) for classes 1 and 2, \(10^{-7}\) for classes 3 and 4, \(10^{-9}\) for the class 5, \(10^{-10}\) for classes 6 and 7, \(10^{-11}\) for classes 8, 10 and equal to \(10^{-12}\) for the class 9. The parameter \(\delta '\) in Step 3 of the algorithm GOSH was chosen equal to \(\delta \).

In the algorithms GOSH and CORE, an M-approximation of the Peano curve has been considered. In particular the level M of the curve must be chosen taking in mind the constraint \(NM < K\), where N is the dimension of the problem and K is the number of digits in the mantissa depending on the computer that is used for the implementation (see [44] for more details). In our experiments we had \(K=52\), thus the value \(M=10\) has been used for classes 1–8 and \(M=8\) for classes 9 and 10.

In the GOSH algorithm we must fix the parameters IglobMax and IlocMax, in Steps 1.1 and 1.2, that specify the maximal allowed number of iterations executed on the global and local phase, respectively, before making the general security iteration, in which the nondominated intervals in the entire domain are selected. Different choices of these parameters can affect the speed of the search towards the global solution. For this reason, a sensitivity analysis with 6 different values of the parameters IglobMax and IlocMax for each class has been executed. The obtained results are shown in Table 2. For each class the average and the maximal number of function evaluations calculated for all the 100 functions is reported. The best results are shown in bold.

Table 2 Results of the sensitivity analysis

Table 3 shows results of experiments comparing the behavior of the GOSH method with the algorithms CORE, DIRECT, and LBDirect on the 10 classes of test functions. Taking into account the sensitivity analysis, the following values of the two parameters of GOSH have been chosen: \(IlocMax=5\) for classes 1, 5, 8, \(IlocMax=10\) for classes 4, 6, 7 and \(IlocMax=15\) for classes 2, 3, 9, 10. IglobMax was fixed equal to 5 for classes 1, 2, 3, 5, 8, 9, \(IglobMax=15\) for the class 10 and equal to 20 for classes 4, 6 and 7. The values of these two parameters corresponding to the best result in relation to the column “Max” of Table 2 have been chosen.

Table 3 illustrates results of experiments with all the 10 classes and the four methods. Notice that in the column “Average” the symbol \(``>''\) means that, after performing \(T_{max}\) iterations, the global minimum has not been found for all functions of the class. The column “Max” reports the maximum number of function evaluations required to satisfy the stopping criterion for all the 100 functions of the class: the notation 1000000(i) means that after evaluating 1000000 trials, the method was not able to find the global solution for \(``i''\) functions of the considered class. The best results are shown in bold.

Fig. 5
figure 5

Function No. 55, class 2. a 1541 trials generated by DIRECT and b 2281 by LBDirect. c 597 trials calculated by CORE and d 257 produced by the GOSH. Trial points chosen by the “local-phase” strategy are shown in red by the symbol “*”

Finally, in Fig. 5 the behavior of the four methods for the function no. 55 of the class 2 is shown. In the first row Fig. 5a shows 1541 trials generated by DIRECT to find the global minimum of the problem and (b) 2281 trials produced by the LBDirect. In the second row Fig. 5c shows 597 trial points calculated by the CORE and (d) 269 produced by the GOSH algorithm to solve the same problem. Trial points chosen by the “local-phase” strategy are shown in red.

Table 3 Results of numerical experiments on 1000 randomly generated test functions

6 A brief conclusion

The problem of global minimization of a multi-dimensional, non-differentiable, and multiextremal function satisfying the Lipschitz condition over a hyperinterval, with an unknown Lipschitz constant has been considered in this paper. An approach based on the reduction of the dimension by using numerical approximations to space-filling curves in order to pass from the original Lipschitz multi-dimensional problem to a univariate one satisfying the Hölder condition has been used. It has been shown that it is possible to organize a simultaneous work with multiple estimates of the Hölder constant. Such a kind of techniques has been proposed for Lipschitz optimization in 1994 in [19] and for a long time created difficulties in the framework of Hölder global optimization. A geometric technique working with a number of possible Hölder constants chosen from a set of values varying from zero to infinity has been proposed and an accelerating “two-phase” technique that performs a smart balancing of the local and global information has been introduced. Conditions ensuring convergence of the method GOSH to the global minimizers have been established. Extensive numerical experiments executed on 1000 test functions have shown a very promising performance of the proposed algorithm with respect to its direct competitors, in particular for hard problems. Thus, one of the mostly abstract mathematical objects – space-filling curves—have been used to develop a practical derivative-free global optimization algorithm that can be successfully used in numerical computations.