On convergence rate of a rectangular partition based global optimization algorithm

Calvin, James; Gimbutienė, Gražina; Phillips, William O.; Žilinskas, Antanas

doi:10.1007/s10898-018-0636-z

On convergence rate of a rectangular partition based global optimization algorithm

Published: 07 March 2018

Volume 71, pages 165–191, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Global Optimization Aims and scope Submit manuscript

On convergence rate of a rectangular partition based global optimization algorithm

Download PDF

James Calvin¹,
Gražina Gimbutienė²,
William O. Phillips¹ &
…
Antanas Žilinskas ORCID: orcid.org/0000-0003-3091-0569²

376 Accesses
13 Citations
Explore all metrics

Abstract

The convergence rate of a rectangular partition based algorithm is considered. A hyper-rectangle for the subdivision is selected at each step according to a criterion rooted in the statistical models based theory of global optimization; only the objective function values are used to compute the criterion of selection. The convergence rate is analyzed assuming that the objective functions are twice- continuously differentiable and defined on the unit cube in d-dimensional Euclidean space. An asymptotic bound on the convergence rate is established. The results of numerical experiments are included.

Convergence rate of a rectangular subdivision-based optimization algorithm for smooth multivariate functions

Article 11 August 2021

Improved scheme for selection of potentially optimal hyper-rectangles in DIRECT

Article 28 December 2017

MSO: a framework for bound-constrained black-box global optimization algorithms

Article 21 May 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Problems of global optimization of continuous multimodal functions are attacked using deterministic [1,2,3,4,5,6,7], stochastic [8,9,10,11,12], and heuristic approaches [13,14,15]. A promising research direction is the hybridization of the advantages of some of the successful methods [16, 17]. Although the proposed algorithms are thoughtfully investigated there remains an important issue that has received relatively little attention, namely the convergence rate problem. In the present paper we focus on this problem continuing the research in [18,19,20,21] where the convergence rate was established for algorithms rooted in the approach based on statistical models of global optimization. The convergence rate of global optimization algorithms in a stochastic setting has been considered for certain Gaussian probabilities; see for example [19]. The general question of average-case tractability of multivariate global optimization is mentioned (Open Problem 87) in [22].

We consider the problem of approximating the global minimum $f^*$ of a twice-continuously differentiable function f defined on the d-dimensional unit cube $A=[0,1]^d$, for $d\ge 2$. We assume that the global minimizer, denoted by $x^*$, is unique and lies in the interior of the cube. While we assume differentiability, we are interested in algorithms that only evaluate the function and not derivatives. Our interest centers on how the number of function values required to obtain a given quality approximation grows with the dimension of the domain.

The algorithm presented here is similar to a one-dimensional algorithm presented in [20] and a two-dimensional algorithm based on Delaunay triangulations presented in [21]. Those papers presented algorithms with the property that for large enough N (depending on f), the residual error (difference between the smallest function value observed and the global minimum) after N function evaluations is at most

$$\begin{aligned} c_1 \exp (-\,c_2\sqrt{N}) \end{aligned}$$

for $c_1,c_2$ depending on dimension (1 or 2) and f. While the algorithm presented in [21] was based on a Delaunay triangulation of the domain, the algorithm considered in this paper is based on a rectangular decomposition. Rectangular decompositions have been employed in previous studies of global optimization, for example in [4, 23,24,25]. The rectangular partition of the feasible region is inherent also for the interval arithmetic-based global optimization algorithms, see e.g. [26]. One of the advantages of the algorithms using rectangular partition of the feasible region is their suitability for parallelization; however the investigation of parallel versions of the algorithm proposed here is postponed for further research. For a discussion of the alternative simplicial partition of the feasible region in the development of global optimization algorithms we refer to [27].

The algorithm in [21] works for arbitrary domains, but the error bound, which required a certain quality triangulation, could only be proved for bivariate functions. In the present paper we are able to prove an asymptotic error bound for arbitrary dimension. Our main result is that eventually the error is at most

$$\begin{aligned} c_1(f,d) \exp \left( -\,c_2(f,d)\sqrt{N}\right) , \end{aligned}$$

where $c_2(f,d)$ decreases exponentially in dimension d. Recall that we assume that f has continuous partial derivatives up to order 2, and that the global minimizer is unique and is in the interior of the domain.

In the worst-case setting any nonadaptive optimization algorithm has an error of order at least $N^{-\,2/d}$. It is known that for a convex and symmetric function class such as $C^2([0,1]^d)$ adaptive algorithms are essentially no more powerful than nonadaptive algorithms for linear problems such as function approximation with error measured using the $L_\infty $ norm ([28]). Global optimization is as difficult as $L_\infty $ approximation (see Theorem 18.24 in [22]). Therefore, $N^{-\,2/d}$ is an optimal worst-case error bound for optimization on $C^2([0,1]^d)$. To obtain an error of at most $\epsilon $, we then require on the order of $(1/\epsilon )^{d/2}$ function values. For the algorithm proposed in this paper, we consider the asymptotic decay of the error, showing that for small enough $\epsilon $ on the order of $d^{d+3}\log (1/\epsilon )^2$ function values suffice. Because of the asymptotic setting, this does not contradict the worst-case results mentioned above.

With the algorithm defined in this paper, the error converges to zero for a much broader class of functions than the one we consider. For example, the error converges to zero for any continuous function defined on $[0,1]^d$, though it could be difficult to characterize the rate of convergence for such a broad class.

At the present stage of investigation we have not focused on the details of efficient software development, and have tested a rather simple implementation (without sophisticated data structures) of the algorithm with well known test functions. Our initial guess was that Delaunay triangulation-based algorithm would be considerably more efficient than the algorithm proposed in this paper. This guess was based on the observation that hyper-rectangles have more volume far from the vertices than simplexes. Also, in our Delaunay-based algorithm, each step placed a single function evaluation in a simplex, while subdividing a hyper-rectangle takes a number of evaluations that grows exponentially with dimension. However, at least for the low dimensional problems in our experiments, this guess was not confirmed. The results are presented in the last section of the paper where the testing results of the proposed algorithm and the algorithm based on Delaunay triangulation [21] are presented for comparison.

2 The algorithm

The algorithm operates by decomposing $[0,1]^d$ into hyper-rectangles as follows. Given a current decomposition, choose one of the hyper-rectangles (according to the maximal value of a criterion to be defined below) and bisect it along the longest axis by evaluating the function at up to $2^{d-1}$ midpoints of the longest hyper-rectangle edges (some of the midpoints may have been evaluated in previous decompositions).

The criterion of choice of a rectangle is rooted in definition of the P-algorithm described in detail in [29]. The P-algorithm chooses the point for the current computing the an objective function value according to the criterion of maximum of improvement probability

$$\begin{aligned} \max _{x\in A}\mathrm{\mathbf P}\{\xi (x)\le y_{0N}-\varepsilon |\xi (x_i)=y_i,\,i=1,\ldots ,N\}, \end{aligned}$$

where $\xi (x)$ is a Gaussian random field accepted as a statistical model of objective functions, and $y_i$ values of the objective function computed at previous iterations, and $y_{0N}=\min _{i=1,\ldots ,N}y_i$, $\varepsilon >0$. The maximization of the improvement probability can be simplified because of the equality

$$\begin{aligned}&\arg \mathop {\max }\nolimits _{x\in A}\mathrm{\mathbf P}\{\xi (x)\le y_{0N}-\varepsilon |\xi (x_i)=y_i,\,i=1,\ldots ,N\} \nonumber \\&\quad =\arg \mathop {\max }\nolimits _{x\in A} \frac{y_{0N}-\varepsilon -\mu (x|(x_i,y_i),\,i=1,\ldots ,N)}{\sigma (x|(x_i,y_i),\,i=1,\ldots ,N)}, \end{aligned}$$

(1)

where $\mu (x|(x_i,y_i),\,i=1,\ldots ,N)$ and $\sigma ^2(x|(x_i,y_i),\,i=1,\ldots ,N)$ denote conditional mean and variance correspondingly. The criterion for the selection of a rectangle by the algorithm considered in the present paper is a modification of (1) where conditional mean and variance are replaced by the relevant parameters of the rectangle, and $\varepsilon $ is replaced by a parameter $g(\cdot )$ which is defined below; its form of the dependence on the number of iteration is determined aiming at fast asymptotic convergence of the algorithm,

Suppose that the algorithm has evaluated f at N points. Let s(N) denote the number of hyper-rectangles in our partition. Each hyper-rectangle subdivision (iteration) requires up to $2^{d-1}$ function evaluations; let us note that at the current iteration some function values can be known since computed at previous subdivisions. Therefore, the following inequality is valid

$$\begin{aligned} s(N) \ge \frac{N}{2^{d-1}}. \end{aligned}$$

(2)

We next define some constants and functions that are used in the definition of the algorithm. Define

$$\begin{aligned} q\equiv \frac{3\cdot 2^{2/3} e^{-1}}{2\log (2)} \approx 1.27 ,\ \ \ \lambda \equiv (q d^2/4)^{d/2}, \end{aligned}$$

and set

$$\begin{aligned} g(x) \equiv \left( \lambda x \log (1/x)\right) ^{2/d}, \end{aligned}$$

for $0<x\le 1/2$ and $g(1)=q d^2/4$. Note that g is increasing and $g(x)\downarrow 0$ as $x\downarrow 0$.

It will be convenient to index quantities to be defined by the iteration number of the algorithm, which we denote by n, instead of the number of function evaluations, N, which is generally larger. Later we will also describe the error bound in terms of the number of function evaluations N.

After n iterations of the algorithm, let $M_n = \min _{1\le i\le n}f(x_i)$ and denote the error by $\varDelta _n = M_n-f(x^*)$. Define the function $L_n:[0,1]^d\rightarrow \mathbb {R}$ as follows. In the interior of hyper-rectangle R, $L_n$ coincides with the multilinear interpolant of the function values at the vertices of R. On the boundaries of hyper-rectangles set $L_n\equiv M_n$, so that $M_n$ is equal to the global minimum of $L_n$. Let $v_n$ denote the smallest volume of a hyper-rectangle after n iterations.

Our algorithm will assign a numerical value to each hyper-rectangle in the subdivision. After n iterations of the algorithm, for each hyper-rectangle $R_i$, $1\le i\le {n}$, set

$$\begin{aligned} {\rho }^n_i\equiv \int _{R_i}\frac{ds}{\left( L_n(s)-M_n + g(v_n)\right) ^{d/2}}, \end{aligned}$$

(3)

and

$$\begin{aligned} \overline{\rho }^n_i\equiv \frac{\left| R_i\right| }{\left( \inf _{t\in int(R_i)}L_n(t)-M_n + g(v_n)\right) ^{d/2} }. \end{aligned}$$

(4)

The criteria (3) and (4) are analogues of reciprocal (1) with opposite sign where $g(v_n)$ is a generalisation of $\varepsilon $. Since the hyper-rectangle boundaries have measure zero, the way we defined $L_n$ on the boundaries is immaterial for the values of the $\rho ^n_i$. Note that the $\inf $ in the denominator of (4) is equal to the minimum of f at the vertices of the hyper-rectangle.

If we are about to subdivide the smallest hyper-rectangle, then

$$\begin{aligned} \rho _i^n \le {\overline{\rho }}^n_i \le \frac{\left| R_i\right| }{g(v_n)^{d/2}} = \frac{v_n}{\lambda v_n \log (1/v_n)} = \frac{1}{\lambda \log (1/v_n)} \le \frac{1}{\lambda \log (n)}, \end{aligned}$$

(5)

with the last inequality following from the fact that the smallest hyper-rectangle in the subdivision has volume $v_n\le 1/n$.

A more formal description of the algorithm follows. In the description, N is the total number of function evaluations to make, i is the current number of hyper-rectangles, and j is the number of function evaluations that have been made. Let v, the volume of the smallest hyper-rectangle, take initial value 1 and let M, the minimum of the observed function values, take initial value the minimum of the function values at the vertices of $[0,1]^d$.

The algorithm comprises the following steps.

1.
We start with the hyper-rectangle $[0,1]^d$ and with the function values at all $2^d$ vertices. Set i, the number of hyper-rectangles, to 1 and set j, the number of function evaluations, to $2^d$.
2.
For each hyper-rectangle $R_k$ in the current collection $\{R_1,R_2,\ldots ,R_i\}$, compute ${\rho }^{{i}}_k$, keeping track of the rectangle with the lowest index $\gamma $ that has the maximal value of ${\rho }^i_\gamma $.
3.
From the hyper-rectangle $R_\gamma $ with maximal value $\rho ^n_\gamma $, form two new hyper-rectangles as follows. Suppose that
$$\begin{aligned} R_\gamma = [a_1,b_1]\times [a_2,b_2]\times \cdots \times [a_d,b_d], \end{aligned}$$
and that j is the smallest index with $b_j-a_j\ge b_i-a_i$ for $1\le i\le d$. The two new hyper-rectangles are
$$\begin{aligned} R^{\prime }_{\gamma } \equiv [a_1,b_1]\times \cdots \times [a_j,(a_j+b_j)/2]\times \cdots \times [a_d,b_d] \end{aligned}$$
and
$$\begin{aligned} R^{\prime \prime }_{\gamma } \equiv [a_1,b_1]\times \cdots \times [(a_j+b_j)/2, b_j]\times \cdots \times [a_d,b_d]. \end{aligned}$$
Evaluate f at the (at most) $2^{d-1}$ points
$$\begin{aligned} \left( c_1,\ldots ,\frac{a_j+b_j}{2},\ldots ,c_d\right) ,\ \ \ c_i\in \{a_i,b_i\},\ 1\le i\le d, \ i\ne j, \end{aligned}$$
which have not been evaluated previously.

If $\left| R_\gamma \right| =v$, then set $v\leftarrow v/2$, and if the smallest of the new function values is less than the previously observed minimum M, then set M to that new smallest value. Update the number of hyper-rectangles $i\leftarrow i+1$ and increment j by the number of new function evaluations.
4.
If $j<{N}$, return to step 2.

Typical subdivisions for the two-dimensional case are depicted in Fig. 1 in Sect. 5. In the graphs, each vertex of the mesh corresponds to a point where the function has been evaluated. Each hyper-rectangle has been evaluated at its four vertices, as well as possibly other points on the perimeter. The function $L_n$ agrees on the interior of a hyper-rectangle with the multilinear interpolation of the four vertices, disregarding additional vertices that may lie on the perimeter. The aspect ratio of each hyper-rectangle is either 1 or 2.

Let $\hat{F}\subset C^2\left( [0,1]^d\right) $ denote the set of functions that have a unique global minimizer $x^*\in ]0,1[^d$. Our error bound will hold for functions in $\hat{F}$. It is essential for our result that the function has only a finite number of isolated global minimizers; the uniqueness assumption is to simplify the presentation.

Denote the matrix of second-order partial derivatives at the minimizer by $D^2f(x^*)$, assumed positive definite.

The error bound will depend on the regularity of the function as measured by a seminorm that we now define. For a compact set $K{\subset [0,1]^d}$ and $f\in C^2(K)$, define the seminorm

$$\begin{aligned} \left\| D^2f\right\| _{\infty , K} \equiv \sup _{x\in K} \sup _{\begin{array}{c} u_1,u_2\in \mathbb {R}^d\\ \left\| u_i\right\| =1 \end{array} } \left| D_{u_1}D_{u_2}f(x)\right| , \end{aligned}$$

where $D_yf$ is the directional derivative of f in the direction y. This is a measure of the maximum size of the second derivative of f over K.

Our main result on the convergence rate of the error for this algorithm follows. Let note, that the convergence of the original version of the P-algorithm for continuous objective functions we refer to ([29]) where the improvement threshold $\varepsilon $, analogous to $g(\cdot )$ in (3), (4), can be any positive constant. With such a weaker assumption, however, the convergence rate remains not known.

Theorem 1

Suppose that $f\in \hat{F}$. There is a number $n_0(f)$ such that for $n\ge n_0(f)$, the residual error after n iterations of the algorithm satisfies

$$\begin{aligned} \varDelta _n{(f)\equiv \min _{1\le i\le n}f(x_i)-f(x^*)} \le \frac{{4^{2/d}}}{8}\left\| D^2f\right\| _{\infty , [0,1]^d} (q \cdot d) \exp \left( -\,\sqrt{n} \beta (f,d)\right) , \end{aligned}$$

where

$$\begin{aligned} \beta (f,d) = \left( \frac{2\sqrt{\pi }\left( \det (D^2f(x^*))\right) ^{1/2}}{9e d^{(d+3)/2}(\pi e q)^{d/2}} \exp \left( -\,\frac{1}{2}-\frac{16}{q}\right) \right) ^{1/2}. \end{aligned}$$

The number $n_0(f)$ after which the bound applies will be the maximum of several other $n_i(f)$’s that will be defined in the proof of the theorem. We note here that $n_0(f)$ depends (exponentially) on the value of the seminorm $\left\| D^2f\right\| _{\infty , [0,1]^d}$.

Note that the limiting error decreases when the determinant of the second derivative at the minimizer increases; a larger second derivative allows the search effort to concentrate more around the minimizer. The convergence rate in terms of the number of function evaluations n is quite fast, but the term $\beta (f,d)$ decreases exponentially fast as d increases: as $d\rightarrow \infty $, $\beta (f,d) = O\left( d^{-(d+3)/2}\right) $. Therefore, this algorithm does not escape the curse of dimension.

We focus on the information complexity; that is, the cost is the number of function evaluations. The worst-case combinatorial cost of the algorithm grows quadratically with the number of iterations.

3 Interpolation error bounds

We will use results about interpolation from [30].

Lemma 1

Consider a hyper-rectangle $R=\{x\in [0,1]^d: a_i\le x_i\le b_i, 1\le i\le d\}$, and let L denote the multilinear function that interpolates f at the vertices of the hyper-rectangle. We have the error bound

$$\begin{aligned} \left| f(x)-L(x)\right| \le \frac{1}{2}\sum _{i=1}^d(x_i-a_i)(b_i-x_i) \left\| D^2f\right\| _{\infty , R}\\ = \frac{1}{2}\frac{\sum _{i=1}^d(x_i-a_i)(b_i-x_i)}{\left| R\right| ^{2/d}} \left\| D^2f\right\| _{\infty , R}\left| R\right| ^{2/d}, \end{aligned}$$

which implies the bound

$$\begin{aligned}&\max _{x\in R}\left| f(x)-L(x)\right| \le \frac{1}{8}\sum _{i=1}^d(b_i-a_i)^2 \left\| D^2f\right\| _{\infty , R}\\&\quad = \frac{1}{8}\frac{\sum _{i=1}^d(b_i-a_i)^2}{\left| R\right| ^{2/d}} \left\| D^2f\right\| _{\infty , R}\left| R\right| ^{2/d}. \end{aligned}$$

Proof

See [30]. $\square $

The quantity

$$\begin{aligned} \frac{\sum _{i=1}^d(b_i-a_i)^2}{\left| R\right| ^{2/d}} = \frac{\sum _{i=1}^d(b_i-a_i)^2}{\left( \prod _{i=1}^d(b_i-a_i)\right) ^{2/d}} \end{aligned}$$

is a measure of how “bad” the hyper-rectangle is from the point of view of interpolation error; that is, how far it is from a cube. Since by construction of the algorithm the widths are within a factor of 2 of each other, there is some number $1\le k\le d$ and a positive number w such that k of the dimensions have width w and $d-k$ have width 2w, and

$$\begin{aligned} \frac{\sum _{i=1}^d(b_i-a_i)^2}{\left( \prod _{i=1}^d(b_i-a_i)\right) ^{2/d}} =\frac{k w^{2} + (d-k)(2w)^2}{(2^{d-k}w^d)^{2/d}} =\frac{(4d-3k)}{2^{2(d-k)/d}} =\left( d-\frac{3}{4}k\right) 2^{2k/d}. \end{aligned}$$

For $k=d$ we get the value d and for $k=1$ we get the value $(d-3/4)2^{2/d}$. Maximizing

$$\begin{aligned} d(1-3x/4)2^{2x} \end{aligned}$$

over $x\in ]0,1[$ gives a maximum value of

$$\begin{aligned} \frac{3\cdot 2^{2/3} e^{-1}}{2\log (2)} d\equiv q\cdot d \approx 1.27 d. \end{aligned}$$

Therefore, the quality metric ranges from d up to about 1.27d, and the second bound in Lemma 1 implies that

$$\begin{aligned} \max _{x\in R}\left| f(x)-L(x)\right| \le \frac{1}{8} q\cdot d \left\| D^2f\right\| _{\infty , R} \left| R\right| ^{2/d}. \end{aligned}$$

(6)

Also note that

$$\begin{aligned} d\le \frac{\sum _{i=1}^d(b_i-a_i)^2}{\left( \prod _{i=1}^d(b_i-a_i)\right) ^{2/d}} \le qd. \end{aligned}$$

(7)

In addition to $\left\| D^2f\right\| _{\infty , [0,1]^d}$, we introduce a functional that is in a sense a measure of how difficult it is to obtain an $\epsilon $ approximation of f. For $\epsilon >0$, define

$$\begin{aligned} {\mathcal I_f}(\epsilon )\equiv \int _{[0,1]^d} \frac{dx}{\left( f(x) - f^* +\epsilon \right) ^{d/2}}. \end{aligned}$$

The following lemma, proved in “Appendix A”, generalizes Lemma 3.2 of [21] to arbitrary dimension.

Lemma 2

$$\begin{aligned} \lim _{\epsilon \downarrow 0}\frac{{\mathcal I_f}(\epsilon )}{\log (1/\epsilon )} = \frac{d(2\pi )^{d/2}}{2\varGamma (1+d/2)} \cdot \left( \det (D^2f(x^*))\right) ^{-1/2} \equiv \alpha (f,d). \end{aligned}$$

4 Proof of Theorem 1

We begin with a rough outline the steps of the proof of Theorem 1, which will be based on a series of lemmas. The proofs of the lemmas appear in “Appendix B”.

Lemma 4 establishes that the hyper-rectangle that contains the global minimizer cannot be much larger than the smallest rectangle at that iteration; this fact allows us to bound the error in terms of $v_n$, the volume of the smallest rectangle. Recalling that $\rho ^n_i$ is the metric used to choose the next rectangle to split, Lemmas 5 and 6 establish that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\rho ^n_i \sim \frac{1}{\log (1/v_n)}, \end{aligned}$$

while Lemma 7 shows that

$$\begin{aligned} \sum _{i=1}^n\rho ^n_i \sim {\mathcal I}_f(g(v_n)). \end{aligned}$$

With this, Lemma 2 allows us to bound $g(v_n)$, and thereby the error. All of the analysis relies on Lemma 3, which gives an upper bound for $\rho ^n$.

We consider a fixed function $f\in \hat{F}$. The subscript n will indicate the number of iterations of the algorithm (not the number of function evaluations). We assume that $n\ge n_2(f)$, where $n_2(f)$ is the first time after $n_1(f)$ that the smallest rectangle is about to be split and

$$\begin{aligned} n_{{1}}(f) = \inf \left\{ n: n\ge \exp \left( \left( d\left\| D^2f\right\| _{\infty , [0,1]^d}\right) ^{d/2}\right) \right\} . \end{aligned}$$

(8)

(Later we will further restrict n.)

Lemma 3

For all $n\ge n_2(f)$,

$$\begin{aligned} \rho ^n {\equiv \max \{\rho ^n_1,\rho ^n_2,\ldots ,\rho ^n_{n}\}}\le \frac{{4}}{\lambda \log (1/v_n)} \le \frac{{4}}{\lambda \log (n)}. \end{aligned}$$

Lemma 4

Let $v_n^*$ denote the volume of the hyper-rectangle containing the minimizer $x^*$. For $n\ge n_2$,

$$\begin{aligned} v_n^* \le 4 v_n \end{aligned}$$

(9)

and

$$\begin{aligned} \varDelta _n \equiv M_n-f(x^*) \le 2 d^{-\,2} g(v_n). \end{aligned}$$

(10)

The following lemma gives a lower bound for $\rho ^n$ that complements the upper bound provided by Lemma 3.

Lemma 5

For $n\ge n_2(f)$,

$$\begin{aligned} \rho ^n \equiv \max _{i\le n}\rho ^n_i \ge \frac{\exp (-\,16/q)}{\lambda \log (1/v_n)}. \end{aligned}$$

Lemma 6

For $n\ge n_3(f)\equiv 2n_2(f)$, we have the bounds

$$\begin{aligned} \frac{1}{{n}}\sum _{i=1}^{{n}} \rho ^n_i \le \rho ^n \le \frac{{4}}{\lambda \log (1/v_n)} \end{aligned}$$

(11)

and

$$\begin{aligned} \frac{1}{{n}}\sum _{i=1}^{ {n}}\rho ^n_i \ge \frac{1}{{2}}\exp \left( -\frac{1}{{2}}-\frac{16}{q}\right) \frac{1}{\lambda \log (1/v_n)}. \end{aligned}$$

(12)

The following lemma allows us to approximate the average of the $\{\rho ^n_i\}$ in terms of ${\mathcal I}_f(\cdot )$, for which we have the approximation given by Lemma 2.

Lemma 7

For $n\ge n_2(f)$,

$$\begin{aligned} {\frac{1}{2}} {\mathcal I_f}(g(v_n)) \le \sum _{i=1}^{{n}}\rho ^n_i \le {\frac{9}{4}}{\mathcal I_f}(g(v_n)). \end{aligned}$$

(13)

We are now ready to prove Theorem 1. From Lemma 6 we have the bounds

$$\begin{aligned} \frac{1}{{n}} \sum _{i=1}^{{n}}\rho ^n_i \le \frac{4}{\lambda \log (1/v_n)} \end{aligned}$$

(14)

and

$$\begin{aligned} \frac{1}{{n}}\sum _{i=1}^{{n}}\rho ^n_i \ge \frac{1}{{2}}\frac{\exp (-\,1/2-16/q)}{\lambda \log (1/v_n)}. \end{aligned}$$

(15)

Therefore

$$\begin{aligned} \frac{1}{8}\exp (-\,1/2-16/q) \frac{1/2}{{n}} {\mathcal I_f}(g(v_n)) \le \frac{1}{8}\exp (-\,1/2-16/q) \frac{1}{{n}} \sum _{i=1}^{{n}}\rho _i^n\\ \le \frac{1}{2}\exp (-\,1/2-16/q) \frac{1}{\lambda \log (1/v_n)} \le \frac{1}{{n}}\sum _{i=1}^{{n}}\rho _i^n \le \frac{{9/4}}{{n}} {\mathcal I_f}(g(v_n)), \end{aligned}$$

using the first inequality in (13), (14), (15), and the second inequality in (13), respectively. We focus on the inequalities

$$\begin{aligned}&\frac{1}{16}\exp (-\,{1}/2-16/q) \frac{1}{{n}} {\mathcal I_f}(g(v_n)) \le \frac{1}{4}\exp (-\,1/2-16/q) \frac{1}{\lambda \log (1/v_n)} \\&\quad \le \frac{{9/4}}{{n}} {\mathcal I_f}(g(v_n)). \end{aligned}$$

There is a number $n_4(f)$ such that, by Lemma 2, $n\ge n_4(f)$ implies that

$$\begin{aligned} \frac{1}{2}\log (1/g(v_n))\alpha (d) \le {\mathcal I_f}(g(v_n))\le 2\log (1/g(v_n))\alpha (d). \end{aligned}$$

Replace ${\mathcal I_f}(g(v_n))$ by $\log (1/g(v_n))$ (times appropriate factor) in the extreme terms of (??) to get

$$\begin{aligned}&\frac{1}{32}\exp (-\,{1}/{2}-16/q) \frac{1}{{n}} \log (1/g(v_n))\alpha (d)\\&\quad \le \frac{1}{4}\exp (-\,1/{2}-16/q) \frac{1}{\lambda \log (1/v_n)} \le \frac{{9/4}}{{n}} 2\log (1/g(v_n))\alpha (d). \end{aligned}$$

Now substitute the value

$$\begin{aligned} \log (1/g(v_n))= -\,\log (qd^2/4) +\frac{2}{d}\left( \log (1/v_n)-\log \log (1/v_n)\right) \end{aligned}$$

to obtain the bounds

$$\begin{aligned}&\frac{1}{32}\exp ({1}/{2}-16/q) \frac{1}{{n}} \left( -\log (qd^2/4) +\frac{2}{d}\left( \log (1/v_n)-\log \log (1/v_n)\right) \right) \alpha (d)\\&\quad \le \frac{1}{4}\exp (-1/{2}-16/q) \frac{1}{\lambda \log (1/v_n)}\\&\quad \le \frac{{9/4}}{{n}} 2 \left( -\log (qd^2/4) +\frac{2}{d}\left( \log (1/v_n)-\log \log (1/v_n)\right) \right) \alpha (d). \end{aligned}$$

This implies that eventually, say for $n\ge n_5(f)$,

$$\begin{aligned} \log (1/v_n) \ge \left( \frac{dn}{36\alpha (d)\lambda }\exp \left( -\frac{1}{2}-\frac{16}{q}\right) \right) ^{1/2}, \end{aligned}$$

which entails that

$$\begin{aligned} v_n \le \exp \left( - \left( \frac{dn}{36\alpha (d)\lambda }\exp \left( -\frac{1}{2}-\frac{16}{q}\right) \right) ^{1/2} \right) . \end{aligned}$$

Then

$$\begin{aligned} M_n-f^*\le & {} \frac{4^{2/d}}{8}\left\| D^2f\right\| _{\infty , [0,1]^d} (q \cdot d) v_n^{2/d}\\\le & {} \frac{4^{2/d}}{8}\left\| D^2f\right\| _{\infty ,[0,1]^d} (q \cdot d) \exp \left( - \left( \frac{dn}{36\alpha (d)\lambda }\exp \left( -\frac{1}{2}-\frac{16}{q}\right) \right) ^{1/2} \frac{2}{d} \right) . \end{aligned}$$

For $n\ge n_0(f) \equiv \max \{ n_2(f), n_3(f), n_4(f), n_5(f)\}$,

$$\begin{aligned}&\varDelta _n \le \frac{4^{2/d}}{8}\left\| D^2f\right\| _{\infty , [0,1]^d} (q \cdot d) \exp \left( - \left( \frac{dn}{36\alpha (d)\lambda }\exp \left( -\frac{1}{2}-\frac{16}{q}\right) \right) ^{1/2} \frac{2}{d} \right) \nonumber \\&\quad = \frac{4^{2/d}}{8}\left\| D^2f\right\| _{\infty , [0,1]^d} (q \cdot d) \exp \left( - \sqrt{n}\left( \frac{d}{36\alpha (d)\lambda }\frac{4}{d^2} \exp \left( -\frac{1}{2}-\frac{16}{q}\right) \right) ^{1/2} \right) .\nonumber \\ \end{aligned}$$

(16)

Substituting the value of $\alpha $ from Lemma 2, and using the bound (from Stirling’s approximation)

$$\begin{aligned} \varGamma (1+d/2) \ge \sqrt{2\pi }(d/2)^{(1+d)/2} e^{-(1+d/2)}, \end{aligned}$$

we obtain

$$\begin{aligned} \frac{d}{36\alpha (d)\lambda }\frac{4}{d^2}&= \frac{1}{9\alpha (d)\lambda d}= \frac{1}{9 \frac{d(2\pi )^{d/2}}{2\varGamma (1+d/2)} \cdot \left( \det (D^2f(x^*))\right) ^{-1/2} \lambda d}\\&= \frac{2\varGamma (1+d/2) \left( \det (D^2f(x^*))\right) ^{1/2} }{9 d(2\pi )^{d/2} (qd^2/4)^{d/2} d}\\&\ge \frac{2 \sqrt{2\pi }(d/2)^{(1+d)/2} e^{-\,(1+d/2)} \left( \det (D^2f(x^*))\right) ^{1/2} }{9 d(2\pi )^{d/2} (qd^2/4)^{d/2} d}\\&= \frac{2 \sqrt{\pi } \left( \det (D^2f(x^*))\right) ^{1/2} }{9 e d^{3/2} (e\pi q d)^{d/2} }. \end{aligned}$$

Combined with (16) this yields the bound

$$\begin{aligned} \varDelta _n\le & {} \frac{4^{2/d}}{8}\left\| D^2f\right\| _{\infty , [0,1]^d} (q \cdot d)\\&\quad \times \,\exp \left( - \sqrt{n}\left( \frac{2 \sqrt{\pi } \left( \det (D^2f(x^*))\right) ^{1/2} }{9 e d^{3/2} (e\pi q d)^{d/2} } \exp \left( -\,\frac{1}{2}-\frac{16}{q}\right) \right) ^{1/2} \right) \\= & {} \frac{4^{2/d}}{8}\left\| D^2f\right\| _{\infty , [0,1]^d} (q \cdot d) \exp \left( -\,\sqrt{n} \beta (f,d)\right) \end{aligned}$$

where

$$\begin{aligned} \beta (f,d) = \left( \frac{2\sqrt{\pi }\left( \det (D^2f(x^*))\right) ^{1/2}}{9 e d^{(d+3)/2}(\pi e q)^{d/2}} \exp \left( -\,\frac{1}{2}-\frac{16}{q}\right) \right) ^{1/2}. \end{aligned}$$

Finally, note that this is the expression for the error after n iterations of the algorithm, which corresponds to as many as $2^{d-1}n$ function evaluations. After a total of N function evaluations, an upper bound on the error is given by

$$\begin{aligned} \frac{4^{2/d}}{8}\left\| D^2f\right\| _{\infty , [0,1]^d} (q \cdot d) \exp \left( -\,\sqrt{N} 2^{-(d-1)/2} \beta (f,d)\right) . \end{aligned}$$

This completes the proof of Theorem 1.

5 Numerical examples

The main goal of the research presented in the paper was the evaluation of convergence rate of the algorithm which generalizes the previously investigated algorithms for the case of arbitrary d. However, we do not expect the upper bound to be tight for typical test functions since the worst case objective function is considered. Therefore, including of some results of preliminary numerical experiments seemed reasonable to illustrate relation of the asymptotic convergence rate to the practical performance. The theoretical overestimate of the actual error is commented after the presentation of the experimental results.

A simple C++ implementation of the algorithm, applicable to the problems of low dimensionality, was used. Further, we plan to implement parallel version of the algorithm and perform the so called competitive software testing according to the widely accepted methodology [13, 31, 32].

The algorithm consists of the following main parts: computation of the selection criterion (3), selection and subdivision of the hyper rectangle, and storing of the new hyper rectangles produced. The computational burden needed for the implementation of the current iteration of algorithm depends on the dimensionality d and on the number of iteration. The dimensionality influences computing time of the criterion (3). The amount of stored information grows with the number of iteration, and correspondingly grows computer time needed for its management. Therefore, for the problems of higher dimensionality, the parallel implementation of the algorithm is planned. Meanwhile, the numerical results below are obtained using the straightforward C++ implementation.

The computation of criterion (3) reduces to the numerical integration over the hyper rectangle. In the two-dimensional case the integral can be computed using a combination of analytic and numerical techniques. Let $\mathbf p ^i=(p^i_0, p^i_1)$, $i=0, \dots , 3,$ be the rectangle vertices, numbered such that $p^0_0 = p^1_0 < p^2_0 = p^3_0$ and $p^0_1=p^2_1 < p^1_1=p^3_1$. The corresponding objective function values are then $f_i = f(\mathbf p ^i), i=0, \dots , 3$. The bi-linear interpolation of the objective value at an arbitrary point $\mathbf x =(x_0, x_1)$, $x_j \in [a_j, b_j], a_j=p^0_j, b_j=p^3_j, j =0, 1, $ can then be expressed as

$$\begin{aligned}&L(\mathbf x )=L(x_0, x_1)=A(x_1)x_0 + B(x_1),\nonumber \\&A(x_1)=\frac{\frac{x_1-a_1}{b_1-a_1}(f_0-f_1-f_2+f_3)+(f_2-f_0)}{b_0-a_0},\nonumber \\&B(x_1)=\frac{\frac{x_1-a_1}{b_1-a_1}(-f_0p^3_0 + f_1p^2_0 + f_2p^1_0 -f_3p^0_0) + (f_0p^3_0-f_2p^1_0)}{b_0-a_0}. \end{aligned}$$

(17)

Then

$$\begin{aligned}&\rho ^n_i=\int _{R_i}\frac{d\mathbf x }{L(\mathbf x ) - M_n + g(v_n)} = \int _{a_1}^{b_1}\int _{a_0}^{b_0} \frac{dx_0}{L(x_0, x_1) - M_n + g(v_n)} dx_1 \nonumber \\&\quad =\int _{a_1}^{b_1}\int _{a_0}^{b_0} \frac{dx_0}{A(x_1)x_0 + C(x_1)} dx_1 = \int _{a_1}^{b_1} u(x_1) dx_1, \end{aligned}$$

(18)

where

$$\begin{aligned} u(x_1)= & {} {\left\{ \begin{array}{ll}{ll} \frac{1}{A(x_1)}(\ln (A(x_1)b_0+C(x_1))-\ln (A(x_1)a_0+C(x_1))), &{}\quad \text {if}\; A(x_1)\ne 0,\\ \frac{1}{C(x_1)}(b_0-a_0),&{}\quad \text {if}\; A(x_1) = 0, \end{array}\right. }\nonumber \\ C(x_1)= & {} B(x_1)-M_n+g(v_n). \end{aligned}$$

(19)

Note that the function $u(x_1)$ is continuous at the point $\hat{x}_1=\frac{f_0-f_2}{f_0-f_1-f_2+f_3}$, $\hat{x}_1 \in [a_1, b_1],$ that gives $A(\hat{x}_1)=0$.

In the experiments for the two dimensional case we compute $\rho ^n_i$ as a single dimensional integral using Eqs. 17–19. The single dimensional integral is computed numerically using the GNU Scientific Library [33] routine gsl_integration_cquad requiring a relative error smaller than $10^{-7}$. We call the resulting implementation Rect-A. For comparison we also include the integration over the rectangle $R_i$ using the classical Monte Carlo method (GNU Scientific Library routine gsl_monte_plain_integrate) with $M=10{,}000$ evaluations of the two dimensional integrand (referred to as Rect-MC). Finally, we present a simplified version of computing $\rho ^n_i$ for dimension $d=2$ as

$$\begin{aligned} \rho ^n_i\approx \frac{|R_i|}{L(\mathbf x ) - M_n + g(v_n)},x_j=(a_j+b_j)/2, j=0, \dots , d. \end{aligned}$$

(20)

We call this algorithm version Rect-1. In all implementations the function evaluation database is employed, so that the objective is never evaluated twice at the same location.

For comparison, two following algorithms were used. A recent algorithm [21], based on the Delaunay triangulation of the optimization region and referred to as Simpl in this section, is also implemented in the C++ programming language and uses the 2D and 3D Delaunay triangulation code from the CGAL library [34]. The comparison with a non-adaptive algorithm presents interest especially because of mentioned in the introduction different convergence rates. The non-adaptive algorithm is implemented using the so called $LP_{\tau }$ sequences [35], and is worst case quasi optimal since it generates points for computing objective function values nearly uniformly in the feasible region. The results of that algorithm are referred to as Sobol.

We start by illustrating the partition of the feasible regions by the algorithms Rect-A during minimization of the following objective functions:

The six-hump camel back function [36]:
$$\begin{aligned} f(\mathbf x )&= \left( 4-2.1x_1^2 + x_1^4/3\right) x_1^2 + x_1x_2+ (-\,4+4x_2^2)x_2^2,\nonumber \\&\quad \mathbf x \in [-3, 3]\times [-\,2, 2]. \end{aligned}$$
(21)
The Branin function [37]:
$$\begin{aligned} f(\mathbf x )&= \left( x_2-\frac{5.1}{4\pi ^2}x_1^2+5\frac{x_1}{\pi }-6\right) ^2+ 10\left( 1- \frac{1}{8\pi }\right) \cos (x_1)+10,\nonumber \\&\quad \mathbf x \in [-\,5, 10]\times [0, 15]. \end{aligned}$$
(22)

Figure 1 shows the partition after $n=3000$ objective function evaluations. It can be seen that the space is roughly explored everywhere, however, the trial points are positioned more densely in the regions of potential global minima.

Next, we present the numerical experiments aimed at the comparison of the proposed algorithm with the algorithm of similar destination which implements the partition based on the Delaunay triangulation of the feasible region [21]. The latter was tested using a set of 2-dimensional objective functions from [38], that were used as well as in numerous other papers investigating Lipschitz optimization algorithms.

Table 1 presents results of minimization of test functions from [38] using the same numeration. The optimization error is presented (on a logarithmic scale) as $\delta _n=-\log _{10}(M_n-f_{min})$, after $n=1000, 3000$ function evaluations for each of the algorithms Rect-A, Rect-1, Rect-MC, Simpl [21] and Sobol. Symbol $\infty $ means that zero error has been achieved within machine precision. The numbers in the left column have been copied from the function definitions in [38] on p. 468. The values for the algorithm [21] have been copied from the Table 1 of the paper [21].

It can be seen that the various versions of the proposed algorithm perform comparably with the algorithm [21]. Moreover, there appears to be no significant difference between implementations Rect-A and Rect-MC, suggesting that the Monte Carlo technique could be used to approximate the integral in the higher dimensions. As could be expected, the results of the non-adaptive Sobol algorithm are the worst. The minimization of the function No. 13 was similarly problematic to all algorithms, as the global minimum of this function is hidden in a small basin behind a sharp spike.

Table 1 Testing results with functions from [38] defined over a 2-dimensional region

Full size table

The minimization results for objective functions defined over 3-dimensional regions are shown in Table 2. The functions are also taken from [38]. Since the analytic expression of the integral is not available in the 3-dimensional case, we included only versions Rect-1 and Rect-MC ($M=100000$ evaluations of the integrand per hyper-rectangle) of the proposed algorithm. The table shows that the proposed algorithm performs better than other algorithms.

Table 2 Testing results with functions from [38] defined over a 3-dimensional region

Full size table

The number of iteration needed to find an appropriate approximation of minimum can depend on a coincidental match of properties of an algorithm and objective function. To reduce that factor, the performance profiles below are evaluated using a sample of randomly generated problems where the same objective function is minimized over a feasible region which is randomly (with uniform distribution) scrolled for 10% of its original extend. The average and maximum errors of 100 random problems were recorded and plotted against number of function evaluations n. The logarithmic scale for the presentation of graphs was used: $\delta _n=-\,\log _{10}(M_n-f_{min})$, where $M_n$ denotes the best function value found up to the $n\mathrm{th}$ function evaluation and $f_{min}$ is the actual global minimum value. In Fig. 2 are presented graphs for (2-dimensional) six-hump camel back and Branin functions; the version of algorithm Rect-A was used, partition of the feasible region is shown in Fig. 1.

To illustrate the influence of the dimension similar experiments were performed with test problems up to dimension 5. Let us note, that contemporary publications on algorithms of similar destination, i.e supposed for expensive black box type problems, typically present experiments with objective functions up to dimensionality five, six [16, 39]. The test problems were constructed extending the two dimensional Rastrigin function [15] for dimension d:

$$\begin{aligned} f(\mathbf x ) = \sum _{i=1}^{d}x_i^2-\cos (\pi x_1) - \cos (\pi x_2),\ \ \ \mathbf x \in [-\,2, 2]^d. \end{aligned}$$

(23)

Such an extension keeps fixed number of local minima equal to 5. The global minimum point is the d dimensional vector with zero components. The results for $d=2,3,4,5$ are presented in Fig. 3, and are also quantified in the Table 3.

Now we can compare the upper bound given by Theorem 1 with the actual error estimated experimentally. Our analysis uses the inequality $v_n\le 1/n$ ($v_n$ is the volume of the smallest rectangle after n iterations), for example in the last inequalities in the proof of Lemma 4. If f is near its minimum over much of the domain, then $v_n\approx 1/n$ (think of a nearly constant function with a tiny, locally quadratic valley in the center of the domain). For our test functions, $v_n$ is much smaller than 1 / n. We computed the characteristics of the test function at (23), and arrived at a value of $\beta = 2.05 \times 10^{-4}$ for dimension $d=5$. For $n=60{,}000$, this gives an upper bound of $-\,0.02$, while the corresponding number is less than $-\,5$ in the last graph of Fig. 3 (the difference is even larger if we take into account the factor in front of the exponential term).

Table 3 The average/maximum number of objective function evaluations, performed by algorithms Rect-1 and Sobol, before $-\,\delta _n$drops below t, for the 100 perturbed cases of the Rastrigin test problem (Eq. 23) in dimensions $d=2,3,4,5$

Full size table

References

Floudas, Ch.: Deterministic Global Optimization: Theory, Algorithms and Applications. Kluwer, Dordrecht (2000)
Book Google Scholar
Horst, R., Pardalos, A.P., Thoai, N.: Introduction to Global Optimization. Kluwer, Dordrecht (1995)
MATH Google Scholar
Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer, Berlin (1996)
Book MATH Google Scholar
Pinter, J.: Global Optimization in Action. Kluwer, Dordrecht (1996)
Book MATH Google Scholar
Sergeyev, Y.D.: Multidimensional global optimization using the first derivatives. Comput. Maths. Math. Phys. 39(5), 743–752 (1999)
MathSciNet MATH Google Scholar
Sergeyev, Y.D., Kvasov, D.E.: A deterministic global optimization using smooth diagonal auxiliary functions. Commun. Nonlinear Sci. Numer. Simul. 21, 99–111 (2015)
Article MathSciNet MATH Google Scholar
Sergeyev, Y.D., Kvasov, D.E.: Deterministic Global Optimization, An Introduction to the Diagonal Approach. Springer, New York (2017)
Book MATH Google Scholar
Mockus, J.: Bayesian Approach to Global Optimization. Kluwer, Dordrecht (1988)
MATH Google Scholar
Strongin, R.G., Sergeyev, Y.D.: Global Optimization with Non-convex Constraints: Sequential and Parallel Algorithms. Kluwer, Dordrecht (2000)
Book MATH Google Scholar
Zhigljavsky, A.: Theory of Global Random Search. Kluwer, Dordrecht (1991)
Book Google Scholar
Žilinskas, A.: A statistical model-based algorithm for black-box multi-objective optimisation. Int. J. Syst. Sci. 45(1), 82–92 (2014)
Article MathSciNet MATH Google Scholar
Žilinskas, A., Zhigljavsy, A.: Stochastic global optimization: a review on the occasion of 25 years of Informatica. Informatica 27, 229–256 (2016)
Article Google Scholar
Hooker, J.: Testing heuristics: we have it all wrong. J. Heuristics 11, 33–42 (1995)
Article MATH Google Scholar
Pardalos, P., Romeijn, H.: Handbook of Global Optimization, vol. 2. Springer, Berlin (2002)
Book MATH Google Scholar
Rastrigin, L.: Statistical Models of Search. Nauka (2013) (in Russian)
Paulavičius, R., Sergeyev, Y., Kvasov, D., Z̆ilinskas, J.: Globally-biased DISIMPL algorithm for expensive global optimization. J. Glob. Optim. 59, 545567 (2014)
MathSciNet MATH Google Scholar
Žilinskas, A., Z̆ilinskas, J.: A hybrid global optimization algorithm for non-linear leist squares regression. J. Glob. Optim. 56, 265–277 (2013)
Article Google Scholar
Calvin, J., Žilinskas, A.: A one-dimensional P-algorithm with convergence rate $o(n^{-3+\delta })$ for smooth functions. J. Optim. Theory Appl. 106, 297–307 (2000)
Article MathSciNet MATH Google Scholar
Calvin, J.M.: An adaptive univariate global optimization algorithm and its convergence rate under the Wiener measure. Informatica 22, 471–488 (2010)
MathSciNet MATH Google Scholar
Calvin, J.M., Chen, Y., Z̆ilinskas, A.: An adaptive univariate global optimization algorithm and its convergence rate for twice continuously differentiable functions. J. Optim. Theory Appl. 155, 628–636 (2011)
Article MathSciNet MATH Google Scholar
Calvin, J.M., Z̆ilinskas, A.: On a global optimization of bivariate smooth functions. J. Optim. Theory Appl. 163, 528–547 (2014)
Article MathSciNet Google Scholar
Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems, II. Tracts in Mathematics, vol. 12. European Mathematical Society, Zürich (2010)
Book MATH Google Scholar
Huyer, W., Neumaier, A.: Global optimization by multi-level coordinate search. J. Glob. Optim. 14, 331–355 (1999)
Article MATH Google Scholar
Jones, D.R., Perttunen, C.D., Stuckman, C.D.: Lipschitzian optimization without the lipschitz constant. J. Optim. Theory Appl. 79, 157–181 (1993)
Article MathSciNet MATH Google Scholar
Sergeyev, Y., Kvasov, D.: Global search based on efficicient diagonal partitions and set of Lipschitz constants. SIAM J. Optim. 16(3), 910–937 (2006)
Article MathSciNet MATH Google Scholar
Scholz, D.: Deterministic Global Optimization: Geometric Branch-and-Bound Methods and their Applications. Springer, Berlin (2012)
Book MATH Google Scholar
Paulavičius, R., Z̆ilinskas, J.: Simplicial Global Optimization. Springer Briefs in Optimization. Springer, Berlin (2014)
Book Google Scholar
Novak, E.: Deterministic and Stochastic Error Bounds in Numerical Analysis, Lecture Notes in Mathematics, vol. 1349. Springer, Berlin (1988)
Book Google Scholar
Zhigljavsy, A., Žilinskas, A.: Stochastic Global Optimization. Springer, Berlin (2008)
Google Scholar
Waldron, S.: Sharp error estimates for multivariate positive linear operators which reproduce the linear polynomials. In: Chui, C.K., Schumaker, L.L. (eds.) Approximation Theory IX, vol. 1, pp. 339–346. Vanderbilt University Press, Nashville (1998)
Google Scholar
McGeoch, C.: Experimental analysis of algorithms. In: Pardalos, P., Romeijn, E. (eds.) Handbook of Global Optimization, vol. 2, pp. 489–514. Kluwer, Dodrecht (2002)
Chapter Google Scholar
More, J., Wild, S.: Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 20, 172–191 (2009)
Article MathSciNet MATH Google Scholar
GSL, GNU Scientific Library. https://www.gnu.org/software/gsl/. Accessed on 1 Aug 2016
Cgal, Computational Geometry Algorithms Library. http://www.cgal.org. Accessed on 15 Oct 2014
Sobol, I.: On the systematic search in a hypercube. SIAM J. Numer. Anal. 16, 790793 (1979)
MathSciNet MATH Google Scholar
Dixon, L.C.W., Szegö, G.P. (eds.): Towards Global Optimization II. North Holland, New York (1978)
Google Scholar
Branin, F.: Widely convergent method for finding multiple solutions of simultaneous nonlinear equations. IBM J. Res. Dev. 16(5), 504–522 (1972)
Article MathSciNet MATH Google Scholar
Hansen, P., Jaumard, B.: Lipshitz optimization. In: Horst, R., Pardalos, P. (eds.) Handbook of Global Optimization, pp. 407–493. Kluwer, Dodrecht (1995)
Chapter Google Scholar
Liu, H., Xu, S., Ma, Y., Wang, X.: Global optimization of expensive black box functions using potential lipschitz constants and response surfaces. J. Glob. Optim. 63, 229251 (2015)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank the reviewers for valuable hints and remarks which facilitated the improvement of presentation of our results. The work of J. Calvin was supported by the National Science Foundation under Grant No. CMMI-0926949 and the work of G. Gimbutienė and A. Žilinskas was supported by the Research Council of Lithuania under Grant No. P-MIP-17-61.

Author information

Authors and Affiliations

Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, 07102, USA
James Calvin & William O. Phillips
Institute of Mathematics and Informatics, Vilnius University, Akademijos 4, 08663, Vilnius, Lithuania
Gražina Gimbutienė & Antanas Žilinskas

Authors

James Calvin
View author publications
You can also search for this author in PubMed Google Scholar
Gražina Gimbutienė
View author publications
You can also search for this author in PubMed Google Scholar
William O. Phillips
View author publications
You can also search for this author in PubMed Google Scholar
Antanas Žilinskas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antanas Žilinskas.

Appendices

Appendix A

This appendix contains the proof of Lemma 2.

Denote the eigenvalues of $D^2f(x^*)$ by

$$\begin{aligned} \theta _1\ge \theta _2\ge \cdots \ge \theta _d >0. \end{aligned}$$

Then by Taylor’s theorem (x a column vector),

$$\begin{aligned} f(x^*+x)-f(x^*) = \frac{1}{2} x^T D^2f(x^*) x + o(\left\| x\right\| ^2). \end{aligned}$$

Since $D^2f(x^*)$ is symmetric positive definite, we can express it as

$$\begin{aligned} D^2f(x^*) = V\varTheta V^T \end{aligned}$$

for an orthogonal matrix V and diagonal matrix $\varTheta =\mathrm{diag}(\theta _1, \theta _2,\ldots ,\theta _d)$. Then

$$\begin{aligned} D^2f(x^*) = V \varTheta ^{1/2}\varTheta ^{1/2} V^T \end{aligned}$$

and

$$\begin{aligned} f(x^*+x)-f^* = \frac{1}{2} x^T V \varTheta ^{1/2}\varTheta ^{1/2} V^T x + o(\left\| x\right\| ^2) = \frac{1}{2} \left\| Tx\right\| ^2 + o(\left\| x\right\| ^2), \end{aligned}$$

(24)

where $Tx \equiv \varTheta ^{1/2}V^Tx$.

Let $B_c^d(0)$ denote the ball of radius c, centered at 0, in $\mathbb {R}^d$. For $c>0$, let

$$\begin{aligned} f_c \equiv \inf \{f(x^*+x):x\notin B_c^d(0)\}, \end{aligned}$$

which is positive since the minimizer is assumed unique and f is continuous. Then

$$\begin{aligned} \int _{[0,1]^d\setminus B_c^d(x^*)} \frac{dx}{(f(x)-f(x^*)+\epsilon )^{d/2}}\le f_c^{-d/2} \end{aligned}$$

and

$$\begin{aligned} \int _{[0,1]^d}\frac{dx}{(f(x)-f(x^*)+\epsilon )^{d/2}}&\le f_c^{-d/2}+\int _{B_c^d(0)}\frac{dx}{\left( f(x^*+x)-f(x^*)+\epsilon \right) ^{d/2}}\\&= f_c^{-d/2}+\int _{B_c^d(0)}\frac{dx}{\left( \frac{1}{2}\left\| Tx\right\| ^2+\epsilon +o(\left\| x\right\| ^2)\right) ^{d/2}} \end{aligned}$$

by (24). For any $\eta \in ]0,1/2]$, we can choose $c>0$ small enough so that

$$\begin{aligned} \frac{1}{2}\left\| Tx\right\| ^2+\epsilon +o(\left\| x\right\| ^2)\ge \left( \frac{1}{2}-\eta \right) \left\| Tx\right\| ^2+\epsilon . \end{aligned}$$

Therefore,

$$\begin{aligned} \int _{[0,1]^d} \frac{dx}{\left( f(x) - f(x^*) +\epsilon \right) ^{d/2}} \le \int _{B^d_c(0)} \frac{dx}{\left( (1/2-\eta ) \left\| Tx\right\| ^2 +\epsilon \right) ^{d/2}} +O(1) \end{aligned}$$

(25)

as $\epsilon \downarrow 0$, and similarly

$$\begin{aligned} \int _{[0,1]^d} \frac{dx}{\left( f(x) - f(x^*) +\epsilon \right) ^{d/2}} \ge \int _{B^d_c(0)} \frac{dx}{\left( (\frac{1}{2}+\eta )\left\| Tx\right\| ^2 +\epsilon \right) ^{d/2}} +O(1). \end{aligned}$$

(26)

Using the orthogonality of V, for any $b>0$,

$$\begin{aligned} \int _{B^d_c(0)} \frac{dx}{\left( b\left\| Tx\right\| ^2 +\epsilon \right) ^{d/2}}&= \int _{V(B^d_c(0))} \frac{dx}{\left( b\left\| \varTheta ^{1/2}x\right\| ^2 +\epsilon \right) ^{d/2}}\nonumber \\&= \int _{B^d_c(0)} \frac{dx}{\left( b\left\| \varTheta ^{1/2}x\right\| ^2 +\epsilon \right) ^{d/2}} \nonumber \\&= \int _{E(c,b,\epsilon )} \frac{dy}{\left( \left\| y\right\| ^2 +1\right) ^{d/2}}\frac{1}{b^{d/2}\left( \prod _{i=1}^d \theta _i\right) ^{1/2}}, \end{aligned}$$

(27)

where $E(c,b,\epsilon )$ is the image of the ball of radius c under the map $x_i \mapsto x_i(\theta _ib/\epsilon )^{1/2}$, and we used the substitution $y_i \leftarrow x_i\left( \theta _i b/\epsilon \right) ^{1/2}$ in the last equation. Therefore,

$$\begin{aligned} B^d_{c\sqrt{b\theta _d/\epsilon }}(0) \subset E(c,b,\epsilon ) \subset B^d_{c\sqrt{b\theta _1/\epsilon }}(0), \end{aligned}$$

and (27) gives the bounds

$$\begin{aligned}&\int _{ B^d_{c\sqrt{b\theta _d/\epsilon }}(0) } \frac{dy}{\left( \left\| y\right\| ^2 +1\right) ^{d/2}}\frac{1}{b^{d/2}\left( \prod _{i=1}^d \theta _i\right) ^{1/2}} \le \int _{B^d_c(0)} \frac{dx}{\left( b\left\| Tx\right\| ^2 +\epsilon \right) ^{d/2}}\\&\quad \le \int _{B^d_{c\sqrt{b\theta _1/\epsilon }}(0)} \frac{dy}{\left( \left\| y\right\| ^2 +1\right) ^{d/2}}\frac{1}{b^{d/2}\left( \prod _{i=1}^d \theta _i\right) ^{1/2}}. \end{aligned}$$

Let $\mathcal {V}(x)=\pi ^{d/2}\varGamma (d/2 +1)^{-1}x^d$ denote the volume of the d-dimensional ball of radius x. For $z>0$,

$$\begin{aligned} \int _{B^d_z(0)}\frac{dx}{\left( \left\| x\right\| ^2 + 1\right) ^{d/2}}= & {} \int _{r=0}^z \frac{d\mathcal {V}(r)}{(r^2+1)^{d/2}} =\frac{d \pi ^{d/2}}{\varGamma (d/2 +1)} \int _{r=0}^z \frac{r^{d-1}dr}{(r^2+1)^{d/2}}\\= & {} \frac{d \pi ^{d/2}}{\varGamma (d/2 +1)} \int _{r=0}^z \frac{1}{r}\left( \frac{r^2}{r^2+1}\right) ^{d/2}dr. \end{aligned}$$

For $z>1$,

$$\begin{aligned} \int _{r=0}^z \frac{1}{r}\left( \frac{r^2}{r^2+1}\right) ^{d/2}dr < \int _{r=1}^z \frac{1}{r}dr = \log (z), \end{aligned}$$

(28)

and for $A<z$,

$$\begin{aligned}&\int _{r=0}^z \frac{1}{r}\left( \frac{r^2}{r^2+1}\right) ^{d/2}dr > \left( \frac{A^2}{A^2+1}\right) ^{d/2}\int _{r=A}^z \frac{1}{r} dr\\&\quad = \left( \frac{A^2}{A^2+1}\right) ^{d/2} \left( \log (z)-\log (A)\right) . \end{aligned}$$

Combining this inequality with (28) we conclude that

$$\begin{aligned} \lim _{z\uparrow \infty } \frac{ \int _{r=0}^z \frac{1}{r}\left( \frac{r^2}{r^2+1}\right) ^{d/2} dr}{\log (z)} =1. \end{aligned}$$

(29)

Set

$$\begin{aligned} C_d = \frac{d\pi ^{d/2}}{\varGamma (1+d/2)} \end{aligned}$$

and

$$\begin{aligned} G_d(c) = \int _{r=0}^c\frac{r^{d-1}dr}{(r^2+1)^{d/2}}. \end{aligned}$$

By (29), $G_d(c)/\log (c)\rightarrow 1$ as $c\uparrow \infty $. For any $b>0$, we have the bounds

$$\begin{aligned}&{b^{-d/2}\left( \prod _{i=1}^d\theta _i\right) ^{-1/2}} C_d G_d\left( c\sqrt{b\theta _d/\epsilon }\right) \le \int _{B^d_c(0)} \frac{dx}{\left( b\left\| Tx\right\| ^2 +\epsilon \right) ^{d/2}}\\&\quad \le {b^{-d/2}\left( \prod _{i=1}^d\theta _i\right) ^{-1/2}} C_d G_d\left( c\sqrt{b\theta _1/\epsilon }\right) , \end{aligned}$$

and

$$\begin{aligned}&O(1) + (1/2+\eta ))^{-d/2}\left( \prod _{i=1}^d \theta _i\right) ^{-1/2} C_d G_d\left( c\sqrt{(1/2+\eta )\theta _d/\epsilon }\right) \\&\quad \le {\mathcal I_f}(\epsilon ) \le O(1) + (1/2-\eta )^{-d/2}\left( \prod _{i=1}^d \theta _i\right) ^{-1/2} C_d G_d\left( c\sqrt{(1/2-\eta )\theta _1/\epsilon }\right) . \end{aligned}$$

Since $\eta >0$ is arbitrary, the fact that $G_d(c)/\log (c)\rightarrow 1$ as $c\uparrow \infty $ completes the proof.

Appendix B

This appendix contains the proofs of the lemmas upon which the proof of Theorem 1 are based.

Proof of Lemma 3

Let $m\le n$ be the last time before n that the smallest hyper-rectangle was about to be split, so $v_m = 2v_n$ and by (5),

$$\begin{aligned} \rho ^m\le \frac{1}{\lambda \log (1/v_m)} = \frac{1}{\lambda \log (1/2v_n)}. \end{aligned}$$

Note that $m\ge n_0$ by definition of $n_2$.

We will show by induction that

$$\begin{aligned} \rho ^{m+k} \le \frac{2}{\lambda \log (1/v_{m+k})} \end{aligned}$$

(30)

for $k\in \{1,2,\ldots ,n-m\}$, if the hyper-rectangle that the algorithm is subdividing at iteration $m+k$ has not been previously subdivided since time m. At other times the bound is twice as large.

Let us first consider $\{\rho ^{m+1}_i: i\le m+1\}$. Suppose that i is the split hyper-rectangle at time m, and let j denote a non-split hyper-rectangle, so $\rho ^m_j \le \rho _i^m$. Then there exists a point $c_j\in R_j$ such that

$$\begin{aligned} \rho ^{m+1}_j&= \frac{\left| R_j\right| }{\left( L_{m}(c_j)-M_{m+1} + g(v_{m+1})\right) ^{d/2} }\\&\le \left( \frac{g(2v_n)}{g(v_n)}\right) ^{d/2} \frac{\left| R_j\right| }{\left( L_{m}(c_j)-M_{m} + g(2v_{n})\right) ^{d/2} }\ \ \text {since}~ g(2v_n)>g(v_n)\\&= \left( \frac{g(2v_n)}{g(v_n)}\right) ^{d/2} \rho ^m_j \le \left( 2 \frac{\log (1/2v_n)}{\log (1/v_n)}\right) \frac{1}{\lambda \log (1/v_m)} \ \ \text {by }~(5)\\&= 2 \frac{\log (1/2v_n)}{\log (1/v_n)}\frac{1}{\lambda \log (1/2v_n)} = \frac{2}{\lambda \log (1/v_{n})} = \frac{2}{\lambda \log (1/v_{m+1})}. \end{aligned}$$

Next consider a child of the split subrectangle i. Since we are splitting the smallest subrectangle, $v_m = 2v_n$, and there exists a point $c_i\in R_i$ such that (supposing that one child has index i at time $m+1$)

$$\begin{aligned} \rho ^{m+1}_i = \frac{\left| R_i\right| /2}{\left( L_{m{+1}}(c_i)-M_{m+1} + g(v_{m+1})\right) ^{d/2} } \le \frac{v_{m+1}}{g(v_{m+1})^{d/2}} = \frac{1}{\lambda \log (1/v_{m+1})}. \end{aligned}$$

We have established the base case for the induction. Now consider iteration $m+k+1$, $1\le k<n-m$. For the induction hypothesis, assume that

$$\begin{aligned} \rho _i^{m+k} \le \frac{2}{\lambda \log (1/v_{n})} \end{aligned}$$

if hyper-rectangle i has not been split since time m and

$$\begin{aligned} \rho _i^{m+k} \le \frac{4}{\lambda \log (1/v_{n})} \end{aligned}$$

if hyper-rectangle i has been split since time m.

Suppose that the most promising hyper-rectangle at iteration $m+k+1$ is subrectangle $R_j=\prod _{i=1}^d[a_i,b_i]$,

$$\begin{aligned} \rho ^{m+k+1} = \rho ^{m+k+1}_j = \int _R \frac{ds}{(L_{m+k+1}(s)-M_{m+k+1}+g(v_n))^{d/2}}. \end{aligned}$$

Assume that this hyper-rectangle has not been split since time m. Let us suppose that during this interval (between m and the next time that the smallest hyper-rectangle is split) R is split r times, and denote by $L^\prime $ the piecewise multilinear function defined over R after the splitting evaluations. Then

$$\begin{aligned} \max _{s\in R}\left| L^\prime (s)-L_{m+k+1}(s)\right| \le \max _{s\in R}\left| f(s)-L_{m+k+1}(s)\right| \le \gamma (s), \end{aligned}$$

where

$$\begin{aligned} \gamma (s) = \frac{1}{2}\sum _{i=1}^d(s-a_i)(b_i-s) \left\| D^2f\right\| _{\infty , R} \le \frac{1}{8} q d \left| R\right| ^{2/d} \left\| D^2f\right\| _{\infty , R} \equiv \hat{\gamma }. \end{aligned}$$

Set

$$\begin{aligned} \overline{\gamma }(s) = \min \{\gamma (s), L_{m+k+1}(s)-M_{m+k+1}\}. \end{aligned}$$

Then, denoting by $\rho _1$ the sum of the $\rho $ values for the resulting subrectangles of R at time $m+k+q$, we have

$$\begin{aligned} \rho _1= & {} \int _R \frac{ds}{(L^\prime (s)-M_{m+k+q}+g(v_n))^{d/2}}\\\le & {} \int _R \frac{ds}{(L_{m+k+1}(s) - \overline{\gamma }(s)-M_{m+k+1}+g(v_n))^{d/2}}\\\le & {} \int _R \frac{ds}{(L_{m+k+1}(s) - M_{m+k+1}+g(v_n)-{\hat{\gamma }})^{d/2}}. \end{aligned}$$

Let

$$\begin{aligned} a+Q(s) = L_{m+k+1}(s)-M_{m+k+1}+g(v_n), \end{aligned}$$

where $a=\min _{s\in R} L_{m+k+1}(s)-M_n+g(v_n) >0$. Then

$$\begin{aligned} \frac{\rho _1}{\rho _0} \le \frac{ \int _R\frac{ds}{(a+Q(s)-\hat{\gamma })^{d/2}} }{ \int _R\frac{ds}{(a+Q(s))^{d/2}} }. \end{aligned}$$

The latter ratio is maximized by $Q\equiv 0$, which corresponds to $\rho _0 = \left| R\right| /a^{d/2}$, and so

$$\begin{aligned} \frac{\rho _1}{\rho _0}&\le \frac{ \int _R\frac{ds}{(a-\hat{\gamma })^{d/2}} }{ \int _R\frac{ds}{(a)^{d/2}} } = \frac{1}{\left( 1-\frac{\hat{\gamma }}{a}\right) ^{d/2}} = \frac{1}{\left( 1-\frac{ \frac{1}{8} q d \left| R\right| ^{2/d} \left\| D^2f\right\| _{\infty , R} }{a}\right) ^{d/2}}\\&\le \frac{1}{\left( 1- \frac{1}{8} q d \rho _0^{2/d} \left\| D^2f\right\| _{\infty , R} \right) ^{d/2}} \le \frac{1}{\left( 1- \frac{1}{8} q d \frac{4\cdot 4^{2/d}}{q d^3 \left\| D^2f\right\| _{\infty , R}} \left\| D^2f\right\| _{\infty , R} \right) ^{d/2}}\\&= \frac{1}{\left( 1- \frac{4^{2/d}}{2d^2 } \right) ^{d/2}} \le 2. \end{aligned}$$

We used the inequalities

$$\begin{aligned}&\rho _0^{2/d}\le \left( \frac{4}{\lambda \log (n)}\right) ^{2/d} \le \left( \frac{4}{\lambda \left( d \left\| D^2f\right\| _{\infty , R} \right) ^{d/2}}\right) ^{2/d}\\&=\frac{\left( \frac{4}{\lambda }\right) ^{2/d}}{d \left\| D^2f\right\| _{\infty , R}} =\frac{4\cdot 4^{2/d}}{q d^3 \left\| D^2f\right\| _{\infty , R}}, \end{aligned}$$

which follows from the induction hypothesis.

We have shown that whenever the algorithm is about to subdivide a hyper-rectangle for the first time since m, $\rho ^{m+k}\le 2/\lambda \log (m+k)$. After a hyper-rectangle is subdivided, subsequent subdivisions can never result in $\rho $ more than twice as large.

The proof by induction of (30) is complete. $\square $

Proof of Lemma 4

Suppose (to get a contradiction) that $\left| R_i\right| =v_n^*$, $\left| R_j\right| =v_n$, and $v_n^*=4 v_n$, but yet we are about to split hyper-rectangle $R_j$, resulting in $v_n^*>4v_n$; that is, $\rho ^n_j \ge \rho ^n_i$:

$$\begin{aligned} \rho ^n_j = \frac{\left| R_j\right| =v_n}{\left( L_n(c_j)-M_n + g(v_n)\right) ^{d/2} } \ge \frac{\left| R_i\right| =v_n^*=4v_n}{\left( L_n(c_i)-M_n + g(v_n)\right) ^{d/2} } =\rho ^n_i \end{aligned}$$

for some $c_i\in R_i, c_j\in R_j$. This implies that

$$\begin{aligned} \frac{L_n(c_i)-M_n + g(v_n)}{L_n(c_j)-M_n + g(v_n)} \ge 4^{2/d}. \end{aligned}$$

(31)

But $L_n(c_j) \ge M_n$ and

$$\begin{aligned} L_n(c_i) \le M_n + \frac{1}{8} q d \left\| D^2f\right\| _{\infty ,R_i}\left| R_i\right| ^{2/d}. \end{aligned}$$

Thus

$$\begin{aligned} L_n(c_i)-L_n(c_j) \le \frac{1}{8} q d \left\| D^2f\right\| _{\infty ,[0,1]^d}v_n^{2/d}4^{2/d}. \end{aligned}$$

This means that

$$\begin{aligned}&\frac{L_n(c_i)-M_n + g(v_n)}{L_n(c_j)-M_n + g(v_n)} = 1+ \frac{L_n(c_i)-L_n(c_j)}{L_n(c_j)-M_n + g(v_n)} \\&\quad = 1+ \frac{L_n(c_i)-L_n(c_j)}{v_n^{2/d}} \frac{v_n^{2/d}}{L_n(c_j)-M_n + g(v_n)} \\&\quad = 1+ \frac{L_n(c_i)-L_n(c_j)}{v_n^{2/d}} (\rho ^n_j)^{2/d} \\&\quad \le 1+ \frac{1}{8} q d \left\| D^2f\right\| _{\infty ,[0,1]^d}\left( 4^{2/d}\right) \left( \frac{4}{\lambda \log (n)}\right) ^{2/d} \\&\quad \le 1+ \frac{1}{8} q d \left\| D^2f\right\| _{\infty ,[0,1]^d}\left( 4^{2/d}\right) \frac{4^{2/d}}{(qd^2/4)\log (n)^{2/d}} \\&\quad \le 1+ \frac{1}{2} q d \left\| D^2f\right\| _{\infty ,[0,1]^d}\left( 4^{2/d}\right) \frac{4^{2/d}}{(qd^2) d\left\| D^2f\right\| _{\infty ,[0,1]^d}}\ \ \text {since}~ n\ge n_2 \\&\quad = 1+ \frac{1}{2d^2} 4^{4/d} < 4^{2/d} \end{aligned}$$

since $d\ge 2$. But this contradicts (31), and establishes (9).

The proof of (10) follows from

$$\begin{aligned} \frac{\varDelta _n}{g(v_n)}&\le \frac{ \frac{1}{8} q d \left\| D^2f\right\| _{\infty ,R_i}(4 v_n)^{2/d} }{ \frac{1}{4}q d^2\left( v_n \log (1/v_n)\right) ^{2/d} } = \frac{ \left\| D^2f\right\| _{\infty ,R_i}4^{2/d} }{ 2d\left( \log (1/v_n)\right) ^{2/d} } \le \frac{ \left\| D^2f\right\| _{\infty ,R_i}4^{2/d} }{2d\left( \log (n)\right) ^{2/d} }\\&\le \frac{ \left\| D^2f\right\| _{\infty ,R_i}4^{2/d} }{ 2d\cdot d \left\| D^2f\right\| _{\infty ,[0,1]^d}} \le \frac{ 4^{2/d} }{ 2d^2 } \le \frac{2}{d^2}. \end{aligned}$$

$\square $

Proof of Lemma 5

Recall that $R^*_n$ denotes the hyper-rectangle containing $x^*$ at time n, and that $\left| R^*_n\right| = v^*_n \le 4 v_n$ by Lemma 4. If h is the minimal edge length of $R^*_n$, then $v^*_n = 2^j h^d$ for some $0 \le j <d$, and the diameter of $R^*_n$ is less than $2h\sqrt{d}$. Also $h = 2^{-j/d}(v^*_n)^{1/d}$. Therefore, for $s\in R^*_n$,

$$\begin{aligned} L_n(s) -M_n&\le L_n(s)-f(x^*)\\&\le \max _{s\in R_n^*}f(s)-f(x^*)\\&\le \frac{1}{2}\left( 2h\sqrt{d}\right) ^2 \left\| D^2f\right\| _{\infty ,R^*_n}\ \ \text {by Taylor's theorem} \\&= \frac{1}{2}\left( 2\cdot 2^{-j/d}(v^*_n)^{1/d}\sqrt{d}\right) ^2 \left\| D^2f\right\| _{\infty ,R^*_n}\\&\le 2 d (v^*_n)^{2/d} \left\| D^2f\right\| _{\infty ,[0,1]^d}\\&\le 2 d (4v_n)^{2/d} \left( \frac{\log (n)}{d}\right) ^{2/d}\ \ \ \text { by Lemma}~4~\text { and }~n\ge n_2\\&\le 2 d (4v_n)^{2/d} \left( \frac{\log (1/v_n)}{d}\right) ^{2/d} = 2\cdot 4^{2/d} \frac{d}{d^{2/d}}\lambda ^{-\,2/d}g(v_n)\\&= 2\cdot 4^{2/d} \frac{d}{d^{2/d}}\frac{4}{qd^2}g(v_n) < \frac{32}{dq}g(v_n). \end{aligned}$$

Therefore, by the previous inequality

$$\begin{aligned} \rho ^n&\ge \int _{R^*_n}\frac{ds}{\left( L_n(s)-M_n+g(v_n)\right) ^{d/2}} \ge \frac{v^*_n}{\left( \frac{32}{dq}g(v_n)+g(v_n)\right) ^{d/2}}\\&\ge \frac{v_n}{\lambda v_n \log (1/v_n)\left( 1+\frac{32}{dq}\right) ^{d/2}} = \frac{1}{\lambda \log (1/v_n)\left( 1+\frac{32}{dq}\right) ^{d/2}}\\&\ge \frac{1}{\lambda \log (1/v_n)\exp \left( \frac{16}{q}\right) } = \frac{\exp (-16/q)}{\lambda \log (1/v_n)}. \end{aligned}$$

$\square $

Proof of Lemma 6

Equation (11) follows from Lemma 3.

For $n\ge n_2(f)$, the $\rho $ values for the children of a split hyper-rectangle will not be much smaller than the parent. The largest possible decrease occurs if the values on the boundary of $R_i$ are constant, say with value $M_{m+k}+A$, and the new function values are the largest possible, namely $M_{n}+A+a$, where

$$\begin{aligned} a \le \frac{1}{8}q d \left\| D^2f\right\| _{\infty , R_i} \left| R_i\right| ^{2/d}. \end{aligned}$$

Considering a child of the split hyper-rectangle (say with index i),

$$\begin{aligned} \rho ^{n+1}_i&\ge \frac{\left| R_i\right| /2}{\left( A+a+g({v_n})\right) ^{d/2}} \ge \frac{\left| R_i\right| /2}{\left( A+g({v_n})\right) ^{d/2} \left( 1+\frac{a}{A+g({v_n})} \right) ^{d/2}}\\&\ge \frac{\left| R_i\right| /2}{\left( A+g({v_n})\right) ^{d/2} \left( 1+\frac{ \frac{1}{8}q d \left\| D^2f\right\| _{\infty , R_i} \left| R_i\right| ^{2/d} }{\left| R_i\right| ^{2/d}} \frac{\left| R_i\right| ^{2/d}}{A+g({v_n})} \right) ^{d/2}}\\&\ge \frac{1}{2}\rho ^{n}_i \frac{1}{ \left( 1+\frac{ \frac{1}{8}q d \left\| D^2f\right\| _{\infty , R_i} \left| R_i\right| ^{2/d} }{\left| R_i\right| ^{2/d}} \frac{\left| R_i\right| ^{2/d}}{A+g({v_n})} \right) ^{d/2}}\\&= \frac{1}{2}\rho ^{n}_i \frac{1}{ \left( 1+\frac{ \frac{1}{8}q d \left\| D^2f\right\| _{\infty , R_i} \left| R_i\right| ^{2/d} }{\left| R_i\right| ^{2/d}} \left( \rho ^{n}_i\right) ^{2/d} \right) ^{d/2}}\\&\ge \frac{1}{2}\rho ^{n}_i \frac{1}{ \left( 1+\frac{ \frac{1}{8}q d \left\| D^2f\right\| _{\infty , R_i} \left| R_i\right| ^{2/d} }{\left| R_i\right| ^{2/d}} \left( \frac{4}{\lambda \log (n)}\right) ^{2/d} \right) ^{d/2}}\ \ \ \mathrm{(}\text {by Lemma}~3\mathrm{)}\\&\ge \frac{1}{2}\rho ^{n}_i \frac{1}{ \left( 1+\frac{ \frac{1}{8}q d \left\| D^2f\right\| _{\infty , R_i} \left| R_i\right| ^{2/d} }{\left| R_i\right| ^{2/d}} \left( \frac{{4}}{\lambda d^{d/2}\left\| D^2f\right\| _{\infty ,[0,1]^d}^{d/2} }\right) ^{2/d} \right) ^{d/2}}\ \ \ \mathrm{(}\text {since}~n\ge n_2\mathrm{)}\\&\ge \frac{1}{2}\rho ^{n}_i \frac{1}{ \left( 1+ \frac{1}{8}q d \left( \frac{{4}}{ (q d^3/4)^{d/2} }\right) ^{2/d} \right) ^{d/2}} = \frac{1}{2}\rho ^{n}_i \frac{1}{ \left( 1+ \frac{1}{8}q d \frac{{{4}}^{2/d}}{ q d^3/4 } \right) ^{d/2}}\\&= \frac{1}{2}\rho ^{n}_i \frac{1}{ \left( 1+ \frac{1}{2} \frac{{{4}}^{2/d}}{ d^2} \right) ^{d/2}} \ge \frac{1}{2} \rho ^{n}_i \exp \left( -\frac{1}{4d}{{4}}^{2/d}\right) \ge \frac{1}{2} e^{-1/{{2}}} \rho ^{n}_i, \end{aligned}$$

using the inequality $(1+x/k)^k \le \exp (x)$ and the fact that $d\ge 2$ implies that $4^{2/d}/(4d)\le 1/2$.

Consider the average of the $\{\rho ^n_i\}$ at time n that were the product of splits after time n / 2. Over this time interval we have

$$\begin{aligned} \rho ^k \ge \frac{\exp (-16/q)}{\lambda \log (1/v_k)} \ge \frac{\exp (-16/q)}{\lambda \log (1/v_n)} \end{aligned}$$

by Lemma 5, since $n\ge 2n_2(f)$. Since the children $\rho $ values will not be much smaller than the parent’s,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\rho ^n_i \ge \frac{1}{n}\frac{n/2}{n/2}\sum _{i=n/2}^n\rho ^i \ge \frac{1}{2}\frac{1}{n/2}\sum _{i=n/2}^n \frac{\exp (-16/q)}{\lambda \log (1/v_i)} \ge \frac{1}{{2}} \frac{\exp (-16/q)}{\lambda \log (1/v_n)}, \end{aligned}$$

since $v_i\ge v_n$. $\square $

Proof of Lemma 7

Fix a particular hyper-rectangle $R_i$ with

$$\begin{aligned} \rho ^n_i = \int _{R_i}\frac{ds}{(L_n(s)-f^*+g(v_n))^{d/2}}. \end{aligned}$$

We first show that the integrals

$$\begin{aligned} \int _{R_i}\frac{ds}{(f(s)-f^*+g(v_n))^{d/2}},\ \ \int _{R_i}\frac{ds}{(L_n(s)-f^*+g(v_n))^{d/2}} \end{aligned}$$

are close. As in the proof of Lemma 3, let

$$\begin{aligned} a+Q(s) = L(s)-f^*+g(v_n), \end{aligned}$$

where $a=\min _{s\in {R_i}} L_n(s)-f^*+g(v_n) >0$. Then, from (6),

$$\begin{aligned} \max _{x\in {R_i}}\left| f(x)-L_n(x)\right| \le \frac{1}{8} q\cdot d \left\| D^2f\right\| _{\infty , {R_i}} \left| {R_i}\right| ^{2/d}\equiv b, \end{aligned}$$

and

$$\begin{aligned} \frac{ \int _{R_i}\frac{ds}{(f(s)-f^*+g(v_n))^{d/2}} }{ \int _{R_i}\frac{ds}{(L_n(s)-f^*+g(v_n))^{d/2}} } \ge \frac{ \int _{R_i}\frac{ds}{(L_n(s)-f^*+g(v_n)+b)^{d/2} } }{ \int _{R_i}\frac{ds}{(L_n(s)-f^*+g(v_n))^{d/2}} } = \frac{ \int _{R_i}\frac{ds}{(a+Q(s)+b)^{d/2} } }{ \int _{R_i}\frac{ds}{(a+Q(s))^{d/2}} }. \end{aligned}$$

The smallest value of the ratio occurs when $Q(s)\equiv 0$.

The case $Q\equiv 0$ corresponds to $\rho = \left| {R_i}\right| /a^{d/2}$, and so

$$\begin{aligned}&\frac{ \int _{R_i}\frac{ds}{(a+Q(s)+b)^{d/2} } }{ \int _{R_i}\frac{ds}{(a+Q(s))^{d/2}} } \ge \frac{ \int _{R_i}\frac{ds}{(a+b)^{d/2} } }{ \int _{R_i}\frac{ds}{a^{d/2}} } =\left( \frac{a}{a+b}\right) ^{d/2}\\&\quad = \frac{1}{\left( 1+ \frac{1}{8} q\cdot d \left\| D^2f\right\| _{\infty , {R_i}}\frac{ \left| {R_i}\right| ^{2/d}}{a}\right) ^{d/2}} = \frac{1}{\left( 1+\frac{1}{8} q\cdot d \left\| D^2f\right\| _{\infty , {R_i}} \rho ^{2/d}\right) ^{d/2}}. \end{aligned}$$

Now

$$\begin{aligned} \frac{1}{8} q\cdot d \left\| D^2f\right\| _{\infty , {R_i}} \rho ^{2/d}&\le \frac{1}{8} q\cdot d \left\| D^2f\right\| _{\infty , {R_i}} \left( \frac{4}{\lambda \log (n)}\right) ^{2/d}\\&\le \frac{1}{8} q\cdot d \left\| D^2f\right\| _{\infty , {R_i}} \frac{4^{2/d}}{\frac{qd^3}{4} \left\| D^2f\right\| _{\infty , {R_i}}} = \frac{4^{2/d}}{2d^2}. \end{aligned}$$

Therefore,

$$\begin{aligned} \frac{ \int _{R_i}\frac{ds}{(a+Q(s)+b)^{d/2} } }{ \int _{R_i}\frac{ds}{(a+Q(s))^{d/2}} } \ge \frac{1}{\left( 1+ \frac{4^{2/d}}{2d^2}\right) ^{d/2}} \ge \frac{2}{3}, \end{aligned}$$

since the last expression is increasing in $d\ge 2$. Turning to an upper bound for

$$\begin{aligned} \frac{ \int _{R_i}\frac{ds}{(f(s)-f^*+g(v_n))^{d/2}} }{ \int _{R_i}\frac{ds}{(L_n(s)-f^*+g(v_n))^{d/2}} }, \end{aligned}$$

observe that

$$\begin{aligned} \frac{ \int _{R_i}\frac{ds}{(f(s)-f^*+g(v_n))^{d/2}} }{ \int _{R_i}\frac{ds}{(L_n(s)-f^*+g(v_n))^{d/2}} }\le & {} \frac{ \int _{R_i}\frac{ds}{(L_n(s)-f^*+g(v_n)-b) } }{ \int _{R_i}\frac{ds}{(L_n(s)-f^*+g(v_n))^{d/2}} } \equiv \frac{ \int _{R_i}\frac{ds}{(a+Q(s)-b)^{d/2} } }{ \int _{R_i}\frac{ds}{(a+Q(s))^{d/2}} }\\\le & {} \frac{1}{\left( 1-\frac{b}{a}\right) ^{d/2}} \le \frac{1}{\left( 1- \frac{4^{2/d}}{2d^2}\right) ^{d/2}} \le 2. \end{aligned}$$

We have shown that

$$\begin{aligned}&\frac{2}{3}\int _{[0,1]^d}\frac{ds}{(L_n(s)-f^*+g(v_n))^{d/2}} \le \int _{[0,1]^d}\frac{ds}{(f(s)-f^*+g(v_n))^{d/2}}\nonumber \\&\quad \le 2\int _{[0,1]^d}\frac{ds}{(L_n(s)-f^*+g(v_n))^{d/2}}. \end{aligned}$$

(32)

Recall that $v_n^*$ denotes the volume of the hyper-rectangle containing the minimizer $x^*$. Since $n\ge n_2$, by Lemma 4 $v_n^* \le 4 v_n$ and

$$\begin{aligned} M_n-f^* \le \frac{g(v_n)}{d^2/2}. \end{aligned}$$

Therefore,

$$\begin{aligned} \int _R\frac{ds}{(L_n(s)-M_n +g(v_n))^{d/2}}&= \int _R\frac{ds}{(L_n(s)-M_n +g(v_n) +M_n-f^*)^{d/2}}\\&= \int _R\frac{ds}{ (L_n(s)-M_n +g(v_n))^{d/2} \left( 1+\frac{\varDelta _n}{L_n(s)-M_n +g(v_n)}\right) ^{d/2}}\\&\ge \int _R\frac{ds}{(L_n(s)-M_n +g(v_n)(1+2/d^2))^{d/2}}\ \ \text {by}~(10)\\&=\frac{1}{\left( 1+2/d^2\right) ^{d/2}} \int _R\frac{ds}{(L_n(s)-M_n +g(v_n))^{d/2}}\\&\ge \frac{2}{3} \int _R\frac{ds}{(L_n(s)-M_n +g(v_n))^{d/2}}. \end{aligned}$$

Combining these inequalities with (32) gives

$$\begin{aligned}&\frac{4}{9}\int _{[0,1]^d}\frac{ds}{(L_n(s)-M_n+g(v_n))^{d/2}} \le \frac{2}{3}\int _{[0,1]^d}\frac{ds}{(L_n(s)-f^*+g(v_n))^{d/2}}\\&\quad \le \int _{[0,1]^d}\frac{ds}{(f(s)-f^*+g(v_n))^{d/2}} \le 2\int _{[0,1]^d}\frac{ds}{(L_n(s)-f^*+g(v_n))^{d/2}}\\&\quad \le 2\int _{[0,1]^d}\frac{ds}{(L_n(s)-M_n+g(v_n))^{d/2}}. \end{aligned}$$

Since

$$\begin{aligned} \sum _{i=1}^{{n}}\rho _i^n = \int _{[0,1]^d}\frac{ds}{(L_n(s)-M_n+g(v_n))^{d/2}}, \end{aligned}$$

this completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Calvin, J., Gimbutienė, G., Phillips, W.O. et al. On convergence rate of a rectangular partition based global optimization algorithm. J Glob Optim 71, 165–191 (2018). https://doi.org/10.1007/s10898-018-0636-z

Download citation

Received: 04 October 2017
Accepted: 28 February 2018
Published: 07 March 2018
Issue Date: May 2018
DOI: https://doi.org/10.1007/s10898-018-0636-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On convergence rate of a rectangular partition based global optimization algorithm

Abstract

Similar content being viewed by others

Convergence rate of a rectangular subdivision-based optimization algorithm for smooth multivariate functions

Improved scheme for selection of potentially optimal hyper-rectangles in DIRECT

MSO: a framework for bound-constrained black-box global optimization algorithms

1 Introduction

2 The algorithm

Theorem 1

3 Interpolation error bounds

Lemma 1

Proof

Lemma 2

4 Proof of Theorem 1

Lemma 3

Lemma 4

Lemma 5

Lemma 6

Lemma 7

5 Numerical examples

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

Appendix B

Proof of Lemma 3

Proof of Lemma 4

Proof of Lemma 5

Proof of Lemma 6

Proof of Lemma 7

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On convergence rate of a rectangular partition based global optimization algorithm

Abstract

Similar content being viewed by others

Convergence rate of a rectangular subdivision-based optimization algorithm for smooth multivariate functions

Improved scheme for selection of potentially optimal hyper-rectangles in DIRECT

MSO: a framework for bound-constrained black-box global optimization algorithms

1 Introduction

2 The algorithm

Theorem 1

3 Interpolation error bounds

Lemma 1

Proof

Lemma 2

4 Proof of Theorem 1

Lemma 3

Lemma 4

Lemma 5

Lemma 6

Lemma 7

5 Numerical examples

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A

Appendix B

Proof of Lemma 3

Proof of Lemma 4

Proof of Lemma 5

Proof of Lemma 6

Proof of Lemma 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation