1 Introduction

One of the key issues faced by deterministic branch-and-bound algorithms for continuous global optimization [11] is the so-called cluster problem, where a large number of boxes may be visited by the algorithm in the vicinity of a global minimizer [7, 21, 29]. Du and Kearfott [7, 13] were the first to analyze this phenomenon in the context of interval branch-and-bound algorithms for unconstrained global optimization. They established that the accuracy with which the bounding scheme estimates the range of the objective function, as determined by the notion of convergence order (see Definition 7), dictates the extent of the cluster problem. Furthermore, they determined that, in the worst case, at least second-order convergence of the bounding scheme is required to mitigate ‘clustering’ [7]. Next, Neumaier [21] provided a similar analysis and concluded that even second-order convergence of the bounding scheme might, in the worst case, result in an exponential number of boxes in the vicinity of an unconstrained global minimizer. In addition, Neumaier claimed that a similar situation holds in a reduced manifold for the constrained case [21].

Recently, Wechsung et al. [29] provided a refined analysis of Neumaier’s argument for unconstrained global optimization which corroborated the previous analyses. In addition, they showed that the number of boxes visited in the vicinity of a global minimizer may scale differently depending on the convergence order prefactor. As a result, second-order convergent bounding schemes with small-enough prefactors may altogether eliminate the cluster problem, while second-order convergent bounding schemes with large-enough prefactors may result in an exponential number of boxes being visited. Also note the analysis by Wechsung [28, Section 2.3] that shows first-order convergence of the bounding scheme may be sufficient to mitigate the cluster problem in unconstrained optimization when the optimizer sits at a point of nondifferentiability of the objective function.

As highlighted above, the convergence order of the bounding scheme plays a key role in the analysis of the cluster problem. This concept, which is based on the rate at which the notion of excess width from interval extensions [18] shrinks to zero, compares the rate of convergence of an estimated range of a function to its true range. Bompadre and Mitsos [3] developed the notions of Hausdorff and pointwise convergence rates of bounding schemes, and established sharp rules for the propagation of convergence orders of bounding schemes constructed using McCormick’s composition rules [17]. In addition, Bompadre and Mitsos [3] demonstrated second-order pointwise convergence of schemes of convex and concave envelopes of twice continuously differentiable functions, second-order pointwise convergence of schemes of \(\alpha \)BB relaxations [1], and provided a conservative estimate of the prefactor of \(\alpha \)BB relaxation schemes for the case of constant \(\alpha \). Scholz [25] demonstrated second-order convergence of centered forms (also see, for instance, the article by Krawczyk and Nickel [15]). Bompadre and coworkers [4] established sharp rules for the propagation of convergence orders of Taylor and McCormick-Taylor models. Najman and Mitsos [20] established sharp rules for the propagation of convergence orders of the multivariate McCormick relaxations developed in [19, 26]. Finally, Khan and coworkers [14] developed a continuously differentiable variant of McCormick relaxations [17, 19, 26], and established second-order pointwise convergence of schemes of the differentiable McCormick relaxations for twice continuously differentiable functions. The above literature not only helps develop bounding schemes for unconstrained optimization with the requisite convergence order, but also provides conservative estimates for the convergence order prefactor (see Definition 7). Also note the related definition for the rate of convergence of (lower) bounding schemes for geometric branch-and-bound methods provided by Schöbel and Scholz [23].

This work provides an analysis of the cluster problem for constrained global optimization. It is shown that clustering can occur both on feasible and infeasible regions in the neighborhood of a global minimizer. Akin to the case of unconstrained optimization, both the convergence order of a lower bounding scheme and its corresponding prefactor (see Definition 8) may be crucial towards tackling the cluster problem; however, in contrast to the case of unconstrained optimization, it is shown that first-order convergent lower bounding schemes with small-enough prefactors may eliminate the cluster problem under certain conditions. Additionally, conditions under which second-order convergence of the lower bounding scheme may be sufficient to mitigate clustering are developed.

This work assumes that boxes can be placed such that global minimizers are always in their relative interior, otherwise an exponential number of boxes can contain global minimizers. Techniques such as epsilon-inflation [16] or back-boxing [21, 27] can potentially be used to place boxes with global minimizers in their relative interior.

This article is organized as follows. Section 2 provides the problem formulation, describes the notions of convergence used in this work, and sets up the framework for analyzing the cluster problem in Sect. 3. Section 3.1 analyzes the cluster problem on the set of nearly-optimal feasible points in a neighborhood of a global minimizer and determines conditions under which first-order and second-order convergent bounding schemes may be sufficient to mitigate clustering in such neighborhoods. Section 3.2 analyzes the cluster problem on the set of nearly-feasible points in a neighborhood of a global minimizer that have a ‘good-enough’ objective function value, and develops conditions under which first-order and second-order convergent bounding schemes may be sufficient to mitigate clustering in such neighborhoods. Finally, Sect. 4 lists the conclusions of this work.

2 Problem formulation and background

Consider the problem

$$\begin{aligned} \underset{\mathbf {x}}{\min }&\,\, f(\mathbf {x}) \\ \text {s.t.}&\,\, \mathbf {g}(\mathbf {x}) \le \mathbf {0}, \nonumber \\&\quad \mathbf {h}(\mathbf {x}) = \mathbf {0}, \nonumber \\&\quad \mathbf {x}\in X, \end{aligned}$$
(P)

where \(X \subset \mathbb {R}^{n_x}\) is a nonempty open bounded convex set, the functions \(f : X \rightarrow \mathbb {R}\), \(\mathbf {g}: X \rightarrow \mathbb {R}^{m_I}\), and \(\mathbf {h}: X \rightarrow \mathbb {R}^{m_E}\) are continuous on X, and \(\mathbf {0}\) denotes a vector of zeros of appropriate dimension. The following assumptions are enforced throughout this work.

Assumption 1

The constraints define a nonempty compact set

$$\begin{aligned} \left\{ \mathbf {x}\in X : \mathbf {g}(\mathbf {x}) \le \mathbf {0}, \, \mathbf {h}(\mathbf {x}) = \mathbf {0} \right\} \subset X. \end{aligned}$$

Assumption 2

Let \(\mathbf {x}^* \in X\) be a global minimum for Problem (P), and assume that the branch-and-bound algorithm has found the upper bound \({\textit{UBD}} = f(\mathbf {x}^*)\) sufficiently early on. Let \(\varepsilon \) be the termination tolerance for the branch-and-bound algorithm, and suppose the algorithm fathoms node k when \({\textit{UBD}} - {\textit{LBD}}_k \le \varepsilon \), where \({\textit{LBD}}_k\) is the lower bound on node k.

When Assumption 1 is enforced, Problem (P) attains its optimal solution on X by virtue of the assumption that f is continuous on X. Note that the assumption that X is an open set is made purely for ease of exposition, particularly when differentiability assumptions on the functions in Problem (P) are made, and is not practically implementable in general. As a result, we implicitly assume throughout this work that finite bounds on the variables (which define an interval in the interior of X) are available for use in a branch-and-bound setting.

Assumption 2 essentially assumes that the convergence of the overall lower bound is the limiting factor for the convergence of the branch-and-bound algorithm. This is usually a reasonable assumption in the context of branch-and-bound algorithms for global optimization where most of the effort is typically spent in proving \(\varepsilon \)-optimality of feasible solutions found using (heuristic) local optimization-based techniques. The cluster problem analysis in this work is asymptotic in \(\varepsilon \) in general; we provide conservative estimates of the worst-case number of boxes visited by the branch-and-bound algorithm in nearly-optimal and nearly-feasible neighborhoods of global minimizers for some sufficiently small \(\varepsilon > 0\). The conservatism of the above estimates decreases as \(\varepsilon ~\rightarrow ~0\). The asymptotic nature of our analysis with respect to \(\varepsilon \) is not only a result of considering the local behavior of the objective function in the vicinity of a global minimizer (which is also a limitation of the analyses of the cluster problem in unconstrained optimization [7, 21, 28, 29]), but is also a consequence of considering the local behavior of the constraints (and, therefore, the feasible region) in the vicinity of a global minimizer. In practice, values of \(\varepsilon \) for which the analysis of the cluster problem provides a reasonable overestimate of the number of boxes visited can be much larger than the machine precision (on the order of \(10^{-1}\)). This is evidenced by the examples in Sect. 3. Also note that the fathoming criterion for the branch-and-bound algorithm in this work is different from the one considered by Wechsung et al. [29], who assume that node k is fathomed only when \({\textit{LBD}}_k > {\textit{UBD}}\); however, the worst-case estimates of the number of boxes visited by the branch-and-bound algorithm are not affected by this difference in our assumptions.

Throughout this work, we will use \(\mathbf {x}^*\) to denote a global minimizer of Problem (P), \(\mathbb {I}Z\) to denote the set of nonempty, closed and bounded interval subsets of \(Z \subset \mathbb {R}^n\), \(Z^{\text {C}}\) to denote the relative complement of a set \(Z \subset \mathbb {R}^n\) with respect to X, cl(Z) to denote the closure of a set \(Z \subset \mathbb {R}^n\), \({{||}\mathbf {z}{||}}\) to denote the Euclidean norm of \(\mathbf {z}\in \mathbb {R}^n\), \(\mathbb {R}_{-}\) to denote the nonpositive orthant, \(z_j\) to denote the \(j^{\text {th}}\) component of a vector \(\mathbf {z}\), \((z_1,z_2,\ldots ,z_n)\) to denote a vector \(\mathbf {z}\in \mathbb {R}^n\) with entries \(z_1,z_2,\ldots ,z_n \in \mathbb {R}\) (note that \((z_1,z_2)\) will be used to denote both an open interval in \(\mathbb {R}\) and a vector in \(\mathbb {R}^2\); the intended use will be clear from the context), \(\lceil {\cdot }\rceil \) to denote the ceiling function, \(\begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}\) to denote a vector-valued function with domain Y and codomain \(\mathbb {R}^{m + n}\) corresponding to vector-valued functions \(\mathbf {g}: Y \rightarrow \mathbb {R}^m\) and \(\mathbf {h}: Y \rightarrow \mathbb {R}^n\), \(\overline{\mathbf {f}}(Z)\) to denote the image of \(Z \subset Y\) under the function \(\mathbf {f}: Y \rightarrow \mathbb {R}^m\), \(f'(\mathbf {z};\mathbf {d})\) to denote the directional derivative of a function \(f: Z \subset \mathbb {R}^n \rightarrow \mathbb {R}\) at a point \(\mathbf {z}\in Z\) (with Z open) in a direction \(\mathbf {d}\in \mathbb {R}^n\), and ‘differentiability’ to refer to differentiability in the Fréchet sense. The following definitions are in order.

Definition 1

(Width of an interval) Let \(Z = [z^{\text {L}}_1,z^{\text {U}}_1] \times \cdots \times [z^{\text {L}}_n,z^{\text {U}}_n]\) be an element of \(\mathbb {I}\mathbb {R}^{n}\). The width of Z, denoted by w(Z), is given by

$$\begin{aligned} w(Z) {:}{=} \max _{i = 1,\ldots ,n} (z^{\text {U}}_i - z^{\text {L}}_i). \end{aligned}$$

Definition 2

(Distance between two sets) Let \(Y, Z \subset \mathbb {R}^n\). The distance between Y and Z, denoted by d(YZ), is defined as

$$\begin{aligned} d(Y,Z) {:}{=} \underset{\mathbf {z}\in Z}{\inf _{\mathbf {y}\in Y,}} \,\, {||}\mathbf {y}-\mathbf {z}{||}. \end{aligned}$$

Note that the above definition of distance does not define a metric; however, it will prove useful in defining a measure of infeasibility for points in X for Problem (P).

Definition 3

(Lipschitz continuous function) Let \(Z \subset \mathbb {R}^n\). A function \(f:Z \rightarrow \mathbb {R}\) is Lipschitz continuous with Lipschitz constant \(M \ge 0\) if \({|}f(\mathbf {z}_1) - f(\mathbf {z}_2){|} \le M {||}\mathbf {z}_1 - \mathbf {z}_2{||}\), \(\forall \mathbf {z}_1,\mathbf {z}_2 \in Z\).

Since the cluster problem analysis is asymptotic in \(\varepsilon \), we will need the following asymptotic notations.

Definition 4

(Big O and little o notations) Let \(Y \subset \mathbb {R}\), \(f:Y \rightarrow \mathbb {R}\), and \(g:Y \rightarrow \mathbb {R}\). We say that \(f(y) = O(g(y))\) as \(y \rightarrow \bar{y} \in Y\) if and only if there exist \(\delta , M > 0\) such that

$$\begin{aligned} {|}f(y){|} \le M{|}g(y){|}, \quad \forall y \in Y \text { with } {|}y-\bar{y}{|} < \delta . \end{aligned}$$

Similarly, we say that \(f(y) = o(g(y))\) as \(y \rightarrow \bar{y} \in Y\) if and only if for all \(M' > 0\) there exists \(\delta ' > 0\) such that

$$\begin{aligned} {|}f(y){|} \le M'{|}g(y){|}, \quad \forall y \in Y \text { with } {|}y-\bar{y}{|} < \delta '. \end{aligned}$$

Note that unless otherwise specified, we consider \(\bar{y} = 0\) in this work.

Definition 5

(Convex and concave relaxations) Given a convex set \(Z \subset \mathbb {R}^n\) and a function \(f:Z \rightarrow \mathbb {R}\), a convex function \(f^{cv }_Z: Z \rightarrow \mathbb {R}\) is called a convex relaxation of f on Z if \(f^{cv }_Z(\mathbf {z}) \le f(\mathbf {z})\), \(\forall \mathbf {z}\in Z\). Similarly, a concave function \(f^{cc }_Z: Z \rightarrow \mathbb {R}\) is called a concave relaxation of f on Z if \(f^{cc }_Z(\mathbf {z}) \ge f(\mathbf {z})\), \(\forall \mathbf {z}\in Z\).

The following definition introduces the notion of schemes of relaxations [3].

Definition 6

(Schemes of convex and concave relaxations) Let \(Y \subset \mathbb {R}^{n}\) be a nonempty convex set, and let \(f:Y \rightarrow \mathbb {R}\). Assume that for every \(Z \in \mathbb {I}Y\), we can construct functions \(f^{cv }_{Z}: Z \rightarrow \mathbb {R}\) and \(f^{cc }_{Z}: Z \rightarrow \mathbb {R}\) that are convex and concave relaxations, respectively, of f on Z. The sets of functions \((f^{cv }_{Z})_{Z \in \mathbb {I}Y}\) and \((f^{cc }_{Z})_{Z \in \mathbb {I}Y}\) define schemes of convex and concave relaxations, respectively, of f in Y, and the set of pairs of functions \((f^{cv }_{Z}, f^{cc }_{Z})_{Z \in \mathbb {I}Y}\) defines a scheme of relaxations of f in Y. The schemes of relaxations are called continuous when \(f^{cv }_{Z}\) and \(f^{cc }_{Z}\) are continuous on Z for each \(Z \in \mathbb {I}Y\).

The next definition presents a notion of convergence order of schemes of convex and concave relaxations [29] based on the notion of Hausdorff convergence order of a scheme of relaxations [3].

Definition 7

(Convergence orders of schemes of convex and concave relaxations) Let \(Y \subset \mathbb {R}^{n}\) be a nonempty bounded convex set, and \(f:Y \rightarrow \mathbb {R}\) be a continuous function. Let \((f^{cv }_{Z})_{Z \in \mathbb {I}Y}\) and \((f^{cc }_{Z})_{Z \in \mathbb {I}Y}\) respectively denote continuous schemes of convex and concave relaxations of f in Y.

The scheme of convex relaxations \((f^{cv }_{Z})_{Z \in \mathbb {I}Y}\) is said to have convergence of order \(\beta > 0\) at \(\mathbf {y}\in Y\) if there exists \(\tau ^{\text {cv}} \ge 0\) such that

$$\begin{aligned} \min _{\mathbf {z}\in Z} f(\mathbf {z}) - \min _{\mathbf {z}\in Z} f^{cv }_{Z}(\mathbf {z}) \le \tau ^{\text {cv}} {w(Z)}^{\beta }, \quad \forall Z \in \mathbb {I}Y\text { with } \mathbf {y}\in Z. \end{aligned}$$

Similarly, the scheme of concave relaxations \((f^{cc }_{Z})_{Z \in \mathbb {I}Y}\) is said to have convergence of order \(\beta > 0\) at \(\mathbf {y}\in Y\) if there exists \(\tau ^{\text {cc}} \ge 0\) such that

$$\begin{aligned} \max _{\mathbf {z}\in Z} f^{cc }_{Z}(\mathbf {z}) - \max _{\mathbf {z}\in Z} f(\mathbf {z}) \le \tau ^{\text {cc}} {w(Z)}^{\beta }, \quad \forall Z \in \mathbb {I}Y\text { with } \mathbf {y}\in Z. \end{aligned}$$

The schemes \((f^{cv }_{Z})_{Z \in \mathbb {I}Y}\) and \((f^{cc }_{Z})_{Z \in \mathbb {I}Y}\) are said to have convergence of order \(\beta > 0\) on Y if they have convergence of order (at least) \(\beta \) at each \(\mathbf {y}\in Y\), with the constants \(\tau ^{\text {cv}}\) and \(\tau ^{\text {cc}}\) independent of \(\mathbf {y}\).

The following definition seeks to extend the notion of convergence order of a bounding scheme [3, 4, 29] to constrained problems. Conditions under which specific lower bounding schemes are guaranteed to exhibit a certain convergence order will be presented in a future article.

Definition 8

(Convergence order of a lower bounding scheme) Consider Problem (P). For any \(Z \in \mathbb {I}X\), let \(\mathscr {F}(Z) = \left\{ \mathbf {x}\in Z : \mathbf {g}(\mathbf {x}) \le \mathbf {0}, \mathbf {h}(\mathbf {x}) = \mathbf {0} \right\} \) denote the feasible set of Problem (P) with \(\mathbf {x}\) restricted to Z.

Let \((f^{cv }_{Z})_{Z \in \mathbb {I}X}\) and \((\mathbf {g}^{cv }_{Z})_{Z \in \mathbb {I}X}\) denote continuous schemes of convex relaxations of f and \(\mathbf {g}\), respectively, in X, and let \((\mathbf {h}^{cv }_{Z}, \mathbf {h}^{cc }_{Z})_{Z \in \mathbb {I}X}\) denote a continuous scheme of relaxations of \(\mathbf {h}\) in X. For any \(Z \in \mathbb {I}X\), let \(\mathscr {F}^{cv }(Z) = \left\{ \mathbf {x}\in Z : \mathbf {g}^{cv }_{Z}(\mathbf {x}) \le \mathbf {0}, \mathbf {h}^{cv }_{Z}(\mathbf {x}) \le \mathbf {0}, \mathbf {h}^{cc }_{Z}(\mathbf {x}) \ge \mathbf {0} \right\} \) denote the feasible set of the convex relaxation-based lower bounding scheme. The convex relaxation-based lower bounding scheme is said to have convergence of order \(\beta > 0\) at

  1. 1.

    a feasible point \(\mathbf {x}\in X\) if there exists \(\tau \ge 0\) such that for every \(Z \in \mathbb {I}X\) with \(\mathbf {x}\in Z\),

    $$\begin{aligned} \min _{\mathbf {z}\in \mathscr {F}(Z)}f(\mathbf {z}) - \min _{\mathbf {z}\in \mathscr {F}^{cv }(Z)} f^{cv }_{Z}(\mathbf {z}) \le \tau {w(Z)}^{\beta }. \end{aligned}$$
  2. 2.

    an infeasible point \(\mathbf {x}\in X\) if there exists \(\bar{\tau } \ge 0\) such that for every \(Z \in \mathbb {I}X\) with \(\mathbf {x}\in Z\),

    $$\begin{aligned} d\left( \overline{\begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}} (Z) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) - d\left( \mathscr {I}_{C}(Z), \mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \le \bar{\tau } {w(Z)}^{\beta }, \end{aligned}$$

where \(\overline{\begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}} (Z)\) denotes the image of Z under the vector-valued function \(\begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}\), and \(\mathscr {I}_{C}(Z)\) is defined by

$$\begin{aligned} (\mathscr {I}_{C}(Z))_{Z \in \mathbb {I}X} := \left( \left\{ (\mathbf {v},\mathbf {w}) \in \mathbb {R}^{m_I} \times \mathbb {R}^{m_E} : \mathbf {v}= \mathbf {g}^{\text {cv}}_{Z}(\mathbf {z}), \mathbf {h}^{\text {cv}}_{Z}(\mathbf {z}) \le \mathbf {w}\le \mathbf {h}^{\text {cc}}_{Z}(\mathbf {z}) \text { for some } \mathbf {z}\in Z \right\} \right) _{Z \in \mathbb {I}X}. \end{aligned}$$

The scheme of lower bounding problems is said to have convergence of order \(\beta > 0\) on X if it has convergence of order (at least) \(\beta \) at each \(\mathbf {x}\in X\), with the constants \(\tau \) and \(\bar{\tau }\) independent of \(\mathbf {x}\).

Definition 8 is motivated by the requirements of a lower bounding scheme to fathom feasible and infeasible regions in a branch-and-bound procedure [11]. On nested sequences of intervals converging to a feasible point of Problem (P), we require that the corresponding sequences of lower bounds converge rapidly to the corresponding sequences of minimum objective values. On the other hand, on nested sequences of intervals converging to an infeasible point of Problem (P), we require that the corresponding sequences of lower bounding problems rapidly detect the (eventual) infeasibility of the corresponding sequences of intervals for Problem (P). The latter requirement is enforced by requiring that the measures of infeasibility of the corresponding lower bounding problems, as determined by the distance function d, converge rapidly to the measures of infeasibility of the corresponding restricted Problems (P). Note that some intervals that only contain infeasible points may also potentially be fathomed by value dominance if the lower bounds on those intervals obtained by solving the corresponding relaxation-based lower bounding problems is greater than or equal to \({\textit{UBD}} - \varepsilon \). This possibility in considered later in this section (see, for instance, Lemma 3) and in Sect. 3.2.

The following lemma detail worst-case conditions under which nodes containing a global minimum and infeasible points are fathomed.

Lemma 1

(Fathoming nodes containing global minimizers) Let \(X^* \in \mathbb {I}X\), with \(\mathbf {x}^* \in X^*\), correspond to the domain of node \(k^*\) in the branch-and-bound tree. Suppose the convex relaxation-based lower bounding scheme has convergence of order \(\beta ^* > 0\) at \(\mathbf {x}^*\) with a prefactor \(\tau ^* > 0\) (see Definition 8). For node \(k^*\) to be fathomed, we require, in that worst case, that

$$\begin{aligned} w(X^*) \le \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}. \end{aligned}$$

Proof

The condition for node \(k^*\) to be fathomed by value dominance is \({\textit{UBD}} - {\textit{LBD}}_{k^*} = f(\mathbf {x}^*) - {\textit{LBD}}_{k^*} \le \varepsilon \). Since we are concerned about convergence at the feasible point \(\mathbf {x}^* \in X\), we have from Definition 8 that

$$\begin{aligned}&\min _{\mathbf {z}\in \mathscr {F}(X^*)}f(\mathbf {z}) - \min _{\mathbf {z}\in \mathscr {F}^{cv }(X^*)} f^{cv }_{X^*}(\mathbf {z}) \le \tau ^* {w(X^*)}^{\beta ^*} \\ \implies&{\textit{LBD}}_{k^*} = \min _{\mathbf {z}\in \mathscr {F}^{cv }(X^*)} f^{cv }_{X^*}(\mathbf {z}) \ge f(\mathbf {x}^*) - \tau ^* {w(X^*)}^{\beta ^*}. \end{aligned}$$

Therefore, in the worst case, node \(k^*\) is fathomed only when

$$\begin{aligned} {\textit{LBD}}_{k^*} \ge f(\mathbf {x}^*) - \tau ^* {w(X^*)}^{\beta ^*} \ge f(\mathbf {x}^*) - \varepsilon \iff w(X^*) \le \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}. \end{aligned}$$

\(\square \)

Lemma 2

(Fathoming infeasible nodes by infeasibility) Let \(X^I \in \mathbb {I}X\), with

$$\begin{aligned} X^I \subset \left\{ \mathbf {x}\in X : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) > \varepsilon ^f \right\} \end{aligned}$$

for some \(\varepsilon ^f > 0\), correspond to the domain of node \(k^I\) in the branch-and-bound tree. Suppose the convex relaxation-based lower bounding scheme has convergence of order \(\beta ^I > 0\) at each \(\mathbf {x}\in X^I\) with a prefactor \(\tau ^I > 0\) that is independent of \(\mathbf {x}\) (see Definition 8). For node \(k^I\) to be fathomed by infeasibility, we require, in the worst case, that

$$\begin{aligned} w(X^I) \le \left( \displaystyle \frac{\varepsilon ^f}{\tau ^I}\right) ^{\frac{1}{\beta ^I}}. \end{aligned}$$

Proof

For node \(k^I\) to be fathomed by infeasibility, we require that the convex relaxation-based lower bounding problem is infeasible on \(X^I\), i.e., \( d\left( \mathscr {I}_{C}(X^I), \mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) > 0. \) Since we are concerned about convergence at infeasible points, we have from Definition 8 that

$$\begin{aligned}&d\left( \overline{\begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}} (X^I) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) - d\left( \mathscr {I}_{C}(X^I), \mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \le \tau ^I {w(X^I)}^{\beta ^I} \\ \implies&d\left( \mathscr {I}_{C}(X^I), \mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \ge d\left( \overline{\begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}} (X^I) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) - \tau ^I {w(X^I)}^{\beta ^I}. \end{aligned}$$

Therefore, node \(k^I\) is fathomed, in the worst case, only when

$$\begin{aligned}&\qquad \qquad d\left( \mathscr {I}_{C}(X^I), \mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \ge d\left( \overline{\begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}} (X^I) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) - \tau ^I {w(X^I)}^{\beta ^I} > 0 \\&\iff \varepsilon ^f - \tau ^I {w(X^I)}^{\beta ^I} \ge 0 \\&\iff w(X^I) \le \left( \displaystyle \frac{\varepsilon ^f}{\tau ^I}\right) ^{\frac{1}{\beta ^I}}. \end{aligned}$$

\(\square \)

Lemma 3

(Fathoming infeasible nodes by value dominance) Let \(X^I \in \mathbb {I}X\), with

$$\begin{aligned} X^I \subset \left\{ \mathbf {x}\in X : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) > 0 \right\} , \end{aligned}$$

correspond to the domain of node \(k^I\) in the branch-and-bound tree. Suppose \(\forall \mathbf {x}\in X^I\), \(f(\mathbf {x}) \ge f(\mathbf {x}^*)\). Furthermore, suppose the scheme \((f^{cv }_{Z})_{Z \in \mathbb {I}X}\) has convergence of order \(\beta ^f > 0\) at each \(\mathbf {x}\in X^I\) with a prefactor \(\tau ^f > 0\) that is independent of \(\mathbf {x}\) (see Definition 7). If

$$\begin{aligned} w(X^I) \le \left( \displaystyle \frac{\varepsilon }{\tau ^f}\right) ^{\frac{1}{\beta ^f}}, \end{aligned}$$

then node \(k^I\) will be fathomed.

Proof

A sufficient condition for node \(k^I\) to be fathomed is

$$\begin{aligned} \min _{\mathbf {z}\in \mathscr {F}^{cv }(X^I)} f^{cv }_{X^I}(\mathbf {z}) \ge f(\mathbf {x}^*) - \varepsilon . \end{aligned}$$

Since \((f^{cv }_{Z})_{Z \in \mathbb {I}X}\) has convergence of order \(\beta ^f\), we have from Definition 7 that

$$\begin{aligned} \min _{\mathbf {z}\in X^I} f^{cv }_{X^I}(\mathbf {z})&\ge \min _{\mathbf {z}\in X^I} f(\mathbf {z}) - \tau ^f w(X^I)^{\beta ^f} \\&\ge \min _{\mathbf {z}\in X^I} f(\mathbf {z}) - \varepsilon \\&\ge f(\mathbf {x}^*) - \varepsilon , \end{aligned}$$

where Step 2 uses \(w(X^I) \le \left( \displaystyle \frac{\varepsilon }{\tau ^f}\right) ^{\frac{1}{\beta ^f}}\), and Step 3 uses \(f(\mathbf {x}) \ge f(\mathbf {x}^*)\), \(\forall \mathbf {x}\in X^I\). Therefore,

$$\begin{aligned} \underset{\mathbf {z}\in \mathscr {F}^{cv }(X^I)}{\min } f^{cv }_{X^I}(\mathbf {z}) \ge \underset{\mathbf {z}\in X^I}{\min } \,f^{cv }_{X^I}(\mathbf {z}) \ge f(\mathbf {x}^*) - \varepsilon . \end{aligned}$$

The desired result follows. \(\square \)

In what follows, we shall partition the set X into distinct regions with the aim of constructing regions that are either relatively easy to fathom (based on Lemma 13), or are relatively hard to fathom. Suppose the convex relaxation-based lower bounding scheme has convergence of order \(\beta ^* > 0\) on \(\mathscr {F}(X)\) with prefactor \(\tau ^* > 0\), and convergence of order \(\beta ^I > 0\) on \(\left( \mathscr {F}({X})\right) ^{\text {C}}\) with prefactor \(\tau ^I > 0\) (note that it is sufficient for the lower bounding scheme to have the requisite convergence orders on some neighborhood of the global minimizers of Problem (P) for our analysis to hold, as will become clear in Sect. 3). Furthermore, suppose the scheme \((f^{cv }_{Z})_{Z \in \mathbb {I}X}\) has convergence of order \(\beta ^f > 0\) on X with prefactor \(\tau ^f > 0\). Pick a feasibility tolerance \(\varepsilon ^f\) and an optimality tolerance \(\varepsilon ^o\) such that

$$\begin{aligned}&\left( \displaystyle \frac{\varepsilon ^f}{\tau ^I}\right) ^{\frac{1}{\beta ^I}} =\left( \displaystyle \frac{\varepsilon ^o}{\tau ^f}\right) ^{\frac{1}{\beta ^f}} =\left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}, \end{aligned}$$
(TOL)

and consider the following partition of X:

$$\begin{aligned} X_1&:= \left\{ \mathbf {x}\in X : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right)> \varepsilon ^f \right\} , \\ X_2&:= \left\{ \mathbf {x}\in X : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \in (0,\varepsilon ^f] \text { and } f(\mathbf {x}) - f(\mathbf {x}^*)> \varepsilon ^o \right\} , \\ X_3&:= \left\{ \mathbf {x}\in X : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \in (0,\varepsilon ^f] \text { and } f(\mathbf {x}) - f(\mathbf {x}^*) \le \varepsilon ^o \right\} , \\ X_4&:= \left\{ \mathbf {x}\in X : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) = 0 \text { and } f(\mathbf {x}) - f(\mathbf {x}^*) > \varepsilon \right\} , \, \text {and} \\ X_5&:= \left\{ \mathbf {x}\in X : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) = 0 \text { and } f(\mathbf {x}) - f(\mathbf {x}^*) \le \varepsilon \right\} . \end{aligned}$$

The set \(X_1\) corresponds to the set of infeasible points for Problem (P) with the measure of infeasibility greater than \(\varepsilon ^f\). The set \(X_2\) corresponds to the set of infeasible points for Problem (P) with the measure of infeasibility less than or equal to \(\varepsilon ^f\) and with the objective function value greater than \(f(\mathbf {x}^*) + \varepsilon ^o\), while the set \(X_3\) corresponds to the set of infeasible points for Problem (P) with the measure of infeasibility less than or equal to \(\varepsilon ^f\) and the objective function value less than or equal to \(f(\mathbf {x}^*) + \varepsilon ^o\). The set \(X_4\) corresponds to the set of feasible points for Problem (P) with objective value greater than \(f(\mathbf {x}^*) + \varepsilon \), while the set \(X_5\) corresponds to the set of feasible points for Problem (P) with objective value less than or equal to \(f(\mathbf {x}^*) + \varepsilon \). The sets \(X_1\) through \(X_5\) are illustrated in Fig. 1 for the three two-dimensional problems presented in Examples 13.

Fig. 1
figure 1

Plots of the sets \(X_1\) through \(X_5\) for an unconstrained, an inequality-constrained, and an equality-constrained problem. The dashed lines define the sets X, and the filled-in triangles denote the unique global minimizers of the problems on X. All plots use \(\varepsilon = \varepsilon ^o = \varepsilon ^f = 0.1\) for illustration a Example 1 (unconstrained), b Example 2 (inequality-constrained), c Example 3 (equality-constrained)

Intuitively, we expect that nodes with domains contained in the sets \(X_1\) and \(X_2\) can be fathomed relatively easily (by infeasibility and value dominance, respectively) compared to nodes with domains contained in the set \(X_3\). Similarly, we expect that nodes with domains contained in the set \(X_4\) can be fathomed relatively easily (by value dominance) compared to nodes with domains contained in the set \(X_5\). This intuition is formalized in Corollary 1. Consequently, the extent of clustering is dictated primarily by the number of boxes required to cover the regions \(X_3\) and \(X_5\). Section 3 provides conservative estimates of the number of boxes of certain widths that are required to cover \(X_3\) and \(X_5\) under suitable assumptions. As an aside, note that the condition specified by Equation (TOL) is used to roughly enforce that nodes with domains contained in the sets \(X_1\), \(X_2\), and \(X_4\) can, in the worst case, be fathomed using a similar level of effort.

Example 1

Let \(X = (0,1) \times (0,1)\), \(m_I = m_E = 0\), and \(f(\mathbf {x}) = x^4_1 + x^4_2 - x^2_1 - x^2_2\) with \(\mathbf {x}^* = \left( \frac{1}{\sqrt{2}},\frac{1}{\sqrt{2}}\right) \). We have:

$$\begin{aligned}&X_1 = X_2 = X_3 = \emptyset , \\&X_4 = \left\{ \mathbf {x}\in X : x^4_1 + x^4_2 - x^2_1 - x^2_2 > f(\mathbf {x}^*) + \varepsilon \right\} , \, \text {and} \\&X_5 = \left\{ \mathbf {x}\in X : x^4_1 + x^4_2 - x^2_1 - x^2_2 \le f(\mathbf {x}^*) + \varepsilon \right\} . \end{aligned}$$

The sets \(X_1\) through \(X_5\) are depicted in Fig. 1a for \(\varepsilon = 0.1\).

Example 2

Let \(X = (2.2,2.5) \times (2.9,3.3)\), \(m_I = 3\), \(m_E = 0\), \(f(\mathbf {x}) = -x_1 - x_2\), \(g_1(\mathbf {x}) = x_2 - 2x^4_1 + 8x^3_1 - 8x^2_1 - 2\), \(g_2(\mathbf {x}) = x_2 - 4x^4_1 + 32x^3_1 - 88x^2_1 + 96x_1 - 36\), and \(g_3(\mathbf {x}) = 3 - x_2\) with \(\mathbf {x}^* \approx (2.33,3.18)\) (based on Example 4.10 in [8]). We have:

$$\begin{aligned} X_1&= \left\{ \mathbf {x}\in X : \displaystyle \sqrt{ \displaystyle \sum _{j=1}^{3}{\left( \max \{0, g_j(\mathbf {x}) \}\right) ^2}}> \varepsilon ^f \right\} , \\ X_2&= \left\{ \mathbf {x}\in X : \displaystyle \sqrt{ \displaystyle \sum _{j=1}^{3}{\left( \max \{0, g_j(\mathbf {x}) \}\right) ^2}} \in (0,\varepsilon ^f], \, -x_1 - x_2> f(\mathbf {x}^*) + \varepsilon ^o \right\} , \\ X_3&= \left\{ \mathbf {x}\in X : \displaystyle \sqrt{ \displaystyle \sum _{j=1}^{3}{\left( \max \{0, g_j(\mathbf {x}) \}\right) ^2}} \in (0,\varepsilon ^f], \, -x_1 - x_2 \le f(\mathbf {x}^*) + \varepsilon ^o \right\} , \\ X_4&= \left\{ \mathbf {x}\in X : \mathbf {g}(\mathbf {x}) \le \mathbf {0}, \, -x_1 - x_2 > f(\mathbf {x}^*) + \varepsilon \right\} , \, \text {and} \\ X_5&= \left\{ \mathbf {x}\in X : \mathbf {g}(\mathbf {x}) \le \mathbf {0}, \, -x_1 - x_2 \le f(\mathbf {x}^*) + \varepsilon \right\} . \end{aligned}$$

The sets \(X_1\) through \(X_5\) are depicted in Fig. 1b for \(\varepsilon = \varepsilon ^o = \varepsilon ^f = 0.1\).

Example 3

Let \(X = (0.4,1.0) \times (0.5,2.0)\), \(m_I = 2\), \(m_E = 1\), \(f(\mathbf {x}) = -12x_1 - 7x_2 + x^2_2\), \(g_1(\mathbf {x}) = x_1 - 0.9\), \(g_2(\mathbf {x}) = 0.5 - x_1\), and \(h(\mathbf {x}) = x_2 + 2x^4_1 - 2\) with \(\mathbf {x}^* \approx (0.72,1.47)\) (based on Example 4.9 in [8]). We have:

$$\begin{aligned} X_1&= \left\{ \mathbf {x}\in X : \displaystyle \sqrt{ \displaystyle \sum _{j=1}^{2}{\left( \max \{0, g_j(\mathbf {x}) \}\right) ^2} + {|}*{|}{h(\mathbf {x})}^2}> \varepsilon ^f \right\} , \\ X_2&= \left\{ \mathbf {x}\in X : \displaystyle \sqrt{ \displaystyle \sum _{j=1}^{2}{\left( \max \{0, g_j(\mathbf {x}) \}\right) ^2} + {|}*{|}{h(\mathbf {x})}^2} \in (0,\varepsilon ^f], \, -12x_1 - 7x_2 + x^2_2> f(\mathbf {x}^*) + \varepsilon ^o \right\} , \\ X_3&= \left\{ \mathbf {x}\in X : \displaystyle \sqrt{ \displaystyle \sum _{j=1}^{2}{\left( \max \{0, g_j(\mathbf {x}) \}\right) ^2} + {|}*{|}{h(\mathbf {x})}^2} \in (0,\varepsilon ^f], \, -12x_1 - 7x_2 + x^2_2 \le f(\mathbf {x}^*) + \varepsilon ^o \right\} , \\ X_4&= \left\{ \mathbf {x}\in X : \mathbf {g}(\mathbf {x}) \le \mathbf {0}, \, h(\mathbf {x}) = 0, \, -12x_1 - 7x_2 + x^2_2 > f(\mathbf {x}^*) + \varepsilon \right\} , \, \text {and} \\ X_5&= \left\{ \mathbf {x}\in X : \mathbf {g}(\mathbf {x}) \le \mathbf {0}, \, h(\mathbf {x}) = 0, \, -12x_1 - 7x_2 + x^2_2 \le f(\mathbf {x}^*) + \varepsilon \right\} . \end{aligned}$$

The sets \(X_1\) through \(X_5\) are depicted in Fig. 1c for \(\varepsilon = \varepsilon ^o = \varepsilon ^f = 0.1\).

The following corollary of Lemma 12, and 3, similar to Lemma 2 in [29], provides sufficient conditions under which nodes with domains contained in \(X_1\), \(X_2\), and \(X_4\) can be fathomed.

Corollary 1

(Fathoming nodes contained in \(X_1\), \(X_2\), and \(X_4\)) Let \(\delta = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}\).

  1. 1.

    Suppose the convex relaxation-based lower bounding scheme has convergence of order \(\beta ^I > 0\) at each \(\mathbf {x}\in X_1\) with a prefactor \(\tau ^I > 0\) that is independent of \(\mathbf {x}\). Consider \(\bar{X}_1 \in \mathbb {I}X_1\) corresponding to the domain of node \(k_1\) in the branch-and-bound tree. If \(w(\bar{X}_1) \le \delta \), then node \(k_1\) will be fathomed by infeasibility.

  2. 2.

    Suppose the scheme of convex relaxations \((f^{cv }_{Z})_{Z \in \mathbb {I}X}\) has convergence of order \(\beta ^f > 0\) at each \(\mathbf {x}\in X_2\) with a prefactor \(\tau ^f > 0\) that is independent of \(\mathbf {x}\). Consider \(\bar{X}_2 \in \mathbb {I}X_2\) corresponding to the domain of node \(k_2\) in the branch-and-bound tree. If \(w(\bar{X}_2) \le \delta \), then node \(k_2\) will be fathomed by value dominance.

  3. 3.

    Suppose the convex relaxation-based lower bounding scheme has convergence of order \(\beta ^* > 0\) at each \(\mathbf {x}\in X_4\) with a prefactor \(\tau ^* > 0\) that is independent of \(\mathbf {x}\). Consider \(\bar{X}_4 \in \mathbb {I}X_4\) corresponding to the domain of node \(k_4\) in the branch-and-bound tree. If \(w(\bar{X}_4) \le \delta \), then node \(k_4\) will be fathomed by value dominance.

Corollary 1 implies that nodes with domains \(\bar{X}_1\), \(\bar{X}_2\), and \(\bar{X}_4\) such that \(\bar{X}_1 \in \mathbb {I}X_1\), \(\bar{X}_2 \in \mathbb {I}X_2\), and \(\bar{X}_4 \in \mathbb {I}X_4\) can be fathomed when or before their widths are \(\delta \) (in fact, nodes with domains in \(\mathbb {I}X_2\) and \(\mathbb {I}X_4\) can be fathomed when or before their widths are \(\left( \frac{\varepsilon ^o + \varepsilon }{\tau ^f}\right) ^{\frac{1}{\beta ^f}}\) and \(\left( \frac{2\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}\), respectively). However, nodes \(\bar{X}_5 \in \mathbb {I}X_5\) may, in the worst case, need to be covered by boxes of width \(\delta \) before they are fathomed. Furthermore, nodes \(\bar{X}_3 \in \mathbb {I}X_3\) may need to be covered by a large number of boxes depending on the convergence properties of the lower bounding scheme on \(X_3\). The following example presents a case in which clustering may occur on \(X_3\) because the lower bounding scheme does not have a sufficiently-large convergence order at infeasible points.

Example 4

Let \(X = (-2,2)\), \(m_I = 3\), and \(m_E = 0\) with \(f(x) = x\), \(g_1(x) = x^2\), \(g_2(x) = x-1\), and \(g_3(x) = -1-x\). We have \(x^* = 0\) (which is the only feasible point). For any \([x^{\text {L}}, x^{\text {U}}] {=}{:} Z \in \mathbb {I}X\), let

$$\begin{aligned} f^{cv }_Z(x)&= x, \\ g^{\text {cv}}_{1,Z}(x)&= \left\{ \begin{array}{ll} - (x^{\text {U}}-x^{\text {L}}), &{}\quad \hbox {if } 0 \in [x^{\text {L}}, x^{\text {U}}] \\ \min \left( \left( x^{\text {L}}\right) ^2, \left( x^{\text {U}}\right) ^2 \right) - (x^{\text {U}}-x^{\text {L}}), &{} \quad \hbox {otherwise} \end{array},\right. \\ g^{\text {cv}}_{2,Z}(x)&= x-1, \\ g^{\text {cv}}_{3,Z}(x)&= -1-x. \end{aligned}$$

We have \(\beta ^* = \beta ^I = 1\) and \(\beta ^f\) arbitarily-large with prefactors \(\tau ^*, \tau ^I\), and \(\tau ^f\), respectively, greater than zero.

Suppose \(\varepsilon , \varepsilon ^f \ll 1\). Pick \(\gamma > 0\) and \(\alpha \in (0,\gamma )\) such that \((\gamma + \alpha )^2 = \varepsilon ^f\). Let \(x^{\text {L}} := -\gamma -\alpha = -\sqrt{\varepsilon ^f}\) and \(x^{\text {U}} := -\gamma +\alpha < 0\). The width of Z is \(w(Z) = 2\alpha \). Note that \(g_2\) and \(g_3\) are feasible on Z; therefore, we need only be concerned with the feasibility of \(g_1\).

We have \(\overline{g}_1(Z) = [(\gamma - \alpha )^2, (\gamma +\alpha )^2]\) and \(d(\overline{\mathbf {g}}(Z),\mathbb {R}^{m_I}_{-}) = (\gamma - \alpha )^2\). This implies \(g_1\) is infeasible at each \(x \in Z\). Note that \(X_3 = [x^{\text {L}},0) \cup \) \(\left( 0, \min \{ \varepsilon ^o, \sqrt{\varepsilon ^f} \} \right] \) (which follows, in part, from each \(x \in [x^{\text {L}},0)\) being infeasible with \(f(x) \le f(x^*)\) and \(d(\{\mathbf {g}(x)\},\mathbb {R}^{m_I}_{-}) \le \varepsilon ^f\)).

We have \(\overline{g}^{\text {cv}}_{1,Z}(Z) = [(\gamma -\alpha )^2 - 2\alpha ,(\gamma -\alpha )^2 - 2\alpha ]\) and \(d(\overline{\mathbf {g}}^{\text {cv}}_{Z}(Z),\mathbb {R}^{m_I}_{-}) = \max \{0,(\gamma - \alpha )^2 - 2\alpha \}\). The optimal objective value of the lower bounding problem on Z is \(-\gamma -\alpha \) when \(d(\overline{\mathbf {g}}^{\text {cv}}_Z(Z),\mathbb {R}^{m_I}_{-}) = 0\), and is \(+\infty \) otherwise. Note that the lower bounding problem is infeasible on Z when \((\gamma -\alpha )^2 - 2\alpha > 0\), which can be achieved by choosing \(\alpha \) to be sufficiently-small (and increasing \(\gamma \) accordingly).

The maximum width of the interval Z for which it can be fathomed by infeasibility can be shown to be \(w(Z) = 2\alpha ^* := 2(1+\gamma ) - 2\displaystyle \sqrt{1+2\gamma } = O(\gamma ^2) = O(\varepsilon ^f)\) (note that \(\gamma \ll 1\) because \(\varepsilon ^f \ll 1\)). For \(\alpha > \alpha ^*\), the interval Z cannot be fathomed by infeasibility and the optimal objective value of the lower bounding problem on Z is \(-\gamma -\alpha = -\sqrt{\varepsilon ^f} = O(\sqrt{\varepsilon })\). Such an interval Z cannot be fathomed by value dominance either since \(\varepsilon \ll 1\).

Therefore, in the worst case, the interval Z can be fathomed only when \(w(Z) = O(\gamma ^2) = O(\varepsilon ^f)\). This causes clustering in the worst case since \(w([x^{L },0)) = O(\sqrt{\varepsilon ^f})\) and \([x^{L },0) \subset X_3\).

3 Analysis of the cluster problem

In this section, conservative estimates for the number of boxes required to cover \(X_3\) and \(X_5\) are provided based on assumptions on Problem (P) (in particular, on its set of global minimizers), and characteristics of the branch-and-bound algorithm. First, some requisite definitions are provided [2].

Definition 9

(Neighborhood of a point) Let \(\mathbf {x}\in X \subset \mathbb {R}^{n_x}\). For any \(\alpha > 0\), \(p \in \mathbb {N}\), the set

$$\begin{aligned} \mathscr {N}^p_{\alpha }(\mathbf {x}) := \left\{ \mathbf {z}\in X : {||}\mathbf {z}- \mathbf {x}{||}_p < \alpha \right\} \end{aligned}$$

is called the \(\alpha \)-neighborhood of \(\mathbf {x}\) relative to X with respect to the p-norm.

Note that all norms on \(\mathbb {R}^{n_x}\) are equivalent.

Definition 10

(Strict local minimum) Let \(\mathscr {F}(X)\) denote the feasible set of Problem (P). A point \(\bar{\mathbf {x}} \in \mathscr {F}(X)\) is called a strict local minimum if \(\bar{\mathbf {x}}\) is a local minimum, and \(\exists \alpha > 0\) such that \(f(\mathbf {x}) > f(\bar{\mathbf {x}})\), \(\forall \mathbf {x}\in \mathscr {N}^2_{\alpha }(\bar{\mathbf {x}}) \cap \mathscr {F}(X)\) such that \(\mathbf {x}\ne \bar{\mathbf {x}}\).

Definition 11

(Nonisolated feasible point) A feasible point \(\mathbf {x}\in \mathscr {F}(X)\) is said to be nonisolated if \(\forall \alpha > 0\), \(\exists \mathbf {z}\in \mathscr {N}^2_{\alpha }(\mathbf {x}) \cap \mathscr {F}(X)\) such that \(\mathbf {z}\ne \mathbf {x}\).

Definition 12

(Set of active inequality constraints) Let \(\mathbf {x}\in \mathscr {F}(X)\) be a feasible point for Problem (P). The set of active inequality constraints at \(\mathbf {x}\), denoted by \(\mathscr {A}(\mathbf {x})\), is given by

$$\begin{aligned} \mathscr {A}(\mathbf {x}) := \left\{ j \in \{1,\ldots ,m_I\} : g_j(\mathbf {x}) = 0 \right\} . \end{aligned}$$

Definition 13

(Tangent and cone of tangents) Let \(\mathbf {x}\in \mathscr {F}(X) \subset \mathbb {R}^{n_x}\) be a feasible point for Problem (P). A vector \(\mathbf {d}\in \mathbb {R}^{n_x}\) is said to be a tangent of \(\mathscr {F}(X)\) at \(\mathbf {x}\) if there exists a sequence \(\{\lambda _k\} \rightarrow 0\) with \(\lambda _k > 0\), and a sequence \(\{\mathbf {x}_k\} \rightarrow \mathbf {x}\) with \(\mathbf {x}_k \in \mathscr {F}(X)\) such that

$$\begin{aligned} \mathbf {d}= \underset{k \rightarrow \infty }{\lim } \displaystyle \frac{\mathbf {x}_k - \mathbf {x}}{\lambda _k}. \end{aligned}$$

The set of all tangents of \(\mathscr {F}(X)\) at \(\mathbf {x}\), denoted by \(T(\mathbf {x})\), is called the tangent cone of \(\mathscr {F}(X)\) at \(\mathbf {x}\).

3.1 Estimates for the number of boxes required to cover \(X_5\)

This section assumes that Problem (P) has a finite number of global minimizers (which implies each global minimum is a strict local minimum), and \(\varepsilon \) is small enough that \(X_5\) is guaranteed to be contained in neighborhoods of global minimizers under additional assumptions. An estimate for the number of boxes of width \(\delta \) required to cover some neighborhood of a minimum \(\mathbf {x}^*\) that contains the subset of \(X_5\) around \(\mathbf {x}^*\) is provided under suitable assumptions. An estimate for the number of boxes required to cover \(X_5\) can be obtained by summing the above estimates over the set of global minimizers. Throughout this section, we assume that \(\mathbf {x}^*\) is a nonisolated feasible point; otherwise, \(\exists \alpha > 0\) such that \(\mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap X_5 = \{\mathbf {x}^*\}\), which can be covered using a single box.

We begin with a necessary condition for \(\mathbf {x}^*\) to be a local minimum.

Theorem 1

(First-order necessary optimality condition) Consider Problem (P), and suppose f is differentiable at \(\mathbf {x}^*\). Then

$$\begin{aligned} \left\{ \mathbf {d} : {\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}< 0 \right\} \cap T(\mathbf {x}^*) = \emptyset . \end{aligned}$$

Proof

See Theorem 5.1.2 in [2]. \(\square \)

Lemma 4

Consider Problem (P). Suppose \(\mathbf {x}^*\) is nonisolated and f is differentiable at \(\mathbf {x}^*\). Then \(\forall \theta > 0\), \(\exists \alpha > 0\) such that

$$\begin{aligned} \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t> 0 \, : \, (\mathbf {x}^*+t\mathbf {d}) \in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X) \right\} }{\inf }{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}> \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \mathbf {d}\in T(\mathbf {x}^*) \right\} }{\min }{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}- \theta . \end{aligned}$$

Proof

See “Proof of Lemma 4 in Appendix”. \(\square \)

The following result, inspired by Lemma 2.4 in [28], provides a conservative estimate of the subset of \(X_5\) around a nonisolated \(\mathbf {x}^*\) under the assumption that the objective function grows linearly on the feasible region in some neighborhood of \(\mathbf {x}^*\). The reader can compare the assumptions of Lemma 5 with what follows from Lemma 4 and the necessary optimality conditions in Theorem 1 (see Remark 1 for details).

Lemma 5

Consider Problem (P). Suppose \(\mathbf {x}^*\) is nonisolated, f is differentiable at \(\mathbf {x}^*\), and \(\exists \alpha > 0\) such that

$$\begin{aligned} L := \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t> 0 \, : \, (\mathbf {x}^*+t\mathbf {d}) \in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X) \right\} }{\inf }{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}> 0. \end{aligned}$$

Then, \(\exists \hat{\alpha } \in (0,\alpha ]\) such that the region \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) can be conservatively approximated by

$$\begin{aligned} \hat{X}_{5} = \left\{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) : L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon \right\} . \end{aligned}$$

Proof

Let \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X)\) with \({||}\mathbf {d}{||}_1 = 1\) and \(t = {||}\mathbf {x}- \mathbf {x}^*{||}_1 > 0\). We have

$$\begin{aligned} f(\mathbf {x})&= f(\mathbf {x}^* + t\mathbf {d}) \\&= f(\mathbf {x}^*) + {\nabla f(\mathbf {x}^*)}^\mathrm{T} (\mathbf {x}- \mathbf {x}^*) + o({||}\mathbf {x}- \mathbf {x}^*{||}_1) \\&= f(\mathbf {x}^*) + t{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}+ o(t) \\&\ge f(\mathbf {x}^*) + Lt + o(t), \end{aligned}$$

where Step 2 follows from the differentiability of f at \(\mathbf {x}^*\). Consequently, there exists \(\hat{\alpha } \in (0,\alpha ]\) such that for all \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \mathscr {F}(X)\) with \({||}\mathbf {d}{||}_1 = 1\) and \(t \in [0,\hat{\alpha })\):

$$\begin{aligned} f(\mathbf {x}) \ge f(\mathbf {x}^*) + Lt + o(t) \ge f(\mathbf {x}^*) + \displaystyle \frac{L}{2} t. \end{aligned}$$

Therefore, \(\forall \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) we have \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \mathscr {F}(X)\) with \({||}\mathbf {d}{||}_1 = 1\) and \(t = {||}\mathbf {x}- \mathbf {x}^*{||}_1 < \hat{\alpha }\), and

$$\begin{aligned} \varepsilon \ge f(\mathbf {x}) - f(\mathbf {x}^*) \ge \displaystyle \frac{L}{2}t \implies Lt = L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon . \end{aligned}$$

\(\square \)

A conservative estimate of the number of boxes of width \(\delta \) required to cover \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) can be obtained by estimating the number of boxes of width \(\delta \) required to cover \(\hat{X}_5\) (see Theorem 2). The following remark is in order.

Remark 1

  1. 1.

    Lemma 5 is not applicable when \(L = 0\). This can occur, for instance, when \(\mathbf {x}^*\) is an unconstrained minimum, in which case other techniques have to be employed to analyze the cluster problem [7, 21, 28, 29] under alternative assumptions. This is because when f is differentiable at an unconstrained minimizer \(\mathbf {x}^*\), it grows slower than linearly around \(\mathbf {x}^*\) as a result of the first-order necessary optimality condition \(\nabla f(\mathbf {x}^*) = \mathbf {0}\) (note that if f is twice-differentiable at \(\mathbf {x}^*\) and \(\nabla ^2 f(\mathbf {x}^*)\) is positive definite, then f grows quadratically around \(\mathbf {x}^*\)). The assumptions of Lemma 5 may be satisfied for a constrained problem, however, because they only require that the objective function grow linearly in the set of directions that lead to feasible points in some neighborhood of \(\mathbf {x}^*\). An example of \(L = 0\) when \(\mathbf {x}^*\) is not an unconstrained minimum is: \(X = (-2,2)\), \(m_I = 2\), \(m_E = 0\), \(f(x) = x^3\), \(g_1(x) = x-1\), and \(g_2(x) = -x\) with \(x^* = 0\). In this example, the objective function only grows cubically around \(x^*\) in the direction from \(x^*\) that leads to feasible points. From Lemma 4, we have that a sufficient condition for the key assumption of Lemma 5 to be satisfied is \(\underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \mathbf {d}\in T(\mathbf {x}^*) \right\} }{\min } \,{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}> 0\). It is not hard to show that this condition is also necessary when f is differentiable at \(\mathbf {x}^*\). Proposition 2 shows that the assumptions of Lemma 5 will not be satisfied when Problem (P) does not contain any active inequality constraints and the minimizer corresponds to a KKT point for Problem (P).

  2. 2.

    \(\hat{\alpha }\) depends on the local behavior of f around \(\mathbf {x}^*\), but is independent of \(\varepsilon \) since it is determined by the subset of \(\mathscr {N}^{1}_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X)\) on which the affine function \(f(\mathbf {x}^*) + \frac{L}{2}t\) underestimates \(f(\mathbf {x})\). Consequently, for sufficiently small \(\varepsilon \), \(\hat{X}_{5} = \left\{ \mathbf {x}\in X : L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon \right\} \) since \(\left\{ \mathbf {x}\in X : L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon \right\} \) will then be a subset of \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*)\). Note that the factor ‘2’ in the denominator of ‘\({\frac{L}{2}t}\)’ is arbitrarily chosen; any factor \(> 1\) can instead be chosen with a corresponding \(\hat{\alpha }\). Furthermore, \(\mathbf {x}^*\) is necessarily the unique global minimizer of Problem (P) on \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*)\) since \(L > 0\).

  3. 3.

    If, in addition to the assumptions of Lemma 5, f is assumed to be convex on \(\mathscr {N}^1_{\alpha }(\mathbf {x}^*)\), then we can choose \(\hat{\alpha } = \alpha \). Additionally, \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) can be conservatively approximated by \(\left\{ \mathbf {x}\in X : L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le \varepsilon \right\} \) when \(\varepsilon \) is small enough.

  4. 4.

    The estimate \(\hat{X}_5\) becomes less conservative as \(\varepsilon \) is decreased since the higher order term \(o(t) \rightarrow 0\) as \(\varepsilon \rightarrow 0\). Simply put, this is because the affine approximation \(f(\mathbf {x}^*) + Lt\) provides a better description of f as \(\varepsilon \rightarrow 0\).

In fact, under the assumptions of Lemma 5, a less conservative estimate of \(X_5\) can be obtained by accounting for the fact that not all points \(\mathbf {x}\in \left\{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) : L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon \right\} \) satisfy \({\nabla f(\mathbf {x}^*)}^\mathrm{T} (\mathbf {x}- \mathbf {x}^*) \ge L{||}\mathbf {x}- \mathbf {x}^*{||}_1\).

Proposition 1

Consider Problem (P), and suppose the assumptions of Lemma 5 are satisfied. Then, \(\exists \hat{\alpha } \in (0,\alpha ]\) such that the region \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) can be conservatively approximated by

$$\begin{aligned} \hat{X}_{5} = \left\{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) : L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon , \, L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le {\nabla f(\mathbf {x}^*)}^\mathrm{T} (\mathbf {x}- \mathbf {x}^*) \right\} . \end{aligned}$$

Proof

The desired result follows from Lemma 5 and the fact that

$$\begin{aligned} {\nabla f(\mathbf {x}^*)}^\mathrm{T} (\mathbf {x}- \mathbf {x}^*) \ge L{||}\mathbf {x}- \mathbf {x}^*{||}_1, \quad \forall \mathbf {x}\in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X), \end{aligned}$$

from the assumptions of Lemma 5. \(\square \)

As an illustration of the application of Lemma 5, let us reconsider Example 2. Recall that \(X = (2.2,2.5) \times (2.9,3.3)\), \(m_I = 3\), \(m_E = 0\), \(f(\mathbf {x}) = -x_1 - x_2\), \(g_1(\mathbf {x}) = x_2 - 2x^4_1 + 8x^3_1 - 8x^2_1 - 2\), \(g_2(\mathbf {x}) = x_2 - 4x^4_1 + 32x^3_1 - 88x^2_1 + 96x_1 - 36\), and \(g_3(\mathbf {x}) = 3 - x_2\) with \(\mathbf {x}^* \approx (2.33,3.18)\). Let \(\varepsilon \le 0.07\). We have \(\mathscr {F}(X) = \left\{ \mathbf {x}\in X : \mathbf {g}(\mathbf {x}) \le \mathbf {0} \right\} \), \(\nabla f(\mathbf {x}^*) = (-1,-1)\), \(\alpha = +\infty \), \(L \approx 0.649\), and \(X_5 = \left\{ \mathbf {x}\in X : \mathbf {g}(\mathbf {x}) \le \mathbf {0}, -x_1 - x_2 \le f(\mathbf {x}^*) + \varepsilon \right\} \). Choose \(\hat{\alpha } = +\infty \) in Lemma 5. From Lemma 5 and Remark 1, we have \(\hat{X}_{5} = \left\{ \mathbf {x} : 0.649{||}\mathbf {x}-\mathbf {x}^*{||}_1 \le \varepsilon \right\} \) (since f is convex).

Figure 2a plots \(X_5\) and \(\hat{X}_5\) for \(\varepsilon = 0.07\), and Fig. 2b shows the improvement in the estimate when Proposition 1 is used, in which case we obtain \(\hat{X}_{5} = \{\mathbf {x}: 0.649{||}\mathbf {x}-\mathbf {x}^*{||}_1 \le \varepsilon , \, 0.649{||}\mathbf {x}-\mathbf {x}^*{||}_1 \le -(x_1 - x^*_1) - (x_2 - x^*_2)\}\). Note that an even better estimate of \(X_5\) may be obtained by using knowledge of the local feasible set \(\mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X)\). However, other than in some special cases (see Lemma 6), we shall stick with the estimate \(\hat{X}_5\) from Lemma 5 since we are mainly concerned with the dependence of the extent of clustering on the convergence rate of the lower bounding scheme.

Fig. 2
figure 2

Plots of \(X_5\) (solid regions) and \(\hat{X}_5\) (the areas between the dotted lines) for Example 2 for \(\varepsilon = 0.07\) (note that we do not use \(\varepsilon = 0.1\) as in Fig. 1b because the corresponding \(\hat{X}_5\) are not contained in X). The dashed lines define the set X, the filled-in triangles correspond to the minimizer \(\mathbf {x}^*\), and the dash-dotted lines represent the axes translated to \(\mathbf {x}^*\), a \(X_5\) and estimate \(\hat{X}_5\) from Lemma 5, b \(X_5\) and estimate \(\hat{X}_5\) from Proposition 1

Before we provide an estimate of the number of boxes of width \(\delta \) required to cover \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\), we provide a few more examples that satisfy the assumptions of Lemma 5 and present an approach that could help determine if its assumptions are satisfied. Example 5 illustrates another inequality-constrained case which satisfies the assumptions of Lemma 5. Note that the minimizer \(\mathbf {x}^*\) does not satisfy the KKT conditions in this case.

Example 5

Let \(\varepsilon \le 1\), \(X = (-2,2)\), \(m_I = 3\), and \(m_E = 0\) with \(f(x) = -x\), \(g_1(x) = x^3\), \(g_2(x) = x-1\), \(g_3(x) = -1-x\), and \(x^* = 0\). We have \(\mathscr {F}(X) = [-1,0]\), \(\nabla f(x^*) = -1\), \(\alpha = +\infty \), \(L = 1\), and \(X_5 = [-\varepsilon ,0]\). Choose \(\hat{\alpha } = +\infty \) in Lemma 5. From Lemma 5 and Remark 1, we have \(\hat{X}_{5} = [-\varepsilon , +\varepsilon ]\) (since f is convex).

The reader may conjecture, based on Example 5 and other examples of low dimension, that every nonisolated minimizer \(\mathbf {x}^*\) which does not satisfy the KKT conditions will automatically satisfy the main assumption of Lemma 5. Example 6, inspired by [10, Section 4.1], however illustrates a case when the assumptions of Lemma 5 are not satisfied even though \(\mathbf {x}^*\) does not satisfy the KKT conditions.

Example 6

Let \(X = (-2,2)^3\), \(m_I = 5\), and \(m_E = 0\) with \(f(\mathbf {x}) = x_1 + x^2_3\), \(g_1(\mathbf {x}) = x_1 - 1\), \(g_2(\mathbf {x}) = x_2 - x_1\), \(g_3(\mathbf {x}) = x^2_2\), \(g_4(\mathbf {x}) = -x_3\), \(g_5(\mathbf {x}) = x_3 - 1\), and \(\mathbf {x}^* = (0,0,0)\). We have \(\mathscr {F}(X) = \left\{ \mathbf {x}\in [0,1]^3 : x_2 = 0 \right\} \), \(\nabla f(\mathbf {x}^*) = (1,0,0)\), and \(L = 0\) for every \(\alpha > 0\) since \((0,0,1) \in T(\mathbf {x}^*)\) and \({\nabla f(\mathbf {x}^*)}^\mathrm{T} (0,0,1) = 0\).

The next result provides conditions under which the assumptions of Lemma 5 will not be satisfied. In particular, it is shown that the assumptions of Lemma 5 will not be satisfied if Problem (P) is purely equality-constrained and all the functions in Problem (P) are differentiable at a nonisolated \(\mathbf {x}^*\).

Proposition 2

Consider Problem (P) with \(m_E \ge 1\). Suppose \(\mathbf {x}^*\) is nonisolated, f is differentiable at \(\mathbf {x}^*\), functions \(h_k\), \(k = 1,\ldots ,m_E\), are differentiable at \(\mathbf {x}^*\), and \(\mathscr {A}(\mathbf {x}^*) = \emptyset \). Furthermore, suppose there exist multipliers \(\varvec{\lambda }^* \in \mathbb {R}^{m_E}\) corresponding to the equality constraints such that \((\mathbf {x}^*,\mathbf {0},\varvec{\lambda }^*)\) is a KKT point. Then

$$\begin{aligned} \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \mathbf {d}\in T(\mathbf {x}^*) \right\} }{\min }{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}= 0. \end{aligned}$$

Proof

See “Proof of Proposition 2 in Appendix”. \(\square \)

Note that the above result can naturally be extended to accommodate weakly active inequality constraints (see [2, Section 4.4]). The ensuing examples illustrate that the assumptions of Lemma 5 may be satisfied when individual assumptions of Proposition 2 do not hold.

Example 7

Let \(\varepsilon \le 0.5\), \(X = (-2,2) \times (-2,2)\), \(m_I = 1\), and \(m_E = 1\) with \(f(\mathbf {x}) = x_1 + 10x^2_2\), \(g(\mathbf {x}) = x_1 - 1\), \(h(\mathbf {x}) = x_1 - {|}x_2{|}\), and \(\mathbf {x}^* = (0,0)\). We have \(\mathscr {F}(X) = \left\{ \mathbf {x}\in X : x_1 = {|}x_2{|}, x_1 \le 1 \right\} \), \(\nabla f(\mathbf {x}^*) = (1,0)\), \(\alpha = +\infty \), \(L~=~0.5\), and \(X_5 = \left\{ \mathbf {x}\in [0,\varepsilon ] \times [-\varepsilon ,\varepsilon ] : x_1 = {|}x_2{|}, x_1 + 10 x^2_2 \le \varepsilon \right\} \). Choose \(\hat{\alpha } = +\infty \) in Lemma 5. From Lemma 5 and Remark 1, we have \(\hat{X}_{5} = \left\{ \mathbf {x}\in X : {||}\mathbf {x}{||}_1 \le 2\varepsilon \right\} \) (since f is convex).

Example 8

Let \(\varepsilon \le 0.5\), \(X = (-2,2) \times (-2,2)\), \(m_I = 4\), and \(m_E = 1\) with \(f(\mathbf {x}) = x_1 + x_2\), \(g_1(\mathbf {x}) = -x_1\), \(g_2(\mathbf {x}) = -x_2\), \(g_3(\mathbf {x}) = x_1 - 1\), \(g_4(\mathbf {x}) = x_2 - 1\), \(h(\mathbf {x}) = x_2 - x^3_1\), and \(\mathbf {x}^* = (0,0)\). We have \(\mathscr {F}(X) = \left\{ \mathbf {x}\in [0,1]^2 : x_2 = x^3_1 \right\} \), \(\nabla f(\mathbf {x}^*) = (1,1)\), \(\alpha = +\infty \), \(L = 1\), and \(X_5 = \left\{ \mathbf {x}\in [0,\varepsilon ] \times [0,\varepsilon ] : x_2 = x^3_1, x_1 + x_2 \le \varepsilon \right\} \). Choose \(\hat{\alpha } = +\infty \) in Lemma 5. From Lemma 5 and Remark 1, we have \(\hat{X}_{5} = \left\{ \mathbf {x}\in X : {||}\mathbf {x}{||}_1 \le \varepsilon \right\} \) (since f is convex).

Fig. 3
figure 3

Plots of \(X_5\) (solid curves) and \(\hat{X}_5\) (left figure: area between the dotted lines, right figure: curve depicted by the circles) for Example 8 for \(\varepsilon = 0.5\). The filled-in triangles correspond to the minimizer \(\mathbf {x}^*\), and the dash-dotted lines represent the axes translated to \(\mathbf {x}^*\), a \(X_5\) and estimate \(\hat{X}_5\) from Lemma 5, b \(X_5\) and estimate \(\hat{X}_5\) from Lemma 6

Figure 3a plots \(X_5\) and \(\hat{X}_5\) for Example 8 for \(\varepsilon = 0.5\). It is seen that the estimate \(\hat{X}_5\) does not capture the ‘one-dimensional nature’ of \(X_5\) (which is a consequence of the equality constraint in Example 8). This issue is addressed in Lemma 6. Note that \(X_5\) for Example 7 also resides in a reduced-dimensional manifold, but Lemma 6 does not apply in this case since h is not differentiable at \(\mathbf {x}^*\) (the discussion after Lemma 6 proposes a modification of the assumptions of Lemma 6 that addresses this issue).

While Lemma 5 provides a conservative estimate of \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) under suitable assumptions, verifying the satisfaction of its assumptions is not straightforward. The following proposition provides a conservative approach for determining whether the assumptions of Lemma 5 are satisfied.

Proposition 3

Let \(L(\alpha )\) denote the constant L in Lemma 5 for a given \(\alpha > 0\). When the active constraints are differentiable at \(\mathbf {x}^*\), a lower bound on \(L_0 := \underset{\alpha \rightarrow 0^{+}}{\lim }{L(\alpha )}\) can be obtained by solving

$$\begin{aligned} \underset{\mathbf {d}}{\min }&\,\, {\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}\\ \quad \quad \mathrm{s.t.}&\,\, {||}\mathbf {d}{||}_1 = 1, \\ \quad \quad \quad&\,\, \mathbf {d}\in \mathscr {L}(\mathbf {x}^*), \end{aligned}$$

where \(\mathscr {L}(\mathbf {x}^*) := \big \{\mathbf {d}\in \mathbb {R}^{n_x}: {\nabla g_j(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}\le 0, \forall j \in \mathscr {A}(\mathbf {x}^*), {\nabla h_k(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}= 0, \forall k \in \{1,\ldots ,m_E\}\big \}\) denotes the linearized cone at \(\mathbf {x}^*\). If \(\mathbf {x}^*\) corresponds to a KKT point, the above formulation provides the exact value of \(L_0\).

So far in this section, we have established conditions under which a conservative estimate of the subset of \(X_5\) around a minimizer \(\mathbf {x}^*\) can be obtained, presented examples for which the above conditions hold, and isolated a class of problems for which the above conditions are not satisfied. The following theorem follows from Corollary 2.1 in [28], the proof of which is rederived in “Appendix” for completeness. It provides a conservative estimate of the number of boxes of width \(\delta \) required to cover \(\hat{X}_5\) from Lemma 5. Therefore, from Lemma 1 and the result below, we can get an upper bound on the worst-case number of boxes required to cover \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) and estimate the extent of the cluster problem on that region (recall from Remark 1 that the subset of \(X_5\) around \(\mathbf {x}^*\) will be contained in \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*)\) for sufficiently small \(\varepsilon \)).

Theorem 2

Suppose the assumptions of Lemma 5 hold. Let \(\delta = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}\), \(r = \displaystyle \frac{2\varepsilon }{L}\).

  1. 1.

    If \(\delta \ge 2r\), let \(N = 1\).

  2. 2.

    If \(\displaystyle \frac{2r}{m-1} > \delta \ge \displaystyle \frac{2r}{m}\) for some \(m \in \mathbb {N}\) with \(m \le n_x\) and \(2 \le m \le 5\), then let

    $$\begin{aligned} N = \displaystyle \sum _{i=0}^{m-1}{2^i \left( {\begin{array}{c}n_x\\ i\end{array}}\right) } + 2n_x{\lceil }\displaystyle \frac{m-3}{3}{\rceil }. \end{aligned}$$
  3. 3.

    Otherwise, let

    $$\begin{aligned} N = {\lceil }2 \left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( 1 - \frac{1}{\beta ^*} \right) } L^{-1}{\rceil }^{n_x-1}&\left( {\lceil }2 \left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( 1 - \frac{1}{\beta ^*} \right) } L^{-1}{\rceil } \right. \\&\left. +\, 2n_x {\lceil }\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( 1 - \frac{1}{\beta ^*} \right) } L^{-1}{\rceil } \right) . \end{aligned}$$

Then, N is an upper bound on the number of boxes of width \(\delta \) required to cover \(\hat{X}_{5}\).

Proof

See “Proof of Theorem 2 in Appendix”. \(\square \)

Remark 2

Under the assumptions of Lemma 5, the dependence of N on \(\varepsilon \) disappears when the lower bounding scheme has first-order convergence on \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \mathscr {F}(X)\), i.e., \(\beta ^* = 1\). Therefore, the cluster problem on \(X_5\) may be eliminated even using first-order convergent lower bounding schemes with sufficiently small prefactors. This is in contrast to unconstrained global optimization where at least second-order convergent lower bounding schemes are required to eliminate the cluster problem (see Remark 1 for an intuitive explanation for this qualitative difference in behavior). Note that the dependence of N on the prefactor \(\tau ^*\) can be detailed in a manner similar to Table 1 in [29].

The above scaling has also been empirically observed by Goldsztejn et al. [9], who reason “\(\ldots \) removes the tangency between the feasible set and the objective level set, and therefore should prevent the cluster effect.”

The next result refines the analysis of Lemma 5 when Problem (P) contains equality constraints that can locally be eliminated using the implicit function theorem [22].

Lemma 6

Consider Problem (P) with \(1 \le m_E < n_x\). Suppose \(\mathbf {x}^*\) is nonisolated, f is differentiable at \(\mathbf {x}^*\), and \(\exists \alpha > 0\) such that \(\mathbf {h}\) is continuously differentiable on \(\mathscr {N}^1_{\alpha }(\mathbf {x}^*)\) and

$$\begin{aligned} L := \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t> 0 \, : \, (\mathbf {x}^*+t\mathbf {d}) \in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X) \right\} }{\inf }{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}> 0. \end{aligned}$$

Furthermore, suppose the variables \(\mathbf {x}\) can be reordered and partitioned into dependent variables \(\mathbf {z}\in \mathbb {R}^{m_E}\) and independent variables \(\mathbf {p}\in \mathbb {R}^{n_x - m_E}\), with \(\mathbf {x}\equiv (\mathbf {z},\mathbf {p})\), such that \(\nabla _{\mathbf {z}} \mathbf {h}((\mathbf {z},\mathbf {p}))\) is nonsingular on \(\mathscr {N}^1_{\alpha }((\mathbf {z}^*,\mathbf {p}^*))\), where \(\mathbf {x}^* \equiv (\mathbf {z}^*,\mathbf {p}^*)\). Then, \(\exists \alpha _{\mathbf {p}}, \alpha _{\mathbf {z}} \in (0,\alpha ]\), a continuously differentiable function \(\varvec{\phi }:\mathscr {N}^1_{\alpha _{\mathbf {p}}}(\mathbf {p}^*) \rightarrow \mathscr {N}^1_{\alpha _{\mathbf {z}}}(\mathbf {z}^*)\), and \(\hat{\alpha } \in (0, \alpha _{\mathbf {p}})\) such that the region \(\left( \mathscr {N}^1_{\alpha _{\mathbf {z}}}(\mathbf {z}^*) \times \mathscr {N}^1_{\hat{\alpha }}(\mathbf {p}^*)\right) \cap X_5\) can be conservatively approximated by

$$\begin{aligned} \hat{X}_{5} = \left\{ (\mathbf {z},\mathbf {p}) \in \mathscr {N}^1_{\alpha _{\mathbf {z}}}(\mathbf {z}^*) \times \mathscr {N}^1_{\hat{\alpha }}(\mathbf {p}^*) : \mathbf {z}= \varvec{\phi }(\mathbf {p}), \, L{||}\mathbf {p}- \mathbf {p}^*{||}_1 \le 2\varepsilon \right\} . \end{aligned}$$

Proof

The result follows from the proof of Lemma 5 and the implicit function theorem [22, Chapter 9]. \(\square \)

Lemma 6 effectively states that, under suitable conditions, the subset of \(X_5\) around \(\mathbf {x}^*\) resides in a reduced-dimensional manifold. Figure 3b compares the estimate \(\hat{X}_5\) obtained from Lemma 6 (when we assume precise knowledge of the implicit function) with the one obtained from Lemma 5 for Example 8. The reason for distinguishing between \(\alpha _{\mathbf {p}}\) and \(\hat{\alpha }\) is so that we can have \(\varvec{\phi }\) to be continuously differentiable on cl\(\left( \mathscr {N}^1_{\hat{\alpha }}(\mathbf {p}^*) \right) \); this fact will be used shortly. Note that the assumptions that \(\mathbf {h}\) is continuously differentiable on \(\mathscr {N}^1_{\alpha }(\mathbf {x}^*)\) and \(\nabla _{\mathbf {z}} \mathbf {h}((\mathbf {z},\mathbf {p}))\) is nonsingular on \(\mathscr {N}^1_{\alpha }((\mathbf {z}^*,\mathbf {p}^*))\) can be relaxed based on a nonsmooth variant of the implicit function theorem [6, Chapter 7] (which can be used to derive a less conservative estimate of \(X_5\) for Example 7, for instance).

The following corollary of Theorem 2 refines the estimate of the number of boxes of width \(\delta \) required to cover \(\hat{X}_5\) under the assumptions of Lemma 6. It provides an upper bound on the number of boxes of width \(\delta \) required to cover \(X_5\) that scales as \(O\left( \varepsilon ^{(n_x-m_E) \left( 1-\frac{1}{\beta ^*} \right) } \right) \) in contrast to the scaling \(O\left( \varepsilon ^{n_x \left( 1-\frac{1}{\beta ^*} \right) } \right) \) from Theorem 2.

Corollary 2

Suppose the assumptions of Lemma 6 hold. Let \(\delta = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}\), \(r = \displaystyle \frac{2\varepsilon }{L}\). Define

$$\begin{aligned} M_k&:= \left( \underset{\mathbf {p}\in cl \left( \mathscr {N}^1_{\hat{\alpha }}(\mathbf {p}^*) \right) }{\max } {||}\nabla \phi _k(\mathbf {p}){||}\right) \sqrt{n_x - m_E}, \quad \forall k \in \{1,\ldots ,m_E\}, \\ K&:= \left\{ k \in \{1,\ldots ,m_E\} : M_k > 1 \right\} . \end{aligned}$$
  1. 1.

    If \(\delta \ge 2r\), let \(N = \displaystyle \prod _{k \in K}{M_k}\).

  2. 2.

    If \(\displaystyle \frac{2r}{m-1} > \delta \ge \displaystyle \frac{2r}{m}\) for some \(m \in \mathbb {N}\) with \(m \le n_x - m_E\) and \(2 \le m \le 5\), then let

    $$\begin{aligned} N = \left( \displaystyle \sum _{i=0}^{m-1}{2^i \left( {\begin{array}{c}n_x - m_E\\ i\end{array}}\right) } + 2\left( n_x - m_E\right) {\lceil }\displaystyle \frac{m-3}{3}{\rceil } \right) \displaystyle \prod _{k \in K}{M_k}. \end{aligned}$$
  3. 3.

    Otherwise, let

    $$\begin{aligned} N= & {} {\lceil }2\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( 1 - \frac{1}{\beta ^*} \right) } L^{-1}{\rceil }^{n_x - m_E -1} \Bigg ( {\lceil }2\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( 1 - \frac{1}{\beta ^*} \right) } L^{-1}{\rceil } + \nonumber \\&2\left( n_x - m_E\right) {\lceil }\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( 1 - \frac{1}{\beta ^*} \right) } L^{-1}{\rceil } \Bigg ) \displaystyle \prod _{k \in K}{M_k}. \end{aligned}$$

Then, N is an upper bound on the number of boxes of width \(\delta \) required to cover \(\hat{X}_{5}\).

Proof

Theorem 2 can be used to obtain an overestimate of the number of boxes of width \(\delta \) required to cover the projection of \(\hat{X}_5\), as defined by Lemma 6, on \(\mathbf {p}\), i.e., \(\left\{ \mathbf {p}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {p}^*) : L{||}\mathbf {p}- \mathbf {p}^*{||}_1 \le 2\varepsilon \right\} \), by replacing \(n_x\) with \(n_x - m_E\) in the expressions for N. This estimate can be extended to obtain a conservative estimate of the number of boxes of width \(\delta \) required to cover \(\hat{X}_5\) as follows.

Note that \(\phi _k\) is Lipschitz continuous on cl\(\left( \mathscr {N}^1_{\hat{\alpha }}(\mathbf {p}^*) \right) \) with Lipschitz constant \(\frac{M_k}{\sqrt{n_x - m_E}}\), \(\forall k \in \{1,\ldots ,m_E\}\). Consider any box B of width \(\delta \) that is used to cover the projection of \(\hat{X}_5\) on \(\mathbf {p}\). We have

$$\begin{aligned} w\left( \overline{\phi _k}\left( B \cap \text {cl}\left( \mathscr {N}^1_{\hat{\alpha }}(\mathbf {p}^*)\right) \right) \right) \le M_k \delta , \quad \forall k \in \{1,\ldots ,m_E\}, \end{aligned}$$

from the Lipschitz continuity of \(\phi _k\). Therefore, we can replace the box B using \(\displaystyle \prod _{k \in K}{M_k}\) such boxes and translate them appropriately to cover the region

$$\begin{aligned} \left\{ (\mathbf {z},\mathbf {p}) \in \mathscr {N}^1_{\alpha _{\mathbf {z}}}(\mathbf {z}^*) \times \left( B \cap \mathscr {N}^1_{\hat{\alpha }}(\mathbf {p}^*) \right) : L{||}\mathbf {p}- \mathbf {p}^*{||}_1 \le 2\varepsilon , \, \mathbf {z}= \varvec{\phi }(\mathbf {p}) \right\} . \end{aligned}$$

Since \(\bigcup _{B}\left\{ B \cap \mathscr {N}^1_{\hat{\alpha }}(\mathbf {p}^*) \right\} \) covers the projection of \(\hat{X}_5\) on \(\mathbf {p}\), the desired result follows by multiplying the estimate obtained from Theorem 2 (with \(n_x\) replaced by \(n_x - m_E\)) by \(\displaystyle \prod _{k \in K}{M_k}\).

\(\square \)

The next result provides a natural extension of Lemma 5 to the case when the objective function is not differentiable at the minimizer \(\mathbf {x}^*\) [28]. Note that a similar result was derived for the case of unconstrained optimization in [28, Section 2.3] under alternative assumptions.

Lemma 7

Consider Problem (P). Suppose \(\mathbf {x}^*\) is nonisolated, f is locally Lipschitz continuous on X and directionally differentiable at \(\mathbf {x}^*\), and \(\exists \alpha > 0\) such that

$$\begin{aligned} L := \underset{\left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t> 0 \, : \, (\mathbf {x}^*+t\mathbf {d}) \in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X) \right\} }{\inf } f'(\mathbf {x}^*;\mathbf {d}) > 0. \end{aligned}$$

Then, \(\exists \hat{\alpha } \in (0,\alpha ]\) such that the region \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) can be conservatively approximated by

$$\begin{aligned} \hat{X}_{5} = \left\{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) : L{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon \right\} . \end{aligned}$$

Proof

The proof is relegated to “Proof of Lemma 7 in Appendix” since it is similar to the proof of Lemma 5. \(\square \)

Remark 3

Theorem 2 can be extended to the case when the assumption that the function f is differentiable at \(\mathbf {x}^*\) is relaxed by using Lemma 1 and 7 and Corollary 2.1 in [28] (also see Theorem 2). Similar to the differentiable case, the dependence of N on \(\varepsilon \) disappears when the lower bounding scheme has first-order convergence on \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \mathscr {F}(X)\), i.e., \(\beta ^* = 1\). Additionally, Lemma 6 and Corollary 2 can also be extended to the case when f is not differentiable at \(\mathbf {x}^*\) under suitable assumptions.

Thus far, we have established conditions under which first-order convergence of the lower bounding scheme at feasible points is sufficient to mitigate the cluster problem on \(X_5\). In the remainder of this section, we will present conditions under which second-order convergence of the lower bounding scheme is sufficient to mitigate clustering on \(X_5\). The first result in this regard provides a conservative estimate of the subset of \(X_5\) around a nonisolated \(\mathbf {x}^*\) under the assumption that the objective function grows quadratically (or faster) on the feasible region in some neighborhood of \(\mathbf {x}^*\).

Lemma 8

Consider Problem (P), and suppose f is twice-differentiable at \(\mathbf {x}^*\). Suppose \(\exists \alpha> 0, \gamma > 0\) such that

$$\begin{aligned} {\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}+ \displaystyle \frac{1}{2}{\mathbf {d}}^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \mathbf {d}\ge \gamma {\mathbf {d}}^\mathrm{T} \mathbf {d}, \quad \forall \mathbf {d}\in \left\{ \mathbf {d} : (\mathbf {x}^*+\mathbf {d}) \in \mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X) \right\} . \end{aligned}$$

Then \(\exists \hat{\alpha } \in (0,\alpha ]\) such that the region \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) can be conservatively approximated by

$$\begin{aligned} \hat{X}_5 = \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon \right\} . \end{aligned}$$

Furthermore, \(\mathbf {x}^*\) is the unique global minimizer for Problem (P) on \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*)\).

Proof

Let \(\mathbf {x}= \mathbf {x}^* + \mathbf {d}\in \mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X)\). We have

$$\begin{aligned} f(\mathbf {x})&= f(\mathbf {x}^*+\mathbf {d}) \\&= f(\mathbf {x}^*) + {\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}+ \displaystyle \frac{1}{2}{\mathbf {d}}^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \mathbf {d}+ o({||}\mathbf {d}{||}^2) \\&\ge f(\mathbf {x}^*) + \gamma {\mathbf {d}}^\mathrm{T} \mathbf {d}+ o({||}\mathbf {d}{||}^2). \end{aligned}$$

Consequently, there exists \(\hat{\alpha } \in (0,\alpha ]\) such that for all \(\mathbf {x}= \mathbf {x}^* + \mathbf {d}\in \mathscr {F}(X)\) with \({||}\mathbf {d}{||} \in [0,\hat{\alpha })\):

$$\begin{aligned}&f(\mathbf {x}) \ge f(\mathbf {x}^*) + \gamma {\mathbf {d}}^\mathrm{T} \mathbf {d}+ o({||}\mathbf {d}{||}^2) \ge f(\mathbf {x}^*) + \displaystyle \frac{\gamma }{2} {\mathbf {d}}^\mathrm{T} \mathbf {d}. \end{aligned}$$
(1)

Therefore, \(\forall \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) we have \(\mathbf {x}= \mathbf {x}^* + \mathbf {d}\in \mathscr {F}(X)\) with \({||}\mathbf {d}{||} < \hat{\alpha }\), and

$$\begin{aligned} \varepsilon \ge f(\mathbf {x}) - f(\mathbf {x}^*) \ge \displaystyle \frac{\gamma }{2} {\mathbf {d}}^\mathrm{T} \mathbf {d}\implies \gamma {||}\mathbf {d}{||}^2 = \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon . \end{aligned}$$

The conclusion that \(\mathbf {x}^*\) is the unique global minimizer for Problem (P) on \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*)\) follows from Eq. (1). \(\square \)

Remark 4

  1. 1.

    Lemma 8 is not applicable when \(\not \exists \alpha > 0\) and \(\gamma > 0\), for example \(X = (-2,2) \times (-2,2)\), \(m_I = 2\), \(m_E = 0\), \(f(\mathbf {x}) = x_2\), \(g_1(\mathbf {x}) = x^4_1-x_2\), \(g_2(\mathbf {x}) = x_2 - 1\), and \(\mathbf {x}^* = (0,0)\). In this case, for any \(\alpha > 0\), there exist directions from \(\mathbf {x}^*\) to feasible points in which f grows slower than quadratically near \(\mathbf {x}^*\).

  2. 2.

    For the case of unconstrained global optimization, the assumption of Lemma 8 reduces to the assumption that \(\nabla ^2 f(\mathbf {x}^*)\) is positive definite, and \(\gamma \) can be taken to be equal to half the smallest eigenvalue of \(\nabla ^2 f(\mathbf {x}^*)\) (see Theorem 1 in [29]). When the minimum is constrained, \(\gamma \) may potentially be estimated as follows. The first possibility is to directly estimate \(\gamma \) using a quadratic underestimator of f on \(\mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X)\). If such an underestimator cannot be constructed easily, \(\gamma \) may still be estimated relatively easily when additional assumptions are satisfied.

    Suppose \((\mathbf {x}^*,\varvec{\mu }^*,\varvec{\lambda }^*)\) is a KKT point, where \(\varvec{\mu }^*\) and \(\varvec{\lambda }^*\) correspond to Lagrange multipliers for \(\mathbf {g}\) and \(\mathbf {h}\), respectively, at \(\mathbf {x}^*\). Consider the restricted Lagrangian \(L(\mathbf {x};\varvec{\mu }^*,\varvec{\lambda }^*)\), and suppose it is positive definite for all \(\mathbf {x}\in \text {cl}(\mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X))\) (cf. [2, Section 4.4]). Then \(\gamma \) may be estimated from the eigenvalues of \(\nabla ^2 L(\mathbf {x};\varvec{\mu }^*,\varvec{\lambda }^*)\) on \(\text {cl}(\mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X))\). This is a consequence of the fact that \(f(\mathbf {x}) \ge L(\mathbf {x};\varvec{\mu }^*,\varvec{\lambda }^*)\), \(\forall \mathbf {x}\in \mathscr {F}(X)\), by weak duality, \(f(\mathbf {x}^*) = L(\mathbf {x}^*;\varvec{\mu }^*,\varvec{\lambda }^*)\), and the stationarity condition \(\nabla _{\mathbf {x}} L(\mathbf {x};\varvec{\mu }^*,\varvec{\lambda }^*) = \mathbf {0}\). Otherwise, if \((\mathbf {x}^*,\varvec{\mu }^*,\varvec{\lambda }^*)\) is a KKT point and some convex combination of f and \(L(\cdot ;\varvec{\mu }^*,\varvec{\lambda }^*)\) grows quadratically or faster on \(\mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X)\), then \(\gamma \) can be estimated using one of its quadratic underestimators on \(\mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X)\).

  3. 3.

    The key assumption of Lemma 8, which assumes that f grows quadratically or faster on the feasible region in some neighborhood of \(\mathbf {x}^*\), is a relaxation of the key assumption of Lemma 5, which assumes that f grows linearly on the feasible region in some neighborhood of \(\mathbf {x}^*\). While it was shown in Theorem 2 that first-order convergence of the lower bounding scheme at feasible points may be sufficient to mitigate clustering on \(X_5\) under the assumptions of Lemma 5, Theorem 3, which will be presented shortly, shows that second-order convergence of the lower bounding scheme at feasible points may be sufficient to mitigate clustering on \(X_5\) under the assumptions of Lemma 8. Consequently, the assumptions of Lemma 5 and 8 can be viewed as belonging to a hierarchy of conditions for certain convergence orders of the lower bounding scheme at feasible points being sufficient to mitigate clustering on \(X_5\), with the condition for third-order convergence of the lower bounding scheme at feasible points to be sufficient to mitigate clustering on \(X_5\) amounting to the third-order Taylor expansion of f growing faster than cubically on the feasible region in some neighborhood of \(\mathbf {x}^*\), and so on.

  4. 4.

    Along the line of discussion in Remark 1, \(\hat{\alpha }\) depends on the local behavior of f around \(\mathbf {x}^*\), but is independent of \(\varepsilon \). Consequently, for sufficiently small \(\varepsilon \) we can conservatively approximate the set \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) by \(\left\{ \mathbf {x}\in X : \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon \right\} \). Additionally, if the objective function f is either an affine or a quadratic function of \(\mathbf {x}\), then its second-order Taylor expansion around \(\mathbf {x}^*\) equals f itself and we can choose \(\hat{\alpha } = \alpha \). Furthermore, \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) can be conservatively approximated by the set \(\hat{X}_5 = \left\{ \mathbf {x}\in X : \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \le \varepsilon \right\} \).

  5. 5.

    Similar to Proposition 1, a less conservative estimate of \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) can be obtained as

    $$\begin{aligned} \hat{X}_5= & {} \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon , \,{\nabla f(\mathbf {x}^*)}^\mathrm{T} \left( \mathbf {x}- \mathbf {x}^* \right) \right. \\&\qquad \qquad \left. + \displaystyle \frac{1}{2}{\left( \mathbf {x}- \mathbf {x}^* \right) }^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \left( \mathbf {x}- \mathbf {x}^* \right) \ge \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \right\} . \end{aligned}$$

As an illustration of the application of Lemma 8, let us reconsider Example 3. Recall that \(X = (0.4,1.0) \times (0.5,2.0)\), \(m_I = 2\), \(m_E = 1\), \(f(\mathbf {x}) = -12x_1 - 7x_2 + x^2_2\), \(g_1(\mathbf {x}) = x_1 - 0.9\), \(g_2(\mathbf {x}) = 0.5 - x_1\), and \(h(\mathbf {x}) = x_2 + 2x^4_1 - 2\) with \(\mathbf {x}^* \approx (0.72,1.47)\). Let \(\varepsilon \le 0.1\). We have \(\mathscr {F}(X) = \left\{ \mathbf {x}\in X : \mathbf {g}(\mathbf {x}) \le \mathbf {0}, h(\mathbf {x}) = 0 \right\} \). Choose \(\alpha = 0.1\), \(\gamma = 2\), and \(\hat{\alpha } = 0.1\) in Lemma 8. We have \(X_5 = \left\{ \mathbf {x} : x_2 = 2 - 2x^4_1, \, -12x_1 - 7x_2 + x^2_2 \le f(\mathbf {x}^*) + \varepsilon \right\} \). From Lemma 8 and Remark 4, we have \(\hat{X}_{5} = \left\{ \mathbf {x}\in \mathscr {N}^2_{0.1}(\mathbf {x}^*) : {||}\mathbf {x}-\mathbf {x}^*{||}^2 \le 0.5\varepsilon \right\} \) (since f is quadratic). Note that an even better estimate of \(X_5\) may be obtained using Lemma 9 by accounting for the fact that \(X_5\) resides in a reduced-dimensional manifold.

The following examples illustrate two additional cases for which the assumptions of Lemma 8 hold.

Example 9

Let \(\varepsilon \le 0.5\), \(X = (-2,2) \times (-2,2)\), \(m_I = 2\), and \(m_E = 0\) with \(f(\mathbf {x}) = x_2\), \(g_1(\mathbf {x}) = x^2_1 - x_2\), \(g_2(\mathbf {x}) = x_2 - 1\), and \(\mathbf {x}^* = (0,0)\). We have \(\mathscr {F}(X) = \left\{ \mathbf {x} : x_2 \ge x^2_1, \, x_2 \le 1 \right\} \). Choose \(\alpha = 1\), \(\gamma = 0.5\), and \(\hat{\alpha } = 1\). From Remark 4, we have \(X_5 = \left\{ \mathbf {x}\in [-\sqrt{\varepsilon },+\sqrt{\varepsilon }] \times [0,\varepsilon ] : x_2 \ge x^2_1 \right\} \subset \left\{ \mathbf {x} : {||}\mathbf {x}{||}^2 \le 2\varepsilon \right\} = \hat{X}_5\).

Example 10

Let \(\varepsilon \le 0.5\), \(X = (-2,2) \times (-2,2)\), \(m_I = 3\), and \(m_E = 0\) with \(f(\mathbf {x}) = 2x^2_1 + x_2\), \(g_1(\mathbf {x}) = -x^2_1 - x_2\), \(g_2(\mathbf {x}) = -x_1\), \(g_3(\mathbf {x}) = x^2_1 + x^2_2 - 1\), and \(\mathbf {x}^* = (0,0)\). We have \(\mathscr {F}(X) = \left\{ \mathbf {x} : x_2 \ge -x^2_1, \, x_1 \ge 0, \, x^2_1 + x^2_2 \le 1 \right\} \) with \(\alpha = 1\), \(\gamma = 0.5\), \(\hat{\alpha } = 1\), and \(X_5 = \left\{ \mathbf {x} : x_2 + 2x^2_1 \le \varepsilon , x_2 \ge -x^2_1, x_1 \ge 0 \right\} \subset \left\{ \mathbf {x} : {||}\mathbf {x}{||}^2 \le 2\varepsilon \right\} = \hat{X}_5\) (see Remark 4).

The overconservatism of the estimate \(\hat{X}_5\) in the above two examples (with regards to its dependence on \(\varepsilon \)) is primarily due to the fact that the linear growth of the objective function in the direction of its gradient is not taken into account. This observation is formalized and taken advantage of in Lemma 10 to obtain a less conservative estimate. Figure 4 plots \(X_5\) and \(\hat{X}_5\), obtained using different estimation techniques, for \(\varepsilon = 0.5\) and \(\varepsilon = 0.1\) in Example 10. The benefit of using the estimate in Remark 4 over that of Lemma 8 is seen from Fig. 4a, b, and the benefit of using the estimate from Lemma 10 (using \(\rho _1 = 3\), \(\rho _2 = 1.5\)) over that of Lemma 8 is seen from Fig. 4a, c. It can be observed from Fig. 4c that the constraint \(-\rho _1\varepsilon \le {\nabla f(\mathbf {x}^*)}^\mathrm{T} \left( \mathbf {x}- \mathbf {x}^* \right) \) in Lemma 10 is not active on the region \(\left\{ \mathbf {x} : \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \le \varepsilon \right\} \) for \(\varepsilon = 0.5\). To illustrate the benefit of this constraint in Lemma 10, we consider \(\varepsilon = 0.1\). Fig. 4d, e demonstrate the advantages of using the estimates in Remark 4 and Lemma 10, respectively, over the estimate in Lemma 8, and Fig. 4f combines the benefits of the estimates from Lemma 10 and Remark 4 by using the estimate

$$\begin{aligned} \hat{X}_5 = \Big \{\mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \, : \,&\gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon , \, -\rho _1\varepsilon \le {\nabla f(\mathbf {x}^*)}^\mathrm{T} \left( \mathbf {x}- \mathbf {x}^* \right) \le \rho _2\varepsilon , \\&{\nabla f(\mathbf {x}^*)}^\mathrm{T} \left( \mathbf {x}- \mathbf {x}^* \right) + \displaystyle \frac{1}{2}{\left( \mathbf {x}- \mathbf {x}^* \right) }^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \left( \mathbf {x}- \mathbf {x}^* \right) \ge \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \Big \}. \end{aligned}$$
Fig. 4
figure 4

Plots of \(X_5\) (solid regions) and \(\hat{X}_5\) (area between the dotted lines) for Example 10. The filled-in triangles correspond to the minimizer \(\mathbf {x}^*\), and the dash-dotted lines represent the axes translated to \(\mathbf {x}^*\), a \(X_5\) and estimate \(\hat{X}_5\) from Lemma 8 for \(\varepsilon = 0.5\), b \(X_5\) and estimate \(\hat{X}_5\) from Remark 4 for \(\varepsilon = 0.5\), c \(X_5\) and estimate \(\hat{X}_5\) from Lemma 10 for \(\varepsilon = 0.5\), d \(X_5\) and estimate \(\hat{X}_5\) from Remark 4 for \(\varepsilon = 0.1\), e \(X_5\) and estimate \(\hat{X}_5\) from Lemma 10 for \(\varepsilon = 0.1\) (f) \(X_5\) and estimate \(\hat{X}_5\) from Lemma 10 and Remark 4 for \(\varepsilon = 0.1\)

The following theorem follows from Lemma 3 in [29], and provides a conservative estimate of the number of boxes of width \(\delta \) required to cover the estimate \(\hat{X}_5\) from Lemma 8. Consequently, from Lemma 1 and the theorem below, we can get a conservative estimate of the number of boxes required to cover \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) and estimate the extent of the cluster problem on that region.

Theorem 3

Consider Problem (P), and suppose the assumptions of Lemma 8 hold. Let \(\delta = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}\) and \(r = \displaystyle \sqrt{\displaystyle \frac{2\varepsilon }{\gamma }}\).

  1. 1.

    If \(\delta \ge 2r\), let \(N = 1\).

  2. 2.

    If \(\displaystyle \frac{2r}{\sqrt{m-1}} > \delta \ge \displaystyle \frac{2r}{\sqrt{m}}\) for some \(m \in \mathbb {N}\) with \(m \le n_x\) and \(2 \le m \le 18\), then let

    $$\begin{aligned} N = \displaystyle \sum _{i=0}^{m-1}{2^i \left( {\begin{array}{c}n_x\\ i\end{array}}\right) } + 2n_x{\lceil }\displaystyle \frac{m-9}{9}{\rceil }. \end{aligned}$$
  3. 3.

    Otherwise, let

    $$\begin{aligned} N= & {} {\lceil }2\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( \frac{1}{2} - \frac{1}{\beta ^*} \right) } \gamma ^{-\frac{1}{2}}{\rceil }^{n_x-1} \left( {\lceil }2\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( \frac{1}{2} - \frac{1}{\beta ^*} \right) } \gamma ^{-\frac{1}{2}}{\rceil }\right. \left. + 2n_x {\lceil }(\sqrt{2}-1)\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( \frac{1}{2} - \frac{1}{\beta ^*} \right) } \gamma ^{-\frac{1}{2}}{\rceil } \right) . \end{aligned}$$

Then, N is an upper bound on the number of boxes of width \(\delta \) required to cover \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\).

Proof

From Lemma 8, we have that the set \(\hat{X}_5 = \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon \right\} \) provides a conservative estimate of \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\). The desired result follows from Lemma 3 in [29]. \(\square \)

For the case of unconstrained global optimization, Theorem 3 effectively reduces to Theorem 1 in [29] with \(\gamma \) equal to half the smallest eigenvalue of \(\nabla ^2 f(\mathbf {x}^*)\) (note that there is a ‘factor of two difference’ from the analysis in [29] because we consider an appropriate \(\hat{\alpha } \in (0, \alpha ]\)).

Remark 5

Under the assumptions of Theorem 3, the dependence of N on \(\varepsilon \) disappears when the lower bounding scheme has second-order convergence on \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \mathscr {F}(X)\). This is similar to the case of unconstrained global optimization where at least second-order convergent lower bounding schemes are required to eliminate the cluster problem.

Finally, we present two sets of additional assumptions over those of Lemma 8 under which less conservative estimates of the cluster problem on \(X_5\) can be obtained. The first result in this regard, similar to Lemma 6, refines the analysis of Lemma 8 when Problem (P) contains equality constraints that can locally be eliminated using the implicit function theorem [22].

Lemma 9

Consider Problem (P) with \(1 \le m_E < n_x\). Suppose f is twice-differentiable at \(\mathbf {x}^*\), and \(\exists \alpha > 0\), \(\gamma > 0\) such that \(\mathbf {h}\) is continuously differentiable on \(\mathscr {N}^2_{\alpha }(\mathbf {x}^*)\) and

$$\begin{aligned} {\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}+ \displaystyle \frac{1}{2}{\mathbf {d}}^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \mathbf {d}\ge \gamma {\mathbf {d}}^\mathrm{T} \mathbf {d}, \quad \forall \mathbf {d}\in \left\{ \mathbf {d} : (\mathbf {x}^*+\mathbf {d}) \in \mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap \mathscr {F}(X) \right\} . \end{aligned}$$

Furthermore, suppose the variables \(\mathbf {x}\) can be reordered and partitioned into dependent variables \(\mathbf {z}\in \mathbb {R}^{m_E}\) and independent variables \(\mathbf {p}\in \mathbb {R}^{n_x - m_E}\), with \(\mathbf {x}\equiv (\mathbf {z},\mathbf {p})\), such that \(\nabla _{\mathbf {z}} \mathbf {h}((\mathbf {z},\mathbf {p}))\) is nonsingular on \(\mathscr {N}^2_{\alpha }((\mathbf {z}^*,\mathbf {p}^*))\), where \(\mathbf {x}^* \equiv (\mathbf {z}^*,\mathbf {p}^*)\). Then, \(\exists \alpha _{\mathbf {p}}, \alpha _{\mathbf {z}} \in (0,\alpha ]\), a continuously differentiable function \(\varvec{\phi }:\mathscr {N}^2_{\alpha _{\mathbf {p}}}(\mathbf {p}^*) \rightarrow \mathscr {N}^2_{\alpha _{\mathbf {z}}}(\mathbf {z}^*)\), and \(\hat{\alpha } \in (0, \alpha _{\mathbf {p}})\) such that the region \(\left( \mathscr {N}^2_{\alpha _{\mathbf {z}}}(\mathbf {z}^*) \times \mathscr {N}^2_{\hat{\alpha }}(\mathbf {p}^*)\right) \cap X_5\) can be conservatively approximated by

$$\begin{aligned} \hat{X}_{5} = \left\{ (\mathbf {z},\mathbf {p}) \in \mathscr {N}^2_{\alpha _{\mathbf {z}}}(\mathbf {z}^*) \times \mathscr {N}^2_{\hat{\alpha }}(\mathbf {p}^*) : \mathbf {z}= \varvec{\phi }(\mathbf {p}), \, \gamma {||}\mathbf {p}- \mathbf {p}^*{||}^2 \le 2\varepsilon \right\} . \end{aligned}$$

Proof

The result follows from the proof of Lemma 8 and the implicit function theorem [22, Chapter 9]. \(\square \)

Lemma 9 can be used to obtain a less conservative estimate of the number of boxes of width \(\delta \) required to cover \(\hat{X}_5\) as shown in the following corollary of Theorem 3. It provides an upper bound on the number of boxes of width \(\delta \) required to cover \(X_5\) that scales as \(O\left( \varepsilon ^{(n_x-m_E) \left( \frac{1}{2}-\frac{1}{\beta ^*} \right) } \right) \) in contrast to the scaling \(O\left( \varepsilon ^{n_x \left( \frac{1}{2}-\frac{1}{\beta ^*} \right) } \right) \) from Theorem 3.

Corollary 3

Suppose the assumptions of Lemma 9 hold. Let \(\delta = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}\) and \(r = \displaystyle \sqrt{\displaystyle \frac{2\varepsilon }{\gamma }}\). Define

$$\begin{aligned} M_k&:= \left( \underset{\mathbf {p}\in cl \left( \mathscr {N}^2_{\hat{\alpha }}(\mathbf {p}^*) \right) }{\max } {||}\nabla \phi _k(\mathbf {p}){||}\right) \sqrt{n_x - m_E}, \quad \forall k \in \{1,\ldots ,m_E\}, \\ K&:= \left\{ k \in \{1,\ldots ,m_E\} : M_k > 1 \right\} . \end{aligned}$$
  1. 1.

    If \(\delta \ge 2r\), let \(N = \displaystyle \prod _{k \in K}{M_k}\).

  2. 2.

    If \(\displaystyle \frac{2r}{\sqrt{m-1}} > \delta \ge \displaystyle \frac{2r}{\sqrt{m}}\) for some \(m \in \mathbb {N}\) with \(m \le n_x - m_E\) and \(2 \le m \le 18\), then let

    $$\begin{aligned} N = \left( \displaystyle \sum _{i=0}^{m-1}{2^i \left( {\begin{array}{c}n_x - m_E\\ i\end{array}}\right) } + 2\left( n_x - m_E \right) {\lceil }\displaystyle \frac{m-9}{9}{\rceil } \right) \displaystyle \prod _{k \in K}{M_k}. \end{aligned}$$
  3. 3.

    Otherwise, let

    $$\begin{aligned} N =&{\lceil }2\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( \frac{1}{2} - \frac{1}{\beta ^*} \right) } \gamma ^{-\frac{1}{2}}{\rceil }^{n_x - m_E - 1} \Bigg ( {\lceil }2\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( \frac{1}{2} - \frac{1}{\beta ^*} \right) } \gamma ^{-\frac{1}{2}}{\rceil } + \Bigg . \\&\Bigg . \quad \quad \quad 2\left( n_x - m_E \right) {\lceil }(\sqrt{2}-1)\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( \frac{1}{2} - \frac{1}{\beta ^*} \right) } \gamma ^{-\frac{1}{2}}{\rceil } \Bigg ) \displaystyle \prod _{k \in K}{M_k}. \end{aligned}$$

Then, N is an upper bound on the number of boxes of width \(\delta \) required to cover \(\hat{X}_{5}\).

Proof

The proof is similar to the proof of Corollary 2, and is therefore omitted. \(\square \)

The next result refines the analysis of Lemma 8 further, in part by accounting for the fact that f grows linearly around \(\mathbf {x}^*\) in the direction of its gradient.

Lemma 10

Consider Problem (P), and suppose the assumptions of Lemma 8 hold. Then \(\exists \hat{\alpha } \in (0,\alpha ]\) and constants \(\rho _1, \rho _2 \ge 0\) such that the region \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_5\) can be conservatively approximated by

$$\begin{aligned} \hat{X}_5 = \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : \gamma {||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon , \, -\rho _1\varepsilon \le {\nabla f(\mathbf {x}^*)}^\mathrm{T} \left( \mathbf {x}- \mathbf {x}^* \right) \le \rho _2\varepsilon \right\} . \end{aligned}$$

Proof

See “Proof of Lemma 10 in Appendix”. \(\square \)

The previous lemma can be used to obtain a less conservative estimate of the number of boxes of width \(\delta \) required to cover \(\hat{X}_5\) when \(\varepsilon \) is sufficiently-small and the convergence order \(\beta ^* > 1\). This is presented in the following corollary of Theorem 3, which provides an upper bound on the number of boxes of width \(\delta \) required to cover \(X_5\) that scales as \(O\left( \varepsilon ^{(n_x-1) \left( \frac{1}{2}-\frac{1}{\beta ^*} \right) } \right) \) in contrast to the scaling \(O\left( \varepsilon ^{n_x \left( \frac{1}{2}-\frac{1}{\beta ^*} \right) } \right) \) from Theorem 3.

Corollary 4

Suppose the assumptions of Lemma 10 hold. Let \(\delta = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}\) and \(r = \displaystyle \sqrt{\displaystyle \frac{2\varepsilon }{\gamma }}\). Suppose \(\beta ^* > 1\), \(\varepsilon \) is sufficiently-small that \((\rho _1 + \rho _2) \varepsilon \ll \delta \), and \(\nabla f(\mathbf {x}^*) \ne \mathbf {0}\).

  1. 1.

    If \(\delta \ge 2r\), let \(N = 1\).

  2. 2.

    If \(\displaystyle \frac{2r}{\sqrt{m-1}} > \delta \ge \displaystyle \frac{2r}{\sqrt{m}}\) for some \(m \in \mathbb {N}\) with \(m \le n_x - 1\) and \(2 \le m \le 18\), then let

    $$\begin{aligned} N = \displaystyle \sum _{i=0}^{m-1}{2^i \left( {\begin{array}{c}n_x - 1\\ i\end{array}}\right) } + 2\left( n_x - 1 \right) {\lceil }\displaystyle \frac{m-9}{9}{\rceil }. \end{aligned}$$
  3. 3.

    Otherwise, let

    $$\begin{aligned} N =&{\lceil }2\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( \frac{1}{2} - \frac{1}{\beta ^*} \right) } \gamma ^{-\frac{1}{2}}{\rceil }^{n_x - 2} \Bigg ( {\lceil }2\left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( \frac{1}{2} - \frac{1}{\beta ^*} \right) } \gamma ^{-\frac{1}{2}}{\rceil } \\&+2\left( n_x \!- \!1 \right) {\lceil }(\sqrt{2}\!-\!1) \left( \tau ^* \right) ^{\frac{1}{\beta ^*}} \varepsilon ^{\left( \frac{1}{2} \!-\! \frac{1}{\beta ^*} \right) } \gamma ^{-\frac{1}{2}}{\rceil } \Bigg ). \end{aligned}$$

Then, N is an upper bound on the number of boxes of width \(\delta \) required to cover \(\hat{X}_{5}\).

Proof

We have from Lemma 10 that \(\hat{X}_5\) is conservatively estimated by a sphere with radius \(= O(\displaystyle \sqrt{\varepsilon })\) truncated by the hyperplanes \({\nabla f(\mathbf {x}^*)}^\mathrm{T} \left( \mathbf {x}- \mathbf {x}^* \right) \le \rho _2\varepsilon \) and \({\nabla f(\mathbf {x}^*)}^\mathrm{T} \left( \mathbf {x}- \mathbf {x}^* \right) \ge -\rho _1\varepsilon \). Therefore, when \(\varepsilon \) is chosen to be small enough that \((\rho _1 + \rho _2)\varepsilon \ll \delta \), the desired result follows from Theorem 3 and the fact that any covering of the projection of \(\hat{X}_5\) on to the subspace perpendicular to \(\nabla f(\mathbf {x}^*)\) with boxes of width \(\delta \) can be directly extended to cover \(\hat{X}_5\) without using additional boxes. \(\square \)

Note that Corollary 4 can also be extended to the case when \(0 < \beta ^* \le 1\), in which case the estimate N may additionally depend on the values of \(\rho _1\) and \(\rho _2\).

3.2 Estimates for the number of boxes required to cover \(X_3 \backslash B_{\delta }\)

This section assumes that Problem (P) has a finite number of global minimizers, and \(\varepsilon \) is small enough that \(X_3\) is guaranteed to be contained in neighborhoods of constrained global minimizers under additional assumptions. An estimate for the number of boxes of certain widths required to cover some neighborhood of a constrained minimum \(\mathbf {x}^*\) that contains the subset of \(X_3\) around \(\mathbf {x}^*\) is provided under suitable assumptions. An estimate for the number of boxes required to cover \(X_3\) can be obtained by summing the above estimates over the set of constrained global minimizers. Throughout this section, we assume that \(\mathbf {x}^*\) is a constrained global minimizer; otherwise \(\exists \alpha > 0\) such that \(\mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap X_3 = \emptyset \). Furthermore, we assume that \(\mathbf {x}^*\) is at the center of a single box \(B_{\delta }\) of width \(\delta = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}\) placed while covering \(\hat{X}_5\) (see Remark 6 for the reason for this assumption).

The first result in this section provides a conservative estimate of the subset of \(X_3\) around a constrained minimizer \(\mathbf {x}^*\) under the following assumption: the infeasible region in some neighborhood of \(\mathbf {x}^*\) can be split into two subregions such that the objective function grows linearly in the first subregion and the measure of infeasibility grows linearly in the second subregion.

Lemma 11

Consider Problem (P). Suppose \(\mathbf {x}^*\) is a constrained minimizer, and the functions f, \(g_j\), \(\forall j \in \mathscr {A}(\mathbf {x}^*)\), and \(h_k\), \(\forall k \in \{1,\ldots ,m_E\}\), are locally Lipschitz continuous on X and directionally differentiable at \(\mathbf {x}^*\). Furthermore, suppose \(\exists \alpha > 0\) and a set \(\mathscr {D}_0\) such that

$$\begin{aligned} L_f= & {} \underset{\mathbf {d}\in \mathscr {D}_0 \cap \mathscr {D}_I}{\inf } f'(\mathbf {x}^*;\mathbf {d})> 0,\\ L_I= & {} \underset{\mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0}{\inf } \max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max } g^{\prime }_j(\mathbf {x}^*;\mathbf {d}), \underset{k \in \{1,\ldots ,m_E\}}{\max }\left|h^{\prime }_k(\mathbf {x}^*;\mathbf {d})\right|\right\} > 0, \end{aligned}$$

where \(\mathscr {D}_I\) is defined as

$$\begin{aligned} \mathscr {D}_I = \left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t > 0 \, : \, (\mathbf {x}^*+t\mathbf {d}) \in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} \right\} . \end{aligned}$$

Then, \(\exists \hat{\alpha } \in (0,\alpha ]\) such that the region

$$\begin{aligned} X^1_3 := \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_0 \cap \mathscr {D}_I, t > 0 \right\} \end{aligned}$$

can be conservatively approximated as

$$\begin{aligned} \hat{X}^1_{3} = \left\{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) : L_f{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon ^o \right\} , \end{aligned}$$

and the region

$$\begin{aligned} X^2_3 := \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0, t > 0 \right\} \end{aligned}$$

can be conservatively approximated as

$$\begin{aligned} \hat{X}^2_{3} = \left\{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) : L_I{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon ^f \right\} . \end{aligned}$$

Furthermore, suppose \(\mathbf {x}^*\) is at the center of a box, \(B_{\delta }\), of width \(\delta = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}\) placed while covering \(\hat{X}_5\). Then, the region

$$\begin{aligned} X^2_3 \backslash B_{\delta } = \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0, t > 0 \right\} \backslash B_{\delta } \end{aligned}$$

is conservatively characterized by

$$\begin{aligned} \left\{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \in \left( \displaystyle \frac{L_I}{4}\delta , \varepsilon ^f \right] \right\} \end{aligned}$$

whenever \(L_I\delta < 4\varepsilon ^f\).

Proof

Let \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}}\) with \({||}\mathbf {d}{||}_1 = 1\), \(\mathbf {d}\in \mathscr {D}_0\), and \(t = {||}\mathbf {x}- \mathbf {x}^*{||}_1 > 0\). We have (see Theorem 3.1.2 in [24])

$$\begin{aligned} f(\mathbf {x})&= f(\mathbf {x}^* + t\mathbf {d}) \\&= f(\mathbf {x}^*) + f'(\mathbf {x}^*;(\mathbf {x}- \mathbf {x}^*)) + o({||}\mathbf {x}- \mathbf {x}^*{||}_1) \\&= f(\mathbf {x}^*) + tf'(\mathbf {x}^*;\mathbf {d}) + o(t) \\&\ge f(\mathbf {x}^*) + L_f t + o(t). \end{aligned}$$

Consequently, there exists \(\hat{\alpha }_0 \in (0,\alpha ]\) such that for all \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \left( \mathscr {F}({X})\right) ^{\text {C}}\) with \({||}\mathbf {d}{||}_1 = 1\), \(\mathbf {d}\in \mathscr {D}_0\) and \(t \in [0,\hat{\alpha }_0)\):

$$\begin{aligned} f(\mathbf {x}) \ge f(\mathbf {x}^*) + L_f t + o(t) \ge f(\mathbf {x}^*) + \displaystyle \frac{L_f}{2} t. \end{aligned}$$

Next, consider \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}}\) with \({||}\mathbf {d}{||}_1 = 1\), \(\mathbf {d}\not \in \mathscr {D}_0\), and \(t = {||}\mathbf {x}- \mathbf {x}^*{||}_1 > 0\). We have

$$\begin{aligned}&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max }\left\{ g_j(\mathbf {x}) \right\} , \underset{k \in \{1,\ldots ,m_E\}}{\max }\left\{ \left|h_k(\mathbf {x}) \right|\right\} \right\} \\ =&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max }\left\{ g_j(\mathbf {x}^* + t\mathbf {d}) \right\} , \underset{k \in \{1,\ldots ,m_E\}}{\max }\left\{ \left|h_k(\mathbf {x}^* + t\mathbf {d}) \right|\right\} \right\} \\ =&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max }\left\{ t g^{\prime }_j(\mathbf {x}^*;\mathbf {d}) + o(t) \right\} , \underset{k \in \{1,\ldots ,m_E\}}{\max }\left\{ \left|t h^{\prime }_k(\mathbf {x}^*;\mathbf {d}) + o(t) \right|\right\} \right\} . \end{aligned}$$

Consequently, there exists \(\hat{\alpha }_1 \in (0,\alpha ]\) such that for all \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \left( \mathscr {F}({X})\right) ^{\text {C}}\) with \({||}\mathbf {d}{||}_1 = 1\), \(\mathbf {d}\not \in \mathscr {D}_0\) and \(t \in [0,\hat{\alpha }_1)\):

$$\begin{aligned} d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \ge&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max }\left\{ g_j(\mathbf {x}) \right\} , \underset{k \in \{1,\ldots ,m_E\}}{\max }\left\{ \left|h_k(\mathbf {x}) \right|\right\} \right\} \\ =&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max }\left\{ t g^{\prime }_j(\mathbf {x}^*;\mathbf {d}) + o(t) \right\} , \underset{k \in \{1,\ldots ,m_E\}}{\max }\left\{ \left|t h^{\prime }_k(\mathbf {x}^*;\mathbf {d}) + o(t) \right|\right\} \right\} \\ \ge&\displaystyle \frac{L_I}{2} t, \end{aligned}$$

where Step 1 follows from the fact that \({||}\mathbf {z}{||} \ge {||}\mathbf {z}{||}_{\infty }\), \(\forall \mathbf {z}\in \mathbb {R}^{m_I} \times \mathbb {R}^{m_E}\).

Set \(\hat{\alpha } = \min \left\{ \hat{\alpha }_0, \hat{\alpha }_1 \right\} \). Then

$$\begin{aligned} \forall \mathbf {x}\in X^1_3 := \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_0 \cap \mathscr {D}_I, t > 0 \right\} , \end{aligned}$$

we have \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \left( \mathscr {F}({X})\right) ^{\text {C}}\) with \({||}\mathbf {d}{||}_1 = 1\), \(\mathbf {d}\in \mathscr {D}_0\) and \(t = {||}\mathbf {x}- \mathbf {x}^*{||}_1 < \hat{\alpha }\), and

$$\begin{aligned} \varepsilon ^o \ge f(\mathbf {x}) - f(\mathbf {x}^*) \ge \displaystyle \frac{L_f}{2}t \implies L_f t = L_f{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon ^o. \end{aligned}$$

Furthermore,

$$\begin{aligned} \forall \mathbf {x}\in X^2_3 := \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0, t > 0 \right\} , \end{aligned}$$

we have \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in \left( \mathscr {F}({X})\right) ^{\text {C}}\) with \({||}\mathbf {d}{||}_1 = 1\), \(\mathbf {d}\not \in \mathscr {D}_0\) and \(t = {||}\mathbf {x}- \mathbf {x}^*{||}_1 < \hat{\alpha }\), and

$$\begin{aligned} \varepsilon ^f \ge d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \ge \displaystyle \frac{L_I}{2}t \implies L_I t = L_I{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon ^f. \end{aligned}$$

Finally, for every \(\mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \{\mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}}:\mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0, t > 0\}\) with \(t \le \displaystyle \frac{\delta }{2}\), we have \(\mathbf {x}\in B_{\delta }\). Consequently, for each

$$\begin{aligned} \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0, t > 0 \right\} \backslash B_{\delta }, \end{aligned}$$

we have \(t > \displaystyle \frac{\delta }{2}\) and therefore,

$$\begin{aligned} d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \ge \displaystyle \frac{L_I}{2}t > \displaystyle \frac{L_I}{4}\delta . \end{aligned}$$

The desired result follows when \(L_I \delta < 4\varepsilon ^f\); otherwise, if \(L_I \delta \ge 4\varepsilon ^f\), then

$$\begin{aligned} \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0, t > 0 \right\} \subset B_{\delta }. \end{aligned}$$

\(\square \)

A conservative estimate of the number of boxes of certain widths required to cover \(\left( \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3\right) \backslash B_{\delta }\) can be obtained by estimating the number of boxes of certain widths required to cover \(\hat{X}^1_3\) and \(\hat{X}^2_3 \backslash B_{\delta }\) (see Theorem 4). The following remark is in order.

Remark 6

  1. 1.

    Lemma 11 does not hold when \(\not \exists \alpha > 0\), \(\mathscr {D}_0\) such that both \(L_f\) and \(L_I\) are positive. Example 4 illustrates a case when no valid partition of \(\mathscr {D}_I\) exists (since \([x^{L },0)\), which is a subset of \(X_3\), corresponds to \(d = -1\) which has an empty intersection with every valid choice of \(\mathscr {D}_0\), and \(\nabla g_1(x^*) = 0\)). Note that \(\mathscr {D}_0\) may be chosen to be \(\emptyset \), but it cannot be chosen to be \(\mathscr {D}_I\) when the objective function is differentiable at \(\mathbf {x}^*\). This is because when \(\nabla f(\mathbf {x}^*) \ne \mathbf {0}\), the direction \(-\nabla f(\mathbf {x}^*)\) leads to infeasible points around \(\mathbf {x}^*\). One potential choice of \(\mathscr {D}_0\) is

    $$\begin{aligned} \mathscr {D}_0 = \Bigg \{\mathbf {d}\, : \,&{||}\mathbf {d}{||}_1 = 1, \, \exists t > 0 \, : \, (\mathbf {x}^*+t\mathbf {d}) \in \mathscr {N}^1_{\alpha }(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}}, \\&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max }\left\{ g^{\prime }_j(\mathbf {x}^*;\mathbf {d}) \right\} , \underset{k \in \{1,\ldots ,m_E\}}{\max }\left\{ \left|h^{\prime }_k(\mathbf {x}^*;\mathbf {d}) \right|\right\} \right\} \le \theta \Bigg \} \end{aligned}$$

    for some choice of \(\theta > 0\), so long as \(\underset{\mathbf {d}\in \mathscr {D}_0}{\inf } f'(\mathbf {x}^*;\mathbf {d}) > 0\). Proposition 4 shows that the assumptions of Lemma 11 will not be satisfied when Problem (P) does not contain any active inequality constraints and the minimizer corresponds to a KKT point for Problem (P).

  2. 2.

    The inequality \(L_I \delta < 4 \varepsilon ^f\) is equivalent to

    $$\begin{aligned} L_I \delta = L_I \left( \displaystyle \frac{\varepsilon ^f}{\tau ^{I}}\right) ^{\frac{1}{\beta ^I}} < 4\varepsilon ^f. \end{aligned}$$

    Since \(\varepsilon ^f\) can be taken to be sufficiently-small, the above inequality holds only when

    $$\begin{aligned} (\varepsilon ^f)^{\frac{1}{\beta ^I}} \le \varepsilon ^f \iff \beta ^I \le 1, \end{aligned}$$

    i.e., if \(\beta ^I > 1\), we can choose \(\varepsilon ^f\) to be small-enough so that \(L_I \delta \ge 4\varepsilon ^f\). Note that if \(L_I \delta \ge 4\varepsilon ^f\), the region

    $$\begin{aligned} \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0, t > 0 \right\} \end{aligned}$$

    has already been covered while covering \(\hat{X}_5\) since

    $$\begin{aligned} \displaystyle \frac{L_I \delta }{4} \ge \varepsilon ^f \ge \displaystyle \frac{L_I}{2}t \implies t \le \displaystyle \frac{\delta }{2}, \end{aligned}$$

    which implies \(\mathbf {x}= \mathbf {x}^* + t\mathbf {d}\in B_{\delta }\). The motivation for excluding the region \(B_{\delta }\) from \(X_3\) is as follows. Lemma 2 shows that if the measure of infeasibility, as determined by the distance function d, is strictly greater than \(\varepsilon ^f\) at each point in the domain of a node, the node can be fathomed by a box of width \(\delta \). However, if \(\mathbf {x}^*\) is a constrained minimizer, we will have points in \(X_3\) which are arbitrarily close to \(\mathbf {x}^*\) and have a measure of infeasibility that is arbitrarily close to 0. Such points will then have to be fathomed by boxes of width much smaller than \(\delta \) (and arbitrarily close to 0). To avoid this issue, such points are assumed to be eliminated when \(X_5\) is covered by boxes of width \(\delta \).

  3. 3.

    \(\hat{\alpha }\) depends on the local behavior of f, \(g_j\), \(\forall j \in \mathscr {A}(\mathbf {x}^*)\), and \(h_k\), \(\forall k \in \{1,\ldots ,m_E\}\), around \(\mathbf {x}^*\), but is independent of \(\varepsilon \). Consequently, for sufficiently small \(\varepsilon \) we have \(\hat{X}^1_{3} = \left\{ \mathbf {x}\in X : L_f{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon ^o \right\} \) and \(\hat{X}^2_{3} = \left\{ \mathbf {x}\in X : L_I{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon ^f \right\} \). Additionally, if f and \(g_j\), \(\forall j \in \mathscr {A}(\mathbf {x}^*)\), are convex on \(\mathscr {N}^1_{\alpha }(\mathbf {x}^*)\) and \(h_k\), \(\forall k \in \{1,\ldots ,m_E\}\), are affine on \(\mathscr {N}^1_{\alpha }(\mathbf {x}^*)\), we can choose \(\hat{\alpha } = \alpha \). Furthermore,

    $$\begin{aligned} X^1_3 := \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_0 \cap \mathscr {D}_I, t > 0 \right\} \end{aligned}$$

    can be conservatively approximated as \(\hat{X}^1_{3} = \left\{ \mathbf {x}\in X : L_f{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le \varepsilon ^o \right\} \),

    $$\begin{aligned} X^2_3 := \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0, t > 0 \right\} \end{aligned}$$

    can be conservatively approximated as \(\hat{X}^2_{3} = \left\{ \mathbf {x}\in X : L_I{||}\mathbf {x}- \mathbf {x}^*{||}_1 \le \varepsilon ^f \right\} \), and the region

    $$\begin{aligned} \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + t\mathbf {d}) \in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0, t > 0 \right\} \backslash B_{\delta } \end{aligned}$$

    is conservatively characterized by

    $$\begin{aligned} \left\{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \in \left( \displaystyle \frac{L_I}{2}\delta , \varepsilon ^f \right] \right\} \end{aligned}$$

    whenever \(L_I\delta < 2\varepsilon ^f\).

  4. 4.

    Similar to Proposition 1, the following less conservative estimates of \(X^1_3\) and \(X^2_3\) can be obtained:

    $$\begin{aligned} \hat{X}^1_3 =&\big \{\mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \, : \, L_f {||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon ^o, \, f'(\mathbf {x}^*; \mathbf {x}- \mathbf {x}^*) \ge L_f {||}\mathbf {x}- \mathbf {x}^*{||}_1 \big \}, \\ \hat{X}^2_3 =&\bigg \{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \, : \, L_I {||}\mathbf {x}- \mathbf {x}^*{||}_1 \le 2\varepsilon ^f, \\&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max } g^{\prime }_j(\mathbf {x}^*; \mathbf {x}- \mathbf {x}^*), \underset{k \in \{1,\ldots ,m_E\}}{\max }\left|h^{\prime }_k(\mathbf {x}^*; \mathbf {x}- \mathbf {x}^*)\right|\right\} \ge L_I {||}\mathbf {x}- \mathbf {x}^*{||}_1 \bigg \}. \end{aligned}$$
Fig. 5
figure 5

Illustration of the sets \(\mathscr {D}_0\) and \(\mathscr {D}_I \backslash \mathscr {D}_0\), the sets \(X^1_3\) and \(X^2_3\), and their estimates \(\hat{X}^1_3\) and \(\hat{X}^2_3\) for Example 2. The dashed lines represent the set X, and the filled-in triangles represent the minimum \(\mathbf {x}^*\). (Top Plot) The solid region represents the feasible region and the solid vectors represent the gradients of the objective and the constraints. The set of directions between the dot-dashed lines (the part in which the feasible region resides) defines the set \(\mathscr {D}_0\), and the remaining directions define the set \(\mathscr {D}_I \backslash \mathscr {D}_0\). The dotted line represents the direction in \(\mathscr {D}_I \backslash \mathscr {D}_0\) in which both constraints grow equally quickly in a first-order sense. (Other Plots) The solid regions represent the set \(X^1_3\) or \(X^2_3\), the area between the dotted lines represent the estimate \(\hat{X}^1_3\) or \(\hat{X}^2_3\), and the dash-dotted lines represent the axes translated to \(\mathbf {x}^*\). All plots use \(\varepsilon ^o = 0.03\) and \(\varepsilon ^f = 0.05\). a Illustration of the sets \(\mathscr {D}_0\) and \(\mathscr {D}_I \backslash \mathscr {D}_0\), b \(X^1_3\) and estimate \(\hat{X}^1_3\) from Lemma 11, c \(X^2_3\) and estimate \(\hat{X}^2_3\) from Lemma 11, d \(X^1_3\) and estimate \(\hat{X}^1_3\) from Remark 6, e \(X^2_3\) and estimate \(\hat{X}^2_3\) from Remark 6

As an illustration of the application of Lemma 11, let us reconsider Example 2. Recall that \(X = (2.2,2.5) \times (2.9,3.3)\), \(m_I = 3\), \(m_E = 0\), \(f(\mathbf {x}) = -x_1 - x_2\), \(g_1(\mathbf {x}) = x_2 - 2x^4_1 + 8x^3_1 - 8x^2_1 - 2\), \(g_2(\mathbf {x}) = x_2 - 4x^4_1 + 32x^3_1 - 88x^2_1 + 96x_1 - 36\), and \(g_3(\mathbf {x}) = 3 - x_2\) with \(\mathbf {x}^* \approx (2.33,3.18)\). Let \(\varepsilon ^o \le 0.03\) and \(\varepsilon ^f \le 0.05\). We have \(\mathscr {F}(X) = \left\{ \mathbf {x}\in X : \mathbf {g}(\mathbf {x}) \le \mathbf {0} \right\} \), \(\nabla f(\mathbf {x}^*) = (-1,-1)\), \(\nabla g_1(\mathbf {x}^*) \approx (-8.164,1)\), and \(\nabla g_2(\mathbf {x}^*) \approx (4.700,1)\). Choose \(\alpha = +\infty \). \(\mathscr {D}_I = \left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t > 0 \, : \, (\mathbf {x}^* + t\mathbf {d}) \in \left( \mathscr {F}({X})\right) ^{\text {C}} \right\} \). Choose \(\mathscr {D}_0 = \left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, {\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}\ge 0.298 \right\} \) and \(\hat{\alpha } = +\infty \) in Lemma 11. From Lemma 11 and Remark 6, we have \(L_f = 0.298\) and \(L_I = 1\) with the estimates \(\hat{X}^1_{3} = \left\{ \mathbf {x} : 0.298{||}\mathbf {x}-\mathbf {x}^*{||}_1 \le \varepsilon ^o \right\} \) (since f is convex), and \(\hat{X}^2_{3} = \left\{ \mathbf {x} : {||}\mathbf {x}-\mathbf {x}^*{||}_1 \le 2\varepsilon ^f \right\} \). Figure 5 illustrates the set \(\mathscr {D}_0\), and plots the sets \(X^1_3\) and \(X^2_3\) along with their estimates \(\hat{X}^1_3\) and \(\hat{X}^2_3\) for \(\varepsilon ^o = 0.03\) and \(\varepsilon ^f = 0.05\).

The next result provides conditions under which the assumptions of Lemma 11 will not be satisfied. In particular, it is shown that the assumptions of Lemma 11 will not be satisfied if Problem (P) is purely equality-constrained and all the functions in Problem (P) are differentiable at a nonisolated constrained minimizer \(\mathbf {x}^*\).

Proposition 4

Consider Problem (P) with \(m_E \ge 1\). Suppose \(\mathbf {x}^*\) is a nonisolated constrained minimizer, f is differentiable at \(\mathbf {x}^*\), functions \(h_k\), \(k = 1,\ldots ,m_E\), are differentiable at \(\mathbf {x}^*\), and \(\mathscr {A}(\mathbf {x}^*) = \emptyset \). Furthermore, suppose there exist multipliers \(\varvec{\lambda }^* \in \mathbb {R}^{m_E}\) corresponding to the equality constraints such that \((\mathbf {x}^*,\mathbf {0},\varvec{\lambda }^*)\) is a KKT point. Then \(\not \exists \alpha > 0\), \(\mathscr {D}_0\) such that the assumptions of Lemma 11 are satisfied.

Proof

See “Proof of Proposition 4 in Appendix”. \(\square \)

The above result can be extended to the case when there exist active inequality constraints if all such constraints are strongly active at \(\mathbf {x}^*\) (see [2, Section 4.4]) and there exists \(\mathbf {d}\in T(\mathbf {x}^*)\) such that \({\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}= 0\).

Next, we revisit two equality-constrained examples from Sect. 3.1 for which the assumptions of Lemma 11 hold, and which do not satisfy individual assumptions of Proposition 4. Consider Example 7, and recall that \(X = (-2,2) \times (-2,2)\), \(m_I = 1\), and \(m_E = 1\) with \(f(\mathbf {x}) = x_1 + 10x^2_2\), \(g_1(\mathbf {x}) = x_1 - 1\), \(h(\mathbf {x}) = x_1 - {|}x_2{|}\), and \(\mathbf {x}^* = (0,0)\). Let \(\varepsilon ^o, \varepsilon ^f \le 0.25\). We have \(\mathscr {F}(X) = \left\{ \mathbf {x}\in X : x_1 = {|}x_2{|}, x_1 \le 1 \right\} \), \(\nabla f(\mathbf {x}^*) = (1,0)\), and \(h^{\prime }(\mathbf {x}^*;\mathbf {d}) = d_1 - {|}*{|}{d_2}\). Choose \(\alpha = +\infty \). We have \(\mathscr {D}_I = \left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t > 0 \, : \, (\mathbf {x}^* + t\mathbf {d}) \in \left( \mathscr {F}({X})\right) ^{\text {C}} \right\} \). Choose \(\mathscr {D}_0 = \left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, {\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}\ge 0.25 \right\} \) and \(\hat{\alpha } = +\infty \) in Lemma 11. From Lemma 11 and Remark 6, we have \(L_f = 0.25\) and \(L_I = 0.5\) with the estimates \(\hat{X}^1_{3} = \left\{ \mathbf {x} : 0.25{||}\mathbf {x}-\mathbf {x}^*{||}_1 \le \varepsilon ^o \right\} \) (since f is convex), and \(\hat{X}^2_{3} = \left\{ \mathbf {x} : 0.5{||}\mathbf {x}-\mathbf {x}^*{||}_1 \le 2\varepsilon ^f \right\} \).

Consider Example 8, and recall that \(X = (-2,2) \times (-2,2)\), \(m_I = 4\), and \(m_E = 1\) with \(f(\mathbf {x}) = x_1 + x_2\), \(g_1(\mathbf {x}) = -x_1\), \(g_2(\mathbf {x}) = -x_2\), \(g_3(\mathbf {x}) = x_1 - 1\), \(g_4(\mathbf {x}) = x_2 - 1\), \(h(\mathbf {x}) = x_2 - x^3_1\), and \(\mathbf {x}^* = (0,0)\). Let \(\varepsilon ^o, \varepsilon ^f \le \frac{1}{3}\). \(\mathscr {F}(X) = \left\{ \mathbf {x}\in [0,1]^2 : x_2 = x^3_1 \right\} \), \(\nabla f(\mathbf {x}^*) = (1,1)\), \(\nabla g_1(\mathbf {x}^*) = (-1,0)\), \(\nabla g_2(\mathbf {x}^*) = (0,-1)\), and \(\nabla h(\mathbf {x}^*) = (0,1)\). Choose \(\alpha = +\infty \). \(\mathscr {D}_I = \left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, \exists t > 0 \, : \, (\mathbf {x}^* + t\mathbf {d}) \in \left( \mathscr {F}({X})\right) ^{\text {C}} \right\} \). Choose \(\mathscr {D}_0 = \left\{ \mathbf {d} : {||}\mathbf {d}{||}_1 = 1, \, {\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}\ge \frac{1}{3} \right\} \) and \(\hat{\alpha } = +\infty \) in Lemma 11. From Lemma 11 and Remark 6, we have \(L_f = \frac{1}{3}\) and \(L_I = \frac{1}{3}\) with the estimates \(\hat{X}^1_{3} = \left\{ \mathbf {x} : {||}\mathbf {x}-\mathbf {x}^*{||}_1 \le 3\varepsilon ^o \right\} \) (since f is convex), and \(\hat{X}^2_{3} = \left\{ \mathbf {x} : {||}\mathbf {x}-\mathbf {x}^*{||}_1 \le 3\varepsilon ^f \right\} \) (since \(g_1\) and \(g_2\) are convex).

The next example illustrates a simple one-dimensional case which satisfies the assumptions of Lemma 11 with \(\mathscr {D}_0 = \emptyset \).

Example 11

Let \(\varepsilon ^f \le 0.5\), \(X = (-2,2)\), \(m_I = 2\), and \(m_E = 0\) with \(f(x) = x^3\), \(g_1(x) = x-1\), \(g_2(x) = -x\), and \(x^* = 0\). We have \(\mathscr {F}(X) = [0,1]\), \(\nabla f(x^*) = 0\), \(\nabla g_2(x^*) = -1\), and \(X_3 = [-\varepsilon ^f,0)\). Choose \(\alpha = +\infty \). We have \(\mathscr {D}_I = \{-1\}\). Choose \(\mathscr {D}_0 = \emptyset \) and \(\hat{\alpha } = +\infty \) in Lemma 11. From Lemma 11 and Remark 6, we have \(L_I = 1\) and \(\hat{X}^2_{3} = [-\varepsilon ^f, +\varepsilon ^f]\) (since \(g_2\) is convex).

The following result follows from Corollary 2.1 in [28] (also see the proof of Theorem 2). It provides a conservative estimate of the number of boxes of certain widths required to cover \(\hat{X}^1_3\) and \(\hat{X}^2_3 \backslash B_{\delta }\) from Lemma 11. Therefore, from Lemma 2 and 3 and the result below, we can get an upper bound on the worst-case number of boxes required to cover \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3\) and estimate the extent of the cluster problem on that region.

Theorem 4

Suppose the assumptions of Lemma 11 hold. Let \(\delta = \delta _f = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}} = \left( \displaystyle \frac{\varepsilon ^o}{\tau ^f}\right) ^{\frac{1}{\beta ^f}} = \left( \displaystyle \frac{\varepsilon ^f}{\tau ^I}\right) ^{\frac{1}{\beta ^I}}\), \(\delta _I = \left( \displaystyle \frac{L_I\delta }{4\tau ^I}\right) ^{\frac{1}{\beta ^I}} = \left( \displaystyle \frac{L_I}{4\tau ^I}\right) ^{\frac{1}{\beta ^I}} \left( \displaystyle \frac{\varepsilon ^f}{\tau ^I}\right) ^{\frac{1}{(\beta ^I)^2}}\), \(r_I = \displaystyle \frac{2\varepsilon ^f}{L_I}\), \(r_f = \displaystyle \frac{2\varepsilon ^o}{L_f}\).

  1. 1.

    If \(\delta _I \ge 2r_I\), let \(N_I = 1\).

  2. 2.

    If \(\displaystyle \frac{2r_I}{\bar{m}_I-1} > \delta _I \ge \displaystyle \frac{2r_I}{\bar{m}_I}\) for some \(\bar{m}_I \in \mathbb {N}\) with \(\bar{m}_I \le n_x\) and \(2 \le \bar{m}_I \le 5\), then let

    $$\begin{aligned} N_I = \displaystyle \sum _{i=0}^{\bar{m}_I-1}{2^i \left( {\begin{array}{c}n_x\\ i\end{array}}\right) } + 2n_x{\lceil }\displaystyle \frac{\bar{m}_I-3}{3}{\rceil }. \end{aligned}$$
  3. 3.

    Otherwise, let

    $$\begin{aligned} N_I = {\lceil }2B_I(\varepsilon ^f;\beta ^I,L_I,\tau _I){\rceil }^{n_x-1} \left( {\lceil }2B_I(\varepsilon ^f;\beta ^I,L_I,\tau _I){\rceil } {+} 2n_x {\lceil }B_I(\varepsilon ^f;\beta ^I,L_I,\tau _I){\rceil } \right) , \end{aligned}$$

    where

    $$\begin{aligned} B_I(\varepsilon ^f;\beta ^I,L_I,\tau _I) := 4^{\frac{1}{\beta ^I}} \left( \tau ^I \right) ^{\left( \frac{1}{\beta ^I}+\frac{1}{\left( \beta ^I\right) ^2}\right) } \left( \varepsilon ^f\right) ^{\left( 1 - \frac{1}{\left( \beta ^I\right) ^2}\right) } L^{-\left( 1 + \frac{1}{\beta ^I}\right) }_I. \end{aligned}$$
  4. 4.

    If \(\delta _f \ge 2r_f\), let \(N_f = 1\).

  5. 5.

    If \(\displaystyle \frac{2r_f}{m_f-1} > \delta _f \ge \displaystyle \frac{2r_f}{m_f}\) for some \(m_f \in \mathbb {N}\) with \(m_f \le n_x\) and \(2 \le m_f \le 5\), then let

    $$\begin{aligned} N_f = \displaystyle \sum _{i=0}^{m_f-1}{2^i \left( {\begin{array}{c}n_x\\ i\end{array}}\right) } + 2n_x{\lceil }\displaystyle \frac{m_f-3}{3}{\rceil }. \end{aligned}$$
  6. 6.

    Otherwise, let

    $$\begin{aligned} N_f= & {} {\lceil }2\left( \tau ^f \right) ^{\frac{1}{\beta ^f}} \left( \varepsilon ^o\right) ^{\left( 1 - \frac{1}{\beta ^f} \right) } L^{-1}_f{\rceil }^{n_x-1} \left( {\lceil }2\left( \tau ^f \right) ^{\frac{1}{\beta ^f}} \left( \varepsilon ^o\right) ^{\left( 1 - \frac{1}{\beta ^f} \right) } L^{-1}_f{\rceil } + 2n_x {\lceil }\left( \tau ^f \right) ^{\frac{1}{\beta ^f}} \left( \varepsilon ^o\right) ^{\left( 1 - \frac{1}{\beta ^f} \right) } L^{-1}_f{\rceil } \right) . \end{aligned}$$

Then, \(N_I\) is an upper bound on the number of boxes of width \(\delta _I\) required to cover \(\hat{X}^2_{3} \backslash B_{\delta }\), and \(N_f\) is an upper bound on the number of boxes of width \(\delta _f\) required to cover \(\hat{X}^1_{3}\).

Proof

The result on \(N_f\) follows from Lemma 3 and 11 and Corollary 2.1 in [28] (also see the proof of Theorem 2). To deduce the result on \(N_I\), note that we cover \(\hat{X}^2_{3} \backslash B_{\delta }\) with boxes of width \(\delta _I = \left( \displaystyle \frac{L_I\delta }{4\tau ^I}\right) ^{\frac{1}{\beta ^I}}\) since, from Lemma 11, we have

$$\begin{aligned} \hat{X}^2_{3} \backslash B_{\delta } \subset \left\{ \mathbf {x}\in \mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \in \left( \displaystyle \frac{L_I}{4}\delta , \varepsilon ^f \right] \right\} \end{aligned}$$

and, from Lemma 2, we have that a box \(B_{\delta _I}\) of width \(\delta _I\) with each \(\mathbf {x}\in B_{\delta _I}\) satisfying \(d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) > \displaystyle \frac{L_I}{4}\delta \) can be fathomed by infeasibility. The desired result then follows from Corollary 2.1 in [28]. \(\square \)

Remark 7

Under the assumptions of Lemma 11, the dependence of \(N_I\) on \(\varepsilon ^f\) disappears when the lower bounding scheme has first-order convergence on \(\mathscr {N}^1_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}}\), i.e., \(\beta ^I = 1\), and the dependence of \(N_f\) on \(\varepsilon ^o\) disappears when the scheme \((f^{cv }_{Z})_{Z \in \mathbb {I}X}\) has first-order convergence on X, i.e., \(\beta ^f = 1\). Therefore, the cluster problem on \(X_3\) can be eliminated even using first-order convergent schemes with sufficiently small prefactors. Note that the dependence of \(N_f\) and \(N_I\) on the prefactors \(\tau ^f\) and \(\tau ^I\), respectively, can be detailed in a manner similar to Table 1 in [29].

The following results illustrate one set of assumptions under which second-order convergence of the lower bounding scheme at infeasible points is sufficient to eliminate the cluster problem on \(X_3 \backslash B_{\delta }\). First, we provide a conservative estimate of the subset of \(X_3\) around a constrained minimizer \(\mathbf {x}^*\) under the following assumption: the infeasible region in some neighborhood of \(\mathbf {x}^*\) can be split into two subregions such that the objective function grows quadratically (or faster) in the first subregion and the measure of infeasibility grows quadratically (or faster) in the second subregion. Note that better estimates of \(X_3\) may be derived either under the (stronger) assumption that the objective function grows linearly in the directions \(\mathscr {D}_0 \cap \mathscr {D}_I\), or under the (stronger) assumption that the measure of infeasibility grows linearly in the directions \(\mathscr {D}_I \backslash \mathscr {D}_0\).

Lemma 12

Consider Problem (P). Suppose \(\mathbf {x}^*\) is a constrained minimizer, functions f, \(g_j\), \(\forall j \in \mathscr {A}(\mathbf {x}^*)\), and \(h_k\), \(\forall k \in \{1,\ldots ,m_E\}\), are twice-differentiable at \(\mathbf {x}^*\), and \(\exists \alpha> 0, \gamma _1> 0, \gamma _2 > 0\) and a set \(\mathscr {D}_0\) such that

$$\begin{aligned}&{\nabla f(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}+ \displaystyle \frac{1}{2} {\mathbf {d}}^\mathrm{T} \nabla ^2 f(\mathbf {x}^*) \mathbf {d}\ge \gamma _1 {\mathbf {d}}^\mathrm{T} \mathbf {d}, \quad \forall \mathbf {d}\in \mathscr {D}_0 \cap \mathscr {D}_I,\\&\quad \max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max } \left\{ {\nabla g_j(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}+ \displaystyle \frac{1}{2}{\mathbf {d}}^\mathrm{T} \nabla ^2 g_j(\mathbf {x}^*) \mathbf {d}\right\} , \right. \\&\left. \quad \underset{k \in \{1,\ldots ,m_E\}}{\max } \left\{ \left|{\nabla h_k(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}+ \displaystyle \frac{1}{2}{\mathbf {d}}^\mathrm{T} \nabla ^2 h_k(\mathbf {x}^*) \mathbf {d}\right|\right\} \right\} \ge \gamma _2 {\mathbf {d}}^\mathrm{T} \mathbf {d}, \quad \forall \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0, \end{aligned}$$

where \(\mathscr {D}_I\) is defined as

$$\begin{aligned} \mathscr {D}_I = \left\{ \mathbf {d} : (\mathbf {x}^*+\mathbf {d}) \in \mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} \right\} . \end{aligned}$$

Then, \(\exists \hat{\alpha } \in (0,\alpha ]\) such that the region

$$\begin{aligned} X^1_3 := \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_0 \cap \mathscr {D}_I \right\} \end{aligned}$$

can be conservatively approximated as

$$\begin{aligned} \hat{X}^1_{3} = \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : \gamma _1{||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon ^o \right\} , \end{aligned}$$

and the region

$$\begin{aligned} X^2_3 := \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0 \right\} \end{aligned}$$

can be conservatively approximated as

$$\begin{aligned} \hat{X}^2_{3} = \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : \gamma _2{||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon ^f \right\} . \end{aligned}$$

Furthermore, suppose \(\mathbf {x}^*\) is at the center of a box, \(B_{\delta }\), of width \(\delta = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}}\) placed while covering \(\hat{X}_5\). Then, the region

$$\begin{aligned} X^2_3 \backslash B_{\delta } = \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0 \right\} \backslash B_{\delta } \end{aligned}$$

is conservatively characterized by

$$\begin{aligned} \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \in \left( \displaystyle \frac{\gamma _2}{8}\delta ^2, \varepsilon ^f \right] \right\} , \end{aligned}$$

whenever \(\gamma _2\delta ^2 < 8\varepsilon ^f\).

Proof

From Lemma 8, we have the existence of \(\hat{\alpha }_0 > 0\) such that

$$\begin{aligned} \mathscr {N}^2_{\hat{\alpha }_0}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }_0}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_0 \cap \mathscr {D}_I \right\} \end{aligned}$$

can be conservatively approximated as \(\left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }_0}(\mathbf {x}^*) : \gamma _1{||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon ^o \right\} \).

Consider \(\mathbf {x}= \mathbf {x}^* + \mathbf {d}\in \mathscr {N}^2_{\alpha }(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}}\) with \(\mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0\). We have

$$\begin{aligned}&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max }\left\{ g_j(\mathbf {x}) \right\} , \underset{k \in \{1,\ldots ,m_E\}}{\max }\left\{ \left|h_k(\mathbf {x}) \right|\right\} \right\} \\ =&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max }\left\{ g_j(\mathbf {x}^* + \mathbf {d}) \right\} , \underset{k \in \{1,\ldots ,m_E\}}{\max }\left\{ \left|h_k(\mathbf {x}^* + \mathbf {d}) \right|\right\} \right\} \\ =&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max }\left\{ {\nabla g_j(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}+ \displaystyle \frac{1}{2}{\mathbf {d}}^\mathrm{T} \nabla ^2 g_j(\mathbf {x}^*) \mathbf {d}+ o({||}\mathbf {d}{||}^2) \right\} , \right. \\&\quad \quad \left. \underset{k \in \{1,\ldots ,m_E\}}{\max }\left\{ \left|{\nabla h_k(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}+ \displaystyle \frac{1}{2}{\mathbf {d}}^\mathrm{T} \nabla ^2 h_k(\mathbf {x}^*) \mathbf {d}+ o({||}\mathbf {d}{||}^2) \right|\right\} \right\} . \end{aligned}$$

Consequently, there exists \(\hat{\alpha }_1 \in (0,\alpha ]\) such that for all \(\mathbf {x}= \mathbf {x}^* + \mathbf {d}\in \left( \mathscr {F}({X})\right) ^{\text {C}}\) with \({||}\mathbf {d}{||} \in [0,\hat{\alpha }_1)\), \(\mathbf {d}\not \in \mathscr {D}_0\):

$$\begin{aligned} d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \ge&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max }\left\{ g_j(\mathbf {x}) \right\} , \underset{k \in \{1,\ldots ,m_E\}}{\max }\left\{ \left|h_k(\mathbf {x}) \right|\right\} \right\} \\ =&\max \left\{ \underset{j \in \mathscr {A}(\mathbf {x}^*)}{\max }\left\{ {\nabla g_j(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}+ \displaystyle \frac{1}{2}{\mathbf {d}}^\mathrm{T} \nabla ^2 g_j(\mathbf {x}^*) \mathbf {d}+ o({||}\mathbf {d}{||}^2) \right\} , \right. \\&\quad \quad \left. \underset{k \in \{1,\ldots ,m_E\}}{\max }\left\{ \left|{\nabla h_k(\mathbf {x}^*)}^\mathrm{T} \mathbf {d}+ \displaystyle \frac{1}{2}{\mathbf {d}}^\mathrm{T} \nabla ^2 h_k(\mathbf {x}^*) \mathbf {d}+ o({||}\mathbf {d}{||}^2) \right|\right\} \right\} \\ \ge&\displaystyle \frac{\gamma _2}{2} {||}\mathbf {d}{||}^2, \end{aligned}$$

where Step 1 follows from the fact that \({||}\mathbf {z}{||} \ge {||}\mathbf {z}{||}_{\infty }\), \(\forall \mathbf {z}\in \mathbb {R}^{m_I} \times \mathbb {R}^{m_E}\).

Choose \(\hat{\alpha } = \min \left\{ \hat{\alpha }_0, \hat{\alpha }_1 \right\} \). The region

$$\begin{aligned} X^1_3 := \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_0 \cap \mathscr {D}_I \right\} \end{aligned}$$

can be conservatively approximated as

$$\begin{aligned} \hat{X}^1_{3}&= \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : \gamma _1{||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon ^o \right\} , \end{aligned}$$

and

$$\begin{aligned} \forall \mathbf {x}\in X^2_3 := \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0 \right\} , \end{aligned}$$

we have \(\mathbf {x}= \mathbf {x}^* + \mathbf {d}\in \left( \mathscr {F}({X})\right) ^{\text {C}}\) with \(\mathbf {d}\not \in \mathscr {D}_0\), \({||}\mathbf {d}{||} < \hat{\alpha }\), and

$$\begin{aligned} \varepsilon ^f \ge d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \ge \displaystyle \frac{\gamma _2}{2}{||}\mathbf {x}- \mathbf {x}^*{||}^2 \implies \gamma _2{||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon ^f. \end{aligned}$$

Finally, for every \(\mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0 \right\} \) with \({||}\mathbf {d}{||} \le \displaystyle \frac{\delta }{2}\), we have \(\mathbf {x}\in B_{\delta }\). Consequently, for each

$$\begin{aligned} \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0 \right\} \backslash B_{\delta }, \end{aligned}$$

we have \({||}\mathbf {d}{||} > \displaystyle \frac{\delta }{2}\) and therefore,

$$\begin{aligned} d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) > \displaystyle \frac{\gamma _2}{8}\delta ^2. \end{aligned}$$

The desired result follows when \(\gamma _2 \delta ^2 < 8\varepsilon ^f\); otherwise, if \(\gamma _2 \delta ^2 \ge 8\varepsilon ^f\), then

$$\begin{aligned} \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0 \right\} \subset B_{\delta }. \end{aligned}$$

\(\square \)

A conservative estimate of the number of boxes of certain widths required to cover \(\left( \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3\right) \backslash B_{\delta }\) can be obtained by estimating the number of boxes of certain widths required to cover \(\hat{X}^1_3\) and \(\hat{X}^2_3 \backslash B_{\delta }\) (see Theorem 5). The following remark is in order.

Remark 8

  1. 1.

    Lemma 12 does not hold when \(\not \exists \alpha , \gamma _1, \gamma _2 > 0\), and \(\mathscr {D}_0\), for example \(X = (0,2) \times (0,2)\), \(m_I = 0\), \(m_E = 2\), \(f(\mathbf {x}) = -x_1\), \(h_1(\mathbf {x}) = x_2 - (1-x_1)^3\), \(h_2(\mathbf {x}) = -x_2 - (1-x_1)^3\), and \(\mathbf {x}^* = (1,0)\) (see [2, Example 4.3.5]). Note that \(\mathscr {D}_0\) may be chosen to be \(\emptyset \), but it cannot be chosen to be \(\mathscr {D}_I\) (see Remark 6 for an explanation).

  2. 2.

    The inequality \(\gamma _2 \delta ^2 < 8\varepsilon ^f\) is equivalent to

    $$\begin{aligned} \gamma _2 \delta ^2 = \gamma _2 \left( \displaystyle \frac{\varepsilon ^f}{\tau ^I}\right) ^{\frac{2}{\beta ^I}} < 8\varepsilon ^f. \end{aligned}$$

    Since \(\varepsilon ^f\) can be taken to be sufficiently-small, the above inequality holds only when

    $$\begin{aligned} (\varepsilon ^f)^{\frac{2}{\beta ^I}} \le \varepsilon ^f \iff \beta ^I \le 2, \end{aligned}$$

    i.e., if \(\beta ^I > 2\), we can choose \(\varepsilon ^f\) to be small-enough so that \(\gamma _2 \delta ^2 \ge 8\varepsilon ^f\). Note that if \(\gamma _2 \delta ^2 \ge 8\varepsilon ^f\), the region

    $$\begin{aligned} \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0 \right\} \end{aligned}$$

    has already been covered while covering \(\hat{X}_5\) since

    $$\begin{aligned} \displaystyle \frac{\gamma _2 \delta ^2}{8} \ge \varepsilon ^f \ge \displaystyle \frac{\gamma _2 {||}\mathbf {d}{||}^2}{2} \implies {||}\mathbf {d}{||} \le \displaystyle \frac{\delta }{2}, \end{aligned}$$

    which implies \(\mathbf {x}= \mathbf {x}^* + \mathbf {d}\in B_{\delta }\).

  3. 4.

    \(\hat{\alpha }\) depends on the local behavior of f, \(g_j\), \(\forall j \in \mathscr {A}(\mathbf {x}^*)\), and \(h_k\), \(\forall k \in \{1,\ldots ,m_E\}\), around \(\mathbf {x}^*\), but is independent of \(\varepsilon \). Consequently, for sufficiently small \(\varepsilon \) we have \(\hat{X}^1_{3} = \left\{ \mathbf {x}\in X : \gamma _1{||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon ^o \right\} \) and \(\hat{X}^2_{3} = \left\{ \mathbf {x}\in X : \gamma _2{||}\mathbf {x}- \mathbf {x}^*{||}^2 \le 2\varepsilon ^f \right\} \). Additionally, if the objective function and the active constraints are all either affine or quadratic functions of \(\mathbf {x}\), then their second-order Taylor expansions around \(\mathbf {x}^*\) equal themselves and we can choose \(\hat{\alpha } = \alpha \). Furthermore,

    $$\begin{aligned} X^1_3 := \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_0 \cap \mathscr {D}_I \right\} \end{aligned}$$

    can be conservatively approximated as \(\hat{X}^1_{3} = \left\{ \mathbf {x}\in X : \gamma _1{||}\mathbf {x}- \mathbf {x}^*{||}^2 \le \varepsilon ^o \right\} \), the region

    $$\begin{aligned} X^2_3 := \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0 \right\} \end{aligned}$$

    can be conservatively approximated as \(\hat{X}^2_{3} = \left\{ \mathbf {x}\in X : \gamma _2{||}\mathbf {x}- \mathbf {x}^*{||}^2 \le \varepsilon ^f \right\} \), and the region

    $$\begin{aligned} \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3 \cap \left\{ \mathbf {x}= (\mathbf {x}^* + \mathbf {d}) \in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}} : \mathbf {d}\in \mathscr {D}_I \backslash \mathscr {D}_0 \right\} \backslash B_{\delta } \end{aligned}$$

    is conservatively characterized by

    $$\begin{aligned} \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \in \left( \displaystyle \frac{\gamma _2}{4}\delta ^2, \varepsilon ^f \right] \right\} \end{aligned}$$

    whenever \(\gamma _2 \delta ^2 \ge 4\varepsilon ^f\).

  4. 5.

    Similar to Proposition 1, the following less conservative estimates of \(X^1_3\) and \(X^2_3\) can be obtained:

To illustrate the application of Lemma 12, let us reconsider Example 4 with \(\varepsilon ^o, \varepsilon ^f \le 1\). Recall that \(X = (-2,2)\), \(m_I = 3\), \(m_E = 0\), \(f(x) = x\), \(g_1(x) = x^2\), \(g_2(x) = x-1\), and \(g_3(x) = -1-x\) with \(x^* = 0\). We have \(\mathscr {F}(X) = \{ 0 \}\) and \(X_3 = [-\sqrt{\varepsilon ^f}, 0) \cup \left( 0, \min \{ \varepsilon ^o, \sqrt{\varepsilon ^f} \} \right] \). Choose \(\alpha = 1\). We have \(\mathscr {D}_I = (-1,1) \backslash \{0\}\). Choose \(\mathscr {D}_0 = \left\{ d \in \mathscr {D}_I : d > 0 \right\} \), \(\gamma _1 = 1\), \(\gamma _2 = 1\), and \(\hat{\alpha } = 1\) in Lemma 12. From Lemma 12 and Remark 8, we have \(\hat{X}^1_{3} = \left\{ x : x^2 \le \varepsilon ^o \right\} \) and \(\hat{X}^2_{3} = \left\{ x : x^2 \le \varepsilon ^f \right\} \) (since f is linear and \(g_1\) is quadratic). In fact, for this example, we can get a better estimate of \(X^1_3\) by taking into account the fact that f grows linearly on \(\mathscr {D}_0 \cap \mathscr {D}_I\).

Next, we revisit two examples from Sect. 3.1 for which the assumptions of Lemma 12 hold. First, consider Example 9 with \(\varepsilon ^o \le 0.6\), \(\varepsilon ^f \le 0.5\), and recall that \(X = (-2,2) \times (-2,2)\), \(m_I = 2\), and \(m_E = 0\) with \(f(\mathbf {x}) = x_2\), \(g_1(\mathbf {x}) = x^2_1 - x_2\), \(g_2(\mathbf {x}) = x_2 - 1\), and \(\mathbf {x}^* = (0,0)\). We have \(\mathscr {F}(X) = \left\{ \mathbf {x} : x_2 \ge x^2_1, \, x_2 \le 1 \right\} \) and \(X_3 = \left\{ \mathbf {x} : x^2_1 - \varepsilon ^f \le x_2 < x^2_1, x_2 \le \varepsilon ^o \right\} \). Choose \(\alpha = 1\). We have \(\mathscr {D}_I = \left\{ \mathbf {d}\in \mathscr {N}^2_1(\mathbf {0}) : d_2 < d^2_1 \right\} \). Choose \(\mathscr {D}_0 = \left\{ \mathbf {d}\in \mathscr {D}_I : d_2 \ge 0.5 d^2_1 \right\} \), \(\gamma _1 = 0.3\), \(\gamma _2 = 0.25\) and \(\hat{\alpha } = 1\) in Lemma 12. From Lemma 12 and Remark 8, we have \(\hat{X}^1_{3} = \left\{ \mathbf {x}\in \mathscr {N}^2_1(\mathbf {x}^*) : 0.3{||}\mathbf {x}{||}^2 \le \varepsilon ^o \right\} \) and \(\hat{X}^2_{3} = \left\{ \mathbf {x}\in \mathscr {N}^2_1(\mathbf {x}^*) : {||}\mathbf {x}{||}^2 \le 4\varepsilon ^f \right\} \) (since f is linear and \(g_1\) is quadratic).

Finally, consider Example 10 with \(\varepsilon ^o, \varepsilon ^f \le 0.1\), and recall that \(X = (-2,2) \times (-2,2)\), \(m_I = 3\), and \(m_E = 0\) with \(f(\mathbf {x}) = 2x^2_1 + x_2\), \(g_1(\mathbf {x}) = -x^2_1 - x_2\), \(g_2(\mathbf {x}) = -x_1\), \(g_3(\mathbf {x}) = x^2_1 + x^2_2 - 1\), and \(\mathbf {x}^* = (0,0)\). We have \(\mathscr {F}(X) = \left\{ \mathbf {x} : x_2 \ge -x^2_1, \, x_1 \ge 0, \, x^2_1 + x^2_2 \le 1 \right\} \) and

$$\begin{aligned} X_3= & {} \left\{ \mathbf {x}\in X:\displaystyle \sqrt{ \left( \max \{0, -x^2_1 -x_2 \}\right) ^2 + \left( \max \{0, -x_1 \}\right) ^2 + \left( \max \{0, x^2_1 + x^2_2 - 1 \}\right) ^2 } \right. \\&\qquad \quad \qquad \!\!\left. \in (0,\varepsilon ^f], 2x^2_1 + x_2 \le \varepsilon ^o\phantom {\sqrt{ \left( \max \{0, -x^2_1 -x_2 \}\right) ^2 + \left( \max \{0, -x_1 \}\right) ^2 + \left( \max \{0, x^2_1 + x^2_2 - 1 \}\right) ^2 }}\right\} . \end{aligned}$$

Choose \(\alpha = \frac{2}{3}\). We have \(\mathscr {D}_I = \left\{ \mathbf {d}\in \mathscr {N}^2_{\frac{2}{3}}(\mathbf {0}) : (\mathbf {x}^* + \mathbf {d}) \in \left( \mathscr {F}({X})\right) ^{\text {C}} \right\} \). Choose \(\mathscr {D}_0 = \left\{ \mathbf {d}\in \mathscr {D}_I : d_2 \ge -1.5 d^2_1 \right\} \), \(\gamma _1 = 0.25\), \(\gamma _2 = 0.25\) and \(\hat{\alpha } = \frac{2}{3}\) in Lemma 12. From Lemma 12 and Remark 8, we have \(\hat{X}^1_{3} = \left\{ \mathbf {x} : {||}\mathbf {x}{||}^2 \le 4\varepsilon ^o \right\} \) and \(\hat{X}^2_{3} = \left\{ \mathbf {x} : {||}\mathbf {x}{||}^2 \le 4\varepsilon ^f \right\} \) (since f and \(g_2\) are quadratic, and \(g_1\) is linear). Figure 6 plots the sets \(X^1_3\) and \(X^2_3\) along with their estimates \(\hat{X}^1_3\) and \(\hat{X}^2_3\) for \(\varepsilon ^o = \varepsilon ^f = 0.1\). The benefit of using the estimates in Remark 8 over that of Lemma 12 is seen from Fig. 6.

Fig. 6
figure 6

Plots of \(X^1_3\) and \(X^2_3\) (solid regions) and their estimates \(\hat{X}^1_3\) and \(\hat{X}^2_3\) (area between the dotted lines) for Example 10. The filled-in triangles correspond to the minimizer \(\mathbf {x}^*\), and the dash-dotted lines represent the axes translated to \(\mathbf {x}^*\). All plots use \(\varepsilon ^o, \varepsilon ^f = 0.1\). a \(X^1_3\) and estimate \(\hat{X}^1_3\) from Lemma 12, b \(X^1_3\) and estimate \(\hat{X}^1_3\) from Remark 8, c \(X^2_3\) and estimate \(\hat{X}^2_3\) from Lemma 12, d \(X^2_3\) and estimate \(\hat{X}^2_3\) from Remark 8

The following result follows from Lemma 3 in [29]. It provides a conservative estimate of the number of boxes of certain widths required to cover \(\hat{X}^1_3\) and \(\hat{X}^2_3 \backslash B_{\delta }\) from Lemma 12. Therefore, from Lemma 2 and 3 and the result below, we can get an upper bound on the worst-case number of boxes required to cover \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap X_3\) and estimate the extent of the cluster problem on that region.

Theorem 5

Suppose the assumptions of Lemma 12 hold. Let \(\delta = \left( \displaystyle \frac{\varepsilon }{\tau ^*}\right) ^{\frac{1}{\beta ^*}} = \delta _f = \left( \displaystyle \frac{\varepsilon ^o}{\tau ^f}\right) ^{\frac{1}{\beta ^f}} = \left( \displaystyle \frac{\varepsilon ^f}{\tau ^I}\right) ^{\frac{1}{\beta ^I}}\), \(\delta _I = \left( \displaystyle \frac{\gamma _2\delta ^2}{8\tau ^I}\right) ^{\frac{1}{\beta ^I}} = \left( \displaystyle \frac{\gamma _2}{8\tau ^I}\right) ^{\frac{1}{\beta ^I}} \left( \displaystyle \frac{\varepsilon ^f}{\tau ^I}\right) ^{\frac{2}{(\beta ^I)^2}}\), \(r_I = \displaystyle \sqrt{\displaystyle \frac{2\varepsilon ^f}{\gamma _2}}\), \(r_f = \displaystyle \sqrt{\displaystyle \frac{2\varepsilon ^o}{\gamma _1}}\).

  1. 1.

    If \(\delta _I \ge 2r_I\), let \(N_I = 1\).

  2. 2.

    If \(\displaystyle \frac{2r_I}{\sqrt{\bar{m}_I-1}} > \delta _I \ge \displaystyle \frac{2r_I}{\sqrt{\bar{m}_I}}\) for some \(\bar{m}_I \in \mathbb {N}\) with \(\bar{m}_I \le n_x\) and \(2 \le \bar{m}_I \le 18\), then let

    $$\begin{aligned} N_I = \displaystyle \sum _{i=0}^{\bar{m}_I-1}{2^i \left( {\begin{array}{c}n_x\\ i\end{array}}\right) } + 2n_x{\lceil }\displaystyle \frac{\bar{m}_I-9}{9}{\rceil }. \end{aligned}$$
  3. 3.

    Otherwise, let

    $$\begin{aligned} N_I= & {} {\lceil }2B_I(\varepsilon ^f;\beta ^I,\gamma _2,\tau _I){\rceil }^{n_x-1}\left( {\lceil }2B_I(\varepsilon ^f;\beta ^I,\gamma _2,\tau _I){\rceil } + 2n_x {\lceil }(\sqrt{2}-1) B_I(\varepsilon ^f;\beta ^I,\gamma _2,\tau _I){\rceil } \right) , \end{aligned}$$

    where

    $$\begin{aligned} B_I(\varepsilon ^f;\beta ^I,\gamma _2,\tau _I) := 8^{\frac{1}{\beta ^I}} \left( \tau ^I \right) ^{\left( \frac{1}{\beta ^I}+\frac{2}{\left( \beta ^I\right) ^2}\right) } \left( \varepsilon ^f\right) ^{\left( \frac{1}{2} - \frac{2}{\left( \beta ^I\right) ^2}\right) } \gamma ^{-\left( \frac{1}{2} + \frac{1}{\beta ^I}\right) }_2. \end{aligned}$$
  4. 4.

    If \(\delta _f \ge 2r_f\), let \(N_f = 1\).

  5. 5.

    If \(\displaystyle \frac{2r_f}{\sqrt{m_f-1}} > \delta _f \ge \displaystyle \frac{2r_f}{\sqrt{m_f}}\) for some \(m_f \in \mathbb {N}\) with \(m_f \le n_x\) and \(2 \le m_f \le 18\), then let

    $$\begin{aligned} N_f = \displaystyle \sum _{i=0}^{m_f-1}{2^i \left( {\begin{array}{c}n_x\\ i\end{array}}\right) } + 2n_x{\lceil }\displaystyle \frac{m_f-9}{9}{\rceil }. \end{aligned}$$
  6. 6.

    Otherwise, let

    $$\begin{aligned} N_f =&{\lceil }2\left( \tau ^f \right) ^{\frac{1}{\beta ^f}} \left( \varepsilon ^o\right) ^{\left( \frac{1}{2} - \frac{1}{\beta ^f} \right) } \gamma ^{-\frac{1}{2}}_1{\rceil }^{n_x-1} \left( {\lceil }2\left( \tau ^f \right) ^{\frac{1}{\beta ^f}} \left( \varepsilon ^o\right) ^{\left( \frac{1}{2} - \frac{1}{\beta ^f} \right) } \gamma ^{-\frac{1}{2}}_1{\rceil }+ \right. \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \left. 2n_x {\lceil }(\sqrt{2}-1)\left( \tau ^f \right) ^{\frac{1}{\beta ^f}} \left( \varepsilon ^o\right) ^{\left( \frac{1}{2} - \frac{1}{\beta ^f} \right) } \gamma ^{-\frac{1}{2}}_1{\rceil } \right) . \end{aligned}$$

Then, \(N_I\) is an upper bound on the number of boxes of width \(\delta _I\) required to cover \(\hat{X}^2_{3} \backslash B_{\delta }\), and \(N_f\) is an upper bound on the number of boxes of width \(\delta _f\) required to cover \(\hat{X}^1_{3}\).

Proof

The result on \(N_f\) follows from Lemma 3 and 12, and Lemma 3 in [29]. To deduce the result on \(N_I\), note that we cover \(\hat{X}^2_{3} \backslash B_{\delta }\) with boxes of width \(\delta _I = \left( \displaystyle \frac{\gamma _2\delta ^2}{8\tau ^I}\right) ^{\frac{1}{\beta ^I}}\) since, from Lemma 12, we have

$$\begin{aligned} \hat{X}^2_{3} \backslash B_{\delta } \subset \left\{ \mathbf {x}\in \mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) : d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) \in \left( \displaystyle \frac{\gamma _2}{8}\delta ^2, \varepsilon ^f \right] \right\} \end{aligned}$$

and, from Lemma 2, we have that a box \(B_{\delta _I}\) of width \(\delta _I\) with each \(\mathbf {x}\in B_{\delta _I}\) satisfying \(d\left( \begin{bmatrix} \mathbf {g}\\ \mathbf {h}\end{bmatrix}(\mathbf {x}) ,\mathbb {R}^{m_I}_{-} \times \{\mathbf {0}\} \right) > \displaystyle \frac{\gamma _2}{8}\delta ^2\) can be fathomed by infeasibility. The desired result then follows from Lemma 3 in [29]. \(\square \)

Remark 9

  1. 1.

    Under the assumptions of Lemma 12, the dependence of \(N_I\) on \(\varepsilon ^f\) disappears when the lower bounding scheme has second-order convergence on \(\mathscr {N}^2_{\hat{\alpha }}(\mathbf {x}^*) \cap \left( \mathscr {F}({X})\right) ^{\text {C}}\), i.e., \(\beta ^I = 2\), and the dependence of \(N_f\) on \(\varepsilon ^o\) disappears when the scheme \((f^{cv }_{Z})_{Z \in \mathbb {I}X}\) has second-order convergence on X, i.e., \(\beta ^f = 2\). Therefore, the cluster problem on \(X_3\) can be eliminated using second-order convergent schemes with sufficiently small prefactors.

  2. 2.

    The dependence of \(N_I\) on \(\varepsilon ^f\) for \(\beta ^I = 1\), i.e., \(N_I \propto \left( \varepsilon ^f\right) ^{-1.5n_x}\), scales worse than the corresponding dependence of N on \(\varepsilon \) for \(\beta ^* = 1\) when second-order convergence on \(X_5\) is required to mitigate clustering, i.e., \(N \propto \varepsilon ^{-0.5n_x}\) (see Theorem 3). Note, however, that this worse scaling may be an artifact of the conservative requirement that all of \(\hat{X}^2_3 \backslash B_{\delta }\) has to be covered using boxes of size \(\delta _I\) instead of simply requiring that the subset of \(\hat{X}^2_3\) that is not fathomed by value dominance (the rest of \(\hat{X}^2_3\), including \(B_{\delta }\), would have already been accounted for while covering \(\hat{X}_5\) and \(\hat{X}^1_3\)) be covered using boxes of appropriate size.

  3. 3.

    Similar to Lemma 10, less conservative estimates (with respect to the dependence on \(\varepsilon ^o\) and \(\varepsilon ^f\)) may be obtained for \(X^1_3\) and \(X^2_3\) by taking into account the fact that the objective function and the measure of infeasibility grow linearly in certain directions.

Remark 10

The main assumptions of Lemma 5 and 11, which assume that the objective function and the measure of infeasibility grow linearly on certain regions in some neighborhood of \(\mathbf {x}^*\), are similar to the linear growth condition in [12], and the main assumptions of Lemma 8 and 12, which assume that the objective function and the measure of infeasibility grow quadratically on certain regions in some neighborhood of \(\mathbf {x}^*\), are similar to the quadratic growth condition in [5, 12]. Furthermore, the assumptions of Lemma 5811, and 12 may be weakened based on the linear and quadratic growth conditions in [5, 12] to account for cases in which \(\mathbf {x}^*\) is not a strict local minimum.

4 Conclusion

This work provides an analysis of the cluster problem for constrained problems. The analysis indicates different scaling of the number of boxes required to cover regions close to a global minimizer based on the convergence order and corresponding prefactor of the lower bounding scheme on nearly-optimal and nearly-feasible regions in the vicinity of the global minimizer.

It is shown that lower bounding schemes with first-order convergence may eliminate the cluster problem at a constrained minimizer if: i. the objective function grows linearly in directions leading to feasible points in some neighborhood of the minimizer, ii. either the objective function, or a measure of constraint violation grows linearly in directions leading to infeasible points in some neighborhood of the minimizer, and iii. the corresponding convergence order prefactors are sufficiently-small. This is shown to be possible because nodes containing nearly-optimal and nearly-feasible points may be fathomed relatively easily, by value dominance or by infeasibility, even using first-order convergent lower bounding schemes when the objective function or the measure of constraint violation grows linearly in directions around the minimizer. The above result is in contrast to the case of unconstrained minimization where at least second-order convergence is required to eliminate the cluster problem at a point of differentiability of the objective function. When the objective function is twice-differentiable at an unconstrained minimizer, this is a consequence of the fact that the objective function grows quadratically or slower around the minimizer.

It is also shown that at least second-order convergence is required to mitigate the cluster problem at a nonisolated constrained minimizer that satisfies certain regularity conditions when the problem is purely equality-constrained. Conditions under which second-order convergence of the lower bounding scheme is sufficient to mitigate clustering are also presented. This analysis reduces to previous analyses for unconstrained problems under suitable assumptions.