1 Introduction

In this paper, we consider the following nonsmooth, nonconvex optimization problem

$$ \left\{ \begin{aligned} \underset{x\in X}{\min} & f(x) \\ \text{s.t.} & g(x,t)\le 0, t \in T, \end{aligned}\right. $$
(1)

where \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is locally Lipschitz but potentially nonsmooth and nonconvex, \(g:\mathbb {R}^{n}\times T\rightarrow {\mathbb {R}}\) is twice continuously differentiable on X × T. In addition, T is a fixed, nonempty compact subset of \(\mathbb {R}^{p}\) and X is a convex compact subset of \(\mathbb {R}^{n}\). This problem is called the semi-infinite programming (SIP) problem. For simplicity of notation, we consider the semi-infinite problem with one semi-infinite constraint and a one-dimensional index set T = [a,b], that is p = 1. However, the algorithm proposed in this paper might be helpful to deal with more general cases.

SIP has recently drawn a lot of attention due to its application in many different fields, including portfolio problem, robot trajectory planning, vibrating membrane problem, minimal cost control of air pollution, and optimization over probability measures (see [11, 15, 22]).

There exist a wide range of numerical methods for SIP problems. Traditional methods for SIP like discretization, exchange, and reduction-based [14, 21, 32, 39, 44] solve a sequence of finitely nonlinear programming subproblems, that is, subproblems with finitely many constraints chosen from the original constraints. In particular, in the reduction-based methods, the finite subset of constraints is generated from finding implicitly all the local maxima t(x) of g(x,⋅) on T for each xX. Based on this reduction principle, a number of nonsmooth approaches have been presented [10, 24, 31]. These traditional methods suffer from the major drawback that they generally do not provide a feasible point in a finite number of iterations.

A breakthrough for SIP was achieved by Bhattacharjee et al. [4], who replaced the semi-infinite constraint by an overestimation obtained via interval extensions and presented a convergent upper bounding approach to SIP. It is the first algorithm with feasible iterates for SIP, to our knowledge. Based on this upper bounding approach, a number of algorithms for finding the global solution of SIP problems have been presented [5, 8, 27, 29, 30]. Under the assumptions that the functions f and g(⋅,t) are continuously differentiable on X for each tT, g(x,⋅) is continuous on T for each xX, there exist SIP Slater points arbitrarily close to all global minima, and the resulting finite nonlinear programs (NLPs) are solved to local optimality; the method using interval extensions with feasible iterates generates an 𝜖-optimal estimate of the global solution of the SIP on finite termination [5]. The main drawback of this method is the rapid increase in the size of the lower and upper bounding problems as nodes of increasing depth are visited by the branch-and-bound procedure. Other methods to generate the global solution of the SIP problems were proposed by Mitsos and coworkers [8, 27, 30]. The upper bounding problems of these methods are not generated by interval extensions, but rather by a restriction of the constraints right-hand side by a negative value (< −𝜖g, for some 𝜖g > 0). For fixed point x, these methods use three subproblems, including the lower bounding problem, the upper bounding problem, and the lower level problem. Under the assumptions that f is continuous on X, g is continuous on X × T, there exists an 𝜖f-optimal SIP Slater point, and the resulting finite NLPs are solved to global optimality, the algorithms terminate finitely with a feasible point and solve the SIP approximately to global optimality. We point out that the authors in [8, 27, 29, 30] consider global solution of all subproblems, which is computationally expensive.

In contrast to these methods, the adaptive convexification method discussed in [9] does not focus on the global solution of the SIP problem, but global solutions of the lower level problem. The authors construct convex relaxations of the lower level problem using the α BB method [1, 2] of global optimization, replace the convex lower level problems with their Karush-Kuhn-Tucker (KKT) conditions, and solve the resulting mathematical programs with complementarity constraints. Under the assumptions that \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) and \(g:\mathbb {R}^{n}\times \mathbb {R}^{p}\rightarrow \mathbb {R}^{m}\) are twice continuously differentiable, the algorithm generates a feasible point and reaches a 𝜖-stationary point of the original SIP problem on finite termination. Such method requires slightly stronger assumption on the constraints, but can give tighter bounds than interval extensions. Based on the method discussed in [9], a number of convexification algorithms have been presented [36, 38, 41]. We point out that the method discussed in [41] relies on constructing concave relaxations of the lower level problem and solving the resulting approximate problems with finitely many constraints.

Bundle methods are among the most efficient methods for solving nonsmooth optimization problems. See [18, 20, 25, 35] for a more detailed discussion and comparison of some bundle methods for nonsmooth convex constrained optimization. When the objective function is nonconvex, the recent results [12, 13] make assumptions on the structure of the objective function and propose a redistributed bundle method for unconstrained nonsmooth, nonconvex optimization problems. Based on this redistributed method, many nonconvex bundle approaches have been suggested. The authors [24, 43] propose proximal bundle methods for constrained nonsmooth, nonconvex optimization problems, where the objective and constraint functions are lower-C2. Lv et al. [23] propose an inexact proximal bundle method for constrained nonsmooth, nonconvex optimization problems, where the objective and constraint functions are lower-C1. Hoseini and Nobakhtian [16, 17] propose bundle methods for constrained nonsmooth nonconvex optimization problems, where the objective and constraint functions are regular. The methods presented in [23, 24, 31] can be used to solve the SIP problem, but the calculation of the lower level problem in these methods is costly since the lower level problem needs to be solved within a given fixed accuracy at each point. Moreover on finite termination, these methods are not guaranteed to furnish a feasible point.

In this paper, our aim is to design a feasible algorithm for nonsmooth, nonconvex SIP that combines some of the ideas of the proximal bundle methods and of the convexification methods. The main idea is to generate upper bounding problems for the SIP by replacing the lower level problem with its concave relaxations using the α BB method, and solve the upper bounding problems by a new feasible proximal bundle method. The relaxed lower level problem is a finite nonsmooth problem. Under certain assumptions given in Theorem 6 below, it can be shown that every accumulation point of the iterative sequence is an 𝜖-stationary point of the original SIP problem. Under slightly stronger assumptions given in Theorem 7 below, every accumulation point of the iterative sequence is a local solution of the original SIP problem. These assumptions appeared in nonsmooth, nonconvex bundle methods cited above. Compared to the related works [23, 24, 31], our algorithm starts with a coarse subset of T and is advantageous to save computational time on constraint calculations. Moreover, we adopt an aggregation technique to control the size of subproblems flexibly. In particular, our algorithm can obtain a feasible point in finite iterations. Compared to [9, 36, 38, 41], the assumptions made on the structure of the objective function in this paper are weaker. We point out that the subproblems of our algorithm are solved by existing QP solver and do not depend on the number of the constraints of the upper bounding problem, whereas the subproblems of algorithm from [9, 36, 38, 41] are solved by the existing NLP solver and depend on the number of the constraints of the upper bounding problem. Thus, our algorithm is, naturally, significantly less time-consuming when the number of subintervals/nodes increases. The results of numerical experiment also show the good performance of the new method.

The remainder of this paper is organized as follows. The main concepts used throughout the paper are described in the next section, where we also state the methods to construct upper bounding functions of the constraints. Section 3 presents our feasible proximal bundle algorithm with convexification. Convergence of the proximal bundle method with convexification is addressed in Section 4. Section 5 presents some numerical results and Section 6 gives conclusions.

Our notation is fairly standard. The Euclidean inner product of two vectors \(x,y\in \mathbb {R}^{n}\) is denoted by 〈x,y〉 := xTy, and the associated norm by ∥⋅∥. The positive-part function is denoted by \(x^{+} := \max \limits \{x, 0\}\). For a set \(X\in \mathbb {R}^{n}\), convX denotes its convex hull. The closed ball with center \(x\in \mathbb {R}^{n}\) and radius r > 0 is denoted by B(x;r). That is, \({B}(x; r):=\{y\in \mathbb {R}^{n} | \|y-x\|\leq r\}\).

2 Preliminaries

In this section, we first recall some concepts and results related to nonsmooth analysis. Next, we give the necessary condition of the optimization measure. Then, we introduce the convexification method to construct the concave relaxation of the lower level problem.

2.1 Background

Definition 1

(Lipschitz continuity) A function \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is Lipschitz continuous on a set \(X\subset \mathbb {R}^{n}\) if there exists L > 0 such that

$$ |f(y)-f(z)|\leq L\|y-z\| \text{for all} y,z\in X. $$

Then, L is called a Lipschitz constant for f on X.

Definition 2

The directional derivative of f at x in the direction \(d\in \mathbb {R}^{n}\) is defined by

$$ f^{\prime}(x;d):=\underset{h\rightarrow 0}{\lim}\frac{f(x+hd)-f(x)}{h}. $$

Let \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) be a locally Lipschitz continuous function at a point \(x\in \mathbb {R}^{n}\). Then, the generalized directional derivative (Clarke) of f at x in the direction of \(d\in \mathbb {R}^{n}\) is defined by

$$ f^{\circ}(x;d):=\underset{\begin{array}{cc}y\rightarrow x \\ h\downarrow 0 \end{array}}{\limsup} \frac{f(y+hd)-f(y)}{h}. $$

Definition 3

A function \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is called regular at \(x\in \mathbb {R}^{n}\) if it is locally Lipschitz continuous at x and for all \(d\in \mathbb {R}^{n}\) the classical directional derivative \(f^{\prime }(x;d)\) exists and we have \(f^{\prime }(x;d)=f^{\circ }(x;d)\).

Definition 4

A function \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is called semismooth at \(x\in \mathbb {R}^{n}\) if f is Lipschitz on a ball about x and for each \(d\in \mathbb {R}^{n}\) and for any sequences \(\{h_{k}\}\subset \mathbb {R}^{+}\), HCode \(\{\theta _{k}\}\subset \mathbb {R}^{n}\) and \(\{\xi _{k}\}\subset \mathbb {R}^{n}\) such that {hk} 0, HCode \(\{\theta _{k}/ h_{k}\}\rightarrow 0\in \mathbb {R}^{n}\) and ξkf(x + hkd + 𝜃k), the sequence {〈ξk,d〉} has exactly one accumulation point.

Definition 5

A locally Lipschitz function \(f:O\rightarrow \mathbb {R}\), where O is an open subset of \(\mathbb {R}^{n}\), is called lower-C1 (lower-C2) on O, if on some neighborhood V of each x0O there is a representation \(f(x)=\max \limits _{\omega \in {{\varOmega }}}f_{\omega }(x)\) in which the functions fω are of class (twice) continuously differentiable on V and the index set Ω is a compact space such that fω(x) and ∇fω(x) (∇2fω(x)) depend continuously not just on xV but jointly on (ω,x) ∈Ω× V.

Definition 6

Let \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) be a locally Lipschitz continuous function at a point \(x\in \mathbb {R}^{n}\). Then, the subdifferential of f at x is defined by

$$ \partial f(x):=\{\xi\in\mathbb{R}^{n} | f^{\circ}(x;d)\geq\xi^{T}d \text{for all} d\in\mathbb{R}^{n}\}. $$

Each element ξf(x) is called a subgradient of f at x. If \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a convex function, then the subdifferential of f at \(x\in \mathbb {R}^{n}\) is defined by

$$ \partial f(x):=\{ \xi | f(y)\geq f(x)+\langle\xi,y-x\rangle \text{for all} y\in\mathbb{R}^{n} \}. $$

Theorem 1

[3, Theorem 3.3] Let \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) be a locally Lipschitz continuous function at \(x\in \mathbb {R}^{n}\) with a Lipschitz constant L. Then, the subdifferential f(x) is a nonempty, convex, and compact set such that

$$ \partial f(x)\subseteq {{B}(0;L)}. $$

Theorem 2

[3, Theorem 3.23] Let \(f_{i}:\mathbb {R}^{n}\rightarrow \mathbb {R}\) be locally Lipschitz continuous at x for all i = 1,...,m. Then, the function

$$ f(x)=\max\{f_{i}(x) | i=1,...,m\} $$

is locally Lipschitz continuous at x and

$$ \partial f(x)\subseteq\text{conv} \{\partial f_{i}(x) | i\in\mathcal{I}(x)\}, $$
(2)

where

$$ \mathcal{I}(x):=\{i\in\{1,...,m\} | f_{i}(x)=f(x)\}. $$

In addition, if fi is regular at x for all i = 1,...,m, then f is also regular at x and equality holds in (2).

Now, we introduce the indicator function IX of a set \(X\subset \mathbb {R}^{n}\) as follows:

$$ I_{X}(x)=\left\{ \begin{aligned} 0 \ \ \ & \ \text{if}\ x\in X, \\ \infty\ \ & \ \text{if}\ x\notin X. \end{aligned}\right. $$

The indicator function IX of a set \(X\subset \mathbb {R}^{n}\) is convex if and only if X is convex. Let the point x belong to X, then

$$ \partial I_{X}(x)=N_{X}(x):=\{v\in\mathbb{R}^{n} | \langle v,y-x\rangle\leq0,\ \text{for all}\ y\in X\}. $$

From [26, Proposition 4] and [3, Theorem 3.13], we have the following:

Theorem 3

If \(f:X\rightarrow \mathbb {R}\) is continuously differentiable then f is semismooth and regular on X.

From [37, Proposition 2.4] and [7, Theorem 2], we have the following:

Proposition 1

Given an open set O containing X. The following statements are equivalent:

1.:

f is semismooth and regular on O.

2.:

For all xO, for all 𝜖 > 0, there exists ρ > 0 such that for all yB(x;ρ) and ξf(y)

$$ f(z)-f(y)\geq\langle\xi ,z-y\rangle-\epsilon\|z-y\|,\qquad z\in B(x;\rho). $$
3.:

f is lower-C1 on O.

Now we return to our basic problem (1), and give the following assumption.

Assumption 1

The Slater constraint qualification holds for problem (1), i.e., there exists \(\hat {x}\in \mathbb {R}^{n}\) such that \(g(\hat {x},t)<0\) for all tT.

Assumption 1 was used in [41]. In this paper, we always assume that the Slater constraint qualification holds.

2.2 Necessary conditions

To solve problem (1), we introduce the improvement function

$$ H(y,x):=\max\left\{f(y)-f(x) ,G(y)\right\},\ \ y\in\mathbb{R}^{n}, $$
(3)

where

$$ G(y)=\underset{t\in T}{\max} g(y,t). $$

Now, the subdifferential of G at a point x is defined by [6]

$$ \partial G(x)=\text{conv}\left\{\nabla_{x} g(x,t) | t\in A(x)\right\}, $$
(4)

where A(x) = {tT|G(x) = g(x,t)}.

Definition 7

[42](EMFCQ) A feasible point x is said to satisfy the extended Mangasarian-Fromovitz constraint qualification for SIP problem (1) if there exists a feasible direction d of X at x such that

$$ \nabla_{x} g(x,t)^{T} d<0,\ \text{for all}\ \ t\in T_{act}(x), $$
(5)

where Tact(x) = {tT|g(x,t) = 0}.

Theorem 4

Let x be a local minimizer of (1). Then the following statements hold:

(a):

0 ∈ H(x,x) + IX(x).

(b):

There exist nonnegative multipliers λi, i = 0,...,n + 1, and indices tiA(x),i = 1,...,n + 1 such that \({{\sum }_{i=0}^{n+1}} \lambda _{i}=1\), λig(x,ti) = 0,i = 1,...,n + 1 and

$$ 0 \in\lambda_{0}\partial f(x^{*})+\sum\limits_{i=1}^{n+1}\lambda_{i}\nabla_{x} g(x^{*},t_{i})+\partial I_{X}(x^{*}). $$
(c):

If EMFCQ holds at x for problem (1), then there exist nonnegative multipliers λi and indices tiTact(x),i = 1,...,n + 1 such that

$$ 0 \in \partial f(x^{*})+\sum\limits_{i=1}^{n+1}\lambda_{i}\nabla_{x} g(x^{*},t_{i})+\partial I_{X}(x^{*}). $$

Proof

  1. (a)

    Let F be the feasible region of the problem (1), namely, F = {xX|G(x) ≤ 0}. Since \(x^{*}\in \mathbb {R}^{n}\) is a local optimum of the problem (1), then G(x) ≤ 0 and there exists a neighborhood U of x such that f(x) ≤ f(x) for all xUF. Then, for all \(x\in U\cap F\subseteq X\)

    $$ H(x,x^{*})=\max\{f(x)-f(x^{*}) , G(x)\}\geq f(x)-f(x^{*})\geq 0=H(x^{*},x^{*}). $$

    This means that x is a local minimum of H(⋅,x) on X, which implies 0 ∈ H(x,x) + IX(x).

  2. (b)

    If G(x) < 0, we have \(\partial H(x^{*},x^{*})\subseteq \partial f(x^{*})\). Then, the assertion of the theorem is proved by choosing λ0 = 1 and λi = 0 for i = 1,...,n + 1. If G(x) = 0, we have \(\partial H(x^{*},x^{*})\subseteq \text {conv}\{\partial f(x^{*})\cup \partial G(x^{*})\}\), where G(x) = conv{∇g(x,ti),tiA(x)}. Then due to the definition of convex hull and the Caratheodory’s Theorem, there exist λi ≥ 0, \(i=0,1,\dots ,n+1\), such that

    $$ 0 \in\lambda_{0}\partial f(x^{*})+\sum\limits_{i=1}^{n+1}\lambda_{i}\nabla_{x} g(x^{*},t_{i})+\partial I_{X}(x^{*}). $$
  3. (c)

    Suppose that λ0 = 0, then we deduce that G(x) = 0 and A(x) = Tact(x). Moreover, \(-{\sum }_{i=1}^{n+1}\lambda _{i}\nabla _{x} g(x^{*},t_{i})\in \partial I_{X}(x^{*})\). For any feasible direction d of X at x, we have

    $$ \left\langle-\sum\limits_{i=1}^{n+1}\lambda_{i}\nabla_{x} g(x^{*},t_{i}),d\right\rangle\leq 0, \ \ t_{i}\in T_{act}(x^{*}), $$

    which is an contradiction with (5). Thus, λ0 is strictly positive, and the result item (c) holds by setting λi = λi/λ0.

Definition 8

x is called an 𝜖-KKT point of (1) if there exist some multipliers λi, i = 1,...,n + 1 such that

$$ 0\in \partial f(x^{*})+\sum\limits_{i=1}^{n+1}\lambda_{i}\nabla_{x} g(x^{*},t_{i})+\partial I_{X}(x^{*}), t_{i}\in T_{act}^{\epsilon}(x^{*}), $$

where \(T_{act}^{\epsilon }(x^{*})=\{t\in T | g(x^{*},t)\in [-\epsilon ,0]\}\).

2.3 The concave relaxation of the lower level problem

In this subsection, we consider the following lower level problem

$$ Q(x):\qquad\qquad \underset{t\in T}{\max} g(x,t).\qquad\qquad\qquad\qquad $$

We note that the lower level problem is actually a concave optimization problem for all xX if g(x,⋅) is convex on T for these xX.

For \(N\in \mathbb {N}\), let a = t0 < t1 < ... < tN = b define a subdivision E of T:

$$ E:=\{t_{u} | u\in U^{\prime}=\{0,1,...,N\}\}. $$

Then, we have

$$ \underset{t\in T}{\max} g(x,t)= \underset{u\in U}{\max} \underset{t\in T_{u}}{\max} g(x,t), $$

where U = {1,2,...,N} and Tu = [tu− 1,tu]. Tu is called a subinterval of T and satisfies

$$ T=\underset{u\in U}{\bigcup} T_{u}. $$

The length of the subinterval Tu is defined by |Tu| = |tutu− 1| and the length of the subdivision E is defined by \(|E|=\max \limits _{u\in U}|T_{u}|\).

Following the approach of [41], for all uU, we generate the upper bound function gu of g on Tu,

$$ g_{u}(x,t)=g(x,t)+\frac{\alpha_{u}}{2}(t-\frac{t_{u-1}+t_{u}}{2})^{2},~~\forall~~x\in X. $$

In this case, the function gu is twice continuously differentiable with respect to t and the second-order derivative with respect to t of gu is

$$ {\nabla^{2}_{t}} g_{u}(x,t)={\nabla^{2}_{t}}{g}(x,t)+\alpha_{u}. $$

It follows that gu(x,t) is convex with respect to t on Tu if the parameter αu satisfies

$$ \alpha_{u}\geq\underset{(x,t)\in X\times T_{u}}{\max} \left\{-{\nabla_{t}^{2}} g(x,t)\right\}. $$

On the other hand, we require to construct an upper bound function of g. Thus,

$$ \alpha_{u}\geq\max\left\{0 ,\underset{(x,t)\in X\times T_{u}}{\max} \left\{-{\nabla_{t}^{2}} g(x,t)\right\}\right\},~~\forall~~ u\in U. $$
(6)

Lemma 1

[41] Let f be a convex function and C = convS, where S is an arbitrary set of points. Then

$$ \sup\{f(x) | x\in C\}=\sup\{f(x) | x\in S\}, $$

where the first supremum is attained only when the second supremum is attained.

We know from Lemma 1 that the maximum of gu(x,t)(∀ uU) must be attained on the boundary of Tu, that is,

$$ \begin{array}{@{}rcl@{}} \underset{t\in T_{u}}{\max} g_{u}(x,t)&=&\underset{t\in \{t_{u-1},t_{u}\}}{\max} g_{u}(x,t)=\max\left\{g(x,t_{u-1}),g(x,t_{u})\right\}\\ &&+\frac{\alpha_{u}}{8}(t_{u}-t_{u-1})^{2}, ~~\forall~~x\in X. \end{array} $$

Then, we have

$$ \underset{t\in \{t_{u-1},t_{u}\}}{\max} g(x,t)\leq \underset{t\in T_{u}}{\max} g(x,t)\leq\underset{t\in \{t_{u-1},t_{u}\}}{\max} g_{u}(x,t),~~\forall~~x\in X. $$

The convex upper bound function G(⋅,α,E) of G on X is defined by

$$ G(x,\alpha,E)=\underset{u\in U}{\max} \underset{t\in T_{u}}{\max} g_{u}(x,t). $$

It can be rewritten as

$$ G(x,\alpha,E)=\underset{t_{u}\in E}{\max} \left\{g(x,t_{u})+h(t_{u},\alpha_{u})\right\}, $$
(7)

where

$$ h(t_{u},\alpha_{u})=\left\{ \begin{aligned} &\frac{\alpha_{u+1}}{8}({t_{u+1}-t_{u}})^{2},&\text{if}\ \ u=0,\qquad\ \\ &\max\left\{\frac{\alpha_{u}}{8}({t_{u} -t_{u-1}})^{2}\ ,\frac{\alpha_{u+1}}{8}({t_{u+1}-t_{u}})^{2}\right\},& \text{if}\ \ 0<u<N, \\ &\frac{\alpha_{u}}{8}({t_{u} -t_{u-1}})^{2},&\text{if}\ \ u=N,\qquad \end{aligned} \right. $$
(8)

where the parameter αu satisfies (6). We define

$$ \bar{A}(x,\alpha,E)=\{t_{u}\in E | G(x,\alpha,E)=g(x,t_{u})+h(t_{u},\alpha_{u})\}, $$
(9)

and

$$ D(E,\alpha)=\underset{u\in U}{\max} \frac{\alpha_{u}}{8}({t_{u} -t_{u-1}})^{2}. $$
(10)

Then, we can obtain that

$$ G(x)\leq G(x,\alpha,E)\leq G(x)+D(E,\alpha). $$
(11)

Now the upper bounding problem of the original problem is defined by

$$ \left\{\begin{aligned} \underset{x\in X}{\min} & f(x) \\ \text{s.t.} & G(x,\alpha,E)\le 0. \end{aligned}\right. $$
(12)

Using the improvement function mentioned in the preceding section, we define \(H(\cdot ,x,\alpha ,E):\mathbb {R}^{n}\rightarrow \mathbb {R}\) as follows:

$$ H(y,x,\alpha,E):=\max\left\{ f(y)-f(x) , G(y,\alpha,E) \right\}, $$

where α and E vary along the iterations.

Lemma 2

[41] If EMFCQ holds for all feasible points of the original problem (1), then the MFCQ holds for all feasible points of the approximation problem (12) when |E| is sufficiently small.

Theorem 5

Suppose that \(f: X\rightarrow \mathbb {R}\) and \(g: X\times T\rightarrow \mathbb {R}\) are locally Lipschitz continuous, then the following statements hold:

1.:

If x is a local minimizer of (12), then 0 ∈ H(x,x,α,E) + IX(x).

2.:

If 0 ∈ H(x,x,α,E) + IX(x) and G(x,α,E) ≤ 0, then there exist nonnegative multipliers λi, i = 0,...,n + 1, and indices \(t_{i}\in \bar {A}(x^{*},\alpha ,E), i=1,...,n+1\), such that \({{\sum }_{i=0}^{n+1}} \lambda _{i}=1\), λi(g(x,ti) + h(ti,αi)) = 0,i = 1,...,n + 1, and

$$ 0 \in\lambda_{0}\partial f(x^{*})+\sum\limits_{i=1}^{n+1}\lambda_{i}\nabla_{x} g(x^{*},t_{i})+\partial I_{X}(x^{*}). $$

If, in addition, MFCQ holds at x for problem (12), then there exist nonnegative multipliers λi and indices \(t_{i}\in \bar {T}_{act}(x^{*},\alpha ,E), i=1,...,n+1\), such that

$$ 0 \in \partial f(x^{*})+\sum\limits_{i=1}^{n+1}\lambda_{i}\nabla_{x} g(x^{*},t_{i})+\partial I_{X}(x^{*}), $$

where \(\bar {T}_{act}(x^{*},\alpha ,E)=\{t_{u}\in E | g(x^{*},t_{u})+h(t_{u},\alpha _{u})=0\}.\) If, in addition, D(E,α) ≤ 𝜖, then x is an 𝜖-KKT point of the problem (1).

Proof

The proof of this theorem is analogous to Theorem 4; we only require to show that x is an 𝜖-KKT point of the original problem if D(E,α) ≤ 𝜖. It suffices to prove that if \(t_{i}\in \bar {T}_{act}(x^{*},\alpha ,E)\), i = 1,...,n + 1, then \(t_{i}\in T_{act}^{\epsilon }(x^{*})\), i = 1,...,n + 1, provided that condition D(E,α) ≤ 𝜖 holds. For each \(t_{i}\in \bar {T}_{act}(x^{*},\alpha ,E)\), we have

$$ g(x^{*},t_{i})+\frac{\alpha_{\bar{i}}}{8}|T_{\bar{i}}|^{2}=0, $$

where

$$ \bar{i}=\left\{ \begin{aligned} i & \text{if} \frac{\alpha_{i}}{8}|T_{i}|^{2}\geq \frac{\alpha_{i+1}}{8}|T_{i+1}|^{2}, \\ i+1 & \text{otherwise}, \end{aligned} \right. $$

where T0 = 0 and TN+ 1 = 0. It implies that g(x,ti) ≤ 0 and

$$ g(x^{*},t_{i})=-\frac{\alpha_{\bar{i}}}{8}|T_{\bar{i}}|^{2}\geq -D(E,\alpha) \geq -\epsilon. $$

It follows that \(t_{i}\in T_{act}^{\epsilon }(x^{*})\), i = 1,...,n + 1, if the condition D(E,α) ≤ 𝜖 holds. □

3 The numerical approach

Let k be the current iteration index and xk be the stability center (or serious iterate) at iteration k, where iterations generate a sequence of trial points \(\{y^{j}\}_{j\in J_{k}^{ora}}\), with \(J_{k}^{ora}\subseteq \{0,1,...,k\}\). In particular, \(x^{k}\in \{y^{j}\}_{j\in J_{k}^{ora}}\). Let \(E^{k}=\{t_{u} | u\in U^{\prime }_{k}=\{0,1,...,N_{k}\}\}\) be the current subdivision of T and αk be the corresponding convexification parameter.

Following the redistributed proximal approach [12, 13], we generate the augmented functions of f and G(⋅,αk,Ek) defined by

$$ \begin{array}{@{}rcl@{}} f^{k}(\cdot):= &\ f_{ }(\cdot) +\frac{{\eta_{1}^{k}}}{2}\|\cdot-x^{k}\|^{2}, \\ g^{k}(\cdot):= &\ G(\cdot,\alpha^{k},E^{k})+\frac{{\eta_{2}^{k}}}{2}\|\cdot-x^{k}\|^{2}. \end{array} $$

The idea is to utilize the augmented functions in the model construction. We can first construct a convex piecewise linear approximation defined by

$$ \begin{array}{@{}rcl@{}} \hat{H}_{k}(y) &:=& \underset{j\in J_{k}^{ora}}{\max} \left\{ f^{k}(y^{j})+ \left\langle\xi_{f}^{j} +{\eta_{1}^{k}}(y^{j}-x^{k}),y-y^{j}\right\rangle-f(x^{k}),\right. \\ &&\left. g^{k}(y^{j})+ \left\langle\xi_{g}^{k,j}+{\eta_{2}^{k}}(y^{j}-x^{k}),y-y^{j}\right\rangle\right\}, \end{array} $$

where \({\xi ^{j}_{f}}\in \partial f(y^{j})\) and \(\xi ^{k,j}_{g}\in \partial G(y^{j},\alpha ^{k},E^{k})\). It can be rewritten in an equivalent form

$$ \hat{H}_{k}(y) = G^{+}(x^{k},\alpha^{k},E^{k})+\underset{j\in J_{k}^{ora}}{\max} \left\{ -c_{k,j}^{f}+\langle s_{f}^{k,j},y-x^{k}\rangle, -c_{k,j}^{g}+\langle s_{g}^{k,j},y-x^{k}\rangle\right\}. $$
(13)

where

$$ \begin{aligned} & s_{f}^{k,j}:={\xi_{f}^{j}} \ +{\eta_{1}^{k}} {{{\varDelta}}^{j}_{k}} \ \ \text{and}\ \ c_{k,j}^{f} := e_{k,j}^{f}+{\eta_{1}^{k}} {b^{j}_{k}}+G^{+}(x^{k},\alpha^{k},E^{k}), \\ & s_{g}^{k,j}:=\xi_{g}^{k,j}+{\eta_{2}^{k}} {{{\varDelta}}^{j}_{k}} \ \ \text{and}\ \ c_{k,j}^{g} := e_{k,j}^{g}+{\eta_{2}^{k}} {b^{j}_{k}}+G^{+}(x^{k},\alpha^{k},E^{k}) - G(x^{k},\alpha^{k},E^{k}), \end{aligned} $$
(14)

with

$$ \begin{aligned} & e_{k,j}^{f}:=f(x^{k})-f(y^{j})-{\langle\xi_{f}^{j}},x^{k}-y^{j}\rangle, \\ & e_{k,j}^{g}:=G(x^{k},\alpha^{k},E^{k})-G(y^{j},\alpha^{k},E^{k})-\langle\xi_{g}^{k,j},x^{k}-y^{j}\rangle,\\ & {b_{k}^{j}}:=\frac{1}{2}\|y^{j}-x^{k}\|^{2},\ \ {{{\varDelta}}_{k}^{j}}:=y^{j}-x^{k}. \end{aligned} $$
(15)

Any choice for the parameters \({\eta _{1}^{k}}\), HCode \({\eta _{2}^{k}}\) that keeps \(c_{k,j}^{f},c_{k,j}^{g}\) in (13) nonnegative is acceptable. In our method, we take

$$ {\eta_{1}^{k}}= \max\left\{\underset{\begin{array}{cc}j\in J_{k}^{ora} \\ y^{j}\neq x^{k} \end{array}}{\max}\left\{- \frac{\ e_{k,j}^{f}\ }{{b_{k}^{j}}}\right\}, 0\right\}+\gamma,\ \ {\eta_{2}^{k}}= \max\left\{\underset{\begin{array}{cc}j\in J_{k}^{ora} \\ y^{j}\neq x^{k} \end{array}}{\max}\left\{- \frac{\ e_{k,j}^{g}\ }{{b_{k}^{j}}}\right\}, 0\right\}+\gamma, $$
(16)

where γ is a small positive parameter. In particular, since \(x^{k}\in \{y^{j}\}_{j\in J_{k}^{ora}}\), we obtain that

$$ \hat{H}_{k}(x^{k})=G^{+}(x^{k},\alpha^{k},E^{k})=H(x^{k},x^{k},\alpha^{k},E^{k}). $$
(17)

Then, we seek for the new point yk+ 1 as a solution of

$$ \left\{\begin{aligned} \min & \hat{H}_{k}(y)+\frac{1}{2}\mu_{k}\|y-x^{k}\|^{2} \\ \text{s.t.} & y\in X, \end{aligned}\right. $$
(18)

where μk > 0 is a proximal parameter.

The following lemma introduces the characterization of the solution of the (18).

Lemma 3

Let yk+ 1 be the unique solution to (18) and assume μk > 0. Then, we have that

$$ y^{k+1}=x^{k}-\frac{1}{\mu_{k}}(S^{k+1}+v^{k+1}), \text{where} S^{k+1}\in\partial \hat{H}_{k}(y^{k+1}), v^{k+1}\in \partial I_{X}(y^{k+1}). $$
(19)

Lemma 3 implies that

$$ \langle v^{k+1},y^{k+1}-x^{k}\rangle{\geq}0. $$
(20)

The linearization error of \(\hat {H}_{k}(\cdot )\) is defined by

$$ C^{k+1}:=\hat{H}_{k}(x^{k})-\hat{H}_{k}(y^{k+1})-\langle S^{k+1},x^{k}-y^{k+1}\rangle\geq 0. $$
(21)

Since the piecewise linear model \(\hat {H}_{k}\) is convex, we can construct a convex linear approximation of \(\hat {H}_{k}\) by

$$ \hat{H}_{k}(y)\geq\hat{H}_{k}(y^{k+1})+\langle S^{k+1},y-y^{k+1}\rangle=G^{+}(x^{k},\alpha^{k},E^{k})+\langle S^{k+1},y-x^{k}\rangle-C^{k+1}. $$

Hence, the cutting plane model can be redefined in the form

$$ \begin{array}{lll} \hat{H}_{k}(y) &= & G^{+}(x^{k},\alpha^{k},E^{k})+\max\left\{\underset{j\in J_{k}^{agg}}{\max}\left\{-C^{j}+\langle S^{j},y-x^{k}\rangle\right\},\right.\\ && \left.\underset{j\in J_{k}^{ora}}{\max}\left\{ -c_{k,j}^{f}+\langle s_{f}^{k,j},y-x^{k}\rangle, -c_{k,j}^{g}+\langle s_{g}^{k,j},y-x^{k}\rangle\right\}\right\}, \end{array} $$
(22)

where \(J_{k}^{ora}\subseteq \{0,1,...,k\}\) and \(J_{k}^{agg}\subseteq \{1,...,k\}\). Then, (18) can be stated as a quadratic programming problem in \(\mathbb {R}^{1}\times X\):

$$ \text{compute }(r,y) \\ := \left\{ \begin{aligned} \arg \min & r+\frac{\mu_{k}}{2}\|y-x^{k}\|^{2} \\ \text{s.t.}\ \ & G^{+}(x^{k},\alpha^{k},E^{k})-c_{k,j}^{f}+\langle s_{f}^{k,j},y-x^{k}\rangle\leq r, j\in J^{ora}_{k},\\ & G^{+}(x^{k},\alpha^{k},E^{k})-c_{k,j}^{g}+\langle s_{g}^{k,j},y-x^{k}\rangle\leq r, j\in J^{ora}_{k},\\ & G^{+}(x^{k},\alpha^{k},E^{k})- C^{j} +\langle S^{j} ,y-x^{k}\rangle\leq r, j\in J^{agg}_{k}. \end{aligned}\right. $$
(23)

From the KKT conditions for (23), one easily obtains a representation for Sk+ 1 and Ck+ 1 (defined by (19) and (21)):

$$ \begin{aligned} S^{k+1} & =\underset{j\in J_{k}^{ora}}{\sum} \left( \lambda_{1,k+1}^{j} s_{f}^{k,j}+\lambda_{2,k+1}^{j} s_{g}^{k,j}\right)+\underset{j\in J_{k}^{agg}}{\sum} \lambda_{3,k+1}^{j} {S^{j}}, \\ C^{k+1} & =\underset{j\in J_{k}^{ora}}{\sum} \left( \lambda_{1,k+1}^{j} c_{k,j}^{f}+\lambda_{2,k+1}^{j} c_{k,j}^{g}\right)+\underset{j\in J_{k}^{agg}}{\sum} \lambda_{3,k+1}^{j} {C^{j}}, \end{aligned} $$
(24)

and (λ1,k+ 1,λ2,k+ 1,λ3,k+ 1) is a solution to

$$ \left\{ \begin{aligned} \min & \frac{1}{2}\left\|\underset{j\in J_{k}^{ora}}{\sum} \left( \lambda_{1,k+1}^{j} s_{f}^{k,j}+\lambda_{2,k+1}^{j} s_{g}^{k,j}\right)+\underset{j\in J_{k}^{agg}}{\sum}\lambda_{3,k+1}^{j} S^{j} \right\|^{2} \\ & + {\mu_{k}}\left( \underset{j\in J_{k}^{ora}}{\sum} \left( \lambda_{1,k+1}^{j} c_{k,j}^{f}+\lambda_{2,k+1}^{j} c_{k,j}^{g}\right)+\underset{j\in {J_{k}^{agg}}}{\sum} \lambda_{3,k+1}^{j} C^{j}\right)\\ \text{s.t.} & \ \ \underset{j\in J_{k}^{ora}}{\sum}\left( \lambda_{1,k+1}^{j} +\lambda_{2,k+1}^{j} \right)+\underset{j\in J_{k}^{agg}}{\sum}\lambda_{3,k+1}^{j} =1,\\ & \lambda_{1,k+1}^{j},\ \lambda_{2,k+1}^{j}\geq 0 \text{for all} j\in J^{ora}_{k} \text{and} \lambda_{3,k+1}^{j}\geq 0 \text{for all} j\in J^{agg}_{k}. \end{aligned}\right. $$
(25)

To measure the quality of the candidate, the nominal decrease δk+ 1 is defined by:

$$ \delta_{k+1}:=C^{k+1}+\frac{1}{\mu_{k}}\|S^{k+1}+v^{k+1}\|^{2}\geq0. $$
(26)

According to (20), (21), and Lemma 3, we can obtain that

$$ \begin{array}{lll} \delta_{k+1}&=&H(x^{k},x^{k},\alpha^{k},E^{k})-\hat{H}_{k}(y^{k+1})-\langle v^{k+1},y^{k+1}-x^{k}\rangle\\ &\leq& H(x^{k},x^{k},\alpha^{k},E^{k})-\hat{H}_{k}(y^{k+1}). \end{array} $$
(27)

We now present the criterion that determines whether a new stability center is generated:

$$ \max\left\{ f(y^{k+1})-f(x^{k}) ,G(y^{k+1},\alpha^{k},E^{k}) \right\}-G^{+}(x^{k},\alpha^{k},E^{k})\leq -m \delta_{k+1}, $$
(28)

where m is a positive descent parameter. If (28) holds, the trial point yk+ 1 brings a sufficient descent for the objective or constraint function. Then, the stability center will be moved to yk+ 1, i.e., xk+ 1 := yk+ 1. This case is called a serious step (or descent step). Otherwise, the trial point yk+ 1 is used to enrich the model and the stability center is not changed. This case is called a null step.

In the following, we introduce a subdivision strategy to refine the current subdivision and construct a refined approximation of the lower level problem by the following procedure.

Refinement procedure

  1. 1.

    For a given subset Tu = [tu− 1,tu], let αu be the corresponding convexification parameter.

  2. 2.

    Define

    $$ E^{\alpha BB}(T_{{u}})=\left\{t_{{u},i} | t_{{u},i}=t_{{u}-1}+\frac{i}{2s+1}(t_{{u}}-t_{{u}-1}),i=1,...,2s, s\in\mathbb{N}\right\}. $$
    (29)
  3. 3.

    Put Tu,i = [tu,i− 1,tu,i], i = 1,..., 2s + 1, where tu,0 = tu− 1 and tu,2s+ 1 = tu.

  4. 4.

    Determine the corresponding convexification parameters αu,i, i = 1,..., 2s + 1, such that αu,iαu and

    $$ \alpha_{{u},i}\geq\max\left\{0,\underset{(x,t)\in X\times T_{{u},i}}{\max} \left\{-{\nabla_{t}^{2}} g(x,t)\right\}\right\}. $$
    (30)

Remark 1

Note that the computation of αu,i involves a global optimization problem. However, we can use an upper bound for the right-hand side in (30). In particular, we can choose αu,i = αu. Therefore, the computation of the globally optimal value of the lower level problem \(\max \limits _{t\in T}\{g(x,t)\}\) uses more computation efforts than the computation of αu,i.

Now, we show that our refinement procedure brings a sufficient descent for the new constraint function.

Lemma 4

Assume that Tu,i = [tu,i− 1,tu,i] and αu,i, i = 1,..., 2s + 1, are obtained by taking the refinement procedure on subset Tu = [tu− 1,tu]. Then, we have

$$ g(x,t)\leq g_{1,i}(x,t)\leq g_{2}(x,t), \text{for all} t\in[t_{u,i-1},t_{u,i}], i=1,...,2s+1, $$

where

$$ \begin{array}{@{}rcl@{}} g_{1,i}(x,t)&= & g(x,t)+\frac{\alpha_{u,i}}{2}\left( t-\frac{t_{u,i-1}+t_{u,i}}{2}\right)^{2},\\ g_{2}(x,t)&= & g(x,t)+\frac{\alpha_{u}}{2}\left( t-\frac{t_{u-1}+t_{u}}{2}\right)^{2}. \end{array} $$

Proof

The first inequality can be obtained directly by the definitions of g1,i,i = 1,..., 2s + 1. We set \(\psi _{i}(t)=\frac {\alpha _{u,i}}{2}(t-\frac {t_{u,i-1}+t_{u,i}}{2})^{2}-\frac {\alpha _{u}}{2}(t-\frac {t_{u-1}+t_{u}}{2})^{2}\). Then, we only need to prove that ψi(t) ≤ 0 for any i = 1,..., 2s + 1. We note that \(|t-\frac {t_{u,i-1}+t_{u,i}}{2}|\leq \frac {t_{u}-t_{u-1}}{2(2s+1)}\leq \frac {t_{u-1}+t_{u}}{2}-t\) for all t ∈ [tu,i− 1,tu,i],i = 1,...,s. Then, for any is, we have

$$ \begin{array}{@{}rcl@{}} \nabla_{t} \psi_{i}(t)& = &\ \alpha_{u,i}\left( t-\frac{t_{u,i-1}+t_{u,i}}{2}\right)-\alpha_{u}\left( t-\frac{t_{u-1}+t_{u}}{2}\right)\\ &= &\ \alpha_{u,i}\left( t-\frac{t_{u,i-1}+t_{u,i}}{2}\right)+\alpha_{u}\left( \frac{t_{u-1}+t_{u}}{2}-t\right) \\ &\geq &\ 0. \end{array} $$

Therefore, we conclude that ψi(t) is nondecreasing on [tu,i− 1,tu,i],i = 1,...,s. It implies that ψi(t) ≤ ψi(tu,i) for all t ∈ [tu,i− 1,tu,i],i = 1,...,s. For any is, we have

$$ \begin{array}{@{}rcl@{}} \psi_{i}(t_{u,i}) &= &\ \frac{\alpha_{u,i}}{2}\left( \frac{t_{u,i}-t_{u,i-1}}{2}\right)^{2}-\frac{\alpha_{u}}{2}\left( t_{u,i}-\frac{t_{u-1}+t_{u}}{2}\right)^{2} \\ &\leq &\ \frac{\alpha_{u,i}}{2}\left( \left( \frac{t_{u,i}-t_{u,i-1}}{2}\right)^{2}-\left( t_{u,i}-\frac{t_{u-1}+t_{u}}{2}\right)^{2}\right)\\ &= &\ \frac{\alpha_{u,i}}{2}\left( \left( \frac{t_{u,i}-t_{u,i-1}}{2}\right)^{2}-\left( \frac{t_{u}-t_{u-1}}{2(2s+1)}\right)^{2}\right) \\ &\leq &\ 0. \end{array} $$

Then, we have ψi(t) ≤ 0 for any [tu,i− 1,tu,i],i = 1,...,s. Similarly, we can prove that ψi(t) ≤ 0 for any [tu,i− 1,tu,i],i = s + 2,..., 2s + 1. For i = s + 1, we have

$$ \begin{array}{@{}rcl@{}} \psi_{i}(t) &= & \frac{\alpha_{u,i}}{2}\left( t-\frac{t_{u,i-1}+t_{u,i}}{2}\right)^{2}-\frac{\alpha_{u}}{2}\left( t-\frac{t_{u-1}+t_{u}}{2}\right)^{2} \\ &= & \left( \frac{\alpha_{u,i}}{2}-\frac{\alpha_{u}}{2}\right)\left( t-\frac{t_{u-1}+t_{u}}{2}\right)^{2}\\ &\leq & 0. \end{array} $$

The proof is complete. □

Now, we present the algorithm in full detail.

Algorithm 1

(Feasible proximal bundle method with convexification for SIP problems) Data. m ∈ (0, 1), tol1 ≥ 0, tol2 > 0, tol3 ≥ 0, \(\mu _{\max \limits }>\mu _{\min \limits }>0\), \(\mu _{0}\in [\mu _{\min \limits },\mu _{\max \limits }]\), \({\eta _{1}^{0}}\geq 0\), \({\eta _{2}^{0}}\geq 0\), γ > 0, s > 0, \(i_{n\max \limits }>0\), \(\tilde {X}\subset \mathbb {R}\). Set k = 0,N0 = 2s + 1, and

$$ E^{0}=\left\{t_{u} | t_{u}=a+\frac{u}{N_{0}}(b-a), u=0,1,...,N_{0}\right\}\subset T. $$

Determine the parameter α0 with (6) on E0.

Step 0 (Initialization)

Solve the problem

$$ \left\{ \begin{aligned} \underset{(x,z)\in X\times\tilde{X}}{\min} \qquad & z \\ \text{s.t.} \quad\qquad & G(x,\alpha^{0},E^{0})-z\leq 0. \end{aligned} \right. $$
(31)

and from the solution \(\tilde {x}\) compute \(G(\tilde {x},\alpha ^{0},E^{0})\). Set \(y^{0}=\tilde {x}\), \(G(y^{0},\alpha ^{0},E^{0})=G(\tilde {x},\alpha ^{0},E^{0})\). If z > 0, set iz = 1, \(J_{1}^{ora}=\{0\}\), and go to Step 7. Otherwise, compute \(f(y^{0}),\ {\xi _{f}^{0}}\in \partial f(y^{0}),\ \xi _{g}^{0,0}\in \partial G(y^{0},\alpha ^{0},E^{0})\) and set x0 = y0, f(x0) = f(y0), G(x0,α0,E0) = G(y0,α0,E0). Define the associated \(e_{0,0}^{f}, e_{0,0}^{g}, {b^{0}_{0}}, {{{\varDelta }}_{0}^{0}}\) by (15). Set iz = 0, ix = 0, in = 0, \(J_{0}^{ora}=\{0\}\), \(J_{0}^{agg}=\emptyset \) and go to Step 1.

Step 1 (Quadratic programming subproblem)

Given the model \(\hat {H}_{k}\) defined by (14) and (22), compute Sk+ 1, Ck+ 1, yk+ 1, (λ1,k+ 1,λ2,k+ 1,λ3,k+ 1) by solving the subproblem (23) and its dual (25).

Step 2 (Stopping test)

Compute δk+ 1 by (26). If δk+ 1tol1 and D(Ek,αk) ≤ tol2, STOP.

Step 3 (Bundle management)

Set

$$ \begin{array}{@{}rcl@{}} & J_{k+1}^{ora}\subseteq J_{k}^{ora}\ \text{and} \ J_{k+1}^{ora}\supseteq\{ k+1 \}, \\ & J_{k+1}^{agg}\subseteq J_{k}^{agg}\ \text{and} \ J_{k+1}^{agg}\supseteq\{ k+1 \}. \end{array} $$

Step 4 (Descent test)

Compute

$$ f(y^{k+1}),\ G(y^{k+1},\alpha^{k},E^{k}),\ \xi_{f}^{k+1}\in\partial f(y^{k+1}),\ \xi_{g}^{k,{k+1}}\in\partial G(y^{k+1},\alpha^{k},E^{k}). $$

Define the associated \(e_{k,{k+1}}^{f}\), \(e_{k,{k+1}}^{g}\), \(b_{k}^{k+1}\), and \({{\varDelta }}_{k}^{k+1}\) by (15). If the descent condition (28) holds, then set in = 0 and go to Step 6. Otherwise, set in = in + 1 and go to Step 5.

Step 5 (Null step update)

Choose \(\mu _{k+1}\in [\mu _{k},\mu _{\max \limits }]\) and set

$$ \begin{array}{@{}rcl@{}} \left( x^{k+1},f(x^{k+1})\right) &=&\left( x^{k},f(x^{k})\right),\\ \left( e_{k+1,j}^{f},b_{k+1}^{j},{{\varDelta}}_{k+1}^{j}\right) &=&\left( e_{k,j}^{f},{b_{k}^{j}},{{{\varDelta}}_{k}^{j}}\right),\ \text{for\ all\ }j\in J_{k+1}^{ora}. \end{array} $$

If \(i_{n}>i_{n\max \limits }\) and D(Ek,αk) > tol2, then go to Step 7. Otherwise, set

$$ \begin{array}{@{}rcl@{}} &\left( E^{k+1},\alpha^{k+1},G(x^{k+1},\alpha^{k+1},E^{k+1})\right) = \left( E^{k},\alpha^{k},G(x^{k},\alpha^{k},E^{k})\right),\\ &\left( \xi^{k+1,j}_{g},e_{k+1,j}^{g}\right) =\left( \xi^{k,j}_{g},e_{k,j}^{g}\right),\ \text{for\ all\ }j\in J_{k+1}^{ora}. \end{array} $$

and go to Step 8.

Step 6 (Serious step update)

Choose \(\mu _{k+1}\in [\mu _{\min \limits },\mu _{\max \limits }]\) and set

$$ i_{x}=k+1,\ x^{k+1}=y^{k+1},\ f(x^{k+1})=f(y^{k+1}),\ J_{k+1}^{agg}=\emptyset. $$

For all \(j\in J_{l+1}^{ora}\), we set

$$ \begin{array}{@{}rcl@{}} e_{k+1,j}^{f} & =&\ e_{k,j}^{f}+f(x^{k+1})-f(x^{k})-{\langle\xi_{f}^{j}}, x^{k+1}-x^{k}\rangle, \\ b_{k+1}^{j} & =&\ {b_{k}^{j}}+\|x^{k+1}-x^{k}\|^{2}/2-{\langle{{\varDelta}}_{k}^{j}}, x^{k+1}-x^{k}\rangle,\\ {{\varDelta}}_{k+1}^{j} & =&\ {{{\varDelta}}_{k}^{j}}-x^{k+1}+x^{k}. \end{array} $$

If D(Ek,αk) > tol2, go to Step 7. Otherwise, set

$$ E^{k+1}=E^{k},\ \alpha^{k+1}=\alpha^{k},\ G(x^{k+1},\alpha^{k+1},E^{k+1})=G(y^{k+1},\alpha^{k},E^{k}). $$

For all \(j\in J_{k+1}^{ora}\), we set \(\xi ^{k+1,j}_{g}=\xi ^{k,j}_{g}\) and

$$ e_{k+1,j}^{g} = e_{k,j}^{g}+G(x^{k+1},\alpha^{k+1},E^{k+1})-G(x^{k},\alpha^{k},E^{k})-\langle\xi_{g}^{k,j}, x^{k+1}-x^{k}\rangle. $$

Then, go to Step 8.

Step 7 (Subdivision update)

Select the index set Wk as follows:

$$ \begin{array}{@{}rcl@{}} W_{k}\supseteq\left\{u | G(y^{j},\alpha^{k},E^{k})\right.&=&g(y^{j},t_{u})+h(t_{u},\alpha_{u}) \text{and} h(t_{u},\alpha_{u})>tol_{2},\\ &&\left.\text{for all} j\in J_{k+1}^{ora},\ t_{u}\in E^{k}\right\}, \end{array} $$

and

$$ W_{k}\subseteq\left\{u | h(t_{u},\alpha_{u})>tol_{2}, \text{for all}\ t_{u}\in E^{k}\right\}. $$

Take the Refinement procedure on subsets \(T_{\bar {u}}\), uWk, where \(\bar {u}\) is defined by

$$ \bar{u}=\left\{ \begin{aligned} u & \text{if} \frac{\alpha_{u}}{8}|T_{u}|^{2}\geq \frac{\alpha_{u+1}}{8}|T_{u+1}|^{2}, \\ u+1 & \text{otherwise}, \end{aligned} \right. $$

where T0 = 0 and \(T_{N_{k}+1}=0\). Put

$$ \begin{array}{@{}rcl@{}} E^{k+1}&=&\ \{ t_{u} | u=0,...,N_{k+1}\}:=E^{k}\bigcup\left\{E^{\alpha BB}(T_{\bar{u}}) | u\in W_{k}\right\}, \\ \alpha^{k+1}&=&\ \{\alpha_{u} | u=1,...,N_{k+1}\}:=\alpha^{k}\bigcup\\ && \left\{\alpha_{\bar{u},i}: u\in W_{k}, i=1,...,2s+1\right\} \backslash \left\{\alpha_{\bar{u}}: u\in W_{k}\right\}. \end{array} $$

If iz = 1, set E0 = Ek+ 1, α0 = αk+ 1, k = 0 and go to Step 0. Otherwise, for all \(j\in J_{k+1}^{ora}\), compute \(G(y^{j},\alpha ^{k+1},E^{k+1}),\ \xi _{g}^{k+1,j}\in \partial G(y^{j},\alpha ^{k+1},E^{k+1})\) and define the associated \(e_{k+1,j}^{g}\) by (15). Then, go to Step 8.

Step 8 (Parameter update)

Compute the parameters \(\eta _{1}^{k+1}\) and \(\eta _{2}^{k+1}\) by (16). Set k = k + 1 and go to Step 1.

Remark 2

Under Assumption 1, there exist E,α, and a point x such that

$$G(x,\alpha,E)\leq 0~\text{as}~D(E,\alpha)\rightarrow0.$$

Therefore, Algorithm 1 cannot pass infinitely many times through step 0, i.e., Algorithm 1 generates a feasible initial point x0 such that G(x0,α0,E0) ≤ z ≤ 0 at step 0. According to (28), if the descent test is satisfied, then it holds that G(xk+ 1,αk,Ek) ≤ G+(xk,αk,Ek) − mδk+ 1G+(xk,αk,Ek). By Lemma 4, we have G(xk+ 1,αk+ 1,Ek+ 1) ≤ G+(xk,αk,Ek). If xk is infeasible, then we can obtain that G(x0,α0,E0) > 0, which contradicts the fact that G(x0,α0,E0) ≤ z ≤ 0. Therefore, if a serious step is declared, then the following conditions hold:

$$ f(x^{k+1})-f(x^{k})\leq -m\delta_{k+1}\ \ \text{and}\ \ G(x^{k+1},\alpha^{k+1},E^{k+1})\leq0. $$

Lemma 5

Algorithm 1 cannot pass infinitely many times through step 7, i.e., there exist \(k_{E}, \bar {E}, \bar {\alpha }\) such that

$$ E^{k}=\bar{E},\ \alpha^{k}=\bar{\alpha},\ D(\bar{E},\bar{\alpha})\leq tol_{2}\ \text{for all}\ k\geq k_{E}. $$

4 Convergence

Firstly, we define the index set of serious step as follows:

$$ \mathcal{K}_{s}:=\{k | y^{k} \text{is a serious step, i.e.},x^{k}=y^{k}\}. $$

We start with the following lemma.

Lemma 6

Assume that f is bounded below on X and Algorithm 1 generates an infinite number of serious steps. Then \(C^{k}\rightarrow 0\) and \(S^{k}+v^{k}\rightarrow 0\) as \(k\rightarrow \infty \) in \(\mathcal {K}_{s}\).

Proof

We first show that

$$ \underset{k\in\mathcal{K}_{s}}{\sum}\delta_{k}<+\infty. $$
(32)

Making use of (28) and Remark 2, we conclude that, for all \(k\in \mathcal {K}_{s}\),

$$ m \delta_{k}\leq f(x^{k-1})-f(x^{k}). $$

Thus, the sequence {f(xk)} is nonincreasing and bounded below on X. Let \(\bar {f}=\inf \{f(x) | x\in X\}\). Therefore, we obtain that

$$ \underset{k\in\mathcal{K}_{s}}{\sum}\delta_{k}\leq\frac{1}{m}\left( f(x^{0})-\bar{f}\right)<+\infty. $$

Now, we obtain that \(\{\delta _{k}\}_{k\in \mathcal {K}_{s}}\rightarrow 0\). Making use of (26), we conclude that

$${ C^{k}\leq\delta_{k}\ \ \text{and}\ \ \frac{1}{\mu_{k-1}}\|S^{k}+v^{k}\|^{2}\leq\delta_{k}.} $$

It immediately follows that \(C^{k}\rightarrow 0\) and \(S^{k}+v^{k}\rightarrow 0\) as \(k\rightarrow \infty \) in \(\mathcal {K}_{s}\) (a consequence of \(\mu _{k-1}\in [\mu _{\min \limits },\mu _{\max \limits }]\)). □

Let Ψk(yk+ 1) denote the objective function optimal value of the subproblem (18), i.e., \({{\varPsi }}_{k}(y^{k+1})=\hat {H}_{k}(y^{k+1})+\frac {\mu _{k}}{2}\|y^{k+1}-x^{k}\|^{2}\).

Lemma 7

Assume that Algorithm 1 takes a finite number of serious steps and \(\bar {k}\) is the last serious iteration, i.e., for all \(k\geq \bar {k}\) we have \(x^{k}=x^{\bar {k}}\). Then we can obtain that

$$ H(x^{k},x^{k},\alpha^{k},E^{k})\geq{{\varPsi}}_{k}(y^{k+1})\geq {{\varPsi}}_{k-1}(y^{k})+\frac{\mu_{k-1}}{2}\|{y^{k+1}}-y^{k}\|^{2},\ \text{for\ all\ } k>\max\{\bar{k},k_{E}\}. $$

Proof

In what follows, we consider \(k>\max \limits \{\bar {k},k_{E}\}\). Using Lemma 3 and the convexity of \(\hat {H}_{k}\), we obtain that

$$ \begin{aligned} H(x^{k},x^{k},\alpha^{k},E^{k})= &\ \hat{H}_{k}(x^{k})\\ \geq &\ \hat{H}_{k}(y^{k+1})+\langle S^{k+1},x^{k}-y^{k+1}\rangle\\ \geq &\ \hat{H}_{k}(y^{k+1})+\langle S^{k+1},x^{k}-y^{k+1}\rangle+\langle v^{k+1}, x^{k}-y^{k+1}\rangle\\ \geq &\ \hat{H}_{k}(y^{k+1})+\frac{\mu_{k}}{2}\|y^{k+1}-x^{k}\|^{2}\\ = &\ {{\varPsi}}_{k}(y^{k+1}), \end{aligned} $$

where the first equality is by (17), the second inequality is due to (20). Since \(k\in J_{k}^{agg}\), (21) and (22) imply that

$$ \begin{array}{@{}rcl@{}} \hat{H}_{k}(y^{k+1}) &\geq\ G^{+}(x^{k},\alpha^{k},E^{k})-C^{k}+\langle S^{k},y^{k+1}-x^{k} \rangle \\ & =\ \hat{H}_{k-1}(y^{k})+\langle S^{k},y^{k+1}-y^{k} \rangle. \end{array} $$

Moreover,

$$ \begin{aligned} {{\varPsi}}_{k}(y^{k+1}) = &\ \hat{H}_{k}(y^{k+1})+\frac{\mu_{k}}{2}\|y^{k+1}-x^{k}\|^{2}\\ \geq &\ \hat{H}_{k-1}(y^{k})+\langle S^{k},y^{k+1}-y^{k} \rangle+\frac{\mu_{k-1}}{2}\|y^{k+1}-x^{k}\|^{2}\\ \geq &\ \hat{H}_{k-1}(y^{k})+\langle S^{k},y^{k+1}-y^{k} \rangle+\langle v^{k}, y^{k+1}-y^{k}\rangle+\frac{\mu_{k-1}}{2}\|y^{k+1}-x^{k}\|^{2}\\ \geq &\ \hat{H}_{k-1}(y^{k})-\mu_{k-1}\langle y^{k}-x^{k},y^{k+1}-y^{k}\rangle+\frac{\mu_{k-1}}{2}\|y^{k}-x^{k}+{y^{k+1}}-y^{k}\|^{2} \\ = &\ \hat{H}_{k-1}(y^{k})+\frac{\mu_{k-1}}{2}\|y^{k}-x^{k}\|^{2}+\frac{\mu_{k-1}}{2}\|{y^{k+1}}-y^{k}\|^{2}\\ = &\ {{\varPsi}}_{k-1}(y^{k})+\frac{\mu_{k-1}}{2}\|{y^{k+1}}-y^{k}\|^{2}, \end{aligned} $$

where the second inequality is by the fact that 〈vk,yk+ 1yk〉≤ 0. □

Lemma 8

Assume that Algorithm 1 takes a finite number of serious steps. Suppose that \(\{{\eta ^{k}_{1}}\}\) and \(\{{\eta _{2}^{k}}\}\) are bounded above. Then, \(C^{k}\rightarrow 0\) and \(S^{k}+v^{k}\rightarrow 0\) as \(k\rightarrow \infty \).

Proof

Let \(\bar {k}\) be the last serious iteration. In what follows, we consider \(k>\{\bar {k},k_{E}\}\). By the definition (22) of \(\hat {H}_{k}\) and the fact \(k\in J_{k}^{ora}\), we obtain that

$$ \hat{H}_{k}(y^{k+1})\geq G^{+}(x^{k},\alpha^{k},E^{k})+\left\{ -c_{k,k}^{f}+\langle s_{f}^{k,k},y^{k+1}-x^{k}\rangle, -c_{k,k}^{g}+\langle s_{g}^{k,k},y^{k+1}-x^{k}\rangle\right\}{.} $$

It can be rewritten as

$$ \begin{array}{lll} \hat{H}_{k}(y^{k+1})&\geq& \max \left\{ f^{k}(y^{k})+ \left\langle\xi_{f}^{k} \ +{\eta_{1}^{k}}(y^{k}-x^{k}),y^{k+1}-y^{k}\right\rangle-f(x^{k}),\right. \\ && \left.g^{k}(y^{k})+ \left\langle\xi^{k,k}_{g}+{\eta_{2}^{k}}(y^{k}-x^{k}),y^{k+1}-y^{k}\right\rangle\right\} \\ &\geq& \max \left\{ \left\langle\xi_{f}^{k} \ +{\eta_{1}^{k}}(y^{k}-x^{k}),y^{k+1}-y^{k}\right\rangle+ f(y^{k})-f(x^{k}),\right. \\ && \left.\left\langle\xi^{k,k}_{g}+{\eta_{2}^{k}}(y^{k}-x^{k}),y^{k+1}-y^{k}\right\rangle+ G(y^{k},\alpha^{k},E^{k})\right\}. \end{array} $$

Since {yk}∈ X, the functions f and g are Lipschitz continuous on X and X × T respectively and X is compact, Theorem 1 implies that \(\{{\xi _{f}^{k}}\}\) and \(\{\xi ^{k,k}_{g}\}\) are bounded on that set. Therefore by the boundedness of \(\{{\eta _{1}^{k}}\}\) and \(\{{\eta _{2}^{k}}\}\), we get \(\{{\xi _{f}^{k}}+{\eta _{1}^{k}}(y^{k}-x^{k})\}\) and \(\{\xi ^{k,k}_{g}+{\eta _{2}^{k}}(y^{k}-x^{k})\}\) are bounded. We set

$$ L^{\prime}=\max\left\{\|{\xi_{f}^{k}}+{\eta_{1}^{k}}(y^{k}-x^{k})\|,\|\xi^{k,k}_{g}+{\eta_{2}^{k}}(y^{k}-x^{k})\|\right\}. $$

Then, we can obtain

$$ \hat{H}_{k}(y^{k+1})\geq \max\left\{f(y^{k})-f(x^{k}) ,G(y^{k},\alpha^{k},E^{k})\right\}-L^{\prime}\|y^{k+1}-y^{k}\|. $$

According to (27) and (28), we can obtain that

$$\begin{aligned} \delta_{k+1}\leq &\ H(x^{k},x^{k},\alpha^{k},E^{k})-\hat{H}_{k}(y^{k+1}) \\ \leq &\ G^{+}(x^{k},\alpha^{k},E^{k})-\max\left\{f(y^{k})-f(x^{k}) ,G(y^{k},\alpha^{k},E^{k})\right\}+L^{\prime}\|y^{k+1}-y^{k}\|\\ \leq &\ m \delta_{k}+L^{\prime}\|y^{k+1}-y^{k}\|, \end{aligned} $$

where the third inequality holds by \(k>\bar {k}\). Combining this relation with (26) yields

$$ 0\leq \delta_{k+1}\leq \ m \delta_{k}+L^{\prime}\|y^{k+1}-y^{k}\|. $$
(33)

According to Lemma 7, we infer that

$$ \{y^{k+1}-y^{k}\}\rightarrow 0, \text{as} k\rightarrow\infty, $$

which implies \(\lim _{k\rightarrow \infty }\delta _{k}=0\) [33, Lemma 3, p. 45]. Making use of (26), we conclude that

$${ C^{k}\leq\delta_{k} \text{and} \frac{1}{\mu_{k-1}}\|S^{k}+v^{k}\|^{2}\leq\delta_{k}.} $$

It immediately follows that \(C^{k}\rightarrow 0\) and \(S^{k}+v^{k}\rightarrow 0\) as \(k\rightarrow \infty \). □

Let

$$\eta_{\min}^{1,j}=\max\{0,-\frac{e_{k,j}^{f}}{{b_{k}^{j}}}\}+\gamma,\ \eta_{\min}^{2,j}=\max\{0,-\frac{e_{k,j}^{g}}{{b_{k}^{j}}}\}+\gamma,\ \forall \ \ j\in\{0,...,1\},$$
$$\eta_{\max}^{i}=\max_{j\in\{0,...,k\}}{\eta_{i}^{j}},\ i\in\{1,2\},\ {J^{C}_{k}}=\{j\in J^{ora}_{k} | \lambda_{1,k+1}^{j} = \lambda_{2,k+1}^{j}=0\}.$$

Lemma 9

Let xk(kkE) be the current stability center. Suppose that \(\{{\eta ^{k}_{1}}\}\) and \(\{{\eta _{2}^{k}}\}\) are bounded above, then there exist \(\hat {\lambda }_{1,k+1},\hat {\lambda }_{2,k+1}\in \mathbb {R}^{k+1}\) and \(\hat {J}_{k}\) with

$$ \left\{ \begin{aligned} & \hat{J}_{k}\subseteq\{0,1,...,k\}, \\ & \hat{\lambda}_{1,k+1}^{j},\hat{\lambda}_{2,k+1}^{j}\geq 0,\ \hat{\lambda}_{1,k+1}^{j}+\hat{\lambda}_{2,k+1}^{j}>0,\ \ \text{for\ all}\ j\in \hat{J}_{k},\\ & \underset{j\in \hat{J}_{k}}{\sum} (\hat{\lambda}_{1,k+1}^{j}+\hat{\lambda}_{2,k+1}^{j})=1, \end{aligned}\right. $$
(34)

such that

$$ S^{k+1} = \underset{j\in \hat{J}_{k}}{\sum} \left( \hat{\lambda}_{1,k+1}^{j}\left( {\xi_{f}^{j}}+\hat{\eta}_{1}^{j} {{{\varDelta}}_{k}^{j}}\right)+\hat{\lambda}_{2,k+1}^{j}\left( \xi_{g}^{k,j}+\hat{\eta}_{2}^{j} {{{\varDelta}}_{k}^{j}}\right)\right),\qquad\ \ $$
(35)

and

$$ \begin{aligned} C^{k+1} =& \underset{j\in \hat{J}_{k}}{\sum} \left( \hat{\lambda}_{1,k+1}^{j}\left( e_{k,j}^{f}+\hat{\eta}_{1}^{j} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})\right)\right. \\ &+ \left.\hat{\lambda}_{2,k+1}^{j}\left( e_{k,j}^{g}+\hat{\eta}_{2}^{j} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})-G(x^{k},\bar{\alpha},\bar{E})\right)\right), \end{aligned} $$
(36)

where \(\hat {\eta }_{i}^{j}\in [\eta _{\min \limits }^{i,j},\eta _{\max \limits }^{i}],~i\in \{1,2\}\).

Proof

In what follows, we consider kkE. If \(k\in \mathcal {K}_{s}\) or k = kE, then we have \(J^{agg}_{k}=\emptyset \) and

$$ \begin{array}{@{}rcl@{}} S^{k+1} &=& \underset{j\in J_{k}^{ora}}{\sum} \left( \lambda_{1,k+1}^{j} \left( {\xi_{f}^{j}} +{\eta_{1}^{k}} {{{\varDelta}}^{j}_{k}}\right)+\lambda_{2,k+1}^{j}\left( \xi_{g}^{k,j}+{\eta_{2}^{k}}{{{\varDelta}}^{j}_{k}}\right)\right),\\ C^{k+1} &=& \underset{j\in J_{k}^{ora}}{\sum} \left( {\lambda}_{1,k+1}^{j}\left( e_{k,j}^{f}+{\eta}_{1}^{k} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})\right)\right. \\ &&+ \left.{\lambda}_{2,k+1}^{j}\left( e_{k,j}^{g}+{\eta}_{2}^{k} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})-G(x^{k},\bar{\alpha},\bar{E})\right)\right), \end{array} $$

where \(\lambda _{1,k+1}^{j}, \lambda _{2,k+1}^{j}\geq 0\) for all \(j\in J^{ora}_{k}\) and \({\sum }_{j\in J_{k}^{ora}}(\lambda _{1,k+1}^{j} +\lambda _{2,k+1}^{j})=1\). Then, the result holds by setting \(\hat {J}_{k}=J_{k}^{ora}\setminus {J^{C}_{k}}\), \(\hat {\lambda }_{1,k+1}^{j}=\lambda _{1,k+1}^{j}\), \(\hat {\lambda }_{2,k+1}^{j}=\lambda _{2,k+1}^{j}\), \(\hat {\eta }_{1}^{j}={\eta _{1}^{k}}\) and \(\hat {\eta }_{2}^{j}={\eta _{2}^{k}}\).

If \(k\notin \mathcal {K}_{s}\) and k > kE, then there exists \(\tilde {k}\in \{k_{E}+1,...,k-1\}\) such that \(x^{k}=x^{\tilde {k}}\) and \(J^{agg}_{\tilde {k}}=\emptyset \). It holds that \({{\varDelta }}^{j}_{\tilde {k}}={{{\varDelta }}^{j}_{k}}\), \(b^{j}_{\tilde {k}}={b^{j}_{k}}\), \(\xi _{g}^{\tilde {k},j}=\xi _{g}^{k,j}\) for all \(j\in J_{k}^{ora}\) and \(G(x^{\tilde {k}},\bar {\alpha },\bar {E})=G(x^{k},\bar {\alpha },\bar {E})\). Hence,

$$ \begin{array}{@{}rcl@{}} S^{\tilde{k}+1} &=&\underset{j\in J_{\tilde{k}}^{ora}}{\sum}\left( \lambda_{1,\tilde{k}+1}^{j} \left( {\xi_{f}^{j}} +\eta_{1}^{\tilde{k}} {{{\varDelta}}^{j}_{k}}\right)+\lambda_{2,\tilde{k}+1}^{j}\left( \xi_{g}^{{k},j}+\eta_{2}^{\tilde{k}}{{{\varDelta}}^{j}_{k}}\right)\right),\\ C^{\tilde{k}+1} &=&\underset{j\in J_{\tilde{k}}^{ora}}{\sum}\left( \lambda_{1,\tilde{k}+1}^{j} \left( e_{k,j}^{f}+{\eta}_{1}^{\tilde{k}} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})\right)+\right.\\ &&\left.\lambda_{2,\tilde{k}+1}^{j}\left( e_{k,j}^{g}+{\eta}_{2}^{\tilde{k}} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})-G(x^{k},\bar{\alpha},\bar{E})\right)\right), \end{array} $$

where \({\sum }_{j\in J_{\tilde {k}}^{ora}}\left (\lambda _{1,\tilde {k}+1}^{j} +\lambda _{2,\tilde {k}+1}^{j} \right )=1\). Since \(J^{agg}_{\tilde {k}}=\emptyset \), then \(J^{agg}_{\tilde {k}+1}=\tilde {k}+1\). Hence,

$$ \begin{array}{@{}rcl@{}} S^{\tilde{k}+2} &= &\underset{j\in J_{\tilde{k}+1}^{ora}}{\sum}\left( \lambda_{1,\tilde{k}+2}^{j} s_{f}^{\tilde{k}+1,j} +\lambda_{2,\tilde{k}+2}^{j} s_{g}^{\tilde{k}+1,j}\right) +\underset{j\in J_{\tilde{k}+1}^{agg}}{\sum}\lambda_{3,\tilde{k}+2}^{j}\cdot S^{j}\\ &=&\underset{j\in J_{\tilde{k}+1}^{ora}}{\sum}\left( \lambda_{1,\tilde{k}+2}^{j} \left( {\xi_{f}^{j}} +\eta_{1}^{\tilde{k}+1} {{{\varDelta}}^{j}_{k}}\right)+\lambda_{2,\tilde{k}+2}^{j}\left( \xi_{g}^{k,j}+\eta_{2}^{\tilde{k}+1}{{{\varDelta}}^{j}_{k}}\right)\right) \\ &&+\underset{j\in J_{\tilde{k}}^{ora}}{\sum}\left( \lambda_{1,\tilde{k}+1}^{j} \lambda_{3,\tilde{k}+2}^{k+1} \left( {\xi_{f}^{j}} +\eta_{1}^{\tilde{k}} {{{\varDelta}}^{j}_{k}}\right)+\lambda_{2,\tilde{k}+1}^{j}\lambda_{3,\tilde{k}+2}^{k+1}\left( \xi_{g}^{\tilde{k},j}+\eta_{2}^{\tilde{k}}{{{\varDelta}}^{j}_{k}}\right)\right)\\ \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} C^{\tilde{k}+2} &=& \underset{j\in J_{\tilde{k}+1}^{ora}}{\sum} \left( \lambda_{1,\tilde{k}+2}^{j} c^{f}_{\tilde{k}+1,j} +\lambda_{2,\tilde{k}+2}^{j} c^{g}_{\tilde{k}+1,j}\right) +\underset{j\in J_{\tilde{k}+1}^{agg}}{\sum}\lambda_{3,\tilde{k}+2}^{j}\cdot C^{j}\\ &=&\underset{j\in J_{\tilde{k}+1}^{ora}}{\sum}\left( \lambda_{1,\tilde{k}+2}^{j} \left( e_{k,j}^{f}+{\eta}_{1}^{\tilde{k}+1} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})\right)\right. \\ &&+\left.\lambda_{2,\tilde{k}+2}^{j} \left( e_{k,j}^{g}+{\eta}_{2}^{\tilde{k}+1} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})-G(x^{k},\bar{\alpha},\bar{E})\right)\right) \\ &&+\underset{j\in J_{\tilde{k}}^{ora}}{\sum}\left( \lambda_{1,\tilde{k}+1}^{j} \lambda_{3,\tilde{k}+2}^{\tilde{k}+1} \left( e_{k,j}^{f}+{\eta}_{1}^{\tilde{k}} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})\right)\right.\\ &&+\left.\lambda_{2,\tilde{k}+1}^{j}\lambda_{3,\tilde{k}+2}^{\tilde{k}+1} \left( e_{k,j}^{g}+{\eta}_{2}^{\tilde{k}} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})-G(x^{k},\bar{\alpha},\bar{E})\right)\right).\\ \end{array} $$

It is easily seen that

$$ \underset{j\in J_{\tilde{k}+1}^{ora}}{\sum}\left( \lambda_{1,\tilde{k}+2}^{j} +\lambda_{2,\tilde{k}+2}^{j} \right)+\underset{j\in J_{\tilde{k}}^{ora}}{\sum}\left( \lambda_{1,\tilde{k}+1}^{j} \lambda_{3,\tilde{k}+2}^{\tilde{k}+1} +\lambda_{2,\tilde{k}+1}^{j} \lambda_{3,\tilde{k}+2}^{\tilde{k}+1} \right)=1. $$

We set

$$ \hat{J}_{\tilde{k}+1}= \left\{ \begin{aligned} & {J}^{ora}_{\tilde{k}+1}\setminus {J}^{C}_{\tilde{k}+1}, & \text{if}\ \ \lambda_{3,\tilde{k}+2}^{\tilde{k}+1}=0, \\ & ({J}^{ora}_{\tilde{k}+1}\setminus {J}^{C}_{\tilde{k}+1}) \cup\ ({J}^{ora}_{\tilde{k}}\setminus {J}^{C}_{\tilde{k}}), & \text{if}\ \ \lambda_{3,\tilde{k}+2}^{\tilde{k}+1}\neq0. \end{aligned} \right. $$

For any \(j\in \hat {J}_{\tilde {k}+1}\) and i ∈{1, 2}, we set

$$ \left\{ \begin{aligned} & \hat{\lambda}_{i,\tilde{k}+2}^{j}=\lambda_{i,\tilde{k}+1}^{j} \lambda_{3,\tilde{k}+2}^{\tilde{k}+1}\ & \text{and} \quad \hat{\eta}_{i}^{j}=\eta_{i}^{\tilde{k}}, \quad & \quad \text{if}\ \ j\in {J}^{ora}_{\tilde{k}}\setminus {J}^{ora}_{\tilde{k}+1}, \\ & \hat{\lambda}_{i,\tilde{k}+2}^{j}=\lambda_{i,\tilde{k}+2}^{j} & \text{and} \quad \hat{\eta}_{i}^{j}=\eta_{i}^{\tilde{k}+1}, & \quad \text{if}\ \ j\in {J}^{ora}_{\tilde{k}+1}\setminus {J}^{ora}_{\tilde{k}}, \\ & \hat{\lambda}_{i,\tilde{k}+2}^{j}=\lambda_{i,\tilde{k}+2}^{j}+\lambda_{i,\tilde{k}+1}^{j} \lambda_{3,\tilde{k}+2}^{\tilde{k}+1}\ & \text{and} \quad \hat{\eta}_{i}^{j}={\ddot{\eta}}_{i}^{j}, \quad & \quad \text{if}\ \ j\in {J}^{ora}_{\tilde{k}}\cap {J}^{ora}_{\tilde{k}+1}, \end{aligned} \right. $$

where

$$ \ddot{\eta}_{i}^{j}= \left\{ \begin{aligned} & \frac{\eta_{i}^{\tilde{k}}+\eta_{i}^{\tilde{k}+1}}{2}, & \quad \text{if}\ \ \hat{\lambda}_{i,\tilde{k}+2}^{j}=0, \\ & \frac{\lambda_{i,\tilde{k}+2}^{j} \eta_{i}^{\tilde{k}+1} +\lambda_{i,\tilde{k}+1}^{j} \lambda_{3,\tilde{k}+2}^{\tilde{k}+1}\eta_{i}^{\tilde{k}}}{\hat{\lambda}_{i,\tilde{k}+2}^{j}}, & \quad \text{if}\ \ \hat{\lambda}_{i,\tilde{k}+2}^{j}>0. \end{aligned} \right. $$

Combining these, we have

$$\eta_{\min}^{i,j} \leq \hat{\eta}_{i}^{j} \leq \eta_{\max}^{i},\ \forall \ i=\{1,2\},\ j\in\hat{J}_{\tilde{k}+1},$$

and

$$ \begin{array}{@{}rcl@{}} S^{\tilde{k}+2} &= & {\sum}_{j\in \hat{J}_{\tilde{k}+1}}\left( \hat{\lambda}_{1,\tilde{k}+2}^{j} \left( {\xi_{f}^{j}} +\hat{\eta}_{1}^{j} {{{\varDelta}}^{j}_{k}}\right)+\hat{\lambda}_{2,\tilde{k}+2}^{j}\left( \xi_{g}^{\tilde{k}+1,j}+\hat{\eta}_{2}^{j}{{{\varDelta}}^{j}_{k}}\right)\right), \\ C^{\tilde{k}+2} &= &{\sum}_{j\in \hat{J}_{\tilde{k}+1}}\left( \hat{\lambda}_{1,\tilde{k}+2}^{j} \left( e_{k,j}^{f}+{\hat{\eta}}_{1}^{j} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})\right) \right.\\ &&+\left.\hat{\lambda}_{2,\tilde{k}+2}^{j} \left( e_{k,j}^{g}+{\hat{\eta}}_{2}^{j} {b_{k}^{j}}+G^{+}(x^{k},\bar{\alpha},\bar{E})-G(x^{k},\bar{\alpha},\bar{E})\right)\right). \end{array} $$

By the same argument, we can obtain that there exist \(\hat {\lambda }_{1,k+1},\hat {\lambda }_{2,k+1}\in \mathbb {R}^{k+1}\) and \(\hat {J}_{k}\) satisfying (34) such that (35) and (36) hold. □

From Theorems 2 and 3, we can obtain that G(⋅,αk,Ek) is regular on X. The authors [17] state that the following lemma holds. We give the proof since the cutting plane model and the bundle update are different from [17]. In addition, the proof of [17] does not consider the aggregate information of the cutting plane model.

Lemma 10

Assume that f is regular on X. Suppose that \(\{{\eta ^{k}_{1}}\}\) and \(\{{\eta _{2}^{k}}\}\) are bounded above. Let xk (kkE) be the current stability center and \(\bar {\eta }=\max \limits _{j\in \{0,1,...,k\}}\{{\eta _{1}^{j}},{\eta _{2}^{j}}\}\). The upper envelope model is defined as

$$ \begin{array}{cc} {{\varPhi}}(y,x^{k},\alpha^{k},E^{k})= G^{+}(x^{k},\alpha^{k},E^{k})+ \\ \underset{i=1,2}{\sup}\left\{m^{i}_{y^{+},\xi_{i},\eta} |\ y^{+}\in {B}(x^{k};R),\xi_{1}\in\partial f(y^{+}),\xi_{2}\in\partial G(y^{+},\alpha^{k},E^{k}),\eta\in[\gamma,\bar{\eta}]\right\}, \end{array} $$

where

$$ m^{i}_{y^{+},\xi_{i},\eta}:={-\frac{\gamma}{2}}\|y^{+}-x^{k}\|^{2}+\langle\xi_{i}+\eta(y^{+}-x^{k}),y-x^{k}\rangle, i=1,2. $$
(37)

B(xk; R) is a fixed ball large enough to contain all possible trial steps during the iteration k. Then, the following statements hold.

  1. 1.

    Φ(⋅,xk,αk,Ek) is a convex function and \(\hat {H}_{k}(\cdot )\leq {{\varPhi }}(\cdot ,x^{k},\alpha ^{k},E^{k})\).

  2. 2.

    Φ(xk,xk,αk,Ek) = G+(xk,αk,Ek).

  3. 3.

    \(\partial {{\varPhi }}(x^{k},x^{k},\alpha ^{k},E^{k})\subseteq \partial H(x^{k},x^{k},\alpha ^{k},E^{k})\).

Proof

We only show that \(\hat {H}_{k}(\cdot )\leq {{\varPhi }}(\cdot ,x^{k},\alpha ^{k},E^{k})\). For the remainder of the proofs, we refer the reader to [17].

Using (14) together with (15), for \(j\in J_{k}^{ora}\) we obtain that

$$ \begin{array}{@{}rcl@{}} -c_{k,j}^{f}+\langle s_{f}^{k,j},y-x^{k}\rangle & \leq -\gamma {b_{k}^{j}}+{\langle\xi_{f}^{j}}+{\eta^{k}_{1}}(y^{j}-x^{k}),y-x^{k}\rangle=m^{1}_{y^{j},{\xi_{f}^{j}},{\eta^{k}_{1}}}, \\ -c_{k,j}^{g}+\langle s_{g}^{k,j},y-x^{k}\rangle & \leq -\gamma {b_{k}^{j}}+\langle\xi_{g}^{k,j}+{\eta^{k}_{2}}(y^{j}-x^{k}),y-x^{k}\rangle=m^{2}_{y^{j},\xi_{g}^{k,j},{\eta^{k}_{2}}}. \end{array} $$

On the other hand, from Lemma 9, we can obtain that for \(j\in J_{k}^{agg}\) there exist \(\hat {\lambda }_{1,j},\hat {\lambda }_{2,j}\in \mathbb {R}^{j}\), and \(\hat {J}_{j-1}\) satisfying (34) such that

$$ \begin{array}{@{}rcl@{}} S^{j} &= & \underset{l\in \hat{J}_{j-1}}{\sum} \left( \hat{\lambda}_{1,j}^{l}\left( {\xi_{f}^{l}}+\hat{\eta}_{1}^{l} {{{\varDelta}}_{k}^{l}}\right)+\hat{\lambda}_{2,j}^{l}\left( \xi_{g}^{k,l}+\hat{\eta}_{2}^{l} {{{\varDelta}}_{k}^{l}}\right)\right), \\ C^{j} &= & \underset{l\in \hat{J}_{j-1}}{\sum} \left( \hat{\lambda}_{1,j}^{l}\left( e_{k,l}^{f}+\hat{\eta}_{1}^{l} {b_{k}^{l}}+G^{+}(x^{k},\bar{\alpha},\bar{E})\right)\right.\\ &&+\left.\hat{\lambda}_{2,j}^{l}\left( e_{k,l}^{g}+\hat{\eta}_{2}^{l} {b_{k}^{l}}+G^{+}(x^{k},\bar{\alpha},\bar{E})-G(x^{k},\bar{\alpha},\bar{E})\right)\right), \end{array} $$

where \(\hat {\eta }_{i}^{l}\geq {\eta }_{\min \limits }^{i,l},\ i=1,2\). According to (16) and the definition of \({\eta }_{\min \limits }^{i,l}\), we can obtain that

$$ \begin{array}{@{}rcl@{}} e_{k,l}^{f}+\hat{\eta}_{1}^{l} {b_{k}^{l}} \geq \gamma {b_{k}^{l}}, ~~ e_{k,l}^{g}+\hat{\eta}_{2}^{l} {b_{k}^{l}} \geq \gamma {b_{k}^{l}}. \end{array} $$

Hence,

$$ \begin{array}{@{}rcl@{}} &&-C^{j}+\langle S^{j} ,y-x^{k}\rangle \\ &\leq & \underset{l\in \hat{J}_{j-1}}{\sum}\left( \hat{\lambda}_{1,j}^{l}\left( -\gamma {b_{k}^{l}}+\langle {\xi_{f}^{l}}+\hat{\eta}_{1}^{l} {{{\varDelta}}_{k}^{l}},y-x^{k}\rangle\right)\right.\\ &&+\left.\hat{\lambda}_{2,j}^{l}\left( -\gamma {b_{k}^{l}}+\langle \xi_{g}^{k,l}+\hat{\eta}_{2}^{l} {{{\varDelta}}_{k}^{l}},y-x^{k}\rangle\right)\right) \\ &\leq & \max\{-\gamma {b_{k}^{l}}+\langle {\xi_{f}^{l}}+\hat{\eta}_{1}^{l} {{{\varDelta}}_{k}^{l}},y-x^{k}\rangle, -\gamma {b_{k}^{l}}+\langle \xi_{g}^{k,l}+\hat{\eta}_{2}^{l} {{{\varDelta}}_{k}^{l}},y-x^{k}\rangle\} \\ &\leq & \underset{\eta\in[\gamma,\bar{\eta}]}{\sup} \{m^{1}_{y^{j},{\xi_{f}^{j}},\eta}, m^{2}_{y^{j},\xi_{g}^{k,j},\eta}\}. \end{array} $$

By the definition (22) of \(\hat {H}_{k}(\cdot )\), we can obtain that \(\hat {H}_{k}(\cdot )\leq {{\varPhi }}(\cdot ,x^{k},\alpha ^{k},E^{k})\). □

Theorem 6

Assume that f is bounded below and regular on X. Suppose that \(\{{\eta ^{k}_{1}}\}\) and \(\{{\eta _{2}^{k}}\}\) are bounded above. Consider the stopping parameter tol1 = 0 and suppose there is no termination. Then, the following mutually exclusive situations hold:

(i):

Algorithm 1 generates an infinite sequence of serious steps. Let \(\bar {x}\) be the accumulation point of {xk}, then \(0\in \partial H(\bar {x},\bar {x},\bar {\alpha },\bar {E})+\partial I_{X}(\bar {x})\).

(ii):

Algorithm 1 generates a finite sequence of serious steps. Let \(\bar {x}\) be the last stability center, then \(0\in \partial H(\bar {x},\bar {x},\bar {\alpha },\bar {E})+\partial I_{X}(\bar {x})\).

In addition, if MFCQ holds at \(\bar {x}\) for problem (12) with \(E=\bar {E}\), then \(\bar {x}\) is a tol2-KKT point of the problem (1).

Proof

In what follows, we consider kkE. From Theorem 1 and Lemma 9, we can obtain that {Sk} is bounded; thus, we can assume that \(S^{k+1}\rightarrow \bar {S}\). Hence, \(v^{k+1}\rightarrow -\bar {S}\) as \(k\rightarrow \infty \). Using (21) together with Lemma 10, we can obtain that

$$ \begin{array}{@{}rcl@{}} {{\varPhi}}(y,x^{k},\bar{\alpha},\bar{E})&\geq & \ \hat{H}_{k}(y)\\ &\geq & \ \hat{H}_{k}(y^{k+1})+\langle S^{k+1},y-y^{k+1}\rangle \\ &= & \ G^{+}(x^{k},\bar{\alpha},\bar{E})-C^{k+1}+\langle S^{k+1},y-x^{k}\rangle. \end{array} $$

By (19) and the convexity of IX, IX(y) ≥ IX(yk+ 1) + 〈vk+ 1,yyk+ 1〉. As a result, it holds that

$$ {{\varPhi}}(y,x^{k},\bar{\alpha},\bar{E})+I_{X}(y)\geq G^{+}(x^{k},\bar{\alpha},\bar{E})-C^{k+1}+\langle S^{k+1}+v^{k+1},y-x^{k}\rangle+\langle v^{k+1},x^{k}-y^{k+1}\rangle. $$
(38)
  1. (i)

    First, note that {xk} is in the compact set X, so it has an accumulation point, say for some infinite set \(K\subseteq \mathcal {K}_{s}\), \(x^{k}\rightarrow \bar {x}\) as \(K\ni k\rightarrow \infty \). For all k + 1 ∈ K, we have \(x^{k}-y^{k+1}=\frac {1}{\mu _{k}}(S^{k+1}+v^{k+1})\). By Lemma 6, \(C^{k+1}\rightarrow 0\), HCode \(S^{k+1}+v^{k+1}\rightarrow 0\), HCode \(x^{k}-y^{k+1}\rightarrow 0\), \(x^{k}\rightarrow \bar {x}\) as \(K\ni k+1\rightarrow \infty \). Passing onto the limit in (38) as \(K\ni k+1\rightarrow \infty \), we obtain that

    $$ {{\varPhi}}(y,\bar{x},\bar{\alpha},\bar{E})+I_{X}(y)\geq G^{+}(\bar{x},\bar{\alpha},\bar{E})+\langle 0,y-\bar{x}\rangle={{\varPhi}}(\bar{x},\bar{x},\bar{\alpha},\bar{E})+\langle 0,y-\bar{x}\rangle. $$

    Using the convexity of \({{\varPhi }}(\cdot ,\bar {x},\bar {\alpha },\bar {E})+I_{X}(\cdot )\), we have \(0\in \partial {{\varPhi }}(\bar {x},\bar {x},\bar {\alpha },\bar {E})+\partial I_{X}(\bar {x}).\) According to Lemma 10, we can conclude that \(0\in \partial H(\bar {x},\bar {x},\bar {\alpha },\bar {E})+\partial I_{X}(\bar {x}).\)

  2. (ii)

    Let \(\bar {k}\) be the last serious iteration, i.e., \(x^{k}=\bar {x}\) for all \(k\geq \bar {k}\). For \(k\geq \bar {k}\), we have that \(\bar {x}-y^{k+1}=x^{k}-y^{k+1}=\frac {1}{\mu _{k}}(S^{k+1}+v^{k+1})\). By Lemma 8, \(C^{k+1}\rightarrow 0\), HCode \(S^{k+1}+v^{k+1}\rightarrow 0\), HCode \(y^{k+1}-x^{k}\rightarrow 0\) as \(k\rightarrow \infty \). Passing onto the limit in (38) as \(k\rightarrow \infty \), we obtain that

    $$ {{\varPhi}}(y,\bar{x},\bar{\alpha},\bar{E})+I_{X}(y)\geq G^{+}(\bar{x},\bar{\alpha},\bar{E})+\langle 0,y-\bar{x}\rangle={{\varPhi}}(\bar{x},\bar{x},\bar{\alpha},\bar{E})+\langle 0,y-\bar{x}\rangle. $$

    By the same discussion, we deduce that \(0\in \partial H(\bar {x},\bar {x},\bar {\alpha },\bar {E})+\partial I_{X}(\bar {x})\).

    From Theorem 5 and Remark 2, we obtain that \(\bar {x}\) is a tol2-KKT point of the original problem (1). The proof is complete.

Theorem 7

Assume that f is bounded below and regular on X. Suppose that \(\{{\eta ^{k}_{1}}\}\) and \(\{{\eta _{2}^{k}}\}\) are bounded above. Consider the stopping parameter tol1 = 0 and suppose that algorithm loops forever. If f is semismooth, then for any accumulation point \(\bar {x}\) of the sequence {xk}, for each 𝜖 > 0 there exists ρ > 0 such that

$$ \max\left\{f(y)-f(\bar{x}) ,G(y,\bar{\alpha},\bar{E})\right\}\geq -\epsilon\|y-\bar{x}\|, \ {\text{for\ all} \ y\in X\cap B(\bar{x};\rho)}. $$

If, in addition, the set \(X_{\rho ,\epsilon ,tol_{2}}=\{y\in X\cap B(\bar {x};\rho ): G(y)<-(\rho \epsilon +tol_{2})\}\) is not empty, then

$$ f(y)\geq f(\bar{x})-\epsilon\|y-\bar{x}\|\ \text{for\ all}\ y\in X_{\rho,\epsilon,tol_{2}}. $$

Proof

Let \(\hat {\lambda }^{j}=\hat {\lambda }_{1,k+1}^{j}+\hat {\lambda }_{2,k+1}^{j}\), then we can obtain that

$$ C^{k+1}\geq \gamma\underset{j\in \hat{J}_{k}}{\sum} \hat{\lambda}^{j} {b_{k}^{j}} \geq \gamma \underset{j\in \hat{J}_{k}}{\sum} (\hat{\lambda}^{j})^{2} {b_{k}^{j}}=\frac{\gamma}{2}\underset{j\in \hat{J}_{k}}{\sum}\left( \hat{\lambda}^{j}\|y^{j}-x^{k}\|\right)^{2}. $$
(39)

Equation (39) together with Lemma 6 and 8 implies that \(\|y^{j}-x^{k}\|\rightarrow 0\) for all \(j\in \hat {J}_{k}\). Moreover, \(\|y^{j}-\bar {x}\|\leq \|y^{j}-x^{k}\|+\|x^{k}-\bar {x}\|\rightarrow 0\) for all \(j\in \hat {J}_{k}\). From Proposition 1 and Theorem 3, we can obtain that to fix any 𝜖 > 0 there exist ρ > 0, \(\hat {\lambda }_{1,k+1},\hat {\lambda }_{2,k+1}\in \mathbb {R}^{k+1}\) and \(\hat {J}_{k}\) satisfying (34) such that \(y^{j}\in B(\bar {x};\rho )\), for all \(j\in \hat {J}_{k}\) and

$$ \begin{aligned} & f(y)\geq \ f(y^{j})+{\langle\xi^{j}_{f}},y-y^{j}\rangle-\epsilon\|y-y^{j}\|,\ y\in X\cap B(\bar{x};\rho), \\ & G(y,\bar{\alpha},\bar{E})\geq \ G(y^{j},\bar{\alpha},\bar{E})+\langle\xi^{k,j}_{g},y-y^{j}\rangle-\epsilon\|y-y^{j}\|,\ y\in X\cap B(\bar{x};\rho). \end{aligned} $$

Using (35) together with (36), for \(j\in \hat {J}_{k}\) we see that

$$ \begin{array}{@{}rcl@{}} && \ \max\left\{f(y)-f(x^{k}) ,G(y,\bar{\alpha},\bar{E})\right\} \\ &\geq & \ \underset{j\in \hat{J}_{k}}{\sum}\left( \hat{\lambda}_{1,k+1}^{j}\left( f(y)-f(x^{k})\right)+\hat{\lambda}_{2,k+1}^{j} G(y,\bar{\alpha},\bar{E})\right)\\ &\geq & \ \underset{j\in \hat{J}_{k}}{\sum}\left( \hat{\lambda}_{1,k+1}^{j}\left( f(y^{j})+{\langle\xi^{j}_{f}},y-y^{j}\rangle-\epsilon\|y-y^{j}\|-f(x^{k})\right)\right.\\ &&+\left.\hat{\lambda}_{2,k+1}^{j}\left( G(y^{j},\bar{\alpha},\bar{E})+\langle\xi^{k,j}_{g},y-y^{j}\rangle-\epsilon\|y-y^{j}\|\right)\right)\\ &= & \underset{j\in \hat{J}_{k}}{\sum}\left( -\epsilon\hat{\lambda}^{j}\|y-y^{j}\|+\hat{\lambda}_{1,k+1}^{j}\left( -e^{f}_{k,j}+{\langle\xi^{j}_{f}},y-x^{k}\rangle\right)\right.\\ &&\left.+\hat{\lambda}_{2,k+1}^{j} \left( -e^{g}_{k,j}+\langle\xi^{k,j}_{g},y-x^{k}\rangle + G(x^{k},\bar{\alpha},\bar{E}) \right)\right)\\ &\geq & \ -C^{k+1} +G^{+}(x^{k},\bar{\alpha},\bar{E})+\langle S^{k+1},y-x^{k}\rangle-\left\langle \underset{j\in \hat{J}_{k}}{\sum}\hat{\eta}^{j}(y^{j}-x^{k}),y-x^{k}\right\rangle\\ &&-\epsilon\underset{j\in \hat{J}_{k}}{\sum}\hat{\lambda}^{j}\|y-y^{j}\|\\ &\geq & \ -C^{k+1}+\langle S^{k+1},y-x^{k}\rangle-\left\langle \underset{j\in \hat{J}_{k}}{\sum}\hat{\eta}^{j}(y^{j}-x^{k}),y-x^{k}\right\rangle\\ &&-\epsilon\underset{j\in \hat{J}_{k}}{\sum}\hat{\lambda}^{j}\left( \|y-x^{k}\|+\|y^{j}-x^{k}\|\right). \end{array} $$

From Theorem 1 and Lemma 9, we can obtain that {Sk} is bounded; thus, we can assume that \(S^{k+1}\rightarrow \bar {S}\). By noting Lemma 6 and Lemma 8, and passing to the limit as \(k\rightarrow \infty \), we obtain that

$$ \max\left\{f(y)-f(\bar{x}) ,G(y,\bar{\alpha},\bar{E})\right\}\geq \langle \bar S,y-\bar x\rangle-\epsilon\|y-\bar{x}\|. $$

As already seen, \(S^{k+1}+v^{k+1}\rightarrow 0\) implies that \(-\bar S\in \partial I_{X}(\bar x)\), thus \(\langle -\bar S,y-\bar x\rangle \leq 0\). Hence

$$ \max\left\{f(y)-f(\bar{x}) ,G(y,\bar{\alpha},\bar{E})\right\}\geq -\epsilon\|y-\bar{x}\|. $$

By (11), we have \(G(y,\bar {\alpha },\bar {E})\leq G(y)+D(\bar {E},\bar {\alpha })\leq G(y)+tol_{2}\). Since also \(y\in B(\bar {x};\rho )\), then \(f(y)\geq f(\bar {x})-\epsilon \|y-\bar {x}\|\) for all \(y\in X_{\rho ,\epsilon ,tol_{2}}=\{y\in X\cap B(\bar {x};\rho ): G(y)<-(\rho \epsilon +tol_{2})\}\). The proof is complete. □

5 Computational results

From Section 2.3, we find that the SIP can be rewritten as a finite constrained optimization problem when the lower level problem is a concave optimization problem. Hence, we consider the nonconvex nonsmooth constrained optimization problems and the nonconvex nonsmooth SIP problems.

In this section, we report some numerical results to verify the practical efficiency of our approach. We consider two classes of test problems and those test problems are introduced in Appendix Appendix. Solvers. For comparison purposes, we use the following solvers:

  • PBM-inexact, the Proximal Bundle Method from [23].

  • ICPBM, the Infeasible Constrained Proximal Bundle Method from [17].

  • SolvOpt, the pubic software for local nonlinear optimization problems. The code is available at https://imsc.uni-graz.at/kuntsevich/solvopt/index.html.

  • ASA, the Adaptive Subdivision Algorithm from [41].

  • fseminf, Matlab Optimization Toolbox for finding minimum of semi-infinitely constrained multivariable nonlinear function.

Solvers ICPBM and SolvOpt are designed for nonsmooth nonconvex constrained problems; these methods are only applicable to Problems 1–8. Solvers ASA and fseminf are designed for semi-infinite programming problems; these methods are only applicable to Problems 9–23. All solvers were implemented in MATLAB R2018a (Windows 10 64-bit). The experiments were performed on a PC with Intel Core i5, 2.4 GHz. We solve the quadratic subproblems using quadprog, which is available in the Matlab Optimization Toolbox. For solving the NLP subproblems, ASA and Algorithm 1 use fmincon of Matlab Optimization Toolbox. Parameters. Parameters of Algorithm 1 are set as follows: HCode \(tol_{1}=10^{-6},\ tol_{2}=10^{-6},\ s=2,\ \gamma =2,\ i_{n\max \limits }=100.\) We choose the descent parameter as follows:

$$ m=\left\{\begin{aligned} & 0.01, \quad \text{if}\ \ n\leq 20 \\ & 0.25, \quad \text{if}\ \ 20<n\leq200 \\ & 0.55, \quad \text{if}\ \ n>200. \end{aligned}\right. $$

The proximal parameter is only updated at serious steps. We set μ0 = 5 and

$$ \mu_{k+1}:=\max\left( \mu_{\min},\min\left( \bar{\mu}_{k+1},\mu_{\max}\right)\right), $$

where

$$ \begin{array}{@{}rcl@{}} \bar{\mu}_{k+1}:=\frac{\|S^{k+1}-S^{k}\|^{2}}{\langle S^{k+1}-S^{k},y^{k+1}-y^{k}\rangle},\\ \mu_{\min}=1,\ \mu_{\max}=10^{5}.\\ \end{array} $$

In ICPBM, the stopping tolerance tol has been set to 10− 6. For all the other parameters, we have used the default values suggested by the respective codes [17]. Furthermore, in the parameters selection of PBM-inexact and ASA we have used the default values [23, 41]. In ASA, the maximum number of iterations is fixed to 10,000. In fseminf, the initial sampling interval is set as follows:

$$ \left\{ \begin{aligned} & a :\frac{b-a}{10^{3}}:b, \ \ & p=1, \\ & a_{i} :\frac{b_{i}-a_{i}}{10^{2}}:b_{i},i=1,2,\ & p=2. \end{aligned} \right. $$

The stopping criterions of fseminf are set as follows:

$$ TOLFun=10^{-6}, TOLCon=10^{-6}, TOLX=10^{-6}. $$

Results. We have summarized the results of our numerical experiments in Tables 1 and 2, where we have used the following notations:

  • P is the number of problems,

  • NF is the number of function evaluations,

  • NT is the number of nodes in the final subdivision,

  • time is the CPU time in seconds,

  • f is the objective function value when the algorithm terminates,

  • g is the constraint function value when the algorithm terminates for constrained optimization problems,

  • G measures the feasibility of x when the algorithm terminates for SIP problems, i.e.,

    $$ G^{*}=\underset{t\in\bar{E}}{\max} g(x^{*},t), $$

    where x is the obtained solution of the problem when the algorithm terminates and

    $$ \bar{E}= \left\{ \begin{aligned} & a :\frac{b-a}{10^{6}}:b, & p=1, \\ & a_{i} :\frac{b_{i}-a_{i}}{10^{4}}:b_{i},i=1,2, & p=2. \end{aligned} \right. $$

The results presented in Table 1 show that the new solver Algorithm 1 is more reliable to find an optimal solution to each problem than the other solvers. The solvers Algorithm 1 and PBM-inexact reach the same optimal value in most problems, only in some of the problems (Problem 3, 7 (n= 100)) Algorithm 1 reaches lower objective value. Despite being more unreliable than Algorithm 1, the solver PBM-inexact has also quite a good performance and yields feasible points in all problems. In some of the problems (Problem 7 (n= 20,100,200), 8 (n= 100,200)) Algorithm 1 reaches a lower objective value than the solver ICPBM. The solver SolvOpt fails to find the optimal value in Problems 7 and 8 (n≥ 20).

Figure 1 depicts the performance plot comparing the solvers Algorithm 1, PBM-inexact, ICPBM and SolvOpt on Problems 1–8. It can be observed that Algorithm 1 performs much better in terms of CPU time, compared to the other solvers. The solver SolvOpt is the faster one for n ≤ 10. However, it uses more computational efforts than the other solvers in Problems 7 and 8 (n > 10).

Fig. 1
figure 1

Performance plot for Algorithm 1, PBM-inexact, ICPBM and SolvOpt

All in all, we can conclude that the new solver Algorithm 1 is really efficient to solve the nonconvex, nonsmooth constrained optimization problems.

Table 1 Summary of numerical results with PBM-inexact, ICPBM, and SolvOpt
Table 2 Summary of numerical results with PBM-inexact, ASA, and fseminf

The results presented in Table 2 show that the solver Algorithm 1 is efficient to find an optimal solution to each SIP problem. The solvers Algorithm 1 and PBM-inexact find a feasible point to each SIP problem. The solver ASA fails to find the optimal solution approximately in 40% of the cases while fseminf fails in 80% of the cases. From Table 2, it observes that Algorithm 1 requires a lot more nodes than ASA. However, Algorithm 1 is much faster than ASA. This is expected because of the subproblems of ASA use more computational efforts than the subproblems of Algorithm 1 and the subproblems of our method do not depend on the number of the nodes in the subdivision.

Figure 2 depicts the performance plot comparing the solvers Algorithm 1, PBM-inexact, ASA, and fseminf on Problems 9–23. Algorithm 1 performs much better in terms of CPU time, compared to the solvers PBM-inexact and ASA. The solver fseminf is the faster one in some cases. However, it cannot find the optimal value in most problems.

Fig. 2
figure 2

Performance plot for Algorithm 1, PBM-inexact, ASA, and fseminf

All in all, we can conclude that Algorithm 1 is really efficient to solve the nonconvex, nonsmooth SIP problems.

6 Conclusions

In this paper, we design a feasible proximal bundle method with convexification for SIP problems. We consider SIP problems that involve nonsmooth, nonconvex functions and present upper bounding problem. The upper bounding problem is constructed based on a concave relaxation of the lower level problem which results in a restriction of the SIP. The relaxed lower level problem is a finite nonsmooth problem and constructed with ideas from the α BB method of global optimization. Then, we introduce an improvement function of the upper bounding problem, construct a cutting plane model of the improvement function, and reformulate the cutting plane model as a quadratic programming problem and solve it. In contrast to the traditional bundle method, the cutting plane model involves new parameters E, α which vary along the iterations. If \(D(E,\ \alpha )\rightarrow 0\), the upper bounding problems converge to the original SIP problem. We perform an initialization to generate a feasible initial point of our proximal bundle method. Under our refinement procedure and descent condition, we can obtain that each iteration point is feasible. The reasonable convergence properties of the our algorithms are obtained under mild assumptions. The presented results of numerical experiments confirm that Algorithm 1 is efficient for solving nonsmooth, nonconvex SIP problems. The feasibility of Algorithm 1 is demonstrated through theoretical analysis and numerical experiments.