1 Introduction

The semi-infinite programming problem is an optimization problem with finite-dimensional decision variables x∈ℝn and infinite number of constraints. We are concerned about the following nonlinear semi-infinite programming problem (NSIP) in this paper:

$$\everymath{\displaystyle}\begin{array}{@{}l@{\quad}l}(\mathrm{NSIP}){:} & \min_{x\in X} f(x)\\[2mm]&\mbox{subject\ to} \quad g(x,y)\leq0,\quad y\in Y,\end{array}$$

where X is a box constraint set in ℝn with x Lxx U. Both x L and x U are n-dimensional vectors with x L<x U; the vector inequalities are understood as componentwise. The objective function f:ℝn→ℝ is twice differentiable on X; set Y is a fixed nonempty compact subset of ℝm; and the constraint g is a twice differentiable function with respect to x and y. A more general case where the x-dependent index set Y(x) substitutes for the fixed index set Y is called generalized semi-infinite programming problems (GSIP). In this paper, we focus on the case of m=1 for the sake of convenience. Moreover, without loss of generality, we let Y=[0,1] throughout this paper.

The optimality condition and duality theorem for semi-infinite programming problems (SIP) have been studied since the 1960s. A complete survey on the optimality condition and duality theory for linear, convex, and smooth SIP problems is clarified in [22]. Many applied problems can be treated as SIP problems, such as approximation theory, economic problems, and numerous engineering problems such as optimal control, robot trajectory planning, digital filter design, air pollution control, and production planning. We refer to [12], an excellent review paper on SIP problem including the theory, algorithms, and some applications. Furthermore, [19, 21] summarize the existing and typically used algorithms for solving SIP problems.

The main difficulty in solving SIP problems is the infinite number of constraints. In the last two decades, many algorithms have been proposed to solve SIP problems. Among these, the most popular method for solving SIP problems is the discretization method (cf. [19, 23, 25]). Another common approach, the reduction-based method, is reduced locally to a finite-dimensional nonlinear programming problem (see, e.g. [11, 18, 24]). In these two methods, the discretization method costs much computational time when the cardinality of the finite set of Y grows. Moreover, although the reduction-based method, which utilizes an exact penalty function, yields global convergence, it usually requires stronger assumptions and is implemented in a rather simple form. In addition to the foregoing methods, the Karush-Kuhn-Tucker (KKT) system of the problem (NSIP) is reformulated into a system of semi-smooth equations (cf. [15, 20, 26]). The authors then apply the smoothing, semi-smoothing Newton method, or Newton-type method to solve the KKT system. In the recent studies, a branch-and-bound technique is used to generate convergent sequences of upper and lower bounds on the SIP solution value in these articles [5, 6]. The authors focus on finding the global solution of SIP problems and present the algorithm with feasible iterates for SIP problems.

In addition, one of the important methods, the exchange method, is proposed in [4, 14]. The exchange method, also called the cutting plane method, provides an outer approximation of the feasible set of SIP problems. For each iterate, this algorithm reduces the infinite index set Y into a finite index subset of Y. However, these authors need to evaluate a problem called lower level optimization problem at every iteration:

$$ \max g(\bar{x},y) \quad\mbox{subject\ to} \quad y\in Y,$$
(1)

for a given point \(\bar{x}\in\mathbb{R}^{n}\). Determining the global maximizer of problem (1) numerically may be difficult, especially in the case where \(g(\bar{x},\cdot)\) is non-convex with respect to the variable y. In fact, the standard solver for (1) can only be expected to obtain a local maximizer, not the global one. Nevertheless, to solve the SIP problem, where both f(x) and g(x,⋅) are convex functions, Zhang et al. [28] propose a new exchange method. This method only requires finding some points with a certain computationally easy criterion instead of finding the global optimal solution of the problem (1).

Recently, Floudas and Stein [8] propose an adaptive convex relaxation of the lower level problem (1). The authors split Y into a finite number of subintervals, and substituted an overestimated concave function regarded as the approximate constraint for each subinterval. They produced feasible iterates for the SIP problems. Although the adaptive convexification method provides an idea to solve the global maximum of the lower level problem (1) efficiently, we find that the computation may be costly when the number of subintervals increases. Therefore, we intend to integrate the idea of the exchange method into the adaptive convexification method, and then propose a modified algorithm in this paper. We now denote that \(\mathcal{F}\) is the feasible set of the problem (NSIP) as follows:

$$\mathcal{F}=\{x\in X\ |\ g(x,y)\leq0, \mbox{\ for\ all\ } y\in Y\}.$$

Here, we provide the first order optimality condition for the problem (NSIP) from [22].

Theorem 1

Let \(\tilde{x}\in\mathcal{F}\) be a local minimizer of (NSIP) and \(Y_{\mathrm{act}}(\tilde{x}):=\{y\in Y|g(\tilde{x},y)=0\}\) be a nonempty set. There exists \(\tilde{y}_{k}\in Y_{\mathrm{act}}(\tilde{x})\), for 1≤kn+1, and (λ 0,λ 1,…,λ n+1)∈ℝ×ℝn+1, where λ 0,λ 1,…,λ n+1≥0 such that

(2)
(3)
(4)

hold. Furthermore, a point \(\tilde{x}\in\mathcal{F}\) is said to satisfy the extended Mangasarian-Fromovitz constraint qualification (EMFCQ) if there exits a direction d∈ℝn such that \(\nabla_{x}g(\tilde {x},y)^{\top}d<0\) for all \(y\in Y_{\mathrm{act}}(\tilde{x})\). If EMFCQ is valid at \(\tilde{x}\), we can set λ 0=1 in (2).

The point \(\tilde{x}\) that satisfies (2)–(4) is called the stationary point for the problem (NSIP). Throughout this paper, we assume that EMFCQ holds at every stationary point of the problem (NSIP). Under EMFCQ, (2)–(4) with λ 0=1 are called KKT optimality conditions. Regarding the result in [22], the set {λ k } is nonempty and bounded when the EMFCQ is fulfilled.

We organize this paper as follows. In Sect. 2, we introduce one kind of convex relaxed methods: the αBB method. The algorithm used to solve the generated subproblem is also mentioned in this section. We propose a modified adaptive convexification algorithm in Sect. 3 and show the convergence properties in Sect. 4. Finally, we apply our algorithm to some examples and present the computational efficiency of the algorithm.

2 Preliminaries

2.1 The αBB method

Based on the convergence properties of the existing global optimization algorithms, these algorithms can be categorized into deterministic and stochastic. One of the deterministic global optimization algorithms, the αBB (α-based Branch and Bound) method, offers mathematical guarantees for convergence to find a point arbitrarily close to the global minimum for the twice-differentiable nonlinear programming problems. In the αBB method, a convex underestimator of a nonconvex function is constructed by decomposing it into a numerous sum of nonconvex terms (cf. [1, 2, 7]). We mention in Sect. 1 that obtaining the global maximizer of the lower level optimization problem (1) may be difficult. Therefore, the concept of the αBB method is used to construct the concave overestimator for the nonconvex constraint g(x,y). We now divide Y into N subintervals with \(Y=\bigcup_{j=1}^{N}[\eta^{L}_{j},\eta^{U}_{j}]\), where \(\eta_{1}^{L}=0\), \(\eta_{N}^{U}=1\) and \(\eta^{L}_{j}<\eta^{U}_{j}=\eta^{L}_{j+1}\), for j=1,2,…,N−1. The single semi-infinite constraint, g(x,y)≤0, yY, can be considered N semi-infinite constraints under the discretization

$$g(x,y)\leq0, \quad y\in Y_j, \quad\mbox{for\ }j=1,2,\ldots,N,$$

where \(Y_{j}=[\eta^{L}_{j},\eta^{U}_{j}]\). With the obvious modification, we construct a concave overestimator by adding a quadratic relaxation function ψ for a nonconcave function \(g:[\eta^{L}_{j},\eta^{U}_{j}]\rightarrow\mathbb{R}\), a C 2-function on an open neighborhood of \([\eta^{L}_{j},\eta^{U}_{j}]\). Set

$$\psi(y:\alpha_j,\eta^L_j,\eta^U_j)=\frac{\alpha_j}{2}(y-\eta^L_j)(\eta^U_j-y),$$

and define g j (x,y), for j=1,2,…,N, by

$$ g_j(x,y)=g(x,y)+\psi(y:\alpha_j,\eta^L_j,\eta^U_j),\quad \mbox{for\ }y\in Y_j.$$
(5)

Note that if α j >0, the function g j (x,y) is greater than g(x,y) except for the boundary points \(\eta^{L}_{j}\) and \(\eta^{U}_{j}\), i.e., g(x,y)≤g j (x,y) for xX, yY j . To ensure that g j (x,y) is a concave function with respect to y on the subinterval Y j , α j must satisfy the following condition:

$$\alpha_j>\displaystyle\max_{(x,y)\in X\times Y_j}\frac{\partial ^2}{\partial y^2} g(x,y),$$

and α j must be positive. Thus,

$$ \centering\alpha_j>\max\biggl(0,\max_{(x,y)\in X\times Y_j}\frac{\partial^2}{\partial y^2} g(x,y)\biggr).$$
(6)

We call α j , which satisfies (6), as the convexification parameter corresponding to Y j , and denote the vector of the convexification parameters by α=(α 1,…,α N ). The maximal separation distance between g(x,y) and g j (x,y) on X×Y j is defined by the maximum of the difference of these functions; i.e., for any xX

$$ d_{\alpha\mathrm{BB}}(y;\alpha_j):=\max_{y\in Y_j}\bigl(g_j(x,y)-g(x,y)\bigr)=\frac{\alpha_j}{8}(\eta^U_j-\eta^L_j)^2.$$
(7)

Clearly, the distance will converge to zero as \((\eta^{U}_{j}-\eta^{L}_{j})\rightarrow0\).

Next, we define some notation. Let \(E=\{Y_{j}=[\eta^{L}_{j},\eta^{U}_{j}]|\,j=1,\ldots,N\}\) be the set of collections of subintervals and the vector α=(α 1,α 2,…,α N ) be the corresponding convexification parameters on E, where α j , defined in (6), is the j-th component of α, i.e., α j is the convexification parameter corresponding to Y j , for j=1,2,…,N. The approximate feasible set \(\mathcal{F}_{\alpha\mathrm {BB}}(E,\alpha)\) is then defined as

$$\mathcal{F}_{\alpha\mathrm{BB}}(E,\alpha)=\{x\in X|\,g_j(x,y)\leq 0,\mbox{\ for\ } y\in Y_j,\ Y_j\in E\},$$

where \(g_{j}(x,y)=g(x,y)+\frac{\alpha_{j}}{2}(y-\eta^{L}_{j})(\eta^{U}_{j}-y)\) is an approximate concave function on the subinterval Y j . Obviously, \(\mathcal{F}_{\alpha\mathrm{BB}}(E,\alpha)\) is a subset of \(\mathcal{F}\). Furthermore, we replace the single semi-infinite constraint g(x,y)≤0 by N semi-infinite constraints,

$$g_j(x,y)\leq0, \quad y\in Y_j, \quad\mbox{for\ }j=1,2,\ldots,N.$$

With the substitution of the single semi-infinite constraint, we obtain an approximate problem (NSIP αBB(E,α)) for problem (NSIP):

$$\everymath{\displaystyle}\begin{array}{@{}l@{\quad}l@{\quad}l}(\mathrm{NSIP}_{\alpha\mathrm{BB}}(E,\alpha)){:} &\min_{x\in X}f(x)\\[2mm]&\mbox{subject\ to}& g_j(x,y)\leq0, \quad y\in Y_j,\ j=1,2,\ldots,N.\\[1mm]& & Y_j\in E.\end{array}$$

The inequality (6) is strict, and thus the problem (NSIP αBB(E,α)) has a finite number of strict concave constraints g j (⋅,y). Therefore, for any fixed point \(\bar{x}\in X\) and each concave constraint function \(g_{j}(\bar{x},y)\), j=1,2,…,N, the lower level problem

$$\max g_j(x,y) \quad\mbox{subject\ to} \quad y\in Y_j,$$

has an unique global optimal solution \(\bar{y}_{j}\). We can obtain \(\bar{y}_{j}\) without difficulty. Next, we will present a method to solve the problem (NSIP αBB(E,α)).

2.2 The cutting plane method

In this subsection, we introduce the cutting plane method to solve the problem (NSIP αBB(E,α)). The cutting plane method is commonly used when solving SIP problems. Let J={1,2,…,N} be the index set of the collection of subintervals, where N is the number of subintervals in E. We present the cutting plane method to solve the problem (NSIP αBB(E,α)) in the following algorithm.

Algorithm 1

The cutting plane method for solving the problem (NSIP αBB(E,α))

Step 1::

For each Y j E, choose a finite set \(T_{j}^{1}=\{y_{j,1},\ldots,y_{j,k^{1}_{j}}\}\) such that \(T^{1}_{j}\subset Y_{j}\). Let \(T^{1}=\bigcup_{j\in J} T^{1}_{j}\) and p=1.

Step 2::

Solve the nonlinear programming problem (NSIP αBB(T p)) to obtain the optimal solution \(\tilde{x}^{p}\), where

$$\everymath{\displaystyle}\begin{array}{@{}l@{\quad}l}(\mathrm{NSIP}_{\alpha\mathrm{BB}}(T^p)){:}&\min_{x\in X} f(x)\\[2mm]&\mbox{subject\ to}\quad g_{j}(x,y_{j,i})\leq0,\quad \mbox{for\ } i=1,2,\ldots,k^p_j,\ j\in J.\end{array}$$
Step 3::

For each jJ, find the optimal solution \(\tilde {y}_{j}\) for the following problem:

$$\max g_j(\tilde{x}^p,y)\quad\mbox{subject\ to} \quad y\in Y_j.$$

If \(g_{j}(\tilde{x}^{p},\tilde{y}_{j})\leq0\) for all jJ, we stop the algorithm and output \(\tilde{x}^{p}\) as the optimal solution. Otherwise, define the set \(T^{p+1}_{j}\) by

$$T^{p+1}_j=\begin{cases}T^p_j\cup\{\tilde{y}_j\} & \text{if } g_j(\tilde{x}^p,\tilde {y}_j)>0;\\T^p_j & \text{if } g_j(\tilde{x}^p,\tilde{y}_j)\leq0.\end{cases}$$

Let \(k_{j}^{p+1}=|T^{p+1}_{j}|\) be the cardinal number of reference set on Y j at (p+1)-iteration for jJ. Set p:=p+1, and go to Step 2.

Associating with each finite subset T p at p-iteration, the problem (NSIP αBB(E,α)) is reduced to a finite dimensional nonlinear programming problem. Therefore, we can solve it using a common NLP solver to obtain the optimal solution \(\tilde{x}^{p}\). We subsequently check the feasibility for each constraint \(g_{j}(\tilde {x}^{p},y)\) on Y j , for Y j E. Next, we add the worst infeasible point to \(T^{p}_{j}\) if there is an infeasible point on Y j . The algorithm terminates when no infeasible point exists in all subintervals in E. The convergence proof for Algorithm 1 can be found in [19, 27]. The convergence proof in [19, 27] is satisfied for every point in a fixed interval, and thus the convergence result of Algorithm 1 naturally holds. Therefore, we omit the proof here.

3 Modified convexification algorithm

The main idea of our algorithm is to modify the adaptive convexification algorithm proposed by Floudas and Stein [8]. They solved the problem (NSIP αBB(E,α)) and refined some of the subintervals to approach the stationary point of the problem (NSIP). Let any given predefined set E 0 be the collection of some subintervals in Y defined by

$$E^0=\{Y^0_{k}|\,k=1,2,\ldots,m_0\},$$

where \(Y^{0}_{k}\) is a subinterval of Y with lower bound \(\eta^{L,0}_{k}\) and upper bound \(\eta^{U,0}_{k}\), for k=1,2,…,m 0, and m 0:=|E 0| denotes the cardinal number of E 0. Note that E 0 only collects some of the subintervals of Y, not all of the subintervals. That is, \(\bigcup^{m_{0}}_{k=1}Y^{0}_{k}\subsetneqq Y\). The convexification parameter \(\alpha^{0}_{k}\) on \(Y^{0}_{k}\) is defined in (6):

$$\alpha^0_{k}>\max\biggl(0,\max_{(x,y)\in X\times Y^0_{k}}\frac {\partial^2}{\partial y^2} g(x,y)\biggr)$$

Let \(\alpha^{0}=(\alpha^{0}_{1},\ldots,\alpha^{0}_{m_{0}})\) be the corresponding convexification parameter on E 0. We substitute E 0 for E and propose a modified subproblem (NSIP αBB(E 0,α 0)):

$$\everymath{\displaystyle}\begin{array}{l@{\quad}l@{\quad}l}(\mathrm{NSIP}_{\alpha\mathrm{BB}}(E^0,\alpha^0)){:} &\min_{x\in X} f(x)\\[2mm]& \mbox{subject\ to} & g^0_{k}(x,y)\leq0,\quad y\in Y^0_{k},\mbox{\ for\ }k=1,2,\ldots,m_0,\\[1mm]& & Y^0_{k}\in E^0,\end{array}$$

where \(g^{0}_{k}(x,y)=g(x,y)+\frac{\alpha^{0}_{k}}{2}(y-\eta ^{L,0}_{k})(\eta^{U,0}_{k}-y)\). We solve the problem (NSIP αBB(E 0,α 0)) to obtain the stationary point x 0 and verify whether x 0 is a stationary point of the problem (NSIP). If x 0 is not a stationary point of the problem (NSIP), we apply the refinement step to formulate a new subproblem until x 0 is the stationary point of the problem (NSIP). Before moving on to discuss the refinement step, \(y^{0}_{k}\) denotes the optimal solution to the following lower level optimization problem

$$\max g^0_{k}(x^0,y)\quad\mbox{subject\ to} \quad y\in Y^0_{k},$$

and the existence of \(y^{0}_{k}\) is unique because of the strict concavity of \(g^{0}_{k}(\cdot,y)\) on \(Y^{0}_{k}\).

Recall the definition that the point \(y^{0}_{k}\in Y^{0}_{k}\) is called an active point at x 0 if \(g^{0}_{k}(x^{0},y^{0}_{k})=0\). Furthermore, we say that the subinterval \(Y^{0}_{k}\) is an active interval at x 0 if \(g^{0}_{k}(x^{0},y^{0}_{k})=0\) for \(y^{0}_{k}\in Y^{0}_{k}\). Define

$$ E_{\mathrm{act}}(x^0)=\{Y^0_{k}\in E^0|Y^0_{k}\mbox{\ is\ an\ active\ interval\ at\ } x^0\},$$
(8)

and

$$ Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^0)=\{y|g^0_{k}(x^0,y)=0,\mbox {\ for\ } y\in Y^0_{k}\mbox{\ and\ }Y^0_{k}\in E_{\mathrm{act}}(x^0)\}.$$
(9)

Clearly, E act(x 0) is a subset of E 0, and \(Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{0})\) collects the active points in \(\bigcup _{k=1}^{m_{0}} Y_{k}^{0}\) at x 0 with respect to the αBB convexification. We can then conclude that the cardinal numbers of E act(x 0) coincides with that of \(Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{0})\) because \(g^{0}_{k}(x^{0},y)\) is strictly concave with respect to y on \(Y^{0}_{k}\). The purpose of the refinement step is to divide the active intervals in E act(x 0). For the numerical technique, we introduce the definition of ε -active interval within a given tolerance. The definition of ε -active interval and the refinement step are modified from the ideas in [8].

Definition 1

Given ε >0, \(Y^{0}_{k}\) is an ε -active interval if \(Y^{0}_{k}\) is an active interval and

$$ \min\{y^0_{k}-\eta^{L,0}_k,\,\eta^{U,0}_k-y^0_{k}\}>\varepsilon _{\cup}\cdot (\eta^{U,0}_k-\eta^{L,0}_k)$$
(10)

holds. Moreover, the active point \(y^{0}_{k}\in Y^{0}_{k}\) is an ε -active point if (10) holds.

Here, we specify the refinement step for the ε -active intervals. The refinement step checks all the subintervals in E 0. We have two conditions when \(Y^{0}_{k}\) is an active interval. One is that \(Y^{0}_{k}\) is the ε -active interval, the refinement step divides \(Y^{0}_{k}\) into two subintervals, \(Y^{0}_{k,1}\) and \(Y^{0}_{k,2}\); and also generates a set \(\tilde{E}^{0}=\{Y^{0}_{1},\ldots ,Y^{0}_{k,1},Y^{0}_{k,2},Y^{0}_{k+1},\ldots,Y^{0}_{m_{0}}\}\). We then check the interval \(Y^{0}_{k+1}\) at the next iteration of the refinement step. Otherwise, Definition 1 shows that, the active interval \(Y^{0}_{k}\) is not an ε -active interval if either the length of \(Y^{0}_{k,1}\) or that of \(Y^{0}_{k,2}\) is too small. Under this condition, we do not divide the active interval since the active point is sufficiently close to one of the endpoints. In our algorithm, this definition will assist us to save computational time when we solve the problem (NSIP). Moreover, due to the property in [7], we have \(g^{0}_{k,i}(x,y)\leq g^{0}_{k}(x,y)\) on \(X\times Y^{0}_{k,i}\) and \(\alpha ^{0}_{k,i}\leq\alpha^{0}_{k}\) for i=1,2. Therefore, the new concave functions \(g^{0}_{k,i}(x,y)\) are closer to the constraint function g(x,y) on \(Y^{0}_{k}\). Now, we summarize the steps of the refinement step as the following algorithm.

Algorithm 2

The refinement step for the ε -active intervals

Step 0:

Input \(E^{0}=\{Y^{0}_{1},Y^{0}_{2},\ldots,Y^{0}_{m_{0}}\}\), α 0, and \(Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{0})\), where m 0=|E 0|. Set k=1, \(\tilde{E}^{0}=E^{0}\), and \(\tilde{\alpha}^{0}=\alpha^{0}\).

Step 1:

If \(Y^{0}_{k}=[\eta^{L,0}_{k},\eta^{U,0}_{k}]\) is an ε -active interval and \(y^{0}_{k}\in Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{0})\) is the respective ε -active point on \(Y^{0}_{k}\), go to the Inner-Step 1 of inner loop; otherwise, go to Step 2.

Inner-Step 1::

Define \(Y^{0}_{k,1}=[\eta^{L,0}_{k},y^{0}_{k}]\) and \(Y^{0}_{k,2}=[y^{0}_{k},\eta^{U,0}_{k}]\). Find the corresponding convexification parameters on \(Y^{0}_{k,1}\) and \(Y^{0}_{k,2}\), respectively, \(\alpha^{0}_{k,1}\) and \(\alpha^{0}_{k,2}\), which satisfy

$$\alpha^0_{k,1}>\max\biggr(0,\max_{(x,y)\in X\times Y^0_{k,1}}\frac{\partial^2}{\partial y^2} g(x,y)\biggr)$$

and

$$\alpha^0_{k,2}>\max\biggr(0,\max_{(x,y)\in X\times Y^0_{k,2}}\frac{\partial^2}{\partial y^2} g(x,y)\biggr).$$

Moreover, define

as the concave function on \(Y^{0}_{k,1}\) and \(Y^{0}_{k,2}\), respectively.

Inner-Step 2::

Denote that \(\tilde{E}^{0}\) is the new collection of subintervals by substituting \(Y^{0}_{k}\) as \(Y^{0}_{k,1}\) and \(Y^{0}_{k,2}\), i.e., \(\tilde{E}^{0}=\{E^{0}\setminus\{Y^{0}_{k}\}\}\cup\{Y^{0}_{k,1}\}\cup\{Y^{0}_{k,2}\}\), and replace the constraint

$$g^0_k(x,y)\leq0, \quad\mbox{for\ }y\in Y^0_{k},$$

by the two new constraints

$$g^0_{k,i}(x,y)\leq0, \quad\mbox{for\ all\ }y\in Y^0_{k,i},\ i=1,2.$$

Furthermore, let \(\tilde{\alpha}^{0}\) be the corresponding convexification parameter on \(\tilde{E}^{0}\), where \(\tilde{\alpha}^{0}\) is the vector by replacing the entry \(\alpha ^{0}_{k}\) of α 0 by two new entries \(\alpha^{0}_{k,1}\) and \(\alpha^{0}_{k,2}\). Go to Step 2.

Step 2:

Terminate the algorithm when k=m 0 and output \(\tilde {E}^{0}\), and \(\tilde{\alpha}^{0}\). Otherwise, let k=k+1, \(E^{0}=\tilde{E}^{0}\), and \(\alpha^{0}=\tilde {\alpha}^{0}\). Go to Step 1.

Referring to Theorem 1, we can conclude that the stationary point x 0 for the problem (NSIP αBB(E 0,α 0)) satisfies

(11)
(12)
(13)

where \(\lambda_{1}^{0},\ldots,\lambda_{m_{0}}^{0}\) are the corresponding Lagrange multipliers. Let \(y^{0}=(y_{1}^{0},\ldots,\allowbreak y_{m_{0}}^{0})\) and \(\lambda^{0}=(\lambda _{1}^{0},\ldots,\lambda_{m_{0}}^{0})\). Thus, (x 0,y 0,λ 0) satisfies (11)–(13). Clearly, (2) also holds at (x 0,y 0,λ 0) since \(\nabla_{x} g^{0}_{k}(x,y)=\nabla_{x} g(x,y)\) for all \((x,y)\in X\times Y^{0}_{k}\). However, (x 0,y 0,λ 0) may not satisfy (3) because

$$\lambda^0_{k}g(x^0,y^0_{k})=-\frac{\lambda^0_{k}\alpha ^0_{k}}{2}(y^0_{k}-\eta ^{L,0}_k)(\eta^{U,0}_k-y^0_{k})$$

for k=1,2,…,m 0. Therefore, when \(\lambda^{0}_{k}>0\), \(\lambda^{0}_{k}g(x^{0},y^{0}_{k})=0\) only happens in the case of either \(y^{0}_{k}-\eta^{L,0}_{k}=0\) or \(\eta ^{U,0}_{k}-y^{0}_{k}=0\). However, achieving this condition is difficult. Hence, relaxing the complementarity slackness condition for the numerical algorithm is crucial. We make the following definition of the given tolerance ε.

Definition 2

The point \(x^{0}\in\mathcal{F}\) is called stationary with ε-complementarity for the problem (NSIP) if there exists \(y^{0}_{k}\) such that (11) and

$$ -\lambda^0_{k}\cdot\varepsilon\leq\lambda^0_{k}g(x^0,y^0_{k})\leq0.$$
(14)

are fulfilled simultaneously.

We emphasize that even though \(g(x,y)\leq g^{0}_{k}(x,y)\) for all \((x,y)\in X\times Y^{0}_{k}\), a feasible point x 0 of the problem (NSIP αBB(E 0,α 0)) cannot imply \(x^{0}\in\mathcal{F}\). For this reason, the feasibility on the unconsidered subintervals of Y may not hold at x 0. That is, let Y c be an arbitrary unconsidered subinterval of Y. Then we have Y c E 0. The concave function g c (x 0,y) with (5) on Y c may not be less than or equal to zero for all y. To develop an algorithm to find the stationary point with ε-complementarity of the problem (NSIP), we check the feasibility on the unconsidered subintervals and refine the ε -active intervals simultaneously to ensure that the feasibility and the ε-complementarity slackness condition (14) hold in the end. Moreover, for a given small number γ>0, we demonstrate that the point x 0 is γ-infeasible if there exists an index c, such that g c (x 0,y)>γ for some yY c . The aforementioned index c is called the index of γ-infeasibility. Furthermore, denote c 0 as the index of maximal γ-infeasibility at x 0 if

$$\max_{y\in Y_{c_0}} g_{c_0}(x^0,y)\geq\max_{y\in Y_c}g_c(x^0,y),$$

for all indices of γ-infeasibility c. Therefore, we conclude our idea in the following algorithm.

Algorithm 3

The relaxed cutting plane method with convexification

Step 0::

Split Y into N disjoint subintervals Y 1,Y 2,…,Y N which \(Y=\bigcup_{j=1}^{N} Y_{j}\) and \(|Y_{j}|=\frac{1}{N}\). Let the set E:={Y 1,Y 2,…,Y N } be the correction of the subintervals. Determine the convexification parameter α with (6) on E, and define the approximate concave functions g j on Y j with (5). Let \(\bar{\alpha}:=\max_{i}\{\alpha_{i}\}\).

Step 1::

Set ν=1. Choose arbitrary subintervals, \(Y_{1}^{1},Y_{2}^{1},\ldots,Y_{m_{1}}^{1}\), in E with m 1<N. Let \(E^{1}=\{Y_{1}^{1},Y_{2}^{1},\ldots,Y_{m_{1}}^{1}\}\) and \(\alpha^{1}=\{\alpha ^{1}_{1},\ldots,\alpha^{1}_{m_{1}}\}\) be the corresponding convexification parameter on E 1. Give adaptive small real numbers ε,γ>0, and let \(\varepsilon _{\cup}:=2\varepsilon\cdot\min\{1,1/\bar{\alpha}\}\). Let \(E^{c}_{1}=E\setminus E^{1}\) be the collection of remain subintervals.

Step 2::

Solve the problem (NSIP αBB(E ν,α ν)) using Algorithm 1 to obtain the stationary point x ν and the corresponding active point set \(Y_{\mathrm{act}}^{\alpha \mathrm{BB}}(x^{\nu})\).

Step 3::

Evaluate the values of

$$\max_{y\in Y_j}g_j(x^\nu,y),$$

for all \(Y_{j} \in E^{c}_{\nu}\). Denote c ν as the index of maximal γ-infeasibility, and go to Step 4 if c ν exists. If c ν does not exist, and for every k, the inequalities \(-\lambda^{\nu}_{k}\cdot\varepsilon\leq\lambda^{\nu}_{k}g(x^{\nu},y^{\nu}_{k})\leq0\) hold, and then STOP and output x ν as an approximate stationary point of the problem (NSIP). Otherwise, go to Step 5.

Step 4::

If there exists at least one ε -active point \(y^{\nu}_{k}\in Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{\nu})\), apply Algorithm 2 to E ν, α ν and \(Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{\nu})\). Define \(E^{\nu+1}=\tilde{E}^{\nu}\cup\{Y_{c_{\nu}}\}\), \(\alpha^{\nu +1}=(\tilde{\alpha}^{\nu},\alpha_{c_{\nu}})\), and \(E^{c}_{\nu+1}=E^{c}_{\nu}\setminus\{Y_{c_{\nu}}\}\), where \(\tilde{E}^{\nu}\) and \(\tilde{\alpha}^{\nu}\) are generated by Algorithm 2. Otherwise, in the case of no ε -active point, define \(E^{\nu+1}=E^{\nu}\cup\{Y_{c_{\nu}}\}\), \(\alpha^{\nu +1}=(\alpha^{\nu},\alpha_{c_{\nu}})\), and \(E^{c}_{\nu+1}=E^{c}_{\nu }\setminus\{Y_{c_{\nu}}\}\). Set m ν+1=|E ν+1| and ν=ν+1 and go to Step 2.

Step 5::

Apply Algorithm 2 to E ν, α ν, and \(Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{\nu})\). Define \(E^{\nu+1}=\tilde{E}^{\nu}\), \(\alpha^{\nu+1}=\tilde{\alpha }^{\nu}\), and \(E^{c}_{\nu+1}=E^{c}_{\nu}\). Set m ν+1=|E ν+1|. Let ν=ν+1, and go to Step 2.

Note here, E act(x ν) and \(Y_{\mathrm{act}}^{\alpha \mathrm{BB}}(x^{\nu})\) are defined in a similar way by substituting the index 0 for ν in (8) and (9). This shows that \(E^{c}_{\nu}\subset E\) for every ν. However, E ν is not a subset of E when Algorithm 2 is applied once in the previous iteration. More remarks on Algorithm 3 are listed as follows.

Remark 1

  1. (i)

    Notice that in Step 2 of Algorithm 3, we check the γ-feasibility on the subintervals in \(E^{c}_{\nu}\) and also determine whether x ν is the stationary point with ε-complementarity of the problem (NSIP). If there exists a γ-infeasible subinterval \(Y_{c_{\nu}}\), we add a new concave constraint \(g_{c_{\nu}}(x,y)\) to the problem (NSIP αBB(E ν,α ν)) in Step 4. Consequently, Algorithm 3 outputs a stationary point with ε-complementarity and γ-feasibility.

  2. (ii)

    The γ-feasibility check on \(E^{c}_{\nu}\) is an easy task since the functions g j (⋅,y) are concave on Y j for \(Y_{j}\in E^{c}_{\nu}\). Therefore, there are various numerical methods that can find the value of maxg j (x ν,y) for yY j quickly.

  3. (iii)

    In Step 3, we refine the ε -active intervals and substitute two new approximate concave constraints for the original concave constraint on ε -active interval. The constraint functions on the rest of the subintervals in E ν are maintained. Certainly, the ε -active interval may not exist. If both ε -active interval and c ν do not exist, according to Lemma 2 described in next section, Algorithm 3 will satisfy the stopping criteria.

  4. (iv)

    To avoid confusion, we rearrange the indices of the subintervals in E ν+1 before go to Step 2. For the sake of convenience, let \(Y^{\nu}_{1}\) be the unique ε -active interval and the index of maximal γ-infeasibility c ν exist at ν-th iterate. Then \(E^{\nu+1}=\tilde{E}^{\nu}\cup\{Y_{c_{\nu}}\}\) is defined in Step 4, where \(\tilde{E}^{\nu}=\{Y^{\nu}_{1,1},Y^{\nu}_{1,2},Y^{\nu}_{2},\ldots ,Y^{\nu}_{m_{\nu}}\}\). Therefore, \(E^{\nu+1}=\{Y^{\nu}_{1,1},Y^{\nu}_{1,2},Y^{\nu}_{2},\ldots ,Y^{\nu}_{m_{\nu}},Y_{c_{\nu}}\}\). We then rearrange the indices of the subintervals in E ν+1 as

    $$E^{\nu+1}=\{Y^{\nu+1}_1,Y^{\nu+1}_2,\ldots,Y^{\nu+1}_{m_{\nu+1}}\},$$

    by letting \(Y^{\nu+1}_{1}=Y^{\nu}_{1,1}\), \(Y^{\nu+1}_{2}=Y^{\nu}_{1,2},\ldots,Y^{\nu+1}_{m_{\nu+1}}=Y_{c_{\nu}}\). It is similar to the rearrangement for the indices of α ν+1. We can find that \(Y^{\nu+1}_{k}\) may not equal \(Y^{\nu}_{k}\) after the rearrangement.

Observe that we have two stopping criteria in Algorithm 3. Generally, we anticipate that the feasibility on \(E^{c}_{\nu}\) will hold after a few iterations. In other words, the number of the remaining subintervals will not be zero when Algorithm 3 terminates. We verify this result by applying some numerical examples in Sect. 5.

4 Convergence properties for algorithm

We demonstrate the finite termination and the error bound properties of Algorithm 3 in this section. Let L ν be defined as

$$ L_\nu:=\max\bigl\{ \|Y^\nu_{k}\|\ \big|\ Y^\nu_{k}\mbox{\ is\ the\ } \varepsilon _{\cup}\mbox{-active\ interval\ in\ } E_{\mathrm{act}}(x^\nu) \bigr\},$$
(15)

where \(\|Y^{\nu}_{k}\|:=\eta^{U,\nu}_{k}-\eta^{L,\nu}_{k}\) is the length of \(Y^{\nu}_{k}\). In the following lemma, we discuss the relationship between ∥x ν+1x ν∥ and the largest length of ε -active interval, L ν . This lemma will help us analyze the convergence property of Algorithm 3. Apply the fact that there is a point \(\bar{x}^{\nu}\) in the open segment connecting x ν and x ν+1 such that

$$ g(x^{\nu+1},y)-g(x^{\nu},y)=\nabla_x g(\bar{x}^\nu,y)^\top(x^{\nu +1}-x^\nu),$$
(16)

for all yY. Additionally, we make the following assumption.

Assumption 1

There exists an index \(\bar{\nu}\) such that for each ε -active point \(y^{\nu}_{k}\in Y_{\mathrm{act}}^{\alpha\mathrm {BB}}(x^{\nu})\), there is a small positive number δ that satisfies the following assumptions for all \(\nu\geq\bar{\nu}\):

  1. (a)

    \(\|\nabla_{x} g(\bar{x}^{\nu},y)\|\geq\delta\) for all y, where \(\bar{x}^{\nu}\) is the point in (16), and

  2. (b)

    \(|\nabla_{x} g(\bar{x}^{\nu},y^{\nu}_{k})^{\top}(x^{\nu +1}-x^{\nu})|\geq\delta\|x^{\nu+1}-x^{\nu}\|\).

We note that Assumption 1 can be checked in each iteration of the algorithm. Clearly, Assumption 1(a) can be achieved easily. Assumption 1(b) states that the angle between \(\nabla_{x}g(\bar{x}^{\nu},y^{\nu}_{k})\) and x ν+1x ν are not too close to being orthogonal. It is known that \(\nabla_{x} g(\bar{x}^{\nu},y^{\nu}_{k})\nabla_{x} g(\bar {x}^{\nu},y^{\nu}_{k})^{\top}\) is a positive semi-definite matrix. Therefore, if the smallest eigenvalue of \(\nabla_{x} g(\bar{x}^{\nu},y^{\nu}_{k})\nabla_{x} g(\bar{x}^{\nu},y^{\nu}_{k})^{\top}\) is greater than δ 2, then Assumption 1(b) naturally holds. Next, we present the following lemma, which is important in our convergence proof.

Lemma 1

Assume that, for each \(\nu\geq\bar{\nu}\), there is at least one ε -active interval. At least one of the subintervals of ε -active interval is an active interval in the next iteration. Suppose that Assumption 1 holds at ε -active points. Then, for each \(\nu\geq\bar{\nu}\), there exists a bounded real number b ν such that

$$\|x^{\nu+1}-x^\nu\|\leq b_{\nu}L_{\nu}^2.$$

Proof

Let \(Y^{\nu}_{k}\) be the ε -active interval that satisfies the assumption in this lemma and \(y^{\nu}_{k}\in Y_{\mathrm{act}}^{\alpha \mathrm{BB}}(x^{\nu})\) be the ε -active point. Hence, we have \(g^{\nu}_{k}(x^{\nu},y^{\nu}_{k})=0\) and

$$g(x^\nu,y^\nu_{k})=-\frac{\alpha^\nu_{k}}{2}(y^\nu_{k}-\eta ^{L,\nu}_k)(\eta^{U,\nu}_k-y^\nu_{k})<0.$$

As \(Y^{\nu}_{k}\) is the ε -active interval, \(Y^{\nu}_{k}\) will be divided into \(Y^{\nu}_{k,1}=[\eta^{L,\nu}_{k},y^{\nu}_{k}]\) and \(Y^{\nu}_{k,2}=[y^{\nu}_{k},\eta^{U,\nu}_{k}]\) in the next iteration. Without loss of generality, we assume that the subinterval \(Y^{\nu}_{k,1}=[\eta^{L,\nu}_{k},y^{\nu}_{k}]\) is the active interval in the next iteration, ν+1. Therefore, \(g^{\nu}_{k,1}(x^{\nu+1},y^{\nu+1}_{k,1})=0\), where \(y^{\nu +1}_{k,1}\) is the maximizer in the following optimization problem:

$$\max g^\nu_{k,1}(x^{\nu+1},y) \quad\mbox{subject\ to} \quad y\in Y^\nu_{k,1}.$$

The function \(g^{\nu}_{k,1}(x,y)\) is defined by

$$ g^\nu_{k,1}(x,y)=g(x,y)+\frac{\alpha^{\nu}_{k,1}}{2}(y^\nu_{k}-y)(y-\eta^{L,\nu}_k).$$
(17)

Clearly, according to (17),

Thus, we have

(18)

since \(\alpha^{\nu}_{k}\leq\bar{\alpha}\) for all k. We know that both \(g^{\nu}_{k,1}(x^{\nu+1},y^{\nu+1}_{k,1})\) and \(\frac{\partial}{\partial y}g^{\nu}_{k,1}(x^{\nu+1},y^{\nu +1}_{k,1})\) are equal to zero; therefore,

(19)

where \(y^{\nu+1}_{k,1}\leq\bar{y}^{\nu+1}_{k,1}\leq y^{\nu}_{k}\). However, the value of \(\frac{\partial^{2}}{\partial y^{2}}g^{\nu}_{k,1}(x^{\nu+1},\bar{y}^{\nu +1}_{k,1})\) is bounded on \(X\times Y^{\nu}_{k,1}\) for all ν. Hence, we have a positive number b such that \(\frac{\partial ^{2}}{\partial y^{2}} g^{\nu}_{k,1}(x,y)\geq-b^{*}\) for all ν. Equation (19) implies that \(g(x^{\nu+1},y^{\nu}_{k})-g(x^{\nu },y^{\nu}_{k})>-b^{*}(\eta^{U,\nu}_{k}-\eta^{L,\nu}_{k})^{2}\) since \(\eta ^{U,\nu}_{k}-\eta^{L,\nu}_{k}>y^{\nu}_{k}-y^{\nu+1}_{k,1}\). Combining (18) and (19), and then applying the mean value theorem, we obtain

$$ -b^*(\eta^{U,\nu}_k-\eta^{L,\nu}_k)^2\leq\nabla_x g(\bar{x}^\nu ,y^\nu_{k})^\top (x^{\nu+1}-x^\nu)\leq\frac{\bar{\alpha}}{8}(\eta^{U,\nu}_k-\eta ^{L,\nu}_k)^2,$$
(20)

where \(\bar{x}^{\nu}\) is the point on the line segment between x ν and x ν+1. Consequently, we have the following result from both (20) and Assumption 1(b)

As the inequality holds for the ε -active subinterval, this implies

$$\delta^2\|x^{\nu+1}-x^\nu\|^2\leq\max\biggr(\frac{\bar{\alpha }^2}{64},\,(b^*)^2\biggr)L_\nu^4.$$

Hence, \(\|x^{\nu+1}-x^{\nu}\|\leq b_{\nu}L_{\nu}^{2}\) when setting \(b_{\nu}=\max(\frac{\bar{\alpha}}{8},\,b^{*})/\delta\). Here, b ν is bounded by a constant for all \(\nu\geq\bar{\nu}\). □

Lemma 1 shows that the value of ∥d ν∥:=∥x ν+1x ν∥ is concerned with the length of ε -active intervals. We present a lemma modified directly from [8].

Lemma 2

If no γ-infeasible subinterval and no ε -active interval exist, Algorithm 3 terminates.

Proof

According to the result of Lemma 4.5 in [8], if there is no ε -active interval, Algorithm 3 generates a stationary point with ε-complementarity. Combined with the fact of having no γ-infeasible subinterval, Algorithm 3 terminates because of the satisfaction of these two stopping criteria. □

The next theorem presents that the maximum length of ε -active interval tends to 0.

Lemma 3

Let {L ν } be the sequence defined by (15), and suppose that Algorithm 3 does not terminate. Then,

$$\lim_{\nu\rightarrow\infty} L_\nu=0.$$

Proof

In Algorithm 3, we divide interval Y into N subintervals Y 1,…,Y N in Step 0, and choose m 1 subintervals to start our algorithm. Let I be the set of the index of iterations for Algorithm 3, and let I γ ={ν 1,ν 2,…,ν l}⊂I be the collection for the index of the iteration with γ-infeasibility, where l∈ℕ. This indicates that, for i=1,2,…,l, \(c_{\nu^{i}}\) exists at the ν i-th iteration. Since the number of unconsidered subintervals is bounded by Nm 1, we have 0≤lNm 1. Moreover, from Lemma 2 and for each νII γ , at least two new subintervals, \(Y^{\nu}_{k,1}\) and \(Y^{\nu}_{k,2}\), are generated from \(Y^{\nu}_{k}\) for some k. Therefore, if \(Y^{\nu}_{k}\) is an ε -active interval, then the result

$$\max\{\|Y^\nu_{k,1}\|,\|Y^\nu_{k,2}\|\}\leq(1-\varepsilon_\cup)\|Y^\nu _{k}\|\leq(1-\varepsilon_\cup)\cdot\frac{1}{N}$$

follows from the definition of ε -active interval. Note that all the subintervals in E ν are at most divided once for each ν, and the length of ε -active interval is monotonically decreasing. Hence, given any integer p∈ℕ, there exists a corresponding sufficiently large iterate \(\tilde{\nu }\in I\setminus I_{\gamma}\) such that L ν is bounded by \((1-\varepsilon _{\cup})^{p}\cdot\frac{1}{N}\) for \(\nu>\tilde{\nu}\) because Algorithm 3 does not terminate. Thus, we obtain lim ν→∞ L ν =0 by letting p→∞. □

The result in Lemma 3 directly implies that ∥d ν∥ approaches 0 as ν goes to infinity. The next theorem shows that Algorithm 3 converges to a stationary point of the problem (NSIP) with ε-complementarity and γ-feasibility after finite iterations.

Theorem 2

Suppose that for each iteration ν, the corresponding Lagrange multipliers are uniformly bounded by a constant M. Moreover, the assumptions in Lemma 1 hold. Let {x ν} be a sequence generated by Algorithm 3. Algorithm 3 will then generate a stationary point of the problem (NSIP) with ε-complementarity and γ-feasibility after finite iterations.

Proof

The KKT condition for the problem (\(\mathrm{NSIP}_{\alpha\mathrm {BB}}(E^{\nu_{s}},\alpha^{\nu_{s}})\)) can be reformulated as

(21)
(22)
(23)

where \(\mu^{\nu_{s}}(y)\) is a discrete measure defined on Y by

$$ \mu^{\nu_s}(y)=\begin{cases}\lambda^{\nu_s}_k & \text{if\ } y=y^{\nu_s}_k;\\0 & \text{elsewhere.}\end{cases}$$

As X and Y are compact sets, there exists a subsequence \(\{x^{\nu _{s}}\}\) converging to the accumulation point x , and \(\{\mu^{\nu_{s}}(y)\}\) converges to μ (y) weakly, as s→∞. Moreover, by continuity of ∇f(x) and ∇ x g(x,y), (21) is satisfied for all s. These facts imply that ∥∇f(x )+∫ Y x g(x ,y)  ∥=0 directly.

Next, we show that x is a stationary point with ε-complementarity. We claim that for \(k=1,2,\ldots,m_{\nu_{s}}\), \(-\lambda^{\nu _{s}}_{k}\cdot\varepsilon \leq\lambda^{\nu_{s}}_{k} g(x^{\nu_{s}},y^{\nu_{s}}_{k})\leq0\) when s is sufficiently large. The case for \(\lambda_{k}^{\nu_{s}}=0\) is trivial; therefore, we focus our attention on \(\lambda_{k}^{\nu_{s}}>0\). Even though the number of subintervals \(m_{\nu_{s}}\) increases monotonically, in view of Carathéodory’s theorem, the number of positive Lagrange multipliers is at most n+1, where n is the dimension of x. In other words, without loss of generality, let the Lagrange multipliers \(\lambda^{\nu_{s}}_{1}, \lambda^{\nu_{s}}_{2},\ldots,\lambda ^{\nu_{s}}_{m'}\) be positive, where m′≤n+1. Hence, the corresponding subintervals \(Y^{\nu_{s}}_{k}\), with respect to \(\lambda^{\nu_{s}}_{k}\) for k=1,2,…,m′, are active intervals. However, we have two cases. First, if the subinterval is an ε -active interval, then

$$\frac{1}{2}\|\alpha^{\nu_s}_k(y^{\nu_s}_k-\eta_k^{L,\nu_s})(\eta _k^{U,\nu _s}-y^{\nu_s}_k)\|\leq\frac{1}{8}\alpha^{\nu_s}_k\|\eta_k^{U,\nu _s}-\eta _k^{L,\nu_s}\|^2\leq\frac{1}{8}\bar{\alpha} L^2_{\nu_s},$$

where \(\bar{\alpha}\) is a constant defined in Step 0 of Algorithm 3. Since \(L_{\nu_{s}}\rightarrow0\), we can find an integer s 2 and the inequality \(\frac{1}{8}\bar{\alpha} L^{2}_{\nu_{s}}\leq\varepsilon\) is valid for s>s 2. Otherwise, if \(Y^{\nu_{s}}_{k}\) is not an ε -active interval. From the definition of \(\varepsilon_{\cup}=2\varepsilon\cdot\min\{1,1/\bar {\alpha}\}\), we obtain

$$\frac{1}{2}\|\alpha^{\nu_s}_k(y^{\nu_s}_k-\eta_k^{L,\nu_s})(\eta _k^{U,\nu _s}-y^{\nu_s}_k)\|\leq\varepsilon_{\cup}\bar{\alpha}\|\eta _k^{U,\nu_s}-\eta _k^{L,\nu_s}\|^2\leq\frac{\varepsilon}{N^2}<\varepsilon,$$

where 1/N is the length of the initial discretized subinterval in Algorithm 3. Moreover, \(g(x^{\nu_{s}},y_{k}^{\nu_{s}})=-(\alpha^{\nu_{s}}_{k}/2)(y^{\nu _{s}}_{k}-\eta_{k}^{L,\nu _{s}})(\eta_{k}^{U,\nu_{s}}-y^{\nu_{s}}_{k})\). Consequently, for all s>s 2, we can obtain \(-\lambda^{\nu_{s}}_{k}\cdot \varepsilon \leq\lambda^{\nu_{s}}_{k} g(x^{\nu_{s}},y^{\nu_{s}}_{k})\leq0\).

Finally, we present that g(x ,y)≤0 for yY is true. Suppose that the collection of the remaining subinterval \(E^{c}_{\nu_{s}}\) is nonempty for s=1,2,…. Otherwise, the proof is achieved if \(E^{c}_{\nu_{s}}\) is empty for some s. Let Y r be a subinterval in \(E^{c}_{\nu_{s}}\) for all s. We denote that

$$y^*_r\in\arg\max_{y\in Y_r} g_r(x^*,y) \quad\mbox{and} \quad y^{\nu_s}_r\in\arg\max_{y\in Y_r} g_r(x^{\nu_s},y),$$

where g r (x,y) is the approximate concave function with (5) on Y r . Assume that, by contradiction, \(g_{r}(x^{*},y^{*}_{r})\geq\kappa\), for some κ>0. Since g r is a continuous differentiable function with respect to variable x, there is an index s κ ∈ℕ such that \(g_{r}(x^{\nu_{s}},y_{r}^{*})\geq\kappa/2\), for all ss κ . This concludes that

$$g_r(x^{\nu_s},y_r^{\nu_s})\geq g_r(x^{\nu_s},y_r^*)\geq\kappa/2,\quad\mbox{for\ all\ } s\geq s_{\kappa}.$$

Since \(Y_{r}\in E^{c}_{\nu_{s}}\) for all s∈ℕ, there must exist an index of maximal γ-infeasibility \(c_{\nu_{s}}\) and the corresponding subinterval \(Y_{c_{\nu_{s}}}\in E^{c}_{\nu_{s}}\) such that

$$g_{c_{\nu_s}}(x^{\nu_s},y_{c_{\nu_s}})>g_r(x^{\nu_s},\,y_r^{\nu _s})\geq\kappa/2,$$

where \(y_{c_{\nu_{s}}}\in\arg\max_{y\in Y_{c_{\nu_{s}}}} g_{c_{\nu _{s}}}(x^{\nu_{s}},y)\) and \(g_{c_{\nu_{s}}}(x,y)\) is the concave function with (5) on \(Y_{c_{\nu_{s}}}\). Hence, we conclude that \(g_{c_{\nu_{s}}}(x,y)\) is the new member of the constraints in the problem (\(\mathrm{NSIP}_{\alpha\mathrm {BB}}(E^{\nu_{s}+1},\alpha^{\nu_{s}+1})\)). This implies that

$$g_{c_{\nu_s}}(x^{\nu_s+1},y)\leq0 \quad\mbox{for\ }y\in Y_{c_{\nu_s}}$$

where \(x^{\nu_{s}+1}\) is the stationary point of the problem (\(\mathrm{NSIP}_{\alpha\mathrm{BB}}(E^{\nu_{s}+1},\alpha^{\nu_{s}+1})\)). Therefore, we have

$$g_{c_{\nu_s}}(x^{\nu_s},y_{c_{\nu_s}})- g_{c_{\nu_s}}(x^{\nu _s+1},y_{c_{\nu_s}})\geq\kappa/2.$$

Thus, \(\|\nabla_{x} g_{c_{\nu_{s}}}(\bar{x}^{\nu_{s}},y_{c_{\nu_{s}}})\|\|d^{\nu_{s}}\|\geq\kappa/2\) for some point \(\bar{x}^{\nu_{s}}\) lies between the segment generated by \(x^{\nu_{s}}\) and \(x^{\nu_{s}+1}\). From Lemma 1, we know that \(\|d^{\nu_{s}}\|\rightarrow0\). This contradicts the inequality. Consequently, we obtain

$$\lim_{s\rightarrow\infty}g_r(x^{\nu_s},y)=g_r(x^*,y)\leq0,$$

for all yY r Therefore, given any γ>0, there is s 3∈ℕ such that \(g_{r}(x^{\nu_{s}},y)\leq\gamma\) for yY r and for all \(Y_{r}\in E_{\nu_{s}}^{c}\) when s>s 3. Let \(\bar{s}=\max\{s_{1},s_{2},s_{3}\}\). Then \(x^{\nu_{\bar{s}}}\) is a stationary point with ε-complementarity and γ-feasibility. Therefore, our proof is complete. □

We choose ε to be a small positive number for the numerical implementation in Algorithm 3. In the end, we make our numerical solution satisfy the ε-complementarity slackness condition since the complementarity slackness condition (3) is difficult to satisfy. In Theorem 2, \(-\lambda^{\nu}_{k}\cdot\varepsilon\leq \lambda^{\nu}_{k}g(x^{\nu},y^{\nu}_{k})\leq0\) holds if the active point \(y^{\nu}_{k}\) is not an ε -active point. Hence, the existence of the ε -active interval helps us avoid dividing the unnecessary active intervals. In conclusion, the ε -active interval plays an important role in the numerical experiment.

We prove that Algorithm 3 terminates in a finite number of iterations for any given tolerances ε>0 and γ>0. Let the point \(x^{\nu^{*}(\varepsilon,\gamma)}\) be generated by Algorithm 3 when the algorithm terminates with ε-complementarity and γ-feasibility. In the following theorem, we show that any accumulation point of \(x^{\nu^{*}(\varepsilon,\gamma)}\) as ε,γ→0 is the stationary point of the problem (NSIP).

Theorem 3

Let \(x^{\nu^{*}(\varepsilon,\gamma)}\) be the stationary point with ε-complementarity and γ-feasibility of the problem (NSIP) generated by Algorithm 3. Then,

  1. (a)

    For a fixed tolerance ε>0, every accumulation point \(x^{\nu^{*}(\varepsilon)}\) of \(x^{\nu^{*}(\varepsilon,\gamma)}\) as γ→0 is a stationary point with ε-complementarity of the problem (NSIP).

  2. (b)

    Every accumulation point \(x^{\nu^{*}}\) of \(x^{\nu ^{*}(\varepsilon )}\) as ε→0 is a stationary point of the problem (NSIP).

Proof

We first show (a). Since X is a compact subset of ℝn, there exists a point \(x^{\nu^{*}(\varepsilon,\gamma)}\in X\) such that \(x^{\nu ^{*}(\varepsilon,\gamma)}\rightarrow x^{\nu^{*}(\varepsilon)}\) as γ→0. Furthermore, according to the proof in Theorem 2, the result in (a) follows immediately. Next, we show (b). Following the result in (a), we obtain \(x^{\nu ^{*}(\varepsilon)}\in\mathcal{F}\). As \(\mathcal{F}\) is a compact subset of ℝn, there exists an accumulation point \(x^{\nu^{*}}\in\mathcal{F}\) of \(x^{\nu^{*}(\varepsilon)}\) as ε→0. Let \(y^{\nu^{*}(\varepsilon)}_{1},y^{\nu^{*}(\varepsilon)}_{2},\ldots ,y^{\nu^{*}(\varepsilon )}_{m''}\), m″≤n+1, be the points satisfying the KKT conditions (11)–(13) with positive Lagrange multipliers \(\lambda^{\nu^{*}(\varepsilon)}_{1},\lambda^{\nu^{*}(\varepsilon )}_{2},\ldots,\lambda ^{\nu^{*}(\varepsilon)}_{m''}\) at \(x^{\nu^{*}(\varepsilon)}\). Thus, we have

$$ \nabla f(x^{\nu^*(\varepsilon)})+\sum_{k=1}^{m''}\lambda^{\nu ^*(\varepsilon )}_{k}\nabla_x g(x^{\nu^*(\varepsilon)},y^{\nu^*(\varepsilon)}_{k})=0,$$
(24)

for all ε>0. Hence, (24) converges to (11) when ε→0. Moreover, define

$$\delta^{\nu^*(\varepsilon)}:=\max_{1\leq k\leq m''} \{-g(x^{\nu ^*(\varepsilon )},y^{\nu^*(\varepsilon)}_{k})\}.$$

Therefore, \(0\leq\delta^{\nu^{*}(\varepsilon)}\leq\varepsilon\) is obtained according to the stopping criterion of Algorithm 3 and \(\lim_{\varepsilon\rightarrow0}\delta^{\nu^{*}(\varepsilon)}=0\). Let ε→0. Due to the continuity of function g, we obtain the result (b). □

Remark 2

Since ε →0 as ε→0, Lemma 3 may not hold. Hence, this fact leads to the failure of the convergence proof in Theorem 2. We can not show the result of (b) in Theorem 3 by letting ε→0 and applying Theorem 2.

5 Numerical experience

In this section, we apply Algorithm 3 to some examples. We implement Algorithm 3 in Matlab 7.0 and use the routine fmincon in Matlab, with default tolerances as the black box NLP solver in Step 2 of Algorithm 1. In particular, the tolerance for constraint violation is 1e−6. All the experiments were run on a personal computer with AMD(R) Phenom II(R) CPU 3.0 GHz and 2 GB RAM.

Example 1

Among all control design techniques, the Proportional-Integral-Derivative (PID) controller design problem is the most widely used. The PID controller design problem is formulated as an SIP problem in [9]. More recently, the authors in [3] applied the semi-infinite programming technique to solve the optimal worst-case H 2 design of the PID controller for uncertain nonminimum-phase plants. The first example was taken from [9] and was implemented in [13, 25]. The PID controller design problem minimizes the cost function given by

$$f(x)=\frac {x_2(122+17x_1+6x_3-5x_2+x_1x_3)+180x_3-36x_1+1224}{x_2(408+56x_1-50x_2+60x_3+10x_1x_3-2x_1^2)},$$

and subject to the constraint

$$g(x,y)=\Im(T(x,y))-3.33(\Re(T(x,y)))^2+1\leq0,\quad\mbox{for\ all\ } y\in Y,$$

where

with Y=[10−6,30], \(j=\sqrt{-1}\). ℑ(a) and ℜ(a) represent the imaginary and real part of complex number a, respectively. To start our algorithm, we have to determine the value of α first. Generally, the estimation of α is not easy. Here, we evaluate the explicit form of the second derivative of the constraint function and then set α=3.4763. The calculation of α costs time; thus we let \(\alpha^{\nu}_{k}=3.4763\) be a constant for all k and ν. We initialize our algorithm with the tolerances ε=1e−4 and γ=1e−6. For comparison, we implement different values of m 1, where m 1 denotes the number of chosen subintervals in the initial step (Step 0) of Algorithm 3. Create a modified Algorithm 3, denoted Algorithm 3.a, by setting m 1=N in Step 0 of the Algorithm 3. That is, we select all the discretized subintervals in the initial step and need not to check the γ-feasibility in each iteration. We also use Algorithm 1 to solve the subproblem generated by Algorithm 3.a. Moreover, the subroutine for solving the SIP problem in Matlab called fseminf is also implemented for comparison with our results. The following table lists the numerical results for Example 1.

Table 1 shows opt-val as the computed optimal value, cpu-t as the computation time, #iter as the iteration number taken in two algorithms, and #re-int as the number of remaining subintervals in Algorithm 3. Moreover, maxg denotes the optimal value of the constraint function g at the terminated solution over Y. In Table 1, the number of infeasible subintervals is few, and we add five infeasible subintervals to both cases. Since the number of constraints in Algorithm 3 is less than that in Algorithm 3.a, Algorithm 3 at least saves half of the time in computation. Although the number of iterations for Algorithm 3 is greater than that in Algorithm 3.a, the total CPU time spent is less than that in Algorithm 3.a. In the case of N=100, the average CPU time of Algorithm 3 for each iteration, 0.5538 s, is much less than the average time of Algorithm 3.a, 0.1735 s. Therefore, selecting some of the subintervals instead of involving all the subintervals makes the computation more efficient. We can also find in Fig. 1 that the subintervals we refine concentrate on the neighborhood of the active point. Furthermore, the terminated solution of Algorithm 3 is feasible to this example. As g(x,y)≤g j (x,y) and \(g_{j}(x,y)=g(x,y)+\frac{\alpha _{j}}{2}(y-\eta_{j}^{L})(\eta_{j}^{U}-y)\leq\gamma\) for all yY j and for all \(Y_{j}\in E^{c}_{\nu}\) at each iteration ν, the terminated solution generated by Algorithm 3 solution may be feasible.

Fig. 1
figure 1

Constraint and concave approximation at the terminated step in Example 1

Table 1 Numerical results for Example 1

Example 2

Let Y=Y PY S be the union of disjoint closed design frequency interval of a filter, where Y P=[0,y p] and Y S=[y s,π] represent the passband and stopband, respectively. 0<y p<y s<π are well-defined frequencies. A desired prescription for the frequency response of the filter is given by

$$ D(y)=\begin{cases}e^{-jy\tau_0}, & \text{for\ } y\in Y^P;\\0 , & \text{for\ } y\in Y^S,\end{cases}$$
(25)

where τ 0 is the given constraint group delay and \(j=\sqrt{-1}\). The Finite Impulse Response (FIR) filter design problem considers the best approximation of a given magnitude response by that of a FIR filter. The most recent approach consists of solving the complex Chebyshev approximation problem [10, 17]

$$ \rho_{CA}^*:=\min_{x\in\mathbb{R}^n} \max_{y\in Y} W(y)\bigl||D(y)|-|H(x,y)| \bigr|$$
(26)

where x:=(x 1,…,x n )∈ℝn is the vector containing the coefficients of the unit impulse response, and

$$H(x,y):=\sum^n_{k=1}x_k e^{-jy(k-1)}.$$

Problem (26) is equivalent to the nonlinear SIP problem

$$\everymath{\displaystyle}\begin{array}{@{}l}\max_{(x,\delta)\in\mathbb{R}^{n+1}} \delta \\[2mm]\mbox{subject\ to} \quad W(y)\bigl| |D(y)|-|H(x,y)|\bigr|-\delta\leq0, \quad\mbox{for\ all\ } y\in Y.\end{array}$$

In [16, 17], the authors suggest some new algorithms for the solution of problem (26) in the case that problem (26) is a convex SIP problem. More general formulations of the design approximation problems in the frequency domain are presented in [10]. The authors in [10] proposed serval non-convex SIP models for the FIR filter design and solved serval specific design problems. In Example 2, we implement one of the non-convex SIP models in [10].

Let n=91, Y P=[0,π/5], and Y S=[π/4,π]. The desired frequency response is given by τ 0=120 in (25). This example approximates |D(y)|=1 by the composed magnitude response M Q (y)M(x,y) of both filters in Y P and bounds M Q (y)M(x,y) by a constant δ S :=3.16227e−4, where the functions are given by

$$\everymath{\displaystyle}\begin{array}{@{}l}M(x,y)=|H(x,y)|,\quad\mbox{and}\quad M_Q(y)=|Q(jy)|;\\[2mm]Q(y):=\frac{1}{1+1.3202(y/\omega_g)+0.86175(y/\omega _g)^2+0.2887(y/\omega_g)^3},\end{array}$$
(27)

where ω g :=π/7.02832 is a constant. From [10], we know that M Q (y) is positive for all yY. Hence, the error function |1−M Q (y)M(x,y)| on Y P can be rewritten as

$$M_Q(y)\biggl(\frac{1}{M_Q(y)}-M(x,y)\biggr),\quad\mbox{for\ }y\in Y^P,$$

and the other error function on Y S is

$$|0-M_Q(y)M(x,y)|=M_Q(y)M(x,y)\leq\delta_S,\quad\mbox{for\ }y\in Y^S.$$

Thus, this problem can be interpreted as the problem (26) with the weight function W(y)=M Q (y) and the desired magnitude response 1/M Q (y) in Y P. Due to the result in [10], the problem has a solution that can be computed by the following non-convex SIP problem:

$$\everymath{\displaystyle}\begin{array}{l@{\quad}l@{\quad}l}(\mathrm{SIP}_{\mathit{FIR}})&\max_{(x,\delta)\in X} \delta \\[2mm]&\mbox{subject\ to}&+1-M_Q(y)M(x,y)-\delta\leq 0, \quad\mbox{for\ } y\in Y^P,\\[1mm]& &\displaystyle-1+M_Q(y)M(x,y)-\delta\leq0, \quad\mbox{for\ }y\in Y^P,\\[1mm]& & M_Q(y)M(x,y)-\delta_S\leq0, \quad\mbox{for\ } y\in Y^S,\end{array}$$

where we restrict X=[−0.3,0.3]n+1. We discretize Y P into 20 subintervals and Y S into 100 subintervals to implement the (SIP FIR ) problem. The parameters are set by ε=1e−4 and γ=1e−6. After calculating, we let α=1.901 to start the two algorithms. Table 2 shows our results and Fig. 2 represents the magnitude M Q (y)M(x,y) for y∈[0,π].

Fig. 2
figure 2

Magnitude in Example 2

Table 2 Numerical results for Example 2

The symbol denotes that Algorithm 3.a reaches the maximum iteration number we set. Therefore, Algorithm 3.a here runs for 150 iterations and does not meet the stopping criteria. Note that fseminf failed on this example. For each iteration, we find that the NLP solver in Matlab does not compute efficiently. Moreover, the improvement for the optimal value is very torpid. Hence, if the number of initial discretization is too small, it costs more CPU time to run this example. However, Algorithm 3 is obviously much better than Algorithm 3.a in the FIR filter design example.

6 Conclusion

We propose a new algorithm and show the convergence properties of our algorithm in this paper. We improve the algorithm proposed in [8] by combining it with the concept of the cutting plane method. The convexification relaxation on the variable y for the constraint function g can make the computation for the lower level optimization problem easier. This method has an advantage in solving non-convex SIP problems, especially when the constraint function g is complicated. Based on the numerical results in the numerical section, Algorithm 3 implements with better efficiency in Examples 1 and 2 than the other two methods.