Abstract
In this paper, we present an algorithm to solve nonlinear semi-infinite programming (NSIP) problems. To deal with the nonlinear constraint, Floudas and Stein (SIAM J. Optim. 18:1187–1208, 2007) suggest an adaptive convexification relaxation to approximate the nonlinear constraint function. The αBB method, used widely in global optimization, is applied to construct the convexification relaxation. We then combine the idea of the cutting plane method with the convexification relaxation to propose a new algorithm to solve NSIP problems. With some given tolerances, our algorithm terminates in a finite number of iterations and obtains an approximate stationary point of the NSIP problems. In addition, some NSIP application examples are implemented by the method proposed in this paper, such as the proportional-integral-derivative controller design problem and the nonlinear finite impulse response filter design problem. Based on our numerical experience, we demonstrate that our algorithm enhances the computational speed for solving NSIP problems.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The semi-infinite programming problem is an optimization problem with finite-dimensional decision variables x∈ℝn and infinite number of constraints. We are concerned about the following nonlinear semi-infinite programming problem (NSIP) in this paper:
where X is a box constraint set in ℝn with x L≤x≤x U. Both x L and x U are n-dimensional vectors with x L<x U; the vector inequalities are understood as componentwise. The objective function f:ℝn→ℝ is twice differentiable on X; set Y is a fixed nonempty compact subset of ℝm; and the constraint g is a twice differentiable function with respect to x and y. A more general case where the x-dependent index set Y(x) substitutes for the fixed index set Y is called generalized semi-infinite programming problems (GSIP). In this paper, we focus on the case of m=1 for the sake of convenience. Moreover, without loss of generality, we let Y=[0,1] throughout this paper.
The optimality condition and duality theorem for semi-infinite programming problems (SIP) have been studied since the 1960s. A complete survey on the optimality condition and duality theory for linear, convex, and smooth SIP problems is clarified in [22]. Many applied problems can be treated as SIP problems, such as approximation theory, economic problems, and numerous engineering problems such as optimal control, robot trajectory planning, digital filter design, air pollution control, and production planning. We refer to [12], an excellent review paper on SIP problem including the theory, algorithms, and some applications. Furthermore, [19, 21] summarize the existing and typically used algorithms for solving SIP problems.
The main difficulty in solving SIP problems is the infinite number of constraints. In the last two decades, many algorithms have been proposed to solve SIP problems. Among these, the most popular method for solving SIP problems is the discretization method (cf. [19, 23, 25]). Another common approach, the reduction-based method, is reduced locally to a finite-dimensional nonlinear programming problem (see, e.g. [11, 18, 24]). In these two methods, the discretization method costs much computational time when the cardinality of the finite set of Y grows. Moreover, although the reduction-based method, which utilizes an exact penalty function, yields global convergence, it usually requires stronger assumptions and is implemented in a rather simple form. In addition to the foregoing methods, the Karush-Kuhn-Tucker (KKT) system of the problem (NSIP) is reformulated into a system of semi-smooth equations (cf. [15, 20, 26]). The authors then apply the smoothing, semi-smoothing Newton method, or Newton-type method to solve the KKT system. In the recent studies, a branch-and-bound technique is used to generate convergent sequences of upper and lower bounds on the SIP solution value in these articles [5, 6]. The authors focus on finding the global solution of SIP problems and present the algorithm with feasible iterates for SIP problems.
In addition, one of the important methods, the exchange method, is proposed in [4, 14]. The exchange method, also called the cutting plane method, provides an outer approximation of the feasible set of SIP problems. For each iterate, this algorithm reduces the infinite index set Y into a finite index subset of Y. However, these authors need to evaluate a problem called lower level optimization problem at every iteration:
for a given point \(\bar{x}\in\mathbb{R}^{n}\). Determining the global maximizer of problem (1) numerically may be difficult, especially in the case where \(g(\bar{x},\cdot)\) is non-convex with respect to the variable y. In fact, the standard solver for (1) can only be expected to obtain a local maximizer, not the global one. Nevertheless, to solve the SIP problem, where both f(x) and g(x,⋅) are convex functions, Zhang et al. [28] propose a new exchange method. This method only requires finding some points with a certain computationally easy criterion instead of finding the global optimal solution of the problem (1).
Recently, Floudas and Stein [8] propose an adaptive convex relaxation of the lower level problem (1). The authors split Y into a finite number of subintervals, and substituted an overestimated concave function regarded as the approximate constraint for each subinterval. They produced feasible iterates for the SIP problems. Although the adaptive convexification method provides an idea to solve the global maximum of the lower level problem (1) efficiently, we find that the computation may be costly when the number of subintervals increases. Therefore, we intend to integrate the idea of the exchange method into the adaptive convexification method, and then propose a modified algorithm in this paper. We now denote that \(\mathcal{F}\) is the feasible set of the problem (NSIP) as follows:
Here, we provide the first order optimality condition for the problem (NSIP) from [22].
Theorem 1
Let \(\tilde{x}\in\mathcal{F}\) be a local minimizer of (NSIP) and \(Y_{\mathrm{act}}(\tilde{x}):=\{y\in Y|g(\tilde{x},y)=0\}\) be a nonempty set. There exists \(\tilde{y}_{k}\in Y_{\mathrm{act}}(\tilde{x})\), for 1≤k≤n+1, and (λ 0,λ 1,…,λ n+1)∈ℝ×ℝn+1, where λ 0,λ 1,…,λ n+1≥0 such that
hold. Furthermore, a point \(\tilde{x}\in\mathcal{F}\) is said to satisfy the extended Mangasarian-Fromovitz constraint qualification (EMFCQ) if there exits a direction d∈ℝn such that \(\nabla_{x}g(\tilde {x},y)^{\top}d<0\) for all \(y\in Y_{\mathrm{act}}(\tilde{x})\). If EMFCQ is valid at \(\tilde{x}\), we can set λ 0=1 in (2).
The point \(\tilde{x}\) that satisfies (2)–(4) is called the stationary point for the problem (NSIP). Throughout this paper, we assume that EMFCQ holds at every stationary point of the problem (NSIP). Under EMFCQ, (2)–(4) with λ 0=1 are called KKT optimality conditions. Regarding the result in [22], the set {λ k } is nonempty and bounded when the EMFCQ is fulfilled.
We organize this paper as follows. In Sect. 2, we introduce one kind of convex relaxed methods: the αBB method. The algorithm used to solve the generated subproblem is also mentioned in this section. We propose a modified adaptive convexification algorithm in Sect. 3 and show the convergence properties in Sect. 4. Finally, we apply our algorithm to some examples and present the computational efficiency of the algorithm.
2 Preliminaries
2.1 The αBB method
Based on the convergence properties of the existing global optimization algorithms, these algorithms can be categorized into deterministic and stochastic. One of the deterministic global optimization algorithms, the αBB (α-based Branch and Bound) method, offers mathematical guarantees for convergence to find a point arbitrarily close to the global minimum for the twice-differentiable nonlinear programming problems. In the αBB method, a convex underestimator of a nonconvex function is constructed by decomposing it into a numerous sum of nonconvex terms (cf. [1, 2, 7]). We mention in Sect. 1 that obtaining the global maximizer of the lower level optimization problem (1) may be difficult. Therefore, the concept of the αBB method is used to construct the concave overestimator for the nonconvex constraint g(x,y). We now divide Y into N subintervals with \(Y=\bigcup_{j=1}^{N}[\eta^{L}_{j},\eta^{U}_{j}]\), where \(\eta_{1}^{L}=0\), \(\eta_{N}^{U}=1\) and \(\eta^{L}_{j}<\eta^{U}_{j}=\eta^{L}_{j+1}\), for j=1,2,…,N−1. The single semi-infinite constraint, g(x,y)≤0, y∈Y, can be considered N semi-infinite constraints under the discretization
where \(Y_{j}=[\eta^{L}_{j},\eta^{U}_{j}]\). With the obvious modification, we construct a concave overestimator by adding a quadratic relaxation function ψ for a nonconcave function \(g:[\eta^{L}_{j},\eta^{U}_{j}]\rightarrow\mathbb{R}\), a C 2-function on an open neighborhood of \([\eta^{L}_{j},\eta^{U}_{j}]\). Set
and define g j (x,y), for j=1,2,…,N, by
Note that if α j >0, the function g j (x,y) is greater than g(x,y) except for the boundary points \(\eta^{L}_{j}\) and \(\eta^{U}_{j}\), i.e., g(x,y)≤g j (x,y) for x∈X, y∈Y j . To ensure that g j (x,y) is a concave function with respect to y on the subinterval Y j , α j must satisfy the following condition:
and α j must be positive. Thus,
We call α j , which satisfies (6), as the convexification parameter corresponding to Y j , and denote the vector of the convexification parameters by α=(α 1,…,α N ). The maximal separation distance between g(x,y) and g j (x,y) on X×Y j is defined by the maximum of the difference of these functions; i.e., for any x∈X
Clearly, the distance will converge to zero as \((\eta^{U}_{j}-\eta^{L}_{j})\rightarrow0\).
Next, we define some notation. Let \(E=\{Y_{j}=[\eta^{L}_{j},\eta^{U}_{j}]|\,j=1,\ldots,N\}\) be the set of collections of subintervals and the vector α=(α 1,α 2,…,α N ) be the corresponding convexification parameters on E, where α j , defined in (6), is the j-th component of α, i.e., α j is the convexification parameter corresponding to Y j , for j=1,2,…,N. The approximate feasible set \(\mathcal{F}_{\alpha\mathrm {BB}}(E,\alpha)\) is then defined as
where \(g_{j}(x,y)=g(x,y)+\frac{\alpha_{j}}{2}(y-\eta^{L}_{j})(\eta^{U}_{j}-y)\) is an approximate concave function on the subinterval Y j . Obviously, \(\mathcal{F}_{\alpha\mathrm{BB}}(E,\alpha)\) is a subset of \(\mathcal{F}\). Furthermore, we replace the single semi-infinite constraint g(x,y)≤0 by N semi-infinite constraints,
With the substitution of the single semi-infinite constraint, we obtain an approximate problem (NSIP αBB(E,α)) for problem (NSIP):
The inequality (6) is strict, and thus the problem (NSIP αBB(E,α)) has a finite number of strict concave constraints g j (⋅,y). Therefore, for any fixed point \(\bar{x}\in X\) and each concave constraint function \(g_{j}(\bar{x},y)\), j=1,2,…,N, the lower level problem
has an unique global optimal solution \(\bar{y}_{j}\). We can obtain \(\bar{y}_{j}\) without difficulty. Next, we will present a method to solve the problem (NSIP αBB(E,α)).
2.2 The cutting plane method
In this subsection, we introduce the cutting plane method to solve the problem (NSIP αBB(E,α)). The cutting plane method is commonly used when solving SIP problems. Let J={1,2,…,N} be the index set of the collection of subintervals, where N is the number of subintervals in E. We present the cutting plane method to solve the problem (NSIP αBB(E,α)) in the following algorithm.
Algorithm 1
The cutting plane method for solving the problem (NSIP αBB(E,α))
- Step 1::
-
For each Y j ∈E, choose a finite set \(T_{j}^{1}=\{y_{j,1},\ldots,y_{j,k^{1}_{j}}\}\) such that \(T^{1}_{j}\subset Y_{j}\). Let \(T^{1}=\bigcup_{j\in J} T^{1}_{j}\) and p=1.
- Step 2::
-
Solve the nonlinear programming problem (NSIP αBB(T p)) to obtain the optimal solution \(\tilde{x}^{p}\), where
$$\everymath{\displaystyle}\begin{array}{@{}l@{\quad}l}(\mathrm{NSIP}_{\alpha\mathrm{BB}}(T^p)){:}&\min_{x\in X} f(x)\\[2mm]&\mbox{subject\ to}\quad g_{j}(x,y_{j,i})\leq0,\quad \mbox{for\ } i=1,2,\ldots,k^p_j,\ j\in J.\end{array}$$ - Step 3::
-
For each j∈J, find the optimal solution \(\tilde {y}_{j}\) for the following problem:
$$\max g_j(\tilde{x}^p,y)\quad\mbox{subject\ to} \quad y\in Y_j.$$If \(g_{j}(\tilde{x}^{p},\tilde{y}_{j})\leq0\) for all j∈J, we stop the algorithm and output \(\tilde{x}^{p}\) as the optimal solution. Otherwise, define the set \(T^{p+1}_{j}\) by
$$T^{p+1}_j=\begin{cases}T^p_j\cup\{\tilde{y}_j\} & \text{if } g_j(\tilde{x}^p,\tilde {y}_j)>0;\\T^p_j & \text{if } g_j(\tilde{x}^p,\tilde{y}_j)\leq0.\end{cases}$$Let \(k_{j}^{p+1}=|T^{p+1}_{j}|\) be the cardinal number of reference set on Y j at (p+1)-iteration for j∈J. Set p:=p+1, and go to Step 2.
Associating with each finite subset T p at p-iteration, the problem (NSIP αBB(E,α)) is reduced to a finite dimensional nonlinear programming problem. Therefore, we can solve it using a common NLP solver to obtain the optimal solution \(\tilde{x}^{p}\). We subsequently check the feasibility for each constraint \(g_{j}(\tilde {x}^{p},y)\) on Y j , for Y j ∈E. Next, we add the worst infeasible point to \(T^{p}_{j}\) if there is an infeasible point on Y j . The algorithm terminates when no infeasible point exists in all subintervals in E. The convergence proof for Algorithm 1 can be found in [19, 27]. The convergence proof in [19, 27] is satisfied for every point in a fixed interval, and thus the convergence result of Algorithm 1 naturally holds. Therefore, we omit the proof here.
3 Modified convexification algorithm
The main idea of our algorithm is to modify the adaptive convexification algorithm proposed by Floudas and Stein [8]. They solved the problem (NSIP αBB(E,α)) and refined some of the subintervals to approach the stationary point of the problem (NSIP). Let any given predefined set E 0 be the collection of some subintervals in Y defined by
where \(Y^{0}_{k}\) is a subinterval of Y with lower bound \(\eta^{L,0}_{k}\) and upper bound \(\eta^{U,0}_{k}\), for k=1,2,…,m 0, and m 0:=|E 0| denotes the cardinal number of E 0. Note that E 0 only collects some of the subintervals of Y, not all of the subintervals. That is, \(\bigcup^{m_{0}}_{k=1}Y^{0}_{k}\subsetneqq Y\). The convexification parameter \(\alpha^{0}_{k}\) on \(Y^{0}_{k}\) is defined in (6):
Let \(\alpha^{0}=(\alpha^{0}_{1},\ldots,\alpha^{0}_{m_{0}})\) be the corresponding convexification parameter on E 0. We substitute E 0 for E and propose a modified subproblem (NSIP αBB(E 0,α 0)):
where \(g^{0}_{k}(x,y)=g(x,y)+\frac{\alpha^{0}_{k}}{2}(y-\eta ^{L,0}_{k})(\eta^{U,0}_{k}-y)\). We solve the problem (NSIP αBB(E 0,α 0)) to obtain the stationary point x 0 and verify whether x 0 is a stationary point of the problem (NSIP). If x 0 is not a stationary point of the problem (NSIP), we apply the refinement step to formulate a new subproblem until x 0 is the stationary point of the problem (NSIP). Before moving on to discuss the refinement step, \(y^{0}_{k}\) denotes the optimal solution to the following lower level optimization problem
and the existence of \(y^{0}_{k}\) is unique because of the strict concavity of \(g^{0}_{k}(\cdot,y)\) on \(Y^{0}_{k}\).
Recall the definition that the point \(y^{0}_{k}\in Y^{0}_{k}\) is called an active point at x 0 if \(g^{0}_{k}(x^{0},y^{0}_{k})=0\). Furthermore, we say that the subinterval \(Y^{0}_{k}\) is an active interval at x 0 if \(g^{0}_{k}(x^{0},y^{0}_{k})=0\) for \(y^{0}_{k}\in Y^{0}_{k}\). Define
and
Clearly, E act(x 0) is a subset of E 0, and \(Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{0})\) collects the active points in \(\bigcup _{k=1}^{m_{0}} Y_{k}^{0}\) at x 0 with respect to the αBB convexification. We can then conclude that the cardinal numbers of E act(x 0) coincides with that of \(Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{0})\) because \(g^{0}_{k}(x^{0},y)\) is strictly concave with respect to y on \(Y^{0}_{k}\). The purpose of the refinement step is to divide the active intervals in E act(x 0). For the numerical technique, we introduce the definition of ε ∪-active interval within a given tolerance. The definition of ε ∪-active interval and the refinement step are modified from the ideas in [8].
Definition 1
Given ε ∪>0, \(Y^{0}_{k}\) is an ε ∪-active interval if \(Y^{0}_{k}\) is an active interval and
holds. Moreover, the active point \(y^{0}_{k}\in Y^{0}_{k}\) is an ε ∪-active point if (10) holds.
Here, we specify the refinement step for the ε ∪-active intervals. The refinement step checks all the subintervals in E 0. We have two conditions when \(Y^{0}_{k}\) is an active interval. One is that \(Y^{0}_{k}\) is the ε ∪-active interval, the refinement step divides \(Y^{0}_{k}\) into two subintervals, \(Y^{0}_{k,1}\) and \(Y^{0}_{k,2}\); and also generates a set \(\tilde{E}^{0}=\{Y^{0}_{1},\ldots ,Y^{0}_{k,1},Y^{0}_{k,2},Y^{0}_{k+1},\ldots,Y^{0}_{m_{0}}\}\). We then check the interval \(Y^{0}_{k+1}\) at the next iteration of the refinement step. Otherwise, Definition 1 shows that, the active interval \(Y^{0}_{k}\) is not an ε ∪-active interval if either the length of \(Y^{0}_{k,1}\) or that of \(Y^{0}_{k,2}\) is too small. Under this condition, we do not divide the active interval since the active point is sufficiently close to one of the endpoints. In our algorithm, this definition will assist us to save computational time when we solve the problem (NSIP). Moreover, due to the property in [7], we have \(g^{0}_{k,i}(x,y)\leq g^{0}_{k}(x,y)\) on \(X\times Y^{0}_{k,i}\) and \(\alpha ^{0}_{k,i}\leq\alpha^{0}_{k}\) for i=1,2. Therefore, the new concave functions \(g^{0}_{k,i}(x,y)\) are closer to the constraint function g(x,y) on \(Y^{0}_{k}\). Now, we summarize the steps of the refinement step as the following algorithm.
Algorithm 2
The refinement step for the ε ∪-active intervals
- Step 0:
-
Input \(E^{0}=\{Y^{0}_{1},Y^{0}_{2},\ldots,Y^{0}_{m_{0}}\}\), α 0, and \(Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{0})\), where m 0=|E 0|. Set k=1, \(\tilde{E}^{0}=E^{0}\), and \(\tilde{\alpha}^{0}=\alpha^{0}\).
- Step 1:
-
If \(Y^{0}_{k}=[\eta^{L,0}_{k},\eta^{U,0}_{k}]\) is an ε ∪-active interval and \(y^{0}_{k}\in Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{0})\) is the respective ε ∪-active point on \(Y^{0}_{k}\), go to the Inner-Step 1 of inner loop; otherwise, go to Step 2.
- Inner-Step 1::
-
Define \(Y^{0}_{k,1}=[\eta^{L,0}_{k},y^{0}_{k}]\) and \(Y^{0}_{k,2}=[y^{0}_{k},\eta^{U,0}_{k}]\). Find the corresponding convexification parameters on \(Y^{0}_{k,1}\) and \(Y^{0}_{k,2}\), respectively, \(\alpha^{0}_{k,1}\) and \(\alpha^{0}_{k,2}\), which satisfy
$$\alpha^0_{k,1}>\max\biggr(0,\max_{(x,y)\in X\times Y^0_{k,1}}\frac{\partial^2}{\partial y^2} g(x,y)\biggr)$$and
$$\alpha^0_{k,2}>\max\biggr(0,\max_{(x,y)\in X\times Y^0_{k,2}}\frac{\partial^2}{\partial y^2} g(x,y)\biggr).$$Moreover, define
as the concave function on \(Y^{0}_{k,1}\) and \(Y^{0}_{k,2}\), respectively.
- Inner-Step 2::
-
Denote that \(\tilde{E}^{0}\) is the new collection of subintervals by substituting \(Y^{0}_{k}\) as \(Y^{0}_{k,1}\) and \(Y^{0}_{k,2}\), i.e., \(\tilde{E}^{0}=\{E^{0}\setminus\{Y^{0}_{k}\}\}\cup\{Y^{0}_{k,1}\}\cup\{Y^{0}_{k,2}\}\), and replace the constraint
$$g^0_k(x,y)\leq0, \quad\mbox{for\ }y\in Y^0_{k},$$by the two new constraints
$$g^0_{k,i}(x,y)\leq0, \quad\mbox{for\ all\ }y\in Y^0_{k,i},\ i=1,2.$$Furthermore, let \(\tilde{\alpha}^{0}\) be the corresponding convexification parameter on \(\tilde{E}^{0}\), where \(\tilde{\alpha}^{0}\) is the vector by replacing the entry \(\alpha ^{0}_{k}\) of α 0 by two new entries \(\alpha^{0}_{k,1}\) and \(\alpha^{0}_{k,2}\). Go to Step 2.
- Step 2:
-
Terminate the algorithm when k=m 0 and output \(\tilde {E}^{0}\), and \(\tilde{\alpha}^{0}\). Otherwise, let k=k+1, \(E^{0}=\tilde{E}^{0}\), and \(\alpha^{0}=\tilde {\alpha}^{0}\). Go to Step 1.
Referring to Theorem 1, we can conclude that the stationary point x 0 for the problem (NSIP αBB(E 0,α 0)) satisfies
where \(\lambda_{1}^{0},\ldots,\lambda_{m_{0}}^{0}\) are the corresponding Lagrange multipliers. Let \(y^{0}=(y_{1}^{0},\ldots,\allowbreak y_{m_{0}}^{0})\) and \(\lambda^{0}=(\lambda _{1}^{0},\ldots,\lambda_{m_{0}}^{0})\). Thus, (x 0,y 0,λ 0) satisfies (11)–(13). Clearly, (2) also holds at (x 0,y 0,λ 0) since \(\nabla_{x} g^{0}_{k}(x,y)=\nabla_{x} g(x,y)\) for all \((x,y)\in X\times Y^{0}_{k}\). However, (x 0,y 0,λ 0) may not satisfy (3) because
for k=1,2,…,m 0. Therefore, when \(\lambda^{0}_{k}>0\), \(\lambda^{0}_{k}g(x^{0},y^{0}_{k})=0\) only happens in the case of either \(y^{0}_{k}-\eta^{L,0}_{k}=0\) or \(\eta ^{U,0}_{k}-y^{0}_{k}=0\). However, achieving this condition is difficult. Hence, relaxing the complementarity slackness condition for the numerical algorithm is crucial. We make the following definition of the given tolerance ε.
Definition 2
The point \(x^{0}\in\mathcal{F}\) is called stationary with ε-complementarity for the problem (NSIP) if there exists \(y^{0}_{k}\) such that (11) and
are fulfilled simultaneously.
We emphasize that even though \(g(x,y)\leq g^{0}_{k}(x,y)\) for all \((x,y)\in X\times Y^{0}_{k}\), a feasible point x 0 of the problem (NSIP αBB(E 0,α 0)) cannot imply \(x^{0}\in\mathcal{F}\). For this reason, the feasibility on the unconsidered subintervals of Y may not hold at x 0. That is, let Y c be an arbitrary unconsidered subinterval of Y. Then we have Y c ∉E 0. The concave function g c (x 0,y) with (5) on Y c may not be less than or equal to zero for all y. To develop an algorithm to find the stationary point with ε-complementarity of the problem (NSIP), we check the feasibility on the unconsidered subintervals and refine the ε ∪-active intervals simultaneously to ensure that the feasibility and the ε-complementarity slackness condition (14) hold in the end. Moreover, for a given small number γ>0, we demonstrate that the point x 0 is γ-infeasible if there exists an index c, such that g c (x 0,y)>γ for some y∈Y c . The aforementioned index c is called the index of γ-infeasibility. Furthermore, denote c 0 as the index of maximal γ-infeasibility at x 0 if
for all indices of γ-infeasibility c. Therefore, we conclude our idea in the following algorithm.
Algorithm 3
The relaxed cutting plane method with convexification
- Step 0::
-
Split Y into N disjoint subintervals Y 1,Y 2,…,Y N which \(Y=\bigcup_{j=1}^{N} Y_{j}\) and \(|Y_{j}|=\frac{1}{N}\). Let the set E:={Y 1,Y 2,…,Y N } be the correction of the subintervals. Determine the convexification parameter α with (6) on E, and define the approximate concave functions g j on Y j with (5). Let \(\bar{\alpha}:=\max_{i}\{\alpha_{i}\}\).
- Step 1::
-
Set ν=1. Choose arbitrary subintervals, \(Y_{1}^{1},Y_{2}^{1},\ldots,Y_{m_{1}}^{1}\), in E with m 1<N. Let \(E^{1}=\{Y_{1}^{1},Y_{2}^{1},\ldots,Y_{m_{1}}^{1}\}\) and \(\alpha^{1}=\{\alpha ^{1}_{1},\ldots,\alpha^{1}_{m_{1}}\}\) be the corresponding convexification parameter on E 1. Give adaptive small real numbers ε,γ>0, and let \(\varepsilon _{\cup}:=2\varepsilon\cdot\min\{1,1/\bar{\alpha}\}\). Let \(E^{c}_{1}=E\setminus E^{1}\) be the collection of remain subintervals.
- Step 2::
-
Solve the problem (NSIP αBB(E ν,α ν)) using Algorithm 1 to obtain the stationary point x ν and the corresponding active point set \(Y_{\mathrm{act}}^{\alpha \mathrm{BB}}(x^{\nu})\).
- Step 3::
-
Evaluate the values of
$$\max_{y\in Y_j}g_j(x^\nu,y),$$for all \(Y_{j} \in E^{c}_{\nu}\). Denote c ν as the index of maximal γ-infeasibility, and go to Step 4 if c ν exists. If c ν does not exist, and for every k, the inequalities \(-\lambda^{\nu}_{k}\cdot\varepsilon\leq\lambda^{\nu}_{k}g(x^{\nu},y^{\nu}_{k})\leq0\) hold, and then STOP and output x ν as an approximate stationary point of the problem (NSIP). Otherwise, go to Step 5.
- Step 4::
-
If there exists at least one ε ∪-active point \(y^{\nu}_{k}\in Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{\nu})\), apply Algorithm 2 to E ν, α ν and \(Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{\nu})\). Define \(E^{\nu+1}=\tilde{E}^{\nu}\cup\{Y_{c_{\nu}}\}\), \(\alpha^{\nu +1}=(\tilde{\alpha}^{\nu},\alpha_{c_{\nu}})\), and \(E^{c}_{\nu+1}=E^{c}_{\nu}\setminus\{Y_{c_{\nu}}\}\), where \(\tilde{E}^{\nu}\) and \(\tilde{\alpha}^{\nu}\) are generated by Algorithm 2. Otherwise, in the case of no ε ∪-active point, define \(E^{\nu+1}=E^{\nu}\cup\{Y_{c_{\nu}}\}\), \(\alpha^{\nu +1}=(\alpha^{\nu},\alpha_{c_{\nu}})\), and \(E^{c}_{\nu+1}=E^{c}_{\nu }\setminus\{Y_{c_{\nu}}\}\). Set m ν+1=|E ν+1| and ν=ν+1 and go to Step 2.
- Step 5::
-
Apply Algorithm 2 to E ν, α ν, and \(Y_{\mathrm{act}}^{\alpha\mathrm{BB}}(x^{\nu})\). Define \(E^{\nu+1}=\tilde{E}^{\nu}\), \(\alpha^{\nu+1}=\tilde{\alpha }^{\nu}\), and \(E^{c}_{\nu+1}=E^{c}_{\nu}\). Set m ν+1=|E ν+1|. Let ν=ν+1, and go to Step 2.
Note here, E act(x ν) and \(Y_{\mathrm{act}}^{\alpha \mathrm{BB}}(x^{\nu})\) are defined in a similar way by substituting the index 0 for ν in (8) and (9). This shows that \(E^{c}_{\nu}\subset E\) for every ν. However, E ν is not a subset of E when Algorithm 2 is applied once in the previous iteration. More remarks on Algorithm 3 are listed as follows.
Remark 1
-
(i)
Notice that in Step 2 of Algorithm 3, we check the γ-feasibility on the subintervals in \(E^{c}_{\nu}\) and also determine whether x ν is the stationary point with ε-complementarity of the problem (NSIP). If there exists a γ-infeasible subinterval \(Y_{c_{\nu}}\), we add a new concave constraint \(g_{c_{\nu}}(x,y)\) to the problem (NSIP αBB(E ν,α ν)) in Step 4. Consequently, Algorithm 3 outputs a stationary point with ε-complementarity and γ-feasibility.
-
(ii)
The γ-feasibility check on \(E^{c}_{\nu}\) is an easy task since the functions g j (⋅,y) are concave on Y j for \(Y_{j}\in E^{c}_{\nu}\). Therefore, there are various numerical methods that can find the value of maxg j (x ν,y) for y∈Y j quickly.
-
(iii)
In Step 3, we refine the ε ∪-active intervals and substitute two new approximate concave constraints for the original concave constraint on ε ∪-active interval. The constraint functions on the rest of the subintervals in E ν are maintained. Certainly, the ε ∪-active interval may not exist. If both ε ∪-active interval and c ν do not exist, according to Lemma 2 described in next section, Algorithm 3 will satisfy the stopping criteria.
-
(iv)
To avoid confusion, we rearrange the indices of the subintervals in E ν+1 before go to Step 2. For the sake of convenience, let \(Y^{\nu}_{1}\) be the unique ε ∪-active interval and the index of maximal γ-infeasibility c ν exist at ν-th iterate. Then \(E^{\nu+1}=\tilde{E}^{\nu}\cup\{Y_{c_{\nu}}\}\) is defined in Step 4, where \(\tilde{E}^{\nu}=\{Y^{\nu}_{1,1},Y^{\nu}_{1,2},Y^{\nu}_{2},\ldots ,Y^{\nu}_{m_{\nu}}\}\). Therefore, \(E^{\nu+1}=\{Y^{\nu}_{1,1},Y^{\nu}_{1,2},Y^{\nu}_{2},\ldots ,Y^{\nu}_{m_{\nu}},Y_{c_{\nu}}\}\). We then rearrange the indices of the subintervals in E ν+1 as
$$E^{\nu+1}=\{Y^{\nu+1}_1,Y^{\nu+1}_2,\ldots,Y^{\nu+1}_{m_{\nu+1}}\},$$by letting \(Y^{\nu+1}_{1}=Y^{\nu}_{1,1}\), \(Y^{\nu+1}_{2}=Y^{\nu}_{1,2},\ldots,Y^{\nu+1}_{m_{\nu+1}}=Y_{c_{\nu}}\). It is similar to the rearrangement for the indices of α ν+1. We can find that \(Y^{\nu+1}_{k}\) may not equal \(Y^{\nu}_{k}\) after the rearrangement.
Observe that we have two stopping criteria in Algorithm 3. Generally, we anticipate that the feasibility on \(E^{c}_{\nu}\) will hold after a few iterations. In other words, the number of the remaining subintervals will not be zero when Algorithm 3 terminates. We verify this result by applying some numerical examples in Sect. 5.
4 Convergence properties for algorithm
We demonstrate the finite termination and the error bound properties of Algorithm 3 in this section. Let L ν be defined as
where \(\|Y^{\nu}_{k}\|:=\eta^{U,\nu}_{k}-\eta^{L,\nu}_{k}\) is the length of \(Y^{\nu}_{k}\). In the following lemma, we discuss the relationship between ∥x ν+1−x ν∥ and the largest length of ε ∪-active interval, L ν . This lemma will help us analyze the convergence property of Algorithm 3. Apply the fact that there is a point \(\bar{x}^{\nu}\) in the open segment connecting x ν and x ν+1 such that
for all y∈Y. Additionally, we make the following assumption.
Assumption 1
There exists an index \(\bar{\nu}\) such that for each ε ∪-active point \(y^{\nu}_{k}\in Y_{\mathrm{act}}^{\alpha\mathrm {BB}}(x^{\nu})\), there is a small positive number δ that satisfies the following assumptions for all \(\nu\geq\bar{\nu}\):
-
(a)
\(\|\nabla_{x} g(\bar{x}^{\nu},y)\|\geq\delta\) for all y, where \(\bar{x}^{\nu}\) is the point in (16), and
-
(b)
\(|\nabla_{x} g(\bar{x}^{\nu},y^{\nu}_{k})^{\top}(x^{\nu +1}-x^{\nu})|\geq\delta\|x^{\nu+1}-x^{\nu}\|\).
We note that Assumption 1 can be checked in each iteration of the algorithm. Clearly, Assumption 1(a) can be achieved easily. Assumption 1(b) states that the angle between \(\nabla_{x}g(\bar{x}^{\nu},y^{\nu}_{k})\) and x ν+1−x ν are not too close to being orthogonal. It is known that \(\nabla_{x} g(\bar{x}^{\nu},y^{\nu}_{k})\nabla_{x} g(\bar {x}^{\nu},y^{\nu}_{k})^{\top}\) is a positive semi-definite matrix. Therefore, if the smallest eigenvalue of \(\nabla_{x} g(\bar{x}^{\nu},y^{\nu}_{k})\nabla_{x} g(\bar{x}^{\nu},y^{\nu}_{k})^{\top}\) is greater than δ 2, then Assumption 1(b) naturally holds. Next, we present the following lemma, which is important in our convergence proof.
Lemma 1
Assume that, for each \(\nu\geq\bar{\nu}\), there is at least one ε ∪-active interval. At least one of the subintervals of ε ∪-active interval is an active interval in the next iteration. Suppose that Assumption 1 holds at ε ∪-active points. Then, for each \(\nu\geq\bar{\nu}\), there exists a bounded real number b ν such that
Proof
Let \(Y^{\nu}_{k}\) be the ε ∪-active interval that satisfies the assumption in this lemma and \(y^{\nu}_{k}\in Y_{\mathrm{act}}^{\alpha \mathrm{BB}}(x^{\nu})\) be the ε ∪-active point. Hence, we have \(g^{\nu}_{k}(x^{\nu},y^{\nu}_{k})=0\) and
As \(Y^{\nu}_{k}\) is the ε ∪-active interval, \(Y^{\nu}_{k}\) will be divided into \(Y^{\nu}_{k,1}=[\eta^{L,\nu}_{k},y^{\nu}_{k}]\) and \(Y^{\nu}_{k,2}=[y^{\nu}_{k},\eta^{U,\nu}_{k}]\) in the next iteration. Without loss of generality, we assume that the subinterval \(Y^{\nu}_{k,1}=[\eta^{L,\nu}_{k},y^{\nu}_{k}]\) is the active interval in the next iteration, ν+1. Therefore, \(g^{\nu}_{k,1}(x^{\nu+1},y^{\nu+1}_{k,1})=0\), where \(y^{\nu +1}_{k,1}\) is the maximizer in the following optimization problem:
The function \(g^{\nu}_{k,1}(x,y)\) is defined by
Clearly, according to (17),
Thus, we have
since \(\alpha^{\nu}_{k}\leq\bar{\alpha}\) for all k. We know that both \(g^{\nu}_{k,1}(x^{\nu+1},y^{\nu+1}_{k,1})\) and \(\frac{\partial}{\partial y}g^{\nu}_{k,1}(x^{\nu+1},y^{\nu +1}_{k,1})\) are equal to zero; therefore,
where \(y^{\nu+1}_{k,1}\leq\bar{y}^{\nu+1}_{k,1}\leq y^{\nu}_{k}\). However, the value of \(\frac{\partial^{2}}{\partial y^{2}}g^{\nu}_{k,1}(x^{\nu+1},\bar{y}^{\nu +1}_{k,1})\) is bounded on \(X\times Y^{\nu}_{k,1}\) for all ν. Hence, we have a positive number b ∗ such that \(\frac{\partial ^{2}}{\partial y^{2}} g^{\nu}_{k,1}(x,y)\geq-b^{*}\) for all ν. Equation (19) implies that \(g(x^{\nu+1},y^{\nu}_{k})-g(x^{\nu },y^{\nu}_{k})>-b^{*}(\eta^{U,\nu}_{k}-\eta^{L,\nu}_{k})^{2}\) since \(\eta ^{U,\nu}_{k}-\eta^{L,\nu}_{k}>y^{\nu}_{k}-y^{\nu+1}_{k,1}\). Combining (18) and (19), and then applying the mean value theorem, we obtain
where \(\bar{x}^{\nu}\) is the point on the line segment between x ν and x ν+1. Consequently, we have the following result from both (20) and Assumption 1(b)
As the inequality holds for the ε ∪-active subinterval, this implies
Hence, \(\|x^{\nu+1}-x^{\nu}\|\leq b_{\nu}L_{\nu}^{2}\) when setting \(b_{\nu}=\max(\frac{\bar{\alpha}}{8},\,b^{*})/\delta\). Here, b ν is bounded by a constant for all \(\nu\geq\bar{\nu}\). □
Lemma 1 shows that the value of ∥d ν∥:=∥x ν+1−x ν∥ is concerned with the length of ε ∪-active intervals. We present a lemma modified directly from [8].
Lemma 2
If no γ-infeasible subinterval and no ε ∪-active interval exist, Algorithm 3 terminates.
Proof
According to the result of Lemma 4.5 in [8], if there is no ε ∪-active interval, Algorithm 3 generates a stationary point with ε-complementarity. Combined with the fact of having no γ-infeasible subinterval, Algorithm 3 terminates because of the satisfaction of these two stopping criteria. □
The next theorem presents that the maximum length of ε ∪-active interval tends to 0.
Lemma 3
Let {L ν } be the sequence defined by (15), and suppose that Algorithm 3 does not terminate. Then,
Proof
In Algorithm 3, we divide interval Y into N subintervals Y 1,…,Y N in Step 0, and choose m 1 subintervals to start our algorithm. Let I be the set of the index of iterations for Algorithm 3, and let I γ ={ν 1,ν 2,…,ν l}⊂I be the collection for the index of the iteration with γ-infeasibility, where l∈ℕ. This indicates that, for i=1,2,…,l, \(c_{\nu^{i}}\) exists at the ν i-th iteration. Since the number of unconsidered subintervals is bounded by N−m 1, we have 0≤l≤N−m 1. Moreover, from Lemma 2 and for each ν∈I∖I γ , at least two new subintervals, \(Y^{\nu}_{k,1}\) and \(Y^{\nu}_{k,2}\), are generated from \(Y^{\nu}_{k}\) for some k. Therefore, if \(Y^{\nu}_{k}\) is an ε ∪-active interval, then the result
follows from the definition of ε ∪-active interval. Note that all the subintervals in E ν are at most divided once for each ν, and the length of ε ∪-active interval is monotonically decreasing. Hence, given any integer p∈ℕ, there exists a corresponding sufficiently large iterate \(\tilde{\nu }\in I\setminus I_{\gamma}\) such that L ν is bounded by \((1-\varepsilon _{\cup})^{p}\cdot\frac{1}{N}\) for \(\nu>\tilde{\nu}\) because Algorithm 3 does not terminate. Thus, we obtain lim ν→∞ L ν =0 by letting p→∞. □
The result in Lemma 3 directly implies that ∥d ν∥ approaches 0 as ν goes to infinity. The next theorem shows that Algorithm 3 converges to a stationary point of the problem (NSIP) with ε-complementarity and γ-feasibility after finite iterations.
Theorem 2
Suppose that for each iteration ν, the corresponding Lagrange multipliers are uniformly bounded by a constant M. Moreover, the assumptions in Lemma 1 hold. Let {x ν} be a sequence generated by Algorithm 3. Algorithm 3 will then generate a stationary point of the problem (NSIP) with ε-complementarity and γ-feasibility after finite iterations.
Proof
The KKT condition for the problem (\(\mathrm{NSIP}_{\alpha\mathrm {BB}}(E^{\nu_{s}},\alpha^{\nu_{s}})\)) can be reformulated as
where \(\mu^{\nu_{s}}(y)\) is a discrete measure defined on Y by
As X and Y are compact sets, there exists a subsequence \(\{x^{\nu _{s}}\}\) converging to the accumulation point x ∗, and \(\{\mu^{\nu_{s}}(y)\}\) converges to μ ∗(y) weakly, as s→∞. Moreover, by continuity of ∇f(x) and ∇ x g(x,y), (21) is satisfied for all s. These facts imply that ∥∇f(x ∗)+∫ Y ∇ x g(x ∗,y) dμ ∗∥=0 directly.
Next, we show that x ∗ is a stationary point with ε-complementarity. We claim that for \(k=1,2,\ldots,m_{\nu_{s}}\), \(-\lambda^{\nu _{s}}_{k}\cdot\varepsilon \leq\lambda^{\nu_{s}}_{k} g(x^{\nu_{s}},y^{\nu_{s}}_{k})\leq0\) when s is sufficiently large. The case for \(\lambda_{k}^{\nu_{s}}=0\) is trivial; therefore, we focus our attention on \(\lambda_{k}^{\nu_{s}}>0\). Even though the number of subintervals \(m_{\nu_{s}}\) increases monotonically, in view of Carathéodory’s theorem, the number of positive Lagrange multipliers is at most n+1, where n is the dimension of x. In other words, without loss of generality, let the Lagrange multipliers \(\lambda^{\nu_{s}}_{1}, \lambda^{\nu_{s}}_{2},\ldots,\lambda ^{\nu_{s}}_{m'}\) be positive, where m′≤n+1. Hence, the corresponding subintervals \(Y^{\nu_{s}}_{k}\), with respect to \(\lambda^{\nu_{s}}_{k}\) for k=1,2,…,m′, are active intervals. However, we have two cases. First, if the subinterval is an ε ∪-active interval, then
where \(\bar{\alpha}\) is a constant defined in Step 0 of Algorithm 3. Since \(L_{\nu_{s}}\rightarrow0\), we can find an integer s 2 and the inequality \(\frac{1}{8}\bar{\alpha} L^{2}_{\nu_{s}}\leq\varepsilon\) is valid for s>s 2. Otherwise, if \(Y^{\nu_{s}}_{k}\) is not an ε ∪-active interval. From the definition of \(\varepsilon_{\cup}=2\varepsilon\cdot\min\{1,1/\bar {\alpha}\}\), we obtain
where 1/N is the length of the initial discretized subinterval in Algorithm 3. Moreover, \(g(x^{\nu_{s}},y_{k}^{\nu_{s}})=-(\alpha^{\nu_{s}}_{k}/2)(y^{\nu _{s}}_{k}-\eta_{k}^{L,\nu _{s}})(\eta_{k}^{U,\nu_{s}}-y^{\nu_{s}}_{k})\). Consequently, for all s>s 2, we can obtain \(-\lambda^{\nu_{s}}_{k}\cdot \varepsilon \leq\lambda^{\nu_{s}}_{k} g(x^{\nu_{s}},y^{\nu_{s}}_{k})\leq0\).
Finally, we present that g(x ∗,y)≤0 for y∈Y is true. Suppose that the collection of the remaining subinterval \(E^{c}_{\nu_{s}}\) is nonempty for s=1,2,…. Otherwise, the proof is achieved if \(E^{c}_{\nu_{s}}\) is empty for some s. Let Y r be a subinterval in \(E^{c}_{\nu_{s}}\) for all s. We denote that
where g r (x,y) is the approximate concave function with (5) on Y r . Assume that, by contradiction, \(g_{r}(x^{*},y^{*}_{r})\geq\kappa\), for some κ>0. Since g r is a continuous differentiable function with respect to variable x, there is an index s κ ∈ℕ such that \(g_{r}(x^{\nu_{s}},y_{r}^{*})\geq\kappa/2\), for all s≥s κ . This concludes that
Since \(Y_{r}\in E^{c}_{\nu_{s}}\) for all s∈ℕ, there must exist an index of maximal γ-infeasibility \(c_{\nu_{s}}\) and the corresponding subinterval \(Y_{c_{\nu_{s}}}\in E^{c}_{\nu_{s}}\) such that
where \(y_{c_{\nu_{s}}}\in\arg\max_{y\in Y_{c_{\nu_{s}}}} g_{c_{\nu _{s}}}(x^{\nu_{s}},y)\) and \(g_{c_{\nu_{s}}}(x,y)\) is the concave function with (5) on \(Y_{c_{\nu_{s}}}\). Hence, we conclude that \(g_{c_{\nu_{s}}}(x,y)\) is the new member of the constraints in the problem (\(\mathrm{NSIP}_{\alpha\mathrm {BB}}(E^{\nu_{s}+1},\alpha^{\nu_{s}+1})\)). This implies that
where \(x^{\nu_{s}+1}\) is the stationary point of the problem (\(\mathrm{NSIP}_{\alpha\mathrm{BB}}(E^{\nu_{s}+1},\alpha^{\nu_{s}+1})\)). Therefore, we have
Thus, \(\|\nabla_{x} g_{c_{\nu_{s}}}(\bar{x}^{\nu_{s}},y_{c_{\nu_{s}}})\|\|d^{\nu_{s}}\|\geq\kappa/2\) for some point \(\bar{x}^{\nu_{s}}\) lies between the segment generated by \(x^{\nu_{s}}\) and \(x^{\nu_{s}+1}\). From Lemma 1, we know that \(\|d^{\nu_{s}}\|\rightarrow0\). This contradicts the inequality. Consequently, we obtain
for all y∈Y r Therefore, given any γ>0, there is s 3∈ℕ such that \(g_{r}(x^{\nu_{s}},y)\leq\gamma\) for y∈Y r and for all \(Y_{r}\in E_{\nu_{s}}^{c}\) when s>s 3. Let \(\bar{s}=\max\{s_{1},s_{2},s_{3}\}\). Then \(x^{\nu_{\bar{s}}}\) is a stationary point with ε-complementarity and γ-feasibility. Therefore, our proof is complete. □
We choose ε ∪ to be a small positive number for the numerical implementation in Algorithm 3. In the end, we make our numerical solution satisfy the ε-complementarity slackness condition since the complementarity slackness condition (3) is difficult to satisfy. In Theorem 2, \(-\lambda^{\nu}_{k}\cdot\varepsilon\leq \lambda^{\nu}_{k}g(x^{\nu},y^{\nu}_{k})\leq0\) holds if the active point \(y^{\nu}_{k}\) is not an ε ∪-active point. Hence, the existence of the ε ∪-active interval helps us avoid dividing the unnecessary active intervals. In conclusion, the ε ∪-active interval plays an important role in the numerical experiment.
We prove that Algorithm 3 terminates in a finite number of iterations for any given tolerances ε>0 and γ>0. Let the point \(x^{\nu^{*}(\varepsilon,\gamma)}\) be generated by Algorithm 3 when the algorithm terminates with ε-complementarity and γ-feasibility. In the following theorem, we show that any accumulation point of \(x^{\nu^{*}(\varepsilon,\gamma)}\) as ε,γ→0 is the stationary point of the problem (NSIP).
Theorem 3
Let \(x^{\nu^{*}(\varepsilon,\gamma)}\) be the stationary point with ε-complementarity and γ-feasibility of the problem (NSIP) generated by Algorithm 3. Then,
-
(a)
For a fixed tolerance ε>0, every accumulation point \(x^{\nu^{*}(\varepsilon)}\) of \(x^{\nu^{*}(\varepsilon,\gamma)}\) as γ→0 is a stationary point with ε-complementarity of the problem (NSIP).
-
(b)
Every accumulation point \(x^{\nu^{*}}\) of \(x^{\nu ^{*}(\varepsilon )}\) as ε→0 is a stationary point of the problem (NSIP).
Proof
We first show (a). Since X is a compact subset of ℝn, there exists a point \(x^{\nu^{*}(\varepsilon,\gamma)}\in X\) such that \(x^{\nu ^{*}(\varepsilon,\gamma)}\rightarrow x^{\nu^{*}(\varepsilon)}\) as γ→0. Furthermore, according to the proof in Theorem 2, the result in (a) follows immediately. Next, we show (b). Following the result in (a), we obtain \(x^{\nu ^{*}(\varepsilon)}\in\mathcal{F}\). As \(\mathcal{F}\) is a compact subset of ℝn, there exists an accumulation point \(x^{\nu^{*}}\in\mathcal{F}\) of \(x^{\nu^{*}(\varepsilon)}\) as ε→0. Let \(y^{\nu^{*}(\varepsilon)}_{1},y^{\nu^{*}(\varepsilon)}_{2},\ldots ,y^{\nu^{*}(\varepsilon )}_{m''}\), m″≤n+1, be the points satisfying the KKT conditions (11)–(13) with positive Lagrange multipliers \(\lambda^{\nu^{*}(\varepsilon)}_{1},\lambda^{\nu^{*}(\varepsilon )}_{2},\ldots,\lambda ^{\nu^{*}(\varepsilon)}_{m''}\) at \(x^{\nu^{*}(\varepsilon)}\). Thus, we have
for all ε>0. Hence, (24) converges to (11) when ε→0. Moreover, define
Therefore, \(0\leq\delta^{\nu^{*}(\varepsilon)}\leq\varepsilon\) is obtained according to the stopping criterion of Algorithm 3 and \(\lim_{\varepsilon\rightarrow0}\delta^{\nu^{*}(\varepsilon)}=0\). Let ε→0. Due to the continuity of function g, we obtain the result (b). □
Remark 2
Since ε ∪→0 as ε→0, Lemma 3 may not hold. Hence, this fact leads to the failure of the convergence proof in Theorem 2. We can not show the result of (b) in Theorem 3 by letting ε→0 and applying Theorem 2.
5 Numerical experience
In this section, we apply Algorithm 3 to some examples. We implement Algorithm 3 in Matlab 7.0 and use the routine fmincon in Matlab, with default tolerances as the black box NLP solver in Step 2 of Algorithm 1. In particular, the tolerance for constraint violation is 1e−6. All the experiments were run on a personal computer with AMD(R) Phenom II(R) CPU 3.0 GHz and 2 GB RAM.
Example 1
Among all control design techniques, the Proportional-Integral-Derivative (PID) controller design problem is the most widely used. The PID controller design problem is formulated as an SIP problem in [9]. More recently, the authors in [3] applied the semi-infinite programming technique to solve the optimal worst-case H 2 design of the PID controller for uncertain nonminimum-phase plants. The first example was taken from [9] and was implemented in [13, 25]. The PID controller design problem minimizes the cost function given by
and subject to the constraint
where
with Y=[10−6,30], \(j=\sqrt{-1}\). ℑ(a) and ℜ(a) represent the imaginary and real part of complex number a, respectively. To start our algorithm, we have to determine the value of α first. Generally, the estimation of α is not easy. Here, we evaluate the explicit form of the second derivative of the constraint function and then set α=3.4763. The calculation of α costs time; thus we let \(\alpha^{\nu}_{k}=3.4763\) be a constant for all k and ν. We initialize our algorithm with the tolerances ε=1e−4 and γ=1e−6. For comparison, we implement different values of m 1, where m 1 denotes the number of chosen subintervals in the initial step (Step 0) of Algorithm 3. Create a modified Algorithm 3, denoted Algorithm 3.a, by setting m 1=N in Step 0 of the Algorithm 3. That is, we select all the discretized subintervals in the initial step and need not to check the γ-feasibility in each iteration. We also use Algorithm 1 to solve the subproblem generated by Algorithm 3.a. Moreover, the subroutine for solving the SIP problem in Matlab called fseminf is also implemented for comparison with our results. The following table lists the numerical results for Example 1.
Table 1 shows opt-val as the computed optimal value, cpu-t as the computation time, #iter as the iteration number taken in two algorithms, and #re-int as the number of remaining subintervals in Algorithm 3. Moreover, maxg denotes the optimal value of the constraint function g at the terminated solution over Y. In Table 1, the number of infeasible subintervals is few, and we add five infeasible subintervals to both cases. Since the number of constraints in Algorithm 3 is less than that in Algorithm 3.a, Algorithm 3 at least saves half of the time in computation. Although the number of iterations for Algorithm 3 is greater than that in Algorithm 3.a, the total CPU time spent is less than that in Algorithm 3.a. In the case of N=100, the average CPU time of Algorithm 3 for each iteration, 0.5538 s, is much less than the average time of Algorithm 3.a, 0.1735 s. Therefore, selecting some of the subintervals instead of involving all the subintervals makes the computation more efficient. We can also find in Fig. 1 that the subintervals we refine concentrate on the neighborhood of the active point. Furthermore, the terminated solution of Algorithm 3 is feasible to this example. As g(x,y)≤g j (x,y) and \(g_{j}(x,y)=g(x,y)+\frac{\alpha _{j}}{2}(y-\eta_{j}^{L})(\eta_{j}^{U}-y)\leq\gamma\) for all y∈Y j and for all \(Y_{j}\in E^{c}_{\nu}\) at each iteration ν, the terminated solution generated by Algorithm 3 solution may be feasible.
Example 2
Let Y=Y P∪Y S be the union of disjoint closed design frequency interval of a filter, where Y P=[0,y p] and Y S=[y s,π] represent the passband and stopband, respectively. 0<y p<y s<π are well-defined frequencies. A desired prescription for the frequency response of the filter is given by
where τ 0 is the given constraint group delay and \(j=\sqrt{-1}\). The Finite Impulse Response (FIR) filter design problem considers the best approximation of a given magnitude response by that of a FIR filter. The most recent approach consists of solving the complex Chebyshev approximation problem [10, 17]
where x:=(x 1,…,x n )⊤∈ℝn is the vector containing the coefficients of the unit impulse response, and
Problem (26) is equivalent to the nonlinear SIP problem
In [16, 17], the authors suggest some new algorithms for the solution of problem (26) in the case that problem (26) is a convex SIP problem. More general formulations of the design approximation problems in the frequency domain are presented in [10]. The authors in [10] proposed serval non-convex SIP models for the FIR filter design and solved serval specific design problems. In Example 2, we implement one of the non-convex SIP models in [10].
Let n=91, Y P=[0,π/5], and Y S=[π/4,π]. The desired frequency response is given by τ 0=120 in (25). This example approximates |D(y)|=1 by the composed magnitude response M Q (y)M(x,y) of both filters in Y P and bounds M Q (y)M(x,y) by a constant δ S :=3.16227e−4, where the functions are given by
where ω g :=π/7.02832 is a constant. From [10], we know that M Q (y) is positive for all y∈Y. Hence, the error function |1−M Q (y)M(x,y)| on Y P can be rewritten as
and the other error function on Y S is
Thus, this problem can be interpreted as the problem (26) with the weight function W(y)=M Q (y) and the desired magnitude response 1/M Q (y) in Y P. Due to the result in [10], the problem has a solution that can be computed by the following non-convex SIP problem:
where we restrict X=[−0.3,0.3]n+1. We discretize Y P into 20 subintervals and Y S into 100 subintervals to implement the (SIP FIR ) problem. The parameters are set by ε=1e−4 and γ=1e−6. After calculating, we let α=1.901 to start the two algorithms. Table 2 shows our results and Fig. 2 represents the magnitude M Q (y)M(x,y) for y∈[0,π].
The symbol ∗ denotes that Algorithm 3.a reaches the maximum iteration number we set. Therefore, Algorithm 3.a here runs for 150 iterations and does not meet the stopping criteria. Note that fseminf failed on this example. For each iteration, we find that the NLP solver in Matlab does not compute efficiently. Moreover, the improvement for the optimal value is very torpid. Hence, if the number of initial discretization is too small, it costs more CPU time to run this example. However, Algorithm 3 is obviously much better than Algorithm 3.a in the FIR filter design example.
6 Conclusion
We propose a new algorithm and show the convergence properties of our algorithm in this paper. We improve the algorithm proposed in [8] by combining it with the concept of the cutting plane method. The convexification relaxation on the variable y for the constraint function g can make the computation for the lower level optimization problem easier. This method has an advantage in solving non-convex SIP problems, especially when the constraint function g is complicated. Based on the numerical results in the numerical section, Algorithm 3 implements with better efficiency in Examples 1 and 2 than the other two methods.
References
Adjiman, C.S., Androulakis, I.P., Floudas, C.A.: A global optimization method, αBB, for general twice-differentiable constrained NLPs—I: Theoretical advances. Comput. Chem. Eng. 22, 1137–1158 (1998)
Adjiman, C.S., Androulakis, I.P., Floudas, C.A.: A global optimization method, αBB, for general twice-differentiable constrained NLPs—II: Implementation and computational results. Comput. Chem. Eng. 22, 1159–1179 (1998)
Bianco, C.G.L., Piazzi, A.: A hybrid algorithm for infinitely constrained optimization. Int. J. Syst. Sci. 32, 91–102 (2001)
Betrò, B.: An accelerated central cutting plane algorithm for linear semi-infinite programming. Math. Program. 101, 479–495 (2004)
Bhattacharjee, B., Green, W.H. Jr., Barton, P.I.: Interval methods for semi-infinite programs. Comput. Optim. Appl. 30, 63–93 (2005)
Bhattacharjee, B., Lemonidis, P., Green, W.H.Jr., Barton, P.I.: Global solution of semi-infinite programs. Math. Program. 103, 283–307 (2005)
Floudas, C.A.: Deterministic Global Optimization, Theory, Methods and Applications. Kluwer Academic, Dordrecht (2000)
Floudas, C.A., Stein, O.: The adaptive convexification algorithm: a feasible point method for semi-infinite programming. SIAM J. Optim. 18, 1187–1208 (2007)
Gonzaga, G., Polak, E., Trahan, R.: An improved algorithm for optimization problems with functional inequality constraints. IEEE Trans. Autom. Control AC25, 49–54 (1980)
Görner, S., Potchinkov, A., Reemtsen, R.: The direct solution of nonconvex nonlinear FIR filter design problems by a SIP method. Optim. Eng. 1, 123–154 (2000)
Gramlich, G., Hettich, R., Sachs, E.W.: Local convergence of SQP methods in semi-infinite programming. SIAM J. Optim. 5, 641–658 (1995)
Hettich, R., Kortanek, K.: Semi-infinite programming: theory, methods and applications. SIAM Rev. 35, 380–429 (1993)
Jennings, L.S., Teo, K.L.: A computational algorithm for functional inequality constrained optimization problems. Automatica 26, 371–375 (1990)
Kortanek, K., No, H.: A central cutting plane algorithm for convex semi-infinite programming problems. SIAM J. Optim. 3, 901–918 (1993)
Ni, Q., Ling, C., Qi, L., Teo, K.L.: A truncated projected Newton-type algorithm for large-scale semi-infinite programming. SIAM J. Optim. 16, 1137–1154 (2006)
Potchinkov, A., Reemtsen, R.: FIR filter design in the complex domain by a semi-infinite programming technique. I. The method. Arch. Elektron. Übertrag.tech. 48, 35–144 (1994). II. Examples: 200–209
Potchinkov, A., Reemtsen, R.: The simultaneous approximation of magnitude and phase by FIR digital filters. Part 1. A new approach. Int. J. Circuit Theory Appl. 25, 167–177 (1997). Part 2. Methods and examples: 179–197
Price, C.J., Coope, C.J.: Numerical experiments in semi-infinite programming. Comput. Optim. Appl. 6, 169–189 (1996)
Polak, E.: Optimization, Algorithms and Consistent Approximations. Springer, Berlin (1997)
Qi, L., Ling, C., Tong, X.J., Zhou, G.: A smoothing projected Newton-type algorithm for semi-infinite programming. Comput. Optim. Appl. 42, 1–30 (2009)
Reemsten, R., Görner, S.: Numerical methods for semi-infinite programming: a survey. In: Reemsten, R., Rückmann, J. (eds.) Semi-infinite Programming, pp. 195–275. Kluwer Academic, Boston (1998)
Shapiro, A.: Semi-infinite programming, duality, discretization and optimality condition. Optimization 58, 133–161 (2009)
Still, G.: Discretization in semi-infinite programming: the rate of convergence. Math. Program. 91, 53–69 (2001)
Tanaka, Y., Fukushima, M., Ibaraki, T.: A global convergent SQP method for semi-infinite nonlinear optimization. J. Comput. Appl. Math. 23, 141–153 (1988)
Teo, K.L., Yang, X.Q., Jennings, L.S.: Computational discretization algorithms for functional inequality constrained optimization. Ann. Oper. Res. 28, 215–234 (2000)
Wu, S.Y., Li, D.H., Qi, L., Zhou, G.: An iterative method for solving KKT system of the semi-infinite programming. Optim. Methods Softw. 20, 629–643 (2005)
Zhang, L.P., Hayashi, S., Wu, S.Y.: On the finite termination of an exchange method for convex and nonlinear semi-infinite programming problems. Technique report (2010)
Zhang, L.P., Wu, S.Y., López, M.A.: A new exchange method for convex semi-infinite programming. SIAM J. Optim. 20, 2959–2977 (2010)
Acknowledgements
The authors would like to thank the associate editor and the two anonymous referees for their helpful comments and suggestions that have enabled us to improve the preliminary version of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shiu, TJ., Wu, SY. Relaxed cutting plane method with convexification for solving nonlinear semi-infinite programming problems. Comput Optim Appl 53, 91–113 (2012). https://doi.org/10.1007/s10589-011-9452-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-011-9452-9