Abstract
We propose an inexact proximal bundle method for constrained nonsmooth nonconvex optimization problems whose objective and constraint functions are known through oracles which provide inexact information. The errors in function and subgradient evaluations might be unknown, but are merely bounded. To handle the nonconvexity, we first use the redistributed idea, and consider even more difficulties by introducing inexactness in the available information. We further examine the modified improvement function for a series of difficulties caused by the constrained functions. The numerical results show the good performance of our inexact method for a large class of nonconvex optimization problems. The approach is also assessed on semi-infinite programming problems, and some encouraging numerical experiences are provided.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We consider the constrained nonconvex nonsmooth optimization problem
where \(f,\,c\): \(\mathbf R ^n\rightarrow \mathbf R \) are locally Lipschitz functions, and \(\mathcal {X}\) is a convex compact subset of \(\mathbf R ^n\). Further, we assume that for fixed accuracy tolerances, the approximate function values and subgradients of f and c can be obtained by some inexact oracle for each given point. The errors in the function values and subgradients might be unknown, but are bounded by universal constants.
The above framework (1.1) arises in many optimization problems from real-life applications. For some problems, as in \(H_\infty \)-control problems [2, 42], SIP problems [43] and stochastic programming problems, computing exact information is too expensive, or even out-of-reach, whereas evaluating some inexact information is still possible. Most methods based on exact information can only be used for solving some simplifications of these problems.
Nonsmooth optimization problems are, in general, difficult to solve, even when they are unconstrained. Bundle methods are currently recognized as the most robust and reliable for nonsmooth optimization; see e.g., [1, 21, 24, 31] for more detailed comments. In particular, bundle methods can be thought of as replacing the objective function f by a cutting-plane model to formulate a subproblem whose solution gives the next iteration and obtains the new bundle elements.
Bundle methods for nonconvex problems using exact information were developed in [14, 15, 18, 26, 29, 36, 37, 40, 54, 55] for more detailed comments. Except for [18, 55], all of these methods that desired positive linearization errors did so by redefining them. That is, most of previous methods handle nonconvexity by downshitfing the so-called linearization errors if they are negative. The cutting-planes models in [18, 55] are special in the sense that they no longer model the objective function f but rather certain local convexification. Our proximal bundle method here is along the lines of [18].
Bundle methods capable of handling inexact oracles have received much attention in the last few years. They have been discussed in [25] since 1985. Inexact evaluations of subgradient in subgradient methods date back to [5, 30, 41] in the convex setting, and [47] for nonconvex optimization. Different from earlier works [41, 47], allow nonvanishing errors. Nonvanishing errors of both functions and subgradient values in convex bundle methods were first considered in [48], and further studied in [31]. Besides, several variants have been developed, i.e., [20, 31] for general approaches [9], for energy planning [6, 10], for a stochastic optimization framework [33], for a combinatorial context [43], for SIP problems, and [7] for a detailed summary. As far as we know, the only two other works dealing with inexact information in bundle methods for nonconvex optimization are [19, 42].
For constrained problems, such as problem (1.1) considered here, more sophisticated methods need to come into play. One possibility is to solve an equivalent unconstrained problem with an exact penalty objective function; see [27, 28] for convex constrained problems, and [55] for nonconvex case. These approaches, however, possess some shortcomings, which are typical whenever a penalty function is employed. Specifically, computing a suitable value of the penalty parameter is sometimes too expensive. Besides, if a large value of the penalty parameter is required to guarantee the exactness of a given penalty function, then numerical difficulties arise. Hence, bundle methods in [35, 46] introduce a so-called improvement function
where \(\tau \) is generated by the former iterative points. It is one of the most effective tools to handle constrained optimization. In this paper, we change \(\tau \) into the \(f(x^k)\) plus a weighted measure of its feasibility. Thus there is a balance between the search for feasibility and the reduction of the objective function. The details will be discussed in Sect. 2.1
The cutting-plane model in our method no longer approximates penalty function of the objective function as in [55], or the improvement function as in [24, 46]. It is given by a maximum value function of the piecewise linear models of the local convexification of the objective function f and constraint function c. Hence, our approach can not be viewed as an unconstrained proximal bundle method applied to the function \( h_{\tau }(x)\). Another feature of our algorithm is infeasibility in this paper. With respect to [24, 39], the advantage is that it is not necessary to compute a feasible point to start the algorithm. Also, since serious steps can be infeasible, monotonicity in f is not enforced. Infeasible bundle methods are very rare and valuable.
In this paper, we propose an infeasible proximal bundle method with inexact information for solving (1.1). Above all, the sequence of iterative points of our inexact bundle method converges to an approximate optimal point at most \(\epsilon \) asymptotically under a mild assumption. To be specific, the limit value \(f^*\) and \(c^*\) of the sequences \(\{f^k\}\) and \(\{c^k\}\), respectively, satisfy
for all y in a neighbourhood. If the feasible set of (1.1) is empty, then \(c^*\le c(y)+\epsilon \) for all y.
This paper is organized as follows. In Sect. 2, we state basic conceptual comments, inexact oracle and basic properties of the improvement function. The cutting-planes model and the algorithm itself are stated in Sect. 3, where some preliminary properties also are established. Asymptotic analysis is considered in Sect. 4, and convergence analysis is provided in Sect. 5. Numerical experiments are presented in Sect. 6.
2 Background, assumptions and notation
The description of this work starts with conceptual comments and results of variational analysis, then passes to technical details gradually. We assume that the objective function f and constraint function c are proper [44, p. 5], regular [44, Def 7.25], and locally Lipschitz with full domain.
2.1 Background and assumptions
In this subsection, we first recall concepts and results of variational analysis that will be of use in this paper. The closed ball in \(\mathbf R ^n\) with the center in \(x \in \mathbf R ^n\) and radius \(\rho > 0\) is denoted by \(B_{\rho }(x)\). We shall use \(\partial f(\bar{x})\) to denote the subdifferential of f at the point \(\bar{x}\). By noting the regularity of f, the subdifferential mapping is well-defined and is given by
and each element of this subdifferential is called subgradient. Besides, the Clarke directional derivative [3] in x in direction \(d \in \mathbf R ^n\) is given as
and alternative definition is
Given an open set \(\mathcal {O}\) containing \(\mathcal {X}\), a locally Lipschitz function \(f{:}\,\mathcal {O}\rightarrow \mathbf R \), is called to be lower-\(C^1\), if on some neighborhood V of every \(\tilde{y} \in \mathcal {O}\), there exists an expression as [49]:
where the derivative \(D_{y}g\) is jointly continuous and the index set S is a compact set. Our nonconvex proximal bundle method is given for lower-\(C^1\) functions, and we define the following equivalence given in [8, The 2, Cor 3] and [49, Prop 2.4] as
-
The locally Lipschitz function f is lower-\(C^1\) on \(\mathcal {O}\).
-
Its Clarke subdifferential \(\partial f\) is submonotone at every \(\tilde{y} \in \mathcal {O}\). That means for every \(\tilde{y}\in \mathcal {O}\) and for all \(\theta >0\) there exists \(\rho >0\) such that
$$\begin{aligned}&\left\langle q^1-q^2, z^1-z^2 \right\rangle \ge -\theta \Vert z^1-z^2\Vert ,\\&\quad \mathrm{for \ all} \ z^1, z^2\in B_{\rho }(\tilde{y}) \ \ \mathrm{and} \ \ q^1\in \partial f(z^1), q^2\in \partial f(z^2). \end{aligned}$$ -
For all \(\tilde{y}\in \mathcal {O}\) and for all \(\theta >0\) there exists \(\rho >0\) such that \(\forall \ y\in B_{\rho }(\tilde{y})\) and \(q\in \partial f(y)\)
$$\begin{aligned} f (y+u)-f (x) \ge \langle q,u \rangle -\theta \Vert u\Vert , \end{aligned}$$(2.3)whenever \(\Vert u\Vert <\rho \) and \(y+u \in B_{\rho }(\tilde{y})\).
-
For every \(\tilde{y}\in \mathcal {O}\) and for all \(\theta >0\) there exists \(\rho >0\) such that
$$\begin{aligned} f(ty^1+(1-t)y^2)\le tf(y^1)+(1-t)f(y^2)+\theta t(1-t)\Vert y^1-y^2\Vert , \end{aligned}$$for all \( t\in (0,1)\) and \(y^1, y^2 \in B_{\rho }(\tilde{y})\).
If the optimal value \(f_{\mathrm{min}}\) of the problem (1.1) is known in advance, the solution of (1.1) can be obtained by minimizing \(h_{\mathrm{min}}(y)=\max \Big \{ f(y)-f_{\mathrm{min}}, c(x)\Big \}.\) Notice that if \(x^*\) is a solution of the problem (1.1), then \(h_{\mathrm{min}}(x^*)=0\). Hence, Lemaréchal [35] first changed \(f_{\mathrm{min}}\) into a lower bound \( \underline{f}^k\). The sequence \(\{\underline{f}^k\}\), however, is not available, thus it is not possible for proximal bundle methods. Some alternatives were proposed in [1, 23, 32, 35, 43, 46]. For instance [43], replaced \( \underline{f}^k\) by the objective function value at the current stability center \(x^k\) as
However, all of them may not be suitable for nonconvex optimization with inexact oracle in this paper. Based on [32], we replace \(\tau \) by \(f(x^k)\) plus a weighted measure of its feasibility
with
where \(s_k>0\) and \(t_k>0\) are penalty parameters. The penalty parameters \(s_k\) and \(t_k\) are bounded, and updated depending on a constant along the iterative process. Moreover, as shown below \( h_{x^k}(\cdot )\) is nonnegative at each stability center, and used for the optimality measure.
We introduce the following extended MFCQ (see [22, 51]) for this nonsmooth SIP.
Assumption 1
(The constraint qualification) Suppose that \(x^*\) is a local solution of the problem (1.1). There exists some direction d of \(\mathcal {F}:=\{ x\in \mathcal {X}{:}\,c(x)\le 0\}\) at \(x^*\), satisfies
We now give the following necessary condition of optimality measure.
Lemma 2.1
Suppose \(x^*\) is a local solution of the problem (1.1), and there exists a constant \(\rho ^*\) such that \(f(y)\ge f(x^*)\) for all \(y\in B_{\rho ^*}(x^*)\bigcap \mathcal {X}\). Then the following statements hold:
-
(i)
For all \(y \in \mathcal {X}\), it holds that
$$\begin{aligned} \min \left\{ h_{x^*}(y){:}\, y \in \mathcal {X}\right\} = h_{x^*}(x^*)= 0, \ for \ all \ y \in B_{\rho ^*}(x^*)\cap \mathcal {X}. \end{aligned}$$(2.8) -
(ii)
\( 0 \in \partial h_{x^*}(x^*)\), and there exist \(\mu _0\) and \(\mu _{1}\) satisfying
$$\begin{aligned}&0\in \mu _0\partial f(x^*)+\mu \partial c (x^*)+\partial i_{\mathcal {X}}(x^*),\nonumber \\&\mu \ge 0, \ \mu +\mu _0=1, \ \mu c(x^*)=0, \ c(x^*)\le 0, \end{aligned}$$(2.9)where \(i_{\mathcal {X}}(x^*)\) denotes the indicator function of the set \(\mathcal {X}\).
-
(iii)
If, in addition, Assumption 1 holds, then \(x^*\) is a KKT point of the problem (1.1).
Proof
Since \(x^*\) is a local solution of the problem (1.1), we conclude that
Together with the relation (2.6), we can obtain
That means, for all \(y\in B_{\rho ^*}(x^*)\cap \mathcal {X}\),
which implies that the relation (2.8) holds. From here, the first item (i) follows.
It follows from [24, Lemma 2.15] that \( 0 \in \partial (h_{x^*}(x^*)+i_{\mathcal {X}}(x^*))\). Hence the point \(x^*\) is a FJ point of (1.1) and the conditions (2.9) hold. To see item (iii),
If \(\mu _0\) is zero in condition (2.9) we obtain \(\mu =1\) and \(-g^*\in \partial i_{\mathcal {X}}(x^*)\) for some \(g^*\in \partial c(x^*)\). By noting d is a feasible direction of \(\mathcal {F}\), we get that \(\langle g^*, d\rangle \ge 0\). Thus we obtain
which is an contradiction with (2.7). Hence the constant \(\mu _0\) is strictly positive. Together with the conditions (2.9), we obtain that
since one may take \(\nu =\frac{\mu }{\mu _0}\). Hence, \(x^*\) is a KKT point of the problem (1.1).\(\square \)
2.2 Available information
The algorithm herein will hinge around previously generated information. Suppose that \(y^j\) is an iterative point at jth step, and \(x^k\) denotes the kth stability center. Let \(j_k\) denote the jth iteration giving the stability center \(x^k\), i.e., \(x^k:=y^{j_k}\). The stability center is considered as the “best” known point up to the kth iteration.
We assume that at any \(y^j\), the oracle provides
For each \(y^j \in \mathcal {X}\), the oracle provides us with
At each stability center \(x^k\), for simplicity, we denote
Since the sign of error \(\sigma _{j}\) is not specified, the exact function values may be either underestimated or overestimated by \(f^j\) and \(c^j\). Throughout this section we will make the assumption that the error on each of these estimates is bounded, i.e., there exist corresponding bounds \(\bar{\sigma }\) and \(\bar{\varepsilon }\), such that
where both bounds \(\bar{\sigma }\) and \(\bar{\varepsilon }\) are generally unknown.
3 Defining the inexact algorithm
3.1 The model
Based on the modified improvement function (2.5) and the inexact oracle (2.11), we define the kth inexact modified improvement function at the stability center \(x^k\) as follows:
where \(f^y:=f(y)-\sigma _{y},\,c^y:=c(y)-\sigma _{y}\), and
Both penalty parameters \(s_k\) and \(t_k\) are updated depending on a constant \(\varpi >0\), and satisfy
Since penalty parameters \(s_{k}\) and \(t_k\) are bounded, they must be well-defined along the iterative process. Considering (3.1) and the oracle assumption, we obtain
The strategy (3.3) for updating \(s_{k}\) and \(t_{k}\) can ensure the nonnegativity \( h_{k}(x^k)\ge 0\).
It is worth noting that the past information gives the cutting plane model. Just as a classic method in [46], if f and c are convex functions with exact oracle, then the cutting plane model was given as
However, considering the negativity of the linearization error and the non-monotonicity for the nonconvex function \(h_{x^k}(\cdot )\), as a result many more difficulties are introduced. Besides, the computations for the value \(h_{k}(y^i)\) and its subgradient \(g_h^i\), at each iteration, are relatively expensive. Thus, the formula (3.5) does not apply to our nonconvex optimization in our setting. Then we have to deal with them adequately along the iterative process.
The oracle output is collected along the iterative process to form the “Bundle” of information
where \(L_k^f\) and \(L_k^c\) are respectively index sets for the objective function and the constrained function at the kth iteration. We now define respectively the linearization errors for f and c at \(x^k\) as
Possible negativity of linearization errors is more strictly linked to nonconvexity of f and c compared to inexact oracle information. Some appropriate adjustments should be introduced, such as by adding the second-order relation
where \(\mu _k\) is the so-called convexification parameter.
Having this information, the kth approximate piecewise-linear models for \(f,\,c\) are generated by
where
The challenge is therefore to select \(\mu _k\) sufficiently large that \( \hat{e}_{f_{j}}^k\ge 0\) and \( \hat{e}_{c_{j}}^k\ge 0\) for all j but sufficiently small to remain manageable. Inspired by [17], we define the parameter as follows:
Different from (3.5), having piecewise-linear models (3.8), the kth inexact improvement function is modelled by the following model function:
By this mean, the model function will maintain powerful relationships with the original functions f and c.
3.2 Inexact proximal bundle method
Let \(\Vert \cdot \Vert \) be the Euclidean norm, \(\Vert \cdot \Vert _{k}\) primal metrics and \(|\cdot |_{k}\) dual metrics. The iterative point \(y^{k+1}\) is nothing but the computation of the proximal point of the model function (3.11), with prox-center (stability center) \(x^k\) and a variable prox-metric depending on matrices \(M_k\). More specifically, the positive definite matrix \(M_k\) of order n has the form
where the parameter \(\eta _{k}>0,\,A_k\) is a symmetric \(n\times n\) matrix, and I the identity matrix of order n. As a result, we introduce the corresponding primal and dual metrics for each \(z\in \mathbf R ^n\) as follows:
Let \(\lambda _{\mathrm{max}}\) and \(\lambda _{\mathrm{min}}\) denote, respectively, the maximum eigenvalue and the minimum eigenvalue, then
As a result, by the Euclidean norm, it holds that
Given a positive definite matrix \(M_k\), the next iterative point \(y^{k+1}\) is generated by solving the following quadratic programming (QP) problem
If \(y^{k+1}\) satisfies the descent condition, then \(x^{k+1}=y^{k+1}\) declares a serious point (a new stability center). Otherwise, it declares a null step, i.e., \(x^{k+1}=x^{k}\) and only updates the next piecewise linear model. With (3.15) as a QP\((\mathcal {B}_k)\) with an extra scalar variable r as follows
More precisely, let \((\lambda ^k, \gamma ^k) \in \mathbf R _{+}^{|L^f_k|}\times \mathbf R _{+}^{|L^c_k|}\) denote the optimal solution of the dual problem of (3.16), considering the primal and dual optimal variables, then the following relations are easily recognized to hold:
where
After solving the problem (3.16), the aggregate linearization
which is an affine function, and implies that \(\psi _{k}(y^{k+1})=\Psi _{k}(y^{k+1}),\,G^k =\nabla \psi _{k}(y)\). Clearly, because \(G^k\in \partial \Psi _{k}(y^{k+1})\),
The another ingredient in the bundle method is given by the aggregate error, defined by
For our setting, the noise introduced by the nonconvexity and inordinate noise is judged “too large” when the function value at the stability center is below the minimum model value. To judge the inordinate noise, we check the following noise measurement quantity as
To measure progress towards the solution of (1.1), we define the predicted decrease
By using the relations (3.12), (3.17) and (3.21), we can obtain
Only when the measurement (3.22) does not hold, i.e., the noise is acceptable, then we examine the new iterative point \(y^{k+1}\) by checking the descent condition
to ensure whether \(y^{k+1}\) is good enough to develop into a new stability center. Otherwise, it declares a null step, and the stability center is fixed. The descent condition (3.25) measures the progress towards the solution of (1.1) from two ways. If \(\hat{c}^k\le 0\), it reduces the objective value without losing feasibility by checking the first condition in (3.25). Otherwise, when \(\hat{c}^k> 0\), the emphasis is put on reducing infeasibility by checking the second condition in (3.25).
Lemma 3.1
Suppose \((r_{k}, y^{k+1})\) is the optimal solution of \(QP(\mathcal {B}_k)\). The aggregate linearization error \(V_k\) satisfies the following relation:
Proof
The Lagrange function of (3.16) can be rewritten as
which implies that
On the other hand,
implies that
By noting (3.1), we can obtain that
Together with the functional relations (3.19) and (3.21), we obtain that
It follows from (3.10) for selecting \(\mu _{k}\) that the relation (3.26) holds. From here, the required result follows. \(\square \)
Our method is based on the ideas of [31], and extend them to nonconvex constrained optimization problems. We now give our bundle method for solving problem (1.1)
We now suppose that \(\{\mu _{k}\}\) is bounded. Boundedness of the sequence of \(\{\mu _{k}\}\) has been proved in [17, 55] for the lower-\(C^2\) functions with an exact oracle. In theory, inexactness may result in an unbounded \(\mu _k\) in our setting. The numerical experiments in Sect. 6 show that \(\{\mu _{k}\}\) is bounded under various kinds of perturbations, and our inexact method is satisfactory.
4 Asymptotic analysis
We now analyze the different cases that can arise when Algorithm 3.1 loops forever. As usual in the convergence analysis of bundle methods, we consider the following two possible cases:
-
either there are infinitely many serious steps, or
-
there is an finite number of serious steps, followed by infinitely many null steps.
We start with the case of infinitely many serious steps.
4.1 Infinite serious steps
Theorem 4.1
Suppose that Algorithm 3.1 generates an infinite sequence \(\{x^k\}\) of serious steps. Let \(\mathcal {L}_s\) denote the set gathering indices of serious steps. Then \(\delta _k \rightarrow 0\) and \(V_k \rightarrow 0\) as \( \mathcal {L}_s\ni k\rightarrow \infty \).
-
In addition, if the series \(\sum \nolimits _{k\in \mathcal {L}_s}\frac{1}{\lambda _{\mathrm{max}}(A_k)+\eta _{k}}\) is divergent, then \(\liminf \nolimits _{k\rightarrow \infty } \Vert G^k+\alpha ^k\Vert =0\). Besides, there exists some subsequence \(\mathcal {K}_s \subseteq \mathcal {L}_s\) and an accumulation \(\hat{x}\) such that \(x^k\rightarrow \hat{x}\) and \(G^k\rightarrow \hat{G}\) as \(\mathcal {K}_s\ni k\rightarrow \infty \).
-
If, instead of the above weaker condition, the sequence \(\{{\lambda _{\mathrm{max}}(A_k)+\eta _{k}}\}\) is bounded above, then \(\lim \nolimits _{k\rightarrow \infty }\Vert G^k+\alpha ^k\Vert =0\), and same assertions hold for all accumulation points of the sequence \(\{x^k\}\).
Proof
We examine the following two different possibilities respectively. In the first case, if there exists some \(\tilde{k}\) such that \(\hat{c}^{\tilde{k}}\le 0 \), together with satisfaction of the first condition in (3.25), this means that
and
Together with the basic assumptions on f, we can obtain that the sequence \(\{\hat{f}^k\}_{\mathcal {L}_s}\) is decreasing and bounded. Hence there exists some \(\hat{f}\) such that the sequence \(\{ \hat{f}^k\}_{\mathcal {L}_s }\) converges to \(\hat{f}\) satisfying \(\hat{f}\le \hat{f}^k\) for all \(k\ge \tilde{k}\) in \(\mathcal {L}_s\). Summing up the relations (4.1), we obtain that
which implies \(\delta _k\rightarrow 0\) as \(\mathcal {L}_s \ni k\rightarrow \infty \).
On the other hand, if \(\hat{c}^{k}>0\) for all \(k\in \mathcal {L}_s\), then (3.25) implies that
By the same argument, the sequence \(\{\hat{c}^k\}\) is decreasing and bounded below. Then it converges to a value \(\hat{c}\) satisfying \(\hat{c} \le \hat{c}^k\) for all \(k\ge \tilde{k}\) in \(\mathcal {L}_s\). As a result, summing up the above relation over all \(\mathcal {L}_s\), we see that
which means \(\lim \nolimits _{k\in \mathcal {L}_s} \delta _k=0\). Together with (3.14) and (3.24) we see that
and, hence \( V_k\rightarrow 0\) as \(\mathcal {L}_s\ni k\rightarrow \infty \).
By noting the relation (3.24), we obtain that
Since all the quantities are nonnegative, and together with (3.14), it holds that
If a weaker condition holds, i.e., \(\sum \nolimits _{k\in \mathcal {L}_s}\frac{1}{\lambda _{\mathrm{max}}(A_k)+\eta _{k}}\) is divergent, then we can obtain \(\liminf \nolimits _{k\rightarrow \infty } \Vert G^k+\alpha ^k\Vert =0\). Passing onto the subsequence \(\mathcal {K}_s\), if necessary, we can suppose \(\{x^k\}\rightarrow \hat{x}\) and \(G^k\rightarrow \hat{G}\) as \(\mathcal {K}_s \ni k\rightarrow \infty \). From here first item is now proven. Furthermore, if the sequence \(\{\lambda _{\mathrm{max}}(A_k)+\eta _{k}\}\) is bounded above, we obtain that
which implies that the same assertions can be proven for all accumulations of \(\{x^k\}\). From here all results have been proven. \(\square \)
4.2 Infinitely many consecutive null steps
The another case refers to finitely many serious steps, which implies Algorithm 3.1 makes infinitely many consecutive null steps. Let \(\hat{k}\) denote the last serious index, and \(\hat{x}:=x^{\hat{k}}\) be the last stability center. By adjusting the strategy of the bundle management, the model is ensured to satisfy the following conditions whenever k and \(k+1\) are two consecutive null indexes:
and
Moreover, let \(r_{k}\) denote the objective function optimal value of the (3.15), i.e.,
We first prove the following preliminary result regarding relations between consecutive terms of the sequence \(\{r_{k}\}\).
Lemma 4.1
Suppose that Algorithm 3.1 takes a finite number of serious steps. The optimal values \(r_{k+1}\) and \(r_{k}\) satisfy
Then, the following relations hold:
Proof
By noting the definition of \(\psi _{k}\) in (3.19), it is easy to see that
where the inequality has used \(\alpha ^k \in \partial i_{\mathcal {X}}(y^{k+1})\), and the last equality follows from (4.5). Hence we can obtain that
By evaluating at \(y=\hat{x}\), it follows that
where the last inequality has used (3.20), i.e., \(\psi _{k}(y)\le \Psi _{k}(y)\). Hence the sequence \(\{r_k\}\) is bounded from above. From (4.4), (4.8), and by noting the fact \(M_{k+1}\succeq M_{k}\), we can obtain that
By evaluating at \(y=y^{k+2}\), we obtain that
which implies that the relation (4.6) holds. Since the sequence \(\{r_k\}\) is monotone increasing and bounded from above, we also obtain the result (4.7) holds. \(\square \)
Theorem 4.2
Suppose that Algorithm 3.1 generates a finite sequence of serious steps, and the sequence \(\{\mu _k\}\) be bounded. Let \(\mathcal {L}_n\) denote the set gathering indices of iterations larger than \(\hat{k}\). Then \(\delta _k \rightarrow 0\) and \(V_k \rightarrow 0\) as \(\mathcal {L}_n \ni k\rightarrow \infty \).
-
If, in addition, \(\liminf \nolimits _{k\rightarrow \infty } \frac{1}{\eta _{k}} >0\), then there exists a subsequence \(\mathcal {K}_n\subseteq \mathcal {L}_n\), such that \(\Vert G^k+\alpha ^k\Vert \rightarrow 0,\,y^k \rightarrow \hat{x}\) and \( G^k\rightarrow \bar{G}\) as \(\mathcal {K}_n\ni k\rightarrow \infty \).
-
If, instead of the above weaker condition, there exists \(\eta _{\mathrm{max}}>0\) such that \(\eta _{k}\le \eta _{\mathrm{max}}\), then \(\Vert G^k+\alpha ^k\Vert \rightarrow 0\) and \(y^k \rightarrow \hat{x}\) as \(\rightarrow \infty \).
Proof
For convenience, let \(\hat{x}=x^{k},\,\hat{f}=\hat{f}^{k},\,\hat{c}=\hat{c}^{k},\,\hat{A}=A_{\hat{k}}\) \(\hat{\theta }_1=\theta _1^k,\,\hat{\theta }_2=\theta _2^k\), and \(\hat{h}=h_{k}(x^k)\) for all \(k\ge \hat{k}\). By noting the definition of the inexact improvement function, we can obtain that
If \(\hat{c}>0\), yields,
where the second inequality follows from (3.25), and the last equality has used the relation (3.4).
On the other hand, if \(\hat{c} \le 0\), it follows from (3.4) that \(\hat{h}=0\). By combining with (3.25), we get
By adding \(\delta _k\) to both terms, we can obtain that
where the last inequality has used the relation (3.23), that is
Following (3.9) and (4.3), we can observe that
By the same argument, we obtain that
Besides, from (3.11), (4.12) and (4.13), we get
By noting the sequence \(\{y^k\}\subset \mathcal {X}\), there exists a positive constant \(N>0\) such that \(\Vert y^k-\hat{x}\Vert \le N\) for all k. Therefore,
where the inequality has used the fact that \(g_c^{k+1}\) and \(g_f^{k+1}\) are bounded (recall that f and c are locally Lipschitzian). By combining with (4.11), we obtain that
where the last equality comes from the relation (4.5). From the relation (3.14) and the condition \(M_{k+1}\succ M_k\), we get
where the last inequality follows from \(G^k \in conv\{g_f^{j},\ g_c^{j}, \ j\in L_k \}\) and \(\hat{\eta }:=\eta _{\hat{k}}\le \eta _{k}\) for all \(k\ge \hat{k}\). Passing to the limit as \(k\rightarrow \infty \), and using (4.7) in Lemma 4.1, we obtain that \(\lim \nolimits _{k\rightarrow \infty } \delta _k=0\). Together with (3.14) and (3.24) we see that
Since \(\liminf \nolimits _{k\rightarrow \infty } \frac{1}{\eta _{k}} >0\), we obtain that \(\lim \nolimits _{\mathcal {K}\ni k\rightarrow \infty } V_k=0\) and \(\lim \nolimits _{\mathcal {K} \ni k\rightarrow \infty } \Vert G^k+\alpha ^k\Vert =0\) for some subsequence \(\mathcal {K}\). Furthermore, it is follows from (3.17) and \(M_{k+1}\succeq M_{k}\) that \(\lim \nolimits _{\mathcal {K} \ni k\rightarrow \infty } y^{k}=\hat{x}\). If \(\eta _{k}\le \eta _{\mathrm{max}}\), it is obviously that \(\Vert G^k+\alpha ^k\Vert \rightarrow 0\) and \(y^k \rightarrow \hat{x}\) as \(\rightarrow \infty \). From here the results have been proved. \(\square \)
5 Convergence results
One of purposes of conditions (3.3) is to ensure satisfaction of the relations stated in the following lemma. Let \(x^*\) denote an accumulation point of \(\{x^k\},\,(f^*,c^*)\) and \((s_*,t_*)\) be the limit points of \(\{(f^k,c^k)\}\) and \(\{(x_k,t_k)\}\) respectively. We can define the constant \( \varpi \) in (3.3) satisfying \(\varpi >\frac{1}{c^{*}}(\max \nolimits _{x\in \mathcal {X} } \{f(y)-c(y)\}-f^{*})\), which implies that the penalty parameters \(s_{k},\,t_k\) satisfy
Theorem 5.1
Suppose that Algorithm 3.1 solves (1.1) satisfying (2.11), (2.12) with penalty parameters \(s_k\) and \(t_k\) satisfying (3.3).
If the algorithm loops forever, then for each \(\theta ^*>0\), there exists a positive constant \(\rho ^*\) such that
-
$$\begin{aligned} (1-t^*)\max \{c^*,0\} \le \max \Big \{ f(y)-f^*-s_*\max \{c^*,0\}, c(y)-t_*\max \{c^*,0\} \Big \}+\epsilon ,\nonumber \\ \end{aligned}$$(5.2)
for all \(y \in B_{\rho ^*}(x^*) \bigcap \mathcal {X}\), where \(\epsilon :=(\theta ^*+\bar{\varepsilon })\rho ^*+\bar{\sigma }\).
-
If \(c^*>0\) then
$$\begin{aligned} c^* \le c(y)+\epsilon \ \ \mathrm{for\ all} \ y. \end{aligned}$$ -
If \(c^*<0\) and \(X_{\epsilon }:=\{y\in X:c(y)< -\epsilon \}\) is not empty set, it holds that
$$\begin{aligned} f^*\le f(y)+\epsilon \ \ \mathrm{for \ all } \ \ y\in X_{\epsilon }. \end{aligned}$$
Proof
If Algorithm 3.1 loops forever, since the set \(\mathcal {X}\) is compact and \(x^k \in \mathcal {X}\), there exists an index set \(\mathcal {L}'\) such that \(\{x^k\}_{k\in \mathcal {L}'}\rightarrow x^*\), and recalling that when \(\mathcal {L}'=\mathcal {L}_{s}\) eventually \(x^*=\lim \nolimits _{k\rightarrow \infty }y^k\). Let \(q_{{f}}^j\in \partial f(x^j)\) and by noting (3.7) and (3.9), we obtain that
where the last equality follows from (2.11). By noting the definition of lower-\(\mathcal {C}^1\), then for all \(y^j\) and \(\theta ^j>0\), there exists \(\rho ^j >0\) such that
which combined with the relation (5.3), implies that
Together with satisfaction of (3.9c), i.e., \(g_{{f}}^j= \hat{g}_{f_j}^k-\mu _k(y^j-x^k)\), it holds that
where the last inequality follows from \(p_j^k\ge 0\). The above relation will eventually mean that
By same argument, it holds that
It follows from (5.5) and (5.6) that
Besides, by noting (3.27) and (3.23) it holds that
which implies that
By noting Theorem 4.1, Theorem 4.2 and Lemma 5.1, and passing to the limit in above relation as \(k\rightarrow \infty \), we obtain that
As already seen, \(G^k+\alpha ^k\rightarrow G^*+\alpha ^*=0\) and \( \alpha ^* \in \partial i_{\mathcal {X}}(x^*)\), implies that \(-G^*\in \partial i_{\mathcal {X}}(x^*)\), thus \(\langle -G^*, y-x^*\rangle \le 0\). From here the result in first item has been proven.
To show second item, consider the condition of \(c^*>0\). Recalling the relation (5.1), yields
which implies that
Together with the relation (5.2), we obtain that
Then the result in second item has been proven.
Finally, to see item (iii), consider condition \(c^*<0\). According to (5.2), it holds that
which means \(f^*\le f(y)+\epsilon \) for \(x\in X_{\epsilon }\). From here all results have been proven. \(\square \)
From the third statement of lower-\(C^1\) in (2.3), the constants \(\theta ^*\) and \(\rho ^*\) for the point \(x^*\) can be set small enough. Thus Theorem 5.1 states that, as long as \(\theta ^*\) and \(\rho ^*\) are chosen properly, the constant \(\epsilon \) should meet the required precision accordingly. Besides, Theorem 5.1 also indicate that, if the parameter \(s_k\) is updated quickly (see e.g., \(s_{k+1}=2 s_{k}\) if \(\hat{c}^k>0\), and \(s_{k+1}=s_{k}\) otherwise), Algorithm 3.1 will eventually find a point infeasible up to the accuracy problem (1.1). Meanwhile, by noting the relation \(c(y)>c^*-\epsilon \), then the feasible set of (1.1) may be empty set or very small. Otherwise, the set \(X_{\epsilon }\) should be nonempty as long as the accuracy \(\epsilon \) is small enough and an approximate local solution of (1.1) will be detected.
Our inexact bundle method can also be given for lower-\(C^2\) functions. The function f is lower-\(C^2\) on a open set \(\mathcal {M}\) as long as it is finite value on an open set \(\mathcal {M}\) and for any point \(\bar{x} \in \mathcal {M}\) there exists a constant \(R_\mathrm{th}>0\) such that \(f+\frac{r}{2}|-\bar{x}|^2\) is convex on an open neighborhood \(\mathcal {M}'\) of \(\bar{x}\) for all \(r>R_\mathrm{th}\). It has been shown the lower-\(C^2\) functions are locally Lipschitz continuous.
The prox-regularity is essential when working with proximal points in a nonconvex setting. Specifically, if f is a prox-regular locally Lipschitz function, then it is lower-\(C^2\) function. The finite convex function is lower-\(C^2\) [44, Theorem 10.31]. There are also some other lower-\(C^2\) functions, i.e., \(C^2\) functions, strongly amenable functions, and if f is l.s.c., proper, and prox-bounded, then the opposite of Moreau envelopes \(e_\lambda f\) is lower-\(C^2\).
By applying [38, Prop. 10.54] and [55, Lemma 4.2], we obtain that there exist two thresholds \(\mu _{f},\,\mu _{c}\), such that for all \(\mu \ge \rho ^{id} :=\max \{\mu _{f}, \mu _{c}\}\), and any given point \(y\in \mathcal {L}_0\) (a nonempty compact set), the functions \(f(\cdot )+\frac{\mu }{2}|\cdot -y|^2,\,c(\cdot )+\frac{\mu }{2}|\cdot -y|^2\) are convex on \(\mathcal {L}_0\). Besides, the positive threshold \(\rho ^{id}\) can be updated along iterations, by using data in the bundle generated by Algorithm 3.1. Besides, from [55, Lemma 4.2] the parameter \(\mu _{k}\) remain unchanged eventually, that is there exist \(\hat{k}>0\) and \(\bar{\mu }>0\), such that \( \mu _{k}=\bar{\mu }, \ \mathrm{for \ all \ } k>\hat{k}\). The similar results as Theorem 5.1 can be proved naturally.
6 Numerical experiments
To assess practical performance of our inexact proximal bundle method, we coded Algorithm 3.1 in Matlab and ran it on a PC with 1.80 GHz CPU. Quadratic programming solver is QuadProg.m, which is available in the Optimization Toolbox.
6.1 Parameters for the proximal bundle method
We first set, respectively, the minimum and maximum positive thresholds to be \(\eta _{\mathrm{min}}=10^{-5}\) and \(\eta _{\mathrm{max}}=10^{9}\). Once the number of active elements in the bundle \(\mathcal {G}_l\) is more than \(num(\mathcal {G}_{\mathrm{max}})=50\), the bundle should be compressed. The initial value of the prox-parameter is started \(\eta _0=1\).
We first compute \( \tilde{\eta }_{k}\) by using the reverse quasi-Newton scalar
Just as [37], the prox-parameter \(\eta _{{k+1}}\) will be updated at each serious steps as below.
where
If the kth iteration declares a null step, then the prox-parameter \(\eta _{k}\) remains constant. This strategy satisfies the condition of the parameters in Theorems 4.1 and 4.2. If necessary, \( \eta _{{k+1}}\) is projected so that \(\eta _{{k+1}}=[\eta _{\mathrm{min}}, \eta _{\mathrm{max}}]\).
Let \(\iota =2\), and update the convexification parameter \(\mu _{k}\) by using the relation (3.10), i.e.,
which is similar to the strategy in [18]. And in [18, Lem 3] and [55, Lem 4.2], for the exact oracle (i.e., \(\bar{\sigma }=0\) and \(\bar{\varepsilon }=0\)) it has proven that the sequence \(\{\mu _{k}\}\) is boundedness. Yet the boundedness of \(\{\mu _{k}\}\) would be difficult to prove without additional coercive assumptions on the behavior of the errors in our setting. It is worth noting that the penalty parameters \(s_{k}\) and \(t_{k}\) is updated, which depends on a constant \(\varpi \) defining in (5.1). Considering the basic assumptions on \(f,\,c\) and \(\mathcal {X}\), we can obtain that the values of both functions on the set \(\mathcal {X}\) are bounded. Thus the constant \(\varpi \) is bounded, and as a result \(s_{k}\) and \(t_{k}\) are well-defined in this work.
6.2 Examples for nonconvex optimization problems
In this subsection, we first introduce the nonconvex test problems. We prefer a series of polynomial functions developed in [11]; see also [12, 18]. For each \(i=1, 2,\ldots , n\), the function \(h_i{:}\,R^n \rightarrow R\) is defined by
where M is a fixed constant. There are five classes of test functions defined by \(h_i\) in [12] as objective functions
It has been proved in [12, 18] that they are nonconvex, globally lower-\(C^1\), bounded on compact \(\mathcal {X}\), and level coercive. We can obtain that \(0=\min \nolimits _{x} f_{i}\) and \(\{0\} \subseteq \arg \min \nolimits _{x} f_{i}\) for \(i = 1, 2, 3, 4\). Thus we can define the compact \(\mathcal {X}:=B_{15}(0)\).
For constraint functions, we consider the pointwise maximum of a finite collection of quadratic functions developed in [17], i.e.,
where \(A_i \) are \(n \times n\) matrices, \(B_i \in \mathbf R ^{ n} \), and \(C_i \in \mathbf R \) for \(i = 1, 2,\dots , n\). Here all the coefficients \(A_i,\,B_i\), and \(C_i\) are uniformly distributed in [−5, 5], which are chosen randomly by matlab code. The functions as (6.4) have many important practical advantages. Firstly, they are always lower-\(\mathcal {C}^2\) (semiconvex), prox-bounded, and prox-regular, but may be nonconvex since \(A_i\) are not necessary to be positive definite. Secondly, the sequence of the convexification parameter \(\{\mu _{k}\}\) for \(c(\cdot )\) is fixed as
As a result, the large enough convexification parameters \(\mu \) for \(c(\cdot )\) can be estimated in advance. Thirdly, many different examples are easily obtained just by randomly generating \(A_i,\,b_i\) and \(c_i\) and choosing values for n and N. Lastly, the oracles (2.11) and (2.12) are easy to obtain the inexact information of the constrained functions.
We considered the following test problems
for \(i=1,\ldots ,4\) and \(n=5,\ldots ,15,20\). For each test, it is well known that the optimum point is \(x^*=(0,\ldots ,0)\in \mathbf R ^n\). Hence, each \(f_i\) is clearly bounded below by 0. To check the precision, we will report the objective function value in last serious point. The remaining parameters of all of these tests are the same and listed below:
-
stopping tolerance \(tol_{stop}=1.0\mathrm{e}-06\), initial starting point \(x_0=(1,\ldots ,1)\);
-
prox-parameter \(\eta _0 =5\), convexification parameter \(\mu _0=5\);
-
Armijo-like parameter \(m= 0.55\), positive constant \(\iota =2\);
-
penalty parameters \(s_0=15\) and \(t_0=0\), stopping tolerance tol\(_{stop}=10^{-6}\);
-
initial variable prox-metric \(A_0=I\) (\(n\times n\) identity matrix).
6.3 Comparison with penalty proximal bundle method
In this subsection, we examine the performances of Algorithm 3.1 with inexact oracles. At each evaluation, we use inexact function values and subgradients as \(f(y^j)-\sigma _{j},\,g_{f}^j= \tilde{g}_{f}^j+\varepsilon _{j}\) and \(c(y^j)-\sigma _{j},\,g_{c}^j= \tilde{g}_{c}^j+\varepsilon _{j}\), where \(\tilde{g}_{f}^j \in \partial f (y^j)\) and \(\tilde{g}_{c}^j \in \partial c(y^j)\). To provide a brief comparison of Algorithm 3.1 to other research, we compared our results with those obtained by the penalty proximal bundle method (PPBM) in [55].
Our results for deterministic tests are summarized in following tables, in which n denotes dimension. The following notations are used as
Our numerical results for general nonconvex examples are reported in Tables 1, 2, 3, 4, 5, 6, 7 and 8, which show a reasonable performance of Algorithm 3.1.
We analyze the results in more detail as follows:
-
For objective function \(f_{1}\), we tested 12 nonconcex examples with \(n=5,\ldots ,15,20\). We use constant noise \(\sigma _{j} = 0.01\) and \(\varepsilon _{j}=0.01\,*\,\texttt {ones}(n,1)\) for all j. Tables 1 and 2 show that values of \(f_{final}\) obtained by Algorithm 3.1 are two orders of magnitude smaller than PPBM in most cases. Thus, Algorithm 3.1 succeeds in obtaining a reasonably high accuracy, while spend less time in all cases. Besides, the percentage of serious steps (Ki) in total iterations (Ni) is above 90% in many cases, which means that most iterations are reliable and Algorithm 3.1 is effective.
-
To introduce errors in the available information, we use random noises \(\sigma _j=0.1\,*\,\texttt {random}(\) ‘Normal’, 0, 0.1) and \(\varepsilon _j=0.1\,*\,\texttt {random}(\) ‘Normal’, 0, 0.1, n, 1), which generates random numbers from the normal distribution with mean 0 and standard deviation 0.1, and scalars n and 1 are the row and column dimensions. Tables 3 and 4 show the results of Algorithm 3.1 as compared to PPBM for the objective function \(f_{2}\) with \(n=5,\ldots ,15,20\). We observe that Algorithm 3.1 only needs one-half of the cpu time of PPBM. In particular, if \(n=20\), the value of \(\texttt {time}\) of our method is just one-third of PPBM. We also see that Algorithm 3.1 always produces better (lower) objective values than PPMB for all examples.
-
For the objective function \(f_3\), we also tested 12 nonconvex examples with \(n=5,\ldots ,15,20\) from randomly generated initial points. For these nonconvex examples, we introduce random noises \(\sigma _j=0.1\,*\,\texttt {unifrnd}(0,1)\) and \(\varepsilon _j=0.1\,*\,\texttt {unifrnd}(0,1,n,1)\). The matlab code \(\texttt {unifrnd}(0,1,n,1)\) returns an \(n-\)dimensional column vector, and all elements are generated from the same distribution in the interval [0, 1]. From Tables 5 and 6, it observes that Algorithm 3.1 always produces better (lower) objective values than PPMB, while spend less time in most cases. Thus, Algorithm 3.1 performs better than PPBM for the objective function \(f_{3}\).
-
For the objective function \(f_4\), we also tested 12 nonconvex examples with \(n = 5,\ldots ,15,20\) from a fixed initial point \(x^0=ones(n,1)\). For these nonconvex examples, we introduce random noises \(\sigma _j= 0.1\,*\,\texttt {normrnd}(0,0.2)\) and \(\varepsilon _j= 0.1\,*\,\texttt {normrnd}(0,0.2,n,1)\). The matlab code \(\texttt {normrnd}(0,0.2,n,1)\) generates random numbers from the normal distribution with mean 0 and standard deviation 0.2, where scalars n and 1 are the row and column dimensions. From Tables 7 and 8, it seems that our method succeeds in obtaining better objective values, at the price of less time than PPBM. In most of cases, the number of serious steps (Ki) is much more than the number of null steps, which implies that our inexact method is high efficiency.
6.4 Impact of noise on solution accuracy
To analyse the errors in the available information, we test five different types of inexact oracle:
-
\(\texttt {NN}\) (No noise): \(\sigma _{j}=\bar{\sigma }= 0\) and \(\varepsilon _j=\bar{\varepsilon }= 0\), for all j,
-
\(\texttt {CN}\) (Constant noise): \(\bar{\sigma }= \sigma _j = 0.01\) and \(\varepsilon _j=\bar{\varepsilon }=0.01\), for all j,
-
\(\texttt {VN}\) (Vanishing noise): \(\bar{\sigma }= 0.01,\,\sigma _j=\min \{0.01, \frac{\Vert y^j\Vert }{100}\},\,\varepsilon _j=\min \{0.01,\frac{\Vert y^j\Vert }{100}\}\), for all j,
-
\(\texttt {CGN}\) (Constant Gradient noise): \(\bar{\sigma }=\sigma _j=0\) and \(\varepsilon _j=\bar{\varepsilon }=0.05\), for all j,
-
\(\texttt {VGN}\) (Vanishing Gradient noise): \(\bar{\sigma }=\sigma _j=0\) and \(\varepsilon _j=\min \{0.01,\frac{\Vert y^j\Vert }{100}\}\), for all j.
Considering the optimum value of all the test problems is zero, we can use the formula Precision\(=|\log _{10}(f_{final})|\) to check the performance of the various methods. Figures 1 and 2 report the average performance of Algorithm 3.1 when noise is introduced (the results are averaged across all 10 runs). To better reveal the influence of noise, we also report the results of NN (no noise). Figure 1 reports the results for constant noise (CN and CGN) with \(\texttt {tol}=10^{-6}\). We can observe that Algorithm 3.1 can achieve an acceptable accuracy for the type CN. The results of CGN are better than those of CN, although still notably deviations than when exact calculations are available. Figure 2 reports the precision results of Algorithm 3.1 for vanishing noise (VN and VGN). The vanishing noise results in reducing the accuracy for \(\texttt {tol}= 10^{-6}\), generally better than in the constant noise cases.
6.5 Comparison with FSQP-GS and SLQP-GS
In this subsection, we aim to compare the practical effectiveness of Algorithm 3.1 with a sequential quadratic programming algorithm (SLQP-GS) [4] and a feasible SQP-GS algorithm (FSQP-GS) [52]. We test the following two examples, which are taken from [4, 45] respectively. In all numerical experiments, we choose the same initial points as FSQP-GS in [52] for all examples in this subsection.
To analyse the errors in the available information, we consider four different types of inexact oracle:
-
Constant noise (CN): \(\sigma _{j} = 0.01\) and \(\varepsilon _{j}=0.01*\texttt {ones}(n,1)\) for all j;
-
\(\sigma _j=0.1*\texttt {random}\) (‘Normal’, 0, 0.1) and \(\varepsilon _j=0.1*\texttt {random}\) (‘Normal’,0, 0.1, n, 1) for all j;
-
\(\sigma _j=0.1*\texttt {unifrnd}(0,1)\) and \(\varepsilon _j=0.1*\texttt {unifrnd}(0,1,n,1)\) for all j;
-
\(\sigma _j= 0.1*\texttt {normrnd}(0,0.2)\) and \(\varepsilon _j= 0.1*\texttt {normrnd}(0,0.2,n,1)\) for all j.
Example 1
Constrained nonsmooth Rosenbrock problem [4]:
The solution of this problem is \(x^*=(\frac{\sqrt{2}}{2},\frac{1}{2})\) at which both the objective function and the constraint function are nondifferentiable. In this numerical experiment, we set the initial points \((0.066661, -0.350366)^\mathrm{T}\) and \((0.433746,-1.447530)^\mathrm{T}\) respectively.
Example 2
Rosen–Suzuki problem [45]:
where \(f_1(x)=x^2_1+x^2_2+2x^2_3+x^2_4-5x_1-5x_2-21x_3+7x_4,\,f_2(x)=f_1(x)+10c_1(x),\,f_3(x)=f_1(x)+10c_2(x),\,f_4(x)=f_1(x)+10c_3(x),\,c_1(x) = x^2_1+x^2_2+x^2_3+x^2_4+x_1-x_2+x_3-x_4-8,\,c_2(x)=x^2_1+2x^2_2+x^2_3+2x^2_4-x_1-x_4-10,\,c_3(x)=x^2_1+x^2_2+x^2_3+2x_1-x_2-x_4-5\). In our numerical experiment, we set the initial points \((1, 1, 1, 1)^\mathrm{T}\) and \((0, 0, 0, 0)^\mathrm{T}\) respectively.
From Tables 9 and 10, it observes that Algorithm 3.1 (CN) needs a fewer number of iterations than FSQP-GS and SLQP-GS for all examples. Tables 9 and 10 also show that Algorithm 3.1 (CN) always obtains similar objective values with SLQP-GS and FSQP-GS, while spends less time for most cases. In general, Algorithm 3.1(CN) performs slightly better than FSQP-GS and SLQP-GS for all examples. Besides, we also consider random noises, which are generated randomly by using matlab codes \(\texttt {random},\,\texttt {unifrnd}\) and \(\texttt {normrnd}\). From Tables 9 and 10, it observes that Algorithm 3.1(rand) and Algorithm 3.1(unifrnd) are comparable for FSQP-GS and SLQP-GS for most cases. Algorithm 3.1(normrnd) spends much time for some cases, than FSQP-GS and SLQP-GS.
6.6 Results for semi-infinite programming problems
For benchmarking purposes we will investigate the computational behaviour and convergence results for SIP problems. We will compare the performance of Algorithm 3.1 by using Constant noise with the central cutting plane algorithm (CCPA) [34] and the SIP solver fseminf in MATLAB toolbox. The SIP problem has the following abstract structure
where \(f(x){:}\,\mathbf R ^n \rightarrow \mathbf R \) is locally Lipschitz and not necessary differentiable; g: \(\mathbf R ^n \times V \rightarrow \mathbf R \) is twice continuously differentiable; and V is a nonempty compact subset of \(\mathbf R ^n\).
We first consider the so-called lower level problem
to obtain the global solution. The difficulty lies in the fact that \(-g(\bar{x},v_{glob})\) is the globally optimal value of (6.6) which might be hard to obtain numerically. In fact, most of standard nonlinear problem solvers can only be expected to output a local minimizer \(v_{loc}\). Many works aim at structuring a sequence of convexifications of the lower level problem by using the technologies in [13, 50] to solve the auxiliary problems with convex lower levels. In fact, the function c needs only an inexact value with a given fixed accuracy \(\omega \). That is we only want to obtain an approximate solution \(v(\bar{x})\) such that \(c(\bar{x},v(\bar{x}))\ge c(\bar{x})-\omega .\) The iterative rules in [14, 19] can generate more and more accurate solutions to (6.6), until a \(\omega \)-optimal solution.
The numerical results are listed in Table 11, where the following notations are used as
Example 3
Finding the Chebyshev approximation of the function, and has been tested by Floudas and Stein [13, Example 5.1]:
Example 4
The following SIP problem is tested by Kortanek and No [34], and stemmed from Tichatschke and Nebeling [53]:
Example 5
The problem is discussed by Goberna and Löpez [16], and Zhang and Wu [56]:
Although the feasible set is not bounded, the objective function is level bounded on the feasible set.
Example 6
Consider the following problem, which was tested by Kortanek and No [34]:
Example 7
The following convex SIP problem has been tested by Kortanek and No [34]:
Observe that the problem satisfies the Slater CQ and the feasible set is bounded.
The numerical results of Examples 3–7 are reported in Table 11. One can observe in Table 11 that Algorithm 3.1 performs well and provides an optimal solution to each example. Besides, the inexact solution of subproblem (6.6) can be obtained effectively, which means the inexact oracle is reasonable. To obtain a similar optimal solution, the SIP solver fseminf and the CCPA took much time than our method. In our opinion, the performance of Algorithm 3.1 for solving Examples 3–7 is better than that of the CCPA and the SIP solver fseminf.
References
Ackooij, W., Sagastizábal, C.: Constrained bundle methods for upper inexact oracles with application to joint chance constrained energy problems. SIAM J. Optim. 24, 733–765 (2014)
Apkarian, P., Noll, D., Prot, O.: A proximity control algorithm to minimize nonsmooth and nonconvex semi-infinite maximum eigenvalue functions. J. Convex Anal. 16, 641–666 (2009)
Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples, 2nd edn. Springer, Berlin (2006)
Curtis, F.E., Overton, M.L.: A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization. SIAM J. Optim. 22, 474–500 (2012)
d’Antonio, G., Frangioni, A.: Convergence analysis of deflected conditional approximate subgradient methods. SIAM. J. Optim. 20, 357–386 (2009)
de Oliveira, W., Sagastizábal, C., Scheimberg, S.: Inexact bundle methods for two-stage stochastic programming. SIAM J. Optim. 21, 517–544 (2011)
de Oliveira, W., Sagastizábal, C., Lemaréchal, C.: Convex proximal bundle methods in depth: a unified analysis for inexact oracles. Math. Program. 148, 241–277 (2014)
Daniilidis, A., Georgiev, P.: Approximate convexity and submonotonicity. J. Math. Anal. Appl. 291, 292–301 (2004)
Emiel, G., Sagastizábal, C.: Incremental-like bundle methods with application to energy planning. Comput. Optim. Appl. 46, 305–332 (2010)
Fábián, C., Szöke, Z.: Solving two-stage stochastic programming problems with level decomposition. Comput. Manag. Sci. 4, 313–353 (2007)
Ferrier, C.: Bornes Duales de Problémes d’Optimisation Polynomiaux, Ph.D. thesis. Laboratoire Approximation et Optimisation, Université Paul Sabatier, Toulouse (1997)
Ferrier, C.: Computation of the distance to semi-algebraic sets. ESAIM Control Optim. Calc. Var. 5, 139–156 (2000)
Floudas, C.A., Stein, O.: The adaptive convexification algorithm: a feasible point method for semi-infinite programming. SIAM J. Optim. 18, 1187–1208 (2007)
Fuduli, A., Gaudioso, M., Giallombardo, G.: A DC piecewise affine model and a bundling technique in nonconvex nonsmooth optimization. Optim. Methods Softw. 19, 89–102 (2004)
Fuduli, A., Gaudioso, M., Giallombardo, G.: Minimizing nonconvex nonsmooth functions via cutting planes and proximity control. SIAM J. Optim. 14, 743–756 (2004)
Goberna, M.A., López, M.A.: Linear Semi-Infinite Optimization. Wiley, New-York (1998)
Hare, W., Sagastizábal, C.: Computing proximal points of nonconvex functions. Math. Program. 116, 221–258 (2009)
Hare, W., Sagastizábal, C.: A redistributed proximal bundle method for nonconvex optimization. SIAM J. Optim. 20, 2442–2473 (2010)
Hare, W., Sagastizábal, C., Solodov, M.: A proximal bundle method for nonconvex functions with inexact oracles. Comput. Optim. Appl. 63, 1–28 (2016)
Hintermüller, M.: A proximal bundle method based on approximate subgradients. Comput. Optim. Appl. 20, 245–266 (2001)
Hiriart-Urruty, J.-B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms. II, Volume 306 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Springer, Berlin (1993). Advanced theory and bundle methods
Jongen, H.T., Rückmann, J.-J., Stein, O.: Generalized semi-infinite optimization: a first order optimality condition and examples. Math. Program. 83, 145–158 (1998)
Karas, E., Ribeiro, A., Sagastizȧbal, C., Solodov, M.: A bundle-filter method for nonsmooth convex constrained optimization. Math. Program. 116, 297–320 (2009)
Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization, Lecture Notes in Mathematics. Springer, Berlin (1985)
Kiwiel, K.C.: An algorithm for nonsmooth convex minimization with errors. Math. Comput. 45, 171–180 (1985)
Kiwiel, K.C.: A linearization algorithm for nonsmooth minimization. Math. Oper. Res. 10, 185–194 (1985)
Kiwiel, K.C.: An exact penalty function algorithm for nonsmooth convex constrained minimization problems. IMA J. Numer. Anal. 5, 111–119 (1985)
Kiwiel, K.C.: Exact penalty functions in proximal bundle methods for constrained convex nondifferentiable minimization. Math. Program. 52, 285–302 (1991)
Kiwiel, K.C.: Restricted step and Levenberg–Marquardt techniques in proximal bundle methods for nonconvex nondifferentiable optimization. SIAM J. Optim. 6, 227–249 (1996)
Kiwiel, K.C.: Convergence of approximate and incremental subgradient methods for convex optimization. SIAM. J. Optim. 14, 807–840 (2004)
Kiwiel, K.C.: A proximal bundle method with approximate subgradient linearizations. SIAM. J. Optim. 16, 1007–1023 (2006)
Kiwiel, K.C.: A method of centers with approximate subgradient linearizations for nonsmooth convex optimization. SIAM J. Optim. 18, 1467–1489 (2008)
Kiwiel, K.C., Lemaréchal, C.: An inexact bundle variant suited to column generation. Math. Program. 118, 177–206 (2009)
Kortanek, K.O., No, H.: A central cutting plane algorithm for convex semi-infinite programming problems. SIAM J. Optim. 3, 901–918 (1993)
Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69, 111–147 (1995)
Lukšan, L., Vlček, J.: A bundle-Newton method for nonsmooth unconstrained minimization. Math. Program. 83, 373–391 (1998)
Lv, J., Pang, L.P., Wang, J.H.: Special backtracking proximal bundle method for nonconvex maximum eigenvalue optimization. Appl. Math. Comput. 265, 635–651 (2015)
Mifflin, R.: An algorithm for constrained optimization with semismooth functions. Math. Oper. Res. 2, 191–207 (1977)
Mifflin, R.: A modification and extension of Lemarechal’s algorithm for nonsmooth minimization. Math. Program. Stud. 17, 77–90 (1982)
Mifflin, R.: A quasi-second-order proximal bundle algorithm. Math. Program. 73, 51–72 (1996)
Nedić, A., Bertsekas, D.P.: The effect of deterministic noise in subgradient methods. Math. Program. 125, 75–99 (2010)
Noll, D.: Bundle method for non-convex minimization with inexact subgradients and function values. Comput. Anal. Math. 50, 555–592 (2013)
Pang, L.P., Lv, J.: Constrained incremental bundle method with partial inexact oracle for nonsmooth convex semi-infinite programming problems. Comput. Optim. Appl. 64, 433–465 (2016)
Rockafellar, R.T., Wets, J.J.-B.: Variational Analysis. Springer, Berlin (1998)
Rustem, B., Nguyen, Q.: An algorithm for the inequality-constrained discrete minimax problem. SIAM J. Optim. 8, 265–283 (1998)
Sagastizábal, C., Solodov, M.: An infeasible bundle method for nonsmooth convex constrained optimization without a penalty function or a filter. SIAM J. Optim. 16, 146–169 (2005)
Solodov, M.V., Zavriev, S.K.: Error stabilty properties of generalized gradient-type algorithms. J. Optim. Theory Appl. 98, 663–680 (1998)
Solodov, M.V.: On approximations with finite precision in bundle methods for nonsmooth optimization. J. Optim. Theory Appl. 119, 151–165 (2003)
Spingarn, J.E.: Submonotone subdifferentials of Lipschitz functions. Trans. Am. Math. Soc. 264, 77–89 (1981)
Stein, O.: Bi-Level Strategies in Semi-Infinite Programming. Kluwer, Boston (2003)
Stein, O.: On constraint qualifications in nonsmooth optimization. J. Optim. Theory. Appl. 121, 647–671 (2004)
Tang, C.M., Liu, S., Jian, J.B., Li, J.L.: A feasible SQP-GS algorithm for nonconvex, nonsmooth constrained optimization. Numer. Algor. 65, 1–22 (2014)
Tichatschke, R., Nebeling, V.: A cutting plane method for quadratic semi-infinite programming problems. Optimization. 19, 803–817 (1988)
Wolfe, P.: A method of conjugate subgradients for minimizing nondifferentiable functions. In: Balinski, M.L., Wolfe, P. (eds.) Nondifferentiable Optimization. Math. Program. Stud., 3, pp. 145–173. North-Holland, Amsterdam (1975)
Yang, Y., Pang, L.P., Ma, X.F., Shen, J.: Constrained nonconvex nonsmooth optimization via proximal bundle method. J. Optim. Theory Appl. 163, 900–925 (2014)
Zhang, L.P., Wu, S.-Y., López, M.A.: A new exchange method for convex semi-infinite programming. SIAM J. Optim. 20, 2959–2977 (2010)
Acknowledgements
The authors wish to thank Editor-in-Chief Sergiy Butenko, the preceding managing editors and the anonymous reviewers for their helpful comments on the earlier version of this paper, which considerably improved both the presentation and the numerical experiments. We also gratefully acknowledge the support of the Huzhou science and technology plan on No. 2016GY03 and Natural Science Foundation of China Grant 11626051.
Author information
Authors and Affiliations
Corresponding author
Additional information
Partially supported by Huzhou science and technology plan on No. 2016GY03 and Natural Science Foundation of China Grant 11626051.
Rights and permissions
About this article
Cite this article
Lv, J., Pang, LP. & Meng, FY. A proximal bundle method for constrained nonsmooth nonconvex optimization with inexact information. J Glob Optim 70, 517–549 (2018). https://doi.org/10.1007/s10898-017-0565-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-017-0565-2
Keywords
- Constrained optimization
- Nonconvex optimization
- Nonsmooth optimization
- Inexact oracle
- Proximal bundle method