Abstract
We introduce a new flexible inexact-restoration algorithm for constrained optimization problems. In inexact-restoration methods, each iteration has two phases. The first phase aims at improving feasibility and the second phase aims to minimize a suitable objective function. In the second phase, we also impose bounded deterioration of the feasibility, obtained in the first phase. Here, we combine the basic ideas of the Fischer-Friedlander approach for inexact-restoration with the use of approximations of the Lagrange multipliers. We present a new option to obtain a range of search directions in the optimization phase, and we employ the sharp Lagrangian as merit function. Furthermore, we introduce a flexible way to handle sufficient decrease requirements and an efficient way to deal with the penalty parameter. Global convergence of the new inexact-restoration method to KKT points is proved under weak constraint qualifications.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Practical methods for solving nonlinearly constrained optimization problems are iterative. Given an iterate, one tries to find a better approximation to the solution taking into account feasibility and optimality criteria. Several traditional nonlinear programming methods preserve feasibility of all the iterates, so that the improvement criterion relies only on objective function values [1–9]. These methods are very effective in many cases, but they tend to be slow in the presence of constraints with high curvature. Modern inexact-restoration (I-R) methods, on the other hand, admit infeasibility of iterates and employ inexact nonspecific restoration procedures, which may be adapted to different classes of problems. See [10–23].
The common features of the I-R methods mentioned above are the following:
-
Given the current iterate, a sufficiently more feasible point (inexactly restored point) is computed by any arbitrary method. This is the Feasibility Phase of the I-R iteration.
-
At the Optimality Phase of the I-R iteration, one minimizes (approximately) a suitable objective function (the Lagrangian or some approximation) for obtaining a trial point. The trial point is compared with the current iterate in terms of feasibility and optimality (using merit functions as in [10, 11, 19], or filters, as in [12, 18]). If the trial point is acceptable, we define it as the new iterate. Otherwise, we find a new trial point “closer” to the inexactly restored point. The fact that the new trial point should be “closer” to the inexactly restored point, instead of closer to the current iterate, is crucial in the philosophy of I-R methods, because ultimately the inexactly restored point is considered to be better than the current iterate.
The first I-R methods [10, 11] motivated the introduction of a sequential optimality condition called AGP (approximate gradient projection) [24]. For several years, it was thought that AGP was equivalent to the approximate KKT condition (AKKT) implicitly used to define stopping criteria in nonlinear programming algorithms. Surprisingly, it was recently shown [25] that AGP is stronger than AKKT; moreover, one of the AGP variations, called linear AGP (L-AGP), is stronger than AGP. These theoretical facts support one of the decisions taken in our I-R method: When linear constraints are present in the problem, feasibility with respect to these constraints should be satisfied at all the iterations.
Fischer and Friedlander [19] introduced a new line search by means of which an I-R method exhibits interesting global convergence properties. The resulting approach turns out to be simpler than more classical approaches based on trust regions or filters. The main convergence result of [19] states that under proper assumptions, the search direction, used at the optimality phase, tends to zero. This result complements the suggestion of Gomes-Ruggiero, Martínez and Santos [17] who claim that by means of the spectral gradient choice [26–30] of the first optimality-phase trial point, I-R turns out to be the natural generalization to nonconvex and constrained optimization of the spectral projected gradient (SPG) method [31–33], which is currently used for large-scale convex and constrained minimization.
In [17], spectral gradient tangent directions and the trust-region globalization approach of [10] were invoked for proving convergence. With the aim of getting closer to quadratic convergence [13], we use here a search direction that comes from solving a strictly convex Newton-related quadratic minimization problem. The resulting method is the natural generalization of the variable metric method [34] for nonlinear programming in the same sense that the method [17] is a generalization of SPG.
In this work, we introduce an improved line search I-R algorithm based on the Fischer–Friedlander approach. The proposed method is more flexible than the approaches in [10, 11, 19], when dealing with sufficient decrease. Several typical I-R algorithmic requirements are relaxed and, therefore, larger steps are more likely to be accepted by the algorithm. We also employ approximations of Lagrange multipliers in the merit function in order to add relevant information when solving the problem. An optimization procedure for general constrained optimization problems is introduced.
This paper is organized as follows. In Sect. 2, we introduce the new I-R method and we prove that it is well defined. In Sect. 3, we state the assumptions that will allow us to prove convergence starting from arbitrary initial points, and we prove feasibility and optimality of the limit points. The plausibility of the assumptions used to prove convergence is discussed in Sect. 4. Conclusions are given in Sect. 5.
2 Basic Method
In this section, we consider the constrained optimization problem in the form:
where \(\Omega \subset \mathbb {R}^n\) is a polytope, \(f: \mathbb {R}^n \rightarrow \mathbb {R}\), \(h: \mathbb {R}^n \rightarrow \mathbb {R}^m\) and both \(f\) and \(h\) are smooth. Every smooth, finite dimensional, and constrained optimization problem may be written in the form (1) using slack variables, if necessary.
For all \(x \in \Omega \) and \( \lambda \in \mathbb {R}^m\), we define the Lagrangian \(L(x, \lambda )\) by
Given a penalty parameter \( \theta \in [0, 1]\), we consider, for all \(x \in \Omega , \lambda \in \mathbb {R}^m\), the following merit function that consists of a convex combination of the Lagrangian function and the infeasibility measure:
Note that, when \(\theta =1\), the merit function coincides with the Lagrangian function and, when \(\theta =0\), it coincides with the infeasibility measure. In these cases, we are neglecting either the objective function or the feasibility at the point \(x\). In our method \(\theta \) is always smaller than one, and we will prove that it is also bounded away from zero. In this paper, the symbol \(\Vert \cdot \Vert \) denotes the Euclidean norm, although many times it can be replaced by an arbitrary norm.
The merit function (2) is essentially the sharp Lagrangian defined in [35] and employed in [36–38], in the context of nonsmooth optimization, and in [11] in the I-R framework.
Algorithm 2.1 below is a flexible form of the I-R method with the line search procedure of [19]. One improvement here with respect to [19] is that we are able to use Lagrange multipliers in the algorithm (thus employing (2)) preserving convergence. Moreover, sufficient descent requirements are relaxed so that acceptance of “Pure Newtonian” steps will be more frequent.
Algorithm 2.1
Flexible inexact-restoration with sharp Lagrangian.
Step 0 Initialization
As initial approximation we choose, arbitrarily, \(x^0 \in \Omega \) and \(\lambda ^{0} \in \mathbb {R}^m\). We initialize \(\theta _{-1} \in \; ]0, 1[\) and \(k \leftarrow 0\).
Step 1 Restoration step
If \(\Vert h(x^k)\Vert =0\), define \(y^{k+1} := x^k\). Otherwise, try to compute \(y^{k+1} \in \Omega \) such that
If this is not possible, we say that Step 1 is deemed unsuccessful and the algorithm stops.
Step 2 Estimation of Lagrange Multipliers
Compute \(\lambda ^{k+1} \in \mathbb {R}^m\). (Practical effective ways for this choice will be given later.)
Step 3 Descent direction
Compute \(d^k \in \mathbb {R}^n\) such that \(y^{k+1} + d^k \in \Omega \),
and, for some \(c_k > 0\) and all small enough \(t > 0\):
Step 4 Globalization
Step 4.1 Sufficient decrease parameters
If \(\Vert h(x^k)\Vert = \Vert h\big (y^{k+1}\big )\Vert = 0\), take \(r_k \in \; ]0,1[\). Else, define \(r_k \ne 0\) such that
Set
Step 4.2 Penalty parameter computation
Compute \(\theta _k\) as the supremum of the values of \(\theta \in [0, \theta _{k-1}]\) such that
Step 4.3 Line search
If \(\Vert d^k\Vert =0\) we define \(t_k := 1\).
Otherwise, by means of some backtracking procedure, compute \(t_k \in [0, 1]\) such that
and
Moreover, the backtracking procedure should be such that either \(t_k = 1\) or there exists \(\bar{t}_k \in [t_k, 10 t_k]\) such that
or
Step 4.4 Iteration update
Set
update \(k \leftarrow k+1\) and go to Step 1.
We have defined Algorithm 2.1 without an explicit stopping criterion, which, of course, is necessary in practical implementations. This means that, in principle, the algorithm generates an infinite sequence whose theoretical properties are proved in Sects. 2 and 3. In Sect. 5, we define suitable criteria for stopping the execution of the algorithm.
Algorithm 2.1 is a simple inexact-restoration algorithm, where even the condition of \(d^k\) being in the tangent subspace is relaxed. However, it is interesting to show that this simplified method is well defined in the sense that an iteration can be completed if the inexactly restored point \(y^{k+1}\) can be computed at every iteration. In the following theorem, we do not make use of Lipschitz conditions at all, neither on the objective function, nor on the constraints.
Theorem 2.1
For all \(x^k \in \Omega \), if the point \(y^{k+1}\) at Step 1 of Algorithm 2.1 is successfully computed, then the iterate \(x^{k+1}\) is well defined.
Proof
By the hypothesis, we have that Step 1 is well defined. Steps 2 and 3 can be completed observing that \(\lambda ^{k+1}=0 \in \mathbb {R}^m\) and \(d^k=0 \in \mathbb {R}^n\) are always acceptable. Although these options will not be our choice, this ensures that both steps are well defined. We will show now that the other computations required in the iteration are possible, whenever \(\Vert h\big (y^{k+1}\big )\Vert < \Vert h(x^k)\Vert \) or \(y^{k+1} = x^k\).
Computation of \(\theta _k\): If \(y^{k+1} = x^k\), we have that \(\Vert h\big (y^{k+1}\big )\Vert = \Vert h(x^k)\Vert = 0\) and, thus, \(L\big (x^k,\lambda \big )=L\big (y^{k+1},\mu \big )\) for all \(\lambda ,\mu \in \mathbb {R}^m\). This means that (8) is obvious for all \(\theta \) and so \(\theta _{k}=\theta _{k-1}\).
If \(y^{k+1} \ne x^k\), by (3), we have that \(\Vert h(y^{k+1})\Vert < \Vert h(x^k)\Vert \) and, since \(0\le s_k <1\),
Therefore, for small enough \(\theta \), we have that
This implies that, for small enough \(\theta > 0\), (8) takes place. By direct calculations, we note that the parameter \(\theta _k\) can be computed in the following way: If (8) is verified with \(\theta = \theta _{k-1}\), then take \(\theta _k = \theta _{k-1}\). Otherwise, we compute
Computation of \(t_k\): We only need to consider the case in which \(\Vert d^k\Vert \ne 0\).
By (4), we have that (9) necessarily holds for small enough \(t\). Let us prove now that (10) also holds for small \(t\).
Assume first that \(y^{k+1} = x^k\), then, \(\Vert h(y^{k+1})\Vert = \Vert h(x^k)\Vert = 0\) and \(L(x^k,\lambda )=f(x^k)\) for all \(\lambda \in \mathbb {R}^m\). Thus
and, by (5),
Now, by (4), for small enough \(t\), we have that
and
By (15), (16), (17), and (18), for small enough \(t\), we have that
So we conclude that (10) holds for small enough \(t\) when \(y^{k+1} = x^k\).
Now, let us show that \(t_k\) is well defined in the case that \(\Vert h(y^{k+1})\Vert < \Vert h(x^k)\Vert \). In this case, by (7), we have that
By continuity,
for small enough \(t\). This ensures that \(t_k\) can be computed in a finite time by means of a backtracking procedure. This completes the proof.
Observe that (10) is easier to be fulfilled, if \(r_k\) is chosen to be close to \(1\) in (6). This means that the chances of accepting large steps \(t_k\) increase if we choose \(r_k \approx 1\). Moreover, the gap in (19) is larger as \(s_k\) is closer to zero, which also favors the acceptance of large steps. On the other hand, \(\theta _k\), the weight of the Lagrangian in the merit function, is larger as \(s_k\) is closer to \(r_k\). Since the direction \(d^k\) is a descent direction for the Lagrangian, it is natural that large values of \(\theta _k\) increase the chances to accept large steps. We believe that the best choice of \(s_k\) strongly depends on the nonlinearity of the constraints in (1). For simplicity in our implementation, we decide to take the largest gap in (19) setting \(s_k=0\) for all \(k\). This flexible way of handling sufficient decrease allows the algorithm to take larger steps than the ones in [19], where \(r_k\) and \(s_k\) are equal and fixed for every \(k\).
Theorem 2.1 showed that, for well-definedness, the following conditions are essential: (a) simple descent of \(\Vert h(y^{k+1})\Vert \) with respect to \(\Vert h(x^k)\Vert \); (b) the direction \(d^k\) should be quasi-tangent (5); and (c) \(d^k\) must be a descent direction for the Lagrangian \(L\).
3 Assumptions and Global Convergence
The global convergence proof for Algorithm 2.1 requires a problem assumption (3.1) and several algorithmic assumptions (3.2–3.6).
Assumption 3.1
The problem should be such that Lipschitz conditions hold both on the gradients of \(f\) and \(h\). Namely, there exists \( \eta > 0\) such that, for all \(x, y \in \Omega \),
The algorithmic assumptions involve conditions that should be satisfied by the different steps of the algorithm in order to guarantee global convergence. For the introduction of these assumptions, we will denote the Euclidean projection of \(x\) on a non-empty, closed, and convex set \(\Omega \) by \(P_\Omega (x)\) and \(\{x^k\}, \{y^{k}\}, \{\lambda ^{k}\}\) will be the sequences generated by Algorithm 2.1. Note that these sequences have infinitely many terms, except in the case when the restoration step fails and (3) cannot be obtained for some \(k\).
Assumption 3.2
For all \(k \in \mathbb {N}\), Step 1 of the Algorithm is successful and there exists \(r \in [0, 1]\) and \(\beta > 0\) such that
and
Conditions (21) and (22) involve the restoration step. Condition (21) states that sufficient uniform feasibility improvement should be obtained by the restoration procedure. In most I-R implementations, \(r\) is an algorithmic parameter and the algorithm stops declaring “restoration failure” if the restoration procedure fails to satisfy (21) after reasonable computer time. Here we adopt the practical point of view that even conservative algorithmic parameters \(r\) (say \(r = 0.99\)) could be excessively strict at some iterations of the algorithm, which, however, could converge smoothly under single descent requirements such as (3). On the other hand, the requirement (21) for some unknown value of \(r\) is usually satisfied under regularity assumptions on the constraints. Since \(y^{k+1}=x^k\) if \(\Vert h(x^k)\Vert =0\), and \(r_k\) is defined as in (6) if \(y^{k+1}\ne x^k\), condition (21) implies that
Condition (22) requires that the deterioration of the Lagrangian at \((y^{k+1}, \lambda ^{k+1})\) should be smaller than a multiple of the infeasibility \(\Vert h(x^k)\Vert \). If the Lagrange multipliers estimates are bounded, and under suitable Lipschitz assumptions, condition (22) is implied by \(\Vert y^{k+1} - x^k\Vert = O(\Vert h(x^k)\Vert )\). This means that the distance between \(x^k\) and the restored point \(y^{k+1}\) should be proportional to the infeasibility measure at \(x^k\). Providing suitable safeguarding parameters \(\beta \) is even harder than in the case of (21), because (22) is scale-dependent. In practical terms, Assumption 3.2 says that we believe that the restoration algorithm employed is reasonable enough so that sufficient improvement and bounded-distance requirements are automatically satisfied.
Assumption 3.3
For all \(k \in \mathbb {N}\), we choose \(d^k\) such that \(y^{k+1} + d^k \in \Omega \) and \(\nabla h(y^{k+1})^T d^k = 0\). Moreover, we assume that there exists \(\sigma > 0\) such that
and
for all \(k \in \mathbb {N}\), where \(P_k\) denotes the Euclidean projection on the polytope defined by \(y \in \Omega \) and \(\nabla h(y^{k+1})^T (y-y^{k+1}) = 0\). As we will see in Sect. 4, there is no loss of generality in the use of the same \(\sigma \) in (24) and (25).
In practice, the direction \(d^k\) will be obtained as the solution of the following problem:
where \(H_k\) is an approximation to the Hessian of the Lagrangian. Since \(\Omega \) is a polytope, we have that (26)–(27) is a quadratic programming problem. Note that solving (26)–(27) is equivalent to minimize \(\frac{1}{2} d^t H_k d + \nabla f(y^{k+1})^T d\) subject to (27).
Note that the requirement (24) is not necessarily satisfied by the solution of (26)–(27), unless we impose some additional conditions on \(H_k\). We will show in Sect. 4 that Assumption 3.3 holds if the eigenvalues of the matrix of \(Z_k^T H_k Z_k\) lie in a positive interval \([\sigma _\mathrm{min}, \sigma _\mathrm{max}]\), where the columns of \(Z_k\) form an orthonormal basis of the null-space of \(\nabla h(y^{k+1})^T\). In implementations, we will define \(H_k\) as the Hessian of the approximate Lagrangian, testing further the descent condition and switching to a safe positive definite matrix, if necessary.
Assumption 3.4
There exists \(\gamma > 0\) such that, for all \(k \in \mathbb {N}\),
Note that (28) states that the Lagrangian, whose quadratic approximation is minimized at (26)–(27), should decrease along the direction \(d^k\). The sufficient decrease condition, which depends on \(\gamma \), holds for small enough \(t_k\) under the choice (26)–(27) with safeguards on the eigenvalues of \(H\). As we saw in Theorem 2.1, condition (10) also holds for small enough \(t_k\). Therefore, Assumption 3.4 suggests a safeguarding backtracking procedure that aims to satisfy, simultaneously, (10) and (28).
Assumption 3.5
The Lagrange multiplier estimates \(\{\lambda ^{k}\}\) lie in a compact set.
By Assumption 3.1, we have that \(\nabla L(\cdot ,\lambda ): \Omega \rightarrow \mathbb {R}^n\) is Lipschitz continuous in \(\Omega \) for all fixed \(\lambda \in \mathbb {R}^m\). By Assumption 3.5, we can assume that the Lipschitz constant for \(\nabla L(\cdot ,\lambda ^{k})\) is the same for all \(k \in \mathbb {N}\). Without any loss of generality, we will also denote by \(\eta \) this Lipschitz constant, that is, for all \(x,y \in \Omega \) and \(k \in \mathbb {N}\),
Assumption 3.6
The Lagrange multiplier estimates \(\{\lambda ^{k}\}\) are such that
As we will see in Sect. 4, this assumption can be fulfilled, if we take \(\lambda ^{k+1}\) as the Lagrange multipliers associated with \(\nabla h(y^{k})\) in the KKT conditions of the problem (26)–(27).
In next lemma, we will prove that the sequence of the penalty parameters is bounded away from zero. This means that the objective function always has a significant weight in the merit criterion.
Lemma 3.1
Suppose that Assumption 3.2 holds. The sequence \(\{\theta _k\}\) is non-increasing and bounded away from zero.
Proof
By direct calculations, (8) is equivalent to
If \(\big [L\big (y^{k+1},\lambda ^{k+1}\big ) - L\big (x^k,\lambda ^{k}\big ) + \Vert h(x^k)\Vert - \Vert h\big (y^{k+1}\big )\Vert \big ] \le 0\), then (31) holds for all \(\theta \ge 0\), thus \(\theta _k=\theta _{k-1}\).
If \([L(y^{k+1},\lambda ^{k+1}) - L(x^k,\lambda ^{k}) + \Vert h(x^k)\Vert - \Vert h(y^{k+1})\Vert ] > 0\), then \(y^{k+1} \ne x^k\) and, consequently,
In this case, we have that
By the updating rule of the penalty parameter, we have that the sequence \(\{\theta _k\}\) is non-increasing. It remains to be proved that \(\{\theta _k\}\) is bounded away from zero. For this purpose, it is sufficient to show that \(\theta _{k}\) is greater than a fixed positive number when \(\theta _k \ne \theta _{k-1}\). In this case, we have
Thus, by Assumption 3.2, (7), and (23),
This implies that the sequence \(\{1 / {\theta }_{k}\}\) is bounded. Therefore, the sequence \(\{\theta _k\}\) is non-increasing and bounded away from zero. \(\square \)
The following lemma ensures that any limit point of the sequence generated by Algorithm 2.1 is feasible.
Lemma 3.2
Suppose that Assumptions 3.2 and 3.5 hold. Then, the sum \(\sum _{k=0}^\infty \Vert h(x^k)\Vert \) is convergent.
Proof
By condition (10) and Assumption 3.2, for all \(k \in \mathbb {N}\) one has that
Therefore, by (23),
Let us define \(\rho _k := (1 - \theta _k)/\theta _k\) for all \(k \in \{-1,0,1,2\ldots \}\). By Lemma 3.1, there exists \(\bar{\theta } > 0\) such that \(\theta _k \ge \bar{\theta }\) for all \(k \in \mathbb {N}\). This implies that \(\rho _k \le 1/\bar{\theta }-1\) for all \(k \in \mathbb {N}\). Since \(\{\rho _k\}\) is bounded and non-decreasing, it follows that
By compactness, the sequence \(\{\Vert h(x^k)\Vert \}\) is bounded. Therefore, by (35), there exists \(c > 0\) such that
Now, by (34),
Since \(0<\theta _{k}<1\), we have that \(\frac{(1-r)^2}{2} < \frac{(1-r)^2}{2 \theta _{k}}\), so
Therefore, for all \(k \in \mathbb {N}\),
Thus, for all \(k \in \mathbb {N}\), we have
Therefore, by (36),
Thus,
Since the functions \(L\) and \(h\) are continuous, by Assumption 3.5 and the compactness of \(\Omega \), we have that the sequences \(\{L(x^{k},\lambda ^{k})\}\) and \(\{\Vert h(x^k)\Vert \}\) are bounded. Since \(\{\rho _k\}\) is also bounded, it follows that the series \(\sum _{k=0}^\infty \Vert h(x^k)\Vert \) is convergent. \(\square \)
The following lemma shows that, if the problem is sufficiently smooth, then the direction on the tangent subspace ensures a uniform- bounded deterioration of the feasibility. The lemma also guarantees that the step size will not tend to zero. This means that Algorithm 2.1 will not produce excessively small steps.
Lemma 3.3
Suppose that Assumptions 3.1–3.3 and 3.5 hold. Define \(c_k\) as in (5) and \(t_k\) as in Step 4.3 of Algorithm 2.1. Then, there exist \(c>0\) and \(\bar{t}>0\) such that \(c_k>c\) and \(t_k \ge \bar{t}\) for all \(k \in \mathbb {N}\).
Proof
Given any continuously differentiable function \(F: \mathbb {R}^n \rightarrow \mathbb {R}^p\), by the Fundamental Theorem of Calculus, we have that
Applying (37) with \(F(y+td)= h(y^{k+1}+td^k)\), by Assumptions 3.1 and 3.2, the triangle inequality, and the Cauchy-Schwarz inequality, we have that, for \(c=\frac{\eta }{2}\),
Thus (5) holds for \(c_k=\frac{\eta }{2}\), independently of \(k\) and for all \(t\ge 0\).
Now, applying (37) with \(F(y+td)= L\big (y^{k+1}+td^k,\lambda ^{k}\big )\), by Assumptions 3.1, 3.2, 3.5, and (29),
Therefore, if \(\gamma _1 \le \frac{\sigma }{2}\) and \(\tau =\frac{\sigma }{2 \eta }\), we have that
for all \(t \le \tau \).
If \(\Vert d^k\Vert \ne 0\), condition (39) ensures that (9) holds whenever \(t_k \le \tau \). Let us prove that we can also satisfy (10) for any \(t_k\) in a specific interval. In fact, we will prove that any \(t_k \le \min \left\{ \frac{\bar{\theta }\gamma _1}{c(1-\bar{\theta })},\tau \right\} \), where \(\bar{\theta }\) is defined in Lemma 3.2, is acceptable in Step 4.3 of Algorithm 2.1. Since \(r_k \ge 0\), by (8), (19), (38), (39), and Lemma 3.1, we have that
Thus, due to the backtracking procedure, we have that \(t_{k} \ge \bar{t} := \min \left\{ \frac{\bar{\theta }\gamma _1}{10 c(1-\bar{\theta })}, \frac{\tau }{10}, 1 \right\} \) for all \(k \in \mathbb {N}\). \(\square \)
Lemma 3.4
Suppose that Assumptions 3.1–3.5 hold. Then \(\lim _{k \rightarrow \infty } \Vert d^k\Vert =0\).
Proof
By Lemma 3.3, there exists \(\bar{t}>0\) such that \(t_k \ge \bar{t}\) for all \(k \in \mathbb {N}\). By (22) and (28),
By Lemma 3.2 there exists \(c\) such that \(\sum _{k=0}^\infty \Vert h(x^k)\Vert =c\). Thus
Since \(f\) and \(h\) are bounded below on \(\Omega \), and \(\{\lambda ^{k+1}\}\) remains in a compact set, we have that the series \(\sum _{k=0}^\infty \Vert d^k\Vert ^2\) is convergent, and thus, \(\{\Vert d^k\Vert \}\) converges to zero. \(\square \)
Next Lemma and its corollary are used to prove the optimality of every limit point of the sequence generated by Algorithm 2.1.
Lemma 3.5
Consider \(P_k\) as in Assumption 3.3. Given \(y^{k+1} \in \Omega \), then
for all \(\lambda \), \(\mu \in \mathbb {R}^m\).
Proof
Defining \(S_k := \{d \; : \; \nabla h(y^{k+1})^Td=0\}\), we have that \(\nabla h(y^{k+1})(\lambda -\mu ) \in S_k^\perp \). Since \(\nabla L\big (y^{k+1},\lambda \big )=\nabla L(y^{k+1},\mu )+ \nabla h\big (y^{k+1}\big )\big (\lambda -\mu \big )\), the result follows from the Projection Theorem (see [39], Proposition B.11, item (b), p 704). \(\square \)
Corollary 3.1
Let \(x^*\) be a feasible point of (1) such that
for some \(\{y^{k+1}\} \subset \Omega \) and \(\{\lambda ^{k+1}\} \subset \mathbb {R}^m\). Then \(x^*\) satisfies the L-AGP optimality condition [25].
Proof
By hypothesis, \(x^*\) is feasible. Moreover, by Lemma 3.5,
Thus, \(x^*\) satisfies the L-AGP optimality condition. \(\square \)
Next lemma will be used to prove the convergence of the Lagrange multiplier estimates \(\{\lambda ^{k}\}\).
Lemma 3.6
Let \(x^*\) be a feasible point such that
for some \(\{y^{k}\} \subset \Omega \) and \(\{\lambda ^{k+1}\} \subset \mathbb {R}^m\). Then, if \(x^*\) satisfies the Mangasarian–Fromovitz constraint qualification, we have that the KKT conditions hold in \(x^*\). Moreover, all the limit points of \(\{\lambda ^{k+1}\}\) are Lagrange multipliers associated with the equality constraints in \(x^*\).
Proof
Let \(z^{k}:=P_\Omega (y^k-\nabla L(y^k,\lambda ^{k+1}))\); hence \(\lim _{k \rightarrow \infty } (z^k-y^k) =0\) and \(z^k\) is the solution of the problem:
Define \(\Omega :=\{x\in \mathbb {R}^n : Ax=b,x\ge 0\}\). Then, for all \(k = 0, 1, 2,\ldots \), there exist \(\alpha ^k\) and \(\beta ^k\) such that
and
Defining \(M_k:=\max \{\Vert \lambda ^{k+1}\Vert _\infty ,\Vert \alpha ^k\Vert _\infty ,\Vert \beta ^k\Vert _\infty \}\), we have that \(\{M_k\}\) is bounded. Otherwise, we could divide (41) by \(M_k\) and take the limit on a suitable subsequence to obtain a contradiction with the Mangasarian-Fromovitz condition.
Since \(\{M_k\}\) is bounded, if \(\lambda ^*\) is a limit point of \(\{\lambda ^{k+1}\}\), there exists \(K \subseteq \mathbb {N}\) such that \(\lim _{k \in K} \lambda ^{k+1}=\lambda ^*\), \(\lim _{k \in K} \alpha ^{k}=\alpha ^*\), and \(\lim _{k \in K} \beta ^{k}=\beta ^*\). Taking limits for \(k \in K\) in (40) and (41), we verify that \(\lambda ^*\) is a Lagrange multiplier associated with the constraints \(h(x)=0\) in \(x^*\). \(\square \)
The following theorem summarizes the convergence properties of Algorithm 2.1 under Assumptions 3.1–3.6.
Theorem 3.1
Suppose that Assumptions 3.1–3.6 hold. Then
-
1.
For all \(k \in \mathbb {N}\), \(x^{k}\) is well defined.
-
2.
There exists \(\bar{\theta }>0\) such that \(\theta _k \ge \bar{\theta }\) for all \(k \in \mathbb {N}\).
-
3.
\(\lim _{k \rightarrow \infty } \Vert h(x^k)\Vert = \lim _{k \rightarrow \infty } \Vert h(y^{k})\Vert = 0\) and any cluster point of \(\{x^k\}\) or \(\{y^{k}\}\) is feasible.
-
4.
\(\lim _{k \rightarrow \infty } d^k = 0\).
-
5.
\(\lim _{k \rightarrow \infty } \Vert y^{k} - x^{k}\Vert = 0\).
-
6.
The sequences \(\{x^k\}\) and \(\{y^{k}\}\) admit the same cluster points.
-
7.
Limit points of \(\{x^k\}\) satisfy the L-AGP optimality condition.
-
8.
If a limit point \(x^*\) satisfies the Constant Positive Generators (CPG) constraint qualification [40], then the KKT conditions hold at \(x^*\).
-
9.
If a limit point \(x^*\) satisfies the Mangasarian–Fromovitz constraint qualification, then the sequence \(\{\lambda ^k\}\) admits a limit point \(\lambda ^*\), which is a Lagrange multiplier associated with \(\nabla h(x^*)\).
Proof
Well-definedness of \(x^{k}\) has been proved in Theorem 2.1. The existence of \(\bar{\theta }\) follows from Lemma 3.1.
By Lemma 3.2 and (3),
and thus, by the continuity of \(h\), any cluster point of \(\{x^k\}\) or \(\{y^{k}\}\) is feasible.
The fact that \(\lim _{k \rightarrow \infty } d^k=0\) follows from Lemma 3.4.
By item 4 and the triangle inequality, we have that
By item 5 and the triangle inequality,
Thus, the sequences \(\{x^k \}\) and \(\{y^{k} \}\) admit the same cluster points.
Let \(x^*\) be a limit point of \(\{x^k \}\), therefore, by item 6, there is \(K \subseteq \mathbb {N}\) such that \(\lim _{k \in K } y^{k+1}=x^*\). By Assumption 3.3 and item 4,
By item 3, we have that \(x^*\) is feasible. Thus we conclude, by Corollary 3.1, that \(x^*\) satisfies the L-AGP optimality condition.
The statement 8 is a consequence of item 7 and the fact that sequential optimality conditions such as L-AGP imply KKT under the CPG constraint qualification [25, 40].
Finally, suppose that \(\lim _{k \in K}y^{k}=x^*\) and the Mangasarian–Fromovitz constraint qualification holds in \(x^*\). Since the sequence \(\{\lambda ^{k+1}\}\) lies in a compact set, we have that this sequence admits at least one limit point. Then, by Lemma 3.6, we conclude that any limit point of \(\{\lambda ^{k+1}\}\) with \(k \in K\) is a Lagrange Multiplier associated with \(\nabla h(x^*)\). \(\square \)
Remark 3.1
The CPG constraint qualification is weaker than CRSC [40], (R)CPLD [41–43], (R)CRCQ [44], and Mangasarian–Fromovitz constraint qualification.
4 Plausibility of Assumptions
One of our main assumptions is that, given an infeasible point \(x^k\), a less infeasible point can always be computed at the Restoration phase. Clearly, this assumption is not always true. For example, if the feasible set is empty and \(x^k\) is a minimizer of the infeasibility, then it would not be possible to be successful at the Restoration phase.
However, if a good algorithm is used at the Restoration phase, we judge that the improvement in feasibility will be achieved, or the problem is probably infeasible.
Lipschitz conditions on the gradients of \(f\) and \(h\), assumed in Assumption 3.1, are standard in smooth optimization. Convexity conditions on the problem (26)–(27) are also usual in quadratic programming analyses. After minimizing \(\frac{1}{2}d^T H_{k-1} d +\nabla f(y^{k})^T d\) subject to \(y^{k} + d \in \Omega ,\; \nabla h(y^{k})^T d = 0\), a natural way to estimate the Lagrange multipliers is taking \(\lambda ^{k+1}\) as the Lagrange multipliers associated with the constraints \(\nabla h(y^{k})^Td=0\). In this case, it is plausible to assume that \(\lambda ^{k+1}\) lies in a compact set if the Mangasarian–Fromovitz constraint qualification holds at \(y^{k}\). Thus, if the sequence \(\{\Vert H_k d^k\Vert \}\) is uniformly bounded, we have that Assumption 3.5 holds under the Mangasarian–Fromovitz constraint qualification on \(\Omega \). An easy alternative to guarantee that Assumption 3.5 holds is to use safeguards on the Lagrange multipliers in the implementation.
Assumption 3.2 asks for a sufficient improvement of feasibility with a bounded deterioration in the Lagrangian. This assumption can be satisfied with a typical Lipschitz hypothesis and regularity conditions on the feasible set. In Lemma 2 of [19], it is proved that if the Mangasarian Fromovitz-constraint qualification holds at all points of \(\Omega \), then the closest feasible point to \(x^k\) fulfills the bounded deterioration for the objective function \(f\), and so any \(r_k>0\) is a possible choice for all \(k\). A practical restoration procedure is shown in Lemma 6.1 of [11], if the Linear Independence constraint qualification holds on \(\Omega \). This result asserts that, given \(r \in \; ]0,1[\), there is a neighborhood \(N\) of the feasible set such that, if \(x^k \in N\), then the linearization of the constraints at \(x_k\), \(T_k := \{ y \in \Omega : h(x^k)+\nabla h(x^k)^T(y-~x^k)~=~0~\!\!\},\) is not empty, and denoting by \(y^{k+1}\) the point of \(T_k\) closest to \(x^k\), we have that \(h(y^{k+1}) \le r h(x^k)\). Moreover, under Lipschitz conditions, bounded deterioration for the objective function holds. In both restoration procedures, if the Lagrange multipliers \(\lambda ^k\) remain in a compact set (Assumption 3.5), then the bounded deterioration also holds for the Lagrangian.
The plausibility of Assumption 3.3 is strictly linked with our proposal in the present paper of choosing \(d^k\) as the solution of the quadratic programming problem:
where \(H_k\) is symmetric, \(Z_k^T H_k Z_k\) is positive definite, and the columns of \(Z_k\) form an orthonormal basis of the null-space of \(\nabla h(y^{k+1})^T\). In addition, we assume that the eigenvalues of \(Z_k^T H_k Z_k\) lie in a positive interval \([\sigma _\mathrm{min}, \sigma _\mathrm{max}]\).
By selecting \(d^k\) in this way, since \(d=0\) is feasible to (42), we have that
Therefore,
In order to prove condition (25), let us define
Since \(d^k\) is the solution of the linearly constrained problem (42),
Changing variables (\(y=y^{k+1}+d\)) and using the fact that the projections are non-expansive, we have
So, defining \(\sigma := \min \left\{ \frac{\sigma _\mathrm{min}}{2}, \frac{1}{2+\sigma _\mathrm{max}} \right\} \), we have that Assumption 3.3 holds.
From Lemma 3.3, we can see that condition (39) holds for \(\gamma _1=\frac{\sigma _\mathrm{min}}{4}\) and sufficiently small \(t\). Thus, if \(\sigma _\mathrm{min}\) is known, then we can ensure that Assumption 3.4 holds testing the sufficient decrease at Step 5 of Algorithm 2.1. Moreover, even if \(\sigma _\mathrm{min}\) is not known, it is possible to do a double backtracking, in \(t\) and \(\sigma \), to ensure that Assumption 3.4 holds [45]. However, this does not seem to be either efficient or necessary in practice, so we only test simple decrease in Step 5 of Algorithm 2.1.
Finally, to prove the plausibility of Assumption 3.6, we consider that \(\lambda ^{k+1}\) is chosen as the vector of Lagrange multipliers associated with the constraints \(\nabla h(y^{k})^T d = 0\) in problem:
In this case, denoting \(D_k:=\{d \; : \; y^{k}+d \in \Omega \}\), we have that
Therefore,
Thus, considering that the sequence \(\{\Vert H_k \Vert \}\) is uniformly bounded and that Assumptions 3.1–3.5 hold, by Lemma 3.4, we have that
which means that Assumption 3.6 holds.
5 Conclusions
The Inexact-Restoration method introduced in this paper employs an improved version of the Fischer–Friedlander line search approach combined with the use of Lagrange multipliers estimates both in the Optimization Phase subproblem as in the merit function. The new method has enhancements in several theoretical aspects and it is appropriate to tackle many problems in which there is a natural way to restore feasibility.
It remains to be investigated its application to general constrained optimization problems. In the general case, some standard (perhaps under-determined Newtonian) method should be defined for the restoration phase and convergence should be proved taking into account the characteristics of that method.
Other possible approach for general problems could be to introduce objective function information in the restoration phase, aiming to provide better initial approximations for the second-phase subproblems. Much theoretical and practical research can be expected along these lines.
Future research will also include the application of alternative I-R approaches to multiobjective optimization.
References
Abadie, J., Carpentier, J.: Generalization of the Wolfe reduced-gradient method to the case of nonlinear constraints. In: Fletcher, R. (ed.) Optimization, pp. 37–47. Academic Press, New York (1968)
Lasdon, L.S.: Reduced gradient methods. In: Powell, M.J.D. (ed.) Nonlinear Optimization, pp. 235–242. Academic Press, New York (1982)
Miele, A., Huang, H.Y., Heideman, J.C.: Sequential gradient-restoration algorithm for the minimization of constrained functions, ordinary and conjugate gradient version. J. Optim. Theory Appl. 4, 213–246 (1969)
Miele, A., Levy, A.V., Cragg, E.E.: Modifications and extensions of the conjugate-gradient restoration algorithm for mathematical programming problems. J. Optim. Theory Appl. 7, 450–472 (1971)
Miele, A., Sims, E.M., Basapur, V.K.: Sequential gradient-restoration algorithm for mathematical programming problem with inequality constraints, Part 1, Theory. Aero-Astronautics Report No. 168, Rice University (1983).
Rom, M., Avriel, M.: Properties of the sequential gradient-restoration algorithm (SGRA), Part 1: introduction and comparison with related methods. J. Optim. Theory Appl. 62, 77–98 (1989)
Rom, M., Avriel, M.: Properties of the sequential gradient-restoration algorithm (SGRA), Part 2: convergence analysis. J. Optim. Theory Appl. 62, 99–126 (1989)
Rosen, J.B.: The gradient projection method for nonlinear programming, Part 1: linear constraints. SIAM J. Appl. Math. 8, 181–217 (1960)
Rosen, J.B.: The gradient projection method for nonlinear programming, Part 2: nonlinear constraints. SIAM J. Appl. Math. 9, 514–532 (1961)
Martínez, J.M., Pilotta, E.A.: Inexact restoration algorithms for constrained optimization. J. Optim. Theory Appl. 104, 135–163 (2000)
Martínez, J.M.: Inexact restoration method with Lagrangian tangent decrease and new merit function for nonlinear programming. J. Optim. Theory Appl. 111, 39–58 (2001)
Gonzaga, C.C., Karas, E.W., Vanti, M.: A globally convergent filter method for nonlinear programming. SIAM J. Optim. 14, 646–669 (2003)
Birgin, E.G., Martínez, J.M.: Local convergence of an Inexact-Restoration method and numerical experiments. J. Optim. Theory Appl. 127, 229–247 (2005)
Martínez, J.M., Pilotta, E.A.: Inexact restoration methods for nonlinear programming: advances and perspectives. In: Qi, L.Q., Teo, K., Yang, X.Q. (eds.) Optimization and Control with Applications, pp. 271–292. Springer, New York (2005)
Kaya, C.Y., Martínez, J.M.: Euler discretization and inexact restoration for optimal control. J. Optim. Theory Appl. 134, 191–206 (2007)
Andreani, R., Castro, S.L., Chela, J.L., Friedlander, A., Santos, S.A.: An inexact-restoration method for nonlinear bilevel programming problems. Comput. Optim. Appl. 43, 307–328 (2009)
Gomes-Ruggiero, M.A., Martínez, J.M., Santos, S.A.: Spectral projected gradient method with inexact restoration for minimization with nonconvex constraints. SIAM J. Sci. Comput. 31, 1628–1652 (2009)
Karas, E.W., Pilotta, E.A., Ribeiro, A.A.: Numerical comparison of merit function with filter criterion in inexact restoration algorithms using Hard-Spheres Problems. Comput. Optim. Appl. 44, 427–441 (2009)
Fischer, A., Friedlander, A.: A new line search inexact restoration approach for nonlinear programming. Comput. Optim. Appl. 46, 333–346 (2010)
Karas, E.W., Gonzaga, C.C., Ribeiro, A.A.: Local convergence of filter methods for equality constrained nonlinear programming. Optimization 59, 1153–1171 (2010)
Kaya, C.Y.: Inexact Restoration for Runge–Kutta discretization of optimal control problems. SIAM J. Numer. Anal. 48, 1492–1517 (2010)
Francisco, J.B., Martínez, J.M., Martínez, L., Pisnitchenko, F.I.: Inexact Restoration method for minimization problems arising in electronic structure calculations. Comput. Optim. Appl. 50, 555–590 (2011)
Bueno, L.F., Friedlander, A., Martínez, J.M., Sobral, F.: Inexact restoration method for derivative-free optimization with smooth constraints. SIAM J. Optim. 23, 1189–1213 (2013)
Martínez, J.M., Svaiter, B.F.: A practical optimality condition without constraint qualifications for nonlinear programming. J. Optim. Theory Appl. 118, 117–133 (2003)
Andreani, R., Haeser, G., Martínez, J.M.: On sequential optimality conditions for smooth constrained optimization. Optimization 60, 627–641 (2011)
Barzilai, J., Borwein, J.M.: Two point step size gradient method. IMA J. Numer. Anal. 8, 141–148 (1988)
Dai, Y.H., Fletcher, R.: On the asymptotic behaviour of some new gradient methods. Math. Prog. 103, 541–559 (2005)
Friedlander, A., Martínez, J.M., Molina, B., Raydan, M.: Gradient method with retards and generalizations. SIAM J. Numer. Anal. 36, 275–289 (1998)
Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13, 321–326 (1993)
Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7, 26–33 (1997)
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 1196–1211 (2000)
Birgin, E.G., Martínez, J.M., Raydan, M.: Algorithm 813: SPG Software for convex-constrained optimization. ACM Trans. Math. Softw. 27, 340–349 (2001)
Birgin, E.G., Martínez, J.M., Raydan, M.: Inexact spectral projected gradient methods on convex sets. IMA J. Numer. Anal. 23, 539–559 (2003)
Andreani, R., Birgin, E.G., Martínez, J.M., Yuan, J.-Y.: Spectral projected gradient and variable metric methods for optimization with linear inequalities. IMA J. Numer. Anal. 25, 221–252 (2005)
Rockafellar, R.T.: Lagrange multipliers and optimality. SIAM Rev. 35, 183–238 (1993)
Burachik, R.S., Gasimov, R.N., Ismayilova, A.N., Kaya, C.Y.: On a modified subgradient algorithm for dual problems via sharp augmented Lagrangian. J. Global Optim. 34, 55–78 (2006)
Burachik, R.S., Iusem, A.N., Melo, J.G.: A primal dual modified subgradient algorithm with sharp Lagrangian. J. Global Optim. 46, 347–361 (2010)
Burachik, R.S., Kaya, C.Y., Mammadov, M.: An inexact modified subgradient algorithm for nonconvex optimization. Comput. Optim. Appl. 45, 1–24 (2010)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
Andreani, R., Haeser, G., Schuverdt, M.L., Silva, P.J.S.: Two new constraint qualifications and applications. SIAM J. Optim. 22, 1109–1135 (2012)
Andreani, R., Martínez, J.M., Schuverdt, M.L.: On the relation between the Constant Positive Linear Dependence condition and quasinormality constraint qualification. J. Optim. Theory Appl. 125, 473–485 (2005)
Andreani, R., Haeser, G., Schuverdt, M.L., Silva, P.J.S.: A relaxed constant positive linear dependence constraint qualification and applications. Math. Prog. 135, 255–273 (2012)
Qi, L., Wei, Z.: On the constant positive linear dependence condition and its application to SQP methods. SIAM J. Optim. 10, 963–981 (2000)
Minchenko, L., Stakhovski, S.: On relaxed constant rank regularity condition in mathematical programming. Optimization 60, 429–440 (2011)
Bueno, L.F.: Otimização com restries LOVO, Restauração Inexata e o Equilíbrio Inverso de Nash. Ph.D. dissertation, Departamento de Matemática Aplicada, Universidade Estadual de Campinas (2011)
Acknowledgments
This work was supported by PRONEX-CNPq/FAPERJ (Grant E-26/171.164/ 2003 - APQ1), FAPESP (Grants 2010/19720-5 and 2013/05475-7), CEPID-Cemeai-Fapesp Industrial Mathematics 201307375-0, and CNPq. The authors would like to thank the three anonymous referees for their insightful comments and suggestions. We also thank Alene Alder-Rangel and William Stanton for reviewing the English grammar and style of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bueno, L.F., Haeser, G. & Martínez, J.M. A Flexible Inexact-Restoration Method for Constrained Optimization. J Optim Theory Appl 165, 188–208 (2015). https://doi.org/10.1007/s10957-014-0572-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-014-0572-0