Abstract
In this paper, we propose an active set modified Polak–Ribiére–Polyak method for solving large-scale optimization with simple bounds on the variables. The active set is guessed by an identification technique at each iteration and the recently developed modified Polak–Ribiére–Polyak method is used to update the variables with indices outside of the active set. Under appropriate conditions, we show that the proposed method is globally convergent. Numerical experiments are presented using box constrained problems in the CUTEr test problem libraries.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The bound constrained problem is very important in practical optimization. Many practical problems can be converted into the form of the problem. In addition, it often a subproblem of augmented Lagrangian or penalty schemes for general constrained optimization [1]. In recent works on complementarity and variational inequalities, these problems are reduced to bound constrained minimization problems in an efficient way [2]. Thus, the development of numerical algorithms to efficiently solve it, especially when the dimension is large, is important in both theory and applications. The bound constrained optimization problem has received much attention in recent decades. We refer to papers [1, 3] for a review on recent advances in this area.
Active set methods are widely used in solving the bound constrained optimization problem. Early active set methods [4] are quite efficient for small dimensional problems, but are unattractive for large-scale problems [5]. The main reason is that typically at each step of the algorithm, at most one constraint can be added to or dropped from the active set. The potential worst-case may appear, where each of the possible 3n active sets is visited before discovering the optimal one [1]. Recently, there has been a growing interest in the design of active set methods that are capable of making rapid changes incorrect predictions [1, 6–11].
In [7], Facchinei, Júdice, and Soares proposed an efficient identification technique of the active set at the solution and developed an active set Newton’s algorithm for large-scale nonlinear programming with box constraints. The reported numerical results in [7] showed that this identification technique works well. Recently, based on the identification technique, Xiao and Hu [12] proposed an active set subspace Barzilai–Borwein gradient method (SBB). Preliminary numerical results show that the identification technique works well and the SBB method is competitive with the well-known SPG2 method [13].
Conjugate gradient methods are very efficient for solving large-scale unconstrained optimization problems due to their simplicity and low storage [14]. However, the study of conjugate gradient method for nonquadratic bound constrained optimization is relatively rare. In this paper, by the use of the active identification technique in [7] and the recently developed modified Polak–Ribiére–Polyak (PRP) method [15], we propose an active set modified PRP method for bound constrained optimization problem. Specifically, at each iteration, the active variables and free variables are defined by the identification technique [7]; we use the method in [7] to update the active variables and use the modified PRP method to update the free variables. The use of the modified PRP method makes the method require lower storage. Hence, the method can be used to solve large-scale problems. In addition, the method possesses the following advantages: all iterates are feasible and rapid changes in the active set are allowed. Under appropriate conditions, we show that the proposed method is globally convergent. The numerical experiments are also given using the bound optimization problems from CUTEr [16] collection with more than 500 variables, which shows that the proposed method is competitive with some existing methods such as the SPG2 method [13] and the ASA method [3].
The remainder of the paper is organized as follows. We propose the algorithm in Sect. 2. In Sect. 3, we show that the proposed algorithm is globally convergent. In Sect. 4, we test the performance of the proposed algorithm and compare it with the SPG2 method and the ASA method.
2 Motivation and Properties
Throughout the paper, ∥.∥ denotes the Euclidean norm of vectors. Let P Ω (x) denote the projection of x on the set Ω. If w=(w 1,w 2,…,w n )T is a n dimension vector and I is an index set such that I⊂{1,2,…,n}, we denote by w I the subvector with components w i ,i∈I, and denote by w I ≥0 the subvector with components w i ≥0,i∈I.
Consider the solution of the bound constrained nonlinear programming problem
where f is a real-valued, continuously differentiable function in an open set containing Ω, l and u are constant vectors, and the inequalities are valid component wise. Let \(\bar{x}\) be a stationary of (1), and consider the associated active sets
Furthermore, let
be the set of the free variables. By using this notation, a vector \(\bar{x}\) is said to be a stationary point for (1) iff it satisfies:
where g i (x) is the ith component of the gradient vector of f at x. Facchinei, Júdice, and Soares [7] gave the following approximations L(x), F(x) and U(x) to \(\bar{L}\), \(\bar{F}\), \(\bar{U}\), respectively,
where a i (x) and b i (x), i=1,…,n are nonnegative, continuous, and bounded functions defined on Ω, such that if x i =l i or x i =u i then a i (x)>0 or b i (x)>0, respectively. The following theorem shows that L(x), F(x), and U(x) are indeed good estimates of \(\bar{L}\), \(\bar{F,}\) and \(\bar{U}\). For the proof, see Theorem 3.1 in [7].
Theorem 2.1
For any x∈Ω, \(L(x)\cap U(x)={\mathcal{O}}\). Furthermore, if \(\bar{x}\) is a stationary point of problem (1) at which the strict complementarity holds, then there exists a neighborhood \(N(\bar{x})\) of \(\bar{x}\) such that
Following the idea of [7], we are going to develop an active set modified PRP method as follows. We firstly define the search direction. Let x k∈Ω be the current point at iteration k. For simplicity, we let L k:=L(x k), U k:=U(x k), and F k:=F(x k). Define the direction \(d^{k}=(d^{k}_{L^{k}},d^{k}_{F^{k}},d^{k}_{U^{k}})^{T}\) by
and
In what follows, we are going to define \(d^{k}_{F^{k}}\). For this, we define the active set indices of f at x k as follows:
The active indices are further subdivided into those indices:
and
It is clearly that a direction d is a feasible direction of f at x k if and only if d i ≥0,i∈A 1(x k)∪A 2(x k) and d i ≤0,i∈A 3(x k)∪A 4(x k). Furthermore, from the definitions of L(x k), U(x k) and A 1(x k)−A 4(x k) we can observe that the following fact holds:
By (3) and (4), we have \(d^{k}_{i} \geq 0, i\in A_{1}(x^{k})\) and \(d^{k}_{i} \leq 0, i\in A_{3}(x^{k})\). However, we can not determine the relation between A 2(x k) and U(x k) and the relation between A 4(x k) and L(x k). Let \(A^{*}_{2}(x^{k}):=A_{2}(x^{k}) \cap U(x^{k})\), \(A^{*}_{4}(x^{k}):=A_{4}(x^{k})\cap L(x^{k})\) and \(F^{*}(x^{k}):=F(x^{k})|((A_{4}(x^{k})-A^{*}_{4}(x^{k})) \cup (A_{2}(x^{k})-A^{*}_{2}(x^{k})))\). It follows from (3) and (4) that \(d^{k}_{i} \geq 0, i\in A^{*}_{2}(x^{k})\) and \(d^{k}_{i} \leq 0, i\in A^{*}_{4}(x^{k})\). Thus, we let
and
where
and ϵ>0 is given a constant. From the above discussion, we can see that d k is a feasible direction of f at x k. Moreover, it is not difficult to see from (5) and (6) that \(d_{F^{k}}^{k}\) satisfies
The following two theorems show that d k is a descent direction f at x k , if x k is not a stationary point of (1).
Theorem 2.2
If {x k}∈Ω, then d k=0 if and only if x k is a stationary point of (1).
Proof
Let d k=0. For i∈L k, by (3) and (4) we have
Since \(x_{i}^{k}=l_{i}\), a i (x k)>0, it holds that g i (x k)≥0. For each i∈U k we have
This implies that g i (x k)≤0. From the above discussion, we have A 2(x k)∪A 4(x k)=∅. Thus, F ∗k=F k. For i∈F k, by the use of Cauchy–Schwarz inequality to (7), we have
Hence, g i (x k)=0 for each i∈F k. The above argument has shown that x k is a stationary point of f on Ω.
Suppose that x k be a stationary point of f on Ω. Then we have A 2(x k)∪A 4(x k)=∅ and F ∗k=F k. It follows from (2) that
By the definition of d k , we immediately have d k=0. The proof is complete. □
Theorem 2.3
Suppose that x k be not a stationary point of (1). Then the direction d k determined by (3)–(6) is a descent direction of f at x k.
Proof
Following a similar proof as the proof of Theorem 3.2 in [7], it is not difficult to show that for each i∈L k∪U k, there exists a positive scalar γ i such that
This together with (7) implies that
Since x k is not a stationary point of (1), by Theorem (2.2), there must exist an index i such that at least one of the two possibilities holds:
Therefore, the right-hand side of (9) must be negative. In other words, d k is a descent direction of f at x k. The proof is complete. □
Based on the above discussion, we propose an active set PRP-type method (APRP) for (1) as follows.
Algorithm 2.1
(APRP)
- Step 0.:
-
Given initial point x 0∈Ω and positive constants ρ, β, ϵ 1 and δ∈ ]0,1[. Set k:=0.
- Step 1.:
-
If ∥P Ω (x k−g k)−x k∥≤ϵ 1, then stop.
- Step 2.:
- Step 3.:
-
Determine α k:=max{βρ j,j=0,1,…} satisfying
$$ f\bigl(P_{\varOmega}\bigl(x^k+\alpha^kd^k \bigr)\bigr)\leq f\bigl(x^k\bigr)-\delta\bigl(\alpha^{k} \big\|d^k\big\|\bigr)^2. $$(10) - Step 4.:
-
Let the next iterate be x k+1:=x k+α k d k.
- Step 5.:
-
Let k:=k+1 and go to Step 1.
Remark 2.1
Since the direction d k is a feasible descent direction of f at x k, there exists a positive constant β k such that x k+αd k∈Ω when 0<α<β k. This implies
Clearly, Theorem 2.3 implies that the condition (10) necessarily holds after a finite number of reduction of β. Consequently, Algorithm 2.1 is well defined. In addition, it follows from (10) that the function value sequence {f(x k)} is decreasing. We also have from (10) that
if f is bounded from below. In particular, we have
3 Convergence Analysis
In this section, we analyze the convergence of Algorithm 2.1. To this end, we make the following assumptions.
Assumption 3.1
-
(i)
The level set D:={x∈Ω:f(x)≤f(x 0)} is bounded.
-
(ii)
In some neighborhood N of D, f is continuously differentiable and its gradient is Lipschitz continuous, namely, there exists a constant L>0 such that
$$ \big\|g(x)-g(y)\big\|\leq L \|x-y\|,\quad\forall x, y \in N. $$(12)
Since {f(x k)} is decreasing, it is clearly that the sequence {x k} generated by Algorithm 2.1 is contained in D. In addition, from Assumption 3.1, we get that there is a constant γ>0 such that
The following lemma shows that the sequence {d k} generated by Algorithm 2.1 is bounded, also.
Lemma 3.1
Let the sequence {x k} be generated by Algorithm 2.1. Then there exists a constant M>0 such that
Proof
First, by (8), there exists a constant \(\bar{\gamma}>\gamma\) such that
This together with the definition of d k, (12) and (13) implies that
By (11), there exists a constant r∈]0,1[ and an integer k 0 such that the following inequality holds for all k≥k 0,
Hence, we have for all k≥k 0
Letting \(M:=\max\{\|d_{1}\|,\|d_{2}\|,\ldots, \|d^{k_{0}}\|,2\frac{\bar{\gamma}}{1-r}+\|d^{k_{0}}\|\}\), we get (14). □
Lemma 3.2
Let the sequence {x k} and {d k} be generated by Algorithm 2.1. If there are subsequences \(\{x_{k}\}_{k\in K}\rightarrow \widehat{x}\) and {d k } k∈K →0, then \(\widehat{x}\) is a stationary point of (1).
Proof
Since the number of distinct sets L k, U k and F k is finite, without loss of generality, we can assume that the following relations hold for all k∈K
Since \(d^{k}_{L}\rightarrow 0\) and \(d^{k}_{U} \rightarrow 0\), we obviously have
On the other hand, by (7), we have
Consequently, we obtain \(\|g_{F}^{k}(\widehat{x})\|=0\). The proof is complete. □
The following theorem shows that there exists an accumulation point of {x k} that is a stationary point of (1).
Theorem 3.1
Let {x k} be generated by Algorithm 2.1. Then we have
Proof
For the sake of contradiction, we suppose that (15) is not true. Then there exists a constant ϵ>0 such that
This together with (11) implies that lim k→∞ α k=0. By the line search rule, ρ −1 α k≤β k does not satisfy (10) when k is sufficiently large. This means
By the mean-value theorem and (12), there exists a constant 0<θ k<1 such that
where L is the Lipschitz constant of g. Substituting the last inequality into (16), we get for all k sufficiently large
Since {d k} is bounded and lim k→∞ α k∥d k∥=0, the last inequality implies
This together with (9) implies that
Since {x k} is bounded, there exists an infinite index set K 1 such that \(\lim_{k\in K_{1}}x^{k}=x^{*}\). Hence, (17) together with (3)–(4) implies that x ∗ is a stationary point of (1). By Theorem 2.2, we have \(\lim_{k \in K_{1}}\!\|d^{k}\|=0\). This yields a contradiction. □
4 Numerical Experiments
In this section, we do some numerical experiments to test the performance of the APRP method and compare it with the SPG2 method [13, 17] and the ASA method [3]. The SPG2 code was obtained from Birgin’s web page at: http://www.ime.usp.br/~egbirgin/tango/. The ASA code was obtained from Hager’s web page at: http://www.math.ufl.edu/~hager/. The SPG2 code and the APRP code were written in Fortran 77 and the ASA code was written in C. All codes were run on a PC with Pentium IV 3.0 GHz CPU processor and 512 M RAM memory and with a Linux operating system.
The ASA code and the SPG2 code were implemented with default parameters. We implemented the APRP code with the following parameters: δ=10−4,ρ=0.5,ϵ=10−6 and
Moreover, we also test the APRP method with different parameters a i (x) and b i (x) to see that a i (x)=b i (x)=10−6 is the best choice.
The test problems consists of 52 box constrained problems in the CUTEr library [16] with dimensions between 500 and 15625. For each test problem and all tested methods, the termination condition is
A limit of 5000 iterations is also imposed. In running the numerical experiments, we checked whether different codes converged to different local minimizers; when comparing the codes, we restricted ourselves to test problems in which all codes converged to the same local minimizer. The detailed numerical results concerned with the CPU time in seconds and the number of iterations, function evaluations, and gradient evaluations for each of the tested method are posted at the following Web site: http://bgchengwy.blog.163.com/.
First, we note that the SPG2 method fails to satisfy the termination in the problems LINVERSE (999), QRTQUAD (1200–5000), and JNLBRNGB (9375–15625), while the APRP method and the ASA method fails to satisfy the termination in the same problems QRTQUAD (1200–5000). More insight into the numerical results, we consider the statistics “iter,” “fun,” “grad,” “time” and list the number of problems where any method achieves the best value on the number of iteration, function evaluations, gradient evaluations and CPU time in Table 1.
From the above table, we observe that the APRP method works efficiently, which require fewer iterations, fewer function evaluations, fewer gradient evaluations, and less time consuming. Further to show the efficiency of the APRP method, we adopt the performance profiles introduced by Dolan and Moré [18] to evaluate CPU time in Fig. 1. It is not difficult to see from Fig. 1 that the APRP method performs best among all the methods on CPU time. In particular, both the APRP method and the ASA method can solve about 96 % of the test problems, whereas the SPG2 method can solve about 90 % of the test problems. In summary, the results from Table 1 and Fig. 1 show that the APRP method provides an efficient method for solving large-scale nonlinear box constrained problems.
5 Conclusions
In this paper, we have introduced an active set modified PRP method for large-scale optimization with simple bounds on the variables. Under appropriate conditions, we show that the proposed method is globally convergent. The implementation of the algorithm uses an active set identification technique to estimate the active variables at each iteration, and determine the search direction of free variables by the modified PRP conjugate gradient method. The active set identification strategy allows rapid changes in the active set from one iteration to the next, and the cost of computing the search direction of active variables is reduced. We think that our algorithm is practical and effective for the test problems.
References
Gould, N.I.M., Orban, D., Toint, P.L.: Numerical methods for large-scale nonlinear optimization. Acta Numer. 14, 299–361 (2005)
Burdakov, O.P., Martínez, J.M., Pilotta, E.A.: A limited-memory multipoint symmetric secant method for bound constrained optimization. Ann. Oper. Res. 117, 51–70 (2002)
Hager, W.W., Zhang, H.: A new active set algorithm for box constrained optimization. SIAM J. Optim. 17, 526–557 (2006)
Goldfarb, D.: Extension of Davidson’s variable metric method to maximization under linear inequality and constraints. SIAM J. Appl. Math. 17, 739–764 (1969)
Dai, Y.H., Fletcher, R.: Projected Barzilai–Borwein methods for large-scale box-constrained quadratic programming. Numer. Math. 100, 21–47 (2005)
Birgin, E.G., Martínez, J.M.: Large-scale active-set box-constrained optimization method with spectral projected gradients. Comput. Optim. Appl. 23, 101–125 (2002)
Facchinei, F., Júdice, J., Soares, J.: An active set newton algorithm for large-scale nonlinear programs with box Canstranits. SIAM J. Optim. 8, 158–186 (1998)
Facchinei, F., Lucidi, S., Palagi, L.: A truncated newton algorithm for large scale box constrained optimization. SIAM J. Optim. 12, 1100–1125 (2002)
Lin, C.J., Moré, J.J.: Newton’s method for large bound-constrained optimization problems. SIAM J. Optim. 9, 1100–1127 (1999)
Moré, J.J., Toraldo, G.: Algorithms for bound constrained quadratic programming problems. Numer. Math. 4, 377–400 (1989)
Ni, Q., Yuan, Y.X.: A subspace limited memory quasi-Newton algorithm for large-scale nonlinear bound constrained optimization. Math. Comput. 66, 1509–1520 (1997)
Xiao, Y.H., Hu, Q.G.: Subspace Barzilai–Borwein gradient method for large-scale bound constrained optimization. Appl. Math. Optim. 58, 275–290 (2008)
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 1121–1196 (2000)
Hager, W.W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2, 35–58 (2006)
Zhang, L., Zhou, W.J., Li, D.H.: A descent modified Polak–Ribiere–Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 26, 629–640 (2006)
Bongartz, I., Conn, A.R., Gould, N.I.M., Toint, P.L.: CUTE: constrained and unconstrained testing environment. ACM Trans. Math. Softw. 21, 123–160 (1995)
Birgin, E.G., Martínez, J.M., Raydan, M.: Algorithm 813: SPG-software for convex-constrained optimization. ACM Trans. Math. Softw. 27, 340–349 (2001)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)
Acknowledgements
We would like to thank the referees for their detailed comments and suggestions that have led to a significant improvement of the paper. We also thank the Professors E.G. Birgin, J.M. Martínez, and M. Raydan for their SPG2 code and Professors W.W.Hager and H. Zhang for their ASA code for numerical comparison. W. Cheng’s research was supported by the NSF of China via grant (11071087, 11101081) and by Foundation for Distinguished Young Talents in Higher Education of Guangdong, China LYM10127. D. Li’s research is supported by the major project of the Ministry of Education of China Grant 309023 and the NSF of China Grant 11071087.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by F. Giannessi.
Rights and permissions
About this article
Cite this article
Cheng, W., Li, D. An Active Set Modified Polak–Ribiére–Polyak Method for Large-Scale Nonlinear Bound Constrained Optimization. J Optim Theory Appl 155, 1084–1094 (2012). https://doi.org/10.1007/s10957-012-0091-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-012-0091-9