1 Introduction

The bound constrained problem is very important in practical optimization. Many practical problems can be converted into the form of the problem. In addition, it often a subproblem of augmented Lagrangian or penalty schemes for general constrained optimization [1]. In recent works on complementarity and variational inequalities, these problems are reduced to bound constrained minimization problems in an efficient way [2]. Thus, the development of numerical algorithms to efficiently solve it, especially when the dimension is large, is important in both theory and applications. The bound constrained optimization problem has received much attention in recent decades. We refer to papers [1, 3] for a review on recent advances in this area.

Active set methods are widely used in solving the bound constrained optimization problem. Early active set methods [4] are quite efficient for small dimensional problems, but are unattractive for large-scale problems [5]. The main reason is that typically at each step of the algorithm, at most one constraint can be added to or dropped from the active set. The potential worst-case may appear, where each of the possible 3n active sets is visited before discovering the optimal one [1]. Recently, there has been a growing interest in the design of active set methods that are capable of making rapid changes incorrect predictions [1, 611].

In [7], Facchinei, Júdice, and Soares proposed an efficient identification technique of the active set at the solution and developed an active set Newton’s algorithm for large-scale nonlinear programming with box constraints. The reported numerical results in [7] showed that this identification technique works well. Recently, based on the identification technique, Xiao and Hu [12] proposed an active set subspace Barzilai–Borwein gradient method (SBB). Preliminary numerical results show that the identification technique works well and the SBB method is competitive with the well-known SPG2 method [13].

Conjugate gradient methods are very efficient for solving large-scale unconstrained optimization problems due to their simplicity and low storage [14]. However, the study of conjugate gradient method for nonquadratic bound constrained optimization is relatively rare. In this paper, by the use of the active identification technique in [7] and the recently developed modified Polak–Ribiére–Polyak (PRP) method [15], we propose an active set modified PRP method for bound constrained optimization problem. Specifically, at each iteration, the active variables and free variables are defined by the identification technique [7]; we use the method in [7] to update the active variables and use the modified PRP method to update the free variables. The use of the modified PRP method makes the method require lower storage. Hence, the method can be used to solve large-scale problems. In addition, the method possesses the following advantages: all iterates are feasible and rapid changes in the active set are allowed. Under appropriate conditions, we show that the proposed method is globally convergent. The numerical experiments are also given using the bound optimization problems from CUTEr [16] collection with more than 500 variables, which shows that the proposed method is competitive with some existing methods such as the SPG2 method [13] and the ASA method [3].

The remainder of the paper is organized as follows. We propose the algorithm in Sect. 2. In Sect. 3, we show that the proposed algorithm is globally convergent. In Sect. 4, we test the performance of the proposed algorithm and compare it with the SPG2 method and the ASA method.

2 Motivation and Properties

Throughout the paper, ∥.∥ denotes the Euclidean norm of vectors. Let P Ω (x) denote the projection of x on the set Ω. If w=(w 1,w 2,…,w n )T is a n dimension vector and I is an index set such that I⊂{1,2,…,n}, we denote by w I the subvector with components w i ,iI, and denote by w I ≥0 the subvector with components w i ≥0,iI.

Consider the solution of the bound constrained nonlinear programming problem

$$ \bigl\{ \min f(x), x\in \varOmega:=\bigl\{x\in {\mathcal{R}}^n: l\leq x\leq u\bigr\}\bigr\}, $$
(1)

where f is a real-valued, continuously differentiable function in an open set containing Ω, l and u are constant vectors, and the inequalities are valid component wise. Let \(\bar{x}\) be a stationary of (1), and consider the associated active sets

$$\bar{L}:=\{i:\bar{x}_i=l_i\}, \quad\quad \bar{U}:=\{i:\bar{x}_i=u_i\}. $$

Furthermore, let

$$\bar{F}:=\{1,\ldots,n\}\verb|\|(\bar{L}\cup\bar{U}) $$

be the set of the free variables. By using this notation, a vector \(\bar{x}\) is said to be a stationary point for (1) iff it satisfies:

$$ \left \{\begin{array}{ll} i\in \bar{L} & \Rightarrow g_i(\bar{x}) \geq 0 \\ i \in \bar{F} & \Rightarrow g_i(\bar{x})= 0 \\ i \in \bar{U} & \Rightarrow g_i(\bar{x})\leq 0\\ \end{array} \right . $$
(2)

where g i (x) is the ith component of the gradient vector of f at x. Facchinei, Júdice, and Soares [7] gave the following approximations L(x), F(x) and U(x) to \(\bar{L}\), \(\bar{F}\), \(\bar{U}\), respectively,

where a i (x) and b i (x), i=1,…,n are nonnegative, continuous, and bounded functions defined on Ω, such that if x i =l i or x i =u i then a i (x)>0 or b i (x)>0, respectively. The following theorem shows that L(x), F(x), and U(x) are indeed good estimates of \(\bar{L}\), \(\bar{F,}\) and \(\bar{U}\). For the proof, see Theorem 3.1 in [7].

Theorem 2.1

For any xΩ, \(L(x)\cap U(x)={\mathcal{O}}\). Furthermore, if \(\bar{x}\) is a stationary point of problem (1) at which the strict complementarity holds, then there exists a neighborhood \(N(\bar{x})\) of \(\bar{x}\) such that

$$L(x)=\bar{L}, \quad\quad F(x)=\bar{F}, \quad\quad U(x)=\bar{U}, \quad \forall \ x\in N(\bar{x}). $$

Following the idea of [7], we are going to develop an active set modified PRP method as follows. We firstly define the search direction. Let x kΩ be the current point at iteration k. For simplicity, we let L k:=L(x k), U k:=U(x k), and F k:=F(x k). Define the direction \(d^{k}=(d^{k}_{L^{k}},d^{k}_{F^{k}},d^{k}_{U^{k}})^{T}\) by

$$ d_{i}^k:=l_i-x_{i}^k, \quad i\in L^k $$
(3)

and

$$ d_{i}^k:=u_i-x_{i}^k, \quad i\in U^k. $$
(4)

In what follows, we are going to define \(d^{k}_{F^{k}}\). For this, we define the active set indices of f at x k as follows:

$$A\bigl(x^k\bigr):=\bigl\{ i:x^k_{i}=l_i \ \mbox{or}\ x^k_{i}=u_i\bigr\}. $$

The active indices are further subdivided into those indices:

$$A_1\bigl(x^k\bigr):=\bigl\{i:x^k_{i}=l_i, g_i\bigl(x^k\bigr)\geq 0\bigr\}, \quad\quad A_2 \bigl(x^k\bigr):=\bigl\{i:x^k_{i}=l_i, g_i\bigl(x^k\bigr) <0\bigr\} $$

and

$$A_3\bigl(x^k\bigr):=\bigl\{i:x^k_{i}=u_i, g_i\bigl(x^k\bigr) \leq 0\bigr\}, \quad\quad A_4\bigl(x^k\bigr):=\bigl\{i:x^k_{i}=u_i, g_i\bigl(x^k\bigr) > 0\bigr\}. $$

It is clearly that a direction d is a feasible direction of f at x k if and only if d i ≥0,iA 1(x k)∪A 2(x k) and d i ≤0,iA 3(x k)∪A 4(x k). Furthermore, from the definitions of L(x k), U(x k) and A 1(x k)−A 4(x k) we can observe that the following fact holds:

By (3) and (4), we have \(d^{k}_{i} \geq 0, i\in A_{1}(x^{k})\) and \(d^{k}_{i} \leq 0, i\in A_{3}(x^{k})\). However, we can not determine the relation between A 2(x k) and U(x k) and the relation between A 4(x k) and L(x k). Let \(A^{*}_{2}(x^{k}):=A_{2}(x^{k}) \cap U(x^{k})\), \(A^{*}_{4}(x^{k}):=A_{4}(x^{k})\cap L(x^{k})\) and \(F^{*}(x^{k}):=F(x^{k})|((A_{4}(x^{k})-A^{*}_{4}(x^{k})) \cup (A_{2}(x^{k})-A^{*}_{2}(x^{k})))\). It follows from (3) and (4) that \(d^{k}_{i} \geq 0, i\in A^{*}_{2}(x^{k})\) and \(d^{k}_{i} \leq 0, i\in A^{*}_{4}(x^{k})\). Thus, we let

$$ d^k_{i}:=-g_{i}^k, \quad i\in \bigl(A_4\bigl(x^k\bigr)-A^{*}_4 \bigl(x^k\bigr)\bigr) \cup \bigl(A_2\bigl(x^k \bigr)-A^{*}_2\bigl(x^k\bigr)\bigr) $$
(5)

and

$$ d^{k}_{F^{*k}}:=\left \{\begin{array}{l@{\quad}l} -g^{k}_{F^{*k}}, & \mbox{iff\ } k=0 \ \mbox{or} \ \big\|g^{k-1}_{F^{*k}}\big\|\leq \epsilon, \\[4pt] -g^{k}_{F^{*k}}+\beta_{k}^{\mathrm{PRP}}d^{k-1}_{F^{*k}}-\eta_ky_{F^{*k}}^{k-1}, & \mbox{iff } k>0 \ \mbox{and} \ \big\|g^{k-1}_{F^{*k}}\big\| > \epsilon, \end{array} \right . $$
(6)

where

and ϵ>0 is given a constant. From the above discussion, we can see that d k is a feasible direction of f at x k. Moreover, it is not difficult to see from (5) and (6) that \(d_{F^{k}}^{k}\) satisfies

$$ \bigl(g_{F^k}^k\bigr)^Td_{F^k}^k=- \big\|g_{F^k}^k\big\|^2. $$
(7)

The following two theorems show that d k is a descent direction f at x k , if x k is not a stationary point of (1).

Theorem 2.2

If {x k}∈Ω, then d k=0 if and only if x k is a stationary point of (1).

Proof

Let d k=0. For iL k, by (3) and (4) we have

$$0=d^{k}_i=l_i-x_{i}^k \geq -a_i\bigl(x^k\bigr) g_i \bigl(x^k\bigr). $$

Since \(x_{i}^{k}=l_{i}\), a i (x k)>0, it holds that g i (x k)≥0. For each iU k we have

$$0=d^{k}_i=u_i-x_{i}^k \leq -b_{i}\bigl(x^k\bigr)g_i \bigl(x^k\bigr), $$

This implies that g i (x k)≤0. From the above discussion, we have A 2(x k)∪A 4(x k)=∅. Thus, F k=F k. For iF k, by the use of Cauchy–Schwarz inequality to (7), we have

$$\big\|g_{F^k}^k \big\| \leq \big\|d_{F^k}^k \big\|. $$

Hence, g i (x k)=0 for each iF k. The above argument has shown that x k is a stationary point of f on Ω.

Suppose that x k be a stationary point of f on Ω. Then we have A 2(x k)∪A 4(x k)=∅ and F k=F k. It follows from (2) that

$$L^k=\{i:x^k_i=l_i\}, \quad\quad F^k=\{i:l_i<x^k_i<u_i\}, \quad\quad U^k=\{i:x^k_i=u_i\}. $$

By the definition of d k , we immediately have d k=0. The proof is complete. □

Theorem 2.3

Suppose that x k be not a stationary point of (1). Then the direction d k determined by (3)–(6) is a descent direction of f at x k.

Proof

Following a similar proof as the proof of Theorem 3.2 in [7], it is not difficult to show that for each iL kU k, there exists a positive scalar γ i such that

$$ g^{k}_i d^{k}_i \leq -\gamma_i \bigl(d_{i}^k\bigr)^2, $$
(8)

This together with (7) implies that

$$ \bigl(g^k\bigr)^Td^k\leq - \big\|g_{F^k}^k\big\|^2+\sum _{i \in L^k \cup U^k}-\gamma_i \bigl(d_{i}^k \bigr)^2. $$
(9)

Since x k is not a stationary point of (1), by Theorem (2.2), there must exist an index i such that at least one of the two possibilities holds:

Therefore, the right-hand side of (9) must be negative. In other words, d k is a descent direction of f at x k. The proof is complete. □

Based on the above discussion, we propose an active set PRP-type method (APRP) for (1) as follows.

Algorithm 2.1

(APRP)

Step 0.:

Given initial point x 0Ω and positive constants ρ, β, ϵ 1 and δ∈ ]0,1[. Set k:=0.

Step 1.:

If ∥P Ω (x kg k)−x k∥≤ϵ 1, then stop.

Step 2.:

Compute d k by (3)–(6).

Step 3.:

Determine α k:=max{βρ j,j=0,1,…} satisfying

$$ f\bigl(P_{\varOmega}\bigl(x^k+\alpha^kd^k \bigr)\bigr)\leq f\bigl(x^k\bigr)-\delta\bigl(\alpha^{k} \big\|d^k\big\|\bigr)^2. $$
(10)
Step 4.:

Let the next iterate be x k+1:=x k+α k d k.

Step 5.:

Let k:=k+1 and go to Step 1.

Remark 2.1

Since the direction d k is a feasible descent direction of f at x k, there exists a positive constant β k such that x k+αd kΩ when 0<α<β k. This implies

$$f\bigl(P_{\varOmega}\bigl(x^k+\alpha^{k}d^k \bigr)\bigr)=f\bigl(x^k+\alpha^{k}d^k\bigr) \quad \mbox{for} \ 0< \alpha^{k} <\beta^k. $$

Clearly, Theorem 2.3 implies that the condition (10) necessarily holds after a finite number of reduction of β. Consequently, Algorithm 2.1 is well defined. In addition, it follows from (10) that the function value sequence {f(x k)} is decreasing. We also have from (10) that

$$\sum_{k=0}^{\infty}\bigl(\alpha^{k} \bigr)^2\big\|d^k\big\|^2 <\infty, $$

if f is bounded from below. In particular, we have

$$ \lim_{k \rightarrow \infty}\alpha^k\big\|d^k\big\|=0. $$
(11)

3 Convergence Analysis

In this section, we analyze the convergence of Algorithm 2.1. To this end, we make the following assumptions.

Assumption 3.1

  1. (i)

    The level set D:={xΩ:f(x)≤f(x 0)} is bounded.

  2. (ii)

    In some neighborhood N of D, f is continuously differentiable and its gradient is Lipschitz continuous, namely, there exists a constant L>0 such that

    $$ \big\|g(x)-g(y)\big\|\leq L \|x-y\|,\quad\forall x, y \in N. $$
    (12)

Since {f(x k)} is decreasing, it is clearly that the sequence {x k} generated by Algorithm 2.1 is contained in D. In addition, from Assumption 3.1, we get that there is a constant γ>0 such that

$$ \big\|g(x)\big\| \leq \gamma,\quad \forall x \in D. $$
(13)

The following lemma shows that the sequence {d k} generated by Algorithm 2.1 is bounded, also.

Lemma 3.1

Let the sequence {x k} be generated by Algorithm 2.1. Then there exists a constant M>0 such that

$$ \big\|d^k\big\| \leq M,\quad \forall k. $$
(14)

Proof

First, by (8), there exists a constant \(\bar{\gamma}>\gamma\) such that

$$\big\|d^k_{L^k \cup U^k}\big\| \leq \bar{\gamma}. $$

This together with the definition of d k, (12) and (13) implies that

By (11), there exists a constant r∈]0,1[ and an integer k 0 such that the following inequality holds for all kk 0,

$$\frac{2L\bar{\gamma}}{\varepsilon^2}\alpha^{k-1}\big\|d^{k-1}\big\| \leq r. $$

Hence, we have for all kk 0

Letting \(M:=\max\{\|d_{1}\|,\|d_{2}\|,\ldots, \|d^{k_{0}}\|,2\frac{\bar{\gamma}}{1-r}+\|d^{k_{0}}\|\}\), we get (14). □

Lemma 3.2

Let the sequence {x k} and {d k} be generated by Algorithm 2.1. If there are subsequences \(\{x_{k}\}_{k\in K}\rightarrow \widehat{x}\) and {d k } kK →0, then \(\widehat{x}\) is a stationary point of (1).

Proof

Since the number of distinct sets L k, U k and F k is finite, without loss of generality, we can assume that the following relations hold for all kK

$$L^k=L,\quad\quad F^k=F,\quad\quad U^k=U. $$

Since \(d^{k}_{L}\rightarrow 0\) and \(d^{k}_{U} \rightarrow 0\), we obviously have

$$\widehat{x}_L=l_{L},\quad g_{L}(\widehat{x}) \geq 0,\quad\mbox{and}\quad \widehat{x}_U=u_{U},\quad g_{U}(\widehat{x}) \leq 0. $$

On the other hand, by (7), we have

$$\big\|g^k_{F}\big\|\leq \big\|d^k_{F}\big\|. $$

Consequently, we obtain \(\|g_{F}^{k}(\widehat{x})\|=0\). The proof is complete. □

The following theorem shows that there exists an accumulation point of {x k} that is a stationary point of (1).

Theorem 3.1

Let {x k} be generated by Algorithm 2.1. Then we have

$$ \lim_{k \rightarrow \infty} \inf \big\|d^k\big\|=0. $$
(15)

Proof

For the sake of contradiction, we suppose that (15) is not true. Then there exists a constant ϵ>0 such that

$$\big\| d^k \big\| \geq \epsilon , \quad \forall k. $$

This together with (11) implies that lim k→∞ α k=0. By the line search rule, ρ −1 α kβ k does not satisfy (10) when k is sufficiently large. This means

$$ f\bigl(x^k+\rho^{-1} \alpha^{k} d^k\bigr)-f\bigl(x^k\bigr)>-\delta \rho^{-2} \bigl(\alpha^{k}\bigr)^2\big\|d^k\big\|^2. $$
(16)

By the mean-value theorem and (12), there exists a constant 0<θ k<1 such that

where L is the Lipschitz constant of g. Substituting the last inequality into (16), we get for all k sufficiently large

Since {d k} is bounded and lim k→∞ α kd k∥=0, the last inequality implies

$$\lim_{k \rightarrow \infty}-\bigl(g^{k}\bigr)^Td^k=0. $$

This together with (9) implies that

$$ \lim_{k \rightarrow \infty} \big\|g^{k}_{F^k} \big\|^2=0\quad \mbox{and}\quad\lim_{k\rightarrow \infty} \sum _{i \in L^k \cup U^k} \gamma_i \bigl(d_{i}^k \bigr)^2=0. $$
(17)

Since {x k} is bounded, there exists an infinite index set K 1 such that \(\lim_{k\in K_{1}}x^{k}=x^{*}\). Hence, (17) together with (3)–(4) implies that x is a stationary point of (1). By Theorem 2.2, we have \(\lim_{k \in K_{1}}\!\|d^{k}\|=0\). This yields a contradiction. □

4 Numerical Experiments

In this section, we do some numerical experiments to test the performance of the APRP method and compare it with the SPG2 method [13, 17] and the ASA method [3]. The SPG2 code was obtained from Birgin’s web page at: http://www.ime.usp.br/~egbirgin/tango/. The ASA code was obtained from Hager’s web page at: http://www.math.ufl.edu/~hager/. The SPG2 code and the APRP code were written in Fortran 77 and the ASA code was written in C. All codes were run on a PC with Pentium IV 3.0 GHz CPU processor and 512 M RAM memory and with a Linux operating system.

The ASA code and the SPG2 code were implemented with default parameters. We implemented the APRP code with the following parameters: δ=10−4,ρ=0.5,ϵ=10−6 and

$$\beta:=\left \{ \begin{array}{l@{\quad}l} \min\bigl\{1.0,1.01\frac{g^T_{k-1}d_{k-1}}{g_k^Td_k}\bigr\}, &\mbox{iff} \ k>0\\[4pt] \frac{1}{\|g(x_0)\|_{\infty}}, & \mbox{iff}\ k=0. \end{array} \right . $$

Moreover, we also test the APRP method with different parameters a i (x) and b i (x) to see that a i (x)=b i (x)=10−6 is the best choice.

The test problems consists of 52 box constrained problems in the CUTEr library [16] with dimensions between 500 and 15625. For each test problem and all tested methods, the termination condition is

$$\big\|P_{\varOmega}\bigl(x^k- g\bigl(x^k\bigr) \bigr)-x^k\big\|_{\infty}\leq 10^{-6}. $$

A limit of 5000 iterations is also imposed. In running the numerical experiments, we checked whether different codes converged to different local minimizers; when comparing the codes, we restricted ourselves to test problems in which all codes converged to the same local minimizer. The detailed numerical results concerned with the CPU time in seconds and the number of iterations, function evaluations, and gradient evaluations for each of the tested method are posted at the following Web site: http://bgchengwy.blog.163.com/.

First, we note that the SPG2 method fails to satisfy the termination in the problems LINVERSE (999), QRTQUAD (1200–5000), and JNLBRNGB (9375–15625), while the APRP method and the ASA method fails to satisfy the termination in the same problems QRTQUAD (1200–5000). More insight into the numerical results, we consider the statistics “iter,” “fun,” “grad,” “time” and list the number of problems where any method achieves the best value on the number of iteration, function evaluations, gradient evaluations and CPU time in Table 1.

Table 1 Statical data for 52 problems

From the above table, we observe that the APRP method works efficiently, which require fewer iterations, fewer function evaluations, fewer gradient evaluations, and less time consuming. Further to show the efficiency of the APRP method, we adopt the performance profiles introduced by Dolan and Moré [18] to evaluate CPU time in Fig. 1. It is not difficult to see from Fig. 1 that the APRP method performs best among all the methods on CPU time. In particular, both the APRP method and the ASA method can solve about 96 % of the test problems, whereas the SPG2 method can solve about 90 % of the test problems. In summary, the results from Table 1 and Fig. 1 show that the APRP method provides an efficient method for solving large-scale nonlinear box constrained problems.

Fig. 1
figure 1

Performance profiles based on number of CPU time

5 Conclusions

In this paper, we have introduced an active set modified PRP method for large-scale optimization with simple bounds on the variables. Under appropriate conditions, we show that the proposed method is globally convergent. The implementation of the algorithm uses an active set identification technique to estimate the active variables at each iteration, and determine the search direction of free variables by the modified PRP conjugate gradient method. The active set identification strategy allows rapid changes in the active set from one iteration to the next, and the cost of computing the search direction of active variables is reduced. We think that our algorithm is practical and effective for the test problems.