Abstract
In this paper, based on the three-term conjugate gradient method and the hybrid technique, we propose a hybrid three-term conjugate gradient projection method by incorporating the adaptive line search for solving large-scale nonlinear monotone equations with convex constraints. The search direction generated by the proposed method is close to the one yielded by the memoryless BFGS method, and has the sufficient descent property and the trust region property independent of line search technique. Under some mild conditions, we establish the global convergence of the proposed method. Our numerical experiments show the effectiveness and robustness of the proposed method in comparison with two existing algorithms in the literature. Moreover, we show applicability and encouraging efficiency of the proposed method by extending it to solve sparse signal restoration and image de-blurring problems.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this paper, we consider numerically solving the following constrained nonlinear monotone equations of the form:
where \(\mathcal {C}\subset \mathbb {R}^{n}\) is a nonempty closed convex set, and F is a continuous and monotone mapping from \(\mathbb {R}^{n}\) to itself. The monotonicity of the mapping F means that:
We call problem (1) the unconstrained nonlinear monotone equations if \(\mathcal {C}=\mathbb {R}^{n}\).
The problem (1) arises in many important modern applications in applied mathematics, economics and engineering, and so on. Moreover, some mathematical problems can be converted into the form of (1), such as some nonlinear monotone complementarity problems, the compressed sensing problems, and the optimal power flow control problems in electricity (see [1,2,3,4] for references). The wide applications of nonlinear monotone equations have motivated many researchers to study the efficient methods for solving (1). Up to now, numerous iterative methods have been developed to solve (1). Among these methods, we only focus on the derivative-free projection-based methods (DFPMs for short); see, for example, [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]. This kind of methods are quite suitable for solving large-scale constrained nonlinear monotone equations due to their simplicity and low storage requirement.
Inspired by the projection technique [20] and the spectral gradient method [21], Yu et al. [5] first extended the spectral gradient projection method in [22] from unconstrained nonlinear monotone equations to constrained nonlinear monotone equations. The reported numerical results in [5] show that the proposed method works quite well even for large-scale problems. Since then, it has attracted much attention, and many DFPMs have sprung up, such as the spectral gradient-type projection method [6, 7], the conjugate gradient-type projection method [8,9,10,11,12,13,14,15], the spectral conjugate gradient-type projection method [16,17,18,19], to mention just a few. Recently, the spectral gradient-type projection method has been further developed. Abubakar et al. [23] proposed a positive spectral gradient projection method whose search direction is a convex combination of two different positive spectral coefficients multiplied with the residual vector. Awwal et al. [24] presented a two-step spectral gradient projection method. Mohammad and Abubakar [25] further designed an adaptive spectral gradient projection method. The numerical results in [23,24,25] show that these methods work well and are more suitable compared to the similar methods for solving (1).
In [26], Wang et al. proposed a three-term conjugate gradient projection method based on the HS method [27] for solving (1). Subsequently, motivated by [26], Gao and He [28] proposed another three-term conjugate gradient projection method by choosing a part of the LS method in [29] as the conjugate parameter. Recently, Gao et al. [30] developed an adaptive three-term conjugate gradient projection method, in which the search direction is close to the direction of the self-scaling memoryless BFGS method proposed by Oren and Luenberger [31] in the sense of the minimum Frobenius norm. And they successfully applied this algorithm to solve the compressed sensing problems in [30]. Quite recently, Awwal et al. [32] proposed a Perry-type derivative-free method based on the BFGS quasi-Newton search direction. The reported numerical results in [32] show that the proposed method is effective and promising for solving (1) and signal recovery problem. On the other hand, based on the ideas of hybrid conjugate gradient method in [33], Sun and Liu [34] developed a hybrid conjugate gradient projection method for solving (1), in which the search direction satisfies the sufficient descent property and the trust region property independent of line search technique. Some large-scale numerical results in [34] show that the proposed method is quite competitive for some fixed initial points.
Recently, based on the memoryless BFGS method proposed by Nocedal [35] and Shanno [36], Li [37,38,39] developed a family of three-term conjugate gradient methods, in which the search directions are close to that of the memoryless BFGS method. A large number of numerical comparisons indicate that the proposed methods are quite efficient for solving unconstrained optimization problems.
To the best of our knowledge, no existing such a conjugate gradient method in the literature can combine the three-term conjugate gradient method, whose search direction is close to that of the memoryless BFGS method, with the ideas of hybrid methods. Hence, it is interesting to design a hybrid three-term conjugate gradient projection method for solving problem (1) such that its search direction is close to that of the memoryless BFGS method and possesses the sufficient descent property and the trust region property independent of line search technique.
In this paper, inspired by the hybrid conjugate gradient projection method in [34] and the three-term conjugate gradient methods in [37,38,39], we propose a hybrid three-term conjugate gradient projection method for solving (1). The rest of the paper is organized as follows. In the next section, we describe in detail the motivation of this paper and propose the algorithm. In Section 3, we establish the global convergence under some mild conditions. In Section 4, we perform numerical experiments to illustrate the performance of the proposed method and apply it to solve compressed sensing problems in Section 5. Some conclusions are summarized in the last section.
Notation
Throughout this paper, the symbols ∥⋅∥ and ∥⋅∥1 denote the Euclidean norm and ℓ1-norm on \(\mathbb {R}^{n}\), respectively. For convenience, we abbreviate F(xk) to Fk, and similar abbreviations are also used. For a nonempty closed convex set \(\mathcal {C}\subset \mathbb {R}^{n}\), \(P_{\mathcal {C}}[\cdot ]\) denotes the projection mapping from \(\mathbb {R}^{n}\) onto the set \(\mathcal {C}\), namely, \(P_{\mathcal {C}}[x]=\arg \min \limits \{\|y-x\|~|~y\in \mathcal {C}\}\).
2 Motivation and algorithm
In this section, we are devoted to developing a hybrid three-term conjugate gradient projection method for solving problem (1). To this end, we first briefly review the three-term conjugate gradient methods for solving unconstrained optimization problems proposed in [37,38,39] and the hybrid conjugate gradient projection method for solving problem (1) in [34].
As we all know, for solving large-scale unconstrained optimization problems:
where \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a continuously differentiable function, one of the most effective methods is the three-term conjugate gradient method, which dates back to Beale [40] and later developed by Nazareth [41], Zhang et al. [42], etc. They are of the form:
where αk > 0 is the steplength determined by some line search and dk is the search direction generated by:
where Fk := ∇f(xk), βk and 𝜃k are scalar parameters, and yk− 1 := Fk − Fk− 1. Different choices for the parameters βk and 𝜃k correspond to different three-term conjugate gradient methods. Clearly, the three-term conjugate gradient method reduces to the classical conjugate gradient method when 𝜃k ≡ 0 in (3). There are at least four well-known formulas for βk, such as:
We refer the readers to the monograph [43] and the comprehensive survey [44] for more details.
Based on the memoryless BFGS method proposed by Nocedal [35] and Shanno [36], in which the search direction can be written as:
where sk− 1 := xk − xk− 1, Li [37] proposed a three-term HS conjugate gradient method, in which:
and \(0\leq t_{k}\leq \bar {t}<1\). In practical computation, the parameter \(\bar {t}\) is set to 0.3 and then tk is calculated by:
where \(1-\frac {y_{k-1}^{\top }s_{k-1}}{\|y_{k-1}\|^{2}}\) is the solution of the univariate minimal problem:
It is not hard to know that the search direction defined in (3)–(5) is close to that of the memoryless BFGS method in the sense of the above univariate minimum problem. Subsequently, based on the fact that \(d_{k-1}^{\top }y_{k-1}=\|F_{k-1}\|^{2}\) holds when the objective function f is a strongly convex quadratic and the exact line search is used, Li [38] proposed a three-term PRP conjugate gradient method with the search direction of the form (3) being generated by replacing \(d_{k-1}^{\top }y_{k-1}\) with ∥Fk− 1∥2 in (4), i.e.:
and \(0\leq t_{k}\leq \bar {t}<1\). Recently, based on the comprehensive analysis of [37, 38], Li [39] proposed a family of three-term conjugate gradient methods for solving unconstrained optimization problems. Interestingly, a remarkable feature of these methods is that the search directions defined by [37,38,39] always satisfy the sufficient descent property:
On the other hand, based on the ideas of hybrid conjugate gradient method in [33], Sun and Liu [34] developed a hybrid conjugate gradient method and then extended it to solve problem (1). In [34], the search direction is determined by the following scheme:
where:
with κ > 1 being a constant. It is easy to show that the search direction defined in the above two relations satisfies the sufficient descent property and the trust region property independent of line search technique; see Lemma 3.1 and Lemma 3.2 of [34], respectively. Notice that the parameter \(\beta _{k}^{\text {HCG}}\) can be viewed as a hybrid of \(\beta _{k}^{\text {HS}}\) and \(\beta _{k}^{\text {DY}}\).
Inspired by the three-term conjugate gradient methods and the hybrid conjugate gradient projection method introduced above, we propose a hybrid three-term conjugate gradient method by incorporating the projection technique to solve nonlinear monotone equations with convex constraints. The search direction dk for the proposed method is given by:
where:
with \(0\leq t_{k}\leq \bar {t}<1\) and μ > 0. Clearly, due to the definition of wk in (8), the parameter \(\beta _{k}^{\text {HTP}}\) is a hybrid of \(\beta _{k}^{\text {THS}}\) in (4) and \(\beta _{k}^{\text {TPRP}}\) in (6). Similarly, the parameter \(\tilde {\theta }_{k}\) is a hybrid of 𝜃k in (4) and \(\hat {\theta }_{k}\) in (6). Thus, the search direction defined in (7) is close to that of the memoryless BFGS method when \(t_{k}=1-\frac {y_{k-1}^{\top }s_{k-1}}{\|y_{k-1}\|^{2}}\). Notice that the first term of the maximum function from the first relation in (8) is chosen to make the search direction have a trust region property; see Lemma 2 below. It is easy to see that wk > 0 due to the existence of ∥Fk− 1∥2 in (8). Therefore, the parameters \(\beta _{k}^{\text {HTP}}\) and \(\tilde {\theta }_{k}\) in (8) are both well defined.
We now give the steps of our algorithm which we call the hybrid three-term conjugate gradient projection (HTTCGP) method.
Remark 1
(i) The adaptive line search (9) was proposed by our recent work in [45]. From the definition of P[λ, ν][⋅], we have that λ ≤ P[λ, ν][u] ≤ ν for any \(u\in \mathbb {R}\). Introducing the parameter ν is to prevent the right-hand side of (9) from being too large, especially when xk is far from the solution set of (1) and ∥F(zk)∥ is also large. This can effectively reduce the computational cost of the line search. The parameter λ is chosen to guard against the worst possible outcome, i.e., F(zk) = 0 and \(z_{k}\notin \mathcal {C}\). Indeed, from step 2 of Algorithm 1, we have ∥dk∥ > 0. Combining this with the fact that P[λ, ν][∥F(xk + ςρidk)∥] ≥ λ > 0, we know that the right-hand side of (9) is positive for any finite nonnegative integer i. Thus, we see immediately from Lemma 3 below that F(zk)≠ 0 holds.
(ii) The adaptive line search (9) is a derivative-free line search and contains two well-known line search procedures in [22, 46] which were used in many of works. In fact, if λ = ν = 1, then P[λ, ν][∥F(zk)∥] ≡ 1. Thus, (9) reduces to the line search used in [22]. If λ = 0 and ν is sufficiently large, such as ν ≥∥F0∥, then we have P[λ, ν][∥F(zk)∥] ≡∥F(zk)∥. Therefore, (9) is equivalent to the line search used in [46]. Notice that when λ = 0 and ν = 1, (9) is equivalent to the line search recently proposed by Sun and Tian in [47]. Quite recently, Awwal et al. [24] proposed another adaptive line search, that is, \(\alpha _{k}=\varsigma \rho ^{i_{k}}\) with ik being the smallest nonnegative integer i such that:
Clearly, if c = 1, (12) can be viewed as a special case of (9), provided that λ = 0 and ν is sufficiently large. On the other hand, if c is sufficiently large, then (12) behaves like the adaptive line search (9) with λ = ν = 1.
(iii) Notice that γ ∈ (0,2) in (10) is a relaxation factor whose suitable value may speed up convergence for the projection and contraction methods as stated in [48]. In practical computation, one can take a larger relaxation factor γ ∈ [1,2).
3 Convergence property
In this section, we analyze the global convergence of the proposed method. To this end, we need the following standard assumptions used in most DFPMs.
(A1) The solution set of problem (1), denoted by \(\mathcal {S}\), is nonempty.
(A2) The mapping F is Lipschitz continuous on \(\mathbb {R}^{n}\), i.e., there is a constant L > 0 such that:
We now start our convergence analysis by giving a well-known nonexpansive property of the projection operator [49].
Lemma 1
Let \(\mathcal {C}\) be a nonempty closed convex subset of \(\mathbb {R}^{n}\). Then:
The next lemma shows that the search direction defined in (7) has some nice properties independent of line search technique.
Lemma 2
For all k ≥ 0, the search direction sequence {dk} generated by the HTTCGP always satisfies the sufficient descent property:
and the trust region property:
with \(0\leq t_{k}\leq \bar {t}<1\) and μ > 0.
Proof
Since d0 = −F0, we have \(F_{0}^{\top }d_{0}=-\|F_{0}\|^{2}\), which satisfies (14) for k = 0. From the definition of dk in (7), we conclude that for all k ≥ 1,
which implies that (14) holds.
Again, from d0 = −F0, we obtain ∥d0∥ = ∥F0∥, which satisfies (15) for k = 0. We next prove that the relation (15) holds for all k ≥ 1. Note from step 2 of Algorithm 1 that dk≠ 0 for all k. If yk− 1 = 0, then the desired result immediately follows from the fact that dk = −Fk due to the definition of dk in (7). Therefore, for the remainder of the proof, we assume that yk− 1≠ 0 for all k ≥ 1. From (14) and the Cauchy-Schwarz inequality, we know that the first inequality in (15) holds. To proceed, we note from (8) that:
and
Combining the above two relations with the definition of dk in (7) yields:
This completes the proof. □
Remark 2
The first inequality in (15) implies that we find a solution of problem (1) when dk = 0. This is the reason why we choose dk = 0 as the stopping criterion in step 2 of Algorithm 1.
The following lemma not only shows that the adaptive line search (9) is well-defined but also provides a uniform lower bound of the sequence {αk}.
Lemma 3
(i) Let {xk} and {dk} be the sequences generated by the HTTCGP. If the mapping F is continuous on \(\mathbb {R}^{n}\), then for all k ≥ 0, there exists a nonnegative integer ik satisfying (9).
(ii) Suppose that assumption (A2) holds, then the steplength αk yielded by (9) satisfies:
Proof
(i) We proceed by contradiction and assume that there exists an integer k0 ≥ 0 such that the adaptive line search (9) does not hold for any nonnegative integer i. Hence, for all i ≥ 0, we have:
where the second inequality follows from the fact that \(P_{\left [\lambda ,\nu \right ]}[\cdot ]\leq \nu \). Taking the limit in the above inequality, using the continuity of the mapping F and rearranging terms in the resulting relation, we obtain:
which contradicts (14) since \(F(x_{k_{0}})^{\top }d_{k_{0}}\leq -\left (1-\frac {(1+\bar {t})^{2}}{4}\right )\|F(x_{k_{0}})\|^{2}<0\). Thus, the desired result holds.
(ii) We now prove part (ii). Clearly, if αk = ς, then (16) holds. Otherwise, \(\widehat {\alpha }_{k}:=\rho ^{-1}\alpha _{k}\) does not satisfy the adaptive line search (9), namely:
where \(\widehat {z}_{k}=x_{k}+\widehat {\alpha }_{k}d_{k}\). This together with (13) and (14) yields:
Therefore:
where the second inequality follows from the second inequality in (15). This yields the desired inequality (16). □
The following result is adapted from [20, Lemma 2.1] to show that for any \(x^{\ast }\in \mathcal {S}\), the whole sequence {∥xk − x∗∥} generated by the HTTCGP is convergent.
Lemma 4
Suppose that the assumptions (A1)-(A2) hold. Let {xk} and {zk} be the sequences generated by the HTTCGP. Then for any \(x^{\ast }\in \mathcal {S}\) the sequence {∥xk − x∗∥} is convergent, therefore the sequence {xk} is bounded. Furthermore, it holds that:
Proof
We start our proof by noticing from the definition of zk and (9) that:
From (2) and \(x^{\ast }\in \mathcal {S}\), we obtain that:
Combining the above two relations with (10), Lemma 1 and (11), we conclude that:
which implies 0 ≤∥xk+ 1 − x∗∥≤∥xk − x∗∥ since 0 < γ < 2 from the choice of γ in the HTTCGP. That is, the sequence {∥xk − x∗∥} is non-increasing and bounded below, and therefore is convergent. So, the boundedness of {xk} is at hand.
We next derive an upper bound of ∥F(zk)∥. To proceed, notice from the boundedness of {xk} and the continuity of F that there exists a constant M > 0 such that:
Hence, we see from the Cauchy-Schwarz inequality, (2) and (18) that:
which shows that {zk} is bounded. Therefore, it again follows from the continuity of F that there exists a constant \(\widehat {M}>0\) such that:
This together with (19) implies:
which shows \(0=\lim \limits _{k\rightarrow \infty }\|x_{k}-z_{k}\|=\lim \limits _{k\rightarrow \infty }\alpha _{k}\|d_{k}\|\) since zk = xk + αkdk and 0 < γ < 2 from the HTTCGP. This completes the proof. □
Remark 3
Notice that the proof of the above lemma relies heavily on the projection technique and the line search technique independent of the specific structure of the search direction. It should be mentioned that although the whole sequence {∥xk − x∗∥} is convergent from the above lemma, this does not guarantee that {xk} converges to \(x^{\ast }\in \mathcal {S}\), and even it is not necessarily convergent.
We now combine these results to prove the global convergence of the HTTCGP under some mild conditions.
Theorem 1
Suppose that the assumptions (A1)-(A2) hold. Then the whole sequence {xk} generated by the HTTCGP converges to a solution of proplem (1).
Proof
We start our proof by noticing from (16) and (17) that \(0\leq \alpha \|d_{k}\|\leq \alpha _{k}\|d_{k}\|\rightarrow 0\). This shows \(\lim _{k\rightarrow \infty }\|d_{k}\|=0\). Combining this with (15), we have:
which implies \(\lim _{k\rightarrow \infty }\|F_{k}\|=0\). Since the sequence \(\{x_{k}\}\subset \mathcal {C}\) is bounded from the fact that \(x_{0}\in \mathcal {C}\), (10) and Lemma 4, there exists at least one cluster point of {xk}. Suppose that \(\bar {x}\) is a cluster point of the sequence \(\{x_{k}\}\subset \mathcal {C}\) and let \(\mathcal {K}\subset \{0,1,2,\dots \}\) be an infinite index set such that:
where the inclusion follows from the closedness of \(\mathcal {C}\). Thus, it follows from these and the continuity of F that:
which implies that \(\bar {x}\) is a solution of problem (1), i.e., \(\bar {x}\in \mathcal {S}\). Therefore, setting \(x^{\ast }:=\bar {x}\) in Lemma 4, we know that the sequence \(\{\|x_{k}-\bar {x}\|\}\) is convergent and then:
Consequently, we conclude that the whole sequence {xk} converges to \(\bar {x}\in \mathcal {S}\). This completes the proof. □
4 Numerical experiments
In this section, we perform numerical experiments to illustrate the effectiveness and robustness of the proposed method (HTTCGP), in comparison with two existing algorithms—Algorithm 2.1 in [30] (denoted by ATTCGP) and Algorithm 3.1 in [34] (denoted by HCGP)—in terms of the total number of iterations (Itr), the number of function evaluations (NF), and the CPU time in second (Tcpu) in the same calculating environment. All codes are written in MATLAB R2007b and run on a 64-bit Dell PC with Intel(R) Core(TM) i7-9700 CPU (3.00 GHz), 16.00 GB RAM and Windows 10.0 operating system.
The parameters in the HTTCGP are chosen as follows: \(\rho =0.5,\ \varsigma =1,\ \sigma =0.01,\ \lambda =0.001,\ \nu =0.8,\ \mu =0.2,\ \gamma =1.6,\ \bar {t}=0.3\) with tk being calculated by:
The parameters of ATTCGP are taken from Section 5 of [30], i.e., \(\rho =0.4,\ \varsigma =1,\ \sigma =0.0001,\ r=0.1,\ \mu _{1}=0.01,\ \mu _{2}=0.1,\ t_{k}={t_{k}^{b}}\). The parameters of HCGP are taken from Section 4 of [34], i.e., ρ = 0.5, ς = 1, σ = 0.0001, μ = 1.4, γ = 1.5. For each algorithm and each test problem, we stop the program when one of the following three conditions is satisfied:
Although Lemma 3 shows that the adaptive line search (9) terminates finitely, we impose a restriction on the steplength computation. That is, for some integer i > 0, if the trial steplength ςρi ≤ 10− 10 does not satisfy (9), then we set αk = ςρi. We now give the following popular test problems in which the mapping F is taken as:
Problem 4.1
[50] Set \(f_{i}(x)=e^{x_{i}}-1\), for \(i=1,2,\dots ,n\) and \(\mathcal {C}=\mathbb {R}_{+}^{n}\). Clearly, this problem has a unique solution \(x^{\ast }=(0,0,\dots ,0)^{\mathrm {T}}\).
Problem 4.2
[7] Set
and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).
Problem 4.3
[5] Set \(f_{i}(x)=x_{i}-\sin \limits |x_{i}-1|\), for \(i=1,2,\dots ,n\) and \(\mathcal {C}=\{x\in \mathbb {R}^{n}|{\sum }_{i=1}^{n}x_{i}\leq n,\ x_{i}\geq -1,\ i=1,2,\dots ,n\}\). Clearly, Problem 4.3 is nonsmooth at \(x=(1,\dots ,1)^{\mathrm {T}}\).
Problem 4.4
[28] Set
and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).
Problem 4.5
[28] Set
and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).
Problem 4.6
[28] Set \(f_{i}(x)=(e^{x_{i}})^{2}+3{\sin \limits } x_{i} \cdot {\cos \limits } x_{i}-1\), for \(i=1,2,\dots ,n\) and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).
Problem 4.7
This problem is Problem 2 in [51] with additional \(\mathcal {C}=\mathbb {R}_{+}^{n}\), i.e., \(f_{i}(x)=2{x_{i}}-{\sin \limits } x_{i}\), for \(i=1,2,\dots ,n\) and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).
Problem 4.8
This problem is Problem 10 in [52] with additional \(\mathcal {C}=\mathbb {R}_{+}^{n}\), i.e., \(f_{i}(x)=\ln (x_{i}+1)-\frac {x_{i}}{n}\), for \(i=1,2,\dots ,n\) and \(\mathcal {C}=\mathbb {R}_{+}^{n}\).
For Problem 4.3, we use the quadratic program solver quadprog.m from the MATLAB optimization toolbox to compute a new iterate xk+ 1 in step 4 of HTTCGP. For the above test problems, we use the following different initial points: \(x_{1}=(1,1,\dots ,1)^{\mathrm {T}},\ x_{2}=(\frac {1}{3},\frac {1}{3^{2}},\dots ,\frac {1}{3^{n}})^{\mathrm {T}},\ x_{3}=(\frac {1}{2},\frac {1}{2^{2}},\dots ,\frac {1}{2^{n}})^{\mathrm {T}},\ x_{4}=(0,\frac {1}{n},\dots ,\frac {n-1}{n})^{\mathrm {T}},\ x_{5}=(1,\frac {1}{2},\dots ,\frac {1}{n})^{\mathrm {T}},\ x_{6}=(\frac {1}{n},\frac {2}{n},\dots ,1)^{\mathrm {T}},\ x_{7}=(1-\frac {1}{n},1-\frac {2}{n},\dots ,0)^{\mathrm {T}}\). The numerical results are listed in Tables 1, 2, 3, 4, 5, 6, 7, and 8, where “Init” means the initial point, “n” is the problem dimension multiplied by 104, and “∥F∗∥” stands for the final value of ∥F(xk)∥ when the program is stopped.
In order to compare the performance of these methods clearly, we adopt the performance profiles introduced by Dolan and Moré [53] based on Tcpu, NF, and Itr, respectively. From Tables 1–8 and Figs. 1, 2, and 3, we conclude that for these fixed initial points, the proposed method in general is the most effective and robust in terms of Tcpu, NF, and Itr, respectively. The reason may lie in twofold, one is that the search direction defined by (7) inherits the effectiveness and robustness of the memoryless BFGS method, and the other is that the hybrid technique may numerically effectively improve the performance of the three-term
To further explore the influence of initial points on the algorithm, we test each problem by using the initial points randomly generated from the interval (0,1) with a gradually increasing number of dimensions. For the given dimension, we run the same test 20 times to obtain the average CPU time in second (\(\overline {\text {Tcpu}}\)), the average number of function evaluations (\(\overline {\text {NF}}\)), the average number of iterations (\(\overline {\text {Itr}}\)), and the average value of ∥F∗∥ (\(\overline {\|F^{\ast }\|}\)). The numerical results are listed in Table 9, where “P” stands for the test problems used and“n” is still the problem dimension multiplied by 104. We also plot the performance profiles of \(\overline {\text {Tcpu}}\), \(\overline {\text {NF}}\), and \(\overline {\text {Itr}}\), respectively. It is not hard to see from Table 9 and Figs. 4, 5, and 6 that (i) all the methods can solve all the test problems successfully; (ii) our method outperforms the other two methods about 68%, 72%, and 70% of the experiments in terms of \(\overline {\text {Tcpu}}\), \(\overline {\text {NF}}\), and \(\overline {\text {Itr}}\), respectively. In most of the experiments, our method needs less number of function evaluations to obtain an approximate solution compared to the other two methods. The reason may be that the former benefits from the adaptive line search technique proposed by Yin et al. [45]. In a word, the proposed method is more efficient and competitive than the other two methods for the given test problems.
5 Applications in compressed sensing
In this section, we apply our method to solve compressed sensing problems and compare it with two existing solvers: CGD [9] and ATTCGP [30].
5.1 Problem description
In compressed sensing problems, the task is to find/recover sparse solutions/signals of underdetermined linear systems Ax = b, where \(A\in \mathbb {R}^{m\times n}\) (m ≪ n) is a sampling matrix, and \(b\in \mathbb {R}^{m}\) is the observation signal. To this end, a popular approach is to solve the ℓ1-norm regularized optimization problem:
where τ > 0 is a constant balancing the data-fitting and sparsity. Notice that the objective function f(x) in (20) is nonsmooth convex. Interestingly, one can reformulate (20) as a convex quadratic programming by Figueiredo, Nowak, and Wright [54]. Here we provide the concrete process for completeness. By introducing the auxiliary variables \(u\in \mathbb {R}^{n}\) and \(v\in \mathbb {R}^{n}\), we can reformulate x as:
where \(u_{i}=\max \limits \{x_{i},0\}\) and \(v_{i}=\max \limits \{-x_{i},0\}\) for all \(i=1,2,\dots ,n\). Then, \(\|x\|_{1}=e_{n}^{\top }u+e_{n}^{\top }v\), where \(e_{n}=(1,1,\dots ,1)^{\top }\in \mathbb {R}^{n}\). Mathematically, the ℓ1-norm regularized optimization problem (20) can be equivalently expressed as:
However, due to the special structure of the objective function above as analyzed in [54], we know that the above optimization problem is equivalent to:
Then by expanding the quadratic term ∥A(u − v) − b∥2 in (21) and rearranging terms in the resulting relation, (21) can be written as the following simple bound constrained quadratic programming:
where:
It is easy to show that H is positive semi-definite. Then, (22) is a convex quadratic programming. Hence, it follows from the first-order optimality conditions of (22) that z is a minimizer of (22) if and only if z is a solution of the following unconstrained equations:
where the “min” is interpreted as componentwise minimum. We then have from [55, Lemma 3] and [2, Lemma 2.2] that \(F: \mathbb {R}^{2n}\rightarrow \mathbb {R}^{2n}\) is Lipschitz continuous and monotone. Thus, we can apply the proposed method to solve the unconstrained equations above.
5.2 Numerical results
In this subsection, we perform two types of numerical experiments to verify the effectiveness and efficiency of the proposed method.
In our first experiment, the main goal is to restore a one-dimensional sparse signal of length n from m observations, where m ≪ n. For fairness, the parameters for our method HTTCGP are taken as the same as those in Section 4 except for μ = 2; the parameters of CGD are taken from Section 5.2 of [9]; the parameters of ATTCGP are chosen from Section 6.2 of [30] with \(t_{k}={t_{k}^{b}}\). As usual, the quality of recovery is measured by the mean of squared error (MSE, for succinctness):
where \(\bar {x}\) is the restored signal and x is the original sparse one. From the definition of MSE, the capability of achieving a lower MSE value reflects a better quality of the restored signal for one method. Therefore, the lower the asymptotically optimal MSE value is, the better the method is. All the algorithms start their iterations with x0 = A⊤b, and all take the stopping criterion as:
where f(x) is defined in (20). The parameter τ defined in (20) for three algorithms is obtained by the same continuation technique used in [54].
In our experiments, let r denote the number of nonzero entries of x. For a given triple (m, n, r), we use the following MATLAB codes to generate the test data:
From Fig. 7, we observe that the original signal is almost exactly restored from the disturbed one by the above-mentioned three algorithms, while our algorithm outperforms the other two algorithms in terms of MSE, Itr, and Tcpu. To further demonstrate the performance of our algorithm, we set (m, n, r) = (256i,1024i,32i) with i = 1 : 0.5 : 8. For each i = 1 : 0.5 : 8, we generate 20 random instances as described above. The detailed numerical results are reported in Table 10, where we report the average number of iterations (\(\overline {\text {Itr}}\)), the average CPU time in second (\(\overline {\text {Tcpu}}\)), and the average MSE (\(\overline {\text {MSE}}\)) over the 20 instances. We see from Table 10 that HTTCGP clearly outperforms the other two algorithms in terms of \(\overline {\text {Itr}}\), \(\overline {\text {Tcpu}}\) and \(\overline {\text {MSE}}\). Moreover, ATTCGP is always faster than CGD, and its solution quality is also better than CGD in terms of \(\overline {\text {MSE}}\). In summary, the reported numerical results indicate that the proposed method is promising for recovering a sparse signal in compressive sensing.
In our second experiment, we concentrate on the image de-blurring problem and apply the proposed method to restore the original image from its blurred image. We refer the readers to the monograph [56] for the background of the digital image restoration problem. In the literature, the quality of restoration is usually measured by the peak signal-to-noise ratio (PSNR):
and the structured similarity (SSIM) index which measures the similarity between the original image and the restored one; see [57] for more details. Here V is the maximum absolute value of the reconstruction. The MATLAB implementation of the SSIM index can be obtained at http://www.cns.nyu.edu/~lcv/ssim/. The larger the PSNR and SSIM are, the better the corresponding method is. The initialization and stopping criterion of all three algorithms are exactly the same as the first experiment above. The parameters for HTTCGP are chosen from the first experiment. For CGD and ATTCGP, we empirically determine appropriate values for their involved parameters. The tuned parameters are as follows: ρ = 0.1 for CGD; ς = 0.8 for ATTCGP and the other parameters are taken from their original papers. Here, the matrix A defined in (20) is a partial discrete Walsh-Hadamard transform (DWT) matrix. We refer the readers to [2] and references therein for a discussion about the computational cost associated with matrix A.
The tested images are Chart.tiff (256 × 256), Barbara.png (512 × 512), Lena.png (512 × 512) and Man.bmp (1024 × 1024). See Fig. 8 for the original, blurred, and restored images by three algorithms. We report the corresponding results in Table 11. According to Table 11, we know that our method is attractive and efficient in terms of the solution quality. Although the proposed method requires more number of iterations, we observe that its CPU time is usually less than CGD. In a word, all the numerical results show that our method provides a effective approach for solving compressed sensing problems.
6 Conclusions
In this paper, we propose a hybrid three-term conjugate gradient projection method for solving large-scale constrained nonlinear monotone equations, in which the search direction is close to that of the memoryless BFGS method in the sense of the univariate minimum problem and has the sufficient descent property and the trust region property independent of line search technique. With the help of the adaptive line search technique, the global convergence of the proposed method is established under some mild conditions. A large number of numerical experiments about the constrained nonlinear monotone equations indicate that our method not only inherits the good properties of the three-term conjugate gradient methods and the hybrid conjugate gradient methods but also benefits from the adaptive line search technique, thus greatly improves its computational efficiency compared to the two existing algorithms. Moreover, we apply the proposed method to solve sparse signal restoration and image de-blurring problems, and compare it with two existing methods. The numerical results show that our method is practical and promising.
References
Zhao, Y.B., Li, D.: Monotonicity of fixed point and normal mapping associated with variational inequality and is applications. SIAM J. Optim. 11(4), 962–973 (2001)
Xiao, Y.H., Wang, Q.Y., Hu, Q.J.: Non-smooth equations based method for ℓ1-norm problems with applications to compressed sensing. Nonlinear Anal. 74 (11), 3570–3577 (2011)
Dirkse, S.P., Ferris, M.C.: MCPLIB: A collection of nonlinear mixed complementarity problems. Optim. Methods Softw. 5(4), 319–345 (1995)
Wood, A.J., Wollenberg, B.F.: Power Generation, Operation, and Control. Wiley, New York (1996)
Yu, Z.S., Lin, J., Sun, J., Xiao, Y.H., Liu, L.Y., Li, Z.H.: Spectral gradient projection method for monotone nonlinear equations with convex constraints. Appl. Numer. Math. 59(10), 2416–2423 (2009)
Liu, J., Duan, Y.R.: Two spectral gradient projection methods for constrained equations and their linear convergence rate. J. Inequal. Appl. 2015(1), 1–13 (2015)
Yu, G.H., Niu, S.Z., Ma, J.H.: Multivariate spectral gradient projection method for nonlinear monotone equations with convex constraints. J. Ind. Manag. Optim. 9(1), 117–129 (2013)
Zhang, L.: A modified PRP projection method for nonlinear equations with convex constraints. Int. J. Pure Appl. Math. 79(1), 87–96 (2012)
Xiao, Y.H., Zhu, H.: A conjugate gradient method to solve convex constrained monotone equations with applications in compressive sensing. J. Math. Anal. Appl. 405(1), 310–319 (2013)
Liu, S.Y., Huang, Y.Y., Jiao, H.W.: Sufficient descent conjugate gradient methods for solving convex constrained nonlinear monotone equations. Abstr. Appl. Anal. 2014, 1–12 (2014)
Sun, M., Liu, J.: A modified Hestenes-Stiefel projection method for constrained nonlinear equations and its linear convergence rate. J. Appl. Math. Comput. 49(1-2), 145–156 (2015)
Ding, Y.Y., Xiao, Y.H., Li, J.W.: A class of conjugate gradient methods for convex constrained monotone equations. Optimization 66 (12), 2309–2328 (2017)
Liu, J.K., Feng, Y.M.: A derivative-free iterative method for nonlinear monotone equations with convex constraints. Numer. Algorithms 82, 245–262 (2019)
Abubakar, A.B., Kumam, P.: A descent Dai-Liao conjugate gradient method for nonlinear equations. Numer. Algorithms 81(1), 197–210 (2019)
Ibrahim, A.H., Kumam, P., Abubakar, A.B., Jirakitpuwapat, W., Abubakar, J.: A hybrid conjugate gradient algorithm for constrained monotone equations with application in compressive sensing. Heliyon 6(3), e03466 (2020)
Wang, S., Guan, H.B.: A scaled conjugate gradient method for solving monotone nonlinear equations with convex constraints. J. Appl. Math. 2013(1), 1–7 (2013)
Liu, J.K., Li, S.J.: Multivariate spectral DY-type projection method for convex constrained nonlinear monotone equations. J. Ind. Manag. Optim. 13 (1), 283–295 (2017)
Abubakar, A.B., Kumam, P., Mohammad, H., Awwal, A.M., Sitthithakerngkiet, K.: A modified Fletcher-Reeves conjugate gradient method for monotone nonlinear equations with some applications. Mathematics 7 (8), 745 (2019)
Abubakar, A.B., Kumam, P., Mohammad, H., Awwal, A.M.: An efficient conjugate gradient method for convex constrained monotone nonlinear equations with applications. Mathematics 7(9), 767 (2019)
Solodov, M.V., Svaiter, B.F.: A globally convergent inexact newton method for systems of monotone equations. In: Fukushima, M., Qi, L. (eds.) Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, pp.355-369. Kluwer Academic (1998)
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Zhang, L., Zhou, W.J.: Spectral gradient projection method for solving nonlinear monotone equations. J. Comput. Appl. Math. 196, 478–484 (2006)
Abubakar, A.B., Kumam, P., Mohammad, H.: A note on the spectral gradient projection method for nonlinear monotone equations with applications. Comput. Appl. Math. 39, 129 (2020)
Awwal, A.M., Wang, L., Kumam, P., Mohammad, H.: A two-step spectral gradient projection method for system of nonlinear monotone equations and image deblurring problems. Symmetry 12(6), 874 (2020)
Mohammad, H., Abubakar, A.B.: A descent derivative-free algorithm for nonlinear monotone equations with convex constraints. RAIRO-oper Res. 54(2), 489–505 (2020)
Wang, X.Y., Li, S.J., Kou, X.P.: A self-adaptive three-term conjugate gradient method for monotone nonlinear equations with convex constraints. Calcolo 53(2), 133–145 (2016)
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49, 409–436 (1952)
Gao, P.T., He, C.J.: An efficient three-term conjugate gradient method for nonlinear monotone equations with convex constraints. Calcolo 55(4), 53 (2018)
Liu, Y., Storey, C.: Efficient generalized conjugate gradient algorithms, part 1: Theory. J. Optim. Theory Appl. 69, 177–182 (1991)
Gao, P.T., He, C.J., Liu, Y.: An adaptive family of projection methods for constrained monotone nonlinear equations with applications. Appl. Math. Comput. 359, 1–16 (2019)
Oren, S.S., Luenberger, D.G.: Self-scaling variable metric (SSVM) algorithms, Part i: Criteria and sufficient conditions for scaling a class of algorithms. Manag. Sci. 20(5), 845–862 (1974)
Awwal, A.M., Kumam, P., Mohammad, H., Watthayu, W., Abubakar, A.B.: A Perry-type derivative-free algorithm for solving nonlinear system of equations and minimizing ℓ1 regularized problem. Optimization. (2020). https://doi.org/10.1080/02331934.2020.1808647
Jian, J.B., Han, L., Jiang, X.Z.: A hybrid conjugate gradient method with descent property for unconstrained optimization. Appl. Math. Model. 39(3-4), 1281–1290 (2015)
Sun, M., Liu, J.: New hybrid conjugate gradient projection method for the convex constrained equations. Calcolo 53, 399–411 (2016)
Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 35(151), 773–782 (1980)
Shanno, D.F.: Conjugate gradient methods with inexact searches. Math. Oper. Res. 3, 244–256 (1978)
Li, M.: A modified Hestense-Stiefel conjugate gradient method close to the memoryless BFGS quasi-Newton method. Optim. Methods Softw. 33 (2), 336–353 (2018)
Li, M.: A three term polak-ribière-polyak conjugate gradient method close to the memoryless BFGS quasi-Newton method. J. Ind. Manag. Optim. 13(5), 1–16 (2017)
Li, M.: A family of three-term nonlinear conjugate gradient methods close to the memoryless BFGS method. Optim. Lett. 12, 1911–1927 (2018)
Beale, E.M.L.: A derivation of conjugate gradients. In: Lootsma, F.A. (ed.) Numerical Methods for Nonlinear Optimization. Academic Press, London (1972)
Nazareth, L.: A conjugate direction algorithm without line search. J. Optim. Theory Appl. 23(3), 373–387 (1997)
Zhang, L., Zhou, W.J., Li, D.H.: A descent modified polak-ribière-polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 26(4), 629–640 (2006)
Dai, Y.H., Yuan, Y.X.: Nonlinear Conjugate Gradient Methods. Shanghai Science and Technology Publisher, Shanghai (2000)
Hager, W.W., Zhang, H.C.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2(1), 35–58 (2006)
Yin, J.H., Jian, J.B., Jiang, X.Z.: A spectral gradient projection algorithm for convex constrained nonsmooth equations based on an adaptive line search. Math. Numer. Sin. (Chinese) 42(4), 457–471 (2020)
Li, Q., Li, D.H.: A class of derivative-free methods for large-scale nonlinear monotone equations. IMA J. Numer. Anal. 31(4), 1625–1635 (2011)
Sun, M., Tian, M.Y.: A class of derivative-free CG projection methods for nonsmooth equations with an application to the LASSO problem. Bull. Iran. Math. Soc. 46, 183–205 (2020)
Cai, X.J., Gu, G.Y., He, B.S.: On the O(1/t) convergence rate of the projection and contraction methods for variational inequalities with Lipschitz continuous monotone operators. Comput. Optim. Appl. 57(2), 339–363 (2014)
Zarantonello, E.H.: Projections on Convex Sets in Hilbert Space and Spectral Theory. Academic Press, New York (1971)
Wang, C.W., Wang, Y.J., Xu, C.L.: A projection method for a system of nonlinear monotone equations with convex constraints. Math. Meth. Oper. Res. 66(1), 33–46 (2007)
Zhou, W.J., Li, D.H.: Limited memory BFGS method for nonlinear monotone equations. J. Comput. Math. 25, 89–96 (2007)
La Cruz, W., Raydan, M.: Nonmonotone spectral methods for large-scale nonlinear systems. Optim. Methods Softw. 18(5), 583–599 (2003)
Dolan, E.D., Moré, J. J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Figueiredo, M.A.T., Nowak, R.D., Wright, S.J.: Gradient projection for sparse reconstruction, application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Sign. Proces. 1(4), 586–597 (2008)
Pang, J.S.: Inexact Newton methods for the nonlinear complementary problem. Math. Program. 36(1), 54–71 (1986)
Gonzales, R.C., Woods, R.E.: Digital Image Processing, 3rd edn. Prentice Hall (2008)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Acknowledgments
The authors wish to thank the three anonymous referees for their very professional comments and quite useful suggestions, which greatly helped us to improve the original version of this paper.
Funding
This work was supported by the National Natural Science Foundation of China (11771383), the Natural Science Foundation of Guangxi Province (2018GXNSFAA281099), the Research Project of Guangxi University for Nationalities (2018KJQD02), and the Middle-aged and Young Teachers’ Basic Ability Promotion Project of Guangxi Province (2018KY0700) of China.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yin, J., Jian, J., Jiang, X. et al. A hybrid three-term conjugate gradient projection method for constrained nonlinear monotone equations with applications. Numer Algor 88, 389–418 (2021). https://doi.org/10.1007/s11075-020-01043-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11075-020-01043-z
Keywords
- Derivative-free projection method
- Three-term conjugate gradient method
- Convergence property
- Compressed sensing