1 Introduction

Let H be a real Hilbert space and C a nonempty, closed, and convex subset of H. Let f : H → ℝ be a convex and continuously Fréchet differentiable functional. Consider the following constrained convex minimization problem:

$$\begin{array}{@{}rcl@{}} \text{minimize} \{f(x) : x \in C\}. \end{array} $$
(1)

The gradient projection method (for short, GPM) generates a sequence {x n } using the following recursive formula:

$$\begin{array}{@{}rcl@{}} x_{n+1} = P_{C} (x_{n}-\lambda \nabla f(x_{n}))\quad\forall n \geq 1, \end{array} $$
(2)

or more generally,

$$\begin{array}{@{}rcl@{}} x_{n+1} = P_{C} (x_{n}-\lambda_{n} \nabla f(x_{n}))\quad\forall n \geq 1, \end{array} $$
(3)

where in both (2) and (3), the initial guess x 0 is taken from C arbitrarily, and the parameters, λ or λ n , are positive real numbers known as stepsize.

The gradient projection (or projected-gradient) algorithm is a powerful tool for solving constrained convex optimization problems and has been extensively studied (see [5, 6, 9, 11, 15, 1723] and the references therein). It has been recently applied to solve split feasibility problems which find applications in image reconstructions and the intensity-modulated radiation therapy (see [3, 4, 14, 25]).

The convergence of algorithms (2) and (3) depends on the behavior of the gradient ∇f. The gradient projection method (3) has been considered with several stepsize rules:

  • Constant stepsize, where for some λ > 0, we have λ n = λ for all n.

  • Diminishing stepsize, where λ n → 0 and \({\sum }_{n=1}^{\infty } \lambda _{n}=\infty \).

  • Polyak’s stepsize, where \(\lambda _{n}=\frac {f(x_{n})-f^{\ast }}{\|\nabla f(x_{n})\|^{2}}\), where f is the optimal value of (1).

  • Modified Polyak’s stepsize, where \(\lambda _{n}=\frac {f(x_{n})-\hat {f_{n}}}{\|\nabla f(x_{n})\|^{2}}\) and \(\hat {f_{n}}=\min _{0\leq j \leq n}f(x_{j})-\delta \) for some scalar δ > 0.

The constant stepsize rule is suitable when we are interested in finding an approximate solution to problem (1). Diminishing stepsize rule is an off-line rule and is typically used with \(\lambda _{n}=\frac {c}{n+1}\) or \(\frac {c}{\sqrt {n+1}}\) for some c > 0. The constant and the diminishing stepsizes are also well suited for some distributed implementations of the method.

As a matter of fact, it is known (see [11]) that if ∇f is α-strongly monotone and L-Lipschitzian with constants α, L > 0, then the operator

$$\begin{array}{@{}rcl@{}} T := P_{C} (I -\lambda \nabla f ) \end{array} $$
(4)

is a contraction; hence, the sequence {x n } defined by algorithm (2) converges in norm to the unique solution of the minimization problem (1). More generally, if the sequence {λ n } is chosen to satisfy the property

$$\begin{array}{@{}rcl@{}} 0 < \liminf \lambda_{n}\leq \limsup\lambda_{n} <\frac{2\alpha}{L^{2}}, \end{array} $$

then the sequence {x n } defined by algorithm (3) converges in norm to the unique minimizer of (1). However, if the gradient ∇f fails to be strongly monotone, the operator T defined by (4) would fail to be contractive; consequently, the sequence {x n } generated by algorithm 3 may fail to converge strongly (see [22, Sect. 5]). If ∇f is Lipschitzian, then algorithms (2) and (3) can still converge in the weak topology under certain conditions.

Recently, Xu [22] gave an alternative operator-oriented approach to algorithm 3, namely an averaged mapping approach. He gave his averaged mapping approach to the gradient projection algorithm (3) and the relaxed gradient projection algorithm. Moreover, he constructed a counterexample which shows that algorithm (2) does not converge in norm in an infinite-dimensional space and also presented two modifications of gradient projection algorithms which are shown to have strong convergence. Further, he regularized the minimization problem (1) to devise an iterative scheme that generates a sequence converging in norm to the minimum-norm solution of (1) in the consistent case.

Very recently, motivated by the work of Xu [22], Ceng et al. [6] proposed implicit and explicit iterative schemes for finding the approximate minimizer of a constrained convex minimization problem and prove that the sequences generated by their schemes converge strongly to a solution of the constrained convex minimization problem. Such a solution is also a solution of a variational inequality defined over the set of fixed points of a nonexpansive mapping.

Motivated by the works of Xu [22] and Ceng et al. [6], we prove a strong convergence theorem for finding the approximate minimizer of a constrained convex minimization problem in a real Hilbert space. Furthermore, we give computational analysis of our result. Our result complements the results of Xu [22] and several other works in this direction.

2 Preliminaries

Definition 1

A mapping T : CC is said to be nonexpansive if

$$\Vert Tx-Ty\Vert\leq\Vert x-y\Vert\quad\forall x,y\in C. $$

Construction of fixed points of nonexpansive mappings is an important subject in nonlinear mapping theory and its applications, in particular, in image recovery and signal processing (see, for example, [3, 4, 14, 25]). For the past 40 years or so, the approximation of fixed points of nonexpansive mappings and fixed points of some of their generalizations and approximation of zeros of accretive-type operators have been a flourishing area of research for many mathematicians. For example, the reader can consult the recent monographs of Berinde [2] and Chidume [7].

For any point uH, there exists a unique point P C uC such that

$$\|u-P_{C} u\| \leq \|u-y\|\quad\forall y \in C. $$

P C is called the metric projection of H onto C. We know that P C is a nonexpansive mapping of H onto C. It is also known that P C satisfies

$$\begin{array}{@{}rcl@{}} \langle x-y, P_{C} x-P_{C} y \rangle \geq \|P_{C} x-P_{C} y\|^{2} \end{array} $$

for all x, yH. Furthermore, P C x is characterized by the properties P C xC and

$$\begin{array}{@{}rcl@{}} \langle x-P_{C} x, P_{C} x-y \rangle \geq 0 \end{array} $$

for all yC.

Definition 2

A mapping T : HH is said to be firmly nonexpansive if and only if 2TI is nonexpansive, or equivalently

$$\langle x-y,Tx-Ty\rangle \geq \|Tx-Ty\|^{2}\quad\forall x,y \in H. $$

Alternatively, T is firmly nonexpansive if and only if T can be expressed as

$$T=\frac{1}{2}(I+S), $$

where S : HH is nonexpansive. Projections are firmly nonexpansive.

Definition 3

A mapping T : HH is said to be an averaged mapping if and only if it can be written as the average of the identity mapping I and a nonexpansive mapping; that is

$$ T=(1-\alpha)I+\alpha S, $$
(5)

where α ∈ (0, 1) and S : HH is nonexpansive. More precisely, when (5) holds, we say that T is α-averaged. Thus, firmly nonexpansive mappings (in particular, projections) are \(\frac {1}{2}\)-averaged mappings.

Some properties of averaged mappings are in the following proposition.

Proposition 1.

([4, 8]) For given operators S, T, V : H → H.

  1. (a)

    If T = (1 − α)S + αV for some α ∈ (0, 1) and if S is averaged and V is nonexpansive, then T is averaged.

  2. (b)

    T is firmly nonexpansive if and only if the complement I − T is firmly nonexpansive.

  3. (c)

    If T = (1 − α)S + αV for some α ∈ (0, 1) and if S is firmly nonexpansive and V is nonexpansive, then T is averaged.

  4. (d)

    The composite of finitely many averaged mappings is averaged. That is, if each of the mappings \(\{T_{i}\}_{i=1}^{N}\) is averaged, then so is the composite T 1 …T N . In particular, if T 1 is α 1 -averaged and T 2 is α 2 -averaged, where α 1 2 ∈ (0, 1), then the composite T 1 T 2 is α-averaged, where α = α 1 + α 2 − α 1 α 2.

Definition 4

A nonlinear operator T whose domain D(T) ⊂ H and range R(T) ⊂ H is said to be

  1. (a)

    monotone if

    $$\langle x-y, Tx-Ty\rangle \geq 0\quad \forall x, y \in D(T), $$
  2. (b)

    β-strongly monotone if there exists β > 0 such that

    $$\langle x-y, Tx-Ty\rangle \geq \beta \|x-y\|^{2}\quad \forall x, y \in D(T), $$
  3. (c)

    ν-inverse strongly monotone (for short, ν-ism) if there exists ν > 0 such that

    $$\langle x-y, Tx-Ty\rangle \geq \nu \|Tx-Ty\|^{2}\quad \forall x, y \in D(T). $$

It can be easily seen that (i) if T is nonexpansive, then IT is monotone; (ii) the projection mapping P C is a 1-ism. The inverse strongly monotone (also referred to as co-coercive) operators have been widely used to solve practical problems in various fields, for instance, in traffic assignment problems; see, for example, [3, 10] and the references therein.

The following proposition gathers some results on the relationship between averaged mappings and inverse strongly monotone operators.

Proposition 2.

([4]) Let T : H → H be an operator from H to itself.

  1. (a)

    T is nonexpansive if and only if the complement I − T is \(\frac {1}{2}\) -ism.

  2. (b)

    If T is ν-ism, then for γ > 0, γT is \(\frac {\nu }{\gamma }\) -ism.

  3. (c)

    T is averaged if and only if the complement I − T is ν-ism for some ν > 1/2. Indeed, for α ∈ (0, 1), T is α-averaged if and only if I−T is \(\frac {1}{2\alpha }\) -ism.

Since the Lipschitz continuity of the gradient of f implies that it is indeed inverse strongly monotone (ism), its complement can be an averaged mapping. Consequently, the GPA can be rewritten as the composite of a projection and an averaged mapping, which is again an averaged mapping. This shows that averaged mappings play an important role in the gradient projection algorithm. Recall that a mapping T is nonexpansive if and only if it is Lipschitz with a Lipschitz constant not more than one and that a mapping is an averaged mapping if and only if it can be expressed as a proper convex combination of the identity mapping and a nonexpansive mapping. An averaged mapping with a fixed point is asymptotically regular and its Picard iterates at each point converge weakly to a fixed point of the mapping. This convergence property is quite helpful.

In the sequel, we shall also make use of the following lemmas.

Lemma 1.

Let H be a real Hilbert space. Then the following inequality holds:

$$\Vert x+y\Vert^{2}\leq \Vert x\Vert^{2} + 2\langle y, x+y\rangle\quad \forall~x, y\in H. $$

Lemma 2.

Let H be a real Hilbert space. The following inequality holds:

$$\|x+y\|^{2}=\|x\|^{2}+2\langle x,y\rangle+\|y\|^{2}\quad\forall x, y \in H. $$

Lemma 3.

(Reich,s [16]) Let K be closed and convex subset of a reflexive real Banach space E with a uniformly Gâteaux differentiable norm. Assume that every weakly compact and convex subset of E has the fixed-point property for nonexpansive mappings. Let A : K → E be an accretive mapping which satisfies the range condition K ⊆ R(I + sA) for all s > 0. Suppose that 0 ∈ R(A), then for each x ∈ K, the strong limit, \(\lim _{t \to \infty } J_{t} x\) exists and belongs to A −1 (0). If we denote \(\lim _{t \to \infty } J_{t} x\) by Qx, then Q : K → A −1 (0) is the unique sunny nonexpansive retraction of K onto A −1(0).

We remark that in Hilbert spaces, a sunny nonexpansive retraction is a projection mapping.

Lemma 4.

(Moore and Nnoli, [12]) Let \(\{\theta _{n}\}_{n=1}^{\infty }\) be a sequence of nonnegative real numbers satisfying the following relation:

$$\theta_{n+1}\leq\theta_{n}-\alpha_{n}{\Phi}(\theta_{n+1})+\sigma_{n},\quad n \geq 1, $$

where (i) 0 < α n < 1; (ii) \(\sum _{n=1}^{\infty }\alpha _{n}=\infty \) (iii) ; Φ : [0, ∞) → [0, ∞) is a strictly increasing function with Φ(0) = 0. Suppose that σ n = o(α n ) (where σ n = o(α n ) if and only if \(\lim _{n \to \infty } \frac {\sigma _{n}}{\alpha _{n}}=0)\) , then θ n → 0 as n → ∞.

We adopt the following notation: x n x means that x n x strongly.

3 Main Result

In this section, we modify the gradient projection method so as to have strong convergence. Below, we include such modification. We use constant stepsize λ since we are interested in finding an approximate solution to problem (1).

Theorem 1.

Let C be a nonempty, closed, and convex subset of a real Hilbert space H. Suppose that the minimization problem (1) is consistent and let S denote its solution set. Assume that the gradient ∇f is L-Lipschitzian with constant L > 0. For a fixed u ∈ C, let the sequence {x n } be generated iteratively by x 1 ∈ C,

$$\begin{array}{@{}rcl@{}} x_{n+1}=(1-\alpha_{n})x_{n}+\alpha_{n}P_{C}(x_{n}-\lambda\nabla f(x_{n}))-\alpha_{n}\gamma_{n}(x_{n}-u),\quad n\geq1, \end{array} $$
(6)

where \(0<\lambda <\frac {2}{L}\) and {α n }, {γ n } are sequences in (0,1) satisfying the following conditions:

  1. (i)

    α n (1 + γ n ) < 1,

  2. (ii)

    α n = o(γ n ),

  3. (iii)

    \(\sum \nolimits _{n=1}^{\infty }\alpha _{n}\gamma _{n}=+\infty. \)

Then the sequence {x n } converges strongly to a minimizer \(\hat {x}\) of (1), where \(\hat {x}\) is the projection of the starting point u onto the solution set of the convex problem being solved.

Proof

Observe that x C solves the minimization problem (1) if and only if x solves the fixed-point equation

$$x^{\ast}=P_{C}(I-\lambda\nabla f)x^{\ast}, $$

where λ > 0 is any fixed positive number. Note that the gradient ∇f is L-Lipschitzian with constant L > 0 implies that the gradient ∇f is \(\frac {1}{L}\)-ism [1], which then implies that λf is \(\frac {1}{\lambda L}\)-ism. So by Proposition 2(c), Iλf is \(\frac {\lambda L}{2}\)-averaged. Now since the projection P C is \(\frac {1}{2}\)-averaged, we see from Proposition 2(d) that the composite P C (Iλf) is \(\frac {2 + \lambda L}{4}\)-averaged for \(0 < \lambda < \frac {2}{L}\). Therefore, we can write

$$\begin{array}{@{}rcl@{}} P_{C}(I-\lambda \nabla f ) = \frac{2 - \lambda L}{4}I+\frac{2 + \lambda L}{4}T=(1-\beta)I+\beta T, \end{array} $$

where T is nonexpansive and \(\beta = \frac {2 + \lambda L}{4} \in (0,1)\). Then, we can rewrite (6) as

$$\begin{array}{@{}rcl@{}} x_{n+1}=(1-\theta_{n})x_{n}+\theta_{n}Tx_{n}-\alpha_{n}\gamma_{n}(x_{n}-u), \end{array} $$
(7)

where θ n = β α n ∈ (0, 1) ∀n ≥ 1. For any x S, notice that T x = x . We first show that the sequence \(\{x_{n}\}_{n=1}^{\infty }\) is bounded. Observe that T is nonexpansive and is equivalent to

$$\begin{array}{@{}rcl@{}} \langle Tx-Ty,x-y\rangle\leq\|x-y\|^{2}-\frac{1}{2}\|(I-T)x-(I-T)y\|^{2}. \end{array} $$
(8)

Using (7), we have

$$\begin{array}{@{}rcl@{}} \|x_{n+1}-x^{\ast}\|^{2}&=&\|(1-\alpha_{n}\gamma_{n}-\theta_{n})(x_{n}-x^{\ast})+\theta_{n} (Tx_{n}-x^{\ast})-\alpha_{n}\gamma_{n}(x^{\ast}-u)\| \\ &\leq&\|(1-\alpha_{n}\gamma_{n}-\theta_{n})(x_{n}-x^{\ast})+\theta_{n} (Tx_{n}-x^{\ast})\|+ \alpha_{n}\gamma_{n}\|u-x^{\ast}\|. \end{array} $$
(9)

Then, using (8), we obtain

$$\begin{array}{@{}rcl@{}} &&\|(1-\alpha_{n}\gamma_{n}-\theta_{n})(x_{n}-x^{\ast})+\theta_{n} (Tx_{n}-x^{\ast})\|^{2} \\ &&\quad=(1-\alpha_{n}\gamma_{n}-\theta_{n})^{2}\|x_{n}-x^{\ast}\|^{2}+{\theta_{n}^{2}}\|Tx_{n}-x^{\ast}\|^{2} \\ &&\qquad+2(1-\alpha_{n}\gamma_{n}-\theta_{n})\theta_{n}\langle Tx_{n}-x^{\ast},x_{n}-x^{\ast}\rangle \\ &&\quad\leq(1-\alpha_{n}\gamma_{n}-\theta_{n})^{2}\|x_{n}-x^{\ast}\|^{2}+{\theta_{n}^{2}}\|x_{n}-x^{\ast}\|^{2} \\ &&\qquad+2(1-\alpha_{n}\gamma_{n}-\theta_{n})\theta_{n}\Big[\|x_{n}-x^{\ast}\|^{2}-\frac{1}{2}\|x_{n}-Tx_{n}\|^{2}\Big]\\ &&\quad=(1-\alpha_{n}\gamma_{n})^{2}\|x_{n}-x^{\ast}\|^{2}-(1-\alpha_{n}\gamma_{n}-\theta_{n})\theta_{n}\|x_{n}-Tx_{n}\|^{2}\\ &&\quad\leq(1-\alpha_{n}\gamma_{n})^{2}\|x_{n}-x^{\ast}\|^{2}, \end{array} $$

which implies that

$$\begin{array}{@{}rcl@{}} \|(1-\alpha_{n}\gamma_{n}-\theta_{n})(x_{n}-x^{\ast})+\theta_{n} (Tx_{n}-x^{\ast})\|\leq (1-\alpha_{n}\gamma_{n})\|x_{n}-x^{\ast}\|. \end{array} $$
(10)

It follows from (9) and (10) that

$$\begin{array}{@{}rcl@{}} \|x_{n+1}-x^{\ast}\|&\leq &(1-\alpha_{n}\gamma_{n})\|x_{n}-x^{\ast}\|+ \alpha_{n}\gamma_{n}\|u-x^{\ast}\| \\ &\leq &\max\{\|x_{n}-x^{\ast}\|,\|u-x^{\ast}\|\} \\ &\vdots& \\ &\leq &\max\{\|x_{1}-x^{\ast}\|,\|u-x^{\ast}\|\}. \end{array} $$

Thus, \(\{x_{n}\}_{n=1}^{\infty }\) is bounded and so is {T x n }.

Next, we show that the sequence \(\{x_{n}\}_{n=1}^{\infty }\) converges strongly to \(\hat {x}\). For each n ≥ 1, let A := IT. Then A is a bounded, continuous, and monotone mapping. Since \(\{x_{n}\}_{n=1}^{\infty }\) is bounded, we have that \(\{Ax_{n}\}_{n=1}^{\infty }\) is bounded. Furthermore, by Theorem 2 of [13], A satisfies the range condition. Observe that if for all γ > 0, we define

$$A_{\gamma}:C \to C\quad\text{by}\quad A_{\gamma} x=\gamma Ax\qquad\forall x \in C, $$

then we easily see that A γ satisfies the range condition and

$$S=F(T) = A^{-1}(0) = A_{\gamma}^{-1}(0) = F(J_{s}^{A_{\gamma}}) = \{x \in H:J_{s}^{A_{\gamma}}x=x\}, $$

where \(J_{s}^{A_{\gamma }}\) is the resolvent of the operator A γ γ > 0. Observe that

$$\|A_{\gamma} x_{n}\|=\gamma \|Ax_{n}\| \leq \gamma \sup\limits_{x \in B}\|Ax\|\quad\forall n \geq 1, $$

where B is any closed ball containing the sequence {x n }. This implies that \(\lim _{\gamma \to 0} \|A_{\gamma } x_{n}\|=0\) for each n ≥ 1. Furthermore, we obtain from Lemma 3 that \(\lim _{s \to \infty } J_{s}^{A_{\gamma }}u=Q^{\gamma } u\), where Q γ is the unique projection mapping from R(I + s A) onto \(A_{\gamma }^{-1}(0) = A^{-1}(0) = F(T)\). Thus, from uniqueness of projection mapping, we obtain that if Q is the projection mapping of R(I+s A) onto A −1(0), then Q γ = Q for all γ > 0. This implies that \(Q u =\lim _{s \to \infty } J_{s}^{A_{\gamma }}u\) for all γ > 0. Let \(\hat {x}:=Q u\). We show that if

$$\xi_{n}:=\max\{\langle u-\hat{x},x_{n}-\hat{x} \rangle,0\}\quad\forall n \geq 1, $$

then \(\lim _{n \to \infty } \xi _{n}=0\). We further observe that since \(J_{s}^{A_{\gamma }}=(I+sA_{\gamma })^{-1},\) then \((I+sA_{\gamma })J_{s}^{A_{\gamma }}u=u\). This implies that \(A_{\gamma } o J_{s}^{A_{\gamma }}u=\frac {1}{s}(u-J_{s}^{A_{\gamma }}u)\) and thus since A γ is monotone, we have that

$$\left\langle A_{\gamma} x_{n}-\frac{1}{s}(u-J_{s}^{A_{\gamma}}u), x_{n}-J_{s}^{A_{\gamma}}u\right\rangle \geq 0\quad\forall s>0,~\gamma>0. $$

This implies that for some positive constant M > 0,

$$\begin{array}{@{}rcl@{}} \langle u-J_{s}^{A_{\gamma}}u, x_{n}-J_{s}^{A_{\gamma}}u \rangle \leq s\langle A_{\gamma} x_{n}, x_{n}-J_{s}^{A_{\gamma}}u \rangle\leq \|A_{\gamma} x_{n}\| sM. \end{array} $$

Thus, \(\limsup _{\gamma \to 0}\langle u-J_{s}^{A_{\gamma }}u, x_{n}-J_{s}^{A_{\gamma }}u \rangle \leq 0~\forall n \geq 1\). Therefore, given 𝜖 > 0, there exists δ := δ(𝜖) > 0 such that for all γ ∈ (0, δ],

$$\langle u-J_{s}^{A_{\gamma}}u,~ x_{n}-J_{s}^{A_{\gamma}}u \rangle < \epsilon. $$

Moreover, we have (in particular, for γ = δ) that for some constant M 0 > 0,

$$\begin{array}{@{}rcl@{}} \langle u-\hat{x},~x_{n}-\hat{x}\rangle &=& \langle u-\hat{x}, (x_{n}-\hat{x})-(x_{n}-J_{s}^{A_{\delta}}u)\rangle +\langle u-J_{s}^{A_{\delta}}u,~ x_{n}-J_{s}^{A_{\delta}}u\rangle \\ &&+\langle J_{s}^{A_{\delta}}u-\hat{x}, x_{n}-J_{s}^{A_{\delta}}u\rangle\\ &<&\langle u-\hat{x},(x_{n}-\hat{x})-(x_{n}-J_{s}^{A_{\delta}}u)\rangle +\|J_{s}^{A_{\delta}}u-\hat{x}\| M_{0}+\epsilon. \end{array} $$
(11)

Observe that

$$\lim\limits_{s \to \infty}\langle u-\hat{x},~ (x_{n}-\hat{x})-(x_{n}-J_{s}^{A_{\delta}}u)\rangle =0. $$

Thus, as s, we obtain from 11) that \(\langle u-\hat {x},x_{n}-\hat {x}\rangle \leq \epsilon \), so

$$\begin{array}{@{}rcl@{}} \limsup\limits_{n \to \infty} \langle u-\hat{x},x_{n}-\hat{x}\rangle \leq \epsilon \end{array} $$
(12)

and since 𝜖 > 0 is arbitrary, (12) gives

$$\limsup\limits_{n \rightarrow \infty} \langle u-\hat{x},x_{n}-\hat{x}\rangle \leq 0 $$

from which we can deduce that \(\underset {n \to \infty }\lim \xi _{n}=0\). From (7), we have

$$\begin{array}{@{}rcl@{}} \|x_{n+1}-\hat{x}\|^{2}&=& \|x_{n}-\hat{x}-\alpha_{n}(\beta(x_{n}-Tx_{n})+\gamma_{n}(x_{n}-u))\|^{2}\\ &\leq&\|x_{n}-\hat{x}\|^{2}-2\alpha_{n}\langle \beta(x_{n}-Tx_{n})+\gamma_{n}(x_{n}-u),x_{n+1}-\hat{x} \rangle \\ &\leq&\|x_{n}-\hat{x}\|^{2}-2\alpha_{n}\gamma_{n}\|x_{n+1}-\hat{x}\|^{2}+2\alpha_{n}\gamma_{n}\langle (x_{n+1}-x_{n})+u-\hat{x},x_{n+1}-\hat{x}\rangle \\ & &-2\alpha_{n}\langle (x_{n}-x_{n+1})+\beta(I-T)x_{n+1}+x_{n+1}-\beta(I-T)x_{n+1}-x_{n}\\ &&+\beta(I-T)x_{n}, x_{n+1}-\hat{x} \rangle \\ &\leq&\|x_{n}-\hat{x}\|^{2}-2\alpha_{n}\gamma_{n}\|x_{n+1}-\hat{x}\|^{2}+({\alpha_{n}^{2}}\gamma_{n}+{\alpha_{n}^{2}})M^{\ast}+2\alpha_{n}\gamma_{n}\langle u-\hat{x},x_{n+1}-\hat{x}\rangle \\ &\leq&\|x_{n}-\hat{x}\|^{2}-2\alpha_{n}\gamma_{n}\|x_{n+1}-\hat{x}\|^{2}+{\alpha_{n}^{2}}(\gamma_{n}+1)M^{\ast}+2\alpha_{n}\gamma_{n}\xi_{n} \\ &=&\|x_{n}-\hat{x}\|^{2}-2\alpha_{n}\gamma_{n}\|x_{n+1}-\hat{x}\|^{2}+\delta_{n}, \end{array} $$

where \(\delta _{n}={\alpha _{n}^{2}}(\gamma _{n}+1)M^{\ast }+2\alpha _{n}\gamma _{n}\xi _{n}=o(\alpha _{n})\) for some M > 0. Hence, by Lemma 4, we have that \(x_{n}\to \hat {x}\) as n. This completes the proof. □

4 Computational Analysis

In this section, we give a computational analysis result using our iterative scheme.

Let H be a real Hilbert space and C := {xH : ∥x∥ ≤ r} (i.e., a closed ball centered at the origin of radius r) and define f : C → ℝ by \(f(x) = \frac {1}{2}\|x\|^{2}\). Then f is convex and differentiable with ∇f(x) = x. Observe that ∇f is 1-Lipschitzian. Let us consider the problem

$$\begin{array}{@{}rcl@{}} \min\limits_{x \in C} f(x). \end{array} $$
(13)

Clearly, we see that the optimal solution \(\hat {x}\) to the minimization problem 13) is \(\hat {x}=0\).

Now, by our Theorem 1, we can take \(\alpha _{n}=\frac {1}{(n+1)^{1/2}}, \gamma _{n}=\frac {1}{(n+1)^{1/3}},~n \geq 1\) and \(\lambda =\frac {1}{2}\). By the choice of our function f and λ, we see that \(P_{C}(x-\frac {1}{2}\nabla f(x)) = P_{C}(x-\frac {1}{2}x) = P_{C}(\frac {1}{2}x) = \frac {1}{2}x\). Furthermore, our iterative scheme (6) becomes

$$\begin{array}{@{}rcl@{}} x_{n+1}=\left(1-\frac{1}{(n+1)^{\frac{1}{2}}}\right)x_{n}+\frac{x_{n}}{2(n+1)^{\frac{1}{2}}}-\frac{1}{(n+1)^{\frac{5}{6}}}(x_{n}-u). \end{array} $$

5 Conclusions

Since the gradient projection method (GPM) fails, in general, to converge in norm in infinite-dimensional Hilbert spaces, here in our result, we have provided a strongly convergent modification of gradient projection method (GPM).

We note that Theorem 4.1 and Theorem 4.2 of Xu [22] are weak convergence results, while our Theorem 1 is a strong convergence result. Thus, our Theorem 1 improves Theorem 4.1 and Theorem 4.2 of Xu [22]. Furthermore, our iterative scheme in this paper does not involve the “CQ” algorithm studied by Xu [22]. Also, our result does not require additional projections which was required in Theorem 5.4 of Xu [22] and Theorem 3.3 of [24] in order to guarantee strong convergence. Our method of proof is different from the methods of proof of Xu [22], Yao et al. [23], Ceng et al. [6], Su and Xu [18], and others.