1 Introduction

Let \(\varGamma _{0}(H)\) be a class of convex, lower semicontinuous, and proper functions from a Hilbert space H to \(( \infty , + \infty ].\) The non-smooth composite optimization problem (shortly, NSCOP) is defined by

$$\begin{aligned} \min _{x\in H} \big ( \psi (x) + \varphi (x) \big ), \end{aligned}$$
(1)

where \(\psi , \varphi \in \varGamma _{0}(H) \), \(\psi \) is differentiable, \(\varphi \) is not necessarily differentiable, and \(\nabla \psi \) is Lipschitz continuous on H. The NSCOP can be traced back to classical work of Bauschke and Combettes [1] and it also has a typical research fields in linear inverse problems [2]. This class of optimization problem covers a large number of applications in applied mathematics, robotics, medical imaging, financial engineering, machine learning, signal processing and resource allocation, see e.g. [3,4,5,6,7,8,9]. Due to recent boom in automations and machine learning, smart iterative algorithms are indispensable in the fields of artificial intelligence. This, however, has lead to an increase in demand for feasible and faster algorithms in which all the relevant components can be evaluated sufficiently quickly and at least time. One of the goals in the present paper is to seek to explore a novel and faster algorithm for solving NSCOP.

The important and well-known power tool for solving the problem (1) is proximal gradient algorithm. It is as an analogous tool for non-smooth, constrained, large-scale, or distributed versions of Newton’s method for solving unconstrained smooth optimization problems of modest size.

Assumption 1

[1]: (The existence of solution of NSCOP)

Let \(\varOmega \) be the set of all solutions of (1). If \( \psi (x) + \varphi (x) := \varPhi \) is coercive, that is,

$$\begin{aligned} \displaystyle \underset{\Vert x \Vert \rightarrow \infty }{\lim } \varPhi (x) = + \infty . \end{aligned}$$

Then \(\varOmega = Arg \min \varPhi \ne \emptyset . \) This means that \(\varPhi \) has a minimizer over H.

Assumption 2

[1]: (Uniqueness of Solution of Proximal Mappings)

Let \(\varphi \in \varGamma _{0}(H) ,\) then for each x, the function

$$\begin{aligned} \left( \frac{ \Vert y-x \Vert ^{2}}{2}+\varphi (y) \right) ~~\text {has exactly one minimizer over}~H. \end{aligned}$$

In view of this, the proximal operator of \(\varphi \) of order \(\lambda >0\) is defined by

$$\begin{aligned} \mathbf{prox} _{\lambda \varphi }(x):= \text {argmin}_{y\in H} \left\{ \frac{ \Vert y-x \Vert ^{2}}{2\lambda }+\varphi (y) \right\} ,\qquad x\in H.\end{aligned}$$
(2)

The proximal gradient algorithms (shortly, PGA) are used to split the contribution of the fuctions \( \psi \) and \( \varphi \), in particular, the gradient descent step determined by \(\psi \) [10, 11] and the proximal step induced by \(\varphi \) [12,13,14]. PGA is easy to implement and applicable in problems that involve large or high-dimensional datasets. Xu [15] submitted that \(x^{*}\) is a solution of the NSCOP (1) if and only if \(x^{*}\) solves the fixed point equation

$$\begin{aligned} x^{*}=\mathbf{prox} _{\lambda \varphi }( x^{*}- \lambda \nabla \psi (x^{*})). \end{aligned}$$
(3)

He then established the weak convergence of PGA as follows:

$$\begin{aligned} x_{n+1}= \mathbf{prox} _{\lambda \varphi }( x_{n}- \lambda _{n} \nabla \psi (x_{n} ) ),\qquad \forall n \ge 0, \end{aligned}$$
(4)

where \(\lambda _{n} > 0\) is a step size. Recently, Guo and Cui [16] modified the PGA by using viscosity algorithm [17] as follows:

$$\begin{aligned} x_{n+1}= \alpha _{n} f(x_{n}) + (1-\alpha _{n})\mathbf{prox} _{\lambda _n \varphi }( x_{n}- \lambda _{n} \nabla \psi (x_{n} ) ) + e(x_{n}),\qquad \forall n \ge 0, \end{aligned}$$
(5)

where \(\{\alpha _n\}\) is a sequence in [0, 1], \(f: H \rightarrow H \) is a \(\rho \in (0,1)\) contractive operator, \(L > 0\) a Lipschitz constant of \(\psi \) and \(e : H \rightarrow H \) represents a perturbation operator satisfying

$$\begin{aligned} \displaystyle \sum _{n=0}^\infty \Vert e(x_{n}) \Vert < + \infty . \end{aligned}$$

They proved that the sequence \(\{ x_n \}\) generated by (5) converges strongly to a point \(x^{*}\in \varOmega \), where \(x^{*}\) is the unique solution of the variational inequality problem:

$$\begin{aligned} \langle x^{*} - f(x^{*}) , x- x^{*} \rangle \ge 0, \qquad \forall x\in \varOmega , \end{aligned}$$
(6)

under the following conditions:

  1. (C1)

    \(0< a= \inf _{n} \lambda _{n} \le \lambda _{n} < \frac{2}{L}\) and \( \displaystyle \sum _{n=0}^\infty \Vert \lambda _{n+1}-\lambda _{n} \Vert < \infty \);

  2. (C2)

    \( \{\alpha _{n} \} \subset (0, 1)\) and satisfying \( \displaystyle \lim _{n\rightarrow \infty }\alpha _n=0 \);

  3. (C3)

    \( \displaystyle \sum _{n=0}^\infty \alpha _{n} = \infty \) and \( \displaystyle \sum _{n=0}^\infty \Vert \alpha _{n+1}-\alpha _{n} \Vert < \infty \).

The superiorization methodology (shortly, SM) was invented by Censor et al. [18] in attempt to check computation tractability of certain image recovery algorithms with limited computing resources. The SM is a heuristic approach with focus on time consumption [19, 20]. The base operation of SM requires that an algorithm which generates a convergent sequence of feasible solutions is bounded perturbation resilient, and that superiorization which uses perturbations proactively to get superior feasible solutions in the sense that the values of the objective function decrease is put in force. The technique is to find a point with a lower cost function value than other points through a simple algorithm is called the basic algorithm. This basic algorithm is usually pre-checked for perturbation resilience. The SM seems to be vast oversimplification, yet it represents a dramatic slight in approach to implementation of iterative algorithms. The SM has been investigated and studied by many researchers (see e.g. [6, 21,22,23]). Very recently, Guo and Cui showed that the sequence generated by (5) is perturbation resilient and prove strong convergence results which is also solution of variational inequality problem (6).

In optimization theory, the technique to increase efficiency of convergence rate, Polyak [24] pioneered the use of heavy ball method of the two-order time dynamical system which is a two-step iterative method for solving smooth convex minimization problem (i.e. a problem where all the constraints are convex functions, and the objective function is a smooth convex function). Later on, the development of this method was emphasized and used to increase the performance the convergence rate by Nesterov [25] as follows:

$$\begin{aligned} y_n= & {} x_n + \zeta _{n}(x_n - x_{n-1}), \end{aligned}$$
(7)
$$\begin{aligned} x_{n+1}= & {} y_n - \lambda _n \nabla f(y_n), \quad \forall n \ge 1, \end{aligned}$$
(8)

where \(\zeta _n \in [0,1)\) is an extrapolation factor, \(\lambda _n\) is a positive real sequence and \(\nabla f\) is the gradient of a smooth convex function f. The method is effective and converges faster since the vector \((x_{n} - x_{n-1}) \) acts as an impulsion term and \(\zeta _{n}\) serves as a speed regulator of the momentum term \(\zeta _{n} ( x_{n} - x_{n-1} )\) in (7). For more about inertial algorithms, we refer to [26,27,28,29,30,31,32] and the references therein.

Inspired and motivated by some contributions of authors mentioned above, we study a new method of approximation of solutions of the NSCOP (1) using a modified proximal gradient algorithm combined with inertial technique, and we then prove its strong convergence theorem of the sequence generated by the proposed algorithm under suitable conditions. Further, we verify that the proposed algorithm is of bounded perturbation resilient and apply it to image recovery problem. The effectiveness and the implementation of the proposed algorithm were illustrated by comparing it with previously known algorithms through numerical experiments in signal recovery.

The rest of this paper is organized as follows: basic notations, definitions and lemmas are recalled in Sect. 2. Our main results are presented in Sect. 3, that is, the inertial modified proximal gradient algorithm involving both perturbation and superiorization method are presented and some strong convergence results are proved. Next, Sect. 4 contains an application and numerical experiments to support our study. Finally, the conclusion of this research is given.

2 Preliminaries

The following notations, definitions and lemmas will play a crucial role in the sequel. Let \(\{ x_{n} \}\) be a sequence in Hilbert space H and \(x\in H.\)

  1. (1)

    \( x_{n} \rightarrow x\) means \( \{ x_{n} \} \) converges strongly to x

  2. (2)

    \(x_{n}\rightharpoonup x\) means \( \{ x_{n} \} \) converges weekly to x

  3. (3)

    A point z is said to be a cluster point of \(\{ x_{n} \}\) if there exists a subsequence \(\{ x_{n_j} \}\) which converges weekly to a point z. The set of all cluster points of \(\{ x_{n} \}\) is denoted by \(\omega _{W}(x_{n} ).\)

  4. (4)

    An operator \(T: H\rightarrow H\) is said to be \(\lambda \)-averaged if \(T= ( 1-\lambda )I + \lambda S\), where \(\lambda \in (0,1)\) and \(S: H \rightarrow H\) is nonexpansive;

  5. (5)

    \(A: H \rightarrow H\) is said to be v-inverse strongly monotone \((v-ism )\) with \(v > 0,\) if

    $$\begin{aligned} \langle Ax-Ay, x-y \rangle \ge \Vert Ax-Ay \Vert ^{2}\qquad \forall x,y\in H; \end{aligned}$$
  6. (6)

    T is nonexpansive if and only if the complement \(I-T\) is \(\frac{1}{2}\)-ism [33];

  7. (7)

    If T is \(\nu \)-ism, then for \(\gamma >0,\) \(\gamma T\) is \(\frac{\nu }{\gamma }\)-ism;

  8. (8)

    T is averaged if and only if the complement \(I-T\) is \(\nu \)-ism for some \(\nu > \frac{1}{2}.\) Indeed, T is \(\alpha \)-averaged if and only if \(I-T\) is \(\frac{1}{2\alpha }\) -ism, for \(\alpha \in (0,1);\)

  9. (9)

    The composite of finitely many averaged operators is averaged. That is, if for each \(i=1,..., N,\) the operator \(T_{i}\) is averaged, then so is the composite operator \(T_{1}...T_{N}.\) In particular, if \(T_{1}\) is \(\alpha _{1}\)-averaged and \(T_{2}\) is \(\alpha _{2}\)-averaged, where \( \alpha _{1}, \alpha _{2}\in (0,1)\), then the composite \(T_{1}T_{2}\) is \(\alpha \)-averaged, where \(\alpha = \alpha _{1}+ \alpha _{2}-\alpha _{1}\alpha _{2}\).

Lemma 1

The following results are well known.

  1. (i)

    \(\Vert x+y \Vert ^{2} = \Vert x \Vert ^{2} + 2 \langle x, y \rangle + \Vert y \Vert ^{2},~~\forall ~x,y\in H\);

  2. (ii)

    \(\Vert x+y \Vert ^{2} \le \Vert x \Vert ^{2} + 2 \langle y, x+ y \rangle ,~~\forall ~x,y\in H\).

Lemma 2

[15] Let \( \varphi \in \varGamma _{0}(H) \), \( \lambda >0 \) and \( \mu >0.\) Then

$$\begin{aligned} \mathbf{prox} _{\lambda \varphi }( x) = \mathbf{prox} _{\mu \varphi }\left( \frac{\mu }{\lambda } x + \left( 1- \frac{\mu }{\lambda } \right) \mathbf{prox} _{\lambda \varphi } x \right) . \end{aligned}$$

Lemma 3

[34] Let \( \varphi \in \varGamma _{0}(H) \) and \( \lambda >0 \). Then the proximal operator \(\mathbf{prox} _{\lambda \varphi }\) is \(\frac{1}{2}\)-average. In particular, it is nonexpansive:

$$\begin{aligned} \Vert \mathbf{prox} _{\lambda \varphi }( x) - \mathbf{prox} _{\lambda \varphi }(y) \Vert \le \Vert x-y \Vert ,\qquad \forall x. y \in H. \end{aligned}$$

Lemma 4

[1] Let \(T: H \rightarrow H\) be a nonexpansive mapping with \(\text {Fix}~(T) \ne \emptyset \). If a sequence \(\{ x_{n} \}\) in H converges weekly to x and \( \{ (I-T)x_{n} \} \) converges strongly to y, then \((I-T)x=y,\) where \(\text {Fix}~(T)\) is a fixed point set of T.

Lemma 5

[21] Assume that \(\{ a_{n} \}\) is a sequence of nonnegative real numbers satisfying

$$\begin{aligned} a_{n+1} \le (1-\gamma _{n}) a_{n} + \gamma _{n} \delta _{n} + \beta _{n} \end{aligned}$$

where \(\{ \gamma _{n}\},\) \(\{ \delta _{n}\},\) and \(\{ \beta _{n}\}\) satisfy the conditions:

  1. (i)

    \(\{ \gamma _{n}\}\subset [0, 1],\qquad \sum\nolimits _{n=0}^{\infty } \gamma _{n}= \infty ;\)

  2. (ii)

    \(\displaystyle \underset{n \rightarrow \infty }{\limsup }\,\, \delta _n \le 0;\)

  3. (iii)

    \(\beta _{n} \ge 0\) and \( \sum\nolimits _{n=0}^{\infty } \beta _{n} < \infty \). Then \(\displaystyle \underset{n \rightarrow \infty }{\lim } a_n =0.\)

3 Main results

In this section, using the proximal mapping, we demonstrate a prove of a strong convergence theorem for finding a solution of the non-smooth convex composite optimization problem (1).

Assumption 3

  1. (i)

    \(f: H\rightarrow H \) is a \(\rho \)-contractive operator with \(\rho \in (0, 1)\);

  2. (ii)

    \(\psi : H\rightarrow (-\infty , \infty ] \) is proper, lower semicontinuous, convex and \(\nabla \psi \) is Lipschitz continuous with Lipschitz constant \(L >0\);

  3. (iii)

    \( \varphi : H\rightarrow (-\infty , \infty ] \) is proper, lower semicontinuous and convex;

  4. (iv)

    The set \(\varOmega \) of all solutions of problem 1 is nonempty.

Assumption 4

Suppose \(\{ \lambda _{n}\}\subset (0, \frac{2}{L}),\) \( \{\zeta _{n} \}\subset [0, 1)\) and \( \{\alpha _{n} \}\) is a sequence in (0, 1) satisfying the following conditions:

  1. (B1)

    \(0< a= \inf _{n} \lambda _{n} \le \lambda _{n} < \frac{2}{L}\) and \( \sum _{n=0}^\infty \Vert \lambda _{n+1}-\lambda _{n} \Vert < \infty \);

  2. (B2)

    \( \displaystyle \lim _{n\rightarrow \infty }\alpha _n=0 ,\) and \( \sum _{n=0}^\infty \alpha _{n} = \infty \)

  3. (B3)

    \( \displaystyle \underset{n \rightarrow \infty }{\lim } \frac{\zeta _{n}}{\alpha _{n}} \Vert x_{n}-x_{n-1} \Vert =0.\)

Algorithm 3.1 : Inertial viscosity proximal gradient algorithm with perturbation

Initialization: Take arbitrary \(x_{0}, x_{1}\in H,\) and set \(n=1.\) Choose sequences \(\{ \alpha _{n} \}\) and \(\{ \zeta _n \}\) such that the conditions in Assumption 4 hold.

Step 1: If \(\Vert x_{n}- x_{n-1}\Vert =0.\) STOP, \((x_{n})\) is a solution of NSCOP (1). Otherwise, go to Step 2.

Step 2: For a given current \(x_{n-1}\), \(x_{n}\) and \(\zeta _{n}.\) Compute \(\theta _{n}\) as

$$\begin{aligned} \displaystyle \theta _{n}=x_{n}+\zeta _{n} (x_{n}-x_{n-1} ) . \end{aligned}$$

Step 3: Compute the next iterate \(( x_{n+1} ) \) as:

$$\begin{aligned} \displaystyle x_{n+1}=\alpha _n f( \theta _{n}) +(1-\alpha _n) \mathbf{prox} _{\lambda _{n} \varphi } \left( \theta _{n}- \lambda _{n} \nabla \psi (\theta _{n}) \right) + e(\theta _{n}), \quad n\ge 0. \end{aligned}$$

where \(e : H \rightarrow H\) is a perturbation operator satisfying the following condition

$$\begin{aligned} \displaystyle \sum _{n=0}^\infty \Vert e(\theta _{n}) \Vert < + \infty . \end{aligned}$$
(9)

Last step: Update \(n:=n+1\) and go to Step1 .

If \(\zeta _{n}= 0\) in Algorithm 3.1, then we obtain the viscosity proximal gradient algorithm with perturbation (5) of Guo and Cui [16].

Definition 1

(Bounded Perturbation Resilience)

Let \(\{ x_{n} \}\) be a sequence generated by the algorithm

$$\begin{aligned} {\left\{ \begin{array}{ll} \hbox {select\, abitrary\, starting \,point} \, x_0\in H, \\ x_{n+1} = Ax_{n},\qquad ~ n\ge 0, \end{array}\right. } \end{aligned}$$

where A is an algorithmic operator from H into H. Then A is said to be bounded perturbation resilient if

$$\begin{aligned} \displaystyle \underset{n \rightarrow \infty }{\lim } x_{n}= & {} \nu ~~\text {implies }~ \displaystyle \underset{n \rightarrow \infty }{\lim }y_{n} = \nu ,~~\text {for any sequence}~~\{ y_{n} \}~~\text {in}~H~\text {generated by } \\&\quad {\left\{ \begin{array}{ll} select\, abitrary\, starting \,point y_0\in H, \\ y_{n+1} = A( y_{n} + \beta _{n} v_{n}) \qquad ~ n\ge 0, \end{array}\right. } \end{aligned}$$

for a bounded vector sequence \(\{ v_{n} \}\) and scalar \(\{ \beta _{n} \}\) such that \(\beta _{n} \ge 0\) \(\forall n\ge 0,\) and \( \sum \nolimits _{n=0}^{\infty } \beta _{n} < \infty .\) This operator A is called the basic algorithm.

Algorithm 3.2: Inertial viscosity proximal gradient algorithm with superiorization method

Initialization: Take arbitrary \(x_{0}, x_{1}\in H,\) and set \(n=1.\) Choose sequences \(\{ v_{n} \}\) and scalar \(\{ \beta _{n} \}\) as in Definition 1. Choose \(\{ \alpha _{n} \}\) and \(\{ \zeta _n \}\) such that the conditions in Assumption 4 hold.

Step 1: If \(\Vert x_{n}- x_{n-1} \Vert =0.\) STOP \((x_{n})\) is a solution of NSCOP (1). Otherwise, go to Step 2.

Step 2: For a given current \(x_{n-1}\), \(x_{n}\) and \(\zeta _{n}.\) Compute \(\theta _{n}\) as

$$\begin{aligned} \displaystyle \theta _{n}=x_{n}+\zeta _{n} (x_{n}-x_{n-1} ) . \end{aligned}$$

Step 3: Compute the next iterate \(( x_{n+1} ) \) as:

$$\begin{aligned} \displaystyle x_{n+1}=\alpha _n f( \theta _{n} + \beta _{n}v_{n}) +(1-\alpha _n) \mathbf{prox} _{\lambda _{n} \varphi } \left( \theta _{n}+ \beta _{n}v_{n}- \lambda _{n} \nabla \psi (\theta _{n} +\beta _{n}v_{n}) \right) , \quad n\ge 0. \end{aligned}$$

Last step: Update \(n:=n+1\) and go to Step1.

Theorem 1

Assume that Assumption 3 holds. Then the sequence \(\{x_n\}\) generated by Algorithm 3.1 is bounded.

Proof

Step 1: We prove that \( \mathbf{prox} _{\lambda _{n} \varphi } \left( I- \lambda _{n} \nabla \psi \right) \) is nonexpansive for each n. Since \(\nabla \psi \) is L-Lipschitzian then \(\nabla \psi \) is \(\frac{1}{L}\)-ism which implies that \(\lambda _{n}\nabla \psi \) is \(\frac{1}{\lambda _{n}L}\)-ism and by (8), the complement \(\left( I- \lambda _{n} \nabla \psi \right) \) is \(\frac{\lambda _{n}L}{2}\)-average as \(\lambda _{n}\in (0, \frac{2}{L} ).\) This implies that \( \mathbf{prox} _{\lambda _{n} \varphi } \) is \(\frac{1}{2}\) -averaged. This further implies that the composition \( \mathbf{prox} _{\lambda _{n} \varphi } \left( I- \lambda _{n} \nabla \psi \right) \) is \(\frac{\lambda _{n}L + 2}{4}\)-averaged. Therefore the composition

$$\begin{aligned} \mathbf{prox} _{\lambda _{n} \varphi } \left( I- \lambda _{n} \nabla \psi \right) = \left( \frac{2-\lambda _{n} L}{4}\right) I +\left( \frac{2+\lambda _{n} L}{4} \right) T=(1-\alpha ) I +\alpha T~~\text { is nonexpansive, } \end{aligned}$$

for each n.

Step 2: We show that

$$\begin{aligned} \Vert \theta _{n}-x^{*} \Vert \le \Vert x_{n}-x^{*} \Vert + \alpha _{n} M_{1}. \end{aligned}$$
(10)

Let \(x^*\in \varOmega ,\) and \(T_{n}= \mathbf{prox} _{\lambda _{n} \varphi } \left( I- \lambda _{n} \nabla \psi \right) .\) From Algorithm 3.1, we compute

$$\begin{aligned} \begin{aligned} \Vert \theta _{n}-x^{*} \Vert&=\Vert x_{n} + \zeta _{n} (x_{n}-x_{n-1} ) - x^{*} \Vert \\&\le \Vert x_{n} - x^{*} \Vert + \zeta _{n} \Vert x_{n}-x_{n-1} \Vert \\&= \Vert x_{n} - x^{*} \Vert + \alpha _{n} \frac{\zeta _{n}}{\alpha _{n}} \Vert x_{n}-x_{n-1} \Vert . \end{aligned} \end{aligned}$$

By condition (B3) in Assumption 4. Then there exists a constant \(M_{1} \ge 0\) such that

$$\begin{aligned} \frac{\zeta _{n}}{\alpha _{n}} \Vert x_{n}-x_{n-1} \Vert \le M_{1}, \qquad \forall n\ge 0. \end{aligned}$$

Step 3: We show that

$$\begin{aligned} \Vert x_{n+1}-x^{*} \Vert \le \max \left\{ \Vert x_{0} - x^{*} \Vert , \frac{ M_{1}+ \Vert fx^{*}- x^{*} \Vert }{1-\rho } \right\} + \displaystyle \sum _{n=0}^{\infty } \Vert e(\theta _{n}) \Vert . \end{aligned}$$
(11)

It follows from Algorithm 3.1, we compute

$$\begin{aligned} \begin{aligned} \Vert x_{n+1}-x^{*} \Vert&=\Vert \alpha _{n} f (\theta _{n}) + (1-\alpha _{n} ) T_{n} (\theta _{n}) + e(\theta _{n}) - x^{*} \Vert \\&\le \alpha _{n} \Vert f (\theta _{n}) - x^{*} \Vert + (1-\alpha _{n} ) \Vert T_{n} (\theta _{n}) - x^{*} \Vert + \Vert e(\theta _{n}) \Vert \\&\le \alpha _{n} \Vert f (\theta _{n}) - f (x^{*} ) \Vert + \alpha _{n}\Vert f (x^{*} )- x^{*} \Vert + (1-\alpha _{n} ) \Vert T_{n} (\theta _{n}) - x^{*} \Vert + \Vert e( \theta _{n}) \Vert \\&\le \rho \alpha _{n} \Vert \theta _{n} - x^{*} \Vert + \alpha _{n}\Vert f (x^{*} )- x^{*} \Vert + (1-\alpha _{n} ) \Vert \theta _{n} - x^{*} \Vert + \Vert e( \theta _{n}) \Vert \\&= ( 1- \alpha _{n}( 1- \rho ) ) \Vert \theta _{n}- x^{*} \Vert + \alpha _{n} ( 1- \rho ) \frac{\Vert fx^{*}- x^{*} \Vert }{1-\rho } + \Vert e( \theta _{n}) \Vert \\&= ( 1- \alpha _{n}( 1- \rho ) ) \left( \Vert x_{n}-x^{*} \Vert + \alpha _{n} M_{1} \right) + \alpha _{n} ( 1- \rho ) \frac{\Vert fx^{*}- x^{*} \Vert }{1-\rho } + \Vert e( \theta _{n}) \Vert \\&= ( 1- \alpha _{n}( 1- \rho ) ) \Vert x_{n}-x^{*} \Vert + \alpha _{n} ( 1- \rho ) \frac{ ( M_{1}+\Vert fx^{*}- x^{*} \Vert ) }{1-\rho } + \Vert e( \theta _{n}) \Vert \\&\le \max \left\{ \Vert x_{n} - x^{*} \Vert , \frac{ M_{1}+ \Vert fx^{*}- x^{*} \Vert }{1-\rho } \right\} + \Vert e( \theta _{n}) \Vert . \end{aligned} \end{aligned}$$

By mathematical induction, we derive relation (11). Using condition (9), we deduce that \(\{ x_{n} \}\) is bounded. \(\square \)

Theorem 2

Assume that Assumption 3 holds. Then the sequence \(\{x_n\}\) generated by Algorithm 3.1 converges strongly to a point \(x^{*}\in \displaystyle \varOmega ,\) satisfying

$$\begin{aligned} \langle x^{*} - f(x^{*}) , x- x^{*} \rangle \ge 0, \qquad \forall ~x\in \varOmega . \end{aligned}$$
(12)

Proof

Step 1: We show that

$$\begin{aligned} \Vert \theta _{n}- \theta _{n-1} \Vert \le \Vert x_{n} - x_{n-1} \Vert + (\alpha _{n} + \alpha _{n-1} )M_{1} . \end{aligned}$$
(13)

By using Algorithm 3.1, we get

$$\begin{aligned} \begin{aligned} \Vert \theta _{n}- \theta _{n-1} \Vert&= \Vert x_{n}+\zeta _{n} (x_{n}-x_{n-1} )- x_{n-1}-\zeta _{n-1} (x_{n-1}-x_{n-2} ) \Vert \\&= \Vert x_{n} - x_{n-1} +\zeta _{n} (x_{n}-x_{n-1} )- \zeta _{n-1} (x_{n-1}-x_{n-2} ) \Vert \\&\le \Vert x_{n} - x_{n-1} \Vert + \zeta _{n} \Vert x_{n}-x_{n-1} \Vert + \zeta _{n-1} \Vert x_{n-1}-x_{n-2} \Vert \\&\le \Vert x_{n} - x_{n-1} \Vert + \alpha _{n} M_{1} + \alpha _{n-1} M_{1} \\&= \Vert x_{n} - x_{n-1} \Vert + (\alpha _{n} + \alpha _{n-1} )M_{1} . \end{aligned} \end{aligned}$$

Step 2: We show that

$$\begin{aligned} \begin{aligned} \Vert T_{n} (\theta _{n-1})-T_{n-1} (\theta _{n-1}) \Vert \le \frac{ \mid \lambda _{n} - \lambda _{n-1} \mid }{a} \Vert \theta _{n-1}- T_{n-1} (\theta _{n-1}) \Vert , \qquad \text {where}~ a=\inf _{ n\in {\mathbb {N}}}\lambda _{n} > 0. \end{aligned} \end{aligned}$$
(14)

We compute as follows:

$$\begin{aligned} \begin{aligned}&\Vert T_{n} (\theta _{n-1})-T_{n-1} (\theta _{n-1}) \Vert \\&\quad = \Vert \mathbf{prox} _{\lambda _{n} \varphi } \left( \theta _{n}- \lambda _{n} \nabla \psi (\theta _{n}) \right) - \mathbf{prox} _{\lambda _{n-1} \varphi } \left( \theta _{n-1}- \lambda _{n-1} \nabla \psi (\theta _{n-1}) \right) \Vert \\&\quad = \Vert \mathbf{prox} _{\lambda _{n} \varphi } \left( \theta _{n-1}- \lambda _{n} \nabla \psi (\theta _{n-1}) \right) \\&\qquad - \mathbf{prox} _{\lambda _{n} \varphi } [ \frac{\lambda _{n}}{\lambda _{n-1}} (\theta _{n-1}- \lambda _{n-1} \nabla \psi (\theta _{n-1}) )\\&\qquad + (1-\frac{\lambda _{n}}{\lambda _{n-1}}) \mathbf{prox} _{\lambda _{n-1} \varphi } \left( \theta _{n-1}- \lambda _{n-1} \nabla \psi (\theta _{n-1}) \right) ] \Vert \\&\qquad \le \Vert ( \theta _{n-1}- \lambda _{n} \nabla \psi (\theta _{n-1}) ) - \frac{\lambda _{n}}{\lambda _{n-1}} (\theta _{n-1} - \lambda _{n-1} \nabla \psi (\theta _{n-1}))\\&\qquad - (1-\frac{\lambda _{n}}{\lambda _{n-1}})T_{n-1} (\theta _{n-1}) \Vert \\&\quad = \,\mid 1- \frac{\lambda _{n}}{\lambda _{n-1}} \mid \Vert \theta _{n-1}- T_{n-1} (\theta _{n-1}) \Vert \\&\quad \le \frac{ \mid \lambda _{n} - \lambda _{n-1} \mid }{a} \Vert \theta _{n-1}- T_{n-1} (\theta _{n-1}) \Vert . \end{aligned} \end{aligned}$$

Step 3: We show that

$$\begin{aligned} \displaystyle \underset{n \rightarrow \infty }{\lim } \Vert x_{n+1}-x_{n} \Vert =0. \end{aligned}$$
(15)

It follows from Algorithm 3.1, we compute

$$\begin{aligned} \begin{aligned} \Vert x_{n+1}-x_{n} \Vert =&\Vert \alpha _{n} f (\theta _{n}) + (1-\alpha _{n} ) T_{n} (\theta _{n}) + e(\theta _{n}) - \alpha _{n-1} f (\theta _{n-1}) \\&- (1-\alpha _{n-1} ) T_{n-1} (\theta _{n-1}) - e(\theta _{n-1}) \Vert \\ \le&\Vert \alpha _{n} ( f (\theta _{n})- f (\theta _{n-1}) ) + ( \alpha _{n} f (\theta _{n-1}) - \alpha _{n-1} f (\theta _{n-1}) ) \Vert \\&+ \Vert (1-\alpha _{n} ) ( T_{n} (\theta _{n})- T_{n} (\theta _{n-1}) ) \\&+ (1-\alpha _{n} ) T_{n} (\theta _{n-1})- (1-\alpha _{n-1} ) T_{n-1} (\theta _{n-1}) \Vert \\&+ \Vert e(\theta _{n}) \Vert + \Vert e(\theta _{n-1}) \Vert . \end{aligned} \end{aligned}$$

Since f is \(\rho \)-contractive, we obtain

$$\begin{aligned} \begin{aligned} \Vert x_{n+1}-x_{n} \Vert&\le \rho \alpha _{n} \Vert \theta _{n}- \theta _{n-1} \Vert + \mid \alpha _{n}-\alpha _{n-1} \mid \Vert f (\theta _{n-1}) \Vert \\&\quad + (1-\alpha _{n} ) \Vert T_{n} (\theta _{n}) - T_{n} (\theta _{n-1}) \Vert + (1-\alpha _{n} ) \Vert T_{n} (\theta _{n-1})-T_{n-1} (\theta _{n-1}) \Vert \\&\quad + \mid \alpha _{n}-\alpha _{n-1} \mid \Vert T_{n-1} (\theta _{n-1}) \Vert +\Vert e(\theta _{n}) \Vert + \Vert e(\theta _{n-1}) \Vert \\&\le \rho \alpha _{n} \Vert \theta _{n}- \theta _{n-1} \Vert + \mid \alpha _{n}-\alpha _{n-1} \mid \Vert f (\theta _{n-1}) \Vert + (1-\alpha _{n} ) \Vert \theta _{n} - \theta _{n-1} \Vert \\&\quad + (1-\alpha _{n} ) \Vert T_{n} (\theta _{n-1})-T_{n-1} (\theta _{n-1}) \Vert + \mid \alpha _{n}-\alpha _{n-1} \mid \Vert T_{n-1} (\theta _{n-1}) \Vert \\&\quad + \Vert e(\theta _{n}) \Vert + \Vert e(\theta _{n-1}) \Vert \\&= ( 1- \alpha _{n}( 1- \rho ) ) \Vert \theta _{n}- \theta _{n-1} \Vert + \mid \alpha _{n}-\alpha _{n-1} \mid ( \Vert f (\theta _{n-1}) \Vert + \Vert T_{n-1} (\theta _{n-1}) \Vert ) \\&\quad + (1-\alpha _{n} ) \Vert T_{n} (\theta _{n-1})-T_{n-1} (\theta _{n-1}) \Vert + \Vert e(\theta _{n}) \Vert + \Vert e(\theta _{n-1}) \Vert . \end{aligned} \end{aligned}$$
(16)

Substituting (14) into (16), we obtain

$$\begin{aligned} \begin{aligned} \Vert x_{n+1}-x_{n} \Vert&\le ( 1- \alpha _{n}( 1- \rho ) ) \Vert \theta _{n}- \theta _{n-1} \Vert \\&\quad + \mid \alpha _{n}-\alpha _{n-1} \mid ( \Vert f (\theta _{n-1}) \Vert + \Vert T_{n-1} (\theta _{n-1}) \Vert ) \\&\quad + (1-\alpha _{n} ) \frac{ \mid \lambda _{n} - \lambda _{n-1} \mid \Vert \theta _{n-1}- T_{n-1} (\theta _{n-1}) \Vert }{a} + \Vert e(\theta _{n}) \Vert + \Vert e(\theta _{n-1}) \Vert \\&\le ( 1- \alpha _{n}( 1- \rho ) ) \Vert \theta _{n}- \theta _{n-1} \Vert + \frac{ \mid \lambda _{n} - \lambda _{n-1} \mid (\Vert \theta _{n-1}\Vert + \Vert T_{n-1} (\theta _{n-1}) \Vert ) }{a} \\&\quad - \alpha _{n} (1- \rho ) \frac{ \mid \lambda _{n} - \lambda _{n-1} \mid (\Vert \theta _{n-1}\Vert + \Vert T_{n-1} (\theta _{n-1}) \Vert ) }{a (1- \rho )} \\&\quad + \mid \alpha _{n}-\alpha _{n-1} \mid ( \Vert f (\theta _{n-1}) \Vert + \Vert T_{n-1} (\theta _{n-1}) \Vert ) + \Vert e(\theta _{n}) \Vert + \Vert e(\theta _{n-1}) \Vert . \end{aligned} \end{aligned}$$
(17)

Substituting (13) into (17), we obtain

$$\begin{aligned} \begin{aligned} \Vert x_{n+1}-x_{n} \Vert \le&( 1- \alpha _{n}( 1- \rho ) ) ( \Vert x_{n} - x_{n-1} \Vert + (\alpha _{n} + \alpha _{n-1} )M_{1} )\\&- \alpha _{n} (1- \rho )\frac{ \mid \lambda _{n} - \lambda _{n-1} \mid ( \Vert \theta _{n-1}\Vert + \Vert T_{n-1} (\theta _{n-1}) \Vert ) }{a (1- \rho )} \\&+ M_{2} ( \mid \alpha _{n}-\alpha _{n-1} \mid + \mid \lambda _{n} - \lambda _{n-1} \mid ) + \Vert e(\theta _{n}) \Vert + \Vert e(\theta _{n-1}) \Vert , \end{aligned} \end{aligned}$$

where \( M_{2}= \sup _{n\in {\mathbb {N}}} \left\{ \Vert f(\theta _{n-1} ) \Vert + \Vert T_{n-1}\theta _{n-1} \Vert , \frac{\Vert \theta _{n-1}\Vert + \Vert T_{n-1} (\theta _{n-1}) \Vert }{a } \right\} .\) This implies

$$\begin{aligned} \begin{aligned} \Vert x_{n+1}-x_{n} \Vert \le&( 1- \alpha _{n}( 1- \rho ) ) \Vert x_{n} - x_{n-1} \Vert \\&+ (-1) \alpha _{n} (1- \rho )\frac{ \mid \lambda _{n} - \lambda _{n-1} \mid ( \Vert \theta _{n-1}\Vert + \Vert T_{n-1} (\theta _{n-1}) \Vert ) }{a (1- \rho )} \\&+ M_{2} ( \mid \alpha _{n}-\alpha _{n-1} \mid + \mid \lambda _{n} - \lambda _{n-1} \mid ) + ( 1- \alpha _{n}( 1- \rho ) ) (\alpha _{n} + \alpha _{n-1} )M_{1} \\&+ \Vert e(\theta _{n}) \Vert + \Vert e(\theta _{n-1}) \Vert . \end{aligned} \end{aligned}$$
(18)

Taking \( \beta _{n} = M_{2} ( \mid \alpha _{n}-\alpha _{n-1} \mid + \mid \lambda _{n} - \lambda _{n-1} \mid ) + ( 1- \alpha _{n}( 1- \rho ) ) (\alpha _{n} + \alpha _{n-1} )M_{1} + \Vert e(\theta _{n}) \Vert + \Vert e(\theta _{n-1}) \Vert ,\) \( \gamma _{n}= \alpha _{n}( 1- \rho )\) and \( \delta _{n}= \frac{(-1) \mid \lambda _{n} - \lambda _{n-1} \mid ( \Vert \theta _{n-1}\Vert + \Vert T_{n-1} (\theta _{n-1}) \Vert ) }{a (1- \rho )} \) in equation (18), we have that

$$\begin{aligned} \Vert x_{n+1}-x_{n} \Vert \le ( 1- \gamma _{n}) \Vert x_{n} - x_{n-1} \Vert + \gamma _{n}\delta _{n} + \beta _{n}. \end{aligned}$$
(19)

Applying Lemma 5 on (19), we deduce that \(\displaystyle \underset{n \rightarrow \infty }{\lim } \Vert x_{n+1}-x_{n} \Vert =0.\)

Step 4: We show that

$$\begin{aligned} \displaystyle \underset{j \rightarrow \infty }{\lim } \Vert T_{n_{j}}( \theta _{n_{j}})-T(\theta _{n_{j}}) \Vert =0. \end{aligned}$$
(20)

We apply Lemma 3 to obtain

$$\begin{aligned} \begin{aligned}&\Vert T_{n_{j}} \theta _{n_{j}}-T\theta _{n_{j}} \Vert \\&\quad = \Vert \mathbf{prox} _{\lambda _{n_{j}} \varphi } \left( \theta _{n_{j}}- \lambda _{n_{j}} \nabla \psi (\theta _{n_{j}}) \right) - \mathbf{prox} _{\lambda \varphi } \left( \theta _{n_{j}}- \lambda \nabla \psi (\theta _{n_{j}}) \right) \Vert \\&\quad = \Vert \mathbf{prox} _{\lambda _{n} \varphi } \left( \frac{\lambda }{\lambda _{n_{j}}}( \theta _{n_{j}}- \lambda _{n_{j}} \nabla \psi (\theta _{n_{j}}) + ( 1-\frac{\lambda }{\lambda _{n_{j}}} )\mathbf{prox} _{\lambda _{n_{j}} \varphi } \left( \theta _{n_{j}} - \lambda _{n_{j}} \nabla \psi (\theta _{n_{j}}) \right) \right) \\&\qquad - \mathbf{prox} _{\lambda \varphi } \left( \theta _{n_{j}}- \lambda \nabla \psi (\theta _{n_{j}}) \right) \Vert \\&\quad \le \Vert \frac{\lambda }{\lambda _{n_{j}}}( \theta _{n_{j}}- \lambda _{n_{j}} \nabla \psi (\theta _{n_{j}}) \\&\qquad + ( 1-\frac{\lambda }{\lambda _{n_{j}}} )\mathbf{prox} _{\lambda _{n_{j}} \varphi } \left( \theta _{n_{j}} - \lambda _{n_{j}} \nabla \psi (\theta _{n_{j}}) \right) - \left( \theta _{n_{j}}- \lambda \nabla \psi (\theta _{n_{j}}) \right) \Vert \\&\quad = \Vert ( \frac{\lambda }{\lambda _{n_{j}}}-1 ) \theta _{n_{j}} + ( 1-\frac{\lambda }{\lambda _{n_{j}}} ) \mathbf{prox} _{\lambda _{n_{j}} \varphi } \left( \theta _{n_{j}} - \lambda _{n_{j}} \nabla \psi (\theta _{n_{j}}) \right) \Vert \\&\quad = \mid 1- \frac{\lambda }{\lambda _{n_{j}}} \mid \Vert \theta _{n_{j}}- T_{n_{j}} \theta _{n_{j}} \Vert \\&\quad \le \mid 1- \frac{\lambda }{\lambda _{n_{j}}} \mid ( \Vert \theta _{n_{j}} \Vert + \Vert T_{n_{j}} \theta _{n_{j}} \Vert ). \end{aligned} \end{aligned}$$
(21)

Letting \(\lambda _{n_{j}} \rightarrow \lambda \) as \(j \rightarrow \infty \) in (21), we derive (20).

Step 5: We show that

$$\begin{aligned} \displaystyle \underset{j \rightarrow \infty }{\lim } \Vert x_{n_{j}}-Tx_{n_{j}} \Vert =0. \end{aligned}$$
(22)

We compute

$$\begin{aligned} \begin{aligned} \Vert x_{n_{j}}-Tx_{n_{j}} \Vert&= \Vert x_{n_{j}}- \mathbf{prox} _{\lambda \varphi } \left( x_{n_{j}}- \lambda \nabla \psi (x_{n_{j}}) \right) \Vert \\&\le \Vert x_{n_{j}}- x_{n_{j+1}} \Vert + \Vert x_{n_{j+1}}- \mathbf{prox} _{\lambda \varphi } \left( x_{n_{j}}- \lambda \nabla \psi (x_{n_{j}}) \right) \Vert \\&= \Vert x_{n_{j}}- x_{n_{j+1}} \Vert + \Vert \alpha _{n_{j}} f( \theta _{n_{j}})- (1-\alpha _{n_{j}}) T_{n_{j}} \theta _{n_{j}} + e( \theta _{n_{j}}) \\&- \mathbf{prox} _{\lambda \varphi } \left( \theta _{n_{j}}- \lambda \nabla \psi (\theta _{n_{j}}) \right) \Vert \\&\le \Vert x_{n_{j}}- x_{n_{j+1}} \Vert + \alpha _{n_{j}}\Vert f( \theta _{n_{j}})-T( \theta _{n_{j}}) \Vert + (1-\alpha _{n_{j}}) \Vert T_{n_{j}} (\theta _{n_{j}})- T(\theta _{n_{j}}) \Vert \\&+ \Vert e(\theta _{n_{j}}) \Vert . \end{aligned} \end{aligned}$$
(23)

Since \(\{ x_{n} \}\) is bounded, then there exists a subsequence \(\{ x_{n_{j}} \}\) such that \(x_{n_{j}} \rightharpoonup z\) as \( j \rightarrow \infty .\) That is, z is a cluster point of \(\{ x_{n} \}\). Taking limit on both sides of (23) using (16), (20) and the hypothesis on \(\{ \alpha _{n}\}.\) We obtain (22). Since \(x_{n_{j}} \rightharpoonup z\) as \( j \rightarrow \infty \) and \( (I-T)x_{n_{j}} \rightarrow 0 ,\) then by Lemma 4, we have that \((I-T)z=0\) which implies that \(z\in \varOmega .\) Hence \(\omega _{W}(x_{n} ) \subset \varOmega .\)

Step 6: We show that

$$\begin{aligned} \displaystyle \underset{n \rightarrow \infty }{\limsup } \langle fx^{*}-x^{*} , \theta _{n}-x^{*} \rangle \le 0. \end{aligned}$$
(24)

To see this, let \(\{ \theta _{n_{j}} \}\) be arbitrary subsequence of \(\{ \theta _{n} \}\) such that

$$\begin{aligned} \displaystyle \underset{n\rightarrow \infty }{\limsup } \langle fx^{*}-x^{*} , \theta _{n}-x^{*} \rangle = \underset{j \rightarrow \infty }{\lim } \langle fx^{*}-x^{*} , \theta _{n_{j}}-x^{*} \rangle . \end{aligned}$$

Since \(\{ \theta _{n_{j}} \}\) is bounded, it has a weakly convergent subsequence. Without loss of generality, choose \(\{ \theta _{n_{j}} \}\) has the subsequence and assume that \(\theta _{n_{j}} \rightharpoonup u\in \varOmega .\) Then, by our hypothesis, we have

$$\begin{aligned} \langle fx^{*}-x^{*} , u-x^{*} \rangle \le 0 . \end{aligned}$$

Hence,

$$\begin{aligned} \displaystyle \underset{n\rightarrow \infty }{\limsup } \langle fx^{*}-x^{*} , \theta _{n}-x^{*} \rangle = \underset{j \rightarrow \infty }{\lim } \langle fx^{*}-x^{*} , \theta _{n_{j}}-x^{*} \rangle \le \langle fx^{*}-x^{*} , u-x^{*} \rangle \le 0 . \end{aligned}$$

Step 7: We show that

$$\begin{aligned} \displaystyle \underset{n \rightarrow \infty }{\lim } x_{n}= x^{*} . \end{aligned}$$
(25)

It follows from Algorithm 3.1, we compute

$$\begin{aligned} \begin{aligned} \Vert x_{n+1}- x^{*} \Vert ^{2}&= \Vert \alpha _{n} f( \theta _{n})- (1-\alpha _{n}) T_{n} \theta _{n} + e( \theta _{n})- x^{*} \Vert ^{2} \\&\le \Vert \alpha _{n} ( f( \theta _{n})-x^{*} ) + (1-\alpha _{n})( T_{n} \theta _{n} - x^{*}) + e( \theta _{n}) \Vert ^{2}\\&\le \Vert \alpha _{n} ( f( \theta _{n})-x^{*} ) + (1-\alpha _{n})( T_{n} \theta _{n} - x^{*}) \Vert ^{2} + 2 \langle \theta _{n+1}-x^{*}, + e( \theta _{n}) \rangle \\&= \Vert \alpha _{n} ( f( \theta _{n})-x^{*} ) + (1-\alpha _{n})( T_{n} \theta _{n} - x^{*}) \Vert ^{2} + 2 \Vert \theta _{n+1}-x^{*} \Vert \Vert e( \theta _{n}) \Vert \\&= \Vert \alpha _{n} ( f( \theta _{n})- fx^{*} ) + (1-\alpha _{n})( T_{n} \theta _{n} - x^{*}) + \alpha _{n} ( f( x^{*})-x^{*}) \Vert ^{2} \\&\quad + 2 \Vert \theta _{n+1}-x^{*} \Vert \Vert e( \theta _{n}) \Vert \\&\le \Vert \alpha _{n} ( f( \theta _{n})- fx^{*} ) + (1-\alpha _{n})( T_{n} \theta _{n} - x^{*}) \Vert ^{2} \\&\quad + 2\langle \alpha _{n} ( fx^{*}-x^{*} ), \theta _{n+1}- e( \theta _{n})-x^{*} \rangle \\&\quad + 2 \Vert \theta _{n+1}-x^{*} \Vert \Vert e( \theta _{n}) \Vert . \end{aligned} \end{aligned}$$

Applying Lemma 1, we get

$$\begin{aligned} \begin{aligned} \Vert x_{n+1}- x^{*} \Vert ^{2}&\le \alpha _{n} \Vert f( \theta _{n})- fx^{*} \Vert ^{2} + (1-\alpha _{n}) \Vert T_{n} \theta _{n} - x^{*} \Vert ^{2} \\&\quad + 2\langle \alpha _{n} ( fx^{*}-x^{*} ), \theta _{n+1}- e( \theta _{n})-x^{*} \rangle + 2 \Vert \theta _{n+1}-x^{*} \Vert \Vert e( \theta _{n}) \Vert \\&\le \rho ^{2} \alpha _{n} \Vert \theta _{n}- x^{*} \Vert ^{2} + (1-\alpha _{n}) \Vert \theta _{n} - x^{*} \Vert ^{2} \\&\quad + 2 \alpha _{n} \langle fx^{*}-x^{*} , \theta _{n+1}- e( \theta _{n})-x^{*} \rangle + 2 \Vert \theta _{n+1}-x^{*} \Vert \Vert e( \theta _{n}) \Vert \\&= ( 1- \alpha _{n}(1-\rho ^{2}) ) \Vert \theta _{n}- x^{*} \Vert ^{2} + 2 \alpha _{n} \langle fx^{*}-x^{*} , \theta _{n+1}-x^{*} \rangle \\&\quad + 2 \alpha _{n} \langle fx^{*}-x^{*} , e( \theta _{n}) \rangle + 2 \Vert \theta _{n+1}-x^{*} \Vert \Vert e( \theta _{n}) \Vert \\&\le ( 1- \alpha _{n}(1-\rho ^{2}) ) \Vert \theta _{n}- x^{*} \Vert ^{2} + 2 \alpha _{n} \langle fx^{*}-x^{*} , \theta _{n+1}-x^{*} \rangle \\&\quad + 2 \left( \alpha _{n} \Vert fx^{*}-x^{*} \Vert + \Vert \theta _{n+1}-x^{*} \Vert \right) \Vert e( \theta _{n}) \Vert \\&\le ( 1- \alpha _{n}(1-\rho ^{2}) ) \Vert \theta _{n}- x^{*} \Vert ^{2} + 2 \alpha _{n} \langle fx^{*}-x^{*} , \theta _{n+1}-x^{*} \rangle \\&\quad + 2 \left( \alpha _{n} \Vert fx^{*}-x^{*} \Vert + \Vert \theta _{n+1}-x^{*} \Vert \right) \Vert e( \theta _{n}) \Vert . \end{aligned} \end{aligned}$$
(26)

Substituting (13) into (26) yields

$$\begin{aligned} \begin{aligned} \Vert x_{n+1}- x^{*} \Vert ^{2}&\le ( 1- \alpha _{n}(1-\rho ^{2}) ) ( \Vert x_{n}-x^{*} \Vert + \alpha _{n} M_{1} ) ^{2} + 2 \alpha _{n} \langle fx^{*}-x^{*} , \theta _{n+1}-x^{*} \rangle \\&\quad + 2 \left( \alpha _{n} \Vert fx^{*}-x^{*} \Vert + \Vert \theta _{n+1}-x^{*} \Vert \right) \Vert e( \theta _{n}) \Vert \\&\le ( 1- \alpha _{n}(1-\rho ^{2}) ) \Vert x_{n}-x^{*} \Vert ^{2} + 2 \alpha _{n} \langle fx^{*}-x^{*} , \theta _{n+1}-x^{*} \rangle \\&\quad + M_{ 3} \Vert e( \theta _{n}) \Vert , \end{aligned} \end{aligned}$$

where \(M_{ 3}= \sup _{n} \{ 2 \left( \alpha _{n} \Vert fx^{*}-x^{*} \Vert + \Vert \theta _{n+1}-x^{*} \Vert \right) \} < \infty \). Setting \(\gamma _{n}= \alpha _{n}( 1-\rho ^{2}),\) \(\beta _{n}= M_{ 3}\Vert e(\theta _{n} \Vert \) and \(\delta _{n}= \frac{2}{1-\rho ^{2}}\langle fx^{*}-x^{*} , \theta _{n+1}-x^{*} \rangle \). We have

$$\begin{aligned} \Vert x_{n+1}- x^{*} \Vert ^{2} \le ( 1- \gamma _{n})\Vert x_{n}- x^{*} \Vert ^{2} + \gamma _{n}\delta _{n} + \beta _{n}. \end{aligned}$$
(27)

Hence by Lemma 5, we get \( \displaystyle \underset{n \rightarrow \infty }{\lim } \Vert x_{n}- x^{*} \Vert =0.\) This completes the proof. \(\square \)

Theorem 3

Under the same condition of Theorem 2. Let \(\{ \beta _{n} \}\) and \(\{ v_{n} \}\) satisfy the assumption in Definition 1. Then Algorithm 3.1 is bounded perturbation resilient. That is, the sequence generated by Algorithm 3.2 converges strongly to \(x^{*}\in \varOmega .\)

Proof

Algorithm 3.2 can be rewritten as

$$\begin{aligned} {\left\{ \begin{array}{ll} \displaystyle \theta _{n}=x_{n}+\zeta _{n} (x_{n}-x_{n-1} ), \\ \displaystyle x_{n+1}=\alpha _n f( \theta _{n} ) +(1-\alpha _n) \mathbf{prox} _{\lambda _{n} \varphi } \left( \theta _{n}- \lambda _{n} \nabla \psi (\theta _{n} ) \right) + e(\theta _{n}) , \quad n\ge 0, \end{array}\right. } \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} e( \theta _{n})&= \alpha _{n} \left( f( \theta _{n} + \beta _{n}v_{n})- f(\theta _{n} ) \right) \\&\quad + ( 1-\alpha _{n} ) ( \mathbf{prox} _{\lambda _{n} \varphi } \left( \theta _{n}+ \beta _{n}v_{n}- \lambda _{n} \nabla \psi (\theta _{n} +\beta _{n}v_{n}) \right) - \mathbf{prox} _{\lambda _{n} \varphi } \left( \theta _{n}- \lambda _{n} \nabla \psi (\theta _{n} ) \right) ). \end{aligned} \end{aligned}$$

We check that \( \sum\nolimits _{n=0}^\infty \Vert e(\theta _{n}) \Vert < + \infty .\) Applying Lemma 3, we get

$$\begin{aligned} \begin{aligned} \Vert e( \theta _{n}) \Vert&\le \alpha _{n} \Vert f( \theta _{n} + \beta _{n}v_{n})- f(\theta _{n} ) \Vert \\&\quad + ( 1-\alpha _{n} ) \Vert \mathbf{prox} _{\lambda _{n} \varphi } \left( \theta _{n}+ \beta _{n}v_{n}- \lambda _{n} \nabla \psi (\theta _{n} +\beta _{n}v_{n}) \right) \\&\quad - \mathbf{prox} _{\lambda _{n} \varphi } \left( \theta _{n}- \lambda _{n} \nabla \psi (\theta _{n} ) \right) \Vert \\&\le \alpha _{n}\rho \Vert ( \theta _{n} + \beta _{n}v_{n})- (\theta _{n} ) \Vert + ( 1-\alpha _{n} ) \Vert \left( \theta _{n}+ \beta _{n}v_{n}- \lambda _{n} \nabla \psi (\theta _{n} +\beta _{n}v_{n}) \right) \\&\quad - \left( \theta _{n}- \lambda _{n} \nabla \psi (\theta _{n} ) \right) \Vert \\&\le \alpha _{n} \Vert \beta _{n}v_{n} \Vert + ( 1-\alpha _{n} ) \Vert \beta _{n}v_{n}+ \lambda _{n} ( \nabla \psi (\theta _{n}) - \nabla \psi (\theta _{n} +\beta _{n}v_{n} ) ) \Vert \\&\le \alpha _{n} \Vert \beta _{n}v_{n} \Vert + ( 1-\alpha _{n} ) \Vert \beta _{n}v_{n} \Vert + ( 1-\alpha _{n} ) \lambda _{n}L \Vert \theta _{n} - (\theta _{n} +\beta _{n}v_{n} ) \Vert \\&= ( 1+ (1-\alpha _{n})\lambda _{n}L )\Vert \beta _{n}v_{n} \Vert . \end{aligned} \end{aligned}$$

This means that \( \displaystyle \sum _{n=0}^\infty \Vert e(\theta _{n}) \Vert < + \infty \). Therefore, the conclusion follows from Theorem 2. \(\square \)

Remark 1

Typically, the inertial scheme speeds up (boosts) the iterative sequence towards the solution set. It provides a major advantage over the execution time and the number of iterations for large-scale problems.

Remark 2

If \(\zeta _{n}= 0\) in Algorithm 3.2, then we obtain the modified proximal gradient algorithm with superiorization method of Guo and Cui [16] as follows:

$$\begin{aligned} \displaystyle x_{n+1}=\alpha _n f( x_{n} + \beta _{n}v_{n}) +(1-\alpha _n) \mathbf{prox} _{\lambda _{n} \varphi } \left( x_{n}+ \beta _{n}v_{n}- \lambda _{n} \nabla \psi (x_{n} +\beta _{n}v_{n}) \right) , \quad n\ge 0, \end{aligned}$$
(28)

Remark 3

[28] The condition (B3) is easily satisfied in numerical computation because the valued of \(||x_n-x_{n-1}||\) is known before choosing \(\zeta _n\). Indeed, the parameter \(\zeta _n\) can be chosen such that \(0 \le \zeta _n \le \bar{\zeta _n}\), where

$$\begin{aligned} \bar{\zeta _{n}}={\left\{ \begin{array}{ll} \min \lbrace \frac{\tau _{n} }{ \Vert x_{n}-x_{n-1} \Vert }, \tau \rbrace \quad \text {if } x_{n}\ne x_{n-1},\\ \tau , \quad \qquad \qquad \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$

where \(\{\tau _n\}\) is a positive sequence such that \(\tau _n =o(\alpha _n)\) and \(\zeta _n \in [0,\tau ]\subset [0,1)\).

3.1 Application to image recovery problems

In this section, we illustrate the relevance of our theorem with a linear inverse problem called LASSO (Least Absolute Shrinkage and Selection Operator) in compressed sensing (see in [35,36,37,38]).

Let \(A: H \rightarrow H\) be a bounded linear operator on a real Hilbert space. The image recovery model can be formulated as:

$$\begin{aligned} Ax = b+ \omega \end{aligned}$$
(29)

where \(\omega \) is an unknown noise vector, x is an unknown image, and b is the image (degraded observation) measurement. Assume problem (29) is consistent, then it can be recast as a least-squares problem:

$$\begin{aligned} \min _{x\in H} \left\{ \frac{1}{2} \Vert Ax- b \Vert ^{2} + \mu \Vert x \Vert \right\} , \end{aligned}$$
(30)

where \(\mu > 0\) is a regularization parameter. We now apply our Theorem 3 to approximate the solutions of problem (30). Setting \(\psi (x)= \frac{1}{2}\Vert Ax-b \Vert ^{2}\), \(\varphi (x) = \mu \Vert x \Vert \). We see that both \(\varphi \) and \(\psi \) are convex, lower semi continuous and proper functions with \(\nabla \psi (x)= A^{*}( Ax-b)\) for each \(x\in H.\) Also \(\nabla \psi \) is Lipschitz continuous with \(L= \Vert A \Vert ^{2}.\) Indeed, the composite \( \psi (x) + \varphi (x) := \varPhi \) is coercive and the subdifferential of \(\varphi \) is

$$\begin{aligned} \partial \Vert . \Vert (x)= \left\{ \begin{array}{ll} \frac{x}{\Vert x \Vert },\qquad ~\,\,\,\,\,\,\,\,\,\, x\ne 0, \\ B(0,1), \qquad ~x=0. \end{array} \right. \end{aligned}$$

While

$$\begin{aligned} \begin{aligned} \Vert \nabla \psi (x) - \nabla \psi (y) \Vert&= \Vert A^{*}( Ax-b)-A^{*}( Ay-b) \Vert \\&= \Vert A^{*}A (x-y) \Vert \\&\le \Vert A \Vert ^{2} \Vert x-y\Vert = L \Vert x-y\Vert . \end{aligned} \end{aligned}$$

The conclusion of our results in Theorem 3 thus applies with suitable standard assumption on the sequence parameters.

4 Numerical experiments

In this section, we present the numerical results of proposed algorithm and compare the performance with algorithm (27) of Guo and Cui [16] in signal recovery. For clarity sake, we reformulate problem (30) in \(H={\mathbb {R}}^N\) as follows:

$$\begin{aligned} \min _{x\in {\mathbb {R}}^N} \left\{ \frac{1}{2} \Vert Ax- b \Vert _{2}^{2} + \mu \Vert x \Vert _{1} \right\} , \end{aligned}$$
(31)

where \(A: {\mathbb {R}}^{N}\rightarrow {\mathbb {R}}^{ M}~(M< N)\) is a linear and bounded matrix operator and \(b\in {\mathbb {R}}^{M}\) is the observed or measured data with a noise \(\omega .\) The positive scalar \(\mu \) called the regularization parameter and \(x\in {\mathbb {R}}^{N}\) is a sparse vector with m nonzero components to be recovered. We claim that \(\psi (x)= \frac{1}{2}\Vert Ax-b \Vert _2^{2}\) and \(\varphi (x) = \mu \Vert x \Vert _1 \) be an indicator function with proximal map as

$$\begin{aligned} \mathbf{prox} _{\mu \Vert \cdot \Vert _1 }(x_n)= \left( \mathbf{prox} _{ \mu \mid \cdot \mid _{1}}(x_n^{1}), \ldots , \mathbf{prox} _{ \mu \mid \cdot \mid _{1}}(x_n^{k}) \right) ^{T}=( s_{n}^{1},\ldots , s_{n}^{k} )^{T}, \end{aligned}$$

where \(s_{n}^{k}= \mathbf{sgn} (x_{n}^{k} ) \cdot \max \{ | x_{n}^{k}|- \mu , 0 \}\) for \(k=1, 2, \ldots , N.\) It is worth noting that the subdifferential \(\partial \varphi \) at \(x_n\) is

$$\begin{aligned} \partial \varphi (x_{n} ) = \left\{ \begin{array}{ll} 1,&{} \quad \text {if }\,\,\, x_n > 0, \\ \left[ -1,1\right] , &{} \quad \text {if }\,\,\, x_n=0, \\ -1,&{} \quad \text {if }\,\,\, x_n < 0. \end{array} \right. \end{aligned}$$

Then the bounded sequence \(\{ v_{n} \}\) defined by

$$\begin{aligned} v_{n}= \left\{ \begin{array}{ll} -\frac{d_{n}}{\Vert d_{n} \Vert },&{} \quad 0\ne d_{n} \in \partial \varphi (x_{n} ), \\ 0, &{} \quad 0 \in \partial \varphi (x_{n} ), \end{array} \right. \end{aligned}$$

and the summarizable positive real sequence \(\beta _n=\frac{1}{5n}\). We set \(f(x)=\frac{x}{2}\) and control parameters \(\alpha _{n} =\frac{1}{n+1}\), \(\lambda _{n} =\frac{0.3}{ \Vert A \Vert ^2}\), \(\mu =\frac{1}{L}\). Let \(\tau =0.5\), \(\tau _n=\frac{1}{n^2}\) for each n and define \(\zeta _n\) as in Remark 3

In experiment 1, the matrix \( A^{M\times N} \) entries are sampled independently from a normal distribution of zero mean and unit variance. The observation vector b is generated from a Gaussian noise with signal-to-noise ratio \(\text {SNR}=40\). The sparse vector x is generated from a uniform distribution in the interval \([-2, 2]\) with m nonzero elements. The initial point \(x_0, x_1\) is picked randomly. The restoration accuracy is measured by the mean square error as \(E_{n}=\frac{1}{N} \Vert x_{n}- x^{*} \Vert < 10^{-4},\) where \(x^{*}\) is the estimated signal of x. The numerical results are reported as follows:

Test 1: Set \(N=1024\), \(M=512\) and \(m=10\).

Test 2: Set \(N=512\), \(M=256\) and \(m=10\).

Fig. 1
figure 1

The numerical experiments of Test 1

Fig. 2
figure 2

The numerical experiments of Test 2

Fig. 3
figure 3

The objective function value versus number of iterations of Test 1 and Test 2, respectively

In experiment 2, the matrix \( A^{M\times N} \) entries are sampled independently from a Gaussian distribution of zero mean and unit variance. The observation vector b is generated from a Gaussian noise with signal-to-noise ratio \(\text {SNR}=50\). The vector b is generated from a uniformly distribution in the interval \([-2, 2]\) with m nonzero elements. The restoration accuracy is measured by the mean square error as \(E_{n}=\frac{1}{N} \Vert x_{n}- x^{*} \Vert < 10^{-4},\) where \(x^{*}\) is the estimated signal of x. The numerical results are reported as follows:

Test 3: Set \(N=1024\), \(M=512\) and \(m=35\).

Test 4: Set \(N=512\), \(M=256\) and \(m=35\).

Fig. 4
figure 4

The numerical experiments of Test 3

Fig. 5
figure 5

The numerical experiments of Test 4

Fig. 6
figure 6

The objective function value versus number of iterations of Test 3 and Test 4, respectively

From Figs. 1, 2, 4 and 5, it is guarantee that our algorithm can recover the original signal faster than Guo and Cui algorithm (28) in the compressed sensing. Moreover, Figs. 3 and 6 report that objective function values generated by Algorithm 3.2 decrease faster than Guo and Cui algorithm (28) in all of numerical tests.

5 Conclusions

In this article, we suggest an inertial viscosity proximal gradient algorithm which is proved effective to handle a non-smooth composite optimization problem with large datasets in real Hilbert spaces. We investigate its convergence and bounded perturbation resilience properties. Our results are demonstrated, and its performance is compared with Algorithm (28) of Guo and Cui [16] with respect to the signal recovery problem. The advantage of our proposed algorithm over some previous works is the reduction in the computational cost and the CPU time in computing numerical results. Numerical studies show that inertial effects usually improve the performance of the iterative sequence in terms of convergence in this context.