1 Introduction

1.1 The considered problem

Let \({{\mathcal {H}}} \) be a real Hilbert space endowed with inner product and induced norm denoted by \(\langle .,. \rangle \) and \(\Vert .\Vert \), respectively. Our goal is to propose and study a rapidly converging method for solving the monotone inclusion problem

$$\begin{aligned} \text{ find } {\bar{x}} \in {\mathcal {H}}\,\, \hbox {such that }0 \in A {\bar{x}} , \end{aligned}$$
(1.1)

under the following conditions:

$$\begin{aligned}&A : {\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}\,\, \hbox {is maximally monotone on}\,\, {\mathcal {H}}, \end{aligned}$$
(1.2a)
$$\begin{aligned}&S:=A ^{-1}(0) \ne \emptyset . \end{aligned}$$
(1.2b)

This problem finds many important applications in scientific fields such as image processing, computer vision, machine learning, signal processing, optimization, equilibrium theory, economics, game theory, partial differential equations, statistics, and so on (see, e.g., [7, 9, 17, 31, 33, 42, 46, 47]). It includes, as special cases, variational inequalities, convex-concave saddle-point problems. In particular, we recall that (1.1)–(1.2) encompasses the non-smooth convex minimization problem

$$\begin{aligned} \min _{{\mathcal {H}}} g , \end{aligned}$$
(1.3)

where g verify the following conditions:

$$\begin{aligned}&g: {\mathcal {H}}\rightarrow (-\infty , \infty ] \text{ is } \text{ proper, } \text{ convex } \text{ and } \text{ lower } \text{ semi-continuous }, \end{aligned}$$
(1.4a)
$$\begin{aligned}&{\mathrm{argmin}} _{\mathcal {H}}\hbox {} g \ne \emptyset . \end{aligned}$$
(1.4b)

A typical method for computing zeroes of a maximally monotone operator \(A:{\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}\) is the so-called proximal point algorithm, PPA for short (see Martinet [36], Rockafellar [44, 45]), which consists of the iteration

$$\begin{aligned} x_{n+1}= J_{\mu A } (x_{n}), \end{aligned}$$
(1.5)

where \(J_{\mu A}:=(I+\mu A)^{-1}\) denotes the proximal mapping of A (with index \(\mu \)) which is well-known to be single-valued, everywhere defined (see, e.g., [13, 24, 30] for more details). Many celebrated algorithms can be recast as specific cases of PPA. As examples we mention the augmented Lagrangian method [26, 41, 43], the alternating direction method of multipliers (ADMM for short, see [1, 12, 18, 22]), the split inexact Uzawa method [47], as well as operator splitting methods (including the Douglas-Rachford splitting method [19, 34] and its generalized variant in [18] (also inspired from the Peaceman-Racheford splitting method [40]). Thus any enhancement of PPA has a wide range of applications.

It is well-known that PPA generates weakly convergent sequences \((x_n)\) with worst-case rates \(\Vert x_{n+1}-x_{n}\Vert = \mathcal{O}(n^{-\frac{1}{2}})\) and \(\Vert A_{\mu }(x_n)\Vert = \mathcal{O}(n^{-\frac{1}{2}})\) (see, e.g. [15, 18, 23]), where \(A_{\mu }= \mu ^{-1} (I-J_{\mu A})\) denotes the Yosida regularization of A. This latter operator enjoys numerous nice properties (see, for instance, [13, 14]). In particular, it satisfies \(A_{\mu }^{-1}(0)=S:=A^{-1} (0 )\) and the quantity \(\Vert A_{\mu }(x)\Vert \) (for \(x\in {\mathcal {H}}\)) can be used to measure the accuracy of x to a zero of A (see [18]).

The minimization problem (1.3)–(1.4), as a special instance of (1.1)–(1.2) when \(A=\partial g\) (where \(\partial g\) is the Fenchel sub-differential of g) can be solved by PPA in which the resolvent operator of A reduces to

$$\begin{aligned} J_{\mu \partial g }=\mathrm{prox}_{\mu g}(x):=\mathrm{argmin} _{ y \in {\mathcal {H}}} \left( g(y)+ (2\mu )^{-1}\Vert x-y\Vert ^2 \right) . \end{aligned}$$
(1.6)

Recall that the corresponding PPA has been considerably enhanced, by means of extrapolation processes based upon Nesterov’s and Güler’s acceleration techniques. The latter variant generates convergent sequences \((x_n)\) with worst-case convergence rates \(g(x_n)-\inf _{{\mathcal {H}}}g =o(n^{-2})\) (for the function values), \(\Vert x_{n+1}-x_{n}\Vert = o(n^{-1})\) (for the discrete velocity) and \(\Vert x_n^*\Vert =o(n^{-1})\) (for the sub-gradients \(x_n^* \in \partial g(x_n) \)), instead of the worst-case rates \(g(x_n)-\inf _{{\mathcal {H}}} g ={{\mathcal {O}}}(n^{-1})\), \(\Vert x_{n+1}-x_{n}\Vert ={{\mathcal {O}}}(n^{-\frac{1}{2}})\) and \(\Vert x_n^*\Vert =o(n^{-\frac{1}{2}})\) established for PPA.

Our purpose here is to extend the above results to sequences \((x_n)\) given by some accelerated variant of PPA, dedicated to solve the more general problem (1.1)–(1.2), especially in term of the quantities \(\Vert x_{n+1}-x_{n}\Vert \) and \(\Vert A_{\lambda }(x_n)\Vert \) (for some positive value \(\lambda \)).

Specifically, for solving (1.1), we introduce CRIPA (Corrected Relaxed Inertial Proximal Algorithm) which enters the following framework of sequences \(\{z_n, x_n\} \subset {\mathcal {H}}\) generated from starting points \(\{z_{-1},x_{-1},x_0\}\) by

$$\begin{aligned}&\ z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n ( z_{n-1}-x_n), \end{aligned}$$
(1.7a)
$$\begin{aligned}&x_{n+1}=(1-w_n )z_n+ w_n J_{r_n A}(z_{n}) , \end{aligned}$$
(1.7b)

where \(\{ w_n, \theta _n, \gamma _n \} \subset (0,1)\) and \((r_n) \subset (0,\infty )\). These parameters will be specified farther.

The considered algorithm combines relaxation factors \((w_n)\), a momentum term “\(\theta _n(x_{n}-x_{n-1})\)” (based on Nesterov’s acceleration techniques) and a correction term “\(\gamma _n (x_n -z_{n-1})\)” (similar to that introduced by Kim [27] in some variant of PPA).

Compared to PPA, CRIPA keeps the computational cost at each iteration basically unchanged, while allowing us to extend to the wide framework of maximal monotone inclusions, the interesting convergence properties obtained for the accelerated variant relative to convex minimization. More precisely, using conveniently chosen parameters \(\{r_n , w_n, \theta _n, \gamma _n\}\), we establish among others (see Theorem 1), the weak convergence of \((x_n)\) towards some zero of A, with the convergence rate \(\Vert x_{n+1}-x_{n}\Vert = o(n^{-1})\) for the discrete velocity. In addition, when using constant proximal indexes \(r_n\) (see Theorem 2), we obtain the accuracy of \((x_n)\) to an equilibria with the worst-case rate \(\Vert A_{\lambda }(x_n)\Vert =o(n^{-1})\) (for some positive value \(\lambda \)). This latter estimate is considerably improved when using increasing proximal indexes (see Theorem 3). To the best of our knowledge, there are no such convergence results established so far for algorithmic solution to (1.1)–(1.2).

1.2 A brief review of the state of art.

1.2.1 Convex minimization and Güuler’s acceleration processes.

Let us give some reminders concerning the useful and efficient methods proposed by Güler [25], based upon ideas of Nesterov [37] (also see [38, 39]), for minimizing a convex lower semi-continuous function g. As particular instances of these acceleration techniques (when using a constant proximal index \(\mu \)), we mention the following two algorithms:

The first consists of the sequences \(\{x_n,z_n\}\) given for \(n \ge 0\) by

$$\begin{aligned}&x_{n+1}= J_{\mu \partial g} (z_n), \end{aligned}$$
(1.8a)
$$\begin{aligned}&\hbox { }\ t_{n+1}=\frac{1}{2}\left( 1+ \sqrt{1+ 4t_n^2} \right) , \end{aligned}$$
(1.8b)
$$\begin{aligned}&z_{n+1}= x_{n+1} + \frac{t_n-1}{t_{n+1}} (x_{n+1} - x_n). \end{aligned}$$
(1.8c)

The second method consists of \(\{x_n,z_n\}\) given for \(n \ge 0\) by

$$\begin{aligned}&x_{n+1}= J_{\mu \partial g} (z_n), \end{aligned}$$
(1.9a)
$$\begin{aligned}&\hbox { }\ t_{n+1}=\frac{1}{2}\left( 1+ \sqrt{1+ 4t_n^2} \right) , \end{aligned}$$
(1.9b)
$$\begin{aligned}&z_{n+1}= x_{n+1} + \frac{t_n-1}{t_{n+1}} (x_{n+1} - x_n)+ \frac{t_n}{t_{n+1}} (x_{n+1} - z_n). \end{aligned}$$
(1.9c)

Note that both methods include a momentum term \(\frac{t_n-1}{t_{n+1}} (x_{n+1} - x_n)\) while the second one incorporates an additional correction term \(\frac{t_n}{t_{n+1}} (x_{n+1} - z_n)\).

Both the methods (1.8) and (1.9) were shown to produce iterates \((x_n)\) that guarantee a worst-case rate \(g(x_n)-\inf _{{\mathcal {H}}} g= {{\mathcal {O}}}(n^{-2})\) for minimizing the function values. However the convergence of the iterates has not been established.

This drawback was overcame by Chambolle-Dossal [16] (also see Attouch-Peypouquet [5]), through the variant of (1.8) given by

$$\begin{aligned} \begin{array}{l} z_{n} =x_n + \frac{n-1}{ n+\alpha -1} (x_{n}- x_{n-1}), \\ x_{n+1}= J_{\mu \partial g}( z_{n} ), \end{array} \end{aligned}$$
(1.10)

for some positive constant \(\alpha \). It was proved for \(\alpha >3\) (see [5]) that (1.10) generates (weakly) convergent sequences \((x_n)\) that minimize the function values with a complexity result of \(o(n^{-2})\), instead of the rates \({{\mathcal {O}}}(n^{-1})\) and \({{\mathcal {O}}}(n^{-2})\) obtained for (1.5) and Güler’s processes (1.8)–(1.9), respectively.

Note that the iterates \(\{x_n,z_n\}\) generated by the second model of Güler (1.9) satisfy, for \(n \ge 1\),

$$\begin{aligned}&z_{n}= x_{n} + \frac{t_{n-1}-1}{t_{n}} (x_{n} - x_{n-1})+ \frac{t_{n-1}}{t_{n}} (x_{n} - z_{n-1}), \end{aligned}$$
(1.11)
$$\begin{aligned}&x_{n+1}= J_{\mu \partial g } (z_n). \end{aligned}$$
(1.12)

Thus, the second algorithm (1.9), which is closely related to the optimized gradient methods discussed by Kim-Fessler [28, 29], uses a correction term other than the one involved in (1.7).

1.2.2 Monotone inclusions and acceleration processes.

Let us emphasize that, regarding the existing algorithmic solutions to (1.1)–(1.2) with an arbitrary monotone operator, there are no analogous theoretical convergence result to that obtained for (1.10). Many papers have been dedicated to accelerating PPA in its general form, by means of relaxation and inertial techniques. It seems (to the best of our knowledge) that only empirical accelerations have been obtained, except for the recent works by Attouch-Peypouquet [5], Attouch-László [2] and Kim [27]:

An accelerated variant of PPA has been proposed and investigated by Attouch-Peypouquet [5] through the following RIPA (Regularized Inertial Proximal Algorithm)

$$\begin{aligned}&z_n=x_n+ \left( 1-\frac{\alpha }{n} \right) (x_{n}-x_{n-1}), \end{aligned}$$
(1.13)
$$\begin{aligned}&x_{n+1}=\frac{\lambda _n}{\lambda _n+s} z_n+ \frac{s}{\lambda _n+s} J_{(\lambda _n+s)A} (z_n), \end{aligned}$$
(1.14)

where \(\{s, \alpha , \lambda _n\}\) are positive parameters such that

$$\begin{aligned} \alpha>2 \text{ and } \lambda _n=(1+\epsilon )\frac{s}{\alpha ^2}n ^2 \text{(for } \text{ some } \epsilon >0). \end{aligned}$$
(1.15)

It was established (see [5, Theorem 3.6]) that (1.13)–(1.15) produces convergent sequences \((x_n)\) with worst-case rates

$$\begin{aligned} \Vert x_{n+1}-x_{n}\Vert = {{\mathcal {O}}}(n^{-1}) \text{ and } \Vert A_{\lambda _n+s}x_n \Vert = o(n^{-1}). \end{aligned}$$
(1.16)

This algorithm was applied by Attouch [1] to convex structured minimization problems with linear constraints and it gave rise to an inertial proximal ADMM algorithm.

Another numerical approach to (1.1)–(1.2) was addressed by Attouch-László [2] through the following PRINAM (Proximal Regularized Inertial Newton Algorithm)

$$\begin{aligned}&z_{n-1} =\left( 1 -\beta \left( \frac{1}{\lambda _n}- \frac{1}{\lambda _{n-1}}\right) \right) x_n + \left( \alpha _{n}- \frac{\beta }{\lambda _{n-1}} \right) {\dot{x}}_{n}\nonumber \\&\qquad \qquad +\ \beta \left( \frac{1}{\lambda _n} J_{\lambda _{n}A}(x_{n}) - \frac{1}{\lambda _{n-1}} J_{\lambda _{n-1}A}(x_{n-1})\right) , \end{aligned}$$
(1.17a)
$$\begin{aligned}&x_{n+1} = \frac{\lambda _{n+1}}{\lambda _{n+1}+s} z_{n-1} + \frac{s}{\lambda _{n+1}+s} J_{(\lambda _{n+1}+s)A}(z_{n-1} ), \end{aligned}$$
(1.17b)

where \({\dot{x}}_{n}=x_n - x_{n-1}\), while s, \(\beta \), \((\lambda _n)\) and \((\alpha _n)\) are positive parameters that satisfy

$$\begin{aligned} \alpha _n= \frac{r n + q-1}{r(n+1)+q} \text{ with } r>0 \text{ and } q \in (-\infty ,\infty ), \lambda _n= \lambda n^2 \text{ with } \lambda > \frac{(2\beta +s)^2 r^2}{s}. \end{aligned}$$
(1.18)

It was established (see [2, Theorem 1.1]) that (1.17a)–(1.17b) generates convergent sequences \((x_n)\) with worst-case rates

$$\begin{aligned} \Vert x_{n+1}-x_{n}\Vert = {{\mathcal {O}}}(n^{-1})\hbox { and }\Vert A_{\lambda _n}(x_n)\Vert = o(n^{-2}). \end{aligned}$$
(1.19)

This second method also use a correction term "\(\beta \left( \frac{1}{\lambda _n} J_{\lambda _{n}A}(x_{n}) - \frac{1}{\lambda _{n-1}} J_{\lambda _{n-1}A}(x_{n-1})\right) \)" different from the one involved in (1.7).

Note that, for accelerating the proximal point algorithm, both the previous methods require proximal indexes \((\lambda _n)\) that go to infinity as \(n \rightarrow \infty \).

Remark 1

This last observation is fundamental in view of decomposition techniques which require to use only constant or bounded proximal indexes; for instance, in forward-backward algorithms ( [33]), or in operator splitting methods ( [18, 19, 21]).

This is not the case for the accelerated proximal point method proposed by Kim [27], based on the performance estimation problem (PEP) approach of Drori-Teboulle [20] and which writes, for initial iterates \(\{x_0, z_0, z_{-1}\} \subset {\mathcal {H}}\) and for \(n \ge 0\),

$$\begin{aligned}&x_{n+1}= J_{\mu A} (z_n), \end{aligned}$$
(1.20a)
$$\begin{aligned}&z_{n+1}= x_{n+1} + \frac{n}{n+2} (x_{n+1} - x_n)+ \frac{n}{n+2} ( z_{n-1}-x_{n}). \end{aligned}$$
(1.20b)

This yields the following worst-case convergence rate \(\Vert x_{n}-z_{n-1}\Vert = {{\mathcal {O}}}(n^{-1})\) (see [27, Theorem 4.1]), or equivalently \(\Vert A_{\mu } (z_n)\Vert ={{\mathcal {O}}}(n^{-1})\), which entails (since \(A_{\mu }\) is Lipschtz continuous)

$$\begin{aligned} \Vert A_{\mu } (x_n)\Vert ={{\mathcal {O}}}(n^{-1}). \end{aligned}$$
(1.21)

However no convergence of the iterates was established for (1.20).

1.3 CRIPA and an overview of the related results.

1.3.1 Introducing CRIPA.

Our numerical approach to (1.1)–(1.2) is more precisely given by sequences \(\{x_n,z_n\}\) generated by the following (corrected, relaxed and inertial) algorithm :

(CRIPA):

\(\rhd \) Step 1 (initialization): Let \(\{ z_{-1}, x_{-1}, x_0 \} \subset {\mathcal {H}}\), \(\lambda >0\).

\(\rhd \) Step 2 (main step): Given \(\{z_{n-1},x_{n-1}, x_n\} \subset {\mathcal {H}}\) (with \(n \ge 0\)), we compute the updates by

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n \left( z_{n-1}- x_{n}\right) , \end{aligned}$$
(1.22a)
$$\begin{aligned}&x_{n+1}= \frac{1}{1+k_{n}} z_n + \frac{k_n}{1+k_{n}} J_{\lambda (1+k_n)A} (z_n) , \end{aligned}$$
(1.22b)

where \(k_n\), \(\theta _n\) and \(\gamma _n\) are non-negative parameters defined, for some constant \(\{a, c,a_1,a_2 \}\subset [0,\infty )\) and \(\{ b, {\bar{c}}\} \subset (0,\infty )\), as follows:

  • \((k_n)\) is given recursively, from \(k_0 >0\) and for \(n \ge 1\), by

    $$\begin{aligned} k_n=k_{n-1} \left( 1 + \frac{a}{ b n + c} \right) . \end{aligned}$$
    (1.23)
  • \((\theta _n)\) and \((\gamma _n)\) are given for \(n \ge 0\) by

    $$\begin{aligned} \theta _n= 1- \frac{a_1 }{b n + {\bar{c}}}, \quad \gamma _n= 1-\frac{a_2}{b n +{\bar{c}}}. \end{aligned}$$
    (1.24)

Let us stress that, for conventional reasons, we can assume that \({\bar{c}}>a_i\) (for \(i=1,2\)), so as to ensure that \(\{\theta _n, \gamma _n\} \subset (0,1)\) (even though this is not of great importance for convergence).

A particular attention will be paid to the special case of CRIPA when \(a=0\) (namely when using constant relaxation factors), which will be considered through the following formulation:

(CRIPA-S):

\(\rhd \) Step 1 (initialization): Let \(\{ z_{-1}, x_{-1}, x_0 \} \subset {\mathcal {H}}\), \(\lambda >0\).

\(\rhd \) Step 2 (main step): Given \(\{z_{n-1},x_{n-1}, x_n\} \subset {\mathcal {H}}\) (with \(n \ge 0\)), we compute the updates by

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n ( z_{n-1}- x_{n}), \end{aligned}$$
(1.25a)
$$\begin{aligned}&x_{n+1}=\frac{1}{1+k_{0}}z_n +\frac{k_0}{1+k_{0}} J_{\lambda (1+k_0) A} (z_n). \end{aligned}$$
(1.25b)

1.3.2 An overview of the main results.

Under convenient assumptions on the constants \(\{a, b, c, a_1, a_2, {\bar{c}}\}\) we establish (among others) the weak convergence of \(\{x_n\}\) (generated by CRIPA) towards equilibria, but we also set up the following estimates (see Theorems 1, 2 and 3):

$$\begin{aligned}&\Vert x_{n+1}-x_{n}\Vert =o( n^{-1}), \sum _n n \Vert x_{n+1}-x_{n}\Vert ^2 < \infty , \end{aligned}$$
(1.26a)
$$\begin{aligned}&\sum _{n } n^2 \Vert (x_{n+1}-x_{n})-(x_{n}-x_{n-1})\Vert ^2 < \infty . \end{aligned}$$
(1.26b)

Moreover, given any arbitrary nonnegative integer p, we exhibit a set of conditions on the parameters that ensure the following properties (see Theorem 2)

$$\begin{aligned}&\Vert A_{\lambda } (x_n) \Vert =o( n^{-(p+1)}), \sum _n n^{2p+1} \Vert A_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$
(1.27a)
$$\begin{aligned}&\text{ for } \text{ any } q\in A^{-1}(0), \sum _{n } n^p \langle A_{\lambda }(x_n), x_{n}-q \rangle <\infty . \end{aligned}$$
(1.27b)

In particular, when A is the sub-differential of a proper and convex lower semi-continuous function \(f:{\mathcal {H}}\rightarrow \mathrm{I\!R}\cup \{+\infty \}\), we reach the following convergence rates of the values (see Theorem 4)

$$\begin{aligned} f_\lambda (x_{n}) -\min f = o( n^{-(p+1)}), \sum _n n^p (f_\lambda (x_{n}) -\min f) < \infty , \end{aligned}$$
(1.28)

where \(f_{\lambda }\) denotes the Moreau envelope of f.

Our process provides a significant acceleration of PPA, even when using a constant proximal index, besides generating convergent iterates. It also involves a correction term similar to that used by Kim [27]. It would be interesting to understand the role of the correction term used in the proposed method, through a continuous counterpart, but this is out of the scope of this work. Let us mention that there is a flourishing literature devoted to accelerated continuous counterpart of PPA with general operators (see, for instance, [2, 3, 5, 8, 11, 35]).

1.4 Organization of the paper.

In Sect. 2, we give some preliminaries on CRIPA. The method is reformulated in terms of Yosida approximations and a crucial estimation is proposed for a Lyapunov analysis. Section 3 is devoted to the convergence analysis of CRIPA. A main result (Theorem 1) is established in a general setting of operators and parameters. As consequences of our main theorem, we claim two other results (Theorems 2 and 3) relative to the involved parameters. In Sect. 4, we specialize our main results in the setting of convex minimization. Two related results are proposed (Theorems 4 and 5) are set up. Next, numerical experiments are performed in Sect. 5 and several technical results are established in Appendix.

Remark 2

From now on, so as to simplify the presentation, we (often) use the following notation: given any sequence \((u_n)\), we denote \({\dot{u}}_n=u_n-u_{n-1}\). We also denote the integer part of any real number a by [a].

2 Preliminaries on CRIPA.

In this section, we provide a suitable reformulation of CRIPA in terms of Yosida approximations and we exhibit a Lyapunov sequence in connection with the algorithm. These two arguments will allow us to obtain separately a series of preliminary estimates (in Sect. 3) that will be combined so as to reach our statements in Theorem 1.

2.1 Formulation of CRIPA by means of Yosida approximations.

Let sequences \(\{x_n,z_n\}\) verify (1.22b) relative to some positive parameters \(\{ k_n, \lambda \}\) and some maximally monotone operator \(A:{\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}\).

By definition of the Yosida approximation we have \(A_{\lambda }=\lambda ^{-1}(I-J_{\lambda A})\) as well as

$$\begin{aligned} (I + \lambda k_n A_{\lambda })^{-1}=I-\lambda k_n (A_{\lambda })_{\lambda k_n}. \end{aligned}$$
(2.1)

Hence, according to the resolvent equation (formulated as a semi-group property) \( (A_{\delta })_{\kappa }=A_{\delta +\kappa }\) (for any positive values \(\delta \) and \(\kappa \)), we infer that

$$\begin{aligned} (I + \lambda k_n A_{\lambda })^{-1}= I-\lambda k_n A_{\lambda (1+k_n)}. \end{aligned}$$
(2.2)

So, by \(x_{n+1}= z_n-\lambda k_n A_{\lambda (1+k_n)}(z_n)\) (from (1.22b)), we deduce that

$$\begin{aligned} x_{n+1}=(I + \lambda k_n A_{\lambda })^{-1}(z_n). \end{aligned}$$
(2.3)

This observation will be of crucial interest with regard to the methodology developed farther.

As a key result in our analysis, we give the following lemma.

Lemma 1

The iterates \(\{x_n, z_n\}\) generated by CRIPA satisfy (for \(n \ge 0\))

$$\begin{aligned}&z_{n}- x_{n+1}= \lambda k_n A_{\lambda }(x_{n+1}), \end{aligned}$$
(2.4a)
$$\begin{aligned}&{\dot{x}}_{n+1}+ (z_{n}- x_{n+1}) = \theta _n{\dot{x}}_{n}+ \gamma _{n} (z_{n-1}- x_{n}) . \end{aligned}$$
(2.4b)

Proof

It is no difficult to see for \(n \ge 0\) that (2.3) is equivalent to \(z_n=x_{n+1}+ \lambda k_n A_{\lambda }(x_{n+1})\), which yields (2.4a). Furthermore, by \(z_n= x_n + \theta _n{\dot{x}}_{n}+ \gamma _n (z_{n-1}- x_{n}) \) (from (1.22a)), we simply obtain \(z_n-x_{n+1}= - {\dot{x}}_{n+1}+ \theta _n{\dot{x}}_{n}+\gamma _n (z_{n-1}- x_{n})\), which entails (2.4b)\(\square \)

As a consequence of the previous arguments we make the following observation.

Remark 3

The iterates \(\{x_n, z_n\}\) given by CRIPA satisfy (for \(n \ge 1\))

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) +\gamma _n \lambda k_{n-1} A_{\lambda }(x_n) , \end{aligned}$$
(2.5a)
$$\begin{aligned}&x_{n+1}= J_{\lambda k_n A_{\lambda } } (z_n) . \end{aligned}$$
(2.5b)

This latter formulation enlightens the fact that the Moreau-Yosida regularization compatibilizes the operator’s lack of co-coercivity with the acceleration scheme (see [5, 6]).

2.2 A general inequality for a Lyapunov analysis

At once we provide a useful inequality dedicated to a Lyapunov analysis of the considered method. With the iterates \(\{x_n \}\) produced by CRIPA, we associate the sequence \(( {{\mathcal {E}}}_n(s,q))\) defined for \((s,q) \in (0, \infty ) \times S\) and for \(n \ge 1\) by

$$\begin{aligned} \begin{array}{l} {{\mathcal {E}}}_n(s,q)= \frac{1}{2} \Vert s (q-x_{n})- (bn +{\bar{c}}-a_1) {\dot{x}}_{n}\Vert ^2 \\ + \frac{1}{2} s(a_1-b -s) \Vert x_{n}-q\Vert ^2+ s \lambda k_{n-1} ( b(n-1)+{\bar{c}}) \langle A_{\lambda }(x_{n}),x_{n}-q \rangle . \end{array} \end{aligned}$$
(2.6)

The next result will be helpful with regard to our forthcoming analysis.

Lemma 2

Suppose that (1.2) holds and let \(\{x_n\} \subset {\mathcal {H}}\) be generated by CRIPA with sequences \((k_n)\), \((\theta _n)\) and \((\gamma _n)\) (given by (1.23) and (1.24)), along with constants \(\{\lambda , k_0\} \subset (0,\infty )\), \(\{a, c, a_1,a_2 \}\subset [0,\infty )\) and \(\{b, {\bar{c}}\} \subset (0,\infty )\) verifying

$$\begin{aligned} a_1 >b. \end{aligned}$$
(2.7)

Then, for \((s,q)\in (0,a_1-b] \times {\mathcal {H}}\) and for \(n \ge N_0\) (with some \(N_0\) large enough), we have

$$\begin{aligned} \begin{array}{l} \dot{{\mathcal {E}}}_{n+1}(s,q) + s \lambda k_{n-1} \left( a_2-b \right) \langle A_{\lambda } (x_{n}), x_n -q \rangle \\ \quad \qquad +\; \lambda k_{n-1} (bn+{\bar{c}}) \left( a \frac{bn+{\bar{c}}-s}{b n +c} + a_2 -s \right) \langle A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ \quad \qquad +\; \lambda ^2 k_{n} (b n+{\bar{c}})(b n+{\bar{c}}-s) \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n})\Vert ^2 \\ \quad \qquad +\; \frac{1}{2} (b n+{\bar{c}}) ^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2 \\ \quad \qquad +\; \frac{1}{2}\left( a_1-b -s \right) \big ( (2 n+1)b + 2{\bar{c}}- a_1 \big ) \Vert {\dot{x}}_{n+1}\Vert ^2 \le 0. \end{array} \end{aligned}$$
(2.8)

The proof of Lemma 2 is divided into two parts through the next subsections.

2.2.1 Proof of Lemma 2 - part 1.

An important equality of independent interest is proposed here relative to our method through the wider framework of sequences \(\{x_n, d_{n}\} \subset {\mathcal {H}}\) and parameters \(\{e, \nu _n, \theta _n\} \subset (0,\infty )\) verifying (for \(n \ge 0\))

$$\begin{aligned}&{\dot{x}}_{n+1}- \theta _n{\dot{x}}_{n}+ d_n =0, \end{aligned}$$
(2.9a)
$$\begin{aligned}&(e+ \nu _ {n+1}) \theta _n= \nu _n. \end{aligned}$$
(2.9b)

For this purpose, we associate with (2.9) the quantity \(G_n(s,q)\) given for \((s,q)\in [0,\infty )\times {\mathcal {H}}\) by

$$\begin{aligned} G_n(s,q) = \frac{1}{2} \Vert s (q-x_{n})- \nu _n {\dot{x}}_{n}\Vert ^2+ \frac{1}{2} s(e -s) \Vert x_{n}-q\Vert ^2, \end{aligned}$$
(2.10)

Basic properties regarding the sequence \(({G}_{n}(s,q))\) are established through the following proposition.

Proposition 1

Let \(\{x_n, d_n \} \subset {\mathcal {H}}\) and \(\{ \theta _n, \nu _n, e \} \subset (0,\infty )\) verify (2.9). Then for \((s,q) \in \left( 0, e \right] \times {\mathcal {H}}\) and for \(n \ge 0\) we have

$$\begin{aligned} \begin{array}{l} {\dot{G}}_{n+1}(s,q) + \frac{1}{2} (e+\nu _{n+1}) ^2 \Vert {\dot{x}}_{n+1}- \theta _n{\dot{x}}_{n}\Vert ^2 \\ \qquad \qquad +\; s (e+\nu _{n+1}) \langle d_{n}, x_{n+1}-q \rangle \\ \qquad \qquad +\; \left( e-s +\nu _{n+1} \right) (e+\nu _{n+1}) \langle d_{n} ,{\dot{x}}_{n+1}\rangle = - \frac{1}{2}\left( e -s \right) ( e+2 \nu _{n+1} ) \Vert {\dot{x}}_{n+1}\Vert ^2 . \end{array} \end{aligned}$$
(2.11)

The proof of Proposition 1 is given in Appendix 1.

2.2.2 Proof of Lemma 2 - part 2.

It can be checked (from (2.4)) that CRIPA enters the special case of algorithm (2.9) (for \(n \ge 1\)) when taking the particular parameters \(e=a_1-b\) (with \(a_1>b\)), \(\nu _{n}=b n +{\bar{c}}-a_1\), together with \(d_n\) as below

$$\begin{aligned} d_n= \lambda k_{n} A_{\lambda } (x_{n+1})- \gamma _n \lambda k_{n-1} A_{\lambda } (x_{n}). \end{aligned}$$
(2.12)

In this specific situation, we get

$$\begin{aligned} e+\nu _{n+1}= b n +{\bar{c}}\hbox { and } e+2 \nu _{n+1}= (2n+1)b +2{\bar{c}}-a_1. \end{aligned}$$
(2.13)

Hence, for \(n \ge N_0\), for some \(N_0\) large enough (so as to ensure that \(\nu _n\) is positive), and for \(s \in (0, a_1-b]\), by Proposition 1 we obtain

$$\begin{aligned} \begin{array}{l} {\dot{G}}_{n+1}(s,q) + \frac{1}{2} (b n+{\bar{c}})^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2+ \Gamma _n \\ = - \frac{1}{2}\left( a_1-b -s \right) ( (2 n+1)b + 2{\bar{c}}- a_1) \Vert {\dot{x}}_{n+1}\Vert ^2 , \end{array} \end{aligned}$$
(2.14)

where \(G_n(s,q)\) and \(\Gamma _{n}\) are given by

$$\begin{aligned}&G_n(s,q) = \frac{1}{2} \Vert s (q-x_{n})- \left( b n +{\bar{c}}-a_1 \right) {\dot{x}}_{n}\Vert ^2+ \frac{1}{2} s(a_1-b -s) \Vert x_{n}-q\Vert ^2,\nonumber \\ \end{aligned}$$
(2.15a)
$$\begin{aligned}&\Gamma _{n} = s (b n+{\bar{c}}) \langle d_n, x_{n+1}-q \rangle + (b n+{\bar{c}}) (b n+{\bar{c}}-s)\langle d_n ,{\dot{x}}_{n+1}\rangle . \end{aligned}$$
(2.15b)

In order to evaluate \(\Gamma _n\) and for simplification, we set

$$\begin{aligned} U_n:= \langle \lambda k_{n-1} A_{\lambda } (x_{n}), x_{n}-q \rangle . \end{aligned}$$

According to the formulation of \(d_n\) (given in (2.12)) we have

$$\begin{aligned} \begin{array}{l} \langle d_n, x_{n+1}-q \rangle \\ = \bigg< \lambda k_{n} A_{\lambda } (x_{n+1}) -\gamma _n \lambda k_{n-1} A_{\lambda } (x_{n}), x_{n+1}-q \bigg> \\ = \bigg< \lambda k_{n} A_{\lambda } (x_{n+1}), x_{n+1}-q \bigg> -\gamma _n \bigg < \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle - \gamma _n \langle \lambda k_{n-1} A_{\lambda } (x_{n}), x_{n}-q \bigg > \\ = U_{n+1} - \gamma _n U_{n}-\gamma _n \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle , \end{array} \end{aligned}$$

as well as

$$\begin{aligned} \begin{array}{l} \langle d_n, {\dot{x}}_{n+1}\rangle \\ = \bigg< \lambda k_{n} A_{\lambda } (x_{n+1}) -\gamma _n \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\bigg> \\ = \bigg< \lambda k_{n} A_{\lambda } (x_{n+1})- \lambda k_{n-1} A_{\lambda } (x_{n}) + (1-\gamma _n) \lambda k_{n-1} A_{\lambda } (x_{n}) , {\dot{x}}_{n+1}\bigg> \\ = \bigg< \lambda k_{n} A_{\lambda } (x_{n+1})- \lambda k_{n-1} A_{\lambda } (x_{n}),{\dot{x}}_{n+1}\bigg>+ (1-\gamma _n) \big < \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\big > \\ = \lambda k_{n} \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ \qquad \qquad +\; \lambda \left( k_{n}-k_{n-1} \right) \langle A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle + (1-\gamma _n) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ = (\lambda k_{n} ) \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ \qquad \qquad +\; \bigg ( \frac{k_{n}-k_{n-1}}{k_{n-1}} + (1-\gamma _n) \bigg ) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle . \end{array} \end{aligned}$$

This in light of (2.15b) amounts to

$$\begin{aligned} \begin{array}{l} (b n+{\bar{c}})^{-1}\Gamma _{n} = s \langle d_n, x_{n+1}-q \rangle + (b n+{\bar{c}}-s)\langle d_n ,{\dot{x}}_{n+1}\rangle \\ \\ = s \bigg ( U_{n+1} - \gamma _n U_{n} -\gamma _n \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \bigg ) \\ \quad \qquad +\; (\lambda k_{n}) (b n+{\bar{c}}-s) \langle A_{\lambda } (x_{n+1})- A_{\lambda } (x_{n}),{\dot{x}}_{n+1}\rangle \\ \quad \qquad +\; (b n+{\bar{c}}-s) \left( \frac{k_{n}-k_{n-1}}{k_{n-1}} + 1-\gamma _n \right) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ \\ = s \left( U_{n+1} - \gamma _n U_{n} \right) \\ \quad \qquad +\; (\lambda k_{n}) (b n+{\bar{c}}-s) \langle A_{\lambda } (x_{n+1})- A_{\lambda } (x_{n}),{\dot{x}}_{n+1}\rangle \\ \quad \qquad +\; \bigg ( (b n+{\bar{c}}-s) \frac{k_{n}-k_{n-1}}{k_{n-1}} \quad \qquad +\; (b n+{\bar{c}}-s) \left( 1-\gamma _n \right) - s \gamma _n \bigg ) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle . \end{array} \end{aligned}$$

The latter equality can be rewritten as

$$\begin{aligned} \begin{array}{l} (b n+{\bar{c}})^{-1}\Gamma _{n} = s ( U_{n+1} -\gamma _n U_n ) \\ \quad +\; \lambda k_{n}(b n+{\bar{c}}-s) \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ \quad +\; \left( (b n+{\bar{c}}-s) \frac{k_{n}-k_{n-1}}{k_{n-1}}+ (b n+{\bar{c}})(1-\gamma _n) -s \right) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle . \end{array} \end{aligned}$$
(2.16)

Moreover, we recall that \(\gamma _n= 1- \frac{a_2}{ n + {\bar{c}}}\). Then, by an easy computation, we obtain the following two equalities

$$\begin{aligned}&\hbox { }\ \gamma _{n} (b n+{\bar{c}})= (b n+{\bar{c}})- a_2 = (b (n-1)+{\bar{c}})+ b - a_2, \\&\hbox { }\ (b n+{\bar{c}})(1-\gamma _n) -s= a_2 -s . \end{aligned}$$

It is also readily checked from (1.24) that \(k_n\) satisfies

$$\begin{aligned} \frac{ k_n - k_{n-1}}{ k_{n-1}}= \frac{a}{b n +c}. \end{aligned}$$
(2.17)

Therefore by the previous arguments we get

$$\begin{aligned} \begin{aligned} \Gamma _n = s ( b n&+{\bar{c}}) U_{n+1} - s (b (n-1)+{\bar{c}}) U_{n} + s \left( a_2-b \right) U_n \\&+ (b n+{\bar{c}}) \left( a \frac{b n+{\bar{c}}-s }{b n +c}+ a_2 -s \right) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\&+ \lambda k_{n} (n+{\bar{c}})(n+{\bar{c}}-s) \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle . \end{aligned} \end{aligned}$$
(2.18)

Consequently, by noticing that \({{\mathcal {E}}}_{n}(s,q)={ G}_{n}(s,q) + s ( b(n-1) +{\bar{c}}) U_{n}\) (in light (2.6) and (2.15a)), and using (2.14) and (2.18), we deduce

$$\begin{aligned}&\dot{{\mathcal {E}}}_{n+1}(s,q) + s \left( a_2-b \right) U_n + (b n+{\bar{c}}) \left( a \frac{b n+{\bar{c}}-s}{b n +c}+ a_2 -s \right) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\&\qquad \qquad \qquad + \lambda k_{n} (b n+{\bar{c}})(b n+{\bar{c}}-s) \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\&\qquad \qquad \qquad + \frac{1}{2} (b n+{\bar{c}})^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2 \\&= - \frac{1}{2}\left( a_1-b -s \right) \big ( (2 n+1)b + 2{\bar{c}}- a_1\big ) \Vert {\dot{x}}_{n+1}\Vert ^2 . \end{aligned}$$

In addition, the well-known property of \(\lambda \)-co-coerciveness of \(A_{\lambda }\) implies that

$$\begin{aligned} \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \ge \lambda \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}) \Vert ^2. \end{aligned}$$
(2.19)

Thus, for \(n \ge N_0\), for some \(N_0\) large enough (which also ensures that \((b n+{\bar{c}}-(a_1-b))\) is positive), and for \(s \in (0, a_1-b]\), by the previous two inequalities we are led to

$$\begin{aligned} \dot{{\mathcal {E}}}_{n+1}(s,q)+ & {} s \left( a_2-b \right) U_n + (b n+{\bar{c}}) \left( a \frac{b n+{\bar{c}}-s}{b n +c}+ a_2 -s \right) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\+ & {} \lambda ^2 k_{n} (b n+{\bar{c}})(b n+{\bar{c}}-s) \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}) \Vert ^2 \\+ & {} \frac{1}{2} (b n+{\bar{c}})^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2 \\+ & {} \frac{1}{2}\left( a_1-b -s \right) \big ( (2 n+1) b + 2{\bar{c}}- a_1 \big ) \Vert {\dot{x}}_{n+1}\Vert ^2 \le 0. \end{aligned}$$

This last inequality, recalling that \(U_n:= \langle \lambda k_{n-1} A_{\lambda } (x_{n}), x_{n}-q \rangle \), is nothing but (2.8) \(\square \)

3 CRIPA in the general case of monotone operators.

3.1 Main estimates.

A series of estimates are obtained here by means of a Lyapunov analysis (based upon Lemma 2) and using the reformulation of CRIPA (from Lemma 1). Our main results (in Theorem 1) will be derived as a combination of the previous series of estimates.

3.1.1 Estimates from the energy-like sequence.

The next result is obtained from Lyapunov properties of \(({{\mathcal {E}}}_n(s,q))\) for convenient choices of the involved parameters.

Lemma 3

Suppose that (1.2) holds and that \(\{x_n\} \subset {\mathcal {H}}\) is generated by CRIPA with sequences \((k_n)\), \((\theta _n)\) and \((\gamma _n)\) (given by (1.23) and (1.24)), along with constants \(\{\lambda , k_0\}\subset (0,\infty )\), \(\{ a, c, a_1,a_2, \}\subset [0,\infty )\) and \(\{ b, {\bar{c}}\}\subset (0,\infty )\) verifying

$$\begin{aligned} a_2 \ge b \,\hbox { and }\, a_1 > b + (a+ a_2). \end{aligned}$$
(3.1)

Assume, in addition, that c and \({\bar{c}}\) are chosen as follows:

$$\begin{aligned}&\hbox {if} \;\; a>0 \hbox {, then} \; \; c > a_1 - (a_2+a) , {\bar{c}}=c+(a_2+a); \end{aligned}$$
(3.2a)
$$\begin{aligned}&\hbox {if}\;\; a =0 \hbox {, then}\; \;{\bar{c}}> \max \{a_1,a_2\}. \end{aligned}$$
(3.2b)

Then, for any \(q\in S\), the sequence \(({{\mathcal {E}}}_n(a+a_2,q))_{n \ge N_1}\) (for some integer \(N_1\) large enough) is non-increasing and convergent. Moreover, the following estimates are reached:

$$\begin{aligned}&\hbox { }\ \sup _{n \ge N_1} \Vert x_n-q\Vert ^2 \le \frac{2{{\mathcal {E}}}_{N_1}(a+a_2,q) }{ (a+a_2)(a_1-b- (a+a_2))}, \end{aligned}$$
(3.3a)
$$\begin{aligned}&\hbox { }\ \sup _ {n \ge N_1} ( b(n-1)+{\bar{c}}) k_{n-1} \langle A_{\lambda }(x_n) , x_{n}-q \rangle \le \frac{{{\mathcal {E}}}_{N_1}(a+a_2,q)}{\lambda (a+a_2) }, \end{aligned}$$
(3.3b)
$$\begin{aligned}&\hbox { }\ \sup _n n \Vert {\dot{x}}_{n}\Vert < \infty , \end{aligned}$$
(3.3c)
$$\begin{aligned}&\hbox { }\ \sum _{n \ge N_1} (a_2-b) k_{n-1} \langle A_{\lambda }(x_n), x_{n}-q \rangle \le \frac{{{\mathcal {E}}}_{N_1}(a+a_2,q)}{\lambda (a+a_2)} , \end{aligned}$$
(3.3d)
$$\begin{aligned}&\hbox { }\ \sum _{n \ge N_1} k_n ( b n +{\bar{c}}) \big ( b n + {\bar{c}}- (a+a_2) \big ) \Vert A_{\lambda }(x_{n+1})- A_{\lambda }(x_{n}) \Vert ^2 \le \frac{{{\mathcal {E}}}_{N_1}(a+a_2,q)}{\lambda ^2},\nonumber \\ \end{aligned}$$
(3.3e)
$$\begin{aligned}&\hbox { }\ \sum _{n \ge N_1} ( b n + {\bar{c}}) \Vert {\dot{x}}_{n+1}\Vert ^2 \le \frac{2 {{\mathcal {E}}}_{N_1}(a+a_2,q) }{a_1-b- (a+a_2)} , \end{aligned}$$
(3.3f)
$$\begin{aligned}&\hbox { }\ \sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2 < \infty . \end{aligned}$$
(3.3g)

Proof

Clearly, we have \(a+ a_2 \in (0, a_1-b)\) (by condition (3.1)). It can also be checked that condition (3.2) ensures that

$$\begin{aligned} \begin{array}{l} a \frac{b n+{\bar{c}}-( a+ a_2)}{b n +c}+ a_2 = a+ a_2. \end{array} \end{aligned}$$
(3.4)

Consequently, for \(q \in S\) and using Lemma 2 with \(s=a+ a_2\), we know that, for \(n \ge N_1\) (with \(N_1\) large enough), we get

$$\begin{aligned} \begin{array}{l} \dot{\mathcal{E}}_{n+1}(a+ a_2,q) + \lambda (a+ a_2) k_{n-1} \left( a_2-b \right) \langle A_{\lambda } (x_{n}), x_n -q \rangle \\ + \lambda ^2 k_{n} (b n+{\bar{c}})\big (b n+{\bar{c}}-(a+ a_2) \big ) \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n})\Vert ^2 \\ + \frac{1}{2} (b n+{\bar{c}}) ^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2 \\ + \frac{1}{2}\big ( a_1-b -(a+ a_2) \big ) \big ( (2n +1)b + 2{\bar{c}}- a_1 \big ) \Vert {\dot{x}}_{n+1}\Vert ^2 \le 0, \end{array} \end{aligned}$$
(3.5)

together with

$$\begin{aligned}&b n+{\bar{c}}-(a+ a_2) >0, \end{aligned}$$
(3.6a)
$$\begin{aligned}&(2 n+1) b + 2{\bar{c}}- a_1 >0. \end{aligned}$$
(3.6b)

It follows immediately that the non-negative sequence \((\mathcal{E}_{n}(a+a_2,q))_{n \ge N_1}\) is non-increasing, since \(a_2-b\) is assumed to be non-negative and since \( a_1-b -(a+ a_2) \) is assumed to be positive (in light of condition (3.1)). Hence, \(({{\mathcal {E}}}_{n}(a+ a_2,q))_{n \ge N_1}\) is convergent and bounded. Moreover, from (2.6), we recall that

$$\begin{aligned} \begin{array}{l} {{\mathcal {E}}}_n(a+a_2,q)= \frac{1}{2} (a+ a_2) \big ( a_1-b -(a+a_2) \big ) \Vert x_{n}-q\Vert ^2 \\ + \lambda (a+ a_2) k_{n-1} ( b (n-1) +{\bar{c}}) \langle A_{\lambda }(x_{n}),x_{n}-q \rangle \\ + \frac{1}{2} \Vert (a+ a_2) (q-x_{n})- (b n+{\bar{c}}-a_1) {\dot{x}}_{n}\Vert ^2 . \end{array} \end{aligned}$$
(3.7)

Then, by the inequality \({{\mathcal {E}}}_{n}(a+ a_2,q)\le \mathcal{E}_{N_1}(a+ a_2,q)\) (for \(n \ge N_1\)), we get

$$\begin{aligned}&\hbox { }\ \frac{1}{2}(a+ a_2) \big (a_1-b -(a+ a_2) \big ) \Vert x_{n}-q\Vert ^2 \le {{\mathcal {E}}}_{N_1}(a+ a_2,q), \end{aligned}$$
(3.8)
$$\begin{aligned}&\hbox { }\ \lambda (a+ a_2) k_{n-1} ( b(n-1) +{\bar{c}}) \langle A_{\lambda }(x_n),x_{n}-q \rangle \le {{\mathcal {E}}}_{N_1}(a+ a_2,q), \end{aligned}$$
(3.9)
$$\begin{aligned}&(b n +{\bar{c}}-a_1) \Vert {\dot{x}}_{n}\Vert - (a+ a_2)\Vert q-x_{n}\Vert \le \sqrt{ 2 {{\mathcal {E}}}_{N_1}(a+ a_2,q)}. \end{aligned}$$
(3.10)

Estimates (3.3a), (3.3b) and (3.3c) are direct consequences of these last three inequalities. Furthermore, by adding (3.5) from \(n=N_1\) to \(n=N\) (for any given positive integer \(N \ge N_1\)) we obtain

$$\begin{aligned} \begin{array}{lll} {{\mathcal {E}}}_{N+1}(a+ a_2,q) \\ + \lambda (a+ a_2) \left( a_2-b \right) \sum _{n=N_1}^N k_{n-1} \langle A_{\lambda } (x_{n}), x_n -q \rangle \\ + \lambda ^2 \sum _{n=N_1}^N k_{n} (b n+{\bar{c}})\big ( b n+{\bar{c}}- (a+ a_2)\big ) \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n})\Vert ^2 \\ + \frac{1}{2}\left( a_1-b -(a+ a_2) \right) \sum _{n=N_1}^N \big ( (2 n+1)b + 2{\bar{c}}- a_1 \big ) \Vert {\dot{x}}_{n+1}\Vert ^2 \\ + \frac{1}{2} \sum _{n=N_1}^N (n+{\bar{c}}) ^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2 \le {{\mathcal {E}}}_{N_1}(a+ a_2,q), \end{array} \end{aligned}$$
(3.11)

which, in light of (3.1) and (3.6), entails that

$$\begin{aligned}&\hbox { }\ \lambda (a_2+a) \left( a_2-b \right) \sum _{n=N_1}^N k_{n-1} \langle A_{\lambda } (x_{n}), x_n -q \rangle \le {{\mathcal {E}}}_{N_1}(a_2+a,q), \end{aligned}$$
(3.12a)
$$\begin{aligned}&\hbox { }\ \lambda ^2 \sum _{n=N_1}^N k_{n} (b n+{\bar{c}})\big (b n+{\bar{c}}-(a+a_2)\big ) \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n})\Vert ^2 \le {{\mathcal {E}}}_{N_1}(a_2+a,q) , \end{aligned}$$
(3.12b)
$$\begin{aligned}&\hbox { }\ \frac{1}{2}\left( a_1-b -(a+ a_2) \right) \sum _{n=N_1}^N \big ( (2 n+1) b + 2{\bar{c}}- a_1\big ) \Vert {\dot{x}}_{n+1}\Vert ^2\le {{\mathcal {E}}}_{N_1}(a_2+a,q), \end{aligned}$$
(3.12c)
$$\begin{aligned}&\frac{1}{2} \sum _{n=N_1}^N (n+{\bar{c}}) ^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2\le {{\mathcal {E}}}_{N_1}(a_2+a,q). \end{aligned}$$
(3.12d)

This straightforwardly yields (3.3d), (3.3e) and (3.3f). The last estimate (3.3g) is simply deduced from (3.3f) and (3.12d) (in light of the definition of \(\theta _n\)) \(\square \)

3.1.2 Estimates from the reformulation of the method.

In this section we establish additional estimates regarding CRIPA, especially on \((A_{\lambda }(x_{n}))\), by combining the results of Lemma 3 with the formulation of the method given in (2.4b).

Lemma 4

Assume, in addition to the assumptions of Lemma 3, that \(a_2\) and b satisfy

$$\begin{aligned} a_2 > 2b. \end{aligned}$$
(3.13)

Then we have the following results:

$$\begin{aligned}&\sum _n n \Vert {\dot{x}}_{n}+ \lambda k_{n-1} A_{\lambda } (x_{n}) \Vert ^2 < \infty , \end{aligned}$$
(3.14a)
$$\begin{aligned}&\Vert {\dot{x}}_{n}+ \lambda k_{n-1} A_{\lambda } (x_{n})\Vert =o(n^{-1}), \end{aligned}$$
(3.14b)
$$\begin{aligned}&\sum _n n k_{n-1}^2 \Vert A_{\lambda } (x_{n}) \Vert ^2 < \infty , \end{aligned}$$
(3.14c)
$$\begin{aligned}&\sum _n n k_{n-1} | \langle A_{\lambda } (x_{n}) , {\dot{x}}_{n+1}\rangle | < \infty . \end{aligned}$$
(3.14d)

Proof

Let us prove (3.14a) and (3.14b). For \(n\ge 1\), according to Lemma 1 we have

$$\begin{aligned} {\dot{x}}_{n+1}+ \lambda k_{n} A_{\lambda } (x_{n+1})= \theta _n{\dot{x}}_{n}+ \gamma _n \lambda k_{n-1} A_{\lambda } (x_{n}), \end{aligned}$$
(3.15)

where \(\{ \theta _n, \gamma _n \} \subset (0,1)\) are given by (1.24) with \({\bar{c}}=c+a_2+a\) (under the assumptions of Lemma 3), namely

$$\begin{aligned} \theta _n= 1-a_1 (b n +{\bar{c}})^{-1}\hbox { and }\gamma _n= 1-a_2 (b n +{\bar{c}})^{-1}. \end{aligned}$$
(3.16)

Observe that (3.15) can be rewritten as

$$\begin{aligned} {\dot{x}}_{n+1}+ \lambda k_{n} A_{\lambda } (x_{n+1})= \gamma _n ({\dot{x}}_{n}+ \lambda k_{n-1} A_{\lambda } (x_{n}) ) + (\theta _n-\gamma _n ) {\dot{x}}_{n}, \end{aligned}$$
(3.17)

so, by setting

$$\begin{aligned} H_n = {\dot{x}}_{n}+\gamma _n \lambda k_{n-1} A_{\lambda } (x_{n}), \end{aligned}$$
(3.18)

we can equivalently formulate (3.17) as

$$\begin{aligned} \begin{array}{l} H_{n+1} = \gamma _n H_{n} + ( \theta _n-\gamma _n) {\dot{x}}_{n}= \gamma _n H_{n} + (1-\gamma _n) \frac{\theta _n-\gamma _n}{1-\gamma _n} {\dot{x}}_{n}. \end{array} \end{aligned}$$
(3.19)

Then by convexity of the squared norm we infer that

$$\begin{aligned} \Vert H_{n+1} \Vert ^2&\le \gamma _n \Vert H_{n}\Vert ^2 + (1-\gamma _n) \left( \frac{\gamma _n- \theta _n}{1-\gamma _n} \right) ^2 \Vert {\dot{x}}_{n}\Vert ^2\nonumber \\&= \gamma _n \Vert H_{n}\Vert ^2 + \frac{\left( \gamma _n- \theta _n\right) ^2}{1-\gamma _n} \Vert {\dot{x}}_{n}\Vert ^2. \end{aligned}$$
(3.20)

Moreover, by (3.16) we readily have

$$\begin{aligned} \theta _n-\gamma _n= (a_2 -a_1) (b n +{\bar{c}})^{-1}\hbox { and }1-\gamma _n= a_2 (b n +{\bar{c}})^{-1}, \end{aligned}$$
(3.21)

which amounts to

$$\begin{aligned} \frac{\left( \gamma _n- \theta _n\right) ^2}{1-\gamma _n}= \frac{(a_2 -a_1)^2 (n +{\bar{c}})^{-2}}{ a_2 (n +{\bar{c}})^{-1}}=\frac{(a_2 -a_1)^2 }{ a_2 } (b n +{\bar{c}})^{-1}. \end{aligned}$$
(3.22)

Consequently, in light of (3.16), (3.20) and (3.22), we obtain

$$\begin{aligned} \Vert H_{n+1} \Vert ^2 \le \left( 1-a_2 (b n +{\bar{c}})^{-1} \right) \Vert H_{n}\Vert ^2 + a_2^{-1} (a_2 -a_1)^2 (b n +{\bar{c}})^{-1} \Vert {\dot{x}}_{n}\Vert ^2.\nonumber \\ \end{aligned}$$
(3.23)

Next, multiplying this last inequality by \((bn +{\bar{c}})^2\) gives us

$$\begin{aligned} (b n +{\bar{c}})^2\Vert H_{n+1} \Vert ^2 \le&(b n +{\bar{c}})\left( b n +{\bar{c}}-a_2 \right) \Vert H_{n}\Vert ^2 \nonumber \\&+ a_2^{-1} (a_2 -a_1)^2 (b n +{\bar{c}}) \Vert {\dot{x}}_{n}\Vert ^2, \end{aligned}$$
(3.24)

while we simply have

$$\begin{aligned} \begin{array}{l} (b n +{\bar{c}})\left( b n +{\bar{c}}-a_2 \right) -(b(n-1) +{\bar{c}})^2 \\ = (b n +{\bar{c}})^2 -(b(n-1) +{\bar{c}})^2 -a_2(b n +{\bar{c}}) \\ \le 2b (b n +{\bar{c}})-a_2(b n +{\bar{c}})= -( a_2-2b ) (b n +{\bar{c}}), \end{array} \end{aligned}$$

or equivalently

$$\begin{aligned} (b n +{\bar{c}})\left( b n +{\bar{c}}- a_2 \right) \le (b(n-1) +{\bar{c}})^2- ( a_2-2b )(b n +{\bar{c}}). \end{aligned}$$
(3.25)

It follows from the two estimates (3.24) and (3.25) that

$$\begin{aligned} \begin{array}{l} (b n +{\bar{c}})^2\Vert H_{n+1} \Vert ^2 - (b(n-1) +{\bar{c}})^2 \Vert H_{n} \Vert ^2 \\ + (b n +{\bar{c}})( a_2-2b ) \Vert H_{n}\Vert ^2 \le a_2^{-1} (a_2 -a_1)^2 (b n +{\bar{c}}) \Vert {\dot{x}}_{n}\Vert ^2. \end{array} \end{aligned}$$
(3.26)

Thus, assuming that \(a_2 -2b >0\) and recalling that \(\sum _n n \Vert {\dot{x}}_{n}\Vert ^2 < \infty \) (according to Lemma 3), we classically deduce that \(\sum _n n \Vert H_{n} \Vert ^2 < \infty \) (that is (3.14a)) and that there exists \( l_1 \ge 0\) such that

$$\begin{aligned} \lim _{n \rightarrow +\infty } (b(n-1) +{\bar{c}})^2 \Vert H_{n} \Vert ^2=l_1. \end{aligned}$$

Notice that we clearly have \( \lim _{n \rightarrow \infty } (bn)^2 \Vert H_{n} \Vert ^2=l_1\) (since \(\frac{ (b(n-1) +{\bar{c}})^2}{(bn)^2} \rightarrow 1\) as \(n \rightarrow \infty \)). So, by \(\sum _n n \Vert H_{n} \Vert ^2 < \infty \) (in light of (3.14a)), and noticing that \(\sum _n n^{-1} =\infty \), we deduce that \(l_1=0\), which leads to (3.14b).

Let us prove (3.14c) and (3.14d). Clearly, according to the definition of \(H_n\), we simply have

$$\begin{aligned} n \Vert \lambda k_{n-1} A_{\lambda } (x_{n})\Vert ^2 \le 2 n \Vert {\dot{x}}_{n}+\lambda k_{n-1} A_{\lambda } (x_{n})\Vert ^2 + 2 n \Vert {\dot{x}}_{n}\Vert ^2= 2 n \Vert H_n\Vert ^2 + 2 n \Vert {\dot{x}}_{n}\Vert ^2, \end{aligned}$$

hence, by \(\sum _n n \Vert {\dot{x}}_{n}\Vert ^2< \infty \) (from (3.3f)) and \(\sum _n n \Vert H_n\Vert ^2< \infty \) (from (3.14a)), we immediately obtain (3.14c). In addition, Young’s inequality readily gives us

$$\begin{aligned} \begin{array}{l} n k_{n-1}|\langle A_{\lambda } (x_{n}) , {\dot{x}}_{n+1}\rangle | \le \frac{1}{2} n \Vert k_{n-1} A_{\lambda } (x_{n})\Vert ^2 + \frac{1}{2} n \Vert {\dot{x}}_{n+1}\Vert ^2. \end{array} \end{aligned}$$
(3.27)

The estimate (3.14d) is then obtained as an immediate consequence of (3.27), along with the results \(\sum _n n \Vert k_{n-1} A_{\lambda } (x_{n})\Vert ^2 < \infty \) (from (3.14c)) and \(\sum _n n \Vert {\dot{x}}_{n}\Vert ^2< \infty \) (from (3.3f))\(\square \)

3.2 Asymptotic convergence and main results.

3.2.1 Convergence in the general case of parameters.

The following result establishes the convergence of CRIPA in a general setting of parameters.

Theorem 1

Let \(A : {\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}\) be a maximally monotone operator such that \(S:=A^{-1}(0) \ne \emptyset \). Let \(\{\lambda , k_0 \}\) be positive constants and assume that \(\{z_n, x_n\} \subset {\mathcal {H}}\) are generated by CRIPA with \((k_n)\), \((\theta _n)\) and \((\gamma _n)\) (given by (1.23) and (1.24)), along with constants \(\{a, c, a_1,a_2\}\subset [0,\infty )\) and \(\{b, {\bar{c}}\}\subset (0,\infty )\) verifying

$$\begin{aligned}&\hbox {} \;\; a_2>2b \hbox {,} \;a_1 > b + a+a_2 ; \end{aligned}$$
(3.28a)
$$\begin{aligned}&\hbox {if}\; \; a>0 \hbox {, then} \;c > a_1 - (a_2+a) , {\bar{c}}=c+a_2+a; \end{aligned}$$
(3.28b)
$$\begin{aligned}&\hbox {if} \;\; a =0 \hbox {, then} \;{\bar{c}}> \max \{a_1,a_2\}. \end{aligned}$$
(3.28c)

Then \((x_n)\) and \((z_n)\) converge weakly to some element of S and the following results are reached:

$$\begin{aligned}&\Vert {\dot{x}}_{n+1}\Vert =o( n^{-1}), \sum _n n \Vert {\dot{x}}_{n+1}\Vert ^2 < \infty ,, \end{aligned}$$
(3.29a)
$$\begin{aligned}&\Vert A_{\lambda } (x_n) \Vert =o( (nk_n)^{-1}), \sum _n n k_n^2 \Vert A_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$
(3.29b)
$$\begin{aligned}&\sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2< \infty , \sum _{n} k_n n^2 \Vert A_{\lambda }(x_{n+1})- A_{\lambda }(x_{n}) \Vert ^2 < \infty , \end{aligned}$$
(3.29c)
$$\begin{aligned}&\text{ for } \text{ any } q\in S, \sum _{n } k_{n} \langle A_{\lambda }(x_n), x_{n}-q \rangle <\infty . \end{aligned}$$
(3.29d)

Theorem 1 will be proved in Appendix (Sect. 3).

3.2.2 Convergence results in particular cases of parameters.

The estimates given in Theorem 1 are depending on the parameter \((k_n)\), while the next result enlightens some specific properties of \((k_n)\) given by (1.23).

Proposition 2

Let \((k_n) \subset (0,\infty )\) be given by (1.23) with \(b >0\) and \(\{ a, c\} \subset [0,\infty )\). Suppose for some nonnegative integer p that \([\frac{a}{b}] \ge p \) (where \([\frac{a}{b}]\) denotes the integer part of \(\frac{a}{b}\)). Then there exist some positive constant C (depending on \(k_0\), \([\frac{a}{b}]\), \([\frac{c}{b}]\)) and some positive integer \(n_p\) (depending on \([\frac{a}{b}]\)) for which \(k_n\) satisfies

$$\begin{aligned} k_n \ge C n^p\hbox { for }n \ge n_p. \end{aligned}$$
(3.30)

The proof of Proposition 2 is given in Appendix 2.

The above proposition allows us to give more precise estimates with respect to the involved parameters.

In specific, the next two results are immediate consequences of Theorem 1 and Proposition 2.

The first theorem is related to the special case of CRIPA when \(a=0\).

Theorem 2

(Convergence of CRIPA with constant relaxation factors) Let \(A : {\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}\) be maximally monotone, with \(S:=A^{-1}(0) \ne \emptyset \). Let \(\{\lambda , k_0 \}\) be positive constants and assume that \(\{z_n, x_n\} \subset {\mathcal {H}}\) are generated by CRIPA-S with parameters \((\theta _n)\) and \((\gamma _n)\) (given (1.24)), along with constants \(\{b, {\bar{c}}, a_1, a_2 \}\subset (0,\infty )\) verifying

$$\begin{aligned}&\hbox {} a_2>2b \hbox {,}\,\, a_1 > b + a_2 , \end{aligned}$$
(3.31a)
$$\begin{aligned}&\hbox { }\ {\bar{c}}> \max \{a_1,a_2 \}. \end{aligned}$$
(3.31b)

Then \((x_n)\) and \((z_n)\) converge weakly to some element of S and the following results are reached:

$$\begin{aligned}&\Vert {\dot{x}}_{n+1}\Vert =o( n^{-1}), \sum _n n \Vert {\dot{x}}_{n+1}\Vert ^2< \infty , \sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2 < \infty , \end{aligned}$$
(3.32a)
$$\begin{aligned}&\Vert A_{\lambda } (x_n) \Vert =o( n^{-1}), \sum _n n \Vert A_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$
(3.32b)
$$\begin{aligned}&\sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2< \infty , \sum _{n} n^2 \Vert A_{\lambda }(x_{n+1})- A_{\lambda }(x_{n}) \Vert ^2 < \infty , \end{aligned}$$
(3.32c)
$$\begin{aligned}&\text{ for } \text{ any } q\in S, \sum _{n } \langle A_{\lambda }(x_n), x_{n}-q \rangle <\infty . \end{aligned}$$
(3.32d)

The second theorem is related to the particular case of CRIPA when \(a>0\).

Theorem 3

(Convergence of CRIPA with varying relaxation factors) Let \(A : {\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}\) be maximally monotone, with \(S:=A^{-1}(0) \ne \emptyset \), and let \(\{x_n, z_n \} \subset {\mathcal {H}}\) be generated by CRIPA with parameters \((\theta _n)\) and \((\gamma _n)\) (given by (1.24)). Suppose for some nonnegative integer p that \(\{a, b, c, a_1,a_2, {\bar{c}}\}\) are positive constants verifying

$$\begin{aligned}&\hbox {} a_2>2b \hbox {,}\,\, [\frac{a}{b}] \ge p, a_1 > b + a+a_2 , \end{aligned}$$
(3.33a)
$$\begin{aligned}&\hbox {} c > a_1 - (a_2+a) \hbox {,}\,\, {\bar{c}}=c+a_2+a. \end{aligned}$$
(3.33b)

Then \((x_n)\) and \((z_n)\) converge weakly to some element of S and the following results are reached:

$$\begin{aligned}&\Vert {\dot{x}}_{n+1}\Vert =o( n^{-1}), \sum _n n \Vert {\dot{x}}_{n+1}\Vert ^2 < \infty , \end{aligned}$$
(3.34a)
$$\begin{aligned}&\Vert A_{\lambda } (x_n) \Vert =o( n^{-(p+1)}), \sum _n n^{2p+1} \Vert A_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$
(3.34b)
$$\begin{aligned}&\sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2< \infty , \sum _{n} n^{p+2} \Vert A_{\lambda }(x_{n+1})- A_{\lambda }(x_{n}) \Vert ^2 < \infty , \end{aligned}$$
(3.34c)
$$\begin{aligned}&\text{ for } \text{ any } q\in S, \sum _{n } n^p \langle A_{\lambda }(x_n), x_{n}-q \rangle <\infty . \end{aligned}$$
(3.34d)

4 CRIPA in the convex case.

In this section, by following the methodology used by Attouch-László [2], we expose our main results relative to the minimization problem

$$\begin{aligned} \inf _{x \in {\mathcal {H}}}f(x), \end{aligned}$$
(4.1)

where \(f : {\mathcal {H}}\rightarrow \mathrm{I\!R}\cup \{+\infty \}\) is a proper convex and lower semi-continuous function such that \(\mathrm{argmin} f \ne \emptyset \).

Indeed, by Fermat’s rule we know that (4.1) is equivalent to the monotone inclusion problem

$$\begin{aligned} \text{ find } x \in {\mathcal {H}}\hbox { such that }0 \in \partial f (x). \end{aligned}$$
(4.2)

Moreover, in the special case when \(A = \partial f\), CRIPA reduces to the following algorithm.

(CRIPA-convex):

\(\rhd \) Step 1 (initialization): Let \(\{ z_{-1}, x_{-1}, x_0 \} \subset {\mathcal {H}}\).

\(\rhd \) Step 2 (main step): Given \(\{z_{n-1},x_{n-1}, x_n\} \subset {\mathcal {H}}\) (with \(n \ge 0\)), we compute the updates by

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n ( z_{n-1}- x_{n}), \end{aligned}$$
(4.3a)
$$\begin{aligned}&x_{n+1}=\frac{1}{1+k_n} z_n + \frac{k_n}{1+k_n} \mathrm{prox}_{ \lambda (1+k_n)} f (z_n), \end{aligned}$$
(4.3b)

where \((k_n)\), \((\theta _n)\) and \((\gamma _n)\) are given by (1.23) and (1.24).

As the specific case of the latter algorithm when \(a=0\) we also consider the following method:

(CRIPA-S-convex):

\(\rhd \) Step 1 (initialization): Let \(\{ z_{-1}, x_{-1}, x_0 \} \subset {\mathcal {H}}\).

\(\rhd \) Step 2 (main step): Given \(\{z_{n-1},x_{n-1}, x_n\} \subset {\mathcal {H}}\) (with \(n \ge 0\)), we compute the updates by

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n ( z_{n-1}- x_{n}), \end{aligned}$$
(4.4a)
$$\begin{aligned}&x_{n+1}=\frac{1}{1+k_0} z_n + \frac{k_0}{1+k_0} \mathrm{prox}_{ \lambda (1+k_0)} f (z_n), \end{aligned}$$
(4.4b)

where \(k_0\) is a positive constant, and where \((\theta _n)\) and \((\gamma _n)\) are given by (1.24).

Remark 4

As a fundamental tool, we also recall that the Yosida approximation of \(\partial f\) is equal to the gradient of the Moreau envelope of f. Namely, for any \(\lambda > 0\), we have \((\partial f)_{\lambda } = \nabla f_{\lambda }\), where \(f_{\lambda } : {\mathcal {H}}\rightarrow \mathrm{I\!R}\) is a \(C^{1,1}\) function, which is defined for any \(x \in {\mathcal {H}}\) by:

$$\begin{aligned} \begin{array}{l} f_{\lambda }(x)= \inf _{\xi \in {\mathcal {H}}} \left\{ f(\xi )+ \frac{1}{2} \lambda ^{-1} \Vert x - \xi \Vert ^2 \right\} . \end{array} \end{aligned}$$
(4.5)

So, an alternative formulation of CRIPA-convex in terms of Moreau envelope is given by

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n ( z_{n-1}- x_{n}), \end{aligned}$$
(4.6a)
$$\begin{aligned}&\hbox { }\ x_{n+1}=z_n - \lambda k_n \nabla f_{\lambda (1+k_n)} (z_n). \end{aligned}$$
(4.6b)

Before exposing our results regarding CRIPA-convex, we recall some properties of the Moreau envelope through the following lemma.

Lemma 5

Let \(f : {\mathcal {H}}\rightarrow \mathrm{I\!R}\cup \{+\infty \}\) be a lower semi-continuous convex and proper function such that \(\mathrm{argmin} f \ne \emptyset \), and let \(q \in S:=\mathrm{argmin} f\). Then the following properties are obtained:

$$\begin{aligned}&0 \le f_\lambda (x_{n}) -\min f \le \langle \nabla f_{\lambda }(x_n), x_{n}-q \rangle , \end{aligned}$$
(4.7a)
$$\begin{aligned}&0 \le f (\mathrm{prox}_{\lambda f}(x_{n})) -\min f \le f_\lambda (x_{n}) -\min f , \end{aligned}$$
(4.7b)
$$\begin{aligned}&2 \lambda ^{-1} \Vert x_{n}-\mathrm{prox}_{\lambda f}x_{n})\Vert ^2 \le f_\lambda (x_{n}) -\min f . \end{aligned}$$
(4.7c)

Proof

Item (4.7a) is immediate from the gradient inequality. In addition, by definition of \(f_{\lambda }\) and the proximal mapping, we have

$$\begin{aligned} f_\lambda (x_{n}) -\min f= f (\mathrm{prox}_{\lambda f}(x_{n})) -\min f + 2 \lambda ^{-1} \Vert x_{n}-\mathrm{prox}_{\lambda f}x_{n})\Vert ^2. \end{aligned}$$
(4.8)

This obviously implies items (4.7b) and (4.7c) \(\square \)

Now, we are in position to claim the main result of the this section.

Theorem 4

(Convergence of CRIPA-convex) Let \(f : {\mathcal {H}}\rightarrow \mathrm{I\!R}\cup \{+\infty \}\) be a lower semi-continuous convex and proper function such that \(S:=\mathrm{argmin} f \ne \emptyset \), and let \(\{x_n, z_n \} \subset {\mathcal {H}}\) be generated by CRIPA-convex with parameters \((\theta _n)\) and \((\gamma _n)\) (given by (1.24)). Suppose for some nonnegative integer p that \(\{a, b, c, a_1,a_2,{\bar{c}}\}\) are positive constants verifying

$$\begin{aligned}&a_2>2b \hbox {,}\;\quad [\frac{a}{b}] \ge p , a_1 > b + a+a_2 , \end{aligned}$$
(4.9a)
$$\begin{aligned}&c > a_1 - (a_2+a)\hbox {,} \;{\bar{c}}=c+(a_2+a). \end{aligned}$$
(4.9b)

Then the following properties are obtained:

$$\begin{aligned}&\Vert x_{n+1}-x_{n}\Vert =o( n^{-1}), \Vert \nabla f_{\lambda } (x_n) \Vert =o( n^{-(p+1)}) , \end{aligned}$$
(4.10a)
$$\begin{aligned}&\sum _n n \Vert x_{n+1}-x_{n}\Vert ^2< \infty , \sum _n n^{2p+1} \Vert \nabla f_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$
(4.10b)
$$\begin{aligned}&\hbox {for any} \;\;\;q\in S \hbox {,} \sum _{n } n^p \langle \nabla f_{\lambda }(x_n), x_{n}-q \rangle <\infty , \end{aligned}$$
(4.10c)
$$\begin{aligned}&\exists {\bar{x}}\in S,\hbox { s.t. }(x_n,z_n) \rightharpoondown ({\bar{x}},{\bar{x}})\hbox { weakly in }{\mathcal {H}}^2. \end{aligned}$$
(4.10d)

We also have the convergence rates:

$$\begin{aligned}&f_\lambda (x_{n}) -\min f = o( n^{-(p+1)}), \hbox { }\ \sum _n n^p (f_\lambda (x_{n}) -\min f) < \infty , \end{aligned}$$
(4.11a)
$$\begin{aligned}&f (\mathrm{prox}_{\lambda f}(x_{n})) -\min f = o( n^{-(p+1)}), \hbox { }\ \sum _n n^p \left( f (\mathrm{prox}_{\lambda f}(x_{n})) -\min f \right) < \infty ,\nonumber \\ \end{aligned}$$
(4.11b)
$$\begin{aligned}&\Vert x_{n}-\mathrm{prox}_{\lambda f}(x_{n})\Vert = o( n^{-\frac{p+1}{2}}), \sum _n n^p \Vert x_{n}-\mathrm{prox}_{\lambda f}(x_{n})\Vert ^2 < \infty . \end{aligned}$$
(4.11c)

Proof

The results given by item (4.10) are direct consequences of Theorem 3. So, (4.7a), in light of \( \Vert \nabla f_{\lambda } (x_n) \Vert =o( n^{-(p+1)})\) (from (4.10a)) and by boundedness of \((x_n)\) (from (4.10d)), yields the first result in item (4.11a). The second result in item (4.11a) follows immediately from (4.7a) and (4.10c). In addition, item (4.11b) is readily deduced from (4.7b) and (4.11a), while (4.11c) is obtained from (4.7c) and (4.11a) \(\square \)

Theorem 5

(Convergence of CRIPA-S-convex) Let \(f : {\mathcal {H}}\rightarrow \mathrm{I\!R}\cup \{+\infty \}\) be a lower semi-continuous convex and proper function such that \(S:=\mathrm{argmin} f \ne \emptyset \). Let \(\{\lambda , k_0 \}\) be positive constants and assume that \(\{x_n, z_n\} \subset {\mathcal {H}}\) are generated by CRIPA-S with parameters \((\theta _n)\) and \((\gamma _n)\) (given by (1.24)), along with constants \(\{b, a_1, a_2, {\bar{c}}\}\subset (0,\infty )\) verifying

$$\begin{aligned}&\hbox {} a_2>2b \hbox {,}\,\, a_1 > b + a_2 , \end{aligned}$$
(4.12a)
$$\begin{aligned}&\hbox { }\ {\bar{c}}> \max \{a_1,a_2 \}. \end{aligned}$$
(4.12b)

Then the following results are reached:

$$\begin{aligned}&\Vert {\dot{x}}_{n+1}\Vert =o( n^{-1}), \sum _n n \Vert {\dot{x}}_{n+1}\Vert ^2 < \infty , \end{aligned}$$
(4.13a)
$$\begin{aligned}&\Vert \nabla f_{\lambda } (x_n) \Vert =o( n^{-1}), \sum _n n \Vert \nabla f_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$
(4.13b)
$$\begin{aligned}&\sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2< \infty , \sum _{n} n^2 \Vert \nabla f _{\lambda }(x_{n+1})- \nabla f _{\lambda }(x_{n}) \Vert ^2 < \infty , \end{aligned}$$
(4.13c)
$$\begin{aligned}&\hbox {for any}\,\, q\in S \hbox {,} \sum _{n } \langle \nabla f_{\lambda }(x_n), x_{n}-q \rangle <\infty , \end{aligned}$$
(4.13d)
$$\begin{aligned}&\exists {\bar{x}}\in S,\hbox { s.t. }(x_n,z_n) \rightharpoondown ({\bar{x}},{\bar{x}})\hbox { weakly in }{\mathcal {H}}^2. \end{aligned}$$
(4.13e)

Proof

The items in (4.13) are clearly given by Theorem 2\(\square \)

5 Numerical experiments.

Some numerical experiments are performed in this section so as to illustrate the behavior of CRIPA relative to some benchmarks.

5.1 The maximally monotone case.

As done for illustrating the performance of PRINAM in [2], we consider a model example of the skew symmetric and maximally monotone operator \(A: \mathrm{I\!R}^2 \rightarrow \mathrm{I\!R}^2\) defined for \((\xi , \eta ) \in \mathrm{I\!R}^2\) by \( A(\xi , \eta ) = (-\eta , \xi )\). It is well-known that A is not the sub-differential of a convex function. We also recall that A possesses a single zero \(x^*=(0,0)\), and that A and its Yosida regularization \(A_{\lambda }\) can be identified respectively with the matrices

$$\begin{aligned} A= \left( \begin{array}{cc} 0 &{} -1 \\ 1 &{} 0 \\ \end{array} \right) , A_{\lambda } = \left( \begin{array}{cc} \frac{\lambda }{\lambda ^2+1} &{} - \frac{1}{\lambda ^2+1} \\ \frac{1}{\lambda ^2+1} &{} \frac{\lambda }{\lambda ^2+1} \\ \end{array} \right) . \end{aligned}$$
(5.1)

We approximate a zero of A by means of several algorithms: CRIPA, KAPPA (namely Kim’s accelerated proximal point algorithm given in (1.20)) and PRINAM (given in (1.17)–(1.18)). On Figs. 1 and 2 are displayed the profiles of \(\Vert x_{n}-x^*\Vert \) for the sequences \((x_n)\) generated by these algorithms:

  • Figure 1 illustrates the behavior of the iterates \((x_n)\) generated by CRIPA (with constant relaxation factors and proximal index) and by KAPPA (which only uses constant indexes). The profile obtained for KAPPA with the proximal parameter \(\mu = 0.01\) is compared with these of CRIPA for several values of \(k_0\) and \(\lambda \) such that \(\lambda (k_0+1) = \mu \). The starting points used are \(x_0=x_{-1}=z_{-1}=(1,-1)\) for CRIPA, and \(x_0=z_0=z_{-1}=(1,-1)\) for KAPPA. We run each method until the stopping criteria \(\Vert x_n -x^*\Vert \le 10^{-7}\) holds. The performance of both algorithms are similar on this simple example. However, one can notice the so many oscillations that are exhibited by KAPPA and which do not happen for CRIPA.

  • Figure 2 illustrates the behavior of the iterates \((x_n)\) generated by CRIPA (using varying relaxation factors and unbounded proximal indexes) and by PRINAM (also using unbounded proximal indexes). The profile obtained for PRINAM (when using the same parameters as in the optimal simulation proposed in [2]) is compared with these of CRIPA (for several values of p). The starting points used are \(x_0=(-1,1)\) and \(x_1=(1,-1)\) for PRINAM, and \(x_0=z_{-1}=(-1,1)\) and \(x_1=(1,-1)\) for CRIPA. Here we use the stopping criteria \(\Vert x_n -x^*\Vert \le 10^{-5}\). A faster convergence can be noticed for PRINAM than for CRIPA with \(p=1\). However the convergence of the trajectories of CRIPA is considerably accelerated for \(p \ge 2\).

Fig. 1
figure 1

CRIPA with \(b=1\), \(a_2=2.5*b\), \(a_1=1.5*(b+a_2)\), \({\bar{c}}=1.5 *max\{a_1,a_2\}\), \(\lambda =0.001\) for (C1) and \(\lambda =0.005\) for (C2). For (C1) and (C2), \(k_0\) depends on \(\lambda \) through \(k_0=0.01*\lambda ^{-1}-1\). KAPPA is considered with \(\mu =0.01\)

Fig. 2
figure 2

CRIPA with \(\lambda =0.001\), \(k_0=0.01\), \(b=1\), \(a_2=3\), \(a=1.5*(2^p-1)*b\), \(a_1=1.5*(b+a+ a_2)\), \(c=1.5*(a_1-a_2-a)\), \({\bar{c}}=c+a_2+a\). PRINAM is considered with \(s=0.1\), \(\beta =0.025\), \(r=0.1\), \(q=-0.1\), \(\lambda _1=1.01*(2\beta +s)^2 r^2s^{-1}\)

5.2 The convex case.

Given a symmetric and positive definite matrix \(A: \mathrm{I\!R}^N \rightarrow \mathrm{I\!R}^N\), we consider the convex quadratic programming problem

$$\begin{aligned} \min _{x \in \mathrm{I\!R}^N} \left\{ f(x): = \frac{1}{2}\langle A x, x \rangle \right\} . \end{aligned}$$
(5.2)

It is clear that A possesses a single zero \(x^*=0 \) and that (5.2) is equivalent to solve

$$\begin{aligned} 0 \in (\partial f) ({\bar{x}})= A{\bar{x}}. \end{aligned}$$
(5.3)

We approximate the solution to (5.3) by means of the following algorithms: CRIPA, AFB (given in (1.10)) and IGAHD (see [4]).

Remark 5

For convenience of the reader, we recall that IGAHD was introduced in [4] for minimizing smooth convex function \(f: {\mathcal {H}}\rightarrow \mathrm{I\!R}\) with L-Lipschitz continuous gradient. This procedure is given, for some nonnegative values \(\{ s, \alpha , \beta \}\), by

$$\begin{aligned} \begin{array}{l} y_{n} =x_n + \left( 1-\frac{\alpha }{n} \right) (x_{n}- x_{n-1})-\beta \sqrt{s} \left( \nabla f(x_{n}) - \nabla f(x_{n-1}) \right) - \frac{\beta \sqrt{s}}{n}\nabla f(x_{n-1}) , \\ x_{n+1}= y_n - s \nabla f( y_{n} )). \end{array}\nonumber \\ \end{aligned}$$
(5.4)

Convergence of the function values with the rates \(o(n^{-2})\) as well as the property of fast convergence to zero of the gradient (that is, \(\sum _n n^2 \Vert \nabla f(x_{n}) \Vert ^2 < \infty \)) were established under the conditions

$$\begin{aligned} \alpha \ge 3, 0 \le \beta < 2\sqrt{s}\hbox { and }s \le L^{-1}. \end{aligned}$$
(5.5)

Regarding our numerical simulation we take \(N=100\) and \(A= B^TB\) where \(B=(b_{i,j})_{1\le i,j \le N}\) is a randomized invertible matrix such that \(b_{i,j} \in [-1,1]\).

On Figs. 3, 4 and 5 are displayed the profiles of \(\Vert x_{n}-x^*\Vert \) for the iterates \((x_n)\) generated by these algorithms. The starting points used are \(x_1=x_0=(1,1,...,1)\) for IGAHD and for AFB, while we similarly choose \(x_1=x_0=z_{-1} =(1,1,...,1)\) for CRIPA. Here we use the stopping criteria \(\Vert x_n -x^*\Vert \le 10^{-5}\) :

  • In order to compare CRIPA with AFB and IGAHD, we first give some insights into the influence of the parameter \(\alpha \) on the trajectories generated by the latter two algorithms. Figures 3 and 4 feature the profiles obtained for AFB and IGAHD (for several values of \(\alpha \)).

Fig. 3
figure 3

AFB with \(\lambda =0.001\) and using several values for \(\alpha \)

Fig. 4
figure 4

IGAHD with \(\lambda =0.001\), \(\beta =0.9* 2 \sqrt{\lambda }\) and for several values of \(\alpha \)

  • On Fig. 5, some profiles obtained for IGAHD and AFB are compared with that of CRIPA (for several values of p).

Fig. 5
figure 5

CRIPA with the same parameters as in Figs. 2. IGAHD and AFB are considered with \(\alpha =10\) and the other parameters unchanged from Fig. 2