Accelerated Proximal Algorithms with a Correction Term for Monotone Inclusions

Maingé, Paul-Emile

doi:10.1007/s00245-021-09819-y

Accelerated Proximal Algorithms with a Correction Term for Monotone Inclusions

Published: 01 September 2021

Volume 84, pages 2027–2061, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Mathematics & Optimization Aims and scope Submit manuscript

Accelerated Proximal Algorithms with a Correction Term for Monotone Inclusions

Download PDF

Paul-Emile Maingé ORCID: orcid.org/0000-0001-8146-1375¹

707 Accesses
12 Citations
Explore all metrics

Abstract

In this paper, we propose an accelerated variant of the proximal point algorithm for computing a zero of an arbitrary maximally monotone operator A. The method incorporates an inertial term (based upon the acceleration techniques introduced by Nesterov), relaxation factors and a correction term. In a general Hilbert space setting, we obtain the weak convergence of the iterates $(x_n)$ to equilibria, with the fast rate $\Vert x_{n+1}-x_{n}\Vert =o(n^{-1})$ (for the discrete velocity). In particular, when using constant proximal indexes, we establish the accuracy of $(x_n)$ to a zero of A with the worst-case rate $\Vert A_{\lambda }(x_n)\Vert =o(n^{-1})$ ($A_{\lambda }$ being the Yosida regularization of A with some index $\lambda $). Furthermore, given any positive integer p and using an appropriate adjustment of increasing proximal indexes, we obtain the worst-case convergence rate $\Vert A_{\lambda }(x_n)\Vert =o(n^{-(p+1)})$. This acceleration process can be applied to various proximal-type algorithms such as the augmented Lagrangian method, the alternating direction method of multipliers, operator splitting methods and so on. Our methodology relies on a Lyapunov analysis combined with a suitable reformulation of the considered algorithm. Numerical experiments are also performed.

Regularization Proximal Method for Monotone Variational Inclusions

Article 15 October 2021

Convergence of a relaxed inertial proximal algorithm for maximally monotone operators

Article 29 June 2019

Proximal alternating penalty algorithms for nonsmooth constrained convex optimization

Article 27 September 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 The considered problem

Let ${{\mathcal {H}}} $ be a real Hilbert space endowed with inner product and induced norm denoted by $\langle .,. \rangle $ and $\Vert .\Vert $, respectively. Our goal is to propose and study a rapidly converging method for solving the monotone inclusion problem

$$\begin{aligned} \text{ find } {\bar{x}} \in {\mathcal {H}}\,\, \hbox {such that }0 \in A {\bar{x}} , \end{aligned}$$

(1.1)

under the following conditions:

$$\begin{aligned}&A : {\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}\,\, \hbox {is maximally monotone on}\,\, {\mathcal {H}}, \end{aligned}$$

(1.2a)

$$\begin{aligned}&S:=A ^{-1}(0) \ne \emptyset . \end{aligned}$$

(1.2b)

This problem finds many important applications in scientific fields such as image processing, computer vision, machine learning, signal processing, optimization, equilibrium theory, economics, game theory, partial differential equations, statistics, and so on (see, e.g., [7, 9, 17, 31, 33, 42, 46, 47]). It includes, as special cases, variational inequalities, convex-concave saddle-point problems. In particular, we recall that (1.1)–(1.2) encompasses the non-smooth convex minimization problem

$$\begin{aligned} \min _{{\mathcal {H}}} g , \end{aligned}$$

(1.3)

where g verify the following conditions:

$$\begin{aligned}&g: {\mathcal {H}}\rightarrow (-\infty , \infty ] \text{ is } \text{ proper, } \text{ convex } \text{ and } \text{ lower } \text{ semi-continuous }, \end{aligned}$$

(1.4a)

$$\begin{aligned}&{\mathrm{argmin}} _{\mathcal {H}}\hbox {} g \ne \emptyset . \end{aligned}$$

(1.4b)

A typical method for computing zeroes of a maximally monotone operator $A:{\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}$ is the so-called proximal point algorithm, PPA for short (see Martinet [36], Rockafellar [44, 45]), which consists of the iteration

$$\begin{aligned} x_{n+1}= J_{\mu A } (x_{n}), \end{aligned}$$

(1.5)

where $J_{\mu A}:=(I+\mu A)^{-1}$ denotes the proximal mapping of A (with index $\mu $) which is well-known to be single-valued, everywhere defined (see, e.g., [13, 24, 30] for more details). Many celebrated algorithms can be recast as specific cases of PPA. As examples we mention the augmented Lagrangian method [26, 41, 43], the alternating direction method of multipliers (ADMM for short, see [1, 12, 18, 22]), the split inexact Uzawa method [47], as well as operator splitting methods (including the Douglas-Rachford splitting method [19, 34] and its generalized variant in [18] (also inspired from the Peaceman-Racheford splitting method [40]). Thus any enhancement of PPA has a wide range of applications.

It is well-known that PPA generates weakly convergent sequences $(x_n)$ with worst-case rates $\Vert x_{n+1}-x_{n}\Vert = \mathcal{O}(n^{-\frac{1}{2}})$ and $\Vert A_{\mu }(x_n)\Vert = \mathcal{O}(n^{-\frac{1}{2}})$ (see, e.g. [15, 18, 23]), where $A_{\mu }= \mu ^{-1} (I-J_{\mu A})$ denotes the Yosida regularization of A. This latter operator enjoys numerous nice properties (see, for instance, [13, 14]). In particular, it satisfies $A_{\mu }^{-1}(0)=S:=A^{-1} (0 )$ and the quantity $\Vert A_{\mu }(x)\Vert $ (for $x\in {\mathcal {H}}$) can be used to measure the accuracy of x to a zero of A (see [18]).

The minimization problem (1.3)–(1.4), as a special instance of (1.1)–(1.2) when $A=\partial g$ (where $\partial g$ is the Fenchel sub-differential of g) can be solved by PPA in which the resolvent operator of A reduces to

$$\begin{aligned} J_{\mu \partial g }=\mathrm{prox}_{\mu g}(x):=\mathrm{argmin} _{ y \in {\mathcal {H}}} \left( g(y)+ (2\mu )^{-1}\Vert x-y\Vert ^2 \right) . \end{aligned}$$

(1.6)

Recall that the corresponding PPA has been considerably enhanced, by means of extrapolation processes based upon Nesterov’s and Güler’s acceleration techniques. The latter variant generates convergent sequences $(x_n)$ with worst-case convergence rates $g(x_n)-\inf _{{\mathcal {H}}}g =o(n^{-2})$ (for the function values), $\Vert x_{n+1}-x_{n}\Vert = o(n^{-1})$ (for the discrete velocity) and $\Vert x_n^*\Vert =o(n^{-1})$ (for the sub-gradients $x_n^* \in \partial g(x_n) $), instead of the worst-case rates $g(x_n)-\inf _{{\mathcal {H}}} g ={{\mathcal {O}}}(n^{-1})$, $\Vert x_{n+1}-x_{n}\Vert ={{\mathcal {O}}}(n^{-\frac{1}{2}})$ and $\Vert x_n^*\Vert =o(n^{-\frac{1}{2}})$ established for PPA.

Our purpose here is to extend the above results to sequences $(x_n)$ given by some accelerated variant of PPA, dedicated to solve the more general problem (1.1)–(1.2), especially in term of the quantities $\Vert x_{n+1}-x_{n}\Vert $ and $\Vert A_{\lambda }(x_n)\Vert $ (for some positive value $\lambda $).

Specifically, for solving (1.1), we introduce CRIPA (Corrected Relaxed Inertial Proximal Algorithm) which enters the following framework of sequences $\{z_n, x_n\} \subset {\mathcal {H}}$ generated from starting points $\{z_{-1},x_{-1},x_0\}$ by

$$\begin{aligned}&\ z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n ( z_{n-1}-x_n), \end{aligned}$$

(1.7a)

$$\begin{aligned}&x_{n+1}=(1-w_n )z_n+ w_n J_{r_n A}(z_{n}) , \end{aligned}$$

(1.7b)

where $\{ w_n, \theta _n, \gamma _n \} \subset (0,1)$ and $(r_n) \subset (0,\infty )$. These parameters will be specified farther.

The considered algorithm combines relaxation factors $(w_n)$, a momentum term “$\theta _n(x_{n}-x_{n-1})$” (based on Nesterov’s acceleration techniques) and a correction term “$\gamma _n (x_n -z_{n-1})$” (similar to that introduced by Kim [27] in some variant of PPA).

Compared to PPA, CRIPA keeps the computational cost at each iteration basically unchanged, while allowing us to extend to the wide framework of maximal monotone inclusions, the interesting convergence properties obtained for the accelerated variant relative to convex minimization. More precisely, using conveniently chosen parameters $\{r_n , w_n, \theta _n, \gamma _n\}$, we establish among others (see Theorem 1), the weak convergence of $(x_n)$ towards some zero of A, with the convergence rate $\Vert x_{n+1}-x_{n}\Vert = o(n^{-1})$ for the discrete velocity. In addition, when using constant proximal indexes $r_n$ (see Theorem 2), we obtain the accuracy of $(x_n)$ to an equilibria with the worst-case rate $\Vert A_{\lambda }(x_n)\Vert =o(n^{-1})$ (for some positive value $\lambda $). This latter estimate is considerably improved when using increasing proximal indexes (see Theorem 3). To the best of our knowledge, there are no such convergence results established so far for algorithmic solution to (1.1)–(1.2).

1.2 A brief review of the state of art.

1.2.1 Convex minimization and Güuler’s acceleration processes.

Let us give some reminders concerning the useful and efficient methods proposed by Güler [25], based upon ideas of Nesterov [37] (also see [38, 39]), for minimizing a convex lower semi-continuous function g. As particular instances of these acceleration techniques (when using a constant proximal index $\mu $), we mention the following two algorithms:

The first consists of the sequences $\{x_n,z_n\}$ given for $n \ge 0$ by

$$\begin{aligned}&x_{n+1}= J_{\mu \partial g} (z_n), \end{aligned}$$

(1.8a)

$$\begin{aligned}&\hbox { }\ t_{n+1}=\frac{1}{2}\left( 1+ \sqrt{1+ 4t_n^2} \right) , \end{aligned}$$

(1.8b)

$$\begin{aligned}&z_{n+1}= x_{n+1} + \frac{t_n-1}{t_{n+1}} (x_{n+1} - x_n). \end{aligned}$$

(1.8c)

The second method consists of $\{x_n,z_n\}$ given for $n \ge 0$ by

$$\begin{aligned}&x_{n+1}= J_{\mu \partial g} (z_n), \end{aligned}$$

(1.9a)

$$\begin{aligned}&\hbox { }\ t_{n+1}=\frac{1}{2}\left( 1+ \sqrt{1+ 4t_n^2} \right) , \end{aligned}$$

(1.9b)

$$\begin{aligned}&z_{n+1}= x_{n+1} + \frac{t_n-1}{t_{n+1}} (x_{n+1} - x_n)+ \frac{t_n}{t_{n+1}} (x_{n+1} - z_n). \end{aligned}$$

(1.9c)

Note that both methods include a momentum term $\frac{t_n-1}{t_{n+1}} (x_{n+1} - x_n)$ while the second one incorporates an additional correction term $\frac{t_n}{t_{n+1}} (x_{n+1} - z_n)$.

Both the methods (1.8) and (1.9) were shown to produce iterates $(x_n)$ that guarantee a worst-case rate $g(x_n)-\inf _{{\mathcal {H}}} g= {{\mathcal {O}}}(n^{-2})$ for minimizing the function values. However the convergence of the iterates has not been established.

This drawback was overcame by Chambolle-Dossal [16] (also see Attouch-Peypouquet [5]), through the variant of (1.8) given by

$$\begin{aligned} \begin{array}{l} z_{n} =x_n + \frac{n-1}{ n+\alpha -1} (x_{n}- x_{n-1}), \\ x_{n+1}= J_{\mu \partial g}( z_{n} ), \end{array} \end{aligned}$$

(1.10)

for some positive constant $\alpha $. It was proved for $\alpha >3$ (see [5]) that (1.10) generates (weakly) convergent sequences $(x_n)$ that minimize the function values with a complexity result of $o(n^{-2})$, instead of the rates ${{\mathcal {O}}}(n^{-1})$ and ${{\mathcal {O}}}(n^{-2})$ obtained for (1.5) and Güler’s processes (1.8)–(1.9), respectively.

Note that the iterates $\{x_n,z_n\}$ generated by the second model of Güler (1.9) satisfy, for $n \ge 1$,

$$\begin{aligned}&z_{n}= x_{n} + \frac{t_{n-1}-1}{t_{n}} (x_{n} - x_{n-1})+ \frac{t_{n-1}}{t_{n}} (x_{n} - z_{n-1}), \end{aligned}$$

(1.11)

$$\begin{aligned}&x_{n+1}= J_{\mu \partial g } (z_n). \end{aligned}$$

(1.12)

Thus, the second algorithm (1.9), which is closely related to the optimized gradient methods discussed by Kim-Fessler [28, 29], uses a correction term other than the one involved in (1.7).

1.2.2 Monotone inclusions and acceleration processes.

Let us emphasize that, regarding the existing algorithmic solutions to (1.1)–(1.2) with an arbitrary monotone operator, there are no analogous theoretical convergence result to that obtained for (1.10). Many papers have been dedicated to accelerating PPA in its general form, by means of relaxation and inertial techniques. It seems (to the best of our knowledge) that only empirical accelerations have been obtained, except for the recent works by Attouch-Peypouquet [5], Attouch-László [2] and Kim [27]:

An accelerated variant of PPA has been proposed and investigated by Attouch-Peypouquet [5] through the following RIPA (Regularized Inertial Proximal Algorithm)

$$\begin{aligned}&z_n=x_n+ \left( 1-\frac{\alpha }{n} \right) (x_{n}-x_{n-1}), \end{aligned}$$

(1.13)

$$\begin{aligned}&x_{n+1}=\frac{\lambda _n}{\lambda _n+s} z_n+ \frac{s}{\lambda _n+s} J_{(\lambda _n+s)A} (z_n), \end{aligned}$$

(1.14)

where $\{s, \alpha , \lambda _n\}$ are positive parameters such that

$$\begin{aligned} \alpha>2 \text{ and } \lambda _n=(1+\epsilon )\frac{s}{\alpha ^2}n ^2 \text{(for } \text{ some } \epsilon >0). \end{aligned}$$

(1.15)

It was established (see [5, Theorem 3.6]) that (1.13)–(1.15) produces convergent sequences $(x_n)$ with worst-case rates

$$\begin{aligned} \Vert x_{n+1}-x_{n}\Vert = {{\mathcal {O}}}(n^{-1}) \text{ and } \Vert A_{\lambda _n+s}x_n \Vert = o(n^{-1}). \end{aligned}$$

(1.16)

This algorithm was applied by Attouch [1] to convex structured minimization problems with linear constraints and it gave rise to an inertial proximal ADMM algorithm.

Another numerical approach to (1.1)–(1.2) was addressed by Attouch-László [2] through the following PRINAM (Proximal Regularized Inertial Newton Algorithm)

$$\begin{aligned}&z_{n-1} =\left( 1 -\beta \left( \frac{1}{\lambda _n}- \frac{1}{\lambda _{n-1}}\right) \right) x_n + \left( \alpha _{n}- \frac{\beta }{\lambda _{n-1}} \right) {\dot{x}}_{n}\nonumber \\&\qquad \qquad +\ \beta \left( \frac{1}{\lambda _n} J_{\lambda _{n}A}(x_{n}) - \frac{1}{\lambda _{n-1}} J_{\lambda _{n-1}A}(x_{n-1})\right) , \end{aligned}$$

(1.17a)

$$\begin{aligned}&x_{n+1} = \frac{\lambda _{n+1}}{\lambda _{n+1}+s} z_{n-1} + \frac{s}{\lambda _{n+1}+s} J_{(\lambda _{n+1}+s)A}(z_{n-1} ), \end{aligned}$$

(1.17b)

where ${\dot{x}}_{n}=x_n - x_{n-1}$, while s, $\beta $, $(\lambda _n)$ and $(\alpha _n)$ are positive parameters that satisfy

$$\begin{aligned} \alpha _n= \frac{r n + q-1}{r(n+1)+q} \text{ with } r>0 \text{ and } q \in (-\infty ,\infty ), \lambda _n= \lambda n^2 \text{ with } \lambda > \frac{(2\beta +s)^2 r^2}{s}. \end{aligned}$$

(1.18)

It was established (see [2, Theorem 1.1]) that (1.17a)–(1.17b) generates convergent sequences $(x_n)$ with worst-case rates

$$\begin{aligned} \Vert x_{n+1}-x_{n}\Vert = {{\mathcal {O}}}(n^{-1})\hbox { and }\Vert A_{\lambda _n}(x_n)\Vert = o(n^{-2}). \end{aligned}$$

(1.19)

This second method also use a correction term "$\beta \left( \frac{1}{\lambda _n} J_{\lambda _{n}A}(x_{n}) - \frac{1}{\lambda _{n-1}} J_{\lambda _{n-1}A}(x_{n-1})\right) $" different from the one involved in (1.7).

Note that, for accelerating the proximal point algorithm, both the previous methods require proximal indexes $(\lambda _n)$ that go to infinity as $n \rightarrow \infty $.

Remark 1

This last observation is fundamental in view of decomposition techniques which require to use only constant or bounded proximal indexes; for instance, in forward-backward algorithms ( [33]), or in operator splitting methods ( [18, 19, 21]).

This is not the case for the accelerated proximal point method proposed by Kim [27], based on the performance estimation problem (PEP) approach of Drori-Teboulle [20] and which writes, for initial iterates $\{x_0, z_0, z_{-1}\} \subset {\mathcal {H}}$ and for $n \ge 0$,

$$\begin{aligned}&x_{n+1}= J_{\mu A} (z_n), \end{aligned}$$

(1.20a)

$$\begin{aligned}&z_{n+1}= x_{n+1} + \frac{n}{n+2} (x_{n+1} - x_n)+ \frac{n}{n+2} ( z_{n-1}-x_{n}). \end{aligned}$$

(1.20b)

This yields the following worst-case convergence rate $\Vert x_{n}-z_{n-1}\Vert = {{\mathcal {O}}}(n^{-1})$ (see [27, Theorem 4.1]), or equivalently $\Vert A_{\mu } (z_n)\Vert ={{\mathcal {O}}}(n^{-1})$, which entails (since $A_{\mu }$ is Lipschtz continuous)

$$\begin{aligned} \Vert A_{\mu } (x_n)\Vert ={{\mathcal {O}}}(n^{-1}). \end{aligned}$$

(1.21)

However no convergence of the iterates was established for (1.20).

1.3 CRIPA and an overview of the related results.

1.3.1 Introducing CRIPA.

Our numerical approach to (1.1)–(1.2) is more precisely given by sequences $\{x_n,z_n\}$ generated by the following (corrected, relaxed and inertial) algorithm :

(CRIPA):

$\rhd $ Step 1 (initialization): Let $\{ z_{-1}, x_{-1}, x_0 \} \subset {\mathcal {H}}$, $\lambda >0$.

$\rhd $ Step 2 (main step): Given $\{z_{n-1},x_{n-1}, x_n\} \subset {\mathcal {H}}$ (with $n \ge 0$), we compute the updates by

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n \left( z_{n-1}- x_{n}\right) , \end{aligned}$$

(1.22a)

$$\begin{aligned}&x_{n+1}= \frac{1}{1+k_{n}} z_n + \frac{k_n}{1+k_{n}} J_{\lambda (1+k_n)A} (z_n) , \end{aligned}$$

(1.22b)

where $k_n$, $\theta _n$ and $\gamma _n$ are non-negative parameters defined, for some constant $\{a, c,a_1,a_2 \}\subset [0,\infty )$ and $\{ b, {\bar{c}}\} \subset (0,\infty )$, as follows:

$(k_n)$ is given recursively, from $k_0 >0$ and for $n \ge 1$, by
$$\begin{aligned} k_n=k_{n-1} \left( 1 + \frac{a}{ b n + c} \right) . \end{aligned}$$
(1.23)
$(\theta _n)$ and $(\gamma _n)$ are given for $n \ge 0$ by
$$\begin{aligned} \theta _n= 1- \frac{a_1 }{b n + {\bar{c}}}, \quad \gamma _n= 1-\frac{a_2}{b n +{\bar{c}}}. \end{aligned}$$
(1.24)

Let us stress that, for conventional reasons, we can assume that ${\bar{c}}>a_i$ (for $i=1,2$), so as to ensure that $\{\theta _n, \gamma _n\} \subset (0,1)$ (even though this is not of great importance for convergence).

A particular attention will be paid to the special case of CRIPA when $a=0$ (namely when using constant relaxation factors), which will be considered through the following formulation:

(CRIPA-S):

$\rhd $ Step 1 (initialization): Let $\{ z_{-1}, x_{-1}, x_0 \} \subset {\mathcal {H}}$, $\lambda >0$.

$\rhd $ Step 2 (main step): Given $\{z_{n-1},x_{n-1}, x_n\} \subset {\mathcal {H}}$ (with $n \ge 0$), we compute the updates by

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n ( z_{n-1}- x_{n}), \end{aligned}$$

(1.25a)

$$\begin{aligned}&x_{n+1}=\frac{1}{1+k_{0}}z_n +\frac{k_0}{1+k_{0}} J_{\lambda (1+k_0) A} (z_n). \end{aligned}$$

(1.25b)

1.3.2 An overview of the main results.

Under convenient assumptions on the constants $\{a, b, c, a_1, a_2, {\bar{c}}\}$ we establish (among others) the weak convergence of $\{x_n\}$ (generated by CRIPA) towards equilibria, but we also set up the following estimates (see Theorems 1, 2 and 3):

$$\begin{aligned}&\Vert x_{n+1}-x_{n}\Vert =o( n^{-1}), \sum _n n \Vert x_{n+1}-x_{n}\Vert ^2 < \infty , \end{aligned}$$

(1.26a)

$$\begin{aligned}&\sum _{n } n^2 \Vert (x_{n+1}-x_{n})-(x_{n}-x_{n-1})\Vert ^2 < \infty . \end{aligned}$$

(1.26b)

Moreover, given any arbitrary nonnegative integer p, we exhibit a set of conditions on the parameters that ensure the following properties (see Theorem 2)

$$\begin{aligned}&\Vert A_{\lambda } (x_n) \Vert =o( n^{-(p+1)}), \sum _n n^{2p+1} \Vert A_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$

(1.27a)

$$\begin{aligned}&\text{ for } \text{ any } q\in A^{-1}(0), \sum _{n } n^p \langle A_{\lambda }(x_n), x_{n}-q \rangle <\infty . \end{aligned}$$

(1.27b)

In particular, when A is the sub-differential of a proper and convex lower semi-continuous function $f:{\mathcal {H}}\rightarrow \mathrm{I\!R}\cup \{+\infty \}$, we reach the following convergence rates of the values (see Theorem 4)

$$\begin{aligned} f_\lambda (x_{n}) -\min f = o( n^{-(p+1)}), \sum _n n^p (f_\lambda (x_{n}) -\min f) < \infty , \end{aligned}$$

(1.28)

where $f_{\lambda }$ denotes the Moreau envelope of f.

Our process provides a significant acceleration of PPA, even when using a constant proximal index, besides generating convergent iterates. It also involves a correction term similar to that used by Kim [27]. It would be interesting to understand the role of the correction term used in the proposed method, through a continuous counterpart, but this is out of the scope of this work. Let us mention that there is a flourishing literature devoted to accelerated continuous counterpart of PPA with general operators (see, for instance, [2, 3, 5, 8, 11, 35]).

1.4 Organization of the paper.

In Sect. 2, we give some preliminaries on CRIPA. The method is reformulated in terms of Yosida approximations and a crucial estimation is proposed for a Lyapunov analysis. Section 3 is devoted to the convergence analysis of CRIPA. A main result (Theorem 1) is established in a general setting of operators and parameters. As consequences of our main theorem, we claim two other results (Theorems 2 and 3) relative to the involved parameters. In Sect. 4, we specialize our main results in the setting of convex minimization. Two related results are proposed (Theorems 4 and 5) are set up. Next, numerical experiments are performed in Sect. 5 and several technical results are established in Appendix.

Remark 2

From now on, so as to simplify the presentation, we (often) use the following notation: given any sequence $(u_n)$, we denote ${\dot{u}}_n=u_n-u_{n-1}$. We also denote the integer part of any real number a by [a].

2 Preliminaries on CRIPA.

In this section, we provide a suitable reformulation of CRIPA in terms of Yosida approximations and we exhibit a Lyapunov sequence in connection with the algorithm. These two arguments will allow us to obtain separately a series of preliminary estimates (in Sect. 3) that will be combined so as to reach our statements in Theorem 1.

2.1 Formulation of CRIPA by means of Yosida approximations.

Let sequences $\{x_n,z_n\}$ verify (1.22b) relative to some positive parameters $\{ k_n, \lambda \}$ and some maximally monotone operator $A:{\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}$.

By definition of the Yosida approximation we have $A_{\lambda }=\lambda ^{-1}(I-J_{\lambda A})$ as well as

$$\begin{aligned} (I + \lambda k_n A_{\lambda })^{-1}=I-\lambda k_n (A_{\lambda })_{\lambda k_n}. \end{aligned}$$

(2.1)

Hence, according to the resolvent equation (formulated as a semi-group property) $ (A_{\delta })_{\kappa }=A_{\delta +\kappa }$ (for any positive values $\delta $ and $\kappa $), we infer that

$$\begin{aligned} (I + \lambda k_n A_{\lambda })^{-1}= I-\lambda k_n A_{\lambda (1+k_n)}. \end{aligned}$$

(2.2)

So, by $x_{n+1}= z_n-\lambda k_n A_{\lambda (1+k_n)}(z_n)$ (from (1.22b)), we deduce that

$$\begin{aligned} x_{n+1}=(I + \lambda k_n A_{\lambda })^{-1}(z_n). \end{aligned}$$

(2.3)

This observation will be of crucial interest with regard to the methodology developed farther.

As a key result in our analysis, we give the following lemma.

Lemma 1

The iterates $\{x_n, z_n\}$ generated by CRIPA satisfy (for $n \ge 0$)

$$\begin{aligned}&z_{n}- x_{n+1}= \lambda k_n A_{\lambda }(x_{n+1}), \end{aligned}$$

(2.4a)

$$\begin{aligned}&{\dot{x}}_{n+1}+ (z_{n}- x_{n+1}) = \theta _n{\dot{x}}_{n}+ \gamma _{n} (z_{n-1}- x_{n}) . \end{aligned}$$

(2.4b)

Proof

It is no difficult to see for $n \ge 0$ that (2.3) is equivalent to $z_n=x_{n+1}+ \lambda k_n A_{\lambda }(x_{n+1})$, which yields (2.4a). Furthermore, by $z_n= x_n + \theta _n{\dot{x}}_{n}+ \gamma _n (z_{n-1}- x_{n}) $ (from (1.22a)), we simply obtain $z_n-x_{n+1}= - {\dot{x}}_{n+1}+ \theta _n{\dot{x}}_{n}+\gamma _n (z_{n-1}- x_{n})$, which entails (2.4b)$\square $

As a consequence of the previous arguments we make the following observation.

Remark 3

The iterates $\{x_n, z_n\}$ given by CRIPA satisfy (for $n \ge 1$)

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) +\gamma _n \lambda k_{n-1} A_{\lambda }(x_n) , \end{aligned}$$

(2.5a)

$$\begin{aligned}&x_{n+1}= J_{\lambda k_n A_{\lambda } } (z_n) . \end{aligned}$$

(2.5b)

This latter formulation enlightens the fact that the Moreau-Yosida regularization compatibilizes the operator’s lack of co-coercivity with the acceleration scheme (see [5, 6]).

2.2 A general inequality for a Lyapunov analysis

At once we provide a useful inequality dedicated to a Lyapunov analysis of the considered method. With the iterates $\{x_n \}$ produced by CRIPA, we associate the sequence $( {{\mathcal {E}}}_n(s,q))$ defined for $(s,q) \in (0, \infty ) \times S$ and for $n \ge 1$ by

$$\begin{aligned} \begin{array}{l} {{\mathcal {E}}}_n(s,q)= \frac{1}{2} \Vert s (q-x_{n})- (bn +{\bar{c}}-a_1) {\dot{x}}_{n}\Vert ^2 \\ + \frac{1}{2} s(a_1-b -s) \Vert x_{n}-q\Vert ^2+ s \lambda k_{n-1} ( b(n-1)+{\bar{c}}) \langle A_{\lambda }(x_{n}),x_{n}-q \rangle . \end{array} \end{aligned}$$

(2.6)

The next result will be helpful with regard to our forthcoming analysis.

Lemma 2

Suppose that (1.2) holds and let $\{x_n\} \subset {\mathcal {H}}$ be generated by CRIPA with sequences $(k_n)$, $(\theta _n)$ and $(\gamma _n)$ (given by (1.23) and (1.24)), along with constants $\{\lambda , k_0\} \subset (0,\infty )$, $\{a, c, a_1,a_2 \}\subset [0,\infty )$ and $\{b, {\bar{c}}\} \subset (0,\infty )$ verifying

$$\begin{aligned} a_1 >b. \end{aligned}$$

(2.7)

Then, for $(s,q)\in (0,a_1-b] \times {\mathcal {H}}$ and for $n \ge N_0$ (with some $N_0$ large enough), we have

$$\begin{aligned} \begin{array}{l} \dot{{\mathcal {E}}}_{n+1}(s,q) + s \lambda k_{n-1} \left( a_2-b \right) \langle A_{\lambda } (x_{n}), x_n -q \rangle \\ \quad \qquad +\; \lambda k_{n-1} (bn+{\bar{c}}) \left( a \frac{bn+{\bar{c}}-s}{b n +c} + a_2 -s \right) \langle A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ \quad \qquad +\; \lambda ^2 k_{n} (b n+{\bar{c}})(b n+{\bar{c}}-s) \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n})\Vert ^2 \\ \quad \qquad +\; \frac{1}{2} (b n+{\bar{c}}) ^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2 \\ \quad \qquad +\; \frac{1}{2}\left( a_1-b -s \right) \big ( (2 n+1)b + 2{\bar{c}}- a_1 \big ) \Vert {\dot{x}}_{n+1}\Vert ^2 \le 0. \end{array} \end{aligned}$$

(2.8)

The proof of Lemma 2 is divided into two parts through the next subsections.

2.2.1 Proof of Lemma 2 - part 1.

An important equality of independent interest is proposed here relative to our method through the wider framework of sequences $\{x_n, d_{n}\} \subset {\mathcal {H}}$ and parameters $\{e, \nu _n, \theta _n\} \subset (0,\infty )$ verifying (for $n \ge 0$)

$$\begin{aligned}&{\dot{x}}_{n+1}- \theta _n{\dot{x}}_{n}+ d_n =0, \end{aligned}$$

(2.9a)

$$\begin{aligned}&(e+ \nu _ {n+1}) \theta _n= \nu _n. \end{aligned}$$

(2.9b)

For this purpose, we associate with (2.9) the quantity $G_n(s,q)$ given for $(s,q)\in [0,\infty )\times {\mathcal {H}}$ by

$$\begin{aligned} G_n(s,q) = \frac{1}{2} \Vert s (q-x_{n})- \nu _n {\dot{x}}_{n}\Vert ^2+ \frac{1}{2} s(e -s) \Vert x_{n}-q\Vert ^2, \end{aligned}$$

(2.10)

Basic properties regarding the sequence $({G}_{n}(s,q))$ are established through the following proposition.

Proposition 1

Let $\{x_n, d_n \} \subset {\mathcal {H}}$ and $\{ \theta _n, \nu _n, e \} \subset (0,\infty )$ verify (2.9). Then for $(s,q) \in \left( 0, e \right] \times {\mathcal {H}}$ and for $n \ge 0$ we have

$$\begin{aligned} \begin{array}{l} {\dot{G}}_{n+1}(s,q) + \frac{1}{2} (e+\nu _{n+1}) ^2 \Vert {\dot{x}}_{n+1}- \theta _n{\dot{x}}_{n}\Vert ^2 \\ \qquad \qquad +\; s (e+\nu _{n+1}) \langle d_{n}, x_{n+1}-q \rangle \\ \qquad \qquad +\; \left( e-s +\nu _{n+1} \right) (e+\nu _{n+1}) \langle d_{n} ,{\dot{x}}_{n+1}\rangle = - \frac{1}{2}\left( e -s \right) ( e+2 \nu _{n+1} ) \Vert {\dot{x}}_{n+1}\Vert ^2 . \end{array} \end{aligned}$$

(2.11)

The proof of Proposition 1 is given in Appendix 1.

2.2.2 Proof of Lemma 2 - part 2.

It can be checked (from (2.4)) that CRIPA enters the special case of algorithm (2.9) (for $n \ge 1$) when taking the particular parameters $e=a_1-b$ (with $a_1>b$), $\nu _{n}=b n +{\bar{c}}-a_1$, together with $d_n$ as below

$$\begin{aligned} d_n= \lambda k_{n} A_{\lambda } (x_{n+1})- \gamma _n \lambda k_{n-1} A_{\lambda } (x_{n}). \end{aligned}$$

(2.12)

In this specific situation, we get

$$\begin{aligned} e+\nu _{n+1}= b n +{\bar{c}}\hbox { and } e+2 \nu _{n+1}= (2n+1)b +2{\bar{c}}-a_1. \end{aligned}$$

(2.13)

Hence, for $n \ge N_0$, for some $N_0$ large enough (so as to ensure that $\nu _n$ is positive), and for $s \in (0, a_1-b]$, by Proposition 1 we obtain

$$\begin{aligned} \begin{array}{l} {\dot{G}}_{n+1}(s,q) + \frac{1}{2} (b n+{\bar{c}})^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2+ \Gamma _n \\ = - \frac{1}{2}\left( a_1-b -s \right) ( (2 n+1)b + 2{\bar{c}}- a_1) \Vert {\dot{x}}_{n+1}\Vert ^2 , \end{array} \end{aligned}$$

(2.14)

where $G_n(s,q)$ and $\Gamma _{n}$ are given by

$$\begin{aligned}&G_n(s,q) = \frac{1}{2} \Vert s (q-x_{n})- \left( b n +{\bar{c}}-a_1 \right) {\dot{x}}_{n}\Vert ^2+ \frac{1}{2} s(a_1-b -s) \Vert x_{n}-q\Vert ^2,\nonumber \\ \end{aligned}$$

(2.15a)

$$\begin{aligned}&\Gamma _{n} = s (b n+{\bar{c}}) \langle d_n, x_{n+1}-q \rangle + (b n+{\bar{c}}) (b n+{\bar{c}}-s)\langle d_n ,{\dot{x}}_{n+1}\rangle . \end{aligned}$$

(2.15b)

In order to evaluate $\Gamma _n$ and for simplification, we set

$$\begin{aligned} U_n:= \langle \lambda k_{n-1} A_{\lambda } (x_{n}), x_{n}-q \rangle . \end{aligned}$$

According to the formulation of $d_n$ (given in (2.12)) we have

$$\begin{aligned} \begin{array}{l} \langle d_n, x_{n+1}-q \rangle \\ = \bigg< \lambda k_{n} A_{\lambda } (x_{n+1}) -\gamma _n \lambda k_{n-1} A_{\lambda } (x_{n}), x_{n+1}-q \bigg> \\ = \bigg< \lambda k_{n} A_{\lambda } (x_{n+1}), x_{n+1}-q \bigg> -\gamma _n \bigg < \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle - \gamma _n \langle \lambda k_{n-1} A_{\lambda } (x_{n}), x_{n}-q \bigg > \\ = U_{n+1} - \gamma _n U_{n}-\gamma _n \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle , \end{array} \end{aligned}$$

as well as

$$\begin{aligned} \begin{array}{l} \langle d_n, {\dot{x}}_{n+1}\rangle \\ = \bigg< \lambda k_{n} A_{\lambda } (x_{n+1}) -\gamma _n \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\bigg> \\ = \bigg< \lambda k_{n} A_{\lambda } (x_{n+1})- \lambda k_{n-1} A_{\lambda } (x_{n}) + (1-\gamma _n) \lambda k_{n-1} A_{\lambda } (x_{n}) , {\dot{x}}_{n+1}\bigg> \\ = \bigg< \lambda k_{n} A_{\lambda } (x_{n+1})- \lambda k_{n-1} A_{\lambda } (x_{n}),{\dot{x}}_{n+1}\bigg>+ (1-\gamma _n) \big < \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\big > \\ = \lambda k_{n} \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ \qquad \qquad +\; \lambda \left( k_{n}-k_{n-1} \right) \langle A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle + (1-\gamma _n) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ = (\lambda k_{n} ) \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ \qquad \qquad +\; \bigg ( \frac{k_{n}-k_{n-1}}{k_{n-1}} + (1-\gamma _n) \bigg ) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle . \end{array} \end{aligned}$$

This in light of (2.15b) amounts to

$$\begin{aligned} \begin{array}{l} (b n+{\bar{c}})^{-1}\Gamma _{n} = s \langle d_n, x_{n+1}-q \rangle + (b n+{\bar{c}}-s)\langle d_n ,{\dot{x}}_{n+1}\rangle \\ \\ = s \bigg ( U_{n+1} - \gamma _n U_{n} -\gamma _n \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \bigg ) \\ \quad \qquad +\; (\lambda k_{n}) (b n+{\bar{c}}-s) \langle A_{\lambda } (x_{n+1})- A_{\lambda } (x_{n}),{\dot{x}}_{n+1}\rangle \\ \quad \qquad +\; (b n+{\bar{c}}-s) \left( \frac{k_{n}-k_{n-1}}{k_{n-1}} + 1-\gamma _n \right) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ \\ = s \left( U_{n+1} - \gamma _n U_{n} \right) \\ \quad \qquad +\; (\lambda k_{n}) (b n+{\bar{c}}-s) \langle A_{\lambda } (x_{n+1})- A_{\lambda } (x_{n}),{\dot{x}}_{n+1}\rangle \\ \quad \qquad +\; \bigg ( (b n+{\bar{c}}-s) \frac{k_{n}-k_{n-1}}{k_{n-1}} \quad \qquad +\; (b n+{\bar{c}}-s) \left( 1-\gamma _n \right) - s \gamma _n \bigg ) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle . \end{array} \end{aligned}$$

The latter equality can be rewritten as

$$\begin{aligned} \begin{array}{l} (b n+{\bar{c}})^{-1}\Gamma _{n} = s ( U_{n+1} -\gamma _n U_n ) \\ \quad +\; \lambda k_{n}(b n+{\bar{c}}-s) \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\ \quad +\; \left( (b n+{\bar{c}}-s) \frac{k_{n}-k_{n-1}}{k_{n-1}}+ (b n+{\bar{c}})(1-\gamma _n) -s \right) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle . \end{array} \end{aligned}$$

(2.16)

Moreover, we recall that $\gamma _n= 1- \frac{a_2}{ n + {\bar{c}}}$. Then, by an easy computation, we obtain the following two equalities

$$\begin{aligned}&\hbox { }\ \gamma _{n} (b n+{\bar{c}})= (b n+{\bar{c}})- a_2 = (b (n-1)+{\bar{c}})+ b - a_2, \\&\hbox { }\ (b n+{\bar{c}})(1-\gamma _n) -s= a_2 -s . \end{aligned}$$

It is also readily checked from (1.24) that $k_n$ satisfies

$$\begin{aligned} \frac{ k_n - k_{n-1}}{ k_{n-1}}= \frac{a}{b n +c}. \end{aligned}$$

(2.17)

Therefore by the previous arguments we get

$$\begin{aligned} \begin{aligned} \Gamma _n = s ( b n&+{\bar{c}}) U_{n+1} - s (b (n-1)+{\bar{c}}) U_{n} + s \left( a_2-b \right) U_n \\&+ (b n+{\bar{c}}) \left( a \frac{b n+{\bar{c}}-s }{b n +c}+ a_2 -s \right) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\&+ \lambda k_{n} (n+{\bar{c}})(n+{\bar{c}}-s) \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle . \end{aligned} \end{aligned}$$

(2.18)

Consequently, by noticing that ${{\mathcal {E}}}_{n}(s,q)={ G}_{n}(s,q) + s ( b(n-1) +{\bar{c}}) U_{n}$ (in light (2.6) and (2.15a)), and using (2.14) and (2.18), we deduce

$$\begin{aligned}&\dot{{\mathcal {E}}}_{n+1}(s,q) + s \left( a_2-b \right) U_n + (b n+{\bar{c}}) \left( a \frac{b n+{\bar{c}}-s}{b n +c}+ a_2 -s \right) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\&\qquad \qquad \qquad + \lambda k_{n} (b n+{\bar{c}})(b n+{\bar{c}}-s) \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\&\qquad \qquad \qquad + \frac{1}{2} (b n+{\bar{c}})^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2 \\&= - \frac{1}{2}\left( a_1-b -s \right) \big ( (2 n+1)b + 2{\bar{c}}- a_1\big ) \Vert {\dot{x}}_{n+1}\Vert ^2 . \end{aligned}$$

In addition, the well-known property of $\lambda $-co-coerciveness of $A_{\lambda }$ implies that

$$\begin{aligned} \langle A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \ge \lambda \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}) \Vert ^2. \end{aligned}$$

(2.19)

Thus, for $n \ge N_0$, for some $N_0$ large enough (which also ensures that $(b n+{\bar{c}}-(a_1-b))$ is positive), and for $s \in (0, a_1-b]$, by the previous two inequalities we are led to

$$\begin{aligned} \dot{{\mathcal {E}}}_{n+1}(s,q)+ & {} s \left( a_2-b \right) U_n + (b n+{\bar{c}}) \left( a \frac{b n+{\bar{c}}-s}{b n +c}+ a_2 -s \right) \langle \lambda k_{n-1} A_{\lambda } (x_{n}), {\dot{x}}_{n+1}\rangle \\+ & {} \lambda ^2 k_{n} (b n+{\bar{c}})(b n+{\bar{c}}-s) \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n}) \Vert ^2 \\+ & {} \frac{1}{2} (b n+{\bar{c}})^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2 \\+ & {} \frac{1}{2}\left( a_1-b -s \right) \big ( (2 n+1) b + 2{\bar{c}}- a_1 \big ) \Vert {\dot{x}}_{n+1}\Vert ^2 \le 0. \end{aligned}$$

This last inequality, recalling that $U_n:= \langle \lambda k_{n-1} A_{\lambda } (x_{n}), x_{n}-q \rangle $, is nothing but (2.8) $\square $

3 CRIPA in the general case of monotone operators.

3.1 Main estimates.

A series of estimates are obtained here by means of a Lyapunov analysis (based upon Lemma 2) and using the reformulation of CRIPA (from Lemma 1). Our main results (in Theorem 1) will be derived as a combination of the previous series of estimates.

3.1.1 Estimates from the energy-like sequence.

The next result is obtained from Lyapunov properties of $({{\mathcal {E}}}_n(s,q))$ for convenient choices of the involved parameters.

Lemma 3

Suppose that (1.2) holds and that $\{x_n\} \subset {\mathcal {H}}$ is generated by CRIPA with sequences $(k_n)$, $(\theta _n)$ and $(\gamma _n)$ (given by (1.23) and (1.24)), along with constants $\{\lambda , k_0\}\subset (0,\infty )$, $\{ a, c, a_1,a_2, \}\subset [0,\infty )$ and $\{ b, {\bar{c}}\}\subset (0,\infty )$ verifying

$$\begin{aligned} a_2 \ge b \,\hbox { and }\, a_1 > b + (a+ a_2). \end{aligned}$$

(3.1)

Assume, in addition, that c and ${\bar{c}}$ are chosen as follows:

$$\begin{aligned}&\hbox {if} \;\; a>0 \hbox {, then} \; \; c > a_1 - (a_2+a) , {\bar{c}}=c+(a_2+a); \end{aligned}$$

(3.2a)

$$\begin{aligned}&\hbox {if}\;\; a =0 \hbox {, then}\; \;{\bar{c}}> \max \{a_1,a_2\}. \end{aligned}$$

(3.2b)

Then, for any $q\in S$, the sequence $({{\mathcal {E}}}_n(a+a_2,q))_{n \ge N_1}$ (for some integer $N_1$ large enough) is non-increasing and convergent. Moreover, the following estimates are reached:

$$\begin{aligned}&\hbox { }\ \sup _{n \ge N_1} \Vert x_n-q\Vert ^2 \le \frac{2{{\mathcal {E}}}_{N_1}(a+a_2,q) }{ (a+a_2)(a_1-b- (a+a_2))}, \end{aligned}$$

(3.3a)

$$\begin{aligned}&\hbox { }\ \sup _ {n \ge N_1} ( b(n-1)+{\bar{c}}) k_{n-1} \langle A_{\lambda }(x_n) , x_{n}-q \rangle \le \frac{{{\mathcal {E}}}_{N_1}(a+a_2,q)}{\lambda (a+a_2) }, \end{aligned}$$

(3.3b)

$$\begin{aligned}&\hbox { }\ \sup _n n \Vert {\dot{x}}_{n}\Vert < \infty , \end{aligned}$$

(3.3c)

$$\begin{aligned}&\hbox { }\ \sum _{n \ge N_1} (a_2-b) k_{n-1} \langle A_{\lambda }(x_n), x_{n}-q \rangle \le \frac{{{\mathcal {E}}}_{N_1}(a+a_2,q)}{\lambda (a+a_2)} , \end{aligned}$$

(3.3d)

$$\begin{aligned}&\hbox { }\ \sum _{n \ge N_1} k_n ( b n +{\bar{c}}) \big ( b n + {\bar{c}}- (a+a_2) \big ) \Vert A_{\lambda }(x_{n+1})- A_{\lambda }(x_{n}) \Vert ^2 \le \frac{{{\mathcal {E}}}_{N_1}(a+a_2,q)}{\lambda ^2},\nonumber \\ \end{aligned}$$

(3.3e)

$$\begin{aligned}&\hbox { }\ \sum _{n \ge N_1} ( b n + {\bar{c}}) \Vert {\dot{x}}_{n+1}\Vert ^2 \le \frac{2 {{\mathcal {E}}}_{N_1}(a+a_2,q) }{a_1-b- (a+a_2)} , \end{aligned}$$

(3.3f)

$$\begin{aligned}&\hbox { }\ \sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2 < \infty . \end{aligned}$$

(3.3g)

Proof

Clearly, we have $a+ a_2 \in (0, a_1-b)$ (by condition (3.1)). It can also be checked that condition (3.2) ensures that

$$\begin{aligned} \begin{array}{l} a \frac{b n+{\bar{c}}-( a+ a_2)}{b n +c}+ a_2 = a+ a_2. \end{array} \end{aligned}$$

(3.4)

Consequently, for $q \in S$ and using Lemma 2 with $s=a+ a_2$, we know that, for $n \ge N_1$ (with $N_1$ large enough), we get

$$\begin{aligned} \begin{array}{l} \dot{\mathcal{E}}_{n+1}(a+ a_2,q) + \lambda (a+ a_2) k_{n-1} \left( a_2-b \right) \langle A_{\lambda } (x_{n}), x_n -q \rangle \\ + \lambda ^2 k_{n} (b n+{\bar{c}})\big (b n+{\bar{c}}-(a+ a_2) \big ) \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n})\Vert ^2 \\ + \frac{1}{2} (b n+{\bar{c}}) ^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2 \\ + \frac{1}{2}\big ( a_1-b -(a+ a_2) \big ) \big ( (2n +1)b + 2{\bar{c}}- a_1 \big ) \Vert {\dot{x}}_{n+1}\Vert ^2 \le 0, \end{array} \end{aligned}$$

(3.5)

together with

$$\begin{aligned}&b n+{\bar{c}}-(a+ a_2) >0, \end{aligned}$$

(3.6a)

$$\begin{aligned}&(2 n+1) b + 2{\bar{c}}- a_1 >0. \end{aligned}$$

(3.6b)

It follows immediately that the non-negative sequence $(\mathcal{E}_{n}(a+a_2,q))_{n \ge N_1}$ is non-increasing, since $a_2-b$ is assumed to be non-negative and since $ a_1-b -(a+ a_2) $ is assumed to be positive (in light of condition (3.1)). Hence, $({{\mathcal {E}}}_{n}(a+ a_2,q))_{n \ge N_1}$ is convergent and bounded. Moreover, from (2.6), we recall that

$$\begin{aligned} \begin{array}{l} {{\mathcal {E}}}_n(a+a_2,q)= \frac{1}{2} (a+ a_2) \big ( a_1-b -(a+a_2) \big ) \Vert x_{n}-q\Vert ^2 \\ + \lambda (a+ a_2) k_{n-1} ( b (n-1) +{\bar{c}}) \langle A_{\lambda }(x_{n}),x_{n}-q \rangle \\ + \frac{1}{2} \Vert (a+ a_2) (q-x_{n})- (b n+{\bar{c}}-a_1) {\dot{x}}_{n}\Vert ^2 . \end{array} \end{aligned}$$

(3.7)

Then, by the inequality ${{\mathcal {E}}}_{n}(a+ a_2,q)\le \mathcal{E}_{N_1}(a+ a_2,q)$ (for $n \ge N_1$), we get

$$\begin{aligned}&\hbox { }\ \frac{1}{2}(a+ a_2) \big (a_1-b -(a+ a_2) \big ) \Vert x_{n}-q\Vert ^2 \le {{\mathcal {E}}}_{N_1}(a+ a_2,q), \end{aligned}$$

(3.8)

$$\begin{aligned}&\hbox { }\ \lambda (a+ a_2) k_{n-1} ( b(n-1) +{\bar{c}}) \langle A_{\lambda }(x_n),x_{n}-q \rangle \le {{\mathcal {E}}}_{N_1}(a+ a_2,q), \end{aligned}$$

(3.9)

$$\begin{aligned}&(b n +{\bar{c}}-a_1) \Vert {\dot{x}}_{n}\Vert - (a+ a_2)\Vert q-x_{n}\Vert \le \sqrt{ 2 {{\mathcal {E}}}_{N_1}(a+ a_2,q)}. \end{aligned}$$

(3.10)

Estimates (3.3a), (3.3b) and (3.3c) are direct consequences of these last three inequalities. Furthermore, by adding (3.5) from $n=N_1$ to $n=N$ (for any given positive integer $N \ge N_1$) we obtain

$$\begin{aligned} \begin{array}{lll} {{\mathcal {E}}}_{N+1}(a+ a_2,q) \\ + \lambda (a+ a_2) \left( a_2-b \right) \sum _{n=N_1}^N k_{n-1} \langle A_{\lambda } (x_{n}), x_n -q \rangle \\ + \lambda ^2 \sum _{n=N_1}^N k_{n} (b n+{\bar{c}})\big ( b n+{\bar{c}}- (a+ a_2)\big ) \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n})\Vert ^2 \\ + \frac{1}{2}\left( a_1-b -(a+ a_2) \right) \sum _{n=N_1}^N \big ( (2 n+1)b + 2{\bar{c}}- a_1 \big ) \Vert {\dot{x}}_{n+1}\Vert ^2 \\ + \frac{1}{2} \sum _{n=N_1}^N (n+{\bar{c}}) ^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2 \le {{\mathcal {E}}}_{N_1}(a+ a_2,q), \end{array} \end{aligned}$$

(3.11)

which, in light of (3.1) and (3.6), entails that

$$\begin{aligned}&\hbox { }\ \lambda (a_2+a) \left( a_2-b \right) \sum _{n=N_1}^N k_{n-1} \langle A_{\lambda } (x_{n}), x_n -q \rangle \le {{\mathcal {E}}}_{N_1}(a_2+a,q), \end{aligned}$$

(3.12a)

$$\begin{aligned}&\hbox { }\ \lambda ^2 \sum _{n=N_1}^N k_{n} (b n+{\bar{c}})\big (b n+{\bar{c}}-(a+a_2)\big ) \Vert A_{\lambda } (x_{n+1}) -A_{\lambda } (x_{n})\Vert ^2 \le {{\mathcal {E}}}_{N_1}(a_2+a,q) , \end{aligned}$$

(3.12b)

$$\begin{aligned}&\hbox { }\ \frac{1}{2}\left( a_1-b -(a+ a_2) \right) \sum _{n=N_1}^N \big ( (2 n+1) b + 2{\bar{c}}- a_1\big ) \Vert {\dot{x}}_{n+1}\Vert ^2\le {{\mathcal {E}}}_{N_1}(a_2+a,q), \end{aligned}$$

(3.12c)

$$\begin{aligned}&\frac{1}{2} \sum _{n=N_1}^N (n+{\bar{c}}) ^2 \Vert {\dot{x}}_{n+1}-\theta _n{\dot{x}}_{n}\Vert ^2\le {{\mathcal {E}}}_{N_1}(a_2+a,q). \end{aligned}$$

(3.12d)

This straightforwardly yields (3.3d), (3.3e) and (3.3f). The last estimate (3.3g) is simply deduced from (3.3f) and (3.12d) (in light of the definition of $\theta _n$) $\square $

3.1.2 Estimates from the reformulation of the method.

In this section we establish additional estimates regarding CRIPA, especially on $(A_{\lambda }(x_{n}))$, by combining the results of Lemma 3 with the formulation of the method given in (2.4b).

Lemma 4

Assume, in addition to the assumptions of Lemma 3, that $a_2$ and b satisfy

$$\begin{aligned} a_2 > 2b. \end{aligned}$$

(3.13)

Then we have the following results:

$$\begin{aligned}&\sum _n n \Vert {\dot{x}}_{n}+ \lambda k_{n-1} A_{\lambda } (x_{n}) \Vert ^2 < \infty , \end{aligned}$$

(3.14a)

$$\begin{aligned}&\Vert {\dot{x}}_{n}+ \lambda k_{n-1} A_{\lambda } (x_{n})\Vert =o(n^{-1}), \end{aligned}$$

(3.14b)

$$\begin{aligned}&\sum _n n k_{n-1}^2 \Vert A_{\lambda } (x_{n}) \Vert ^2 < \infty , \end{aligned}$$

(3.14c)

$$\begin{aligned}&\sum _n n k_{n-1} | \langle A_{\lambda } (x_{n}) , {\dot{x}}_{n+1}\rangle | < \infty . \end{aligned}$$

(3.14d)

Proof

Let us prove (3.14a) and (3.14b). For $n\ge 1$, according to Lemma 1 we have

$$\begin{aligned} {\dot{x}}_{n+1}+ \lambda k_{n} A_{\lambda } (x_{n+1})= \theta _n{\dot{x}}_{n}+ \gamma _n \lambda k_{n-1} A_{\lambda } (x_{n}), \end{aligned}$$

(3.15)

where $\{ \theta _n, \gamma _n \} \subset (0,1)$ are given by (1.24) with ${\bar{c}}=c+a_2+a$ (under the assumptions of Lemma 3), namely

$$\begin{aligned} \theta _n= 1-a_1 (b n +{\bar{c}})^{-1}\hbox { and }\gamma _n= 1-a_2 (b n +{\bar{c}})^{-1}. \end{aligned}$$

(3.16)

Observe that (3.15) can be rewritten as

$$\begin{aligned} {\dot{x}}_{n+1}+ \lambda k_{n} A_{\lambda } (x_{n+1})= \gamma _n ({\dot{x}}_{n}+ \lambda k_{n-1} A_{\lambda } (x_{n}) ) + (\theta _n-\gamma _n ) {\dot{x}}_{n}, \end{aligned}$$

(3.17)

so, by setting

$$\begin{aligned} H_n = {\dot{x}}_{n}+\gamma _n \lambda k_{n-1} A_{\lambda } (x_{n}), \end{aligned}$$

(3.18)

we can equivalently formulate (3.17) as

$$\begin{aligned} \begin{array}{l} H_{n+1} = \gamma _n H_{n} + ( \theta _n-\gamma _n) {\dot{x}}_{n}= \gamma _n H_{n} + (1-\gamma _n) \frac{\theta _n-\gamma _n}{1-\gamma _n} {\dot{x}}_{n}. \end{array} \end{aligned}$$

(3.19)

Then by convexity of the squared norm we infer that

$$\begin{aligned} \Vert H_{n+1} \Vert ^2&\le \gamma _n \Vert H_{n}\Vert ^2 + (1-\gamma _n) \left( \frac{\gamma _n- \theta _n}{1-\gamma _n} \right) ^2 \Vert {\dot{x}}_{n}\Vert ^2\nonumber \\&= \gamma _n \Vert H_{n}\Vert ^2 + \frac{\left( \gamma _n- \theta _n\right) ^2}{1-\gamma _n} \Vert {\dot{x}}_{n}\Vert ^2. \end{aligned}$$

(3.20)

Moreover, by (3.16) we readily have

$$\begin{aligned} \theta _n-\gamma _n= (a_2 -a_1) (b n +{\bar{c}})^{-1}\hbox { and }1-\gamma _n= a_2 (b n +{\bar{c}})^{-1}, \end{aligned}$$

(3.21)

which amounts to

$$\begin{aligned} \frac{\left( \gamma _n- \theta _n\right) ^2}{1-\gamma _n}= \frac{(a_2 -a_1)^2 (n +{\bar{c}})^{-2}}{ a_2 (n +{\bar{c}})^{-1}}=\frac{(a_2 -a_1)^2 }{ a_2 } (b n +{\bar{c}})^{-1}. \end{aligned}$$

(3.22)

Consequently, in light of (3.16), (3.20) and (3.22), we obtain

$$\begin{aligned} \Vert H_{n+1} \Vert ^2 \le \left( 1-a_2 (b n +{\bar{c}})^{-1} \right) \Vert H_{n}\Vert ^2 + a_2^{-1} (a_2 -a_1)^2 (b n +{\bar{c}})^{-1} \Vert {\dot{x}}_{n}\Vert ^2.\nonumber \\ \end{aligned}$$

(3.23)

Next, multiplying this last inequality by $(bn +{\bar{c}})^2$ gives us

$$\begin{aligned} (b n +{\bar{c}})^2\Vert H_{n+1} \Vert ^2 \le&(b n +{\bar{c}})\left( b n +{\bar{c}}-a_2 \right) \Vert H_{n}\Vert ^2 \nonumber \\&+ a_2^{-1} (a_2 -a_1)^2 (b n +{\bar{c}}) \Vert {\dot{x}}_{n}\Vert ^2, \end{aligned}$$

(3.24)

while we simply have

$$\begin{aligned} \begin{array}{l} (b n +{\bar{c}})\left( b n +{\bar{c}}-a_2 \right) -(b(n-1) +{\bar{c}})^2 \\ = (b n +{\bar{c}})^2 -(b(n-1) +{\bar{c}})^2 -a_2(b n +{\bar{c}}) \\ \le 2b (b n +{\bar{c}})-a_2(b n +{\bar{c}})= -( a_2-2b ) (b n +{\bar{c}}), \end{array} \end{aligned}$$

or equivalently

$$\begin{aligned} (b n +{\bar{c}})\left( b n +{\bar{c}}- a_2 \right) \le (b(n-1) +{\bar{c}})^2- ( a_2-2b )(b n +{\bar{c}}). \end{aligned}$$

(3.25)

It follows from the two estimates (3.24) and (3.25) that

$$\begin{aligned} \begin{array}{l} (b n +{\bar{c}})^2\Vert H_{n+1} \Vert ^2 - (b(n-1) +{\bar{c}})^2 \Vert H_{n} \Vert ^2 \\ + (b n +{\bar{c}})( a_2-2b ) \Vert H_{n}\Vert ^2 \le a_2^{-1} (a_2 -a_1)^2 (b n +{\bar{c}}) \Vert {\dot{x}}_{n}\Vert ^2. \end{array} \end{aligned}$$

(3.26)

Thus, assuming that $a_2 -2b >0$ and recalling that $\sum _n n \Vert {\dot{x}}_{n}\Vert ^2 < \infty $ (according to Lemma 3), we classically deduce that $\sum _n n \Vert H_{n} \Vert ^2 < \infty $ (that is (3.14a)) and that there exists $ l_1 \ge 0$ such that

$$\begin{aligned} \lim _{n \rightarrow +\infty } (b(n-1) +{\bar{c}})^2 \Vert H_{n} \Vert ^2=l_1. \end{aligned}$$

Notice that we clearly have $ \lim _{n \rightarrow \infty } (bn)^2 \Vert H_{n} \Vert ^2=l_1$ (since $\frac{ (b(n-1) +{\bar{c}})^2}{(bn)^2} \rightarrow 1$ as $n \rightarrow \infty $). So, by $\sum _n n \Vert H_{n} \Vert ^2 < \infty $ (in light of (3.14a)), and noticing that $\sum _n n^{-1} =\infty $, we deduce that $l_1=0$, which leads to (3.14b).

Let us prove (3.14c) and (3.14d). Clearly, according to the definition of $H_n$, we simply have

$$\begin{aligned} n \Vert \lambda k_{n-1} A_{\lambda } (x_{n})\Vert ^2 \le 2 n \Vert {\dot{x}}_{n}+\lambda k_{n-1} A_{\lambda } (x_{n})\Vert ^2 + 2 n \Vert {\dot{x}}_{n}\Vert ^2= 2 n \Vert H_n\Vert ^2 + 2 n \Vert {\dot{x}}_{n}\Vert ^2, \end{aligned}$$

hence, by $\sum _n n \Vert {\dot{x}}_{n}\Vert ^2< \infty $ (from (3.3f)) and $\sum _n n \Vert H_n\Vert ^2< \infty $ (from (3.14a)), we immediately obtain (3.14c). In addition, Young’s inequality readily gives us

$$\begin{aligned} \begin{array}{l} n k_{n-1}|\langle A_{\lambda } (x_{n}) , {\dot{x}}_{n+1}\rangle | \le \frac{1}{2} n \Vert k_{n-1} A_{\lambda } (x_{n})\Vert ^2 + \frac{1}{2} n \Vert {\dot{x}}_{n+1}\Vert ^2. \end{array} \end{aligned}$$

(3.27)

The estimate (3.14d) is then obtained as an immediate consequence of (3.27), along with the results $\sum _n n \Vert k_{n-1} A_{\lambda } (x_{n})\Vert ^2 < \infty $ (from (3.14c)) and $\sum _n n \Vert {\dot{x}}_{n}\Vert ^2< \infty $ (from (3.3f))$\square $

3.2 Asymptotic convergence and main results.

3.2.1 Convergence in the general case of parameters.

The following result establishes the convergence of CRIPA in a general setting of parameters.

Theorem 1

Let $A : {\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}$ be a maximally monotone operator such that $S:=A^{-1}(0) \ne \emptyset $. Let $\{\lambda , k_0 \}$ be positive constants and assume that $\{z_n, x_n\} \subset {\mathcal {H}}$ are generated by CRIPA with $(k_n)$, $(\theta _n)$ and $(\gamma _n)$ (given by (1.23) and (1.24)), along with constants $\{a, c, a_1,a_2\}\subset [0,\infty )$ and $\{b, {\bar{c}}\}\subset (0,\infty )$ verifying

$$\begin{aligned}&\hbox {} \;\; a_2>2b \hbox {,} \;a_1 > b + a+a_2 ; \end{aligned}$$

(3.28a)

$$\begin{aligned}&\hbox {if}\; \; a>0 \hbox {, then} \;c > a_1 - (a_2+a) , {\bar{c}}=c+a_2+a; \end{aligned}$$

(3.28b)

$$\begin{aligned}&\hbox {if} \;\; a =0 \hbox {, then} \;{\bar{c}}> \max \{a_1,a_2\}. \end{aligned}$$

(3.28c)

Then $(x_n)$ and $(z_n)$ converge weakly to some element of S and the following results are reached:

$$\begin{aligned}&\Vert {\dot{x}}_{n+1}\Vert =o( n^{-1}), \sum _n n \Vert {\dot{x}}_{n+1}\Vert ^2 < \infty ,, \end{aligned}$$

(3.29a)

$$\begin{aligned}&\Vert A_{\lambda } (x_n) \Vert =o( (nk_n)^{-1}), \sum _n n k_n^2 \Vert A_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$

(3.29b)

$$\begin{aligned}&\sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2< \infty , \sum _{n} k_n n^2 \Vert A_{\lambda }(x_{n+1})- A_{\lambda }(x_{n}) \Vert ^2 < \infty , \end{aligned}$$

(3.29c)

$$\begin{aligned}&\text{ for } \text{ any } q\in S, \sum _{n } k_{n} \langle A_{\lambda }(x_n), x_{n}-q \rangle <\infty . \end{aligned}$$

(3.29d)

Theorem 1 will be proved in Appendix (Sect. 3).

3.2.2 Convergence results in particular cases of parameters.

The estimates given in Theorem 1 are depending on the parameter $(k_n)$, while the next result enlightens some specific properties of $(k_n)$ given by (1.23).

Proposition 2

Let $(k_n) \subset (0,\infty )$ be given by (1.23) with $b >0$ and $\{ a, c\} \subset [0,\infty )$. Suppose for some nonnegative integer p that $[\frac{a}{b}] \ge p $ (where $[\frac{a}{b}]$ denotes the integer part of $\frac{a}{b}$). Then there exist some positive constant C (depending on $k_0$, $[\frac{a}{b}]$, $[\frac{c}{b}]$) and some positive integer $n_p$ (depending on $[\frac{a}{b}]$) for which $k_n$ satisfies

$$\begin{aligned} k_n \ge C n^p\hbox { for }n \ge n_p. \end{aligned}$$

(3.30)

The proof of Proposition 2 is given in Appendix 2.

The above proposition allows us to give more precise estimates with respect to the involved parameters.

In specific, the next two results are immediate consequences of Theorem 1 and Proposition 2.

The first theorem is related to the special case of CRIPA when $a=0$.

Theorem 2

(Convergence of CRIPA with constant relaxation factors) Let $A : {\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}$ be maximally monotone, with $S:=A^{-1}(0) \ne \emptyset $. Let $\{\lambda , k_0 \}$ be positive constants and assume that $\{z_n, x_n\} \subset {\mathcal {H}}$ are generated by CRIPA-S with parameters $(\theta _n)$ and $(\gamma _n)$ (given (1.24)), along with constants $\{b, {\bar{c}}, a_1, a_2 \}\subset (0,\infty )$ verifying

$$\begin{aligned}&\hbox {} a_2>2b \hbox {,}\,\, a_1 > b + a_2 , \end{aligned}$$

(3.31a)

$$\begin{aligned}&\hbox { }\ {\bar{c}}> \max \{a_1,a_2 \}. \end{aligned}$$

(3.31b)

Then $(x_n)$ and $(z_n)$ converge weakly to some element of S and the following results are reached:

$$\begin{aligned}&\Vert {\dot{x}}_{n+1}\Vert =o( n^{-1}), \sum _n n \Vert {\dot{x}}_{n+1}\Vert ^2< \infty , \sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2 < \infty , \end{aligned}$$

(3.32a)

$$\begin{aligned}&\Vert A_{\lambda } (x_n) \Vert =o( n^{-1}), \sum _n n \Vert A_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$

(3.32b)

$$\begin{aligned}&\sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2< \infty , \sum _{n} n^2 \Vert A_{\lambda }(x_{n+1})- A_{\lambda }(x_{n}) \Vert ^2 < \infty , \end{aligned}$$

(3.32c)

$$\begin{aligned}&\text{ for } \text{ any } q\in S, \sum _{n } \langle A_{\lambda }(x_n), x_{n}-q \rangle <\infty . \end{aligned}$$

(3.32d)

The second theorem is related to the particular case of CRIPA when $a>0$.

Theorem 3

(Convergence of CRIPA with varying relaxation factors) Let $A : {\mathcal {H}}\rightarrow 2^{{\mathcal {H}}}$ be maximally monotone, with $S:=A^{-1}(0) \ne \emptyset $, and let $\{x_n, z_n \} \subset {\mathcal {H}}$ be generated by CRIPA with parameters $(\theta _n)$ and $(\gamma _n)$ (given by (1.24)). Suppose for some nonnegative integer p that $\{a, b, c, a_1,a_2, {\bar{c}}\}$ are positive constants verifying

$$\begin{aligned}&\hbox {} a_2>2b \hbox {,}\,\, [\frac{a}{b}] \ge p, a_1 > b + a+a_2 , \end{aligned}$$

(3.33a)

$$\begin{aligned}&\hbox {} c > a_1 - (a_2+a) \hbox {,}\,\, {\bar{c}}=c+a_2+a. \end{aligned}$$

(3.33b)

Then $(x_n)$ and $(z_n)$ converge weakly to some element of S and the following results are reached:

$$\begin{aligned}&\Vert {\dot{x}}_{n+1}\Vert =o( n^{-1}), \sum _n n \Vert {\dot{x}}_{n+1}\Vert ^2 < \infty , \end{aligned}$$

(3.34a)

$$\begin{aligned}&\Vert A_{\lambda } (x_n) \Vert =o( n^{-(p+1)}), \sum _n n^{2p+1} \Vert A_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$

(3.34b)

$$\begin{aligned}&\sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2< \infty , \sum _{n} n^{p+2} \Vert A_{\lambda }(x_{n+1})- A_{\lambda }(x_{n}) \Vert ^2 < \infty , \end{aligned}$$

(3.34c)

$$\begin{aligned}&\text{ for } \text{ any } q\in S, \sum _{n } n^p \langle A_{\lambda }(x_n), x_{n}-q \rangle <\infty . \end{aligned}$$

(3.34d)

4 CRIPA in the convex case.

In this section, by following the methodology used by Attouch-László [2], we expose our main results relative to the minimization problem

$$\begin{aligned} \inf _{x \in {\mathcal {H}}}f(x), \end{aligned}$$

(4.1)

where $f : {\mathcal {H}}\rightarrow \mathrm{I\!R}\cup \{+\infty \}$ is a proper convex and lower semi-continuous function such that $\mathrm{argmin} f \ne \emptyset $.

Indeed, by Fermat’s rule we know that (4.1) is equivalent to the monotone inclusion problem

$$\begin{aligned} \text{ find } x \in {\mathcal {H}}\hbox { such that }0 \in \partial f (x). \end{aligned}$$

(4.2)

Moreover, in the special case when $A = \partial f$, CRIPA reduces to the following algorithm.

(CRIPA-convex):

$\rhd $ Step 1 (initialization): Let $\{ z_{-1}, x_{-1}, x_0 \} \subset {\mathcal {H}}$.

$\rhd $ Step 2 (main step): Given $\{z_{n-1},x_{n-1}, x_n\} \subset {\mathcal {H}}$ (with $n \ge 0$), we compute the updates by

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n ( z_{n-1}- x_{n}), \end{aligned}$$

(4.3a)

$$\begin{aligned}&x_{n+1}=\frac{1}{1+k_n} z_n + \frac{k_n}{1+k_n} \mathrm{prox}_{ \lambda (1+k_n)} f (z_n), \end{aligned}$$

(4.3b)

where $(k_n)$, $(\theta _n)$ and $(\gamma _n)$ are given by (1.23) and (1.24).

As the specific case of the latter algorithm when $a=0$ we also consider the following method:

(CRIPA-S-convex):

$\rhd $ Step 1 (initialization): Let $\{ z_{-1}, x_{-1}, x_0 \} \subset {\mathcal {H}}$.

$\rhd $ Step 2 (main step): Given $\{z_{n-1},x_{n-1}, x_n\} \subset {\mathcal {H}}$ (with $n \ge 0$), we compute the updates by

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n ( z_{n-1}- x_{n}), \end{aligned}$$

(4.4a)

$$\begin{aligned}&x_{n+1}=\frac{1}{1+k_0} z_n + \frac{k_0}{1+k_0} \mathrm{prox}_{ \lambda (1+k_0)} f (z_n), \end{aligned}$$

(4.4b)

where $k_0$ is a positive constant, and where $(\theta _n)$ and $(\gamma _n)$ are given by (1.24).

Remark 4

As a fundamental tool, we also recall that the Yosida approximation of $\partial f$ is equal to the gradient of the Moreau envelope of f. Namely, for any $\lambda > 0$, we have $(\partial f)_{\lambda } = \nabla f_{\lambda }$, where $f_{\lambda } : {\mathcal {H}}\rightarrow \mathrm{I\!R}$ is a $C^{1,1}$ function, which is defined for any $x \in {\mathcal {H}}$ by:

$$\begin{aligned} \begin{array}{l} f_{\lambda }(x)= \inf _{\xi \in {\mathcal {H}}} \left\{ f(\xi )+ \frac{1}{2} \lambda ^{-1} \Vert x - \xi \Vert ^2 \right\} . \end{array} \end{aligned}$$

(4.5)

So, an alternative formulation of CRIPA-convex in terms of Moreau envelope is given by

$$\begin{aligned}&z_n= x_n + \theta _n(x_{n}-x_{n-1}) + \gamma _n ( z_{n-1}- x_{n}), \end{aligned}$$

(4.6a)

$$\begin{aligned}&\hbox { }\ x_{n+1}=z_n - \lambda k_n \nabla f_{\lambda (1+k_n)} (z_n). \end{aligned}$$

(4.6b)

Before exposing our results regarding CRIPA-convex, we recall some properties of the Moreau envelope through the following lemma.

Lemma 5

Let $f : {\mathcal {H}}\rightarrow \mathrm{I\!R}\cup \{+\infty \}$ be a lower semi-continuous convex and proper function such that $\mathrm{argmin} f \ne \emptyset $, and let $q \in S:=\mathrm{argmin} f$. Then the following properties are obtained:

$$\begin{aligned}&0 \le f_\lambda (x_{n}) -\min f \le \langle \nabla f_{\lambda }(x_n), x_{n}-q \rangle , \end{aligned}$$

(4.7a)

$$\begin{aligned}&0 \le f (\mathrm{prox}_{\lambda f}(x_{n})) -\min f \le f_\lambda (x_{n}) -\min f , \end{aligned}$$

(4.7b)

$$\begin{aligned}&2 \lambda ^{-1} \Vert x_{n}-\mathrm{prox}_{\lambda f}x_{n})\Vert ^2 \le f_\lambda (x_{n}) -\min f . \end{aligned}$$

(4.7c)

Proof

Item (4.7a) is immediate from the gradient inequality. In addition, by definition of $f_{\lambda }$ and the proximal mapping, we have

$$\begin{aligned} f_\lambda (x_{n}) -\min f= f (\mathrm{prox}_{\lambda f}(x_{n})) -\min f + 2 \lambda ^{-1} \Vert x_{n}-\mathrm{prox}_{\lambda f}x_{n})\Vert ^2. \end{aligned}$$

(4.8)

This obviously implies items (4.7b) and (4.7c) $\square $

Now, we are in position to claim the main result of the this section.

Theorem 4

(Convergence of CRIPA-convex) Let $f : {\mathcal {H}}\rightarrow \mathrm{I\!R}\cup \{+\infty \}$ be a lower semi-continuous convex and proper function such that $S:=\mathrm{argmin} f \ne \emptyset $, and let $\{x_n, z_n \} \subset {\mathcal {H}}$ be generated by CRIPA-convex with parameters $(\theta _n)$ and $(\gamma _n)$ (given by (1.24)). Suppose for some nonnegative integer p that $\{a, b, c, a_1,a_2,{\bar{c}}\}$ are positive constants verifying

$$\begin{aligned}&a_2>2b \hbox {,}\;\quad [\frac{a}{b}] \ge p , a_1 > b + a+a_2 , \end{aligned}$$

(4.9a)

$$\begin{aligned}&c > a_1 - (a_2+a)\hbox {,} \;{\bar{c}}=c+(a_2+a). \end{aligned}$$

(4.9b)

Then the following properties are obtained:

$$\begin{aligned}&\Vert x_{n+1}-x_{n}\Vert =o( n^{-1}), \Vert \nabla f_{\lambda } (x_n) \Vert =o( n^{-(p+1)}) , \end{aligned}$$

(4.10a)

$$\begin{aligned}&\sum _n n \Vert x_{n+1}-x_{n}\Vert ^2< \infty , \sum _n n^{2p+1} \Vert \nabla f_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$

(4.10b)

$$\begin{aligned}&\hbox {for any} \;\;\;q\in S \hbox {,} \sum _{n } n^p \langle \nabla f_{\lambda }(x_n), x_{n}-q \rangle <\infty , \end{aligned}$$

(4.10c)

$$\begin{aligned}&\exists {\bar{x}}\in S,\hbox { s.t. }(x_n,z_n) \rightharpoondown ({\bar{x}},{\bar{x}})\hbox { weakly in }{\mathcal {H}}^2. \end{aligned}$$

(4.10d)

We also have the convergence rates:

$$\begin{aligned}&f_\lambda (x_{n}) -\min f = o( n^{-(p+1)}), \hbox { }\ \sum _n n^p (f_\lambda (x_{n}) -\min f) < \infty , \end{aligned}$$

(4.11a)

$$\begin{aligned}&f (\mathrm{prox}_{\lambda f}(x_{n})) -\min f = o( n^{-(p+1)}), \hbox { }\ \sum _n n^p \left( f (\mathrm{prox}_{\lambda f}(x_{n})) -\min f \right) < \infty ,\nonumber \\ \end{aligned}$$

(4.11b)

$$\begin{aligned}&\Vert x_{n}-\mathrm{prox}_{\lambda f}(x_{n})\Vert = o( n^{-\frac{p+1}{2}}), \sum _n n^p \Vert x_{n}-\mathrm{prox}_{\lambda f}(x_{n})\Vert ^2 < \infty . \end{aligned}$$

(4.11c)

Proof

The results given by item (4.10) are direct consequences of Theorem 3. So, (4.7a), in light of $ \Vert \nabla f_{\lambda } (x_n) \Vert =o( n^{-(p+1)})$ (from (4.10a)) and by boundedness of $(x_n)$ (from (4.10d)), yields the first result in item (4.11a). The second result in item (4.11a) follows immediately from (4.7a) and (4.10c). In addition, item (4.11b) is readily deduced from (4.7b) and (4.11a), while (4.11c) is obtained from (4.7c) and (4.11a) $\square $

Theorem 5

(Convergence of CRIPA-S-convex) Let $f : {\mathcal {H}}\rightarrow \mathrm{I\!R}\cup \{+\infty \}$ be a lower semi-continuous convex and proper function such that $S:=\mathrm{argmin} f \ne \emptyset $. Let $\{\lambda , k_0 \}$ be positive constants and assume that $\{x_n, z_n\} \subset {\mathcal {H}}$ are generated by CRIPA-S with parameters $(\theta _n)$ and $(\gamma _n)$ (given by (1.24)), along with constants $\{b, a_1, a_2, {\bar{c}}\}\subset (0,\infty )$ verifying

$$\begin{aligned}&\hbox {} a_2>2b \hbox {,}\,\, a_1 > b + a_2 , \end{aligned}$$

(4.12a)

$$\begin{aligned}&\hbox { }\ {\bar{c}}> \max \{a_1,a_2 \}. \end{aligned}$$

(4.12b)

Then the following results are reached:

$$\begin{aligned}&\Vert {\dot{x}}_{n+1}\Vert =o( n^{-1}), \sum _n n \Vert {\dot{x}}_{n+1}\Vert ^2 < \infty , \end{aligned}$$

(4.13a)

$$\begin{aligned}&\Vert \nabla f_{\lambda } (x_n) \Vert =o( n^{-1}), \sum _n n \Vert \nabla f_{\lambda } (x_n)\Vert ^2 < \infty , \end{aligned}$$

(4.13b)

$$\begin{aligned}&\sum _{n } n^2 \Vert {\dot{x}}_{n+1}-{\dot{x}}_{n}\Vert ^2< \infty , \sum _{n} n^2 \Vert \nabla f _{\lambda }(x_{n+1})- \nabla f _{\lambda }(x_{n}) \Vert ^2 < \infty , \end{aligned}$$

(4.13c)

$$\begin{aligned}&\hbox {for any}\,\, q\in S \hbox {,} \sum _{n } \langle \nabla f_{\lambda }(x_n), x_{n}-q \rangle <\infty , \end{aligned}$$

(4.13d)

$$\begin{aligned}&\exists {\bar{x}}\in S,\hbox { s.t. }(x_n,z_n) \rightharpoondown ({\bar{x}},{\bar{x}})\hbox { weakly in }{\mathcal {H}}^2. \end{aligned}$$

(4.13e)

Proof

The items in (4.13) are clearly given by Theorem 2$\square $

5 Numerical experiments.

Some numerical experiments are performed in this section so as to illustrate the behavior of CRIPA relative to some benchmarks.

5.1 The maximally monotone case.

As done for illustrating the performance of PRINAM in [2], we consider a model example of the skew symmetric and maximally monotone operator $A: \mathrm{I\!R}^2 \rightarrow \mathrm{I\!R}^2$ defined for $(\xi , \eta ) \in \mathrm{I\!R}^2$ by $ A(\xi , \eta ) = (-\eta , \xi )$. It is well-known that A is not the sub-differential of a convex function. We also recall that A possesses a single zero $x^*=(0,0)$, and that A and its Yosida regularization $A_{\lambda }$ can be identified respectively with the matrices

$$\begin{aligned} A= \left( \begin{array}{cc} 0 &{} -1 \\ 1 &{} 0 \\ \end{array} \right) , A_{\lambda } = \left( \begin{array}{cc} \frac{\lambda }{\lambda ^2+1} &{} - \frac{1}{\lambda ^2+1} \\ \frac{1}{\lambda ^2+1} &{} \frac{\lambda }{\lambda ^2+1} \\ \end{array} \right) . \end{aligned}$$

(5.1)

We approximate a zero of A by means of several algorithms: CRIPA, KAPPA (namely Kim’s accelerated proximal point algorithm given in (1.20)) and PRINAM (given in (1.17)–(1.18)). On Figs. 1 and 2 are displayed the profiles of $\Vert x_{n}-x^*\Vert $ for the sequences $(x_n)$ generated by these algorithms:

Figure 1 illustrates the behavior of the iterates $(x_n)$ generated by CRIPA (with constant relaxation factors and proximal index) and by KAPPA (which only uses constant indexes). The profile obtained for KAPPA with the proximal parameter $\mu = 0.01$ is compared with these of CRIPA for several values of $k_0$ and $\lambda $ such that $\lambda (k_0+1) = \mu $. The starting points used are $x_0=x_{-1}=z_{-1}=(1,-1)$ for CRIPA, and $x_0=z_0=z_{-1}=(1,-1)$ for KAPPA. We run each method until the stopping criteria $\Vert x_n -x^*\Vert \le 10^{-7}$ holds. The performance of both algorithms are similar on this simple example. However, one can notice the so many oscillations that are exhibited by KAPPA and which do not happen for CRIPA.
Figure 2 illustrates the behavior of the iterates $(x_n)$ generated by CRIPA (using varying relaxation factors and unbounded proximal indexes) and by PRINAM (also using unbounded proximal indexes). The profile obtained for PRINAM (when using the same parameters as in the optimal simulation proposed in [2]) is compared with these of CRIPA (for several values of p). The starting points used are $x_0=(-1,1)$ and $x_1=(1,-1)$ for PRINAM, and $x_0=z_{-1}=(-1,1)$ and $x_1=(1,-1)$ for CRIPA. Here we use the stopping criteria $\Vert x_n -x^*\Vert \le 10^{-5}$. A faster convergence can be noticed for PRINAM than for CRIPA with $p=1$. However the convergence of the trajectories of CRIPA is considerably accelerated for $p \ge 2$.

5.2 The convex case.

Given a symmetric and positive definite matrix $A: \mathrm{I\!R}^N \rightarrow \mathrm{I\!R}^N$, we consider the convex quadratic programming problem

$$\begin{aligned} \min _{x \in \mathrm{I\!R}^N} \left\{ f(x): = \frac{1}{2}\langle A x, x \rangle \right\} . \end{aligned}$$

(5.2)

It is clear that A possesses a single zero $x^*=0 $ and that (5.2) is equivalent to solve

$$\begin{aligned} 0 \in (\partial f) ({\bar{x}})= A{\bar{x}}. \end{aligned}$$

(5.3)

We approximate the solution to (5.3) by means of the following algorithms: CRIPA, AFB (given in (1.10)) and IGAHD (see [4]).

Remark 5

For convenience of the reader, we recall that IGAHD was introduced in [4] for minimizing smooth convex function $f: {\mathcal {H}}\rightarrow \mathrm{I\!R}$ with L-Lipschitz continuous gradient. This procedure is given, for some nonnegative values $\{ s, \alpha , \beta \}$, by

$$\begin{aligned} \begin{array}{l} y_{n} =x_n + \left( 1-\frac{\alpha }{n} \right) (x_{n}- x_{n-1})-\beta \sqrt{s} \left( \nabla f(x_{n}) - \nabla f(x_{n-1}) \right) - \frac{\beta \sqrt{s}}{n}\nabla f(x_{n-1}) , \\ x_{n+1}= y_n - s \nabla f( y_{n} )). \end{array}\nonumber \\ \end{aligned}$$

(5.4)

Convergence of the function values with the rates $o(n^{-2})$ as well as the property of fast convergence to zero of the gradient (that is, $\sum _n n^2 \Vert \nabla f(x_{n}) \Vert ^2 < \infty $) were established under the conditions

$$\begin{aligned} \alpha \ge 3, 0 \le \beta < 2\sqrt{s}\hbox { and }s \le L^{-1}. \end{aligned}$$

(5.5)

Regarding our numerical simulation we take $N=100$ and $A= B^TB$ where $B=(b_{i,j})_{1\le i,j \le N}$ is a randomized invertible matrix such that $b_{i,j} \in [-1,1]$.

On Figs. 3, 4 and 5 are displayed the profiles of $\Vert x_{n}-x^*\Vert $ for the iterates $(x_n)$ generated by these algorithms. The starting points used are $x_1=x_0=(1,1,...,1)$ for IGAHD and for AFB, while we similarly choose $x_1=x_0=z_{-1} =(1,1,...,1)$ for CRIPA. Here we use the stopping criteria $\Vert x_n -x^*\Vert \le 10^{-5}$ :

In order to compare CRIPA with AFB and IGAHD, we first give some insights into the influence of the parameter $\alpha $ on the trajectories generated by the latter two algorithms. Figures 3 and 4 feature the profiles obtained for AFB and IGAHD (for several values of $\alpha $).

On Fig. 5, some profiles obtained for IGAHD and AFB are compared with that of CRIPA (for several values of p).

References

Attouch, H.: Fast inertial proximal ADMM algorithms for convex structured optimization with linear constraint. Minimax Theory Its Appl. 06(1), 1–24 (2021)
MathSciNet MATH Google Scholar
Attouch, H., László, S.C.: Newton-like inertial dynamics and proximal algorithms governed by maximally monotone operators. SIAM J. Optim. 30(4), 3252–3283 (2020)
Article MathSciNet Google Scholar
Attouch, H., László, S.C.: Continuous Newton-like inertial dynamics for monotone inclusions. Set-Valued Var. Anal. (2020). https://doi.org/10.1007/s11228-020-00564-y
Article MATH Google Scholar
Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: First-order optimization algorithms via inertial systems with Hessian driven damping, arXiv preprint, arXiv:1907.10536 (2019)
Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Programm. 174, 391–432 (2019)
Article Google Scholar
Attouch, H., Peypouquet, J.: Convergence Rate of Proximal Inertial Algorithms Associated with Moreau Envelopes of Convex Functions. In: Bauschke, H., Burachik, R., Luke, D. (eds.) Splitting Algorithms, Modern Operator Theory, and Applications. Springer, Cham (2019)
MATH Google Scholar
Attouch, H., Soueycatt, M.: Augmented Lagrangian and proximal alternating direction methods of multipliers in Hilbert spaces. Applications to games, PDE’s and control. Pac. J. Optim. 5(1), 17–37 (2009)
MathSciNet MATH Google Scholar
Attouch, H., Svaiter, B.F.: A continuous dynamical Newton-like approach to solving monotone inclusions. SIAM J. Control Optim. 49(2), 574–598 (2011)
Article MathSciNet Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2017)
Book Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet Google Scholar
Boţ, R.I., Csetnek, E.R.: Second order forward-backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 54, 1423–1443 (2016)
Article MathSciNet Google Scholar
Boţ, R.I., Csetnek, E.R.: ADMM for monotone operators: convergence analysis and rates. Adv. Comput. Math. 45(1), 327–359 (2019)
Article MathSciNet Google Scholar
Brezis, H.: Opérateurs maximaux monotones, Mathematical Studies, vol. 5. North-Holland, Amsterdam (1973)
MATH Google Scholar
Brezis, H.: Function Analysis, Sobolev Spaces and Partial Differential Equations. Springer, New York (2010). https://doi.org/10.1007/978-0-387-70914-7
Book Google Scholar
Brezis, H., Lions, P.L.: Produits infinis de résolvantes. Isr. J. Math. 29, 329–345 (1978)
Article Google Scholar
Chambolle, A., Dossal, C.: On the convergence of the iterates of Fista. J. Optim. Theory Appl. 166(3), 968–982 (2015)
Article MathSciNet Google Scholar
Combettes, P.L.: Monotone operator theory in convex optimization. Math. Programm. Vol. B 170(1), 177–206 (2018)
Article MathSciNet Google Scholar
Corman, E., Yuan, X.: A generalized proximal point algorithm and its convergence rate. SIAM J. Optim. 24(4), 1614–1638 (2014)
Article MathSciNet Google Scholar
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Article MathSciNet Google Scholar
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Programm. 145(1–2), 451–482 (2014)
Article MathSciNet Google Scholar
Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Programm. 55(1–3), 293–318 (1992)
Article MathSciNet Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2(1), 17–40 (1976)
Article Google Scholar
Gu, G., Yang, J.: Optimal nonergodic sublinear convergence rate of proximal point algorithm for maximal monotone inclusion problems, (2019), arXiv:1904.05495
Güler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optim. 29, 403–419 (1991)
Article MathSciNet Google Scholar
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Article MathSciNet Google Scholar
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 302–320 (1969)
MathSciNet MATH Google Scholar
Kim, D.: Accelerated proximal point method for maximally monotone operators, (2019), arXiv:1905.05149
Kim, D., Fessler, J.A.: Another look at the Fast Iterative Shrinkage/Thresholding Algorithm (FISTA). SIAM J. Optim. 28(1), 223–250 (2018)
Article MathSciNet Google Scholar
Kim, D., Fessler, J.A.: Generalizing the optimized gradient method for smooth convex minimization. SIAM J. Optim. 28(2), 1920–1950 (2018)
Article MathSciNet Google Scholar
Lemaire, B.: The proximal algorithm, in: New methods in optimization and their industrial uses. In: J.P. Penot (ed), Internat. Ser. Numer. Math. 87, Birkhauser, Basel, pp. 73-87 (1989)
Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learning Res. 18(212), 1–54 (2018)
MathSciNet MATH Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Article MathSciNet Google Scholar
Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51, 311–325 (2015)
Article MathSciNet Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Article MathSciNet Google Scholar
Maingé, P.E.: First-order continuous newton-like systems for monotone inclusions. SIAM J. Control Optim. 51(2), 1615–1638 (2013)
Article MathSciNet Google Scholar
Martinet, B.: Régularisation d’in’ equations variationnelles par approximations successives. Rev. Fr. Infor. Rech. Opération. 4, 154–158 (1970)
MATH Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Math. Doklady 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(1/k2). Dokl. Akad. Nauk. USSR 269(3), 543–7 (1983)
Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Programm. Ser. B 140, 125–161 (2013)
Article Google Scholar
Peaceman, D.W., Rachford, H.H.: The numerical solution of parabolic and elliptic differential equations. J. Soc. Ind. Appl. Math. 3(1), 28–41 (1955)
Article MathSciNet Google Scholar
Powell, M.J.D.: Optimization. In: Fletcher, R. (ed.) A Method for Nonlinear Constraints in Minimization Problems, pp. 283–98. Academic Press, New York (1969)
Google Scholar
Rockafellar, R.T.: Monotone operators associated with saddle functions and minimax problems. In: F. E. Browder, (ed), Nonlinear Functional Analysis, Part 1. Symposia in Pure Math., vol. 18, American Mathematical Society, Providence, RI., pp. 397–407 (1970)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Opt. 14(5), 877–898 (1976)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, J.B.: Variational Analysis. Springer, Berlin (1998)
Book Google Scholar
Takahashi, W.: Nonlinear Functional Analysis. Yokohama Publishers, Yokohama (2000)
MATH Google Scholar
Zhang, X.Q., Burger, M., Bresson, X., Osher, S.: Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J. Imaging Sci. 3(3), 253–276 (2010)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

F.W.I., MEMIAD, Université des Antilles, Campus de Schoelcher, 97233, Schoelcher Cedex, Martinique
Paul-Emile Maingé

Authors

Paul-Emile Maingé
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul-Emile Maingé.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

APPENDIX.

1.1 Proof of Proposition 1.

In order to get this result, we compute the discrete derivative ${\dot{G}}_{n+1}(s,q):={G}_{n+1}(s,q)-{G}_{n}(s,q)$. For convenience of the reader we recall that $G_n(s,q)$ is given from (2.10) by

$$\begin{aligned} G_n(s,q) = \frac{1}{2} \Vert s (q-x_{n})- \nu _n {\dot{x}}_{n}\Vert ^2+ \frac{1}{2} s(e -s) \Vert x_{n}-q\Vert ^2. \end{aligned}$$

(A.1)

It is then readily noticed that ${\dot{G}}_{n+1}(s,q)$ can be formulated as

$$\begin{aligned} \begin{array}{l} {\dot{G}}_{n+1}(s,q) = s (\nu _{n+1} {a}_{n+1}- \nu _{n} a_n) + s e {\dot{b}}_{n+1} + \nu ^2_{n+1}{c}_{n+1}- \nu _{n}^2{c}_{n} , \end{array} \end{aligned}$$

(A.2)

where

$$\begin{aligned} \hbox {} a_n:=\langle x_n-q, {\dot{x}}_{n}\rangle \hbox {,}\,\, b_n := \frac{1}{2} \Vert x_n-q\Vert ^2\hbox { and }c_n :=\frac{1}{2} \Vert {\dot{x}}_{n}\Vert ^2. \end{aligned}$$

Note also that for any bilinear symmetric form $\langle .,.\rangle _E$ on a real vector space E and for any sequences $\{ \phi _{n} , \varphi _{n} \} \subset E$ we have the discrete derivative rules:

$$\begin{aligned}&\langle \phi _{n+1} , \varphi _{n+1} \rangle _E- \langle \phi _{n} , \varphi _{n} \rangle _E=\langle {\dot{\phi }}_{n+1}, \varphi _{n+1} \rangle _E + \langle \phi _{n} , {\dot{\varphi }}_{n+1} \rangle _E, \end{aligned}$$

(A.3a)

$$\begin{aligned}&\langle \phi _{n+1} , \varphi _{n+1} \rangle _E - \langle \phi _{n} , \varphi _{n} \rangle _E=\langle {\dot{\phi }}_{n+1}, \varphi _{n} \rangle _E + \langle \phi _{n+1} , {\dot{\varphi }}_{n+1} \rangle _E. \end{aligned}$$

(A.3b)

The sequel of the proof can be divided into the following parts (1)–(4):

(1) Basic estimates. Setting $P_n= \langle q-x_{n+1}, {\dot{x}}_{n+1}\rangle $, we establish the elementary but useful facts below:

$$\begin{aligned} {\dot{b}}_{n+1}= & {} -P_n - \hbox { }\ \frac{1}{2} \Vert {\dot{x}}_{n+1}\Vert ^2, \end{aligned}$$

(A.4a)

$$\begin{aligned} \nu _{n+1} a_{n+1} -\nu _n a_n= & {} \nu _n \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle + e P_n - (e+ \nu _{n+1}) \langle d_{n}, x_{n+1}-q \rangle .\nonumber \\ \end{aligned}$$

(A.4b)

Let us prove (A.4a). From $ 2 b_{n+1}= \Vert x_{n+1}-q\Vert ^2$, by the derivative rule (A.3a) we get
$$\begin{aligned} \begin{array}{l} 2 {\dot{b}}_{n+1} =\langle {\dot{x}}_{n+1}, x_{n+1}-q \rangle + \langle x_{n}-q, {\dot{x}}_{n+1}\rangle \\ =\langle {\dot{x}}_{n+1}, x_{n+1}-q \rangle + \langle x_n -x_{n+1}, {\dot{x}}_{n+1}\rangle + \langle x_{n+1}-q, {\dot{x}}_{n+1}\rangle , \end{array} \end{aligned}$$
namely $ 2 {\dot{b}}_{n+1} = -2 P_n - \Vert {\dot{x}}_{n+1}\Vert ^2$, which leads to (A.4a).
Let us prove (A.4b). From $a_n= \langle q-x_n, - {\dot{x}}_{n}\rangle $, we simply get
$$\begin{aligned} \begin{array}{l} a_n=\langle q-x_{n+1}, -{\dot{x}}_{n}\rangle + \langle {\dot{x}}_{n+1}, -{\dot{x}}_{n}\rangle =- \langle q-x_{n+1}, {\dot{x}}_{n}\rangle - \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle . \end{array}\nonumber \\ \end{aligned}$$
(A.5)
Moreover, from $a_n:=\langle x_n-q, {\dot{x}}_{n}\rangle $, by the rule (A.3b) we readily get
$$\begin{aligned} {\dot{a}}_{n+1} = \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle + \langle x_{n+1}-q , {\dot{x}}_{n+1}-{\dot{x}}_{n}\rangle . \end{aligned}$$
(A.6)
In addition, the derivative rule (A.3b) yields
$$\begin{aligned} \nu _{n+1} a_{n+1} -\nu _n a_n= {\dot{\nu }}_{n+1} a_{n} + {\nu }_{n+1}{\dot{a}}_{n+1}. \end{aligned}$$
This latter result, in light of (A.5) and (A.6), amounts to
$$\begin{aligned} \begin{array}{l} a_{n+1}\nu _{n+1} - a_n \nu _n \\ \quad = {\dot{\nu }}_{n+1} \bigg (- \langle q-x_{n+1}, {\dot{x}}_{n}\rangle - \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle \bigg )\\ \qquad + {\nu }_{n+1}\bigg ( \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle - P_n + \langle q-x_{n+1}, {\dot{x}}_{n}\rangle \bigg ) \\ \quad = \nu _n \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle - \nu _{n+1} P_n + \nu _{n} \langle q-x_{n+1}, {\dot{x}}_{n}\rangle . \end{array} \end{aligned}$$
(A.7)
Furthermore, (2.9) gives us $ {\dot{x}}_{n+1}+ d_{n} - \theta _n {\dot{x}}_{n}=0$. Taking the scalar product of each side of this equality by $q-x_{n+1}$ yields
$$\begin{aligned} \hbox { }\ \theta _n\langle q-x_{n+1}, {\dot{x}}_{n}\rangle = P_n - \langle d_{n}, x_{n+1}-q \rangle . \end{aligned}$$
So, recalling that $ \nu _{n} =(e+\nu _{n+1}) \theta _n$ (from (2.9)), we get
$$\begin{aligned} \nu _{n} \langle q-x_{n+1}, {\dot{x}}_{n}\rangle&=(e+\nu _{n+1}) (\theta _n\langle q-x_{n+1}, {\dot{x}}_{n}\rangle )\\&=(e+\nu _{n+1}) \left( P_n - \langle d_{n}, x_{n+1}-q \rangle \right) , \end{aligned}$$
which, in light of (A.7), entails
$$\begin{aligned} \begin{array}{l} a_{n+1}\nu _{n+1} - a_n \nu _n \\ = \nu _n \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle - \nu _{n+1} P_n+ (e+\nu _{n+1}) \left( P_n - \langle d_{n}, x_{n+1}-q \rangle \right) \\ = \nu _n \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle + e P_n - (e+\nu _{n+1})\langle d_{n}, x_{n+1}-q \rangle , \end{array} \end{aligned}$$
that is (A.4b).

(2) An estimate from the inertial part. Now, given $(s,q) \in [0,\infty ) \times {\mathcal {H}}$, we prove that the discrete derivative $ {\dot{G}}_{n+1}(s,q)$ satisfies

$$\begin{aligned} \begin{array}{l} {\dot{G}}_{n+1}(s,q) + s (e+ \nu _{n+1}) \langle d_{n}, x_{n+1}-q \rangle \\ = s \nu _n \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle - \frac{1}{2} \left( s e - \nu _{n+1}^2 \right) \Vert {\dot{x}}_{n+1}\Vert ^2 - \frac{1}{2} \nu _n ^2 \Vert {\dot{x}}_{n}\Vert ^2. \end{array} \end{aligned}$$

(A.8)

Indeed, in light of (A.2) and (A.4), we obtain

$$\begin{aligned} \begin{array}{l} {\dot{G}}_{n+1} = s \bigg ( \nu _n \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle +e P_n - (e+ \nu _{n+1}) \langle d_{n}, x_{n+1}-q \rangle \bigg ) \\ \qquad + se \bigg ( -P_n - \frac{1}{2}\Vert {\dot{x}}_{n+1}\Vert ^2 \bigg ) + \frac{1}{2} \bigg ( \nu _{n+1}^2 \Vert {\dot{x}}_{n+1}\Vert ^2- \nu _{n}^2 \Vert {\dot{x}}_{n}\Vert ^2 \bigg ) \\ \quad = s \nu _n \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle + \frac{1}{2} \left( \nu _{n+1}^2 - se \right) \Vert {\dot{x}}_{n+1}\Vert ^2 - \frac{1}{2} \nu _n ^2 \Vert {\dot{x}}_{n}\Vert ^2\\ \qquad -s (e+ \nu _{n+1}) \langle d_{n}, x_{n+1}-q \rangle , \end{array} \end{aligned}$$

which leads obviously to the desired equality.

(3) An estimate from the proximal part. We prove that, for any $\xi _n \ne 1$, it holds that

$$\begin{aligned} \begin{array}{l} \xi _n \langle d_n,{\dot{x}}_{n+1}\rangle + \frac{ 1 }{2} \Vert {\dot{x}}_{n+1}- \theta _n{\dot{x}}_n \Vert ^2 \\ = - \theta _n(1-\xi _n)\langle {\dot{x}}_n,{\dot{x}}_{n+1}\rangle + \frac{1}{2} \theta _n^2 \Vert {\dot{x}}_n \Vert ^2 - \left( \xi _n- \frac{1}{2} \right) \Vert {\dot{x}}_{n+1}\Vert ^2. \end{array} \end{aligned}$$

(A.9)

Indeed, we have ${\dot{x}}_{n+1}=\theta _n{\dot{x}}_n -d_n$ (from (2.9)), hence, for any $\xi _n \ne 1$, and setting $H_n= {\dot{x}}_{n+1}- (1-\xi _n)^{-1}\theta _n{\dot{x}}_n $, we have

$$\begin{aligned} (1-\xi _n)H_n= (1-\xi _n){\dot{x}}_{n+1}- \theta _n{\dot{x}}_n = -d_n- \xi _n {\dot{x}}_{n+1}, \end{aligned}$$

or equivalently

$$\begin{aligned} \begin{array}{l} \xi _n {\dot{x}}_{n+1}= -(1-\xi _n)H_n - d_n . \end{array} \end{aligned}$$

(A.10)

Furthermore, by $-d_n= {\dot{x}}_{n+1}- \theta _n{\dot{x}}_n$ (again using (2.9)) we simply obtain

$$\begin{aligned} \begin{array}{l} \langle (-d_n), H_n \rangle = \langle {\dot{x}}_{n+1}- \theta _n{\dot{x}}_{n}, {\dot{x}}_{n+1}- (1-\xi _n)^{-1}\theta _n{\dot{x}}_{n}\rangle \\ = \Vert {\dot{x}}_{n+1}\Vert ^2 + (1-\xi _n)^{-1}\theta _n^2 \Vert {\dot{x}}_{n}\Vert ^2 - \frac{2-\xi _n}{(1-\xi _n)}\theta _n\langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle . \end{array} \end{aligned}$$

(A.11)

Therefore, taking the scalar product of each side of (A.10) with $d_n$, also adding $(1/2) \Vert d_n\Vert ^2$ to the resulting equality, and next using (A.11) we get

$$\begin{aligned} \begin{array}{l} \xi _n \langle d_n , {\dot{x}}_{n+1}\rangle + \frac{1}{2} \Vert d_n\Vert ^2 = (1-\xi _n)\langle (-d_n), H_n \rangle - \frac{1}{2} \Vert d_n\Vert ^2 \\ = (1-\xi _n)\left( \Vert {\dot{x}}_{n+1}\Vert ^2 + \frac{ \theta _n^2 }{(1-\xi _n)} \Vert {\dot{x}}_{n}\Vert ^2 - \frac{2-\xi _n}{(1-\xi _n)} \theta _n\langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle \right) \\ \;\;- \frac{1}{2}\left( \Vert {\dot{x}}_{n+1}\Vert ^2 + \theta _n^2 \Vert {\dot{x}}_{n}\Vert ^2 - 2 \theta _n\langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle \right) \\ = -(1-\xi _n) \theta _n\langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle + \frac{1}{2}\theta _n^2 \Vert {\dot{x}}_{n}\Vert ^2 + \left( \frac{1}{2} -\xi _n \right) \Vert {\dot{x}}_{n+1}\Vert ^2 . \end{array} \end{aligned}$$

This, while noticing that $ \Vert d_n\Vert ^2= \Vert {\dot{x}}_{n+1}- \theta _n{\dot{x}}_{n}\Vert ^2$ (from (2.9)) , yields (A.9).

(4) Combining proximal and inertial effects. At once we show for $s \in \left( 0, e \right] $ that the iterates verify (for $n \ge 0$)

$$\begin{aligned} \begin{array}{l} {\dot{G}}_{n+1}(s,q) + \frac{1}{2} (e+\nu _{n+1})^2 \Vert {\dot{x}}_{n+1}- \theta _n{\dot{x}}_{n}\Vert ^2 \\ + s (e+\nu _{n+1}) \langle d_n, x_{n+1}-q \rangle + (e+\nu _{n+1}) (e-s+\nu _{n+1}) \langle d_n,{\dot{x}}_{n+1}\rangle \\ = -\frac{1}{2} \left( e -s \right) \left( e + 2 \nu _{n+1}\right) \Vert {\dot{x}}_{n+1}\Vert ^2 . \end{array} \end{aligned}$$

(A.12)

Indeed, denoting $\tau _n=e+\nu _{n+1}$, from (A.8) we know that

$$\begin{aligned} \begin{array}{l} {\dot{G}}_{n+1}(s,q) + s \tau _n \langle d_n, x_{n+1}-q \rangle \\ = s \nu _n \langle {\dot{x}}_{n+1}, {\dot{x}}_{n}\rangle - \frac{1}{2} \left( s e - \nu _{n+1}^2 \right) \Vert {\dot{x}}_{n+1}\Vert ^2 - \frac{1}{2} \nu _n ^2 \Vert {\dot{x}}_{n}\Vert ^2. \end{array} \end{aligned}$$

(A.13)

Furthermore, assuming that $s \in \left( 0, e \right] $ and taking $\xi _n=1- s \tau _n^{-1} $ in (A.9), while noticing that $\tau _n \theta _n=\nu _n $ (from (2.9b)), we obtain

$$\begin{aligned} \begin{array}{l} (1- s \tau _n^{-1} ) \langle d_n ,{\dot{x}}_{n+1}\rangle + \frac{ 1 }{2} \Vert {\dot{x}}_{n+1}- \theta _n{\dot{x}}_{n}\Vert ^2 \\ = -s\theta _n\tau _n^{-1} \langle {\dot{x}}_{n}, {\dot{x}}_{n+1}\rangle + \frac{1}{2} \theta _n^2 \Vert {\dot{x}}_{n}\Vert ^2 - \left( \frac{1}{2} -s \tau _n^{-1} \right) \Vert {\dot{x}}_{n+1}\Vert ^2 \\ = -s\nu _n \tau _n^{-2} \langle {\dot{x}}_{n}, {\dot{x}}_{n+1}\rangle + \frac{1}{2} \nu _n^2 \tau _n^{-2} \Vert {\dot{x}}_{n}\Vert ^2 - \left( \frac{1}{2} -s \tau _n^{-1} \right) \Vert {\dot{x}}_{n+1}\Vert ^2 . \end{array} \end{aligned}$$

(A.14)

Then multiplying equality (A.14) by $\tau _n^2$ and adding the resulting equality to (A.13) yields

$$\begin{aligned} \begin{array}{l} (1- s \tau _n^{-1} ) \tau _n^2 \langle d_n ,{\dot{x}}_{n+1}\rangle + \frac{1}{2} \tau _n^2 \Vert {\dot{x}}_{n+1}- \theta _n{\dot{x}}_{n}\Vert ^2 \\ \;\;+\; {\dot{G}}_{n+1}(s,q) + s \tau _n \langle d_n , x_{n+1}-q \rangle \\ = \bigg ( - \frac{1}{2} \left( s e - \nu _{n+1}^2 \right) - \tau _n^2 \left( \frac{1}{2} -s \tau _n^{-1} \right) \bigg ) \Vert {\dot{x}}_{n+1}\Vert ^2 = -T_n, \end{array} \end{aligned}$$

(A.15)

where $T_n$ is defined by

$$\begin{aligned} T_n = \frac{1}{2} \bigg ( se - \nu _{n+1}^2 + \tau _n^2 \left( 1- 2s\tau _n ^{-1} \right) \bigg ) \Vert {\dot{x}}_{n+1}\Vert ^2. \end{aligned}$$

(A.16)

In addition, as $\tau _n:=e+\nu _{n+1}$, a simple computation yields

$$\begin{aligned} \begin{array}{l} \tau _n^2 \left( 1 -2 s \tau _n^{-1} \right) = e^2+ 2 e \nu _{n+1} + ( \nu _{n+1})^2 - 2s \left( e+ \nu _{n+1} \right) \\ = e \left( e -s \right) - s e + 2 \nu _{n+1} \left( e -s \right) + ( \nu _{n+1} )^2 \\ = \left( e + 2 \nu _{n+1}\right) \left( e -s \right) - s e + ( \nu _{n+1})^2. \end{array} \end{aligned}$$

As a consequence, by (A.16) we obtain

$$\begin{aligned} T_{n}= \frac{\left( e -s \right) }{2} \left( e + 2 \nu _{n+1}\right) \Vert {\dot{x}}_{n+1}\Vert ^2. \end{aligned}$$

This ends the proof $\square $

1.2 Proof of Proposition 2.

Let us first prove the result for $b=1$. Consider the positive sequence $(k_n)_{n \ge 0}$ defined, for $n \ge 1$ and for some constants $\{a, c \} \subset [0,\infty )$, by the recursive formula $\frac{k_{n}}{k_{n-1}}= 1 + \frac{a}{ n + c } $, . As a consequence, from the basic inequalities $a \ge [a]$ and $c \le [c]+1$, we get

$$\begin{aligned} \begin{array}{l} \frac{k_{n}}{k_{n-1}} \ge 1 + \frac{[a]}{ n + [c]+1 } = \frac{ n + [c]+[a]+1}{ n + [c]+1 }\end{array}. \end{aligned}$$

Hence, for $n \ge 1$, we obtain

$$\begin{aligned} \begin{array}{l} \frac{k_n}{k_0} \ge \prod _{j=1}^n \frac{ j + [c]+[a]+1}{ j + [c]+1 } = \frac{ \prod _{j=1}^{n} (j + [c]+[a]+1)}{ \prod _{j=1}^{n} (j + [c]+1) } = \frac{ \prod _{j=2 + [c]+[a]}^{n + [c]+1+[a]} j }{ \prod _{j=2+ [c]}^{n+ [c]+1} j }. \end{array} \end{aligned}$$

(A.17)

Moreover, for $n \ge 3 + [a]$, we simply get

$$\begin{aligned}&\hbox { }\ \prod _{j=2 + [c]+[a]}^{n + [c]+1+[a]} j = (2 + [c]+[a]) \bigg ( \prod _{j=3 + [c]+[a]}^{n + [c]} j \bigg ) \bigg (\prod _{j=n+ [c]+1 }^{n + [c]+1+[a]} j \bigg ), \end{aligned}$$

(A.18a)

$$\begin{aligned}&\prod _{j=2+ [c]}^{n+ [c]+1} j = \bigg ( \prod _{j=2+ [c]}^{ 2 + [c]+[a] } j \bigg ) \bigg ( \prod _{j=3+ [c]+[a]}^{ n+ [c] } j \bigg ) (n+ [c]+1). \end{aligned}$$

(A.18b)

As a consequence, for $n \ge 3 + [a]$, by (A.17) in light of (A.18) we get

$$\begin{aligned} \begin{array}{l} k_n \ge k_0 \frac{ (2 + [c]+[a]) \big (\prod _{j=n+ [c]+1 }^{n + [c]+1+[a]} j \big ) }{ \big ( \prod _{j=2+ [c]}^{ 2 + [c]+[a] } j \big ) (n+ [c]+1) }= C_0 \frac{ \prod _{j=n+ [c]+1 }^{n + [c]+1+[a]} j }{ n+ [c]+1 }= C_0 \frac{ \prod _{j= [c]+1 }^{ [c]+1+[a]} (n+j) }{ n+ [c]+1 } , \end{array} \end{aligned}$$

where $C_0:= k_0 \bigg ( \frac{ 2 + [c]+[a] }{ \prod _{j=2+ [c]}^{ 2 + [c]+[a] } j } \bigg )$, which implies that

$$\begin{aligned} \begin{array}{l} k_n \ge C_0 (n+ [c]+1)^{[a]} \ge C_0 n^{[a]} . \end{array} \end{aligned}$$

The desired result follows immediately from the previous arguments when replacing a and c by $\frac{a}{b}$ and $\frac{c}{b}$, respectively. This completes the proof $\square $

1.3 Proof of Theorem 1

The proof will be divided into the following steps (a), (b) and (c):

(a) In order to establish (3.29a) and (3.29b), we first prove the following estimates

$$\begin{aligned}&\hbox { }\ \Vert {\dot{x}}_{n}\Vert =o(n^{-1}), \end{aligned}$$

(A.19a)

$$\begin{aligned}&\hbox { }\ k_{n-1}\Vert A_{\lambda }(x_{n})\Vert =o(n^{-1}). \end{aligned}$$

(A.19b)

Indeed, passing to the limit as $s \rightarrow 0^+$ in (2.8) amounts to

$$\begin{aligned} \begin{aligned}&(b (n+1) +{\bar{c}}-a_1)^2 \Vert {\dot{x}}_{n+1}\Vert ^2 - ( b n +{\bar{c}}-a_1)^2 \Vert {\dot{x}}_{n}\Vert ^2 \\&+\; \lambda k_{n-1} (b n+{\bar{c}}) \left( a \frac{bn+{\bar{c}}}{bn+c}+ a_2 \right) \langle A_{\lambda }(x_{n}) , {\dot{x}}_{n+1}\rangle \\&+\; \frac{1}{2} (a_1-b)\big ( (2n+1)b +2{\bar{c}}-a_1 \big ) \Vert {\dot{x}}_{n+1}\Vert ^2 \le 0. \end{aligned} \end{aligned}$$

(A.20)

Concerning the second quantity in the right side of the above inequality, by $b>0$, $a \ge 0$ and $a_2 >0$ (hence $a+a_2 >0$) we readily have

$$\begin{aligned} (bn+{\bar{c}}) \left( a \frac{bn+{\bar{c}}}{bn+c}+ a_2 \right) \sim b (a+a_2)n \hbox { as }n \rightarrow \infty . \end{aligned}$$

Then, in light of $\sum _n n k_{n-1} |\langle A_{\lambda }(x_{n}), {\dot{x}}_{n+1}\rangle | < \infty $ (from (3.14d)), we infer that

$$\begin{aligned} \hbox { }\ \sum _{n} (bn+{\bar{c}}) \left( a \frac{bn+{\bar{c}}}{bn+c}+ a_2 \right) k_{n-1}| \langle A_{\lambda }(x_{n}) , {\dot{x}}_{n+1}\rangle | < \infty . \end{aligned}$$

It is then a classical matter to derive from (A.20) that there exists some $ l_2 \ge 0$ such that $ \lim _{n \rightarrow \infty } n ^2 \Vert {\dot{x}}_{n}\Vert ^2=l_2$. Moreover, recalling that $\sum _n n \Vert {\dot{x}}_{n}\Vert ^2 < \infty $ (from (3.3f)), and noticing that $\sum _n n^{-1}=\infty $, we infer that $ \liminf _{n \rightarrow \infty } n ^2 \Vert {\dot{x}}_{n}\Vert ^2=0$. It follows that $l_2=0$, that is (A.19a).

Next combining the latter result with $\Vert {\dot{x}}_{n}+ \lambda k_{n-1} A_{\lambda }(x_{n}) \Vert =o(n^{-1})$ (from (3.14b)) gives us immediately that $ \lim _{n \rightarrow \infty } n k_{n-1} \Vert A_{\lambda }(x_{n})\Vert =0$, that is (A.19b).

Then (3.29a) is given by (3.3f) and (A.19a), while (3.29b) follows from (3.14c) and (A.19b).

(b) The estimates in (3.29c) are derived from (3.3e) and (3.3g), while the estimate (3.29d) is given by (3.3d).

(c) Finally, we prove the weak convergence of the iterates $\{x_{n}, z_n\}$ given by CRIPA by means of the Opial lemma. This latter result guarantees that $(x_n)$ converges weakly to some element of S, provided that the following results (h1)–(h2) hold: (h1) for any $q \in S$, the sequence ($\Vert x_n-q\Vert $) is convergent; (h2) any weak-cluster point of ($x_n$) belongs to S.

- Let us prove (h1). Take $q \in S$. Clearly, as a straightforward consequence of the bounded-ness of $(x_n)$ (given by (3.3a)) along with (A.19b) we have

$$\begin{aligned} k_{n-1}\langle A_{\lambda }(x_{n}), x_{n}-q \rangle =o(n^{-1}). \end{aligned}$$

(1.21)

Moreover, we know that (${{\mathcal {E}}}_n(a+a_2,q)$) is convergent (from Lemma 3) and writes

$$\begin{aligned} \begin{aligned} {{\mathcal {E}}}_n(a+a_2 ,q)&= \frac{1}{2} \Vert (a+a_2) (q-x_{n})- (b n +{\bar{c}}-a_1) {\dot{x}}_{n}\Vert ^2 \\&\quad +\; \frac{1}{2} (a+a_2) \big ( a_1-b-(a+a_2) \big ) \Vert x_{n}-q\Vert ^2 \\&\quad +\; (a+a_2) ( b (n-1)+ {\bar{c}}) ( \lambda k_{n-1}) \langle A_{\lambda }(x_{n}) , x_{n}-q \rangle . \end{aligned} \end{aligned}$$

(1.22)

Then, by $n \Vert {\dot{x}}_{n}\Vert \rightarrow 0$ (from (A.19a)) and $n k_{n-1} | \langle A_{\lambda }(x_{n}), x_{n}-q \rangle | \rightarrow 0$ (according to (1.21)) as $n\rightarrow \infty $, we deduce that

$$\begin{aligned} \lim _{n \rightarrow \infty }{{\mathcal {E}}}_n(a+a_2 ,q)= \lim _{n \rightarrow \infty } \frac{1}{2} (a+a_2 ) (a_1-b) \Vert x_{n}-q\Vert ^2. \end{aligned}$$

(1.23)

This entails (h1).

- Now, we prove (h2). Let u be a weak cluster point of $(x_n)$, namely there exists a subsequence ($x_{n_k}$) that converges to u as $k \rightarrow \infty $. So, in view of (A.19b), we simply have $ \lim _{k \rightarrow +\infty } \Vert A_{\lambda } (x_ {n_k})\Vert =0$, while a classical result gives us $A_{\lambda } (x_{n_k}) \in A (x_{n_k}-\lambda A_{\lambda } (x_{n_k}))$. Then passing to the limit as $k \rightarrow \infty $ in this latter result and recalling that the graph of a maximally monotone operator is demi-closed (see, for instance, [13]), we deduce that $0 \in A(u)$, namely $ u \in S$. This proves (h2).

Then we infer by Opial’s lemma that $(x_n)$ converges weakly to some element ${\bar{x}} \in S$.

Finally, by $\sum _n n \Vert k_{n-1} A_{\lambda } (x_{n})\Vert ^2 < \infty $ (from (3.14c)), we get $k_{n-1} A_{\lambda } (x_{n}) \rightarrow 0$ (as $n \rightarrow \infty $), hence, by $ z_{n-1}-x_{n}=\lambda k_{n-1} A_{\lambda } (x_{n})$ (from (2.4a)), we are led to $ z_{n-1}-x_{n}\rightarrow 0$ (as $n \rightarrow \infty $). It follows that $(z_n)$ converges weakly to the element ${\bar{x}}$. This completes the proof $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maingé, PE. Accelerated Proximal Algorithms with a Correction Term for Monotone Inclusions. Appl Math Optim 84 (Suppl 2), 2027–2061 (2021). https://doi.org/10.1007/s00245-021-09819-y

Download citation

Published: 01 September 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00245-021-09819-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Accelerated Proximal Algorithms with a Correction Term for Monotone Inclusions

Abstract

Similar content being viewed by others

Regularization Proximal Method for Monotone Variational Inclusions

Convergence of a relaxed inertial proximal algorithm for maximally monotone operators

Proximal alternating penalty algorithms for nonsmooth constrained convex optimization

1 Introduction

1.1 The considered problem

1.2 A brief review of the state of art.

1.2.1 Convex minimization and Güuler’s acceleration processes.

1.2.2 Monotone inclusions and acceleration processes.

Remark 1

1.3 CRIPA and an overview of the related results.

1.3.1 Introducing CRIPA.

1.3.2 An overview of the main results.

1.4 Organization of the paper.

Remark 2

2 Preliminaries on CRIPA.

2.1 Formulation of CRIPA by means of Yosida approximations.

Lemma 1

Proof

Remark 3

2.2 A general inequality for a Lyapunov analysis

Lemma 2

2.2.1 Proof of Lemma 2 - part 1.

Proposition 1

2.2.2 Proof of Lemma 2 - part 2.

3 CRIPA in the general case of monotone operators.

3.1 Main estimates.

3.1.1 Estimates from the energy-like sequence.

Lemma 3

Proof

3.1.2 Estimates from the reformulation of the method.

Lemma 4

Proof

3.2 Asymptotic convergence and main results.

3.2.1 Convergence in the general case of parameters.

Theorem 1

3.2.2 Convergence results in particular cases of parameters.

Proposition 2

Theorem 2

Theorem 3

4 CRIPA in the convex case.

Remark 4

Lemma 5

Proof

Theorem 4

Proof

Theorem 5

Proof

5 Numerical experiments.

5.1 The maximally monotone case.

5.2 The convex case.

Remark 5

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

APPENDIX.

APPENDIX.

1.1 Proof of Proposition 1.

1.2 Proof of Proposition 2.

1.3 Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation