1 Introduction

1.1 Problem Formulation and a Continuous Time Splitting Scheme with Vanishing Damping

Let \(\mathcal {H}\) be a real Hilbert, \(A: \mathcal {H}\rightarrow 2^{\mathcal {H}}\) a maximally monotone operator and \(B: \mathcal {H}\rightarrow \mathcal {H}\) a \(\beta \)-cocoercive operator for some \(\beta > 0\) such that \({{\,\mathrm{zer}\,}}(A + B)\ne \emptyset \). Devising fast convergent continuous and discrete time dynamics for solving monotone inclusions of the type

$$\begin{aligned} \text{ find } \ x \in \mathcal {H} \ \text{ such } \text{ that } \ 0 \in (A + B)(x) \end{aligned}$$
(1)

is of great importance in many fields, including, but not limited to, optimization, equilibrium theory, economics and game theory, partial differential equations, and statistics. One of our main motivations comes from the fact that solving the convex optimization problem

$$\begin{aligned}\min _{x\in \mathcal {H}} f(x) + g(x),\end{aligned}$$

where \(f : \mathcal {H} \rightarrow \mathbb {R}\cup \{+\infty \}\) is proper, convex and lower semicontinuous and \(g : \mathcal {H} \rightarrow \mathbb {R}\) is convex and Fréchet differentiable with a Lipschitz continuous gradient, is equivalent to solving the monotone inclusion

$$\begin{aligned} 0 \in (\partial f + \nabla g)(x). \end{aligned}$$

We want to exploit the additive structure of (1) and approach A and B separately, in the spirit of the splitting paradigm.

For \(t \ge t_{0} > 0\), \(\alpha > 1, \xi \ge 0\), and functions \(\lambda , \gamma : [t_{0}, +\infty ) \rightarrow (0, +\infty )\), we will study the asymptotic behaviour of the trajectories of the second order differential equation

$$\begin{aligned} \text {(Split-DIN-AVD)}\quad \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \left( \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right) + T_{\lambda (t), \gamma (t)}(x(t)) = 0, \end{aligned}$$
(2)

where, for \(\lambda , \gamma > 0\), the operator \(T_{\lambda , \gamma } : \mathcal {H} \rightarrow \mathcal {H}\) is given by

$$\begin{aligned} T_{\lambda , \gamma } = \frac{1}{\lambda }\Big [{{\,\mathrm{Id}\,}}- J_{\gamma A}\circ ({{\,\mathrm{Id}\,}}- \gamma B)\Big ]. \end{aligned}$$

The sets of zeros of \(A+B\) and of \(T_{\lambda , \gamma }\), for \(\lambda , \gamma > 0\), coincide. The nomenclature (Split-DIN-AVD) comes from the splitting feature of the continuous time scheme, as well as the link with the (DIN-AVD) system developed by Attouch and László in [9] (Dynamic Inertial Newton—Asymptotic Vanishing Damping), which we will emphasize later. We will discuss the existence and uniqueness of the trajectories generated (Split-DIN-AVD), and also show their weak convergence to the set of zeros of \(A + B\) as well as the fast convergence of the velocities to zero, and convergence rates for \(T_{\lambda (t), \gamma (t)}(x(t))\) and \(\frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\) as \(t\rightarrow +\infty \).

For the particular case \(B = 0\), we are left with the monotone inclusion problem

$$\begin{aligned} \text{ find } \ x \in \mathcal {H} \ \text{ such } \text{ that } \ 0 \in A(x),\end{aligned}$$

and the attached system

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \left( \frac{d}{dt}A_{\lambda (t), \gamma (t)}(x(t))\right) + A_{\lambda (t), \gamma (t)}(x(t)) = 0, \end{aligned}$$

where, for \(\lambda , \gamma > 0\), the operator \(A_{\lambda , \gamma } : \mathcal {H} \rightarrow \mathcal {H}\) can be seen as a generalized Moreau envelope of the operator A, i.e.,

$$\begin{aligned} A_{\lambda , \gamma } = \frac{1}{\lambda }\Big [{{\,\mathrm{Id}\,}}- J_{\gamma A}\Big ]. \end{aligned}$$

In particular, we will be able to set \(\gamma (t) = \lambda (t)\) for every \(t \ge t_0\). Since for \(\lambda > 0\), \(A_{\lambda , \lambda } = A_{\lambda }\), this allows us to recover the (DIN-AVD) system

$$\begin{aligned} \text {(DIN-AVD)} \quad \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \left( \frac{d}{dt}A_{\lambda (t)}(x(t))\right) + A_{\lambda (t)}(x(t)) = 0, \end{aligned}$$

addressed by Attouch and László in [9].

If \(A = 0\), and after properly redefining some parameters, we obtain the following system

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \left( \frac{d}{dt}\frac{1}{\eta (t)}Bx(t)\right) + \frac{1}{\eta (t)}Bx(t) = 0, \end{aligned}$$

with \(\eta : [t_{0}, +\infty ) \rightarrow (0, +\infty )\), which addresses the monotone equation

$$\begin{aligned} \text{ find } \ x \in \mathcal {H} \ \text{ such } \text{ that } \ B(x)=0.\end{aligned}$$

This dynamical system approaches the cocoercive operator B directly through a forward evaluation, which is more natural, instead of having to resort to its Moreau envelope, as in (DIN-AVD).

1.2 Notation and Preliminaries

In this subsection, we will explain the notions which were mentioned in the previous subsection, and we will introduce some definitions and preliminary results that will be required later. Throughout the paper, we will be working in a real Hilbert space \(\mathcal {H}\) with inner product \(\langle \cdot , \cdot \rangle \) and corresponding norm \(\Vert \cdot \Vert = \sqrt{\langle \cdot , \cdot \rangle }\).

Let \(A : \mathcal {H} \rightarrow 2^{\mathcal {H}}\) be a set-valued operator, that is, Ax is a subset of \(\mathcal {H}\) for every \(x\in \mathcal {H}\). The operator A is totally characterized by its graph \({{\,\mathrm{gra}\,}}A = \{(x, u) \in \mathcal {H}\times \mathcal {H} : u\in Ax\}\). The inverse of A is the operator \(A^{-1} : \mathcal {H} \rightarrow 2^{\mathcal {H}}\) well-defined through the equivalence \(x\in A^{-1}u\) if and only if \(u\in Ax\). The set of zeros of A is the set \({{\,\mathrm{zer}\,}}A = \{x\in \mathcal {H} : 0 \in Ax\}\). For a subset \(C\subseteq \mathcal {H}\), we say that \(A(C) = \cup _{x\in C}Ax\). The range of A is the set \({{\,\mathrm{ran}\,}}A = A(\mathcal {H})\).

A set-valued operator A is said to be monotone if \(\langle v - u, y - x\rangle \ge 0\) whenever \((x, u), (y, v)\in {{\,\mathrm{gra}\,}}A\), and maximally monotone if it is monotone and the following implication holds:

$$\begin{aligned} \widetilde{A} \,\,\text {is monotone}, \,\,{{\,\mathrm{gra}\,}}A \subseteq {{\,\mathrm{gra}\,}}\widetilde{A} \,\Longrightarrow \, A = \widetilde{A}. \end{aligned}$$

Let \(\lambda > 0\). The resolvent of index \(\lambda \) of A is the operator \(J_{\lambda A} : \mathcal {H} \rightarrow 2^{\mathcal {H}}\) given by

$$\begin{aligned} J_{\lambda A} = ({{\,\mathrm{Id}\,}}+ \lambda A)^{-1}, \end{aligned}$$

and the Moreau envelope (or Yosida approximation or Yosida regularization) of index \(\lambda \) of A is the operator \(A_{\lambda } : \mathcal {H} \rightarrow 2^{\mathcal {H}}\) given by

$$\begin{aligned} A_{\lambda } = \frac{1}{\lambda }({{\,\mathrm{Id}\,}}- J_{\lambda A}), \end{aligned}$$

where \({{\,\mathrm{Id}\,}}: \mathcal {H} \rightarrow \mathcal {H}\), defined by \({{\,\mathrm{Id}\,}}(x) = x\) for every \(x\in \mathcal {H}\), is the identity operator of \(\mathcal {H}\). For \(\lambda _{1}, \lambda _{2} > 0\), it holds \((A_{\lambda _{1}})_{\lambda _{2}} = A_{\lambda _{1} + \lambda _{2}}\).

A single-valued operator \(B : \mathcal {H} \rightarrow \mathcal {H}\) is said to be \(\beta \)-cocoercive for some \(\beta >0\) if for every \(x, y\in \mathcal {H}\) we have

$$\begin{aligned} \beta \Vert Bx - By\Vert ^{2} \le \langle Bx - By, x - y\rangle . \end{aligned}$$

In this case, B is \(\frac{1}{\beta }\)-Lipschitz continuous, namely, for every \(x, y\in \mathcal {H}\) we have

$$\begin{aligned} \Vert Bx - By\Vert \le \frac{1}{\beta } \Vert x - y\Vert . \end{aligned}$$

We say B is nonexpansive if it is 1-Lipschitz continuous, and firmly nonexpansive if it is 1-cocoercive. For \(\alpha \in (0, 1)\), we say B is \(\alpha \)-averaged if there exists a nonexpansive operator \(R :\mathcal {H} \rightarrow \mathcal {H}\) such that

$$\begin{aligned} B = (1 - \alpha ){{\,\mathrm{Id}\,}}+ \alpha R. \end{aligned}$$

Let \(\lambda > 0\) and \(A : \mathcal {H} \rightarrow 2^{\mathcal {H}}\). According to Minty’s Theorem, A is maximally monotone if and only if \({{\,\mathrm{ran}\,}}({{\,\mathrm{Id}\,}}+ \lambda A) = \mathcal {H}\). In this case \(J_{\lambda A}\) is single-valued and firmly nonexpansive, \(A_{\lambda }\) is single-valued, \(\lambda \)-cocoercive, and for every \(x\in \mathcal {H}\) and every \(\lambda _{1}, \lambda _{2} > 0\) we have

$$\begin{aligned} \Vert J_{\lambda _{1}A}(x) - J_{\lambda _{2}A}(x)\Vert \le |\lambda _{1} - \lambda _{2}|\Vert A_{\lambda _{1}}(x)\Vert . \end{aligned}$$

Let \(B : \mathcal {H} \rightarrow \mathcal {H}\) be a single-valued operator. If B is \(\alpha \)-averaged for some \(\alpha \in (0, 1)\), then \({{\,\mathrm{Id}\,}}- B\) is \(\frac{1}{2\alpha }\)-cocoercive. If B is monotone and continuous, then it is maximally monotone.

The following concepts and results show the strong interplay between the theory of monotone operators and the convex analysis.

Let \(f : \mathcal {H}\rightarrow \mathbb {R}\cup \{+\infty \}\) be a proper, convex and lower semicontinuous function. We denote the infimum of f over \(\mathcal {H}\) by \(\min _{\mathcal {H}}f\) and the set of global minimizers of f by \({{\,\mathrm{argmin}\,}}_{\mathcal {H}}f\). The subdifferential of f is the operator \(\partial f : \mathcal {H} \rightarrow 2^{\mathcal {H}}\) defined, for every \(x\in \mathcal {H}\), by

$$\begin{aligned} \partial f(x) = \{x^{*}\in \mathcal {H} : \langle x^{*}, y - x\rangle + f(x) \le f(y) \,\,\forall y\in \mathcal {H}\}. \end{aligned}$$

The subdifferential operator of f is maximally monotone and \(\overline{x} \in {{\,\mathrm{zer}\,}}\partial f \) \(\Leftrightarrow \) \(\overline{x}\) is a global minimizer of f.

Let \(\lambda > 0\). The proximal operator of f of index \(\lambda \) is the operator \({{\,\mathrm{prox}\,}}_{\lambda f} : \mathcal {H} \rightarrow \mathcal {H}\) defined, for every \(x\in \mathcal {H}\), by

$$\begin{aligned} {{\,\mathrm{prox}\,}}_{\lambda f}(x) = J_{\lambda \partial f}(x)={{\,\mathrm{argmin}\,}}_{y\in \mathcal {H}}\left[ f(y) + \frac{1}{2 \lambda }\Vert x - y\Vert ^{2}\right] , \end{aligned}$$

which also means that \({{\,\mathrm{prox}\,}}_{\lambda f}\) is firmly nonexpansive. The Moreau envelope of f of index \(\lambda \) is the function \(f_{\lambda } : \mathcal {H}\rightarrow \mathbb {R}\) given, for every \(x\in \mathcal {H}\), by

$$\begin{aligned} f_{\lambda }(x) = f\left( {{\,\mathrm{prox}\,}}_{\lambda f}(x)\right) + \frac{1}{2\lambda }\Vert x - {{\,\mathrm{prox}\,}}_{\lambda f}(x)\Vert ^{2}. \end{aligned}$$

The function \(f_{\lambda }\) is Fréchet differentiable and

$$\begin{aligned} \nabla f_{\lambda }(x) = \frac{1}{\lambda }\left( x - {{\,\mathrm{prox}\,}}_{\lambda f}(x)\right) = (\partial f)_{\lambda }(x) \quad \forall x \in \mathcal {H}. \end{aligned}$$

Finally, if \(f : \mathcal {H}\rightarrow \mathbb {R}\) has full domain and is Fréchet differentiable with \(\frac{1}{\beta }\)-Lipschitz continuous gradient, for \(\beta >0\), then, according to Baillon–Haddad’s Theorem, \(\nabla f\) is \(\beta \)-cocoercive.

1.3 A Brief History of Inertial Systems Attached to Optimization Problems and Monotone Inclusions

In the last years there have been many advances in the study of continuous time inertial systems with vanishing damping attached to monotone inclusion problems. We briefly visit them in the following paragraphs.

1.3.1 The Heavy Ball Method with Friction

Consider a convex and continuously differentiable function \(f : \mathcal {H}\rightarrow \mathbb {R}\) with at least one minimizer. The heavy ball with friction system

$$\begin{aligned} \text {(HBF)} \quad \ddot{x}(t) + \mu \dot{x}(t) + \nabla f(x(t)) = 0 \end{aligned}$$
(3)

was introduced by Álvarez in [2] as a suitable continuous time scheme to approach the minimization of the function f. This system can be seen as the equation of the horizontal position x(t) of an object that moves, under the force of gravity, along the graph of the function f, subject to a kinetic friction represented by the term \(\mu \dot{x}(t)\) (a nice derivation can be seen in the work done by Attouch-Goudou-Redont in [8]). It is known that, if x is a solution of (HBF), then x converges weakly to a minimizer of f and \(f(x(t)) - \min _{\mathcal {H}}f = \mathcal {O}\left( \frac{1}{t}\right) \) as \(t\rightarrow +\infty \).

In recent times, the question was raised whether the damping coefficient \(\mu \) could be chosen to be time-dependent. An important contribution was made by Su–Boyd–Candés (in [20]) who studied the case of an Asymptotic Vanishing Damping coefficient \(\mu (t) = \frac{\alpha }{t}\), namely,

$$\begin{aligned} \text {(AVD)} \quad \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \nabla f(x(t)) = 0, \end{aligned}$$
(4)

and proved when \(\alpha \ge 3\) the rate of convergence for the functional values \(f(x(t)) - \min _{\mathcal {H}}f = O\left( \frac{1}{t^{2}}\right) \) as \(t\rightarrow +\infty \). This second order system can be seen as a continuous counterpart to Nesterov’s accelerated gradient method from [19]. Weak convergence of the trajectories generated by \(\text {(AVD)}\) when \(\alpha > 3\) has been shown by Attouch-Chbani-Peypouquet-Redont [6] and May [18], with the improved rate of convergence for the functional values \(f(x(t)) - \min _{\mathcal {H}}f = o\left( \frac{1}{t^{2}}\right) \) as \(t\rightarrow +\infty \). For \(\alpha = 3\), the convergence of the trajectories remains an open question, except for the one dimensional case (see [7]). In the subcritical case \(\alpha \le 3\), it has been shown by Apidopoulos-Aujol-Dossal [5] and Attouch-Chbani-Riahi [7] that the objective values converge at a rate \(\mathcal {O}(t^{-\frac{2\alpha }{3}})\) as \(t\rightarrow +\infty \).

1.3.2 Heavy Ball Dynamics and Cocoercive Operators

If \(f : \mathcal {H}\rightarrow \mathbb {R}\cup \{+\infty \}\) is a proper, convex and lower semicontinuous function which is not necessarily differentiable, then we cannot make direct use of (3). However, since for \(\lambda > 0\) we have \({{\,\mathrm{argmin}\,}}f = {{\,\mathrm{argmin}\,}}f_{\lambda }\), we can replace f by its Moreau envelope \(f_{\lambda }\), and the system now becomes

$$\begin{aligned} \ddot{x}(t) + \mu \dot{x}(t) + \nabla f_{\lambda }(x(t)) = 0. \end{aligned}$$

In line with this idea, and in analogy with (3), Álvarez and Attouch [3] and Attouch and Maingé [11] studied the dynamics

$$\begin{aligned} \ddot{x}(t) + \mu \dot{x}(t) + B(x(t)) = 0, \end{aligned}$$
(5)

where \(B : \mathcal {H}\rightarrow \mathcal {H}\) is a \(\beta \)-cocoercive operator. They were able to prove that the solutions of this system weakly converge to elements of \({{\,\mathrm{zer}\,}}B\) provided that the cocoercitivity parameter \(\beta \) and the damping coefficient \(\mu \) satisfy \(\beta \mu ^{2} > 1\). For a maximally monotone operator \(A : \mathcal {H}\rightarrow 2^{\mathcal {H}}\), we know that its Moreau envelope is \(\lambda \)-cocoercive and thus, under the condition \(\lambda \mu ^{2} > 1\), the trajectories of

$$\begin{aligned} \ddot{x}(t) + \mu \dot{x}(t) + A_{\lambda }(x(t)) = 0 \end{aligned}$$

converge weakly to elements of \({{\,\mathrm{zer}\,}}A_{\lambda } = {{\,\mathrm{zer}\,}}A\).

Also related to (5), Boţ-Csetnek [16] considered the system

$$\begin{aligned} \ddot{x}(t) + \mu (t)\dot{x}(t) + \nu (t)Bx(t) = 0, \end{aligned}$$
(6)

where \(B : \mathcal {H}\rightarrow \mathcal {H}\) is again \(\beta \)-cocoercive. Under the assumption that \(\mu \) and \(\nu \) are locally absolutely continuous, \(\dot{\mu }(t) \le 0 \le \dot{\nu }(t)\) for almost every \(t\in [0, +\infty )\) and \(\inf _{t\ge 0} \frac{\mu ^{2}(t)}{\nu (t)} > \frac{1}{\beta }\), the authors were able to prove that the solutions to this system converge weakly to zeros of B.

In [12], Attouch and Peypouquet addressed the system

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + A_{\lambda (t)}(x(t)) = 0, \end{aligned}$$
(7)

where \(\alpha > 1\) and the time-dependent regularizing parameter \(\lambda (t)\) satisfies \(\lambda (t) \frac{\alpha ^{2}}{t^{2}} > 1\) for every \(t \ge t_0 >0\). As well as ensuring the weak convergence of the trajectories towards elements of \({{\,\mathrm{zer}\,}}A\), choosing the regularizing parameter in such a fashion allowed the authors to obtain fast convergence of the velocities and accelerations towards zero.

1.3.3 Inertial Dynamics with Hessian Damping

Let us return briefly to the \(\text {(AVD)}\) system (4). In addition to the viscous vanishing damping term \(\frac{\alpha }{t}\dot{x}(t)\), the following system with Hessian-driven damping was considered by Attouch-Peypouquet-Redont in [13]

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \nabla ^{2}f(x(t))\dot{x}(t) + \nabla f(x(t)) = 0, \end{aligned}$$

where \(\xi \ge 0\). While preserving the fast convergence properties of the Nesterov accelerated method, the Hessian-driven damping term reduces the oscillatory aspect of the trajectories. In [9], Attouch and László studied a version of (7) with an added Hessian-driven damping term:

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \left( \frac{d}{dt}A_{\lambda (t)}(x(t))\right) + A_{\lambda (t)}(x(t)) = 0. \end{aligned}$$

While preserving the convergence results of (7), the main benefit of the introduction of this damping term is the fast convergence rates that can be obtained for \(A_{\lambda (t)}(x(t))\) and \(\frac{d}{dt}A_{\lambda (t)}(x(t))\) as \(t\rightarrow +\infty \). The regularizing parameter \(\lambda (t)\) is again chosen to be time-dependent; in the general case, the authors take \(\lambda (t) = \lambda t^{2}\), and in [12] it is shown that taking \(\lambda (t)\) this way is critical. However, in the case where \(A = \partial f\) for a proper, convex and lower semicontinuous function f, it is also allowed to take \(\lambda (t) = \lambda t^{r}\) with \(r \ge 0\).

1.4 Layout of the Paper

In Sect. 2, we give the proof for the existence and uniqueness of strong global solutions to (Split-DIN-AVD) by means of a Cauchy–Lipschitz–Picard argument. In Sect. 3 we state the main theorem of this work, and we show the weak convergence of the solutions of (2) to elements of \({{\,\mathrm{zer}\,}}(A + B)\), as well as the fast convergence of the velocities and accelerations to zero. We also provide convergence rates for \(T_{\lambda (t), \gamma (t)}(x(t))\) and \(\frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\) as \(t\rightarrow +\infty \). We explore the particular cases \(A = 0\) and \(B = 0\), and show improvements with respect to previous works. In Sect. 4, we address the convex minimization case, namely, when \(A = \partial f\) and \(B = \nabla g\), where \(f : \mathcal {H}\rightarrow \mathbb {R}\cup \{+\infty \}\) is a proper, convex and lower semicontinuous function and \(g : \mathcal {H}\rightarrow \mathbb {R}\) is a convex and Fréchet differentiable function with Lipschitz continuous gradient, and derive, in addition, a fast convergence rate for the function values. In Sect. 5, we illustrate the theoretical results by numerical experiments. In Sect. 5, we provide an algorithm that arises from a time discretization of (Split-DIN-AVD) and discuss its convergence properties.

2 Existence and Uniqueness of Trajectories

In this section, we show the existence and uniqueness of strong global solutions to (Split-DIN-AVD). For the sake of clarity, first we state the definition of a strong global solution.

Definition 2.1

We say that \(x : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) is a strong global solution of (Split-DIN-AVD) with Cauchy data \(( x_{0}, u_{0}) \in \mathcal {H} \times \mathcal {H}\) if

  1. (i)

    \(x, \dot{x} : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) are locally absolutely continuous;

  2. (ii)

    \(\ddot{x}(t) + \frac{\alpha }{t}\dot{x} + \xi \left( \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right) + T_{\lambda (t), \gamma (t)}(x(t)) = 0\) for almost every \(t\in [t_{0}, +\infty )\);

  3. (iii)

    \(x(t_{0}) = x_{0}\), \(\dot{x}(t_{0}) = u_{0}\).

A classic solution is just a strong global solution which is \(\mathcal {C}^{2}\). Sometimes we will mention the terms strong global solution or classic global solution without explicit mention of the Cauchy data.

The following lemma will be used to prove the existence of strong global solutions of our system, and we will need it in the proof of the main theorem as well.

Lemma 2.2

Let \(A : \mathcal {H}\rightarrow 2^{\mathcal {H}}\) be a maximally monotone operator and \(B : \mathcal {H} \rightarrow \mathcal {H}\) a \(\beta \)-cocoercive operator for some \(\beta > 0\). Then, the following statements hold:

  1. (i)

    For \(\lambda > 0\) and \(\gamma \in (0, 2\beta )\), \(T_{\lambda , \gamma }\) is a \(\lambda \frac{4\beta - \gamma }{4\beta }\)-cocoercive operator. In particular, this also implies that \(T_{\lambda , \gamma }\) is \(\frac{\lambda }{2}\)-cocoercive.

  2. (ii)

    Choose \(\lambda _{1}, \lambda _{2} > 0\), \(\gamma _{1}, \gamma _{2}\in (0, 2\beta )\) and \(x, y\in \mathcal {H}\). Then, for \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\) it holds

    $$\begin{aligned} \Vert \lambda _{1}T_{\lambda _{1}, \gamma _{1}}(x) - \lambda _{2}T_{\lambda _{2}, \gamma _{2}}(y)\Vert&\le \, 4\Vert x - y\Vert + \frac{4\beta |\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert B(x)\Vert \\&\quad +\frac{2|\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert x - \overline{x}\Vert ,\\ \left\| T_{\lambda _{1}, \gamma _{1}}(x) - T_{\lambda _{2}, \gamma _{2}}(y)\right\|&\le \, \frac{1}{\lambda _{1}}\left[ 4\Vert x - y\Vert + 4\beta \frac{|\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert Bx\Vert \right. \\ {}&\quad \left. + 2\frac{|\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert x - \overline{x}\Vert \right] + 2\frac{|\lambda _{2} - \lambda _{1}|}{\lambda _{1}\lambda _{2}}\Vert y - \overline{x}\Vert . \end{aligned}$$
  3. (iii)

    If x is a classic global solution to (2) and \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\), then, for every \(t\ge t_{0}\), we have

    $$\begin{aligned} \left\| \frac{d}{dt}\left( \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right) \right\| \le 4\Vert \dot{x}(t)\Vert + 4\beta \frac{|\dot{\gamma }(t)|}{\gamma (t)}\Vert B(x(t))\Vert + 2\frac{|\dot{\gamma }(t)|}{\gamma (t)}\Vert x(t) - \overline{x}\Vert . \end{aligned}$$

Proof

  1. (i)

    From [14, Proposition 26.1(iv)(d)] we know that the operator \(J_{\gamma A}\circ ({{\,\mathrm{Id}\,}}- \gamma B)\) is \(\alpha = \frac{2\beta }{4\beta - \gamma }\)-averaged. From [14, Proposition 4.39], we obtain that \({{\,\mathrm{Id}\,}}- J_{\gamma A}\circ ({{\,\mathrm{Id}\,}}- \gamma B)\) is \(\frac{1}{2\alpha }\)-cocoercive, namely, it is \(\frac{4\beta - \gamma }{4\beta }\)-cocoercive. Since \(\gamma \in (0, 2\beta )\), we have \(\frac{4\beta - \gamma }{4\beta } > \frac{2\beta }{4\beta } = \frac{1}{2}\), which implies that \({{\,\mathrm{Id}\,}}- J_{\gamma A}\circ ({{\,\mathrm{Id}\,}}- \gamma B)\) is \(\frac{1}{2}\)-cocoercive and thus

    $$\begin{aligned} T_{\lambda , \gamma } \,\,\,\text {is}\,\,\,\lambda \frac{4\beta - \gamma }{4\beta }\text {-cocoercive}\,\,\,\text {and}\,\,\,T_{\lambda , \gamma } \,\,\,\text {is}\,\,\,\frac{\lambda }{2}\text {-cocoercive}. \end{aligned}$$
  2. (ii)

    We have

    $$\begin{aligned}&\Vert \lambda _{1}T_{\lambda _{1}, \gamma _{1}}(x) - \lambda _{2}T_{\lambda _{2}, \gamma _{2}}(y)\Vert \le \Vert x - y\Vert + \Vert J_{\gamma _{1}A}(x - \gamma _{1}B(x)) - J_{\gamma _{2}A}(y - \gamma _{2}B(y))\Vert \\&\quad \le \Vert x - y\Vert + \Vert J_{\gamma _{1}A}(x - \gamma _{1}B(x)) - J_{\gamma _{2}A}(x - \gamma _{1}B(x))\Vert \\&\qquad + \Vert J_{\gamma _{2}A}(x - \gamma _{1}B(x)) - J_{\gamma _{2}A}(y - \gamma _{2}B(y))\Vert \\&\quad \le \ 2\Vert x - y\Vert + |\gamma _{1} - \gamma _{2}|\Vert A_{\gamma _{1}}(x - \gamma _{1}B(x))\Vert + \Vert \gamma _{1}B(x) - \gamma _{2}B(y)\Vert \\&\quad \le \ 2\Vert x - y\Vert + |\gamma _{1} - \gamma _{2}|\Vert A_{\gamma _{1}}(x - \gamma _{1}B(x))\Vert \\&\qquad + \Vert \gamma _{1}B(x) - \gamma _{2}B(x)\Vert + \Vert \gamma _{2}B(x) - \gamma _{2}B(y)\Vert \\&\quad = \ 2\Vert x - y\Vert + |\gamma _{1} - \gamma _{2}|\Vert A_{\gamma _{1}}(x - \gamma _{1}B(x))\Vert \\&\qquad + |\gamma _{1} - \gamma _{2}|\Vert B(x)\Vert + \gamma _{2}\Vert B(x) - B(y)\Vert . \end{aligned}$$

Now, notice that

$$\begin{aligned} A_{\gamma _{1}}(x - \gamma _{1}B(x))&= \frac{1}{\gamma _{1}}({{\,\mathrm{Id}\,}}- J_{\gamma _{1}A})(x - \gamma _{1}B(x)) = - B(x) + \frac{1}{\gamma _{1}}(x - J_{\gamma _{1}A}(x - \gamma _{1}B(x))) \\&= -B(x) + T_{\gamma _{1}, \gamma _{1}}(x), \end{aligned}$$

so using (i) and the fact that \(T_{\gamma _{1}, \gamma _{2}}(\overline{x}) = 0\), we obtain

$$\begin{aligned} \begin{aligned} \Vert A_{\gamma _{1}}(x - \gamma _{1}B(x))\Vert&= \Vert -B(x) + T_{\gamma _{1},\gamma _{2}}(x)\Vert \le \Vert B(x)\Vert + \Vert T_{\gamma _{1}, \gamma _{2}}(x) - T_{\gamma _{1}, \gamma _{2}}(\overline{x})\Vert \\&\le \Vert B(x)\Vert + \frac{2}{\gamma _{1}}\Vert x - \overline{x}\Vert . \end{aligned} \end{aligned}$$
(8)

Altogether, plugging (8) into our initial inequality yields

$$\begin{aligned}&\Vert \lambda _{1}T_{\lambda _{1}, \gamma _{2}}(x) - \lambda _{2}T_{\lambda _{2}, \gamma _{2}}(y)\Vert \le \, 2\Vert x - y\Vert + 2|\gamma _{1} - \gamma _{2}|\Vert B(x)\Vert + \frac{2|\gamma _{1} -\gamma _{2}|}{\gamma _{1}}\Vert x - \overline{x}\Vert \\&\qquad + \gamma _{2}\Vert B(x) -B(y)\Vert \\&\quad \le \, 2\Vert x - y\Vert + \frac{4\beta |\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert B(x)\Vert + \frac{2|\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert x - \overline{x}\Vert + 2\beta \left( \frac{1}{\beta }\right) \Vert x - y\Vert . \end{aligned}$$

To show the second inequality, we use the previous one. We have

$$\begin{aligned}&\left\| T_{\lambda _{1}, \gamma _{1}}(x) - T_{\lambda _{2}, \gamma _{2}}(y)\right\| = \frac{1}{\lambda _{1}}\left\| \lambda _{1}T_{\lambda _{1}, \gamma _{1}}(x) - \lambda _{2}T_{\lambda _{2}, \gamma _{2}}(y) + (\lambda _{2} - \lambda _{1})T_{\lambda _{2}, \gamma _{2}}(y)\right\| \\&\quad \le \frac{1}{\lambda _{1}}\left[ 4\Vert x - y\Vert + 4\beta \frac{|\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert Bx\Vert + 2\frac{|\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert x - \overline{x}\Vert \right] + \frac{|\lambda _{2} - \lambda _{1}|}{\lambda _{1}}\left\| T_{\lambda _{2}, \gamma _{2}}(y)\right\| \\&\quad \le \frac{1}{\lambda _{1}}\left[ 4\Vert x - y\Vert + 4\beta \frac{|\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert Bx\Vert + 2\frac{|\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert x - \overline{x}\Vert \right] \\ {}&\qquad + 2\frac{|\lambda _{2} - \lambda _{1}|}{\lambda _{1}\lambda _{2}}\left\| y - \overline{x}\right\| , \end{aligned}$$

where the last line is a consequence of \(T_{\lambda _{2}, \gamma _{2}}\) being \(\frac{\lambda _{2}}{2}\)-cocoercive, and hence \(\frac{2}{\lambda _{2}}\)-Lipschitz continuous (see (i)).

(iii) For \(t, s \ge t_{0}\) set

$$\begin{aligned} x = x(t),\,\, y = x(s),\,\, \lambda _{1} = \lambda (t),\,\, \gamma _{1} = \gamma (t),\,\, \lambda _{2} = \lambda (s),\,\, \gamma _{2} = \gamma (s) \end{aligned}$$

and use (ii) to obtain, for every \(t\ge t_{0}\),

$$\begin{aligned}&\frac{\Vert \lambda (t)T_{\lambda (t), \gamma (t)}(x(t)) - \lambda (s)T_{\lambda (s), \gamma (s)}(x(s))\Vert }{|t - s|} \\&\quad \le 4\frac{\Vert x(t) - x(s)\Vert }{|t - s|} + \frac{4\beta }{\gamma (t)}\frac{|\gamma (t) - \gamma (s)|}{|t - s|}\Vert B(x(t))\Vert +\, \frac{2}{\gamma (t)}\frac{|\gamma (t) - \gamma (s)|}{|t - s|}\Vert x(t) - \overline{x}\Vert . \end{aligned}$$

Hence, by taking the limit as \(s\rightarrow t\) we get, for any \(t\ge t_{0}\),

$$\begin{aligned} \left\| \frac{d}{dt}\lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| \le 4\Vert \dot{x}(t)\Vert + 4\beta \frac{|\dot{\gamma }(t)|}{\gamma (t)}\Vert B(x(t))\Vert + 2\frac{|\dot{\gamma }(t)|}{\gamma (t)}\Vert x(t) - \overline{x}\Vert . \end{aligned}$$

\(\square \)

The next theorem concerns the existence and uniqueness of strong global solutions to (Split-DIN-AVD).

Theorem 2.3

Assume that \(\lambda , \gamma : [t_{0}, +\infty )\rightarrow (0, +\infty )\) are Lebesgue measurable functions and that \(\inf _{t\ge t_{0}}\lambda (t) > 0\). Then, for any \((x_{0}, u_{0})\in \mathcal {H}\times \mathcal {H}\) there exists a unique strong global solution \(x : [t_{0}, +\infty )\rightarrow \mathcal {H}\) of the system (2) that satisfies \(x(t_{0}) = x_{0}\) and \(\dot{x}(t_{0}) = u_{0}\).

Proof

We will rely on [17, Proposition 6.2.1] and distinguish between the cases \(\xi > 0\) and \(\xi = 0\). For each chase, we will check that the conditions of the afforementioned proposition are fulfilled. We will be working in the real Hilbert space \(\mathcal {H}\times \mathcal {H}\) endowed with the norm \(\Vert (x, y)\Vert = \Vert x\Vert + \Vert y\Vert \). Let \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\) be fixed.

The Case \(\xi > 0\). First, it can be easily checked (see also [4, 9, 13]) that for all \(t\ge t_{0}\) the following dynamical systems are equivalent

\(*\):

\(\displaystyle \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \left( \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right) + T_{\lambda (t), \gamma (t)}(x(t)) = 0\).

\(*\):

\(\displaystyle {\left\{ \begin{array}{ll} \dot{x}(t) + \xi T_{\lambda (t), \gamma (t)}(x(t)) - \left( \frac{1}{\xi } - \frac{\alpha }{t}\right) x(t) + \frac{1}{\xi }y(t) = 0, \\ \dot{y}(t) - \left( \frac{1}{\xi } - \frac{\alpha }{t} + \frac{\alpha \xi }{t^{2}}\right) x(t) + \frac{1}{\xi }y(t) = 0. \end{array}\right. }\)

In other words, (2) with Cauchy data \((x_{0}, u_{0}) = (x(t_{0}), \dot{x}(t_{0}))\) is equivalent to the first order system

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{z}(t) = F(t, z(t)), \\ z(t_{0}) = (x_{0}, y_{0}), \end{array}\right. } \end{aligned}$$

where \(z(t) = (x(t), y(t))\), F is given, for every \(t\ge t_{0}\), by

$$\begin{aligned} F(t, (x, y)) = \left[ -\xi T_{\lambda (t), \gamma (t)}(x) + \left( \frac{1}{\xi } - \frac{\alpha }{t}\right) x - \frac{1}{\xi }y, \, \left( \frac{1}{\xi } - \frac{\alpha }{t} + \frac{\alpha \xi }{t^{2}}\right) x - \frac{1}{\xi }y\right] \end{aligned}$$

and the Cauchy data is \(x_{0} = x(t_{0})\), \(y_{0} = -\xi \left( u_{0} + \xi T_{\lambda (t_{0}), \gamma (t_{0})}(x_{0}) - \left( \frac{1}{\xi } - \frac{\alpha }{t_{0}}\right) x_{0}\right) \).

  1. (i)

    Let \(t\in [t_{0}, +\infty )\) be fixed. We need to verify the Lipschitz continuity of F on the z variable. Set \(z = (x, y)\), \(w = (u, v)\). We have

    $$\begin{aligned} \Vert F(t, z) - F(t, w)\Vert = \,&\left\| -\xi \left( T_{\lambda (t), \gamma (t)}(x) - T_{\lambda (t), \gamma (t)}(u) + \left( \frac{1}{\xi } - \frac{\alpha }{t}\right) \right. \right. \\ {}&\left. \left. (x - u) - \frac{1}{\xi }(y - v)\right) \right\| \\&+ \left\| \left( \frac{1}{\xi } - \frac{\alpha }{t} + \frac{\alpha \xi }{t^{2}}\right) (x - u) - \frac{1}{\xi }(y - v)\right\| . \end{aligned}$$

    Set \(\underline{\lambda } := \inf _{t\ge t_{0}}\lambda (t) > 0\). According to Lemma 2.2(i), the term involving the operator \(T_{\lambda (t), \gamma (t)}\) satisfies

    $$\begin{aligned} \left\| T_{\lambda (t), \gamma (t)}(x) - T_{\lambda (t), \gamma (t)}(u)\right\| \le \frac{2}{\lambda (t)}\Vert x - u\Vert \le \frac{2}{\underline{\lambda }}\Vert x - u\Vert . \end{aligned}$$

    It follows that, if we take

    $$\begin{aligned} K(t) := \max \left\{ \frac{2\xi }{\underline{\lambda }} + \left| \frac{1}{\xi } - \frac{\alpha }{t}\right| + \left| \frac{1}{\xi } - \frac{\alpha }{t} + \frac{\alpha \xi }{t^{2}}\right| , \, \frac{2}{\xi }\right\} \quad \forall t\ge t_{0}, \end{aligned}$$

    then we have \(K\in L_{\text {loc}}^{1}([t_{0}, +\infty ), \mathbb {R})\) and

    $$\begin{aligned} \Vert F(t, z) - F(t, w)\Vert \le K(t)\Vert z - w\Vert \quad \forall t\ge t_{0}. \end{aligned}$$
  2. (ii)

    Now, we claim that F fulfills a boundedness condition. For \(t\in [t_{0}, +\infty )\) and \(z = (x, y)\in \mathcal {H} \times \mathcal {H}\) we have

    $$\begin{aligned} \Vert F(t, z)\Vert = \left\| -\xi T_{\lambda (t), \gamma (t)}(x) + \left( \frac{1}{\xi } - \frac{\alpha }{t}\right) x - \frac{1}{\xi }y\right\| + \left\| \left( \frac{1}{\xi } - \frac{\alpha }{t} + \frac{\alpha \xi }{t^{2}}\right) x - \frac{1}{\xi }y\right\| . \end{aligned}$$

    By Lemma 2.2(i), we have, for every \(t\ge t_{0}\),

    $$\begin{aligned} \left\| T_{\lambda (t), \gamma (t)}(x)\right\| = \left\| T_{\lambda (t), \gamma (t)}(x) - T_{\lambda (t), \gamma (t)}(\overline{x})\right\| \le \frac{2}{\lambda (t)}\Vert x - \overline{x}\Vert . \end{aligned}$$

    Hence, if we take

    $$\begin{aligned} P(t) = \max \left\{ \frac{2\xi }{\lambda (t)} + \left| \frac{1}{\xi } - \frac{\alpha }{t}\right| + \left| \frac{1}{\xi } - \frac{\alpha }{t} + \frac{\alpha \xi }{t^{2}}\right| , \frac{2\xi }{\lambda (t)},\, \frac{2}{\xi }\right\} \quad \forall t\ge t_{0}, \end{aligned}$$

    then we have \(P\in L_{\text {loc}}^{1}([t_{0}, +\infty ), \mathbb {R})\) and

    $$\begin{aligned} \Vert F(t, z)\Vert \le P(t)(1 + \Vert z\Vert ). \end{aligned}$$

We have checked that the conditions of [17, Proposition 6.2.1] hold. Therefore, there exists a unique locally absolutely continuous solution \(t\mapsto x(t)\) of (2) that satisfies \(x(t_{0}) = x_{0}\) and \(\dot{x}(t_{0}) = u_0\).

The Case \(\xi = 0\). Now, (2) is easily seen to be equivalent to

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{z}(t) = F(t, z(t)), \\ z(t_{0}) = (x_{0}, u_{0}), \end{array}\right. }, \end{aligned}$$

where \(z(t) = (x(t), y(t))\) and F is given, for every \(t\ge t_{0}\), by

$$\begin{aligned} F(t, (x, y)) = \left[ y, \, -\frac{\alpha }{t}y - T_{\lambda (t), \gamma (t)}(x)\right] . \end{aligned}$$

Showing that F fulfills the required properties is starightforward. \(\square \)

3 The Convergence Properties of the Trajectories

In this section, we will study the asymptotic behaviour of the trajectories of the system

$$\begin{aligned} \text {(Split-DIN-AVD)} \quad \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \frac{d}{dt}\left( T_{\lambda (t), \gamma (t)}(x(t))\right) + T_{\lambda (t), \gamma (t)}(x(t)) = 0, \end{aligned}$$

where

$$\begin{aligned} T_{\lambda , \gamma }(x) = \frac{1}{\lambda }\big [{{\,\mathrm{Id}\,}}- J_{\gamma A}\circ ({{\,\mathrm{Id}\,}}- \gamma B)\big ]. \end{aligned}$$

We will show weak convergence of the trajectories generated by (2) to elements of \({{\,\mathrm{zer}\,}}(A + B)\), as well as the fast convergence of the velocities and accelerations to zero. Additionally, we will provide convergence rates for \(T_{\lambda (t), \gamma (t)}(x(t))\) and \(\frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\) as \(t\rightarrow +\infty \). To avoid repetition of the statement “for almost every t”, in the following theorem we will assume we are working with a classic global solution of our system.

Theorem 3.1

Let \(A : \mathcal {H}\rightarrow 2^{\mathcal {H}}\) be a maximally monotone operator and \(B : \mathcal {H}\rightarrow \mathcal {H}\) a \(\beta \)-cocoercive operator for some \(\beta > 0\) such that \({{\,\mathrm{zer}\,}}(A + B)\ne \emptyset \). Assume that \(\alpha > 1\), \(\xi \ge 0\), \(\lambda (t) = \lambda t^{2}\) for \(\lambda > \frac{2}{(\alpha - 1)^{2}}\) and all \(t\ge t_{0}\), and that \(\gamma : [t_{0}, +\infty )\rightarrow (0, 2\beta )\) is a differentiable function that satisfies \(\frac{\dot{\gamma }(t)}{\gamma (t)} = \mathcal {O}\left( \frac{1}{t}\right) \) as \(t\rightarrow +\infty \). Then, for a solution \(x : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) to (Split-DIN-AVD), the following statements hold:

  1. (i)

    x is bounded.

  2. (ii)

    We have the estimates

    $$\begin{aligned}&\int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }t^{3}\Vert \ddot{x}(t)\Vert ^{2}dt< +\infty , \\&\int _{t_{0}}^{+\infty }\frac{\gamma ^{2}(t)}{t}\left\| A_{\gamma (t)}\Big [x(t) - \gamma (t)Bx(t)\Big ] + Bx(t)\right\| ^{2}dt < +\infty . \end{aligned}$$
  3. (iii)

    We have the convergence rates

    $$\begin{aligned}&\Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) , \Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) ,\\&\left\| A_{\gamma (t)}\Big [x(t) - \gamma (t)Bx(t)\Big ] + Bx(t)\right\| = o\left( \frac{1}{\gamma (t)}\right) , \\&\left\| \frac{d}{dt} \left( A_{\gamma (t)}\Big [x(t) - \gamma (t)Bx(t)\Big ] + Bx(t)\right) \right\| = \mathcal {O}\left( \frac{1}{t\gamma (t)}\right) + o\left( \frac{t^{2}\left| \frac{d}{dt}\frac{\gamma (t)}{\lambda (t)}\right| }{\gamma ^{2}(t)}\right) \end{aligned}$$

    as \(t\rightarrow +\infty \).

  4. (iv)

    If \(0< \inf _{t \ge t_{0}}\gamma (t) \le \sup _{t\ge t_{0}}\gamma (t) < 2\beta \), then x(t) converges weakly to an element of \({{\,\mathrm{zer}\,}}(A + B)\) as \(t \rightarrow +\infty \).

Proof

Integral Estimates and Rates. To develop the analysis, we will fix \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\) and make of use of the Lyapunov function \(\mathcal {E} : [t_{0}, +\infty ) \rightarrow \mathbb {R}\cup \{+\infty \}\) given by

$$\begin{aligned} \mathcal {E}(t) := \frac{1}{2}\left\| \frac{\alpha - 1}{2}(x(t) - \overline{x}) + t(\dot{x}(t) + \xi \,T_{\lambda (t), \gamma (t)}(x(t)))\right\| ^{2} + \frac{(\alpha - 1)^{2}}{8}\Vert x(t) - \overline{x}\Vert ^{2}. \end{aligned}$$
(9)

Differentiation of \(\mathcal {E}\) with respect to time yields, for every \(t\ge t_{0}\),

$$\begin{aligned} \begin{aligned} \dot{\mathcal {E}}(t) =&\ \left\langle \frac{\alpha - 1}{2}(x(t) - \overline{x}) + t\left( \dot{x}(t) + \xi \, T_{\lambda (t), \lambda (t)}(x(t))\right) \right. , \\&\quad \left. \frac{\alpha + 1}{2}\dot{x}(t) + \xi \, T_{\lambda (t), \gamma (t)}(x(t)) + t\left( \ddot{x}(t) + \xi \frac{d}{dt}\left( T_{\lambda (t), \gamma (t)}(x(t))\right) \right) \right\rangle \\&\quad + \frac{(\alpha - 1)^{2}}{4}\langle x(t) - \overline{x}, \dot{x}(t)\rangle . \end{aligned} \end{aligned}$$

After reduction and employing (2), we get, for every \(t\ge t_{0}\),

$$\begin{aligned} \begin{aligned} \dot{\mathcal {E}}(t)&= \frac{(\alpha - 1)(\xi - t)}{2}\left\langle x(t) - \overline{x}, T_{\lambda (t), \gamma (t)}(x(t))\right\rangle + \frac{(1 - \alpha )t}{2}\Vert \dot{x}(t)\Vert ^{2} \\&\quad + \left( -t^{2} + \frac{\xi (3 - \alpha )t}{2}\right) \left\langle T_{\lambda (t), \gamma (t)}(x(t)), \dot{x}(t)\right\rangle + \xi (\xi - t)t\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2}. \end{aligned} \end{aligned}$$

Now, by Lemma 2.2(i), we know that \(T_{\lambda (t), \gamma (t)}\) is \(\frac{\lambda (t)}{2}\)-cocoercive for every \(t\ge t_{0}\). Using this on the first summand of the right hand side of the previous inequality yields, for \(t\ge t_{1} = \max \{\xi , t_{0}\}\),

$$\begin{aligned} \begin{aligned} \dot{\mathcal {E}}(t) \le&\frac{(1 - \alpha )t}{2}\Vert \dot{x}(t)\Vert ^{2} + \left( -t^{2} + \frac{\xi (3 - \alpha )t}{2}\right) \left\langle T_{\lambda (t), \gamma (t)}(x(t)), \dot{x}(t)\right\rangle \\&+ \left( \frac{(\alpha - 1)(\xi - t)\lambda (t)}{4} + \xi (\xi - t)t\right) \left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2}. \end{aligned} \end{aligned}$$
(10)

Now, since \(\lambda > \frac{2}{(\alpha - 1)^{2}}\), we can choose \(\epsilon > 0\) such that

$$\begin{aligned} 0< \epsilon< \alpha - 1 - \sqrt{\frac{2}{\lambda }} < \alpha - 1. \end{aligned}$$
(11)

From (10) we get, for every \(t\ge t_{1}\),

$$\begin{aligned} \begin{aligned}&\dot{\mathcal {E}}(t) \, +\, \frac{\epsilon }{2} t\Vert \dot{x}(t)\Vert ^{2} + \frac{\epsilon }{4} t\lambda (t)\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2} \\&\quad \le \left( \frac{1 - \alpha }{2} + \frac{\epsilon }{2}\right) t\Vert \dot{x}(t)\Vert ^{2} + \left( -t^{2} + \frac{\xi (3 - \alpha )t}{2}\right) \left\langle T_{\lambda (t), \gamma (t)}(x(t)), \dot{x}(t)\right\rangle \\&\qquad +\left( \left( \frac{(\alpha - 1)(\xi - t)}{2} + \frac{\epsilon }{2}t\right) \frac{\lambda (t)}{2} + \xi (\xi - t)t\right) \left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2}. \end{aligned} \end{aligned}$$
(12)

By (11) and the definition of \(\lambda (t)\), we know that \(\frac{1 - \alpha }{2} + \frac{\epsilon }{2} < 0\), and

$$\begin{aligned} \left( \left( \frac{(\alpha - 1)(\xi - t)}{2} + \frac{\epsilon }{2}t\right) \frac{\lambda (t)}{2} + \xi (\xi - t)t\right) = \underbrace{\left( \frac{1 - \alpha }{2} + \frac{\epsilon }{2}\right) }_{< 0}\frac{\lambda }{2} t^{3} + \mathcal {O}(t^{2}), \end{aligned}$$

so we can find \(t_{2}\ge t_{1}\) such that for every \(t\ge t_{2}\) the previous expression becomes nonpositive. According to Lemma A.2, the right hand side of (12) is nonpositive whenever

$$\begin{aligned} R(t):= & {} \left( -t^{2} + \frac{\xi (3 - \alpha )t}{2}\right) ^{2} - 4\left( \frac{1 - \alpha }{2} + \frac{\epsilon }{2}\right) \\&t\left( \left( \frac{(\alpha - 1)(\xi - t)}{2} + \frac{\epsilon }{2}t\right) \frac{\lambda (t)}{2} + \xi (\xi - t)t\right) \le 0. \end{aligned}$$

This quantity can be rewritten as

$$\begin{aligned} R(t) = \left( 1 + 4\left( \frac{1 - \alpha }{2} + \frac{\epsilon }{2}\right) \left( \frac{\alpha - 1}{2} - \frac{\epsilon }{2}\right) \frac{\lambda }{2}\right) t^{4} + \mathcal {O}(t^{3}) \quad \text {as} \quad t\rightarrow +\infty . \end{aligned}$$

Since \(\epsilon < \alpha - 1 - \sqrt{\frac{2}{\lambda }}\), we have \(\frac{\lambda }{2} > \frac{1}{(\alpha - 1 - \epsilon )^{2}}\). Hence,

$$\begin{aligned} 1 + 4\left( \frac{1 - \alpha }{2} + \frac{\epsilon }{2}\right) \left( \frac{\alpha - 1}{2} - \frac{\epsilon }{2}\right) \frac{\lambda }{2} = 1 - (\alpha - 1 - \epsilon )^{2}\frac{\lambda }{2} < 0. \end{aligned}$$

This means we can find \(t_{3}\ge t_{2}\) such that for every \(t\ge t_{3}\) we have \(R(t) \le 0\), that is, for every \(t\ge t_{3}\) we have

$$\begin{aligned} \dot{\mathcal {E}}(t) + \frac{\epsilon }{2} t\Vert \dot{x}(t)\Vert ^{2} + \frac{\epsilon }{4} t\lambda (t)\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2} \le 0. \end{aligned}$$
(13)

Now, integrating (13) from \(t_{3}\) to t we obtain

$$\begin{aligned} \mathcal {E}(t) + \frac{\epsilon }{2}\int _{t_{3}}^{t}s\Vert \dot{x}(s)\Vert ^{2}ds + \frac{\epsilon }{4}\lambda \int _{t_{3}}^{t}s^{3}\left\| T_{\lambda (s), \gamma (s)}(x(s))\right\| ^{2}ds \le \mathcal {E}(t_{3}). \end{aligned}$$
(14)

From (13) and the form of \(\mathcal {E}\) we immediately obtain

$$\begin{aligned}&t\mapsto \Vert x(t) - \overline{x}\Vert \,\,\text {is bounded}, \end{aligned}$$
(15)
$$\begin{aligned}&\int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt < +\infty , \end{aligned}$$
(16)
$$\begin{aligned}&\int _{t_{0}}^{+\infty }t^{3}\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2}dt < +\infty , \end{aligned}$$
(17)
$$\begin{aligned}&\sup _{t\ge t_{0}}\left\| \left( \frac{\alpha - 1}{2}\right) (x(t) - \overline{x}) + t\left( \dot{x}(t) + \xi \,T_{\lambda (t), \gamma (t)}(x(t))\right) \right\| < +\infty . \end{aligned}$$
(18)

From Lemma 2.2(i), we know that for every \(t\ge t_{0}\) the operator \(T_{\lambda (t), \gamma (t)}\) is \(\frac{2}{\lambda (t)}\)-Lipschitz continuous, which gives, for every \(t\ge t_{0}\),

$$\begin{aligned} \left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| = \left\| T_{\lambda (t), \gamma (t)}(x(t)) - T_{\lambda (t), \gamma (t)}(\overline{x})\right\| \le \frac{2}{\lambda (t)}\Vert x(t) - \overline{x}\Vert . \end{aligned}$$

Thus, from (15) and recalling that \(\lambda (t) = \lambda t^{2}\) we arrive at

$$\begin{aligned} \left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| = \mathcal {O}\left( \frac{1}{t^{2}}\right) \quad \text {as}\quad t\rightarrow +\infty . \end{aligned}$$
(19)

By combining (15), (18) and (19) we obtain \(\sup _{t\ge t_{0}}t\Vert \dot{x}(t)\Vert < +\infty \) and therefore

$$\begin{aligned} \Vert \dot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t}\right) \quad \text {as} \quad t\rightarrow +\infty . \end{aligned}$$
(20)

From Lemma 2.2, (15), (20) and the fact that B is \(\frac{1}{\beta }\)-Lipschitz continuous we deduce that, as \(t\rightarrow +\infty \),

$$\begin{aligned} \left\| \frac{d}{dt}\lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| \le 4\Vert \dot{x}(t)\Vert + 4\beta \frac{|\dot{\gamma }(t)|}{\gamma (t)}\Vert B(x(t))\Vert + 2\frac{|\dot{\gamma }(t)|}{\gamma (t)}\Vert x(t) - \overline{x}\Vert = \mathcal {O}\left( \frac{1}{t}\right) . \end{aligned}$$
(21)

On the other hand, for every \(t\ge t_{0}\) we have

$$\begin{aligned} \left\| \frac{d}{dt}\lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| = \left\| \dot{\lambda }(t)T_{\lambda (t), \gamma (t)}(x(t)) + \lambda (t)\frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right\| , \end{aligned}$$
(22)

so by combining (19), (21), (22) and the fact that \(\dot{\lambda }(t) = 2\lambda t\) we arrive at

$$\begin{aligned}&\left\| \lambda (t)\frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right\| \le \underbrace{\left\| \frac{d}{dt}\lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| }_{\mathcal {O}\left( \frac{1}{t}\right) } + \dot{\lambda }(t)\underbrace{\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| }_{\mathcal {O}\left( \frac{1}{t^{2}}\right) } = \mathcal {O}\left( \frac{1}{t}\right) \\&\text {as}\quad t\rightarrow +\infty , \end{aligned}$$

which yields

$$\begin{aligned} \left\| \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right\| = \frac{1}{\lambda (t)}\mathcal {O}\left( \frac{1}{t}\right) = \mathcal {O}\left( \frac{1}{t^{3}}\right) \quad \text {as} \quad t\rightarrow +\infty . \end{aligned}$$
(23)

Let us now improve (19) and show that

$$\begin{aligned} \left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| = o\left( \frac{1}{t^{2}}\right) \quad \text {as} \quad t\rightarrow +\infty . \end{aligned}$$
(24)

According to (19) and (21) there exists a constant \(K > 0\) such that for every \(t\ge t_{0}\) it holds

$$\begin{aligned} \left| \frac{d}{dt}\left\| \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{4}\right|&= \left| 4\left\| \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2}\right. \\ {}&\quad \times \left. \left\langle \lambda (t)T_{\lambda (t), \gamma (t)}(x(t)), \frac{d}{dt}\lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\rangle \right| \\&\le 4 \left\| \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2} \left\| \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| \\ {}&\quad \times \left\| \frac{d}{dt}\lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| \\&\le \frac{4K}{t}\left\| \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2}. \end{aligned}$$

By (17), the right hand side belongs to \(L^{1}([t_{0}, +\infty ), \mathbb {R})\), so we get

$$\begin{aligned} \frac{d}{dt}\left\| \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{4} \in L^{1}([t_{0}, +\infty ), \mathbb {R}), \end{aligned}$$

hence the limit

$$\begin{aligned} \lim _{t\rightarrow +\infty }\left\| \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{4} \end{aligned}$$

exists. Obviously, this implies the existence of \(L:= \lim _{t\rightarrow +\infty }\left\| \lambda (t)T_{\lambda (t), \gamma (t))}(x(t))\right\| ^{2}\). By using (17) again we come to

$$\begin{aligned} \int _{t_{0}}^{+\infty }\frac{1}{t}\left\| \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2}dt = \lambda ^{2}\int _{t_{0}}^{+\infty }t^{3}\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2}dt < +\infty , \end{aligned}$$

and so we must have \(L = 0\), which gives

$$\begin{aligned} \left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| = o\left( \frac{1}{t^{2}}\right) \quad \text {as} \quad t\rightarrow +\infty . \end{aligned}$$
(25)

By combining (2), (19), (20) and (23) we obtain, as \(t\rightarrow +\infty \),

$$\begin{aligned} \Vert \ddot{x}(t)\Vert&= \left\| -\frac{\alpha }{t}\dot{x}(t) - \xi \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t)) - T_{\lambda (t), \gamma (t)}(x(t))\right\| \\&\le \frac{\alpha }{t}\underbrace{\Vert \dot{x}(t)\Vert }_{\mathcal {O}\left( \frac{1}{t}\right) } + \xi \underbrace{\left\| \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right\| }_{\mathcal {O}\left( \frac{1}{t^{3}}\right) } + \underbrace{\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| }_{\mathcal {O}\left( \frac{1}{t^{2}}\right) } = \mathcal {O}\left( \frac{1}{t^{2}}\right) . \end{aligned}$$

Moreover, by using the well-known inequality \(\Vert a + b + c\Vert ^{2} \le 3\Vert a\Vert ^{2} + 3\Vert b\Vert ^{2} + 3\Vert c\Vert ^{2}\) for every \(a, b, c\in \mathcal {H}\), for every \(t\ge t_{0}\) it holds

$$\begin{aligned} t^{3}\Vert \ddot{x}(t)\Vert ^{2}&\le t^{3}\left\| - \frac{\alpha }{t}\dot{x}(t) - \xi \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t)) - T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2} \\&\le 3\alpha t\Vert \dot{x}(t)\Vert ^{2} + 3\xi ^{2}t^{3}\left\| \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2} + 3t^{3}\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| ^{2}. \end{aligned}$$

From (16), (23) and (17) it follows

$$\begin{aligned} \int _{t_{0}}^{+\infty }t^{3}\Vert \ddot{x}(t)\Vert ^{2}dt < +\infty . \end{aligned}$$
(26)

To see that \(\Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) \) as \(t\rightarrow +\infty \), we write, for every \(t\ge t_{0}\),

$$\begin{aligned} \frac{d}{dt} \left( t^{2}\Vert \dot{x}(t)\Vert ^{2} \right) = 2t\Vert \dot{x}(t)\Vert ^{2} + 2t^{2}\langle \dot{x}(t), \ddot{x}(t)\rangle \le 3t\Vert \dot{x}(t)\Vert ^{2} + t^{3}\Vert \ddot{x}(t)\Vert ^{2}. \end{aligned}$$

From (16) and (26) we deduce that the left hand side belongs to \(L^{1}([t_{0}, +\infty ), \mathbb {R})\), from which we infer that the limit \(\lim _{t\rightarrow +\infty }t^{2}\Vert \dot{x}(t)\Vert ^{2}\) exists. Using (16) again, we get

$$\begin{aligned} \int _{t_{0}}^{+\infty }\frac{1}{t}\left( t^{2}\Vert \dot{x}(t)\Vert ^{2}\right) dt = \int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt < +\infty , \end{aligned}$$

from which we finally deduce \(\lim _{t\rightarrow +\infty }t^{2}\Vert \dot{x}(t)\Vert ^{2} = 0\), therefore

$$\begin{aligned} \Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) \quad \text {as} \quad t\rightarrow +\infty . \end{aligned}$$
(27)

Notice that we can write for every \(t\ge t_{0}\)

$$\begin{aligned} T_{\lambda (t), \gamma (t)} = \frac{1}{\lambda (t)}\Big [{{\,\mathrm{Id}\,}}- J_{\gamma (t)A}({{\,\mathrm{Id}\,}}- \gamma (t)B)\Big ] = \frac{\gamma (t)}{\lambda (t)}\left( A_{\gamma (t)}\Big [x(t) - \gamma (t)Bx(t)\Big ] + Bx(t)\right) . \end{aligned}$$

Hence, multiplying both sides of (25) by \(\frac{\lambda (t)}{\gamma (t)}\) and remembering the definition of \(\lambda (t)\) we obtain

$$\begin{aligned} \left\| A_{\gamma (t)}\Big [x(t) - \gamma (t)Bx(t)\Big ] + Bx(t)\right\| = o\left( \frac{1}{\gamma (t)}\right) \quad \text {as} \quad t\rightarrow +\infty . \end{aligned}$$
(28)

For every \(t\ge t_{0}\), we have

$$\begin{aligned} \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t)) =&\, \frac{d}{dt}\left( \frac{\gamma (t)}{\lambda (t)}\right) \left( A_{\gamma (t)}\Big [x(t) - \gamma (t)Bx(t)\Big ] + Bx(t)\right) \\&+ \frac{\gamma (t)}{\lambda (t)}\frac{d}{dt}\left( A_{\gamma (t)}\Big [x(t) - \gamma (t)Bx(t)\Big ] + Bx(t)\right) . \end{aligned}$$

Therefore, by using (23) and (28), and recalling that \(\lambda (t) = \lambda t^{2}\), we obtain

$$\begin{aligned} \left\| \frac{d}{dt}\left( A_{\gamma (t)}\Big [x(t) - \gamma (t)Bx(t)\Big ] + Bx(t)\right) \right\|= & {} \mathcal {O}\left( \frac{1}{t\gamma (t)}\right) \\&+ o\left( \frac{t^{2}\left| \frac{d}{dt}\frac{\gamma (t)}{\lambda (t)}\right| }{\gamma ^{2}(t)}\right) \quad \text {as} \quad t\rightarrow +\infty . \end{aligned}$$

The fact that \(\Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) \) as \(t\rightarrow +\infty \) comes from (2), (27), (23) and (24).

Weak Convergence of the Trajectories. Let \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\). We will work with the energy function \(h :[t_{0}, +\infty )\rightarrow \mathbb {R}\) given by

$$\begin{aligned} h(t) := \frac{1}{2}\Vert x(t) - \overline{x}\Vert ^{2}. \end{aligned}$$

For every \(t\ge t_{0}\), we have

$$\begin{aligned} \dot{h}(t) = \langle x(t) - \overline{x}, \dot{x}(t)\rangle , \quad \ddot{h}(t) = \langle x(t) - \overline{x}, \ddot{x}(t)\rangle + \Vert \dot{x}(t)\Vert ^{2}. \end{aligned}$$
(29)

Combining (2) and (29) gives us, for every \(t\ge t_{0}\),

$$\begin{aligned} \ddot{h}(t) + \frac{\alpha }{t}\dot{h}(t) + \left\langle T_{\lambda (t), \gamma (t)}(x(t)), x(t) - \overline{x}\right\rangle = \Vert \dot{x}(t)\Vert ^{2} + \left\langle -\xi \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t)), x(t) - \overline{x}\right\rangle . \end{aligned}$$

By using the \(\frac{\lambda (t)}{2}\)-cocoercitivity of \(T_{\lambda (t), \gamma (t)}\) on the left hand side, Cauchy–Schwarz on the right hand side and multiplying both sides by t, the previous inequality entails, for every \(t\ge t_{0}\),

$$\begin{aligned}&t\ddot{h}(t) + \alpha \dot{h}(t) + t \frac{\lambda (t)}{2}\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| \le t\Vert \dot{x}(t)\Vert ^{2} + \xi t \left\| \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right\| \\&\Vert x(t) - \overline{x}\Vert \quad \forall t\ge t_{0}. \end{aligned}$$

Now, puttin together results in

$$\begin{aligned} k(t) := t\Vert \dot{x}(t)\Vert ^{2} + \xi t \left\| \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right\| \Vert x(t) - \overline{x}\Vert \in L^{1}([t_{0}, +\infty ), \mathbb {R}). \end{aligned}$$

Now apply Lemma A.1 with \(\theta (t):= t \frac{\lambda (t)}{2}\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| \) for every \(t\ge t_{0}\) to deduce that the limit

$$\begin{aligned} \lim _{t\rightarrow +\infty }h(t) \end{aligned}$$

exists, which fulfills the first condition of Opial’s Lemma A.3.

Let us now move on to the second condition. Suppose \(\widehat{x}\) is a weak sequential cluster point of \(t\mapsto x(t)\), that is, there exists a sequence \((t_{n})_{n\in \mathbb {N}}\subseteq [t_{0}, +\infty )\) such that \(t_{n}\rightarrow +\infty \) and \(x_{n} := x(t_{n})\) converges weakly to \(\widehat{x}\) as \(n\rightarrow +\infty \). Define

$$\begin{aligned} U_{\gamma } := {{\,\mathrm{Id}\,}}- J_{\gamma A}\circ ({{\,\mathrm{Id}\,}}- \gamma B). \end{aligned}$$

According to (25), we have \(U_{\gamma (t)}(x(t)) = \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\rightarrow 0\) as \(t\rightarrow +\infty \). Now, since \(\gamma (t)\in [\delta , 2\beta - \delta ]\) for all \(t\ge t_{0}\) for some \(\delta > 0\), we can extract a subsequence \((\gamma (t_{n_{k}}))_{k\in \mathbb {N}}\) such that \(\gamma (t_{n_{k}})\rightarrow \overline{\gamma }\in (0, 2\beta )\) as \(k\rightarrow +\infty \). We may assume without loss of generality then that \(\gamma _{n} := \gamma (t_{n})\rightarrow \overline{\gamma }\) as \(n\rightarrow +\infty \). We now have for every \(n \in \mathbb {N}\)

$$\begin{aligned} \Vert U_{\gamma _{n}}(x_{n}) - U_{\overline{\gamma }}(x_{n})\Vert =&\ \Vert J_{\gamma _{n} A}(x_{n} - \gamma _{n}B(x_{n})) - J_{\overline{\gamma }A}(x_{n} - \overline{\gamma }B(x_{n}))\Vert \\ =&\ \Vert J_{\gamma _{n}A}(x_{n} - \gamma _{n}B(x_{n})) - J_{\gamma _{n}A}(x_{n} - \overline{\gamma }B(x_{n}))\Vert \\&+ \Vert J_{\gamma _{n}A}(x_{n} - \overline{\gamma }B(x_{n})) - J_{\overline{\gamma }A}(x_{n} - \overline{\gamma }B(x_{n}))\Vert \\ \le&\ |\overline{\gamma } - \gamma _{n}|\Vert B(x_{n})\Vert + |\overline{\gamma } - \gamma _{n}|\Vert A_{\overline{\gamma }}(x_{n} - \overline{\gamma }B(x_{n}))\Vert . \end{aligned}$$

Now, since every weakly convergent sequence is bounded and the operators B and \(A_{\overline{\gamma }}\) are Lipschitz-continuous we deduce that the right-hand side of the previous inequality approaches zero as \(n\rightarrow +\infty \), therefore getting

$$\begin{aligned} U_{\overline{\gamma }}(x_{n}) = U_{\gamma _{n}}(x_{n}) + \big (U_{\overline{\gamma }}(x_{n}) - U_{\gamma _{n}}(x_{n})\big ) \rightarrow 0 \end{aligned}$$

as \(n\rightarrow +\infty \). Now, from the proof of part (i) of Lemma 2.2, we know that \(U_{\overline{\gamma }}\) is \(\frac{4\beta - \overline{\gamma }}{4\beta }\)-cocoercive, thus monotone and Lipschitz continuous and therefore maximally monotone. Summarizing, we have

  1. 1.

    \(U_{\overline{\gamma }}\) is maximally monotone and thus its graph is closed in the weak\(\times \)strong topology of \(\mathcal {H} \times \mathcal {H}\) (see [14, Proposition 20.38(ii)]),

  2. 2.

    \(x_{n}\) converges weakly to \(\widehat{x}\) and \(U_{\overline{\gamma }}(x_{n})\rightarrow 0\) as \(n\rightarrow +\infty \),

which allows us to conclude that \(U_{\overline{\gamma }}(\widehat{x}) = 0\), and gives finally \(\widehat{x}\in {{\,\mathrm{zer}\,}}(A + B)\). Now we just invoke Opial’s Lemma to achieve that x(t) converges weakly to \(\overline{x}\) as \(t\rightarrow +\infty \) for some \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\). \(\square \)

In the following subsections, we explore the particular cases \(B = 0\) and \(A = 0\), and we will show improvements with respect to previous results from the literature addressing continuous time approaches to monotone inclusions.

3.1 The Case \(B = 0\)

If we let \(B = 0\) in the (Split-DIN-AVD) system (2), then, attached to the monotone inclusion problem

$$\begin{aligned} \text{ find } \ x \in \mathcal {H} \ \text{ such } \text{ that } \ 0 \in A(x), \end{aligned}$$

we obtain the dynamics

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \frac{d}{dt}\left( A_{\lambda (t), \gamma (t)}(x(t)\right) + A_{\lambda (t), \gamma (t)}(x(t)) = 0, \end{aligned}$$
(30)

where

$$\begin{aligned} A_{\lambda , \gamma }(x) = \frac{1}{\lambda }({{\,\mathrm{Id}\,}}- J_{\gamma A}). \end{aligned}$$

We can state the following theorem.

Theorem 3.2

Let \(A : \mathcal {H}\rightarrow 2^{\mathcal {H}}\) be a maximally monotone operator such that \({{\,\mathrm{zer}\,}}A\ne \emptyset \). Assume that \(\alpha > 1\), \(\xi \ge 0\), \(\lambda (t) = \lambda t^{2}\) for \(\lambda > \frac{1}{(\alpha - 1)^{2}}\) and all \(t\ge t_{0}\), and that \(\gamma : [t_{0}, +\infty )\rightarrow (0, +\infty )\) is a differentiable function that satisfies \(\frac{|\dot{\gamma }(t)|}{\gamma (t)} = \mathcal {O}\left( \frac{1}{t}\right) \) as \(t\rightarrow +\infty \). Then, for a solution \(x : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) to (30), the following statements hold:

  1. (i)

    x is bounded.

  2. (ii)

    We have the estimates

    $$\begin{aligned}&\int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }t^{3}\Vert \ddot{x}(t)\Vert ^{2}dt< +\infty ,\\&\int _{t_{0}}^{+\infty }\frac{\gamma ^{2}(t)}{t}\left\| A_{\gamma (t)}(x(t))\right\| ^{2}dt < +\infty . \end{aligned}$$
  3. (iii)

    We have the convergence rates

    $$\begin{aligned}&\Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) , \ \Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) , \\&\left\| A_{\gamma (t)}(x(t))\right\| = o\left( \frac{1}{\gamma (t)}\right) , \ \left\| \frac{d}{dt}A_{\gamma (t)}(x(t))\right\| = \mathcal {O}\left( \frac{1}{t\gamma (t)}\right) + o\left( \frac{t^{2}\left| \frac{d}{dt}\frac{\gamma (t)}{\lambda (t)}\right| }{\gamma ^{2}(t)}\right) \end{aligned}$$

    as \(t\rightarrow +\infty \).

  4. (iv)

    If \(0 < \inf _{t\ge t_{0}}\gamma (t)\), then x(t) converges weakly to an element of \({{\,\mathrm{zer}\,}}A\) as \(t \rightarrow +\infty \).

Proof

The proof proceeds in the exact same way as the proof of Theorem 3.1. However, a few comments are in order: first of all, now we have \(T_{\lambda , \gamma } = \frac{1}{\lambda }({{\,\mathrm{Id}\,}}- J_{\gamma A}) = A_{\lambda , \gamma }\). Since \(J_{\lambda A}\) is firmly nonexpansive, by [14, Proposition 4.4] so is \({{\,\mathrm{Id}\,}}- J_{\lambda A}\). In other words, \({{\,\mathrm{Id}\,}}- J_{\gamma A}\) is 1-cocoercive, therefore \(A_{\lambda , \gamma } = \frac{1}{\lambda }({{\,\mathrm{Id}\,}}- J_{\gamma A})\) is \(\lambda \)-cocoercive, so now the condition on \(\lambda \) becomes \(\lambda > \frac{1}{(\alpha - 1)^{2}}\).

The proof also changes when we verify the second part of the Opial’s Lemma, to get weak convergence of the trajectories \(t\mapsto x(t)\). This is in order to allow for \(\gamma (t)\) not to be necessarily bounded. We do need, however, the assumption \(0 < \inf _{t\ge t_{0}}\gamma (t)\). Indeed, from \(\Vert A_{\lambda (t), \gamma (t)}(x(t))\Vert = o\left( \frac{1}{t^{2}}\right) \) as \(t\rightarrow +\infty \), we obtain

$$\begin{aligned} y(t) := x(t) - J_{\gamma (t)A}x(t) = \lambda (t)A_{\lambda (t), \gamma (t)}(x(t)) \rightarrow 0 \end{aligned}$$

as \(t\rightarrow +\infty \). Using the definition of the resolvent, we come to

$$\begin{aligned} J_{\gamma (t)A}x(t) = x(t) - y(t) \Leftrightarrow y(t) \in \gamma (t)A(x(t) - y(t)) \Leftrightarrow \frac{1}{\gamma (t)}y(t) \in A(x(t) - y(t)). \end{aligned}$$

for all \(t\ge t_{0}\). If \((t_{n})_{n\in \mathbb {N}}\subseteq [t_{0}, +\infty )\) is such that \(t_{n}\rightarrow +\infty \) and \(x(t_{n})\) converges weakly to \(\widehat{x}\) as \(n\rightarrow +\infty \), then the previous inclusion, together with the assumption on \(\gamma \) gives

$$\begin{aligned} x(t_{n}) - y(t_{n}) \ \text{ converges } \text{ weakly } \text{ to } \ \widehat{x} \quad \text {and}\quad \frac{1}{\gamma (t)}y(t)\rightarrow 0 \quad \text {as}\quad n\rightarrow +\infty , \end{aligned}$$

and by the closedness of the graph of A in the weak\(\times \)strong topology of \(\mathcal {H} \times \mathcal {H}\), we deduce that \(\widehat{x}\in {{\,\mathrm{zer}\,}}A\). \(\square \)

Remark 3.3

The hypotheses required for \(\gamma \) are fulfilled at least by two families of functions. First, take \(r\ge 0\) and set \(\gamma (t) = e^{t^{-r}}\). Then, we have

$$\begin{aligned} \frac{\dot{\gamma }(t)}{\gamma (t)} = \frac{-r t^{-(r + 1)}e^{t^{-r}}}{e^{t^{-r}}} = -\frac{r}{t^{r + 1}} = \mathcal {O}\left( \frac{1}{t}\right) \quad \text {as} \quad t\rightarrow +\infty , \end{aligned}$$

and

$$\begin{aligned} \gamma (t) = e^{t^{-r}} \ge e^{0} = 1 \quad \forall t\ge 0. \end{aligned}$$

If \(\gamma \) is a polynomial of degree n for some \(n\in \mathbb {N}\), the conditions are also fulfilled. Assume \(\gamma (t) = a_{n}t^{n} + a_{n - 1}t^{n - 1} + \cdots + a_{0}\) for all \(t\ge t_{0}\), for some \(a_{i}\in \mathbb {R}\) for \(i\in \{0, \ldots , n\}\) and \(a_{n} > 0\). Then, we have

$$\begin{aligned} t\cdot \frac{\dot{\gamma }(t)}{\gamma (t)}&= t \cdot \frac{n a_{n}t^{n - 1} + (n - 1)a_{n - 1}t^{n - 1} + \cdots + a_{1}}{a_{n}t^{n} + a_{n - 1}t^{n - 1} + \cdots + a_{0}} \\&\rightarrow \frac{na_{n}}{a_{n}} = n \quad \text {as} \quad t\rightarrow +\infty , \end{aligned}$$

so \(\frac{\dot{\gamma }(t)}{\gamma (t)} = \mathcal {O}\left( \frac{1}{t}\right) \) as \(t\rightarrow +\infty \). Since we also have \(\gamma (t) \rightarrow +\infty \) as \(t\rightarrow +\infty \), the condition \(\inf _{t\ge t_{0}} \gamma (t) > 0\) is fulfilled for large enough \(t_{0}\).

In particular, we can choose \(\gamma (t) = \lambda (t) = \lambda t^{2}\), which fulfills \(\gamma (t) \ge \lambda t_{0}^{2} > 0\) for any \(t\ge t_{0}\) and any \(t_{0}\). Since \(A_{\lambda , \lambda } = A_{\lambda }\) for \(\lambda > 0\), this choice of \(\gamma \) allows us to recover the (DIN-AVD) system studied by Attouch and László in [9]. Notice the way the convergence rates for \(A_{\gamma (t)}(x(t))\) and \(\frac{d}{dt}A_{\gamma (t)}(x(t))\) exhibited in part (iii) of Theorem 3.2 depend on \(\gamma (t)\). If we set \(\gamma (t) = t^{n}\) for every \(t\ge t_{0}\) for any natural number \(n > 2\), (Split-DIN-AVD) performs from this point of view better than (DIN-AVD) without increasing the complexity of the governing operator.

3.2 The Case \(A = 0\)

Let us return to (Split-DIN-AVD) dynamics (2). Set \(A = 0\), and for every \(t\ge t_{0}\) take \(\gamma (t) = \gamma \in (0, 2\beta )\) and \(\eta (t) = \eta t^{2}\) with \(\eta = \lambda /\gamma \). Then, associated to the problem

$$\begin{aligned} \text{ find } \ x \in \mathcal {H} \ \text{ such } \text{ that } \ B(x)=0, \end{aligned}$$

we obtain the system

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \frac{d}{dt}\left( \frac{1}{\eta (t)}Bx(t)\right) + \frac{1}{\eta (t)}Bx(t) = 0. \end{aligned}$$
(31)

The conditions \(\lambda > \frac{2}{(\alpha - 1)^{2}}\) and \(\gamma \in (0, 2\beta )\) imply

$$\begin{aligned} \eta = \frac{\lambda }{\gamma }> \frac{2}{\gamma (\alpha - 1)^{2}} > \frac{2}{2\beta (\alpha - 1)^{2}} = \frac{1}{\beta (\alpha - 1)^{2}}. \end{aligned}$$

With the previous observation, we are able to state the following theorem.

Theorem 3.4

Let \(B: \mathcal {H}\rightarrow \mathcal {H}\) be a \(\beta \)-cocoercive operator for some \(\beta > 0\) such that \({{\,\mathrm{zer}\,}}B\ne \emptyset \). Assume that \(\alpha > 1\), \(\xi \ge 0\) and \(\eta (t) = \eta t^{2}\) for \(\eta > \frac{1}{\beta (\alpha - 1)^{2}}\) and all \(t\ge t_{0}\). Take \(x : [t_{0}, +\infty )\rightarrow \mathcal {H}\) a solution to (31). Then, the following hold:

  1. (i)

    x is bounded, and x(t) converges weakly to an element of \({{\,\mathrm{zer}\,}}B\) as \(t \rightarrow +\infty \).

  2. (ii)

    We have the estimates

    $$\begin{aligned} \int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }t^{3}\Vert \ddot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }\frac{1}{t}\left\| Bx(t)\right\| ^{2}dt < \infty . \end{aligned}$$
  3. (iii)

    We have the convergence rates

    $$\begin{aligned} \Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) , \quad \Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) \end{aligned}$$

    as well as the limit

    $$\begin{aligned} \Vert Bx(t)\Vert \rightarrow 0 \end{aligned}$$

    as \(t\rightarrow +\infty \).

Proof

Since \(\eta > \frac{1}{\beta (\alpha - 1)^{2}}\), we can find \(\epsilon \in (0, \beta )\) such that \(\eta > \frac{1}{(\beta - \epsilon )(\alpha - 1)^{2}}\), equivalently, \(2(\beta - \epsilon )\eta > \frac{2}{(\alpha - 1)^{2}}\). Since (31) is equivalent to (Split-DIN-AVD) with \(A = 0\) and parameters \(\lambda = 2(\beta - \epsilon )\eta > \frac{1}{(\alpha - 1)^{2}}\) and \(\gamma (t) \equiv 2(\beta - \epsilon ) \in (0, 2\beta )\), the conclusion follows from Theorem 3.1. \(\square \)

Remark 3.5

  1. (a)

    As we mentioned in the introduction, the dynamical system (31) provides a way of finding the zeros of a cocoercive operator directly through forward evaluations, instead of having to resort to its Moreau envelope when following the approach in [9].

  2. (b)

    The dynamics (31) bear some resemblance to the system (6) (see also [16]) with \(\mu (t) = \frac{\alpha }{t}\) and \(\nu (t) = \frac{1}{\eta (t)}\), with an additional Hessian-driven damping term. In our case, since \(\eta > \frac{1}{\beta (\alpha - 1)^{2}}\), the parameters satisfy

    $$\begin{aligned} \dot{\mu }(t) = -\frac{\alpha }{t^{2}} \le 0, \quad \frac{\mu ^{2}(t)}{\nu (t)} = \frac{\alpha ^{2}\eta t^{2}}{t^{2}} = \alpha ^2 \eta > \frac{1}{\beta } \quad \forall t\ge t_{0}. \end{aligned}$$

    However, we have

    $$\begin{aligned} \dot{\nu }(t) = -\frac{2}{\lambda t^{3}} \le 0 \quad \forall t\ge t_{0}, \end{aligned}$$

    so one of the hypotheses which is needed in (6) is not fulfilled, which shows that one cannot address the dynamical system (31) as a particular case of it; indeed, for (6) a vanishing damping is not allowed. With our system, we obtain convergence rates for \(\dot{x}(t)\) and \(\ddot{x}(t)\) as \(t\rightarrow +\infty \), which are not obtained in [16].

4 Structured Convex Minimization

We can specialize the previous results to the case of convex minimization, and show additionally the convergence of functional values along the generated trajectories to the optimal objective value at a rate that will depend on the choice of \(\gamma \). Let \(f:\mathcal {H}\rightarrow \mathbb {R}\cup \{+\infty \}\) be a proper, convex and lower semicontinuous function, and let \(g:\mathcal {H}\rightarrow \mathbb {R}\) be a convex and Fréchet differentiable function with \(L_{\nabla g}\)-Lipschitz continuous gradient. Assume that \({{\,\mathrm{argmin}\,}}_{\mathcal {H}}(f + g)\ne \emptyset \), and consider the minimization problem

$$\begin{aligned} \min _{x\in \mathcal {H}} f(x) + g(x). \end{aligned}$$
(32)

Fermat’s rule tells us that \(\overline{x}\) is a global minimum of \(f + g\) if and only if

$$\begin{aligned} 0\in \partial (f + g)(\overline{x}) = \partial f(\overline{x}) + \nabla g(\overline{x}). \end{aligned}$$

Therefore, solving (32) is equivalent solving the monotone inclusion \(0\in (A + B)(x)\) addressed in the first section, with \(A = \partial f\) and \(B = \nabla g\). Moreover, recall that if \(\nabla g\) is \(L_{\nabla g}\)-Lipschitz then it is \(\frac{1}{L_{\nabla g}}\)-cocoercive (Baillon–Haddad’s Theorem, see [14, Corollary 18.17]). Therefore, associated to the problem (32) we have the dynamics

$$\begin{aligned}&\ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \frac{d}{dt}\left( \frac{\gamma (t)}{\lambda (t)}\left( \nabla f_{\gamma (t)}(u(t)) + \nabla g(x(t))\right) \right) \nonumber \\ {}&\quad + \frac{\gamma (t)}{\lambda (t)}\left( \nabla f_{\gamma (t)}(u(t)) + \nabla g(x(t)) \right) = 0, \end{aligned}$$
(33)

where we have denoted \(u(t) = x(t) - \gamma (t)\nabla g(x(t))\) for all \(t\ge t_{0}\) for convenience.

Theorem 4.1

Let \(f: \mathcal {H}\rightarrow \mathbb {R}\cup \{+\infty \}\) be a proper, convex and lower semicontinuous function, and let \(g : \mathcal {H}\rightarrow \mathbb {R}\) be a convex and Fréchet differentiable function with a \(L_{\nabla g}\)-Lipschitz continuous gradient such that \({{\,\mathrm{argmin}\,}}_{\mathcal {H}}(f + g)\ne \emptyset \). Assume that \(\alpha > 1\), \(\xi \ge 0\), \(\lambda (t) = \lambda t^{2}\) for \(\lambda > \frac{2}{(\alpha - 1)^{2}}\) and all \(t\ge t_{0}\), and that \(\gamma : [t_{0}, +\infty )\rightarrow \left( 0, \frac{2}{L_{\nabla g}}\right) \) is a differentiable function that satisfies \(\frac{\dot{\gamma }(t)}{\gamma (t)} = \mathcal {O}(1/t)\) as \(t\rightarrow +\infty \). Then, for a solution \(x : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) to (33), the following statements hold:

  1. (i)

    x is bounded.

  2. (ii)

    We have the estimates

    $$\begin{aligned}&\int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }t^{3}\Vert \ddot{x}(t)\Vert ^{2}dt< +\infty , \\&\int _{t_{0}}^{+\infty } \frac{\gamma ^{2}(t)}{t}\left\| \nabla f_{\gamma (t)}\Big [x(t) - \gamma (t)\nabla g(x(t))\Big ] + \nabla g(x(t))\right\| ^{2}dt < +\infty . \end{aligned}$$
  3. (iii)

    We have the convergence rates

    $$\begin{aligned}&\Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) , \ \Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) , \\&\left\| \nabla f_{\gamma (t)}\Big [x(t) - \gamma (t)\nabla g(x(t))\Big ] + \nabla g(x(t))\right\| = o\left( \frac{1}{\gamma (t)}\right) , \\&\left\| \frac{d}{dt}\left( \nabla f_{\gamma (t)}\Big [x(t) - \gamma (t)\nabla g(x(t))\Big ] + \nabla g(x(t))\right) \right\| = \mathcal {O}\left( \frac{1}{t\gamma (t)}\right) + o\left( \frac{t^{2}\left| \frac{d}{dt}\frac{\gamma (t)}{\lambda (t)}\right| }{\gamma ^{2}(t)}\right) \end{aligned}$$

    as \(t\rightarrow +\infty \).

  4. (iv)

    If \(0< \inf _{t \ge t_{0}}\gamma (t) \le \sup _{t\ge t_{0}}\gamma (t) < \frac{2}{L_{\nabla g}}\), then x(t) converges converges to a minimizer of \(f + g\) as \(t \rightarrow +\infty \).

  5. (v)

    Additionally, if \(0 < \gamma (t) \le \frac{1}{L_{\nabla g}}\) for every \(t\ge t_{0}\) and we set \(u(t) := x(t) - \gamma (t) \nabla g(x(t))\), then

    $$\begin{aligned} f\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\right) + g\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\right) - \min \nolimits _{\mathcal {H}}(f + g) = o\left( \frac{1}{\gamma (t)}\right) \end{aligned}$$

    as \(t\rightarrow +\infty \). Moreover, \(\left\| {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t)) - x(t)\right\| \rightarrow 0\) as \(t\rightarrow +\infty \).

Proof

Parts (i)–(iv) are a direct consequence of Theorem 3.1. For checking (v), first notice that for all \(t\ge t_{0}\) we have

$$\begin{aligned} T_{\lambda (t), \gamma (t)}(x(t))&= \frac{1}{\lambda (t)}\Big [{{\,\mathrm{Id}\,}}- J_{\gamma (t)\partial f}\circ ({{\,\mathrm{Id}\,}}- \gamma (t)\nabla g)\Big ](x(t)) \nonumber \\ {}&\quad = \frac{1}{\lambda (t)}\Big [x(t) - {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\Big ]. \end{aligned}$$
(34)

Now, let \(\overline{x}\in {{\,\mathrm{argmin}\,}}_{\mathcal {H}}(f+g)\). According to [15, Lemma 2.3], for every \(t\ge t_{0}\), we have the inequality

$$\begin{aligned}&f\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\right) + g\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\right) - \min \nolimits _{\mathcal {H}}(f + g) \\&\quad \le f\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\right) + g\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\right) - f(\overline{x}) - g(\overline{x}) \\&\quad \le -\frac{1}{2\gamma (t)}\left\| {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t)) - x(t)\right\| ^{2} + \frac{1}{\gamma (t)}\left\langle x(t) - x^{*}, x(t) - {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\right\rangle . \end{aligned}$$

After summing the norm squared term and using the Cauchy–Schwarz inequality, for every \(t\ge t_{0}\) we obtain

$$\begin{aligned}&\frac{1}{2\gamma (t)}\left\| {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t)) - x(t)\right\| ^{2} \\&\quad \le f\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\right) + g\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\right) +\\ {}&\quad \frac{1}{2\gamma (t)}\left\| {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t)) - x(t)\right\| ^{2} - \min \nolimits _{\mathcal {H}}(f + g) \\&\quad \le \left\langle \frac{1}{\gamma (t)}\Big (x(t) - {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\Big ), x(t) - \overline{x}\right\rangle \\ {}&\quad \le \left\| \frac{1}{\gamma (t)}\Big (x(t) - {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\Big )\right\| \Vert x(t) - \overline{x}\Vert \\&\quad = \frac{\lambda (t)}{\gamma (t)}\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| \Vert x(t) - \overline{x}\Vert \\&\quad = o\left( \frac{1}{\gamma (t)}\right) \quad \text {as} \quad t\rightarrow +\infty , \end{aligned}$$

which follows as a consequence of x being bounded and \(\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| = o\left( \frac{1}{t^{2}}\right) \) as \(t\rightarrow +\infty \). \(\square \)

Remark 4.2

It is also worth mentioning the system we obtain in the case where \(g \equiv 0\), since we also get some improved rates for the objective functional values when we compare (Split-DIN-AVD) to (DIN-AVD) [9]. In this case, we have the system

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} + \xi \frac{d}{dt}\left( \frac{\gamma (t)}{\lambda (t)}\nabla f_{\gamma (t)}(x(t))\right) + \frac{\gamma (t)}{\lambda (t)}\nabla f_{\gamma (t)}(x(t)) = 0 \end{aligned}$$
(35)

attached to the convex optimization problem

$$\begin{aligned} \min _{x\in \mathcal {H}}f(x). \end{aligned}$$

If we assume \(\lambda > \frac{1}{(\alpha - 1)^{2}}\), allow \(\gamma : [t_{0}, +\infty ) \rightarrow (0, +\infty )\) to be unbounded from above and otherwise keep the hypotheses of Theorem 4.1, for a solution \(x : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) to (35), the following statements hold:

  1. (i)

    x is bounded,

  2. (ii)

    We have the estimates

    $$\begin{aligned}&\int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }t^{3}\Vert \ddot{x}(t)\Vert ^{2}dt< +\infty ,\\&\int _{t_{0}}^{+\infty }\frac{\gamma ^{2}(t)}{t}\left\| \nabla f_{\gamma (t)}(x(t))\right\| ^{2}dt < +\infty , \end{aligned}$$
  3. (iii)

    We have the convergence rates

    $$\begin{aligned}&\Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) , \ \Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) , \\&\left\| \nabla f_{\gamma (t)}(x(t))\right\| = o\left( \frac{1}{\gamma (t)}\right) , \ \left\| \frac{d}{dt}\nabla f_{\gamma (t)}(x(t))\right\| = \mathcal {O}\left( \frac{1}{t\gamma (t)}\right) + o\left( \frac{t^{2}\left| \frac{d}{dt}\frac{\gamma (t)}{\lambda (t)}\right| }{\gamma ^{2}(t)}\right) \end{aligned}$$

    as \(t\rightarrow +\infty \).

  4. (iv)

    If \(0 < \inf _{t\ge t_{0}}\gamma (t)\), then x(t) converges weakly to a minimizer of f as \(t \rightarrow +\infty \).

  5. (v)

    We also obtain the rate

    $$\begin{aligned} f_{\gamma (t)}(x(t)) - \min \nolimits _{\mathcal {H}}f = o\left( \frac{1}{\gamma (t)}\right) \quad \text {as} \quad t\rightarrow +\infty , \end{aligned}$$

    which entails

    $$\begin{aligned} f\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(x(t))\right) - \min \nolimits _{\mathcal {H}}f = o\left( \frac{1}{\gamma (t)}\right) \quad \text {and} \quad \left\| {{\,\mathrm{prox}\,}}_{\gamma (t)f}(x(t)) - x(t)\right\| \rightarrow 0 \end{aligned}$$

    as \(t\rightarrow +\infty \).

Parts (i)–(iv) are a direct consequence of Theorem 3.2 for the case \(A = \partial f\). For showing part (v), first notice that for \(\lambda > 0\) and \(u\in \mathcal {H}\) we have, according to the definition of \(f_{\lambda }\) and \({{\,\mathrm{prox}\,}}_{\lambda f}\),

$$\begin{aligned} f_{\lambda }(u) = f\left( {{\,\mathrm{prox}\,}}_{\lambda f}(u)\right) + \frac{1}{2\lambda }\left\| {{\,\mathrm{prox}\,}}_{\lambda f}(u) - u\right\| ^{2} \le f(u). \end{aligned}$$

Let \(\overline{x} \in \mathcal {H}\) be a minimizer of f. We apply the gradient inequality to \(f_{\gamma (t)}\), from which we obtain, for every \(t\ge t_{0}\)

$$\begin{aligned} f_{\gamma (t)}(x(t)) - \min \nolimits _{\mathcal {H}}f&= f_{\gamma (t)}(x(t)) - f(\overline{x}) \le f_{\gamma (t)}(x(t)) - f_{\lambda (t)}(\overline{x}) \\&\le \left\langle \nabla f_{\gamma (t)}(x(t)), x(t) - \overline{x}\right\rangle \le \left\| \nabla f_{\gamma (t)}(x(t))\right\| \Vert x(t) - \overline{x}\Vert , \end{aligned}$$

where the last inequality follows from the Cauchy–Schwarz inequality. Since \(\left\| \nabla f_{\gamma (t)}(x(t))\right\| = o\left( \frac{1}{\gamma (t)}\right) \) as \(t\rightarrow +\infty \) and x is bounded, the previous inequality entails the first statement of (v). Again recalling the definition of the Moreau envelope of f, this finally gives

$$\begin{aligned}&f\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(x(t))\right) + \frac{1}{2\gamma (t)}\left\| {{\,\mathrm{prox}\,}}_{\gamma (t)f}(x(t)) - x(t)\right\| ^{2} - \min \nolimits _{\mathcal {H}}f = f_{\gamma (t)}(x(t))\\&\quad - \min \nolimits _{\mathcal {H}}f = o\left( \frac{1}{\gamma (t)}\right) \end{aligned}$$

as \(t\rightarrow +\infty \), which implies the last two statements and concludes the proof.

As pointed out in Remark 3.3, we can choose \(\gamma (t) = \lambda t^{2}\) for every \(t\ge t_{0}\) and recover the (DIN-AVD) system for nonsmooth convex minimization problems studied in [9]. Moreover, we can also set \(\gamma (t) = t^{n}\) for a natural number \(n > 3\) and all \(t\ge t_{0}\). Now, not only are the convergence rates for \(\nabla f_{\gamma (t)}(x(t))\) and \(\frac{d}{dt}\nabla f_{\gamma (t)}(x(t))\) as \(t\rightarrow +\infty \) improved with respect to the system in [9], but (Split-DIN-AVD) also provides a better rate for the convergence of \(f_{\gamma (t)}(x(t))\) to \(\min _{\mathcal {H}}f\) as \(t\rightarrow +\infty \).

5 Numerical Experiments

In the following paragraphs we describe some numerical experiments that portray some aspects of the theory.

5.1 Minimizing a Smooth and Convex Function

As an example of a continuous time scheme minimizing a convex and Fréchet differentiable function \(g : \mathcal {H} \rightarrow \mathbb {R}\) with \(L_{\nabla g}\)-Lipschitz continuous gradient via (Split-DIN-AVD), we consider the system

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \frac{d}{dt}\left( \frac{1}{\eta (t)}\nabla g(x(t))\right) + \frac{1}{\eta (t)}\nabla g(x(t)) = 0, \end{aligned}$$
(36)

where for \((x_{1}, x_{2})\in \mathbb {R}^{2}\) we set \(g(x_{1}, x_{2}) = \frac{1}{2}(x_{1}^{2} + 100x_{2}^{2})\) and therefore \(\nabla g(x_{1}, x_{2}) = (x_{1}, 100x_{2})\). A trajectory generated by (36) is a pair \(x(t) = (x_{1}(t), x_{2}(t))\). Figure 1 plots both components of the solution to (36) with initial Cauchy data \(x_{0} = (1, 1)\), \(u_{0} = (1, 1)\). Notice that the Lipschitz constant of \(\nabla g\) is \(L_{\nabla g} = 100\), which means that the cocoercitivity modulus of \(\nabla g\) is \(\beta = \frac{1}{L_{\nabla g}} = \frac{1}{100}\). To fulfill \(\eta > \frac{1}{\beta (\alpha - 1)^{2}} = \frac{100}{(\alpha - 1)^{2}}\), we choose \(\alpha = 20\), \(\eta = 0.278\). Figure 1a corresponds to the case with no Hessian damping, that is, \(\xi = 0\). Figure 1b corresponds to a Hessian damping parameter \(\xi = 0.2\).

Fig. 1
figure 1

Trajectories of (Split-DIN-AVD) for \(B = \nabla g\)

Figure 2 depicts the fast convergence of the velocities to zero for the cases \(\xi = 0\) (Fig. 2a) and \(\xi = 0.2\) (Fig. 2b). In both figures, notice the effect of the damping parameter \(\xi > 0\), which attenuates the oscillations of the second component of the trajectories, as well as the oscillations present in the velocities.

Fig. 2
figure 2

Fast convergence of the velocities

5.2 Minimizing a Nonsmooth and Convex Function

As an example of a continuous time scheme minimizing a proper, convex and lower semicontinuous function \(f : \mathcal {H} \rightarrow \mathbb {R}\cup \{+\infty \}\) via (Split-DIN-AVD), we consider the system

$$\begin{aligned} \ddot{x}(t) + \frac{\alpha }{t} + \xi \frac{d}{dt}\left( \frac{\gamma (t)}{\lambda (t)}\nabla f_{\gamma (t)}(x(t))\right) + \frac{\gamma (t)}{\lambda (t)}\nabla f_{\gamma (t)}(x(t)) = 0. \end{aligned}$$
(37)

We will consider three options for f and plot for each of them the trajectories, the objective function values and the gradients of the Moreau envelopes as follows:

  • \(f(x) = \frac{1}{2}x^{2}\) (Figs. 3a and 4a),

  • \(f(x) = |x|\) (Figs. 3b and 4b),

  • \(f(x) = |x| + \frac{1}{2}x^{2}\) (Figs. 3c and 4c).

In order to fulfill \(\alpha > 1\) and \(\lambda > \frac{1}{(\alpha - 1)^{2}}\), we choose the parameters \(\alpha = 2\), \(\lambda = 1.1\), and we take \(\xi = 0\) and \(\gamma (t) = t^{8}\). We compare the results given by (DIN-AVD) (that is, when \(\gamma (t) = \lambda t^{2}\)) and the ones given by our system (Split-DIN-AVD). The choice of \(\xi \) does not seem to change the plots in a significant way for the examples we have chosen.

Fig. 3
figure 3

Trajectories and objective function values in the case \(A = \partial f\)

Fig. 4
figure 4

Gradients of the Moreau envelopes of f

Figure 3 depicts the trajectories x(t) of (37) and the function values \(f\left( {{\,\mathrm{prox}\,}}_{\gamma (t)}(x(t))\right) \) for our choices of f as \(t \rightarrow +\infty \). Figure 4 portrays the fast convergence to zero of \(\Vert \nabla f_{\gamma (t)}(x(t))\Vert \) as \(t\rightarrow +\infty \). Notice the big improvement over (DIN-AVD) for nonsmooth convex minimization in [9] when choosing \(\gamma (t) = t^{8}\), a result which we already knew theoretically. Polynomials of high degree seem to be the ones which give the biggest improvements in terms of rates.

5.3 An Example with Operator Splitting

Now we consider the monotone inclusion problem (1) for \(A(x_{1}, x_{2}) = (-x_{2}, x_{1})\) and \(B(x_{1}, x_{2}) = (x_{1}, x_{2})\) for every \((x_{1}, x_{2})\in \mathbb {R}^{2}\). For every \((x_{1}, x_{2}) \in \mathbb {R}^{2}\), an easy calculation gives

$$\begin{aligned} J_{\gamma A} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} =\begin{bmatrix} \frac{1}{1 + \gamma ^{2}} &{}\quad \frac{\gamma }{1 + \gamma ^{2}} \\ \frac{-\gamma }{1 + \gamma ^{2}} &{}\quad \frac{1}{1 + \gamma ^{2}} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}, \end{aligned}$$

and so

$$\begin{aligned}&({{\,\mathrm{Id}\,}}- J_{\gamma A}({{\,\mathrm{Id}\,}}- \gamma {{\,\mathrm{Id}\,}})) \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}\\&\quad = \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} - (1 - \gamma ) \begin{bmatrix} \frac{1}{1 + \gamma ^{2}} &{}\quad \frac{\gamma }{1 + \gamma ^{2}} \\ \frac{-\gamma }{1 + \gamma ^{2}} &{}\quad \frac{1}{1 + \gamma ^{2}} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} = \begin{bmatrix} \frac{\gamma ^{2} + \gamma }{1 + \gamma ^{2}} &{}\quad \frac{\gamma - 1}{1 + \gamma ^{2}} \\ \frac{1 - \gamma }{1 + \gamma ^{2}} &{}\quad \frac{\gamma ^{2} + \gamma }{1 + \gamma ^{2}} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}, \end{aligned}$$

and

$$\begin{aligned} T_{\lambda , \gamma } \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix} = \begin{bmatrix} \frac{\gamma ^{2} + \gamma }{\lambda (1 + \gamma ^{2})} &{}\quad \frac{\gamma - 1}{\lambda (1 + \gamma ^{2})} \\ \frac{1 - \gamma }{\lambda (1 + \gamma ^{2})} &{}\quad \frac{\gamma ^{2} + \gamma }{\lambda (1 + \gamma ^{2})} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \end{bmatrix}. \end{aligned}$$

(Split-DIN-AVD) now reads

$$\begin{aligned}&\begin{bmatrix} \ddot{x_{1}}(t) \\ \ddot{x_{2}}(t) \end{bmatrix} + \frac{\alpha }{t} \begin{bmatrix} \dot{x_{1}}(t) \\ \dot{x_{2}}(t) \end{bmatrix} + \, \xi \frac{d}{dt} \left( \begin{bmatrix} \frac{\gamma ^{2}(t) + \gamma (t)}{\lambda (t)(1 + \gamma ^{2}(t))} &{}\quad \frac{\gamma (t) - 1}{\lambda (t)(1 + \gamma ^{2}(t))} \\ \frac{1 - \gamma (t)}{\lambda (t)(1 + \gamma ^{2}(t))} &{}\quad \frac{\gamma ^{2}(t) + \gamma (t)}{\lambda (t)(1 + \gamma ^{2}(t))} \end{bmatrix} \begin{bmatrix} x_{1}(t) \\ x_{2}(t) \end{bmatrix}\right) \\&\quad + \begin{bmatrix} \frac{\gamma ^{2}(t) + \gamma (t)}{\lambda (t)(1 + \gamma (t)^{2})} &{}\quad \frac{\gamma (t) - 1}{\lambda (t)(1 + \gamma ^{2}(t))} \\ \frac{1 - \gamma (t)}{\lambda (1 + \gamma ^{2}(t))} &{}\quad \frac{\gamma ^{2}(t) + \gamma (t)}{\lambda (t)(1 + \gamma ^{2}(t))} \end{bmatrix} \begin{bmatrix} x_{1}(t) \\ x_{2}(t) \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}. \end{aligned}$$

We choose the parameters \(\alpha = 7\), \(\lambda = 0.056\), \(\gamma (t) \equiv 1.5\), and the Cauchy data \(x_{0} = (1, 2)\) and \(u_{0} = (-1, -1)\). Figure 5a corresponds to the case \(\xi = 0\), and Fig. 5b depicts the trajectory when the Hessian damping parameter is \(\xi = 0.8\). Again, notice how, not only for optimization problems, but also for monotone inclusions which cannot be reduced to the former, the presence of \(\xi \) seems to attenuate the oscillations present in the trajectories.

Fig. 5
figure 5

Trajectories of (Split-DIN-AVD) for finding the zeros of \(A + B\)

6 A Numerical Algorithm

In the following we will derive via time discretization of (Split-DIN-AVD) a numerical algorithm for solving the monotone inclusion problem (1). We perform a discretization of (Split-DIN-AVD) with stepsize 1 and set, for an integer \(k \ge 1\), \(x(k) := x_{k}\), \(\lambda (k) := \lambda _{k}\), \(\gamma (k) := \gamma _{k}\). We make the approximations

$$\begin{aligned}&\ddot{x}(t) \approx x_{k + 1} - 2x_{k} + x_{k - 1}, \quad \frac{\alpha }{t}\dot{x}(t) \approx \frac{\alpha }{k}(x_{k} - x_{k - 1}), \\&\frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t)) \approx T_{\lambda _{k}, \gamma _{k}}(x_{k}) - T_{\lambda _{k - 1}, \gamma _{k - 1}}(x_{k - 1}), \quad T_{\lambda (t), \gamma (t)}(x(t)) \approx T_{\lambda _{k + 1}, \gamma _{k + 1}}(x_{k + 1}), \end{aligned}$$

so we get, for every \(k\ge 1\),

$$\begin{aligned} x_{k + 1} - 2x_{k} - x_{k - 1} + \frac{\alpha }{k}(x_{k} - x_{k - 1}) + \xi \left( T_{\lambda _{k}, \gamma _{k}}(x_{k}) - T_{\lambda _{k - 1}, \gamma _{k - 1}}(x_{k - 1})\right) + T_{\lambda _{k + 1}, \gamma _{k + 1}}(x_{k + 1}) = 0. \end{aligned}$$
(38)

After rearranging the terms of (38), for every \(k\ge 1\) we obtain

$$\begin{aligned} x_{k + 1} + T_{\lambda _{k + 1}, \gamma _{k + 1}}(x_{k + 1}) = x_{k} + \left( 1 - \frac{\alpha }{k}\right) (x_{k} - x_{k - 1}) - \xi \left( T_{\lambda _{k}, \gamma _{k}}(x_{k}) - T_{\lambda _{k - 1}, \gamma _{k - 1}}(x_{k - 1})\right) . \end{aligned}$$
(39)

In other words, after setting \(\alpha _{k} = 1 - \frac{\alpha }{k}\) and denoting the right hand side of (39) by \(y_{k}\) for every \(k\ge 1\), we obtain the following iterative scheme

$$\begin{aligned} (\forall k \ge 1) \ \left\{ \begin{aligned}&y_{k} = x_{k} + \alpha _{k}(x_{k} - x_{k - 1}) - \xi \left( T_{\lambda _{k}, \gamma _{k}}(x_{k}) - T_{\lambda _{k - 1}, \gamma _{k - 1}}(x_{k - 1})\right) , \\&x_{k + 1} = \left( {{\,\mathrm{Id}\,}}+ T_{\lambda _{k + 1}, \gamma _{k + 1}}\right) ^{-1}(y_{k}). \end{aligned} \right. \end{aligned}$$
(40)

Observe that the second step in (40) is always well-defined. Indeed, for \(\lambda , \gamma > 0\), \(T_{\lambda , \gamma }\) is \(\frac{\lambda }{2}\)-cocoercive, hence monotone (see Lemma 2.2(i)). This also implies that \(T_{\lambda , \gamma }\) is \(\frac{2}{\lambda }\)-Lipschitz continuous, and a monotone and continuous operator is maximally monotone, according to [14, Corollary 20.28]. Hence, by Minty’s Theorem (see [14, Theorem 21.1]), we know that \({{\,\mathrm{Id}\,}}+ T_{\lambda , \gamma } : \mathcal {H}\rightarrow \mathcal {H}\) is surjective.

We are in conditions of stating the main theorem concerning our previous algorithm.

Theorem 6.1

Let \(A : \mathcal {H}\rightarrow 2^{\mathcal {H}}\) be a maximally monotone operator and \(B : \mathcal {H}\rightarrow \mathcal {H}\) a \(\beta \)-cocoercive operator for some \(\beta \ge 0\) such that \({{\,\mathrm{zer}\,}}(A + B)\ne \emptyset \). Choose \(x_{0}, x_{1}\in \mathcal {H}\) any initial points. Let \(\alpha > 1\), \(\xi \ge 0\), and \((\lambda _{k})_{k \ge 0}\), \((\gamma _{k})_{k \ge 0}\) sequences of positive numbers that fulfill

$$\begin{aligned}&\lambda _{k} = \lambda k^{2} \,\,\, \forall k\ge 1, \quad \text {with}\quad \lambda > \frac{4\xi + 2}{(\alpha - 1)^{2}},\\&0< \inf _{k \ge 0}\gamma _{k} \le \sup _{k \ge 0}\gamma _{k} < 2\beta \quad \text {and} \quad \frac{\gamma _{k} - \gamma _{k - 1}}{\gamma _{k}} = \mathcal {O}\left( \frac{1}{k}\right) \quad \text {as} \quad k\rightarrow +\infty . \end{aligned}$$

Now, consider the sequences \((y_{k})_{k\ge 1}\) and \((x_{k})_{k\ge 0}\) generated by algorithm (40). The following properties are satisfied:

  1. (i)

    We have the estimates

    $$\begin{aligned} \Vert x_{k + 1} - x_{k}\Vert = \mathcal {O}\left( \frac{1}{k}\right) \quad \text {and} \quad \left\| A_{\gamma _{k}}(x_{k} - \gamma _{k}Bx_{k}) + Bx_{k}\right\| = o\left( \frac{1}{\gamma _{k}}\right) \quad \text {as} \quad k\rightarrow +\infty . \end{aligned}$$
  2. (ii)

    The sequence \((x_{k})_{k\ge 0}\) converges weakly to an element of \({{\,\mathrm{zer}\,}}(A + B)\).

  3. (iii)

    The sequence \((y_{k})_{k\ge 1}\) converges weakly to an element of \({{\,\mathrm{zer}\,}}(A + B)\). Precisely, we have \(\Vert x_{k} - y_{k}\Vert = \mathcal {O}\left( \frac{1}{k}\right) \) as \(k\rightarrow +\infty \).

The proof can be done by transposing the techniques used in the continuous time case to the discrete time case. Algorithm (40) can be seen as a splitting version of the (PRINAM) algorithm studied by Attouch and László in [10].

Remark 6.2

The second step in (40) can be quite complicated to compute. However, if \(B = 0\), we can resort to the fact that \((A_{\lambda _{1}})_{\lambda _{2}} = A_{\lambda _{1} + \lambda _{2}}\) for \(\lambda _{1}, \lambda _{2} > 0\). We now have, for \(\lambda , \gamma > 0\),

$$\begin{aligned} T_{\lambda , \gamma } = \frac{1}{\lambda }\Big [{{\,\mathrm{Id}\,}}- J_{\gamma A}\Big ] = \frac{\gamma }{\lambda }A_{\gamma }, \end{aligned}$$

which gives

$$\begin{aligned} \left( {{\,\mathrm{Id}\,}}+ T_{\gamma , \lambda }\right) ^{-1} = J_{\frac{\gamma }{\lambda }A_{\gamma }} = -\frac{\gamma }{\lambda }\left( A_{\lambda }\right) _{\frac{\gamma }{\lambda }} + {{\,\mathrm{Id}\,}}= {{\,\mathrm{Id}\,}}- \frac{\gamma }{\lambda }A_{\lambda + \frac{\gamma }{\lambda }}. \end{aligned}$$

It is now possible to write (40) in terms of the resolvents of A. We have, for every \(k\ge 1\),

$$\begin{aligned} T_{\lambda _{k}, \gamma _{k}}(x_{k}) - T_{\lambda _{k - 1}, \gamma _{k - 1}}(x_{k - 1}) = \,&\frac{1}{\lambda _{k}}\Big [x_{k} - J_{\gamma _{k}A}(x_{k})\Big ] - \frac{1}{\lambda _{k - 1}}\Big [x_{k - 1} - J_{\gamma _{k - 1}A}(x_{k - 1})\Big ] \\ = \,&\left( \frac{1}{\lambda _{k}} - \frac{1}{\lambda _{k - 1}}\right) x_{k} + \frac{1}{\lambda _{k - 1}}(x_{k} - x_{k - 1}) \\&- \left( \frac{1}{\lambda _{k}}J_{\gamma _{k}A}(x_{k}) - \frac{1}{\lambda _{k - 1}}J_{\gamma _{k - 1}A}(x_{k - 1})\right) , \\ y_{k} - \frac{\gamma _{k + 1}}{\lambda _{k + 1}}A_{\lambda _{k + 1} + \frac{\gamma _{k + 1}}{\lambda _{k + 1}}}(y_{k}) = \,&y_{k} - \frac{\gamma _{k + 1}}{\lambda _{k + 1}}\frac{1}{\frac{\lambda _{k + 1}^{2} + \gamma _{k + 1}}{\lambda _{k + 1}}}\left[ y_{k} - J_{\left( \lambda _{k + 1} + \frac{\gamma _{k + 1}}{\lambda _{k + 1}}\right) A}(y_{k})\right] \\ = \,&\frac{\lambda _{k + 1}^{2}}{\lambda _{k + 1}^{2} + \gamma _{k + 1}}y_{k} + \frac{\gamma _{k}}{\lambda _{k + 1}^{2} + \gamma _{k + 1}}J_{\left( \lambda _{k + 1} + \frac{\gamma _{k + 1}}{\lambda _{k + 1}}\right) A}(y_{k}). \end{aligned}$$

So now (40) becomes

$$\begin{aligned} (\forall k \ge 1) \ \left\{ \begin{aligned}&y_{k} = \left( 1 - \xi \left( \frac{1}{\lambda _{k}} - \frac{1}{\lambda _{k - 1}}\right) \right) x_{k} + \left( \alpha _{k} - \frac{\xi }{\lambda _{k - 1}}\right) (x_{k} - x_{k - 1}) \\&\qquad \quad + \xi \left( \frac{1}{\lambda _{k}}J_{\gamma _{k}A}(x_{k}) - \frac{1}{\lambda _{k - 1}}J_{\gamma _{k - 1}A}(x_{k - 1})\right) , \\&x_{k + 1} =\frac{\lambda _{k + 1}^{2}}{\lambda _{k + 1}^{2} + \gamma _{k + 1}}y_{k} + \frac{\gamma _{k}}{\lambda _{k + 1}^{2} + \gamma _{k + 1}}J_{\left( \lambda _{k + 1} + \frac{\gamma _{k + 1}}{\lambda _{k + 1}}\right) A}(y_{k}). \end{aligned} \right. \end{aligned}$$
(41)

Now, if we assume \(0 < \inf _{k\ge 0}\gamma _{k}\) and \(\lambda > \frac{2\xi + 1}{(\alpha - 1)^{2}}\) and otherwise keep the hypotheses of Theorem 6.1, then for the sequences \((x_{k})_{k\ge 0}\) and \((y_{k})_{k\ge 1}\) generated by (41), the following statements hold:

  1. (i)

    We have the estimates

    $$\begin{aligned} \Vert x_{k + 1} - x_{k}\Vert = \mathcal {O}\left( \frac{1}{k}\right) \quad \text {and} \quad \left\| A_{\gamma _{k}}(x_{k})\right\| = o\left( \frac{1}{\gamma _{k}}\right) \quad \text {as} \quad k\rightarrow +\infty . \end{aligned}$$
  2. (ii)

    The sequence \((x_{k})_{k\ge 0}\) converges weakly to an element of \({{\,\mathrm{zer}\,}}A\).

  3. (iii)

    The sequence \((y_{k})_{k\ge 1}\) converges weakly to an element of \({{\,\mathrm{zer}\,}}A\) as well. Precisely, we have \(\Vert x_{k} - y_{k}\Vert = \mathcal {O}\left( \frac{1}{k}\right) \) as \(k\rightarrow +\infty \).

Notice that the condition required for \((\gamma _{k})_{k\ge 0}\) is fulfilled in particular for \(\gamma _{k} = k^{n}\) for every \(k\ge 1\) and a natural number \(n \ge 1\). Thus, by choosing large n, we obtain a fast convergence rate for \(A_{\gamma _{k}}(x_{k})\) as \(k\rightarrow +\infty \).