Abstract
In the framework of a real Hilbert space, we address the problem of finding the zeros of the sum of a maximally monotone operator A and a cocoercive operator B. We study the asymptotic behaviour of the trajectories generated by a second order equation with vanishing damping, attached to this problem, and governed by a time-dependent forward–backward-type operator. This is a splitting system, as it only requires forward evaluations of B and backward evaluations of A. A proper tuning of the system parameters ensures the weak convergence of the trajectories to the set of zeros of \(A + B\), as well as fast convergence of the velocities towards zero. A particular case of our system allows to derive fast convergence rates for the problem of minimizing the sum of a proper, convex and lower semicontinuous function and a smooth and convex function with Lipschitz continuous gradient. We illustrate the theoretical outcomes by numerical experiments.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
1.1 Problem Formulation and a Continuous Time Splitting Scheme with Vanishing Damping
Let \(\mathcal {H}\) be a real Hilbert, \(A: \mathcal {H}\rightarrow 2^{\mathcal {H}}\) a maximally monotone operator and \(B: \mathcal {H}\rightarrow \mathcal {H}\) a \(\beta \)-cocoercive operator for some \(\beta > 0\) such that \({{\,\mathrm{zer}\,}}(A + B)\ne \emptyset \). Devising fast convergent continuous and discrete time dynamics for solving monotone inclusions of the type
is of great importance in many fields, including, but not limited to, optimization, equilibrium theory, economics and game theory, partial differential equations, and statistics. One of our main motivations comes from the fact that solving the convex optimization problem
where \(f : \mathcal {H} \rightarrow \mathbb {R}\cup \{+\infty \}\) is proper, convex and lower semicontinuous and \(g : \mathcal {H} \rightarrow \mathbb {R}\) is convex and Fréchet differentiable with a Lipschitz continuous gradient, is equivalent to solving the monotone inclusion
We want to exploit the additive structure of (1) and approach A and B separately, in the spirit of the splitting paradigm.
For \(t \ge t_{0} > 0\), \(\alpha > 1, \xi \ge 0\), and functions \(\lambda , \gamma : [t_{0}, +\infty ) \rightarrow (0, +\infty )\), we will study the asymptotic behaviour of the trajectories of the second order differential equation
where, for \(\lambda , \gamma > 0\), the operator \(T_{\lambda , \gamma } : \mathcal {H} \rightarrow \mathcal {H}\) is given by
The sets of zeros of \(A+B\) and of \(T_{\lambda , \gamma }\), for \(\lambda , \gamma > 0\), coincide. The nomenclature (Split-DIN-AVD) comes from the splitting feature of the continuous time scheme, as well as the link with the (DIN-AVD) system developed by Attouch and László in [9] (Dynamic Inertial Newton—Asymptotic Vanishing Damping), which we will emphasize later. We will discuss the existence and uniqueness of the trajectories generated (Split-DIN-AVD), and also show their weak convergence to the set of zeros of \(A + B\) as well as the fast convergence of the velocities to zero, and convergence rates for \(T_{\lambda (t), \gamma (t)}(x(t))\) and \(\frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\) as \(t\rightarrow +\infty \).
For the particular case \(B = 0\), we are left with the monotone inclusion problem
and the attached system
where, for \(\lambda , \gamma > 0\), the operator \(A_{\lambda , \gamma } : \mathcal {H} \rightarrow \mathcal {H}\) can be seen as a generalized Moreau envelope of the operator A, i.e.,
In particular, we will be able to set \(\gamma (t) = \lambda (t)\) for every \(t \ge t_0\). Since for \(\lambda > 0\), \(A_{\lambda , \lambda } = A_{\lambda }\), this allows us to recover the (DIN-AVD) system
addressed by Attouch and László in [9].
If \(A = 0\), and after properly redefining some parameters, we obtain the following system
with \(\eta : [t_{0}, +\infty ) \rightarrow (0, +\infty )\), which addresses the monotone equation
This dynamical system approaches the cocoercive operator B directly through a forward evaluation, which is more natural, instead of having to resort to its Moreau envelope, as in (DIN-AVD).
1.2 Notation and Preliminaries
In this subsection, we will explain the notions which were mentioned in the previous subsection, and we will introduce some definitions and preliminary results that will be required later. Throughout the paper, we will be working in a real Hilbert space \(\mathcal {H}\) with inner product \(\langle \cdot , \cdot \rangle \) and corresponding norm \(\Vert \cdot \Vert = \sqrt{\langle \cdot , \cdot \rangle }\).
Let \(A : \mathcal {H} \rightarrow 2^{\mathcal {H}}\) be a set-valued operator, that is, Ax is a subset of \(\mathcal {H}\) for every \(x\in \mathcal {H}\). The operator A is totally characterized by its graph \({{\,\mathrm{gra}\,}}A = \{(x, u) \in \mathcal {H}\times \mathcal {H} : u\in Ax\}\). The inverse of A is the operator \(A^{-1} : \mathcal {H} \rightarrow 2^{\mathcal {H}}\) well-defined through the equivalence \(x\in A^{-1}u\) if and only if \(u\in Ax\). The set of zeros of A is the set \({{\,\mathrm{zer}\,}}A = \{x\in \mathcal {H} : 0 \in Ax\}\). For a subset \(C\subseteq \mathcal {H}\), we say that \(A(C) = \cup _{x\in C}Ax\). The range of A is the set \({{\,\mathrm{ran}\,}}A = A(\mathcal {H})\).
A set-valued operator A is said to be monotone if \(\langle v - u, y - x\rangle \ge 0\) whenever \((x, u), (y, v)\in {{\,\mathrm{gra}\,}}A\), and maximally monotone if it is monotone and the following implication holds:
Let \(\lambda > 0\). The resolvent of index \(\lambda \) of A is the operator \(J_{\lambda A} : \mathcal {H} \rightarrow 2^{\mathcal {H}}\) given by
and the Moreau envelope (or Yosida approximation or Yosida regularization) of index \(\lambda \) of A is the operator \(A_{\lambda } : \mathcal {H} \rightarrow 2^{\mathcal {H}}\) given by
where \({{\,\mathrm{Id}\,}}: \mathcal {H} \rightarrow \mathcal {H}\), defined by \({{\,\mathrm{Id}\,}}(x) = x\) for every \(x\in \mathcal {H}\), is the identity operator of \(\mathcal {H}\). For \(\lambda _{1}, \lambda _{2} > 0\), it holds \((A_{\lambda _{1}})_{\lambda _{2}} = A_{\lambda _{1} + \lambda _{2}}\).
A single-valued operator \(B : \mathcal {H} \rightarrow \mathcal {H}\) is said to be \(\beta \)-cocoercive for some \(\beta >0\) if for every \(x, y\in \mathcal {H}\) we have
In this case, B is \(\frac{1}{\beta }\)-Lipschitz continuous, namely, for every \(x, y\in \mathcal {H}\) we have
We say B is nonexpansive if it is 1-Lipschitz continuous, and firmly nonexpansive if it is 1-cocoercive. For \(\alpha \in (0, 1)\), we say B is \(\alpha \)-averaged if there exists a nonexpansive operator \(R :\mathcal {H} \rightarrow \mathcal {H}\) such that
Let \(\lambda > 0\) and \(A : \mathcal {H} \rightarrow 2^{\mathcal {H}}\). According to Minty’s Theorem, A is maximally monotone if and only if \({{\,\mathrm{ran}\,}}({{\,\mathrm{Id}\,}}+ \lambda A) = \mathcal {H}\). In this case \(J_{\lambda A}\) is single-valued and firmly nonexpansive, \(A_{\lambda }\) is single-valued, \(\lambda \)-cocoercive, and for every \(x\in \mathcal {H}\) and every \(\lambda _{1}, \lambda _{2} > 0\) we have
Let \(B : \mathcal {H} \rightarrow \mathcal {H}\) be a single-valued operator. If B is \(\alpha \)-averaged for some \(\alpha \in (0, 1)\), then \({{\,\mathrm{Id}\,}}- B\) is \(\frac{1}{2\alpha }\)-cocoercive. If B is monotone and continuous, then it is maximally monotone.
The following concepts and results show the strong interplay between the theory of monotone operators and the convex analysis.
Let \(f : \mathcal {H}\rightarrow \mathbb {R}\cup \{+\infty \}\) be a proper, convex and lower semicontinuous function. We denote the infimum of f over \(\mathcal {H}\) by \(\min _{\mathcal {H}}f\) and the set of global minimizers of f by \({{\,\mathrm{argmin}\,}}_{\mathcal {H}}f\). The subdifferential of f is the operator \(\partial f : \mathcal {H} \rightarrow 2^{\mathcal {H}}\) defined, for every \(x\in \mathcal {H}\), by
The subdifferential operator of f is maximally monotone and \(\overline{x} \in {{\,\mathrm{zer}\,}}\partial f \) \(\Leftrightarrow \) \(\overline{x}\) is a global minimizer of f.
Let \(\lambda > 0\). The proximal operator of f of index \(\lambda \) is the operator \({{\,\mathrm{prox}\,}}_{\lambda f} : \mathcal {H} \rightarrow \mathcal {H}\) defined, for every \(x\in \mathcal {H}\), by
which also means that \({{\,\mathrm{prox}\,}}_{\lambda f}\) is firmly nonexpansive. The Moreau envelope of f of index \(\lambda \) is the function \(f_{\lambda } : \mathcal {H}\rightarrow \mathbb {R}\) given, for every \(x\in \mathcal {H}\), by
The function \(f_{\lambda }\) is Fréchet differentiable and
Finally, if \(f : \mathcal {H}\rightarrow \mathbb {R}\) has full domain and is Fréchet differentiable with \(\frac{1}{\beta }\)-Lipschitz continuous gradient, for \(\beta >0\), then, according to Baillon–Haddad’s Theorem, \(\nabla f\) is \(\beta \)-cocoercive.
1.3 A Brief History of Inertial Systems Attached to Optimization Problems and Monotone Inclusions
In the last years there have been many advances in the study of continuous time inertial systems with vanishing damping attached to monotone inclusion problems. We briefly visit them in the following paragraphs.
1.3.1 The Heavy Ball Method with Friction
Consider a convex and continuously differentiable function \(f : \mathcal {H}\rightarrow \mathbb {R}\) with at least one minimizer. The heavy ball with friction system
was introduced by Álvarez in [2] as a suitable continuous time scheme to approach the minimization of the function f. This system can be seen as the equation of the horizontal position x(t) of an object that moves, under the force of gravity, along the graph of the function f, subject to a kinetic friction represented by the term \(\mu \dot{x}(t)\) (a nice derivation can be seen in the work done by Attouch-Goudou-Redont in [8]). It is known that, if x is a solution of (HBF), then x converges weakly to a minimizer of f and \(f(x(t)) - \min _{\mathcal {H}}f = \mathcal {O}\left( \frac{1}{t}\right) \) as \(t\rightarrow +\infty \).
In recent times, the question was raised whether the damping coefficient \(\mu \) could be chosen to be time-dependent. An important contribution was made by Su–Boyd–Candés (in [20]) who studied the case of an Asymptotic Vanishing Damping coefficient \(\mu (t) = \frac{\alpha }{t}\), namely,
and proved when \(\alpha \ge 3\) the rate of convergence for the functional values \(f(x(t)) - \min _{\mathcal {H}}f = O\left( \frac{1}{t^{2}}\right) \) as \(t\rightarrow +\infty \). This second order system can be seen as a continuous counterpart to Nesterov’s accelerated gradient method from [19]. Weak convergence of the trajectories generated by \(\text {(AVD)}\) when \(\alpha > 3\) has been shown by Attouch-Chbani-Peypouquet-Redont [6] and May [18], with the improved rate of convergence for the functional values \(f(x(t)) - \min _{\mathcal {H}}f = o\left( \frac{1}{t^{2}}\right) \) as \(t\rightarrow +\infty \). For \(\alpha = 3\), the convergence of the trajectories remains an open question, except for the one dimensional case (see [7]). In the subcritical case \(\alpha \le 3\), it has been shown by Apidopoulos-Aujol-Dossal [5] and Attouch-Chbani-Riahi [7] that the objective values converge at a rate \(\mathcal {O}(t^{-\frac{2\alpha }{3}})\) as \(t\rightarrow +\infty \).
1.3.2 Heavy Ball Dynamics and Cocoercive Operators
If \(f : \mathcal {H}\rightarrow \mathbb {R}\cup \{+\infty \}\) is a proper, convex and lower semicontinuous function which is not necessarily differentiable, then we cannot make direct use of (3). However, since for \(\lambda > 0\) we have \({{\,\mathrm{argmin}\,}}f = {{\,\mathrm{argmin}\,}}f_{\lambda }\), we can replace f by its Moreau envelope \(f_{\lambda }\), and the system now becomes
In line with this idea, and in analogy with (3), Álvarez and Attouch [3] and Attouch and Maingé [11] studied the dynamics
where \(B : \mathcal {H}\rightarrow \mathcal {H}\) is a \(\beta \)-cocoercive operator. They were able to prove that the solutions of this system weakly converge to elements of \({{\,\mathrm{zer}\,}}B\) provided that the cocoercitivity parameter \(\beta \) and the damping coefficient \(\mu \) satisfy \(\beta \mu ^{2} > 1\). For a maximally monotone operator \(A : \mathcal {H}\rightarrow 2^{\mathcal {H}}\), we know that its Moreau envelope is \(\lambda \)-cocoercive and thus, under the condition \(\lambda \mu ^{2} > 1\), the trajectories of
converge weakly to elements of \({{\,\mathrm{zer}\,}}A_{\lambda } = {{\,\mathrm{zer}\,}}A\).
Also related to (5), Boţ-Csetnek [16] considered the system
where \(B : \mathcal {H}\rightarrow \mathcal {H}\) is again \(\beta \)-cocoercive. Under the assumption that \(\mu \) and \(\nu \) are locally absolutely continuous, \(\dot{\mu }(t) \le 0 \le \dot{\nu }(t)\) for almost every \(t\in [0, +\infty )\) and \(\inf _{t\ge 0} \frac{\mu ^{2}(t)}{\nu (t)} > \frac{1}{\beta }\), the authors were able to prove that the solutions to this system converge weakly to zeros of B.
In [12], Attouch and Peypouquet addressed the system
where \(\alpha > 1\) and the time-dependent regularizing parameter \(\lambda (t)\) satisfies \(\lambda (t) \frac{\alpha ^{2}}{t^{2}} > 1\) for every \(t \ge t_0 >0\). As well as ensuring the weak convergence of the trajectories towards elements of \({{\,\mathrm{zer}\,}}A\), choosing the regularizing parameter in such a fashion allowed the authors to obtain fast convergence of the velocities and accelerations towards zero.
1.3.3 Inertial Dynamics with Hessian Damping
Let us return briefly to the \(\text {(AVD)}\) system (4). In addition to the viscous vanishing damping term \(\frac{\alpha }{t}\dot{x}(t)\), the following system with Hessian-driven damping was considered by Attouch-Peypouquet-Redont in [13]
where \(\xi \ge 0\). While preserving the fast convergence properties of the Nesterov accelerated method, the Hessian-driven damping term reduces the oscillatory aspect of the trajectories. In [9], Attouch and László studied a version of (7) with an added Hessian-driven damping term:
While preserving the convergence results of (7), the main benefit of the introduction of this damping term is the fast convergence rates that can be obtained for \(A_{\lambda (t)}(x(t))\) and \(\frac{d}{dt}A_{\lambda (t)}(x(t))\) as \(t\rightarrow +\infty \). The regularizing parameter \(\lambda (t)\) is again chosen to be time-dependent; in the general case, the authors take \(\lambda (t) = \lambda t^{2}\), and in [12] it is shown that taking \(\lambda (t)\) this way is critical. However, in the case where \(A = \partial f\) for a proper, convex and lower semicontinuous function f, it is also allowed to take \(\lambda (t) = \lambda t^{r}\) with \(r \ge 0\).
1.4 Layout of the Paper
In Sect. 2, we give the proof for the existence and uniqueness of strong global solutions to (Split-DIN-AVD) by means of a Cauchy–Lipschitz–Picard argument. In Sect. 3 we state the main theorem of this work, and we show the weak convergence of the solutions of (2) to elements of \({{\,\mathrm{zer}\,}}(A + B)\), as well as the fast convergence of the velocities and accelerations to zero. We also provide convergence rates for \(T_{\lambda (t), \gamma (t)}(x(t))\) and \(\frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\) as \(t\rightarrow +\infty \). We explore the particular cases \(A = 0\) and \(B = 0\), and show improvements with respect to previous works. In Sect. 4, we address the convex minimization case, namely, when \(A = \partial f\) and \(B = \nabla g\), where \(f : \mathcal {H}\rightarrow \mathbb {R}\cup \{+\infty \}\) is a proper, convex and lower semicontinuous function and \(g : \mathcal {H}\rightarrow \mathbb {R}\) is a convex and Fréchet differentiable function with Lipschitz continuous gradient, and derive, in addition, a fast convergence rate for the function values. In Sect. 5, we illustrate the theoretical results by numerical experiments. In Sect. 5, we provide an algorithm that arises from a time discretization of (Split-DIN-AVD) and discuss its convergence properties.
2 Existence and Uniqueness of Trajectories
In this section, we show the existence and uniqueness of strong global solutions to (Split-DIN-AVD). For the sake of clarity, first we state the definition of a strong global solution.
Definition 2.1
We say that \(x : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) is a strong global solution of (Split-DIN-AVD) with Cauchy data \(( x_{0}, u_{0}) \in \mathcal {H} \times \mathcal {H}\) if
-
(i)
\(x, \dot{x} : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) are locally absolutely continuous;
-
(ii)
\(\ddot{x}(t) + \frac{\alpha }{t}\dot{x} + \xi \left( \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right) + T_{\lambda (t), \gamma (t)}(x(t)) = 0\) for almost every \(t\in [t_{0}, +\infty )\);
-
(iii)
\(x(t_{0}) = x_{0}\), \(\dot{x}(t_{0}) = u_{0}\).
A classic solution is just a strong global solution which is \(\mathcal {C}^{2}\). Sometimes we will mention the terms strong global solution or classic global solution without explicit mention of the Cauchy data.
The following lemma will be used to prove the existence of strong global solutions of our system, and we will need it in the proof of the main theorem as well.
Lemma 2.2
Let \(A : \mathcal {H}\rightarrow 2^{\mathcal {H}}\) be a maximally monotone operator and \(B : \mathcal {H} \rightarrow \mathcal {H}\) a \(\beta \)-cocoercive operator for some \(\beta > 0\). Then, the following statements hold:
-
(i)
For \(\lambda > 0\) and \(\gamma \in (0, 2\beta )\), \(T_{\lambda , \gamma }\) is a \(\lambda \frac{4\beta - \gamma }{4\beta }\)-cocoercive operator. In particular, this also implies that \(T_{\lambda , \gamma }\) is \(\frac{\lambda }{2}\)-cocoercive.
-
(ii)
Choose \(\lambda _{1}, \lambda _{2} > 0\), \(\gamma _{1}, \gamma _{2}\in (0, 2\beta )\) and \(x, y\in \mathcal {H}\). Then, for \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\) it holds
$$\begin{aligned} \Vert \lambda _{1}T_{\lambda _{1}, \gamma _{1}}(x) - \lambda _{2}T_{\lambda _{2}, \gamma _{2}}(y)\Vert&\le \, 4\Vert x - y\Vert + \frac{4\beta |\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert B(x)\Vert \\&\quad +\frac{2|\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert x - \overline{x}\Vert ,\\ \left\| T_{\lambda _{1}, \gamma _{1}}(x) - T_{\lambda _{2}, \gamma _{2}}(y)\right\|&\le \, \frac{1}{\lambda _{1}}\left[ 4\Vert x - y\Vert + 4\beta \frac{|\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert Bx\Vert \right. \\ {}&\quad \left. + 2\frac{|\gamma _{1} - \gamma _{2}|}{\gamma _{1}}\Vert x - \overline{x}\Vert \right] + 2\frac{|\lambda _{2} - \lambda _{1}|}{\lambda _{1}\lambda _{2}}\Vert y - \overline{x}\Vert . \end{aligned}$$ -
(iii)
If x is a classic global solution to (2) and \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\), then, for every \(t\ge t_{0}\), we have
$$\begin{aligned} \left\| \frac{d}{dt}\left( \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\right) \right\| \le 4\Vert \dot{x}(t)\Vert + 4\beta \frac{|\dot{\gamma }(t)|}{\gamma (t)}\Vert B(x(t))\Vert + 2\frac{|\dot{\gamma }(t)|}{\gamma (t)}\Vert x(t) - \overline{x}\Vert . \end{aligned}$$
Proof
-
(i)
From [14, Proposition 26.1(iv)(d)] we know that the operator \(J_{\gamma A}\circ ({{\,\mathrm{Id}\,}}- \gamma B)\) is \(\alpha = \frac{2\beta }{4\beta - \gamma }\)-averaged. From [14, Proposition 4.39], we obtain that \({{\,\mathrm{Id}\,}}- J_{\gamma A}\circ ({{\,\mathrm{Id}\,}}- \gamma B)\) is \(\frac{1}{2\alpha }\)-cocoercive, namely, it is \(\frac{4\beta - \gamma }{4\beta }\)-cocoercive. Since \(\gamma \in (0, 2\beta )\), we have \(\frac{4\beta - \gamma }{4\beta } > \frac{2\beta }{4\beta } = \frac{1}{2}\), which implies that \({{\,\mathrm{Id}\,}}- J_{\gamma A}\circ ({{\,\mathrm{Id}\,}}- \gamma B)\) is \(\frac{1}{2}\)-cocoercive and thus
$$\begin{aligned} T_{\lambda , \gamma } \,\,\,\text {is}\,\,\,\lambda \frac{4\beta - \gamma }{4\beta }\text {-cocoercive}\,\,\,\text {and}\,\,\,T_{\lambda , \gamma } \,\,\,\text {is}\,\,\,\frac{\lambda }{2}\text {-cocoercive}. \end{aligned}$$ -
(ii)
We have
$$\begin{aligned}&\Vert \lambda _{1}T_{\lambda _{1}, \gamma _{1}}(x) - \lambda _{2}T_{\lambda _{2}, \gamma _{2}}(y)\Vert \le \Vert x - y\Vert + \Vert J_{\gamma _{1}A}(x - \gamma _{1}B(x)) - J_{\gamma _{2}A}(y - \gamma _{2}B(y))\Vert \\&\quad \le \Vert x - y\Vert + \Vert J_{\gamma _{1}A}(x - \gamma _{1}B(x)) - J_{\gamma _{2}A}(x - \gamma _{1}B(x))\Vert \\&\qquad + \Vert J_{\gamma _{2}A}(x - \gamma _{1}B(x)) - J_{\gamma _{2}A}(y - \gamma _{2}B(y))\Vert \\&\quad \le \ 2\Vert x - y\Vert + |\gamma _{1} - \gamma _{2}|\Vert A_{\gamma _{1}}(x - \gamma _{1}B(x))\Vert + \Vert \gamma _{1}B(x) - \gamma _{2}B(y)\Vert \\&\quad \le \ 2\Vert x - y\Vert + |\gamma _{1} - \gamma _{2}|\Vert A_{\gamma _{1}}(x - \gamma _{1}B(x))\Vert \\&\qquad + \Vert \gamma _{1}B(x) - \gamma _{2}B(x)\Vert + \Vert \gamma _{2}B(x) - \gamma _{2}B(y)\Vert \\&\quad = \ 2\Vert x - y\Vert + |\gamma _{1} - \gamma _{2}|\Vert A_{\gamma _{1}}(x - \gamma _{1}B(x))\Vert \\&\qquad + |\gamma _{1} - \gamma _{2}|\Vert B(x)\Vert + \gamma _{2}\Vert B(x) - B(y)\Vert . \end{aligned}$$
Now, notice that
so using (i) and the fact that \(T_{\gamma _{1}, \gamma _{2}}(\overline{x}) = 0\), we obtain
Altogether, plugging (8) into our initial inequality yields
To show the second inequality, we use the previous one. We have
where the last line is a consequence of \(T_{\lambda _{2}, \gamma _{2}}\) being \(\frac{\lambda _{2}}{2}\)-cocoercive, and hence \(\frac{2}{\lambda _{2}}\)-Lipschitz continuous (see (i)).
(iii) For \(t, s \ge t_{0}\) set
and use (ii) to obtain, for every \(t\ge t_{0}\),
Hence, by taking the limit as \(s\rightarrow t\) we get, for any \(t\ge t_{0}\),
\(\square \)
The next theorem concerns the existence and uniqueness of strong global solutions to (Split-DIN-AVD).
Theorem 2.3
Assume that \(\lambda , \gamma : [t_{0}, +\infty )\rightarrow (0, +\infty )\) are Lebesgue measurable functions and that \(\inf _{t\ge t_{0}}\lambda (t) > 0\). Then, for any \((x_{0}, u_{0})\in \mathcal {H}\times \mathcal {H}\) there exists a unique strong global solution \(x : [t_{0}, +\infty )\rightarrow \mathcal {H}\) of the system (2) that satisfies \(x(t_{0}) = x_{0}\) and \(\dot{x}(t_{0}) = u_{0}\).
Proof
We will rely on [17, Proposition 6.2.1] and distinguish between the cases \(\xi > 0\) and \(\xi = 0\). For each chase, we will check that the conditions of the afforementioned proposition are fulfilled. We will be working in the real Hilbert space \(\mathcal {H}\times \mathcal {H}\) endowed with the norm \(\Vert (x, y)\Vert = \Vert x\Vert + \Vert y\Vert \). Let \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\) be fixed.
The Case \(\xi > 0\). First, it can be easily checked (see also [4, 9, 13]) that for all \(t\ge t_{0}\) the following dynamical systems are equivalent
- \(*\):
-
\(\displaystyle \ddot{x}(t) + \frac{\alpha }{t}\dot{x}(t) + \xi \left( \frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\right) + T_{\lambda (t), \gamma (t)}(x(t)) = 0\).
- \(*\):
-
\(\displaystyle {\left\{ \begin{array}{ll} \dot{x}(t) + \xi T_{\lambda (t), \gamma (t)}(x(t)) - \left( \frac{1}{\xi } - \frac{\alpha }{t}\right) x(t) + \frac{1}{\xi }y(t) = 0, \\ \dot{y}(t) - \left( \frac{1}{\xi } - \frac{\alpha }{t} + \frac{\alpha \xi }{t^{2}}\right) x(t) + \frac{1}{\xi }y(t) = 0. \end{array}\right. }\)
In other words, (2) with Cauchy data \((x_{0}, u_{0}) = (x(t_{0}), \dot{x}(t_{0}))\) is equivalent to the first order system
where \(z(t) = (x(t), y(t))\), F is given, for every \(t\ge t_{0}\), by
and the Cauchy data is \(x_{0} = x(t_{0})\), \(y_{0} = -\xi \left( u_{0} + \xi T_{\lambda (t_{0}), \gamma (t_{0})}(x_{0}) - \left( \frac{1}{\xi } - \frac{\alpha }{t_{0}}\right) x_{0}\right) \).
-
(i)
Let \(t\in [t_{0}, +\infty )\) be fixed. We need to verify the Lipschitz continuity of F on the z variable. Set \(z = (x, y)\), \(w = (u, v)\). We have
$$\begin{aligned} \Vert F(t, z) - F(t, w)\Vert = \,&\left\| -\xi \left( T_{\lambda (t), \gamma (t)}(x) - T_{\lambda (t), \gamma (t)}(u) + \left( \frac{1}{\xi } - \frac{\alpha }{t}\right) \right. \right. \\ {}&\left. \left. (x - u) - \frac{1}{\xi }(y - v)\right) \right\| \\&+ \left\| \left( \frac{1}{\xi } - \frac{\alpha }{t} + \frac{\alpha \xi }{t^{2}}\right) (x - u) - \frac{1}{\xi }(y - v)\right\| . \end{aligned}$$Set \(\underline{\lambda } := \inf _{t\ge t_{0}}\lambda (t) > 0\). According to Lemma 2.2(i), the term involving the operator \(T_{\lambda (t), \gamma (t)}\) satisfies
$$\begin{aligned} \left\| T_{\lambda (t), \gamma (t)}(x) - T_{\lambda (t), \gamma (t)}(u)\right\| \le \frac{2}{\lambda (t)}\Vert x - u\Vert \le \frac{2}{\underline{\lambda }}\Vert x - u\Vert . \end{aligned}$$It follows that, if we take
$$\begin{aligned} K(t) := \max \left\{ \frac{2\xi }{\underline{\lambda }} + \left| \frac{1}{\xi } - \frac{\alpha }{t}\right| + \left| \frac{1}{\xi } - \frac{\alpha }{t} + \frac{\alpha \xi }{t^{2}}\right| , \, \frac{2}{\xi }\right\} \quad \forall t\ge t_{0}, \end{aligned}$$then we have \(K\in L_{\text {loc}}^{1}([t_{0}, +\infty ), \mathbb {R})\) and
$$\begin{aligned} \Vert F(t, z) - F(t, w)\Vert \le K(t)\Vert z - w\Vert \quad \forall t\ge t_{0}. \end{aligned}$$ -
(ii)
Now, we claim that F fulfills a boundedness condition. For \(t\in [t_{0}, +\infty )\) and \(z = (x, y)\in \mathcal {H} \times \mathcal {H}\) we have
$$\begin{aligned} \Vert F(t, z)\Vert = \left\| -\xi T_{\lambda (t), \gamma (t)}(x) + \left( \frac{1}{\xi } - \frac{\alpha }{t}\right) x - \frac{1}{\xi }y\right\| + \left\| \left( \frac{1}{\xi } - \frac{\alpha }{t} + \frac{\alpha \xi }{t^{2}}\right) x - \frac{1}{\xi }y\right\| . \end{aligned}$$By Lemma 2.2(i), we have, for every \(t\ge t_{0}\),
$$\begin{aligned} \left\| T_{\lambda (t), \gamma (t)}(x)\right\| = \left\| T_{\lambda (t), \gamma (t)}(x) - T_{\lambda (t), \gamma (t)}(\overline{x})\right\| \le \frac{2}{\lambda (t)}\Vert x - \overline{x}\Vert . \end{aligned}$$Hence, if we take
$$\begin{aligned} P(t) = \max \left\{ \frac{2\xi }{\lambda (t)} + \left| \frac{1}{\xi } - \frac{\alpha }{t}\right| + \left| \frac{1}{\xi } - \frac{\alpha }{t} + \frac{\alpha \xi }{t^{2}}\right| , \frac{2\xi }{\lambda (t)},\, \frac{2}{\xi }\right\} \quad \forall t\ge t_{0}, \end{aligned}$$then we have \(P\in L_{\text {loc}}^{1}([t_{0}, +\infty ), \mathbb {R})\) and
$$\begin{aligned} \Vert F(t, z)\Vert \le P(t)(1 + \Vert z\Vert ). \end{aligned}$$
We have checked that the conditions of [17, Proposition 6.2.1] hold. Therefore, there exists a unique locally absolutely continuous solution \(t\mapsto x(t)\) of (2) that satisfies \(x(t_{0}) = x_{0}\) and \(\dot{x}(t_{0}) = u_0\).
The Case \(\xi = 0\). Now, (2) is easily seen to be equivalent to
where \(z(t) = (x(t), y(t))\) and F is given, for every \(t\ge t_{0}\), by
Showing that F fulfills the required properties is starightforward. \(\square \)
3 The Convergence Properties of the Trajectories
In this section, we will study the asymptotic behaviour of the trajectories of the system
where
We will show weak convergence of the trajectories generated by (2) to elements of \({{\,\mathrm{zer}\,}}(A + B)\), as well as the fast convergence of the velocities and accelerations to zero. Additionally, we will provide convergence rates for \(T_{\lambda (t), \gamma (t)}(x(t))\) and \(\frac{d}{dt}T_{\lambda (t), \gamma (t)}(x(t))\) as \(t\rightarrow +\infty \). To avoid repetition of the statement “for almost every t”, in the following theorem we will assume we are working with a classic global solution of our system.
Theorem 3.1
Let \(A : \mathcal {H}\rightarrow 2^{\mathcal {H}}\) be a maximally monotone operator and \(B : \mathcal {H}\rightarrow \mathcal {H}\) a \(\beta \)-cocoercive operator for some \(\beta > 0\) such that \({{\,\mathrm{zer}\,}}(A + B)\ne \emptyset \). Assume that \(\alpha > 1\), \(\xi \ge 0\), \(\lambda (t) = \lambda t^{2}\) for \(\lambda > \frac{2}{(\alpha - 1)^{2}}\) and all \(t\ge t_{0}\), and that \(\gamma : [t_{0}, +\infty )\rightarrow (0, 2\beta )\) is a differentiable function that satisfies \(\frac{\dot{\gamma }(t)}{\gamma (t)} = \mathcal {O}\left( \frac{1}{t}\right) \) as \(t\rightarrow +\infty \). Then, for a solution \(x : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) to (Split-DIN-AVD), the following statements hold:
-
(i)
x is bounded.
-
(ii)
We have the estimates
$$\begin{aligned}&\int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }t^{3}\Vert \ddot{x}(t)\Vert ^{2}dt< +\infty , \\&\int _{t_{0}}^{+\infty }\frac{\gamma ^{2}(t)}{t}\left\| A_{\gamma (t)}\Big [x(t) - \gamma (t)Bx(t)\Big ] + Bx(t)\right\| ^{2}dt < +\infty . \end{aligned}$$ -
(iii)
We have the convergence rates
$$\begin{aligned}&\Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) , \Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) ,\\&\left\| A_{\gamma (t)}\Big [x(t) - \gamma (t)Bx(t)\Big ] + Bx(t)\right\| = o\left( \frac{1}{\gamma (t)}\right) , \\&\left\| \frac{d}{dt} \left( A_{\gamma (t)}\Big [x(t) - \gamma (t)Bx(t)\Big ] + Bx(t)\right) \right\| = \mathcal {O}\left( \frac{1}{t\gamma (t)}\right) + o\left( \frac{t^{2}\left| \frac{d}{dt}\frac{\gamma (t)}{\lambda (t)}\right| }{\gamma ^{2}(t)}\right) \end{aligned}$$as \(t\rightarrow +\infty \).
-
(iv)
If \(0< \inf _{t \ge t_{0}}\gamma (t) \le \sup _{t\ge t_{0}}\gamma (t) < 2\beta \), then x(t) converges weakly to an element of \({{\,\mathrm{zer}\,}}(A + B)\) as \(t \rightarrow +\infty \).
Proof
Integral Estimates and Rates. To develop the analysis, we will fix \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\) and make of use of the Lyapunov function \(\mathcal {E} : [t_{0}, +\infty ) \rightarrow \mathbb {R}\cup \{+\infty \}\) given by
Differentiation of \(\mathcal {E}\) with respect to time yields, for every \(t\ge t_{0}\),
After reduction and employing (2), we get, for every \(t\ge t_{0}\),
Now, by Lemma 2.2(i), we know that \(T_{\lambda (t), \gamma (t)}\) is \(\frac{\lambda (t)}{2}\)-cocoercive for every \(t\ge t_{0}\). Using this on the first summand of the right hand side of the previous inequality yields, for \(t\ge t_{1} = \max \{\xi , t_{0}\}\),
Now, since \(\lambda > \frac{2}{(\alpha - 1)^{2}}\), we can choose \(\epsilon > 0\) such that
From (10) we get, for every \(t\ge t_{1}\),
By (11) and the definition of \(\lambda (t)\), we know that \(\frac{1 - \alpha }{2} + \frac{\epsilon }{2} < 0\), and
so we can find \(t_{2}\ge t_{1}\) such that for every \(t\ge t_{2}\) the previous expression becomes nonpositive. According to Lemma A.2, the right hand side of (12) is nonpositive whenever
This quantity can be rewritten as
Since \(\epsilon < \alpha - 1 - \sqrt{\frac{2}{\lambda }}\), we have \(\frac{\lambda }{2} > \frac{1}{(\alpha - 1 - \epsilon )^{2}}\). Hence,
This means we can find \(t_{3}\ge t_{2}\) such that for every \(t\ge t_{3}\) we have \(R(t) \le 0\), that is, for every \(t\ge t_{3}\) we have
Now, integrating (13) from \(t_{3}\) to t we obtain
From (13) and the form of \(\mathcal {E}\) we immediately obtain
From Lemma 2.2(i), we know that for every \(t\ge t_{0}\) the operator \(T_{\lambda (t), \gamma (t)}\) is \(\frac{2}{\lambda (t)}\)-Lipschitz continuous, which gives, for every \(t\ge t_{0}\),
Thus, from (15) and recalling that \(\lambda (t) = \lambda t^{2}\) we arrive at
By combining (15), (18) and (19) we obtain \(\sup _{t\ge t_{0}}t\Vert \dot{x}(t)\Vert < +\infty \) and therefore
From Lemma 2.2, (15), (20) and the fact that B is \(\frac{1}{\beta }\)-Lipschitz continuous we deduce that, as \(t\rightarrow +\infty \),
On the other hand, for every \(t\ge t_{0}\) we have
so by combining (19), (21), (22) and the fact that \(\dot{\lambda }(t) = 2\lambda t\) we arrive at
which yields
Let us now improve (19) and show that
According to (19) and (21) there exists a constant \(K > 0\) such that for every \(t\ge t_{0}\) it holds
By (17), the right hand side belongs to \(L^{1}([t_{0}, +\infty ), \mathbb {R})\), so we get
hence the limit
exists. Obviously, this implies the existence of \(L:= \lim _{t\rightarrow +\infty }\left\| \lambda (t)T_{\lambda (t), \gamma (t))}(x(t))\right\| ^{2}\). By using (17) again we come to
and so we must have \(L = 0\), which gives
By combining (2), (19), (20) and (23) we obtain, as \(t\rightarrow +\infty \),
Moreover, by using the well-known inequality \(\Vert a + b + c\Vert ^{2} \le 3\Vert a\Vert ^{2} + 3\Vert b\Vert ^{2} + 3\Vert c\Vert ^{2}\) for every \(a, b, c\in \mathcal {H}\), for every \(t\ge t_{0}\) it holds
From (16), (23) and (17) it follows
To see that \(\Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) \) as \(t\rightarrow +\infty \), we write, for every \(t\ge t_{0}\),
From (16) and (26) we deduce that the left hand side belongs to \(L^{1}([t_{0}, +\infty ), \mathbb {R})\), from which we infer that the limit \(\lim _{t\rightarrow +\infty }t^{2}\Vert \dot{x}(t)\Vert ^{2}\) exists. Using (16) again, we get
from which we finally deduce \(\lim _{t\rightarrow +\infty }t^{2}\Vert \dot{x}(t)\Vert ^{2} = 0\), therefore
Notice that we can write for every \(t\ge t_{0}\)
Hence, multiplying both sides of (25) by \(\frac{\lambda (t)}{\gamma (t)}\) and remembering the definition of \(\lambda (t)\) we obtain
For every \(t\ge t_{0}\), we have
Therefore, by using (23) and (28), and recalling that \(\lambda (t) = \lambda t^{2}\), we obtain
The fact that \(\Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) \) as \(t\rightarrow +\infty \) comes from (2), (27), (23) and (24).
Weak Convergence of the Trajectories. Let \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\). We will work with the energy function \(h :[t_{0}, +\infty )\rightarrow \mathbb {R}\) given by
For every \(t\ge t_{0}\), we have
Combining (2) and (29) gives us, for every \(t\ge t_{0}\),
By using the \(\frac{\lambda (t)}{2}\)-cocoercitivity of \(T_{\lambda (t), \gamma (t)}\) on the left hand side, Cauchy–Schwarz on the right hand side and multiplying both sides by t, the previous inequality entails, for every \(t\ge t_{0}\),
Now, puttin together results in
Now apply Lemma A.1 with \(\theta (t):= t \frac{\lambda (t)}{2}\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| \) for every \(t\ge t_{0}\) to deduce that the limit
exists, which fulfills the first condition of Opial’s Lemma A.3.
Let us now move on to the second condition. Suppose \(\widehat{x}\) is a weak sequential cluster point of \(t\mapsto x(t)\), that is, there exists a sequence \((t_{n})_{n\in \mathbb {N}}\subseteq [t_{0}, +\infty )\) such that \(t_{n}\rightarrow +\infty \) and \(x_{n} := x(t_{n})\) converges weakly to \(\widehat{x}\) as \(n\rightarrow +\infty \). Define
According to (25), we have \(U_{\gamma (t)}(x(t)) = \lambda (t)T_{\lambda (t), \gamma (t)}(x(t))\rightarrow 0\) as \(t\rightarrow +\infty \). Now, since \(\gamma (t)\in [\delta , 2\beta - \delta ]\) for all \(t\ge t_{0}\) for some \(\delta > 0\), we can extract a subsequence \((\gamma (t_{n_{k}}))_{k\in \mathbb {N}}\) such that \(\gamma (t_{n_{k}})\rightarrow \overline{\gamma }\in (0, 2\beta )\) as \(k\rightarrow +\infty \). We may assume without loss of generality then that \(\gamma _{n} := \gamma (t_{n})\rightarrow \overline{\gamma }\) as \(n\rightarrow +\infty \). We now have for every \(n \in \mathbb {N}\)
Now, since every weakly convergent sequence is bounded and the operators B and \(A_{\overline{\gamma }}\) are Lipschitz-continuous we deduce that the right-hand side of the previous inequality approaches zero as \(n\rightarrow +\infty \), therefore getting
as \(n\rightarrow +\infty \). Now, from the proof of part (i) of Lemma 2.2, we know that \(U_{\overline{\gamma }}\) is \(\frac{4\beta - \overline{\gamma }}{4\beta }\)-cocoercive, thus monotone and Lipschitz continuous and therefore maximally monotone. Summarizing, we have
-
1.
\(U_{\overline{\gamma }}\) is maximally monotone and thus its graph is closed in the weak\(\times \)strong topology of \(\mathcal {H} \times \mathcal {H}\) (see [14, Proposition 20.38(ii)]),
-
2.
\(x_{n}\) converges weakly to \(\widehat{x}\) and \(U_{\overline{\gamma }}(x_{n})\rightarrow 0\) as \(n\rightarrow +\infty \),
which allows us to conclude that \(U_{\overline{\gamma }}(\widehat{x}) = 0\), and gives finally \(\widehat{x}\in {{\,\mathrm{zer}\,}}(A + B)\). Now we just invoke Opial’s Lemma to achieve that x(t) converges weakly to \(\overline{x}\) as \(t\rightarrow +\infty \) for some \(\overline{x}\in {{\,\mathrm{zer}\,}}(A + B)\). \(\square \)
In the following subsections, we explore the particular cases \(B = 0\) and \(A = 0\), and we will show improvements with respect to previous results from the literature addressing continuous time approaches to monotone inclusions.
3.1 The Case \(B = 0\)
If we let \(B = 0\) in the (Split-DIN-AVD) system (2), then, attached to the monotone inclusion problem
we obtain the dynamics
where
We can state the following theorem.
Theorem 3.2
Let \(A : \mathcal {H}\rightarrow 2^{\mathcal {H}}\) be a maximally monotone operator such that \({{\,\mathrm{zer}\,}}A\ne \emptyset \). Assume that \(\alpha > 1\), \(\xi \ge 0\), \(\lambda (t) = \lambda t^{2}\) for \(\lambda > \frac{1}{(\alpha - 1)^{2}}\) and all \(t\ge t_{0}\), and that \(\gamma : [t_{0}, +\infty )\rightarrow (0, +\infty )\) is a differentiable function that satisfies \(\frac{|\dot{\gamma }(t)|}{\gamma (t)} = \mathcal {O}\left( \frac{1}{t}\right) \) as \(t\rightarrow +\infty \). Then, for a solution \(x : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) to (30), the following statements hold:
-
(i)
x is bounded.
-
(ii)
We have the estimates
$$\begin{aligned}&\int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }t^{3}\Vert \ddot{x}(t)\Vert ^{2}dt< +\infty ,\\&\int _{t_{0}}^{+\infty }\frac{\gamma ^{2}(t)}{t}\left\| A_{\gamma (t)}(x(t))\right\| ^{2}dt < +\infty . \end{aligned}$$ -
(iii)
We have the convergence rates
$$\begin{aligned}&\Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) , \ \Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) , \\&\left\| A_{\gamma (t)}(x(t))\right\| = o\left( \frac{1}{\gamma (t)}\right) , \ \left\| \frac{d}{dt}A_{\gamma (t)}(x(t))\right\| = \mathcal {O}\left( \frac{1}{t\gamma (t)}\right) + o\left( \frac{t^{2}\left| \frac{d}{dt}\frac{\gamma (t)}{\lambda (t)}\right| }{\gamma ^{2}(t)}\right) \end{aligned}$$as \(t\rightarrow +\infty \).
-
(iv)
If \(0 < \inf _{t\ge t_{0}}\gamma (t)\), then x(t) converges weakly to an element of \({{\,\mathrm{zer}\,}}A\) as \(t \rightarrow +\infty \).
Proof
The proof proceeds in the exact same way as the proof of Theorem 3.1. However, a few comments are in order: first of all, now we have \(T_{\lambda , \gamma } = \frac{1}{\lambda }({{\,\mathrm{Id}\,}}- J_{\gamma A}) = A_{\lambda , \gamma }\). Since \(J_{\lambda A}\) is firmly nonexpansive, by [14, Proposition 4.4] so is \({{\,\mathrm{Id}\,}}- J_{\lambda A}\). In other words, \({{\,\mathrm{Id}\,}}- J_{\gamma A}\) is 1-cocoercive, therefore \(A_{\lambda , \gamma } = \frac{1}{\lambda }({{\,\mathrm{Id}\,}}- J_{\gamma A})\) is \(\lambda \)-cocoercive, so now the condition on \(\lambda \) becomes \(\lambda > \frac{1}{(\alpha - 1)^{2}}\).
The proof also changes when we verify the second part of the Opial’s Lemma, to get weak convergence of the trajectories \(t\mapsto x(t)\). This is in order to allow for \(\gamma (t)\) not to be necessarily bounded. We do need, however, the assumption \(0 < \inf _{t\ge t_{0}}\gamma (t)\). Indeed, from \(\Vert A_{\lambda (t), \gamma (t)}(x(t))\Vert = o\left( \frac{1}{t^{2}}\right) \) as \(t\rightarrow +\infty \), we obtain
as \(t\rightarrow +\infty \). Using the definition of the resolvent, we come to
for all \(t\ge t_{0}\). If \((t_{n})_{n\in \mathbb {N}}\subseteq [t_{0}, +\infty )\) is such that \(t_{n}\rightarrow +\infty \) and \(x(t_{n})\) converges weakly to \(\widehat{x}\) as \(n\rightarrow +\infty \), then the previous inclusion, together with the assumption on \(\gamma \) gives
and by the closedness of the graph of A in the weak\(\times \)strong topology of \(\mathcal {H} \times \mathcal {H}\), we deduce that \(\widehat{x}\in {{\,\mathrm{zer}\,}}A\). \(\square \)
Remark 3.3
The hypotheses required for \(\gamma \) are fulfilled at least by two families of functions. First, take \(r\ge 0\) and set \(\gamma (t) = e^{t^{-r}}\). Then, we have
and
If \(\gamma \) is a polynomial of degree n for some \(n\in \mathbb {N}\), the conditions are also fulfilled. Assume \(\gamma (t) = a_{n}t^{n} + a_{n - 1}t^{n - 1} + \cdots + a_{0}\) for all \(t\ge t_{0}\), for some \(a_{i}\in \mathbb {R}\) for \(i\in \{0, \ldots , n\}\) and \(a_{n} > 0\). Then, we have
so \(\frac{\dot{\gamma }(t)}{\gamma (t)} = \mathcal {O}\left( \frac{1}{t}\right) \) as \(t\rightarrow +\infty \). Since we also have \(\gamma (t) \rightarrow +\infty \) as \(t\rightarrow +\infty \), the condition \(\inf _{t\ge t_{0}} \gamma (t) > 0\) is fulfilled for large enough \(t_{0}\).
In particular, we can choose \(\gamma (t) = \lambda (t) = \lambda t^{2}\), which fulfills \(\gamma (t) \ge \lambda t_{0}^{2} > 0\) for any \(t\ge t_{0}\) and any \(t_{0}\). Since \(A_{\lambda , \lambda } = A_{\lambda }\) for \(\lambda > 0\), this choice of \(\gamma \) allows us to recover the (DIN-AVD) system studied by Attouch and László in [9]. Notice the way the convergence rates for \(A_{\gamma (t)}(x(t))\) and \(\frac{d}{dt}A_{\gamma (t)}(x(t))\) exhibited in part (iii) of Theorem 3.2 depend on \(\gamma (t)\). If we set \(\gamma (t) = t^{n}\) for every \(t\ge t_{0}\) for any natural number \(n > 2\), (Split-DIN-AVD) performs from this point of view better than (DIN-AVD) without increasing the complexity of the governing operator.
3.2 The Case \(A = 0\)
Let us return to (Split-DIN-AVD) dynamics (2). Set \(A = 0\), and for every \(t\ge t_{0}\) take \(\gamma (t) = \gamma \in (0, 2\beta )\) and \(\eta (t) = \eta t^{2}\) with \(\eta = \lambda /\gamma \). Then, associated to the problem
we obtain the system
The conditions \(\lambda > \frac{2}{(\alpha - 1)^{2}}\) and \(\gamma \in (0, 2\beta )\) imply
With the previous observation, we are able to state the following theorem.
Theorem 3.4
Let \(B: \mathcal {H}\rightarrow \mathcal {H}\) be a \(\beta \)-cocoercive operator for some \(\beta > 0\) such that \({{\,\mathrm{zer}\,}}B\ne \emptyset \). Assume that \(\alpha > 1\), \(\xi \ge 0\) and \(\eta (t) = \eta t^{2}\) for \(\eta > \frac{1}{\beta (\alpha - 1)^{2}}\) and all \(t\ge t_{0}\). Take \(x : [t_{0}, +\infty )\rightarrow \mathcal {H}\) a solution to (31). Then, the following hold:
-
(i)
x is bounded, and x(t) converges weakly to an element of \({{\,\mathrm{zer}\,}}B\) as \(t \rightarrow +\infty \).
-
(ii)
We have the estimates
$$\begin{aligned} \int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }t^{3}\Vert \ddot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }\frac{1}{t}\left\| Bx(t)\right\| ^{2}dt < \infty . \end{aligned}$$ -
(iii)
We have the convergence rates
$$\begin{aligned} \Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) , \quad \Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) \end{aligned}$$as well as the limit
$$\begin{aligned} \Vert Bx(t)\Vert \rightarrow 0 \end{aligned}$$as \(t\rightarrow +\infty \).
Proof
Since \(\eta > \frac{1}{\beta (\alpha - 1)^{2}}\), we can find \(\epsilon \in (0, \beta )\) such that \(\eta > \frac{1}{(\beta - \epsilon )(\alpha - 1)^{2}}\), equivalently, \(2(\beta - \epsilon )\eta > \frac{2}{(\alpha - 1)^{2}}\). Since (31) is equivalent to (Split-DIN-AVD) with \(A = 0\) and parameters \(\lambda = 2(\beta - \epsilon )\eta > \frac{1}{(\alpha - 1)^{2}}\) and \(\gamma (t) \equiv 2(\beta - \epsilon ) \in (0, 2\beta )\), the conclusion follows from Theorem 3.1. \(\square \)
Remark 3.5
-
(a)
As we mentioned in the introduction, the dynamical system (31) provides a way of finding the zeros of a cocoercive operator directly through forward evaluations, instead of having to resort to its Moreau envelope when following the approach in [9].
-
(b)
The dynamics (31) bear some resemblance to the system (6) (see also [16]) with \(\mu (t) = \frac{\alpha }{t}\) and \(\nu (t) = \frac{1}{\eta (t)}\), with an additional Hessian-driven damping term. In our case, since \(\eta > \frac{1}{\beta (\alpha - 1)^{2}}\), the parameters satisfy
$$\begin{aligned} \dot{\mu }(t) = -\frac{\alpha }{t^{2}} \le 0, \quad \frac{\mu ^{2}(t)}{\nu (t)} = \frac{\alpha ^{2}\eta t^{2}}{t^{2}} = \alpha ^2 \eta > \frac{1}{\beta } \quad \forall t\ge t_{0}. \end{aligned}$$However, we have
$$\begin{aligned} \dot{\nu }(t) = -\frac{2}{\lambda t^{3}} \le 0 \quad \forall t\ge t_{0}, \end{aligned}$$so one of the hypotheses which is needed in (6) is not fulfilled, which shows that one cannot address the dynamical system (31) as a particular case of it; indeed, for (6) a vanishing damping is not allowed. With our system, we obtain convergence rates for \(\dot{x}(t)\) and \(\ddot{x}(t)\) as \(t\rightarrow +\infty \), which are not obtained in [16].
4 Structured Convex Minimization
We can specialize the previous results to the case of convex minimization, and show additionally the convergence of functional values along the generated trajectories to the optimal objective value at a rate that will depend on the choice of \(\gamma \). Let \(f:\mathcal {H}\rightarrow \mathbb {R}\cup \{+\infty \}\) be a proper, convex and lower semicontinuous function, and let \(g:\mathcal {H}\rightarrow \mathbb {R}\) be a convex and Fréchet differentiable function with \(L_{\nabla g}\)-Lipschitz continuous gradient. Assume that \({{\,\mathrm{argmin}\,}}_{\mathcal {H}}(f + g)\ne \emptyset \), and consider the minimization problem
Fermat’s rule tells us that \(\overline{x}\) is a global minimum of \(f + g\) if and only if
Therefore, solving (32) is equivalent solving the monotone inclusion \(0\in (A + B)(x)\) addressed in the first section, with \(A = \partial f\) and \(B = \nabla g\). Moreover, recall that if \(\nabla g\) is \(L_{\nabla g}\)-Lipschitz then it is \(\frac{1}{L_{\nabla g}}\)-cocoercive (Baillon–Haddad’s Theorem, see [14, Corollary 18.17]). Therefore, associated to the problem (32) we have the dynamics
where we have denoted \(u(t) = x(t) - \gamma (t)\nabla g(x(t))\) for all \(t\ge t_{0}\) for convenience.
Theorem 4.1
Let \(f: \mathcal {H}\rightarrow \mathbb {R}\cup \{+\infty \}\) be a proper, convex and lower semicontinuous function, and let \(g : \mathcal {H}\rightarrow \mathbb {R}\) be a convex and Fréchet differentiable function with a \(L_{\nabla g}\)-Lipschitz continuous gradient such that \({{\,\mathrm{argmin}\,}}_{\mathcal {H}}(f + g)\ne \emptyset \). Assume that \(\alpha > 1\), \(\xi \ge 0\), \(\lambda (t) = \lambda t^{2}\) for \(\lambda > \frac{2}{(\alpha - 1)^{2}}\) and all \(t\ge t_{0}\), and that \(\gamma : [t_{0}, +\infty )\rightarrow \left( 0, \frac{2}{L_{\nabla g}}\right) \) is a differentiable function that satisfies \(\frac{\dot{\gamma }(t)}{\gamma (t)} = \mathcal {O}(1/t)\) as \(t\rightarrow +\infty \). Then, for a solution \(x : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) to (33), the following statements hold:
-
(i)
x is bounded.
-
(ii)
We have the estimates
$$\begin{aligned}&\int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }t^{3}\Vert \ddot{x}(t)\Vert ^{2}dt< +\infty , \\&\int _{t_{0}}^{+\infty } \frac{\gamma ^{2}(t)}{t}\left\| \nabla f_{\gamma (t)}\Big [x(t) - \gamma (t)\nabla g(x(t))\Big ] + \nabla g(x(t))\right\| ^{2}dt < +\infty . \end{aligned}$$ -
(iii)
We have the convergence rates
$$\begin{aligned}&\Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) , \ \Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) , \\&\left\| \nabla f_{\gamma (t)}\Big [x(t) - \gamma (t)\nabla g(x(t))\Big ] + \nabla g(x(t))\right\| = o\left( \frac{1}{\gamma (t)}\right) , \\&\left\| \frac{d}{dt}\left( \nabla f_{\gamma (t)}\Big [x(t) - \gamma (t)\nabla g(x(t))\Big ] + \nabla g(x(t))\right) \right\| = \mathcal {O}\left( \frac{1}{t\gamma (t)}\right) + o\left( \frac{t^{2}\left| \frac{d}{dt}\frac{\gamma (t)}{\lambda (t)}\right| }{\gamma ^{2}(t)}\right) \end{aligned}$$as \(t\rightarrow +\infty \).
-
(iv)
If \(0< \inf _{t \ge t_{0}}\gamma (t) \le \sup _{t\ge t_{0}}\gamma (t) < \frac{2}{L_{\nabla g}}\), then x(t) converges converges to a minimizer of \(f + g\) as \(t \rightarrow +\infty \).
-
(v)
Additionally, if \(0 < \gamma (t) \le \frac{1}{L_{\nabla g}}\) for every \(t\ge t_{0}\) and we set \(u(t) := x(t) - \gamma (t) \nabla g(x(t))\), then
$$\begin{aligned} f\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\right) + g\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t))\right) - \min \nolimits _{\mathcal {H}}(f + g) = o\left( \frac{1}{\gamma (t)}\right) \end{aligned}$$as \(t\rightarrow +\infty \). Moreover, \(\left\| {{\,\mathrm{prox}\,}}_{\gamma (t)f}(u(t)) - x(t)\right\| \rightarrow 0\) as \(t\rightarrow +\infty \).
Proof
Parts (i)–(iv) are a direct consequence of Theorem 3.1. For checking (v), first notice that for all \(t\ge t_{0}\) we have
Now, let \(\overline{x}\in {{\,\mathrm{argmin}\,}}_{\mathcal {H}}(f+g)\). According to [15, Lemma 2.3], for every \(t\ge t_{0}\), we have the inequality
After summing the norm squared term and using the Cauchy–Schwarz inequality, for every \(t\ge t_{0}\) we obtain
which follows as a consequence of x being bounded and \(\left\| T_{\lambda (t), \gamma (t)}(x(t))\right\| = o\left( \frac{1}{t^{2}}\right) \) as \(t\rightarrow +\infty \). \(\square \)
Remark 4.2
It is also worth mentioning the system we obtain in the case where \(g \equiv 0\), since we also get some improved rates for the objective functional values when we compare (Split-DIN-AVD) to (DIN-AVD) [9]. In this case, we have the system
attached to the convex optimization problem
If we assume \(\lambda > \frac{1}{(\alpha - 1)^{2}}\), allow \(\gamma : [t_{0}, +\infty ) \rightarrow (0, +\infty )\) to be unbounded from above and otherwise keep the hypotheses of Theorem 4.1, for a solution \(x : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) to (35), the following statements hold:
-
(i)
x is bounded,
-
(ii)
We have the estimates
$$\begin{aligned}&\int _{t_{0}}^{+\infty }t\Vert \dot{x}(t)\Vert ^{2}dt< +\infty , \quad \int _{t_{0}}^{+\infty }t^{3}\Vert \ddot{x}(t)\Vert ^{2}dt< +\infty ,\\&\int _{t_{0}}^{+\infty }\frac{\gamma ^{2}(t)}{t}\left\| \nabla f_{\gamma (t)}(x(t))\right\| ^{2}dt < +\infty , \end{aligned}$$ -
(iii)
We have the convergence rates
$$\begin{aligned}&\Vert \dot{x}(t)\Vert = o\left( \frac{1}{t}\right) , \ \Vert \ddot{x}(t)\Vert = \mathcal {O}\left( \frac{1}{t^{2}}\right) , \\&\left\| \nabla f_{\gamma (t)}(x(t))\right\| = o\left( \frac{1}{\gamma (t)}\right) , \ \left\| \frac{d}{dt}\nabla f_{\gamma (t)}(x(t))\right\| = \mathcal {O}\left( \frac{1}{t\gamma (t)}\right) + o\left( \frac{t^{2}\left| \frac{d}{dt}\frac{\gamma (t)}{\lambda (t)}\right| }{\gamma ^{2}(t)}\right) \end{aligned}$$as \(t\rightarrow +\infty \).
-
(iv)
If \(0 < \inf _{t\ge t_{0}}\gamma (t)\), then x(t) converges weakly to a minimizer of f as \(t \rightarrow +\infty \).
-
(v)
We also obtain the rate
$$\begin{aligned} f_{\gamma (t)}(x(t)) - \min \nolimits _{\mathcal {H}}f = o\left( \frac{1}{\gamma (t)}\right) \quad \text {as} \quad t\rightarrow +\infty , \end{aligned}$$which entails
$$\begin{aligned} f\left( {{\,\mathrm{prox}\,}}_{\gamma (t)f}(x(t))\right) - \min \nolimits _{\mathcal {H}}f = o\left( \frac{1}{\gamma (t)}\right) \quad \text {and} \quad \left\| {{\,\mathrm{prox}\,}}_{\gamma (t)f}(x(t)) - x(t)\right\| \rightarrow 0 \end{aligned}$$as \(t\rightarrow +\infty \).
Parts (i)–(iv) are a direct consequence of Theorem 3.2 for the case \(A = \partial f\). For showing part (v), first notice that for \(\lambda > 0\) and \(u\in \mathcal {H}\) we have, according to the definition of \(f_{\lambda }\) and \({{\,\mathrm{prox}\,}}_{\lambda f}\),
Let \(\overline{x} \in \mathcal {H}\) be a minimizer of f. We apply the gradient inequality to \(f_{\gamma (t)}\), from which we obtain, for every \(t\ge t_{0}\)
where the last inequality follows from the Cauchy–Schwarz inequality. Since \(\left\| \nabla f_{\gamma (t)}(x(t))\right\| = o\left( \frac{1}{\gamma (t)}\right) \) as \(t\rightarrow +\infty \) and x is bounded, the previous inequality entails the first statement of (v). Again recalling the definition of the Moreau envelope of f, this finally gives
as \(t\rightarrow +\infty \), which implies the last two statements and concludes the proof.
As pointed out in Remark 3.3, we can choose \(\gamma (t) = \lambda t^{2}\) for every \(t\ge t_{0}\) and recover the (DIN-AVD) system for nonsmooth convex minimization problems studied in [9]. Moreover, we can also set \(\gamma (t) = t^{n}\) for a natural number \(n > 3\) and all \(t\ge t_{0}\). Now, not only are the convergence rates for \(\nabla f_{\gamma (t)}(x(t))\) and \(\frac{d}{dt}\nabla f_{\gamma (t)}(x(t))\) as \(t\rightarrow +\infty \) improved with respect to the system in [9], but (Split-DIN-AVD) also provides a better rate for the convergence of \(f_{\gamma (t)}(x(t))\) to \(\min _{\mathcal {H}}f\) as \(t\rightarrow +\infty \).
5 Numerical Experiments
In the following paragraphs we describe some numerical experiments that portray some aspects of the theory.
5.1 Minimizing a Smooth and Convex Function
As an example of a continuous time scheme minimizing a convex and Fréchet differentiable function \(g : \mathcal {H} \rightarrow \mathbb {R}\) with \(L_{\nabla g}\)-Lipschitz continuous gradient via (Split-DIN-AVD), we consider the system
where for \((x_{1}, x_{2})\in \mathbb {R}^{2}\) we set \(g(x_{1}, x_{2}) = \frac{1}{2}(x_{1}^{2} + 100x_{2}^{2})\) and therefore \(\nabla g(x_{1}, x_{2}) = (x_{1}, 100x_{2})\). A trajectory generated by (36) is a pair \(x(t) = (x_{1}(t), x_{2}(t))\). Figure 1 plots both components of the solution to (36) with initial Cauchy data \(x_{0} = (1, 1)\), \(u_{0} = (1, 1)\). Notice that the Lipschitz constant of \(\nabla g\) is \(L_{\nabla g} = 100\), which means that the cocoercitivity modulus of \(\nabla g\) is \(\beta = \frac{1}{L_{\nabla g}} = \frac{1}{100}\). To fulfill \(\eta > \frac{1}{\beta (\alpha - 1)^{2}} = \frac{100}{(\alpha - 1)^{2}}\), we choose \(\alpha = 20\), \(\eta = 0.278\). Figure 1a corresponds to the case with no Hessian damping, that is, \(\xi = 0\). Figure 1b corresponds to a Hessian damping parameter \(\xi = 0.2\).
Figure 2 depicts the fast convergence of the velocities to zero for the cases \(\xi = 0\) (Fig. 2a) and \(\xi = 0.2\) (Fig. 2b). In both figures, notice the effect of the damping parameter \(\xi > 0\), which attenuates the oscillations of the second component of the trajectories, as well as the oscillations present in the velocities.
5.2 Minimizing a Nonsmooth and Convex Function
As an example of a continuous time scheme minimizing a proper, convex and lower semicontinuous function \(f : \mathcal {H} \rightarrow \mathbb {R}\cup \{+\infty \}\) via (Split-DIN-AVD), we consider the system
We will consider three options for f and plot for each of them the trajectories, the objective function values and the gradients of the Moreau envelopes as follows:
In order to fulfill \(\alpha > 1\) and \(\lambda > \frac{1}{(\alpha - 1)^{2}}\), we choose the parameters \(\alpha = 2\), \(\lambda = 1.1\), and we take \(\xi = 0\) and \(\gamma (t) = t^{8}\). We compare the results given by (DIN-AVD) (that is, when \(\gamma (t) = \lambda t^{2}\)) and the ones given by our system (Split-DIN-AVD). The choice of \(\xi \) does not seem to change the plots in a significant way for the examples we have chosen.
Figure 3 depicts the trajectories x(t) of (37) and the function values \(f\left( {{\,\mathrm{prox}\,}}_{\gamma (t)}(x(t))\right) \) for our choices of f as \(t \rightarrow +\infty \). Figure 4 portrays the fast convergence to zero of \(\Vert \nabla f_{\gamma (t)}(x(t))\Vert \) as \(t\rightarrow +\infty \). Notice the big improvement over (DIN-AVD) for nonsmooth convex minimization in [9] when choosing \(\gamma (t) = t^{8}\), a result which we already knew theoretically. Polynomials of high degree seem to be the ones which give the biggest improvements in terms of rates.
5.3 An Example with Operator Splitting
Now we consider the monotone inclusion problem (1) for \(A(x_{1}, x_{2}) = (-x_{2}, x_{1})\) and \(B(x_{1}, x_{2}) = (x_{1}, x_{2})\) for every \((x_{1}, x_{2})\in \mathbb {R}^{2}\). For every \((x_{1}, x_{2}) \in \mathbb {R}^{2}\), an easy calculation gives
and so
and
(Split-DIN-AVD) now reads
We choose the parameters \(\alpha = 7\), \(\lambda = 0.056\), \(\gamma (t) \equiv 1.5\), and the Cauchy data \(x_{0} = (1, 2)\) and \(u_{0} = (-1, -1)\). Figure 5a corresponds to the case \(\xi = 0\), and Fig. 5b depicts the trajectory when the Hessian damping parameter is \(\xi = 0.8\). Again, notice how, not only for optimization problems, but also for monotone inclusions which cannot be reduced to the former, the presence of \(\xi \) seems to attenuate the oscillations present in the trajectories.
6 A Numerical Algorithm
In the following we will derive via time discretization of (Split-DIN-AVD) a numerical algorithm for solving the monotone inclusion problem (1). We perform a discretization of (Split-DIN-AVD) with stepsize 1 and set, for an integer \(k \ge 1\), \(x(k) := x_{k}\), \(\lambda (k) := \lambda _{k}\), \(\gamma (k) := \gamma _{k}\). We make the approximations
so we get, for every \(k\ge 1\),
After rearranging the terms of (38), for every \(k\ge 1\) we obtain
In other words, after setting \(\alpha _{k} = 1 - \frac{\alpha }{k}\) and denoting the right hand side of (39) by \(y_{k}\) for every \(k\ge 1\), we obtain the following iterative scheme
Observe that the second step in (40) is always well-defined. Indeed, for \(\lambda , \gamma > 0\), \(T_{\lambda , \gamma }\) is \(\frac{\lambda }{2}\)-cocoercive, hence monotone (see Lemma 2.2(i)). This also implies that \(T_{\lambda , \gamma }\) is \(\frac{2}{\lambda }\)-Lipschitz continuous, and a monotone and continuous operator is maximally monotone, according to [14, Corollary 20.28]. Hence, by Minty’s Theorem (see [14, Theorem 21.1]), we know that \({{\,\mathrm{Id}\,}}+ T_{\lambda , \gamma } : \mathcal {H}\rightarrow \mathcal {H}\) is surjective.
We are in conditions of stating the main theorem concerning our previous algorithm.
Theorem 6.1
Let \(A : \mathcal {H}\rightarrow 2^{\mathcal {H}}\) be a maximally monotone operator and \(B : \mathcal {H}\rightarrow \mathcal {H}\) a \(\beta \)-cocoercive operator for some \(\beta \ge 0\) such that \({{\,\mathrm{zer}\,}}(A + B)\ne \emptyset \). Choose \(x_{0}, x_{1}\in \mathcal {H}\) any initial points. Let \(\alpha > 1\), \(\xi \ge 0\), and \((\lambda _{k})_{k \ge 0}\), \((\gamma _{k})_{k \ge 0}\) sequences of positive numbers that fulfill
Now, consider the sequences \((y_{k})_{k\ge 1}\) and \((x_{k})_{k\ge 0}\) generated by algorithm (40). The following properties are satisfied:
-
(i)
We have the estimates
$$\begin{aligned} \Vert x_{k + 1} - x_{k}\Vert = \mathcal {O}\left( \frac{1}{k}\right) \quad \text {and} \quad \left\| A_{\gamma _{k}}(x_{k} - \gamma _{k}Bx_{k}) + Bx_{k}\right\| = o\left( \frac{1}{\gamma _{k}}\right) \quad \text {as} \quad k\rightarrow +\infty . \end{aligned}$$ -
(ii)
The sequence \((x_{k})_{k\ge 0}\) converges weakly to an element of \({{\,\mathrm{zer}\,}}(A + B)\).
-
(iii)
The sequence \((y_{k})_{k\ge 1}\) converges weakly to an element of \({{\,\mathrm{zer}\,}}(A + B)\). Precisely, we have \(\Vert x_{k} - y_{k}\Vert = \mathcal {O}\left( \frac{1}{k}\right) \) as \(k\rightarrow +\infty \).
The proof can be done by transposing the techniques used in the continuous time case to the discrete time case. Algorithm (40) can be seen as a splitting version of the (PRINAM) algorithm studied by Attouch and László in [10].
Remark 6.2
The second step in (40) can be quite complicated to compute. However, if \(B = 0\), we can resort to the fact that \((A_{\lambda _{1}})_{\lambda _{2}} = A_{\lambda _{1} + \lambda _{2}}\) for \(\lambda _{1}, \lambda _{2} > 0\). We now have, for \(\lambda , \gamma > 0\),
which gives
It is now possible to write (40) in terms of the resolvents of A. We have, for every \(k\ge 1\),
So now (40) becomes
Now, if we assume \(0 < \inf _{k\ge 0}\gamma _{k}\) and \(\lambda > \frac{2\xi + 1}{(\alpha - 1)^{2}}\) and otherwise keep the hypotheses of Theorem 6.1, then for the sequences \((x_{k})_{k\ge 0}\) and \((y_{k})_{k\ge 1}\) generated by (41), the following statements hold:
-
(i)
We have the estimates
$$\begin{aligned} \Vert x_{k + 1} - x_{k}\Vert = \mathcal {O}\left( \frac{1}{k}\right) \quad \text {and} \quad \left\| A_{\gamma _{k}}(x_{k})\right\| = o\left( \frac{1}{\gamma _{k}}\right) \quad \text {as} \quad k\rightarrow +\infty . \end{aligned}$$ -
(ii)
The sequence \((x_{k})_{k\ge 0}\) converges weakly to an element of \({{\,\mathrm{zer}\,}}A\).
-
(iii)
The sequence \((y_{k})_{k\ge 1}\) converges weakly to an element of \({{\,\mathrm{zer}\,}}A\) as well. Precisely, we have \(\Vert x_{k} - y_{k}\Vert = \mathcal {O}\left( \frac{1}{k}\right) \) as \(k\rightarrow +\infty \).
Notice that the condition required for \((\gamma _{k})_{k\ge 0}\) is fulfilled in particular for \(\gamma _{k} = k^{n}\) for every \(k\ge 1\) and a natural number \(n \ge 1\). Thus, by choosing large n, we obtain a fast convergence rate for \(A_{\gamma _{k}}(x_{k})\) as \(k\rightarrow +\infty \).
References
Abbas, B., Attouch, H.: Dynamical systems and forward–backward algorithms associated with the sum of a convex subdifferential and a monotone cocoercive operator. Optimization 64(10), 2223–2252 (2015)
Álvarez, F.: On the minimizing property of a second-order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38(4), 1102–1119 (2020)
Álvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9(1–2), 3–11 (2001)
Álvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics. J. Math. Pures Apl. 81(8), 747–779 (2002)
Apidopoulos, V., Aujol, J.-F., Dossal, C.: Convergence rate of inertial forward–backward algorithm beyond Nesterov’s rule. Math. Program. 180, 137–156 (2020)
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168, 123–175 (2018)
Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case \(\alpha \le 3\). In: ESAIM: COCV, vol. 25, Article number 2 (2019)
Attouch, H., Goudou, X., Redont, P.: The heavy ball with friction method. The continuous dynamical system, global exploration of the local minima of a real-valued function by asymptotical analysis of a dissipative dynamical system. Commun. Contemp. Math. 2(1), 1–34 (2000)
Attouch, H., László, S.C.: Continuous Newton-like inertial dynamics for monotone inclusions. Set-Valued Var. Anal. 29, 555–581 (2021)
Attouch, H., László, S.C.: Newton-like inertial dynamics and proximal algorithms governed by maximally monotone operators. SIAM J. Optim. 30(4), 3252–3283 (2020)
Attouch, H., Maingé, P.E.: Asymptotic behavior of second order dissipative evolution equations combining potential with non-potential effects. ESAIM Control Optim. Calculus Var. 17(3), 836–857 (2011)
Attouch, H., Peypouquet, J.: Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators. Math. Program. 174(1–2), 391–432 (2019)
Attouch, H., Peypouquet, J., Redont, P.: Fast convex minimization via inertial dynamics with Hessian driven damping. J. Differ. Equ. 210(10), 5734–5783 (2016)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory, CMS Books in Mathematics, 2nd edn. Springer, Berlin (2017)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202
Boţ, R.I., Csetnek, E.R.: Second order forward–backward dynamical systems for monotone inclusion problems. SIAM J. Control Optim. 54, 1423–1443 (2016)
Haraux, A.: Systémes Dynamiques Dissipatifs et Applications. Masson (1991)
May, R.: Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turk. J. Math. 41(3), 681–685 (2017)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(\cal{O}(1/k^{2})\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)
Su, W.J., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Neural Inf. Process. Syst. 27, 2510–2518 (2014)
Funding
Open access funding provided by University of Vienna.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Research partially supported by the Doctoral Programme Vienna Graduate School on Computational Optimization (VGSCO) which is funded by FWF (Austrian Science Fund), project W1260-N35.
A Appendix
A Appendix
The following are three auxiliary lemmas that are used in the proof of Theorem 3.1. The proof for Lemma A.1 can be found in [12], while the proof of Lemma A.2 is straightforward. For the proof of Opial’s Lemma, we refer the reader to [1, Lemma 1.10].
Lemma A.1
Let \(t_{0} > 0\), and let \(u: [t_{0}, +\infty )\rightarrow \mathbb {R}\) be a continuously differentiable function which is bounded from below. Given \(\alpha >1\), a nonnegative function \(\theta : [t_{0}, +\infty )\rightarrow \mathbb {R}\) and a nonnegative function \(k\in L^{1}([t_{0}, +\infty ), \mathbb {R})\), let us assume that
for almost every \(t \ge t_{0}\). Then, the positive part \([\dot{u}]_{+}\) of \(\dot{u}\) belongs to \(L^{1}([t_{0}, +\infty ), \mathbb {R})\) and \(\lim _{t\rightarrow +\infty }u(t)\) exists. Moreover, we have \(\int _{t_{0}}^{+\infty }\theta (t)dt < +\infty \).
Lemma A.2
Let \(A, B, C\in \mathbb {R}\) and \(\mathcal {H}\) a real Hilbert space. Then the inequality
holds for every \(X, Y\in \mathcal {H}\) if and only if \(A, B\le 0 \) and \(C^{2} - AB \le 0\).
Lemma A.3
(Opial’s Lemma) Let \(S\subseteq \mathcal {H}\) be a nonempty set and \(x : [t_{0}, +\infty ) \rightarrow \mathcal {H}\) a given map, where \(t_{0} > 0\). Assume that
-
(i)
for every \(x^{*} \in S\), \(\lim _{t\rightarrow +\infty }\Vert x(t) - x^{*}\Vert \) exists;
-
(ii)
every weak sequential cluster point of the map x belongs to S.
Then, there exists \(x_{\infty } \in S\) such that x(t) converges weakly to \(x_{\infty }\) as \(t\rightarrow +\infty \).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Boţ, R.I., Hulett, D.A. Second Order Splitting Dynamics with Vanishing Damping for Additively Structured Monotone Inclusions. J Dyn Diff Equat 36, 727–756 (2024). https://doi.org/10.1007/s10884-022-10160-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10884-022-10160-3
Keywords
- Asymptotic stabilization
- Damped inertial dynamics
- Lyapunov analysis
- Vanishing viscosity
- Splitting system
- Monotone inclusions