Error estimates of the backward Euler–Maruyama method for multi-valued stochastic differential equations

Eisenmann, Monika; Kovács, Mihály; Kruse, Raphael; Larsson, Stig

doi:10.1007/s10543-021-00893-w

Error estimates of the backward Euler–Maruyama method for multi-valued stochastic differential equations

Open access
Published: 14 September 2021

Volume 62, pages 803–848, (2022)
Cite this article

Download PDF

You have full access to this open access article

BIT Numerical Mathematics Aims and scope Submit manuscript

Error estimates of the backward Euler–Maruyama method for multi-valued stochastic differential equations

Download PDF

2814 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper we derive error estimates of the backward Euler–Maruyama method applied to multi-valued stochastic differential equations. An important example of such an equation is a stochastic gradient flow whose associated potential is not continuously differentiable but assumed to be convex. We show that the backward Euler–Maruyama method is well-defined and convergent of order at least 1/4 with respect to the root-mean-square norm. Our error analysis relies on techniques for deterministic problems developed in Nochetto et al. (Commun Pure Appl Math 53(5):525–589, 2000). We verify that our setting applies to an overdamped Langevin equation with a discontinuous gradient and to a spatially semi-discrete approximation of the stochastic p-Laplace equation.

Stochastic C-Stability and B-Consistency of Explicit and Implicit Milstein-Type Schemes

Article 19 September 2016

Convergence, Non-negativity and Stability of a New Tamed Euler–Maruyama Scheme for Stochastic Differential Equations with Hölder Continuous Diffusion Coefficient

Article 16 November 2019

The improvement of the truncated Euler-Maruyama method for non-Lipschitz stochastic differential equations

Article 22 April 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we investigate the numerical approximation of multi-valued stochastic differential equations (MSDE). An important example of such equations is provided by stochastic gradient flows with a convex potential. More precisely, let $T \in (0,\infty )$ and $(\varOmega ,{\mathcal {F}}, ({\mathcal {F}}_t)_{t \in [0,T]}, {\mathbf {P}})$ be a filtered probability space satisfying the usual conditions. By $W :[0,T] \times \varOmega \rightarrow {\mathbf {R}}^m$, $m \in {\mathbf {N}}$, we denote a standard $({\mathcal {F}}_t)_{t \in [0,T]}$-adapted Wiener process whose increments are independent of the filtration. As a motivating example, let us consider the numerical treatment of nonlinear, overdamped Langevin-type equations of the form

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\mathrm {d}X(t) = - \nabla \varPhi (X(t)) \,\mathrm {d}t + g_0 \,\mathrm {d}W(t), \quad t \in (0,T],\\ X(0) = X_0, \end{array}\right. } \end{aligned}$$

(1.1)

where $X_0 \in L^p(\varOmega ,{\mathcal {F}}_0,{\mathbf {P}};{\mathbf {R}}^d)$, $p \in [2,\infty )$, $g_0 \in {\mathbf {R}}^{d,m}$, and $\varPhi :{\mathbf {R}}^d \rightarrow {\mathbf {R}}$ are given. The space ${\mathbf {R}}^{d,m}$ consists of the real matrices of type $d \times m$. These equations have many important applications, for example, in Bayesian statistics and molecular dynamics. We refer to [10, 22, 23, 45, 50], and the references therein.

We recall that if the gradient $\nabla \varPhi $ is of superlinear growth, then the classical forward Euler–Maruyama method is known to be divergent in the strong and weak sense, see [18]. This problem can be circumvented by using modified versions of the explicit Euler–Maruyama method based on techniques such as taming, truncating, stopping, projecting, or adaptive strategies, cf. [4, 6, 17, 19, 29, 49].

In this paper we take an alternative approach by considering the backward Euler–Maruyama method. Our main motivation for considering this method lies in its good stability properties, which allow its application to stiff problems arising, for instance, from the spatial semi-discretization of stochastic partial differential equations. Implicit methods have also been studied extensively in the context of stochastic differential equations with superlinearly growing coefficients. For example, see [1, 15, 16, 30, 31].

The error analysis in the above mentioned papers on explicit and implicit methods typically requires a certain degree of smoothness of $\nabla \varPhi $ such as local Lipschitz continuity. The purpose of this paper is to derive error estimates of the backward Euler–Maruyama method for equations of the form (1.1), where the associated potential $\varPhi :{\mathbf {R}}^d \rightarrow {\mathbf {R}}$ is not necessarily continuously differentiable, but assumed to be convex.

For the formulation of the numerical scheme, let $N \in {\mathbf {N}}$ be the number of temporal steps, let $k = \frac{T}{N}$ be the step size, and let

$$\begin{aligned} \pi = \{0 = t_0< \cdots<t_n< \cdots < t_N = T\} \end{aligned}$$

(1.2)

be an equidistant partition of the interval [0, T], where $t_n = n k$ for $n \in \{0,\dots ,N\}$. The backward Euler–Maruyama method for the Langevin equation (1.1) is then given by the recursion

$$\begin{aligned} {\left\{ \begin{array}{ll} X^{n} = X^{n-1} - k \nabla \varPhi (X^{n}) + g_0 \varDelta W^n, \quad n \in \{1,\ldots ,N\},\\ X^0 = X_0, \end{array}\right. } \end{aligned}$$

(1.3)

where $\varDelta W^n = W(t_n) - W(t_{n-1})$.

An example of a non-smooth potential is found by setting $d = m = 1$ and $\varPhi (x) = |x|^{p}$, $x \in {\mathbf {R}}$, for $p \in [1,2)$. Evidently, the gradient of $\varPhi $ is not locally Lipschitz continuous at $0 \in {\mathbf {R}}$ for $p \in (1,2)$. Moreover, if $p = 1$, then the gradient $\nabla \varPhi $ has a jump discontinuity of the form

$$\begin{aligned} \nabla \varPhi (x) = {\left\{ \begin{array}{ll} -1,&{} \text { if } x < 0,\\ c,&{} \text { if } x = 0,\\ 1,&{} \text { if } x > 0. \end{array}\right. } \end{aligned}$$

(1.4)

Here, the value $c \in {\mathbf {R}}$ at $x =0$ is not canonically determined. We have to solve a nonlinear equation of the form $x + k \nabla \varPhi (x) = y$ in each step of the backward Euler method (1.3). However, if $y \in (-k,k)$, then the sole candidate for a solution is $x = 0$, since otherwise $|x + k \nabla \varPhi (x)| \ge k$. But $x=0$ is only a solution if $kc = y$. Therefore, the mapping ${\mathbf {R}}\ni x \mapsto x + k \nabla \varPhi (x) \in {\mathbf {R}}$ is not surjective for any single-valued choice of c.

This problem can be bypassed by considering the multi-valued subdifferential $\partial \varPhi :{\mathbf {R}}^d \rightarrow 2^{{\mathbf {R}}^d}$ of a convex potential $\varPhi :{\mathbf {R}}^d \rightarrow {\mathbf {R}}$, which is given by

$$\begin{aligned} \partial \varPhi (x) = \big \{ v \in {\mathbf {R}}^d\, : \, \varPhi (x) + \langle v, y - x \rangle \le \varPhi (y) \, \text {for all } y \in {\mathbf {R}}^d \big \}. \end{aligned}$$

Recall that $\partial \varPhi (x) = \{\nabla \varPhi (x)\}$ if the gradient exists at $x \in {\mathbf {R}}^d$ in the classical sense. See [46, Section 23] for further details.

In the above example, one easily verifies that

$$\begin{aligned} \partial \varPhi (x) = {\left\{ \begin{array}{ll} \{-1\},&{} \text { if } x < 0,\\ {[}-1,1],&{} \text { if } x = 0,\\ \{ 1\},&{} \text { if } x > 0. \end{array}\right. } \end{aligned}$$

This allows us to solve the nonlinear inclusion where we want to find $x \in {\mathbf {R}}$ with $x + k \partial \varPhi (x) \ni y$ for any $y \in {\mathbf {R}}$.

For this reason we study the more general problem of the numerical approximation of multi-valued stochastic differential equations (MSDE) of the form

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\mathrm {d}X(t) + f(X(t)) \,\mathrm {d}t \ni b(X(t)) \,\mathrm {d}t + g(X(t)) \,\mathrm {d}W(t), \quad t \in (0,T],\\ X(0) = X_0. \end{array}\right. } \end{aligned}$$

(1.5)

Here, we assume that the mappings $b :{\mathbf {R}}^d \rightarrow {\mathbf {R}}^d$ and $g :{\mathbf {R}}^d \rightarrow {\mathbf {R}}^{d,m}$ are globally Lipschitz continuous. Moreover, the multi-valued drift coefficient function $f :{\mathbf {R}}^d \rightarrow 2^{{\mathbf {R}}^d}$ is assumed to be a maximal monotone operator, cf. Definition 2.1 below. We refer to Sect. 4 for a complete list of all imposed assumptions on the MSDE (1.5). Let us emphasize that the subdifferential of a proper, lower semi-continuous, and convex potential is an important example of a possibly multi-valued and maximal monotone mapping f, cf. [46, Corollary 31.5.2].

We use the backward Euler–Maruyama method for the approximation of the MSDE (1.5) on the partition $\pi $, which is given by the recursion

$$\begin{aligned} {\left\{ \begin{array}{ll} X^{n} \in X^{n-1} - k f(X^{n}) + k b(X^{n}) + g(X^{n-1}) \varDelta W^n, \quad n \in \{1,\ldots ,N\},\\ X^0 = X_0. \end{array}\right. } \end{aligned}$$

(1.6)

We discuss the well-posedness of this method (1.6) under our assumptions on f, b, and g in Sect. 5. In particular, it will turn out that both problems, (1.5) and (1.6), admit single-valued solutions $(X(t))_{t \in [0,T]}$ and $(X^n)_{n = 0}^N$, respectively.

The main result of this paper, Theorem 6.4, then states that the backward Euler–Maruyama method is convergent of order at least 1/4 with respect to the norm in $L^2(\varOmega ;{\mathbf {R}}^d)$. For the error analysis we rely on techniques for deterministic problems developed in [38]. An important ingredient is the additional condition on f that there exists $\gamma \in (0,\infty )$ with

$$\begin{aligned} \langle f_v - f_z, z - w \rangle \le \gamma \langle f_v - f_w, v - w\rangle \end{aligned}$$

for all $v, w, z \in D(f) \subset {\mathbf {R}}^d$ and $f_v \in f(v)$, $f_w \in f(w)$, $f_z \in f(z)$. This assumption is easily verified for a subdifferential of a convex potential, cf. Lemma 3.2. As already noted in [38] for deterministic problems, this inequality allows us to avoid Gronwall-type arguments in the error analysis for terms involving the multi-valued mapping f.

Before we give a more detailed outline of the content of this paper let us mention that multi-valued stochastic differential equations have been studied in the literature before. The existence of a uniquely determined solution to the MSDE (1.5) has been investigated, e.g., in [7, 21, 42]. We also refer to the more recent monograph [41] and the references therein. In [14, 52] related results have been derived for multi-valued stochastic evolution equations in infinite dimensions. The numerical analysis for MSDEs has also been considered in [3, 26, 43, 54, 56]. However, these papers differ from the present paper in terms of the considered numerical methods, the imposed conditions, or the obtained order of convergence.

Further, we also mention that several authors have developed explicit numerical methods for SDEs with discontinuous drifts in recent years. For instance, we refer to [9, 24, 25, 33, 35,36,37]. While these results often apply to more irregular drift coefficients, which are beyond the framework of maximal monotone operators, the authors have to employ more restrictive conditions such as the global boundedness or piecewise Lipschitz continuity of the drift, which is not required in our framework. This allows for more general growth conditions. Moreover, none of these papers allows for a multi-valued drift coefficient.

The main motivation for this paper is to present a novel approach to analyze MSDEs. As the numerical method as such is not new, we do not provide any numerical tests in this paper. Numerical experiments for implicit methods can be found, e.g., in [1, 3], and [30].

This paper is organized as follows: in Sect. 2 we fix some notation and recall the relevant terminology for multi-valued mappings. In Sect. 3 we demonstrate how to apply the techniques from [38] to the simplified setting of the Langevin equation (1.1). In addition, we also show that if the gradient $\nabla \varPhi $ is more regular, say Hölder continuous with exponent $\alpha \in (0,1]$, then the order of convergence increases to $\frac{1 + \alpha }{4}$. Moreover, it turns out that the error constant does not grow exponentially with the final time T. This is an important insight if the backward Euler method is used within an unadjusted Langevin algorithm [45], which typically requires large time intervals. See Theorem 3.7 and Remark 3.8 below.

In Sect. 4 we turn to the more general multi-valued stochastic differential equation (1.5) where we introduce all the assumptions imposed on the appearing drift and diffusion coefficients and collect some properties of the exact solution. In Sect. 5 we show that the backward Euler–Maruyama method (1.6) is well-posed under the assumptions of Sect. 4. In Sect. 6 we prove the already mentioned convergence result with respect to the root-mean-square norm. Finally, in Sect. 7 we verify that the setting of Sect. 4 applies to a Langevin equation with the discontinuous gradient (1.4). Further, we also show how to apply our results to the spatial discretization of the stochastic p-Laplace equation which indicates their usability for the numerical analysis of stochastic partial differential equations. However, a complete analysis of the latter problem will be deferred to a future work.

2 Preliminaries

In this section we collect some notation and introduce some background material. First we recall some terminology for set valued mappings and (maximal) monotone operators. For a more detailed introduction we refer, for instance, to [48, Abschn. 3.3] or [40, Chapter 6].

By ${\mathbf {R}}^d$, $d \in {\mathbf {N}}$, we denote the Euclidean space with the standard norm $|\cdot |$ and inner product $\langle \cdot , \cdot \rangle $. Let $M \subset {\mathbf {R}}^d$ be a set. A set-valued mapping $f :M \rightarrow 2^{{\mathbf {R}}^d}$ maps each $x \in M$ to an element of the power set $2^{{\mathbf {R}}^d}$, that is, $f(x) \subseteq {\mathbf {R}}^d$. The domain D(f) of f is given by

$$\begin{aligned} D(f) = \{ x \in M \, : \, f(x) \ne \emptyset \}. \end{aligned}$$

Definition 2.1

Let $M \subset {\mathbf {R}}^d$ be a non-empty set. A set-valued map $f :M \rightarrow 2^{{\mathbf {R}}^d}$ is called monotone if

$$\begin{aligned} \langle f_u - f_v,u - v\rangle _{} \ge 0 \end{aligned}$$

for all $u,v \in D(f)$, $f_u \in f(u)$, and $f_v \in f(v)$. Moreover, a set-valued mapping $f :M \rightarrow 2^{{\mathbf {R}}^d}$ is called maximal monotone if f is monotone and for all $x \in M$ and $y \in {\mathbf {R}}^d$ satisfying

$$\begin{aligned} \langle y - f_v,x - v\rangle _{} \ge 0 \quad \text { for all } v \in D(f), f_v \in f(v), \end{aligned}$$

it follows that $x \in D(f)$ and $y \in f(x)$.

Next, we recall a Burkholder–Davis–Gundy-type inequality. For a proof we refer to [28, Chapter 1, Theorem 7.1]. For its formulation we note that the Frobenius or Hilbert–Schmidt norm of a matrix $g \in {\mathbf {R}}^{d,m}$ is also denoted by |g|.

Lemma 2.2

Let $p \in [2,\infty )$ and $g \in L^p(\varOmega ; L^p(0,T;{\mathbf {R}}^{d,m}))$ be stochastically integrable. Then, for every $s,t \in [0,T]$ with $s<t$, the inequality

$$\begin{aligned} {\mathbf {E}}\left[ \left| \int _s^t g(\tau ) \,\mathrm {d}W(\tau ) \right| ^p \right]&\le \Big (\frac{p(p-1)}{2} \Big )^{\frac{p}{2}} (t-s)^{\frac{p-2}{2}} {\mathbf {E}}\left[ \int _s^t |g(\tau )|^p \,\mathrm {d}\tau \right] \end{aligned}$$

holds.

Let us also recall a stochastic variant of the Gronwall inequality. A proof that can be modified to this setting can be found in [55]. Compare also with [51].

Lemma 2.3

Let $Z,M,\xi :[0,T] \times \varOmega \!\rightarrow \! {\mathbf {R}}$ be $({\mathcal {F}}_t)_{t\in [0,T]}$-adapted and $\mathbf {P}$-almost surely continuous stochastic processes. Moreover, M is a local $({\mathcal {F}}_t)_{t\in [0,T]}$-martingale with $M(0) = 0$. Suppose that Z and $\xi $ are nonnegative. In addition, let $\varphi :[0, T ] \rightarrow {\mathbf {R}}$ be integrable and nonnegative. If, for all $t \in [0, T ]$, we have

$$\begin{aligned} Z(t) \le \xi (t) + \int _{0}^{t} \varphi (s) Z(s) \,\mathrm {d}s + M(t), \quad {\mathbf {P}}\text {-almost surely,} \end{aligned}$$

then, for every $t \in [0,T]$, the inequality

$$\begin{aligned} {\mathbf {E}}\big [Z(t) \big ] \le \exp \Big (\int _{0}^{t} \varphi (s) \,\mathrm {d}s \Big ) {\mathbf {E}}\big [ \sup _{s\in [0,t]} \xi (s) \big ] \end{aligned}$$

holds.

Moreover, we often make use of generic constants. More precisely, by C we denote a finite and positive quantity that may vary from occurrence to occurrence but is always independent of numerical parameters such as the step size $k = \frac{T}{N}$ and the number of steps $N \in {\mathbf {N}}$.

3 Application to the Langevin equation with a convex potential

In order to illustrate our approach, we first consider a more regular stochastic differential equation with single-valued (Hölder) continuous drift term. More precisely, we consider the overdamped Langevin equation [23, Section 2.2]

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\mathrm {d}X(t) = - \nabla \varPhi (X(t))\,\mathrm {d}t + g_0 \,\mathrm {d}W(t), \quad t \in [0,T],\\ X(0) = X_0, \end{array}\right. } \end{aligned}$$

(3.1)

where $X_0 \in L^2(\varOmega , {\mathcal {F}}_0, {\mathbf {P}};{\mathbf {R}}^d)$, $g_0 \in {\mathbf {R}}^{d,m}$, and $W :[0,T] \times \varOmega \rightarrow {\mathbf {R}}^m$ is a standard ${\mathbf {R}}^m$-valued Wiener process.

In this section we impose the following additional assumption on the potential $\varPhi :{\mathbf {R}}^d \rightarrow {\mathbf {R}}$. It allows us to illustrate our approach in a simplified analytical setting which avoids the full technical details required for dealing with multi-valued mappings. The assumption will be dropped in later parts of the paper.

Assumption 3.1

Let $\varPhi :{\mathbf {R}}^d \rightarrow {\mathbf {R}}$ be a convex, nonnegative, and continuously differentiable function.

In the following, we denote by $f :{\mathbf {R}}^d \rightarrow {\mathbf {R}}^d$ the gradient of $\varPhi $, that is $f(x) = \nabla \varPhi (x)$. It is well-known that the convexity of $\varPhi $ implies the variational inequality

$$\begin{aligned} \langle f(v), w - v \rangle \le \varPhi (w) - \varPhi (v),\quad v, w \in {\mathbf {R}}^d, \end{aligned}$$

(3.2)

see, for example, [46, § 23].

In the following lemma we collect some properties of f which are direct consequences of Assumption 3.1. Both inequalities are well-known. The proof of (3.4) is taken from [38].

Lemma 3.2

Under Assumption 3.1 and with $f = \nabla \varPhi $, the inequalities

$$\begin{aligned} \langle f(v) - f(w),v-w\rangle _{} \ge 0 \end{aligned}$$

(3.3)

and

$$\begin{aligned} \langle f(v) - f(z),z-w\rangle _{} \le \langle f(v) - f(w),v-w\rangle _{} \end{aligned}$$

(3.4)

are fulfilled for all $v,w,z \in {\mathbf {R}}^d$.

Proof

The first inequality follows directly from (3.2) since

$$\begin{aligned} \langle f(v) - f(w),v-w\rangle _{}&= -\langle f(v),w-v\rangle _{} - \langle f(w),v-w\rangle _{} \\&\ge - \big ( \varPhi (w) - \varPhi (v) \big ) - \big ( \varPhi (v) - \varPhi (w) \big ) = 0 \end{aligned}$$

for all $v,w \in {\mathbf {R}}^d$. For the proof of the second inequality we start by rewriting its left-hand side. For arbitrary $v,w,z \in {\mathbf {R}}^d$ we rearrange the terms to obtain

$$\begin{aligned}&\langle f(v) - f(z),z-w\rangle _{}\\&\quad = \langle f(v),z\rangle _{} - \langle f(v),w\rangle _{} + \langle f(z),w-z\rangle _{} + \langle f(v),v\rangle _{} - \langle f(v),v\rangle _{}\\&\quad = \langle f(v),z-v\rangle _{} + \langle f(v),v-w\rangle _{} + \langle f(z),w-z\rangle _{}\\&\qquad + \langle f(w),v-w\rangle _{} - \langle f(w),v-w\rangle _{}\\&\quad = \langle f(v),z-v\rangle _{} + \langle f(v)-f(w),v-w\rangle _{} + \langle f(z),w-z\rangle _{} + \langle f(w),v-w\rangle _{}. \end{aligned}$$

Setting $\sigma (v,w) := \varPhi (w) - \varPhi (v) - \langle f(v),w-v\rangle _{}$ for all $v,w \in {\mathbf {R}}^d$, we see that

$$\begin{aligned} \langle f(v) - f(z),z-w\rangle _{}&= \langle f(v)-f(w),v-w\rangle _{} + \varPhi (z) - \varPhi (v) - \sigma (v,z) \\&\quad + \varPhi (w) - \varPhi (z) - \sigma (z,w) + \varPhi (v) - \varPhi (w) - \sigma (w,v)\\&= \langle f(v)-f(w),v-w\rangle _{} - \sigma (v,z) - \sigma (z,w) - \sigma (w,v). \end{aligned}$$

But (3.2) says that $\sigma (v,w) \ge 0$ for all $v,w \in {\mathbf {R}}^d$, which completes the proof. $\square $

It follows from Assumption 3.1 and Lemma 3.2 that the drift $f = \nabla \varPhi $ of the stochastic differential equation (3.1) is continuous and monotone. Therefore, the stochastic differential equation (3.1) has a solution in the strong (probabilistic) sense, satisfying ${\mathbf {P}}$-a.s. for all $t \in [0,\infty )$

$$\begin{aligned} X(t) = X_0 - \int _0^t f(X(s))\,\mathrm {d}s + g_0 W(t). \end{aligned}$$

(3.5)

See, [44, Thm. 3.1.1] for a proof and more details on this concept of solution. Moreover, the solution is unique up to ${\mathbf {P}}$-indistinguishability and it is square-integrable with

$$\begin{aligned} \sup _{t \in [0,T]} {\mathbf {E}}\big [ |X(t)|^2 \big ] \le C \big (1 + {\mathbf {E}}\big [ |X_0|^2 \big ] \big ). \end{aligned}$$

Next, we turn to the numerical approximation of the solution of (3.1). Recall that for a single-valued drift the backward Euler–Maruyama method is given by the recursion

$$\begin{aligned} {\left\{ \begin{array}{ll} X^{n} = X^{n-1} - k f(X^{n}) + g_0 \varDelta W^n, \quad n \in \{1,\ldots ,N\},\\ X^0 = X_0, \end{array}\right. } \end{aligned}$$

(3.6)

where $\varDelta W^n = W(t_{n}) - W(t_{n-1})$, $t_n = nk$, and $k = \frac{T}{N}$.

The next lemma contains some a priori estimates for the backward Euler–Maruyama method (3.6).

Lemma 3.3

Let $g_0 \in {\mathbf {R}}^{d,m}$ be given and let Assumption 3.1 be satisfied. For an arbitrary step size $k = \frac{T}{N}$, $N \in {\mathbf {N}}$, let $(X^n)_{n \in \{0,\dots ,N\}}$ be a family of $({\mathcal {F}}_{t_n})_{n \in \{0,\ldots ,N\}}$-adapted random variables satisfying (3.6). If the initial value $X_0 \in L^2(\varOmega ,{\mathcal {F}}_0,{\mathbf {P}};{\mathbf {R}}^d)$, then

$$\begin{aligned} \max _{n \in \{1,\ldots ,N\}} {\mathbf {E}}[|X^n|^2] \le {\mathbf {E}}\big [ |X_0|^2 \big ] + 2 T \big ( \varPhi (0) + |g_0|^2 \big ) \end{aligned}$$

(3.7)

and

$$\begin{aligned} \sum _{n = 1}^N {\mathbf {E}}\big [ | X^n - X^{n-1}|^2 \big ] + 4 k \sum _{n = 1}^{N} {\mathbf {E}}\big [ \varPhi (X^n) \big ] \le 2{\mathbf {E}}\big [ |X_0|^2 \big ] + 4 T \big ( \varPhi (0) + |g_0|^2 \big ). \end{aligned}$$

(3.8)

Proof

First, we recall the identity

$$\begin{aligned} \langle X^n - X^{n-1},X^n\rangle _{} = \frac{1}{2} \big ( |X^n|^2 - |X^{n-1}|^2 + |X^n - X^{n-1}|^2 \big ). \end{aligned}$$

Using also (3.6), we then get

$$\begin{aligned} |X^n|^2 - |X^{n-1}|^2 + |X^n - X^{n-1}|^2&= 2 \langle X^n - X^{n-1}, X^n \rangle \\&= - 2 k \langle f(X^n),X^n\rangle _{} + 2 \langle g_0 \varDelta W^n, X^n \rangle , \end{aligned}$$

for every $n \in \{1,\ldots ,N\}$. Hence, an application of (3.2) yields

$$\begin{aligned} |X^n|^2 - |X^{n-1}|^2 + |X^n - X^{n-1}|^2&\le 2 k \big ( \varPhi (0) - \varPhi (X^n) \big ) + 2 \langle g_0 \varDelta W^n, X^n \rangle , \end{aligned}$$

for every $n \in \{1,\ldots ,N\}$. From applications of the Cauchy–Schwarz inequality and the weighted Young inequality we then obtain

$$\begin{aligned}&|X^n|^2 - |X^{n-1}|^2 + |X^n - X^{n-1}|^2 + 2 k \varPhi (X^n) \\&\quad \le 2 k \varPhi (0) + 2 \langle g_0 \varDelta W^n, X^n - X^{n-1} \rangle + 2 \langle g_0 \varDelta W^n, X^{n-1} \rangle \\&\quad \le 2 k \varPhi (0) + 2 \big | g_0 \varDelta W^n \big |^2 + \frac{1}{2} | X^n - X^{n-1}|^2 + 2 \langle g_0 \varDelta W^n, X^{n-1} \rangle , \end{aligned}$$

for every $n \in \{1,\ldots ,N\}$.

The third term on the right-hand side is absorbed in the third term on the left-hand side. Summation then yields

$$\begin{aligned}&|X^n|^2 + \frac{1}{2} \sum _{ j = 1}^n | X^{j} - X^{j-1} |^2 + 2 k \sum _{j = 1}^n \varPhi (X^j)\\&\quad \le |X^0|^2 + 2 t_n \varPhi (0) + 2 \sum _{j = 1}^n \big | g_0 \varDelta W^j \big |^2 + 2 \sum _{j = 1}^n \big \langle g_0 \varDelta W^j, X^{j-1} \big \rangle . \end{aligned}$$

An inductive argument over $n \in \{1,\ldots ,N\}$ then implies that $X^n$ is square-integrable due to the assumption $X_0 \in L^2(\varOmega ,{\mathcal {F}}_0,{\mathbf {P}};{\mathbf {R}}^d)$. Therefore, after taking expectation the last sum vanishes. Moreover, an application of the Itô isometry then gives

$$\begin{aligned}&{\mathbf {E}}\big [ |X^n|^2 \big ] + \frac{1}{2} \sum _{ j = 1}^n {\mathbf {E}}\big [ | X^{j} - X^{j-1} |^2 \big ] + 2 k \sum _{j = 1}^n {\mathbf {E}}\big [ \varPhi (X^j) \big ]\\&\quad \le {\mathbf {E}}\big [ |X^0|^2 \big ] + 2 t_n \varPhi (0) + 2 \sum _{j = 1}^n {\mathbf {E}}\big [ \big | g_0 \varDelta W^j \big |^2 \big ]\\&\quad = {\mathbf {E}}\big [ |X_0|^2 \big ] + 2 t_n \big ( \varPhi (0) + | g_0 |^2\big ). \end{aligned}$$

Since this is true for any $n \in \{1,\ldots ,N\}$ the assertion follows. $\square $

As the next theorem shows, Assumption 3.1 is also sufficient to ensure the well-posedness of the backward Euler–Maruyama method. The result follows directly from the fact that f is continuous and monotone due to (3.3). For a proof we refer, for instance, to [4, Sect. 4], [39, Chap. 6.4], and [53, Theorem C.2]. The assertion also follows from the more general result in Theorem 5.3 below.

Theorem 3.4

Let $X_0 \in L^2(\varOmega ,{\mathcal {F}}_0,{\mathbf {P}};{\mathbf {R}}^d)$ and $g_0 \in {\mathbf {R}}^{d,m}$ be given and let Assumption 3.1 be satisfied. Then, for every equidistant step size $k = \frac{T}{N}$, $N \in {\mathbf {N}}$, there exists a unique family of square-integrable and $({\mathcal {F}}_{t_n})_{n \in \{0,\ldots ,N\}}$-adapted random variables $(X^n)_{n\in \{0,\dots ,N\}}$ satisfying (3.6).

We now turn to an error estimate with respect to the $L^2(\varOmega ;{\mathbf {R}}^d)$-norm. Since we do not impose any (local) Lipschitz condition on the drift f, classical approaches based on discrete Gronwall-type inequalities are not applicable. Instead we rely on an error representation formula, which was introduced for deterministic problems in [38].

For its formulation, we introduce some additional notation: For a given equidistant partition $\pi = \{0=t_0< t_1< \cdots < t_N = T\} \subset [0,T]$ with step size $k = \frac{T}{N}$, we denote by ${\mathcal {X}}:[0,T] \times \varOmega \rightarrow {\mathbf {R}}^d$ the piecewise linear interpolant of the sequence $(X^n)_{n \in \{0,\ldots ,N\}}$ generated by the backward Euler method (3.6). It is defined by ${\mathcal {X}}(0) = X^0$ and for all $t \in (t_{n-1}, t_n]$, $n \in \{1,\ldots ,N\}$, by

$$\begin{aligned} {\mathcal {X}}(t) = \frac{t - t_{n-1}}{k} X^n + \frac{t_n - t}{k} X^{n-1}. \end{aligned}$$

(3.9)

In addition, we introduce the processes ${\overline{{\mathcal {X}}}}, {\underline{{\mathcal {X}}}} :[0,T] \times \varOmega \rightarrow {\mathbf {R}}^d$, which are piecewise constant interpolants of $(X^n)_{n \in \{0,\ldots ,N\}}$ and defined by ${\overline{{\mathcal {X}}}}(0)= {\underline{{\mathcal {X}}}}(0) = X^0$ and for all $t \in (t_{n-1}, t_n]$, $n \in \{1,\ldots ,N\}$, by

$$\begin{aligned} {\overline{{\mathcal {X}}}}(t) = X^n \quad \text {and} \quad {\underline{{\mathcal {X}}}}(t) = X^{n-1}. \end{aligned}$$

(3.10)

Analogously, we define the piecewise linear interpolated process ${\mathcal {W}} :[0,T] \times \varOmega \rightarrow {\mathbf {R}}^m$ by ${\mathcal {W}}(0) = 0$ and

$$\begin{aligned} {\mathcal {W}}(t) = \frac{t - t_{n-1}}{k} W(t_{n}) + \frac{t_n - t}{k} W(t_{n-1}) = W(t_{n-1}) + \frac{t - t_{n-1}}{k} \varDelta W^n, \end{aligned}$$

(3.11)

for all $t \in (t_{n-1},t_n]$, $n \in \{1,\ldots ,N\}$.

We are now prepared to state our first preparatory result. The underlying idea was introduced in [38], where it is used to derive a posteriori error estimates for the backward Euler method. In fact, in the absence of noise, only the first term on the right-hand side of (3.12) is non-zero. In [38] this term is used as an a posteriori error estimator, since it is explicitly computable by quantities generated by the numerical method.

Lemma 3.5

Let $X_0 \in L^2(\varOmega ,{\mathcal {F}}_0,{\mathbf {P}};{\mathbf {R}}^d)$ as well as $g_0 \in {\mathbf {R}}^{d,m}$ be given and let Assumption 3.1 be satisfied. Let $k = \frac{T}{N}$, $N \in {\mathbf {N}}$, be an arbitrary equidistant step size and let $t_n = nk$, $n \in \{0,\dots , N\}$. Then, for every $n \in \{1,\ldots ,N\}$ the estimate

$$\begin{aligned} \begin{aligned} {\mathbf {E}}\big [ | X(t_n) - X^n|^2 \big ]&\le k \sum _{i = 1}^n {\mathbf {E}}\big [ \langle f(X^i) - f(X^{i-1}),X^i - X^{i-1}\rangle _{}\big ]\\&\quad + 2 \int _0^{t_n} {\mathbf {E}}\big [ \big \langle f( {\overline{{\mathcal {X}}}}(t)) - f(X(t)), g_0 \big ( {\mathcal {W}}(t) - W(t)\big ) \big \rangle \big ] \,\mathrm {d}t \end{aligned} \end{aligned}$$

(3.12)

holds, where $(X(t))_{t \in [0,T]}$ and $(X^n)_{n \in \{0,\ldots ,N\}}$ are the solutions of (3.1) and (3.6), respectively.

Proof

From (3.6) we directly deduce that for every $n \in \{1,\ldots ,N\}$

$$\begin{aligned} X^n = X_0 - k \sum _{i = 1}^n f(X^{i}) + g_0 W(t_n). \end{aligned}$$

Then, one easily verifies for all $t \in (t_{n-1},t_n]$, $n \in \{1,\ldots ,N\}$, that

$$\begin{aligned} {\mathcal {X}}(t) = X_0 - \int _0^t f({\overline{{\mathcal {X}}}}(s)) \,\mathrm {d}s + g_0 {\mathcal {W}}(t). \end{aligned}$$

Hence, due to (3.5), the error process $E := X - {\mathcal {X}}$ can be written as

$$\begin{aligned} E(t) = \int _0^t \big (f({\overline{{\mathcal {X}}}}(s)) - f(X(s))\big ) \,\mathrm {d}s + g_0 \big (W(t) - {\mathcal {W}}(t) \big ) =: E_1(t) + E_2(t) \end{aligned}$$

(3.13)

for all $t \in [0,T]$. Here, we have $E_2(t_n)= 0$, since ${\mathcal {W}}$ is an interpolant of W. Hence, for all $n \in \{0,\ldots ,N\}$,

$$\begin{aligned} |E(t_n)|^2 = | E_1(t_n)|^2. \end{aligned}$$

(3.14)

To estimate the norm of $E_1(t_n)$, we first note that $E_1$ has absolutely continuous sample paths with $E_1(0)=0$. Hence,

$$\begin{aligned} \frac{1}{2} \frac{\mathrm {d}}{\,\mathrm {d}t} |E_1(t)|^2 = \langle \dot{E}_1(t),E_1(t)\rangle _{} \end{aligned}$$

holds for almost all $t \in [0,T]$. Therefore, by integration with respect to t, we get

$$\begin{aligned} \frac{1}{2} |E_1(t_n)|^2= & {} \int _0^{t_n} \langle {\dot{E}}_1(t),E_1(t)\rangle _{}\,\mathrm {d}t \nonumber \\= & {} \int _0^{t_n} \langle {\dot{E}}_1(t),E(t)\rangle _{}\,\mathrm {d}t - \int _0^{t_n} \langle {\dot{E}}_1(t),E_2(t)\rangle _{}\,\mathrm {d}t. \end{aligned}$$

(3.15)

Next, we write

$$\begin{aligned} {\mathcal {X}}(t) = \frac{t - t_{n-1}}{k} {\overline{{\mathcal {X}}}}(t) + \frac{t_n - t}{k} {\underline{{\mathcal {X}}}}(t), \quad t \in (t_{n-1}, t_n], \end{aligned}$$

and use (3.3) and (3.4) to obtain, for almost every $t \in (t_{n-1}, t_n]$, that

$$\begin{aligned} \langle {\dot{E}}_1(t),E(t)\rangle _{}&= \langle f({\overline{{\mathcal {X}}}}(t)) - f(X(t)),X(t) - {\mathcal {X}}(t)\rangle _{}\\&= \frac{t - t_{n-1}}{k} \langle f({\overline{{\mathcal {X}}}}(t)) - f(X(t)),X(t) - {\overline{{\mathcal {X}}}}(t)\rangle _{}\\&\quad + \frac{t_n -t}{k} \langle f({\overline{{\mathcal {X}}}}(t)) - f(X(t)),X(t) - {\underline{{\mathcal {X}}}}(t)\rangle _{}\\&\le \frac{t_n -t}{k} \langle f({\overline{{\mathcal {X}}}}(t)) - f({\underline{{\mathcal {X}}}}(t)),{\overline{{\mathcal {X}}}}(t) - {\underline{{\mathcal {X}}}}(t)\rangle _{}\\&= \frac{t_n -t}{k} \langle f(X^n) - f(X^{n-1}),X^n - X^{n-1}\rangle _{}. \end{aligned}$$

Furthermore, the expectation of the second integral on the right-hand side of (3.15) equals

$$\begin{aligned} {\mathbf {E}}\Big [ \int _0^{t_n} \langle {\dot{E}}_1(t),E_2(t)\rangle _{}\,\mathrm {d}t \Big ]&= \int _0^{t_n} {\mathbf {E}}\big [ \langle f({\overline{{\mathcal {X}}}}(t)) - f(X(t)),g_0(W(t) - {\mathcal {W}}(t))\rangle _{} \big ] \,\mathrm {d}t. \end{aligned}$$

Therefore,

$$\begin{aligned} {\mathbf {E}}\big [ |E_1(t_n)|^2 \big ]&= 2 \int _0^{t_n} {\mathbf {E}}[ \langle {\dot{E}}_1(t),E(t)\rangle _{} ] \,\mathrm {d}t - 2 \int _0^{t_n} {\mathbf {E}}[ \langle {\dot{E}}_1(t),E_2(t)\rangle _{} ] \,\mathrm {d}t\\&\le 2 \sum _{i = 1}^n \int _{t_{i-1}}^{t_i} \frac{t_i -t}{k} \,\mathrm {d}t \, {\mathbf {E}}\big [ \langle f(X^i) - f(X^{i-1}),X^i - X^{i-1}\rangle _{} \big ]\\&\quad + 2 \int _0^{t_n} {\mathbf {E}}\big [ \langle f({\overline{{\mathcal {X}}}}(t)) - f(X(t)),g_0({\mathcal {W}}(t) - W(t))\rangle _{} \big ] \,\mathrm {d}t. \end{aligned}$$

Since $\int _{t_{i-1}}^{t_i} (t_i - t) \,\mathrm {d}t = \frac{1}{2} k^2$ the assertion follows. $\square $

The next lemma concerns the difference between the Wiener process W and its piecewise linear interpolant ${\mathcal {W}}$.

Lemma 3.6

For every $g_0 \in {\mathbf {R}}^{d,m}$ and every step size $k = \frac{T}{N}$, $N \in {\mathbf {N}}$, the equality

$$\begin{aligned} \Big ( \int _{0}^{T} {\mathbf {E}}[ |g_0 ( W(t) - {\mathcal {W}}(t)) |^2] \,\mathrm {d}t \Big )^{\frac{1}{2}} = \frac{1}{\sqrt{6}} T^{\frac{1}{2}} | g_0 | k^{\frac{1}{2}} \end{aligned}$$

(3.16)

holds.

Proof

From the definition (3.11) of ${\mathcal {W}}$ it follows that

$$\begin{aligned}&\int _{0}^{T} {\mathbf {E}}[ |g_0 ( W(t) - {\mathcal {W}}(t)) |^2] \,\mathrm {d}t\\&\quad = \sum _{n = 1}^N \int _{t_{n-1}}^{t_n} {\mathbf {E}}\Big [ \Big |g_0 \Big ( W(t) - W(t_{n-1}) - \frac{t-t_{n-1}}{k} \varDelta W^n \Big ) \Big |^2 \Big ] \,\mathrm {d}t \\&\quad = \sum _{n =1}^N \int _{t_{n-1}}^{t_n} {\mathbf {E}}\Big [ \Big | \frac{t_n - t}{k} g_0 ( W(t) - W(t_{n-1})) \\&\qquad \qquad \qquad \quad - \frac{t-t_{n-1}}{k} g_0 (W(t_n) - W(t)) \Big |^2 \Big ] \,\mathrm {d}t\\&\quad = \sum _{n =1}^N \int _{t_{n-1}}^{t_n} {\mathbf {E}}\Big [ \Big | \frac{t_n - t}{k} g_0 ( W(t) - W(t_{n-1})) \Big |^2\\&\qquad \qquad \qquad \quad + \Big | \frac{t-t_{n-1}}{k} g_0 (W(t_n) - W(t))\Big |^2 \Big ] \,\mathrm {d}t\\&\quad = \frac{1}{k^2}\sum _{n =1}^N \Big (\int _{t_{n-1}}^{t_n} |g_0|^2 (t_n - t)^2 (t- t_{n-1}) \,\mathrm {d}t\\&\qquad \qquad \qquad + \int _{t_{n-1}}^{t_n} |g_0|^2 (t - t_{n-1})^2 (t_n- t) \,\mathrm {d}t\Big ), \end{aligned}$$

where we used that the two increments of the Wiener process are independent for every $t \in (t_{n-1},t_n]$, $n \in \{1,\ldots , N\}$, and we also applied Itô’s isometry. By symmetry of the two terms it then follows that

$$\begin{aligned} \int _{0}^{T} {\mathbf {E}}[ |g_0 ( W(t) - {\mathcal {W}}(t)) |^2] \,\mathrm {d}t&= \frac{1}{6} T |g_0|^2 k, \end{aligned}$$

and the proof is complete. $\square $

The error estimates in Lemmas 3.5 and 3.6 allow us to determine the order of convergence of the backward Euler–Maruyama method without relying on discrete Gronwall-type inequalities. The following theorem imposes the additional assumption that the drift f is Hölder continuous. We include the parameter value $\alpha = 0$, which simply means that f is continuous and globally bounded. The case of less regular f is treated in Sect. 6.

Observe that we recover the standard rate $\frac{1}{2}$ if $\alpha = 1$, that is, if the drift f is assumed to be globally Lipschitz continuous. Compare also with the standard literature, for example, [20, Chap. 12] or [32, Sect. 1.3].

For processes $X :[0,T] \times \varOmega \rightarrow {\mathbf {R}}^d$ and exponents $\alpha \in [0,1]$, we define the family of Hölder semi-norms by

$$\begin{aligned} | X |_{C^{\alpha }([0,T];L^2(\varOmega ;{\mathbf {R}}^d))} =\sup _{{\mathop {t\ne s}\limits ^{t,s\in [0,T]}}} \frac{\Vert X(t)-X(s)\Vert _{L^2(\varOmega ;{\mathbf {R}}^d)}}{|t-s|^{\alpha }} \end{aligned}$$

and the corresponding Hölder spaces

$$\begin{aligned} C^{\alpha }([0,T];L^2(\varOmega ;{\mathbf {R}}^d))= \big \{ X \in C([0,T];L^2(\varOmega ;{\mathbf {R}}^d)) : | X |_{C^{\alpha }([0,T];L^2(\varOmega ;{\mathbf {R}}^d))} < \infty \big \}. \end{aligned}$$

Theorem 3.7

Let $X_0 \in L^2(\varOmega ,{\mathcal {F}}_0,{\mathbf {P}};{\mathbf {R}}^d)$ as well as $g_0 \in {\mathbf {R}}^{d,m}$ be given, let Assumption 3.1 be fulfilled and let $f = \nabla \varPhi $ be Hölder continuous with exponent $\alpha \in [0,1]$, i.e., there exists $L_f \in ( 0,\infty )$ such that

$$\begin{aligned} | f(x) - f(y) | \le L_f |x-y|^\alpha , \quad \text { for all } x,y \in {\mathbf {R}}^d. \end{aligned}$$

Then there exists $C \in (0,\infty )$ such that for every step size $k = \frac{T}{N}$, $N \in {\mathbf {N}}$, the estimate

$$\begin{aligned} \max _{n \in \{0,\ldots ,N\}} \Vert X(t_n) - X^n \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)} \le C k^{\frac{1 + \alpha }{4}} \end{aligned}$$

holds, where $(X(t))_{t \in [0,T]}$ and $(X^n)_{n \in \{0,\ldots ,N\}}$ are the solutions to (3.1) and (3.6), respectively.

Proof

Since f is assumed to be $\alpha $-Hölder continuous it follows that

$$\begin{aligned} |f(x)| \le \max (L_f,|f(0)|)( 1 + |x|^\alpha ), \quad \text {for all }x \in {\mathbf {R}}^d. \end{aligned}$$

In particular, f grows at most linearly. Therefore, as stated in [28, Chap. 2, Thm 4.3], the solution $(X(t))_{t\in [0,T]}$ of (3.1) fulfills $X \in C^{\frac{1}{2}}([0,T];L^2(\varOmega ;{\mathbf {R}}^d))$.

We will use Lemma 3.5 to prove the error bound. To this end, we first show that

$$\begin{aligned} \begin{aligned}&k \sum _{i=1}^{N} {\mathbf {E}}\big [ \langle f(X^i) - f(X^{i-1}),X^i - X^{i-1}\rangle _{}\big ]\\&\quad \le L_f T^{\frac{1-\alpha }{2}} \Big ( 2 {\mathbf {E}}\big [ |X_0|^2 \big ] + 4 T \big ( \varPhi (0) + |g_0|^2 \big ) \Big )^{\frac{1+\alpha }{2}} k^{\frac{1+\alpha }{2}}. \end{aligned}\nonumber \\ \end{aligned}$$

(3.17)

Indeed, we make use of the Hölder continuity of f directly and obtain

$$\begin{aligned}&k \sum _{i=1}^{N} {\mathbf {E}}\big [ \langle f(X^i) - f(X^{i-1}),X^i - X^{i-1}\rangle _{}\big ]\\&\quad \le \sum _{i=1}^{N} k {\mathbf {E}}\big [ |f(X^i) - f(X^{i-1})| |X^i - X^{i-1}|\big ]\\&\quad \le L_f \sum _{i=1}^{N} k^{\frac{1}{q}} k^{\frac{1}{p}} {\mathbf {E}}\big [ |X^i - X^{i-1}|^{1 + \alpha } \big ]\\&\quad \le L_f \Big ( \sum _{i = 1}^N k \Big )^{\frac{1}{q}} \Big ( k \sum _{i = 1}^N {\mathbf {E}}\big [ |X^i - X^{i-1}|^2 \big ] \Big )^{\frac{1}{p}}, \end{aligned}$$

where we also used Hölder’s inequality with $p = \frac{2}{1 + \alpha } \in [1,2]$ and $\frac{1}{p} + \frac{1}{q} = 1$ as well as Jensen’s inequality. Due to the a priori estimate (3.8) the sum $\sum _{i=1}^{N} {\mathbf {E}}\big [ |X^i - X^{i-1}|^2 \big ]$ is bounded independently of the step size k. Hence, we arrive at (3.17).

Therefore, it remains to estimate the second error term in Lemma 3.5:

$$\begin{aligned}&\int _0^{t_n} {\mathbf {E}}\big [ \big \langle f({\overline{{\mathcal {X}}}}(t)) - f(X(t)), g_0 ({\mathcal {W}}(t) - W(t) ) \big \rangle \big ] \,\mathrm {d}t\nonumber \\&\quad = \sum _{j = 1}^n \int _{t_{j-1}}^{t_j} {\mathbf {E}}\big [ \big \langle f(X^j) - f(X(t)), g_0 ({\mathcal {W}}(t) - W(t) ) \big \rangle \big ] \,\mathrm {d}t, \end{aligned}$$

(3.18)

where we inserted the definition of ${\overline{{\mathcal {X}}}}$ from (3.10). Moreover, from (3.11) we get

$$\begin{aligned} g_0 ({\mathcal {W}}(t) - W(t) ) = \frac{t - t_{j-1}}{k} g_0 \varDelta W^j - g_0 ( W(t) - W(t_{j-1}) ) \end{aligned}$$

for $t \in (t_{j-1},t_j]$. Hence, the random variable in the second slot of the inner product on the right-hand side of (3.18) is centered and is independent of any ${\mathcal {F}}_{t_{j-1}}$-measurable random variable. Thus, we may write

$$\begin{aligned}&\sum _{j = 1}^n \int _{t_{j-1}}^{t_j} {\mathbf {E}}\big [ \big \langle f(X^j) - f(X(t)), g_0 ({\mathcal {W}}(t) - W(t) ) \big \rangle \big ] \,\mathrm {d}t\\&\quad = \sum _{j = 1}^n \int _{t_{j-1}}^{t_j} {\mathbf {E}}\big [ \big \langle f(X^j) - f(X^{j-1}),g_0 ({\mathcal {W}}(t) - W(t) ) \big \rangle \big ] \,\mathrm {d}t\\&\qquad + \sum _{j = 1}^n \int _{t_{j-1}}^{t_j} {\mathbf {E}}\big [ \big \langle f(X(t_{j-1})) - f(X(t)), g_0 ({\mathcal {W}}(t) - W(t) ) \big \rangle \big ] \,\mathrm {d}t =: T_1 + T_2. \end{aligned}$$

To estimate $T_1$ we first recall the definitions of ${\underline{{\mathcal {X}}}}$ and ${\overline{{\mathcal {X}}}}$ from (3.10). Then we apply the Cauchy–Schwarz inequality and obtain

$$\begin{aligned} T_1&= \int _0^{t_n} {\mathbf {E}}\big [ \langle f({\overline{{\mathcal {X}}}}(t)) - f({\underline{{\mathcal {X}}}}(t)), g_0 ({\mathcal {W}}(t) - W(t) ) \rangle \big ] \,\mathrm {d}t\\&\le \Big ( \int _0^{t_n} {\mathbf {E}}\big [ | f({\overline{{\mathcal {X}}}}(t)) - f({\underline{{\mathcal {X}}}}(t)) |^2 \big ] \,\mathrm {d}t \Big )^{\frac{1}{2}} \Big ( \int _0^{t_n} {\mathbf {E}}\big [ | g_0 ({\mathcal {W}}(t) - W(t) ) |^2 \big ] \,\mathrm {d}t \Big )^{\frac{1}{2}}. \end{aligned}$$

From the Hölder continuity of f we then deduce that

$$\begin{aligned} \int _0^{t_n} {\mathbf {E}}\big [ | f({\overline{{\mathcal {X}}}}(t)) - f({\underline{{\mathcal {X}}}}(t)) |^2 \big ] \,\mathrm {d}t&\le L_f^2 k \sum _{i = 1}^N {\mathbf {E}}\big [ | X^{i} - X^{i-1}|^{2 \alpha } \big ]\\&\le L_f^2 T^{\frac{1}{q}} \Big (k \sum _{i =1}^N {\mathbf {E}}\big [ | X^{i} - X^{i-1}|^{2} \big ] \Big )^{\alpha }, \end{aligned}$$

where the last inequality is in fact an equality if $\alpha = 1$, $\frac{1}{q} = 0$ or if $\alpha = 0$, $\frac{1}{q}=1$. Otherwise the inequality follows from Hölder’s inequality with $p = \frac{1}{\alpha } \in (1,\infty )$ and $\frac{1}{p} + \frac{1}{q} = 1$, followed by an application of Jensen’s inequality. Furthermore, Lemma 3.6 states that

$$\begin{aligned} \Big ( \int _0^T {\mathbf {E}}\big [ | g_0 ({\mathcal {W}}(t) - W(t) ) |^2 \big ] \,\mathrm {d}t \Big )^{\frac{1}{2}} = \frac{1}{\sqrt{6}} T^{\frac{1}{2}} |g_0| k^{\frac{1}{2}}. \end{aligned}$$

(3.19)

Therefore, together with (3.8) we arrive at the estimate

$$\begin{aligned} T_1 \le \frac{1}{\sqrt{6}} L_f T^{\frac{2 - \alpha }{2}} |g_0| \Big ( 2 {\mathbf {E}}\big [ |X_0|^2 \big ] + 4 T \big ( \varPhi (0) + |g_0|^2 \big ) \Big )^{\frac{\alpha }{2}} k^{\frac{1 + \alpha }{2}} \end{aligned}$$

for all $n \in \{1,\ldots ,N\}$.

The estimate of $T_2$ follows similarly by additionally making use of the Hölder continuity of the exact solution. To be more precise, we have that

$$\begin{aligned}&\sum _{i =1}^n \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ | f(X(t_{i-1})) - f(X(t)) |^2 \big ] \,\mathrm {d}t\\&\quad \le L_f^2 \sum _{i = 1}^N \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ | X(t_{i-1}) - X(t) |^{2 \alpha } \big ] \,\mathrm {d}t\\&\quad \le L_f^2 \sum _{i = 1}^N \int _{t_{i-1}}^{t_i} \big ( {\mathbf {E}}\big [ | X(t_{i-1}) - X(t) |^{2} \big ] \big )^{\alpha } \,\mathrm {d}t\\&\quad \le L_f^2 T \Vert X \Vert _{C^{\frac{1}{2}}([0,T];L^2(\varOmega ;{\mathbf {R}}^{d}))}^{2\alpha } k^{\alpha }. \end{aligned}$$

Together with the Cauchy–Schwarz inequality and (3.19), we therefore obtain

$$\begin{aligned} T_2 \le \frac{1}{\sqrt{6}} L_f T |g_0| \Vert X \Vert _{C^{\frac{1}{2}}([0,T];L^2(\varOmega ;{\mathbf {R}}^{d}))}^\alpha k^{\frac{1 + \alpha }{2}}. \end{aligned}$$

Inserting the estimates for $T_1$, $T_2$, and (3.17) into Lemma 3.5 completes the proof. $\square $

Remark 3.8

The precise form of the constant C appearing in Theorem 3.7 is, after taking squares,

$$\begin{aligned} C^2&= L_f T^{\frac{1-\alpha }{2}} C_0^{\frac{1+\alpha }{2}} + \frac{1}{\sqrt{6}} L_f T^{\frac{2 - \alpha }{2}} |g_0| \big ( C_0^{\frac{\alpha }{2}} + T^{\frac{\alpha }{2}} \Vert X \Vert _{C^{\frac{1}{2}}([0,T];L^2(\varOmega ;{\mathbf {R}}^{d}))}^\alpha \big ) \end{aligned}$$

with $C_0 = 2 {\mathbf {E}}[ |X_0|^2 ] + 4 T ( \varPhi (0) + |g_0|^2 )$.

Observe that, since we avoid the use of Gronwall-type inequalities, the error constant does not grow exponentially with time T. This indicates that the backward Euler–Maruyama method is particularly suited for long-time simulations as is often required in Markov-chain Monte Carlo methods, for example, in the unadjusted Langevin algorithm [45].

4 Properties of the exact solution in the multi-valued case

In this section, we turn our attention to the multi-valued stochastic differential equation (MSDE) in (1.5). We give a complete account of the assumptions imposed on the coefficient functions. In addition, we collect some results on the existence and uniqueness of a strong solution to the MSDE. We also include useful results on higher moment bounds of the exact solution.

Assumption 4.1

The set valued mapping $f :{\mathbf {R}}^d \rightarrow 2^{{\mathbf {R}}^d}$ is maximal monotone with ${\text {int}} D(f) \ne \emptyset $. Moreover, there exist constants $\beta , \lambda \in [0,\infty )$, $\mu \in (0,\infty )$, and $p \in [1,\infty )$ such that

$$\begin{aligned} \langle f_v,v\rangle _{} \ge \mu |v|^p - \lambda \quad \text {and}\quad |f_v| \le \beta (1 + |v|^{p-1}) \end{aligned}$$

for every $v \in D(f)$ and $f_v \in f(v)$.

Assumption 4.2

The function $b :{\mathbf {R}}^d \rightarrow {\mathbf {R}}^d$ is Lipschitz continuous; i.e., there exists a constant $L_b \in [0,\infty )$ such that

$$\begin{aligned} |b(v) - b(w)| \le L_b |v-w| \end{aligned}$$

for all $v,w \in {\mathbf {R}}^d$.

Assumption 4.3

The function $g :{\mathbf {R}}^d \rightarrow {\mathbf {R}}^{d,m}$ is Lipschitz continuous; i.e., there exists a constant $L_g \in [0,\infty )$ such that

$$\begin{aligned} |g(v) - g(w)| \le L_g |v-w| \end{aligned}$$

for all $v,w \in {\mathbf {R}}^d$.

Assumption 4.4

The initial value $X_0$ is an ${\mathcal {F}}_0$-measurable and D(f)-valued random variable. Furthermore,

$$\begin{aligned} {\mathbf {E}}[ |X_0|^{\max (2p - 2,2)} ] < \infty , \end{aligned}$$

where the value of p is the same as in Assumption 4.1.

Observe that Assumptions 4.2 and 4.3 directly imply that b and g grow at most linearly. More precisely, after possibly increasing the values of $L_b$ and $L_g$, we obtain the bounds

$$\begin{aligned} |b(v)| \le L_b (1 + |v|),\quad |g(v)| \le L_g (1 + |v|), \end{aligned}$$

(4.1)

for all $v \in {\mathbf {R}}^d$.

Remark 4.5

Without loss of generality we will assume that $0 \in D(f)$. Otherwise, since the graph of f is not empty, we take $v_0 \in D(f)$ and $f_{v_0} \in f(v_0)$ and replace f, b, and g by suitably shifted mappings, for instance, ${\tilde{f}}(v) := f(v + v_0)$. Then $0 \in D({\tilde{f}})$ holds. Compare further with [48, Abschn. 3.3.3].

Next, we introduce the notion of a solution of (1.5), which we use for the remainder of this paper.

Definition 4.6

A tuple $(X,\eta )$ is called a solution of the multi-valued stochastic differential equation (1.5), if the following conditions hold.

(i)
The mapping $X :[0,T] \times \varOmega \rightarrow {\mathbf {R}}^d$ is an $({\mathcal {F}}_t)_{t \in [0,T]}$-adapted, $\mathbf {P}$-almost surely continuous stochastic process such that $X(t) \in \overline{D(f)}$ for all $t \in (0,T]$ with probability one.
(ii)
The mapping $\eta :[0,T] \times \varOmega \rightarrow {\mathbf {R}}^d$ is an $({\mathcal {F}}_t)_{t \in [0,T]}$-adapted stochastic process such that
$$\begin{aligned} \int _0^T |\eta (t)| \,\mathrm {d}t < \infty , \quad {\mathbf {P}}\text {-almost surely.} \end{aligned}$$
(iii)
The equality
$$\begin{aligned} X(t) + \int _{0}^{t} \eta (s) \,\mathrm {d}s = X_0 + \int _{0}^{t} b(X(s)) \,\mathrm {d}s + \int _{0}^{t} g(X(s)) \,\mathrm {d}W(s) \end{aligned}$$
(4.2)
holds for all $t \in [0,T]$ and ${\mathbf {P}}$-almost surely.
(iv)
For ${\mathbf {P}}$-almost all $\omega \in \varOmega $ and $t \in [0,T]$, it follows that $\eta (t,\omega ) \in f(X(t,\omega ))$; in other words, for every $y \in D(f)$ and $f_y \in f(y)$ the inequality
$$\begin{aligned} \langle \eta (t) - f_{y},X(t) - y\rangle _{} \ge 0 \end{aligned}$$
is satisfied for almost every $t \in [0,T]$ and ${\mathbf {P}}$-almost surely, cf. Definition 2.1.

This notion of a solution has been considered in, for example, [7, 21, 42, 52], where also the existence of a unique solution is shown. Note that in [7] the condition on $\eta $ is slightly milder than in [21, 42, 52]. Our concept of solution corresponds to the latter sources. For the multi-valued equation it becomes necessary to consider a tuple $(X,\eta )$ as a solution. The function X plays the usual role of the solution of the equation. As f(X) is now a set in ${\mathbf {R}}^d$, we select one unique element $\eta $ in this set such that the inclusion (1.5) becomes an equality when exchanging f(X) by $\eta $.

Due to their importance for the error analysis we next prove certain moment estimates.

Theorem 4.7

Let Assumptions 4.1 and 4.4 be satisfied with $p \in [1,\infty )$. Then there exists a unique solution $(X,\eta )$ of (1.5) in the sense of Definition 4.6. There is a constant $C\in (0,\infty )$ such that

$$\begin{aligned} \sup _{t \in [0,T]} {\mathbf {E}}\big [ |X(t)|^2 \big ] + {\mathbf {E}}\Big [ \int _0^T |X(s)|^p \,\mathrm {d}s \Big ] \le C. \end{aligned}$$

Furthermore, if $p \in (1,\infty )$ and $\frac{1}{p}+\frac{1}{q} = 1$, then

$$\begin{aligned} {\mathbf {E}}\Big [ \int _0^T |\eta (s)|^q \,\mathrm {d}s \Big ] \le C. \end{aligned}$$

Proof

Existence and uniqueness is shown, for instance, in [21]. For

$$\begin{aligned} X(t) = X_0 + \int _{0}^{t} (b(X(s)) - \eta (s)) \,\mathrm {d}s + \int _{0}^{t} g(X(s)) \,\mathrm {d}W(s) \end{aligned}$$

the equality

$$\begin{aligned} |X(t)|^2&= |X_0|^2 + \int _{0}^{t} \big (2 \langle b(X(s)),X(s)\rangle _{} - 2 \langle \eta (s),X(s)\rangle _{} + |g(X(s))|^2\big ) \,\mathrm {d}s\\&\quad + \int _{0}^{t} 2\langle X(s),g(X(s))\,\mathrm {d}W(s)\rangle _{}, \end{aligned}$$

holds by an application of Itô’s formula (see [12, Chap. 4.7, Theorem 7.1]). From the coercivity assumption on f we obtain that

$$\begin{aligned} \langle f_{X(s)},X(s)\rangle _{} \ge \mu |X(s)|^p - \lambda \end{aligned}$$

for every $f_{X(s)} \in f(X(s))$ and almost every $s \in [0,T]$. The fact that $\eta (s) \in f(X(s))$ for almost every $s \in [0,T]$ then implies that

$$\begin{aligned} \int _{0}^{t} \langle \eta (s),X(s)\rangle _{} \,\mathrm {d}s \ge \mu \int _{0}^{t} |X(s)|^p \,\mathrm {d}s - \lambda t. \end{aligned}$$

Since b and g satisfie the linear growth bound (4.1), we have

$$\begin{aligned} \int _{0}^{t} \langle b(X(s)),X(s)\rangle _{}\,\mathrm {d}s \le 2 L_b \int _{0}^{t} \big (1 +|X(s)|^2\big ) \,\mathrm {d}s \end{aligned}$$

as well as

$$\begin{aligned} \int _{0}^{t} |g(X(s))|^2 \,\mathrm {d}s \le 2 L_g^2 \int _{0}^{t} \big ( 1+ |X(s)|^2\big ) \,\mathrm {d}s. \end{aligned}$$

Thus, we get

$$\begin{aligned} |X(t)|^2 + 2 \mu \int _{0}^{t} |X(s)|^p \,\mathrm {d}s&\le |X_0|^2 + \big (4 L_b + 2 L_g^2 \big ) \int _{0}^{t} \big (1 + |X(s)|^2 \big ) \,\mathrm {d}s\\&\quad + 2 \lambda t + \int _{0}^{t} 2\langle X(s),g(X(s)) \,\mathrm {d}W(s)\rangle _{}. \end{aligned}$$

We introduce

$$\begin{aligned} Z(t)&:= |X(t)|^2 + 2 \mu \int _{0}^{t} |X(s)|^p \,\mathrm {d}s, \quad M(t) := \int _{0}^{t} 2\langle X(s),g(X(s)) \,\mathrm {d}W(s)\rangle _{}, \\ \xi (t)&:= |X_0|^2 + 2 ( \lambda + 2 L_b + L_g^2 ) t, \quad \varphi (t): = 4 L_b + 2 L_g^2. \end{aligned}$$

Then Z, M, and $\xi $ are $({\mathcal {F}}_t)_{t\in [0,T]}$-adapted and $\mathbf {P}$-almost surely continuous stochastic processes. Furthermore, M is a local $({\mathcal {F}}_t)_{t\in [0,T]}$-martingale satisfying $M(0) = 0$. Thus, an application of Lemma 2.3 yields, for every $t \in [0,T]$, that

$$\begin{aligned} {\mathbf {E}}\big [Z(t) \big ]&\le \exp \Big (\int _{0}^{t} \varphi (s) \,\mathrm {d}s \Big ) {\mathbf {E}}\big [ \sup _{s\in [0,t]} \xi (s) \big ]\\&= \exp \big ( (4 L_b + 2 L_g^2) t \big ) \big ( {\mathbf {E}}\big [|X_0|^2\big ] + 2 ( \lambda + 2 L_b + L_g^2 ) t \big ). \end{aligned}$$

Inserting the definition of Z then proves the first estimate.

Furthermore, if Assumption 4.1 holds with $p \in (1,\infty )$, then we have, for every $f_{x} \in f(x)$, $x \in {\mathbf {R}}^d$, that

$$\begin{aligned} |f_{x} | \le \beta (1 + |x|^{p-1}), \end{aligned}$$

with $q = \frac{p}{p-1}$. Therefore, it follows that

$$\begin{aligned} \Big (\int _{0}^{T} {\mathbf {E}}\big [ |\eta (s)|^q \big ] \,\mathrm {d}s\Big )^{\frac{1}{q}} \le T^{\frac{1}{q}} \beta + \beta \Big (\int _{0}^{T} {\mathbf {E}}\big [ |X(s)|^p \big ] \,\mathrm {d}s\Big )^{\frac{1}{q}} \le C \end{aligned}$$

since $\eta (s) \in f(X(s))$ for almost every $s \in [0,T]$. $\square $

Remark 4.8

Let us mention that, for instance, in [41, Chapter 4] and the references therein, a weaker notion of a solution to (1.5) is found. More precisely, if $(X,\eta )$ is a solution in the sense of Definition 4.6, then (X, H) is a solution in the sense of [41, Chapter 4] with the definition

$$\begin{aligned} H(t) := \int _0^t \eta (s) \,\mathrm {d}s, \quad t \in [0,T]. \end{aligned}$$

In particular, the process H is a continuous, progressively measurable process with bounded total variation and $H(0) = 0$ almost surely. The stronger condition of absolute continuity of the process H, which is required in Definition 4.6, is essential in the proof of Theorem 6.4 below. This explains why we work with the stronger notion of a solution in Definition 4.6.

5 Well-posedness of the backward Euler method in the multi-valued case

In this section, we show that the backward Euler–Maruyama method (1.6) for the MSDE (1.5) is well-posed under the same assumptions as in the previous section.

Lemma 5.1

Let Assumptions 4.1 and 4.2 be satisfied. Furthermore, let $w \in {\mathbf {R}}^d$ and $k \in (0,T]$ be given with $L_b k \in [0,1)$. Then there exist uniquely determined $x_0 \in D(f)$ and $\eta _{x_0} \in f(x_0)$, which satisfy the nonlinear equation

$$\begin{aligned} x_0 + k \eta _{x_0} - k b(x_0) = w. \end{aligned}$$

(5.1)

Proof

We first show that there exists a unique $x_0\in D(f)$ such that

$$\begin{aligned} x_0 + k f(x_0) - k b(x_0) = (\mathrm {id}+ k f - k b)(x_0)\ni w. \end{aligned}$$

(5.2)

To this end, notice that for all $x,y \in {\mathbf {R}}^d$, the inequalities

$$\begin{aligned} \langle (\mathrm {id}-k b)x - (\mathrm {id}-k b)y,x-y\rangle _{} \ge |x-y|^2 - k L_b |x-y|^2 \ge 0 \end{aligned}$$

hold due to the step size bound. In addition, it follows from (4.1) that

$$\begin{aligned} \frac{\langle (\mathrm {id}-k b)x,x\rangle _{}}{|x|} = \frac{|x|^2 - k \langle b(x),x\rangle _{}}{|x|} \ge (1 - k L_b) |x| - k L_b \end{aligned}$$

for all $x \in {\mathbf {R}}^d$. Hence, the mapping $(\mathrm {id}+ k f - k b)$ is the sum of the maximal monotone operator kf and the mapping $(\mathrm {id}- k b)$, which is single-valued, Lipschitz continuous, monotone, and coercive.

Thus, we can apply [2, Theorem 2.1] and obtain the existence of $x_0 \in D(f)$ such that (5.2) holds. Furthermore, there necessarily exists a corresponding unique element $\eta _{x_0} \in f(x_0)$ with

$$\begin{aligned} \eta _{x_0} = \frac{1}{k} (w - x_0) + b(x_0). \end{aligned}$$

It remains to prove the uniqueness of $x_0$, which directly implies the uniqueness of $\eta _{x_0}$. Assume that there exist $x_1 \in D(f)$ and $\eta _{x_1} \in f(x_1)$ as well as $x_2 \in D(f)$ and $\eta _{x_2} \in f(x_2)$ such that

$$\begin{aligned} x_1 + k \eta _{x_1} - k b(x_1) = w, \quad x_2 + k \eta _{x_2} - k b(x_2) = w. \end{aligned}$$

By considering the difference of these equations tested with $x_1- x_2$, we obtain

$$\begin{aligned} 0&= \langle x_1 - x_2,x_1 - x_2\rangle _{} + k \langle \eta _{x_1} - \eta _{x_2},x_1 - x_2\rangle _{} - k \langle b(x_1) - b(x_2) ,x_1 - x_2\rangle _{} \\&\ge |x_1 - x_2|^2 - k L_b |x_1 - x_2|^2 \ge 0. \end{aligned}$$

Since $1 - k L_b > 0$ we must have $x_1 = x_2$ and the proof is complete. $\square $

For later use, we note that the solution operator of (5.1) is Lipschitz continuous.

Lemma 5.2

Let Assumptions 4.1 and 4.2 be satisfied. For $k \in (0,T]$ with $L_b k \in [0,1)$ let $S_k :{\mathbf {R}}^d \rightarrow D(f)$ be the solution operator that maps $w \in {\mathbf {R}}^d$ to the unique solution $x_0 \in D(f)$ of (5.1). Then $S_k$ is globally Lipschitz continuous with

$$\begin{aligned} | S_k(w_1) - S_k(w_2)| \le \frac{1}{1 - k L_b} |w_1 - w_2| \quad \text {for all } w_1, w_2 \in {\mathbf {R}}^d. \end{aligned}$$

Proof

Let $w_1, w_2 \in {\mathbf {R}}^d$ and $k \in (0,T]$ with $L_b k \in [0,1)$ be given. Let $x_i = S_k(w_i) \in D(f)$ and $\eta _{x_i} \in f(x_i)$, $i\in \{1,2\}$, denote the unique solutions of the equations

$$\begin{aligned} x_1 + k \eta _{x_1} -k b(x_1) = w_1,\quad x_2 + k \eta _{x_2} -k b(x_2) = w_2. \end{aligned}$$

By considering the difference of these equations, tested with $x_1 - x_2$, we obtain

$$\begin{aligned}&|x_1 - x_2|^2 + k \langle \eta _{x_1} - \eta _{x_2},x_1 - x_2\rangle _{} - k \langle b(x_1) - b(x_2 ),x_1 - x_2\rangle _{} \\&\quad = \langle w_1 - w_2,x_1 - x_2\rangle _{}. \end{aligned}$$

By using the Cauchy–Schwarz inequality for the right-hand side as well as the monotonicity and the Lipschitz continuity for the left-hand side, we get

$$\begin{aligned} (1 - k L_b) |x_1 - x_2|^2 \le |w_1 - w_2| |x_1 - x_2|. \end{aligned}$$

Reinserting $x_i = S_k(w_i)$ then shows that

$$\begin{aligned} |S_k(w_1) - S_k(w_2)| = |x_1 - x_2| \le \frac{1}{1 - k L_b} |w_1 - w_2| \end{aligned}$$

as claimed. $\square $

Theorem 5.3

Let Assumptions 4.1 to 4.4 be satisfied. Then for every step size $k = \frac{T}{N}$, $N \in {\mathbf {N}}$, with $L_b k \in [0,1)$ there exist uniquely determined families of square-integrable, ${\mathbf {R}}^d$-valued and $({\mathcal {F}}_{t_n})_{n \in \{1,\ldots ,N\}}$-adapted random variables $(X^n)_{n\in \{1,\dots ,N\}}$ and $(\eta ^n)_{n\in \{1,\dots ,N\}}$ such that $X^n \in D(f)$, $\eta ^n \in f(X^n)$ for every $n \in \{1,\dots ,N\}$ and

$$\begin{aligned} X^n + k \eta ^n = X^{n-1} + k b(X^n) + g(X^{n-1})\varDelta W^n \end{aligned}$$

(5.3)

for every $n \in \{1,\dots ,N\}$, ${\mathbf {P}}$-almost surely with $X^0 = X_0$ and $\eta ^0 \in f(X_0)$.

Proof

We prove the existence of $(X^n)_{n\in \{0,\dots ,N\}}$ and $(\eta ^n)_{n\in \{0,\dots ,N\}}$ by induction over $n \in \{0,\ldots ,N\}$. From the assumptions on $X_0$ and f it is clear that $X^0=X_0$ and $\eta ^0 \in f(X_0)$ are ${\mathcal {F}}_{t_0}$-adapted and square-integrable. In particular, it follows from Assumptions 4.1 and 4.4 that

$$\begin{aligned} {\mathbf {E}}\big [ |\eta ^0 |^2 \big ] \le \beta ^2 {\mathbf {E}}\big [ (1 + |X_0|^{p-1})^2 \big ] \le 2 \beta ^2 \big ( 1 + {\mathbf {E}}[ |X_0|^{2p-2} ] \big ). \end{aligned}$$

Next, we assume that $(X^j)_{j \in \{0,\ldots ,n-1\}}$ and $(\eta ^j)_{j \in \{0,\ldots ,n-1\}}$ are adapted to $({\mathcal {F}}_{t_{j}})_{j \in \{0,\ldots ,n-1\}}$, square-integrable and satisfy (5.3) for all $j \in \{1,\ldots ,n-1\}$. By Lemma 5.1 there exist uniquely determined $X^n(\omega ) \in D(f)$ and $\eta ^n(\omega ) \in f(X^n(\omega ))$ for every $\omega \in \varOmega $ such that

$$\begin{aligned} X^n(\omega ) + k \eta ^n(\omega ) = X^{n-1}(\omega ) + k b(X^n(\omega )) + g(X^{n-1}(\omega ))\varDelta W^n(\omega ). \end{aligned}$$

By Lemma 5.2, the solution operator $S_{k} :{\mathbf {R}}^d \rightarrow D(f)$ that maps $X^{n-1}(\omega ) + g(X^{n-1}(\omega ))\varDelta W^n(\omega )$ to $X^n(\omega ) \in D(f)$ is Lipschitz continuous. As $S_{k}$ is Lipschitz continuous and, hence, of linear growth it follows that $X^n$ is an ${\mathcal {F}}_{t_n}$-measurable and square-integrable random variable. To be more precise, we have the bound

$$\begin{aligned} \big \Vert X^n \big \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)}&=\big \Vert S_{k}(X^{n-1} + g(X^{n-1}) \varDelta W^n) \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)}\\&\le | S_{k}(0) | + \big \Vert X^{n-1} + g(X^{n-1}) \varDelta W^n \big \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)}. \end{aligned}$$

This implies, in particular, that

$$\begin{aligned} \eta ^n = - \frac{1}{k}\big (X^n - X^{n-1}\big ) + b(X^n) + g(X^{n-1})\frac{\varDelta W^n}{k} \quad \text {a.s. in } \varOmega \end{aligned}$$

is also a ${\mathcal {F}}_{t_n}$-measurable and square-integrable random variable as $X^n$, $X^{n-1}$, and $g(X^{n-1})\varDelta W^n$ have these properties. This finishes the proof of the induction and hence that of the theorem. $\square $

Next we state an a priori estimate for the sequence of random variables satisfying recursion (1.6).

Lemma 5.4

Let Assumptions 4.1 to 4.4 be satisfied. For a step size $k = \frac{T}{N}$, $N \in {\mathbf {N}}$, with $5 L_b k \in [0,1)$, let $(X^n)_{n \in \{0,\dots ,N\}}$ and $(\eta ^n)_{n\in \{0,\dots ,N\}}$ be two families of $({\mathcal {F}}_{t_n})_{n \in \{0,\ldots ,N\}}$-adapted random variables as stated in Theorem 5.3. Then there exists $K_X \in (0,\infty )$ independent of the step size $k = \frac{T}{N}$ such that

$$\begin{aligned} \begin{aligned}&\max _{n \in \{1,\ldots ,N\}} {\mathbf {E}}\big [ |X^n|^2 \big ] + \frac{1}{2} \sum _{ j = 1}^N {\mathbf {E}}\big [ | X^{j} - X^{j-1} |^2 \big ] + 2 \mu k \sum _{ j = 1}^N {\mathbf {E}}\big [|X^j|^p\big ] \le K_X. \end{aligned} \end{aligned}$$

(5.4)

If, in addition, $p \in (1,\infty )$, then there exists $K_\eta \in (0,\infty )$ independent of the step size $k = \frac{T}{N}$ such that

$$\begin{aligned} k \sum _{ j = 1}^N {\mathbf {E}}\big [|\eta ^j|^q\big ] \le K_{\eta }, \end{aligned}$$

(5.5)

where $q \in (1,\infty )$ is given by $\frac{1}{p} + \frac{1}{q} = 1$.

Remark 5.5

If $p = 1$ in Assumption 4.1, then f and, hence, $(\eta ^n)_{n \in \{1,\ldots ,N\}}$ are bounded. In particular, (5.5) holds for any $q \in (1,\infty )$ and for any step size $k = \frac{T}{N}$ with $L_b k \in [0,1)$.

Proof of Lemma 5.4

First, we recall the identity

$$\begin{aligned} \langle X^n - X^{n-1},X^n\rangle _{} = \frac{1}{2} \big ( |X^n|^2 - |X^{n-1}|^2 + |X^n - X^{n-1}|^2 \big ). \end{aligned}$$

As $\eta ^n \in f(X^n)$, using Assumptions 4.1 and 4.2, it follows that

$$\begin{aligned}&\frac{1}{2} \big ( |X^n|^2 - |X^{n-1}|^2 + |X^n - X^{n-1}|^2 \big ) + k \mu |X^n|^p \\&\quad \le \langle X^n - X^{n-1}, X^n \rangle + k \langle \eta ^n,X^n\rangle _{} + k \lambda \\&\quad = k \langle b(X^n),X^n\rangle _{} + \langle g(X^{n-1}) \varDelta W^n, X^n \rangle + k \lambda \\&\quad \le k L_b (1 + |X^n|)|X^n| + \langle g(X^{n-1}) \varDelta W^n, X^n \rangle + k \lambda , \end{aligned}$$

where we also applied (4.1). Hence,

$$\begin{aligned}&\frac{1}{2} \big ( |X^n|^2 - |X^{n-1}|^2 + |X^n - X^{n-1}|^2 \big ) + k \mu |X^n|^p \\&\quad \le k (\lambda + L_b) + \frac{5}{4} k L_b |X^n|^2 + \langle g(X^{n-1}) \varDelta W^n, X^n - X^{n-1} \rangle \\&\qquad + \langle g(X^{n-1}) \varDelta W^n, X^{n-1} \rangle \\&\quad \le k (\lambda + L_b) + \frac{5}{4} k L_b |X^n|^2 + \big | g(X^{n-1}) \varDelta W^n \big |^2 + \frac{1}{4} | X^n - X^{n-1}|^2\\&\qquad + \langle g(X^{n-1}) \varDelta W^n, X^{n-1} \rangle , \end{aligned}$$

for every $n \in \{1,\ldots ,N\}$, where we also applied the Cauchy–Schwarz and weighted Young inequalities. After a kick-back argument, we sum from 1 to $n \in \{1,\ldots ,N\}$ to obtain

$$\begin{aligned}&|X^n|^2 + \frac{1}{2} \sum _{ j = 1}^n | X^{j} - X^{j-1} |^2 + 2 k \mu \sum _{j = 1}^n |X^j|^p \\&\quad \le |X^0|^2 + 2 (\lambda + L_b) T + \frac{5}{2} k L_b \sum _{j = 1}^n |X^j|^2 + 2 \sum _{j = 1}^n \big | g(X^{j-1}) \varDelta W^j \big |^2\\&\qquad + 2 \sum _{j = 1}^n \big \langle g(X^{j-1}) \varDelta W^j, X^{j-1} \big \rangle . \end{aligned}$$

After taking expectations, the last term on the right-hand side vanishes. Then, applications of Itô’s isometry and (4.1) give

$$\begin{aligned}&{\mathbf {E}}\big [ |X^n|^2 \big ] + \frac{1}{2} \sum _{ j = 1}^n {\mathbf {E}}\big [ | X^{j} - X^{j-1} |^2 \big ] + 2 k \mu \sum _{j = 1}^n {\mathbf {E}}\big [|X^j|^p\big ] \\&\quad \le {\mathbf {E}}\big [ |X^0|^2 \big ] + 2 (\lambda + L_b) T + \frac{5}{2} k L_b \sum _{j = 1}^n {\mathbf {E}}\big [ |X^j|^2\big ] + 2 \sum _{j = 1}^n {\mathbf {E}}\big [ \big | g(X^{j-1}) \varDelta W^j \big |^2 \big ] \\&\quad \le {\mathbf {E}}\big [ |X^0|^2 \big ] + 2 (\lambda + L_b) T + \frac{5}{2} k L_b \sum _{j = 1}^n {\mathbf {E}}\big [ |X^j|^2\big ] + 2 k \sum _{j = 1}^n {\mathbf {E}}\big [ | g(X^{j-1})|^2 \big ]\\&\quad \le (1 + 4 k L_g^2 ) {\mathbf {E}}\big [ |X^0|^2 \big ] + 2 (\lambda + L_b + 2 L_g^2) T + k \Big ( \frac{5}{2} L_b + 4 L_g^2\Big ) \sum _{j = 1}^{n-1} {\mathbf {E}}\big [ |X^j|^2\big ]\\&\qquad + \frac{5}{2} k L_b {\mathbf {E}}\big [ |X^n|^2\big ]. \end{aligned}$$

Since the step size bound $5 L_b k \in [0,1)$ ensures that

$$\begin{aligned} 1 - \frac{5}{2} k L_b > \frac{1}{2}, \end{aligned}$$

the discrete Gronwall inequality (see, for example, [8]) is applicable and completes the proof of (5.4). Finally, it follows from the polynomial growth bound on f that

$$\begin{aligned} \Big (k \sum _{j = 1}^N {\mathbf {E}}\big [|\eta ^j|^q\big ]\Big )^{\frac{1}{q}}&\le \Big (k \sum _{j = 1}^N {\mathbf {E}}\big [\beta ^q (1 + |\eta ^j|^{p-1})^q\big ]\Big )^{\frac{1}{q}}\\&\le \beta T^{\frac{1}{q}} + \beta \Big (k \sum _{j = 1}^N {\mathbf {E}}\big [ |X^j|^p \big ] \Big )^{\frac{1}{q}}, \end{aligned}$$

and an application of (5.4) then yields (5.5). $\square $

6 Error estimates in the multi-valued case

In this section we derive an error estimate for the backward Euler method given by (1.6) for the MSDE (1.5).

To prove the convergence of the scheme (1.6) let us fix some notation. Throughout this section we assume that the equidistant step size $k = \frac{T}{N}$ is small enough so that the a priori estimates in Lemma 5.4 hold. Furthermore, as in (3.9) and (3.10), we denote the piecewise linear interpolants of the discrete values by ${\mathcal {X}}(0) = X^0$, ${\mathcal {H}}(0) = \eta ^0$ for $\eta ^0 \in f(X^0)$ and

$$\begin{aligned} {\mathcal {X}}(t) := \frac{t-t_{n-1}}{k} X^n + \frac{t_n-t}{k} X^{n-1},&{\mathcal {H}}(t) := \frac{t-t_{n-1}}{k} \eta ^n + \frac{t_n-t}{k} \eta ^{n-1} \end{aligned}$$

for all $t \in (t_{n-1},t_n]$ and $n \in \{1,\ldots ,N\}$. Similarly, we define the piecewise constant interpolants by ${\overline{{\mathcal {X}}}}(0)={\underline{{\mathcal {X}}}}(0)=X^0$, ${\overline{{\mathcal {H}}}}(0) = {\underline{{\mathcal {H}}}}(0) = \eta ^0$, and

$$\begin{aligned}&{\overline{{\mathcal {X}}}}(t) = X^n \quad \text {and} \quad {\underline{{\mathcal {X}}}}(t) = X^{n-1}, \text { as well as}\\&{\overline{{\mathcal {H}}}}(t) = \eta ^n \quad \text {and} \quad {\underline{{\mathcal {H}}}}(t) = \eta ^{n-1}, \end{aligned}$$

for all $t \in (t_{n-1},t_n]$ and $n \in \{1,\ldots ,N\}$. Moreover, we introduce the stochastic processes $G :[0,T] \times \varOmega \rightarrow {\mathbf {R}}^d$ and ${\mathcal {G}}:[0,T] \times \varOmega \rightarrow {\mathbf {R}}^d$ defined by

$$\begin{aligned} G(t) = \int _{0}^{t} g(X(s)) \,\mathrm {d}W(s), \quad \text { for all } t \in [0,T], \end{aligned}$$

(6.1)

as well as by ${\mathcal {G}}(0) = 0$ and, for all $n \in \{1,\ldots ,N\}$ and $t \in (t_{n-1},t_n]$,

$$\begin{aligned} \begin{aligned} {\mathcal {G}}(t)&= \frac{t - t_{n-1}}{k}g(X^{n-1}) \varDelta W^n + \sum _{i=1}^{n-1} g(X^{i-1}) \varDelta W^i\\&= \frac{t - t_{n-1}}{k} g(X^{n-1}) \varDelta W^n + \int _{0}^{t_{n-1}} g({\underline{{\mathcal {X}}}}(s)) \,\mathrm {d}W(s). \end{aligned}\nonumber \\ \end{aligned}$$

(6.2)

In view of (1.6) and the definition of ${\mathcal {G}}$ for $t \in (t_{n-1},t_n]$, $n \in \{1,\dots ,N\}$, we obtain the representation

$$\begin{aligned} \begin{aligned} {\mathcal {X}}(t)&= X^{n-1} + \frac{t - t_{n-1}}{k} \big (X^n - X^{n-1}\big )\\&= \Big ( X^0 + k \sum _{i=1}^{n-1} \big (b(X^i) - \eta ^i\big ) + \sum _{i=1}^{n-1} g(X^{i-1}) \varDelta W^i \Big ) \\&\quad + \frac{t - t_{n-1}}{k} \Big (k b(X^n) - k \eta ^n + g(X^{n-1}) \varDelta W^n\Big )\\&= X_0 + \int _{0}^{t} \big (b({\overline{{\mathcal {X}}}}(s)) - {\overline{{\mathcal {H}}}}(s)\big ) \,\mathrm {d}s + {\mathcal {G}}(t). \end{aligned}\nonumber \\ \end{aligned}$$

(6.3)

We begin the derivation of our error estimate by considering the difference between the stochastic integral G and its approximation ${\mathcal {G}}$.

Lemma 6.1

Let Assumptions 4.1 to 4.4 be satisfied. Then there exists $K_G \in (0,\infty )$ such that, for every equidistant step size $k = \frac{T}{N}$, $N \in {\mathbf {N}}$ with $5 L_b k \in [0,1)$ and every $t \in [0,T]$, we have

$$\begin{aligned} \begin{aligned} \big \Vert G(t) - {\mathcal {G}}(t) \big \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)}^2 \le K_G k + 2L_g^2 \int _0^t {\mathbf {E}}\big [ | X(s) - {\mathcal {X}}(s) |^2 \big ] \,\mathrm {d}s. \end{aligned} \end{aligned}$$

(6.4)

In addition, for every $\rho \in [2,\infty )$, there exists $K_\rho \in (0,\infty )$ such that, for every $n \in \{1,\ldots ,N\}$ and $t \in (t_{n-1},t_n]$, the following estimates hold:

$$\begin{aligned} \Big ( \int _{t_{n-1}}^{t} {\mathbf {E}}\big [ | G(t) - G(s) |^\rho \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho }}&\le K_\rho k^{\frac{1}{2}} \Big ( \int _{t_{n-1}}^{t} \big (1 + {\mathbf {E}}\big [ | X(s) |^\rho \big ] \big ) \,\mathrm {d}s \Big )^{\frac{1}{\rho }} \end{aligned}$$

(6.5)

and

$$\begin{aligned} \sup _{s \in [t_{n-1},t]} \Vert {\mathcal {G}}(t) - {\mathcal {G}}(s) \Vert _{L^\rho (\varOmega ;{\mathbf {R}}^d)}^{\rho }&\le K_\rho k^{\frac{\rho }{2}} \big ( 1 + \Vert X^{n-1}\Vert _{L^\rho (\varOmega ;{\mathbf {R}}^d)}^{\rho }\big ). \end{aligned}$$

(6.6)

Proof

Recall the definitions of G and ${\mathcal {G}}$ from (6.1) and (6.2). First, we add and subtract a term and then apply the triangle inequality. Then, for every $n \in \{1,\ldots ,N\}$ and $t \in (t_{n-1},t_n]$ we arrive at

$$\begin{aligned}&\big \Vert G(t) - {\mathcal {G}}(t) \big \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)} \le \Big \Vert \int _0^t \big ( g(X(s)) - g({\underline{{\mathcal {X}}}}(s)) \big ) \,\mathrm {d}W(s) \Big \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)}\\ {}&\qquad + \Big \Vert \int _{t_{n-1}}^t g({\underline{{\mathcal {X}}}}(s)) \,\mathrm {d}W(s) - \frac{t - t_{n-1} }{k} g(X^{n-1}) \varDelta W^n \Big \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)} \\ {}&\quad = \Big ( \int _0^t {\mathbf {E}}\big [ | g(X(s)) - g({\underline{{\mathcal {X}}}}(s)) |^2 \big ] \,\mathrm {d}s \Big )^{\frac{1}{2}}\\ {}&\qquad + \Big \Vert g(X^{n-1}) \Big ( \frac{t_n - t}{k} \big ( W(t) - W(t_{n-1}) \big )\\ {}&\qquad \qquad - \frac{t - t_{n-1} }{k} \big (W(t_n) - W(t) \big )\Big ) \Big \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)}\end{aligned}$$

by an application of Itô’s isometry. Furthermore, due to the Lipschitz continuity of g we obtain

$$\begin{aligned}&\Big ( \int _0^t {\mathbf {E}}\big [ | g(X(s)) - g({\underline{{\mathcal {X}}}}(s)) |^2 \big ] \,\mathrm {d}s \Big )^{\frac{1}{2}}\\&\quad \le L_g \Big ( \int _0^t {\mathbf {E}}\big [ | X(s) - {\mathcal {X}}(s) |^2 \big ] \,\mathrm {d}s \Big )^{\frac{1}{2}} + L_g \Big ( \int _0^t {\mathbf {E}}\big [ | {\mathcal {X}}(s) - {\underline{{\mathcal {X}}}}(s) |^2 \big ] \,\mathrm {d}s \Big )^{\frac{1}{2}}\\&\quad \le L_g \Big ( \int _0^t {\mathbf {E}}\big [ |X(s) - {\mathcal {X}}(s)|^2 \big ] \,\mathrm {d}s \Big )^{\frac{1}{2}} + L_g \Big ( \frac{1}{3} k \sum _{i = 1}^n {\mathbf {E}}\big [ |X^i - X^{i-1}|^2 \big ] \Big )^{\frac{1}{2}}, \end{aligned}$$

where the last step follows from the identity

$$\begin{aligned} {\mathcal {X}}(s) - {\underline{{\mathcal {X}}}}(s) = \frac{s-t_{i-1}}{k} X^i + \frac{t_i-s}{k} X^{i-1} - X^{i-1} = \frac{s-t_{i-1}}{k} \big (X^i - X^{i-1}\big ) \end{aligned}$$

which holds for every $s \in (t_{i-1},t_i]$, $i \in \{1,\ldots ,N\}$. Finally, it follows from the same arguments as in the proof of Lemma 3.6 and by (4.1) for every $t \in (t_{n-1},t_n]$ that

$$\begin{aligned}&\Big \Vert g(X^{n-1}) \Big ( \frac{t_n - t}{k} \big ( W(t) - W(t_{n-1}) \big ) - \frac{t - t_{n-1} }{k} \big (W(t_n) - W(t) \big )\Big ) \Big \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)}^2\\&\quad = \frac{1}{k^2} \big ( (t_n - t)^2 (t - t_{n-1}) + (t - t_{n-1})^2 (t_n - t)\big ) \big \Vert g(X^{n-1}) \big \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)}^2\\&\quad = \frac{1}{k} (t_n - t) (t - t_{n-1}) \big \Vert g(X^{n-1}) \big \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)}^2\\&\quad \le \frac{1}{4} L_g^2 k \big (1+ \Vert X^{n-1} \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)} \big )^2. \end{aligned}$$

Together with the a priori bounds from Lemma 5.4 this shows (6.4).

It remains to prove the estimates (6.5) and (6.6). For (6.5) we first apply the Burkholder–Davis–Gundy-type inequality from Lemma 2.2 with constant $C_\rho $ and obtain for every $n \in \{1,\ldots ,N\}$ and $t \in (t_{n-1},t_n]$ that

$$\begin{aligned} \int _{t_{n-1}}^{t} {\mathbf {E}}\big [ | G(t) - G(s) |^{\rho } \big ] \,\mathrm {d}s&\le C_\rho ^{\rho } \int _{t_{n-1}}^{t} (t - s)^{\frac{\rho - 2}{2}} \int _s^{t} {\mathbf {E}}\big [ |g(X(\tau ))|^\rho \big ] \,\mathrm {d}\tau \,\mathrm {d}s\\&\le \frac{2^{\rho } }{\rho } C_{\rho }^{\rho } L_g^\rho k^{\frac{\rho }{2}} \int _{t_{n-1}}^{t} \big ( 1 + {\mathbf {E}}\big [ | X(\tau )|^\rho \big ] \big ) \,\mathrm {d}\tau , \end{aligned}$$

where we also made use of the linear growth bound (4.1) in the last step. This proves (6.5). The bound in (6.6) can be shown by analogous arguments. $\square $

The next lemma generalizes an important estimate from the proof of Theorem 3.7 to the multi-valued setting. In particular, we refer to Lemma 3.5 and (3.17).

Lemma 6.2

Let Assumptions 4.1 to 4.4 be satisfied. For every step size $k = \frac{T}{N}$, $N \in {\mathbf {N}}$, with $5 L_b k \in [0,1)$, the families of random variables $(X^n)_{n\in \{0,\dots ,N\}}$ and $(\eta ^n)_{n\in \{0,\dots ,N\}}$ are as stated in Theorem 5.3. Then there exists $K_{\delta \eta } \in (0,\infty )$ independent of the step size k such that

$$\begin{aligned} 0\le&k \sum _{i=1}^{N} {\mathbf {E}}\big [ \langle \eta ^i - \eta ^{i-1},X^i - X^{i-1}\rangle _{}\big ] \le K_{\delta \eta } k^{\frac{1}{2}}. \end{aligned}$$

Proof

The nonnegativity follows immediately from the monotonicity of f. To prove the second inequality, we insert the scheme (5.3) and obtain

$$\begin{aligned}&k \sum _{i=1}^{N} {\mathbf {E}}\big [ \langle \eta ^i - \eta ^{i-1},X^i - X^{i-1}\rangle _{}\big ] \nonumber \\&\quad = k \sum _{i=1}^{N} {\mathbf {E}}\big [ \langle \eta ^i - \eta ^{i-1},k (b(X^i) - \eta ^i) + g(X^{i-1}) \varDelta W^i\rangle _{}\big ] \nonumber \\&\quad = - k^2 \sum _{i=1}^{N} {\mathbf {E}}\big [ \langle \eta ^i - \eta ^{i-1},\eta ^i\rangle _{}\big ] \end{aligned}$$

(6.7)

$$\begin{aligned}&\qquad + k \sum _{i=1}^{N} {\mathbf {E}}\big [ \langle \eta ^i - \eta ^{i-1} ,kb(X^i) + g(X^{i-1}) \varDelta W^i\rangle _{}\big ]. \end{aligned}$$

(6.8)

For (6.7) we obtain

$$\begin{aligned} - k^2 \sum _{i=1}^{N} {\mathbf {E}}\big [ \langle \eta ^i - \eta ^{i-1},\eta ^i\rangle _{}\big ]&=- \frac{k^2}{2} \sum _{i=1}^{N} {\mathbf {E}}\big [ |\eta ^i |^2 - |\eta ^{i-1}|^2 + |\eta ^i - \eta ^{i-1} |^2\big ] \\&\le - \frac{k^2}{2} \big ({\mathbf {E}}\big [ |\eta ^N |^2\big ] - {\mathbf {E}}\big [ |\eta ^{0}|^2\big ]\big ) \le \frac{k^2}{2} {\mathbf {E}}\big [ |\eta ^0|^2\big ] \end{aligned}$$

because of the telescopic structure. Furthermore, it follows from Assumptions 4.1 and 4.4 that

$$\begin{aligned} \big ({\mathbf {E}}\big [ |\eta ^0|^2\big ]\big )^{\frac{1}{2}} \le \beta \big ( 1 + \big ({\mathbf {E}}\big [ |X_0|^{2p-2} \big ]\big )^{\frac{1}{2}} \big ) < \infty . \end{aligned}$$

For the term in (6.8) we apply Hölder’s inequality with $\rho = \max (2,p)$ and $\frac{1}{\rho } + \frac{1}{\rho '} = 1$ to obtain

$$\begin{aligned}&k \sum _{i=1}^{N} {\mathbf {E}}[\langle \eta ^i - \eta ^{i-1} ,kb(X^i) + g(X^{i-1}) \varDelta W^i\rangle _{}]\\&\quad \le k \sum _{i=1}^{N} \big ({\mathbf {E}}[ |\eta ^i - \eta ^{i-1} |^{\rho '} ]\big )^{\frac{1}{\rho '}} \big ({\mathbf {E}}[ |k b(X^i) + g(X^{i-1}) \varDelta W^i|^{\rho }]\big )^{\frac{1}{\rho }} \\&\quad \le \Big (k \sum _{i=1}^{N} {\mathbf {E}}[ |\eta ^i - \eta ^{i-1}|^{\rho '} ]\Big )^{\frac{1}{\rho '}} \Big (k \sum _{i=1}^{N} {\mathbf {E}}[ |k b(X^i) + g(X^{i-1}) \varDelta W^i|^{\rho }]\Big )^{\frac{1}{\rho }}. \end{aligned}$$

Then, from applications of the triangle inequality and Lemma 5.4, we get

$$\begin{aligned}&\Big (k \sum _{i=1}^{N} {\mathbf {E}}[ |\eta ^i - \eta ^{i-1}|^{\rho '} ]\Big )^{\frac{1}{\rho '}} \\&\quad \le \Big (k \sum _{i=1}^{N} {\mathbf {E}}[ |\eta ^i |^{\rho '} ]\Big )^{\frac{1}{\rho '}} + \Big (k \sum _{i=1}^{N} {\mathbf {E}}[ |\eta ^{i-1}|^{\rho '} ]\Big )^{\frac{1}{\rho '}} \\&\quad \le K_\eta ^{\frac{1}{\rho '}} + \big ( K_\eta + {\mathbf {E}}[ |\eta ^0|^{\rho '} ] \big )^{\frac{1}{\rho '}} \le 2 K_\eta ^{\frac{1}{\rho '}} + \big ( {\mathbf {E}}[ |\eta ^0|^{\rho '} ] \big )^{\frac{1}{\rho '}}. \end{aligned}$$

We apply the polynomial growth bound satisfied by f and see that, for $p \in [2,\infty )$,

$$\begin{aligned} \big ( {\mathbf {E}}[ |\eta ^0|^{\rho '} ]\big )^{\frac{1}{\rho '}} =\big ( {\mathbf {E}}[ |\eta ^0|^{q} ]\big )^{\frac{1}{q}} \le \beta ( 1 + \Vert X_0 \Vert _{L^p(\varOmega ;{\mathbf {R}}^d)}^{p-1}) \end{aligned}$$

is fulfilled, while for $p \in [1,2)$ we have

$$\begin{aligned} \big ( {\mathbf {E}}[ |\eta ^0|^{\rho '} ]\big )^{\frac{1}{\rho '}}&= \big ({\mathbf {E}}[ |\eta ^0|^2 ] \big )^{\frac{1}{2}}\\&\le \beta \big ( 1 + \big ({\mathbf {E}}[ |X_0|^{2p-2} ]\big )^{\frac{1}{2}} \big ) = \beta \big ( 1 + \Vert X_0 \Vert _{L^{2p-2}(\varOmega ;{\mathbf {R}}^d)}^{p-1} \big ). \end{aligned}$$

In both cases the appearing terms are finite because of Assumption 4.4. Moreover, a further application of the triangle inequality yields

$$\begin{aligned}&\left( k \sum _{i=1}^{N} {\mathbf {E}}[ |k b(X^i) + g(X^{i-1}) \varDelta W^i|^{\rho }]\right) ^{\frac{1}{\rho }}\\&\quad \le \left( k \sum _{i=1}^{N} {\mathbf {E}}[ |k b(X^i)|^{\rho }]\right) ^{\frac{1}{\rho }} + \left( k \sum _{i=1}^{N} {\mathbf {E}}[ | g(X^{i-1}) \varDelta W^i|^{\rho }]\right) ^{\frac{1}{\rho }}. \end{aligned}$$

Due to the linear growth bound (4.1) on b and the a priori bound (5.4), it then follows that

$$\begin{aligned} \left( k \sum _{i=1}^{N} {\mathbf {E}}[ |k b(X^i)|^{\rho }]\right) ^{\frac{1}{\rho }}&\le L_b k \left( k \sum _{i=1}^{N} {\mathbf {E}}\big [ \big (1 + | X^i| \big )^{\rho } \big ]\right) ^{\frac{1}{\rho }}\\&\le L_b k \left( T^{\frac{1}{\rho }} + \big ( \max \big (\frac{1}{2 \mu },T\big ) K_X \big )^{\frac{1}{\rho }} \right) . \end{aligned}$$

By application of Lemma 2.2 with constant $C_{\rho }$, we obtain

$$\begin{aligned} {\mathbf {E}}[ |g(X^{i-1}) \varDelta W^i|^{\rho }] = {\mathbf {E}}\Big [ \Big |\int _{t_{i-1}}^{t_i} g(X^{i-1}) \,\mathrm {d}W(s)\Big |^{\rho } \Big ] \le C_{\rho }^{\rho } k^{\frac{\rho }{2}} {\mathbf {E}}\big [ |g(X^{i-1})|^{\rho } \big ]. \end{aligned}$$

Together with the linear growth bound (4.1) on g this shows that

$$\begin{aligned} \left( k \sum _{i=1}^{N} {\mathbf {E}}[ |g(X^{i-1}) \varDelta W^i|^{\rho }]\right) ^{\frac{1}{\rho }}&\le C_{\rho } k^{\frac{1}{2}} \left( k \sum _{i=1}^{N} {\mathbf {E}}\big [ |g(X^{i-1})|^{\rho } \big ] \right) ^{\frac{1}{\rho }}\\&\le C_{\rho } L_g k^{\frac{1}{2}} \left( T^{\frac{1}{\rho }} + \big ( \max \big (\frac{1}{2 \mu },T\big ) K_X \big )^{\frac{1}{\rho }} \right) . \end{aligned}$$

Putting the estimates together proves the desired bound. $\square $

We are now prepared to state and prove the main result of this section. While the main ingredients of the proof still consist of techniques introduced in [38, Sect. 4] for deterministic problems the proof is somewhat more technical than the proof of Theorem 3.7. In particular, due to the presence of Lipschitz perturbations in the general problem (1.5) it is no longer possible to avoid an application of a Gronwall lemma. Moreover, as in [38, Sect. 4], we impose the following additional assumption on the multi-valued mapping f.

Assumption 6.3

There exists $\gamma \in (0,\infty )$ such that, for every $v,w, z \in D(f)$, $f_v \in f(v)$, $f_w \in f(w)$, and $f_z \in f(z)$,

$$\begin{aligned} \langle f_v - f_z,z-w\rangle _{} \le \gamma \langle f_v - f_w,v-w\rangle _{}. \end{aligned}$$

In Lemma 3.2, we already proved that, if f is the subdifferential of a convex potential, then Assumption 6.3 is satisfied with $\gamma = 1$. For a further example, we refer to Sect. 7.

Theorem 6.4

Let Assumptions 4.1–4.4 and Assumption 6.3 be satisfied. Let the step size $k = \frac{T}{N}$, $N \in {\mathbf {N}}$, be such that $8 L_b k \in [0,1)$. Then there exists a constant $C \in (0,\infty )$ independent of k such that

$$\begin{aligned} \max _{t \in [0,T]} \Vert X(t) - {\mathcal {X}}(t) \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)} \le C k^{\frac{1}{4}}. \end{aligned}$$

Remark 6.5

The strong rate of convergence of 1/4 might not be optimal in the case of a piecewise Lipschitz drift coefficient. Under this additional assumption it is proved in [35] that the forward Euler–Maruyama scheme has a strong convergence rate of 1/2. In [34] a further scheme is introduced with the strong convergence order of 3/4. As proved in [33], the rate of 3/4 is a sharp lower error bound. In contrast to that our setting allows for a superlinearly growing and multi-valued drift coefficient at the cost of a lower convergence rate.

Proof Theorem 6.4

Let us first introduce some additional notation. We will denote the error between the exact solution X to (1.5) and the numerical approximation ${\mathcal {X}}$ defined in (6.3) by $E(t):= X(t) - {\mathcal {X}}(t)$, $t \in [0,T]$. Furthermore, it will be convenient to split the error into two parts

$$\begin{aligned} E(t) = E_1(t) + E_2(t), \quad t \in [0,T], \end{aligned}$$

where

$$\begin{aligned} E_1(t)&:= \int _{0}^{t} \big ( {\overline{{\mathcal {H}}}}(s) - \eta (s) \big ) \,\mathrm {d}s + \int _{0}^{t} \big ( b(X(s)) - b({\overline{{\mathcal {X}}}}(s)) \big ) \,\mathrm {d}s, \end{aligned}$$

(6.9)

$$\begin{aligned} E_2(t)&:= G(t) - {\mathcal {G}}(t) \end{aligned}$$

(6.10)

${\mathbf {P}}$-almost surely for every $t \in (0,T]$. We expand the square of the norm of E as

$$\begin{aligned} |E(t)|^2 = |E_1(t)|^2 + 2 \langle E_1(t),E_2(t)\rangle _{} + |E_2(t)|^2, \quad t \in [0,T]. \end{aligned}$$

(6.11)

In order to estimate the terms on the right-hand side of (6.11) we first observe in (6.9) that $E_1$ has absolutely continuous sample paths with $E_1(0)=0$. Hence we have $\frac{1}{2} \frac{\mathrm {d}}{\,\mathrm {d}t}|E_1(t)|^2 = \langle {\dot{E}}_1(t), E_1(t) \rangle $ for almost every $t \in [0,T]$. Therefore, after integrating from 0 to $t \in (0,T]$, we get

$$\begin{aligned} \frac{1}{2} |E_1(t)|^2= & {} \int _{0}^{t} \langle {\dot{E}}_1(s),E_1(s)\rangle _{} \,\mathrm {d}s\nonumber \\= & {} \int _{0}^{t} \langle {\dot{E}}_1(s),E(s)\rangle _{} \,\mathrm {d}s - \int _{0}^{t} \langle {\dot{E}}_1(s),E_2(s)\rangle _{} \,\mathrm {d}s. \end{aligned}$$

(6.12)

Furthermore, we also have that

$$\begin{aligned} \langle E_1(t),E_2(t)\rangle _{} = \Big \langle \int _0^t {\dot{E}}_1(s) \,\mathrm {d}s, E_2(t) \Big \rangle = \int _0^t \langle {\dot{E}}_1(s),E_2(t)\rangle _{} \,\mathrm {d}s. \end{aligned}$$

(6.13)

Thus, after combining (6.12) and (6.13) we obtain

$$\begin{aligned} \begin{aligned}&\frac{1}{2} |E_1(t)|^2 + \langle E_1(t),E_2(t)\rangle _{}\\&\quad = \int _0^t \langle {\dot{E}}_1(s),E(s)\rangle _{} \,\mathrm {d}s + \int _0^t \langle {\dot{E}}_1(s),E_2(t)-E_2(s)\rangle _{} \,\mathrm {d}s. \end{aligned}\nonumber \\ \end{aligned}$$

(6.14)

For the first integral on the right-hand side of (6.14) we insert the derivative of $E_1$ and the definition of the error process E. This yields, for almost every $s \in (0, T]$,

$$\begin{aligned} \langle {\dot{E}}_1(s),E(s)\rangle _{} = \langle {\overline{{\mathcal {H}}}}(s) - \eta (s),X(s) - {\mathcal {X}}(s)\rangle _{} + \langle b(X(s)) - b({\overline{{\mathcal {X}}}}(s)),X(s) - {\mathcal {X}}(s)\rangle _{}. \end{aligned}$$

After recalling the definition of ${\mathcal {X}}$ we use Assumptions 4.1 and 6.3. Then, for almost every $s \in (t_{n-1},t_n]$ and all $n \in \{1,\ldots ,N\}$, we get

$$\begin{aligned}&\langle {\overline{{\mathcal {H}}}}(s) - \eta (s),X(s) - {\mathcal {X}}(s)\rangle _{}\\&\quad = \frac{t_n - s}{k}\langle \eta ^n - \eta (s) ,X(s) - X^{n-1}\rangle _{} + \frac{s - t_{n-1}}{k} \langle \eta ^n - \eta (s) ,X(s) - X^n\rangle _{}\\&\quad \le \gamma \frac{t_n - s}{k} \langle \eta ^n - \eta ^{n-1} ,X^n - X^{n-1}\rangle _{} - \frac{s - t_{n-1}}{k} \langle \eta (s)- \eta ^n,X(s) - X^n\rangle _{}\\&\quad \le \gamma \frac{t_n - s}{k} \langle \eta ^n - \eta ^{n-1} ,X^n - X^{n-1}\rangle _{}, \end{aligned}$$

where the second term in the last step is non-positive due to the monotonicity of f (cf. Definition 2.1). Moreover, due to the Lipschitz continuity of b, it follows for almost every $s \in (0,T]$ that

$$\begin{aligned}&\langle b(X(s)) - b({\overline{{\mathcal {X}}}}(s)),X(s) - {\mathcal {X}}(s)\rangle _{}\\&\quad = \langle b(X(s)) - b({\mathcal {X}}(s)),X(s) - {\mathcal {X}}(s)\rangle _{} + \langle b({\mathcal {X}}(s)) - b({\overline{{\mathcal {X}}}}(s)),X(s) - {\mathcal {X}}(s)\rangle _{}\\&\quad \le L_b |E(s)|^2 + L_b |{\mathcal {X}}(s) - {\overline{{\mathcal {X}}}}(s)| |E(s)| \le \frac{3}{2} L_b |E(s)|^2 + \frac{L_b}{2} |{\mathcal {X}}(s) - {\overline{{\mathcal {X}}}}(s)|^2, \end{aligned}$$

where we also made use of Young’s inequality. In addition, for every $n \in \{1,\ldots ,N\}$ and $s \in (t_{n-1},t_n]$, we have that

$$\begin{aligned} {\mathcal {X}}(s) - {\overline{{\mathcal {X}}}}(s) = \frac{s-t_{n-1}}{k} X^n + \frac{t_n-s}{k} X^{n-1} - X^n = - \frac{t_n-s}{k} \big (X^n - X^{n-1}\big ). \end{aligned}$$

Therefore,

$$\begin{aligned} \langle b(X(s)) - b({\overline{{\mathcal {X}}}}(s)),X(s) - {\mathcal {X}}(s)\rangle _{} \le \frac{3}{2} L_b |E(s)|^2 + \frac{L_b(t_n-s)^2}{2 k^2} |X^n - X^{n-1}|^2. \end{aligned}$$

Altogether, for every $t \in (t_{n-1}, t_n]$ and $n \in \{1,\ldots ,N\}$, we have shown that

$$\begin{aligned} \int _{t_{n-1}}^t \langle {\dot{E}}_1(s),E(s)\rangle _{} \,\mathrm {d}s&\le \frac{\gamma }{2} k \langle \eta ^n - \eta ^{n-1} ,X^n - X^{n-1}\rangle _{}\\&\quad + \frac{3}{2} L_b \int _{t_{n-1}}^{t} | E(s)|^2 \,\mathrm {d}s + \frac{L_b}{6} k |X^n - X^{n-1}|^2, \end{aligned}$$

where we also inserted that $\int _{t_{n-1}}^{t} (t_n - s) \,\mathrm {d}s \le \int _{t_{n-1}}^{t_n} (t_n - s) \,\mathrm {d}s = \frac{1}{2} k^2$ as well as $\int _{t_{n-1}}^{t} (t_n - s)^2 \,\mathrm {d}s \le \frac{1}{3} k^3$. It follows that, for every $n \in \{1,\ldots ,N\}$ and $t \in (t_{n-1},t_n]$,

$$\begin{aligned}&\int _0^t\langle {\dot{E}}_1(s),E(s)\rangle _{} \,\mathrm {d}s = \sum _{i = 1}^{n-1} \int _{t_{i-1}}^{t_i} \langle {\dot{E}}_1(s),E(s)\rangle _{} \,\mathrm {d}s + \int _{t_{n-1}}^t \langle {\dot{E}}_1(s),E(s)\rangle _{} \,\mathrm {d}s \\&\quad \le \frac{\gamma }{2} k \sum _{i = 1}^n \langle \eta ^i - \eta ^{i-1} ,X^i - X^{i-1}\rangle _{} + \frac{L_b}{6} k \sum _{i = 1}^n |X^i - X^{i-1}|^2\\&\qquad + \frac{3}{2} L_b \int _0^t |E(s)|^2 \,\mathrm {d}s. \end{aligned}$$

Hence, together with Lemmas 5.4 and 6.2 this shows that

$$\begin{aligned} \int _0^t {\mathbf {E}}\big [\langle {\dot{E}}_1(s),E(s)\rangle _{}\big ] \,\mathrm {d}s&\le \frac{\gamma }{2} K_{\delta \eta } k^{\frac{1}{2}} + \frac{L_b}{3} K_X k + \frac{3}{2} L_b \int _0^t {\mathbf {E}}\big [|E(s)|^2\big ] \,\mathrm {d}s. \end{aligned}$$

(6.15)

Next, we give an estimate for the second integral on the right-hand side of (6.14). For every $n \in \{1,\ldots ,N\}$ and $t \in (t_{n-1},t_n]$ we decompose the integral as follows

$$\begin{aligned} \int _{0}^{t} \langle {\dot{E}}_1(s),E_2(t) - E_2(s)\rangle _{} \,\mathrm {d}s= & {} \sum _{i = 1}^{n-1} \int _{t_{i-1}}^{t_i} \langle {\dot{E}}_1(s),E_2(t) - E_2(s)\rangle _{} \,\mathrm {d}s\nonumber \\&\quad + \int _{t_{n-1}}^t \langle {\dot{E}}_1(s),E_2(t) - E_2(s)\rangle _{} \,\mathrm {d}s. \end{aligned}$$

(6.16)

For every $i \in \{1,\ldots ,n-1\}$ we then add and subtract $E_2(t_i)$ in the second slot of the inner product in the first term on the right-hand side of (6.16). This gives

$$\begin{aligned} \int _{t_{i-1}}^{t_i} \langle {\dot{E}}_1(s),E_2(t) - E_2(s)\rangle _{} \,\mathrm {d}s&= \int _{t_{i-1}}^{t_i} \langle {\dot{E}}_1(s),E_2(t) - E_2(t_{i})\rangle _{} \,\mathrm {d}s\\&\quad + \int _{t_{i-1}}^{t_i} \langle {\dot{E}}_1(s),E_2(t_{i}) - E_2(s)\rangle _{} \,\mathrm {d}s. \end{aligned}$$

After inserting the definition of $E_2$ from (6.10) the first integral is then equal to

$$\begin{aligned}&\int _{t_{i-1}}^{t_i} \langle {\dot{E}}_1(s),E_2(t) - E_2(t_{i})\rangle _{} \,\mathrm {d}s = \Big \langle \int _{t_{i-1}}^{t_i} {\dot{E}}_1(s) \,\mathrm {d}s, E_2(t) - E_2(t_{i}) \Big \rangle \\&\quad = \langle E_1(t_i) - E_1(t_{i-1}),E_2(t) - E_2(t_{i})\rangle _{}\\&\quad = \langle E_1(t_i) - E_1(t_{i-1}),G(t) - {\mathcal {G}}(t) - ( G(t_{i}) - {\mathcal {G}}(t_i)) \rangle _{} \\&\quad = \Big \langle E_1(t_i) - E_1(t_{i-1}), \int _{t_i}^{t} g(X(s)) \,\mathrm {d}W(s) \Big \rangle \\&\qquad - \Big \langle E_1(t_i) - E_1(t_{i-1}), \int _{t_i}^{t_{n-1}} g({\underline{{\mathcal {X}}}}(s)) \,\mathrm {d}W(s) + \frac{t - t_{n-1}}{k} g(X^{n-1}) \varDelta W^n \Big \rangle \end{aligned}$$

for all $i, n \in \{1,\ldots ,N\}$, $i < n$, and $t \in (t_{n-1},t_n]$. Since $E_1(t_i) - E_1(t_{i-1}) = E(t_i) - E(t_{i-1}) - (E_2(t_i) - E_2(t_{i-1}))$ is square-integrable and ${\mathcal {F}}_{t_i}$-measurable it therefore follows that

$$\begin{aligned} {\mathbf {E}}\Big [ \int _{t_{i-1}}^{t_i} \langle {\dot{E}}_1(s),E_2(t) - E_2(t_{i})\rangle _{} \,\mathrm {d}s \Big ] = 0 \end{aligned}$$

for all $n \in \{1,\ldots ,N\}$, $t \in (t_{n-1},t_n]$, and $t_i < t$. Hence, after taking expectations in (6.16), we arrive at

$$\begin{aligned}&{\mathbf {E}}\Big [ \int _{0}^{t} \langle {\dot{E}}_1(s),E_2(t) - E_2(s)\rangle _{} \,\mathrm {d}s \Big ]\\&\quad = \sum _{i = 1}^{n-1} {\mathbf {E}}\Big [ \int _{t_{i-1}}^{t_i} \langle {\dot{E}}_1(s),E_2(t_i) - E_2(s)\rangle _{} \,\mathrm {d}s \Big ]\\&\qquad + {\mathbf {E}}\Big [ \int _{t_{n-1}}^t \langle {\dot{E}}_1(s),E_2(t) - E_2(s)\rangle _{} \,\mathrm {d}s \Big ]\\&\quad \le \sum _{i = 1}^{n} {\mathbf {E}}\Big [ \int _{t_{i-1}}^{t_i} |{\dot{E}}_1(s)| |E_2(t_i) - E_2(s)| \,\mathrm {d}s \Big ]. \end{aligned}$$

Inserting the definitions (6.9) and (6.10) of $E_1$ and $E_2$ and applying Hölder’s inequality with $\rho = \max (2,p)$ and $\frac{1}{\rho }+\frac{1}{\rho '}=1$, we get

$$\begin{aligned}&{\mathbf {E}}\Big [ \int _{0}^{t} \langle {\dot{E}}_1(s),E_2(t) - E_2(s)\rangle _{} \,\mathrm {d}s \Big ]\\&\quad \le \sum _{i = 1}^{n} \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ \big (|\eta ^i - \eta (s)| + |b(X(s)) - b(X^i)| \big )\\&\qquad \times \big (| G(t_i) - G(s)| + |{\mathcal {G}}(t_i) - {\mathcal {G}}(s)| \big ) \big ] \,\mathrm {d}s\\&\quad \le \sum _{i = 1}^{n} \Big ( \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ |\eta ^i - \eta (s)|^{\rho '} \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho '}} \Big ( \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ | G(t_i) - G(s)|^\rho \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho }}\\&\qquad + \sum _{i = 1}^{n} \Big ( \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ |\eta ^i - \eta (s)|^{\rho '} \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho '}} \Big ( \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ | {\mathcal {G}}(t_i) - {\mathcal {G}}(s)|^\rho \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho }}\\&\qquad + \sum _{i = 1}^{n} \Big ( \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ |b(X(s)) - b(X^i)|^{\rho '} \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho '}} \Big ( \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ | G(t_i) - G(s)|^\rho \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho }}\\&\qquad + \sum _{i = 1}^{n} \Big ( \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ |b(X(s)) - b(X^i)|^{\rho '} \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho '}} \Big ( \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ | {\mathcal {G}}(t_i) - {\mathcal {G}}(s)|^\rho \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho }}\\&\quad =: \varGamma _1 + \varGamma _2 + \varGamma _3 + \varGamma _4. \end{aligned}$$

In the following, we will estimate $\varGamma _1$, $\varGamma _2$, $\varGamma _3$, and $\varGamma _4$ separately. For $\varGamma _1$ we obtain after an application of Hölder’s inequality for sums that

$$\begin{aligned} \varGamma _1&\le \Big ( \sum _{i = 1}^{n} \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ |\eta ^i - \eta (s)|^{\rho '} \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho '}} \Big ( \sum _{i = 1}^{n} \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ | G(t_i) - G(s)|^\rho \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho }}\\&\le \Big (\Big ( k \sum _{i = 1}^{n} {\mathbf {E}}\big [ |\eta ^i|^{\rho '} \big ] \Big )^{\frac{1}{\rho '}} + \Big ( \int _{0}^{t_n} {\mathbf {E}}\big [ |\eta (s)|^{\rho '} \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho '}}\Big )\\&\quad \times \Big ( \sum _{i = 1}^{n} \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ | G(t_i) - G(s)|^\rho \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho }}. \end{aligned}$$

If $p \in [2,\infty )$ then $\rho = p$ and $\rho ' = q$. In this case all integrals appearing are finite due to the bounds in Theorem 4.7 and Lemma 5.4. Moreover, if $p \in (1,2)$ then $\rho = \rho ' = 2 < q$. Then it follows from further applications of Hölder’s inequality and Jensen’s inequality that

$$\begin{aligned} k \sum _{i = 1}^{n} {\mathbf {E}}\big [ |\eta ^i|^{2} \big ] \le T^{\frac{q-2}{2}} \Big ( k \sum _{i = 1}^{n} {\mathbf {E}}\big [ |\eta ^i|^{q} \big ] \Big )^{\frac{2}{q}} \end{aligned}$$

as well as

$$\begin{aligned} \int _{0}^{t_n} {\mathbf {E}}\big [ |\eta (s)|^{2} \big ] \,\mathrm {d}s \le T^{\frac{q-2}{2}} \Big ( \int _{0}^{t_n} {\mathbf {E}}\big [ |\eta (s)|^{q} \big ] \,\mathrm {d}s \Big )^{\frac{2}{q}}. \end{aligned}$$

Hence, we arrive at the same conclusion. If $p = 1$ then the processes $(\eta (t))_{t \in [0,T]}$ and $(\eta ^n)_{n \in \{1,\ldots ,N\}}$ are globally bounded due to the bound on f in Assumption 4.1. Using Lemma 6.1 we see that

$$\begin{aligned} \left( \sum _{i = 1}^{n} \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ | G(t_i) - G(s)|^\rho \big ] \,\mathrm {d}s \right) ^{\frac{1}{\rho }} \le K_{\rho } k^{\frac{1}{2}} \left( \int _0^{t_n} \big ( 1 + {\mathbf {E}}\big [ | X(s)|^\rho \big ] \big ) \,\mathrm {d}s \right) ^{\frac{1}{\rho }}. \end{aligned}$$

Altogether, this yields

$$\begin{aligned} \varGamma _1 \le C_{\varGamma _1} k^{\frac{1}{2}} \end{aligned}$$

for a suitable constant $C_{\varGamma } \in (0,\infty )$, which is independent of k. To estimate $\varGamma _2$, we argue analogously as in the case for $\varGamma _1$ to obtain that

$$\begin{aligned} \varGamma _2&\le \left( \left( k \sum _{i = 1}^{n} {\mathbf {E}}\big [ |\eta ^i|^{\rho '} \big ] \right) ^{\frac{1}{\rho '}} + \Big ( \int _{0}^{t} {\mathbf {E}}\big [ |\eta (s)|^{\rho '} \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho '}}\right) \\&\quad \times \left( \sum _{i = 1}^{n} \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ | {\mathcal {G}}(t_i) - {\mathcal {G}}(s)|^\rho \big ] \,\mathrm {d}s \right) ^{\frac{1}{\rho }}. \end{aligned}$$

The first factor is bounded as we saw in the case for $\varGamma _1$. Furthermore, using Lemma 6.1, we have that

$$\begin{aligned} \left( \sum _{i = 1}^{n} \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ | {\mathcal {G}}(t_i) - {\mathcal {G}}(s)|^\rho \big ] \,\mathrm {d}s \right) ^{\frac{1}{\rho }} \le K_{\rho } k^{\frac{1}{2}} \left( k \sum _{i =1}^{n} \big ( 1 + {\mathbf {E}}\big [ | X^{i-1}|^\rho \big ] \big ) \,\mathrm {d}s \right) ^{\frac{1}{\rho }}. \end{aligned}$$

Due to the a priori bound (5.4), it follows that there exists a constant $C_{\varGamma _2} \in (0,\infty )$, which does not depend on k such that

$$\begin{aligned} \varGamma _2 \le C_{\varGamma _2} k^{\frac{1}{2}}. \end{aligned}$$

The estimates $\varGamma _3$ and $\varGamma _4$ follow analogously with the only new term that appears is of the form

$$\begin{aligned}&\Big ( \sum _{i = 1}^{n} \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ |b(X(s)) - b(X^i)|^{\rho '} \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho '}}\\&\quad \le L_b \Big ( \sum _{i = 1}^{n} \int _{t_{i-1}}^{t_i} {\mathbf {E}}\big [ |X(s) - X^i|^{\rho '} \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho '}}\\&\quad \le L_b \Big ( \int _{0}^{t_n} {\mathbf {E}}\big [ |X(s)|^{\rho '} \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho '}} + L_b \Big ( k \sum _{i = 1}^{n} {\mathbf {E}}\big [|X^i|^{\rho '} \big ] \,\mathrm {d}s \Big )^{\frac{1}{\rho '}}, \end{aligned}$$

which is bounded due to Theorem 4.7 and the a priori bound (5.4). Therefore, there exist constants $C_{\varGamma _3}, C_{\varGamma _4} \in (0,\infty )$ such that

$$\begin{aligned} \varGamma _3 \le C_{\varGamma _3} k^{\frac{1}{2}} \quad \text {and} \quad \varGamma _4 \le C_{\varGamma _4} k^{\frac{1}{2}}. \end{aligned}$$

Hence, we obtain

$$\begin{aligned} {\mathbf {E}}\Big [ \int _{0}^{t} \langle {\dot{E}}_1(s),E_2(t) - E_2(s)\rangle _{} \,\mathrm {d}s \Big ] \le (C_{\varGamma _1} + C_{\varGamma _2} + C_{\varGamma _3} + C_{\varGamma _4}) k^{\frac{1}{2}} =: C_{\varGamma } k^{\frac{1}{2}} . \end{aligned}$$

(6.17)

After taking expectations in (6.11) and inserting (6.14), (6.15), (6.17) as well as (6.4) from Lemma 6.1, we obtain for every $t \in (0,T]$ that

$$\begin{aligned}&{\mathbf {E}}\big [ |E(t)|^2 \big ] \\&\quad \le \gamma K_{\delta \eta } k^{\frac{1}{2}} + \frac{2 L_b}{3} K_X k + 2 C_\varGamma k^{\frac{1}{2}} + K_G k + \big ( 3 L_b + 2 L_g^2 \big ) \int _0^t |E(s)|^2 \,\mathrm {d}s. \end{aligned}$$

The assertion then follows from an application of Gronwall’s lemma, see for example, [11, Appendix B]. $\square $

Remark 6.6

Up to this point, we only proved convergence for X but not for $\eta $. However, from the existence of $X^n$ we also obtain that

$$\begin{aligned} k \eta ^n = - (X^n - X^{n-1}) + k b(X^n) + g(X^{n-1}) \varDelta W^n \quad \text { a.s. in } \varOmega . \end{aligned}$$

Analogously, we can write for the exact solution $\eta $ that

$$\begin{aligned} \int _{0}^{t} \eta (s) \,\mathrm {d}s = - X(t) + X_0 + \int _0^t b(X(s)) \,\mathrm {d}s + \int _{0}^{t} g(X(s)) \,\mathrm {d}W(s). \end{aligned}$$

Therefore, from the convergence of ${\mathcal {X}}$ to X and the Lipschitz continuity of b and g we also obtain the estimate

$$\begin{aligned} \Big \Vert \int _{0}^{t_n} \eta (s) \,\mathrm {d}s - k\sum _{j = 1}^n \eta ^j \Big \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)} \le C k^{\frac{1}{4}} \end{aligned}$$

for every $n \in \{1,\ldots ,N\}$.

7 Examples

7.1 Discontinuous drift coefficient

In this example we show that Assumption 4.1 includes overdamped Langevin-type equations with a possibly discontinuous drift f. We consider the convex, nonnegative, yet not continuously differentiable function $\varPhi (x) := |x|$, $x \in {\mathbf {R}}$, which has a multi-valued subdifferential $f :{\mathbf {R}}\rightarrow 2^{{\mathbf {R}}}$ defined by

$$\begin{aligned} f(x) := {\left\{ \begin{array}{ll} \{1\},&{} \text { if } x > 0,\\ {[}-1,1],&{} \text { if } x = 0,\\ \{-1\},&{} \text { if } x < 0. \end{array}\right. } \end{aligned}$$

This mapping fulfills Assumption 4.1 for $p=1$. To be more precise, f is a monotone function and there exists no proper monotone extension of its graph. In fact, the subdifferential of any proper, lower semi-continuous, and convex function is a maximal monotone mapping by a well-known theorem of Rockafellar, cf. [46, Cor. 31.5.2] or [48, Satz 3.23].

Furthermore, we notice that $f_x x = \text {sgn} (x) x = |x|$ as well as $|f_x | \le 1$ for every $x \in {\mathbf {R}}$ and $f_x \in f(x)$. This shows that f fulfills all the conditions of Assumption 4.1. It remains to verify Assumption 6.3. Since f is the subdifferential of $\varPhi $ the variational inequality (3.2) is still satisfied in the sense that

$$\begin{aligned} f_x(y-x) \le \varPhi (y) - \varPhi (x) \end{aligned}$$

for all $x,y \in {\mathbf {R}}$ and $f_x \in f(x)$. Following the same steps as in the proof of Lemma 3.2 but replacing f(v), f(w), and f(z) by arbitrary elements $f_v \in f(v)$, $f_w \in f(w)$, and $f_z \in f(z)$, respectively, shows that Assumption 6.3 is fulfilled. Therefore, the backward Euler–Maruyama method (1.6) is well-defined and yields an approximation of the exact solution X of

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\mathrm {d}X(t) + f(X(t)) \,\mathrm {d}t \ni b(X(t)) \,\mathrm {d}t + g(X(t)) \,\mathrm {d}W(t), \quad t \in (0,T],\\ X(0) = X_0, \end{array}\right. } \end{aligned}$$

where $b :{\mathbf {R}}\rightarrow {\mathbf {R}}$ and $g :{\mathbf {R}}\rightarrow {\mathbf {R}}^{1,m}$ are Lipschitz continuous and $X_0 \in L^2(\varOmega )$. To be more precise, the piecewise linear interpolant ${\mathcal {X}}$ of the values $(X^n)_{n\in \{0,\dots ,N\}}$ defined in (6.3) fulfills

$$\begin{aligned} \max _{t \in [0,T]} \Vert X(t) - {\mathcal {X}}(t) \Vert _{L^2(\varOmega )} \le C k^{\frac{1}{4}} \end{aligned}$$

for $C \in (0,\infty )$ that does not depend on the step size $k = \frac{T}{N}$. However, let us mention that the strong order of convergence of 1/4 is not necessarily optimal in this particular example. We refer the reader to [9] for a corresponding result on the forward Euler–Maruyama method.

7.2 Stochastic p-Laplace equation

As a second example, we consider the discretization of the stochastic p-Laplace equation. A similar setting is studied in [5]. For a more detailed introduction to this class of problems, we refer the reader to this work and the references therein.

For $p \in [2,\infty )$ and $T \in (0,\infty )$ the stochastic p-Laplace equation is given by

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\mathrm {d}u(t,\xi ) - \nabla \cdot \big (|\nabla u(t,\xi )|^{p-2} \nabla u(t,\xi )\big ) \,\mathrm {d}t\\ \quad = \varPsi (u(t,\xi )) \,\mathrm {d}W(t), &{}\text {for all } (t,\xi ) \in (0,T)\times {\mathcal {D}},\\ u(t,\xi ) = 0, &{}\text {for all } (t,\xi ) \in (0,T) \times \partial {\mathcal {D}},\\ u(0,\xi ) = u_0(\xi ), &{}\text {for all } \xi \in {\mathcal {D}}, \end{array}\right. } \end{aligned}$$

(7.1)

where ${\mathcal {D}}\subset {\mathbf {R}}^n$, $n \in {\mathbf {N}}$, is a bounded Lipschitz domain. By $W :[0,T] \times \varOmega \rightarrow {\mathbf {R}}^m$, $m \in {\mathbf {N}}$, we denote a standard $({\mathcal {F}}_t)_{t \ge 0}$-adapted Wiener process. We also assume that the initial value $u_0 :{\mathcal {D}}\times \varOmega \rightarrow {\mathbf {R}}$ fulfills

$$\begin{aligned} {\mathbf {E}}\big [ \Vert u_0\Vert _{L^2({\mathcal {D}})}^2\big ] = {\mathbf {E}}\Big [ \int _{\mathcal {D}}|u_0|^2 \,\mathrm {d}\xi \Big ] < \infty . \end{aligned}$$

(7.2)

Furthermore, let $\varPsi :{\mathbf {R}}\rightarrow {\mathcal {L}}_2({\mathbf {R}}^m;{\mathbf {R}})$ be a Lipschitz continuous mapping, where ${\mathcal {L}}_2({\mathbf {R}}^m;{\mathbf {R}})$ denotes the space of Hilbert–Schmidt operators from ${\mathbf {R}}^m$ to ${\mathbf {R}}$. Note that the Nemytskii operator ${\tilde{\varPsi }} :L^2({\mathcal {D}}) \rightarrow {\mathcal {L}}_2({\mathbf {R}}^m;L^2({\mathcal {D}}))$, given by $[{\tilde{\varPsi }}(u)](x) = \varPsi (u(x))$ for $u \in L^2({\mathcal {D}})$, is also Lipschitz continuous and will be of importance in the weak formulation below.

Let $W^{1,p}_0({\mathcal {D}})$ be the Sobolev space of weakly differentiable and p-fold integrable functions on ${\mathcal {D}}$ with vanishing trace on the boundary $\partial {\mathcal {D}}$, see [47, Section 1.2.3] or [40, Section 4.5] for a precise definition. The dual space of $W_0^{1,p}({\mathcal {D}})$ is denoted by $W^{-1,p}({\mathcal {D}})$ in the following. Then, the stochastic p-Laplace equation (7.1) has a unique variational solution $(u(t))_{t \in [0,T]}$ which is progressively measurable and an element of $L^2(\varOmega ; C([0,T]; L^2({\mathcal {D}}))) \cap L^p(\varOmega ; L^p(0,T;W_0^{1,p}({\mathcal {D}})))$. For further details we refer to [27, Example 4.1.9, Theorem 4.2.4].

For a spatial discretization of (7.1), we use a family of finite element spaces $(V_h)_{h>0}$ such that $V_h \subset W_0^{1,p}({\mathcal {D}})$ for every $h >0$. Hereby, we interpret h as a spatial refinement parameter. In the following, we consider a fixed parameter value $h >0$. By $d \in {\mathbf {N}}$ we then denote the dimension of the space $V_h$.

The spatially semi-discrete problem consists of finding a progressively measurable $L^2(\varOmega ; C([0,T]; L^2({\mathcal {D}}))) \cap L^p(\varOmega ; L^p(0,T;V_h))$-valued stochastic process $(u_h(t))_{t \in [0,T]}$ such that

$$\begin{aligned} \begin{aligned} \int _{{\mathcal {D}}} u_h(t) v_h \,\mathrm {d}\xi + \int _{{\mathcal {D}}} \int _{0}^{t} |\nabla u_h(s)|^{p-2} \nabla u_h(s) \cdot \nabla v_h \,\mathrm {d}s \,\mathrm {d}\xi \\ = \int _{{\mathcal {D}}} P_h u_0 v_h \,\mathrm {d}\xi + \int _{{\mathcal {D}}} \int _{0}^{t} P_h {\tilde{\varPsi }}(u_h(s)) \,\mathrm {d}W(s) v_h \,\mathrm {d}\xi \end{aligned}\nonumber \\ \end{aligned}$$

(7.3)

for every $v_h \in V_h$ and $t\in [0,T]$. Here, $P_h :L^2({\mathcal {D}}) \rightarrow V_h$ denotes the $L^2({\mathcal {D}})$-orthogonal projection onto $V_h$.

In order to apply our results from the previous sections, we rewrite (7.3) as a problem in ${\mathbf {R}}^d$. To this end, we consider a one-to-one relation between $V_h$ and ${\mathbf {R}}^d$ given by

$$\begin{aligned} v_x = \sum _{i=1}^{d} x_i \varphi _i \in V_h \quad \text {for } x = [x_1,\ldots ,x_d]^{\top } \in {\mathbf {R}}^d \end{aligned}$$

(7.4)

for a basis $\{\varphi _1,\dots ,\varphi _d\}$ of $V_h$. Through (7.4) we induce additional norms on ${\mathbf {R}}^d$ which are given by

$$\begin{aligned} \Vert x\Vert _1 := \Vert v_x\Vert _{W_0^{1,p}({\mathcal {D}})}, \quad \Vert x\Vert _0 := \Vert v_x\Vert _{L^2({\mathcal {D}})}, \quad \Vert x\Vert _{-1} := \Vert v_x\Vert _{W^{-1,p}({\mathcal {D}})}, \end{aligned}$$

for every $x \in {\mathbf {R}}^d$. Observe that the norm $\Vert \cdot \Vert _0$ is also induced by the inner product

$$\begin{aligned} \langle x,y\rangle _{0} := \langle v_x,v_y\rangle _{L^2({\mathcal {D}})} = \langle M_h x,y\rangle _{}, \quad \text {with } M_h = (\langle \varphi _i,\varphi _j\rangle _{L^2({\mathcal {D}})})_{i,j \in \{1,\dots ,d\}}, \end{aligned}$$

where the mass matrix $M_h$ is symmetric and positive definite. Since all norms on ${\mathbf {R}}^d$ are equivalent, for each $i \in \{-1,0,1\}$ there exist $c_i, C_i \in (0,\infty )$ such that

$$\begin{aligned} c_i \Vert x\Vert _i \le |x| \le C_i \Vert x\Vert _i \end{aligned}$$

for all $x \in {\mathbf {R}}^d$.

The p-Laplace operator in the spatially semi-discrete problem (7.3) can be written as $A_h :V_h \rightarrow V_h$ which is implicitly defined by

$$\begin{aligned} \langle A_h(v_h),w_h\rangle _{L^2({\mathcal {D}})} = \int _{{\mathcal {D}}} |\nabla v_h|^{p-2} \nabla v_h \cdot \nabla w_h \,\mathrm {d}\xi \end{aligned}$$

for all $v_h, w_h \in V_h$. By the same arguments as in [27, Example 4.1.9] one can easily verify that $A_h$ fulfills

$$\begin{aligned}&\langle A_h (v_h) - A_h (w_h),v_h - w_h\rangle _{L^2({\mathcal {D}})} \ge 0,\\&\langle A_h (v_h),v_h\rangle _{L^2({\mathcal {D}})} = \Vert v_h\Vert _{W_0^{1,p}({\mathcal {D}})}^p, \quad \Vert A_h (v_h)\Vert _{W^{-1,p}({\mathcal {D}})} \le \Vert v_h\Vert _{W_0^{1,p}({\mathcal {D}})}^{p-1} \end{aligned}$$

for all $v_h, w_h \in V_h$. Then, for $x, y \in {\mathbf {R}}^d$ and associated $v_x, v_y \in V_h$, we introduce mappings ${\tilde{f}} :{\mathbf {R}}^d \rightarrow {\mathbf {R}}^d$ and ${\tilde{g}} :{\mathbf {R}}^d \rightarrow {\mathbf {R}}^{d,m}$ implicitly by

$$\begin{aligned} \sum _{i=1}^{d} [{\tilde{f}}(x)]_i \varphi _i = A_h (v_x), \quad \sum _{i=1}^{d} [{\tilde{g}}(x)z ]_i \varphi _i = P_h {\tilde{\varPsi }}(v_x) z,\quad \sum _{i=1}^{d} [X_0]_i \varphi _i = P_h u_0 \end{aligned}$$

for $z \in {\mathbf {R}}^m$ and use these functions to define $f(x) := M_h {\tilde{f}}(x)$ as well as $g(x) := M_h^{\frac{1}{2}}{\tilde{g}}(x)$ for every $x \in {\mathbf {R}}^d$. As we assumed that $v_x \mapsto {\tilde{\varPsi }} (v_x )$ is Lipschitz continuous, there exists $L_g \in (0,\infty )$ such that

$$\begin{aligned} |g(x) - g(y) |^2&= \sum _{j=1}^{m} | M_h^{\frac{1}{2}}{\tilde{g}}(x)e_j - M_h^{\frac{1}{2}}{\tilde{g}}(y)e_j|^2\\&= \sum _{j=1}^{m} \Vert P_h {\tilde{\varPsi }}(v_x)e_j - P_h {\tilde{\varPsi }}(v_y)e_j \Vert _{L^2({\mathcal {D}})}^2\\&= \Vert P_h {\tilde{\varPsi }}(v_x) - P_h {\tilde{\varPsi }}(v_y) \Vert _{{\mathcal {L}}_2({\mathbf {R}}^m;L^2({\mathcal {D}}))}^2\\&\le L_g^2 \Vert v_x - v_y \Vert _{L^2({\mathcal {D}})}^2 \le \frac{L_g^2}{c_0^2} | x - y |^2 \end{aligned}$$

for $x,y \in {\mathbf {R}}^d$ and $v_x,v_y \in V_h$ fulfilling (7.4) as well as an orthonormal basis $\{e_j\}_{j \in \{1,\dots ,m\}}$ of ${\mathbf {R}}^m$. Thus, g fulfills Assumption 4.3. Due the integrability condition to (7.2) for $u_0$, it follows that $X_0$ fulfills Assumption 4.4.

Moreover, we see that f is monotone, coercive, and bounded as we can write

$$\begin{aligned} \langle f(x) - f(y),x - y\rangle _{}&= \langle {\tilde{f}}(x) - {\tilde{f}}(y),x - y\rangle _{0}\\&=\sum _{i=1}^{d} \sum _{j=1}^{d} \big ([{\tilde{f}}(x)]_i - [{\tilde{f}}(y)]_i\big ) \big (x_j - y_j\big ) \langle \varphi _i ,\varphi _j\rangle _{L^2({\mathcal {D}})}\\&= \langle A_h (v_x) - A_h (v_y),v_x - v_y\rangle _{L^2({\mathcal {D}})} \ge 0 \end{aligned}$$

as well as

$$\begin{aligned} \langle f(x),x\rangle _{}&=\sum _{i=1}^{d} \sum _{j=1}^{d} [{\tilde{f}}(x)]_i x_j \langle \varphi _i ,\varphi _j\rangle _{L^2({\mathcal {D}})}\\&= \langle A_h (v_x),v_x\rangle _{L^2({\mathcal {D}})} = \Vert x\Vert _1^p \ge C_1^{-p} |x|^p \end{aligned}$$

and

$$\begin{aligned} |f(x) |&\le \Vert M_h\Vert _{{\mathcal {L}}({\mathbf {R}}^m)} |M_h^{-1} f(x) | \le C_{-1} \Vert M_h\Vert _{{\mathcal {L}}({\mathbf {R}}^m)} \Vert {\tilde{f}}(x) \Vert _{-1} \\&= C_{-1} \Vert M_h\Vert _{{\mathcal {L}}({\mathbf {R}}^m)} \Vert A_h (v_x) \Vert _{W^{-1,p}({\mathcal {D}})} \le C_{-1} \Vert M_h\Vert _{{\mathcal {L}}({\mathbf {R}}^m)} \Vert v_x\Vert _{W^{1,p}({\mathcal {D}})}^{p-1} \\&= C_{-1} \Vert M_h\Vert _{{\mathcal {L}}({\mathbf {R}}^m)} \Vert x \Vert _1^{p-1} = \frac{C_{-1}}{c_1^{p-1}} \Vert M_h\Vert _{{\mathcal {L}}({\mathbf {R}}^m)} |x|^{p-1} \end{aligned}$$

for all $x,y \in {\mathbf {R}}^d$ and $v_x,v_y \in V_h$ fulfilling (7.4). Here, $\Vert \cdot \Vert _{{\mathcal {L}}({\mathbf {R}}^m)}$ denotes the matrix norm in ${\mathbf {R}}^m$ which is induced by $|\cdot |$. Therefore, Assumption 4.1 is satisfied. To prove that f fulfills Assumption 6.3 we note that the mapping $\varPhi :V_h \rightarrow [0,\infty )$ given by

$$\begin{aligned} \varPhi (v_h) = \frac{1}{p} \int _{{\mathcal {D}}} |\nabla v_h|^p \,\mathrm {d}\xi , \quad v_h \in V_h, \end{aligned}$$

is a potential of $A_h$, compare [47, Example 4.23]. Since $\varPhi $ is convex it follows that

$$\begin{aligned} \varPhi (v_h) \ge \varPhi (w_h) + \langle A_h(w_h),v_h - w_h\rangle _{L^2({\mathcal {D}})}, \quad \text { for all } v_h, w_h \in V_h, \end{aligned}$$

where we use [13, Kapitel III, Lemma 4.10]. In the same way as in Lemma 3.2 we obtain that

$$\begin{aligned} \langle A_h(v_x) - A_h(v_y),v_y - v_z\rangle _{L^2({\mathcal {D}})} \le \langle A_h(v_x) - A_h(v_z),v_x - v_z\rangle _{L^2({\mathcal {D}})} \end{aligned}$$

for all $v_z,v_x,v_y \in V_h$. Applying the definition of f, we then get

$$\begin{aligned} \langle f(x) - f(y),y - z\rangle _{}&= \langle A_h(v_x) - A_h(v_y),v_y - v_z\rangle _{L^2({\mathcal {D}})}\\&\le \langle A_h(v_x) - A_h(v_z),v_x - v_z\rangle _{L^2({\mathcal {D}})} = \langle f(x) - f(z),x - z\rangle _{} \end{aligned}$$

for $x,y,z \in {\mathbf {R}}^d$ and $v_x,v_y,v_z \in V_h$ fulfilling (7.4). This shows that f also fulfills Assumption 6.3.

Consequently, the results of the previous sections are applicable. More precisely, the backward Euler scheme (1.6) has a unique solution $(X^n)_{n\in \{0,\dots ,N\}}$ (cf. Theorem 5.3). Theorem 6.4 then states that the piecewise linear interpolant ${\mathcal {X}}$ of the values $(X^n)_{n\in \{1,\dots ,N\}}$ defined in (6.3) fulfills

$$\begin{aligned} \max _{t \in [0,T]} \Vert X(t) - {\mathcal {X}}(t) \Vert _{L^2(\varOmega ;{\mathbf {R}}^d)} \le C k^{\frac{1}{4}} \end{aligned}$$

for $C \in (0,\infty )$ that does not depend on the step size k where X is the solution to the single-valued stochastic differential equation

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\mathrm {d}X(t) + f(X(t)) \,\mathrm {d}t = g(X(t)) \,\mathrm {d}W(t), \quad t \in (0,T],\\ X(0) = X_0. \end{array}\right. } \end{aligned}$$

Observe that our proof does not yet rule out that the constant C above depends on the dimension d of the finite element space $V_h$. Hence, this is not a complete analysis of a full discretization of the stochastic partial differential equation (7.1) and a more detailed analysis is subject to future work. We refer to [5] for a related result in this direction.

Let us emphasize that, unlike the results in [5], we do not have to impose any temporal regularity assumption on the exact solution of (7.1) or on the solution of the semi-discrete problem (7.3). Since such regularity conditions are often not easily verified for quasi-linear stochastic partial differential equations we are confident that our approach could lead to interesting new insights in the numerical analysis of such infinite dimensional problems.

References

Andersson, A., Kruse, R.: Mean-square convergence of the BDF2-Maruyama and backward Euler schemes for SDE satisfying a global monotonicity condition. BIT Numer. Math. 57(1), 21–53 (2017)
Article MathSciNet MATH Google Scholar
Barbu, V.: Nonlinear Differential Equations of Monotone Types in Banach Spaces. Springer Monographs in Mathematics. Springer, New York (2010)
Google Scholar
Bernardin, F.: Multivalued stochastic differential equations: convergence of a numerical scheme. Set-Valued Anal. 11(4), 393–415 (2003)
Article MathSciNet MATH Google Scholar
Beyn, W.-J., Isaak, E., Kruse, R.: Stochastic C-stability and B-consistency of explicit and implicit Euler-type schemes. J. Sci. Comput. 67(3), 955–987 (2016)
Article MathSciNet MATH Google Scholar
Breit, D., Hofmanová, M.: Space-time approximation of stochastic $p$-Laplace systems (2019). ArXiv Preprint, arXiv:1904.03134
Brosse, N., Durmus, A., Moulines, É., Sabanis, S.: The tamed unadjusted Langevin algorithm. Stoch. Process. Appl. 129(10), 3638–3663 (2018)
Article MathSciNet MATH Google Scholar
Cépa, E.: Équations différentielles stochastiques multivoques. In: Séminaire de Probabilités, XXIX, Volume 1613 of Lecture Notes in Math., pp. 86–107. Springer, Berlin (1995)
Clark, D.S.: Short proof of a discrete Gronwall inequality. Discrete Appl. Math. 16(3), 279–281 (1987)
Article MathSciNet MATH Google Scholar
Dareiotis, K., Gerencsér, M.: On the regularisation of the noise for the Euler–Maruyama scheme with irregular drift. Electron. J. Probab. 25, Paper No. 82, 18 (2020)
Durmus, A., Moulines, É.: High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli 25(4A), 2854–2882 (2019)
Article MathSciNet MATH Google Scholar
Evans, L.C.: Partial Differential Equations. Graduate Studies in Mathematics, vol. 19. American Mathematical Society, Providence (1998)
Google Scholar
Friedman, A.: Stochastic Differential Equations and Applications, vol. 1. Academic Press, New York (1975) (Probability and Mathematical Statistics, Vol. 28)
Gajewski, H., Gröger, K., Zacharias, K.: Nichtlineare Operatorgleichungen und Operatordifferentialgleichungen. Akademie-Verlag, Berlin (1974) (Mathematische Lehrbücher und Monographien, p. 38. II, Abteilung, Mathematische Monographien, Band)
Gess, B., Tölle, J.M.: Multi-valued, singular stochastic evolution inclusions. J. Math. Pures Appl. 101(6), 789–827 (2014)
Article MathSciNet MATH Google Scholar
Higham, D.J., Mao, X., Stuart, A.M.: Strong convergence of Euler-type methods for nonlinear stochastic differential equations. SIAM J. Numer. Anal. 40(3), 1041–1063 (2002)
Article MathSciNet MATH Google Scholar
Hu, Y.: Semi-implicit Euler–Maruyama scheme for stiff stochastic equations. In: Stochastic Analysis and Related Topics, V (Silivri, 1994), Volume 38 of Progr. Probab., pp. 183–202. Birkhäuser, Boston (1996)
Hutzenthaler, M., Jentzen, A.: Numerical approximations of stochastic differential equations with non-globally Lipschitz continuous coefficients. Mem. Am. Math. Soc. 236(1112), v+99 (2015)
Hutzenthaler, M., Jentzen, A., Kloeden, P.E.: Strong and weak divergence in finite time of Euler’s method for stochastic differential equations with non-globally Lipschitz continuous coefficients. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 467(2130), 1563–1576 (2011)
Kelly, C., Lord, G.J.: Adaptive time-stepping strategies for nonlinear stochastic systems. IMA J. Numer. Anal. 38(3), 1523–1549 (2018)
Article MathSciNet MATH Google Scholar
Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations. Applications of Mathematics (New York), vol. 23. Springer, Berlin (1992)
Book MATH Google Scholar
Krée, P.: Diffusion equation for multivalued stochastic differential equations. J. Funct. Anal. 49(1), 73–90 (1982)
Article MathSciNet MATH Google Scholar
Leimkuhler, B., Matthews, C., Stoltz, G.: The computation of averages from equilibrium and nonequilibrium Langevin molecular dynamics. IMA J. Numer. Anal. 36(1), 13–79 (2016)
MathSciNet MATH Google Scholar
Lelièvre, T., Rousset, M., Stoltz, G.: Free Energy Computations: A Mathematical Perspective. Imperial College Press, London (2010)
Book MATH Google Scholar
Leobacher, G., Szölgyenyi, M.: A strong order 1/2 method for multidimensional SDEs with discontinuous drift. Ann. Appl. Probab. 27(4), 2383–2418 (2017)
Article MathSciNet MATH Google Scholar
Leobacher, G., Szölgyenyi, M.: Convergence of the Euler–Maruyama method for multidimensional SDEs with discontinuous drift and degenerate diffusion coefficient. Numer. Math. 138(1), 219–239 (2018)
Article MathSciNet MATH Google Scholar
Lepingle, D., Nguyen, T.T.: Approximating and simulating multivalued stochastic differential equations. Monte Carlo Methods Appl. 10(2), 129–152 (2004)
Article MathSciNet MATH Google Scholar
Liu, W., Röckner, M.: Stochastic Partial Differential Equations: An Introduction. Universitext. Springer, Cham (2015)
Book MATH Google Scholar
Mao, X.: Stochastic Differential Equations and Applications, 2nd edn. Horwood Publishing Limited, Chichester (2008)
Book Google Scholar
Mao, X.: Convergence rates of the truncated Euler-Maruyama method for stochastic differential equations. J. Comput. Appl. Math. 296, 362–375 (2016)
Article MathSciNet MATH Google Scholar
Mao, X., Szpruch, L.: Strong convergence and stability of implicit numerical methods for stochastic differential equations with non-globally Lipschitz continuous coefficients. J. Comput. Appl. Math. 238, 14–28 (2013)
Article MathSciNet MATH Google Scholar
Mao, X., Szpruch, L.: Strong convergence rates for backward Euler–Maruyama method for non-linear dissipative-type stochastic differential equations with super-linear diffusion coefficients. Stochastics 85(1), 144–171 (2013)
Article MathSciNet MATH Google Scholar
Milstein, G.N., Tretyakov, M.V.: Stochastic Numerics for Mathematical Physics. Scientific Computation. Springer, Berlin (2004)
Book MATH Google Scholar
Müller-Gronbach, T., Yaroslavtseva, L.: On the performance of the Euler–Maruyama scheme for SDEs with discontinuous drift coefficient. Ann. Inst. Henri Poincaré Probab. Stat. 56(2), 1162–1178 (2020)
Article MathSciNet MATH Google Scholar
Müller-Gronbach, T., Yaroslavtseva, L.: A strong order 3/4 method for SDEs with discontinuous drift coefficient. IMA J. Numer. Anal. (2020). https://doi.org/10.1093/imanum/draa078
Neuenkirch, A., Szölgyenyi, M., Szpruch, L.: An adaptive Euler–Maruyama scheme for stochastic differential equations with discontinuous drift and its convergence analysis. SIAM J. Numer. Anal. 57(1), 378–403 (2019)
Article MathSciNet MATH Google Scholar
Ngo, H.-L., Taguchi, D.: On the Euler–Maruyama approximation for one-dimensional stochastic differential equations with irregular coefficients. IMA J. Numer. Anal. 37(4), 1864–1883 (2017)
MathSciNet MATH Google Scholar
Ngo, H.-L., Taguchi, D.: Approximation for non-smooth functionals of stochastic differential equations with irregular drift. J. Math. Anal. Appl. 457(1), 361–388 (2018)
Article MathSciNet MATH Google Scholar
Nochetto, R.H., Savaré, G., Verdi, C.: A posteriori error estimates for variable time-step discretizations of nonlinear evolution equations. Commun. Pure Appl. Math. 53(5), 525–589 (2000)
Article MathSciNet MATH Google Scholar
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables, Volume 30 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2000) (Reprint of the 1970 original)
Papageorgiou, N.S., Winkert, P.: Applied Nonlinear Functional Analysis: An Introduction. De Gruyter Graduate. De Gruyter, Berlin (2018)
Book MATH Google Scholar
Pardoux, E., Răşcanu, A.: Stochastic Differential Equations, Backward SDEs, Partial Differential Equations. Stochastic Modelling and Applied Probability, vol. 69. Springer, Cham (2014)
MATH Google Scholar
Pettersson, R.: Yosida approximations for multivalued stochastic differential equations. Stoch. Stoch. Rep. 52(1–2), 107–120 (1995)
Article MathSciNet MATH Google Scholar
Pettersson, R.: Projection scheme for stochastic differential equations with convex constraints. Stoch. Process. Appl. 88(1), 125–134 (2000)
Article MathSciNet MATH Google Scholar
Prévôt, C., Röckner, M.: A Concise Course on Stochastic Partial Differential Equations. Lecture Notes in Mathematics, vol. 1905. Springer, Berlin (2007)
MATH Google Scholar
Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton (1997) (Reprint of the 1970 original, Princeton Paperbacks)
Roubíček, T.: Nonlinear Partial Differential Equations with Applications, Volume 153 of International Series of Numerical Mathematics, 2nd edn. Birkhäuser, Basel (2013)
Book Google Scholar
Růžička, M.: Nichtlineare Funktionalanalysis: eine Einführung. Springer, Berlin (2004)
Sabanis, S.: Euler approximations with varying coefficients: the case of superlinearly growing diffusion coefficients. Ann. Appl. Probab. 26(4), 2083–2105 (2016)
Article MathSciNet MATH Google Scholar
Sabanis, S., Zhang, Y.: Higher order Langevin Monte Carlo algorithm. Electron. J. Stat. 13(2), 3805–3850 (2019)
Article MathSciNet MATH Google Scholar
Scheutzow, M.: A stochastic Gronwall lemma. Infin. Dimens. Anal. Quantum Probab. Relat. Top., 16(2):1350019, 4 (2013)
Stephan, M.: Yosida approximations for multivalued stochastic differential equations on Banach spaces via a Gelfand triple. Ph.D. thesis, Bielefeld University (2012)
Stuart, A.M., Humphries, A.R.: Dynamical Systems and Numerical Analysis. Cambridge Monographs on Applied and Computational Mathematics, vol. 2. Cambridge University Press, Cambridge (1996)
Google Scholar
Wu, J., Zhang, H.: Penalization schemes for multi-valued stochastic differential equations. Stat. Probab. Lett. 83(2), 481–492 (2013)
Article MathSciNet MATH Google Scholar
Xie, L., Zhang, X.: Ergodicity of stochastic differential equations with jumps and singular coefficients. Ann. Inst. Henri Poincaré Probab. Stat. 56(1), 175–229 (2020)
Article MathSciNet MATH Google Scholar
Zhang, H.: Strong convergence rate for multivalued stochastic differential equations via stochastic theta method. Stochastics 90(5), 762–781 (2018)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

ME would like to thank the Berlin Mathematical School for the financial support. RK also gratefully acknowledges financial support by the German Research Foundation (DFG) through the research unit FOR 2402 – Rough paths, stochastic partial differential equations and related topics – at TU Berlin. MK and SL were supported by Vetenskapsrådet (VR) through Grant No. 2017-04274.

Funding

Open access funding provided by Lund University.

Author information

Authors and Affiliations

Centre for Mathematical Sciences, Lund University, 221 00, Lund, Sweden
Monika Eisenmann
Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, P.O. Box 278, Budapest, Hungary
Mihály Kovács
Institut für Mathematik, Martin-Luther-Universität Halle-Wittenberg, 06099, Halle (Saale), Germany
Raphael Kruse
Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, 412 96, Gothenburg, Sweden
Stig Larsson

Authors

Monika Eisenmann
View author publications
You can also search for this author in PubMed Google Scholar
Mihály Kovács
View author publications
You can also search for this author in PubMed Google Scholar
Raphael Kruse
View author publications
You can also search for this author in PubMed Google Scholar
Stig Larsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Monika Eisenmann.

Additional information

Communicated by David Cohen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Eisenmann, M., Kovács, M., Kruse, R. et al. Error estimates of the backward Euler–Maruyama method for multi-valued stochastic differential equations. Bit Numer Math 62, 803–848 (2022). https://doi.org/10.1007/s10543-021-00893-w

Download citation

Received: 01 April 2020
Accepted: 20 August 2021
Published: 14 September 2021
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10543-021-00893-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Error estimates of the backward Euler–Maruyama method for multi-valued stochastic differential equations

Abstract

Similar content being viewed by others

Stochastic C-Stability and B-Consistency of Explicit and Implicit Milstein-Type Schemes

Convergence, Non-negativity and Stability of a New Tamed Euler–Maruyama Scheme for Stochastic Differential Equations with Hölder Continuous Diffusion Coefficient

The improvement of the truncated Euler-Maruyama method for non-Lipschitz stochastic differential equations

1 Introduction

2 Preliminaries

Definition 2.1

Lemma 2.2

Lemma 2.3

3 Application to the Langevin equation with a convex potential

Assumption 3.1

Lemma 3.2

Proof

Lemma 3.3

Proof

Theorem 3.4

Lemma 3.5

Proof

Lemma 3.6

Proof

Theorem 3.7

Proof

Remark 3.8

4 Properties of the exact solution in the multi-valued case

Assumption 4.1

Assumption 4.2

Assumption 4.3

Assumption 4.4

Remark 4.5

Definition 4.6

Theorem 4.7

Proof

Remark 4.8

5 Well-posedness of the backward Euler method in the multi-valued case

Lemma 5.1

Proof

Lemma 5.2

Proof

Theorem 5.3

Proof

Lemma 5.4

Remark 5.5

Proof of Lemma 5.4

6 Error estimates in the multi-valued case

Lemma 6.1

Proof

Lemma 6.2

Proof

Assumption 6.3

Theorem 6.4

Remark 6.5

Proof Theorem 6.4

Remark 6.6

7 Examples

7.1 Discontinuous drift coefficient

7.2 Stochastic p-Laplace equation

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation