1 Introduction

In this paper we study stochastic model reduction for a system of nonlinear evolution equations in infinite-dimensional Hilbert spaces which is general enough to cover well-established systems of equations used in climate modeling. The big advantage of such a procedure is the lower complexity of the reduced equations, since complexity is still one of the major issues when predicting the evolution of systems over time spans which are typical for climate rather than meteorology.

Following [9, 17], we assume that the climate variables of the system, i.e., those more relevant to climate prediction, evolve on longer time scales than the unresolved variables, which can be modeled stochastically and have a typical time scale much shorter than the climate variables. To be able to close the equation for the climate variables, the task is to understand the effects of unresolved variables when stretching time to climate time. In what follows, we also refer to climate variables as resolved variables.

Climate modeling typically starts with equations containing quadratic nonlinearities which can describe many features of oceanic and atmospheric dynamics at meteorological time—see [18, 25]. In abstract mathematical terms, such equations would look like

$$\begin{aligned} \frac{\mathrm{d}Z_t}{\mathrm{d}t} = f_t + A Z_t + B(Z_t,Z_t), \end{aligned}$$
(1)

where \(A:H \rightarrow H\) is a linear operator, \(B:H\times H \rightarrow H\) is a bilinear operator, and f is an external forcing term. Here, the variable Z taking values in H is supposed to be a complex mix of climate and unresolved variables, and hence, the space H has to be ‘big enough’ to ‘host’ variables of that type. We therefore choose H to be a separable infinite-dimensional Hilbert space.

Now, there is a variety of procedures to identify climate variables in practice which we will not discuss in this paper. We rather assume that climate variables have been identified spanning a Hilbert-subspace \(H_d\subset H\), and we further assume that the orthogonal complement \(H_\infty ,\,H = H_d \oplus H_\infty \), gives the space of unresolved variables. When projecting Z onto \(H_d\), \(H_\infty \) via the projection maps \(\pi _d\), \(\pi _\infty \), Eq. (1) gives raise to two equations

$$\begin{aligned} \frac{\mathrm{d}X_t}{\mathrm{d}t} = f^1_t + {\tilde{A}}^1_1 X_t + A^1_2 Y_t + {\tilde{B}}^1_{11}(X_t,X_t) + B^1_{12}(X_t,Y_t) + B^1_{22}(Y_t,Y_t) \end{aligned}$$
(2)

and

$$\begin{aligned} \frac{\mathrm{d}Y_t}{\mathrm{d}t} = f^2_t + A^2_1 X_t + A^2_2 Y_t + B^2_{11}(X_t,X_t) + B^2_{12}(X_t,Y_t) + B^2_{22}(Y_t,Y_t) \end{aligned}$$
(3)

for the collection of climate variables \(X=\pi _d(Z)\) and unresolved variables \(Y=\pi _\infty (Z)\), respectively.

The next step, called stochastic climate modeling, consists in replacing the complicated nonlinear self-interaction term in (3) by a linear random term. Such a replacement could be justified by the assumption that quickly varying fluctuations of small-scale unresolved variables are more or less indistinguishable from the combined effect of a large number of weakly coupled factors, usually leading to Gaussian driving forces via central limit theorem. But such effects would only become visible at climate time and not at meteorological time used in (2) and (3), so that we are looking to replace \(B^2_{22}(Y_{\varepsilon ^{-1}t},Y_{\varepsilon ^{-1}t})\) by a linear random term, stretching meteorological time to \(\varepsilon ^{-1}t\), using a small parameter \(\varepsilon \ll 1\).

In this work, following [17, 22], we suppose that

$$\begin{aligned} B^2_{22}(Y_{\varepsilon ^{-1}t},Y_{\varepsilon ^{-1}t}) \hbox { is replaced by } - \mu \varepsilon ^{-1} Y_{\varepsilon ^{-1}t} + \sigma {\dot{W}}_t, \end{aligned}$$

where \(\mu ,\sigma \) are positive constants, and \({\dot{W}}\) is Gaussian noise, white in time, and colored in space. This way, the parameter \(\varepsilon \) is used to scale time, but also to adjust for the size of the involved variables when scaling time.

Another assumption made in [17] is that climate variables at climate time have small forcing and self-interaction, and hence, we also suppose that

$$\begin{aligned} f^1_{\varepsilon ^{-1}t} + {\tilde{A}}^1_1 X_{\varepsilon ^{-1}t} + {\tilde{B}}^1_{11}(X_{\varepsilon ^{-1}t},X_{\varepsilon ^{-1}t}) \hbox { is replaced by } \varepsilon F^1_t + \varepsilon A^1_1 X_{\varepsilon ^{-1}t} + \varepsilon B^1_{11}(X_{\varepsilon ^{-1}t},X_{\varepsilon ^{-1}t}), \end{aligned}$$

avoiding so-called fast forcing and fast waves.

All in all, when introducing the notation \(X^\varepsilon _t = X_{\varepsilon ^{-1}t}\) for climate variables at climate time, and \(Y^\varepsilon _t = \varepsilon ^{-1} Y_{\varepsilon ^{-1}t}\) for the effect of unresolved variables at climate time, Eqs. (2) and (3) translate into

$$\begin{aligned} \frac{\mathrm{d}X^\varepsilon _t}{\mathrm{d}t}&= F^1_t + A^1_1 X^\varepsilon _t + A^1_2 Y^\varepsilon _t + B^1_{11}(X^\varepsilon _t,X^\varepsilon _t) + B^1_{12}(X^\varepsilon _t,Y^\varepsilon _t) + \varepsilon B^1_{22}(Y^\varepsilon _t,Y^\varepsilon _t), \end{aligned}$$
(4)
$$\begin{aligned} \frac{\mathrm{d}Y^\varepsilon _t}{\mathrm{d}t}&= \varepsilon ^{-2} f^2_{\varepsilon ^{-1} t} + \varepsilon ^{-2} A^2_1 X^\varepsilon _t + \varepsilon ^{-1}A^2_2 Y^\varepsilon _t + \varepsilon ^{-2} B^2_{11}(X^\varepsilon _t,X^\varepsilon _t) \nonumber \\&\quad +\varepsilon ^{-1} B^2_{12}(X^\varepsilon _t,Y^\varepsilon _t) - \varepsilon ^{-2}Y^\varepsilon _t +\varepsilon ^{-2} {\dot{W}}_t , \end{aligned}$$
(5)

where we have set \(\mu =\sigma =1\) for the sake of simplicity.

The hope is now that, when \(\varepsilon \) tends to zero, climate variables at climate time can be approximated by a random variable \({\bar{X}}\) which solves a closed stochastic equation with new coefficients not depending on unresolved variables any more. Of course, these new coefficients will be functions of the coefficients of Eqs. (4) and (5), and the process of finding these new coefficients is called stochastic model reduction.

Stochastic model reduction of finite-dimensional systems similar to (4), (5) was extensively discussed in [17]. However, one of the key steps, i.e., proving the convergence \(X^\varepsilon \rightarrow {\bar{X}},\,\varepsilon \downarrow 0\), was kept rather short. Indeed, the authors first sketch a perturbation method based on a theorem by T.G. Kurtz, [16], which is their general method, and they then briefly describe a so-called direct averaging method for special cases based on limits of solutions to stochastic differential equations. In particular, the latter method lacks a certain amount of rigor because the convergence of the involved stochastic processes is not shown, and this gap has not been closed in follow-up papers—see [6, 7, 13] for example.

In this paper we are not only closing this gap, but also develop a new method of proof.

We at first identify the limit process \({\bar{X}}\), and then study the convergence \(X^\varepsilon \rightarrow {\bar{X}}\) as \(\varepsilon \downarrow 0\), when \(X^\varepsilon \) solves a general evolution equation of type

$$\begin{aligned} \frac{\mathrm{d}X^\varepsilon _t}{\mathrm{d}t} = F(t,X^\varepsilon _t) + \sigma (t,X^\varepsilon _t) Y^\varepsilon _t +\varepsilon \beta (Y^\varepsilon _t,Y^\varepsilon _t), \end{aligned}$$
(6)

where \(Y^\varepsilon \) is a decoupled infinite-dimensional Ornstein–Uhlenbeck process satisfying

$$\begin{aligned} \frac{\mathrm{d}Y^\varepsilon _t}{\mathrm{d}t} = -\varepsilon ^{-2}Y^\varepsilon _t +\varepsilon ^{-2} {\dot{W}}_t. \end{aligned}$$
(7)

Since Eq. (6) is more general than (4), once stochastic model reduction is established for the system (6), (7) with decoupled unresolved variables, it also follows for an interesting subclass of systems of type (4), (5) with coupled unresolved variables—basically those systems for which \(B^2_{12}=0\), see Theorem 5.3. Part (ii) of this theorem deals with the case of linear scattering, that is \(B^1_{22} =0\), and in this case we achieve showing ‘strong’ convergence in probability:

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \mathbb {P}\left\{ \sup _{t \le T }\Vert X^\varepsilon _t - {\bar{X}}_t\Vert _{H_d}> \delta \right\} = 0, \quad \forall \delta >0, \end{aligned}$$
(8)

on a given climate time interval [0, T]. When the quadratic interaction term \(B^1_{22}\) is non-trivial, we can only show convergence in law, as stated in Theorem 5.3(i). We refer to Remark 4.3(ii) for an argument which suggests that one cannot expect much more than a weak-type convergence in the general case. This insight of course sheds new light on the results given in [17] and follow-up papers.

At this point it should be mentioned that throughout this paper we assume that \(H_d\) is finite-dimensional which seems to be a natural choice when it comes to climate modeling. However, our arguments are general and can be adapted to infinite-dimensional subspaces, see [5].

In the case of the more abstract system (6), (7), the process \(Y^\varepsilon \) will eventually behave like white noise, as \(\varepsilon \downarrow 0\). This limiting behavior is fundamental for finding the limit of Eq. (6) because it opens the door for using arguments similar to those of Wong and Zakai in [26]. Of course, Wong and Zakai formulated their results in a finite-dimensional setting. There have been earlier attempts of proving similar results in infinite dimensions; we refer to [2, 23, 24], for example. However, we would like to emphasize that these earlier attempts dealt with piecewise linear approximations of noise rather than an infinite-dimensional Ornstein–Uhlenbeck process. Note that it is typical for Wong–Zakai results that stochastic integral terms of limiting equations are interpreted in the sense of Stratonovich.

Finally, it is worth comparing our results with those in the literature concerning averaging principles, see, for instance, [8, Sect. 7.9], [20, 21] and references therein. Roughly speaking, in those results the unresolved variables satisfy the equation \(\mathrm{d}Y^\varepsilon _t = - \varepsilon ^{2} Y^\varepsilon _t \mathrm{d}t + \varepsilon ^{-1} \mathrm{d}W_t\), with a weaker noise intensity compared to our, and therefore, the resolved variables only undergo a change of drift in the limit \(\varepsilon \downarrow 0\). On the contrary, in our setting a diffusion term also appears in the limit, see (13) below.

The paper is structured as follows.

In Sect. 2, we formulate our main results on the convergence of solutions to (6), (7). First, the limiting equation for \({\bar{X}}\) is identified, and then conditions for weak convergence \(X^\varepsilon \rightarrow {\bar{X}}\) are stated in Theorem 2.2(i). However, when (6) is a simpler equation, i.e., \(\beta =0\), even the stronger convergence (8) can be shown under the same conditions—see Theorem 2.2(ii).

In Sect. 3, we give the proof of Theorem 2.2(ii). The proof relies on preliminary localization and discretization arguments which allow to consider, instead of (8), its discrete version

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \mathbb {P}\left\{ \sup _{k}\Vert X^\varepsilon _{t_k} - {\bar{X}}_{t_k}\Vert _{H_d}> \delta \right\} = 0, \quad \forall \delta >0, \end{aligned}$$

for only finitely many \(t_k \in [0,T]\).

In Sect. 4, we give the proof of Theorem 2.2(i) which, at the beginning, requires a careful analysis of the quadratic term \(\beta (Y^\varepsilon _t,Y^\varepsilon _t)\), but otherwise is an adaptation of the proof given in the previous section.

In Sect. 5, we eventually use the results of Sect. 2 to prove Theorem 5.3 under quite natural conditions, thus making the connection to our main applications in climate modeling.

2 Notation and main result

Let \(H_d\), \(H_\infty \) be real separable Hilbert spaces. Assume that \(H_d\) is finite-dimensional, \(\dim H_d = d\), with given orthonormal basis \({\mathbf {e}}_1,\dots ,{\mathbf {e}}_d\), and that \(H_\infty \) is infinite-dimensional with given orthonormal basis \({\mathbf {f}}_1,{\mathbf {f}}_2,\dots \)

Given two Banach spaces UV, let \({\mathcal {L}}(U,V)\) denote the Banach space of continuous linear operators mapping U to V, endowed with the operator norm.

For each \(\varepsilon >0\), consider the pair of stochastic processes \((X^\varepsilon ,Y^\varepsilon )\), taking values in \(H_d \times H_\infty \), where \(X^\varepsilon \) satisfies (6) over a fixed finite time interval [0, T], and \(Y^\varepsilon \) is given by

$$\begin{aligned} Y^\varepsilon _t = \int _{-\infty }^t \varepsilon ^{-2}e^{-\varepsilon ^{-2}(t-s)} \mathrm{d}W_s,\quad t\ge 0, \end{aligned}$$

where W is a Wiener process in \(H_\infty \), with real-valued time parameter and self-adjoint trace class covariance operator \(Q \in {\mathcal {L}}(H_\infty ,H_\infty )\).

Remark 2.1

  1. (i)

    A Wiener process with real-valued time parameter can be obtained in the following way: given two independent Wiener processes \((W^+_t)_{t\ge 0}\) and \((W^-_t)_{t\ge 0}\) defined on filtered probability spaces \((\Omega ^+,({\mathcal {F}}^+_t),\mathbb {P}^+)\) and \((\Omega ^-,({\mathcal {F}}^-_t),\mathbb {P}^-)\), respectively, set \(W_t = W^+_t\), for \(t \ge 0\), and \(W_t = W^-_{-t}\), for \(t<0\).

  2. (ii)

    Using such a representation of W, we can also write

    $$\begin{aligned} Y^\varepsilon _t = - \int _0^\infty \varepsilon ^{-2}e^{-\varepsilon ^{-2}(t+s)} \mathrm{d}W^-_s + \int _0^t \varepsilon ^{-2}e^{-\varepsilon ^{-2}(t-s)} \mathrm{d}W^+_s, \quad t\ge 0, \end{aligned}$$

    which clearly is a stationary Ornstein–Uhlenbeck process on \((\Omega ,{\mathcal {F}}^-_\infty \otimes {\mathcal {F}}^+_\infty ,\mathbb {P})\), where \(\Omega =\Omega ^- \times \Omega ^+\) and \(\mathbb {P}=\mathbb {P}^- \otimes \mathbb {P}^+\), see [3]. Furthermore, setting up the stochastic basis for our processes \((X^\varepsilon ,Y^\varepsilon )\), let \((\Omega ,{\mathcal {F}},\mathbb {P})\) be the completion of \((\Omega ,{\mathcal {F}}^-_\infty \otimes {\mathcal {F}}^+_\infty ,\mathbb {P})\), and \(({\mathcal {F}}_t)_{t \ge 0}\) be the augmentation of the filtration \(({\mathcal {F}}^-_\infty \otimes {\mathcal {F}}^+_t)_{t \ge 0}\). Note that this filtration would satisfy the usual conditions.

  3. (iii)

    Since Q is trace class, both W and \(Y^\varepsilon \) take values in \(H_\infty \). Without loss of generality, we can assume that Q is diagonal with respect to the chosen basis \(\{{\mathbf {f}}_m\}_{m \in \mathbb {N}}\) of \(H_\infty \), that the eigenvalues of Q form a sequence \(\{q_m\}_{m \in \mathbb {N}}\) satisfying \(\sum _{m} q_m < \infty \), and that \(\mathbb {E}\left[ \langle W_t, {\mathbf {f}}_m \rangle _{H_\infty } ^2\right] = |t| q_m\), for every \(t \ge 0\) and \(m \in \mathbb {N}\). Moreover, since

    $$\begin{aligned} \langle Y^\varepsilon _t, {\mathbf {f}}_m \rangle _{H_\infty } = \int _{-\infty }^t \varepsilon ^{-2}e^{-\varepsilon ^{-2}(t-s)} d\langle W_s, {\mathbf {f}}_m \rangle _{H_\infty } \end{aligned}$$

    we also have \(\mathbb {E}\left[ \langle Y^\varepsilon _t, {\mathbf {f}}_m \rangle _{H_\infty } ^2\right] = \frac{\varepsilon ^{-2}}{2} q_m\) for every \(t \ge 0\) and \(m \in \mathbb {N}\).

  4. (iv)

    Let Z be an \(\varepsilon \)-independent stationary Ornstein–Uhlenbeck process solving \(\mathrm{d}Z_t = -Z_t \mathrm{d}t + \mathrm{d}W_t\), which is explicitly given by the formula

    $$\begin{aligned} Z_t = \int _{-\infty }^t e^{-(t-s)} \mathrm{d}W_s,\quad t\ge 0. \end{aligned}$$
    (9)

    Due to the self-similarity of W, it is easy to check that the process \((Y^\varepsilon _t)_{t \ge 0}\) equals in law the process \((\varepsilon ^{-1} Z_{t \varepsilon ^{-2}})_{t\ge 0}\), thus making more transparent why we expect the process \(Y^\varepsilon \) to behave like a white noise as \(\varepsilon \downarrow 0\), see, for instance, [1].

Adopting the useful notation \(W^\varepsilon _t = \int _0^t Y^\varepsilon _s \mathrm{d}s\), we can write (6) in integral form as

$$\begin{aligned} X^\varepsilon _t = x_0 + \int _0^t F(s,X^\varepsilon _s) \mathrm{d}s + \int _0^t \sigma (s,X^\varepsilon _s) \mathrm{d}W^\varepsilon _s + \int _0^t \varepsilon \beta (Y^\varepsilon _s,Y^\varepsilon _s) \mathrm{d}s, \quad t\in [0,T],\nonumber \\ \end{aligned}$$
(10)

where \(x_0 \in H_d\) is a deterministic initial condition, as well as \(F:[0,T] \times H_d \rightarrow H_d\), \(\sigma :[0,T] \times H_d \rightarrow {\mathcal {L}}(H_\infty ,H_d)\), \(\beta :H_{\infty } \times H_{\infty } \rightarrow H_d\). We make the following assumptions on these coefficients:

(A1):

\(F \in C([0,T] \times H_d , H_d)\), and \(F(t,\cdot ) \in {Lip}_{loc}(H_d,H_d)\), uniformly in \(t \in [0,T]\);

(A2):

\(\sigma \in C^{1,\gamma }([0,T] \times H_d,{\mathcal {L}}(H_\infty ,H_d))\), the space of \(C^1\) functions with \(\gamma \)-Hölder derivative, for some \(\gamma \in (0,1)\) and its space-differential \(D\sigma (t,\cdot ) \in {Lip}_{loc}(H_d,{\mathcal {L}}(H_d,{\mathcal {L}}(H_\infty ,H_d)))\), uniformly in \(t \in [0,T]\);

(A3):

\(\beta :H_{\infty } \times H_{\infty } \rightarrow H_d\) is a continuous bilinear map.

Of course, by standard theory (see [3], for example), Eq. (10) admits a unique local strong solution, for each \(\varepsilon >0\).

Next, we introduce the limiting equation for the wanted limit \({\bar{X}}\) of the processes \(X^\varepsilon \), when \(\varepsilon \downarrow 0\). First, define the so-called Stratonovich correction term \(C:[0,T] \times H_d \rightarrow H_d\) by

$$\begin{aligned}&C^i(s,{x}) = \langle C(s,{x}) , {\mathbf {e}}_i \rangle _{H_d}= \frac{1}{2} \sum _{m \in \mathbb {N}} q_m \sum _{j=1}^d D_j \sigma ^{i,m}(s,{x}) \sigma ^{j,m}(s,{x}), \quad i=1,\dots ,d,\nonumber \\ \end{aligned}$$
(11)

where

$$\begin{aligned} \sigma ^{i,m}(s,x) = \langle \sigma (s,x){\mathbf {f}}_m , {\mathbf {e}}_i \rangle _{H_d}, \quad i=1,\dots ,d,\;m\in \mathbb {N}, \end{aligned}$$

is matrix notation for the linear map \(\sigma (s,x)\in {\mathcal {L}}(H_\infty ,H_d)\) with respect to our chosen basis vectors; second, let

$$\begin{aligned} b^i_{\ell ,m} = \langle \beta ({\mathbf {f}}_\ell ,{\mathbf {f}}_m) , {\mathbf {e}}_i \rangle _{H_d}\, \sqrt{\frac{q_\ell q_m}{2}}, \quad i=1,\dots ,d,\;\ell ,m\in \mathbb {N}. \end{aligned}$$
(12)

Then, our limiting equation would read

$$\begin{aligned} {\bar{X}}_t= & {} x_0 + \int _0^t \left( F(s,{\bar{X}}_s) + C(s,{\bar{X}}_s) \right) \mathrm{d}s \nonumber \\&+ \int _0^t \sigma (s,{\bar{X}}_s ) d{W}_s + \sum _{\ell , m \in \mathbb {N}} b_{\ell ,m} {\bar{W}}^{\ell ,m}_t, \quad t\in [0,T], \end{aligned}$$
(13)

where W is the same Wiener process used to define \(Y^\varepsilon \) in Remark 2.1, while \(\{{\bar{W}}^{\ell ,m}\}_{\ell ,m \in \mathbb {N}}\) is a family of independent one-dimensional standard Wiener processes, which are also independent of W.

As for (10), also (13) admits a unique local strong solution. However, in view of the interpretation of our results with respect to climate modeling, it is natural to further assume that

(A4):

both Eqs. (10) and (13) admit global solutions on [0, T].

Another assumption specific to climate modeling, which has been advocated in [17], is the zero-mean property of \(\beta (Y^\varepsilon _s,Y^\varepsilon _s)\), \(s \ge 0\). Since all \(Y^\varepsilon \) are stationary under \(\mathbb {P}\), see Remark 2.1(ii), this assumption would translate into

$$\begin{aligned} \mathbb {E}\left[ \langle \beta (Y^\varepsilon _s,Y^\varepsilon _s) , {\mathbf {e}}_i \rangle _{H_d} \right]&= \sum _{\ell ,m \in \mathbb {N}} \langle \beta ({\mathbf {f}}_\ell ,{\mathbf {f}}_m) , {\mathbf {e}}_i \rangle _{H_d}\, \mathbb {E}\left[ Y^{\varepsilon ,\ell }_s Y^{\varepsilon ,m}_s \right] = \sum _{\ell \in \mathbb {N}} \langle \beta ({\mathbf {f}}_\ell ,{\mathbf {f}}_\ell ) , {\mathbf {e}}_i \rangle _{H_d}\, \frac{\varepsilon ^{-2}}{2} q_\ell \,=\,0, \end{aligned}$$

where \(Y_s^{\varepsilon ,\ell }\) is short notation for the coordinates \(\langle Y_s^{\varepsilon } , {\mathbf {f}}_\ell \rangle _{H_\infty }\), \(\ell =1,2,\dots ,\,s\in [0,T]\). As a consequence, we also impose the zero-mean condition

(A5):

\(\sum _{\ell \in \mathbb {N}} \langle \beta ({\mathbf {f}}_\ell ,{\mathbf {f}}_\ell ) , {\mathbf {e}}_i \rangle _{H_d}\, q_\ell \,=0\), for all \(i=1,\dots ,d\),

which is usually true for equations from fluid dynamics and can in general be understood as a renormalization procedure for the quadratic term.

The following theorem is the main result of this paper.

Theorem 2.2

  1. (i)

    Assume (A1)–(A5). Then, \(X^\varepsilon \) converges to \({\bar{X}}\), in law, \(\varepsilon \downarrow 0\).

  2. (ii)

    However, if (A1)–(A4) and (A5) comes via \(\beta =0\), then the stronger convergence (8) holds true.

In what follows, to keep notation light in proofs, when no confusion may occur, the norms in both spaces \(H_d\) and \(H_\infty \) will be denoted by \(|\cdot |\), and their scalar products by \(\langle \cdot ,\cdot \rangle \). The symbol \(\lesssim \) means inequality up to a multiplicative constant, possibly depending on the parameters of our equations, but not on \(\varepsilon \).

3 Strong convergence

In this section we give the proof of Theorem 2.2(ii), which is divided into several steps.

First, by localization, we argue that we can restrict ourselves to \(|X^\varepsilon _t|\), \(|{\bar{X}}_t| \le R\), for some large R, which is effectively leading to Lipschitz continuity of the coefficients of (10).

Second, we discretize the problem, which allows us to reduce the proof of Theorem 2.2(ii) to its discrete version:

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \mathbb {P}\left\{ \sup _{k}|X^\varepsilon _{t_k} - {\bar{X}}_{t_k}|> \delta \right\} = 0, \quad \forall \delta >0, \end{aligned}$$

for only finitely many \(t_k \in [0,T]\). Here, we choose \(t_k = k\Delta \), where \(\Delta =\Delta _\varepsilon \) is a positive parameter whose \(\varepsilon \)-dependence has to be carefully chosen in the proof—see Remark 3.9.

Third, we prove the above discretized version.

3.1 Localization

Fix \(\varepsilon >0,\,\delta \in (0,1)\), and define

$$\begin{aligned} \tau ^\varepsilon _R = \inf \{t\ge 0: |X^\varepsilon _t| \ge R+1\} \wedge \inf \{t\ge 0: |{\bar{X}}_t| \ge R\}, \quad \hbox { for}\ R>0, \end{aligned}$$

so that

$$\begin{aligned} \mathbb {P}\left\{ \sup _{t \le T }|X^\varepsilon _t - {\bar{X}}_t|> \delta \right\}&= \mathbb {P}\left\{ \sup _{t \le T }|X^\varepsilon _t - {\bar{X}}_t|> \delta , \, \sup _{t \le T } |{\bar{X}}_t| \ge R \right\} \nonumber \\&\quad + \mathbb {P}\left\{ \sup _{t \le T }|X^\varepsilon _t - {\bar{X}}_t|> \delta , \, \sup _{t \le T } |{\bar{X}}_t|< R \right\} \nonumber \\&= \mathbb {P}\left\{ \sup _{t \le T }|X^\varepsilon _t - {\bar{X}}_t|> \delta , \, \sup _{t \le T } |{\bar{X}}_t| \ge R \right\} \nonumber \\&\quad + \mathbb {P}\left\{ \sup _{t \le T \wedge \tau ^\varepsilon _R }|X^\varepsilon _t - {\bar{X}}_t|> \delta , \, \sup _{t \le T } |{\bar{X}}_t| < R \right\} \nonumber \\&\le \mathbb {P}\left\{ \sup _{t \le T } |{\bar{X}}_t| \ge R \right\} + \mathbb {P}\left\{ \sup _{t \le T \wedge \tau ^\varepsilon _R }|X^\varepsilon _t - {\bar{X}}_t| > \delta \right\} . \end{aligned}$$
(14)

Therefore, since (A4) implies

$$\begin{aligned} \mathbb {P}\left\{ \sup _{t \le T } |{\bar{X}}_t| \ge R \right\} \rightarrow 0, \hbox { as } R \uparrow \infty , \end{aligned}$$

to prove (8), it is sufficient to show the convergence of the second summand on the right-hand side of (14), when \(\varepsilon \downarrow 0\), for fixed \(\delta \in (0,1),\,R>0\). Furthermore, by Markov inequality,

$$\begin{aligned} \mathbb {P}\left\{ \sup _{t \le T\wedge \tau ^\varepsilon _R }|X^\varepsilon _t - {\bar{X}}_t| > \delta \right\} \le \delta ^{-p}\, \mathbb {E}\left[ \sup _{t \le T\wedge \tau ^\varepsilon _R }|X^\varepsilon _t - {\bar{X}}_t|^p\right] , \end{aligned}$$
(15)

for every \(p>0\), \(\delta \in (0,1)\), and hence showing convergence of the above right-hand side, only, is enough. To keep notation light, we are going to use \(\tau ^\varepsilon \) instead of \(\tau ^\varepsilon _R\), as \(R>0\) will be fixed, in what follows.

3.2 Discretization

Fix \(\varepsilon >0\). We show that the expectation on the right-hand side of (15) can be replaced by an expectation of the same quantity, but with the supremum taken over a finite number (diverging to \(\infty \), as \(\varepsilon \downarrow 0\)) of times \(t_k\), see Corollary 3.7 below.

To start with, we have the following useful a priori estimate.

Lemma 3.1

For any \(p>1\), the Ornstein–Uhlenbeck process \(Y^\varepsilon \) satisfies

$$\begin{aligned} \mathbb {E}\left[ \sup _{t \le T}\left| Y^\varepsilon _t \right| ^p \right] \lesssim \varepsilon ^{-p} \log ^{p/2}\big (1+\varepsilon ^{-2}\big ). \end{aligned}$$

Proof

First, using the decomposition \(Y^\varepsilon _t = Y^\varepsilon _0 + \left( Y^\varepsilon _t - Y^\varepsilon _0 \right) \), Gaussian estimates on \(Y^\varepsilon _0\) and [15, Theorem 2.2], the result is true in one dimension.

In the infinite-dimensional case, by Hölder’s inequality, we can suppose \(p>2\). Therefore, since Q is trace class with eigenvalues satisfying \(\sum _{m\in \mathbb {N}}q_m<\infty \), when \(\alpha = (p-2)/p\), we obtain that

$$\begin{aligned} \mathbb {E}\left[ \sup _{t \le T}\bigg | Y^\varepsilon _t \bigg |^p \right]&= \mathbb {E}\left[ \sup _{t \le T} \left( \sum _{m \in \mathbb {N}, q_m>0} q_m^{\alpha } q_m^{-\alpha } \big | Y^{\varepsilon ,m}_t \big |^2 \right) ^{p/2} \right] \\&\lesssim \left( \sum _{m \in \mathbb {N}, q_m>0} q_m^{-\alpha p/2} \mathbb {E}\Big [ \sup _{t \le T}\Big | Y^{\varepsilon ,m}_t \Big |^p \Big ] \right) \left( \sum _{m \in \mathbb {N}} q_m^{\alpha p/(p-2)} \right) ^{(p-2)/2}\\&\lesssim \varepsilon ^{-p} \log ^{p/2}\big (1+\varepsilon ^{-2}\big ), \end{aligned}$$

having used the one-dimensional result for the coordinates \(Y^{\varepsilon ,m}_t = \langle Y^\varepsilon _t , {\mathbf {f}}_m \rangle ,\,m=1,2,\dots \) \(\square \)

Remark 3.2

In view of Remark 2.1(iv), the previous result could also be obtained from the analogous result for (9) and parabolic scaling. Indeed, it would be sufficient to prove \( \mathbb {E}\left[ \sup _{t \le T} |Z_t|^p\right] \lesssim \log ^{p/2}(1+T) \) for every \(p>1\).

Now, we introduce the discretization of the time interval [0, T]. Let \(\Delta >0\), and let \([T/\Delta ]\) be the largest integer less or equal than \(T/\Delta \). In what follows, \(\Delta \) will also depend on \(\varepsilon \), in a way to be determined later. Also, to make it easier to bound terms by powers of \(\varepsilon \) or \(\Delta \), without loss of generality, we will always assume that both \(\varepsilon \) and \(\Delta \) are less than one.

The next two lemmas control the excursion of \(X^\varepsilon \) between adjacent nodes in terms of the ratio \(\Delta /\varepsilon \).

Lemma 3.3

For any \(p>1\), and any deterministic time \(\tau >0\),

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} k=0,1,\dots ,[T/\Delta ] \\ t \le \tau ,\, t+k\Delta \le T\wedge \tau ^\varepsilon \\ \end{array}} |{X}^\varepsilon _{t+k\Delta } - {X}^\varepsilon _{k\Delta }|^p \right] \lesssim \left( \frac{\tau }{\varepsilon }\right) ^p \log ^{p/2}(1+\varepsilon ^{-2}). \end{aligned}$$

Proof

Since \(\beta =0\), by (10), the increment \({X}^\varepsilon _{t+k\Delta } - {X}^\varepsilon _{k\Delta }\) can be written as

$$\begin{aligned} {X}^\varepsilon _{t+k\Delta } {-} {X}^\varepsilon _{k\Delta } {=}&\int _{k\Delta }^{t+k\Delta } F(s,X^\varepsilon _s) \mathrm{d}s + \int _{k\Delta }^{t+k\Delta } \sigma (s,X^\varepsilon _s)\mathrm{d}W^\varepsilon _s, \quad \hbox {for} t+k\Delta \le T\wedge \tau ^\varepsilon . \end{aligned}$$

Therefore, using (A1), (A2), boundedness of \(X^\varepsilon \) on \([0,\tau ^\varepsilon ]\), and Lemma 3.1, we obtain that

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} k=0,1,\dots ,[T/\Delta ] \\ t \le \tau ,\, t+k\Delta \le T\wedge \tau ^\varepsilon \end{array}} |{X}^\varepsilon _{t+k\Delta } - {X}^\varepsilon _{k\Delta }|^p \right]&\lesssim \tau ^p \left( 1 + \mathbb {E}\left[ \sup _{t \le T\wedge \tau ^\varepsilon }\bigg | Y^\varepsilon _t \bigg |^p \right] \right) \\&\lesssim \left( \frac{\tau }{\varepsilon }\right) ^p \log ^{p/2}(1+\varepsilon ^{-2}), \end{aligned}$$

where \(W^\varepsilon _t = \int _0^t Y^\varepsilon _s \mathrm{d}s\) was defined in Sect. 2. \(\square \)

Lemma 3.4

For any \(p>1\), and any fixed \(k \in \{0,1,\dots ,[T/\Delta ]\}\) such that \(k\Delta \le T\),

$$\begin{aligned} \mathbb {E}\left[ |{X}^\varepsilon _{(k+1)\Delta \wedge \tau ^\varepsilon } {-} {X}^\varepsilon _{k\Delta \wedge \tau ^\varepsilon }|^p \right] \lesssim \Delta ^{p/2} {+} \varepsilon ^{p} \log ^{p/2}(1+\varepsilon ^{-2}) {+} \left( \frac{\Delta }{\varepsilon }\right) ^{2p} \log ^{p}(1+\varepsilon ^{-2}). \end{aligned}$$

Proof

It suffices to bound every single term on the right-hand side of the equation

$$\begin{aligned} X^\varepsilon _{(k+1)\Delta \wedge \tau ^\varepsilon } - X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon } =&\int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } F(s,X^\varepsilon _s) \mathrm{d}s \\&+ \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \left( \sigma (s,X^\varepsilon _s) - \sigma (k\Delta \wedge \tau ^\varepsilon ,X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon }) \right) \mathrm{d}W^\varepsilon _s \\&+ \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \sigma (k\Delta \wedge \tau ^\varepsilon ,X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon }) \mathrm{d}W^\varepsilon _s. \end{aligned}$$

First, by (A1) and boundedness of \(X^\varepsilon \) on \([0,\tau ^\varepsilon ]\), we have that

$$\begin{aligned} \mathbb {E}\left[ \bigg | \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } F(s,X^\varepsilon _s) \mathrm{d}s \bigg |^p \right]&\lesssim \Delta ^p. \end{aligned}$$

Second, using Hölder’s inequality with \(q'>1/p\) and Lemma 3.1,

$$\begin{aligned} \mathbb {E}&\left[ \bigg | \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \left( \sigma (s,X^\varepsilon _s) - \sigma (k\Delta \wedge \tau ^\varepsilon ,X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon }) \right) \mathrm{d}W^\varepsilon _s \bigg |^p \right] \\&\lesssim \mathbb {E}\left[ \sup _{t \le T}\bigg | Y^\varepsilon _t \bigg |^p \bigg | \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \bigg | \sigma (s,X^\varepsilon _s) - \sigma (k\Delta \wedge \tau ^\varepsilon ,X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon }) \bigg | \mathrm{d}s \bigg |^p \right] \\&\lesssim \varepsilon ^{-p} \log ^{p/2}(1+\varepsilon ^{-2}) \left( \mathbb {E}\left[ \bigg | \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \bigg | \sigma (s,X^\varepsilon _s) - \sigma (k\Delta \wedge \tau ^\varepsilon ,X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon })\bigg | \mathrm{d}s \bigg |^{pq'} \right] \right) ^{1/q'} . \end{aligned}$$

Since \(pq'>1\) by assumption, we can estimate the integral above using Hölder’s inequality with exponents \(pq'\) and \(pq' /(pq'-1)\), (A2) and Lemma 3.3 to obtain

$$\begin{aligned}&\left( \mathbb {E}\left[ \bigg | \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \bigg | \sigma (s,X^\varepsilon _s) - \sigma (k\Delta \wedge \tau ^\varepsilon ,X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon }) \bigg | \mathrm{d}s \bigg |^{pq'} \right] \right) ^{1/q'} \\&\quad \lesssim \left( \mathbb {E}\left[ \bigg | \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \mathrm{d}s \, \bigg |^{pq'-1} \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \bigg | \sigma (s,X^\varepsilon _s) - \sigma (k\Delta \wedge \tau ^\varepsilon ,X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon }) \bigg |^{pq'} \mathrm{d}s \right] \right) ^{1/q'} \\&\quad \lesssim \Delta ^{p-1/q'} \, \left( \mathbb {E}\left[ \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \left( \bigg | X^\varepsilon _s - X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon } \bigg |^{pq'} + (s-k\Delta )^{pq'} \right) \mathrm{d}s \right] \right) ^{1/q'} \\&\quad \lesssim \Delta ^{p-1/q'} \left( \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \mathbb {E}\left[ \bigg | X^\varepsilon _s - X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon } \bigg |^{pq'} + (s-k\Delta )^{pq'} \right] \mathrm{d}s \right) ^{1/q'} \\&\quad \lesssim \Delta ^{p-1/q'} \left( \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } (s-k\Delta )^{pq'} \left( \varepsilon ^{-pq'} \log ^{pq'/2}(1+\varepsilon ^{-2})+ 1\right) \mathrm{d}s \right) ^{1/q'} \\&\quad \lesssim \varepsilon ^{-p} \log ^{p/2}(1+\varepsilon ^{-2}) \Delta ^{p-1/q'} \left( \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } (s-k\Delta )^{pq'} \mathrm{d}s \right) ^{1/q'} \lesssim \varepsilon ^{-p} \log ^{p/2}(1+\varepsilon ^{-2}) \Delta ^{2p}. \end{aligned}$$

Finally,

$$\begin{aligned}&\mathbb {E}\left[ \bigg | \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \sigma (k\Delta \wedge \tau ^\varepsilon ,X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon }) \mathrm{d}W^\varepsilon _s \bigg |^p \right] \lesssim \mathbb {E}\left[ \bigg | W^\varepsilon _{(k+1)\Delta \wedge \tau ^\varepsilon } - W^\varepsilon _{k\Delta \wedge \tau ^\varepsilon } \bigg |^p \right] \\&\quad \lesssim \Delta ^{p/2} + \varepsilon ^{p} \log ^{p/2}(1+\varepsilon ^{-2}), \end{aligned}$$

because, for every \(t_2 > t_1 \ge 0\),

$$\begin{aligned} W^\varepsilon _{t_2} - W^\varepsilon _{t_1}=&\int _{t_1}^{t_2} \left( \int _{-\infty }^s \varepsilon ^{-2}e^{-\varepsilon ^{-2}(s-r)}\mathrm{d}W_r\right) \mathrm{d}s \nonumber \\ =&W_{t_2} - W_{t_1} - \int _{-\infty }^{t_2} e^{-\varepsilon ^{-2}({t_2}-r)}\mathrm{d}W_r + \int _{-\infty }^{t_1} e^{-\varepsilon ^{-2}({t_1}-r)}\mathrm{d}W_r. \end{aligned}$$
(16)

\(\square \)

The next lemma controls the excursion of the limiting process \({\bar{X}}\) between adjacent nodes.

Lemma 3.5

For any \(p>1\), any deterministic time \(\tau \in (0,1)\), and any fixed \(k \in \{0,1,\dots ,[T/\Delta ]\}\),

$$\begin{aligned} \mathbb {E}\left[ \sup _{t \le \tau ,\,t+k\Delta \le T\wedge \tau ^\varepsilon }|{\bar{X}}_{t+k\Delta } - {\bar{X}}_{k\Delta }|^p \right] \lesssim \tau ^{\frac{p}{2}}. \end{aligned}$$

Proof

Since \(\beta =0\), by (13), the increment \({\bar{X}}_{t+k\Delta } - {\bar{X}}_{k\Delta }\) can be written as

$$\begin{aligned} {\bar{X}}_{t+k\Delta } - {\bar{X}}_{k\Delta }&= \int _{k\Delta }^{t+k\Delta } \left( F(s,{\bar{X}}_s) + C(s,{\bar{X}}_s)\right) \mathrm{d}s\\&\quad + \int _{k\Delta }^{t+k\Delta } \sigma (s,{\bar{X}}_s ) \mathrm{d}W_s, \quad \hbox {for} t+k\Delta \le T\wedge \tau ^\varepsilon . \end{aligned}$$

Therefore, using (A1), (A2), boundedness of \(X^\varepsilon \) on \([0,\tau ^\varepsilon ]\), and Burkholder–Davis–Gundy’s inequality, we obtain that

$$\begin{aligned}&\mathbb {E}\left[ \sup _{t \le \tau ,\,t+k\Delta \le T\wedge \tau ^\varepsilon }|{\bar{X}}_{t+k\Delta } - {\bar{X}}_{k\Delta }|^p \right] \lesssim \tau ^p\\&\quad + \mathbb {E}\left[ \sup _{t \le \tau ,\,t+k\Delta \le T\wedge \tau ^\varepsilon } \bigg | \int _{k\Delta }^{t+k\Delta } \sigma (s,{\bar{X}}_s ) \mathrm{d}W_s \bigg |^p \right] \lesssim \tau ^p + \tau ^{\frac{p}{2}}, \end{aligned}$$

which proves the lemma since \(\tau <1\). \(\square \)

Corollary 3.6

For any \(p>1\),

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} k=0,1,\dots ,[T/\Delta ] \\ t \le \Delta ,\, t+k\Delta \le T\wedge \tau ^\varepsilon \end{array}} |{\bar{X}}_{t+k\Delta } - {\bar{X}}_{k\Delta }|^p \right] \lesssim \Delta ^{\frac{p}{2}-1}. \end{aligned}$$

Proof

The claim easily follows from Lemma 3.5 with \(\tau =\Delta \), and the inequality

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} k=0,1,\dots ,[T/\Delta ] \\ t \le \Delta ,\, t+k\Delta \le T\wedge \tau ^\varepsilon \end{array}} |{\bar{X}}_{t+k\Delta } {-} {\bar{X}}_{k\Delta }|^p \right]&\lesssim \sum _{k=0}^{[T/\Delta ]} \mathbb {E}\left[ \sup _{t \le \Delta ,\, t+k\Delta \le T\wedge \tau ^\varepsilon } |{\bar{X}}_{t+k\Delta } - {\bar{X}}_{k\Delta }|^p \right] \\&\lesssim \sum _{k=0}^{[T/\Delta ]} \Delta ^{p/2} =\Delta ^{\frac{p}{2}-1}. \end{aligned}$$

\(\square \)

Corollary 3.7

Let \(\Delta =\Delta _\varepsilon >0\) depend on \(\varepsilon \) such that \(\Delta /\varepsilon \rightarrow 0\), as \(\varepsilon \downarrow 0\). Then,

$$\begin{aligned} \mathbb {E}\left[ \sup _{t \le T\wedge \tau ^\varepsilon }|X^\varepsilon _t - {\bar{X}}_t|^2\right] \lesssim \mathbb {E}\left[ \sup _{\begin{array}{c} k=0,1,\dots ,[T/\Delta ] \\ k\Delta \le \tau ^\varepsilon \end{array}} |X^\varepsilon _{k\Delta } - {\bar{X}}_{k\Delta }|^2\right] + o(1). \end{aligned}$$

Proof

First, by Hölder’s inequality with \(q>1\) and Lemma 3.6, we have that

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} k=0,1,\dots ,[T/\Delta ] \\ t \le \Delta ,\, t+k\Delta \le T\wedge \tau ^\varepsilon \end{array}} |{\bar{X}}_{t+k\Delta } {-} {\bar{X}}_{k\Delta }|^2 \right]&\lesssim \left( \right. \mathbb {E}\left[ \sup _{\begin{array}{c} k=0,1,\dots ,[T/\Delta ] \\ t \le \Delta ,\, t+k\Delta \le T\wedge \tau ^\varepsilon \end{array}} |{\bar{X}}_{t+k\Delta } {-} {\bar{X}}_{k\Delta }|^{2q} \right] \left. \right) ^{1/q} \\&\lesssim \Delta ^{1-1/q} \rightarrow 0 \hbox { as } \varepsilon \downarrow 0, \end{aligned}$$

since we have taken \(q>1\). Thus, the proof can easily be completed by combining the above and Lemma 3.3, while taking into account

$$\begin{aligned} |X^\varepsilon _t - {\bar{X}}_t|^2 \,\lesssim \, |X^\varepsilon _t - X^\varepsilon _{[t/\Delta ]\Delta }|^2 +|X^\varepsilon _{[t/\Delta ]\Delta }-{\bar{X}}_{[t/\Delta ]\Delta }|^2 +|{\bar{X}}_{[t/\Delta ]\Delta }-{\bar{X}}_t|^2, \end{aligned}$$

where \([t/\Delta ]\) is again our notation for the floor of \(t/\Delta \). \(\square \)

3.3 Proof of the discretized version

We now discuss our strategy to prove part (ii) of Theorem 2.2. Recall that we want

$$\begin{aligned} \mathbb {P}\left\{ \sup _{t \le T }|X^\varepsilon _t - {\bar{X}}_t| > \delta \right\} \rightarrow 0, \end{aligned}$$

for every fixed \(\delta >0\), as \(\varepsilon \downarrow 0\). As we have seen, by (14), (15) and Corollary 3.7, it suffices to prove

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} k=0,\dots ,[T/\Delta ] \\ k\Delta \le \tau ^\varepsilon \end{array}} \bigg | X^\varepsilon _{k\Delta }-{\bar{X}}_{k\Delta }\bigg |^2\right] \rightarrow 0, \quad \varepsilon \downarrow 0, \end{aligned}$$
(17)

for some \(\Delta = \Delta _\varepsilon = o(\varepsilon )\). The proof is inspired by [11, Sect. VI.7].

Hereafter, \(\partial \sigma \) denotes the derivative of \(\sigma \) with respect its first variable, and \(D \sigma \) denotes the derivative of \(\sigma \) with respect its second variable. To start with, by (10) without \(\beta \)-term, (A2), and (16), we have that

$$\begin{aligned} X^\varepsilon _{(k+1)\Delta } = \,&X^\varepsilon _{k\Delta } + \int _{k\Delta }^{(k+1)\Delta } F(s,X^\varepsilon _s) \mathrm{d}s + \int _{k\Delta }^{(k+1)\Delta } \sigma (s,X^\varepsilon _s) \mathrm{d}W^\varepsilon _s \nonumber \\ = \,&X^\varepsilon _{k\Delta } + \int _{k\Delta }^{(k+1)\Delta } \left( F(s,X^\varepsilon _s)- F(k\Delta ,X^\varepsilon _{k\Delta }) \right) \mathrm{d}s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } F(k\Delta ,X^\varepsilon _{k\Delta }) \mathrm{d}s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } \left( \sigma (s,X^\varepsilon _s) -\sigma (k\Delta ,X^\varepsilon _{k\Delta }) \right) \mathrm{d}W^\varepsilon _s + \int _{k\Delta }^{(k+1)\Delta } \sigma (k\Delta ,X^\varepsilon _{k\Delta }) \mathrm{d}W^\varepsilon _s \nonumber \\ = \,&X^\varepsilon _{k\Delta } + \int _{k\Delta }^{(k+1)\Delta } \left( F(s,X^\varepsilon _s)- F(k\Delta ,X^\varepsilon _{k\Delta }) \right) \mathrm{d}s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } F(k\Delta ,X^\varepsilon _{k\Delta }) \mathrm{d}s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s \left( \partial \sigma (r,X^\varepsilon _r) + D\sigma (r,X^\varepsilon _r)F(r,X^\varepsilon _r) \right) \mathrm{d}r \right) \mathrm{d}W^\varepsilon _s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s \left( D\sigma (r,X^\varepsilon _r)\sigma (r,X^\varepsilon _r) - D\sigma (k\Delta ,X^\varepsilon _{k\Delta })\sigma (k\Delta ,X^\varepsilon _{k\Delta })\right) \mathrm{d}W^\varepsilon _r \right) \mathrm{d}W^\varepsilon _s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s \left( D\sigma (k\Delta ,X^\varepsilon _{k\Delta })\sigma (k\Delta ,X^\varepsilon _{k\Delta }) - D\sigma (k\Delta ,{\bar{X}}_{k\Delta })\sigma (k\Delta ,{\bar{X}}_{k\Delta }) \right) \mathrm{d}W^\varepsilon _r \right) \mathrm{d}W^\varepsilon _s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s D\sigma (k\Delta ,{\bar{X}}_{k\Delta })\sigma (k\Delta ,{\bar{X}}_{k\Delta }) \mathrm{d}W^\varepsilon _r \right) \mathrm{d}W^\varepsilon _s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } \sigma (k\Delta ,X^\varepsilon _{k\Delta }) \mathrm{d}W_s \nonumber \\&+ \sigma (k\Delta ,X^\varepsilon _{k\Delta }) \varepsilon ^2 \left( Y^\varepsilon _{k\Delta } - Y^\varepsilon _{(k+1)\Delta } \right) \nonumber \\ = \,&X^\varepsilon _{k\Delta } + I^k_1 + I^k_2 + I^k_3 + I^k_4 + I^k_5 + I^k_6 + I^k_7 + I^k_8, \end{aligned}$$
(18)

for any \(k=0,\dots ,[T/\Delta ]\) such that \((k+1)\Delta \le T\).

Similarly, using (13) instead of (10), the process \({\bar{X}}\) satisfies

$$\begin{aligned} {\bar{X}}_{(k+1)\Delta } = \,&{\bar{X}}_{k\Delta } + \int _{k\Delta }^{(k+1)\Delta } \left( F(s,{\bar{X}}_s)- F(k\Delta ,{\bar{X}}_{k\Delta })\right) \mathrm{d}s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } F(k\Delta ,{\bar{X}}_{k\Delta }) \mathrm{d}s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } \left( C(s,{\bar{X}}_s) - C(k\Delta ,{\bar{X}}_{k\Delta })\right) \mathrm{d}s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } C(k\Delta ,{\bar{X}}_{k\Delta }) \mathrm{d}s \nonumber \\&+ \int _{k\Delta }^{(k+1)\Delta } \left( \sigma (s,{\bar{X}}_s) - \sigma (k\Delta ,{\bar{X}}_{k\Delta }) \right) \mathrm{d}W_s + \int _{k\Delta }^{(k+1)\Delta } \sigma (k\Delta ,{\bar{X}}_{k\Delta }) \mathrm{d}W_s \nonumber \\ = \,&{\bar{X}}_{k\Delta } + J^k_1 + J^k_2 + J^k_3 + J^k_4 + J^k_5 + J^k_6. \end{aligned}$$
(19)

Having in mind to apply Gronwall’s lemma, it turns out to be useful to summarize the contributions of the right-hand sides of (18), (19) as follows:

$$\begin{aligned} X^\varepsilon _{h\Delta } - {\bar{X}}_{h\Delta } =&\sum _{k=0}^{h-1} \left( I^k_2 - J^k_2 \right) + \sum _{k=0}^{h-1} \left( I^k_6 - J^k_4 \right) + \sum _{k=0}^{h-1} \left( I^k_7 - J^k_6 \right) + \sum _{k=0}^{h-1} I^k_5 \nonumber \\&+ \sum _{k=0}^{h-1} \left( I^k_1 + I^k_3 + I^k_4 + I^k_8 - J^k_1 - J^k_3 - J^k_5 \right) , \end{aligned}$$
(20)

for any \(h=1,\dots ,[T/\Delta ]\), which splits the difference \(X^\varepsilon _{h\Delta } - {\bar{X}}_{h\Delta }\) into 5 sums.

We at first prove that the 2nd and the 5th sum can be neglected when proving (17). The summands of the 5th sum are discussed in Lemma 3.8 below. The contribution of the 2nd sum though is more delicate and requires a martingale argument similar to that of [11, Theorem VI.7.1].

The remaining sums will be controlled in terms of the difference \(X^\varepsilon - {\bar{X}}\) itself, which allows them to be estimated via Gronwall’s lemma.

Of course, under assumption (A1), the function F is uniformly continuous when restricted to \([0,T] \times B_R(0)\), where \(B_R(0)\) is the closed ball of radius R in \(H_d\). In what follows, we will denote by \(\omega _F:[0,T] \rightarrow [0,\infty )\) the (local) modulus of continuity of \(F(\cdot ,x)\):

$$\begin{aligned} \big | F(t,x) - F(s,x) \big | \le \omega _F(|t-s|), \quad \hbox { for every } t,s \in [0,T], \hbox { and } x \in B_R(0). \end{aligned}$$

Obviously, the function \(\omega _F\) vanishes at zero, and without loss of generality, it can be chosen to be both non-decreasing and continuous.

Denote by \(\omega _\sigma \) the corresponding modulus of continuity of the derivative \(D\sigma (\cdot ,x)\), and let \(\omega _{F,\sigma } = \omega _F + \omega _\sigma \). Recall that, under assumption (A2), one can take \(\omega _\sigma (t) = C t^{\gamma }\) for some positive constant C and \(\gamma \in (0,1)\).

Lemma 3.8

For any \(p>1\):

$$\begin{aligned}&\mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} I^k_1 \bigg |^p + \bigg | \sum _{k=0}^{h-1} I^k_3 \bigg |^p\right] \lesssim \left( \frac{\Delta }{\varepsilon } \right) ^p \log ^{p/2}(1+\varepsilon ^{-2}) + \omega _F(\Delta )^p;\\&\quad \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} I^k_4 \bigg |^p\right] \lesssim \left( \frac{\Delta ^2}{\varepsilon ^3} \right) ^p \log ^{3p/2}(1+\varepsilon ^{-2}) \\&\quad + \left( \frac{\Delta }{\varepsilon ^2}\right) ^p \log ^{p}(1+\varepsilon ^{-2})\,\omega _\sigma (\Delta )^p;\\&\quad \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} I^k_8 \bigg |^p\right] \lesssim \left( \frac{\varepsilon ^2}{\Delta }\right) ^{p/2} \log ^{p/2}(1+\varepsilon ^{-2}) +\left( \frac{\varepsilon ^2}{\Delta }\right) ^{p} \log ^{p}(1+\varepsilon ^{-2})\\&\quad + \left( \frac{\Delta }{\varepsilon }\right) ^{p} \log ^{p}(1+\varepsilon ^{-2});\\&\quad \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} J^k_1 \bigg |^p + \bigg | \sum _{k=0}^{h-1} J^k_3 \bigg |^p + \bigg | \sum _{k=0}^{h-1} J^k_5 \bigg |^p\right] \lesssim \Delta ^{p/2} + \omega _{F,\sigma }(\Delta )^p. \end{aligned}$$

Proof

Throughout this proof, we will frequently make use of (A1), (A2) without explicit mentioning.

For \(\sum I^k_1\), by Hölder’s inequality and Lemma 3.3,

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} I^k_1 \bigg |^p\right]&\lesssim \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} \int _{k\Delta }^{(k+1)\Delta } \left( \bigg |X^\varepsilon _s - X^\varepsilon _{k\Delta } \bigg |\right. \right. \\&\left. \left. \quad + \omega _F(s-k\Delta ) \right) \mathrm{d}s \bigg |^p\right] \\&\lesssim \sum _{k=0}^{[T/\Delta ]-1} \int _{k\Delta }^{(k+1)\Delta } \mathbb {E}\left[ \bigg |X^\varepsilon _{s\wedge \tau ^\varepsilon } - X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon } \bigg |^p +\omega _F(\Delta )^p \right] \mathrm{d}s \\&\lesssim \left( \frac{\Delta }{\varepsilon } \right) ^p \log ^{p/2}(1+\varepsilon ^{-2})+ \omega _F(\Delta )^p. \end{aligned}$$

For \(\sum I^k_3\), by Hölder’s inequality and Lemma 3.1,

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} I^k_3 \bigg |^p\right]&\lesssim \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg |\, \sup _{t \le T}\bigg | Y^\varepsilon _t \bigg | \sum _{k=0}^{h-1} \int _{k\Delta }^{(k+1)\Delta } \left( s-k\Delta \right) \mathrm{d}s \bigg |^p\right] \\&\lesssim \mathbb {E}\left[ \sup _{t \le T}\bigg | Y^\varepsilon _t \bigg |^p \sum _{k=0}^{[T/\Delta ]-1} \int _{k\Delta }^{(k+1)\Delta } \bigg | s-k\Delta \bigg |^p \mathrm{d}s \right] \\&\lesssim \left( \frac{\Delta }{\varepsilon } \right) ^p \log ^{p/2}(1+\varepsilon ^{-2}). \end{aligned}$$

For \(\sum I^k_4\), by Hölder’s inequality, Lemmas 3.1 and 3.3,

$$\begin{aligned} \mathbb {E}&\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} I^k_4 \bigg |^p\right] \\&\lesssim \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg |\, \sup _{t \le T}\bigg | Y^\varepsilon _t \bigg |^2 \sum _{k=0}^{h-1} \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s \left( \bigg | X^\varepsilon _r - X^\varepsilon _{k\Delta } \bigg | + \omega _\sigma (r-k\Delta ) \right) \mathrm{d}r \right) \mathrm{d}s \bigg |^p\right] \\&\lesssim \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \sup _{t \le T}\bigg | Y^\varepsilon _t \bigg |^{2p} \sum _{k=0}^{h-1} \int _{k\Delta }^{(k+1)\Delta } \bigg | \int _{k\Delta }^s \left( \bigg | X^\varepsilon _r - X^\varepsilon _{k\Delta } \bigg | + \omega _\sigma (r-k\Delta ) \right) \mathrm{d}r \bigg |^p \mathrm{d}s \right] \\&\lesssim \varepsilon ^{-2p} \log ^{p}(1+\varepsilon ^{-2})\left( \sum _{k=0}^{[T/\Delta ]-1} \int _{k\Delta }^{(k+1)\Delta } (s-k\Delta )^{pq'-1}\right. \\&\left. \qquad \int _{k\Delta }^s \left( \mathbb {E}\left[ \bigg | X^\varepsilon _{r\wedge \tau ^\varepsilon } - X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon } \bigg |^{pq'} + \omega _\sigma (\Delta )^{pq'} \right] \mathrm{d}r \right) \mathrm{d}s \right) ^{1/q'}\\&\lesssim \varepsilon ^{-3p}\log ^{3p/2}(1+\varepsilon ^{-2}) \left( \sum _{k=0}^{[T/\Delta ]-1}\int _{k\Delta }^{(k+1)\Delta } (s-k\Delta )^{2pq'} \mathrm{d}s \right) ^{1/q'}\\&\quad + \left( \frac{\Delta }{\varepsilon ^2}\right) ^p \log ^{p}(1+\varepsilon ^{-2}) \,\omega _\sigma (\Delta )^p \\&\lesssim \left( \frac{\Delta ^2}{\varepsilon ^3} \right) ^p \log ^{3p/2}(1+\varepsilon ^{-2})\\&\quad + \left( \frac{\Delta }{\varepsilon ^2}\right) ^p \log ^{p}(1+\varepsilon ^{-2}) \, \omega _\sigma (\Delta )^p. \end{aligned}$$

We now consider \(\sum I^k_8\). Here, the idea is to convert \(Y^\varepsilon \)-increments into \(X^\varepsilon \)-increments via integration by parts since \(X^\varepsilon \)-increments are easier to control. This way, applying Lemmas 3.1 and 3.4,

$$\begin{aligned}&\mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta {\le } \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} I^k_8 \bigg |^p\right] {\lesssim } \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} \sigma (k\Delta ,X^\varepsilon _{k\Delta })\varepsilon ^2 \left( Y^\varepsilon _{k\Delta } {-} Y^\varepsilon _{(k+1)\Delta } \right) \bigg |^p\right] \\&\quad \lesssim \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=1}^{h} \left( \sigma (k\Delta ,X^\varepsilon _{k\Delta }) - \sigma ((k-1)\Delta ,X^\varepsilon _{(k-1)\Delta }) \right) \varepsilon ^2 Y^\varepsilon _{k\Delta }\bigg |^p\right] \\&\quad \lesssim \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \sup _{t \le T}\bigg | \varepsilon ^2 Y^\varepsilon _t \bigg |^{p} \bigg | \sum _{k=1}^{h} \left( \bigg | X^\varepsilon _{k\Delta } - X^\varepsilon _{(k-1)\Delta }\bigg | + \Delta \right) \bigg |^p\right] \\&\quad \lesssim \mathbb {E}\left[ \sup _{t \le T}\bigg | \varepsilon ^2 Y^\varepsilon _t \bigg |^{pq} \right] ^{1/q} \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=1}^{h} \left( \bigg | X^\varepsilon _{k\Delta } - X^\varepsilon _{(k-1)\Delta }\bigg | + \Delta \right) \bigg |^{pq'} \right] ^{1/q'} \\&\quad \lesssim \varepsilon ^p \log ^{p/2}(1+\varepsilon ^{-2}) \Delta ^{1/q'-p} \left( \sum _{k=1}^{[T/\Delta ]} \mathbb {E}\left[ \bigg | X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon } - X^\varepsilon _{(k-1)\Delta \wedge \tau ^\varepsilon }\bigg |^{pq'} + \Delta ^{pq'} \right] \right) ^{1/q'} \\&\quad \lesssim \varepsilon ^p \log ^{p/2}(1+\varepsilon ^{-2}) \Delta ^{-p} \left( \Delta ^{pq'/2} + \varepsilon ^{pq'} \log ^{pq'/2}(1+\varepsilon ^{-2}) + \left( \frac{\Delta }{\varepsilon }\right) ^{2pq'}\right. \\&\left. \qquad \log ^{pq'}(1+\varepsilon ^{-2})\right) ^{1/q'} \\&\quad \lesssim \left( \frac{\varepsilon ^2}{\Delta }\right) ^{p/2} \log ^{p/2}(1+\varepsilon ^{-2}) +\left( \frac{\varepsilon ^2}{\Delta }\right) ^{p} \log ^{p}(1+\varepsilon ^{-2})+ \left( \frac{\Delta }{\varepsilon }\right) ^{p} \log ^{p}(1+\varepsilon ^{-2}). \end{aligned}$$

In a similar way, for \(\sum J^k_1\) and \(\sum J^k_3\), now applying Lemma 3.5,

$$\begin{aligned}&\mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} J^k_1 \bigg |^p + \bigg | \sum _{k=0}^{h-1} J^k_3 \bigg |^p\right] \\&\quad \lesssim \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} \int _{k\Delta }^{(k+1)\Delta } \left( \bigg |{\bar{X}}_s - {\bar{X}}_{k\Delta } \bigg | +\omega _{F,\sigma }(s-k\Delta ) \right) \mathrm{d}s \bigg |^p\right] \\&\quad \lesssim \sum _{k=0}^{[T/\Delta ]-1} \int _{k\Delta }^{(k+1)\Delta } \mathbb {E}\left[ \bigg |{\bar{X}}_{s\wedge \tau ^\varepsilon } - {\bar{X}}_{k\Delta \wedge \tau ^\varepsilon } \bigg |^p +\omega _{F,\sigma }(\Delta )^p \right] \mathrm{d}s\\&\quad \lesssim \Delta ^{p/2} +\omega _{F,\sigma }(\Delta )^p . \end{aligned}$$

For the last sum \(\sum J^k_5\), by Burkholder–Davis–Gundy’s inequality and Lemma 3.5,

$$\begin{aligned}&\mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} J^k_5 \bigg |^p\right] \lesssim \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} \int _{k\Delta }^{(k+1)\Delta } \left( \sigma (s,{\bar{X}}_s) - \sigma (k\Delta ,{\bar{X}}_{k\Delta }) \right) \mathrm{d}W_s \bigg |^p\right] \\&\quad \lesssim \mathbb {E}\left[ \bigg | \sum _{k=0}^{[T/\Delta ]-1} \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \bigg | \sigma (s,{\bar{X}}_s) - \sigma (k\Delta ,{\bar{X}}_{k\Delta }) \bigg |^2 \mathrm{d}s \bigg |^{p/2}\right] \\&\quad \lesssim \mathbb {E}\left[ \bigg | \sum _{k=0}^{[T/\Delta ]-1} \int _{k\Delta \wedge \tau ^\varepsilon }^{(k+1)\Delta \wedge \tau ^\varepsilon } \bigg | \sigma (s,{\bar{X}}_s) - \sigma (k\Delta ,{\bar{X}}_{k\Delta }) \bigg |^2 \mathrm{d}s \bigg |^{p}\,\right] ^{1/2} \\&\quad \lesssim \left( \sum _{k=0}^{[T/\Delta ]-1} \int _{k\Delta }^{(k+1)\Delta } \mathbb {E}\left[ \bigg |{\bar{X}}_{s\wedge \tau ^\varepsilon } - {\bar{X}}_{k\Delta \wedge \tau ^\varepsilon } \bigg |^{2p} +(s-k\Delta )^{2p} \right] \mathrm{d}s \right) ^{1/2} \lesssim \;\Delta ^{p/2}. \end{aligned}$$

\(\square \)

Remark 3.9

The estimates given in Lemma 3.8 motivate the following choice of how \(\Delta =\Delta _\varepsilon \) should behave when \(\varepsilon \) goes to zero:

$$\begin{aligned} \frac{\Delta ^2}{\varepsilon ^3} \log ^{3/2}(1+\varepsilon ^{-2}) \rightarrow 0,\quad \frac{\Delta }{\varepsilon ^2} \log (1+\varepsilon ^{-2}) \,\omega _\sigma (\Delta ) \rightarrow 0,\quad \frac{\varepsilon ^2}{\Delta } \log ^{1/2}(1+\varepsilon ^{-2}) \rightarrow 0. \end{aligned}$$

Such a choice is always possible. Indeed, under assumption (A2), one can take \(\omega _\sigma (t) = C t^{\gamma }\) for some positive constant C and \(\gamma \in (0,2/3)\), and therefore the choice \(\Delta _\varepsilon = \varepsilon ^{\frac{2}{1+\gamma /2}}\) satisfies all the requirements above. We will maintain this choice of \(\Delta \) in the remainder of the paper.

We now discuss the 2nd sum on the right-hand side of (20), that is

$$\begin{aligned}&\sum _{k=0}^{h-1} \left( \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s D\sigma (k\Delta ,{\bar{X}}_{k\Delta })\sigma (k\Delta ,{\bar{X}}_{k\Delta }) \mathrm{d}W^\varepsilon _r \right) \mathrm{d}W^\varepsilon _s - \int _{k\Delta }^{(k+1)\Delta } C(k\Delta ,{\bar{X}}_{k\Delta }) \mathrm{d}s \right) , \end{aligned}$$

the i-th component of which, when plugging in (11), reads

$$\begin{aligned} \sum _{k=0}^{h-1} \sum _{\ell ,m \in \mathbb {N}} \sum _{j=1,\dots ,d}D_j \sigma ^{i,m}(k\Delta ,{\bar{X}}_{k\Delta })\sigma ^{j,\ell }(k\Delta ,{\bar{X}}_{k\Delta }) \left( c_{\ell ,m}^k(\Delta ,\varepsilon ) -\delta _{\ell ,m}\frac{q_m}{2} \Delta \right) , \end{aligned}$$

where \(c_{\ell ,m}^k(\Delta ,\varepsilon )\) is given by

$$\begin{aligned} c_{\ell ,m}^k(\Delta ,\varepsilon ) = \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s \mathrm{d}W^{\varepsilon ,\ell }_r \right) \mathrm{d}W^{\varepsilon ,m}_s. \end{aligned}$$

Taking the conditional expectation of \(c_{\ell ,m}^k(\Delta ,\varepsilon )\) with respect to \({\mathcal {F}}_{k\Delta }\) yields

$$\begin{aligned} \mathbb {E}\left[ c_{\ell ,m}^k(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right]&= \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s \mathbb {E}\left[ Y^{\varepsilon ,\ell }_r Y^{\varepsilon ,m}_s \mid {\mathcal {F}}_{k\Delta }\right] \mathrm{d}r \right) \mathrm{d}s \\&= Y^{\varepsilon ,\ell }_{k\Delta } Y^{\varepsilon ,m}_{k\Delta } \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s e^{-\varepsilon ^{-2}(r+s-2k\Delta )} \mathrm{d}r \right) \mathrm{d}s \\&+ \delta _{\ell ,m} \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s q_\ell \frac{\varepsilon ^{-2}}{2} \left( e^{-\varepsilon ^{-2}(s-r)}- e^{-\varepsilon ^{-2}(r+s-2k\Delta )} \right) \mathrm{d}r \right) \mathrm{d}s, \end{aligned}$$

where the following representation of \(Y^\varepsilon \),

$$\begin{aligned} Y^{\varepsilon ,m}_{s} = Y^{\varepsilon ,m}_{k\Delta } e^{-\varepsilon ^{-2}(s-k\Delta )} + \int _{k\Delta }^{s} e^{-\varepsilon ^{-2}(s-r)} \varepsilon ^{-2} \mathrm{d}W^m_r, \end{aligned}$$

has been used, and this conditional expectation can easily be calculated as

$$\begin{aligned} \mathbb {E}\left[ c_{\ell ,m}^k(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right]&= \frac{\varepsilon ^4}{2} Y^{\varepsilon ,\ell }_{k\Delta } Y^{\varepsilon ,m}_{k\Delta } \left( e^{-\varepsilon ^{-2}\Delta } -1 \right) ^2 \nonumber \\&\quad + \delta _{\ell ,m}\frac{q_m}{2} \left( \Delta + \varepsilon ^2 \left( - \frac{3}{2} + 2 e^{-\varepsilon ^{-2}\Delta } - \frac{1}{2} e^{-2\varepsilon ^{-2}\Delta }\right) \right) . \end{aligned}$$
(21)

Now, since \(\sum _{j=1,\dots ,d} D_j \sigma ^{i,m}(k\Delta ,{\bar{X}}_{\tau ^\varepsilon \wedge (k\Delta )}) \sigma ^{j,\ell }(k\Delta ,{\bar{X}}_{\tau ^\varepsilon \wedge (k\Delta )})\) is \({\mathcal {F}}_{k\Delta }\) measurable, for every \(\ell ,m \in \mathbb {N}\), \(i=1,\dots ,d\), each process \(M^i_h,\,h=1,\dots ,[T/\Delta ]\), given by

$$\begin{aligned} M^i_h= & {} \sum _{k=0}^{h-1} \sum _{\ell ,m \in \mathbb {N}} \sum _{j=1,\dots ,d} D_j \sigma ^{i,m}(k\Delta ,{\bar{X}}_{\tau ^\varepsilon \wedge (k\Delta )}) \sigma ^{j,\ell }(k\Delta ,{\bar{X}}_{\tau ^\varepsilon \wedge (k\Delta )}) \left( c_{\ell ,m}^k(\Delta ,\varepsilon )\right. \\&\left. -\mathbb {E}\left[ c_{\ell ,m}^k(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right] \right) , \end{aligned}$$

is a discrete martingale with respect to the filtration \(({\mathcal {F}}_{h\Delta })_{h=1}^{[T/\Delta ]}\).

Lemma 3.10

For each \(i=1,\dots ,d\),

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | M^i_h \bigg |^2\right] \lesssim \left( \frac{\Delta }{\varepsilon }\right) ^2 \log (1+\varepsilon ^{-2}) + \Delta \log ^2(1+\varepsilon ^{-2}). \end{aligned}$$

Proof

Combining Doob’s maximal inequality and martingale property gives

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | M^i_h \bigg |^2\right]&\lesssim \mathbb {E}\left[ \bigg | M^i_{[T/\Delta ]} \bigg |^2\right] \\&\lesssim \sum _{k=0}^{[T /\Delta ]{-}1} \mathbb {E}\left[ \bigg | \sum _{\ell ,m \in \mathbb {N}} c_{\ell ,m}^{k}(\Delta ,\varepsilon ) -\mathbb {E}\left[ c_{\ell ,m}^{}(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right] \bigg |^2 \right] , \end{aligned}$$

where

$$\begin{aligned} \mathbb {E}\left[ \bigg | \sum _{\ell ,m \in \mathbb {N}} c_{\ell ,m}^{k}(\Delta ,\varepsilon ) -\mathbb {E}\left[ c_{\ell ,m}^{k}(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right] \bigg |^2 \right]&\lesssim \mathbb {E}\left[ \bigg | \sum _{\ell ,m \in \mathbb {N}} c_{\ell ,m}^{k}(\Delta ,\varepsilon ) \bigg |^2 \right] , \end{aligned}$$

for each \(k=0,\dots ,[T/\Delta ]-1\), because the conditional expectation is an \(L^2\)-projection. Thus, by independence of \(Y^{\varepsilon ,\ell }\) and \(Y^{\varepsilon ,m}\), for every \(\ell \ne m\), we can estimate

$$\begin{aligned}&\mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,T/\Delta \\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | M^i_h \bigg |^2\right] \lesssim \sum _{k=0}^{T/\Delta -1} \sum _{\ell ,m \in \mathbb {N}} \mathbb {E}\left[ \bigg | \int _{k\Delta }^{(k+1)\Delta } \left( W^{\varepsilon ,\ell }_s -W^{\varepsilon ,\ell }_{k\Delta } \right) \mathrm{d}W^{\varepsilon ,m}_s \bigg |^2 \right] \\&\quad \lesssim \sum _{k=0}^{T/\Delta -1} \sum _{\ell ,m \in \mathbb {N}} \Delta \int _{k\Delta }^{(k+1)\Delta } \mathbb {E}\left[ \bigg | \left( W^{\varepsilon ,\ell }_s -W^{\varepsilon ,\ell }_{k\Delta } \right) Y^{\varepsilon ,m}_s \bigg |^2 \right] \mathrm{d}s\\&\quad \lesssim \sum _{k=0}^{T/\Delta -1} \sum _{\ell ,m \in \mathbb {N}} \Delta \int _{k\Delta }^{(k+1)\Delta } \mathbb {E}\left[ \bigg | W^{\varepsilon ,\ell }_s - W^{\varepsilon ,\ell }_{k\Delta } \bigg |^{2q} \right] ^{1/q} \mathbb {E}\left[ \bigg | Y^{\varepsilon ,m}_s \bigg |^{2q'} \right] ^{1/q'} \mathrm{d}s\\&\quad \lesssim \sum _{k=0}^{T/\Delta -1} \sum _{\ell ,m \in \mathbb {N}} q_\ell q_m \Delta \varepsilon ^{-2} \log (1+\varepsilon ^{-2})\int _{k\Delta }^{(k+1)\Delta } \left( \Delta + \varepsilon ^2 \log (1+\varepsilon ^{-2}) \right) \mathrm{d}s \\&\quad \lesssim \left( \frac{\Delta }{\varepsilon }\right) ^2 \log (1+\varepsilon ^{-2}) + \Delta \log ^2(1+\varepsilon ^{-2}). \end{aligned}$$

\(\square \)

To eventually cover the remainder of the 2nd sum on the right-hand side of (20), after subtracting the martingale term \(M_h\), we introduce

$$\begin{aligned} N^i_h= & {} \sum _{k=0}^{h-1} \sum _{\ell ,m \in \mathbb {N}}\sum _{j=1,\dots ,d} D_j \sigma ^{i,m}(k\Delta ,{\bar{X}}_{k\Delta })\sigma ^{j,\ell }(k\Delta ,{\bar{X}}_{k\Delta }) \left( \mathbb {E}\left[ c_{\ell ,m}^k(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right] - \delta _{\ell ,m}\frac{q_m}{2} \Delta \right) . \end{aligned}$$

Lemma 3.11

For each \(i=1,\dots ,d\),

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | N^i_h \bigg |^2\right] \lesssim \left( \frac{\varepsilon ^2}{\Delta }\right) ^2 \log ^2(1+\varepsilon ^{-2}). \end{aligned}$$

Proof

The proof is an easy consequence of (21). Indeed,

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | N^i_h \bigg |^2\right]&\lesssim \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} \sum _{\ell ,m \in \mathbb {N}} \bigg |\mathbb {E}\left[ c_{\ell ,m}^k(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right] - \delta _{\ell ,m}\frac{q_m}{2} \Delta \bigg | \bigg |^2\right] \\&\lesssim \varepsilon ^4 \log ^2(1+\varepsilon ^{-2}) \Delta ^{-1} \sum _{k=0}^{[T/\Delta ]-1} \sum _{\ell ,m \in \mathbb {N}} q_\ell q_m\\&\lesssim \left( \frac{\varepsilon ^2}{\Delta }\right) ^2 \log ^2(1+\varepsilon ^{-2}). \end{aligned}$$

\(\square \)

All in all, Lemmas 3.10 and 3.11 together imply

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \sum _{k=0}^{h-1} \left( I^k_6 - J^k_4 \right) \bigg |^2\right]&= \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ]\\ h\Delta \le \tau ^\varepsilon \end{array}} \bigg | \left( M_h + N_h \right) \bigg |^2\right] \\&\lesssim \left( \frac{\Delta }{\varepsilon }\right) ^2 \log (1+\varepsilon ^{-2}) + \Delta \log ^2(1+\varepsilon ^{-2})+\left( \frac{\varepsilon ^2}{\Delta }\right) ^2 \log ^2(1+\varepsilon ^{-2}), \end{aligned}$$

showing that the 2nd sum on the right-hand side of (20) can be neglected, like the 5th one, when \(\varepsilon \downarrow 0\), and \(\Delta =\Delta _\varepsilon \) behaves as described in Remark 3.9.

Recall that we wanted to control the remaining sums in terms of the difference \(X^\varepsilon - {\bar{X}}\) itself, which is obvious for the first and third sum on the right-hand side of (20). However, in case of the fourth sum, applying almost the same martingale argument used in case of the 2nd sum, each term \(I^k_5\) can be formally replaced by \(\int _{k\Delta }^{(k+1)\Delta } \left( C(k\Delta ,X^\varepsilon _{k\Delta }) - C(k\Delta ,{\bar{X}}_{k\Delta }) \right) \mathrm{d}s \), subject to a sufficiently small \(\varepsilon \)-correction, eventually leading to the wanted contraction argument in this case, too.

On the whole, we have justified that, if \(\Delta =\Delta _\varepsilon \) behaves as described in Remark 3.9, then

$$\begin{aligned}&\mathbb {E}\left[ \sup _{\begin{array}{c} k'=0,\dots ,h\\ k'\!\Delta \le \tau ^\varepsilon \end{array}} \bigg | X^\varepsilon _{k'\Delta } - {\bar{X}}_{k'\Delta } \bigg |^2 \right] \lesssim \;r(\Delta ,\varepsilon ) + \sum _{k=0}^{h-1} \Delta \mathbb {E}\left[ \sup _{\begin{array}{c} k'=0,\dots ,k\\ k'\!\Delta \le \tau ^\varepsilon \end{array}} \bigg | X^\varepsilon _{k'\Delta } - {\bar{X}}_{k'\Delta } \bigg |^2 \right] ,\\&\quad h=1,\dots ,[T/\Delta ], \end{aligned}$$

where \(r(\Delta ,\varepsilon ) \rightarrow 0,\,\varepsilon \downarrow 0\), finally proving (17), by Gronwall’s lemma.

The proof of Theorem 2.2(ii) is thus complete.

4 Weak convergence

In this section we prove part (i) of Theorem 2.2. The idea of proof is similar to the one of part (ii), except that now \(\beta \not =0\) is possible. It is the existence of this bilinear term which prevents us from proving convergence in probability—we only succeed in showing convergence in law (see Remark 4.3(ii)).

First, we prove weak convergence of the bilinear term.

Second, we prove convergence in law of \(X^\varepsilon ,\,\varepsilon \downarrow 0\), using bounds similar to those obtained in Sect. 3.

4.1 Weak convergence of the bilinear term

For any \(\varepsilon >0\), define the process \(U^\varepsilon \) by

$$\begin{aligned} U^\varepsilon _t = \int _0^t \varepsilon \beta (Y^\varepsilon _s,Y^\varepsilon _s) \mathrm{d}s, \quad t\in [0,T], \end{aligned}$$
(22)

where \(Y^\varepsilon \) is the stationary Ornstein–Uhlenbeck process introduced in Remark 2.1. By (A5), the process \(U^\varepsilon \) has zero mean, and, using (A3), its second moments,

$$\begin{aligned} \mathbb {E}\left[ \int _0^t \varepsilon \underbrace{ \langle \beta (Y^\varepsilon _s,Y^\varepsilon _s) , {\mathbf {e}}_i \rangle }_{\beta ^i(Y^\varepsilon _s,Y^\varepsilon _s)} \mathrm{d}s \int _0^t \varepsilon \underbrace{ \langle \beta (Y^\varepsilon _r,Y^\varepsilon _r) , {\mathbf {e}}_j \rangle }_{\beta ^j(Y^\varepsilon _r,Y^\varepsilon _r)} \mathrm{d}r\right] , \end{aligned}$$

can be calculated to be

$$\begin{aligned} \frac{1}{2} \sum _{\ell , m \in \mathbb {N}} \underbrace{ \langle \beta ({\mathbf {f}}_\ell ,{\mathbf {f}}_m) , {\mathbf {e}}_i \rangle }_{\beta ^i_{\ell ,m} } \underbrace{ \langle \beta ({\mathbf {f}}_\ell ,{\mathbf {f}}_m) , {\mathbf {e}}_j \rangle }_{\beta ^j_{\ell ,m} } q_\ell q_m \left( t + \frac{\varepsilon ^2}{2}\left( e^{-2\varepsilon ^{-2}t}-1\right) \right) , \end{aligned}$$

for \(i,j=1,\dots ,d\), and \(\ell ,m \in \mathbb {N}\).

Recalling (12), using the above short notation, we also have that

$$\begin{aligned} b^i_{\ell ,m} = \beta ^i_{\ell ,m} \sqrt{\frac{q_\ell q_m}{2}}, \quad i=1,\dots ,d,\;\ell ,m\in \mathbb {N}. \end{aligned}$$

Next, since \(\mathrm{d}Y^{\varepsilon ,\ell }_t = -\varepsilon ^{-2}Y^{\varepsilon ,\ell }_t \mathrm{d}t + \varepsilon ^{-2} d\langle W_t , {\mathbf {f}}_\ell \rangle \), Itô’s formula implies

$$\begin{aligned} Y^{\varepsilon ,\ell }_t Y^{\varepsilon ,m}_t&= Y^{\varepsilon ,\ell }_0 Y^{\varepsilon ,m}_0 - 2\varepsilon ^{-2} \int _0^t Y^{\varepsilon ,\ell }_s Y^{\varepsilon ,m}_s \mathrm{d}s +\varepsilon ^{-2} \int _0^t Y^{\varepsilon ,\ell }_s d\langle W_s , {\mathbf {f}}_m \rangle \\&\quad +\varepsilon ^{-2}\quad \int _0^t Y^{\varepsilon ,m}_s d\langle W_s , {\mathbf {f}}_\ell \rangle + \frac{t \varepsilon ^{-4}}{2} q_\ell \delta _{\ell ,m}, \end{aligned}$$

for any \(\ell ,m\in \mathbb {N}\), and hence

$$\begin{aligned} U^{\varepsilon ,i}_t = \int _0^t \varepsilon \sum _{\ell ,m \in \mathbb {N}} \beta ^i_{\ell ,m} Y^{\varepsilon ,\ell }_s Y^{\varepsilon ,m}_s \mathrm{d}s&= \;\varepsilon \int _0^t \sum _{\ell ,m \in \mathbb {N}} \beta ^i_{\ell ,m} Y^{\varepsilon ,\ell }_s d\langle W_s , {\mathbf {f}}_m \rangle \\&\quad - \frac{\varepsilon ^3}{2} \sum _{\ell ,m \in \mathbb {N}} \beta ^i_{\ell ,m} \left( Y^{\varepsilon ,\ell }_t Y^{\varepsilon ,m}_t - Y^{\varepsilon ,\ell }_0 Y^{\varepsilon ,m}_0\right) +\,\frac{\varepsilon ^{-1}}{4}\,t\sum _{\ell \in \mathbb {N}}\beta ^i_{\ell ,\ell }q_\ell \\&= \;M^{\varepsilon ,i}_t - \frac{1}{2} V^{\varepsilon ,i}_t \,+\,\frac{\varepsilon ^{-1}}{4}\,t\sum _{\ell \in \mathbb {N}}\beta ^i_{\ell ,\ell }q_\ell , \end{aligned}$$

where \(M^\varepsilon \) is a d-dimensional continuous local martingale, while the process \(V^\varepsilon \) satisfies

$$\begin{aligned} \mathbb {E}\left[ \sup _{t \le T} \bigg | V^{\varepsilon }_t \bigg |^p \right]&= \mathbb {E}\left[ \sup _{t \le T} \bigg | \varepsilon ^3 \left( \beta \left( Y^{\varepsilon }_t,Y^{\varepsilon }_t\right) - \beta \left( Y^{\varepsilon }_0,Y^{\varepsilon }_0\right) \right) \bigg |^p \right] \lesssim \varepsilon ^p \log ^p(1+\varepsilon ^{-2}),\quad \forall \,p>1, \end{aligned}$$
(23)

by combining (A3) and Lemma3.1.

Remark 4.1

Using \(\sum _{\ell ,m \in \mathbb {N}} \beta ^i_{\ell ,m} q_\ell q_m < \infty \) for every \(i=1,\dots ,d\), it is possible to prove that \(M^\varepsilon \) is a square integrable martingale for every \(\varepsilon >0\). However, we will not need this in the following.

The above representation of \(U^\varepsilon \), though very simple, has been used in a variety of cases in a fruitful way, see for instance [19] or [10]. Observe that, by (A5), the Itô-correction actually cancels out, being otherwise a contribution of order \(\varepsilon ^{-1}\). The process \(U^\varepsilon \), nevertheless, has got an interesting limit in law:

Proposition 4.2

The couple of processes \((U^\varepsilon ,W)\) converges in law, \(\varepsilon \downarrow 0\), to a pair of processes \((\eta ,\omega )\), where \(\eta \) is a d-dimensional Wiener process with covariance \((\sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m}b^j_{\ell ,m})_{i,j=1}^d\), and \(\omega \) is a Q-Wiener process, like W. Furthermore, \(\eta \) and \(\omega \) are independent.

Proof

First, by (23), it is sufficient to prove the proposition for \((M^\varepsilon ,W)\) instead of \((U^\varepsilon ,W)\).

Since all components of the processes \(M^\varepsilon ,\,\varepsilon >0\), and of W are continuous local martingales, the distributional properties of the limit \((\eta ,\omega )\) would follow from [4, Chapter VII, Theorem 1.4], if

$$\begin{aligned} \mathbb {E}\left[ \left( \left[ M^{\varepsilon ,i} , M^{\varepsilon ,j} \right] _{t} - t \sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m}b^j_{\ell ,m} \right) ^2 \right] \rightarrow 0,\quad \varepsilon \downarrow 0, \end{aligned}$$
(24)

for each \(t \in [0,T]\), and \(i,j =1,\dots , d\), as well as

$$\begin{aligned} \mathbb {E}\left[ \left( \left[ M^{\varepsilon ,i} , \langle W , {\mathbf {f}}_m \rangle \right] _{t}\,\right) ^2 \right] \rightarrow 0,\quad \varepsilon \downarrow 0, \end{aligned}$$

for each \(t\in [0,T],\,i=1,\dots ,d\), and \(m\in \mathbb {N}\).

First, fix \(t \in [0,T]\), as well as \(i,j =1,\dots , d\). Then, the quadratic covariation \(\left[ M^{\varepsilon ,i} , M^{\varepsilon ,j} \right] _{t}\) is given by

$$\begin{aligned} \left[ M^{\varepsilon ,i} , M^{\varepsilon ,j} \right] _{t} = \varepsilon ^2 \int _0^t \sum _{m \in \mathbb {N}} \sum _{\ell ,\ell ' \in \mathbb {N}} \beta ^i_{\ell ,m} \beta ^j_{\ell ',m} q_m Y^{\varepsilon ,\ell }_s Y^{\varepsilon ,\ell '}_s \mathrm{d}s, \end{aligned}$$

so that

$$\begin{aligned} \mathbb {E}&\left[ \left( \left[ M^{\varepsilon ,i} , M^{\varepsilon ,j} \right] _{t} - t \sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m}b^j_{\ell ,m} \right) ^2 \right] \\&\quad = \varepsilon ^4 \iint _0^t \sum _{m,{\underline{m}}\in \mathbb {N}} \sum _{\begin{array}{c} \ell ,\ell ' \in \mathbb {N}\\ \underline{\ell } ,\underline{\ell }' \in \mathbb {N} \end{array}} \beta ^i_{\ell ,m} \beta ^j_{\ell ',m} \beta ^i_{\underline{\ell },{\underline{m}}} \beta ^j_{\underline{\ell }',{\underline{m}}} q_m q_{{\underline{m}}} \mathbb {E}\left[ Y^{\varepsilon ,\ell }_s Y^{\varepsilon ,\ell '}_s Y^{\varepsilon ,\underline{\ell }}_r Y^{\varepsilon ,\underline{\ell }'}_r \right] \mathrm{d}s \mathrm{d}r \\&\quad \quad -2 \varepsilon ^2 \int _0^t \sum _{m \in \mathbb {N}} \sum _{\ell ,\ell ' \in \mathbb {N}} \beta ^i_{\ell ,m} \beta ^j_{\ell ',m} q_m \mathbb {E}\left[ Y^{\varepsilon ,\ell }_s Y^{\varepsilon ,\ell '}_s \right] \mathrm{d}s \left( t \sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m} b^j_{\ell ,m}\right) \\&\qquad + \left( t \sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m} b^j_{\ell ,m}\right) ^2. \end{aligned}$$

Now, using that one can easily calculate \(\mathbb {E}\left[ Y^{\varepsilon ,\ell }_s Y^{\varepsilon ,\ell '}_s \right] = \frac{\varepsilon ^{-2}}{2} q_\ell \delta _{\ell ,\ell '}\), it follows from Isserlis–Wick’s theorem, see [14, Theorem 1.28], that

$$\begin{aligned} \mathbb {E}\left[ Y^{\varepsilon ,\ell }_s Y^{\varepsilon ,\ell '}_s Y^{\varepsilon ,\underline{\ell }}_r Y^{\varepsilon ,\underline{\ell }'}_r \right]&= \frac{\varepsilon ^{-4}}{4} \left( q_\ell q_{\underline{\ell }} \delta _{\ell ,\ell '} \delta _{\underline{\ell },\underline{\ell }'} + q_\ell q_{\ell '} e^{-2\varepsilon ^{-2}|s-r|} \left( \delta _{\ell ,\underline{\ell }} \delta _{\ell ',\underline{\ell }'} +\delta _{\ell ,\underline{\ell }'} \delta _{\ell ',\underline{\ell }} \right) \right) , \end{aligned}$$

which yields

$$\begin{aligned}&\mathbb {E}\left[ \left( \left[ M^{\varepsilon ,i} , M^{\varepsilon ,j} \right] _{t} - t \sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m}b^j_{\ell ,m} \right) ^2 \right] \\&\quad = \left( t \sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m} b^j_{\ell ,m} -t \sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m} b^j_{\ell ,m}\right) ^2 \\&\qquad + O(\varepsilon ^2) \lesssim \varepsilon ^2, \end{aligned}$$

proving (24).

Second, fix \(t\in [0,T]\), as well as \(i=1,\dots ,d,\,m\in \mathbb {N}\). Then,

$$\begin{aligned} \left[ M^{\varepsilon ,i} , \langle W , {\mathbf {f}}_m \rangle \right] _t \,=\, \int _0^{t}\beta ^i(\varepsilon Y^\varepsilon _s,Q{\mathbf {f}}_m)\,\mathrm{d}s, \end{aligned}$$

where, using Lemma 3.1,

$$\begin{aligned}&\mathbb {E}\left[ |\!\int _0^t\!\beta ^i(\varepsilon Y^\varepsilon _s,Q{\mathbf {f}}_m)\,\mathrm{d}s\,|^2 \right] =\; \mathbb {E}\left[ |\,\beta ^i(\varepsilon \!\int _0^t\!Y^\varepsilon _s \mathrm{d}s\,,Q{\mathbf {f}}_m)\,|^2 \right] \\&\quad \lesssim \; \mathbb {E}\left[ \right. | \overbrace{\varepsilon \!\int _0^t\!Y^\varepsilon _s \mathrm{d}s}^{ \varepsilon W_t-\varepsilon ^3(Y^\varepsilon _t-Y^\varepsilon _0)} |^2\,q_m^2 \left. \right] {\mathop {\longrightarrow }\limits ^{\varepsilon \downarrow 0}} \;0, \end{aligned}$$

finishing the proof of the proposition. \(\square \)

Remark 4.3

  1. (i)

    Of course, a d-dimensional Wiener process with covariance \((\sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m}b^j_{\ell ,m})_{i,j=1}^d\) can always be represented by \(\sum _{\ell ,m \in \mathbb {N}} b_{\ell ,m} {\bar{W}}^{\ell ,m}\), where \(\{{\bar{W}}^{\ell ,m}\}_{\ell ,m \in \mathbb {N}}\) is a family of independent one-dimensional standard Wiener processes.

  2. (ii)

    We would like to stress that we do not expect a much stronger convergence of \(U^\varepsilon \), when \(\varepsilon \downarrow 0\), as the one stated in the above proposition. Indeed, it turns out to be that the sequence \(\{M^{\varepsilon }\}_{\varepsilon >0}\) is not even a Cauchy sequence in \(L^2(\Omega ;\mathbb {R}^d)\). To see this, for fixed \(0<\varepsilon <\underline{\varepsilon }\), and some \(1\le i\le d\), consider

    $$\begin{aligned} {\mathbb {E}} \left[ \sup _{t \le T} \bigg | M^{\varepsilon ,i}_t - M^{\underline{\varepsilon },i}_t \bigg |^2 \right]&= {\mathbb {E}} \left[ \sup _{t \le T} \bigg | \int _0^t \sum _{\ell ,m \in \mathbb {N}} \beta ^i_{\ell ,m} \left( \varepsilon Y^{\varepsilon ,\ell }_s - \underline{\varepsilon } Y^{\underline{\varepsilon },\ell }_s \right) d\langle W_s , {\mathbf {f}}_m \rangle \bigg |^2 \right] . \end{aligned}$$

    But, by Burkholder–Davis–Gundy’s inequality, the above expectation can be bound from below by

    $$\begin{aligned}&{\mathbb {E}} \left[ \int _0^T \sum _{m \in \mathbb {N}} \left( \sum _{\ell \in \mathbb {N}} \beta ^i_{\ell ,m} \left( \varepsilon Y^{\varepsilon ,\ell }_s - \underline{\varepsilon } Y^{\underline{\varepsilon },\ell }_s \right) \right) ^2 q_m \mathrm{d}s \right] \\&\quad = T \sum _{\ell ,m \in \mathbb {N}} (\beta ^i_{\ell ,m})^2 q_\ell q_m \left( 1 - \frac{2\varepsilon ^{-1}\underline{\varepsilon }^{-1}}{\varepsilon ^{-2}+\underline{\varepsilon }^{-2}} \right) , \end{aligned}$$

    where

    $$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \left( 1 - \frac{2\varepsilon ^{-1}\underline{\varepsilon }^{-1}}{\varepsilon ^{-2}+\underline{\varepsilon }^{-2}} \right) = 1, \quad \hbox {for every fixed} \underline{\varepsilon }>0, \end{aligned}$$

    so that \(\{M^{\varepsilon ,i}\}_{\varepsilon >0}\) cannot be Cauchy in \(L^2(\Omega )\).

4.2 Weak convergence of solutions

We now prove \(X^\varepsilon \rightarrow {\bar{X}}\), in law, when \(\varepsilon \downarrow 0\).

First, for each \(\varepsilon >0\), let \({\hat{X}}^\varepsilon \) be the solution of

$$\begin{aligned} {\hat{X}}^\varepsilon _t = x_0 + \int _0^t \left( F(s,{\hat{X}}^\varepsilon _s) + C(s,{\hat{X}}^\varepsilon _s)\right) \mathrm{d}s + \int _0^t \sigma (s,{\hat{X}}^\varepsilon _s ) \mathrm{d}W_s + U^\varepsilon _t, \quad t\in [0,T], \end{aligned}$$
(25)

where \(U^\varepsilon \) is given by (22), and let \(\tau ^\varepsilon _R = \inf \{t\ge 0 : |X^\varepsilon _t|\ge R \} \wedge \inf \{t\ge 0 : |{\hat{X}}^\varepsilon _t|\ge R \} \).

Note that, if (A4), then the coefficients \(F,C,\sigma ,\beta \) must have properties such that each of the above equations admits global solutions on [0, T], too.

Next, taking into account \(\mathbb {E}\left[ \bigg | \sup _{s \in [0,T]} \varepsilon \beta (Y^\varepsilon _s,Y^\varepsilon _s) \bigg |^p \right] \lesssim \varepsilon ^{-p} \log ^p(1+\varepsilon ^{-2})\) we can estimate increments of \(U^\varepsilon \) with

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} k=0,1,\dots ,[T/\Delta ] \\ t \le \tau ,\, t+k\Delta \le T\wedge \tau ^\varepsilon _R \end{array}} |U^\varepsilon _{t+k\Delta } - U^\varepsilon _{k\Delta }|^p \right] \lesssim \left( \frac{\tau }{\varepsilon }\right) ^p \log ^p(1+\varepsilon ^{-2}). \end{aligned}$$
(26)

As a consequence, it can easily be verified that the analogous of Lemmas 3.3 and 3.4 would still be valid for the process \(X^\varepsilon \), despite \(\beta \not =0\), on the one hand, and that the following versions

$$\begin{aligned}&\mathbb {E}\left[ \sup _{t \le \tau ,\,t+k\Delta \le T\wedge \tau ^\varepsilon _R}|{\hat{X}}^\varepsilon _{t+k\Delta } - {\hat{X}}^\varepsilon _{k\Delta }|^p \right] \lesssim \tau ^{\frac{p}{2}} + \left( \frac{\tau }{\varepsilon }\right) ^p \log ^p(1+\varepsilon ^{-2}),\\&\quad p>1,\,\tau \in (0,1),\,k\in \{0,1,\dots ,[T/\Delta ]\}, \end{aligned}$$

and

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} k=0,1,\dots ,[T/\Delta ] \\ t \le \Delta ,\, t+k\Delta \le T\wedge \tau ^\varepsilon _R \end{array}} |{\hat{X}}^\varepsilon _{t+k\Delta } - {\hat{X}}^\varepsilon _{k\Delta }|^p \right] \lesssim \Delta ^{\frac{p}{2}-1} + \frac{\Delta ^{p-1}}{\varepsilon ^p} \log ^p(1+\varepsilon ^{-2}), \quad p>1, \end{aligned}$$

of Lemmas 3.5 and 3.6, respectively, would hold true when replacing \({\bar{X}}\) by \({\hat{X}}^\varepsilon \), on the other. We point out that the proof of this claim differs from those in Sect. 3 only for the term \(U^\varepsilon \), which however can be controlled by (26).

Therefore, when expanding \(X^\varepsilon \) and \({\hat{X}}^\varepsilon \) as in (18) and (19), but including the \(\beta \)-term, and then arguing as in the proof of Theorem 2.2(ii) in Sect. 3, it would immediately follow that \(X^\varepsilon _{\cdot \wedge \tau ^\varepsilon _R} - {\hat{X}}^\varepsilon _{\cdot \wedge \tau ^\varepsilon _R} \rightarrow 0\), in probability, \(\varepsilon \downarrow 0\), for any \(R>0\), once the following lemma is also available.

Lemma 4.4

Assume that \(\Delta =\Delta _\varepsilon \) behaves as described in Remark 3.9. Then,

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ] \\ h\Delta \le \tau ^\varepsilon _R \end{array}} \bigg | \sum _{k=0}^{h-1} \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s D\sigma (r,X^\varepsilon _r) \varepsilon \beta (Y^\varepsilon _r,Y^\varepsilon _r) \mathrm{d}r \right) \mathrm{d}W^\varepsilon _s \bigg |^2\right] {\rightarrow } 0,\quad \varepsilon \downarrow 0. \end{aligned}$$

Proof

To start with, write

$$\begin{aligned} \int _{k\Delta }^{(k+1)\Delta }&\left( \int _{k\Delta }^s D\sigma (r,X^\varepsilon _r) \varepsilon \beta (Y^\varepsilon _r,Y^\varepsilon _r) \mathrm{d}r \right) \mathrm{d}W^\varepsilon _s \\ =\,&\int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s \left( D\sigma (r,X^\varepsilon _r) \varepsilon \beta (Y^\varepsilon _r,Y^\varepsilon _r) - D\sigma ({k\Delta },X^\varepsilon _{k\Delta }) \varepsilon \beta (Y^\varepsilon _r,Y^\varepsilon _r) \right) \mathrm{d}r \right) \mathrm{d}W^\varepsilon _s \\&+ \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s D\sigma ({k\Delta },X^\varepsilon _{k\Delta }) \varepsilon \beta (Y^\varepsilon _r,Y^\varepsilon _r) \mathrm{d}r \right) \mathrm{d}W^\varepsilon _s, \end{aligned}$$

which creates two summands, for any fixed \(0\le k\le [T/\Delta ]-1\).

We estimate the impact of each summand separately.

First, using \(|D\sigma (r,X^\varepsilon _r)-D\sigma ({k\Delta },X^\varepsilon _{k\Delta })| \lesssim |X^\varepsilon _r-X^\varepsilon _{k\Delta }| + \omega _\sigma (\Delta )\), we obtain that

$$\begin{aligned}&\mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ] \\ h\Delta \le \tau ^\varepsilon _R \end{array}} \bigg | \sum _{k=0}^{h-1} \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s \left( D\sigma (r,X^\varepsilon _r) \varepsilon \beta (Y^\varepsilon _r,Y^\varepsilon _r) - D\sigma ({k\Delta },X^\varepsilon _{k\Delta }) \varepsilon \beta (Y^\varepsilon _r,Y^\varepsilon _r) \right) \mathrm{d}r \right) \mathrm{d}W^\varepsilon _s \bigg |^2\right] \\&\quad \lesssim \varepsilon ^{-4} \log ^2(1+\varepsilon ^{-2}) \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ] \\ h\Delta \le \tau ^\varepsilon _R \end{array}} \bigg | \sum _{k=0}^{h-1} \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s \left( |X^\varepsilon _r-X^\varepsilon _{k\Delta }| + \omega _\sigma (\Delta ) \right) \mathrm{d}r \right) \mathrm{d}s \bigg |^2 \right] \\&\quad \lesssim \varepsilon ^{-4} \log ^2(1+\varepsilon ^{-2}) \mathbb {E}\left[ \sum _{k=0}^{\lceil T \wedge \tau ^\varepsilon _R\rceil / \Delta -1} \int _{k\Delta }^{(k+1)\Delta } \bigg | \int _{k\Delta }^s \left( |X^\varepsilon _r-X^\varepsilon _{k\Delta }| + \omega _\sigma (\Delta ) \right) \mathrm{d}r \bigg |^2 \mathrm{d}s \right] \\&\quad \lesssim \varepsilon ^{-4} \log ^2(1+\varepsilon ^{-2}) \sum _{k=0}^{[T / \Delta ]-1} \int _{k\Delta }^{(k+1)\Delta } (s-k\Delta ) \int _{k\Delta }^s \left( \mathbb {E}\left[ |X^\varepsilon _{r\wedge \tau ^\varepsilon _R}-X^\varepsilon _{k\Delta \wedge \tau ^\varepsilon _R}|^2\right] + \omega _\sigma (\Delta )^2 \right) \mathrm{d}r \mathrm{d}s\\&\quad \lesssim \left( \frac{\Delta ^2}{\varepsilon ^3}\right) ^2 \log ^3(1+\varepsilon ^{-2})+ \left( \frac{\Delta }{\varepsilon ^2}\right) ^2 \log ^2(1+\varepsilon ^{-2})\,\omega _\sigma (\Delta )^2. \end{aligned}$$

Second, we approach

$$\begin{aligned} \sum _{k=0}^{h-1} \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s D\sigma ({k\Delta },X^\varepsilon _{k\Delta }) \varepsilon \beta (Y^\varepsilon _r,Y^\varepsilon _r) \mathrm{d}r \right) \mathrm{d}W^\varepsilon _s \end{aligned}$$
(27)

following the method used when discussing the 2nd sum on the right-hand side of (20) in the proof of Theorem 2.2(ii), but now for triple moments of \(Y^\varepsilon \).

Indeed, define

$$\begin{aligned} c^k_{\ell ,m,n}(\Delta ,\varepsilon ) = \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s Y^{\varepsilon ,\ell }_r Y^{\varepsilon ,m}_r \mathrm{d}r \right) Y^{\varepsilon ,n}_s \mathrm{d}s, \end{aligned}$$

and take the conditional expectation with respect to \({\mathcal {F}}_{k\Delta }\), that is

$$\begin{aligned} \mathbb {E}\left[ c^k_{\ell ,m,n}(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right] = \int _{k\Delta }^{(k+1)\Delta } \left( \int _{k\Delta }^s \mathbb {E}\left[ Y^{\varepsilon ,\ell }_r Y^{\varepsilon ,m}_r Y^{\varepsilon ,n}_s \mid {\mathcal {F}}_{k\Delta }\right] \mathrm{d}r \right) \mathrm{d}s. \end{aligned}$$

Since

$$\begin{aligned} \mathbb {E}\left[ Y^{\varepsilon ,\ell }_r Y^{\varepsilon ,m}_r Y^{\varepsilon ,n}_s \mid {\mathcal {F}}_{k\Delta }\right]&=\, Y^{\varepsilon ,\ell }_{k\Delta } Y^{\varepsilon ,m}_{k\Delta } Y^{\varepsilon ,n}_{k\Delta } e^{-\varepsilon ^{-2}(s+2r-3k\Delta )} \\&\quad + \left( Y^{\varepsilon ,\ell }_{k\Delta } \delta _{m,n} q_n+ Y^{\varepsilon ,m}_{k\Delta } \delta _{\ell ,n} q_n+ Y^{\varepsilon ,n}_{k\Delta } \delta _{\ell ,m} q_\ell \right) \\&\qquad \frac{\varepsilon ^{-2}}{2} \left( e^{-\varepsilon ^{-2}(s-k\Delta )} - e^{-\varepsilon ^{-2}(s+2r-3k\Delta )}\right) , \end{aligned}$$

we have that

$$\begin{aligned} \mathbb {E}\left[ c^k_{\ell ,m,n}(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right] =\,&Y^{\varepsilon ,\ell }_{k\Delta } Y^{\varepsilon ,m}_{k\Delta } Y^{\varepsilon ,n}_{k\Delta } \frac{\varepsilon ^4}{2} \left( 1 - e^{-\varepsilon ^{-2} \Delta } - \frac{1}{3} + \frac{1}{3} e^{-3\varepsilon ^{-2}\Delta } \right) \\&+ \left( Y^{\varepsilon ,\ell }_{k\Delta } \delta _{m,n} q_n+ Y^{\varepsilon ,m}_{k\Delta } \delta _{\ell ,n} q_n+ Y^{\varepsilon ,n}_{k\Delta } \delta _{\ell ,m} q_\ell \right) \\&\quad \times \frac{\varepsilon ^{2}}{2} \left( \frac{\Delta }{\varepsilon ^2}e^{-\varepsilon ^{-2}\Delta } + \frac{1}{2} - \frac{1}{2} e^{-\varepsilon ^{-2}\Delta } + \frac{1}{6} - \frac{1}{6} e^{-3\varepsilon ^{-2}\Delta } \right) . \end{aligned}$$

Next, for each \(i=1,\dots ,d\), the process \(M^i_h,\,h=1,\dots ,[T/\Delta ]\), given by

$$\begin{aligned} M^i_h&= \sum _{k=0}^{h-1} \sum _{\ell ,m,n \in \mathbb {N}} \sum _{j =1,\dots ,d} D_j \sigma ^{i,n}(k\Delta ,X^\varepsilon _{\tau ^\varepsilon _R\wedge k\Delta }) \varepsilon \beta ^j_{\ell ,m}\left( c^k_{\ell ,m,n}(\Delta ,\varepsilon )-\mathbb {E}\left[ c^k_{\ell ,m,n}(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right] \right) , \end{aligned}$$

is a martingale with respect to the filtration \(({\mathcal {F}}_{h\Delta })_{h=1}^{[T/\Delta ]}\), and arguing as in the proof of Lemma 3.10 yields

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,[T/\Delta ] \\ h\Delta \le \tau ^\varepsilon _R \end{array}} \bigg | M^i_h \bigg |^2\right] \lesssim \frac{\Delta ^3}{\varepsilon ^4} \log ^3(1+\varepsilon ^{-2}), \quad i=1,\dots ,d. \end{aligned}$$

So, it remains to prove that the remainder, after subtracting the martingale term \(M_h\) from (27), also vanishes, when \(\varepsilon \downarrow 0\). For \(i=1,\dots ,d\), the ith coordinate of this remainder reads

$$\begin{aligned} N^i_h = \sum _{k=0}^{h-1} \sum _{\ell ,m,n \in \mathbb {N}} \sum _{j =1,\dots ,d} D_j \sigma ^{i,n}(k\Delta ,X^\varepsilon _{k\Delta }) \varepsilon B^j_{\ell ,m} \mathbb {E}\left[ c^k_{\ell ,m,n}(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right] , \end{aligned}$$

and we can easily calculate the below bound,

$$\begin{aligned} \mathbb {E}\left[ \sup _{\begin{array}{c} h=1,\dots ,T/\Delta \\ h\Delta \le \tau ^\varepsilon _R \end{array}} \bigg | N^i_h \bigg |^2 \right]&\lesssim \Delta ^{-1} \sum _{k=0}^{T / \Delta - 1} \mathbb {E}\left[ \bigg | \varepsilon \mathbb {E}\left[ c^k_{\ell ,m,n}(\Delta ,\varepsilon ) \mid {\mathcal {F}}_{k\Delta }\right] \bigg |^2 \right] \\&\lesssim \left( \frac{\varepsilon ^2}{\Delta }\right) ^2 \log ^3(1+\varepsilon ^{-2}), \end{aligned}$$

finishing the proof of the lemma. \(\square \)

Corollary 4.5

For any \(R>0\), if \(\Delta =\Delta _\varepsilon \) behaves as described in Remark 3.9,

$$\begin{aligned} \mathbb {E}\left[ \sup _{t \le T\wedge \tau ^\varepsilon _R }|X^\varepsilon _t - {\hat{X}}^\varepsilon _t|^2\right] \rightarrow 0,\quad \varepsilon \downarrow 0, \end{aligned}$$

and hence \(X^\varepsilon _{\cdot \wedge \tau ^\varepsilon _R} - {\hat{X}}^\varepsilon _{\cdot \wedge \tau ^\varepsilon _R} \rightarrow 0\), in probability, \(\varepsilon \downarrow 0\), in particular.

The above corollary suggests that it would be sufficient to show that \({\hat{X}}_{\cdot \wedge \tau ^\varepsilon _R}^\varepsilon \rightarrow {\bar{X}}_{\cdot \wedge \tau ^\varepsilon _R}\), in law, when \(\varepsilon \downarrow 0\), subject to some procedure allowing to let R go to infinity, afterwards. So, we at first prove the weak convergence for fixed R and then discuss the limit-procedure for \(R\rightarrow \infty \).

Modify the coefficients \(F,\sigma \) outside the set \(\{(t,x): |x|<R\}\) in such a way that the new coefficients \(F_R,\,\sigma _R\), but also \(D\sigma _R\), are globally bounded, and that both functions \(F_R(t,\cdot )\) and \(D\sigma _R(t,\cdot )\) are globally Lipschitz, uniformly in \(t \in [0,T]\).

Of course, \({\hat{X}}^\varepsilon _{\cdot \wedge \tau ^\varepsilon _R}\) coincides with \({\hat{X}}^{\varepsilon ,R}_{\cdot \wedge \tau ^\varepsilon _R}\), where \({\hat{X}}^{\varepsilon ,R}\) denotes the solution to the equation obtained when replacing the coefficients of (25) by \(F_R,\,\sigma _R\), and the Stratonovich correction \(C_R\) associated with \(\sigma _R\). Also, let \({\bar{X}}^R\) denote the solution to the equation obtained when replacing the coefficients of (13) by \(F_R,\,\sigma _R,\,C_R\).

Proposition 4.6

Fix \(R>0\). Then, \({\hat{X}}^{\varepsilon ,R}\) converges to \({\bar{X}}^R\), in law, when \(\varepsilon \downarrow 0\).

Proof

Since

$$\begin{aligned} {\hat{X}}^{\varepsilon ,R}_t - U^\varepsilon _t = x_0 + \int _0^t \left( F_R(s,{\hat{X}}^{\varepsilon ,R}_s) + C_R(s,{\hat{X}}^{\varepsilon ,R}_s)\right) \mathrm{d}s + \int _0^t \sigma _R(s,{\hat{X}}^{\varepsilon ,R}_s ) \mathrm{d}W_s, \end{aligned}$$

by boundedness of the coefficients on the above right-hand side, we obtain that

$$\begin{aligned} \mathbb {E}\left[ \sup _{t\le T} |{\hat{X}}^{\varepsilon ,R}_t - U^\varepsilon _t| \right] \,\lesssim \, |x_0| + T + \mathbb {E}\left[ \sup _{t\le T}|\int _0^t \sigma _R(s,{\hat{X}}^{\varepsilon ,R}_s ) \mathrm{d}W_s| \right] , \end{aligned}$$

where Burkholder–Davis–Gundy’s inequality gives \(\mathbb {E}\left[ \sup _{t\le T}|\int _0^t \sigma _R(s,{\hat{X}}^{\varepsilon ,R}_s ) \mathrm{d}W_s|\right] \,\lesssim \,T^{1/2}\).

Similarly, \(\mathbb {E}\left[ |({\hat{X}}^{\varepsilon ,R}_{t_2} - U^\varepsilon _{t_2}) - ({\hat{X}}^{\varepsilon ,R}_{t_1} - U^\varepsilon _{t_1})|^p\right] \lesssim |t_2-t_1|^{p/2}\), for any \(|t_2-t_1|<1\), and any \(p>1\). Thus, by Kolmogorov–Chentsov’s theorem, for every \(\alpha \in (0,1)\), one can find \(\Delta \in (0,1)\) such that

$$\begin{aligned} \mathbb {P}\left\{ \sup _{t_1,t_2\in [0,T],\,|t_2-t_1|\le \Delta } \frac{ |({\hat{X}}^{\varepsilon ,R}_{t_2} - U^\varepsilon _{t_2}) - ({\hat{X}}^{\varepsilon ,R}_{t_1} - U^\varepsilon _{t_1})| }{|t_2-t_1|^\gamma } \,{\le }\,const \right\} \,\ge \,1{-}\alpha ,\quad \forall \,\varepsilon >0, \end{aligned}$$

where const depends on \(\gamma \), but not on \(\varepsilon \), and \(\gamma \in (0,1/2)\) can be freely chosen.

We therefore have equi-boundedness and equi-continuity of \(\{{\hat{X}}^{\varepsilon ,R} - U^\varepsilon \}_{\varepsilon >0}\) with arbitrarily high probability, and hence the family \(\{{\hat{X}}^{\varepsilon ,R} - U^\varepsilon \}_{\varepsilon >0}\) is tight with respect to the uniform topology in \(C([0,T],\mathbb {R}^d)\), first applying Arzelà–Ascoli, followed by Prokhorov’s theorem. Moreover, \(\{U^\varepsilon \}_{\varepsilon > 0}\) is trivially tight by Proposition 4.2, so that adding \({\hat{X}}^{\varepsilon ,R} - U^\varepsilon \) and \(U^\varepsilon \) would make \(\{{\hat{X}}^{\varepsilon ,R}\}_{\varepsilon > 0}\) tight, too.

All in all, the family of triples \(\{\left( \right. {\hat{X}}^{\varepsilon ,R},U^\varepsilon ,{W} \left. \right) \}_{\varepsilon > 0}\) is tight.

Next, for \(\varepsilon >0\), let \(\mathbb {P}^{R,\varepsilon }\) be the pushforward measure \(\mathbb {P}\circ ({\hat{X}}^{\varepsilon ,R},U^\epsilon , {W} )^{-1}\) on the space

$$\begin{aligned} {\tilde{\Omega }}=C([0,T],H_d)\times C([0,T],H_d) \times C([0,T],H_\infty ) \end{aligned}$$

equipped with the Borel-\(\sigma \)-algebra \({\mathcal {B}}\), and let \((\xi ,\eta ,\omega )\) denote the coordinate process on \({\tilde{\Omega }}\).

By tightness of \(\{({\hat{X}}^{\varepsilon ,R},U^\epsilon ,{W})\}_{\varepsilon > 0}\), there exists a subsequence \((\varepsilon _n)_{n\in \mathbb {N}}\) such that \(\mathbb {P}^{R,\varepsilon _n}\) weakly converges to a probability measure \(\mathbb {P}^R\) on \(({\tilde{\Omega }},{\mathcal {B}})\), when \(n\uparrow \infty \).

Let \(\tilde{{\mathcal {F}}}\) be the \(\mathbb {P}^R\)- completion of \({\mathcal {B}}\), and let \((\tilde{{\mathcal {F}}}_t)_{t\in [0,T]}\) be the smallest filtration the process \((\xi ,\eta ,\omega )\) is adapted to, on the one hand, and which satisfies the usual conditions with respect to \(\mathbb {P}^R\), on the other. Also, introduce \(\tilde{{\mathcal {F}}}^n,\,(\tilde{{\mathcal {F}}}_t^n)_{t\in [0,T]}\) in a similar way with respect to \(\mathbb {P}^{R,\varepsilon _n},\,n\in \mathbb {N}\).

Now, it easily follows from Proposition 4.2 that, on \(({\tilde{\Omega }},\tilde{{\mathcal {F}}},\mathbb {P}^R)\), the following distributional properties must hold for the pair of processes \((\eta ,\omega )\): \(\eta \) is a d-dimensional Wiener process with covariance \((\sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m}b^j_{\ell ,m})_{i,j=1}^d\), \(\omega \) is a Q-Wiener process, \(\eta \) and \(\omega \) are independent.

Introduce

$$\begin{aligned} M^{R}_t = \xi _t - x_0 - \int _0^t \left( F_R(s,\xi _s) + C_R(s,\xi _s)\right) \mathrm{d}s - \eta _t, \quad t\in [0,T], \end{aligned}$$
(28)

and observe that each component of both processes \(M^R\) and \(\omega \), but also

$$\begin{aligned} M^{R,i}_t M^{R,j}_t- \int _0^t \sum _{m \in \mathbb {N}} \sigma ^{i,m}_R(s,\xi _s) \sigma ^{j,m}_R(s,\xi _s) q_m \mathrm{d}s, \quad t\in [0,T],\quad i,j=1,\dots ,d, \\ M^{R,i}_t \omega ^m_t - \int _0^t \sigma ^{i,m}_R(s,\xi _s) q_m \mathrm{d}s, \quad t\in [0,T],\quad i=1,\dots ,d,\quad m \in {\mathbb {N}}, \\ \omega ^\ell _t \omega ^m_t - t \delta _{\ell ,m} q_m, \quad t\in [0,T],\quad \ell , m \in {\mathbb {N}}, \end{aligned}$$

are continuous local martingales with respect to \((\tilde{{\mathcal {F}}}_t^n)_{t\in [0,T]}\) on \(({\tilde{\Omega }},\tilde{{\mathcal {F}}}^n,\mathbb {P}^{R,\varepsilon _n})\), for any \(n\in \mathbb {N}\), and hence they are continuous local martingales with respect to \((\tilde{{\mathcal {F}}}_t)_{t\in [0,T]}\) on \(({\tilde{\Omega }},\tilde{{\mathcal {F}}},\mathbb {P}^{R})\), too, by [12, IX. Cor.1.19].

Therefore, applying [3, Theorem 8.2] to the pair of process \((M^R,\omega )\) yields

$$\begin{aligned} M^R_t = \int _0^t \sigma _R(s,\xi _s ) d{W}^R_s, \quad \omega _t = \int _0^t 1\, d{W}^R_s = {W}^R_t, \quad t\in [0,T], \end{aligned}$$

on \(({\tilde{\Omega }},\tilde{{\mathcal {F}}},\mathbb {P}^R)\), or an enlargement of this space we still denote by \(({\tilde{\Omega }},\tilde{{\mathcal {F}}},\mathbb {P}^R)\), where \(W^R\) is another Q-Wiener process, which, by the above representation, even \(\mathbb {P}^R\)- almost surely coincides with \(\omega \), so that

$$\begin{aligned} M^R_t = \int _0^t \sigma _R(s,\xi _s ) d\omega _s,\quad t\in [0,T],\quad \mathbb {P}^R-\,\hbox {a.s.} \end{aligned}$$

Thus, Eq. (28) can be written as

$$\begin{aligned} \xi _t \,&=\, x_0 + \int _0^t \left( F_R(s,\xi _s) + C_R(s,\xi _s)\right) \mathrm{d}s + \int _0^t \sigma _R(s,\xi _s ) d\omega _s + \eta _t, \quad t\in [0,T],\mathbb {P}^R-\,\hbox {a.s.}, \end{aligned}$$

where \(\omega \) is a Q-Wiener process, while \(\eta \) is a d-dimensional Wiener process, independent of \(\omega \), and with covariance \((\sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m}b^j_{\ell ,m})_{i,j=1}^d\). Observe that the process \({\bar{X}}^R\) satisfies the same type of equation, as \(\sum _{\ell , m \in \mathbb {N}} b_{\ell ,m} {\bar{W}}^{\ell ,m}\) from (13) is a d-dimensional Wiener process with covariance \((\sum _{\ell ,m \in \mathbb {N}} b^i_{\ell ,m}b^j_{\ell ,m})_{i,j=1}^d\), too. But, since this type of equation admits a unique strong solution, the laws of \(\xi \) and \({\bar{X}}^R\) must be the same, proving \({\hat{X}}^{\varepsilon _n,R}\rightarrow {\bar{X}}^R\), in law, when \(n\uparrow \infty \). However, the same argument applies to any converging subsequence, and the limit will always be the same, finally proving \({\hat{X}}^{\varepsilon ,R}\rightarrow {\bar{X}}^R\), in law, when \(\varepsilon \downarrow 0\). \(\square \)

It remains to discuss how R can be taken to infinity.

Recall that \({\bar{X}}\) is the solution of (13), and it is not difficult to see that \({\bar{X}}^R\) converges to \({\bar{X}}\), in law, as \(R \rightarrow \infty \).

Now take a function \(\varphi _R \in C(C([0,T],\mathbb {R}^d),[0,1])\), such that \(\varphi _R(u)=0\), if \(\sup _{t \in [0,T]} |u_t| \le R-1\), and \(\varphi _R(u)=1\), if \(\sup _{t \in [0,T]} |u_t| > R\).

Then,

$$\begin{aligned} \mathbb {P}\{ \tau ^\varepsilon _R < T\} \le \mathbb {P}\left\{ \sup _{t \in [0,T]} |{\hat{X}}^{\varepsilon ,R}_t| \ge R \right\} \le \mathbb {E}\left[ \varphi _R({\hat{X}}^{\varepsilon ,R})\right] , \end{aligned}$$

and because \({\hat{X}}^{\varepsilon ,R} \rightarrow {\bar{X}}^R\), in law, when \(\varepsilon \downarrow 0\), we deduce that

$$\begin{aligned}&\limsup _{\varepsilon \rightarrow 0} \mathbb {P}\{ \tau ^\varepsilon _R < T\} \le {\mathbb {E}} \left[ \varphi _R({\bar{X}}^R)\right] \le {\mathbb {P}} \left\{ \sup _{t \in [0,T]} |{\bar{X}}^R_t| \ge R-1 \right\} = \mathbb {P}\left\{ \sup _{t \in [0,T]} |{\bar{X}}_t| \ge R-1 \right\} , \end{aligned}$$

where the last probability converges to zero, when \(R\rightarrow \infty \), because \({\bar{X}}\) is a global solution.

As a consequence, for any \(\psi \in C_b(C([0,T],\mathbb {R}^d),\mathbb {R})\),

$$\begin{aligned} \bigg | \mathbb {E}\left[ \psi (X^\varepsilon )\right] - \mathbb {E}\left[ \psi ({\bar{X}})\right] \bigg | \le \,&\bigg | \mathbb {E}\left[ \psi (X^\varepsilon )\right] - \mathbb {E}\left[ \psi (X^\varepsilon _{\cdot \wedge \tau ^\varepsilon _R})\right] \bigg | + \bigg | \mathbb {E}\left[ \psi (X^\varepsilon _{\cdot \wedge \tau ^\varepsilon _R})\right] - \mathbb {E}\left[ \psi ({\hat{X}}^{\varepsilon ,R}_{\cdot \wedge \tau ^\varepsilon _R})\right] \bigg | \\&+ \bigg | \mathbb {E}\left[ \psi ({\hat{X}}^{\varepsilon ,R}_{\cdot \wedge \tau ^\varepsilon _R})\right] - \mathbb {E}\left[ \psi ({\hat{X}}^{\varepsilon ,R})\right] \bigg | + \bigg | \mathbb {E}\left[ \psi ({\hat{X}}^{\varepsilon ,R})\right] - {\mathbb {E}} \left[ \psi ({\bar{X}}^R)\right] \bigg | \\&+ \bigg | {\mathbb {E}} \left[ \psi ({\bar{X}}^R)\right] - \mathbb {E}\left[ \psi ({\bar{X}})\right] \bigg |. \end{aligned}$$

Here, when taking R large enough, we can make all the summands on the right-hand side, except for the second and fourth, arbitrarily small, uniformly in \(\varepsilon \), and, for fixed R, the remaining terms go to zero, when \(\varepsilon \downarrow 0\).

Thus, by a diagonal argument, the convergence in law of \(X^\varepsilon \rightarrow {\bar{X}},\,\varepsilon \downarrow 0\), follows, completing the proof of the theorem.

5 Application to climate models

We now apply Theorem 2.2 to perform stochastic model reduction for a subclass of the stochastic climate models given by (4), (5) in the introduction: we restrict ourselves to a simpler version of (5), omitting fast forcing \(\varepsilon ^{-2}f^2_{\varepsilon ^{-1}t}\) and \(\varepsilon ^{-1} A^2_2 Y^\varepsilon _t\), on the one hand, but also neglecting the interaction \(B^2_{12}(X^\varepsilon _t,Y^\varepsilon _t)\), on the other. While the first two terms we omit are technically demanding but look doable from a wider prospective, which is beyond this paper, the term \(\varepsilon ^{-1}B^2_{12}(X^\varepsilon _t,Y^\varepsilon _t)\) involving the neglected interaction is notoriously hard and beyond our understanding, right now.

For each \(\varepsilon >0\), let \((X^\varepsilon ,Y^\varepsilon )\) be a pair of processes satisfying

$$\begin{aligned} \frac{\mathrm{d}X^\varepsilon _t}{\mathrm{d}t}&= F^1_t + A^1_1 X^\varepsilon _t + A^1_2 Y^\varepsilon _t + B^1_{11}(X^\varepsilon _t,X^\varepsilon _t) + B^1_{12}(X^\varepsilon _t,Y^\varepsilon _t) + \varepsilon B^1_{22}(Y^\varepsilon _t,Y^\varepsilon _t), \end{aligned}$$
(29)
$$\begin{aligned} \frac{\mathrm{d}Y^\varepsilon _t}{\mathrm{d}t}&= \varepsilon ^{-2} A^2_1 X^\varepsilon _t +\varepsilon ^{-2} B^2_{11}(X^\varepsilon _t,X^\varepsilon _t) -\varepsilon ^{-2} Y^\varepsilon _t + \varepsilon ^{-2} {\dot{W}}_t, \end{aligned}$$
(30)

where \(A^1_1:H_d \rightarrow H_d\), \(A^1_2:H_\infty \rightarrow H_d\), \(A^2_1:H_d \rightarrow H_\infty \) are bounded linear operators, \(B^1_{11}:H_d \times H_d \rightarrow H_d\), \(B^1_{12}:H_d \times H_\infty \rightarrow H_d\), \(B^1_{22}:H_\infty \times H_\infty \rightarrow H_d\), \(B^2_{11}:H_d \times H_d \rightarrow H_\infty \) are continuous bilinear maps, and \(F^1:[0,T] \rightarrow H_d\) is a deterministic continuous external force. Stochastic basis and Wiener process W are taken to be the same as in Remark 2.1.

In what follows, the above equations will always have initial conditions \((x_0,y_0)\), where \(x_0 \in H_d\) can be chosen arbitrarily, while \(y_0 = \int _{-\infty }^0 \varepsilon ^{-2} e^{\varepsilon ^{-2}s} \mathrm{d}W_s\) will be fixed to ensure pseudo-stationarity of the scaled unresolved variables. Note that fixing \(y_0\in H_\infty \) this way would not restrict the initial data of the reduced equations.

In fluid dynamics settings like (1), it is customary to assume that A is self-adjoint, and that the full nonlinearity is skew-symmetric: \(\langle B(z',z),z \rangle _H = 0\), \(z,z' \in H\), see [18]. We therefore make the following assumptions on the projected coefficients:

(C1):

\(A^2_1 = (A^1_2)^*\);

(C2):

\(\langle B^1_{11}(x',x),x \rangle _{H_d} = 0\), for all \(x,x' \in H_d\);

(C3):

\(\langle B^1_{12}(x',y),x \rangle _{H_d} = - \langle B^2_{11}(x',x),y \rangle _{H_\infty } \), for all \(x,x' \in H_d\), \(y \in H_\infty \).

Also, without loss of generality, we can assume that \(B^1_{22}\) is symmetric in the sense of \(\langle B^1_{22}({\mathbf {f}}_\ell ,{\mathbf {f}}_m) , {\mathbf {e}}_i \rangle _{H_d} = \langle B^1_{22}({\mathbf {f}}_m,{\mathbf {f}}_\ell ) , {\mathbf {e}}_i \rangle _{H_d}\), for all \(i,\ell ,m\); and finally, we will need the analogue of (A5), that is

(C4):

\(\sum _{\ell \in \mathbb {N}} \langle B^1_{22}({\mathbf {f}}_\ell ,{\mathbf {f}}_\ell ) , {\mathbf {e}}_i \rangle _{H_d}\, q_\ell \,=0\), for all \(i=1,\dots ,d\).

Note that the latter condition is indeed satisfied for many fluid dynamics models—it usually holds independently of the structure of the noise because \(\langle B^1_{22}({\mathbf {f}}_\ell ,{\mathbf {f}}_m) , {\mathbf {e}}_i \rangle _{H_d}\) would be zero on the diagonal, when \(\ell =m\), for all i.

Next, we bring Eqs. (29), (30) into a form which makes them comparable to (6), (7).

Using the definition of \(y_0\), we have the following mild formulation of (30),

$$\begin{aligned} Y_t^\varepsilon \,=\, {\tilde{Y}}^\varepsilon _t + \int _0^t \varepsilon ^{-2} e^{-\varepsilon ^{-2}(t-s)} \left( A^2_1 X^\varepsilon _s +B^2_{11}(X^\varepsilon _s,X^\varepsilon _s)\right) \mathrm{d}s, \quad t\in [0,T], \end{aligned}$$
(31)

where

$$\begin{aligned} {\tilde{Y}}^\varepsilon _t = \int _{-\infty }^t \varepsilon ^{-2}e^{-\varepsilon ^{-2}(t-s)} \mathrm{d}W_s, \quad t\in \mathbb {R}, \end{aligned}$$

is a stationary Ornstein–Uhlenbeck process. Plugging (31) into (29), \(X^\varepsilon \) alternatively satisfies

$$\begin{aligned} X^\varepsilon _t {=} \,&\,x_0 {+} \int _0^t \left( F^1_s + A^1_1 X^\varepsilon _s + B^1_{11}(X^\varepsilon _s,X^\varepsilon _s) \right) \mathrm{d}s + \int _0^t A^1_2 Z^\varepsilon _s \mathrm{d}s + \int _0^t B^1_{12}\left( X^\varepsilon _s,Z^\varepsilon _s\right) \mathrm{d}s\nonumber \\&+ \int _0^t A^1_2 {\tilde{Y}}^\varepsilon _s \mathrm{d}s + \int _0^t B^1_{12}(X^\varepsilon _s,{\tilde{Y}}^\varepsilon _s) \mathrm{d}s\nonumber \\&+\int _0^t \varepsilon B^1_{22}({\tilde{Y}}^\varepsilon _s,{\tilde{Y}}^\varepsilon _s) \mathrm{d}s +2\int _0^t \varepsilon B^1_{22}({\tilde{Y}}^\varepsilon _s,Z^\varepsilon _s ) \mathrm{d}s +\int _0^t \varepsilon B^1_{22}\left( Z^\varepsilon _s,Z^\varepsilon _s\right) \mathrm{d}s, t\in [0,T], \end{aligned}$$
(32)

when using the abbreviation

$$\begin{aligned} Z^\varepsilon _s = \int _0^s \varepsilon ^{-2} e^{-\varepsilon ^{-2}(s-r)} \left( A^2_1 X^\varepsilon _r +B^2_{11}(X^\varepsilon _r,X^\varepsilon _r)\right) \mathrm{d}r. \end{aligned}$$

Since \(Z^\varepsilon _s\) is close to \(A^2_1 X^\varepsilon _s + B^2_{11}(X^\varepsilon _s,X^\varepsilon _s)\), for small \(\varepsilon \), and since both terms \(B^1_{22}({\tilde{Y}}^\varepsilon _s,Z^\varepsilon _s ),\, B^1_{22}\left( Z^\varepsilon _s,Z^\varepsilon _s\right) \) will be shown to vanish with \(\varepsilon \), too, the process \(X^\varepsilon \) should be close to \({\tilde{X}}^\varepsilon \) satisfying

$$\begin{aligned} {\tilde{X}}^\varepsilon _t = \,&\,x_0 {+} \int _0^t \left( F^1_s {+} A^1_1 {\tilde{X}}^\varepsilon _s + B^1_{11}({\tilde{X}}^\varepsilon _s,{\tilde{X}}^\varepsilon _s) \right) \mathrm{d}s + \int _0^t A^1_2 \left( A^2_1 {\tilde{X}}^\varepsilon _s +B^2_{11}({\tilde{X}}^\varepsilon _s,{\tilde{X}}^\varepsilon _s) \right) \mathrm{d}s \nonumber \\&+ \int _0^t B^1_{12}\left( {\tilde{X}}^\varepsilon _s, \left( A^2_1 {\tilde{X}}^\varepsilon _s +B^2_{11}({\tilde{X}}^\varepsilon _s,{\tilde{X}}^\varepsilon _s)\right) \right) \mathrm{d}s \nonumber \\&+ \int _0^t A^1_2 {\tilde{Y}}^\varepsilon _s \mathrm{d}s + \int _0^t B^1_{12}({\tilde{X}}^\varepsilon _s,{\tilde{Y}}^\varepsilon _s)\,\mathrm{d}s + \int _0^t \varepsilon B^1_{22}({\tilde{Y}}^\varepsilon _s,{\tilde{Y}}^\varepsilon _s)\,\mathrm{d}s, \quad t\in [0,T], \end{aligned}$$
(33)

which is an equation of type (6) with

$$\begin{aligned} F(t,x)\,&=\, \,F^1_t + A^1_1{x} + B^1_{11}(x,x) + A^1_2 \left( A^2_1{x} + B^2_{11}(x,x) \right) + B^1_{12}\left( x,\left( A^2_1{x}+ B^2_{11}(x,x)\right) \right) ,\\ \sigma (t,x)\,&=\, \,A^1_2 + B^1_{12}(x,\cdot )\,,\\ \beta \,&=\,\,B^1_{22}\,. \end{aligned}$$

Thus, in this setting, the analogue of (13) would read

$$\begin{aligned} {\bar{X}}_t = \,&x_0 + \int _0^t \left( F^1_s + A^1_1 {\bar{X}}_s + B^1_{11}({\bar{X}}_s,{\bar{X}}_s) \right) \mathrm{d}s + \int _0^t A^1_2 \left( A^2_1 {\bar{X}}_s + B^2_{11}({\bar{X}}_s,{\bar{X}}_s) \right) \mathrm{d}s \nonumber \\&+ \int _0^t B^1_{12}\left( {\bar{X}}_s, \left( A^2_1 {\bar{X}}_s + B^2_{11}({\bar{X}}_s,{\bar{X}}_s)\right) \right) \mathrm{d}s + \int _0^t C({\bar{X}}_s)\,\mathrm{d}s\nonumber \\&+ A^1_2 W_t + \int _0^t B^1_{12}({\bar{X}}_s,\mathrm{d}W_s) + \sum _{\ell , m \in \mathbb {N}} b_{\ell ,m} {\bar{W}}^{\ell ,m}_t, \quad t\in [0,T], \end{aligned}$$
(34)

where the Stratonovich correction term \(C: H_d \rightarrow H_d\) simplifies to

$$\begin{aligned} \langle C({x}) , {\mathbf {e}}_i \rangle _{H_d} = \frac{1}{2} \sum _{m \in \mathbb {N}} q_m \sum _{j=1}^d \langle B^1_{12}({\mathbf {e}}_j,{\mathbf {f}}_m), {\mathbf {e}}_i \rangle _{H_d} \langle B^1_{12}(x,{\mathbf {f}}_m), {\mathbf {e}}_j \rangle _{H_d}, \quad i=1,\dots ,d, \end{aligned}$$

and

$$\begin{aligned} b^i_{\ell ,m} = \langle B^1_{22}({\mathbf {f}}_\ell ,{\mathbf {f}}_m) , {\mathbf {e}}_i \rangle _{H_d}\, \sqrt{\frac{q_\ell q_m}{2}}, \quad i=1,\dots ,d,\;\ell ,m\in \mathbb {N}. \end{aligned}$$

Proposition 5.1

When assuming (C1)–(C3), Eq. (34) admits a unique global strong solution on [0, T].

Proof

First, regularity of coefficients guarantees the existence of a unique local strong solution. Second, by Itô’s formula,

$$\begin{aligned} \frac{1}{2}|{\bar{X}}_{t\wedge \tau }|^2 \,=\,&\,\frac{1}{2}|x_0|^2 + \int _0^{t\wedge \tau } \langle F^1_s + A^1_1 {\bar{X}}_s + B^1_{11}({\bar{X}}_s,{\bar{X}}_s) , {\bar{X}}_s \rangle \,\mathrm{d}s \\&+ \int _0^{t\wedge \tau } \langle A^1_2 \left( A^2_1 {\bar{X}}_s +B^2_{11}({\bar{X}}_s,{\bar{X}}_s) \right) , {\bar{X}}_s \rangle \,\mathrm{d}s\\&+ \int _0^{t\wedge \tau } \langle B^1_{12}\left( {\bar{X}}_s, \left( A^2_1 {\bar{X}}_s +B^2_{11}({\bar{X}}_s,{\bar{X}}_s)\right) \right) , {\bar{X}}_s \rangle \,\mathrm{d}s\,+ \int _0^{t\wedge \tau }\langle C({\bar{X}}_s),{\bar{X}}_s \rangle \,\mathrm{d}s\\&+ \int _0^{t\wedge \tau } \langle A^1_2 \mathrm{d}W_s , {\bar{X}}_s \rangle + \int _0^{t\wedge \tau } \langle B^1_{12}({\bar{X}}_s,\mathrm{d}W_s), {\bar{X}}_s \rangle + \sum _{\ell ,m\in \mathbb {N}} \int _0^{t\wedge \tau }\langle b_{\ell ,m},{\bar{X}}_s \rangle \,d{\bar{W}}^{\ell ,m}_s\\&+\frac{1}{2}\sum _{m\in \mathbb {N}} |A^1_2{\mathbf {f}}_m|^2 q_m (t\wedge \tau ) +\frac{1}{2}\sum _{m\in \mathbb {N}} \int _0^{t\wedge \tau }\!\! |B^1_{12}({\bar{X}}_s,{\mathbf {f}}_m)|^2 q_m\,\mathrm{d}s+\frac{1}{2}\sum _{\ell ,m\in \mathbb {N}} |b_{\ell ,m}|^2 (t\wedge \tau ), \end{aligned}$$

for any fixed \(t\in [0,T]\), and any stopping time \(\tau \) smaller than a possible explosion time.

Applying (C1)–(C3), we have the identities

$$\begin{aligned}&\langle B^1_{11}({\bar{X}}_s,{\bar{X}}_s) , {\bar{X}}_s \rangle _{H_d} = 0, \\&\quad \langle A^1_2 B^2_{11}({\bar{X}}_s,{\bar{X}}_s), {\bar{X}}_s \rangle _{H_d}= \langle B^2_{11}({\bar{X}}_s,{\bar{X}}_s),A^2_1 {\bar{X}}_s \rangle _{H_\infty }, \\&\quad \langle B^1_{12}({\bar{X}}_s,A^2_1 {\bar{X}}_s) , {\bar{X}}_s \rangle _{H_d} = -\langle B^2_{11}({\bar{X}}_s,{\bar{X}}_s) , A^2_1 {\bar{X}}_s \rangle _{H_\infty },\\&\quad \langle B^1_{12}({\bar{X}}_s,B^2_{11}({\bar{X}}_s,{\bar{X}}_s)) , {\bar{X}}_s \rangle _{H_d} = -\Vert B^2_{11}({\bar{X}}_s,{\bar{X}}_s)\Vert ^2_{H_\infty }, \end{aligned}$$

leading to

$$\begin{aligned} \mathbb {E}\left[ \sup _{t'\le t}|{\bar{X}}_{t'\wedge \tau }|^2 \right] \lesssim \left( 1 + \int _0^t \mathbb {E}\left[ \sup _{s'\le s}|{\bar{X}}_{s'\wedge \tau }|^2 \right] \mathrm{d}s\right) , \end{aligned}$$

again using the regularity of the coefficients combined with Burkholder–Davis–Gundy’s inequality. Thus, by Gronwall, the local solution \({\bar{X}}\) has to be global on [0, T]. \(\square \)

Remark 5.2

In a very similar way, it can be shown that both Eqs. (32) and (33) admit unique global strong solutions on [0, T], too, and hence those proofs are omitted. As a consequence, simply substituting the solution of (32) into (31), for each \(\varepsilon >0\), there is a unique pair of processes \((X^\varepsilon ,Y^\varepsilon )\) satisfying (29), (30) on [0, T].

Theorem 5.3

Assume (C1)–(C3), fix \(\varepsilon >0\), and let \((X^\varepsilon ,Y^\varepsilon )\) be the unique pair of processes satisfying (29), (30) on a given climate time interval [0, T].

  1. (i)

    If (C4), then \(X^\varepsilon \) converges in law, \(\varepsilon \downarrow 0\), to the unique process \({\bar{X}}\) satisfying (34).

  2. (ii)

    However, if (C4) comes via \(B^1_{22}=0\), then the stronger convergence (8) holds true.

Proof

Recall the process \({\tilde{X}}^\varepsilon \) satisfying (33), which is an equation of type (6) with coefficients \(F,\sigma ,\beta \) satisfying (A1)–(A3). Furthermore, by Proposition 5.1 and Remark 5.2, condition (A4) is satisfied, too, while (A5) and (C4) actually are the same condition.

All in all, Theorem 2.2 implies that both parts (i) and (ii) of Theorem 5.3 hold true when replacing \({X}^\varepsilon \) by \({\tilde{X}}^\varepsilon \).

Thus, it is sufficient to prove convergence in probability of \(X^\varepsilon - {\tilde{X}}^\varepsilon \) to zero, \(\varepsilon \downarrow 0\), uniformly on compact subsets of a localizing stochastic interval, which can easily be shown following the lines of proof of Theorem 2.2.

Indeed, by localization and discretization arguments, one would first derive

$$\begin{aligned}&\mathbb {E}\left[ \sup _{\begin{array}{c} k'=0,\dots ,h\\ k'\!\Delta \le \tau _R^\varepsilon \end{array}} \bigg | X^\varepsilon _{k'\Delta } - {\tilde{X}}^\varepsilon _{k'\Delta } \bigg |^2 \right] \lesssim \;r(\Delta ,\varepsilon ) + \sum _{k=0}^{h-1} \Delta \mathbb {E}\left[ \sup _{\begin{array}{c} k'=0,\dots ,k\\ k'\!\Delta \le \tau _R^\varepsilon \end{array}} \bigg | X^\varepsilon _{k'\Delta } - {\tilde{X}}^\varepsilon _{k'\Delta } \bigg |^2 \right] ,\\&\quad h=1,\dots ,[T/\Delta ], \end{aligned}$$

where \(\tau ^\varepsilon _R = \inf \{t \ge 0: |X^\varepsilon _t| \ge R \} \wedge \inf \{t \ge 0: |{\tilde{X}}^\varepsilon _t| \ge R \}\), and \(r(\Delta ,\varepsilon ) \rightarrow 0,\,\varepsilon \downarrow 0\), for a suitable choice of \(\Delta =\Delta _\varepsilon \). Then, combining Gronwall’s lemma and Markov’s inequality, one would obtain

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \mathbb {P}\left\{ \sup _{t \le T \wedge \tau ^\varepsilon _R }\Vert X^\varepsilon _t - {\tilde{X}}^\varepsilon _t \Vert _{H_d}> \delta \right\} = 0,\quad \forall \,\delta >0, \end{aligned}$$

which yields the convergences stated in parts (i) and (ii) of Theorem 5.3 up to time \(\tau ^\varepsilon _R\). Since \({\bar{X}}\) is globally defined, both types of convergence can be extended to the whole interval [0, T], using similar arguments given in the proof of the corresponding parts of Theorem 2.2. \(\square \)