In this chapter we consider \(\mathbb {R}^{d}\)-valued discrete time processes of the same form as in Chap. 4, but instead of analyzing the large deviation behavior, we consider deviations closer to the LLN limit. Since this will require centering on the limit, we assume that the process model has the form

$$\begin{aligned} X_{i+1}^{n}\doteq X_{i}^{n}+\frac{1}{n}b(X_{i}^{n})+\frac{1}{n}\nu _{i} (X_{i}^{n}), \quad X_{0}^{n}=x_{0}, \end{aligned}$$
(5.1)

where \(\{\nu _{i}(\cdot )\}_{i\in \mathbb {N}_{0}}\) are iid random vector fields as in Chap. 4 but with zero mean, and \(b:\mathbb {R}^{d} \rightarrow \mathbb {R}^{d}\) is continuously differentiable.

As in Chap. 4, we consider the continuous time piecewise linear interpolations \(\{X^{n}(t)\}_{0\le t\le T}\) with \(X^{n}(i/n)=X_{i}^{n}\). Under moment conditions that are weaker than those of Chap. 4, there is a law of large numbers limit \(X^{0}\in \mathscr {C}([0,T]:\mathbb {R} ^{d})\). Closely related to \(X^{0}\) is the noiseless version of (5.1) obtained by setting \(\nu _{i}(\cdot )=0\), which is denoted by \(\{X_{i}^{n, 0}\}_{n\in \mathbb {N}_{0}}\) with piecewise linear interpolation \(\{X^{n, 0}(t)\}_{0\le t\le T}\). We introduce a scaling sequence  \(\varkappa (n)\) that satisfies

$$\begin{aligned} \varkappa (n)\rightarrow 0\text { and }\varkappa (n)n\rightarrow \infty , \end{aligned}$$
(5.2)

and study the rescaled difference

$$ Y^{n}\doteq \sqrt{\varkappa (n)n}(X^{n}-X^{n, 0}). $$

Since under Condition 5.1, b is Lipschitz continuous, we have

$$ \left\| X^{0}-X^{n, 0}\right\| _{\infty }\doteq \sup _{t\in [0,T]}\left\| X^{0}(t)-X^{n, 0}(t)\right\| =O(1/n). $$

Thus

$$ \sqrt{\varkappa (n)n}\left\| X^{0}-X^{n, 0}\right\| _{\infty } =O(\sqrt{\varkappa (n)/n}), $$

and hence \(Y^{n}\) behaves the same asymptotically as \(\sqrt{\varkappa (n)n}(X^{n}-X^{0})\). It will be shown that under weaker conditions on the noise \(\nu _{i}(\cdot )\) than were used in Chap. 4, \(Y^{n}\) satisfies the large deviation principle on \(\mathscr {C}([0,T]:\mathbb {R}^{d})\) with a “Gaussian”-type rate function. As is customary for this type of scaling, we refer to this as moderate deviations.

While one might expect the proof of the moderate deviations result to be similar to that of the corresponding large deviations result, there are important differences. For example, the tightness proof is significantly more complicated in the case of moderate deviations than for the case of large deviations. In Chap. 4 we were able to establish an a priori bound on certain relative entropy costs associated with any sequence of nearly minimizing controls. Under this boundedness of the relative entropy costs, empirical measures of the controlled driving noises as well as the controlled processes themselves were tight. With the scaling used for moderate deviations, and even with the information that the analogous relative entropy costs decay like \(O(1/\varkappa (n)n)\), tightness of the empirical measures of the driving noise does not hold. Instead, one must consider the empirical measures of conditional means of the noises, and additional effort is required to show that the limits of these measures determine the limit of the controlled processes. This extra difficulty arises for moderate deviations (even with the vanishing relative entropy costs), because the noise is amplified by the factor \(\sqrt{\varkappa (n)n}\) in the definition of \(Y^{n}\).

A second way in which the proofs for large and moderate deviations differ is in their treatment of degenerate noise, i.e., problems in which the support of \(\nu _{i}(\cdot )\) is not all of \(\mathbb {R}^{d}\). As we saw in Chap. 4, this leads to significant difficulties in the proof of the large deviation lower bound, requiring a somewhat involved mollification argument. In contrast, the proof in the setting of moderate deviations, though more involved than the nondegenerate case, is much more straightforward.

As a potential application of these results we mention their usefulness in the design and analysis of Monte Carlo schemes for events whose probability is small but not very small. For such problems, the performance of standard Monte Carlo may not be adequate, especially if the quantity must be computed for many different parameter settings, as in, say, an optimization problem. Another instance is the situation in which the cost for even a single sample is very high, as for example in the case of stochastic partial differential equations. Then accelerated Monte Carlo may be of interest, and as is well known, such schemes (e.g., importance sampling and splitting) benefit from the use of information contained in the large deviation rate function as part of the algorithm design (e.g., [28, 76, 114, 116]). In a situation in which one considers events of small but not too small probability, one may find the moderate deviation approximation both adequate and relatively easy to apply, since moderate deviations lead to situations in which the objects needed to design an efficient scheme can be explicitly constructed in terms of solutions to the linear–quadratic regulator. These issues were first explored in [101]. Other moderate deviation analyses are presented in Chaps. 10 and 13, and an example of how moderate deviation approximations can be used to construct accelerated Monte Carlo schemes is given in Sect. 17.5.

1 Assumptions, Notation, and Theorem Statement

Let

$$ X_{i+1}^{n}\doteq X_{i}^{n}+\frac{1}{n}b(X_{i}^{n})+\frac{1}{n}\nu _{i} (X_{i}^{n}), \quad X_{0}^{n}=x_{0}, $$

where the \(\{\nu _{i}(\cdot )\}_{i\in \mathbb {N}_{0}}\) are zero-mean iid vector fields whose distribution is given by a stochastic kernel \(\theta (dy|x)\) on \(\mathbb {R}^{d}\) given \(\mathbb {R}^{d}\). For \(\alpha \in \mathbb {R}^{d}\), define

$$ H_{c}(x,\alpha )\doteq \log Ee^{\left\langle \alpha ,\nu _{i}(x)\right\rangle } . $$

The subscript c reflects the fact that this log moment generating function uses the centered distribution \(\theta (\cdot |x)\), rather than \(H(x,\alpha )=H_{c}(x,\alpha )+\left\langle \alpha , b(x)\right\rangle \) as in Chap. 4. We use the following.

Condition 5.1

(a) There exist \(\lambda >0\) and \(K_{\text {mgf}}<\infty \) such that

$$\begin{aligned} \sup _{x\in \mathbb {R}^{d}}\sup _{\left\| \alpha \right\| \le \lambda } H_{c}(x,\alpha )\le K_{\text {mgf}}\text {.} \end{aligned}$$
(5.3)

(b) The mapping \(x\mapsto \theta (dy|x)\) from \(\mathbb {R}^{d}\) to \(\mathscr {P}(\mathbb {R}^{d})\) is continuous with respect to the topology of weak convergence.

(c) b(x) is continuously differentiable, and the norms of both b(x) and its derivative are uniformly bounded by a constant \(K_{b}<\infty \).

Throughout this chapter we let \(\left\| \alpha \right\| _{A}^{2} \doteq \left\langle \alpha , A\alpha \right\rangle \) for \(\alpha \in \mathbb {R}^{d}\) and a symmetric nonnegative definite matrix A. Define

$$ A_{ij}(x)\doteq \int _{\mathbb {R}^{d}}y_{i}y_{j}\theta (dy|x), $$

and note that the weak continuity of \(\theta (dy|x)\) with respect to x and (5.3) ensures that \(A\left( x\right) \) is continuous in x and that its norm, \(\Vert A(x)\Vert \doteq \sup _{v:\Vert v\Vert =1}\Vert A(x)v\Vert \), is uniformly bounded by some constant \(K_{A}\). Note that

$$\begin{aligned} \frac{\partial H_{c}(x, 0)}{\partial \alpha _{i}}=\int _{\mathbb {R}^{d}} y_{i}\theta (dy|x)=0 \end{aligned}$$
(5.4)

and

$$\begin{aligned} \frac{\partial ^{2}H_{c}(x, 0)}{\partial \alpha _{i}\partial \alpha _{j}} =\int _{\mathbb {R}^{d}}y_{i}y_{j}\theta (dy|x)=A_{ij}(x) \end{aligned}$$
(5.5)

for all \(i, j\in \{1,\ldots , d\}\) and \(x\in \mathbb {R}^{d}\), and that A(x) is nonnegative definite and symmetric. It follows that for \(x\in \mathbb {R} ^{d}\),

$$ A(x)=Q(x)\varLambda (x)Q^{T}(x), $$

where Q(x) is an orthogonal matrix whose columns are the eigenvectors of A(x), and \(\varLambda (x)\) is the diagonal matrix consisting of the eigenvalues of A(x) in descending order. The map \(x\mapsto \varLambda (x)\) is continuous. In what follows, we define \(\varLambda ^{-1}(x)\) to be the diagonal matrix with diagonal entries each equal to the inverse of the corresponding eigenvalue for the positive eigenvalues, and equal to \(\infty \) for the zero eigenvalues. Then when we write

$$\begin{aligned} \left\| \alpha \right\| _{A^{-1}(x)}^{2}=\left\| \alpha \right\| _{Q(x)\varLambda ^{-1}(x)Q^{T}(x)}^{2}, \end{aligned}$$
(5.6)

we mean a value of \(\infty \) for \(\alpha \in \mathbb {R}^{d}\) not in the linear span of the eigenvectors corresponding to the positive eigenvalues, and the standard value for vectors \(\alpha \in \mathbb {R}^{d}\) in that linear span. (Note that even if the definition of \(A^{-1}(x)\) is ambiguous, in that for \(\alpha \) in the range of A(x) there may be more than one v such that \(A(x)v=\alpha \), the value of \(\left\| \alpha \right\| _{A^{-1}(x)}^{2}\) is not ambiguous. Indeed, since the eigenvectors can be assumed orthogonal, for all such v, \(\left\langle v,\alpha \right\rangle \) coincides with \(\left\langle \bar{v},\alpha \right\rangle \), where \(\bar{v}\) is the solution in the span of eigenvectors corresponding to positive eigenvalues.) Condition 5.1(a) implies there exist \(K_{DA}<\infty \) and \(\lambda _{DA} \in (0,\lambda ]\) such that

$$\begin{aligned} \sup _{x\in \mathbb {R}^{d}}\sup _{\left\| \alpha \right\| \le \lambda _{DA} }\max _{i,j, k}\left| \frac{\partial ^{3}H_{c}(x,\alpha )}{\partial \alpha _{i}\partial \alpha _{j}\partial \alpha _{k}}\right| \le \frac{K_{DA}}{d^{3}}, \end{aligned}$$
(5.7)

and consequently for all \(\left\| \alpha \right\| \le \lambda _{DA}\) and all \(x\in \mathbb {R}^{d}\),

$$\begin{aligned} \frac{1}{2}\left\| \alpha \right\| _{A(x)}^{2}-\left\| \alpha \right\| ^{3}K_{DA}\le H_{c}(x,\alpha )\le \frac{1}{2}\left\| \alpha \right\| _{A(x)}^{2}+\left\| \alpha \right\| ^{3}K_{DA}\text {.} \end{aligned}$$
(5.8)

Define the continuous time piecewise linear interpolation of \(X_{i}^{n}\) by

$$ X^{n}(t)\doteq X_{i}^{n}+[X_{i+1}^{n}-X_{i}^{n}](nt-i),\quad t\in [i/n, i/n+1/n]. $$

In addition, define

$$ X_{i+1}^{n, 0}=X_{i}^{n, 0}+\frac{1}{n}b(X_{i}^{n, 0}),\quad X_{0}^{n, 0}=x_{0}, $$

and let \(X^{n, 0}(t)\) be the analogously defined continuous time interpolation. Then \(X^{n, 0}\rightarrow X^{0}\) in \(\mathscr {C}([0,T]:\mathbb {R}^{d})\), where for \(t\in [0,T]\),

$$ X^{0}(t)=\int _{0}^{t}b(X^{0}(s))ds+x_{0}\text {.} $$

Since \(E\nu _{i}(x)=0\) for all \(x\in \mathbb {R}^{d}\), we know that \(X^{n}\rightarrow X^{0}\) in \(\mathscr {C}([0,T]:\mathbb {R}^{d})\) in probability.

In Chap. 4 we showed, under significantly stronger assumptions, that \(X^{n}(t)\) satisfies a large deviation principle on \(\mathscr {C} ([0,T]:\mathbb {R}^{d})\) with scaling sequence \(r(n)=1/n\). Letting \(I_{L}\) denote this rate function (with L for large deviation), it takes the form

$$\begin{aligned} I_{L}(\phi )&\doteq \inf \left[ \int _{0}^{T}L_{c}(\phi (s), u(s))ds:\phi \left( t\right) =x_{0}+\int _{0}^{t}b(\phi (s))ds\right. \\&\quad \qquad \quad \left. +\int _{0}^{t}u(s)ds,\;t\in [0,T]\right] , \end{aligned}$$

where

$$\begin{aligned} L_{c}(x,\beta )\doteq \sup _{\alpha \in \mathbb {R}^{d}}\left[ \left\langle \alpha ,\beta \right\rangle -H_{c}(x,\alpha )\right] \end{aligned}$$
(5.9)

is the Legendre transform of \(H_{c}(x,\alpha )\). We see that \(I_{L}\) coincides with the rate function of Chap. 4, because

$$ L(x,\beta )\doteq \sup _{\alpha \in \mathbb {R}^{d}}\left[ \left\langle \alpha ,\beta \right\rangle -H_{c}(x,\alpha )-\left\langle \alpha , b(x)\right\rangle \right] =L_{c}(x,\beta -b(x)), $$

so that \(L_{c}(\phi (s), u(s))=L_{c}(\phi (s),\dot{\phi }(s)-b(\phi (s)))=L(\phi (s),\dot{\phi }(s))\).

Assume that \(\varkappa (n)\) satisfies (5.2):

$$ \varkappa (n)\rightarrow 0\text { and }\varkappa (n)n\rightarrow \infty \text {.} $$

We define the rescaled difference

$$ Y^{n}(t)\doteq \sqrt{\varkappa (n)n}(X^{n}(t)-X^{n, 0}(t)). $$

Let Db(x) denote the matrix of partial derivatives \((Db(x))_{ij}=\partial b_{i} (x)/\partial x_{j}\), and let \(A^{1/2}(x)\) be the unique nonnegative definite square root of A(x).

Theorem 5.2

Assume Condition 5.1. Then \(\{Y^{n} \mathbb {\}}_{n\in \mathbb {N}}\) satisfies the Laplace principle on \(\mathscr {C}([0,T]:\mathbb {R}^{d})\) with scaling sequence \(\varkappa (n)\) and rate function

$$\begin{aligned} I_{M}(\phi )&=\inf \left[ \frac{1}{2}\int _{0}^{T}\left\| u(t)\right\| ^{2}dt:\phi (t)=\int _{0}^{t}Db(X^{0}(s))\phi (s)ds\right. \\&\quad \qquad \quad \left. +\int _{0}^{t}A^{1/2}(X^{0}(s))u(s)ds,\;t\in [0,T]\right] . \end{aligned}$$

The function \(I_{M}\) is essentially the same as what one would obtain using a linear approximation around the law of large numbers limit \(X^{0}\) of the dynamics and a quadratic approximation of the costs in \(I_{L}\). By our convention, proofs will be given for the case \(T=1\), with only notational modifications needed for the general case. An alternative form of the rate function that is consistent with expressions we use for continuous time models is

$$\begin{aligned} I_{M}(\phi )=\inf _{u\in U_{\phi }}\left[ \frac{1}{2}\int _{0}^{1}\left\| u(t)\right\| ^{2}dt\right] , \end{aligned}$$
(5.10)

where for \(\phi \in \mathscr {AC}_{0}([0,1]:\mathbb {R}^{d})\) \(U_{\phi }\) is the subset of \(\mathscr {L}^{2}([0,1]:\mathbb {R}^{d})\) given by

$$\begin{aligned} U_{\phi }\dot{=}\left\{ u:\phi (\cdot )=\int _{0}^{\cdot }Db(X^{0}(s))\phi (s)ds+\int _{0}^{\cdot }A^{1/2} (X^{0}(s))u(s)ds\right\} , \nonumber \\ \end{aligned}$$
(5.11)

and \(U_{\phi }\) is the empty set otherwise. Since \(\{\phi :I_{M}(\phi )\le K\}\) is the image of the compact set \(\{u:\int _{0}^{1}\left\| u(t)\right\| ^{2}dt\le K\}\) (with the weak topology on \(\mathscr {L}^{2}([0,1]:\mathbb {R} ^{d})\)) under a continuous mapping, \(I_{M}\) has compact level sets. To complete the proof of Theorem 5.2, we must show that for every bounded and continuous F,

$$\begin{aligned} \lim _{n\rightarrow \infty }-\varkappa (n)\log E\left[ e^{-\frac{1}{\varkappa (n)}F(Y^{n})}\right] =\inf _{\phi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ I_{M}(\phi )+F\left( \phi \right) \right] . \end{aligned}$$
(5.12)

The argument will follow the same layout as that used in Chaps. 3 and 4. In Sect. 5.2, a representation is derived for the exponential integral in (5.12), and in Sect. 5.3, tightness of empirical measures and identification of limits for these measures and controlled processes are carried out. It is here that the moderate deviation problem differs most from the corresponding large deviation problem, in that the definition of the empirical measures is not analogous to the definition used in Chap. 4. The Laplace principle upper and lower bounds that together imply (5.12) are proved in Sects. 5.4 and 5.5, respectively.

2 The Representation

As usual, the first step is to identify a useful representation for the Laplace functionals. Owing to the moderate deviation scaling, the construction of the controlled processes differs slightly from that of Chap. 4.

Construction 5.3

Suppose we are given a probability measure \(\mu ^{n}\in \mathscr {P}((\mathbb {R}^{d})^{n})\) and decompose it in terms of conditional distributions \([\mu ^{n}]_{i|1,\ldots , i-1}\) on the ith variable given variables 0 through \(i-1\):

$$\begin{aligned} \mu ^{n}(dv_{0}\times \cdots \times dv_{n-1})&=[\mu ^{n}]_{0}(dv_{0})[\mu ^{n}]_{1|0}(dv_{1}|v_{0})\\&\quad \times \cdots \times [\mu ^{n}]_{n-1|0,\ldots , n-2}(dv_{n-1} |v_{0},\ldots , v_{n-2}). \end{aligned}$$

Let \(\{\bar{v}_{i}^{n}\}_{i=0,\ldots , n-1}\) be random variables defined on a probability space \((\Omega ,\mathscr {F},\mathbb {P})\) and with joint distribution \(\mu ^{n}\). Thus conditioned on \(\bar{\mathscr {F}}_{i}^{n}\doteq \sigma (\bar{v}_{j}^{n}, j=0,\ldots , i-1)\), \(\bar{v}_{i}^{n}\) has distribution \(\bar{\mu }_{i}^{n}(dv_{i} )\doteq [\mu ^{n}]_{i|0,\ldots , i-1}(dv_{i}|\bar{v}_{0}^{n},\ldots ,\bar{v}_{i-1}^{n})\). The collection \(\{\bar{\mu }_{i}^{n}\}_{i=0,\ldots , n-1}\) will be called a control. Controlled processes \(\bar{X}^{n} ,\bar{Y}^{n}\) and measures \(\bar{M}^{n}\) are recursively constructed as follows. Let \(\bar{X}_{0}^{n}=x_{0}\), and for \(i=1,\ldots , n\), define \(\bar{X}_{i}^{n}\) recursively by

$$ \bar{X}_{i+1}^{n}=\bar{X}_{i}^{n}+\frac{1}{n}b(\bar{X}_{i}^{n})+\frac{1}{n}\bar{v}_{i}^{n}. $$

When \(\{\bar{X}_{i}^{n}\}_{i=1,\ldots , n}\) has been constructed, define

$$\begin{aligned} \bar{Y}_{i+1}^{n}=\bar{Y}_{i}^{n}+\sqrt{\frac{\varkappa (n)}{n}}\left( b(\bar{X}_{i}^{n})-b(X_{i}^{n, 0})\right) +\sqrt{\frac{\varkappa (n)}{n}} \bar{v}_{i}^{n},\quad \bar{Y}_{0}^{n}=0. \end{aligned}$$
(5.13)

Note that

$$\begin{aligned} \bar{X}_{i}^{n}-X_{i}^{n, 0}=\frac{1}{\sqrt{\varkappa (n)n}}\bar{Y}_{i} ^{n},\;i=0,1,\ldots , n. \end{aligned}$$
(5.14)

Let \(\bar{X}^{n}\) and \(\bar{Y}^{n}\) be as in (4.2) the piecewise linear interpolations with \(\bar{X}^{n}(i/n)=\bar{X}_{i}^{n}\) and \(\bar{Y}^{n}(i/n)=\bar{Y}_{i}^{n} \). Define also the interpolated conditional mean (provided it exists)

$$ \bar{w}^{n}(t)\doteq \int _{\mathbb {R}^{d}}y\bar{\mu }_{i}^{n}(dy),\quad t\in [i/n, i/n+1/n), $$

the scaled conditional mean

$$ w^{n}(t)\doteq \sqrt{\varkappa (n)n}\bar{w}^{n} (t), $$

and random measures on \(\mathbb {R}^{d}\times [0,1]\) by

$$ \bar{M}^{n}(dw\times dt)\doteq \delta _{w^{n}(t)}(dw)dt=\delta _{\sqrt{\varkappa (n)n}\bar{w}^{n}(t)}(dw)dt. $$

Note that \(\bar{M}^{n}\) is the empirical measure of the scaled conditional means and not, in contrast to Chap. 4, of the \(\bar{v}_{i}^{n}\), scaled or otherwise. This additional “averaging” will be needed for tightness. We will refer to this construction when we are given \(\{\bar{\mu }_{i}^{n}\}_{i=1,\ldots , n}\) to identify associated \(\bar{X}^{n},\bar{Y}^{n}, w^{n}\) and \(\bar{M}^{n}\). By Theorem 4.5, for every bounded, continuous \(F:\mathscr {C} ([0,1]:\mathbb {R}^{d})\rightarrow \mathbb {R}\),

$$\begin{aligned}&-\varkappa (n)\log E\left[ e^{-\frac{1}{\varkappa (n)}F(Y^{n})}\right] \\&\quad =\inf _{\{\bar{\mu }_{i}^{n}\}}E\left[ {\displaystyle \sum \limits _{i=0}^{n-1}} \varkappa (n)R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))+F(\bar{Y}^{n})\right] .\nonumber \end{aligned}$$
(5.15)

3 Tightness and Limits for Controlled Processes

3.1 Tightness and Uniform Integrability

When the moment-generating function is finite for all \(\alpha \), a variational characterization of its Legendre transform in terms of relative entropy is proved in Lemma 4.16. In this chapter we will need only the following inequality, which holds when the moment-generating function is finite in some neighborhood of the origin. Recall that \(L_{c}(x,\cdot )\) is the Legendre–Fenchel transform of \(H_{c}(x,\cdot )\).

Lemma 5.4

Assume Condition 5.1. Then for all \(x,\beta \in \mathbb {R}^{d}\),

$$ L_{c}\left( x,\beta \right) \le R(\left. \eta (\cdot )\right\| \theta (\cdot |x)) $$

for all \(\eta \in \mathscr {P}(\mathbb {R}^{d})\) satisfying \(\int _{\mathbb {R}^{d} }y\eta (dy)=\beta \).

Proof

Fix \(x,\beta \in \mathbb {R}^{d}\) and consider any \(\eta \in \mathscr {P}(\mathbb {R}^{d})\) that satisfies \(\int _{\mathbb {R}^{d}} y\eta (dy) =\beta \). If \(R(\left. \eta (\cdot ) \right\| \theta (\cdot |x))=\infty \), the lemma is automatically true, so we assume without loss that \(R(\left. \eta (\cdot ) \right\| \theta (\cdot |x))<\infty \). From (5.3) we have

$$ \int _{\mathbb {R}^{d}}e^{{\lambda }\left\| y\right\| /{d^{1/2}} } \theta (dy|x) \le 2de^{K_{\text {mgf}}}<\infty \text {.} $$

Therefore, recalling the inequality \(ab\le e^{a}+\ell (b)\) for \(a, b\ge 0\), we have

$$\begin{aligned}&\int _{\mathbb {R}^{d}}\frac{\lambda }{d^{1/2}}\left\| y\right\| \frac{ d\eta (\cdot )}{d\theta (\cdot |x)}(y)\theta (dy|x)\\&\quad \le \int _{\mathbb {R}^{d}}e^{{\lambda }\left\| y\right\| /{d^{1/2}} }\theta (dy|x)+\int _{\mathbb {R}^{d}}\ell \left( \frac{d\eta (\cdot )}{ d\theta (\cdot |x) }(y)\right) \theta (dy|x)\\&\quad \le 2de^{K_{\text {mgf}}}+R(\left. \eta (\cdot )\right\| \theta (\cdot |x)). \end{aligned}$$

Consequently, for all \(\alpha \in \mathbb {R}^{d}\),

$$\begin{aligned} \int _{\mathbb {R}^{d}}\left\| \alpha \right\| \left\| y\right\| \frac{d\eta (\cdot )}{d\theta (\cdot |x)}(y)\theta (dy|x)\le \frac{d^{1/2} \left\| \alpha \right\| }{\lambda }\left( 2de^{K_{\text {mgf}}}+R(\left. \eta (\cdot )\right\| \theta (\cdot |x))\right) <\infty \text {.} \end{aligned}$$
(5.16)

Define the bounded continuous function \(F_{K}:\mathbb {R}^{d}\times \mathbb {R}^{d}\rightarrow \mathbb {R}\) by

$$ F_{K}(y,\alpha )=\left\{ \begin{array} [c]{ll} \left\langle \alpha ,y\right\rangle \text { } &{} \text {if }\left| \left\langle \alpha ,y\right\rangle \right| \le K,\\ \frac{K\left\langle \alpha ,y\right\rangle }{\left| \left\langle \alpha , y\right\rangle \right| } &{} \text {otherwise.} \end{array} \right. $$

From (5.16) and the dominated convergence theorem, we have

$$ \lim _{K\rightarrow \infty }\int _{\mathbb {R}^{d}}F_{K}(y,\alpha )\eta (dy)=\left\langle \alpha ,\int _{\mathbb {R}^{d}}y\eta (dy)\right\rangle =\langle \alpha ,\beta \rangle \text {.} $$

Another application of the monotone convergence theorem gives

$$ \lim _{K\rightarrow \infty }\int _{\{y:\left\langle \alpha , y\right\rangle<0\}}e^{F_{K}(y,\alpha )}\theta (dy|x)=\int _{\{y:\left\langle \alpha , y\right\rangle <0\}}e^{\left\langle \alpha , y\right\rangle }\theta (dy|x), $$

and the monotone convergence theorem gives

$$ \lim _{K\rightarrow \infty }\int _{\{y:\left\langle \alpha , y\right\rangle \ge 0\}}e^{F_{K}(y,\alpha )}\theta (dy|x)=\int _{\{y:\left\langle \alpha , y\right\rangle \ge 0\}}e^{\left\langle \alpha , y\right\rangle }\theta (dy|x). $$

Thus

$$ \lim _{K\rightarrow \infty }\log \left( \int _{\mathbb {R}^{d}}e^{F_{K}(y,\alpha )}\theta (dy|x)\right) =H_{c}(x,\alpha )\text {.} $$

By the Donsker–Varadhan variational formula (Proposition 2.2), for every \(K\in (0,\infty )\) and \(\alpha \in \mathbb {R}^{d}\),

$$ R(\left. \eta (\cdot )\right\| \theta (\cdot |x))\ge \int _{\mathbb {R}^{d} }F_{K}(y,\alpha )\eta (dy)-\log \left( \int _{\mathbb {R}^{d}}e^{F_{K}(y,\alpha )}\theta (dy|x)\right) . $$

Sending \(K\rightarrow \infty \) and taking the supremum over \(\alpha \in \mathbb {R} ^{d}\) yields

$$ R(\left. \eta (\cdot )\right\| \theta (\cdot |x))\ge \sup _{\alpha \in \mathbb {R}^{d}}\left[ \left\langle \alpha ,\beta \right\rangle -H_{c} (x,\alpha )\right] =L_{c}\left( x,\beta \right) , $$

which completes the proof of the lemma.    \(\square \)

Theorem 5.5

Assume Condition 5.1 and

$$\begin{aligned} \sup _{n\in \mathbb {N}}\left[ \varkappa (n)nE\left[ \frac{1}{n} {\displaystyle \sum \limits _{i=0}^{n-1}} R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i} ^{n}))\right] \right] \le K_{E}<\infty . \end{aligned}$$
(5.17)

Let \(\{\bar{M}^{n}\}_{n\in \mathbb {N}}\), \(\{\bar{w}^{n}\}_{n\in \mathbb {N}}\), and \(\{w^{n}\}_{n\in \mathbb {N}}\) be defined as in Construction 5.3. Then

$$ \sup _{n\in \mathbb {N}}E\left[ \int _{0}^{1}\sqrt{\varkappa (n)n}\left\| \bar{w}^{n}(t)\right\| dt\right] =\sup _{n\in \mathbb {N}}E\left[ \int _{0}^{1}\left\| w^{n}(t)\right\| dt\right] <\infty . $$

In addition, \(\{\bar{M}^{n}\}\) is tight (as a sequence of random probability measures) and uniformly integrable in the sense that

$$\begin{aligned} \lim _{C\rightarrow \infty }\limsup _{n\rightarrow \infty }E\left[ \int _{\mathbb {R}^{d}\times [0,1]}\left\| w\right\| 1_{\left\{ \left\| w\right\| \ge C\right\} }\bar{M}^{n}(dw\times dt)\right] =0. \end{aligned}$$
(5.18)

Proof

We assume without loss of generality that \(\inf _{n\in \mathbb {N}} \{\sqrt{\varkappa (n)n}\}>0\). Let \(B\in (0,\infty )\) be such that \(B\le \lambda _{DA}\inf _{n\in \mathbb {N}}\{\sqrt{\varkappa (n)n}\}\), so that \(\lambda _{DA}\ge B/\sqrt{\varkappa (n)n}\) for all n. Recall \(L_{c}\) from (5.9), and let \(\bar{K}\doteq \lambda _{DA}K_{DA}+K_{A}/2\), where we recall that \(K_{A}\) is the bound on A(x) and \(K_{DA}\) was introduced in (5.8). Let \(e_{i}\) denote the standard unit vectors in \(\mathbb {R}^{d}\). Then for all \(i\in \{1,\ldots , d\}\) and each choice of ±,

$$\begin{aligned}&\varkappa (n)nL_{c}(x,\beta )\\&\quad =\sup _{\alpha \in \mathbb {R}^{d}}\left[ \sqrt{\varkappa (n)n} \left\langle \alpha ,\sqrt{\varkappa (n)n}\beta \right\rangle -\varkappa (n)nH_{c}(x,\alpha )\right] \\&\quad \ge \pm \sqrt{\varkappa (n)n}\left\langle \frac{B}{\sqrt{\varkappa (n)n} }e_{i},\sqrt{\varkappa (n)n}\beta \right\rangle -\varkappa (n)nH_{c}\left( x,\pm \frac{B}{\sqrt{\varkappa (n)n}}e_{i}\right) \\&\quad \ge \pm B\sqrt{\varkappa (n)n}\beta _{i}-\frac{1}{2}B^{2}\left\| A(x)\right\| -B^{2}\lambda _{DA}K_{DA}\\&\quad \ge \pm B\sqrt{\varkappa (n)n}\beta _{i}-B^{2}\bar{K}, \end{aligned}$$

where the first inequality follows from making a specific choice of \(\alpha \) and the second uses (5.8). If we multiply both sides by \(\left| \beta _{i}\right| \), sum on i, and then divide by \(\sum _{i=1}^{d}\left| \beta _{i}\right| \), we obtain

$$\begin{aligned} d\varkappa (n)nL_{c}(x,\beta )+dB^{2}\bar{K}\ge Bd\sqrt{\varkappa (n)n} \frac{\left\| \beta \right\| ^{2}}{\sum _{i=1}^{d}\left| \beta _{i}\right| }\ge B\sqrt{\varkappa (n)n}\left\| \beta \right\| . \end{aligned}$$
(5.19)

To slightly simplify the notation, we let \(s^{n}(t)\doteq \left\lfloor nt\right\rfloor /n\), where \(\left\lfloor a\right\rfloor \) is the integer part of a. Using the bound relating \(L_{c}\) and relative entropy from Lemma 5.4 together with (5.17), we obtain

$$\begin{aligned} d\left( \frac{K_{E}}{B}+B\bar{K}\right) \ge&\ \frac{d\varkappa (n)n}{B}E\left[ \int _{0}^{1}L_{c}\left( \bar{X}^{n}\left( s^{n}(t)\right) ,\bar{w}^{n}(t)\right) dt\right] +dB\bar{K}\nonumber \\ \ge&\ E\left[ \int _{0}^{1}\sqrt{\varkappa (n)n}\left\| \bar{w} ^{n}(t)\right\| dt\right] , \end{aligned}$$
(5.20)

which proves the first statement in the theorem. Since by Theorem 2.10 the mapping

$$ m\mapsto \int _{\mathbb {R}^{d}\times [0,1]}\left\| w\right\| m(dw\times dt) $$

defines a tightness function on \(\mathscr {P}\left( \mathbb {R}^{d} \times [0,1]\right) \), it follows from Lemma 2.9 and the first claim that \(\{\bar{M}^{n}\}_{n\in \mathbb {N}}\) is tight.

For the uniform integrability, let \(C\in (1,\infty )\) be arbitrary. We note that the estimates in (5.19) and (5.20) hold for any B and n such that \(B\le \lambda _{DA}\{\sqrt{\varkappa (n)n}\}\). Consider n large enough that

$$ \min \{\lambda _{DA}, 1\}\ge \frac{C}{\sqrt{\varkappa (n)n}}\text {.} $$

Then for such n, the estimates (5.19) and (5.20) hold with \(B=1\) and \(B=C\). Recalling \(\bar{K} \doteq \lambda _{DA}K_{DA}+K_{A}/2\) and applying (5.20) with \(B=1\), we have for such n,

$$ E\left[ \int _{0}^{1}\sqrt{\varkappa (n)n}\left\| \bar{w}^{n}(s)\right\| ds\right] \le K^{*}\doteq d\left( K_{E}+\frac{1}{2}K_{A}+\lambda _{DA}K_{DA}\right) , $$

and therefore

$$ E\left[ \int _{0}^{1}1_{\{\sqrt{\varkappa (n)n}\left\| \bar{w} ^{n}(s)\right\| >C^{2}\}}ds\right] \le \frac{K^{*}}{C^{2}}. $$

In the following bound, (5.19) with \(C=B\) is used to get the first inequality and the last display, and (5.20) with \(B=1\) for the third inequality:

$$\begin{aligned}&CE\left[ \int _{\mathbb {R}^{d}\times [0,1]}\left\| w\right\| 1_{\left\{ \left\| w\right\| \ge C^{2}\right\} }\bar{M}^{n}(dw\times dt)\right] \\&\quad \le E\left[ d\int _{0}^{1}1_{\{\left\| w^{n}(s)\right\|>C^{2}\}}\left( \varkappa (n)nL_{c}\left( \bar{X}^{n}\left( s^{n}(t)\right) ,\bar{w}^{n}(t)\right) +C^{2}\bar{K}\right) dt\right] \\&\quad \le d\varkappa (n)nE\left[ \int _{0}^{1}L_{c}\left( \bar{X} ^{n}\left( s^{n}(t)\right) ,\bar{w}^{n}(t)\right) dt\right] +C^{2}d\bar{K}E\left[ \int _{0}^{1}1_{\{\left\| w^{n}(t)\right\| >C^{2} \}}dt\right] \\&\quad \le K^{*}d\left( 1+\bar{K}\right) . \end{aligned}$$

This proves the claimed uniform integrability.    \(\square \)

3.2 Identification of Limits

The following theorem is a law of large numbers type result for the difference between the noises and their conditional means, and is the most complicated part of the analysis.

Theorem 5.6

Assume Condition 5.1 and (5.17). Consider the sequence \(\{\bar{\nu }_{i}^{n}\}\) of controlled noises and \(\{\bar{w}^{n}(i/n)\}\) of means of the controlled noises as in Construction 5.3. For \(i\in \{1,\ldots , n\}\) let

$$ W_{i}^{n}\doteq \frac{1}{n} {\displaystyle \sum \limits _{j=0}^{i-1}} \sqrt{\varkappa (n)n}\left( \bar{\nu }_{j}^{n}-\bar{w}^{n}\left( j/n\right) \right) \text {.} $$

Then for all \(\delta >0\),

$$ \lim _{n\rightarrow \infty }P\left\{ \max _{i\in \{1,\ldots , n\}}\left\| W_{i}^{n}\right\| \ge \delta \right\} =0\text {.} $$

Proof

According to (5.17),

$$ \frac{1}{n}\sum \limits _{i=0}^{n-1}E[R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))]\le \frac{K_{E}}{\varkappa (n)n}. $$

Because of this the (random) Radon–Nikodym derivatives,

$$ f_{i}^{n}(y)=\frac{d\bar{\mu }_{i}^{n}(\cdot )}{d\theta (\cdot |\bar{X}_{i}^{n} )}(y) $$

are well defined and can be selected in a measurable way [79, Theorem V.58]. We will control the magnitude of the noise when the Radon–Nikodym derivative is large by bounding

$$ \frac{1}{n}\sum \limits _{i=0}^{n-1}E[1_{\{f_{i}^{n}(\bar{\nu }_{i}^{n})\ge r\}}\left\| \bar{\nu }_{i}^{n}\right\| ] $$

for large \(r\in (0,\infty )\).

From the bound on the moment-generating function (5.3) [see (3.12)], we obtain

$$ \sup _{x\in \mathbb {R}^{d}}\int _{\mathbb {R}^{d}}e^{\frac{\lambda }{\sqrt{d} }\left\| y\right\| }\theta (dy|x)\le 2de^{K_{\text {mgf}}}\text {.} $$

Let \(\sigma =\min \{\lambda /2\sqrt{d}, 1\}\) and recall \(\ell (b)\doteq b\log b-b+1\). Then

$$ \frac{1}{n}\sum \limits _{i=0}^{n-1}E\left[ 1_{\{f_{i}^{n}(\bar{\nu }_{i} ^{n})\ge r\}}\left\| \bar{\nu }_{i}^{n}\right\| \right] =\frac{1}{n}\sum \limits _{i=0}^{n-1}E\left[ \int _{\{y:f_{i}^{n}(y)\ge r\}}\left\| y\right\| f_{i}^{n}(y)\theta (dy|\bar{X}_{i}^{n})\right] , $$

and the bound \(ab\le e^{a}+\ell (b)\) for \(a, b\ge 0\) with \(a=\sigma \left\| y\right\| \) and \(b=f_{i}^{n}(y)\) gives that for all i,

$$\begin{aligned}&E\left[ \int _{\{y:f_{i}^{n}(y)\ge r\}}\left\| y\right\| f_{i} ^{n}(y)\theta (dy|\bar{X}_{i}^{n})\right] \\&\quad \le \frac{1}{\sigma }E\left[ \int _{\{y:f_{i}^{n}(y)\ge r\}} e^{\sigma \left\| y\right\| }\theta (dy|\bar{X}_{i}^{n})\right] +\frac{1}{\sigma }E\left[ \int _{\{y:f_{i}^{n}(y)\ge r\}}\ell (f_{i}^{n}(y))\bar{\mu }_{i}^{n}(dy)\right] . \end{aligned}$$

Since \(\ell (b)\ge 0\) for all \(b\ge 0\), we have

$$\begin{aligned} E\left[ \int _{\left\{ y:f_{i}^{n}\left( y\right) \ge r\right\} } \ell \left( f_{i}^{n}\left( y\right) \right) \theta (dy|\bar{X}_{i} ^{n})\right]&\le E\left[ \int _{\mathbb {R}^{d}}\ell (f_{i}^{n} (y))\theta (dy|\bar{X}_{i}^{n})\right] \\&=E[R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X} _{i}^{n}))], \end{aligned}$$

and by Hölder’s inequality,

$$\begin{aligned}&E\left[ \int _{\{y:f_{i}^{n}(y)\ge r\}}e^{\sigma \left\| y\right\| }\theta (dy|\bar{X}_{i}^{n})\right] \\&\quad \le E\left[ \left( \int _{\mathbb {R}^{d}}1_{\{f_{i}^{n}(y)\ge r\}}\theta (dy|\bar{X}_{i}^{n})\right) ^{1/2}\left( \int _{\mathbb {R}^{d} }e^{2\sigma \left\| y\right\| }\theta (dy|\bar{X}_{i}^{n})\right) ^{1/2}\right] \\&\quad =E\left[ \theta (\{y:f_{i}^{n}(y)\ge r\}|\bar{X}_{i}^{n} )^{1/2}\right] \left( 2de^{K_{\text {mgf}}}\right) ^{1/2}. \end{aligned}$$

In addition, for all \(r>1\), Markov’s inequality gives

$$\begin{aligned} \theta (\{y:f_{i}^{n}(y)\ge r\}|\bar{X}_{i}^{n})&\le \frac{1}{r\log r}\int \log (f_{i}^{n}(y))f_{i}^{n}(y)\theta (dy|\bar{X}_{i}^{n}) \\&=\frac{R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))}{r\log r}\text {.} \end{aligned}$$

The last four displays give the bound

$$\begin{aligned}&\frac{1}{n}\sum \limits _{i=0}^{n-1}E\left[ \int _{\{f_{i}^{n}(y)\ge r\}}\left\| y\right\| f_{i}^{n}(y)\theta (dy|\bar{X}_{i}^{n})\right] \\&\quad \le \frac{1}{\sigma }\left( 2de^{K_{\text {mgf}}}\right) ^{1/2} \frac{1}{n}\sum \limits _{i=0}^{n-1}E\left[ \left( \frac{R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))}{r\log r}\right) ^{1/2}\right] \\&\quad \quad \quad +\frac{1}{\sigma }\frac{1}{n}\sum \limits _{i=0}^{n-1} E[R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i} ^{n}))]. \end{aligned}$$

Since by Jensen’s inequality,

$$\begin{aligned}&\frac{1}{n}\sum \limits _{i=0}^{n-1}E\left[ \left( \frac{R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))}{r\log r}\right) ^{1/2}\right] \\&\quad \le \left( \frac{1}{r\log r}\right) ^{1/2}\left( \frac{1}{n} \sum \limits _{i=0}^{n-1}E[R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))]\right) ^{1/2}, \end{aligned}$$

we obtain the overall bound

$$\begin{aligned}&\frac{1}{n}\sum \limits _{i=0}^{n-1}E\left[ 1_{\{f_{i}^{n}(\bar{\nu }_{i} ^{n})\ge r\}}\left\| \bar{\nu }_{i}^{n}\right\| \right] \nonumber \\&\quad \le \frac{1}{\sigma }\left( 2de^{K_{\text {mgf}}}\right) ^{1/2}\left( \frac{1}{r\log r}\right) ^{1/2}\left( \frac{1}{n}\sum \limits _{i=0} ^{n-1}E[R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X} _{i}^{n}))]\right) ^{1/2}\nonumber \\&\qquad +\frac{1}{\sigma }\frac{1}{n}\sum \limits _{i=0}^{n-1}E[R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))]\nonumber \\&\quad \le \frac{1}{\sigma }\frac{K_{E}^{1/2}}{\sqrt{\varkappa (n)n}}\left( 2de^{K_{\text {mgf}}}\right) ^{1/2}\left( \frac{1}{r\log r}\right) ^{1/2}+\frac{1}{\sigma }\frac{K_{E}}{\varkappa (n)n}\text {.} \end{aligned}$$
(5.21)

Using this result, we can complete the proof. Define

$$ \xi _{i}^{n, r}\doteq \left\{ \begin{array} [c]{cc} \bar{\nu }_{i}^{n} &{} \text {if }f_{i}^{n}(\bar{\nu }_{i}^{n})<r,\\ 0 &{} \text {otherwise.} \end{array} \right. $$

For all \(\delta >0\),

$$\begin{aligned}&P\left\{ \max _{k=0,..., n-1}\left\| \frac{1}{n}\sum \limits _{i=0} ^{k}\sqrt{\varkappa (n)n}\left( \bar{\nu }_{i}^{n}-\bar{w}^{n}\left( \frac{i}{n}\right) \right) \right\| \ge 3\delta \right\} \\&\quad \le P\left\{ \max _{k=0,..., n-1}\left\| \frac{1}{n}\sum \limits _{i=0}^{k}\sqrt{\varkappa (n)n}(\bar{\nu }_{i}^{n}-\xi _{i}^{n, r} )\right\| \ge \delta \right\} \\&\qquad +P\left\{ \max _{k=0,..., n-1}\left\| \frac{1}{n}\sum \limits _{i=0}^{k}\sqrt{\varkappa (n)n}\left( \xi _{i}^{n, r}-\int _{\{y:f_{i} ^{n}(y)<r\}}y\bar{\mu }_{i}^{n}(dy)\right) \right\| \ge \delta \right\} \\&\qquad +P\left\{ \max _{k=0,..., n-1}\left\| \frac{1}{n}\sum \limits _{i=0}^{k}\sqrt{\varkappa (n)n}\left( \bar{w}^{n}\left( \frac{i}{n}\right) -\int _{\{y:f_{i}^{n}(y)<r\}}y\bar{\mu }_{i}^{n}(dy)\right) \right\| \ge \delta \right\} \text {.} \end{aligned}$$

The first term satisfies

$$\begin{aligned}&P\left\{ \max _{k=0,..., n-1}\left\| \frac{1}{n}\sum \limits _{i=0} ^{k}\sqrt{\varkappa (n)n}(\bar{\nu }_{i}^{n}-\xi _{i}^{n, r})\right\| \ge \delta \right\} \\&\quad \le \frac{1}{\delta }\sqrt{\varkappa (n)n}\frac{1}{n}\sum \limits _{i=0} ^{n-1}E\left[ \left\| \bar{\nu }_{i}^{n}-\xi _{i}^{n, r}\right\| \right] \\&\quad =\frac{1}{\delta }\sqrt{\varkappa (n)n}\frac{1}{n}\sum \limits _{i=0} ^{n-1}E\left[ 1_{\{f_{i}^{n}(\bar{\nu }_{i}^{n})\ge r\}}\left\| \bar{\nu }_{i}^{n}\right\| \right] \text {.} \end{aligned}$$

The norm in the second term is a submartingale in k, and so by Doob’s submartingale inequality [see (D.1)],

$$\begin{aligned}&P\left\{ \max _{k=0,..., n-1}\left\| \frac{1}{n}\sum \limits _{i=0} ^{k}\sqrt{\varkappa (n)n}\left( \xi _{i}^{n, r}-\int _{\{y:f_{i}^{n}(y)<r\}} y\bar{\mu }_{i}^{n}(dy)\right) \right\| \ge \delta \right\} \\&\quad \le \frac{1}{\delta ^{2}}E\left[ \left\| \frac{1}{n}\sum \limits _{i=0}^{n-1}\sqrt{\varkappa (n)n}\left( \xi _{i}^{n, r}-\int _{\{y:f_{i}^{n}(y)<r\}}y\bar{\mu }_{i}^{n}(dy)\right) \right\| ^{2}\right] \\&\quad =\frac{1}{\delta ^{2}}\frac{\varkappa (n)}{n}\sum \limits _{i=0} ^{n-1}E\left[ \left\| \left( \xi _{i}^{n, r}-\int _{\{y:f_{i}^{n} (y)<r\}}y\bar{\mu }_{i}^{n}(dy)\right) \right\| ^{2}\right] \\&\quad \le \frac{1}{\delta ^{2}}\frac{\varkappa (n)}{n}\sum \limits _{i=0} ^{n-1}E\left[ \left\| \xi _{i}^{n, r}\right\| ^{2}\right] \\&\quad =\frac{1}{\delta ^{2}}\frac{\varkappa (n)}{n}\sum \limits _{i=0} ^{n-1}E\left[ \int _{\{y:f_{i}^{n}(y)<r\}}\left\| y\right\| ^{2} f_{i}^{n}(y)\theta (dy|\bar{X}_{i}^{n})\right] \\&\quad \le \frac{r}{\delta ^{2}}\frac{\varkappa (n)}{n}\sum \limits _{i=0} ^{n-1}E\left[ \int _{\mathbb {R}^{d}}\left\| y\right\| ^{2}\theta (dy|\bar{X}_{i}^{n})\right] \\&\quad \le \frac{r}{\delta ^{2}}\varkappa (n)K_{\mu , 2}, \end{aligned}$$

where

$$ K_{\mu , 2}=\sup _{x\in \mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\left\| y\right\| ^{2}\theta (dy|x)<\infty , $$

and the finiteness is due to (5.3). We can use Jensen’s inequality with the third term and get the same bound that was proved for the first:

$$\begin{aligned}&P\left\{ \max _{k=0,..., n-1}\left\| \frac{1}{n}\sum \limits _{i=0} ^{k}\sqrt{\varkappa (n)n}\left( \bar{w}^{n}\left( \frac{i}{n}\right) -\int _{\{y:f_{i}^{n}(y)<r\}}y\bar{\mu }_{i}^{n}(dy)\right) \right\| \ge \delta \right\} \\&\quad \le \frac{1}{\delta }\sqrt{\varkappa (n)n}\frac{1}{n}\sum \limits _{i=0} ^{n-1}E\left[ \left\| \left( \bar{w}^{n}\left( \frac{i}{n}\right) -\int _{\{y:f_{i}^{n}(y)<r\}}y\bar{\mu }_{i}^{n}(dy)\right) \right\| \right] \\&\quad =\frac{1}{\delta }\sqrt{\varkappa (n)n}\frac{1}{n}\sum \limits _{i=0} ^{n-1}E\left[ \left\| \int _{\{y:f_{i}^{n}(y)\ge r\}}y\bar{\mu }_{i} ^{n}(dy)\right\| \right] \\&\quad \le \frac{1}{\delta }\sqrt{\varkappa (n)n}\frac{1}{n}\sum \limits _{i=0} ^{n-1}E\left[ \int _{\{y:f_{i}^{n}(y)\ge r\}}\left\| y\right\| \bar{\mu }_{i}^{n}(dy)\right] \\&\quad =\frac{1}{\delta }\sqrt{\varkappa (n)n}\frac{1}{n}\sum \limits _{i=0} ^{n-1}E\left[ 1_{\{f_{i}^{n}(\bar{\nu }_{i}^{n})\ge r\}}\left\| \bar{\nu }_{i}^{n}\right\| \right] \text {.} \end{aligned}$$

Combining the bounds for these three terms with (5.21) gives

$$\begin{aligned}&P\left\{ \max _{k=0,..., n-1}\left\| \frac{1}{n}\sum \limits _{i=0} ^{k}\sqrt{\varkappa (n)n}\left( \bar{\nu }_{i}^{n}-\bar{w}^{n}\left( \frac{i}{n}\right) \right) \right\| \ge 3\delta \right\} \\&\quad \le \frac{2}{\delta }\sqrt{\varkappa (n)n}\frac{1}{n}\sum \limits _{i=0} ^{n-1}E\left[ 1_{\{f_{i}^{n}(\bar{\nu }_{i}^{n})\ge r]}\left\| \bar{\nu }_{i}^{n}\right\| \right] +\frac{r}{\delta ^{2}}\varkappa (n)K_{\mu , 2}\\&\quad \le \frac{2}{\sigma \delta }K_{E}^{1/2}\left( 2{d}e^{K_{\text {mgf}} }\right) ^{1/2}\left( \frac{1}{r\log r}\right) ^{1/2}+\frac{2}{\sigma \delta }\frac{K_{E}}{\sqrt{\varkappa (n)n}}+\varkappa (n)\frac{r}{\delta ^{2} }K_{\mu , 2}. \end{aligned}$$

Sending \(n\rightarrow \infty \) and then \(r\rightarrow \infty \) (and using \(\varkappa (n)\rightarrow 0,\varkappa (n)n\rightarrow \infty \)) gives

$$ P\left\{ \max _{k=0,..., n-1}\left\| \frac{1}{n}\sum \limits _{i=0}^{k} \sqrt{\varkappa (n)n}\left( \bar{\nu }_{i}^{n}-\bar{w}^{n}\left( \frac{i}{n}\right) \right) \right\| \ge 3\delta \right\} \rightarrow 0 $$

as \(n\rightarrow \infty \), which completes the proof.    \(\square \)

The next result identifies the weak limits of controlled processes. We recall that for a probability measure \(\gamma \) on \(\mathbb {R}^{d}\times [0,1]\), the marginal distribution on the second coordinate is denoted by \([\gamma ]_{2}\), and the conditional distribution on the first coordinate given the second is given by \([\gamma ]_{1|2}\). Thus for Borel sets A and B,

$$ \gamma (\mathbb {R}^{d}\times B)=[\gamma ]_{2}(B)\text { and }\gamma (A\times B)=\int _{B}[\gamma ]_{1|2}(A|s)[\gamma ]_{2}(ds). $$

Theorem 5.7

Let \(\{\bar{\mu }_{i}^{n}\}_{i=1,\ldots , n}\) be a sequence of controls, and define the corresponding random variables as in Construction 5.3. Assume Condition 5.1 and that (5.17) holds for some \(K_{E}<\infty \). Then \(\{(\bar{M} ^{n},\bar{Y}^{n})\}_{n\in \mathbb {N}}\) is tight in \(\mathscr {P}(\mathbb {R} ^{d}\times [0,1])\times \mathscr {C}([0,1]:\mathbb {R}^{d})\). Consider a subsequence (keeping the index n for convenience) such that \(\{(\bar{M} ^{n},\bar{Y}^{n})\}\) converges weakly to \((\bar{M},\bar{Y})\). Then with probability 1, \([\bar{M}]_{2}(dt)\) is Lebesgue measure and

$$\begin{aligned} \bar{Y}(t)=\int _{0}^{t}Db(X^{0}(s))\bar{Y}(s)ds+\int _{0}^{t} w(s)ds, \end{aligned}$$
(5.22)

where

$$\begin{aligned} w(t)=\int _{\mathbb {R}^{d}}w[\bar{M}]_{1|2}(dw\left| t\right. ). \end{aligned}$$
(5.23)

In addition,

$$\begin{aligned} \liminf _{n\rightarrow \infty }\varkappa (n)nE\left[ \frac{1}{n}\sum \limits _{i=0}^{n-1}R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))\right] \ge E\left[ \int _{0}^{1}\frac{1}{2}\left\| w(s)\right\| _{A^{-1}(X^{0}(s))}^{2}ds\right] . \end{aligned}$$
(5.24)

The proof of this theorem is lengthy. After some preliminary discussion, several lemmas will be presented. After stating and proving the lemmas, we will return to complete the argument for Theorem 5.7.

It was shown in Theorem 5.5 that \(\{\bar{M}^{n} \}_{n\in \mathbb {N}}\) is tight. If \(\bar{M}\) is any weak limit of a subsequence of \(\{\bar{M}^{n}\}_{n\in \mathbb {N}}\), then since for all n the second marginal of \(\bar{M}^{n}(dw\times dt)\) is Lebesgue measure, it follows that \([\bar{M}]_{2}(dt)\) is Lebesgue measure with probability 1.

The ultimate goal is to show that \(\bar{Y}^{n}\rightarrow \bar{Y}\) weakly in \(\mathscr {C}([0,1]:\mathbb {R}^{d})\), where \(\bar{Y}(t)\) is given by (5.22) in terms of the weak limit \(\bar{M}\). To achieve this, we introduce the following processes, which serve as intermediate steps. Let \(\check{Y}_{0}^{n}=0\) and

$$ \check{Y}_{i+1}^{n}=\check{Y}_{i}^{n}+\sqrt{\frac{\varkappa (n)}{n}}\left( b\left( X_{i}^{n, 0}+\frac{1}{\sqrt{\varkappa (n)n}}\check{Y}_{i}^{n}\right) -b\left( X_{i}^{n, 0}\right) \right) +\sqrt{\frac{\varkappa (n)}{n}}\bar{w}^{n}\left( \frac{i}{n}\right) , $$

together with its continuous time linear interpolation defined for \(t\in [i/n, i/n+1/n]\) by

$$ \check{Y}^{n}(t)=(i+1-nt)\check{Y}_{i}^{n}+(nt-i)\check{Y}_{i+1}^{n}\text {.} $$

Also let

$$\begin{aligned} \hat{Y}^{n}(t)=\int _{0}^{t}Db\left( X^{0}(s)\right) \hat{Y}^{n} (s)ds+\int _{0}^{t}w^{n}(s)ds, \end{aligned}$$
(5.25)

where

$$ w^{n}(t)=\int _{\mathbb {R}^{d}}w[\bar{M}]_{1|2}^{n}(dw\left| t\right. ) $$

as in Construction 5.3. Then both \(\check{Y}^{n}\) and \(\hat{Y}^{n}\) are random variables taking values in \(\mathscr {C} ([0,1]:\mathbb {R}^{d})\). Note that \(\bar{Y}^{n}\) differs from \(\check{Y} ^{n}\), because \(\bar{Y}^{n}\) is driven by the actual noises and \(\check{Y}^{n}\) is driven by their conditional means. While the driving terms of \(\hat{Y}^{n}\) and \(\check{Y}^{n}\) are the same [recall that \(\sqrt{\varkappa (n)n}\bar{w}^{n}(t)=w^{n}(t)\)], they differ in that \(\check{Y}^{n}\) is still a linear interpolation of a discrete time process, whereas \(\hat{Y}^{n}\) satisfies an ODE. We will show that along any subsequence where \(\bar{M} ^{n}\rightarrow \bar{M}\) weakly,

$$ \bar{Y}^{n}-\check{Y}^{n}\rightarrow 0,\check{Y}^{n}-\hat{Y}^{n}\rightarrow 0,\text { and }\hat{Y}^{n}\rightarrow \bar{Y} $$

in \(\mathscr {C}([0,1]:\mathbb {R}^{d})\), all in distribution, where \(\bar{Y}\) is the unique solution of (5.22).

To show that \(\hat{Y}^{n}\rightarrow \bar{Y}\), we show that \(\{\hat{Y}^{n}\}\) is tight in \(\mathscr {C}([0,1]:\mathbb {R}^{d})\) and use the mapping defined by (5.25) from \(\int _{0}^{\cdot }w^{n}\) to \(\hat{Y}^{n}\). Recall that \(\sup _{x\in \mathbb {R}^{d}}\left\| Db(x)\right\| \le K_{b}\). The following lemma uses the uniform integrability of \(\{\bar{M}^{n}\}\) given in Theorem 5.5 to prove tightness of \(\{\hat{Y}^{n}\}\).

Lemma 5.8

Assume Conditions 5.1 and (5.17). The sequence \(\{\hat{Y}^{n}\}\) defined in (5.25) in terms of the measures \(\{\bar{M}^{n}\}\) via Construction 5.3 is tight in \(\mathscr {C}([0,1]:\mathbb {R}^{d})\), as is \(\{\int _{0}^{\cdot }w^{n}ds\}\).

Proof

Tightness of \(\{\int _{0}^{\cdot }w^{n}ds\}\) is a consequence of the fact that for \(\delta , C\in (0,\infty )\),

$$ \limsup _{n\rightarrow \infty }P\left( \sup _{\left| s-t\right| \le \delta }\int _{s}^{t}\left\| w^{n}(r)\right\| dr>\varepsilon \right) \le \delta \frac{C}{\varepsilon }+\frac{1}{\varepsilon }T(C), $$

where

$$\begin{aligned} T(C)&\doteq \limsup _{n\rightarrow \infty }E\left[ \int _{0}^{1} 1_{\{\left\| w^{n}\left( t\right) \right\|>C\}}\left\| w^{n}(t)\right\| dt\right] \\&=\limsup _{n\rightarrow \infty }E\left[ \int _{\{\left\| w\right\| >C\}}\left\| w\right\| \bar{M}^{n}(dw\times dt)\right] , \end{aligned}$$

and the fact that by Theorem 5.5, \(T(C)\rightarrow 0\) as \(C\rightarrow \infty \). For tightness of \(\{\hat{Y}^{n}\}\) it suffices to check that the map from \(\mathscr {C}([0,1]:\mathbb {R}^{d})\) to itself that takes \(z\in \mathscr {C}([0,1]:\mathbb {R}^{d})\) to the unique solution of

$$ \phi (t)=\int _{0}^{t}Db(X^{0}(s))\phi (s)ds+z(t),\;t\in [0,1], $$

is continuous. However, this continuity follows directly from Gronwall’s inequality.    \(\square \)

We still need to show that \(\hat{Y}^{n}\) converges to \(\bar{Y}\). This also relies on the uniform integrability given by Theorem 5.5.

Lemma 5.9

Assume Conditions 5.1 and (5.17). Let the sequence \(\{\hat{Y}^{n}\left( t\right) \}\) be defined by (5.25) and consider a weakly convergent subsequence \(\{(\hat{Y}^{n},\bar{M}^{n})\}\) with limit \((\hat{Y},\bar{M})\). Then w.p.1, \(\hat{Y}=\bar{Y}\), where \(\bar{Y}\) is defined by (5.22)–(5.23).

Proof

We can write

$$ \hat{Y}^{n}(t)=\int _{0}^{t}Db(X^{0}(s))\hat{Y}^{n}(s)ds+\int _{0}^{t} \int _{\mathbb {R}^{d}}w\bar{M}^{n}(dw\times ds). $$

The uniform integrability proved in Theorem 5.5 and that \([\bar{M}]_{2}\) is Lebesgue measure w.p.1 will be used. The latter implies \(E\bar{M}(\mathbb {R}^{d}\times \{t\})=0\) for \(t\in [0,1]\). Sending \(n\rightarrow \infty \) and using the definition of w(s) in (5.23) gives

$$\begin{aligned} \hat{Y}(t)&=\int _{0}^{t}Db(X^{0}(s))\hat{Y}(s)ds+\int _{0}^{t} \int _{\mathbb {R}^{d}}w\bar{M}(dw\times ds)\\&=\int _{0}^{t}Db(X^{0}(s))\hat{Y}(s)ds+\int _{0}^{t}w(s)ds. \end{aligned}$$

By uniqueness of the solution, \(\hat{Y}=\bar{Y}\) follows.    \(\square \)

It remains to show that \(\bar{Y}^{n}-\check{Y}^{n}\rightarrow 0\) and \(\check{Y} ^{n}-\hat{Y}^{n}\rightarrow 0\). We begin with \(\bar{Y}^{n}-\check{Y} ^{n}\rightarrow 0\). Recall that the difference between \(\bar{Y}^{n}\) and \(\check{Y}^{n}\) is that the first is driven by the actual noises, and the second is driven by their conditional means. The following discrete version of Gronwall’s inequality will be used to prove \(\bar{Y}^{n}-\check{Y} ^{n}\rightarrow 0\). A proof can be found in [83, p. 283].

Lemma 5.10

If \(\{z_{n}\}\), \(\{u_{n}\}\), and \(\{v_{n}\}\) are nonnegative sequences defined for \(n\in \mathbb {N}_{0}\) that satisfy

$$ z_{k}\le v_{k}+ {\displaystyle \sum \limits _{k=0}^{n-1}} u_{i}z_{i}, $$

then

$$ z_{k}\le v_{k}+ {\displaystyle \sum \limits _{k=0}^{n-1}} u_{i}v_{i}\exp \left\{ {\displaystyle \sum \limits _{i=k+1}^{n-1}} u_{j}\right\} \text {.} $$

Lemma 5.11

Assume Conditions 5.1 and (5.17). Then \(\check{Y}^{n}-\bar{Y}^{n}\rightarrow 0\) in probability.

Proof

Recall from (5.13) and (5.14) that

$$ \bar{Y}_{k}^{n}= {\displaystyle \sum \limits _{i=0}^{k-1}} \sqrt{\frac{\varkappa (n)}{n}}\left( b\left( X_{i}^{n, 0}+\frac{1}{\sqrt{\varkappa (n)n}}\bar{Y}_{i}^{n}\right) -b\left( X_{i}^{n, 0}\right) \right) + {\displaystyle \sum \limits _{i=0}^{k-1}} \sqrt{\frac{\varkappa (n)}{n}}\bar{\nu }_{i}^{n} $$

and

$$ \check{Y}_{k}^{n}= {\displaystyle \sum \limits _{i=0}^{k-1}} \sqrt{\frac{\varkappa (n)}{n}}\left( b\left( X_{i}^{n, 0}+\frac{1}{\sqrt{\varkappa (n)n}}\check{Y}_{i}^{n}\right) -b\left( X_{i}^{n, 0}\right) \right) + {\displaystyle \sum \limits _{i=0}^{k-1}} \sqrt{\frac{\varkappa (n)}{n}}\bar{w}^{n}\left( \frac{i}{n}\right) , $$

so with \(W_{k}^{n}\) defined as in Theorem 5.6,

$$ \left\| \bar{Y}_{k}^{n}-\check{Y}_{k}^{n}\right\| \le \left\| W_{k}^{n}\right\| + {\displaystyle \sum \limits _{i=0}^{k-1}} \frac{K_{b}}{n}\left\| \bar{Y}_{i}^{n}-\check{Y}_{i}^{n}\right\| \text {.} $$

Using Lemma 5.10 gives, for \(k\le n\),

$$\begin{aligned} \left\| \bar{Y}_{k}^{n}-\check{Y}_{k}^{n}\right\|&\le \left\| W_{k}^{n}\right\| + {\displaystyle \sum \limits _{i=0}^{k-1}} \left\| W_{i}^{n}\right\| \frac{K_{b}}{n}\exp \left\{ \frac{K_{b}}{n}(k-i-1)\right\} \\&\le (1+K_{b}e^{K_{b}})\max _{i\in \{0,1,\ldots , k\}}\left\| W_{i} ^{n}\right\| . \end{aligned}$$

From Theorem 5.6, we have \(\max _{i\in \{1,\ldots , n\}}\left\| W_{i}^{n}\right\| \rightarrow 0\) in probability, and therefore

$$ \max _{i\in \{1,\ldots , n\}}\left\| \bar{Y}_{i}^{n}-\check{Y}_{i} ^{n}\right\| \rightarrow 0, $$

and hence \(\sup _{t\in [0,1]}\left\| \bar{Y}^{n}(t)-\check{Y} ^{n}(t)\right\| \rightarrow 0\) in probability.    \(\square \)

To complete the proof of the convergence we need to show that \(\check{Y}^{n} -\hat{Y}^{n}\rightarrow 0\). Recall that these two processes have the same driving terms but different drifts, in that \(\hat{Y}^{n}\) satisfies the ODE

$$ \hat{Y}^{n}(t)=\int _{0}^{t}Db(X^{0}(s))\hat{Y}^{n}(s)ds+\int _{0}^{t}w^{n}(s)ds, $$

while \(\check{Y}^{n}\) is the linear interpolation of the discrete time process defined by \(\check{Y}_{0}^{n}=0\) and

$$\begin{aligned} \check{Y}_{i+1}^{n}=\check{Y}_{i}^{n}+\sqrt{\frac{\varkappa (n)}{n}}\left( b\left( X_{i}^{n, 0}+\frac{1}{\sqrt{\varkappa (n)n}}\check{Y}_{i}^{n}\right) -b\left( X_{i}^{n, 0}\right) \right) +\frac{1}{n}w^{n}\left( \frac{i}{n}\right) .\nonumber \end{aligned}$$

However, essentially the same arguments as those used in Lemma 5.8 to show tightness of \(\{\hat{Y}^{n}\}\) can be used to prove tightness of \(\{\check{Y}^{n}\}\), and then it easily follows as in Lemma 5.9 that any limit will satisfy the same ODE (5.22) as the limit of \(\{\hat{Y}^{n}\}\), and therefore \(\check{Y}^{n}-\hat{Y} ^{n}\rightarrow 0\) follows.

Combining \(\bar{Y}^{n}-\check{Y}^{n}\rightarrow 0\), \(\check{Y}^{n}-\hat{Y} ^{n}\rightarrow 0\), and \(\hat{Y}^{n}\rightarrow \bar{Y}\) demonstrates that along the subsequence where \(\bar{M}^{n}\rightarrow \bar{M}\) weakly, \(\bar{Y} ^{n}\rightarrow \bar{Y}\) in distribution, which implies that along this subsequence, \((\bar{M}^{n},\bar{Y}^{n})\rightarrow (\bar{M},\bar{Y})\) weakly. We have already shown that with probability 1, \([\bar{M}]_{2}(dt)\) is Lebesgue measure and

$$ \bar{Y}(t)=\int _{0}^{t}Db(X^{0}(s))\bar{Y}(s)ds+\int _{0}^{t}\int _{\mathbb {R}^{d}}w[\bar{M}]_{1|2}(dw\left| s\right. )ds, $$

so the proof of convergence (i.e., the first part of Theorem 5.7) is complete.

To finish the proof of Theorem 5.7, we must prove the bound (5.24). Recall the notation \(s^{n}(t)\doteq \left\lfloor nt\right\rfloor /n\), and note from (5.14) that the weak convergence of \(\bar{Y}^{n}\) implies

$$\begin{aligned} \sup _{t\in [0,1]}\left\| \bar{X}^{n}(s^{n}(t))-X^{0}(t)\right\| \rightarrow 0\text { in probability.} \end{aligned}$$
(5.26)

Now define random measures on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\times \left[ 0,1\right] \) by

$$ \gamma ^{n}\left( dx\times dw\times dt\right) =\delta _{\bar{X}^{n}\left( s^{n}(t)\right) }\left( dx\right) \bar{M}^{n}\left( dw\times dt\right) \text {.} $$

Note that the tightness of \(\left\{ \gamma ^{n}\right\} \) follows easily from (5.26) and the tightness of \(\left\{ \bar{M}^{n}\right\} \). Thus given any subsequence, we can choose a further subsequence (again we will retain n as the index for simplicity) along which \(\left\{ \gamma ^{n}\right\} \) converges weakly to some \(\mathscr {P}\left( \mathbb {R} ^{d}\times \mathbb {R}^{d}\times \left[ 0,1\right] \right) \)-valued random variable \(\gamma \) with

$$ [\gamma ]_{2,3}\left( dw\times dt\right) =\bar{M}\left( dw\times dt\right) \text {,} $$

where \([\gamma ]_{2,3}\) is the second and third marginal of \(\gamma \). If we establish (5.24) for this subsequence, it holds for the original one using a standard argument by contradiction. For \(\sigma >0\), let

$$ G_{\sigma }^{X^{0}} \doteq \left\{ \left( x,w, t\right) :\left\| x-X^{0}\left( t\right) \right\| \le \sigma \right\} $$

be closed sets centered on \(X^{0}\left( t\right) \) in the x variable, and note that by (5.26) and weak convergence, for all \(\sigma >0\),

$$ 1=\limsup _{n\rightarrow \infty }E\left[ \gamma ^{n}\left( G_{\sigma }^{X^{0} }\right) \right] \le E\left[ \gamma \left( G_{\sigma }^{X^{0}}\right) \right] \text {.} $$

Thus

$$ E\left[ \gamma \left( \bigcap _{n\in \mathbb {N}}G_{1/n}^{X^{0}}\right) \right] =1, $$

so with probability 1, \(\gamma \) puts all its mass on \(\left\{ \left( x,w, t\right) :x=X^{0}\left( t\right) \right\} \). Therefore, with probability 1, for a.e. \(\left( w, t\right) \) under \([\gamma ]_{2,3}\left( dw\times dt\right) \),

$$ [\gamma ]_{1\left| 2,3\right. }\left( \left. dx\right| w, t\right) =\delta _{X^{0}\left( t\right) }\left( dx\right) \text {.} $$

Combined with the fact that the second marginal of \(\bar{M}\left( dw\times dt\right) \) is Lebesgue measure, this gives

$$\begin{aligned} \gamma \left( dx\times dw\times dt\right) =\delta _{X^{0}\left( t\right) }\left( dx\right) \bar{M}\left( \left. dw\right| t\right) dt\text {.} \end{aligned}$$
(5.27)

For \(\kappa \in (0,\infty )\), define

$$ \bar{L}_{\kappa }\left( x,\beta \right) \doteq \sup _{\alpha \in \mathbb {R}^{d} }\left[ \left\langle \alpha ,\beta \right\rangle -\frac{1}{2}\left\| \alpha \right\| _{A(x)}^{2}-\frac{1}{2\kappa }\left\| \alpha \right\| ^{2}\right] . $$

Using (5.8),

$$\begin{aligned}&\varkappa (n)nL_{c}\left( x,\frac{1}{\sqrt{\varkappa (n)n}}\beta \right) \nonumber \\&\quad =\sup _{\alpha \in \mathbb {R}^{d}}\left[ \sqrt{\varkappa (n)n} \langle \alpha ,\beta \rangle -\varkappa (n)nH_{c}(x,\alpha )\right] \nonumber \\&\quad \ge \sup _{\alpha \in \mathbb {R}^{d}}\left[ \sqrt{\varkappa (n)n} \langle \alpha ,\beta \rangle -\frac{\varkappa (n)n}{2}\Vert \alpha \Vert _{A(x)} ^{2}-\varkappa (n)nK_{DA}\Vert \alpha \Vert ^{3}\right] \nonumber \\&\quad \ge \sup _{\alpha \in \mathbb {R}^{d}}\left[ \langle \alpha ,\beta \rangle -\Vert \alpha \Vert _{A(x)}^{2}-\frac{1}{2\kappa }\Vert \alpha \Vert ^{2}-\frac{K_{DA}}{\sqrt{\varkappa (n)n}}\Vert \alpha \Vert ^{3}\right] . \end{aligned}$$
(5.28)

Let \(K_{1}\) be an arbitrary compact subset of \(\mathbb {R}^{d}\). Since \(\Vert \alpha \Vert ^{2}\) is superlinear, there exists another compact set \(K_{2}\) of \(\mathbb {R}^{d}\), depending only on \(\kappa \) and \(K_{1}\), such that whenever \(\beta \in K_{1}\) and \(x\in \mathbb {R}^{d}\),

$$\begin{aligned} \sup _{\alpha \in K_{2}}\left[ \left\langle \alpha ,\beta \right\rangle -\frac{1}{2}\left\| \alpha \right\| _{A(x)}^{2}-\frac{1}{2\kappa }\left\| \alpha \right\| ^{2}\right]&=\sup _{\alpha \in \mathbb {R}^{d}}\left[ \left\langle \alpha ,\beta \right\rangle -\frac{1}{2}\left\| \alpha \right\| _{A(x)}^{2}-\frac{1}{2\kappa }\left\| \alpha \right\| ^{2}\right] \\&=\bar{L}_{\kappa }(x,\beta ). \end{aligned}$$

Also, from (5.28),

$$\begin{aligned} \varLambda _{K_{2}}^{n}(x,\beta )&\doteq \sup _{\alpha \in K_{2}}\left[ \langle \alpha ,\beta \rangle -\Vert \alpha \Vert _{A(x)}^{2}-\frac{1}{2\kappa }\Vert \alpha \Vert ^{2}-\frac{K_{DA} }{\sqrt{\varkappa (n)n}}\Vert \alpha \Vert ^{3}\right] \\&\le \varkappa (n)nL_{c}\left( x,\frac{1}{\sqrt{\varkappa (n)n}}\beta \right) \end{aligned}$$

Note that as \(n\rightarrow \infty \),

$$\begin{aligned} \sup _{(x,\beta )\in \mathbb {R}^{d}\times K_{1}}|\varLambda _{K_{2}}^{n} (x,\beta )-\bar{L}_{\kappa }(x,\beta )|\rightarrow 0. \end{aligned}$$
(5.29)

By Lemma 5.4 and the definitions of \(\gamma ^{n}\) and \(\bar{M}^{n}\), we have

$$\begin{aligned}&\liminf _{n\rightarrow \infty }\varkappa (n)nE\left[ {\displaystyle \sum \limits _{i=0}^{n-1}}\frac{1}{n}R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))\right] \\&\quad \ge \liminf _{n\rightarrow \infty }E\left[ \int _{\mathbb {R}^{d} \times \mathbb {R}^{d}\times \left[ 0,1\right] }\varkappa (n)nL_{c}\left( x,\frac{1}{\sqrt{\varkappa (n)n}}w\right) \gamma ^{n}\left( dx\times dw\times dt\right) \right] \\&\quad \ge \liminf _{n\rightarrow \infty }E\left[ \int _{\mathbb {R}^{d}\times K_{1}\times \left[ 0,1\right] }\varLambda _{K_{2}}^{n}(x, w)\gamma ^{n}\left( dx\times dw\times dt\right) \right] . \end{aligned}$$

For fixed \(K_{1}\), since \(K_{2}\) is bounded, there is \(c\in (0,\infty )\) such that \(|\varLambda _{K_{2}}^{n}(x, w)|\le c(1+\left\| w\right\| )\) for all \(n\in \mathbb {N}\) and also \(|\bar{L}_{\kappa }(x, w)|\le c(1+\left\| w\right\| )\). Using these bounds and (5.18) to control contributions to the integrals from large values of \(\left\| w\right\| \), it follows from (5.29) that the last quantity in the previous display is the same as

$$ \liminf _{n\rightarrow \infty }E\left[ \int _{\mathbb {R}^{d}\times K_{1} \times \left[ 0,1\right] }\bar{L}_{\kappa }(x, w)\gamma ^{n}\left( dx\times dw\times dt\right) \right] . $$

Using the continuity of \((x,\beta )\mapsto \bar{L}_{\kappa }(x,\beta )\) and Fatou’s lemma thus gives

$$\begin{aligned}&\liminf _{n\rightarrow \infty }\varkappa (n)nE\left[ {\displaystyle \sum \limits _{i=0}^{n-1}} \frac{1}{n}R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))\right] \\&\quad \ge E\left[ \int _{\mathbb {R}^{d}\times K_{1}\times \left[ 0,1\right] }\bar{L}_{\kappa }\left( x, w\right) \gamma \left( dx\times dw\times dt\right) \right] . \end{aligned}$$

Since \(\bar{L}_{\kappa }\ge 0\), by the monotone convergence theorem we can replace \(K_{1}\) by \(\mathbb {R}^{d}\) in the last display. Next note that as \(\kappa \rightarrow \infty \),

$$ \bar{L}_{\kappa }\left( x,\beta \right) \uparrow \frac{1}{2}\left\| \beta \right\| _{A^{-1}(x)}^{2} $$

for all \((x,\beta )\in \mathbb {R}^{2d}\). Finally, using the monotone convergence theorem, the decomposition (5.27), and Jensen’s inequality in that order shows that

$$\begin{aligned}&\liminf _{n\rightarrow \infty }\varkappa (n)nE\left[ {\displaystyle \sum \limits _{i=0}^{n-1}} \frac{1}{n}R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))\right] \\&\quad \ge \lim _{\kappa \rightarrow \infty }E\left[ \int _{\mathbb {R}^{d} \times \mathbb {R}^{d}\times \left[ 0,1\right] }\bar{L}_{\kappa }\left( x, w\right) \gamma \left( dx\times dw\times dt\right) \right] \\&\quad =E\left[ \int _{\mathbb {R}^{d}\times \mathbb {R}^{d}\times \left[ 0,1\right] }\frac{1}{2}\left\| w\right\| _{A^{-1}(x)}^{2}\gamma \left( dx\times dw\times dt\right) \right] \\&\quad =E\left[ \int _{0}^{1}\int _{\mathbb {R}^{d}}\frac{1}{2}\left\| w\right\| _{A^{-1}(X^{0}\left( t\right) )}^{2}\bar{M}\left( \left. dw\right| t\right) dt\right] \\&\quad \ge E\left[ \frac{1}{2}\int _{0}^{1}\left\| w(t)\right\| _{A^{-1}(X^{0}(t))}^{2}dt\right] , \end{aligned}$$

which is (5.24). This concludes the proof of Theorem 5.7.    \(\square \)

4 Laplace Upper Bound

In this section we prove the variational lower bound

$$\begin{aligned} \liminf _{n\rightarrow \infty }-\varkappa (n)\log E\left[ e^{-\frac{1}{\varkappa (n)}F(Y^{n})}\right] \ge \inf _{\phi \in \mathscr {C}([0,1]:\mathbb {R} ^{d})}\left[ I_{M}(\phi )+F\left( \phi \right) \right] , \end{aligned}$$
(5.30)

which corresponds to the Laplace upper bound.

Suppose for each n that \(\{\bar{\mu }_{i}^{n}\}\) comes within 1 / n of achieving the infimum in (5.15), so that

$$\begin{aligned}&\liminf _{n\rightarrow \infty }-\varkappa (n)\log E\left[ e^{-\frac{1}{\varkappa (n)}F(Y^{n})}\right] \nonumber \\&\quad \ge \liminf _{n\rightarrow \infty }E\left[ {\displaystyle \sum \limits _{i=0}^{n-1}} \varkappa (n)R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))+F(\bar{Y}^{n})\right] . \end{aligned}$$
(5.31)

We also have

$$ \sup _{n\in \mathbb {N}}\varkappa (n)nE\left[ {\displaystyle \sum \limits _{i=0}^{n-1}} \frac{1}{n}R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))\right] \le 2\left\| F\right\| _{\infty }+1. $$

Consequently, (5.17) is satisfied with \(K_{E}=2\left\| F\right\| _{\infty }+1\), and from Theorem 5.7 we can choose for every subsequence of \(\{(\bar{M}^{n},\bar{Y}^{n})\}\) a further subsequence (we retain n as the index for convenience) along which \((\bar{M}^{n},\bar{Y}^{n})\) converges to \((\bar{M},\bar{Y})\) in distribution, \(\bar{M}\), \(\bar{Y}\) are related by (5.22)–(5.23), and such that (5.24) is satisfied. Combining this with (5.31) gives

$$\begin{aligned}&\liminf _{n\rightarrow \infty }-\varkappa (n)\log E\left[ e^{-\frac{1}{\varkappa (n)}F(Y^{n})}\right] \\&\quad \ge \liminf _{n\rightarrow \infty }E\left[ {\displaystyle \sum \limits _{i=0}^{n-1}} \varkappa (n)R(\left. \bar{\mu }_{i}^{n}(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n}))+F(\bar{Y}^{n})\right] \\&\quad \ge E\left[ \int _{0}^{1}\frac{1}{2}\left\| w(s)\right\| _{A^{-1}(X^{0}(s))}^{2}ds+F(\bar{Y})\right] \text {.} \end{aligned}$$

Define \(\phi ^{u}\) for \(u\in \mathscr {L}^{2}([0,1]:\mathbb {R}^{d})\) by

$$\begin{aligned} \phi ^{u}\left( t\right) =\int _{0}^{t}Db(X^{0}(s))\phi ^{u}(s)ds+\int _{0} ^{t}A^{1/2}(X^{0}(s))u(s)ds\text {.} \end{aligned}$$
(5.32)

Recalling

$$ \bar{Y}(t)=\int _{0}^{t}Db(X^{0}(s))\bar{Y}(s)ds+\int _{0}^{t}w(s)ds, $$

it follows using the expression for \(I_{M}\) in (5.10) and (5.11) that

$$\begin{aligned}&E\left[ \int _{0}^{1}\frac{1}{2}\left\| w(s)\right\| _{A^{-1} (X^{0}(s))}^{2}ds+F(\bar{Y})\right] \\&\quad \ge \inf _{u\in \mathscr {L}^{2}([0,1]:\mathbb {R}^{d})}\left[ \int _{0}^{1}\frac{1}{2}\left\| u(s)\right\| ^{2}ds+F(\phi ^{u})\right] \\&\quad =\inf _{\phi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ I_{M} (\phi )+F\left( \phi \right) \right] , \end{aligned}$$

which is the lower bound (5.30).    \(\square \)

5 Laplace Lower Bound

In this section we prove the variational upper bound

$$\begin{aligned} \limsup _{n\rightarrow \infty }-\varkappa (n)\log E\left[ e^{-\frac{1}{\varkappa (n)}F(Y^{n})}\right] \quad \le \inf _{\phi \in \mathscr {C} ([0,1]:\mathbb {R}^{d})}\left[ I_{M}(\phi )+F\left( \phi \right) \right] , \end{aligned}$$
(5.33)

which is the Laplace lower bound. Note that for \(u, v\in \mathscr {L} ^{2}([0,1]:\mathbb {R}^{d})\),

$$\begin{aligned} \phi ^{u}(t)-\phi ^{v}(t)&=\int _{0}^{t}Db(X^{0}(s))\left( \phi ^{u}(s)-\phi ^{v}(s)\right) ds \\&\quad \, +\int _{0}^{t}A^{1/2}(X^{0}(s))(u(s)-v(s))ds. \end{aligned}$$

Thus by Gronwall’s inequality,

$$\begin{aligned}&\sup _{t\in [0,1]}\left\| \phi ^{u}(t)-\phi ^{v}(t)\right\| \nonumber \\&\quad \le e^{K_{b}}\int _{0}^{1}\left\| A^{1/2}(X^{0} (s))(u(s)-v(s))\right\| ds\nonumber \\&\quad \le e^{K_{b}}\left( \int _{0}^{1}\left\| A^{1/2}(X^{0} (s))(u(s)-v(s))\right\| ^{2}ds\right) ^{1/2}\nonumber \\&\quad \le e^{K_{b}}K_{A}^{1/2}\left( \int _{0}^{1}\left\| u(s)-v(s)\right\| ^{2}ds\right) ^{1/2}. \end{aligned}$$
(5.34)

Since \(\mathscr {C}([0,1]:\mathbb {R}^{d})\) is dense in \(\mathscr {L} ^{2}([0,1]:\mathbb {R}^{d})\), the proof of the Laplace lower bound is reduced to showing that for an arbitrary \(u\in \mathscr {C}([0,1]:\mathbb {R}^{d})\),

$$\begin{aligned} \limsup _{n\rightarrow \infty }-\varkappa (n)\log E\left[ e^{-\frac{1}{\varkappa (n)}F(Y^{n})}\right] \le \frac{1}{2}\int _{0}^{1}\left\| u(s)\right\| ^{2}ds+F\left( \phi ^{u}\right) . \end{aligned}$$
(5.35)

We fix \(u\in \mathscr {C}([0,1]:\mathbb {R}^{d})\) for the remainder of the proof.

Remark 5.12

The proof of the lower bound for the moderate deviation problem differs substantially from the corresponding proof of the large deviation problem, especially in regard to the treatment of degenerate noise. For the case of large deviations, this was handled in Chap. 4 using a mollification. In the moderate deviations setting a simpler argument is possible. This is largely due to the form of the rate function, which is the same as that of the small noise diffusion model of Sect. 3.2, but with time-dependent drift \(Db(X^{0}(t))\) and diffusion matrix \(A^{1/2}\left( X^{0}\left( t\right) \right) \). As just discussed, with this form one can find a nearly optimal trajectory for the limit variational problem of the form \(\phi ^{u}\), with u continuous rather than just measurable, which greatly facilitates the construction of nearly optimal controls for the prelimit in the proof of the lower bound. This is not possible for the general model of Chap. 4, since it is not useful to view \(X^{n}\) there as a continuous or nearly continuous mapping on an “exogenous” noise process. In this sense, the moderate deviation problem shares some of the simplifying features of the continuous time models discussed in Sect. 3.2 and at greater length in later chapters.

We now turn to the proof of (5.35) for the given \(u\in \mathscr {C} ([0,1]:\mathbb {R}^{d})\). The main difficulty related to the possible degeneracy of the noise is the following. Since at the prelimit, the controlled processes \(\bar{X}^{n}\) may be close to but not precisely equal to \(X^{0}\), the range of \(A(\bar{X}^{n}(i/n))\) can differ from that of \(A(X^{0}(i/n))\) (at least in the degenerate case). Because of this, the construction of a control that approximates \(A^{1/2}(X^{0}(i/n))u(i/n)\) with nearly optimal cost is not as straightforward as in the nondegenerate case [it is simple in that case due to the invertibility of \(A^{1/2}(\bar{X}^{n}(i/n))\)].

Recall the orthogonal decomposition of \(A^{-1}(x)\) discussed above (5.6). For \(\kappa \in (0,\infty )\), define

$$ A_{\kappa }^{-1}(x)=Q(x)\varLambda _{\kappa }^{-1}(x)Q^{T}(x), $$

where \(\varLambda _{\kappa }^{-1}(x)\) is the diagonal matrix such that \(\varLambda _{ii,\kappa }^{-1}(x)=\varLambda _{ii}^{-1}(x)\) when \(\varLambda _{ii} ^{-1}(x)\le \kappa ^{2}\) and \(\varLambda _{ii,\kappa }^{-1}(x)=\kappa ^{2}\) when \(\varLambda _{ii}^{-1}(x)>\kappa ^{2}\). Note that by [155, Theorem 6.2.37], \(A^{1/2}(x)\), \(A_{\kappa }^{-1}(x)\), and \(A_{\kappa }^{1/2}(x)\) are continuous functions of A(x), and consequently they are also continuous functions of \(x\in \mathbb {R}^{d}\). In addition, define

$$ u_{\kappa }(s)=\left\{ \begin{array} [c]{cc} u(s) &{} \text {for }\left\| u(s)\right\| \le \kappa ,\\ \frac{\kappa u(s)}{\left\| u(s)\right\| } &{} \text {for }\left\| u(s)\right\| >\kappa . \end{array} \right. $$

Let \(\phi ^{u,\kappa }(t)=\phi ^{A_{\kappa }^{-1/2}(X^{0})u_{\kappa }}(t)\), and note that \(\phi ^{u,\kappa }\) solves

$$\begin{aligned} \phi ^{u,\kappa }(t)&=\int _{0}^{t}Db(X^{0}(s))\phi ^{u,\kappa } (s)ds\nonumber \\&\quad +\int _{0}^{t}A(X^{0}(s))A_{\kappa }^{-1/2}(X^{0}(s))u_{\kappa }(s)ds\text {.} \end{aligned}$$
(5.36)

For n sufficiently large,

$$ \max _{0\le i\le n-1}\frac{1}{\sqrt{\varkappa (n)n}}\left\| A_{\kappa }^{-1/2}\left( X^{0}\left( i/n\right) \right) u_{\kappa }\left( i/n\right) \right\| \le \frac{1}{\sqrt{\varkappa (n)n}}\kappa ^{2} \le \lambda _{DA}, $$

and we can define the sequence \(\{(\bar{X}^{n,\kappa },\bar{Y}^{n,\kappa } ,\bar{M}^{n,\kappa }, w^{n,\kappa })\}\) as in Construction 5.3 with

$$\begin{aligned} \bar{\mu }_{i}^{n,\kappa }(dy)&=\exp \left\{ \left\langle y,\frac{1}{\sqrt{\varkappa (n)n}}A_{\kappa }^{-1/2}\left( X^{0}\left( i/n\right) \right) u_{\kappa }\left( i/n\right) \right\rangle \right. \\&\quad \quad \quad \quad \left. -H_{c}\left( \bar{X}_{i}^{n,\kappa },\frac{1}{\sqrt{\varkappa (n)n}}A_{\kappa }^{-1/2}\left( X^{0}\left( i/n\right) \right) u_{\kappa }\left( i/n\right) \right) \right\} \theta (dy|\bar{X}_{i}^{n,\kappa }). \end{aligned}$$

We will use

$$ \int _{\mathbb {R}^{d}}y\exp \{\left\langle y,\alpha \right\rangle -H_{c} (x,\alpha )\}\theta (dy|x)=D_{\alpha }H_{c}(x,\alpha ) $$

and the formula

$$ D_{\alpha }H_{c}(x,\alpha )=D_{\alpha }H_{c}(x,\alpha )-D_{\alpha }H_{c} (x, 0)=\int _{0}^{1}\left( \frac{d}{ds}D_{\alpha }H_{c}(x, s\alpha )\right) ds, $$

where \(D_{\alpha }H_{c}(x, 0)=0\) follows from (5.4). Using (5.5) to approximate second derivatives that appear on the right side of the last display, the bounds (5.7) imply that for \(\left\| \alpha \right\| \le \lambda _{D}A\),

$$\begin{aligned} \left\| \int _{\mathbb {R}^{d}}y\exp \{\left\langle y,\alpha \right\rangle -H_{c}(x,\alpha )\}\theta (dy|x)-A(x)\alpha \right\| \le K_{DA}\left\| \alpha \right\| ^{2}\text {.} \end{aligned}$$
(5.37)

The next result identifies the limit in probability of the controlled processes and an asymptotic bound for the relative entropies. Recall that \(u\in \mathscr {C}(\left[ 0,1\right] :\mathbb {R}^{d})\) has been fixed.

Theorem 5.13

Let \(\kappa \in (0,\infty )\) be given. Consider the controls \(\{\bar{\mu }_{i}^{n,\kappa }\}\) and random variables \(\{(\bar{X}^{n,\kappa },\bar{Y}^{n,\kappa },\bar{M}^{n,\kappa }, w^{n,\kappa })\}\) as in Construction 5.3 with \(\{\bar{\mu }_{i}^{n}\}\) replaced by \(\{\bar{\mu }_{i}^{n,\kappa }\}\), and define \(\phi ^{u,\kappa }\) by (5.36). Then

$$\begin{aligned} \bar{Y}^{n,\kappa }\rightarrow \phi ^{u,\kappa } \end{aligned}$$
(5.38)

in \(\mathscr {C}([0,1]:\mathbb {R}^{d})\) in probability, and

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\varkappa (n)nE\left[ \frac{1}{n} {\displaystyle \sum \limits _{i=0}^{n-1}} R\left( \left. \bar{\mu }_{i}^{n,\kappa }(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n,\kappa })\right) \right] \nonumber \\&\quad \quad \quad \le \frac{1}{2}\int _{0}^{1}\left\| A_{\kappa } ^{-1/2}(X^{0}(s))u_{\kappa }(s)\right\| _{A(X^{0}(s))}^{2} ds. \end{aligned}$$
(5.39)

Proof

From (5.8) and (5.37), for all n large enough that \(\kappa ^{2}/\sqrt{\varkappa (n)n}\le \lambda _{DA}\) and with \(s_{i} ^{n}\doteq i/n\),

$$\begin{aligned}&R\left( \left. \bar{\mu }_{i}^{n,\kappa }(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n,\kappa })\right) \\&\quad =\int _{\mathbb {R}^{d}}\left\langle y,\frac{1}{\sqrt{\varkappa (n)n} }A_{\kappa }^{-1/2}\left( X^{0}\left( s_{i}^{n}\right) \right) u_{\kappa }\left( s_{i}^{n}\right) \right\rangle \bar{\mu }_{i}^{n,\kappa }(dy)\\&\qquad -H_{c}\left( \bar{X}_{i}^{n,\kappa },\frac{1}{\sqrt{\varkappa (n)n} }A_{\kappa }^{-1/2}\left( X^{0}\left( s_{i}^{n}\right) \right) u_{\kappa }\left( s_{i}^{n}\right) \right) \\&\quad \le \frac{1}{{\varkappa (n)n}}\left\langle A\left( \bar{X} _{i}^{n,\kappa }\right) A_{\kappa }^{-1/2}\left( X^{0}\left( s_{i} ^{n}\right) \right) u_{\kappa }\left( s_{i}^{n}\right) , A_{\kappa }^{-1/2}\left( X^{0}\left( s_{i}^{n}\right) \right) u_{\kappa }\left( s_{i}^{n}\right) \right\rangle \\&\qquad -\frac{1}{2{\varkappa (n)n}}\left\langle A\left( \bar{X}_{i}^{n,\kappa }\right) A_{\kappa }^{-1/2}\left( X^{0}\left( s_{i} ^{n}\right) \right) u_{\kappa }\left( s_{i}^{n}\right) ,\right. \\&\qquad \qquad \qquad \left. \quad A_{\kappa } ^{-1/2}\left( X^{0}\left( s_{i}^{n}\right) \right) u_{\kappa }\left( s_{i}^{n}\right) \right\rangle +\frac{2}{(\varkappa (n)n)^{3/2}}K_{DA} \kappa ^{6}\\&\quad =\frac{1}{2\varkappa (n)n}\left\| A_{\kappa }^{-1/2}\left( X^{0}\left( s_{i}^{n}\right) \right) u_{\kappa }\left( s_{i}^{n}\right) \right\| _{A(\bar{X}_{i}^{n,\kappa })}^{2}+\frac{2}{(\varkappa (n)n)^{3/2} }K_{DA}\kappa ^{6}. \end{aligned}$$

Consequently,

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\varkappa (n)nE\left[ \frac{1}{n} {\displaystyle \sum \limits _{i=0}^{n-1}} R\left( \left. \bar{\mu }_{i}^{n,\kappa }(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n,\kappa })\right) \right] \\&\quad \le \limsup _{n\rightarrow \infty }\frac{1}{2}E\left[ \frac{1}{n} {\displaystyle \sum \limits _{i=0}^{n-1}} \left\| A_{\kappa }^{-1/2}\left( X^{0}\left( i/n\right) \right) u_{\kappa }\left( i/n\right) \right\| _{A(\bar{X}_{i}^{n,\kappa })} ^{2}\right] ,\nonumber \end{aligned}$$
(5.40)

where in fact,

$$ \limsup _{n\rightarrow \infty }\frac{1}{2}E\left[ \frac{1}{n} {\displaystyle \sum \limits _{i=0}^{n-1}} \left\| A_{\kappa }^{-1/2}\left( X^{0}\left( i/n\right) \right) u_{\kappa }\left( i/n\right) \right\| _{A(\bar{X}_{i}^{n,\kappa })} ^{2}\right] \le \frac{1}{2}\kappa ^{4}K_{A}. $$

Therefore, (5.17) is satisfied by \(\{\bar{\mu }_{i}^{n,\kappa }\}\). Thus the conclusions of Theorem 5.7 hold with \(\bar{Y}^{n}\), \(\bar{M}^{n}\) replaced by \(\bar{Y}^{n,\kappa }\), \(\bar{M}^{n,\kappa } \). Choose a subsequence (keeping n as the index for convenience) along which \(\{(\bar{M}^{n,\kappa },\bar{Y}^{n,\kappa })\}\) converges weakly to some limit \((\bar{M}^{\kappa },\bar{Y}^{\kappa })\), where \([\bar{M}^{\kappa }]_{2}\) is Lebesgue measure and

$$ \bar{Y}^{\kappa }(t)=\int _{0}^{t}Db(X^{0}(s))\bar{Y}^{\kappa }(s)ds+\int _{0} ^{t}\int _{\mathbb {R}^{d}}w[\bar{M}^{\kappa }]_{1|2}(dw\left| s\right. )ds. $$

Then \(\bar{Y}^{n,\kappa }\rightarrow \bar{Y}^{\kappa }\) implies

$$ \sup _{t\in [0,1]}\left\| \bar{X}^{n,\kappa }(t)-X^{0}(t)\right\| \rightarrow 0 $$

in probability. Because of this and the continuity of \(A^{1/2}(x)\), we have (recall \(s^{n}(t)\doteq \left\lfloor nt\right\rfloor /n\))

$$ \sup _{t\in [0,1]}\left\| A^{1/2}(\bar{X}^{n,\kappa }(s^{n} (t)))-A^{1/2}(X^{0}(s^{n}(t)))\right\| \rightarrow 0 $$

in probability. However, the continuity of \(t\mapsto A^{1/2}(X^{0} (t))A_{\kappa }^{-1/2}(X^{0}(t))u_{\kappa }(t)\) gives

$$\begin{aligned}&\sup _{t\in [0,1]}\left\| A^{1/2}(X^{0}(s^{n}(t)))A_{\kappa } ^{-1/2}(X^{0}(s^{n}(t)))u_{\kappa }(s^{n}(t))\right. \\&\qquad \qquad \left. -A^{1/2}(X^{0}(t))A_{\kappa }^{-1/2}(X^{0}(t))u_{\kappa }(t)\right\| \rightarrow 0\text {.} \end{aligned}$$

Combining these limits, and using the fact that \(A_{\kappa }^{-1/2} (X^{0}(t))u_{\kappa }(t)\) is uniformly bounded, shows that

$$\begin{aligned}&\sup _{t\in [0,1]}\left\| A^{1/2}(\bar{X}^{n,\kappa }(s^{n} (t)))A_{\kappa }^{-1/2}(X^{0}(s^{n}(t)))u_{\kappa }(s^{n}(t))\right. \\&\qquad \qquad \left. -A^{1/2}(X^{0}(t))A_{\kappa }^{-1/2}(X^{0}(t))u_{\kappa }(t)\right\| \rightarrow 0\nonumber \end{aligned}$$
(5.41)

in probability. This combined with the uniform bounds allows the use of the dominated convergence theorem to show that

$$\begin{aligned}&\limsup _{n\rightarrow \infty }E\left[ \frac{1}{2}\int _{0}^{1}\left\| A_{\kappa }^{-1/2}(X^{0}(s^{n}(t)))u_{\kappa }(s^{n}(t))\right\| _{A(\bar{X}^{n,\kappa }(s^{n}(t)))}^{2}dt\right] \\&\quad =\frac{1}{2}\int _{0}^{1}\left\| A_{\kappa }^{-1/2}(X^{0} (t))u_{\kappa }(t)\right\| _{A(X^{0}(t))}^{2}dt. \end{aligned}$$

Combining this with (5.40) demonstrates (5.39).

To prove (5.38), we will show that in fact,

$$ \bar{M}^{\kappa }(dw\times dt)=\delta _{A(X^{0}(t))A_{\kappa }^{-1/2} (X^{0}(t))u_{\kappa }(t)}(dw)dt. $$

For all \(\sigma >0\), let

$$ G_{\sigma }\doteq \left\{ (z, t)\in \mathbb {R}^{d}\times [0,1]:\left\| z-A(X^{0}(t))A_{\kappa }^{-1/2}(X^{0}(t))u_{\kappa }(t)\right\| \le \sigma \right\} , $$

and note that by weak convergence, \(\limsup _{n\rightarrow \infty }E[\bar{M}^{n,\kappa }(G_{\sigma })]\le E[\bar{M}^{\kappa }(G_{\sigma })]\). Note also that

$$\begin{aligned}&E[\bar{M}^{n,\kappa }(G_{\sigma })]\\&\quad \ge P\left[ \sup _{t\in [0,1]}\left\| \sqrt{\varkappa (n)n}\int _{\mathbb {R}^{d}}y\bar{\mu }_{\left\lfloor nt\right\rfloor } ^{n,\kappa }(dy)-A(X^{0}(t))A_{\kappa }^{-1/2}(X^{0}(t))u_{\kappa }(t)\right\| \le \sigma \right] \text {.} \end{aligned}$$

However, by (5.37) we can choose n large enough to make

$$ \sup _{t\in [0,1]}\left\| \sqrt{\varkappa (n)n}\int _{\mathbb {R}^{d} }y\bar{\mu }_{\left\lfloor nt\right\rfloor }^{n,\kappa }(dy)-A\left( \bar{X}^{n,\kappa }\left( s^{n}(t)\right) \right) A_{\kappa }^{-1/2}\left( X^{0}\left( s^{n}(t)\right) \right) u_{\kappa }\left( s^{n}(t)\right) \right\| $$

arbitrarily small, and the proof that

$$\begin{aligned}&\sup _{t\in [0,1]}\left\| A(\bar{X}^{n,\kappa }(s^{n}(t)))A_{\kappa }^{-1/2}(X^{0}(s^{n}(t)))u_{\kappa }(s^{n}(t))\right. \\&\qquad \quad \left. -A(X^{0}(t))A_{\kappa }^{-1/2}(X^{0}(t))u_{\kappa }(t)\right\| \rightarrow 0 \end{aligned}$$

in probability is identical to the proof of (5.41). Hence \(\limsup _{n\rightarrow \infty }E[\bar{M}^{n,\kappa }(G_{\sigma })]=1\) for all \(\sigma >0\), and so \(E[\bar{M}^{\kappa }(\cap _{n\in \mathbb {N}} G_{1/n})]=1\). This implies that with probability 1,

$$ [\bar{M}]_{1|2}^{\kappa }(dw\left| t\right. )=\delta _{A(X^{0} (t))A_{\kappa }^{-1/2}(X^{0}(t))u_{\kappa }}(dw) $$

for a.e. t. It follows that

$$ \bar{Y}^{\kappa }(t)=\int _{0}^{t}Db(X^{0}(s))\bar{Y}^{\kappa }(s)ds+\int _{0} ^{t}A(X^{0}(s))A_{\kappa }^{-1/2}(X^{0}(s))u_{\kappa }(s)ds, $$

and therefore \(\bar{Y}^{n,\kappa }\rightarrow \phi ^{u,\kappa }\) weakly. This implies (5.38) and completes the proof.    \(\square \)

The second theorem in this section allows us to approximate \(F(\phi ^{u})\) by \(F(\phi ^{u,\kappa })\) and \(\frac{1}{2}\int _{0}^{1}\left\| u(s)\right\| ^{2}ds\) by

$$ \frac{1}{2}\int _{0}^{1}\left\| A_{\kappa }^{-1/2}(X^{0}(s))u_{\kappa }(s)\right\| _{A(X^{0}(s))}^{2}ds. $$

Recall that \(u\in \mathscr {C}([0,1]:\mathbb {R}^{d})\) has been given.

Theorem 5.14

Define \(\phi ^{u}\) by (5.32) and \(\phi ^{u,\kappa }\) by (5.36). Then \(\phi ^{u,\kappa }\rightarrow \phi ^{u}\) in \(\mathscr {C}([0,1]:\mathbb {R}^{d})\) and

$$ \limsup _{\kappa \rightarrow \infty }\frac{1}{2}\int _{0}^{1}\left\| A_{\kappa }^{-1/2}(X^{0}(s))u_{\kappa }(s)\right\| _{A(X^{0}(s))}^{2}ds\le \frac{1}{2}\int _{0}^{1}\left\| u(s)\right\| ^{2}ds\text {.} $$

Proof

Note that

$$ \left\| A^{1/2}(X^{0}(s))A_{\kappa }^{-1/2}(X^{0}(s))u_{\kappa }(s)\right\| \le \left\| u(s)\right\| $$

for all \(s\in [0,1]\). Thus it is automatic that

$$ \limsup _{\kappa \rightarrow \infty }\frac{1}{2}\int _{0}^{1}\left\| A_{\kappa }^{-1/2}(X^{0}(s))u_{\kappa }(s)\right\| _{A(X^{0}(s))}^{2}ds\le \frac{1}{2}\int _{0}^{1}\left\| u(s)\right\| ^{2}ds\text {.} $$

Also, for \(x\in \mathbb {R}^{d}\),

$$ A(x)A_{\kappa }^{-1/2}(x)=Q(x)\varLambda (x)\varLambda _{\kappa }^{-1/2}(x)Q^{T} (x)\rightarrow Q(x)\varLambda ^{1/2}(x)Q^{T}(x)=A^{1/2}(x). $$

Since \(u_{\kappa }(s)\rightarrow u(s)\) for all \(s\in [0,1]\),

$$\begin{aligned} A(X^{0}(s))A_{\kappa }^{-1/2}(X^{0}(s))u_{\kappa }(s)\rightarrow A^{1/2} (X^{0}(s))u(s) \end{aligned}$$
(5.42)

pointwise. Since \(u\in \mathscr {L}^{2}([0,1]:\mathbb {R}^{d})\), by the dominated convergence theorem, (5.42) also holds in \(\mathscr {L}^{2} ([0,1]:\mathbb {R}^{d})\). Combining this with the second inequality in (5.34) shows that \(\phi ^{u,\kappa }\rightarrow \phi ^{u}\) in \(\mathscr {C}([0,1]:\mathbb {R}^{d})\).    \(\square \)

Using (5.15) and the fact that any given control is suboptimal yields

$$ -\varkappa (n)\log E\left[ e^{-\frac{1}{\varkappa (n)}F(Y^{n})}\right] \le E\left[ {\displaystyle \sum \limits _{i=0}^{n-1}} \varkappa (n)R\left( \left. \bar{\mu }_{i}^{n,\kappa }(\cdot )\right\| \theta (\cdot |\bar{X}_{i}^{n,\kappa })\right) +F(\bar{Y}^{n,\kappa })\right] . $$

Using Theorem 5.13, this implies

$$\begin{aligned}&\limsup _{n\rightarrow \infty }-\varkappa (n)\log E\left[ e^{-\frac{1}{\varkappa (n)}F(Y^{n})}\right] \\&\quad \le \frac{1}{2}\int _{0}^{1}\left\| A_{\kappa }^{-1/2}(X^{0} (s))u_{\kappa }(s)\right\| _{A(X^{0}(s))}^{2}ds+F(\phi ^{u,\kappa })\text {.} \end{aligned}$$

Sending \(\kappa \rightarrow \infty \) and using Theorem 5.14 gives (5.35), which completes the proof of (5.33).    \(\square \)

6 Notes

Among the earliest papers to study moderate deviations are those by Rubin and Sethuraman [222], Ghosh [146], and Michel [200]. See the introduction of [41] for a more complete discussion of work in this area. While a number of settings have been considered, to the authors’ knowledge the first papers to consider moderate deviations for small noise processes around solutions to general nonlinear ODEs (rather than constant-velocity trajectories) are [100] for discrete time models, upon which this chapter is based, and [41] for continuous time processes. As noted in the introduction to the chapter, the proof of the moderate deviation principle presented here is neither uniformly harder nor easier than its large deviation counterpart, at least when one is using weak convergence methods. In particular, the large deviation upper bound is made more difficult due to difficulties in using tightness in the convergence analysis. (The case of solutions to SDEs driven by Brownian motion, which is given as an example in Chap. 3, is in fact much easier, owing to the fact that the driving noise is already Gaussian.) Also, the assumed conditions are not strictly weaker, mainly in that additional smoothness is needed for the proper centering and rescaling. Moderate deviation principles will also appear in Chaps. 10 and 13.