In Chap. 3 we presented several examples of representations and how they could be used for large deviation analysis. A simplifying feature of all the examples of Chap. 3 is that the process models (e.g., empirical measure, solution to an SDE) could be thought of as a “nice” functional of a process that is “white” in the time variable, by which we mean independent in the setting of discrete time, and with independent increments in the setting of continuous time (see Sect. 3.5 for what is meant by a nice functional in the case of small noise SDEs).

In this chapter we study a model for which there is, in general, no convenient representation as a functional of white noise. Note that we do not claim that such a representation is impossible, but rather that it will not (in general) be useful, e.g., in proving law of large numbers limits. Because of this feature, a more complex representation and weak convergence analysis cannot be avoided. In particular, the “base” measure in the representation will be a Markov measure rather than a product measure, and the process model will be a general “small noise” Markov process. The model provides a substantial generalization of the random walk considered in Cramér’s theorem. It occurs frequently in stochastic systems theory, e.g., stochastic approximation and related recursive algorithms [18, 182, 193], where the rate function can be used to define a rate of convergence [102]. The model also arises as a discrete time approximation to various continuous time models, such as the small noise SDE in Sect. 3.2.1 of Chap. 3, and indeed provides an alternative approach to proving large deviation estimates for such models (though we much prefer the direct approach of Chap. 3).

1 Process Model

We begin with a description of the process model. Suppose that \(\theta (dy|x)\) is a stochastic kernel on \(\mathbb {R}^{d}\) given \(\mathbb {R}^{d}\). One can construct a probability space that supports iid random vector fields \(\left\{ v_{i}(x), i\in \mathbb {N}_{0}, x\in \mathbb {R}^{d}\right\} \) , with the property that for all \(x\in \mathbb {R}^{d}\), \(v_{i}(x)\) has distribution \(\theta (\cdot |x)\). To be precise, there exists a probability space \((\varOmega ,\mathscr {F}, P)\) such that for each \(i\in \mathbb {N}_{0}\), \(v_{i}\) is a measurable map from \(\mathbb {R}^{d}\times \varOmega \) to \(\mathbb {R}^{d}\); for \(k\in \mathbb {N}\) and distinct \(i_{1},\ldots , i_{k}\in \mathbb {N}_{0}\) and \(x_{i_{1}},\ldots , x_{i_{k}}\in \mathbb {R}^{d}\), the random vectors \(v_{i_{1}}(x_{i_{1}}),\ldots , v_{i_{k}}(x_{i_{k}})\) are mutually independent; and for each \(i\in \mathbb {N}_{0}\), \(v_{i}(x)\) has distribution \(\theta (\cdot |x)\). We then define for each \(n\in \mathbb {N}\) a Markov process \(\left\{ X_{i}^{n}\right\} _{i=0,\ldots , n}\) by setting

$$\begin{aligned} X_{i+1}^{n}=X_{i}^{n}+\frac{1}{n}v_{i}(X_{i}^{n}),{\quad }X_{0}^{n}=x_{0}. \end{aligned}$$
(4.1)

This discrete time process is interpolated into continuous time according to

$$\begin{aligned} X^{n}(t)=X_{i}^{n}+\left[ X_{i+1}^{n}-X_{i}^{n}\right] \left( nt-i\right) ,\quad t\in [i/n, i/n+1/n]. \end{aligned}$$
(4.2)

The goal of this chapter is to study a large deviation principle for the sequence \(\{X^{n}\}_{n\in \mathbb {N}}\) of \(\mathscr {C}([0,T]:\mathbb {R}^{d})\)-valued random variables.

Example 4.1

Suppose that for each \(x\in \mathbb {R}^{d}\), \(v_{i}(x)\) has a normal distribution with continuous mean b(x) and covariance \(\sigma (x)\sigma ^{T}(x)\). Then \(X^{n}(t)\) is the Euler approximation with step size 1 / n to the SDE (3.15) with drift coefficient b, diffusion coefficient \(\sigma \), and \(\varepsilon =1/n\).

Example 4.2

For an example in the form of a stochastic approximation algorithm, take \(v_{i}(x)=-\nabla V(x)+w_{i}\), where the \(w_{i}\) are iid with \(Ew_{i}=0\) and V is a smooth function. In this case, 1 / n is the “gain” of the algorithm [18, 182].

Of course, to prove an LDP for \(\left\{ X^{n}\right\} _{n\in \mathbb {N}}\), additional assumptions must be made. For \(x\in \mathbb {R}^{d}\) and \(\alpha \in \mathbb {R}^{d}\), define

$$ H(x,\alpha )\doteq \log Ee^{\left\langle \alpha , v_{i}(x)\right\rangle }. $$

Condition 4.3

(a) For each \(\alpha \in \mathbb {R}^{d}\), \(\sup _{x\in \mathbb {R}^{d} }H(x,\alpha )<\infty \).

(b) The mapping \(x\mapsto \theta (\cdot |x)\) from \(\mathbb {R}^{d}\) to \(\mathscr {P}(\mathbb {R}^{d})\) is continuous in the topology of weak convergence.

The first condition is not needed for an LDP to hold. However, if \(H(x,\alpha )=\infty \) for some values of x and \(\alpha \), then \(v_{i}(x)\) has relatively heavy tails in certain directions. Paths with jumps may be important from the perspective of large deviations, and the setting used here is no longer appropriate. The second condition can also be weakened. However, this often leads to a qualitatively different form of the rate function, and the process models that violate this condition are said to have “discontinuous statistics” [95, 98]. For an example of such a process but in continuous time, see Chap. 13.

2 The Representation

The first issue to resolve is the formulation of a representation that reflects the natural structure of the process model. As noted at the beginning of the chapter, it is possible to represent \(\left\{ X^{n}\right\} \) in terms of iid random variables, e.g., in the form \(X_{i+1}^{n}=X_{i}^{n} +\frac{1}{n}g(X_{i}^{n}, U_{i})\), where g is measurable and the \(\left\{ U_{i}, i\in \mathbb {N}_{0}\right\} \) are iid random variables with uniform distribution on [0, 1]. Although this form would allow a representation in terms of an iid base measure, it would not be useful. This is because the map g is not in general continuous in x, and hence this formulation is poorly suited for even a law of large numbers analysis.

An alternative and more useful representation follows from the form (4.1) and the continuity of \(x\mapsto \theta (\cdot |x)\). Following our convention, we present only the representation needed to prove an LDP on \(\mathscr {C}([0,1]:\mathbb {R}^{d})\), but the analogous representation holds with [0, 1] replaced by any interval [0, T], \(T<\infty \). The line of argument used to prove the LDP will adapt the arguments used for Sanov’s theorem and Cramér’s theorem to this functional setting. However, obtaining “process-level” information requires a more complicated empirical measure than the one used for Sanov’s theorem. Define \(L^{n}\) by

$$\begin{aligned} L^{n}(A\times B)\doteq \int _{B}L^{n}(A|t)dt,\quad L^{n}(A|t)\doteq \delta _{v_{i}(X_{i}^{n})}(A)\text { if }t\in [i/n, i/n+1/n) \end{aligned}$$
(4.3)

for Borel sets \(A\subset \mathbb {R}^{d}\) and \(B\subset [0,1]\). This measure and controlled analogues to be introduced below record the joint empirical distribution of velocity and time. Owing to conflicting but standard usage, in this chapter L is used for both an empirical measure (as defined above) and the local rate function. The intended use should always be clear, since the former appears only as \(L^n\), and the latter as L.

The following construction identifies quantities that will appear in the representation as well as others to be used in the convergence analysis. As first discussed in Sect. 3.1, we can consider \([\mu ^{n}]_{i|0,\ldots , i-1}(dv_{i}|\bar{v}_{0}^{n},\ldots ,\bar{v}_{i-1}^{n})\) to be simply a random measure on \(\mathbb {R}^{d}\) that is measurable with respect to \(\sigma (\bar{v}_{j}^{n}, j=0,\ldots , i-1)=\sigma (\bar{X}_{j}^{n}, j=1,\ldots , i)\), and this \(\omega \)-dependent measure is denoted by \(\bar{\mu }_{i}^{n}(dv_{i})\). Also as in Sect. 3.1, for notational convenience we assume that the original processes as well as controlled analogues are all defined on the same probability space. Note that the role of the “driving noises” played by \(X_{i}\) in Sect. 3.1 is here played by \(v_{i}(X_{i}^{n})\). The measure \(\mu ^{n}\) picks new distributions for these driving noises, as reflected by the notation. Another minor notational difference is that the noise index is from 0 to \(n-1\) rather than 1 to n.

Construction 4.4

Suppose we are given a probability measure \(\mu ^{n}\in \mathscr {P}((\mathbb {R}^{d})^{n})\) and decompose it in terms of conditional distributions \([\mu ^{n}]_{i|1,\ldots , i-1}\) on the ith variable given variables 0 through \(i-1\):

$$\begin{aligned} \mu ^{n}(dv_{0}\times \cdots \times dv_{n-1})&= [\mu ^{n}]_{0}(dv_{0})[\mu ^{n}]_{1|0}(dv_{1}|v_{0})\\&\quad \times \cdots \times [\mu ^{n}]_{n-1|1,\ldots , n-2}(dv_{n-1}|v_{0},\ldots , v_{n-2}). \end{aligned}$$

Let \(\{\bar{v}_{i}^{n}\}_{i=0,\ldots , n-1}\) be random variables defined on a probability space \((\varOmega ,\mathscr {F}, P)\) and with joint distribution \(\mu ^{n}\). Thus conditioned on \(\bar{\mathscr {F}}_{i}^{n}\doteq \sigma (\bar{v}_{j}^{n}, j=0,\ldots , i-1)\), \(\bar{v}_{i}^{n}\) has distribution \(\bar{\mu }_{i}^{n}(dv_{i})\doteq [\mu ^{n}]_{i|0,\ldots , i-1}(dv_{i}|\bar{v}_{0}^{n},\ldots ,\bar{v}_{i-1}^{n})\). The collection \(\{\bar{\mu }_{i}^{n}\}_{i=0,\ldots , n-1}\) will be called a control. Then controlled processes \(\bar{X}^{n}\) and measures \(\bar{L}^{n}\) are recursively constructed as follows. Let \(\bar{X}_{0}^{n}=x_{0}\), and for \(i=1,\ldots , n\) define \(\bar{X}_{i}^{n}\) recursively by

$$ \bar{X}_{i+1}^{n}=\bar{X}_{i}^{n}+\frac{1}{n}\bar{v}_{i}^{n}. $$

When \(\{\bar{X}_{i}^{n}\}_{i=1,\ldots , n}\) has been constructed, \(\bar{X}^{n}(t)\) is defined as in (4.2) as piecewise linear interpolation, and

$$ \bar{L}^{n}(A\times B)\doteq \int _{B}\bar{L}^{n}(A|t)dt,\quad \bar{L}^{n}(A|t)\doteq \delta _{\bar{v}_{i}^{n}}(A)\text { if }t\in [i/n, i/n+1/n). $$

We also define

$$ \bar{\mu }^{n}(A\times B)\doteq \int _{B}\bar{\mu }^{n}(A|t)dt,\quad \bar{\mu }^{n}(A|t)\doteq \bar{\mu }_{i}^{n}(A)\text { if }t\in [i/n, i/n+1/n) $$

and

$$ \lambda ^{n}(A\times B)\doteq \int _{B}\lambda ^{n}(A|t)dt,\quad \lambda ^{n}(A|t)\doteq \theta (A|\bar{X}_{i}^{n})\text { if }t\in [i/n, i/n+1/n). $$

The measures \(\bar{\mu }^{n}(dx\times dt)\) record the time dependence of the \(\bar{\mu }_{i}^{n}\). When taking limits, we will also want to keep track of the corresponding \(\theta (\cdot |\bar{X}_{i}^{n})\), since the two appear together in the relative entropy representation. This information is recorded in \(\lambda ^{n}\in \mathscr {P}(\mathbb {R}^{d}\times [0,1])\). Note also that, as remarked previously, \(\mathscr {\bar{F}}_{i}^{n}=\sigma (\bar{X}_{j}^{n}, j=1,\ldots , i)\).

Theorem 4.5

Let \(G:\mathscr {P}(\mathbb {R}^{d}\times [0,1])\rightarrow \mathbb {R}\) be bounded from below and measurable. Let \(L^{n}\) be defined as in (4.3), and given a control \(\{\bar{\mu }_{i}^{n}\}\), let \(\{\bar{X}_{i}^{n}\}\) and \(\{\bar{L}^{n}\}\) be defined as in Construction 4.4. Then

$$ -\frac{1}{n}\log Ee^{-nG(L^{n})}=\inf _{\ \{\bar{\mu }_{i}^{n}\}}E\left[ G(\bar{L}^{n})+\frac{1}{n}\sum _{i=0}^{n-1}R\left( \bar{\mu }_{i}^{n}(\cdot )\left\| \theta (\cdot |\bar{X}_{i}^{n})\right. \right) \right] . $$

Proof

The representation follows directly from the high-level variational representation for exponential integrals [part (a) of Proposition 2.3] and the chain rule [Theorem 2.6], and the argument is almost the same as that used to derive the representation used to prove Sanov’s theorem [Proposition 3.1]. The only difference is that the base measure in that case was product measure, reflecting the iid noise structure. Here the base measure is

$$ \theta (dv_{0}|x_{0}^{n})\theta (dv_{1}|x_{1}^{n})\times \cdots \times \theta (dv_{n-1}|x_{n-1}^{n}), $$

where

$$ x_{i}^{n}=x_{0}+\frac{1}{n}\sum _{j=0}^{i-1}v_{j}. $$

One applies the chain rule exactly as was done in Proposition 3.1. The change in the base measure is reflected by a change in the measures appearing in the relative entropy cost, i.e., \(\theta (\cdot |\bar{X}_{i}^{n})\) rather than \(\theta (\cdot )\) as in the iid case.   \(\square \)

Note that the definition of \(\bar{L}^{n}\) allows us to write

$$ \bar{X}^{n}(t)=\int _{\mathbb {R}^{d}\times [0,t]}y\bar{L}^{n}(dy\times ds)+x_{0}. $$

Thus a special case of the representation in Theorem 4.5 occurs for F that is a bounded and measurable map from \(\mathscr {C}([0,1]:\mathbb {R}^{d})\) to \(\mathbb {R}\):

$$\begin{aligned} -\frac{1}{n}\log Ee^{-nF(X^{n})}=\inf _{\ \{\bar{\mu }_{i}^{n}\}}E\left[ F(\bar{X}^{n})+\frac{1}{n}\sum _{i=0}^{n-1}R\left( \bar{\mu }_{i}^{n}(\cdot )\left\| \theta (\cdot |\bar{X}_{i}^{n})\right. \right) \right] . \end{aligned}$$
(4.4)

This representation will be used in the proof of the LDP for \(\{X^{n}\}\). As in passing from Sanov’s theorem to Cramér’s theorem, convergence of \(\bar{L}^{n}\) plus some uniform integrability will imply convergence of \(\bar{X}^{n}\).

Remark 4.6

Although the proof of the LDP requires only bounded F (and hence bounded G), we state Theorem 4.5 so as to allow its use in the analysis of importance sampling in Chap. 15, where unbounded functionals cannot be avoided.

3 Form of the Rate Function

Before going further, we pause to comment on the expected form of the rate function. We give a completely heuristic calculation, based on a time scale separation due to the 1 / n scaling of the noise and the weak continuity of \(x\rightarrow \theta (\cdot |x)\), which suggests the correct form of the rate function. Over an interval \([s, s+\delta ]\), with \(\delta >0\) small and \(1/\delta \in \mathbb {N}\), the noise terms in the definition of \(X^{n}(s+\delta )-X^{n}(s)\) are approximately iid with distribution \(\theta (\cdot |X^{n}(s))\). Therefore,

$$ \frac{X^{n}(s+\delta )-X^{n}(s)}{\delta }\approx \frac{1}{n\delta }\sum _{i=\left\lfloor ns\right\rfloor }^{\left\lfloor ns+n\delta \right\rfloor }v_{i}(X^{n}(s)), $$

and by Cramér’s theorem, the right-hand side satisfies an LDP with the rate function \(\delta L(X^{n}(s),\beta )\), where

$$\begin{aligned} L(x,\beta )=\inf \left[ R\left( \mu (\cdot )\left\| \theta (\cdot |x)\right. \right) :\int _{\mathbb {R}^{d}}y\mu (dy)=\beta \right] . \end{aligned}$$
(4.5)

Suppose that \(\sigma >0\) is small, and that in the following display, \(B(y,\sigma )\) denotes a (context-dependent) open ball of radius \(\sigma \). Using the Markov property to combine estimates over small intervals, for a smooth trajectory \(\phi \in \mathscr {C}([0,1]:\mathbb {R}^{d})\) that starts at \(x_{0}\), we have

$$\begin{aligned}&P\left\{ X^{n}\in B(\phi ,\sigma )\right\} \\&\quad \approx P\left\{ X^{n}(j\delta )\in B(\phi (j\delta ),\sigma )\text { all }1\le j\le 1/\delta \right\} \\&\quad \approx P\left\{ \frac{X^{n}(j\delta +\delta )-X^{n}(j\delta )}{\delta }\in B\left( \frac{\phi (j\delta +\delta )-\phi (j\delta )}{\delta },\frac{2\sigma }{\delta }\right) ,\text { }0\le j<\frac{1}{\delta }\right\} \\&\quad \approx \prod _{j=0}^{\frac{1}{\delta }-1}\exp \left\{ -n\delta L\left( \phi (j\delta ),\frac{\phi (j\delta +\delta )-\phi (j\delta )}{\delta }\right) \right\} \\&\quad \approx \exp \left\{ -n\int _{0}^{1}L(\phi (s),\dot{\phi }(s))ds\right\} . \end{aligned}$$

Therefore, one may expect the rate function \(I(\phi )=\int _{0}^{1}L(\phi (s),\dot{\phi }(s))ds\) for such \(\phi \). Owing to this interpretation, \(\beta \mapsto L(x,\beta )\) is often called a local rate function in this context.

4 Statement of the LDP

We now turn to the rigorous analysis. As was the case in Chap. 3 with Sanov’s theorem and small noise diffusions, we first establish tightness, and then prove a result that links the limits of weakly converging controls and controlled processes. With these results in hand, the Laplace principle is proved by establishing upper and lower bounds. The conditions we assume and some of the arguments are close to those used in [97]. However, the perspective is somewhat different, with the main argument being a functional version of the one used to obtain Cramér’s theorem from Sanov’s theorem, and we also set the arguments up so they can easily be adapted to the problems of importance sampling considered later in the book.

We show that Condition 4.3 by itself suffices for the Laplace principle and large deviation upper bound. For the lower bound we need additional conditions. Two types of conditions will be used, and are formulated as Conditions 4.7 and 4.8 below. The Laplace principle lower bound under Conditions 4.3 and 4.7 will be proved in Sect. 4.7, and under Conditions 4.3 and 4.8 it will be proved in Sect. 4.8. The convex hull of the support of \(\mu \in \mathscr {P}(\mathbb {R}^{d})\) is the smallest closed and convex set \(A\subset \mathbb {R}^{d}\) such that \(\mu (A)=1\).

Condition 4.7

For each \(x\in \mathbb {R}^{d}\), the convex hull of the support of \(\theta (\cdot |x)\) is \(\mathbb {R}^{d}\).

Condition 4.8

For every compact \(K\subset \mathbb {R}^{d}\) and \(\varepsilon \in (0,1)\), there exist \(\eta =\eta (K,\varepsilon )\in (0,1)\) and \(m=m(K,\varepsilon )\in (0,\infty )\), such that whenever \(\xi ,\chi \in K\) satisfy \(\Vert \xi -\chi \Vert \le \eta \), we can find for each \(\gamma \in \mathbb {R}^{d}\) a \(\beta \in \mathbb {R}^{d}\) such that

$$ L(\xi ,\beta )-L(\chi ,\gamma )\le \varepsilon (1+L(\chi ,\gamma )),\;\;\Vert \beta -\gamma \Vert \le m(1+L(\chi ,\gamma ))\Vert \xi -\chi \Vert . $$

Condition 4.7 can be weakened to the requirement that the relative interior of the convex hull of the support of \(\theta (\cdot |x)\) be independent of x and contain 0 (see Sect. 6.3 of [97]). Condition 4.8 is very important in that it allows the noise to push the process in only a subset of all possible directions. For example, if the model of Example 4.1 corresponds to a degenerate diffusion, which means that \(\sigma (x)\sigma ^{T}(x)\) is only positive semidefinite, then Condition 4.7 is not valid, but under the assumption that b and \(\sigma \) are Lipschitz continuous, Condition 4.8 holds. Under similar Lipschitz-type assumptions, Condition 4.8 is satisfied for a broad range of models, and we refer the reader to Sect. 6.3 in [97] for additional illustrative examples.

Recall that \(\mathscr {AC}_{x_{0}}([0,T]:\mathbb {R}^{d})\) denotes the subset of \(\mathscr {C}([0,T]:\mathbb {R}^{d})\) consisting of all absolutely continuous functions satisfying \(\phi (0)=x_{0}\).

Theorem 4.9

Assume Condition 4.3 and define \(X^{n}\) by (4.2) and \(L:\mathbb {R}^{d}\times \mathbb {R}^{d}\rightarrow [0,\infty )\) by (4.5). Let

$$ I(\phi )=\int _{0}^{T}L(\phi (s),\dot{\phi }(s))ds \quad \text {if }\phi \in \mathscr {AC}_{x_{0}}([0,T]:\mathbb {R}^{d}), $$

and in all other cases set \(I(\phi )=\infty \). Then the following conclusions hold.

(a) I is a rate function and \(\left\{ X^{n}\right\} _{n\in \mathbb {N}}\) satisfies the Laplace principle upper bound with rate function I.

(b) Suppose that in addition, either Condition 4.7 or Condition 4.8 holds. Then \(\left\{ X^{n}\right\} _{n\in \mathbb {N}}\) satisfies the Laplace principle with rate function I.

Remark 4.10

In the proofs to follow, the initial condition is fixed at \(x_{0}\). However, the arguments apply with only notational changes if instead we consider a sequence of initial conditions \(\{x_{0}^{n}\}_{n\in \mathbb {N}}\) with \(x_{0}^{n}\rightarrow x_{0}\), and establish

$$\begin{aligned} \frac{1}{n}\log E_{x_{0}^{n}}e^{-nF(X^{n})}+\inf _{\phi :\phi (0)=x_{0}^{n}}\left[ F(\phi )+I(\phi )\right] \rightarrow 0.\end{aligned}$$
(4.6)

Using an elementary argument by contradiction, this implies that the Laplace and large deviation principles hold uniformly for initial conditions in compact sets, as defined in Chap. 1. To be specific, if the uniform Laplace principle is not valid, then there exists a compact set \(K\subset \mathbb {R}^{d}\), \(\delta >0\), and for each \(n\in \mathbb {N}\), an initial condition \(x_{0}^{n}\in K\) such that

$$\begin{aligned} \left| \frac{1}{n}\log E_{x_{0}^{n}}e^{-nF(X^{n})}+\inf _{\phi :\phi (0)=x_{0}^{n}}\left[ F(\phi )+I(\phi )\right] \right| \ge \delta .\end{aligned}$$
(4.7)

However, since K is compact, there exist a subsequence \(x_{0}^{n_{k}}\) and \(x_{0}\in K\) such that \(x_{0}^{n_{k}}\rightarrow x_{0}\). Then (4.6) contradicts (4.7), and thus the uniform Laplace principle holds.

The rest of the chapter is organized as follows. In Sect. 4.5 we prove part (a) of Theorem 4.9. In preparation for the (two) proofs of the Laplace lower bound, Sect. 4.6 studies some basic properties of the function \(L(x,\beta )\). The last two sections of the chapter, Sects. 4.7 and 4.8, contain the proof of the lower bound under Condition 4.7 and Condition 4.8, respectively. Throughout the chapter we assume Condition 4.3, and to simplify notation, proofs are given for \(T=1\).

5 Laplace Upper Bound

We begin with preliminary results on tightness and uniform integrability of the controlled processes from Sect. 4.2.

5.1 Tightness and Uniform Integrability

Lemma 4.11

Assume Condition 4.3 and consider any sequence of controls \(\left\{ \bar{\mu }_{i}^{n}\right\} \) for which the relative entropy costs satisfy

$$ \sup _{n\in \mathbb {N}} E\left[ \frac{1}{n}\sum _{i=0}^{n-1}R\left( \bar{\mu }_{i}^{n}(\cdot )\left\| \theta (\cdot |\bar{X}_{i}^{n})\right. \right) \right] \le K<\infty . $$

Let \(\{\bar{L}^{n}\}_{n\in \mathbb {N}}\), \(\{\bar{X}^{n}\}_{n\in \mathbb {N}}\), \(\{\bar{\mu }^{n}\}_{n\in \mathbb {N}}\), and \(\left\{ \lambda ^{n}\right\} _{n\in \mathbb {N}}\) be defined as in Construction 4.4. Then the empirical measures \(\{\bar{L}^{n}\}\) are tight and in fact uniformly integrable in the sense that

$$\begin{aligned} \lim _{M\rightarrow \infty }\limsup _{n\rightarrow \infty }E\left[ \int _{\mathbb {R}^{d}\times [0,1]}\left\| y\right\| 1_{\left\{ \left\| y\right\| \ge M\right\} }\bar{L}^{n}(dy\times dt)\right] =0. \end{aligned}$$
(4.8)

The measures \(\left\{ \bar{\mu }^{n}\right\} _{n\in \mathbb {N}}\) are also uniformly integrable in the sense of (4.8), and \(\{\bar{X}^{n}\}\), \(\{\bar{\mu }^{n}\}\), and \(\left\{ \lambda ^{n}\right\} \) are all tight.

Proof

Except for more complicated notation, the proof is almost the same as the analogous result needed for Cramér’s theorem. From the inequality (2.9), it follows that if \(\mu \in \mathscr {P}(\mathbb {R}^{d}\times [0,1])\) satisfies \(\mu \ll \lambda ^{n}\), then for all \(\sigma \in [1,\infty )\),

$$\begin{aligned}&\int _{\mathbb {R}^{d}\times [0,1]}\left\| y\right\| 1_{\left\{ \left\| y\right\| \ge M\right\} }\mu (dy\times dt)\\&\quad \le \int _{\mathbb {R}^{d}\times [0,1]}e^{\sigma \left\| y\right\| }1_{\left\{ \left\| y\right\| \ge M\right\} }\lambda ^{n}(dy\times dt)+\frac{1}{\sigma }R\left( \mu \left\| \lambda ^{n}\right. \right) . \end{aligned}$$

By a conditioning argument it follows that \(E\int fd\bar{L}^{n}=E\int fd\bar{\mu }^{n}\) for every bounded and measurable function f. Using the definitions of \(\bar{\mu }^{n}\) and \(\lambda ^{n}\) and the chain rule to get the first equality, we have

$$\begin{aligned} E\left[ R\left( \bar{\mu }^{n}\left\| \lambda ^{n}\right. \right) \right]&=E\left[ \int _{0}^{1}R\left( \bar{\mu }^{n}(\cdot |t)\left\| \lambda ^{n}(\cdot |t)\right. \right) dt\right] \\&=E\left[ \frac{1}{n}\sum _{i=0}^{n-1}R\left( \bar{\mu }_{i}^{n}(\cdot )\left\| \theta (\cdot |\bar{X}_{i}^{n})\right. \right) \right] \nonumber \\&\le K.\nonumber \end{aligned}$$
(4.9)

Therefore,

$$\begin{aligned}&E\left[ \int _{\mathbb {R}^{d}\times [0,1]}\left\| y\right\| 1_{\left\{ \left\| y\right\| \ge M\right\} }\bar{L}^{n}(dy\times dt)\right] \nonumber \\&\quad =E\left[ \int _{\mathbb {R}^{d}\times [0,1]}\left\| y\right\| 1_{\left\{ \left\| y\right\| \ge M\right\} }\bar{\mu }^{n}(dy\times dt)\right] \nonumber \\&\quad \le \sup _{x\in \mathbb {R}^{d}}\int _{\mathbb {R}^{d}}e^{\sigma \left\| y\right\| }1_{\left\{ \left\| y\right\| \ge M\right\} }\theta (dy|x)+\frac{1}{\sigma }K. \end{aligned}$$
(4.10)

From part (a) of Condition 4.3 it follows that for \(\sigma \in \mathbb {R}\),

$$\begin{aligned} \sup _{x\in \mathbb {R}^{d}}\int _{\mathbb {R}^{d}}e^{2\sigma \left\| y\right\| }\theta (dy|x)<\infty \end{aligned}$$
(4.11)

(for details see the analogous claim in the proof of Lemma 3.9). Since

$$\begin{aligned} \int _{\mathbb {R}^{d}}e^{\sigma \left\| y\right\| }1_{\{\left\| y\right\| \ge M\}}\theta (dy|x)\le e^{-\sigma M}\int _{\mathbb {R}^{d}}e^{2\sigma \left\| y\right\| }\theta (dy|x), \end{aligned}$$
(4.12)

sending first \(n\rightarrow \infty \), then \(M\rightarrow \infty \), and finally \(\sigma \rightarrow \infty \) in (4.10), the limit (4.8) holds for both \(\{\bar{L}^{n}\}\) and \(\{\bar{\mu }^{n}\}\). Tightness of \(\{\bar{L}^{n}\}\) and \(\{\bar{\mu }^{n}\}\) follows directly, and the tightness of \(\left\{ \lambda ^{n}\right\} \) follows from part (a) of Condition 4.3.

To establish tightness of \(\{\bar{X}^{n}\}\) we use the fact that

$$\begin{aligned} \bar{X}^{n}(t)=\int _{\mathbb {R}^{d}\times [0,t]}y\bar{L}^{n}(dy\times ds)+x_{0}. \end{aligned}$$
(4.13)

Tightness will follow if given \(\varepsilon >0\) and \(\eta >0\), there is \(\delta >0\) such that

$$\begin{aligned} \limsup _{n\rightarrow \infty }P\left\{ w^{n}(\delta )\ge \varepsilon \right\} \le \eta , \end{aligned}$$
(4.14)

where \(w^{n}(\delta )\doteq \sup _{0\le s<t\le 1:t-s\le \delta }\left\| \bar{X}^{n}(t)-\bar{X}^{n}(s)\right\| \). Using (4.8), choose \(M<\infty \) such that

$$ \limsup _{n\rightarrow \infty }E\left[ \int _{\mathbb {R}^{d}\times [0,1]}\left\| y\right\| 1_{\left\{ \left\| y\right\| \ge M\right\} }\bar{L}^{n}(dy\times dt)\right] \le \frac{\varepsilon \eta }{2}. $$

Let \(\delta \doteq (\varepsilon /2M)\wedge 1\). Then since \(M\delta \le \varepsilon /2\), we have

$$ \sup _{0\le s<u\le 1:u-s\le \delta }\int _{\mathbb {R}^{d}\times [s, u]}\left\| y\right\| 1_{\left\{ \left\| y\right\| \le M\right\} }\bar{L}^{n}(dy\times dt)\le M\delta \le \frac{\varepsilon }{2}. $$

Hence

$$\begin{aligned} P\left\{ w^{n}(\delta )\ge \varepsilon \right\}&\le P\left\{ \int _{\mathbb {R}^{d}\times [0,1]}\left\| y\right\| 1_{\left\{ \left\| y\right\| \ge M\right\} }\bar{L}^{n}(dy\times dt)\ge \frac{\varepsilon }{2}\right\} \\&\le \frac{2}{\varepsilon }E\left[ \int _{\mathbb {R}^{d}\times [0,1]}\left\| y\right\| 1_{\left\{ \left\| y\right\| \ge M\right\} }\bar{L}^{n}(dy\times dt)\right] \\&\le \eta . \end{aligned}$$

This proves (4.14), and tightness of \(\{\bar{X}^{n}\}\) follows.   \(\square \)

5.2 Weak Convergence

Lemma 4.11 proved tightness of \(\{(\bar{L}^{n},\bar{\mu }^{n},\lambda ^{n},\bar{X}^{n})\}_{n\in \mathbb {N}}\). The following lemma characterizes the weak limits of this collection.

Lemma 4.12

Consider any sequence of controls \(\left\{ \bar{\mu }_{i}^{n}\right\} \) as in Construction 4.4 for which the relative entropy costs satisfy

$$ E\left[ \frac{1}{n}\sum _{i=0}^{n-1}R\left( \bar{\mu }_{i}^{n}(\cdot )\left\| \theta (\cdot |\bar{X}_{i}^{n})\right. \right) \right] \le K<\infty . $$

Let \(\{(\bar{X}^{n},\bar{L}^{n},\bar{\mu }^{n})\}\) denote a weakly converging subsequence, which for notational convenience we again label by n, with limit \((\bar{X},\bar{L},\bar{\mu })\). Then w.p.1, \(\bar{L}=\bar{\mu }\), and \(\bar{\mu }(dy\times dt)\) can be decomposed as \(\bar{\mu }(dy|t)dt\), where \(\bar{\mu }(dy|t)\) is a stochastic kernel on \(\mathbb {R}^{d}\) given [0, 1], and w.p.1 for all \(t\in [0,1]\),

$$\begin{aligned} \bar{X}(t)=\int _{\mathbb {R}^{d}\times [0,t]}y\bar{\mu }(dy\times ds)+x_{0}=\int _{\mathbb {R}^{d}\times [0,t]}y\bar{\mu }(dy|s)ds+x_{0}. \end{aligned}$$
(4.15)

In addition, \(\lambda ^{n}\) converges weakly to a limit \(\lambda \) of the form

$$\begin{aligned} \lambda (A\times B)=\int _{B}\theta (A|\bar{X}(t))dt. \end{aligned}$$
(4.16)

Proof

Recall that \(\bar{\mu }_{i}^{n}\) picks the conditional distribution of \(\bar{v}_{i}^{n}\). Hence a minor modification of the martingale argument used to prove the analogous result needed for Sanov’s theorem (Lemma 3.5) can be used to show that \(\bar{L}=\bar{\mu }\) w.p.1. The changes are mainly notational, and are needed, since in the present setting the measures must record time information. For completeness we give the details.

Now, \(\mathbb {R}^{d}\times [0,1]\) is a Polish space, and on such a space there exists a countable separating class of bounded uniformly continuous functions (see Appendix A). Thus to verify \(\bar{L}=\bar{\mu }\) w.p.1, it suffices to show that for every bounded uniformly continuous f,

$$\begin{aligned} P\left\{ \int _{\mathbb {R}^{d}\times [0,1]}f(v, t)\bar{L}\left( dv\times dt\right) =\int _{\mathbb {R}^{d}\times [0,1]}f(v, t)\bar{\mu }\left( dv\times dt\right) \right\} =1. \end{aligned}$$
(4.17)

Define \(K\doteq \left\| f\right\| _{\infty }\) and \(\Delta _{i}^{n}\doteq f\left( \bar{v}_{i}^{n},i/n\right) -\int _{\mathbb {R}^{d}}f\left( v, i/n\right) \bar{\mu }_{i}^{n}\left( dv\right) \). For all \(\varepsilon >0\),

$$\begin{aligned}&P\left\{ \left| \frac{1}{n}\sum _{i=1}^{n}f\left( \bar{v}_{i}^{n}, i/n\right) -\frac{1}{n}\sum _{i=1}^{n}\int _{\mathbb {R}^{d}}f\left( v, i/n\right) \bar{\mu }_{i}^{n}\left( dv\right) \right| >\varepsilon \right\} \\&\quad \le \frac{1}{\varepsilon ^{2}}E\left[ \frac{1}{n^{2}}\sum _{i, j=1}^{n}\Delta _{i}^{n}\Delta _{j}^{n}\right] . \end{aligned}$$

Recall that \(\mathscr {\bar{F}}_{i}^{n}\doteq \sigma (\bar{v}_{j}^{n}, j=0,\ldots , i-1)\). By a standard argument, for \(i\ne j\), conditioning on \(\mathscr {\bar{F}}_{j\wedge i-1}^{n}\) gives \(E[\Delta _{i}^{n}\Delta _{j}^{n}]=0\). Since \(|\Delta _{i}^{n}|\le 2K\),

$$ P\left\{ \left| \frac{1}{n}\sum _{i=1}^{n}f(\bar{v}_{i}^{n}, i/n)-\frac{1}{n}\sum _{i=1}^{n}\int _{\mathbb {R}^{d}}f(v, i/n)\bar{\mu }_{i}^{n}(dv)\right| >\varepsilon \right\} \le \frac{4K^{2}}{n\varepsilon ^{2}}. $$

Let \(\gamma (\delta )\) denote the modulus \(\sup _{v\in \mathbb {R}^{d}, 0\le s\le t\le 1:t-s\le \delta }\left\{ \left| f(v,t)-f(v, s)\right| \right\} \). Since f is uniformly continuous, \(\gamma (\delta )\downarrow 0\) as \(\delta \downarrow 0\), and the definition of \(\gamma (\delta )\) implies

$$\begin{aligned} \left| \frac{1}{n}\sum _{i=1}^{n}f(\bar{v}_{i}^{n}, i/n)-\int _{\mathbb {R}^{d}\times [0,1]}f(v, t)\bar{L}^{n}(dv\times dt)\right|&\le \gamma (1/n)\\ \left| \frac{1}{n}\sum _{i=1}^{n}\int _{\mathbb {R}^{d}}f(v, i/n)\bar{\mu }_{i}^{n}(dv)-\int _{\mathbb {R}^{d}\times [0,1]}f(v, t)\bar{\mu }^{n}(dv\times dt)\right|&\le \gamma (1/n). \end{aligned}$$

Letting first \(n\rightarrow \infty \) and then \(\varepsilon \rightarrow 0\), we obtain (4.17), which proves \(\bar{L}=\bar{\mu }\) w.p.1.

Note that both \(\bar{L}^{n}\) and \(\bar{\mu }^{n}\) have second marginals equal to Lebesgue measure. Since this property is inherited by the weak limits, \(\bar{L}(\mathbb {R}^{d}\times \{t\})=0\) w.p.1. This property and the uniform integrability allow us to pass to the limit in (4.13) and obtain

$$ \bar{X}(t)=\int _{\mathbb {R}^{d}\times [0,t]}y\bar{L}(dy\times ds)+x_{0}. $$

Now use that \(\bar{L}=\bar{\mu }\) w.p.1 to get the first part of (4.15). Since each \(\bar{\mu }\) has Lebesgue measure as its second marginal, both the decomposition and the second part of (4.15) follow. Finally, the weak convergence of \(\lambda ^{n}\) and the form of the limit follow from the weak convergence of \(\bar{X}^{n}\) to \(\bar{X}\) and the assumption that \(x\rightarrow \theta (\cdot |x)\) is continuous in the weak topology.   \(\square \)

5.3 Completion of the Laplace Upper Bound

The large deviation and Laplace principle upper bounds correspond to the variational lower bound. To prove such a lower bound, we again follow the line of argument used for Cramér’s theorem in Sect. 3.1.6. Fix a continuous and bounded \(F:\mathscr {C}([0,1]:\mathbb {R}^{d})\rightarrow \mathbb {R}\) and \(\varepsilon >0\). Using (4.4), let \(\left\{ \bar{\mu }_{i}^{n}\right\} _{i=1,\ldots , n}\) satisfy

$$ -\frac{1}{n}\log Ee^{-nF(X^{n})}+\varepsilon \ge E\left[ F(\bar{X}^{n})+\frac{1}{n}\sum _{i=0}^{n-1}R\left( \bar{\mu }_{i}^{n}(\cdot )\left\| \theta (\cdot |\bar{X}_{i}^{n})\right. \right) \right] . $$

Then since F is bounded, we have \(\sup _{n}\frac{1}{n}E\sum _{i=0}^{n-1}R\left( \bar{\mu }_{i}^{n}(\cdot )\left\| \theta (\cdot |\bar{X}_{i}^{n})\right. \right) <\infty \), and therefore by Lemma 4.11, it follows that \(\left\{ (\bar{L}^{n},\bar{X}^{n},\bar{\mu }^{n},\lambda ^{n})\right\} \) is tight. Consider any subsequence that converges to a weak limit \((\bar{L},\bar{X},\bar{\mu },\lambda )\), and denote the convergent subsequence by n. If the lower bound is demonstrated for this subsequence, then the standard argument by contradiction establishes the lower bound for the original sequence. Details of the following calculation are given after the display:

$$\begin{aligned}&\liminf _{n\rightarrow \infty }-\frac{1}{n}\log E\exp \{-nF(X^{n})\}+\varepsilon \nonumber \\&\quad \ge \liminf _{n\rightarrow \infty }E\left[ F(\bar{X}^{n})+\frac{1}{n}\sum _{i=0}^{n-1}R\left( \bar{\mu }_{i}^{n}(\cdot )\left\| \theta (\cdot |\bar{X}_{i}^{n})\right. \right) \right] \nonumber \\&\quad =\liminf _{n\rightarrow \infty }E\left[ F(\bar{X}^{n})+R\left( \bar{\mu }^{n}(dy\times dt)\left\| \lambda ^{n}(dy\times dt)\right. \right) \right] \nonumber \\&\quad \ge E\left[ F\left( \bar{X}\right) +R\left( \bar{\mu }(dy\times dt)\left\| \lambda (dy\times dt)\right. \right) \right] \nonumber \\&\quad =E\left[ F\left( \bar{X}\right) +\int _{[0,1]}R\left( \bar{\mu }(\cdot |t)\left\| \theta (\cdot |\bar{X}(t))\right. \right) dt\right] \nonumber \\&\quad \ge E\left[ F(\bar{X})+\int _{[0,1]}L(\bar{X}(t),\dot{\bar{X}}(t))dt\right] \nonumber \\&\quad \ge \inf _{\phi }\left[ F(\phi )+\int _{[0,1]}L(\phi (t),\dot{\phi }(t))dt\right] . \end{aligned}$$
(4.18)

The first equality uses the rewriting of the relative entropy in (4.9); the next inequality is due to the weak convergence, the lower semicontinuity of \(R\left( \cdot \left\| \cdot \right. \right) \), continuity of F, and Fatou’s lemma; the next equality uses the decompositions \(\bar{\mu }(dy\times dt)=\bar{\mu }(dy|t)dt\) and \(\lambda (dy\times dt)=\theta (dy|\bar{X}(t))dt\) and the chain rule; the third inequality follows from (4.5) and (4.15); and the infimum in the last line is over all \(\phi \in \mathscr {AC}_{x_{0}}([0,1]:\mathbb {R}^{d})\). Since \(\varepsilon >0\) is arbitrary, we have proved the Laplace upper bound for \(\{X^{n}\}\):

$$ \limsup _{n\rightarrow \infty }\frac{1}{n}\log E\exp \{-nF(X^{n})\}\le -\inf _{\phi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ F(\phi )+I(\phi )\right] . $$

   \(\square \)

5.4 I is a Rate Function

As first noted in Chap. 3, in the weak convergence approach, a deterministic version of the argument used to prove the Laplace upper bound will usually show that the proposed rate function is indeed a rate function, i.e., that it has compact level sets.

Theorem 4.13

Assume Condition 4.3, define \(L(x,\beta )\) by (4.5), and let I be the function defined in Theorem 4.9. Then I has compact level sets in \(\mathscr {C}([0,T]:\mathbb {R}^{d})\).

Proof

As usual, the proof is given for \(T=1\). Suppose \(\{\phi _{j}\}_{j\in \mathbb {N}}\) is given such that \(I(\phi _{j})\le K<\infty \) for all \(j\in \mathbb {N}\). Then we need to show that \(\{\phi _{j}\}\) is precompact, and that if \(\phi _{j}\rightarrow \phi \), then

$$ \liminf _{j\rightarrow \infty }I(\phi _{j})\ge I(\phi ). $$

Since \(I(\phi _{j})<\infty \), we know \(\phi _{j}\) is absolutely continuous. Define probability measures \(\mu ^{j}\) on \(\mathbb {R}^{d}\times [0,1]\) by

$$ \mu ^{j}(A\times B)=\int _{B}\delta _{\dot{\phi }_{j}(t)}(A)dt,\;A\in \mathscr {B}(\mathbb {R}^{d}), B\in \mathscr {B}([0,1]). $$

Note that

$$ \phi _{j}(t)=x_{0}+\int _{\mathbb {R}^{d}\times [0,t]}y\mu ^{j}(dy\times ds). $$

Using \(I(\phi _{j})\le K<\infty \), exactly the same argument as in Lemma 4.11 shows that \(\{\mu ^{j}\}_{j\in \mathbb {N}}\) is tight and uniformly integrable. By the usual subsequential argument, we can assume that \(\mu ^{j}\) converges along the full sequence, and a deterministic version of Lemma 4.12 shows that the limit \(\mu \) can be factored in the form \(\mu (dy\times dt)=\mu (dy|t)dt\), and that \(\phi _{j}\rightarrow \phi \), with \(\int _{\mathbb {R}^{d}}y\mu (dy|t)=\dot{\phi }(t)\). Thus \(\left\{ \phi _{j}\right\} \) is precompact. We now argue that \(I(\phi )\le K\). In Lemma 4.14 it will be shown that L is a lower semicontinuous function that is convex in the second variable. Using these properties, we obtain

$$\begin{aligned} K&\ge \liminf _{j\rightarrow \infty }I(\phi _{j})\\&=\liminf _{j\rightarrow \infty }\int _{\mathbb {R}^{d}\times [0,1]}L(\phi _{j}(t), y)\mu ^{j}(dy\times dt)\\&\ge \int _{\mathbb {R}^{d}\times [0,1]}L(\phi (t), y)\mu (dy\times dt)\\&=\int _{0}^{1}\int _{\mathbb {R}^{d}}L(\phi (t), y)\mu (dy|t)dt\\&\ge \int _{0}^{1}L(\phi (t),\dot{\phi }(t))dt=I(\phi ), \end{aligned}$$

where the second inequality is a consequence of Fatou’s lemma, the lower semicontinuity of L, and the convergence of \((\phi _{j},\mu ^{j})\) to \((\phi ,\mu )\), while the third inequality uses the convexity of L and Jensen’s inequality. Thus I has compact level sets, and hence is a rate function.   \(\square \)

6 Properties of \(L(x,\beta )\)

To prove a Laplace lower bound, we must take a trajectory \(\phi \) that nearly minimizes in \(\inf _{\phi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ F(\phi )+I(\phi )\right] \) and show how to construct a control that can be applied in the representation that will give asymptotically the same cost. For the continuous-time models in Chap. 3 this was not very difficult, in part because the implementation of the control was straightforward. For example, in the case of the diffusion model, the construction of a solution to (3.17) is possible when v is measurable in t and has appropriate integrability properties; in particular, piecewise continuity or some similar form of regularity is not required. The situation is different in discrete time. In general, \(I(\phi )<\infty \) implies only that \(\phi \) is absolutely continuous. As we will see, it is natural to define controls for the prelimit in terms of \(\dot{\phi }(t)\), where \(\phi \) is nearly minimizing. Since the derivative is well defined only up to a set of Lebesgue measure zero, this causes a number of problems. The solution is to show that one can always construct a “nice” nearly minimizing trajectory, e.g., one whose derivative is continuous from the left with right-hand limits. Such a construction requires some regularity properties of \(L(x,\beta )\), which we now present.

Recall that for \(x\in \mathbb {R}^{d}\), the Legendre–Fenchel transform of \(H(x,\cdot )\) is defined by

$$ H^{*}(x,\beta )=\sup _{\alpha \in \mathbb {R}^{d}}\left[ \left\langle \alpha ,\beta \right\rangle -H(x,\alpha )\right] ,\;\beta \in \mathbb {R}^{d}, $$

and \(L(x,\beta )\) is defined as in (4.5). As noted in Remark 3.10 and shown in Lemma 4.16, for each fixed x these are dual representations of the same function.

Lemma 4.14

Assume Condition 4.3. Then the following are valid.

(a) For each \(x\in \mathbb {R}^{d}\), \(\alpha \mapsto H(x,\alpha )\) is a finite convex function on \(\mathbb {R}^{d}\) that is differentiable for all \(\alpha \in \mathbb {R}^{d}\). Also, \((x,\alpha )\mapsto H(x,\alpha )\) is continuous on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\).

(b) For each \(x\in \mathbb {R}^{d}\), \(\beta \mapsto H^{*}(x,\beta )\) is a convex function on \(\mathbb {R}^{d}\). Furthermore, \((x,\beta )\mapsto H^{*}(x,\beta )\) is a nonnegative lower semicontinuous function on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\).

(c) The map \(\beta \mapsto H^{*}(x,\beta )\) is superlinear, uniformly in x, which means that

$$ \lim _{N\rightarrow \infty }\inf _{x\in \mathbb {R}^{d}}\inf _{\beta \in \mathbb {R}^{d}:\Vert \beta \Vert =N}\frac{H^{*}(x,\beta )}{\Vert \beta \Vert }=\infty . $$

(d) The map \((x,\beta )\mapsto L(x,\beta )\) is a lower semicontinuous function on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\), and for each \(x\in \mathbb {R}^{d}\), the map \(\beta \mapsto L(x,\beta )\) is convex.

Proof

(a) Part (a) of Condition 4.3 ensures that \(H(x,\alpha )\in (-\infty ,\infty )\) for all \((x,\alpha )\in \mathbb {R}^{d}\times \mathbb {R}^{d}\). The convexity of \(\alpha \mapsto H(x,\alpha )\) then follows from Hölder’s inequality: if \(\alpha _{1},\alpha _{2}\in \mathbb {R}^{d}\) and \(\rho \in [0,1]\), then

$$ \int _{\mathbb {R}^{d}}e^{\langle \rho \alpha _{1}+(1-\rho )\alpha _{2}, y\rangle }\theta (dy|x)\le \left( \int _{\mathbb {R}^{d}}e^{\langle \alpha _{1}, y\rangle }\theta (dy|x)\right) ^{\rho }\left( \int _{\mathbb {R}^{d}}e^{\langle \alpha _{2}, y\rangle }\theta (dy|x)\right) ^{1-\rho }. $$

Taking the logarithm of both sides demonstrates convexity. Under part (a) of Condition 4.3 one can easily construct an appropriate dominating function, and hence show that \(\alpha \mapsto H(x,\alpha )\) is differentiable, where for all \((x,\alpha )\in \mathbb {R}^{d}\times \mathbb {R}^{d}\), the gradient \(\nabla _{\alpha }H(x,\alpha )\) is given by

$$\begin{aligned} \nabla _{\alpha }H(x,\alpha )=\frac{\int _{\mathbb {R}^{d}}ye^{\langle \alpha ,y\rangle }\theta (dy|x)}{\int _{\mathbb {R}^{d}}e^{\langle \alpha , y\rangle }\theta (dy|x)}. \end{aligned}$$
(4.19)

To see the continuity of \((x,\alpha )\mapsto H(x,\alpha )\), let \((x_{n},\alpha _{n})\rightarrow (x,\alpha )\) in \(\mathbb {R}^{d}\times \mathbb {R}^{d}\). We write

$$\begin{aligned} e^{H(x_{n},\alpha _{n})}-e^{H(x,\alpha )}&=\int _{\mathbb {R}^{d}}\left[ e^{\langle \alpha _{n},y\rangle }-e^{\langle \alpha ,y\rangle }\right] \theta (dy|x_{n})\\&\quad +\left[ \int _{\mathbb {R}^{d}}e^{\langle \alpha ,y\rangle }\theta (dy|x_{n})-\int _{\mathbb {R}^{d}}e^{\langle \alpha , y\rangle }\theta (dy|x)\right] , \end{aligned}$$

and recall the bounds (4.11) and (4.12). The second term on the right converges to zero by the Feller property assumed in Condition 4.3(b) and the uniform integrability implied by Condition 4.3(a). For every \(M<\infty \), we have \(\sup _{\Vert y\Vert \le M}|\exp \langle \alpha _{n},y\rangle -\exp \langle \alpha , y\rangle |\rightarrow 0\) as \(n\rightarrow \infty \), and by part (a) of Condition 4.3,

$$ \sup _{n\in \mathbb {N}}\sup _{x\in \mathbb {R}^{d}}\int _{\{\Vert y\Vert \ge M\}}e^{\langle \alpha _{n}, y\rangle }\theta (dy|x)\rightarrow 0 $$

as \(M\rightarrow \infty \). Hence the first term on the right converges to zero, which completes the proof of the continuity of \((x,\alpha )\mapsto H(x,\alpha )\).

(b) By duality for Legendre–Fenchel transforms [217, Theorem 23.5],

$$ H(x,\alpha )=\sup _{\beta \in \mathbb {R}^{d}}[\langle \alpha ,\beta \rangle -H^{*}(x,\beta )],\;\alpha \in \mathbb {R}^{d}. $$

Taking \(\alpha =0\) in the last display shows that \(\inf _{\beta \in \mathbb {R}^{d}}H^{*}(x,\beta )=-H(x, 0)=0\), and thus \(H^{*}\) is nonnegative. For each \(x\in \mathbb {R}^{d}\) and \(\alpha \in \mathbb {R}^{d}\), the mapping \(\beta \mapsto \langle \alpha ,\beta \rangle -H(x,\alpha )\) is convex (in fact affine), and for each \(\alpha \in \mathbb {R}^{d}\), the mapping \((x,\beta )\mapsto \langle \alpha ,\beta \rangle -H(x,\alpha )\) is continuous. Recalling that the pointwise supremum of convex functions is convex and that the pointwise supremum of continuous functions is lower semicontinuous, we see that \(H^{*}(x,\cdot )\) is convex on \(\mathbb {R}^{d}\) and \(H^{*}\) is lower semicontinuous on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\).

(c) From part (a) of Condition 4.3, for every \(M<\infty \), we have

$$ C_{M}\doteq \sup _{x\in \mathbb {R}^{d}}\sup _{\alpha \in \mathbb {R}^{d}:\Vert \alpha \Vert =M}H(x,\alpha )<\infty . $$

Also, for every \(\beta \in \mathbb {R}^{d}\) and \(x\in \mathbb {R}^{d}\), the definition of \(H^{*}\) implies

$$ H^{*}(x,\beta )\ge \langle M\beta /\Vert \beta \Vert ,\beta \rangle -H(x, M\beta /\Vert \beta \Vert )\ge M\Vert \beta \Vert -C_{M}. $$

Thus

$$ \inf _{x\in \mathbb {R}^{d}}\inf _{\beta \in \mathbb {R}^{d}:\Vert \beta \Vert =N}\frac{H^{*}(x,\beta )}{\Vert \beta \Vert }\ge M-\frac{C_{M}}{N}. $$

The asserted superlinearity follows by sending first \(N\rightarrow \infty \) and then \(M\rightarrow \infty \) in the last display.

(d) The claimed properties of \(L(x,\beta )\) follow from its definition in (4.5) and the corresponding properties of \(R(\cdot \left\| \cdot \right. )\) [part (b) of Lemma 2.4]. We first consider convexity. Fix x, let \(\beta _{1},\beta _{2}\in \mathbb {R}^{d},\delta >0,\rho \in [0,1]\) be given, and suppose that \(\mu _{i}\) are within \(\delta \) of the infimum in (4.5) for \(\beta _{i}, i=1,2\). Then since the mean under \(\rho \mu _{1}+(1-\rho )\mu _{2}\) is \(\rho \beta _{1}+(1-\rho )\beta _{2}\), we have

$$\begin{aligned} L(x,\rho \beta _{1}+(1-\rho )\beta _{2})&\le R(\rho \mu _{1}(\cdot )+(1-\rho )\mu _{2}(\cdot )\left\| \theta (\cdot |x)\right. )\\&\le \rho R(\mu _{1}(\cdot )\left\| \theta (\cdot |x)\right. )+(1-\rho )R(\mu _{2}(\cdot )\left\| \theta (\cdot |x)\right. )\\&\le \rho L(x,\beta _{1})+(1-\rho )L(x,\beta _{2})+\delta . \end{aligned}$$

Convexity follows, since \(\delta >0\) is arbitrary. Next suppose that \(x_{j}\rightarrow x\) and \(\beta _{j}\rightarrow \beta \) as \(j\rightarrow \infty \). By an argument by contradiction based on subsequences, we can assume without loss that \(L(x_{j},\beta _{j})\) converges in \([0,\infty ]\), and we need to prove that \(L(x,\beta )\le \lim _{j\rightarrow \infty }L(x_{j},\beta _{j})\). If the limit is \(\infty \), there is nothing to prove, and hence we assume \(L(x_{j},\beta _{j})\le M<\infty \) for all j. Choose \(\mu _{j}\) that is within \(\delta >0\) of the infimum in the definition of \(L(x_{j},\beta _{j})\). Then by part (d) of Lemma 2.4, \(\{\mu _{j}\}\) is tight and uniformly integrable. Thus if \(\mu ^{*}\) is the limit of any convergence subsequence, then the mean of \(\mu ^{*}\) is \(\beta \), and so (along this subsequence)

$$ L(x,\beta )\le R(\mu ^{*}(\cdot )\left\| \theta (\cdot |x)\right. )\le \liminf _{j\rightarrow \infty }R(\mu _{j}(\cdot )\left\| \theta (\cdot |x_{j})\right. )\le \liminf _{j\rightarrow \infty }L(x_{j},\beta _{j})+\delta . $$

This establishes the lower semicontinuity.   \(\square \)

Remark 4.15

For the next result we will assume, in addition to Condition 4.3, that the support of \(\theta (\cdot |x)\) is all of \(\mathbb {R}^{d}\). Part (d) of the lemma was mentioned in Remark 3.10, which noted that the rate function for Cramér’s theorem (which plays the role of the local rate function here) has two variational representations. These correspond here to \(H^{*}\) (as a supremum involving a moment-generating function) and L (as an infimum involving relative entropy). Although for our needs it suffices to prove this assuming Conditions 4.3 and 4.7, the functions \(H^{*}\) and L coincide when \(\sup _{x \in \mathbb {R}^{d}} H(x, \alpha )<\infty \) for \(\alpha \) in some open neighborhood of the origin. This will be proved in Lemma 5.4. Several statements in the lemma below hold assuming only Condition 4.3 (cf. [97, Lemma 6.2.3]). However, for simplicity we assume here that both Conditions 4.3 and 4.7 are satisfied.

Lemma 4.16

Assume Conditions 4.3 and 4.7. Then the following conclusions hold.

(a) \(H^{*}\) is finite on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\).

(b) For every \(x\in \mathbb {R}^{d}\), \(\alpha \mapsto H(x,\alpha )\) is strictly convex on \(\mathbb {R}^{d}\).

(c) For every \((x,\beta )\in \mathbb {R}^{d}\times \mathbb {R}^{d}\), there is a unique \(\alpha =\alpha (x,\beta )\in \mathbb {R}^{d}\) such that \(\nabla _{\alpha }H(x,\alpha (x,\beta ))=\beta \).

(d) \(H^{*}=L\).

(e) For every \((x,\beta )\in \mathbb {R}^{d}\times \mathbb {R}^{d}\), with \(\alpha (x,\beta )\) as in part (c),

$$ L(x,\beta )=\langle \alpha (x,\beta ),\beta \rangle -H(x,\alpha (x,\beta )). $$

(f) \((x,\beta )\mapsto L(x,\beta )\) is continuous on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\).

(g) There exists a measurable \(\alpha :\mathbb {R}^{d}\times \mathbb {R}^{d}\rightarrow \mathbb {R}^{d}\) such that the stochastic kernel \(\gamma (dy|x,\beta )\) on \(\mathbb {R}^{d}\) given \(\mathbb {R}^{d}\times \mathbb {R}^{d}\) defined by

$$ \gamma (dy|x,\beta )\doteq e^{\left\langle \alpha (x,\beta ), y\right\rangle -H(x,\alpha (x,\beta ))}\theta (dy|x), $$

satisfies

$$\begin{aligned} R(\gamma (\cdot |x,\beta )\Vert \theta (\cdot |x))=L(x,\beta ) \text{ and } \int _{\mathbb {R}^{d}}y\gamma (dy|x,\beta )=\beta \text{ for } \text{ all } x\in \mathbb {R}^{d},\beta \in \mathbb {R}^{d}. \end{aligned}$$
(4.20)

Proof

Since x plays no role other than as a parameter in parts (a)–(e), it is dropped from the notation in the proofs of these parts.

(a) Let Y be a random variable with distribution \(\theta \), and let \(Y_{i}\) denote the ith component. We first claim that the map \(\alpha \rightarrow H(\alpha )\) is superlinear. By Condition 4.7, the support of \(\theta \) is all of \(\mathbb {R}^{d}\). Thus for each \(M<\infty \),

$$ \varTheta _{M}\doteq \min _{\mathscr {I}\subset \{1,\ldots , d\}}P\left\{ [\cap _{i\in \mathscr {I}}\{Y_{i}\ge M\}]\;{\textstyle \bigcap }\;[\cap _{i\in \mathscr {I}^{c}}\{Y_{i}\le -M\} ]\right\} >0. $$

Therefore, for every \(\alpha \in \mathbb {R}^{d}\),

$$ \frac{1}{\Vert \alpha \Vert +1}\log Ee^{\langle \alpha , Y\rangle }\ge \frac{1}{\Vert \alpha \Vert +1}\log \left[ \varTheta _{M}e^{M\sum _{i=1}^{d}|\alpha _{i}|}\right] =M\frac{\sum _{i=1}^{d}|\alpha _{i}|}{\Vert \alpha \Vert +1}+\frac{\log \varTheta _{M}}{\Vert \alpha \Vert +1}. $$

The superlinearity of H now follows by sending \(\Vert \alpha \Vert \rightarrow \infty \) and then \(M\rightarrow \infty \). The superlinearity implies that the Legendre–Fenchel transform \(H^{*}\) of H is finite everywhere, since for each \(\beta \in \mathbb {R}^{d}\), one can find a compact set \(K\subset \mathbb {R}^{d}\) such that

$$ H^{*}(\beta )=\sup _{\alpha \in \mathbb {R}^{d}}[\langle \alpha ,\beta \rangle -H(\alpha )]=\sup _{\alpha \in K}[\langle \alpha ,\beta \rangle -H(\alpha )]. $$

Since H is continuous, the last expression is finite, and thus (a) follows.

(b) As shown in Lemma 4.14, H is convex. Suppose that for some \(\alpha _{1},\alpha _{2}\in \mathbb {R}^{d}\), \(\alpha _{1}\ne \alpha _{2}\), and \(\rho \in (0,1)\), we have

$$ H(\rho \alpha _{1}+(1-\rho )\alpha _{2})=\rho H(\alpha _{1})+(1-\rho )H(\alpha _{2}). $$

Then the condition for equality in Hölder’s inequality requires that for \(\theta (dy)\) a.e. y,

$$ \frac{\exp \langle \alpha _{1}, y\rangle }{\int _{\mathbb {R}^{d}}\exp \langle \alpha _{1}, z\rangle \theta (dz)}=\frac{\exp \langle \alpha _{2}, y\rangle }{\int _{\mathbb {R}^{d}}\exp \langle \alpha _{2}, z\rangle \theta (dz)}, $$

which implies that

$$ \langle \alpha _{1}-\alpha _{2}, y\rangle =H(\alpha _{1})-H(\alpha _{2}) \text{ a.s. } \theta (dy). $$

In other words, \(\theta \) is supported on a hyperplane of dimension \(d-1\). But this contradicts the fact that the support of \(\theta (dy)\) is all of \(\mathbb {R}^{d}\), which proves the strict convexity of \(\alpha \mapsto H(\alpha )\).

(c) From Corollary 26.4.1 in [217] it now follows that the gradient \(\nabla _{\alpha }H(\alpha )\) is onto \(\mathbb {R}^{d}\). Thus given \(\beta \), there exists a vector \(\alpha (\beta )\) such that \(\nabla _{\alpha }H(\alpha (\beta ))=\beta \). We claim that \(\alpha (\beta )\) is unique. Suppose \(\alpha _{1}\ne \alpha _{2}\) are such that

$$\begin{aligned} \nabla _{\alpha }H(\alpha _{1})=\nabla _{\alpha }H(\alpha _{2})=\beta . \end{aligned}$$
(4.21)

Define \(\zeta :\mathbb {R}\rightarrow \mathbb {R}\) by \(\zeta (\lambda )\doteq H(\alpha _{1}+\lambda (\alpha _{2}-\alpha _{1}))\), \(\lambda \in \mathbb {R}\). From part (b), \(\zeta \) is strictly convex on \(\mathbb {R}\), and so

$$ \zeta ^{\prime }(0)=\langle \nabla _{\alpha }H(\alpha _{1}),\alpha _{2}-\alpha _{1}\rangle <\zeta ^{\prime }(1)=\langle \nabla _{\alpha }H(\alpha _{2}),\alpha _{2}-\alpha _{1}\rangle . $$

This contradicts (4.21), and thus there is only one \(\alpha \) that satisfies \(\nabla _{\alpha }H(\alpha )=\beta \).

(d, e) Setting \(\gamma (dy)=e^{\left\langle \alpha (\beta ), y\right\rangle }\theta (dy)/e^{H(\alpha (\beta ))}\), we have from (4.19),

$$\begin{aligned} \int _{\mathbb {R}^{d}}y\gamma (dy)=\frac{1}{e^{H(\alpha (\beta ))}}\int _{\mathbb {R}^{d}}ye^{\left\langle \alpha (\beta ), y\right\rangle }\theta (dy)=\nabla _{\alpha }H(\alpha (\beta ))=\beta . \end{aligned}$$
(4.22)

A direct calculation using the form of \(\gamma (dy)\), the definition of relative entropy, and the definition of L in (4.5) then gives

$$\begin{aligned} L(\beta )\le R\left( \gamma \left\| \theta \right. \right) =\left\langle \alpha (\beta ),\beta \right\rangle -H(\alpha (\beta ))\le H^{*}(\beta ). \end{aligned}$$
(4.23)

Using the definition of H and part (c) of Proposition 2.3, we have

$$ H(\alpha )=\sup _{\mu \in \mathscr {P}(\mathbb {R}^{d}):R\left( \mu \left\| \theta \right. \right) <\infty }\left[ \int _{\mathbb {R}^{d}}\left\langle \alpha , y\right\rangle \mu (dy)-R\left( \mu \left\| \theta \right. \right) \right] . $$

Therefore, for all \(\alpha \in \mathbb {R}^{d}\) and \(\mu \in \mathscr {P}(\mathbb {R}^{d})\),

$$ R\left( \mu \left\| \theta \right. \right) \ge \left\langle \alpha ,\int _{\mathbb {R}^{d}}y\mu (dy)\right\rangle -H(\alpha ), $$

and consequently

$$ L(\beta )\ge \left\langle \alpha ,\beta \right\rangle -H(\alpha ). $$

Since \(\alpha \in \mathbb {R}^{d}\) is arbitrary, we have \(L(\beta )\ge H^{*}(\beta )\). By (4.23), the reverse inequality holds, which shows that \(L(\beta )=H^{*}(\beta )\) and also proves part (e).

(f) For the last two parts of the lemma we include the x dependence. From Lemma 4.14, \((x,\alpha )\mapsto H(x,\alpha )\) is continuous. We now show that joint continuity of \(L(x,\beta )\) follows from this. If a sequence of differentiable convex functions \(g_{i}\) with Legendre transforms \(g_{i}^{*}\) converges pointwise to another differentiable convex function g with transform \(g^{*}\), and if \(\beta \) is any point such that \(g^{*}(\beta )<\infty \), then whenever \(\beta _{i}\rightarrow \beta \), we have \(g_{i}^{*}(\beta _{i})\rightarrow \) \(g^{*}(\beta )\) [97, Lemma C.8.1]. We apply this result with \(g_{i}(\alpha )=H(x_{i},\alpha )\) and \(g(\alpha )=H(x,\alpha )\) to conclude that if \(x_{i}\rightarrow x\) and \(\beta _{i}\rightarrow \beta \), then \(L(x_{i},\beta _{i})\rightarrow L(x,\beta )\).

(g) To see the measurability of \((x,\beta )\mapsto \alpha (x,\beta )\), note that for each x, the strict convexity of \(\alpha \mapsto H(x,\alpha )\) and the fact \(L(x,\beta )=H^{*}(x,\beta )<\infty \) imply that \(\beta \mapsto L(x,\beta )\) is differentiable for all \(\beta \in \mathbb {R}^{d}\) [217, Theorem 26.3]. The characterization of \(H(x,\alpha )\) as the Legendre–Fenchel transform of \(L(x,\beta )\) then gives \(\alpha (x,\beta )=\) \(\nabla _{\beta }L(x,\beta )\), from which measurability follows.

The second equality in (4.20) follows from (4.22), while the first equality follows on noting that the first inequality in (4.23) was shown to be an equality.    \(\square \)

7 Laplace Lower Bound Under Condition 4.7

In this section we prove the Laplace principle lower bound under Conditions 4.3 and 4.7. For the proof of this lower bound we construct a nearly optimal trajectory \(\phi ^{*}\) for \(\inf _{\phi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ F(\phi )+I(\phi )\right] \) that has a simple form. Based on \(\phi ^{*}\), a control is constructed for use in the representation, so that the running cost is close to \(I(\phi ^{*})\) and the associated controlled process converges to the nearly optimal trajectory \(\phi ^{*}\) as \(n\rightarrow \infty \).

Fix \(\varepsilon >0\). Then there is \(\zeta \in \mathscr {C}([0,1]:\mathbb {R}^{d})\) such that

$$\begin{aligned} \left[ F(\zeta )+I(\zeta )\right] \le \inf _{\phi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ F(\phi )+I(\phi )\right] +\varepsilon . \end{aligned}$$
(4.24)

While \(\left\{ \zeta (t):0\le t\le 1\right\} \) is bounded by continuity, we also claim that without loss of generality, we can assume that

$$ \{\dot{\zeta }(t):0\le t\le 1\} \text{ is } \text{ bounded. } $$

This claim will be established in Sect. 4.7.3.

Recall from part (f) of Lemma 4.16 that L is continuous. Let \(M<\infty \) and \(K<\infty \) be such that

$$ \sup _{t\in [0,1]}\left\| \zeta (t)\right\| \vee \sup _{t\in [0,1]}\Vert \dot{\zeta }(t)\Vert \le M,\quad \sup _{\{(x,\beta ):\left\| x\right\| \le M+1,\left\| \beta \right\| \le M+1\}}L(x,\beta )\le K. $$

For \(\delta >0\), let \(\zeta ^{\delta }\) be the piecewise linear interpolation of \(\zeta \), with interpolation points \(t=k\delta \). Since \(\zeta \) is absolutely continuous, \(\dot{\zeta }^{\delta }(t)\) converges to \(\dot{\zeta }(t)\) for a.e. \(t\in [0,1]\). Also, since \(\sup _{t\in [0,1]}\Vert \zeta ^{\delta }(t)\Vert \le M\) and \(\sup _{t\in [0,1]}\Vert \dot{\zeta }^{\delta }(t)\Vert \le M\), the continuity of L and the dominated convergence theorem imply that there is \(\delta >0\) such that \(\left[ F(\zeta ^{\delta })+I(\zeta ^{\delta })\right] \le \left[ F(\zeta )+I(\zeta )\right] +\varepsilon \). We set \(\phi ^{*}=\zeta ^{\delta }\) for such a \(\delta \).

7.1 Construction of a Nearly Optimal Control

The construction of a control to apply in the representation is now straightforward. Let \(\gamma (dy|x,\beta )\) be as in part (g) of Lemma 4.16. Recall that \(\bar{\mu }_{i}^{n}\) is allowed to be any measurable function of \(\bar{X}_{j}^{n}, j=0,\ldots , i\). Define \(N^{n}\doteq \inf \{j:\Vert \bar{X}_{j}^{n}-\phi ^{*}(j/n)\Vert >1\}\wedge n\). Then we set

$$\begin{aligned} \bar{\mu }_{i}^{n}(\cdot )=\left\{ \begin{array} [c]{cc}\gamma (\cdot |\bar{X}_{i}^{n},\dot{\phi }^{*}(i/n)) &{} \text {if }i<N^{n},\\ \theta (\cdot |\bar{X}_{i}^{n}) &{} \text {if }i\ge N^{n}.\end{array} \right. \end{aligned}$$
(4.25)

The cost under this control satisfies

$$ E\left[ \frac{1}{n}\sum _{i=0}^{n-1}R\left( \bar{\mu }_{i}^{n}(\cdot )\left\| \theta (\cdot |\bar{X}_{i}^{n})\right. \right) \right] =E\left[ \frac{1}{n}\sum _{i=0}^{N^{n}-1}L(\bar{X}_{i}^{n},\dot{\phi }^{*}(i/n))\right] \le K, $$

and therefore Lemma 4.11 applies. Since \(\tau ^{n}\doteq N^{n}/n\) takes values in a compact set, given any subsequence of \(\mathbb {N}\) we can find a further subsequence (again denoted by n) such that \((\bar{X}^{n},\bar{\mu }^{n},\tau ^{n})\) converges in distribution to a limit \((\bar{X},\bar{\mu },\tau )\). Also, it follows from the fact that the mean of \(\bar{\mu }_{i}^{n}\) is \(\dot{\phi }^{*}(i/n)\) for \(i<N^{n}\) that

$$\begin{aligned} x_{0}+\int _{\mathbb {R}^{d}\times [0,t\wedge \tau _{n}]}y\bar{\mu }^{n}(dy\times ds)&=x_{0}+\frac{1}{n}\sum _{i=0}^{\lfloor nt\rfloor \wedge N^{n}-1}\dot{\phi }^{*}(i/n)+O(1/n)\\&=\phi ^{*}(t\wedge \tau ^{n})+O(1/n). \end{aligned}$$

Using the uniform integrability (4.8), we can send \(n\rightarrow \infty \) and obtain

$$ x_{0}+\int _{\mathbb {R}^{d}\times [0,t]}y\bar{\mu }(dy|s)ds=\phi ^{*}(t) $$

for all \(t\in [0,\tau ]\). It then follows from Lemma 4.12 [see (4.15)] that \(\bar{X}(t)=\phi ^{*}(t)\) for all \(t\in [0,\tau ]\), w.p.1. However, since \(\Vert \bar{X}^{n}(\tau ^{n})-\phi ^{*}(\tau ^{n})\Vert \) converges in distribution to \(\Vert \bar{X}(\tau )-\phi ^{*}(\tau )\Vert \), the definition of \(N^{n}\) implies that on the set \(\tau <1\), we have \(\left\| \bar{X}(\tau )-\phi ^{*}(\tau )\right\| \ge 1\). Thus \(P(\tau <1)=0\), and so \(\bar{X}(t)=\phi ^{*}(t)\) for \(t\in [0,1]\). We conclude that along the full sequence \(\mathbb {N}\), \(\bar{X}^{n}\) converges in distribution to \(\phi ^{*}\).

7.2 Completion of the Proof of the Laplace Lower Bound

We now put the pieces together to prove the Laplace lower bound. For the particular control \(\left\{ \bar{\mu }_{i}^{n}\right\} \) just constructed, we have

$$\begin{aligned}&\limsup _{n\rightarrow \infty }-\frac{1}{n}\log E\exp \{-nF(X^{n})\}\\&\quad \le \limsup _{n\rightarrow \infty }E\left[ F(\bar{X}^{n})+\frac{1}{n}\sum _{i=0}^{n-1}R\left( \bar{\mu }_{i}^{n}(\cdot )\left\| \theta (\cdot |\bar{X}_{i}^{n})\right. \right) \right] \\&\quad =\limsup _{n\rightarrow \infty }E\left[ F(\bar{X}^{n})+\frac{1}{n}\sum _{i=0}^{N^{n}-1}L(\bar{X}_{i}^{n},\dot{\phi }^{*}(i/n))\right] \\&\quad =\left[ F(\phi ^{*})+\int _{0}^{1}L(\phi ^{*}(t),\dot{\phi }^{*}(t))dt\right] \\&\quad \le \inf _{\phi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ F(\phi )+I(\phi )\right] +2\varepsilon . \end{aligned}$$

The first inequality follows since the representation considers the infimum over all controls, and the first equality is due to the definition of \(\bar{\mu }_{i}^{n}\) in (4.25). The second equality follows from the weak convergence \(\bar{X}^{n}\Rightarrow \phi ^{*}\) [Lemma 4.12], the uniform bound on \(L(\bar{X}_{i}^{n},\dot{\phi }^{*}(i/n))\) for \(i\le \) \(N^{n}-1\), and the dominated convergence theorem. The last inequality uses the fact that \(\phi ^{*}\) as constructed is within \(2\varepsilon \) of the infimum. Since \(\varepsilon >0\) is arbitrary, the Laplace lower bound

$$ \liminf _{n\rightarrow \infty }\frac{1}{n}\log E\exp \{-nF(X^{n})\}\ge -\inf _{\phi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ F(\phi )+I(\phi )\right] $$

follows.   \(\square \)

7.3 Approximation by Bounded Velocity Paths

We now prove the claim that for \(\zeta \) satisfying (4.24), \(\dot{\zeta }\) can also be assumed bounded.

Lemma 4.17

Consider \(\zeta \in \mathscr {C}([0,1]:\mathbb {R}^{d})\) such that \(\left[ F(\zeta )+I(\zeta )\right] <\infty \). Then given \(\varepsilon >0\), there is \(\zeta ^{*}\) such that \(\{\dot{\zeta }^{*}(t):0\le t\le 1\}\) is bounded and

$$ \sup _{0\le t\le 1}\Vert \zeta (t)-\zeta ^{*}(t)\Vert \le \varepsilon ,\;\quad I(\zeta ^{*})\le I(\zeta )+\varepsilon . $$

Proof

Since F is bounded, \(I(\zeta )<\infty \). For \(\lambda \in (0,1)\) let \(D_{\lambda }\doteq \{t:\Vert \dot{\zeta }(t)\Vert \ge 1/\lambda \}\), and define a time rescaling \(S_{\lambda }:[0,1]\rightarrow [0,\infty )\) by \(S_{\lambda }(0)=0\) and

$$ \dot{S}_{\lambda }(t)=\left\{ \begin{array} [c]{cc}\Vert \dot{\zeta }(t)\Vert /(1-\lambda ), &{} t\in D_{\lambda },\\ 1 &{} \text { otherwise.}\end{array} \right. $$

Then \(S_{\lambda }(t)\) is continuous and strictly increasing. Let \(T_{\lambda }\) be the inverse of \(S_{\lambda }\), which means that \(T_{\lambda }\) satisfies \(T_{\lambda }(S_{\lambda }(t))=t\) for all \(t\in [0,1]\) and \(S_{\lambda }(T_{\lambda }(t))=t\) for all \(t\in [0,S_{\lambda }(1)]\supset [0,1]\). Also define \(\zeta _{\lambda }\) on \([0,S_{\lambda }(1)]\) by

$$ \zeta _{\lambda }(t)\doteq \zeta (T_{\lambda }(t)), $$

which is a “slowed” version of \(\zeta \). By the ordinary chain rule, \(\dot{\zeta }_{\lambda }(S_{\lambda }(t))=\dot{\zeta }(t)/\dot{S}_{\lambda }(t)\), and therefore \(\dot{\zeta }_{\lambda }(t)\) has uniformly bounded derivative for \(t \in [0,1]\). The \(\zeta ^{*}\) in the lemma will be \(\zeta _{\lambda }\) for a positive \(\lambda \).

From part (c) of Lemma 4.14 and part (d) of Lemma 4.16, \(L(x,\beta )\) is uniformly superlinear in \(\beta \): \(L(x,\beta )/\Vert \beta \Vert \rightarrow \infty \) uniformly in x as \(\Vert \beta \Vert \rightarrow \infty \). This property and \(I(\zeta )<\infty \) imply \(\int _{0}^{1}\Vert \dot{\zeta }(t)\Vert dt<\infty \), and consequently

$$\begin{aligned} \lim _{\lambda \rightarrow 0}\int _{0}^{1}1_{D_{\lambda }}(t)\Vert \dot{\zeta }(t)\Vert dt=0. \end{aligned}$$
(4.26)

It follows that \(\lim _{\lambda \rightarrow 0}S_{\lambda }(s)=s\) uniformly for \(s\in [0,1]\). Since

$$ \sup _{t\in [0,1]}\Vert \zeta _{\lambda }(t)-\zeta (t)\Vert =\sup _{t\in [0,1]}\Vert \zeta (T_{\lambda }(t))-\zeta (t)\Vert =\sup _{s\in [0,T_{\lambda }(1)]}\Vert \zeta (s)-\zeta (S_{\lambda }(s))\Vert , $$

it follows that \(\sup _{t\in [0,1]}\Vert \zeta _{\lambda }(t)-\zeta (t)\Vert \rightarrow 0\) as \(\lambda \rightarrow 0\).

Thus we need only show that \(I(\zeta _{\lambda })\) is close to \(I(\zeta )\). Let

$$ \varGamma \doteq \sup _{t\in [0,1]}\sup _{\beta :\Vert \beta \Vert \le 1}L(\zeta (t),\beta )<\infty . $$

For \(t\in D_{\lambda }\), the nonnegativity of L implies

$$\begin{aligned} L\left( \zeta (t),\frac{\dot{\zeta }(t)}{\dot{S}_{\lambda }(t)}\right) \dot{S}_{\lambda }(t)-L(\zeta (t),\dot{\zeta }(t))&\le L\left( \zeta (t),\frac{(1-\lambda )\dot{\zeta }(t)}{\Vert \dot{\zeta }(t)\Vert }\right) \frac{\Vert \dot{\zeta }(t)\Vert }{1-\lambda }\\&\le \frac{\varGamma }{1-\lambda }\Vert \dot{\zeta }(t)\Vert , \end{aligned}$$

and therefore

$$\begin{aligned} I(\zeta _{\lambda })-I(\zeta )&\le \int _{0}^{S_{\lambda }(1)}L(\zeta _{\lambda }(t),\dot{\zeta }_{\lambda }(t))dt-\int _{0}^{1}L(\zeta (t),\dot{\zeta }(t))dt\\&=\int _{0}^{1}L(\zeta _{\lambda }(S_{\lambda }(t)),\dot{\zeta }_{\lambda }(S_{\lambda }(t)))\dot{S}_{\lambda }(t)dt-\int _{0}^{1}L(\zeta (t),\dot{\zeta }(t))dt\\&=\int _{0}^{1}L\left( \zeta (t),\frac{\dot{\zeta }(t)}{\dot{S}_{\lambda }(t)}\right) \dot{S}_{\lambda }(t)dt-\int _{0}^{1}L(\zeta (t),\dot{\zeta }(t))dt\\&\le \frac{\varGamma }{1-\lambda }\int _{0}^{1}1_{D_{\lambda }}(t)\Vert \dot{\zeta }(t)\Vert dt. \end{aligned}$$

From (4.26), the last expression converges to 0 as \(\lambda \rightarrow 0\). Thus \(\limsup _{\lambda \rightarrow 0}I(\zeta _{\lambda })\le I(\zeta )\) and \(\zeta _{\lambda }\rightarrow \zeta \) as \(\lambda \rightarrow 0\), and the claim is proved.   \(\square \)

With the proof that we can assume that \(\dot{\zeta }\) is bounded for \(\zeta \) appearing in (4.24), the proof of the Laplace lower bound under Condition 4.7 is complete.

8 Laplace Lower Bound Under Condition 4.8

In this section we prove the Laplace principle lower bound without assuming the support condition on \(\theta (\cdot |x)\). Instead, besides Condition 4.3, we use the Lipschitz-type assumption of Condition 4.8. The main difficulty in the proof for this case is that the construction of a piecewise linear nearly minimizing trajectory, which simplified the proof in Sect. 4.7, is not directly available here. The arguments of Sect. 4.7 relied on the finiteness and continuity of the function \((x,\beta )\mapsto L(x,\beta )\), properties that in general will not hold for the setting considered in this section.

In order to overcome this difficulty, we introduce a mollification that in a sense reduces the problem to the form studied in Sect. 4.7. The mollification introduces an error that needs to be carefully controlled. This is the main technical challenge in the proof. We will also make use of Lemma 1.10, which shows that the Laplace principle lower bound holds if and only if it holds for F that are Lipschitz continuous. Mollification techniques are often used in large deviation analysis, and are especially useful in proving lower bounds.

The section is organized as follows. We begin in Sect. 4.8.1 by introducing the mollification of the state dynamics. This takes the form of a small additive Gaussian perturbation, parametrized by \(\sigma >0\), to the noise sequence \(\{v_{i}(X_{i}^{n})\}\) in (4.1). We then estimate the asymptotics of \(-\frac{1}{n}\log E\exp \{-nF(X^{n})\}\) through an analogous expression when \(X^{n}\) is replaced by the perturbed state process \(Z_{\sigma }^{n}\). Next in Sect. 4.8.2 we give a variational upper bound for functionals of the perturbed process in terms of a convenient family of controls. The limits of cost functions in this representation are given in terms of a perturbation \(L_{\sigma }\) of the function L introduced in (4.5). Section 4.8.3 studies properties of \(L_{\sigma }\). In particular, we show that \(L_{\sigma }\) is a finite continuous function, is bounded above by L, and satisfies properties analogous to those assumed of L in Condition 4.8. Using these results, in Sect. 4.8.4 we construct a piecewise linear nearly optimal trajectory for \(\inf _{\phi \in \mathscr {C}([0,1]\, :\, \mathbb {R}^{d})}\left[ F(\phi )+I_{\sigma }(\phi )\right] \), where \(I_{\sigma }\) is the rate function associated with the local rate function \(L_{\sigma }\), which is then used to construct an asymptotically nearly optimal control sequence for the representation. Section 4.8.5 studies tightness and convergence properties of the associated controlled processes. Finally, Sect. 4.8.6 uses these convergence results and estimates from Sect. 4.8.1 to complete the proof of the variational upper bound. Throughout this section, F will be a real-valued bounded Lipschitz continuous function on \(\mathscr {C}([0,1]:\mathbb {R}^{d})\).

8.1 Mollification

For \(\sigma >0\), let \(\{w_{i,\sigma }\}_{i\in \mathbb {N}_{0}}\) be an iid sequence of Gaussian random variables with mean 0 and covariance \(\sigma I\) that is independent of the random vector fields \(\{v_{i}(\cdot )\}_{i\in \mathbb {N}_{0}}\). For \(n\in \mathbb {N}\) and \(\sigma >0\), consider along with the sequence \(\{X_{i}^{n}\}_{i=0,\ldots , n}\), the sequence \(\{U_{i,\sigma }^{n}\}_{i=0,\ldots , n}\) defined by

$$ U_{i+1,\sigma }^{n}\doteq U_{i,\sigma }^{n}+\frac{1}{n}w_{i,\sigma },\quad U_{0,\sigma }^{n}=0. $$

Define the piecewise linear process \(\{U_{\sigma }^{n}(t)\}_{t\in [0,1]}\) by

$$ U_{\sigma }^{n}(t)\doteq U_{i,\sigma }^{n}+\left[ U_{i+1,\sigma }^{n}-U_{i,\sigma }^{n}\right] \left( nt-i\right) ,\quad t\in [i/n, i/n+1/n]. $$

Let \(Z_{\sigma }^{n}=X^{n}+U_{\sigma }^{n}\), where \(X^{n}\) is as in (4.2). Note that \(Z_{\sigma }^{n}\) is the piecewise linear interpolation of the sequence \(\{Z_{i,\sigma }^{n}\}\), where \(Z_{i,\sigma }^{n}\) \(=X_{i}^{n}+U_{i,\sigma }^{n}\).

The following result shows that the Laplace lower bound properties of \(\{X^{n}\}\) can be bounded in terms of the lower bound properties of \(\{Z_{\sigma }^{n}\}\). For \(\phi \in \mathscr {C}([0,1]:\mathbb {R}^{d})\), recall that \(\Vert \phi \Vert _{\infty }\doteq \sup _{0\le t\le 1}\Vert \phi (t)\Vert \).

Lemma 4.18

For every \(\sigma >0\),

$$ \limsup _{n\rightarrow \infty }-\frac{1}{n}\log Ee^{-nF(X^{n})}\le \limsup _{n\rightarrow \infty }-\frac{1}{n}\log Ee^{-nF(Z_{\sigma }^{n})}+\frac{M^{2}\sigma ^{2}}{2}, $$

where

$$ M\doteq \sup _{\phi ,\eta \in \mathscr {C}([0,1]:\mathbb {R}^{d}),\phi \ne \eta }\frac{|F(\phi )-F(\eta )|}{\Vert \phi -\eta \Vert _{\infty }}. $$

Proof

Let \(B=2\Vert F\Vert _{\infty }\). Then since

$$ F(Z_{\sigma }^{n})\ge F(X^{n})-(M\Vert U_{\sigma }^{n}\Vert _{\infty })\wedge B, $$

we see that

$$\begin{aligned} -\frac{1}{n}\log Ee^{-nF(X^{n})}\le -\frac{1}{n}\log Ee^{-nF(Z_{\sigma }^{n})}+\frac{1}{n}\log Ee^{n[(M\Vert U_{\sigma }^{n}\Vert _{\infty })\wedge B]}. \end{aligned}$$
(4.27)

We now estimate the second term on the right side of (4.27) using the Laplace principle upper bound (which was proved in Sect. 4.5) with \(\theta (\cdot |x)=\rho _{\sigma }(\cdot )\), where \(\rho _{\sigma }\) is the law of a d-dimensional normal random variable with mean 0 and covariance \(\sigma I\). Let

$$ H_{\sigma }(\alpha )\doteq \log \int _{\mathbb {R}^{d}}\exp \langle \alpha , y\rangle \rho _{\sigma }(dy)=\frac{\sigma ^{2}}{2}\Vert \alpha \Vert ^{2},\;\alpha \in \mathbb {R}^{d}. $$

The Legendre–Fenchel transform of \(H_{\sigma }\) is given by

$$ L_{\sigma }(\beta )\doteq \sup _{\alpha \in \mathbb {R}^{d}}\left[ \langle \alpha ,\beta \rangle -H_{\sigma }(\alpha )\right] =\frac{1}{2\sigma ^{2}}\Vert \beta \Vert ^{2}. $$

Then \(\{U_{\sigma }^{n}\}_{n\in \mathbb {N}}\) satisfies the Laplace upper bound with rate function

$$ I_{0,\sigma }(\varphi )\doteq \frac{1}{2\sigma ^{2}}\int _{0}^{1}\Vert \dot{\varphi }(s)\Vert ^{2}ds $$

if \(\varphi \in \mathscr {C}([0,1]:\mathbb {R}^{d})\) is absolutely continuous and \(\varphi (0)=0\), and \(I_{0,\sigma }(\varphi )\doteq \infty \) otherwise. This upper bound yields

$$\begin{aligned} \limsup _{n\rightarrow \infty }&\frac{1}{n}\log Ee^{n[(M\Vert U_{\sigma }^{n}\Vert _{\infty })\wedge B]}\nonumber \\&\quad \le -\inf _{\varphi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ I_{0,\sigma }(\varphi )-(M\Vert \varphi \Vert _{\infty })\wedge B\right] \nonumber \\&\quad \le -\inf _{\varphi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ I_{0,\sigma }(\varphi )-M\Vert \varphi \Vert _{\infty }\right] . \end{aligned}$$
(4.28)

For all \(\varphi \in \mathscr {C}([0,1]:\mathbb {R}^{d})\) with \(I_{0,\sigma }(\varphi )<\infty \), we have

$$ \Vert \varphi \Vert _{\infty }^{2}=\sup _{t\in [0,1]}\left\| \int _{0}^{t}\dot{\varphi }(s)ds\right\| ^{2}\le \int _{0}^{1}\Vert \dot{\varphi }(s)\Vert ^{2}ds, $$

and thus

$$\begin{aligned} \inf _{\varphi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ I_{0,\sigma }(\varphi )-M\Vert \varphi \Vert _{\infty }\right]&\ge \inf _{\varphi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ \frac{\Vert \varphi \Vert _{\infty }^{2}}{2\sigma ^{2}}-M\Vert \varphi \Vert _{\infty }\right] \\&=\inf _{r\ge 0}\left[ \frac{r^{2}}{2\sigma ^{2}}-Mr\right] \\&=-\frac{M^{2}\sigma ^{2}}{2}. \end{aligned}$$

The claim of the lemma now follows from the last display, (4.27), and (4.28).   \(\square \)

8.2 Variational Bound for the Mollified Process

In this section we present a variational bound for \(Ee^{-nF(Z_{\sigma }^{n})}\). The basic idea is to apply Theorem 4.5 with the Markov chain \(\{X_{i}^{n}\}_{i\in \mathbb {N}_{0}}\) replaced by the \(\mathbb {R}^{2d}\)-valued Markov chain \(\{(X_{i}^{n}, U_{i,\sigma }^{n})\}_{i\in \mathbb {N}_{0}}\). Let \(Y_{i,\sigma }^{n}\doteq (X_{i}^{n}, U_{i,\sigma }^{n})\). The following construction is analogous to Construction 4.4, but for the doubled set of noises appearing in the mollification. In addition, we will build in the restriction on controls just mentioned.

Construction 4.19

Suppose we are given a probability measure \(\mu ^{n}\in P((\mathbb {R}^{d}\times \mathbb {R}^{d})^{n})\) , and decompose it into a collection of stochastic kernels. With a point in \((\mathbb {R}^{d}\times \mathbb {R}^{d})^{n}\) denoted by \((v_{1}, w_{1}, v_{2}, w_{2},\ldots ,v_{n}, w_{n})\) , \([\mu ^{n}]_{i|0,\ldots , i-1}^{1}\) will denote the marginal distribution of \(\mu ^{n}\) on \(v_{i}\) given \((v_{j},w_{j}), j<i\) , and \([\mu ^{n}]_{i|0,\ldots , i-1}^{2}\) will denote the marginal distribution of \(\mu ^{n}\) on \(w_{i}\) given \((v_{j},w_{j}), j<i\) and \(v_{i}\) .

We now assume that \(\mu ^{n}\) also has the property that \([\mu ^{n}]_{i|0,\ldots , i-1}^{2}\) does not depend on \(v_{i}\) , which implies that the distributions on \(v_i\) and \(w_i\) are conditionally independent given \((v_j,w_j), j<i\) . Let \(\{(\bar{v}_{i}^{n},\bar{w}_{i}^{n})\}_{i=0,\ldots , n-1}\) be random variables defined on a probability space \((\varOmega ,\mathscr {F}, P)\) and with joint distribution \(\mu ^{n}\) . Let \(\bar{\mathscr {F}}_{i}^{n}\doteq \sigma ((\bar{v}_{j}^{n},\bar{w}_{j}^{n}), j=0,\ldots , i-1)\) , and define

$$\begin{aligned} \bar{\mu }_{i}^{1,n}(dv_{i})&\doteq [\mu ^{n}]_{i|0,\ldots , i-1}^{1}(dv_{i}|\bar{v}_{0}^{n},\bar{w}_{0}^{n}, \ldots ,\bar{v}_{i-1}^{n}, \bar{w}_{i-1}^{n})\\ \bar{\mu }_{i}^{2,n}(dw_{i})&\doteq [\mu ^{n}]_{i|0,\ldots , i-1}^{2}(dw_{i}|\bar{v}_{0}^{n},\bar{w}_{0}^{n}, \ldots ,\bar{v}_{i-1}^{n}, \bar{w}_{i-1}^{n}), \end{aligned}$$

so that these controls pick the distributions of \(\bar{v}_{i}^{n}\) and \(\bar{w}_{i}^{n}\) conditioned on \(\bar{\mathscr {F}}_{i}^{n}\) . Controlled processes \(\bar{X}^{n}\) , \(\bar{U}_{\sigma }^{n}\) and measures \(\bar{L}^{n}\) are then recursively constructed as follows. Let \((\bar{X}_{0}^{n},\bar{U}_{0,\sigma }^{n})=(x_{0}, 0)\) and define

$$\begin{aligned} \bar{X}_{i+1}^{n}=\bar{X}_{i}^{n}+\frac{1}{n}\bar{v}_{i}^{n},\quad \bar{U}_{i+1,\sigma }^{n}=\bar{U}_{i,\sigma }^{n}+\frac{1}{n}\bar{w}_{i}^{n}. \end{aligned}$$
(4.29)

Note that \(\bar{\mathscr {F}}_{i}^{n}=\sigma ((\bar{X}_{j}^{n},\bar{U}_{j,\sigma }^{n}), j=1,\ldots , i)\) . When \(\{(\bar{X}_{i}^{n},\bar{U}_{i,\sigma }^{n})\}_{i=1,\ldots , n}\) has been constructed, \(\bar{X}^{n}(t)\) and \(\bar{U}_{\sigma }^{n}(t)\) are defined as in ( 4.2 ) as the piecewise linear interpolations, and we set \(\bar{Z}_{\sigma }^{n}(t)\doteq \bar{X}^{n}(t)+\bar{U}_{\sigma }^{n}(t)\) for \(t\in [0,1]\) . In addition, define

$$ \bar{L}^{n}(A\times B)\doteq \int _{B}\bar{L}^{n}(A|t)dt,\quad \bar{L}^{n}(A|t)=\delta _{(\bar{v}_{i}^{n},\bar{w}_{i}^{n})}(A)\text { if }t\in [i/n, i/n+1/n). $$

The following is the main result of this section. Owing to the restriction placed on the controls in Construction 4.19, we obtain only an inequality, but the inequality is in the right direction to establish a Laplace lower bound.

Proposition 4.20

Let \(F:\mathscr {C}([0,1]:\mathbb {R}^{d})\rightarrow \mathbb {R}\) be Lipschitz continuous. Given a control \(\{(\bar{\mu }_{i}^{1,n},\bar{\mu }_{i}^{2,n})\}_{i=0,\ldots , n-1}\), let \(\{\bar{X}_{i}^{n}\}\) and \(\{\bar{Z}_{\sigma }^{n}\}\) be defined as in Construction 4.19. Then for all \(n\in \mathbb {N}\) and \(\sigma >0\),

$$\begin{aligned}&-\frac{1}{n}\log Ee^{-nF(Z_{\sigma }^{n})}\nonumber \\&\quad \le \inf _{\{\bar{\mu }_{i}^{1,n},\bar{\mu }_{i}^{2,n}\}}E\left[ F(\bar{Z}_{\sigma }^{n})+\frac{1}{n}\sum _{i=0}^{n-1}\left[ R\left( \bar{\mu }_{i}^{1,n}(\cdot )\Vert \theta (\cdot |\bar{X}_{i}^{n})\right) +R\left( \bar{\mu }_{i}^{2,n}\Vert \rho _{\sigma }\right) \right] \right] . \end{aligned}$$

Proof

We apply Theorem 4.5 with d replaced by 2d, \(\{X_{i}^{n}\}\) replaced by \(\{(X_{i}^{n}, U_{i,\sigma }^{n})\}\), and \(G:\mathscr {P}(\mathbb {R}^{2d}\times [0,1])\rightarrow \mathbb {R}\) defined by

$$ G(\gamma )\doteq F(\varphi _{\gamma }),\;\gamma \in \mathscr {P}(\mathbb {R}^{2d}\times [0,1]), $$

where \(\varphi _{\gamma }\in \mathscr {C}([0,1]:\mathbb {R}^{d})\) is defined by

$$ \varphi _{\gamma }(t)\doteq \int _{\mathbb {R}^{2d}\times [0,t]}(y+z)\gamma (dy\times dz\times ds)+x_{0},\;t\in [0,1] $$

if \(\left\| y+z\right\| \) has finite integral under \(\gamma \), and \(\varphi _{\gamma }\) identically zero otherwise. Let \(Y_{i,\sigma }^{n}=(X_{i}^{n}, U_{i,\sigma }^{n})\) and let \(\bar{Y}_{i,\sigma }^{n}\) be its controlled analogue according to (4.29), with the appropriate replacements, and in particular \(\{\bar{v}_{i}^{n}\}\) replaced by \(\{(\bar{v}_{i}^{n},\bar{w}_{i}^{n})\}\). Let

$$ \bar{\mu }_{i}^{n}(A\times B)\doteq \bar{\mu }_{i}^{1,n}(A)\bar{\mu }_{i}^{2,n}(B),\;A, B\in \mathscr {B}(\mathbb {R}^{d}). $$

Since we have placed restrictions on the measures \(\mu ^{n}\) (equivalently on the controls \(\{(\bar{\mu }_{i}^{1,n},\bar{\mu }_{i}^{2,n})\}\)), Theorem 4.5 yields the inequality

$$\begin{aligned}&-\frac{1}{n}\log Ee^{-nF(Z_{\sigma }^{n})}\\&\quad \le \inf _{\{\bar{\mu }_{i}^{1,n},\bar{\mu }_{i}^{2,n}\}}E\left[ F(\varphi _{\bar{L}^{n}})+\frac{1}{n}\sum _{i=0}^{n-1}R\left( \bar{\mu }_{i}^{n}(dv_{i}\times dw_{i})\Vert \theta (dv_{i}|\bar{X}_{i}^{n})\rho _{\sigma }(dw_{i})\right) \right] .\nonumber \end{aligned}$$
(4.30)

Here we have used that the distribution of the original process on \((v_{i}, w_{i})\) depends on \(Y_{i,\sigma }^{n}\) only through \(X_{i}^{n}\). The chain rule implies

$$ R\left( \bar{\mu }_{i}^{n}(dv_{i}\times dw_{i})\Vert \theta (dv_{i}|\bar{X}_{i}^{n})\rho _{\sigma }(dw_{i})\right) =R\left( \bar{\mu }_{i}^{1,n}(\cdot )\Vert \theta (\cdot |\bar{X}_{j}^{n})\right) +R\left( \bar{\mu }_{i}^{2,n}\Vert \rho _{\sigma }\right) . $$

Finally, from the definition of \(\varphi _{\gamma }\) it follows that if the relative entropy cost is finite, then \(F(\varphi _{\bar{L}^{n}})=F(\bar{Z}_{\sigma }^{n})\) w.p.1. Inserting these into (4.30) completes the proof of the lemma.   \(\square \)

8.3 Perturbation of L and Its Properties

In order to characterize the limits of the relative entropy terms in (4.30), we use a perturbation of the function L introduced in (4.5). For \(\sigma >0\), let

$$ L_{\sigma }(x,\beta )\doteq \sup _{\alpha \in \mathbb {R}^{d}}\left[ \langle \alpha ,\beta \rangle -H(x,\alpha )-\frac{\sigma ^{2}}{2}\Vert \alpha \Vert ^{2}\right] . $$

Note that for each \(x\in \mathbb {R}^{d}\), \(\beta \mapsto L_{\sigma }(x,\beta )\) is the Legendre–Fenchel transform of

$$ H_{\sigma }(x,\alpha )\doteq \log \int _{\mathbb {R}^{d}}e^{\langle \alpha , y\rangle }\theta _{\sigma }(dy|x), $$

where \(\theta _{\sigma }(\cdot |x)\) is the distribution of \(v_{0}(x)+w_{0,\sigma }\). The following lemma records some important properties of \(L_{\sigma }\).

Lemma 4.21

Assume Conditions 4.3 and 4.8 and fix \(\sigma >0\). Then the following conclusions hold.

(a) For all \((x,\beta )\in \mathbb {R}^{d}\times \mathbb {R}^{d}\),

$$ L_{\sigma }(x,\beta )=\inf _{b\in \mathbb {R}^{d}}\left[ L(x,\beta -b)+\frac{\Vert b\Vert ^{2}}{2\sigma ^{2}}\right] . $$

(b) For all \((x,\beta )\in \mathbb {R}^{d}\times \mathbb {R}^{d}\), \(L_{\sigma }(x,\beta )\le L(x,\beta )\).

(c) \((x,\beta )\mapsto L_{\sigma }(x,\beta )\) is a finite nonnegative continuous function on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\).

(d) Condition 4.8 is satisfied with L replaced by \(L_{\sigma }\) uniformly in \(\sigma \) in the following sense: for every compact \(K\subset \mathbb {R}^{d}\) and \(\varepsilon \in (0,1)\) there exist \(\bar{\eta }=\bar{\eta }(K,\varepsilon )\in (0,1)\) and \(\bar{m}=\bar{m}(K,\varepsilon )\in (0,\infty )\) such that whenever \(\xi ,\chi \in K\) satisfy \(\Vert \xi -\chi \Vert \le \bar{\eta }\), for every \(\bar{\gamma }\in \mathbb {R}^{d}\) and \(\sigma >0\) we can find \(\bar{\beta }\in \mathbb {R}^{d}\) such that

$$\begin{aligned} L_{\sigma }(\xi ,\bar{\beta })-L_{\sigma }(\chi ,\bar{\gamma })\le \varepsilon (1+L_{\sigma }(\chi ,\bar{\gamma })),\;\;\Vert \bar{\beta }-\bar{\gamma }\Vert \le \bar{m}(1+L_{\sigma }(\chi ,\bar{\gamma }))\Vert \xi -\chi \Vert . \end{aligned}$$
(4.31)

(e) Given \(\zeta \in \mathscr {C}([0,1]:\mathbb {R}^{d})\) satisfying \(I(\zeta )<\infty \), there is a \(\zeta ^{*}\in \mathscr {C}([0,1]:\mathbb {R}^{d})\) that is piecewise linear with finitely many pieces such that \(\Vert \zeta ^{*}-\zeta \Vert _{\infty }<\sigma \) and

$$\begin{aligned} \int _{0}^{1}L_{\sigma }(\zeta ^{*}(t),\dot{\zeta }^{*}(t))dt\le \int _{0}^{1}L_{\sigma }(\zeta (t),\dot{\zeta }(t))dt+\sigma \le I(\zeta )+\sigma . \end{aligned}$$
(4.32)

Proof

(a) For each \(x\in \mathbb {R}^{d}\), \(\beta \mapsto L_{\sigma }(x,\beta )\) is the Legendre–Fenchel transform of the sum of the convex functions \(H(x,\cdot )\) and \(\bar{H}_{\sigma }(\cdot )\), where \(\bar{H}_{\sigma }(\alpha )=\sigma ^{2}\Vert \alpha \Vert ^{2}/2\), \(\alpha \in \mathbb {R}^{d}\). The Legendre transform of the first function is \(L(x,\cdot )\), and that of the second function is \(L_{\sigma }(\beta )=\Vert \beta \Vert ^{2}/2\sigma ^{2}\), \(\beta \in \mathbb {R}^{d}\). From Theorem 16.4 of [217] it follows that

$$ L_{\sigma }(x,\beta )=\inf \left[ L(x,\beta _{1})+\frac{\Vert \beta _{2}\Vert ^{2}}{2\sigma ^{2}}:\beta _{i}\in \mathbb {R}^{d}, i=1,2,\beta _{1}+\beta _{2}=\beta \right] , $$

as claimed.

(b) This is an immediate consequence of part (a).

(c) Note that Conditions 4.3 and 4.7 are satisfied with \(H(x,\alpha )\) replaced by \(H_{\sigma }(x,\alpha )\doteq H(x,\alpha )+H_{\sigma }(\alpha )\) and \(\theta (\cdot |x)\) replaced by \(\theta _{\sigma }(\cdot |x)\). Part (c) now follows from parts (a), (d), and (f) of Lemma 4.16.

(d) Fix a compact \(K\subset \mathbb {R}^{d}\) and \(\varepsilon \in (0,1)\). From Condition 4.8, we can find \(\eta \in (0,1)\) and \(m\in (0,\infty )\) such that whenever \(\xi ,\chi \in K\) satisfy \(\Vert \xi -\chi \Vert \le \eta \), we can find for every \(\gamma \in \mathbb {R}^{d}\) a \(\beta \in \mathbb {R}^{d}\) such that

$$\begin{aligned} L(\xi ,\beta )-L(\chi ,\gamma )\le \frac{\varepsilon }{2}(1+L(\chi ,\gamma )),\;\;\Vert \beta -\gamma \Vert \le m(1+L(\chi ,\gamma ))\Vert \xi -\chi \Vert . \end{aligned}$$
(4.33)

We claim that the statement in (d) holds for \(\bar{\eta }=\eta \) and \(\bar{m}=2m\). To see this, suppose \(\xi ,\chi \in K\) satisfy \(\Vert \xi -\chi \Vert \le \eta \) and \(\bar{\gamma }\in \mathbb {R}^{d}\). Using part (a), we can find \(\bar{b}\in \mathbb {R}^{d}\) such that

$$\begin{aligned} L_{\sigma }(\chi ,\bar{\gamma })\ge L(\chi ,\bar{\gamma }-\bar{b})+\frac{\Vert \bar{b}\Vert ^{2}}{2\sigma ^{2}}-\frac{\varepsilon }{4}. \end{aligned}$$
(4.34)

Using Condition 4.8 and taking \(\gamma =\bar{\gamma }-\bar{b}\), we can find \(\beta \in \mathbb {R}^{d}\) such that (4.33) holds. Letting \(\bar{\beta }=\beta +\bar{b}\), we have

$$\begin{aligned} \Vert \bar{\beta }-\bar{\gamma }\Vert&=\Vert \beta -\gamma \Vert \\&\le m(1+L(\chi ,\bar{\gamma }-\bar{b}))\Vert \xi -\chi \Vert \\&\le m(1+L_{\sigma }(\chi ,\bar{\gamma })+\varepsilon /4)\Vert \xi -\chi \Vert \\&\le 2m(1+L_{\sigma }(\chi ,\bar{\gamma }))\Vert \xi -\chi \Vert \,, \end{aligned}$$

where the second inequality follows from (4.34). This proves the second inequality in (4.31). Also, from part (a),

$$\begin{aligned} L_{\sigma }(\xi ,\bar{\beta })-L_{\sigma }(\chi ,\bar{\gamma })&\le L(\xi ,\bar{\beta }-\bar{b})+\frac{\Vert \bar{b}\Vert ^{2}}{2\sigma ^{2}}-L(\chi ,\bar{\gamma }-\bar{b})-\frac{\Vert \bar{b}\Vert ^{2}}{2\sigma ^{2}}+\frac{\varepsilon }{4}\\&\le \frac{\varepsilon }{2}\left( 1+L(\chi ,\bar{\gamma }-\bar{b})+\frac{\Vert \bar{b}\Vert ^{2}}{2\sigma ^{2}}\right) +\frac{\varepsilon }{4}\\&\le \varepsilon \left( 1+L_{\sigma }(\chi ,\bar{\gamma })\right) , \end{aligned}$$

where the second inequality is from (4.33) and the third is from (4.34). This proves the first inequality in (4.31) and completes the proof of part (d).

(e) Recall that for all \(\sigma >0\), \(H_{\sigma }(\cdot ,\cdot )\) and \(\theta _{\sigma }\) satisfy Conditions 4.3 and 4.7. Fix \(\zeta \in \mathscr {C}([0,1]:\mathbb {R}^{d})\) satisfying \(I(\zeta )<\infty \). Note that \(I_{\sigma }(\zeta )\doteq \int _{0}^{1}L_{\sigma }(\zeta (t),\dot{\zeta }(t))dt\le I(\zeta )\). Then applying Lemma 4.17 to \(I_{\sigma }\), we can find \(\zeta _{1}^{*}\in \mathscr {C}([0,1]:\mathbb {R}^{d})\) such that \(\{\dot{\zeta }_{1}^{*}(t):0\le t\le 1\}\) is bounded, \(\Vert \zeta _{1}^{*}-\zeta \Vert _{\infty }<\frac{\sigma }{2}\), and (4.32) holds with \(\zeta ^{*}\) replaced by \(\zeta _{1}^{*}\) and \(\sigma \) replaced by \(\sigma /2\). The statement in part (e) now follows by taking \(\zeta ^{*}\) to be a piecewise linear approximation of \(\zeta _{1}^{*}\) and using the continuity of \((x,\beta )\mapsto L_{\sigma }(x,\beta )\).   \(\square \)

8.4 A Nearly Optimal Trajectory and Associated Control Sequence

For \(\varepsilon >0\) let \(\zeta \in \mathscr {C}([0,1]:\mathbb {R}^{d})\) be such that

$$\begin{aligned} F(\zeta )+I(\zeta )\le \inf _{\varphi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ F(\varphi )+I(\varphi )\right] +\varepsilon . \end{aligned}$$
(4.35)

Here F, as in Sect. 4.8.1, is a Lipschitz continuous function from \(\mathscr {C}([0,1]:\mathbb {R}^{d})\) to \(\mathbb {R}\), and I is the expected rate function for the system without mollification. From part (e) of Lemma 4.21, for each fixed \(\sigma \in (0,1)\), we can find a \(\zeta ^{*}\in \mathscr {C}([0,1]:\mathbb {R}^{d})\) that is piecewise linear with finitely many pieces such that \(\Vert \zeta ^{*}-\zeta \Vert _{\infty }\le \sigma \) and (4.32) holds, i.e., \(\int _{0}^{1}L_{\sigma }(\zeta ^{*}(t),\dot{\zeta }^{*}(t))dt\le I(\zeta )+\sigma \).

We now construct a control sequence to be applied to the mollified process for which the running cost is asymptotically close to the left side in (4.32) and the associated controlled process \(\bar{Z}_{\sigma }^{n}\) tracks the nearly optimal trajectory \(\zeta ^{*}\) closely as \(n\rightarrow \infty \). Let

$$ K\doteq \bigcup _{t\in [0,1]}\left\{ y\in \mathbb {R}^{d}:\Vert y-\zeta (t)\Vert \le 2\right\} . $$

Since \(\zeta \) is continuous, K is compact. We apply part (d) of Lemma 4.21 with K defined as in the last display and \(\varepsilon \) as in (4.35). Thus there exist \(\bar{\eta }\in (0,1)\) and \(\bar{m}\in (0,\infty )\) such that whenever \(\xi ,\chi \in K\) satisfy \(\Vert \xi -\chi \Vert \le \bar{\eta }\), we can find for every \(\bar{\gamma }\in \mathbb {R}^{d}\) and \(\sigma \in (0,1)\), a \(\bar{\beta }\in \mathbb {R}^{d}\) such that (4.31) holds. The following lemma says that the selection of \(\bar{\beta }\) can be done in a measurable way. Note that the choice of \(\bar{\eta }\) and \(\bar{m}\) depends on \(\varepsilon \) and K, but is independent of \(\sigma \).

Lemma 4.22

(a) Fix \(\chi ,\bar{\gamma }\in \mathbb {R}^{d}\) and \(\sigma >0\). Given \(\xi \in K(\chi )\doteq \{x\in K:\Vert x-\chi \Vert \le \bar{\eta }\}\), define \(\varGamma _{\xi }\) to be the set of all \(\bar{\beta }\in \mathbb {R}^{d}\) such that (4.31) holds. Then there is a measurable map \(B:K(\chi )\rightarrow \mathbb {R}^{d}\) such that \(B(\xi )\in \varGamma _{\xi }\) for all \(\xi \in K(\chi )\) and \(\chi \in K\).

(b) Let \(K_{\bar{M}}=\{\beta \in \mathbb {R}^{d}:\Vert \beta \Vert \le \bar{M}\}\), where

$$ \bar{M}=\sup _{s\in [0,1],\xi \in K}\left[ \bar{m}\Vert \xi -\zeta ^{*}(s)\Vert \left( 1+L_{\sigma }(\zeta ^{*}(s),\dot{\zeta }^{*}(s))\right) +\Vert \dot{\zeta }^{*}(s)\Vert \right] . $$

Given \((\xi ,\beta )\in K\times K_{\bar{M}}\), define \(\tilde{\varGamma }_{(\xi ,\beta )}\) to be the set of all \((\beta ^{1},\beta ^{2})\in \mathbb {R}^{d}\times \mathbb {R}^{d}\) such that

$$ L(\xi ,\beta ^{1})+\frac{1}{2\sigma ^{2}}\Vert \beta ^{2}\Vert ^{2}\le L_{\sigma }(\xi ,\beta )+\sigma ,\;\beta ^{1}+\beta ^{2}=\beta . $$

Then there are measurable maps \(B^{i}:K\times K_{\bar{M}}\rightarrow \mathbb {R}^{d}\), \(i=1,2\), such that \((B^{1}(\xi ,\beta ),\) \(B^{2}(\xi ,\beta ))\in \tilde{\varGamma }_{(\xi ,\beta )}\) for all \((\xi ,\beta )\in K\times K_{\bar{M}}\).

Proof

Corollary E.3 in the appendix is concerned with measurable selections. The proof of part (a) is immediate from this corollary and the continuity of \(L_{\sigma }(\cdot ,\cdot )\). The second part also follows from Corollary E.3 and the lower semicontinuity of L proved in part (b) of Lemma 4.14. Indeed, suppose \((\xi _{n},\beta _{n})\in K\times K_{\bar{M}}\) are such that \(\xi _{n}\rightarrow \xi \) and \(\beta _{n}\rightarrow \beta \), and let \((\beta _{n}^{1},\beta _{n}^{2})\in \tilde{\varGamma }_{(\xi _{n},\beta _{n})}\). Since \(K\times K_{\bar{M}}\) is compact and \(L_{\sigma }\) is continuous, \(\sup _{n\in \mathbb {N}}L_{\sigma }(\xi _{n},\beta _{n})<\infty \). Using the inequality

$$\begin{aligned} L(\xi _{n},\beta _{n}^{1})+\frac{1}{2\sigma ^{2}}\Vert \beta _{n}^{2}\Vert ^{2}\le L_{\sigma }(\xi _{n},\beta _{n})+\sigma \,,\end{aligned}$$
(4.36)

we see that \(\{\beta _{n}^{2}\}\) is bounded, and since \(\beta _{n}^{1}+\beta _{n}^{2}=\beta _{n}\), \(\{\beta _{n}^{1}\}\) is bounded as well. Suppose now that \(\beta _{n}^{1}\rightarrow \beta ^{1}\) and \(\beta _{n}^{2}\rightarrow \beta ^{2}\) along a subsequence. Clearly \(\beta ^{1}+\beta ^{2}=\beta \), and from the lower semicontinuity of L and continuity of \(L_{\sigma }\), (4.36) holds with \((\xi _{n},\beta _{n},\beta _{n}^{1},\beta _{n}^{2})\) replaced by \((\xi ,\beta ,\beta ^{1},\beta ^{2})\). Thus \((\beta ^{1},\beta ^{2})\in \tilde{\varGamma }_{(\xi ,\beta )}\). Hence the assumptions of Corollary E.3 are satisfied, and the result follows.   \(\square \)

As shown in Appendix B, it follows from part (g) of Lemma 4.16 that there are stochastic kernels \(\gamma ^{i}\), \(i=1,2\), from \(\mathbb {R}^{d}\times \mathbb {R}^{d}\) to \(\mathscr {P}(\mathbb {R}^{d})\) and \(\mathbb {R}^{d}\) to \(\mathscr {P}(\mathbb {R}^{d})\), respectively, such that for all \((\xi ,\beta ^{1})\in \mathbb {R}^{d}\times \mathbb {R}^{d}\) and \(\beta ^{2}\in \mathbb {R}^{d}\),

$$ R\left( \gamma ^{1}(\cdot |\xi ,\beta ^{1})\left\| \theta (\cdot |\xi )\right. \right) =L(\xi ,\beta ^{1})\quad \text {and}\quad \int _{\mathbb {R}^{d}}y\gamma ^{1}(dy|\xi ,\beta ^{1})=\beta ^{1}$$

and

$$ R\left( \gamma ^{2}(\cdot |\beta ^{2})\left\| \rho _{\sigma }(\cdot )\right. \right) =\frac{1}{2\sigma ^{2}}\Vert \beta ^{2}\Vert ^{2}\quad \text {and}\quad \int _{\mathbb {R}^{d}}y\gamma ^{2}(dy|\beta ^{2})=\beta ^{2}. $$

Using these kernels, we recursively define a control sequence \(\{(\bar{\mu }_{i}^{1,n},\bar{\mu }_{i}^{2,n})\}\), controlled processes \(\{(\bar{X}_{i}^{n},\bar{U}_{i,\sigma }^{n})\}\), and a stopping time \(\kappa ^{n}\) as follows. We initialize with \((\bar{X}_{0}^{n},\bar{U}_{0,\sigma }^{n})=(x_{0}, 0)\), and set

$$ \kappa ^{n}\doteq \inf \left\{ i\in \mathbb {N}_{0}:\Vert \bar{X}_{i}^{n}-\zeta ^{*}(i/n)\Vert >\bar{\eta }\right\} \wedge n. $$

At each discrete time \(j=0,\ldots ,\kappa ^{n}-1\) we apply part (a) of Lemma 4.22 with \((\chi ,\bar{\gamma })=(\zeta ^{*}(j/n),\dot{\zeta }^{*}(j/n))\). Noting that \(\bar{X}_{j}^{n}\in K(\zeta ^{*}(j/n))\), we define \(\bar{\beta }_{j}^{n}=B(\bar{X}_{j}^{n})\). Note that \(\bar{\beta }_{j}^{n}\in K_{\bar{M}}\). With \(B^{i}\) as in part (b) of Lemma 4.22, let \(\beta _{j}^{i, n}=B^{i}(\bar{X}_{j}^{n},\bar{\beta }_{j}^{n})\), \(i=1,2\). Define

$$ \bar{\mu }_{j}^{1,n}(\cdot )=1_{\{j<\kappa ^{n}\}}\gamma ^{1}(\cdot |\bar{X}_{j}^{n},\beta _{j}^{1,n})+1_{\{j\ge \kappa ^{n}\}}\theta (\cdot |\bar{X}_{j}^{n}), $$
$$ \bar{\mu }_{j}^{2,n}(\cdot )=1_{\{j<\kappa ^{n}\}}\gamma ^{2}(\cdot |\beta _{j}^{2,n})+1_{\{j\ge \kappa ^{n}\}}\rho _{\sigma }(\cdot ) $$

and define \(\bar{v}_{j}^{n},\bar{w}_{j,\sigma }^{n},\bar{X}_{j+1}^{n},\bar{U}_{j+1,\sigma }^{n},\bar{X}^{n},\bar{U}_{\sigma }^{n}\), and \(\bar{Z}_{\sigma }^{n}\) according to Construction 4.19. As in the previous proof of a Laplace lower bound, we revert to the original distributions when \(\bar{X}_{i}^{n}\) wanders farther than \(\bar{\eta }\) from \(\zeta ^{*}(i/n)\) to keep the relative entropy costs uniformly bounded. This will be needed when we study the convergence of the controlled processes.

Note that the choice of the control sequence ensures that for \(j\in \{0,1,\ldots ,\kappa ^{n}-1\}\),

$$\begin{aligned} R\left( \bar{\mu }_{j}^{1,n}(\cdot )\left\| \theta (\cdot |\bar{X}_{j}^{n})\right. \right) =L(\bar{X}_{j}^{n},\beta _{j}^{1,n}),\;\int _{\mathbb {R}^{d}}y\bar{\mu }_{j}^{1,n}(dy)=\beta _{j}^{1,n}\end{aligned}$$
(4.37)

and

$$\begin{aligned} R\left( \bar{\mu }_{j}^{2,n}\left\| \rho _{\sigma }\right. \right) =\frac{1}{2\sigma ^{2}}\Vert \beta _{j}^{2,n}\Vert ^{2},\;\int _{\mathbb {R}^{d}}y\bar{\mu }_{j}^{2,n}(dy)=\beta _{j}^{2,n}.\end{aligned}$$
(4.38)

Also for \(j\in \{\kappa ^{n},\ldots , n-1\}\), \(R(\bar{\mu }_{j}^{1,n}||\theta (\cdot |\bar{X}_{j}^{n}))=R(\bar{\mu }_{j}^{2,n}\left\| \rho _{\sigma }\right. )=0\). From the choice of \((\chi ,\bar{\gamma })=(\zeta ^{*}(j/n),\dot{\zeta }^{*}(j/n))\) in the definition of \(\bar{\beta }_{j}^{n}\),

$$\begin{aligned}&L_{\sigma }(\bar{X}_{j}^{n},\bar{\beta }_{j}^{n})-L_{\sigma }(\zeta ^{*}(j/n),\dot{\zeta }^{*}(j/n))\le \varepsilon \left[ 1+L_{\sigma }(\zeta ^{*}(j/n),\dot{\zeta }^{*}(j/n))\right] ,\end{aligned}$$
(4.39)
$$\begin{aligned}&\Vert \bar{\beta }_{j}^{n}-\dot{\zeta }^{*}(j/n)\Vert \le \bar{m}\left[ 1+L_{\sigma }(\zeta ^{*}(j/n),\dot{\zeta }^{*}(j/n))\right] \Vert \bar{X}_{j}^{n}-\zeta ^{*}(j/n)\Vert \end{aligned}$$
(4.40)

and

$$\begin{aligned} L(\bar{X}_{j}^{n},\bar{\beta }_{j}^{1,n})+\frac{1}{2\sigma ^{2}}\Vert \beta _{j}^{2,n}\Vert ^{2}\le L_{\sigma }(\bar{X}_{j}^{n},\bar{\beta }_{j}^{n})+\sigma .\end{aligned}$$
(4.41)

It follows that

$$\begin{aligned}&E\left[ \frac{1}{n}\sum _{j=0}^{n-1}\left( R\left( \bar{\mu }_{j}^{1,n}(\cdot )\Vert \theta (\cdot |\bar{X}_{j}^{n})\right) +R\left( \bar{\mu }_{j}^{2,n}(\cdot )\Vert \rho _{\sigma }(\cdot )\right) \right) \right] \\&\quad =E\left[ \frac{1}{n}\sum _{j=0}^{\kappa ^{n}-1}\left( R\left( \bar{\mu }_{j}^{1,n}(\cdot )\Vert \theta (\cdot |\bar{X}_{j}^{n})\right) +R\left( \bar{\mu }_{j}^{2,n}(\cdot )\Vert \rho _{\sigma }(\cdot )\right) \right) \right] \\&\quad =E\left[ \frac{1}{n}\sum _{j=0}^{\kappa ^{n}-1}\left( L(\bar{X}_{j}^{n},\beta _{j}^{1,n})+\frac{1}{2\sigma ^{2}}\Vert \beta _{j}^{2,n}\Vert ^{2}\right) \right] \\&\quad \le E\left[ \frac{1}{n}\sum _{j=0}^{\kappa ^{n}-1}L_{\sigma }(\bar{X}_{j}^{n},\bar{\beta }_{j}^{n})\right] +\sigma \\&\quad \le \frac{1}{n}\sum _{j=0}^{n-1}\left[ L_{\sigma }(\zeta ^{*}(j/n),\dot{\zeta }^{*}(j/n))+\varepsilon \left[ 1+L_{\sigma }(\zeta ^{*}(j/n),\dot{\zeta }^{*}(j/n))\right] \right] +\sigma , \end{aligned}$$

where the first equality follows from observing that for \(j\ge \kappa ^{n}\), the relative entropy terms in the first line are zero, the second from (4.37) and (4.38), the inequality on the third line from (4.41), and the last line from (4.39). Taking limits as \(n\rightarrow \infty \) in the last display and using the continuity of \(L_{\sigma }\) and that \(\zeta ^{*}\) is piecewise linear, it follows that

$$\begin{aligned}&\limsup _{n\rightarrow \infty }E\left[ \frac{1}{n}\sum _{j=0}^{n-1}\left( R\left( \bar{\mu }_{j}^{1,n}(\cdot )\Vert \theta (\cdot |\bar{X}_{j}^{n})\right) +R\left( \bar{\mu }_{j}^{2,n}(\cdot )\Vert \rho _{\sigma }(\cdot )\right) \right) \right] \nonumber \\&\quad \le (1+\varepsilon )\int _{0}^{1}L_{\sigma }(\zeta ^{*}(t),\dot{\zeta }^{*}(t))dt+(\sigma +\varepsilon )\nonumber \\&\quad \le (1+\varepsilon )I(\zeta )+2\sigma +\varepsilon (1+\sigma ),\end{aligned}$$
(4.42)

where the last inequality follows on recalling that \(\zeta ^{*}\) was chosen so that (4.32) is satisfied.

We recall that Lemma 4.18 gave a bound for \(Ee^{-nF(X^{n})}\) in terms of \(Ee^{-nF(Z_{\sigma }^{n})}\), and that Proposition 4.20 gave a variational bound for \(-\frac{1}{n}\log Ee^{-nF(Z_{\sigma }^{n})}\). If we combine these with the last display and (4.35), then

$$\begin{aligned} \limsup _{n\rightarrow \infty }-\frac{1}{n}\log Ee^{-nF(X^{n})}&\le F(\zeta )+I(\zeta )+2\sigma +\varepsilon (1+\sigma +I(\zeta ))\\&\quad +\limsup _{n\rightarrow \infty }E\left[ F(\bar{Z}_{\sigma }^{n})-F(\zeta )\right] +M^{2}\sigma ^{2}/2\\&\le \inf _{\varphi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ F(\varphi )+I(\varphi )\right] +2\sigma +\varepsilon (2+\sigma +I(\zeta ))\\&\quad +\limsup _{n\rightarrow \infty }E\left[ F(\bar{Z}_{\sigma }^{n})-F(\zeta )\right] +M^{2}\sigma ^{2}/2, \end{aligned}$$

where M, as in the statement of Lemma 4.18, is the Lipschitz constant of F. Taking the limit as \(\sigma \rightarrow 0\) and then \(\varepsilon \rightarrow 0\) gives

$$\begin{aligned}&\limsup _{n\rightarrow \infty }-\frac{1}{n}\log Ee^{-nF(X^{n})}\\&\quad \le \inf _{\varphi \in \mathscr {C}([0,1]:\mathbb {R}^{d})}\left[ F(\varphi )+I(\varphi )\right] +\limsup _{\varepsilon \rightarrow 0}\limsup _{\sigma \rightarrow 0}\limsup _{n\rightarrow \infty }E\left[ F(\bar{Z}_{\sigma }^{n})-F(\zeta )\right] . \end{aligned}$$

Hence in order to complete the proof of the Laplace principle lower bound, it now suffices to argue that

$$\begin{aligned} \limsup _{\varepsilon \rightarrow 0}\limsup _{\sigma \rightarrow 0}\limsup _{n\rightarrow \infty }E\left[ F(\bar{Z}_{\sigma }^{n})-F(\zeta )\right] =0. \end{aligned}$$
(4.43)

To do this we must analyze the asymptotic properties of the controls and controlled processes.

8.5 Tightness and Convergence of Controlled Processes

To prove (4.43) we will need to establish tightness and characterize the limits of \(\{\bar{Z}_{\sigma }^{n}\}\). The main results that are needed have already been established in Lemmas 4.11 and 4.12. To apply these results, we first must identify correspondences between objects here [on \(\mathbb {R}^{d}\times \mathbb {R}^{d}\)] and those of the lemmas [on \(\mathbb {R}^{d}\)]. We recall the definitions

$$ \bar{L}^{n}(A\times C)\doteq \int _{C}\bar{L}^{n}(A|t)dt,\quad \bar{L}^{n}(A|t)\doteq \delta _{(\bar{v}_{i}^{n},\bar{w}_{i,\sigma }^{n})}(A)\text { if }t\in [i/n, i/n+1/n) $$

and \(\bar{\mu }_{i}^{n}(A\times B)\doteq \bar{\mu }_{i}^{1,n}(A)\bar{\mu }_{i}^{2,n}(B)\). Define random probability measures by

$$\begin{aligned} \bar{\mu }^{1,n}(A\times C)&\doteq \int _{C}\bar{\mu }^{1,n}(A|t)dt,\quad \bar{\mu }^{1,n}(A|t)\doteq \bar{\mu }_{i}^{1,n}(A)\text { if }t\in [i/n, i/n+1/n),\\ \bar{\mu }^{2,n}(B\times C)&\doteq \int _{C}\bar{\mu }^{2,n}(B|t)dt,\quad \bar{\mu }^{2,n}(B|t)\doteq \bar{\mu }_{i}^{2,n}(B)\text { if }t\in [i/n, i/n+1/n), \end{aligned}$$

and

$$\begin{aligned}&\bar{\mu }^{n}(A\times B\times C)\doteq \int _{C}\bar{\mu }^{1,n}(A|t)\bar{\mu }^{2,n}(B|t)dt, \lambda ^{n}(A\times B\times C)\doteq \int _{C}\lambda ^{n}(A\times B|t)dt,\\&\qquad \qquad \quad \lambda ^{n}(A\times B|t)\doteq \theta (A|\bar{X}_{i}^{n})\rho _{\sigma }(B)\text { if }t\in [i/n, i/n+1/n). \end{aligned}$$

Here \(A, B\in \mathscr {B}(\mathbb {R}^{d})\) and \(C\in \mathscr {B}([0,1])\). Also, let \(\bar{\kappa }^{n}=\kappa ^{n}/n\).

Lemma 4.23

(a) The collection \(\{(\bar{\mu }^{n},\lambda ^{n},\bar{\mu }^{1,n},\bar{\mu }^{2,n},\bar{X}^{n},\bar{U}_{\sigma }^{n},\bar{\kappa }^{n})\}_{n\in \mathbb {N}}\) is a tight family of random variables with values in

$$ \mathscr {P}(\mathbb {R}^{d}\times \mathbb {R}^{d}\times [0,1])^{2}\times \mathscr {P}(\mathbb {R}^{d}\times [0,1])^{2}\times \mathscr {C}([0,1]:\mathbb {R}^{d})^{2}\times [0,1]. $$

(b) Suppose \((\bar{\mu }^{n},\lambda ^{n},\bar{\mu }^{1,n},\bar{\mu }^{2,n},\bar{X}^{n},\bar{U}_{\sigma }^{n},\bar{\kappa }^{n})\) converges along a subsequence in distribution to \((\bar{\mu },\lambda ,\bar{\mu }^{1},\bar{\mu }^{2},\bar{X},\bar{U}_{\sigma },\bar{\kappa })\). Then a.s., for every \(t\in [0,1]\),

$$ \bar{X}(t)=x_{0}+\int _{\mathbb {R}^{d}\times [0,t]}y\bar{\mu }^{1}(dy\times ds) \text{ and } \bar{U}_{\sigma }(t)=\int _{\mathbb {R}^{d}\times [0,t]}z\bar{\mu }^{2}(dz\times ds). $$

Proof

It follows from (4.42) that

$$ \sup _{n\in \mathbb {N}}E\left[ R\left( \bar{\mu }^{n}\Vert \lambda ^{n}\right) \right] <\infty . $$

Also, from part (a) of Condition 4.3, for every \(\alpha =(\alpha ^{1},\alpha ^{2})\in \mathbb {R}^{2d}\),

$$ \sup _{x\in \mathbb {R}^{d}}\log \int _{\mathbb {R}^{2d}}\exp \left( \langle \alpha ^{1}, y\rangle +\langle \alpha ^{1}, z\rangle \right) \theta (dy|x)\rho _{\sigma }(dz)<\infty . $$

We can therefore apply Lemma 4.11 with \(\theta (dy|x)\) replaced with \(\theta (dy|x)\rho _{\sigma }(dz)\) and \(\bar{\mu }_{i}^{n}\) now given by \(\bar{\mu }_{i}^{1,n}\times \bar{\mu }_{i}^{2,n}\). The lemma implies that the families \(\{\bar{\mu }^{n}\}\) and \(\{\bar{L}^{n}\}\) are tight and satisfy the uniform integrability property

$$\begin{aligned}&\lim _{M\rightarrow \infty }\limsup _{n\rightarrow \infty }E\left[ \int _{\mathbb {R}^{2d}\times [0,1]}\left\| (y, z)\right\| 1_{\left\{ \left\| (y, z)\right\| \ge M\right\} }\bar{\mu }^{n}(dy\times dz\times dt)\right] \nonumber \\&\quad =\lim _{M\rightarrow \infty }\limsup _{n\rightarrow \infty }E\left[ \int _{\mathbb {R}^{2d}\times [0,1]}\left\| (y, z)\right\| 1_{\left\{ \left\| (y, z)\right\| \ge M\right\} }\bar{L}^{n}(dy\times dz\times dt)\right] \nonumber \\&\quad =0. \end{aligned}$$
(4.44)

Tightness of \(\bar{X}^{n}\) and \(\bar{U}_{\sigma }^{n}\) follows from Lemma 4.11 with the identities

$$ (\bar{X}^{n}(t),\bar{U}_{\sigma }^{n}(t))=(x_{0}, 0)+\int _{\mathbb {R}^{2d}\times [0,t]}(y, z)\bar{L}^{n}(dy\times dz\times ds). $$

Finally, tightness of \(\{(\bar{\mu }^{1,n},\bar{\mu }^{2,n})\}\) is immediate from that of \(\{\bar{\mu }^{n}\}\), and the tightness of \(\{\bar{\kappa }^{n}\}\) holds trivially due to the compactness of [0, 1].

It follows from Lemma 4.12 that

$$ (\bar{X}(t),\bar{U}_{\sigma }(t))=(x_{0}, 0)+\int _{\mathbb {R}^{2d}\times [0,t]}(y, z)\bar{\mu }(dy\times dz\times ds). $$

We then use that w.p.1 \(\bar{\mu }^{1}(dy\times ds)=\bar{\mu }(dy\times \mathbb {R}^{d}\times ds)\) and \(\bar{\mu }^{2}(dz\times ds)=\bar{\mu } (\mathbb {R}^{d}\times dz\times ds)\) to get part (b).   \(\square \)

8.6 Completion of the Proof of the Laplace Lower Bound

We now return to the proof of (4.43). We will argue that

$$\begin{aligned} \limsup _{n\rightarrow \infty }E\left[ F(\bar{Z}_{\sigma }^{n})-F(\zeta )\right] \le h(\sigma ,\varepsilon ) ,\end{aligned}$$
(4.45)

where \(h:(0,\infty )\times (0,\infty )\rightarrow [0,\infty )\) satisfies \(\lim _{\sigma \rightarrow 0}h(\sigma ,\varepsilon )=0\) for all \(\varepsilon \in (0,1)\). For this, by a usual subsequential argument it is enough to argue that (4.45) holds along any subsequence as in part (b) of Lemma 4.23 with a function h that is independent of the choice of the subsequence. Using the Skorohod representation theorem [Appendix A, Theorem A.8], we can assume that the convergence in part (b) of Lemma 4.23 is a.s., and without loss we can also assume that it holds along the full sequence. Then for all \(t\in [0,1]\),

$$ \bar{Z}_{\sigma }(t)\doteq \lim _{n\rightarrow \infty }\bar{Z}_{\sigma }^{n}(t)=\lim _{n\rightarrow \infty }\bar{X}^{n}(t)+\lim _{n\rightarrow \infty }\bar{U}_{\sigma }^{n}(t)=\bar{X}(t)+\bar{U}_{\sigma }(t). $$

The following lemma estimates the difference between \(\bar{Z}_{\sigma }\) and \(\bar{X}\).

Lemma 4.24

Let \(m(\sigma ,\varepsilon )\doteq 2\sigma ^{2}\left( (1+\varepsilon )(2\Vert F\Vert _{\infty }+\sigma +\varepsilon )+\sigma +\varepsilon \right) \). For every \(\sigma >0\),

$$ E\Vert \bar{Z}_{\sigma }-\bar{X}\Vert _{\infty }^{2}=E\Vert \bar{U}_{\sigma }\Vert _{\infty }^{2}\le m(\sigma ,\varepsilon ). $$

Proof

We use the convergence of \(\bar{\mu }^{2,n}\) to \(\bar{\mu }^{2}\) and the uniform integrability stated in (4.44). The identities in part (b) of Lemma 4.23 and an application of Fatou’s lemma then imply

$$\begin{aligned} E\Vert \bar{U}_{\sigma }\Vert _{\infty }^{2}&\le \liminf _{n\rightarrow \infty }E\left\| \int _{\mathbb {R}^{d}\times [0,\cdot ]}z\bar{\mu }^{2,n}(dz\times ds)\right\| _{\infty }^{2}\\&\le \liminf _{n\rightarrow \infty }E\frac{1}{n}\sum _{j=0}^{n-1}\left\| \int _{\mathbb {R}^{d}}z\bar{\mu }_{j}^{2,n}(dz)\right\| ^{2}\\&=\liminf _{n\rightarrow \infty }E\frac{1}{n}\sum _{j=0}^{\kappa ^{n}-1}\left\| \beta _{j}^{2,n}\right\| ^{2}\\&\le 2\sigma ^{2}\liminf _{n\rightarrow \infty }\frac{1}{n}\sum _{j=0}^{n-1}\left[ L_{\sigma }(\zeta ^{*}(j/n),\dot{\zeta }^{*}(j/n))\right. \\&\quad \quad \quad \quad \quad +\left. \varepsilon \left[ 1+L_{\sigma }(\zeta ^{*}(j/n),\dot{\zeta }^{*}(j/n))\right] \right] +2\sigma ^{3}\,,\end{aligned}$$

where the second inequality follows from Jensen’s inequality and the third uses (4.41) and (4.39). Using the continuity of \(L_{\sigma }\) leads to

$$\begin{aligned} E\Vert \bar{U}_{\sigma }\Vert _{\infty }^{2}&\le 2\sigma ^{2}\left( (1+\varepsilon )\int _{0}^{1}L_{\sigma }(\zeta ^{*}(t),\dot{\zeta }^{*}(t))dt\right) +2\sigma ^{2}(\sigma +\varepsilon )\\&\le 2\sigma ^{2}\left( (1+\varepsilon )(I(\zeta )+\sigma )\right) +2\sigma ^{2}(\sigma +\varepsilon )\\&\le 2\sigma ^{2}\left( (1+\varepsilon )(2\Vert F\Vert _{\infty }+\sigma +\varepsilon )\right) +2\sigma ^{2}(\sigma +\varepsilon ), \end{aligned}$$

where the second inequality follows from (4.32) and the third from (4.35). This is the claim of the lemma.   \(\square \)

We recall that \(\beta _{j}^{1,n}+\beta _{j}^{2,n}=\bar{\beta }_{j}^{n}\) and that \(\bar{\beta }_{j}^{n}\) is chosen equal to \(B(\bar{X}_{j}^{n})\), which implies (4.31) with \((\xi ,\bar{\beta },\chi ,\bar{\gamma })=(\bar{X}_{j}^{n},\bar{\beta }_{j}^{n},\zeta ^{*}(j/n),\dot{\zeta }^{*}(j/n))\) for \(j\le \kappa ^{n}-1\). It follows from (4.37), (4.38), and (4.40) that for \(j=0,1,\ldots ,\kappa ^{n}-1\),

$$\begin{aligned}&\left\| \int _{\mathbb {R}^{d}}y\left[ \bar{\mu }_{j}^{1,n}(dy)+\bar{\mu }_{j}^{2,n}(dy)\right] -\dot{\zeta }^{*}(j/n)\right\| \\&\quad =\left\| \beta _{j}^{1,n}+\beta _{j}^{2,n}-\dot{\zeta }^{*}(j/n)\right\| \\&\quad =\left\| \bar{\beta }_{j}^{n}-\dot{\zeta }^{*}(j/n)\right\| \\&\quad \le \bar{m}\left( 1+L_{\sigma }(\zeta ^{*}(j/n),\dot{\zeta }^{*}(j/n))\right) \Vert \bar{X}_{j}^{n}-\zeta ^{*}(j/n)\Vert . \end{aligned}$$

From the uniform integrability property in (4.44) and the a.s. convergence of \((\bar{\mu }^{1,n},\bar{\mu }^{2,n},\kappa ^{n})\) to \((\bar{\mu }^{1},\bar{\mu }^{2},\kappa )\), we have for all \(t\in [0,1]\) that

$$ \int _{\mathbb {R}^{d}\times [0,t\wedge \kappa ^{n}]}y\left[ \bar{\mu }^{1,n}(dy\times ds)+\bar{\mu }^{2,n}(dy\times ds)\right] $$

converges in probability to

$$ \int _{\mathbb {R}^{d}\times [0,t\wedge \kappa ]}y\left[ \bar{\mu }^{1}(dy\times ds)+\bar{\mu }^{2}(dy\times ds)\right] . $$

Thus for every \(t\in [0,\bar{\kappa }]\),

$$\begin{aligned}&\left\| \int _{\mathbb {R}^{d}\times [0,t]}y\left[ \bar{\mu }^{1}(dy\times ds)+\bar{\mu }^{2}(dy\times ds)\right] -\int _{0}^{t}\dot{\zeta }^{*}(s)ds\right\| \\&\quad =\lim _{n\rightarrow \infty }\left\| \int _{\mathbb {R}^{d}\times [0,t]}y\left[ \bar{\mu }^{1,n}(dy\times ds)+\bar{\mu }^{2,n}(dy\times ds)\right] -\int _{0}^{t}\dot{\zeta }^{*}(s)ds\right\| \\&\quad \le \bar{m}\int _{0}^{t}\Vert \bar{X}(s)-\zeta ^{*}(s)\Vert \left( 1+L_{\sigma }(\zeta ^{*}(s),\dot{\zeta }^{*}(s))\right) ds. \end{aligned}$$

For \(s\in [0,1]\), let \(a(s)=1+L_{\sigma }(\zeta ^{*}(s),\dot{\zeta }^{*}(s))\) and \(b(s)=\Vert \bar{Z}_{\sigma }(s)-\zeta ^{*}(s)\Vert \). Then from part (b) of Lemma 4.23 and since \(\bar{Z}_{\sigma }=\bar{X}+\bar{U}_{\sigma }\), the last display implies that for \(t\in [0,\bar{\kappa }]\),

$$\begin{aligned} b(t)&=\left\| \int _{\mathbb {R}^{d}\times [0,t]}y\left[ \bar{\mu }^{1}(dy\times ds)+\bar{\mu }^{2}(dy\times ds)\right] -\int _{0}^{t}\dot{\zeta }^{*}(s)ds\right\| \\&\le \bar{m}\int _{0}^{t}\Vert \bar{X}(s)-\zeta ^{*}(s)\Vert a(s)ds\\&\le \bar{m}\int _{0}^{t}b(s)a(s)ds+\bar{m}\Vert \bar{X}-\bar{Z}_{\sigma }\Vert _{\infty }\int _{0}^{1}a(s)ds. \end{aligned}$$

Using that [again by (4.32) and (4.35)] \(\int _{0}^{1}a(s)ds\le 1+2\Vert F\Vert _{\infty }+\sigma +\varepsilon \), for all \(\varepsilon ,\sigma \in (0,1)\), Gronwall’s lemma [Lemma E.2] implies

$$\begin{aligned} \Vert \bar{Z}_{\sigma }(\cdot \wedge \bar{\kappa })-\zeta ^{*}(\cdot \wedge \bar{\kappa })\Vert _{\infty }&\le \bar{m}\Vert \bar{X}-\bar{Z}_{\sigma }\Vert _{\infty }\int _{0}^{1}a(s)ds\int _{0}^{1}e^{\bar{m}\int _{0}^{t}a(s)ds}dt\nonumber \\&\le \bar{m}(2\Vert F\Vert _{\infty }+3)\Vert \bar{X}-\bar{Z}_{\sigma }\Vert _{\infty }e^{\bar{m}(2\Vert F\Vert _{\infty }+3)}\nonumber \\&=\bar{m}_{1}\Vert \bar{X}-\bar{Z}_{\sigma }\Vert _{\infty }, \end{aligned}$$
(4.46)

where \(\bar{m}_{1}\doteq \bar{m}(2\Vert F\Vert _{\infty }+3)e^{\bar{m}(2\Vert F\Vert _{\infty }+3)}\). Finally, with M as in Lemma 4.18 equal to the Lipschitz constant of F, we obtain

$$\begin{aligned} \limsup _{n\rightarrow \infty }E\left[ F(\bar{Z}_{\sigma }^{n})-F(\zeta )\right]&\le \limsup _{n\rightarrow \infty }E\left[ M\Vert \bar{Z}_{\sigma }^{n}-\zeta \Vert _{\infty }\wedge 2\Vert F\Vert _{\infty }\right] \\&\le M\left[ \bar{m}_{1}E\Vert \bar{X}-\bar{Z}_{\sigma }\Vert _{\infty }+\sigma \right] +2\Vert F\Vert _{\infty }P(\bar{\kappa }<1)\\&\le M\left( \bar{m}_{1}(m(\sigma ,\varepsilon ))^{1/2}+\sigma \right) +2\Vert F\Vert _{\infty }P(\bar{\kappa }<1). \end{aligned}$$

For the second inequality we use the fact that \(\Vert \zeta -\zeta ^{*}\Vert _{\infty }\le \sigma \), and we partition according to whether \(\bar{\kappa }<1\), using (4.46) when this is not the case. The third inequality follows from Lemma 4.24.

The last quantity we need to control is \(P(\bar{\kappa }<1)\). Since the convergence in path space is with respect to the uniform topology, we have

$$ P(\bar{\kappa }<1)\le P\left( \left\| \bar{Z}_{\sigma }(\cdot \wedge \bar{\kappa })-\zeta ^{*}(\cdot \wedge \bar{\kappa })\right\| _{\infty }\ge \bar{\eta }/2\right) \le \frac{4}{\bar{\eta }^{2}}\bar{m}_{1}^{2}\, m(\sigma ,\varepsilon ), $$

where the first inequality follows from the definition of \(\kappa ^{n}\) and the second inequality uses Chebyshev’s inequality, (4.46) and Lemma 4.24. Thus (4.45) holds with

$$ h(\sigma ,\varepsilon )=M\left( \bar{m}_{1}(m(\sigma ,\varepsilon ))^{1/2}+\sigma \right) +2\Vert F\Vert _{\infty }\frac{4}{\bar{\eta }^{2}(\varepsilon )}\bar{m}_{1}^{2}\, m(\sigma ,\varepsilon ), $$

where we write \(\bar{\eta }=\bar{\eta }(\varepsilon )\) to emphasize its dependence on \(\varepsilon \) (recall from Lemma 4.21 that \(\bar{\eta }\) does not depend on \(\sigma \)). From the definition of \(m(\sigma ,\varepsilon )\) in Lemma 4.24 it follows that \(\lim _{\sigma \rightarrow 0}m(\sigma ,\varepsilon )=0\) for all \(\varepsilon \in (0,1)\). This proves that \(\lim _{\sigma \rightarrow 0}h(\sigma ,\varepsilon )=0\) for every \(\varepsilon \in (0,1)\), and hence (4.43) holds. With the limit (4.43) demonstrated, we have completed the proof of the Laplace lower bound under Condition 4.8.    \(\square \)

9 Notes

Among the very first papers to treat models of this type are those of Wentzell [245–248], Freidlin and Wentzell [140], and Azencott and Ruget [8]. The conditions we use are weaker than those of the cited references. The statement of assumptions and conclusions for this chapter parallels that of Chapter 6 of [97], though the proof is different and is directly analogous to the way Cramér’s theorem was obtained in Chap. 3. In particular, we first prove a process-level generalization of Sanov’s theorem for the driving noises that define the recursive stochastic model. Then, assuming appropriate integrability conditions on the distribution of these noises, one obtains Laplace asymptotics for the process of interest by viewing it as a mapping on this process-level empirical measure. This leads to a somewhat simpler proof, though in all approaches the mollification aspect of the proof under Conditions 4.3 and 4.8 is technical. We also note that the proof given here corrects a gap in a proof in [97], in that the constant M defined on page 205 of [97] depends on \(\sigma \), and therefore the claim in equation (6.65) of [97] is not really established.

As noted several times already, the analysis of the discrete time model of this chapter is in many ways more difficult than that of the corresponding continuous-time models, largely because for continuous time models the noise enters in an additive and affine manner.