1 Introduction

Return times play a fundamental role in the theory of dynamical systems. In the specific context of a one-sided shift space over a finite alphabet, there is a vast literature on the connection between return times and entropy. One of the landmark results in this direction is the following theorem, due to Wyner and Ziv [WZ89] and to Ornstein and Weiss [OW93]: if \({\mathbb {P}}\) is an ergodic probability measure, then for \({\mathbb {P}}\)-almost every sequence x, the time \(R_n(x)\) it takes for the first n letters to reappear down the sequence x (in the same order) grows exponentially with n at a rate equal to the Kolmogorov–Sinai entropy \(h({\mathbb {P}})\). In other words, return times, once properly rescaled, provide a sequence \((\tfrac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) of universal entropy estimators. Here, “universal” means that the definition of the random variable \(R_n\) itself makes no reference to the measure \({\mathbb {P}}\); computing \(R_n(x)\) requires no explicit information about the marginals of the measure \({\mathbb {P}}\) whose entropy is being estimated. Refinements of this behavior in the form of central limit theorems, laws of the iterated logarithm, large deviation principles (LDPs) and multifractal analysis have been an active area of research since the 1990 s; see e.g. [Kon98, CGS99, FW01, Ols03, CU05, Joh06, CFM19, AACG23].

In the present paper, we will prove a full LDP for the sequence \((\tfrac{1}{n} \ln R_n)_{n\in {\mathbb N}}\), and give an expression of the rate function and pressure in terms of those of the LDP accompanying the celebrated Shannon–McMillan–Breiman (SMB) theorem. We will also carry this analysis for a nonoverlapping notion of return times along a sequence (see the definition of \(V_n\) below) and for waiting times (see the definition of \(W_n\) below) involving a pair of sequences as in [WZ89, Shi93, MS95, Kon98, CDEJR23a]. The latter will be related to recent results on the large deviations of \((-\tfrac{1}{n} \ln {\mathbb {Q}}_n)_{n\in {\mathbb N}}\) with respect to \({\mathbb {P}}\), where \({\mathbb {Q}}_n\) is the n-th marginal of a second shift-invariant probability measure \({\mathbb {Q}}\). For each of these results, the assumptions on the measures involve the notions of decoupling of [CJPS19, CDEJR23a], in turn inspired by [LPS95, Pfi02]. Roughly speaking, these decoupling assumptions take the form of bounds on how strongly different parts of the sequences may depend on each other. However, unlike mixing conditions, they do not require distant symbols to be asymptotically independent. As we shall see, the decoupling assumptions cover many standard classes of probability measures, including irreducible Markov chains, equilibrium measures for sufficiently regular potentials, g-measures, invariant Gibbs states for summable interactions in statistical mechanics, as well as some less standard measures which may be far from Gibbsian in any sense.

To the best of the authors’ knowledge, the large deviations of the sequence \((\tfrac{1}{n} \!\ln R_n)_{\!n\in {\mathbb N}}\) were previously only understood locally (i.e. in a bounded interval containing the entropy \(h({\mathbb {P}})\)), and only for measures satisfying the Bowen–Gibbs property [CGS99, AACG23]. The improvements we provide are three-fold. First, we extend the LDP to the whole real line, and describe the possible lack of convexity of the rate function which prevents the Gärtner–Ellis method used in [AACG23] from producing a global result. Second, we go past the Bowen–Gibbs property by focussing on the class of decoupled measures for which the LDP accompanying the SMB theorem and its analogue for cross entropy can be derived from results recently established in [CJPS19]. Third, we extend this analysis to the related estimators \((\frac{1}{n} \ln V_n)_{n\in {\mathbb N}}\) and \((\frac{1}{n} \ln W_n)_{n\in {\mathbb N}}\) to be introduced below.

Our main results also provide a sharp version of the large-deviation upper bounds on \((\tfrac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) that were obtained in [JB13] for mixing measures on shift spaces, and in [CRS18] for more general dynamical systems.

Organization of the paper. In the remainder of Sect. 1, we discuss our setup and main results. Theorem 0, which collects some large-deviation properties of \((-\tfrac{1}{n} \ln {\mathbb {Q}}_n)_{n\in {\mathbb N}}\) with respect to \({\mathbb {P}}\), is viewed as a starting point. Our main new results, Theorems AB and C, deal with the large deviations of \((\tfrac{1}{n} \ln W_n)_{n\in {\mathbb N}}\), \((\tfrac{1}{n} \ln V_n)_{n\in {\mathbb N}}\) and \((\tfrac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) respectively. In Sect. 1.3, we outline the proof on the basis of a toy model consisting of a mixture of geometric random variables, which we connect to the widely studied problem of exponential approximations of hitting and return times.

In Sect. 2, we discuss a selection of applications and provide concrete formulae for the rate functions and pressures whenever possible. It seems that even in the simplest examples (Bernoulli and Markov measures), some of our results are actually new. We also compare our results to those of [CGS99, AACG23] in Sect. 2.3, and explain how Theorems A and C prove a conjecture stated in [AACG23].

In Sect. 3, we first establish some sharp (at the exponential scale) estimates on the waiting times \(W_n\) (Sect. 3.1), which we then extend to \(R_n\) and \(V_n\) in Sect. 3.2.

Section  4 starts with some reminders about (weak) LDPs, and a brief review of the notion of Ruelle–Lanford functions. We then identify the Ruelle–Lanford functions of \((\tfrac{1}{n} \ln W_n)_{n\in {\mathbb N}}\), \((\tfrac{1}{n} \ln V_n)_{n\in {\mathbb N}}\) and \((\tfrac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) in order to obtain the corresponding weak LDPs.

In Sect. 5, we complete the proofs of Theorems AC: the weak LDPs of the previous section are promoted to full LDPs, the Legendre–Fenchel duality relations between the rate functions and the associated pressures are established, and the (lack of) convexity of the rate function of \((\tfrac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) is characterized.

In Sect. 6, we gain some insight into the set where the rate functions vanish by combining our LDPs and the corresponding almost sure convergence results (law of large numbers) in the literature.

In order to make the paper accessible to readers who may not be familiar with the tools that we will require from both large-deviation theory and dynamical systems, we include several technical, yet rather standard results in the appendices. We provide in “Appendix A” some definitions regarding subshifts and measures of maximal entropy, which are useful for some of the examples in Sect. 2 as well as for characterizing the convexity of the rate function of \((\tfrac{1}{n} \ln R_n)_{n\in {\mathbb N}}\). In “Appendix B”, we present technical results about our decoupling assumptions, and in particular some sufficient conditions for them to hold. Finally, in “Appendix C”, we give tools to promote weak LDPs to full ones, and we prove Theorem 0.

Notational conventions. We adopt the convention that \({\mathbb N}= \{1,2,3,4,\dotsc \}\). Unless otherwise stated, measures are probability measures. We use the conventions \(\ln 0 = -\infty \) and \(0 \cdot (\pm \infty ) = 0\) so that, in particular, \(0 \cdot \ln 0 = 0\). Given a function \(f:{\mathbb R}\rightarrow {\mathbb R}\cup \{\infty \}\), we denote by \(f^*: {\mathbb R}\rightarrow {\mathbb R}\cup \{\infty \}\) the Legendre–Fenchel transform (convex conjugate) of f defined by \(f^*(\alpha ) = \sup _{s\in {\mathbb R}}(\alpha s - f(s))\). The ball of radius \(\epsilon >0\) around the point s in a metric space is denoted \(B(s,\epsilon )\).

1.1 Setup and hypotheses

We consider the one-sided shift space \(\Omega :={{\mathcal {A}}}^{{\mathbb N}} = \{x = (x_k)_{k\in {\mathbb N}}: x_k \in {{\mathcal {A}}}\text { for all } k \in {\mathbb N}\}\) for some finite alphabet \({{\mathcal {A}}}\). The shift map \(T: \Omega \rightarrow \Omega \) is defined by \((Tx)_{n}:= x_{n+1}\). As usual, \(\Omega \) is equipped with the product topology constructed from the discrete topology on \({{\mathcal {A}}}\), and \({{\mathcal {F}}}\) denotes the corresponding Borel \(\sigma \)-algebra.

We use common notations such as: \(x_k^n\) for the letters \(x_k, x_{k+1}, \dots , x_n\) of the sequence \(x\in \Omega \); [u] for the cylinder consisting in the sequences x with \(x_1^n=u\); \({\mathbb {P}}_n\) for the marginal of the probability measure \({\mathbb {P}}\) on \({{\mathcal {A}}}^n\), i.e. \({\mathbb {P}}_n(u) = {\mathbb {P}}([u])\) for all \(u\in {{\mathcal {A}}}^n\). We let \(\Omega _{\text {fin}}:= \bigcup _{n\in {\mathbb N}}{{\mathcal {A}}}^n\) be the set of words of finite length. The length of a word \(u\in {{\mathcal {A}}}^{n}\) is denoted by \(|u|=n\). The concatenation of \(u,v\in \Omega _{\text {fin}}\) is denoted by uv, and the concatenation of m copies of u is denoted by \(u^m\). We use \({{\mathcal {F}}}_n\) for the \(\sigma \)-algebra generated by the cylinders sets [u] with \(u\in {{\mathcal {A}}}^n\), and \({{\mathcal {F}}}_{{\text {fin}}}:= \bigcup _{n\in {\mathbb N}}{{\mathcal {F}}}_n \subset {{\mathcal {F}}}\) for the set of events involving only finitely many coordinates in \(\Omega \).

We denote by \({{\mathcal {P}}}_{\text {inv}}(\Omega )\) the set of shift-invariant Borel probability measures on \(\Omega \). For \({\mathbb {P}}\in {{\mathcal {P}}}_{\text {inv}}(\Omega )\) and \(n\in {\mathbb N}\), the support of \({\mathbb {P}}_n\) is \({{\,\textrm{supp}\,}}{\mathbb {P}}_n:= \{u\in {{\mathcal {A}}}^n: {\mathbb {P}}_n(u)>0\}\). Moreover, \({{\,\textrm{supp}\,}}{\mathbb {P}}:= \{x\in \Omega : {\mathbb {P}}_n(x)>0\text { for all }n\in {\mathbb N}\}\) is the support of \({\mathbb {P}}\), which is a subshift of \(\Omega \). We denote by \(h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}})\) the topological entropy of \({{\,\textrm{supp}\,}}{\mathbb {P}}\), which satisfies \(h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}})\le \ln |{{\mathcal {A}}}|\); see “Appendix A”.Footnote 1

We now define the three sequences of interest, namely \((R_n)_{n\in {\mathbb N}}\), \((V_n)_{n\in {\mathbb N}}\) and \((W_n)_{n\in {\mathbb N}}\). The return times \(R_n:\Omega \rightarrow {\mathbb N}\) are defined as

$$\begin{aligned} R_n(x):= \inf \{k\in {\mathbb N}: T^k x \in [x_1^n] \}= \inf \{k\in {\mathbb N}: x_{k+1}^{k+n} = x_1^n \} \end{aligned}$$

for all \(x\in \Omega \) and \(n\in {\mathbb N}\), and their nonoverlapping counterparts \(V_n:\Omega \rightarrow {\mathbb N}\) are defined as

$$\begin{aligned} V_n(x):= \inf \{k\in {\mathbb N}: T^{k+n-1}x \in [x_1^n] \} = \inf \{k\in {\mathbb N}: x_{n+k}^{2n+k-1} = x_1^n \} \end{aligned}$$

instead. The waiting times \(W_n: \Omega \times \Omega \rightarrow {\mathbb N}\) are defined as

$$\begin{aligned} W_n(x,y):= \inf \{k\in {\mathbb N}: T^{k-1}y \in [x_1^n] \}= \inf \{k\in {\mathbb N}: y_{k}^{k+n-1} = x_1^n \} \end{aligned}$$

for all \((x,y)\in \Omega \times \Omega \) and \(n\in {\mathbb N}\). In the literature, the function \(W_n(x, \cdot \,)\) is often called hitting time of the set \([x_1^n]\).

We next briefly recall some results about almost sure convergence of the sequences \((\frac{1}{n} \ln R_n)_{n\in {\mathbb N}}\), \((\frac{1}{n} \ln V_n)_{n\in {\mathbb N}}\) and \((\frac{1}{n} \ln W_n)_{n\in {\mathbb N}}\), which justify their role of as (cross) entropy estimators. The LDPs we prove in the present paper complement these almost sure convergence results, without relying on them, nor implying them in general; see Sect. 6 for an extended discussion.

First, if \({\mathbb {P}}\in {{\mathcal {P}}}_{\text {inv}}(\Omega )\), then

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n} \ln R_n(x) = \lim _{n\rightarrow \infty }\frac{1}{n} \ln V_n(x) = h_{\mathbb {P}}(x), \end{aligned}$$
(1.1)

for \({\mathbb {P}}\)-almost every \(x\in \Omega \), where \(h_{\mathbb {P}}\) is the local entropy function defined by

$$\begin{aligned} h_{\mathbb {P}}(x):= \lim _{n\rightarrow \infty } -\frac{1}{n} \ln {\mathbb {P}}_n(x_1^n). \end{aligned}$$
(1.2)

By virtue of the SMB theorem, the last limit exists and satisfies \(h_{\mathbb {P}}(x) = h_{\mathbb {P}}(Tx)\) for \({\mathbb {P}}\)-almost every x, and its integral with respect to \({\mathbb {P}}\) is the Kolmogorov–Sinai entropy (also commonly called specific entropy or simply entropy), defined as

$$\begin{aligned} h({\mathbb {P}}):= \lim _{n\rightarrow \infty }-\frac{1}{n} \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u) \ln {\mathbb {P}}_n(u). \end{aligned}$$

When \({\mathbb {P}}\) is ergodic, we have \(h_{\mathbb {P}}(x) = h({\mathbb {P}})\) for \({\mathbb {P}}\)-almost every \(x\in \Omega \), and in this case (1.1) can be traced back to [WZ89, OW93]. The relations (1.1) are less standard when \({\mathbb {P}}\) is merely assumed to be shift invariant; see Sect. 6.1 for references.

Consider now a pair \(({\mathbb {P}},{\mathbb {Q}}\)) of shift-invariant probability measures, possibly with \({\mathbb {P}}={\mathbb {Q}}\). As we will discuss in Sect. 6.1, under some assumptions more general than those of Theorem A below, it has recently been proved in [CDEJR23a] that

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n} \ln W_n(x,y) = h_{\mathbb {Q}}(x) \end{aligned}$$

for \(({\mathbb {P}}\otimes {\mathbb {Q}})\)-almost all pairs \((x,y)\in \Omega \times \Omega \), where \(h_{\mathbb {Q}}\) is defined as in (1.2) and integrates with respect to \({\mathbb {P}}\) to the specific cross entropy of \({\mathbb {P}}\) relative to \({\mathbb {Q}}\), i.e.

$$\begin{aligned} h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}}):= \lim _{n\rightarrow \infty } -\frac{1}{n} \sum _{u \in \mathcal{A}^n} {\mathbb {P}}_n(u) \ln {\mathbb {Q}}_n(u). \end{aligned}$$
(1.3)

If \({\mathbb {P}}\) is, in addition, assumed to be ergodic, then \(h_{\mathbb {Q}}(x) = h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\) for \({\mathbb {P}}\)-almost every \(x\in \Omega \). We stress that the \({\mathbb {P}}\)-almost sure existence of \(h_{\mathbb {Q}}\) and the existence of the limit in (1.3) do not follow from mere shift invariance when \({\mathbb {P}}\ne {\mathbb {Q}}\). Note that, with \(h_{\text {r}}({\mathbb {P}}|{\mathbb {Q}})\) the specific relative entropy, we have \(h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}}) = h({\mathbb {P}}) + h_{\text {r}}({\mathbb {P}}|{\mathbb {Q}})\) whenever both sides are well defined, and that \(h_{\text {c}}({\mathbb {P}}|{\mathbb {P}}) = h({\mathbb {P}})\) is always well defined. The numbers \(h_{\text {r}}({{\mathbb {P}}}|{{\mathbb {Q}}})\) and \(h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\) are both relevant in information theory (see e.g. [CT06, §5.4]Footnote 2) and are commonly used in classification tasks (see e.g. [GBC16, §3.13] or [Mur22, §5.1.6]).

The pressure \(q_R\) associated with the sequence \((\tfrac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) is the function of \(\alpha \in {\mathbb R}\) defined by

$$\begin{aligned} q_R(\alpha ):= \lim _{n\rightarrow \infty }\frac{1}{n} \ln \int \textrm{e}^{\alpha \ln R_n(x)}\textrm{d}{\mathbb {P}}(x) \end{aligned}$$
(1.4)

when the limit exists. The function \(q_R\) is also referred to as the “rescaled cumulant-generating function” or “\(L^\alpha \)-spectrum” in the literature. Similarly, we define

$$\begin{aligned} q_V(\alpha ):= \lim _{n\rightarrow \infty }\frac{1}{n} \ln \int \textrm{e}^{\alpha \ln V_n(x)}\textrm{d}{\mathbb {P}}(x) \end{aligned}$$
(1.5)

and

$$\begin{aligned} q_W(\alpha ):= \lim _{n\rightarrow \infty }\frac{1}{n} \ln \int \int \textrm{e}^{\alpha \ln W_n(x,y)}\textrm{d}{\mathbb {P}}(x)\textrm{d}{\mathbb {Q}}(y) \end{aligned}$$
(1.6)

when the limits exist.

The above pressures will be expressed in terms of the pressure of the sequence \((-\frac{1}{n} \ln {\mathbb {Q}}_n)_{n\in {\mathbb N}}\), which we define as

$$\begin{aligned} \begin{aligned} q_{{\mathbb {Q}}}(\alpha )&:= \lim _{n\rightarrow \infty }\frac{1}{n} \ln \int \textrm{e}^{-\alpha \ln {\mathbb {Q}}_n(x_1^n)} \textrm{d}{\mathbb {P}}(x) \\&= \lim _{n\rightarrow \infty } \frac{1}{n} \ln \sum _{u\in {{\,\textrm{supp}\,}}{\mathbb {P}}_n} {\mathbb {Q}}_n(u)^{-\alpha }{\mathbb {P}}_n(u) \end{aligned} \end{aligned}$$
(1.7)

when the limit exists. It will be part of our assumptions below that \({\mathbb {P}}_n \ll {\mathbb {Q}}_n\) for every \(n\in {\mathbb N}\), i.e. that \({\mathbb {P}}_n\) is absolutely continuous with respect to \({\mathbb {Q}}_n\), so the integral and sum in (1.7) are well defined and finite for each n. When \({\mathbb {Q}}= {\mathbb {P}}\), the summand in the rightmost expression of (1.7) is \({\mathbb {P}}_n(u)^{1-\alpha }\); up to some sign and normalization convention, the pressure \(q_{{\mathbb {P}}}\) thus coincides with the Rényi entropy function of \({\mathbb {P}}\).

A common route to establishing the LDP, which is followed in particular in [AACG23], is to first study the pressure in detail, and then derive the LDP using an adequate version of the Gärtner–Ellis theorem. Intrinsic to this approach are the differentiability of the pressure and the convexity of the rate function, both of which fail in our setup; see Sect. 2.3 for further discussion of the results in [AACG23]. Our approach goes in the opposite direction: first the LDP is established, and only then is some version of Varadhan’s lemma used to describe the pressure. This method allows to consider significantly more general measures, and to obtain the full LDP with a possibly nonconvex rate function. This path was already followed in [CJPS19] in order to establish, in particular, the LDP for \((-\tfrac{1}{n} \ln {\mathbb {Q}}_n)_{n\in {\mathbb N}}\) with respect to \({\mathbb {P}}\) and the properties of \(q_{{\mathbb {Q}}}\).

Assumptions. In order to state the decoupling assumptions below, we require a sequence \((\tau _n)_{n\in {\mathbb N}}\) in \({\mathbb N}\cup \{0\}\) and a sequence \((C_n)_{n\in {\mathbb N}}\) in \([1,\infty )\), assumed to be fixed and to satisfy

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{\tau _n}{n} = \lim _{n\rightarrow \infty } \frac{\ln C_n}{n} = 0. \end{aligned}$$
(1.8)

We will freely write \(\tau _n = o(n)\) and \(C_n = \textrm{e}^{o(n)}\) or speak of an “o(n)-sequence” and an “\(\textrm{e}^{o(n)}\)-sequence” when referring to the conditions (1.8).

Definition 1.1

(UD, SLD, JSLD, admissible pair) Let \({\mathbb {P}}\in {{\mathcal {P}}}_{\text {inv}}(\Omega )\). We say that \({\mathbb {P}}\) satisfies the upper decoupling assumption (UD) if for all \(n,m\in {\mathbb N}\), \(u\in {{\mathcal {A}}}^n\), \(v\in {{\mathcal {A}}}^m\) and \(\xi \in \Omega ^{\tau _n}\),

$$\begin{aligned} {\mathbb {P}}_{n+\tau _n+m}\left( u\xi v\right) \le {C_n} {\mathbb {P}}_n(u){\mathbb {P}}_m(v). \end{aligned}$$
(1.9)

We say that \({\mathbb {P}}\) satisfies the selective lower decoupling assumption (SLD) if for all \(n,m\in {\mathbb N}\), \(u\in {{\mathcal {A}}}^n\) and \(v\in {{\mathcal {A}}}^m\), there exist \(0\le \ell \le \tau _n\) and \(\xi \in {{\mathcal {A}}}^\ell \) such that

$$\begin{aligned} {\mathbb {P}}_{n+\ell +m}\left( u\xi v\right) \ge C_n^{-1} {\mathbb {P}}_n(u){\mathbb {P}}_m(v). \end{aligned}$$
(1.10)

A pair of measures \(({\mathbb {P}}, {\mathbb {Q}})\) with \({\mathbb {P}}, {\mathbb {Q}}\in {{\mathcal {P}}}_{\text {inv}}(\Omega )\) is said satisfy the joint selective lower decoupling assumption (JSLD) if for all \(n,m\in {\mathbb N}\), \(u\in {{\mathcal {A}}}^n\) and \(v\in {{\mathcal {A}}}^m\), there exist \(0\le \ell \le \tau _n\) and \(\xi \in {{\mathcal {A}}}^\ell \) such that

$$\begin{aligned} {\mathbb {P}}_{n+\ell +m}\left( u\xi v\right) \ge C_n^{-1} {\mathbb {P}}_n(u){\mathbb {P}}_m(v) \quad \text{ and }\quad {\mathbb {Q}}_{n+\ell +m}\left( u\xi v\right) \ge C_n^{-1} {\mathbb {Q}}_n(u){\mathbb {Q}}_m(v).\nonumber \\ \end{aligned}$$
(1.11)

Finally, a pair of measures \(({\mathbb {P}}, {\mathbb {Q}})\) with \({\mathbb {P}}, {\mathbb {Q}}\in {{\mathcal {P}}}_{\text {inv}}(\Omega )\) is said to be admissible if \({\mathbb {P}}_n \ll {\mathbb {Q}}_n\) for all \(n\in {\mathbb N}\), if \({\mathbb {P}}\) and \({\mathbb {Q}}\) satisfy UD, and if the pair \(({\mathbb {P}}, {\mathbb {Q}})\) satisfies JSLD.

Remark 1.2

If the pair \(({\mathbb {P}}, {\mathbb {Q}})\) satisfies JSLD, then obviously both \({\mathbb {P}}\) and \({\mathbb {Q}}\) satisfy SLD, but the converse need not hold, since in (1.11) both inequalities are required to hold for the same \(\xi \). If \(\tau _n =0\) for all n, then JSLD becomes equivalent to \({\mathbb {P}}\) and \({\mathbb {Q}}\) both satisfying SLD.

Remark 1.3

We can always increase the constants \(C_n\) and \(\tau _n\) (see Lemma B.1), so there is no loss of generality in taking the same sequences \((\tau _n)_{n\in {\mathbb N}}\) and \((C_n)_{n\in {\mathbb N}}\) for JSLD (resp. SLD) and UD, as well as for both measures \({\mathbb {P}}\) and \({\mathbb {Q}}\).

Next, the following numbers will play an important role:

$$\begin{aligned} \gamma _+:= \limsup _{n\rightarrow \infty }\frac{1}{n} \sup _{u\in {{\mathcal {A}}}^n}\ln {\mathbb {P}}_n(u) \quad \text{ and }\quad \ \gamma _-:= \liminf _{n\rightarrow \infty }\frac{1}{n} \inf _{u\in {{\,\text {supp}\,}}{\mathbb {P}}_n}\ln {\mathbb {P}}_n(u).\nonumber \\ \end{aligned}$$
(1.12)

One easily shows that

$$\begin{aligned} 0 \le -\gamma _+ \le h({\mathbb {P}}) \le h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}}) \le -\gamma _- \le \infty ; \end{aligned}$$
(1.13)

see “Appendix A” for a proof and a definition of the topological entropy \(h_{\text {top}}\). For some results, \(\gamma _+\) will be required to be well approximated by periodic sequences:

Definition 1.4

(PA). PA A measure \({\mathbb {P}}\in {{\mathcal {P}}}_{\text {inv}}(\Omega )\) satisfies the periodic approximation assumption (PA) if for every \(\epsilon > 0\), there exists \(p \in {\mathbb N}\) and \(u\in {{\mathcal {A}}}^p\) such that

$$\begin{aligned} \liminf _{n\rightarrow \infty } \frac{1}{np} \ln {\mathbb {P}}_{np }\left( u^n\right) \ge \gamma _+ - \epsilon . \end{aligned}$$

Remark 1.5

If \({\mathbb {P}}\) satisfies SLD with \(\tau _n=0\), then automatically PA holds; a more general sufficient condition for PA to hold is given in Lemma B.6.

We discuss at the end of Sect. 4.2 possible ways to weaken PA.

Starting point. Our analysis is built on top of the LDP for the sequence \((-\frac{1}{n} \ln {\mathbb {Q}}_n)_{n\in {\mathbb N}}\) viewed as a family of random variables on \((\Omega ,{\mathbb {P}})\).Footnote 3 The following theorem summarizes the large-deviation properties of this sequence and essentially follows from the results of [CJPS19], where the corresponding weak LDP is proved. Taking this weak LDP for granted, the proof of the full theorem only requires some minor adjustments to the arguments in [CJPS19]; for completeness we provide the details in “Appendix C.2”. In some important applications, the conclusions of the theorem are well known and follow from standard methods; see Sects. 2.12.3 below. The terminology regarding (weak and full) LDPs, rate functions and exponential tightness is summarized in Sect. 4.

Theorem 0

(LDP for \(-\frac{1}{n} \ln {\mathbb {Q}}_n\)) If \(({\mathbb {P}}, {\mathbb {Q}})\) is an admissible pair, then the following hold:

  1. i.

    The sequence \((-\tfrac{1}{n} \ln {\mathbb {Q}}_n)_{n\in {\mathbb N}}\) on \((\Omega , {\mathbb {P}})\) satisfies the LDP with a convex rate function \(I_{{\mathbb {Q}}}: {\mathbb R}\rightarrow [0,\infty ]\) satisfying \(I_{{\mathbb {Q}}}(s) = \infty \) for all \(s<0\).

  2. ii.

    The limit defining \(q_{{\mathbb {Q}}}\) in (1.7) exists in \((-\infty , \infty ]\) for all \(\alpha \in {\mathbb R}\) and defines a nondecreasing, convex, lower semicontinuous function \(q_{{\mathbb {Q}}}:{\mathbb R}\rightarrow (-\infty ,\infty ]\) satisfying \(q_{{\mathbb {Q}}}(0)=0\). Moreover, the following Legendre–Fenchel duality relations hold:

    $$\begin{aligned} q_{{\mathbb {Q}}}= I_{{\mathbb {Q}}}^* \qquad \text {and}\qquad I_{{\mathbb {Q}}}= q_{{\mathbb {Q}}}^*. \end{aligned}$$
    (1.14)
  3. iii.

    Assume now that \({\mathbb {Q}}={\mathbb {P}}\). Then, in addition to the above, the following hold:

    1. a.

      The sequence \((-\frac{1}{n} \ln {\mathbb {P}}_n)_{n\in {\mathbb N}}\) on \((\Omega , {\mathbb {P}})\) is exponentially tight and \(I_{{\mathbb {P}}}\) is a good rate function satisfying

      (1.15)

      where \([-\gamma _+, -\gamma _-]\) is understood to be \([-\gamma _+, \infty )\) if \(\gamma _- = -\infty \).

    2. b.

      The limit superior (resp. inferior) defining \(\gamma _+\) (resp. \(\gamma _-\)) is actually a limit.

    3. c.

      For every \(\alpha \le 1\),

      $$\begin{aligned} q_{{\mathbb {P}}}(\alpha )\le q_{{\mathbb {P}}}(1) = h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}}). \end{aligned}$$
      (1.16)
    4. d.

      Either \(\gamma _- > -\infty \) and \(q_{{\mathbb {P}}}(\alpha )<\infty \) for all \(\alpha \in {\mathbb R}\), or \(\gamma _- = -\infty \) and \(q_{{\mathbb {P}}}(\alpha ) = \infty \) for all \(\alpha > 1\).

Remark 1.6

As we shall see in Sect. 6, under the assumptions of Theorem 0, the limit defining \(h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\) exists in \([0, \infty ]\), and we have \(I_{{\mathbb {Q}}}(h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}}))=0\) when \(h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})<\infty \) and \(\lim _{s\rightarrow \infty }I_{{\mathbb {Q}}}(s)=0\) when \(h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})=\infty \). In particular, \(I_{{\mathbb {P}}}(h({\mathbb {P}}))=0\) when \({\mathbb {Q}}= {\mathbb {P}}\). These conclusions will extend to the other rate functions to be introduced below; see again Sect. 6.

Remark 1.7

In the course of the proof, we shall establish and use the relation

$$\begin{aligned} I_{{\mathbb {P}}}(s) = s - \lim _{\epsilon \rightarrow 0} \limsup _{n\rightarrow \infty }\frac{1}{n} \ln \left| \left\{ u\in {{\mathcal {A}}}^n:-\frac{1}{n} \ln {\mathbb {P}}_n(u)\in B(s,\epsilon )\right\} \right| , \end{aligned}$$
(1.17)

which can be seen as a classical “energy-entropy competition”. Note that the logarithm is either nonnegative or \(-\infty \), which explains why we have either \(I_{{\mathbb {P}}}\le s\) or \(I_{{\mathbb {P}}}=\infty \) in (1.15).

Remark 1.8

A standard consequence of (1.14) and (1.15), which can also be derived from the definition of \(q_{{\mathbb {P}}}\) directly, is that if \({\mathbb {P}}\) satisfies UD and SLD, then

$$\begin{aligned} -\gamma _\mp = \lim _{\alpha \rightarrow \pm \infty }\frac{q_{{\mathbb {P}}}(\alpha )}{\alpha }. \end{aligned}$$
(1.18)

Readers who are unfamiliar with such properties of Legendre–Fenchel transforms may benefit from reading the introductions in Section 2.3 of [DZ09] and Chapter VI of [Ell06], or the in-depth exposition in [Roc70].

Remark 1.9

The assumption that the alphabet \({{\mathcal {A}}}\) is finite is important, as many of our estimates rely on the constant \(|{{\mathcal {A}}}|\) in a way which does not seem easy to circumvent. We do not expect the results to remain true on countably infinite alphabets without further assumptions on the measures at hand. This is a matter we would like to investigate in future research.

1.2 Main results

We shall establish the LDP and express the rate function and pressure for the sequence \((\tfrac{1}{n} \ln W_n)_{n\in {\mathbb N}}\) in terms of \(I_{{\mathbb {Q}}}\) and \(q_{{\mathbb {Q}}}\). Similarly, the rate functions and pressures for \((\tfrac{1}{n} \ln V_n)_{n\in {\mathbb N}}\) and \((\tfrac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) will be expressed in terms of \(I_{{\mathbb {P}}}\) and \(q_{{\mathbb {P}}}\).

Theorem A

(LDP for \(\frac{1}{n} \ln W_n\)) If \(({\mathbb {P}}, {\mathbb {Q}})\) is an admissible pair, then the following hold:

  1. i.

    The sequence \((\tfrac{1}{n} \ln W_n)_{n\in {\mathbb N}}\) satisfies the LDP with respect to \({\mathbb {P}}\otimes {\mathbb {Q}}\) with the convex rate function \(I_W\) given by

    $$\begin{aligned} I_W(s):= {\left\{ \begin{array}{ll} \infty &{}\text {if }s<0,\\ \inf _{r\ge s}(r-s+I_{{\mathbb {Q}}}(r)) &{}\text {if }s\ge 0. \end{array}\right. } \end{aligned}$$
    (1.19)
  2. ii.

    For all \(\alpha \in {\mathbb R}\), the limit in (1.6) exists in \((0, \infty ]\),

    $$\begin{aligned} q_W(\alpha ) = \max \{q_{{\mathbb {Q}}}(\alpha ), q_{{\mathbb {Q}}}(-1)\}, \end{aligned}$$
    (1.20)

    and the Legendre–Fenchel duality relations \(q_W= I_W^*\) and \(I_W= q_W^*\) hold.

  3. iii.

    If \({\mathbb {Q}}={\mathbb {P}}\), then \(I_W\) is a good rate function and \((\tfrac{1}{n} \ln W_n)_{n\in {\mathbb N}}\) is an exponentially tight family of random variables.

Remark 1.10

The relations (1.19) and (1.20) can be written as

$$\begin{aligned} I_W(s)={\left\{ \begin{array}{ll} s_0-s + I_{{\mathbb {Q}}}(s_0)&{}\text {if } s< s_0,\\ I_{{\mathbb {Q}}}(s)&{}\text {if } s\ge s_0, \end{array}\right. } \quad \text {and}\quad q_W(\alpha ) = {\left\{ \begin{array}{ll} q_{{\mathbb {Q}}}(-1)&{}\text {if } \alpha < -1,\\ q_{{\mathbb {Q}}}(\alpha )&{}\text {if } \alpha \ge -1, \end{array}\right. }\nonumber \\ \end{aligned}$$
(1.21)

where \(s_0\) is any point such that \(-1\) belongs to the subdifferential of \(I_{{\mathbb {Q}}}\) at \(s_0\), or equivalently, such that \(s_0\) belongs to the subdifferential of \(q_{{\mathbb {Q}}}\) at \(-1\). In nontrivial cases, \(q_W\) is not differentiable at \(\alpha = -1\). The situation is depicted on the basis of an example in Fig. 1.

Fig. 1
figure 1

The rate functions and pressures of Theorems 0 and A for some Bernoulli measures \({\mathbb {P}}\) and \({\mathbb {Q}}\); see Sect. 2.1. The rate functions are infinite wherever not drawn

The next two theorems involve only one measure \({\mathbb {P}}\). We remark that the pair \(({\mathbb {P}}, {\mathbb {P}})\) is admissible if and only if \({\mathbb {P}}\) satisfies both UD and SLD. As a consequence, the conclusions of Theorem 0 hold under the assumptions of Theorems B and C; in particular \(I_{{\mathbb {P}}}\) and \(q_{{\mathbb {P}}}\) are well defined, and the numbers \(\gamma _+\in (-\infty , 0]\) and \(\gamma _-\in [-\infty ,0]\) defined in (1.12) are actual limits.

Theorem B

(LDP for \(\frac{1}{n} \ln V_n\)) If \({\mathbb {P}}\in {{\mathcal {P}}}_{\text {inv}}(\Omega )\) satisfies UD and SLD, then the following hold:

  1. i.

    The sequence \((\tfrac{1}{n} \ln V_n)_{n\in {\mathbb N}}\) is exponentially tight and satisfies the LDP with respect to \({\mathbb {P}}\) with the good, convex rate function \(I_V\) given by

    $$\begin{aligned} I_V(s):= {\left\{ \begin{array}{ll} \infty &{}\text {if }s<0,\\ \inf _{r\ge s}(r-s+I_{{\mathbb {P}}}(r)) &{}\text {if }s\ge 0. \end{array}\right. } \end{aligned}$$
    (1.22)
  2. ii.

    For all \(\alpha \in {\mathbb R}\), the limit in (1.5) exists in \((0, \infty ]\),

    $$\begin{aligned} q_V(\alpha ) = \max \{q_{{\mathbb {P}}}(\alpha ), q_{{\mathbb {P}}}(-1)\}, \end{aligned}$$
    (1.23)

    and the Legendre–Fenchel duality relations \(q_V= I_V^*\) and \(I_V= q_V^*\) hold. Moreover,

    $$\begin{aligned} I_V(0) = -q_{{\mathbb {P}}}(-1) \ge - \gamma _+. \end{aligned}$$
    (1.24)

Remark 1.11

The rate function \(I_V\) corresponds to \(I_W\) in the special case \({\mathbb {Q}}= {\mathbb {P}}\). While one can, of course, choose \({\mathbb {Q}}= {\mathbb {P}}\) in Theorem A (in this case it suffices that \({\mathbb {P}}\) satisfies UD and SLD), this special case is not equivalent to Theorem B. Indeed, \(W_n\) and \(V_n\) are still distinct in their definition and underlying probability space. It is known that the range of applicability of almost sure entropy estimation via \(W_n\) is strictly smaller than that via \(V_n\) (or \(R_n\)); see [OW93] and [Shi93, §4].

Theorem C

(LDP for \(\frac{1}{n} \ln R_n\)) If \({\mathbb {P}}\in {{\mathcal {P}}}_{\text {inv}}(\Omega )\) satisfies UD, SLD and PA, then the following hold:

  1. i.

    The sequence \((\tfrac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) is exponentially tight and satisfies the LDP with respect to \({\mathbb {P}}\) with the good (possibly nonconvex) rate function \(I_R\) given by

    $$\begin{aligned} I_R(s):= {\left\{ \begin{array}{ll} \infty &{}\text {if }s<0,\\ -\gamma _+ &{}\text {if }s = 0,\\ \inf _{r\ge s}(r-s+I_{{\mathbb {P}}}(r)) &{}\text {if }s>0. \end{array}\right. } \end{aligned}$$
    (1.25)
  2. ii.

    For all \(\alpha \in {\mathbb R}\), the limit in (1.4) exists in \((0, \infty ]\) and

    $$\begin{aligned} q_R(\alpha ) = \max \{q_{{\mathbb {P}}}(\alpha ), \gamma _+\}. \end{aligned}$$
    (1.26)

    Moreover, \(q_R= I_R^*\).

  3. iii.

    We have the following relations: \(I_R\) is convex \(\iff \) \(I_R= I_V\) \(\iff \) \(q_R= q_V\) \(\iff \) \(\gamma _+ = q_{{\mathbb {P}}}(-1)\) \(\iff \) \(I_{{\mathbb {P}}}(-\gamma _+) = 0\) \(\iff \) \(q_{{\mathbb {P}}}(\alpha ) = -\gamma _+\alpha \) for all \(\alpha \le 0\) \(\gamma _+= \gamma _-\) \(\iff \) \(I_{{\mathbb {P}}}(s) = \infty \) for all \(s \in {\mathbb R}\setminus \{-\gamma _+\}\) \(\iff \) \(h({\mathbb {P}}) = h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}})\). Moreover, if \(\tau _n=O(1)\) and \(C_n=O(1)\), then all the above properties are actually equivalent.

By their definition, the rate functions \(I_R\) and \(I_V\) may differ only at 0, and by (1.24) we have \(I_R(0) = -\gamma _+\le I_V(0)\); see Fig. 2 for an illustration. Part iii gives many technical conditions equivalent to \(I_R= I_V\), the most notable of which is the convexity of \(I_R\). The situation where \(I_R= I_V\) should be seen as quite degenerate; in the generic case, we have \(I_R(0)<I_V(0)\), as strikingly illustrated by the examples in Sects. 2.1 and 2.2. The generic inequality \(I_R(0)<I_V(0)\) is due to fact that the definition of \(R_n\) allows for overlaps (characterized by \(R_n < n\)) which are excluded in \(V_n\). These overlaps, which make the LDP for \((\frac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) highly interesting, were widely studied in other contexts; see the discussion and references in Sect. 1.3. With help of PA, we will show in Sect. 4.2 that \({\mathbb {P}}\{x:R_n(x) < n\}\) asymptotically decreases like \(\textrm{e}^{n\gamma _+}\), which explains the equality \(I_R(0)=-\gamma _+\).

Fig. 2
figure 2

The rate functions and pressures of Theorems 0 and AC for some Bernoulli measure \({\mathbb {P}}\); see Sect. 2.1. The rate functions are infinite wherever not drawn

Remark 1.12

If \(\gamma _+ = 0\), then \(I_{{\mathbb {P}}}(0) = 0\) by (1.15), and thus \(q_{{\mathbb {P}}}(\alpha ) = I_{{\mathbb {P}}}^*(\alpha ) = 0\) for all \(\alpha \le 0\). In this case, we easily see that \(I_V= I_R= I_{{\mathbb {P}}}\) and \(q_V= q_R= q_{{\mathbb {P}}}\).

Remark 1.13

The relation (1.26) can be written as

$$\begin{aligned} q_R(\alpha ) = {\left\{ \begin{array}{ll} \gamma _+&{}\text {if } \alpha < \alpha _*,\\ q_{{\mathbb {P}}}(\alpha )&{}\text {if } \alpha \ge \alpha _*, \end{array}\right. } \end{aligned}$$
(1.27)

where \(\alpha _*\le 0\) is such that \(q_{{\mathbb {P}}}(\alpha _*)=\gamma _+\). If \(\gamma _+ = 0\), then we can take any \(\alpha _*\le 0\) in view of Remark 1.12. If \(\gamma _+<0\), then \(q_{{\mathbb {P}}}(-1)\le \gamma _+ < 0 = q_{{\mathbb {P}}}(0)\) by (1.24), so the point \(\alpha _*\) is unique and satisfies \(\alpha _*\in [-1, 0)\), with \(\alpha _* = -1\) if and only if \(I_V= I_R\).

1.3 Outline of the proof for a toy model

We present a natural toy model, consisting of a mixture of geometric distributions, which not only provides the correct rate functions for \(W_n\) and \(V_n\), but also illustrates the method of our proof. We then modify the toy model in order to take overlaps (corresponding to \(R_n < n\)) into account and guess the correct rate function for \(R_n\).

Approximations of waiting times and return times by geometric random variables (or exponential random variables in the scaling limit) have been widely studied, typically under mixing assumptions; see e.g. [GS97, CGS99, HSV99, AG01, HV10] and [AAG21] for a recent overview and more exhaustive references. As often emphasized in the literature, possible overlaps, which are tightly related to Poincaré recurrence times (see [AV08, AC15, AAG21]), play a crucial role in such approximations. We shall further comment on Poincaré recurrence times at the end of Sect. 4.2.

Our decoupling assumptions do not seem to imply any of the very sharp exponential approximations that are available in the literature. In comparison, the geometric approximation that we prove in Sect. 3.1 are quite loose in the sense that scaling factors and error terms may grow subexponentially. Yet, it suffices to establish the LDPs of interest.

Geometric approximation. We start with the following interpretation of \(W_n(x,y)\): first x is drawn at random according to the law \({\mathbb {P}}\) and y is drawn at random according to the law \({\mathbb {Q}}\), independently of x. Then, for each \(k\in {\mathbb N}\), we check whether \(y_{k}^{k+n-1} = x_1^n\) or not. Once \(x_1^n\) is given, shift invariance implies that \({\mathbb {Q}}\{y\in \Omega : y_{k}^{k+n-1} = x_1^n\} = {\mathbb {Q}}_n(x_1^n)\) for each \(k\in {\mathbb N}\), and we can view \(W_n\) as the time of the first “success” in a series of attempts (indexed by k). If the attempts were mutually independent, \(W_n\) would be a geometric random variable with random parameter \(p_n = {\mathbb {Q}}_n(x_1^n)\). Of course, these attempts are not independent: even if \({\mathbb {Q}}\) is a Bernoulli measure, the attempts k and \(k'\) are only independent for \(|k'-k| \ge n\). However, it turns out, due to our decoupling assumptions, that the asymptotic behavior of \(W_n\) at the scale that is relevant to our LDP is accurately captured by this simplified geometric model.

We now define the toy model properly. Let \({\widetilde{W}}_n\) the random variable whose law \(\nu _n\) on \({\mathbb N}\) is given by

$$\begin{aligned} \nu _n(k):= \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u){\mathbb {Q}}_n(u)(1-{\mathbb {Q}}_n(u))^{k-1}, \end{aligned}$$
(1.28)

for every \(k\in {\mathbb N}\). This is a simple mixture of geometric distributions, motivated by the above discussion. The next proposition describes the large deviations of \({\widetilde{W}}_n\). Since it is introduced only for illustration purposes, and since the actual proof of the proposition relies on estimates which are similar to — but simpler than — those we provide for \(W_n\) in the main body of the paper, we limit ourselves to sketching the proof. The interested reader will easily be able to fill in the details.

Proposition 1.14

If the pair \(({\mathbb {P}}, {\mathbb {Q}})\) is admissible, then the sequence \((\frac{1}{n} \ln {\widetilde{W}}_n)_{n\in {\mathbb N}}\) satisfies the LDP with the rate function \(I_W\) defined in (1.19).

Sketch of the proof

For each \(s\in M_n:=\{\frac{1}{n} \ln k: k\in {\mathbb N}\}\), the probability that \(\frac{1}{n} \ln {\widetilde{W}}_n\) equals s is given by

$$\begin{aligned} \nu _n(\textrm{e}^{ns}) = \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u){\mathbb {Q}}_n(u)(1-{\mathbb {Q}}_n(u))^{\textrm{e}^{ns}-1} = \int _{(0,\infty )} \textrm{e}^{-nr}(1-\textrm{e}^{-nr})^{\textrm{e}^{ns}-1} \textrm{d}\mu _n(r), \nonumber \\ \end{aligned}$$
(1.29)

where we denote by \(\mu _n\) the distribution of \(-\frac{1}{n} \ln {\mathbb {Q}}_n\) with respect to \({\mathbb {P}}\). Now using that \(\textrm{e}^{-nr}\) is very small for n large, a formal first order Taylor expansion yields

$$\begin{aligned} (1-\textrm{e}^{-nr})^{\textrm{e}^{ns}-1} \sim \exp (-(\textrm{e}^{ns}-1)\textrm{e}^{-nr}) \sim \exp (-\textrm{e}^{n(s-r)}). \end{aligned}$$
(1.30)

We thus have a cut-off phenomenon: the above vanishes superexponentially when \(r<s\), and is very close to 1 when \(r>s\). Formal substitution into (1.29) yields

$$\begin{aligned} \nu _n(\textrm{e}^{ns}) \sim \int _{(s,\infty )} \textrm{e}^{-nr} \textrm{d}\mu _n(r). \end{aligned}$$

We remark that this cut-off argument corresponds to retaining, in the sum in (1.28), only the \(u\in {{\mathcal {A}}}^n\) such that \({\mathbb {Q}}_n(u) \lesssim \textrm{e}^{-ns}\).

Now, for \(s>0\) and \(\epsilon \) small, the set \(B(s,\epsilon )\cap M_n\) contains approximately \(\textrm{e}^{n(s+\epsilon )}-\textrm{e}^{n(s-\epsilon )}\sim \textrm{e}^{ns}\) points, so

$$\begin{aligned} \nu _n\left\{ k\in {\mathbb N}: \frac{1}{n}\ln k \in B(s,\epsilon )\right\} \sim \textrm{e}^{ns}\nu _n(\textrm{e}^{ns}) \sim \int _{(s,\infty )} \textrm{e}^{n(s-r)} \textrm{d}\mu _n(r). \end{aligned}$$
(1.31)

Formally, Theorem 0 says that \(\textrm{d}\mu _n(r)\sim \textrm{e}^{-nI_{{\mathbb {Q}}}(r)}\textrm{d}r\), and a saddle-point approximation in (1.31) then yields

$$\begin{aligned} \frac{1}{n} \ln \nu _n\left\{ k\in {\mathbb N}: \frac{1}{n}\ln k \in B(s,\epsilon )\right\} \sim \sup _{r\ge s}(s-r-I_{{\mathbb {Q}}}(r)) = -I_W(s). \end{aligned}$$

The same conclusion applies to the case \(s=0\) without the need for any cut-off argument. This local formulation of the LDP (i.e. concerning only small balls) implies the weak LDP (see Sect. 4), which can in turn be promoted to a full LDP using the arguments of Sect. 5. \(\square \)

The fact that we consider \(W_n(x,y)\) for x and y that are mutually independent was an important ingredient of the above argument. When moving on to \(V_n\) and \(R_n\) (and replacing \({\mathbb {Q}}\) with \({\mathbb {P}}\)), we look for occurrences of \(x_1^n\), not in an independent sample y, but in x itself, which introduces more dependence. Conditioned on the event [u] for some \(u\in {{\mathcal {A}}}^n\), the random variable \(V_n\) corresponds to the first success time of a series of attempts, where the k-th attempt is successful if \(x_{n+k}^{2n+k-1} = u\). Contrary to the case of \(W_n\), the conditional success probability given [u] of the k-th attempt is not simply given by \({\mathbb {P}}_n(u)\) since the coordinates \(x_1^n\) and \(x_{n+k}^{2n+k-1}\) are not independent in general.Footnote 4 However, by our decoupling assumptions, and since the intervals [1, n] and \([n+k, 2k+k-1]\) do not overlap, this added dependence will not actually alter the asymptotics. The arguments of Proposition 1.14 then suggest that \(V_n\) obeys the LDP with the rate function \(I_V\) of (1.22).

Return time and overlaps. The picture for \(R_n\) is more complicated: while the dependence between \(x_{k+1}^{k+n}\) and \(x_1^n\) will not significantly alter our estimates for k very large, this dependence will play a major role when \(k< n\), due to overlap; see the examples in Sect. 2. We shall prove in Sect. 4.2 that, under the assumptions of Theorem C,

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n} \ln {\mathbb {P}}\{x:R_n(x) < n\} = \gamma _+. \end{aligned}$$
(1.32)

This suggests the following picture: with probability close to \(\textrm{e}^{n\gamma _+}\) we have “very quick return” (\(R_n < n\)), and with probability close to \(1-\textrm{e}^{n\gamma _+}\) we start a series of trials as in the case of \(V_n\). As before, let us pretend these trials are independent, and that their probability of success is exactly \({\mathbb {P}}_n(x_1^n)\). Noting that the asymptotics of \(\frac{1}{n} \ln R_n\) is not affected by adding to \(R_n\) quantities of order n, we are led to introduce a toy model \({\widetilde{R}}_n\) for \(R_n\) whose law \(\rho _n\) on \({\mathbb N}\) is given by

$$\begin{aligned} \rho _n(k):= \textrm{e}^{n\gamma _+}\delta _{k,1}+ (1-\textrm{e}^{n\gamma _+})\nu _n(k) \end{aligned}$$
(1.33)

for every \(k\in {\mathbb N}\), where \(\nu _n\) is as in (1.28) with \({\mathbb {Q}}={\mathbb {P}}\). A straightforward adaptation of Proposition 1.14, taking the first term in (1.33) into account at \(s=0\), and also using that \(\gamma _+ \ge \sup _{r\ge 0}(-r-I_{{\mathbb {P}}}(r))\) by Theorem 0.iii.a., shows that \({\widetilde{R}}_n\) satisfies the LDP with the rate function \(I_R\) of (1.25).

Turning the sketch into a proof. We now briefly comment on how the above sketch will be made rigorous in the main body of the paper. In Sect. 3.1, we establish that for fixed x, the random variable \(W_n(x, \cdot \,)\) on \((\Omega ,{\mathbb {Q}})\) is indeed well approximated by a geometric random variable at the exponential scale. The above cut-off argument is made rigorous in Proposition 3.5. The corresponding estimates for \(R_n\) and \(V_n\) are presented in Sect. 3.2.

The saddle-point approximation mentioned above is made rigorous by the variation of Varadhan’s lemma provided in Lemma 4.1, which then leads to the weak LDP for the sequences of interest. As with the above toy model, we need to treat the cases \(s > 0\) (Sect. 4.1) and \(s = 0\) (Sect. 4.2) separately — the latter is particularly subtle for \(R_n\) and PA will be needed to establish (1.32). With the weak LDPs at hand, the main results are proved in Sect. 5.

2 Examples

Decoupling assumptions are satisfied by many important classes of examples, and have allowed to simplify and unify the proofs of various large deviation principles that existed in many, sometimes rather technical, forms in the literature. The range of applicability of UD and SLD has already been discussed in [CJPS19, BCJP21, CDEJR23a, CDEJR23b].

We describe in this section some classes of examples that serve as illustrations of different features of our main results. Sections 2.12.5 each assume some level of familiarity with the specific examples on the reader’s part, and may be skipped entirely without affecting the reader’s ability to understand the proofs of the main theorems. We now briefly summarize the role of each of these examples.

  1. 1.

    In the Bernoulli (IID) case (Sect. 2.1), formulae for the different pressures and rate functions can be quickly derived, and are easy to understand. To the best of our knowledge, the global aspect (i.e. without any restriction to a strict subinterval of \({\mathbb R}\)) of the LDPs in Theorems AC as well as the ability to consider distinct measures \({\mathbb {P}}\) and \({\mathbb {Q}}\) in Theorem A are new even in this most basic class of examples.

  2. 2.

    By going from Bernoulli measures to Markov measures (Sect. 2.2), we start seeing the benefit of inserting the words \(\xi \) in the formulation of SLD, as they allow to deal with irreducible Markov chains whose transition probabilities are not all positive. Markov measures also make the role of periodicity in PA very clear.

  3. 3.

    Discussing our assumptions and results in the setup of equilibrium measures for potentials enjoying Bowen’s regularity condition (Sect. 2.3) allows us to compare our results to existing ones, most notably to those of [AACG23] where the pressures \(q_W\) and \(q_R\) were studied.

  4. 4.

    We then discuss two situations in which Bowen’s regularity condition can be lifted. While carrying distinct history and intuition, they both reveal the same two aspects of our decoupling assumptions. First, they make a case that allowing a certain amount of growth for the sequence \((C_n)_{n\in {\mathbb N}}\) in UD and SLD is beneficial. Second, they show that our assumptions apply in phase-transition situations, and in particular do not imply ergodicity. These generalizations are:

    1. i.

      equilibrium measures for absolutely summable interactions in statistical mechanics (Sect. 2.4.1);

    2. ii.

      equilibrium measures for g-functions, i.e. g-measures (Sect. 2.4.2).

    While our results do not seem to apply to the class of weak Gibbs measures in full generality (see Sect. 2.4), the measures discussed in Sects. 2.4.1 and 2.4.2 are weak Gibbs.

  5. 5.

    Finally, the so-called class of hidden Markov models (Sect. 2.5) shows that our assumptions apply to measures which are far from Gibbsian. Hidden Markov models also provide examples of pairs of measures \(({\mathbb {P}}, {\mathbb {Q}})\) where the distributions of \((\frac{1}{n} \ln W_n)_{n\in {\mathbb N}}\) lack exponential tightness, showing that \(I_W\) (see Theorem A) need not be a good rate function when \({\mathbb {Q}}\ne {\mathbb {P}}\). To the authors’ knowledge, even the conclusions of Theorem 0 in this setup are new.

As abundantly discussed in [BJPP18, CJPS19, BCJP21, CDEJR23a, CDEJR23b], repeated quantum measurement processes give rise to a very rich class of measures satisfying our decoupling assumptions, yet displaying remarkable singularities — some being, once again, far from Gibbsian. For reasons of space, we do not repeat such a discussion in the present paper.

2.1 Bernoulli measures

Consider the simple case \({\mathbb {P}}= {\mathbb {P}}_1^{\otimes {\mathbb N}}\) and \({\mathbb {Q}}= {\mathbb {Q}}_1^{\otimes {\mathbb N}}\), where \({\mathbb {P}}_1\) and \({\mathbb {Q}}_1\) are measures on \({{\mathcal {A}}}\), with \({\mathbb {P}}_1 \ll {\mathbb {Q}}_1\). SLD, UD and PA obviously hold with \(C_n = 1\) and \(\tau _n = 0\) for all n. Note that, as a random variable on \((\Omega , {\mathbb {P}})\), the map \(x\mapsto -\tfrac{1}{n} \ln {\mathbb {Q}}_n(x_1^n) = \frac{1}{n}\sum _{i=1}^n (-\ln {\mathbb {Q}}_1(x_i))\) is simply the average of IID random variables supported on the finite set \(\{-\ln {\mathbb {Q}}_1(a): a\in {{\,\textrm{supp}\,}}{\mathbb {P}}_1\} \subset {\mathbb R}\). By independence, \(q_{{\mathbb {Q}}}(\alpha )\) is easily seen to coincide with the cumulant-generating function:

$$\begin{aligned} q_{{\mathbb {Q}}}(\alpha ) = \ln \int _{{{\mathcal {A}}}} \textrm{e}^{-\alpha \ln {\mathbb {Q}}_1 (a)}\textrm{d}{\mathbb {P}}_1(a)=\ln \sum _{a\in {{\,\textrm{supp}\,}}{\mathbb {P}}_1} {\mathbb {P}}_1(a) {\mathbb {Q}}_1 (a)^{-\alpha }. \end{aligned}$$
(2.1)

The LDP proved in Theorem 0 then follows from standard results. We mention two methods which lead to different expressions of the rate function \(I_{{\mathbb {Q}}}\).

Method 1. Since \(\ln {\mathbb {Q}}_n(x_1^n)\) is a sum of IID random variables, Cramér’s theorem [DZ09, §2.2.1] yields the stated LDP with a rate function given by the Legendre–Fenchel transform of the cumulant-generating function (2.1), i.e. with a rate function \(I_{{\mathbb {Q}}}= q_{{\mathbb {Q}}}^*\). This can be seen as a special case of the Gärtner–Ellis theorem [DZ09, §2.3], which applies here since by (2.1) the function \(q_{{\mathbb {Q}}}\) is differentiable (and actually real-analytic).

Method 2. One can instead appeal to a combination of Sanov’s theorem and the contraction principle [DZ09, §2.1.1–2.1.2] to obtain the LDP with rate function

$$\begin{aligned} I_{{\mathbb {Q}}}(s) = \inf _{\mu \in L_s} H_{\textrm{r}}(\mu |{\mathbb {P}}_1), \end{aligned}$$
(2.2)

where \(H_{\textrm{r}}\) denotes the relative entropy and \(L_s\) is the set of probability measures \(\mu \ll {\mathbb {P}}_1\) (hence also satisfying \(\mu \ll {\mathbb {Q}}_1\)) on \({{\mathcal {A}}}\) subject to the constraint

$$\begin{aligned} -\sum _{a\in {{\mathcal {A}}}} \mu (a) \ln {\mathbb {Q}}_1(a) = s. \end{aligned}$$
(2.3)

When \(s \notin [-\ln \max _{a\in {{\mathcal {A}}}} {\mathbb {Q}}_1(a), -\ln \min _{a\in {{\,\textrm{supp}\,}}{\mathbb {Q}}_1} {\mathbb {Q}}_1(a)]\), the set \(L_s\) is empty and the infimum is set to \(\infty \) by convention. Note that \(I_{{\mathbb {Q}}}\) vanishes at the unique point \(s = -\sum _{a\in {{\mathcal {A}}}} {\mathbb {P}}_1(a) \ln {\mathbb {Q}}_1(a)= h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\), where the infimum in (2.2) is attained at \(\mu = {\mathbb {P}}_1\).

We now turn to Theorems B and C, and we discuss in particular the convexity of \(I_R\). The relevant quantities are easily expressed in terms of \({\mathbb {P}}_1\):

$$\begin{aligned} \gamma _+&= \ln \max _{a\in {{\,\textrm{supp}\,}}{\mathbb {P}}_1} {\mathbb {P}}_1(a),&h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}})&= \ln |{{\,\textrm{supp}\,}}{\mathbb {P}}_1|, \\ \gamma _-&= \ln \min _{a\in {{\,\textrm{supp}\,}}{\mathbb {P}}_1} {\mathbb {P}}_1(a),&q_{{\mathbb {P}}}(-1)&= \ln \sum _{a\in {{\,\textrm{supp}\,}}{\mathbb {P}}_1} {\mathbb {P}}_1(a)^2. \end{aligned}$$

In view of these expressions, \(\gamma _- \le q_{{\mathbb {P}}}(-1) \le \gamma _+\) with strict inequalities unless \({\mathbb {P}}_1\) is constant on its support. To see this, notice that \( \sum _{a\in {{\mathcal {A}}}} {\mathbb {P}}_1(a)^2\) is the expectation of the function \(a\mapsto {\mathbb {P}}_1(a)\) with respect to the measure \({\mathbb {P}}_1\) on \({{\mathcal {A}}}\). To discuss the convexity of \(I_R\) we distinguish two cases.

Singular case. If \({\mathbb {P}}_1\) is constant on its support, then \(\gamma _- = q_{{\mathbb {P}}}(-1) = \gamma _+\), so \(I_R\) is convex by Theorem C.iii. Moreover, we readily obtain \( h({\mathbb {P}}) = h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}}), \) so \({\mathbb {P}}\) is indeed the measure of maximal entropy on its support, in accordance with Theorem C.iii. Next, since \(-\frac{1}{n} \ln {\mathbb {P}}_n(x_1^n) = h({\mathbb {P}})\) almost surely, the rate function \(I_{{\mathbb {P}}}\) vanishes at \(h({\mathbb {P}})\) and is infinite everywhere else. Dual to this, \(q_{{\mathbb {P}}}(\alpha ) = h({\mathbb {P}})\alpha \) for all \(\alpha \in {\mathbb R}\). In the quite extreme case where \(|{{\,\textrm{supp}\,}}{\mathbb {P}}_1| = 1\), i.e. if \({\mathbb {P}}\) is a Dirac measure on an orbit of period 1, we find \(h({\mathbb {P}}) = 0\) and \(q_{{\mathbb {P}}}(\alpha ) = 0\) for all \(\alpha \in {\mathbb R}\).

Generic case. If \({\mathbb {P}}_1\) is not constant on its support, then \(\gamma _-< q_{{\mathbb {P}}}(-1) < \gamma _+\), and thus the rate function \(I_R\) is nonconvex by Theorem C.iii. In the present setup, it is easy to understand why \(I_R(0) < I_V(0)\), as we now discuss. Let \(\epsilon \) be small and n be large. On the one hand, if \({\hat{a}}\in {{\mathcal {A}}}\) is such that \(\ln {\mathbb {P}}_1({\hat{a}}) = \gamma _+\), we find \(R_n(x) = 1\) for all \(x\in [{\hat{a}}^{n+1}]\), so

$$\begin{aligned} \begin{aligned} {\mathbb {P}}\{x: R_n(x) \le \textrm{e}^{\epsilon n}\}&\ge {\mathbb {P}}\left\{ x: R_n(x) = 1 \right\} \\&\ge {\mathbb {P}}_{n+1}({\hat{a}}^{n+1}) \\&= \exp \left( (n+1) \gamma _+\right) . \end{aligned} \end{aligned}$$
(2.4)

On the other hand, \(V_n(x) = k\) implies that \(x\in [u]\cap T^{-n-k+1}[u]\) for some \(u\in {{\mathcal {A}}}^n\), so

$$\begin{aligned} {\mathbb {P}}\{x : V_n(x) = k\}&\le \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}([u] \cap T^{-n-k+1}[u]) = \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u)^2 \\&= \Big (\sum _{a\in {{\mathcal {A}}}}{\mathbb {P}}_1(a)^2\Big )^n = \exp \left( n q_{{\mathbb {P}}}(-1)\right) , \end{aligned}$$

where we have used (2.1) with \({\mathbb {Q}}={\mathbb {P}}\). Thus,

$$\begin{aligned} {\mathbb {P}}\{x: V_n(x) \le \textrm{e}^{\epsilon n}\} \le \sum _{k=1}^{\lfloor \textrm{e}^{\epsilon n}\rfloor } {\mathbb {P}}\{x: V_n(x) = k\} \le \exp \left( n\epsilon + n q_{{\mathbb {P}}}(-1)\right) . \end{aligned}$$
(2.5)

Therefore, for \(\epsilon \) small enough, the right-hand side of (2.4) decays exponentially faster than the right-hand side of (2.5) since \(q_{{\mathbb {P}}}(-1)<\gamma _+\). The estimates (2.4) and (2.5), despite the rather crude inequalities in (2.4), turn out to be sharp at the exponential scale. Indeed, we find \(I_V(0) = -q_{{\mathbb {P}}}(-1)\) and \(I_R(0) = -\gamma _+\) in Theorems B and C.

Remark 2.1

For Bernoulli measures, in the case \({\mathbb {Q}}={\mathbb {P}}\), \(W_n\) and \(V_n\) have the same law.

We provide three figures corresponding to Bernoulli measures, the first two of which were displayed in Sect. 1.2:

  • Fig. 1: \({{\mathcal {A}}}=\{\textsf {0}, \textsf {1}, \textsf {2}\}\) and \({\mathbb {P}}_1 = 0.2\,\delta _{\textsf {0}}+0.3\,\delta _{\textsf {1}}+0.5\,\delta _{\textsf {2}}\) and \({\mathbb {Q}}_1 = 0.6\,\delta _{\textsf {0}}+0.3\,\delta _{\textsf {1}}+0.1\,\delta _{\textsf {2}}\).

  • Fig. 2: \({{\mathcal {A}}}=\{\textsf {0}, \textsf {1}\}\) and \({\mathbb {P}}_1 = {\mathbb {Q}}_1 = 0.3\,\delta _{\textsf {0}}+0.7\,\delta _{\textsf {1}}\).

  • Fig. 3: \({{\mathcal {A}}}=\{\textsf {0}, \textsf {1}, \textsf {2}\}\) and \({\mathbb {P}}_1 = {\mathbb {Q}}_1 = 0.15\,\delta _{\textsf {0}}+0.15\,\delta _{\textsf {1}}+0.7\,\delta _{\textsf {2}}\). Notice that \(\ln {\mathbb {P}}_1(\textsf {0}) = \ln {\mathbb {P}}_1(\textsf {1}) = -\gamma _-\). Such degeneracy implies that the second term in (1.17) takes the value \(\ln 2\) for \(s=-\gamma _-\). Indeed, when \(\epsilon \) is very small, the cardinality in (1.17) is essentially \(2^n\). As a consequence, \(I_{{\mathbb {P}}}(-\gamma _-) = -\gamma _--\ln 2\). A similar phenomenon occurs at \(-\gamma _+\) if several letters in \({{\mathcal {A}}}\) have maximum probability.

Fig. 3
figure 3

Rate functions and pressures for a Bernoulli measure with degenerate least probable letter. The coordinates of the point A are \((-\gamma _-, -\gamma _--\ln 2)\). Rate functions are infinite wherever not drawn

2.2 Irreducible Markov measures

Let \({\mathbb {P}}\) and \({\mathbb {Q}}\) be two stationary Markov measures on \({{\mathcal {A}}}\), with transition matrices \(P = [P_{i,j}]_{i,j\in {{\mathcal {A}}}}\) and \(Q = [Q_{i,j}]_{i,j\in {{\mathcal {A}}}}\) respectively. We assume that the matrices P and Q are irreducible in the sense that there exists \(M\in {\mathbb N}\) such that all entries in the matrices \(\sum _{i=1}^M P^i\) and \(\sum _{i=1}^M Q^i\) are positive. Then,

$$\begin{aligned} {\mathbb {P}}_n(x_1^n) = {\mathbb {P}}_1(x_1)\prod _{k=1}^{n-1}P_{x_k, x_{k+1}} \qquad \text {and}\qquad {\mathbb {Q}}_n(x_1^n) = {\mathbb {Q}}_1(x_1)\prod _{k=1}^{n-1}Q_{x_k, x_{k+1}}, \end{aligned}$$
(2.6)

where \({\mathbb {P}}_1\) and \({\mathbb {Q}}_1\) are the (unique and fully supported) invariant probability vectors for the matrices P and Q respectively. We assume, furthermore, that \(Q_{i,j} = 0\) implies \(P_{i,j} = 0\), which ensures that \({\mathbb {P}}_n \ll {\mathbb {Q}}_n\) for all \(n\in {\mathbb N}\). It follows from Lemma A.3 in [CJPS19] that JSLD and UD hold with \(C_n=O(1)\) and \(\tau _n=M-1\), so the pair \(({\mathbb {P}}, {\mathbb {Q}})\) is admissible. Similar arguments also yield PA. It is worth noting that if P is further assumed to be aperiodic, i.e. if there exists \(M\in {\mathbb N}\) such that all entries in the matrix \(P^M\) are positive, then we can fix \(\ell =\tau _n=M-1\) in (1.10), whereas if P is merely irreducible, then \(\ell \) must be allowed depend on u and v. This illustrates the importance of the condition \(\ell \le \tau _n\) in SLD instead of \(\ell = \tau _n\); see the discussion in [CJPS19, §§2.5;A.1].

Our results thus apply to the setup described here. As in the previous example, the conclusions of Theorem 0 can be derived using classical methods, which yield natural expressions for \(I_{{\mathbb {Q}}}\) and \(q_{{\mathbb {Q}}}\), as we now show.

Method 1. In view of (2.6), the pressure is easily seen to be given by

$$\begin{aligned} q_{{\mathbb {Q}}}(\alpha ) = \ln {\text {spr}}(M(\alpha )), \end{aligned}$$

where the spectral radius is computed for the deformation of the stochastic matrix P defined by \(M_{i,j}(\alpha ):= P_{i,j}Q_{i,j}^{-\alpha }\). By the Perron–Frobenius theorem and analytic perturbation theory, \(\alpha \mapsto q_{{\mathbb {Q}}}(\alpha )\) is real-analytic, so the LDP for \((-\frac{1}{n}\ln {\mathbb {Q}}_n)_{n\in {\mathbb N}}\) with the rate function \(I_{{\mathbb {Q}}}= q_{{\mathbb {Q}}}^*\) follows from the Gärtner–Ellis theorem [DZ09, §2.3].

Method 2. From (2.6), we see that

$$\begin{aligned} \ln {\mathbb {Q}}_n(x_1^n) =\sum _{k = 1}^{n-1} \ln Q_{x_k, x_{k+1}} + O(1) \end{aligned}$$

for all \(x\in {{\,\textrm{supp}\,}}{\mathbb {P}}\subseteq {{\,\textrm{supp}\,}}{\mathbb {Q}}\), so the LDP for \((-\frac{1}{n}\ln {\mathbb {Q}}_n)_{n\in {\mathbb N}}\) reduces to that of a sequence of Birkhoff sums. By Sanov’s theorem applied to the pair empirical measures \({{\mathcal {E}}}_n(x):= \frac{1}{n} \sum _{k=1}^n \delta _{(x_k, x_{k+1})}\in {{\mathcal {P}}}({{\mathcal {A}}}^2)\) and the contraction principle, one derives the expression

$$\begin{aligned} I_{{\mathbb {Q}}}(s) = \inf _{\mu \in L_s^{(2)}} H_{\textrm{r}}(\mu |\mu _1 \otimes P), \end{aligned}$$
(2.7)

where \(L_s^{(2)}\) is the set of probability measures \(\mu \) on \({{\mathcal {A}}}\times {{\mathcal {A}}}\) such that \(\mu \ll \mu _1\otimes P\), \(\mu _1 = \mu _2\), and

$$\begin{aligned} \sum _{i,j\in {{\mathcal {A}}}}\mu (i,j)\ln Q_{i,j} = -s; \end{aligned}$$

see [DZ09, §3.1.3]Footnote 5 or Lemma 4.49 in [DS89]. Here, \(\mu _1\) (resp. \(\mu _2\)) denotes the first (resp. second) marginal of \(\mu \), and \(\mu _1\otimes P\) denotes the measure on \({{\mathcal {A}}}\times {{\mathcal {A}}}\) defined by \((\mu _1\otimes P) (i,j) = \mu _1(i)P_{i,j}\). Note that \(I_{{\mathbb {Q}}}\) vanishes at the single point \(s = -\sum _{i,j\in {{\mathcal {A}}}}{\mathbb {P}}_2(i,j)\ln Q_{i,j} = h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\), where the infimum in (2.7) is attained at \(\mu = {\mathbb {P}}_2\).

We now discuss \(I_R\) and its convexity, with \({\mathbb {P}}={\mathbb {Q}}\) a Markov measure as above. It is well known that

$$\begin{aligned} \gamma _+ = \max _{p\le |{{\mathcal {A}}}|} \max _{u\in {{\mathcal {A}}}^p} \frac{1}{p} \ln \prod _{k=1}^{p} P_{u_k,u_{k+1}}, \end{aligned}$$
(2.8)

with cyclic identification \(u_{p+1} = u_1\); see e.g. Remark 1.ii in [Szp93] or [AACG23, §3.4]. The heuristic interpretation is the following: while in the generic IID case discussed in Sect. 2.1 the probability \({\mathbb {P}}\{x: R_n(x) \le \textrm{e}^{\epsilon n}\}\) was, at exponential scale, captured by the subset \(\{x: x_1^{n+1} = {\hat{a} \hat{a} \hat{a}}\cdots {\hat{a} \hat{a}}\}\) of the event \(\{x: R_n(x) = 1\}\) for n large and \(\epsilon \) small, now \({\mathbb {P}}\{x: R_n(x) \le \textrm{e}^{\epsilon n}\}\) is essentially accounted for by the subset \(\{x: x_1^{n+p} = u u u \cdots u u_1^{r}\}\) of the event \(\{x: R_n(x) = p\}\), for any p and u that saturate (2.8), and \(r = n - \lfloor \frac{n}{p} \rfloor p\). In other words, the key scenario for small return times consists of a periodic orbit repeating some optimal cycle.

By the last part of Theorem C, the function \(I_R\) is convex if and only if \({\mathbb {P}}\) is the measure of maximal entropy on \({{\,\textrm{supp}\,}}{\mathbb {P}}\), that is if and only if \({\mathbb {P}}\) is the Parry measure of \({{\,\textrm{supp}\,}}{\mathbb {P}}\), which is characterized by the Perron–Frobenius data of the adjacency matrix of the chain [Par64]; see also [CGS99, §2]. In particular, if \(P_{i,j}>0\) for all \(i,j\in {{\mathcal {A}}}\), then \(I_R\) is convex if and only if \({\mathbb {P}}= \pi ^{\otimes {\mathbb N}}\) with \(\pi \) uniform on \({{\mathcal {A}}}\).

Remark 2.2

One can show that irreducible multi-step Markov measures also satisfy the assumptions in the present paper. In fact, Markov measures and multi-step Markov measures are merely special cases of the equilibrium measures discussed in Sect. 2.3. For example, in the notation of Sect. 2.3, one takes \(\varphi (x) = \ln P_{x_1, x_2}\) in the case of Markov measures. An explicit computation of \(q_{{\mathbb {P}}}\) for a specific Markov chain is provided in [AACG23].

2.3 Equilibrium measures for Bowen potentials

As mentioned above, our assumptions cover the setups of [CGS99] and [AACG23], where the large-deviation results are local in the sense that there is some strict subinterval \(A \subset {\mathbb R}\) such that the large-deviation lower bound and upper bound (see (4.1) and (4.2) below) are only shown to be valid respectively for open sets O contained in A and closed sets \(\Gamma \) contained in A. The former work considers a single measure \({\mathbb {P}}\) that is the equilibrium measure for a Hölder-continuous potential on a topologically mixing Markov subshift; the latter work, for a potential of summable variations on the full shift.

In particular, Theorem A.i and Theorem C.i prove the following conjecture stated in [AACG23, §3.3]:

We believe that there exists a non-trivial rate function describing the large deviation asymptotic [on the whole real line] for both return and waiting times, but this has to be proven using another method.

Moreover, the results of [AACG23, §3.2] are recovered as special cases of Theorem A.ii and Theorem C.ii. In order to facilitate the translation, we compare notations in Table 1. Note also that what we refer to as pressure is called “\(L^q\)-spectrum” in [AACG23].

Table 1 A summary of the notational changes between our results and those of [AACG23]

We first consider the class of Bowen-regular potentials on the full shift, and then discuss the extension to some subshifts (including those of [CGS99]) in Remark 2.3. We recall that a potential \(\varphi \), i.e. a continuous function \(\varphi :\Omega \rightarrow {\mathbb R}\), is called Bowen regular if

$$\begin{aligned} \sup \left\{ \left| \sum _{i=0}^{n-1}\varphi (T^{i}x)- \sum _{i=0}^{n-1} \varphi (T^{i}y)\right| : x \in \Omega , n\in {\mathbb N}, y \in [x_1^n] \right\} < \infty , \end{aligned}$$
(2.9)

and that this class strictly contains the class of potentials with summable variations, which in turn strictly contains the class of Hölder-continuous potentials; we refer the reader to [Bow74] and [Wal01, §4] for a thorough discussion. We also recall that the topological pressure of any potential \(\varphi \) is given by

$$\begin{aligned} p_{\text {top}}(\varphi ):= \lim _{n\rightarrow \infty }\frac{1}{n} \ln \sum _{u \in {{\mathcal {A}}}^n}\textrm{e}^{\sup _{x \in [u]} \sum _{i=0}^{n-1} \varphi (T^{i}x)}. \end{aligned}$$
(2.10)

Suppose that \({\mathbb {P}}\) and \({\mathbb {Q}}\) are the (necessarily unique [Bow74]) equilibrium measures for the Bowen-regular potentials \(\varphi \) and \(\psi \) on \(\Omega \) in the sense that they belong to \({{\mathcal {P}}}_{\text {inv}}(\Omega )\) and satisfy

$$\begin{aligned} h({\mathbb {P}}) + \int \varphi \textrm{d}{\mathbb {P}}= p_{\text {top}}(\varphi ) \qquad \text {and}\qquad h({\mathbb {Q}}) + \int \psi \textrm{d}{\mathbb {Q}}= p_{\text {top}}(\psi ), \end{aligned}$$
(2.11)

respectively. The measure \({\mathbb {P}}\) then satisfies the Bowen–Gibbs property with respect to \(\varphi \), i.e. there exists a constant \(K\ge 1\) such that

$$\begin{aligned} K^{-1} \textrm{e}^{\sum _{i=0}^{n-1} \varphi (T^{i}x)-np_{\text {top}}(\varphi )}\le {\mathbb {P}}([x_1^n]) \le K \textrm{e}^{\sum _{i=0}^{n-1} \varphi (T^{i}x)-np_{\text {top}}(\varphi )}, \end{aligned}$$
(2.12)

for every \(x \in \Omega \) [Wal01, §4], and one then deduces that it satisfies UD and SLD with \(\tau _n = 0\) and \(C_n = K^3\). The same is true for \({\mathbb {Q}}\) with \(\psi \), so JSLD and PA follow from Remarks 1.2 and 1.5 respectively, and thus the pair \(({\mathbb {P}}, {\mathbb {Q}})\) is admissible. We note that (2.12) is central in the analysis in [AACG23, §2.1]. In order to simplify some formulae, we assume for the remainder of this subsection that

$$\begin{aligned} p_{\text {top}}(\varphi ) = p_{\text {top}}(\psi ) = 0, \end{aligned}$$
(2.13)

which results in no loss of generality since adding constants to \(\varphi \) and \(\psi \) does not alter the set of equilibrium measures.

In the setup of the present subsection, the conclusions of Theorem 0 are well known and can be obtained more directly as follows.

Method 1. The bounds (2.12) imply that

$$\begin{aligned} q_{{\mathbb {Q}}}(\alpha ) = p_{\text {top}}(\varphi -\alpha \psi ). \end{aligned}$$
(2.14)

In particular, we remark that \(q_{{\mathbb {P}}}(-1) = p_{\text {top}}(2\varphi )\); this quantity plays an important role in the formula for \(q_W\), and is equal to \(-I_W(0)\). In this setup, \(q_{{\mathbb {Q}}}\) is differentiable and, for all \(\alpha \in {\mathbb R}\),

$$\begin{aligned} q_{{\mathbb {Q}}}'(\alpha ) = -\int \psi \textrm{d}\mu _\alpha , \end{aligned}$$

where \(\mu _\alpha \) is the equilibrium measure for the Bowen-regular potential \(\varphi -\alpha \psi \); see e.g. Theorems 4.3.3 and 4.3.5 in [Kel98]. The LDP of Theorem 0 then follows from the Gärtner–Ellis theorem.

Method 2. The same LDP can be obtained by noticing that, in view of (2.12), the large deviations of \((-\frac{1}{n} \ln {\mathbb {Q}}_n)_{n\in {\mathbb N}}\) are the same as those of the ergodic averages of \(-\psi \) with respect to \({\mathbb {P}}\), which is a well-studied problem; see e.g. [You90, Kif90, Com09, PS17]. The rate function is then given, for all \(s\in {\mathbb R}\), by

$$\begin{aligned} I_{{\mathbb {Q}}}(s) = -\sup \left\{ \int \varphi \textrm{d}\eta +h(\eta ): \eta \in {{\mathcal {P}}}_{\text {inv}}, \int \psi \textrm{d}\eta = -s \right\} . \end{aligned}$$
(2.15)

The rate function \(I_{{\mathbb {Q}}}\) vanishes at a single point \(s = -\int \psi \textrm{d}{\mathbb {P}}= h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\); the supremum in (2.15) is then reached at \(\eta = {\mathbb {P}}\).

For all \(s\ge 0\), the relations (1.19) and (1.22) give

$$\begin{aligned} I_W(s)&= -s-\sup \left\{ \int ( \varphi +\psi )\textrm{d}\eta + h(\eta ): \eta \in {{\mathcal {P}}}_{\text {inv}},\int \psi \textrm{d}\eta \le -s \right\} ,\\ I_V(s)&= -s-\sup \left\{ 2 \int \varphi \textrm{d}\eta +h(\eta ): \eta \in {{\mathcal {P}}}_{\text {inv}}, \int \varphi \textrm{d}\eta \le -s \right\} . \end{aligned}$$

When \({\mathbb {Q}}={\mathbb {P}}\), we obtain \(I_W=I_V\) and, for all \(s\in {\mathbb R}\),

$$\begin{aligned} I_{{\mathbb {P}}}(s) = s-\sup \left\{ h(\eta ): \eta \in {{\mathcal {P}}}_{\text {inv}}, \int \varphi \textrm{d}\eta = -s \right\} . \end{aligned}$$

Next, by (1.25), \(I_R\) differs from \(I_V\) only at 0, with

$$\begin{aligned} I_R(0) = -\gamma _+ = - \sup _{\eta \in {{\mathcal {P}}}_{\text {inv}}}\int \varphi \textrm{d}\eta ; \end{aligned}$$

the second identity is proved [AACG23, §4.2].

The very last assertion of Theorem C applies, and \(I_R\) is convex if and only if \({\mathbb {P}}\) is the measure of maximal entropy on \(\Omega \) (note that \({{\,\textrm{supp}\,}}{\mathbb {P}}= \Omega \) by (2.12)), i.e. if \({\mathbb {P}}\) is the uniform measure. Equivalently, in terms of potentials, \(I_R\) is convex if and only if \(\varphi \) is cohomologous to a constant [AACG23, §3.2].

The authors of [AACG23] first derive the expression (1.20) for \(q_W\) and the expression (1.23) for \(q_R\). Then, since under the assumptions at hand, the pressure \(q_W\) is differentiable on the domain where it is equal to \(q_{{\mathbb {P}}}\), a version of the Gärtner–Ellis theorem implies the LDP for \((\frac{1}{n} \ln W_n)_{n\in {\mathbb N}}\) restricted to the interval \([q_{{\mathbb {P}}}'(-1), -\gamma _-]\), i.e. where \(I_{{\mathbb {P}}}\) and \(I_W\) coincide and are finite. In the same way, \((\frac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) is shown to obey the LDP restricted to the interval \([q_{{\mathbb {P}}}'(\alpha _*), -\gamma _-]\), i.e. where \(I_{{\mathbb {P}}}\) and the convex envelope of \(I_R\) coincide and are finite; see e.g. Fig. 2. By construction, the approach of [AACG23] cannot describe the LDP on the interval \((0, q_{{\mathbb {P}}}'(-1))\) (resp. \((0, q_{{\mathbb {P}}}'(\alpha _*))\)), and in particular cannot capture the nonconvexity of \(I_R\), since \(I_R\) is not the Legendre–Fenchel transform of \(q_R\) in general.

As mentioned in the introduction, our method is very different in spirit. In addition to the explicit singularities in (1.21) and (1.27), the pressures suffer from the fact that even \(q_{{\mathbb {Q}}}\) may fail to be differentiable under our decoupling assumptions. As a consequence, the Gärtner–Ellis theorem cannot be used to obtain the LDP, even limited to the above-mentioned intervals. We are able to circumvent these limitations by going in the opposite direction: we first establish the LDPs directly, using the Ruelle–Lanford method, and then we obtain the properties of the pressures as corollaries.

While [CGS99] does not rely on any version of the Gärtner–Ellis theorem, the LDP there is still restricted to a nonexplicit interval, which is contained in that of [AACG23]. More precisely, the LDP is restricted to an interval of values of s on which \(I_{{\mathbb {P}}}(s)\) is small enough so that some exponentially decaying error terms that arise in the proof decay faster than \(\textrm{e}^{-nI_{{\mathbb {P}}}(s)}\).

Remark 2.3

The above discussion and the results in [AACG23] are limited to the full shift \(\Omega \), while [CGS99] discusses Markov subshifts \(\Omega '\) that are topologically mixing. However, the above conclusions extend in a straightforward way to any subshift \(\Omega ' \subseteq \Omega \) satisfying the flexible specification property of Definition A.2 with \(\tau _n=O(1)\), and in particular to any transitive subshift of finite type; see Remark A.3. Indeed, in this context, one can show that if \({\mathbb {P}}\) and \({\mathbb {Q}}\) are equilibrium measures for two potentials \(\varphi \) and \(\psi \) on \(\Omega '\) satisfying Bowen’s condition, i.e. (2.9) with x and y restricted to \(\Omega '\) in the supremum, then the Bowen–Gibbs property (2.12) holds for both measures and for all \(x\in \Omega '\); see Remark 2.2 in [CT13] and [CT16, §6.5] (with \({{\mathcal {G}}}={{\mathcal {L}}}\) in the notation therein). This in turn implies JSLD and PA through Lemmas A.4, B.7 and B.8, once we have extended \({\mathbb {P}}\) and \({\mathbb {Q}}\) to measures on \(\Omega \) by setting \({\mathbb {P}}(\Omega {\setminus } \Omega ')={\mathbb {Q}}(\Omega {\setminus } \Omega ') = 0\). Thus, our results apply, and the above expressions for the rate functions and pressures remain valid. In particular, \(I_R\) is convex if and only if \({\mathbb {P}}\) is the measure of maximal entropy on \(\Omega '\).

We emphasize that the restriction to \(\tau _n = O(1)\) is important in two ways in the above argument: first, in order to apply Lemma A.4, and second, in order to derive (2.12), on which the hypotheses (A.3) and (A.4) of Lemmas B.7 and B.8 rely in the present setup.

2.4 Beyond Bowen potentials: statistical mechanics and g-measures

The analysis in Sect. 2.3 relies heavily on the Bowen–Gibbs property (2.12), which is obtained as a consequence of the (quite restrictive) Bowen condition (2.9) imposed on the potentials \(\varphi \) and \(\psi \).

For the coming discussion, we introduce a weaker version of (2.12). We say that \({\mathbb {P}}\) is weak Gibbs for the potential \(\varphi \) if there exists an \(\textrm{e}^{o(n)}\)-sequence \((K_n)_{n\in {\mathbb N}}\) such that for all \(x\in \Omega \),

$$\begin{aligned} K^{-1}_n \textrm{e}^{\sum _{i=0}^{n-1} \varphi (T^{i}x)-np_{\text {top}}(\varphi )}\le {\mathbb {P}}_n(x_1^n) \le K_n \textrm{e}^{\sum _{i=0}^{n-1} \varphi (T^{i}x)-np_{\text {top}}(\varphi )}. \end{aligned}$$
(2.16)

The notion of weak Gibbs measure was introduced in [Yur02], and in many interesting situations, equilibrium measures for non-Bowen potentials can still be shown to be weak Gibbs, see for example [PS20].

The conclusions of Theorem 0 remain valid if \({\mathbb {P}}\) and \({\mathbb {Q}}\) are assumed to be weak Gibbs for \(\varphi \) and \(\psi \) respectively; indeed, in view of (2.16), the LDP of Theorem 0 again boils down to the LDP for the ergodic averages of \(\psi \) with respect to \({\mathbb {P}}\), which is a well-studied problem; see e.g. [Com09, §5], [Var12, PS17], and [CJPS19, §A.3], as well as [EKW94] in the specific setup of Sect. 2.4.1 and [CO00] in the specific setup of Sect. 2.4.2 below.

However, the weak Gibbs condition does not seem to imply UD and SLD; see Remark B.10. To the best of the authors’ knowledge, the LDP for \((\tfrac{1}{n} \ln W_n)_{n\in {\mathbb N}}\), \((\tfrac{1}{n} \ln V_n)_{n\in {\mathbb N}}\) and \((\tfrac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) as well as the validity of (1.20), (1.23) and (1.26) are open problems for weak Gibbs measures.

We discuss in Sects. 2.4.1 and 2.4.2 below two important classes of measures enjoying the weak Gibbs property, which are shown to satisfy our decoupling assumptions, using specific arguments distinct from those in Sect. 2.3. Although they have a large intersection, these two classes are distinct; see [FGM11, BEvELN18].

Contrary to the regularity conditions in Sect. 2.3, the setups of Sects. 2.4.1 and 2.4.2 allow for phase transitions: the equilibrium measures may fail to be unique and ergodic. Also, the pressure \(q_{{\mathbb {Q}}}\) may fail to be differentiable, so the conclusions of Theorem 0 cannot be obtained using the Gärtner–Ellis theorem anymore.

2.4.1 Absolutely summable interactions in statistical mechanics

An important situation where less regular potentials arise is the statistical mechanics of one-dimensional, translation-invariant systems; see e.g. [Rue04, Ch. 3–5] or [Sim93, Ch. II–III]. Indeed, if \(\Phi := \{\Phi _X\}_{X \Subset {\mathbb Z}}\) is a translation-invariant collection of functions \(\Phi _X: {{\mathcal {A}}}^X \rightarrow {\mathbb R}\) (interchangeably considered as functions on \({{\mathcal {A}}}^{{\mathbb Z}}\)), called interactions, satisfying the absolute summability condition

$$\begin{aligned} \sum _{\begin{array}{c} X \Subset {\mathbb Z}\\ 0 \in X \end{array}}\Vert \Phi _X\Vert _\infty < \infty , \end{aligned}$$
(2.17)

then there is a well-known correspondence between translation-invariant Gibbs states — either defined using the Dobrushin–Lanford–Ruelle equations, or using convex combinations of weak limits of finite-volume Gibbs measures — and equilibrium measure (on \({{\mathcal {A}}}^{{\mathbb Z}}\)) for the “energy per site” potential

$$\begin{aligned} \varphi = \sum _{\begin{array}{c} X\Subset {\mathbb Z}\\ \min X =1 \end{array}} \Phi _X, \end{aligned}$$

which we see both as a function on \({{\mathcal {A}}}^{\mathbb Z}\) and on \(\Omega = {{\mathcal {A}}}^{\mathbb N}\).Footnote 6

Let now \({\mathbb {P}}\) (resp. \({\mathbb {Q}}\)) be the marginal on \(\Omega \) of some translation-invariant Gibbs state on \({{\mathcal {A}}}^{\mathbb Z}\) for the absolutely summable interaction \(\Phi \) (resp. \(\Psi \)). Equivalently, \({\mathbb {P}}\) (resp. \({\mathbb {Q}}\)) is an equilibrium measure for the potential \(\varphi \) (resp. \(\psi \)) on \(\Omega \).

Since \(\varphi \) and \(\psi \) may fail to satisfy Bowen’s regularity condition, we cannot use the Bowen–Gibbs property (2.12). However, the Dobrushin–Lanford–Ruelle equations allow to prove that \({\mathbb {P}}\) and \({\mathbb {Q}}\) satisfy UD and SLD with \(\tau _n = 0\), but with a possibly unboundedFootnote 7 sequence \((C_n)_{n\in {\mathbb N}}\); see Lemma 9.2 in [LPS95]. JSLD and PA then follow from Remarks 1.2 and 1.5. Our main results thus apply, and we now identify some of the quantities at play in physical terms.

The associated topological pressure \(p_{\text {top}}(\varphi )\) can be thought of as a free energy density: with

$$\begin{aligned} U_n:= \sum _{X \subseteq [1,n]} \Phi _X \end{aligned}$$

the Hamiltonian (up to a factor of minus the inverse temperature) corresponding to \(\Phi \) in the finite volume [1, n] with free boundary conditions, we have

$$\begin{aligned} U_n = \sum _{i=0}^{n-1} \varphi \circ T^i + o(n), \end{aligned}$$
(2.18)

so

$$\begin{aligned} F(\Phi ):= \lim _{n\rightarrow \infty } \frac{1}{n} \ln \sum _{u\in {{\mathcal {A}}}^n}\textrm{e}^{U_n(u)} = p_{\text {top}}(\varphi ). \end{aligned}$$

Using the Dobrushin–Lanford–Ruelle equations, one can show that

$$\begin{aligned} {\mathbb {P}}_n(u) = \textrm{e}^{U_n(u)-nF(\Phi ) + o(n)}, \end{aligned}$$
(2.19)

which, combined with (2.18), shows that \({\mathbb {P}}\) is weak Gibbs in the sense of (2.16). The same is true of \({\mathbb {Q}}\). To simplify the discussion, we shall assume going forward that

$$\begin{aligned} F(\Phi ) = F(\Psi ) = 0, \end{aligned}$$
(2.20)

which can be achieved by adding suitable constants to \(\Phi _{\{i\}}\) and \(\Psi _{\{i\}}\) for every \(i \in {\mathbb Z}\).

The expressions for \(q_{{\mathbb {Q}}}\) and the rate functions obtained in Sect. 2.3 remain valid, as replacing (2.12) with (2.16) does not affect these computations (note that (2.20) implies (2.13)). There, \(\int \varphi \textrm{d}\eta \) and \(\int \psi \textrm{d}\eta \) are simply understood as the specific energies of the state \(\eta \).

We then obtain, either by (2.14) or by direct computation using (2.19), the relations

$$\begin{aligned} q_{{\mathbb {Q}}}(\alpha ) = F(\Phi -\alpha \Psi ) \qquad \text {and}\qquad q_{{\mathbb {P}}}(\alpha ) = F((1-\alpha )\Phi ), \end{aligned}$$

valid for all \(\alpha \in {\mathbb R}\). We remark that \(q_{{\mathbb {P}}}(\alpha )\) is related to the free energy density at (minus the) inverse temperature \(1-\alpha \). Moreover, in this setup, \(\gamma _+\) takes the form of (minus) the asymptotic ground-state energy per unit volume:

$$\begin{aligned} \gamma _+ = \lim _{n\rightarrow \infty }\frac{1}{n} \sup _{u\in {{\mathcal {A}}}^n}U_n(u). \end{aligned}$$
(2.21)

The summability condition (2.17) allows for phase transitions, i.e. the coexistence of several equilibrium measures, and thus the measures \({\mathbb {P}}\) and \({\mathbb {Q}}\) may fail to be ergodic, while still satisfying our decoupling assumptions. As a consequence, \(q_{{\mathbb {Q}}}\) is not differentiable in general.

Remark 2.4

Again, we have limited the above discussion to the full shift \(\Omega \), but the conclusions remain true on many subshifts. In particular, they hold if the subshift satisfies the flexible specification property of Definition A.2 with \(\tau _n=O(1)\). Indeed, although the technical details are slightly involved, one can then adapt the arguments of Lemma 9.2 in [LPS95, §9] in order to obtain (A.3) and (A.4), so that Lemmas B.7 and B.8, together with Lemma A.4, establish UD, SLD, JSLD and PA. In particular, this applies to any transitive subshift of finite type.

2.4.2 g-measures

A continuous function \(g: \Omega \rightarrow (0,1]\) is called a g-function on \(\Omega \) if

$$\begin{aligned} \sum _{y\in T^{-1}\{x\}} g(y) = 1 \end{aligned}$$

for all \(x \in \Omega \). In this case, the potential \(\varphi = \ln g\) has vanishing topological pressure (\(p_{\text {top}}(\varphi )=0\)), and any equilibrium measure (recall (2.11)) for the potential \(\varphi \) is a called a g-measure on \(\Omega \); see e.g. [Wal75, PPW78, Wal05]. Let \({\mathbb {P}}\) be such an equilibrium measure. It is then well known that \({{\,\textrm{supp}\,}}{\mathbb {P}}= \Omega \) and that

$$\begin{aligned} g_n(x):= \frac{{\mathbb {P}}_n(x_1^n)}{{\mathbb {P}}_{n-1}(x_2^n)} \end{aligned}$$
(2.22)

defines a sequence of continuous functions that converges uniformly to g on \(\Omega \); see e.g. [PPW78, §4]. As shown in Lemma B.11, this uniform convergence has two important consequences. First, it implies that \({\mathbb {P}}\) satisfies the weak Gibbs condition. Second, it implies that there is an \(\textrm{e}^{o(n)}\)-sequence \((D_n)_{n\in {\mathbb N}}\) such that

$$\begin{aligned} D_n^{-1} {\mathbb {P}}_n(x_1^n){\mathbb {P}}_m(x_{n+1}^{n+m}) \le {\mathbb {P}}_{n+m}(x_1^{n+m}) \le D_n {\mathbb {P}}_n(x_1^n){\mathbb {P}}_m(x_{n+1}^{n+m}) \end{aligned}$$

for all \(x\in \Omega \), and all \(n,m\in {\mathbb N}\), which implies UD and SLD with \(\tau _n = 0\), but with a possibly unbounded sequence \((C_n)_{n\in {\mathbb N}}\). By Remark 1.5, PA holds as well. If \({\mathbb {Q}}\) is a second g-measure on \(\Omega \) (for a possibly different g-function \(g'\)), then JSLD holds as well by Remark 1.2. Our results thus apply, and once again, the expressions obtained in Sect. 2.3 remain valid thanks to (2.12), with \(\varphi = \ln g\) and \(\psi = \ln g'\).

Remark 2.5

The above discussion can easily be generalized to transitive Markov subshifts \(\Omega '\subseteq \Omega \). However, developing a theory of g-measures on more general subshifts seems to be more delicate, and goes beyond the scope of the present paper. Still, without any reference to the general theory of g-measures, one obtains by combining Lemmas B.11, B.7 and B.8 that our decoupling assumptions hold if \(\Omega '\) satisfies the flexible specification property and the abundant periodic orbit property of Definition A.2, and if one assumes that the sequence \((g_n)_{n\in {\mathbb N}}\) defined by (2.22) converges uniformly to some continuous function \(g:\Omega '\rightarrow (0,1]\); see Lemma B.11 for a precise statement.

We conclude this subsection by noting two interesting references. First, Hulse constructed in [Hul06] an example of a g-measure that is not ergodic, showing once again that UD, SLD and PA do not imply ergodicity.Footnote 8 Second, in the framework of g-measures, the large deviations of empirical entropies (which are also entropy estimators) were studied in [CG05].

2.5 Hidden Markov models and lack of exponential tightness

A hidden Markov measure \({\mathbb {P}}\) on \(\Omega \) with finite hidden alphabet \(\mathcal {A}^\text {H}\) is obtained from a shift-invariant Markov measure \({\mathbb {P}}^\text {H}\) on \(\Omega ^\text {H}:= (\mathcal {A}^\text {H})^{{\mathbb N}}\) and a surjective map \(f: \mathcal {A}^\text {H}\rightarrow {{\mathcal {A}}}\) by prescribing the marginals \({\mathbb {P}}_n:= {\mathbb {P}}^\text {H}_n \circ (f^{\otimes n})^{-1}\) for all \(n \in {\mathbb N}\). The name “hidden Markov model” refers to the pair \(({\mathbb {P}}^\text {H},{\mathbb {P}})\). There exist several different characterizations of hidden Markov measures; one which is particularly useful from the point of view of decoupling properties is the representation in terms of products of matrices discussed in Proposition 2.25 in [BCJP21]. The reader is also encouraged to consult Example 2.25 in [CJPS19]. Using that \(|\mathcal {A}^\text {H}|<\infty \), it is straightforward to show that, if \({\mathbb {P}}^\text {H}\) and \(\mathbb {Q}^\text {H}\) are irreducible, stationary Markov measures on \(\Omega ^\text {H}\) that satisfy \({\mathbb {P}}^\text {H}_n \ll \mathbb {Q}^\text {H}_n\) for all \(n\in {\mathbb N}\), then \({\mathbb {P}}\) and \({\mathbb {Q}}\) defined using the same function \(f\) satisfy UD, JSLD and PA, and Theorems 0 and AC apply.

Obviously, this is a generalization of the setup of Sect. 2.2, but in a completely different spirit from that of Sect. 2.3: hidden Markov measures can be far from Gibbsian; see e.g. Theorem 2.10 in [BCJP21].Footnote 9 The assumption \(|\mathcal {A}^\text {H}|<\infty \) ensures the existence of a constant \(c\ge 0\) such that

$$\begin{aligned} \inf _{x\in {{\,\textrm{supp}\,}}{\mathbb {P}}}{\mathbb {P}}_n(x_1^n)\ge \textrm{e}^{-cn} \qquad \text {and}\qquad \inf _{x\in {{\,\textrm{supp}\,}}{\mathbb {Q}}}{\mathbb {Q}}_n(x_1^n)\ge \textrm{e}^{-cn}, \end{aligned}$$
(2.23)

so \(0\le -\frac{1}{n} \ln {\mathbb {Q}}_1(x_1^n)\le c\) for \({\mathbb {P}}\)-almost every \(x\in \Omega \). These bounds imply exponential tightness and thus guarantee goodness of the rate function \(I_{{\mathbb {Q}}}\). In fact, \(I_{{\mathbb {Q}}}\) is infinite on \((c,\infty )\), and the same is then automatically true of \(I_W\), \(I_V\) and \(I_R\). By the same token, the bounds (2.23) imply that \(q_{{\mathbb {Q}}}\le c\alpha \) for all \(\alpha \ge 0\), so \(q_{{\mathbb {Q}}}\), \(q_W\), \(q_R\) and \(q_V\) are finite everywhere. The bounds (2.23) and these consequences are a common feature of all the examples in Sects. 2.12.4.

Dropping the assumption that \(\mathcal {A}^\text {H}\) is finite, the class of hidden Markov measures allows for examples where (2.23) fails, but one then needs to verify on a case-by-case basis whether UD, JSLD and PA are satisfied in order for our results to apply. Using \(\mathcal {A}^\text {H}= \{0\} \cup {\mathbb N}\) as in [CJPS19, §A.2], one easily constructs fully supported, admissible pairs \(({\mathbb {P}}, {\mathbb {Q}})\) on \({{\mathcal {A}}}= \{\textsf{a},\textsf{b}\}\), satisfying also PA, but violating either or both inequalities in (2.23). We limit ourselves to providing two sets of parameters following the notation in [CJPS19, §A.2]: Examples 2.6 and 2.7 are illustrated in Figs. 4 and 5, respectively.

Example 2.6

Define \({\mathbb {P}}\) using \(\gamma (n)=n\ln 2\), and \({\mathbb {Q}}\) using \({\hat{\gamma }}(n)=2n+0.15n^2\). Then, the measure \({\mathbb {P}}\) is uniform and \({\mathbb {Q}}(\textsf{b}^n) \sim \textrm{e}^{-{\hat{\gamma }}(n)}\). Note that the quadratic term in the definition of \(\hat{\gamma }(n)\) makes the second bound in (2.23) fail. Here,

$$\begin{aligned} q_{{\mathbb {Q}}}(\alpha )&\ge \liminf _{n\rightarrow \infty }\frac{1}{n} \ln ( {\mathbb {P}}_{n}(\textsf{b}^n){\mathbb {Q}}_n(\textsf{b}^n)^{-\alpha }) \\&= -\ln 2 + \alpha \liminf _{n\rightarrow \infty } \frac{{\hat{\gamma }}(n)}{n}, \end{aligned}$$

so \(q_{{\mathbb {Q}}}(\alpha )=\infty \) for all \(\alpha >0\). One can show that \(q_{{\mathbb {Q}}}'(0^-)<\infty \) and that \(I_{{\mathbb {Q}}}(s)=I_W(s) = 0\) for all \(s\ge q_{{\mathbb {Q}}}'(0^-)\). Thus, the sequence \((-\tfrac{1}{n} \ln {\mathbb {Q}}_n)_{n\in {\mathbb N}}\) is not exponentially tight with respect to \({\mathbb {P}}\), and the rate functions \(I_{{\mathbb {Q}}}\) and \(I_W\) are not good; see Fig. 4.

Fig. 4
figure 4

Illustration of Example 2.6. In the right picture, \(q_{{\mathbb {Q}}}(\alpha )=q_W(\alpha )=\infty \) for all \(\alpha >0\)

Example 2.7

Define \({\mathbb {P}}={\mathbb {Q}}\) using \(\gamma (0)=\hat{\gamma }(0)=0\), \(\gamma (n) = {\hat{\gamma }}(n)= 0.6\,\textrm{e}^{n}\) for \(n\in {\mathbb N}\). Then, we have \({\mathbb {P}}(\textsf{b}^n) = {\mathbb {Q}}(\textsf{b}^n)\sim \textrm{e}^{-\gamma (n)}\) and both bounds in (2.23) fail. Here, \(q_{{\mathbb {P}}}\) is infinite for all \(\alpha > 1\), and \(q_{{\mathbb {P}}}'(1^-)=\infty \). Each sequence of interest is exponentially tight and has a rate function that is good, but remains finite on an unbounded interval; in other words \(\gamma _- = -\infty \).

Fig. 5
figure 5

Illustration of Example 2.7. In the left picture, the dashed black asymptote is given by the equation \(y = x-q_{{\mathbb {P}}}(1)\), and in the right picture, \(q_{{\mathbb {P}}}(\alpha )=q_W(\alpha )=q_V(\alpha )=q_R(\alpha )=\infty \) for all \(\alpha >1\)

Remark 2.8

The possible lack of exponential tightness is specific to the case where \({\mathbb {P}}\ne {\mathbb {Q}}\); when \({\mathbb {P}}={\mathbb {Q}}\), our assumptions imply that all the random variables discussed in the paper are exponentially tight and that the corresponding rate functions are good. At the root of this fact is the exponential tightness of \((-\frac{1}{n}\ln {\mathbb {P}}_n)_{n\in {\mathbb N}}\) with respect to \({\mathbb {P}}\) (Theorem 0.iii). This exponential tightness will be derived as a consequence of (1.16), but it can also be seen directly: for any \(M\ge 0\),

$$\begin{aligned} {\mathbb {P}}\left\{ x: -\frac{1}{n} \ln {\mathbb {P}}_n(x_1^n) \ge M\right\} = {\mathbb {P}}\{x: {\mathbb {P}}_n(x_1^n) \le \textrm{e}^{-nM}\}\le |{{\mathcal {A}}}|^{n}\textrm{e}^{-Mn}. \end{aligned}$$
(2.24)

We will come back to this point in Remark 5.3.

Remark 2.9

This class of examples also allows for cases where the pressure \(q_{{\mathbb {Q}}}\) is not differentiable — and hence where the Gärtner–Ellis route cannot provide the conclusions of Theorem 0. Indeed, using the parameters \(\gamma (n) = {\hat{\gamma }}(n)=\frac{n}{10}+4\ln (1+\frac{n}{4})\) yields a situation where \(q_{{\mathbb {Q}}}\) is not differentiable at \(\alpha _0 \approx -0.1769\) and the rate functions are affine on an interval corresponding to the subdifferential of \(q_{{\mathbb {Q}}}\) at \(\alpha _0\).

3 Key Estimates

This section is devoted to technical estimates on the distribution of \(W_n\), \(R_n\) and \(V_n\) at large but finite \(n \in {\mathbb N}\). We start with estimates for \(W_n\) and then use those for our analysis of \(R_n\) and \(V_n\). In order to do so, we first need a convenient reformulation of our decoupling assumptions.

We show in Lemma B.2 that, at the cost of replacing \(C_n\) with \(C_n|{{\mathcal {A}}}|^{\tau _n}\) (which also satisfies (1.8)), UD implies that for every \(n\in {\mathbb N}\), \(A\in {{\mathcal {F}}}_n\) and \(B\in {{\mathcal {F}}}\),

$$\begin{aligned} {\mathbb {P}}\left( A \cap T^{-n - \tau _n}B\right) \le C_n {\mathbb {P}}(A){\mathbb {P}}(B). \end{aligned}$$
(3.1)

In the same way, we show in Lemma B.3 that at the cost of replacing \(C_n\) with \((\tau _n+1)C_n\), SLD implies that for every \(n\in {\mathbb N}\), \(A\in {{\mathcal {F}}}_n\) and \(B\in {{\mathcal {F}}}\),

$$\begin{aligned} \max _{0\le \ell \le \tau _n}{\mathbb {P}}\left( A \cap T^{-n -\ell }B\right) \ge C_n^{-1} {\mathbb {P}}(A){\mathbb {P}}(B). \end{aligned}$$
(3.2)

We shall freely use the form (3.1) of UD and (3.2) of SLD throughout the paper.

3.1 Waiting times

Because \(W_n: \Omega \times \Omega \rightarrow {\mathbb N}\) is \(({{\mathcal {F}}}_n \otimes {{\mathcal {F}}})\)-measurable, we will sometimes identify it to a function \({{\mathcal {A}}}^n \times \Omega \rightarrow {\mathbb N}\) denoted by the same symbol.

Lemma 3.1

Assume \({\mathbb {Q}}\) satisfies SLD. Then, for all \(m,n\in {\mathbb N}\) and \(u\in {{\mathcal {A}}}^n\),

$$\begin{aligned} {\mathbb {Q}}\{ y: W_n(u,y) = m \} \le {\mathbb {Q}}_n(u) (1 - C_n^{-1}{\mathbb {Q}}_n(u))^{\left\lfloor \frac{m-1}{n + \tau _n}\right\rfloor } \end{aligned}$$
(3.3)

and

$$\begin{aligned} {\mathbb {Q}}\{ y: W_n(u,y)\ge m\} \le (1 - C_n^{-1}{\mathbb {Q}}_n(u))^{\left\lfloor \frac{m-1}{n + \tau _n}\right\rfloor }. \end{aligned}$$
(3.4)

Proof

Let us fix nm and u as in the statement, and let \(k:=\lfloor \tfrac{m-1}{n + \tau _n}\rfloor \). First, if \(k=0\), then (3.4) is trivial and (3.3) holds since \(\{ y: W_n(u,y) = m \}\subseteq T^{1-m}[u]\) and \({\mathbb {Q}}\) is shift invariant. Consider now the case \(k\in {\mathbb N}\). In view of SLD (recall (3.2)) and shift invariance, we may inductively pick k integers \(0\le \ell _1, \ell _2, \dots , \ell _k\le \tau _n\) such that the intersections inductively defined by

$$\begin{aligned} A_0:= [u] \qquad \text {and} \qquad A_j:= [u]^\textsf{c}\cap T^{-n - \ell _j}A_{j{-1}} \end{aligned}$$

for \(j = 1, \dotsc , k\) satisfy \({\mathbb {Q}}\left( [u] \cap T^{-n - \ell _j}A_{j-1}\right) \ge C_n^{-1} {\mathbb {Q}}_n(u){\mathbb {Q}}(A_{j-1})\), and thus also

$$\begin{aligned} {\mathbb {Q}}(A_j) = {\mathbb {Q}}(A_{j-1})-{\mathbb {Q}}([u]\cap T^{-n-\ell _j} A_{j-1}) \le {\mathbb {Q}}(A_{j-1})(1-C_n^{-1}{\mathbb {Q}}_n(u)). \end{aligned}$$
(3.5)

Iterating (3.5) starting from \(A_0\) yields

$$\begin{aligned} {\mathbb {Q}}(A_k) \le {\mathbb {Q}}_n(u)(1 - C_n^{-1} {\mathbb {Q}}_n(u))^{k}. \end{aligned}$$
(3.6)

Let \(M:= \{m- jn - \sum _{i=1}^j \ell _i\}_{j=1}^{k}\). By construction, \(kn + \sum _{i=1}^k\ell _j \le k({n + \tau _n})<m\), so \(M \subseteq [1, m-n] \subseteq [1,m-1]\). As a consequence,

$$\begin{aligned} \{ y : W_n(u,y) = m \}&= T^{1-m}[u] \cap \bigcap _{m'=1}^{m-1} T^{1-m'}[u]^\textsf{c}\\&\subseteq T^{1-m}[u] \cap \bigcap _{m'\in M} T^{1-m'}[u]^\textsf{c}\\&= T^{1-m+nk + \sum _{j=1}^{k}\ell _j } A_{k}. \end{aligned}$$

Thus, by shift invariance and (3.6), we readily obtain (3.3). The proof of (3.4) is exactly the same with \(A_0\) replaced by \(\Omega \). \(\square \)

Remark 3.2

The bounds (3.5) and (3.6) are inspired by [Kon98, §2], and were already adapted to selective decoupling conditions in [CDEJR23a]. It might be surprising that the upper bound (3.3) relies on the lower decoupling assumption SLD. In fact, even if \(\tau _n = 0\) for all n, using UD would yield \({\mathbb {Q}}(A_k)\le C_n^k{\mathbb {Q}}_n(u)(1 -{\mathbb {Q}}_n(u))^{k}\) instead of (3.6); the extra factor of \(C_n^k\) is too crude since we will be interested in the case where \(k \gg n\). The opposite will happen in Lemma 3.4, where a lower bound will be proved using UD; see (3.13).

If \({\mathbb {Q}}_n(u)=0\), then \({\mathbb {Q}}\{y:W_n(u,y)=m\} \le {\mathbb {Q}}(T^{1-m}[u])={\mathbb {Q}}_n(u) = 0\) for all \(m\in {\mathbb N}\), and thus \(W_n(u, \cdot \, )\) is almost surely infinite. Conversely, if \({\mathbb {Q}}\) satisfies SLD, the bound (3.4) ensures that \(W_n(u, \cdot \, )\) is almost surely finite whenever \({\mathbb {Q}}_n(u)>0\). This last observation is at the heart of the next lemma.

Lemma 3.3

Let \(n\in {\mathbb N}\), let \({\mathbb {Q}}\) satisfy SLD, and assume that \({\mathbb {P}}_n \ll {\mathbb {Q}}_n\). Then, the random variable \(W_n\) is \(({\mathbb {P}}\otimes {\mathbb {Q}})\)-almost surely finite.

Proof

This is a consequence of the decomposition

$$\begin{aligned} W_n(x,y) = \sum _{u\in {{\mathcal {A}}}^n}1_{[u]}(x)W_n(u,y), \end{aligned}$$

the bound (3.4), and the absolute continuity assumption. \(\square \)

We now turn to lower bounds on the distribution of \(W_n(u,\cdot \,)\). The following lemma will be useful when \((1-C_n{\mathbb {Q}}_n(u))^{\big \lceil \frac{m-1}{n + \tau _n}\big \rceil }\) is close to 1.

Lemma 3.4

Assume \({\mathbb {Q}}\) satisfies UD. Then, for all \(n,m\in {\mathbb N}\) and \(u\in {{\mathcal {A}}}^n\) such that \(C_n{\mathbb {Q}}_n(u)\le 1\),

$$\begin{aligned} \begin{aligned}&{\mathbb {Q}}\{ y: W_n(u,y) = m \} \\&\qquad \qquad \ge \frac{1}{n + \tau _n}{\mathbb {Q}}_n(u) \left( 1 - ({n + \tau _n}) \left( 1 - (1-C_n{\mathbb {Q}}_n(u))^{\big \lceil \frac{m-1}{n + \tau _n}\big \rceil }\right) \right) . \end{aligned} \end{aligned}$$
(3.7)

Proof

Let us fix nm and u as in the statement. We first prove that it suffices to establish the bound

$$\begin{aligned} \begin{aligned}&{\mathbb {Q}}\{ y: k({n + \tau _n}) < W_n(u,y) \le (k+1)({n + \tau _n}) \} \\&\qquad \qquad \ge {\mathbb {Q}}_n(u) \left( 1 - ({n + \tau _n})\left( 1 - (1-C_n{\mathbb {Q}}_n(u))^k\right) \right) \end{aligned} \end{aligned}$$
(3.8)

for all \(k\in {\mathbb N}\cup \{0\}\). Since

$$\begin{aligned} \{y:W_n(u,y) = t + 1 \} \subseteq T^{-1}\{y:W_n(u,y) = t\} \end{aligned}$$
(3.9)

for all \(t\in {\mathbb N}\), the probability \({\mathbb {Q}}\{y:W_n(u,y) = t\}\) is nonincreasing in t. As a consequence, the left-hand side of (3.8) is bounded above by \(({n + \tau _n}){\mathbb {Q}}\{y: W_n(u,y) = k({n + \tau _n})+1\}\), so (3.8) implies that

$$\begin{aligned} \begin{aligned}&{\mathbb {Q}}\{ y: W_n(u,y) = 1+k({n + \tau _n}) \}\\&\qquad \qquad \ge \frac{1}{n + \tau _n}{\mathbb {Q}}_n(u) \left( 1 - ({n + \tau _n}) \left( 1 - (1-C_n{\mathbb {Q}}_n(u))^{k}\right) \right) . \end{aligned} \end{aligned}$$

By nonincreasingness, the same lower bound applies to \({\mathbb {Q}}\{ y: W_n(u,y) =m\}\) for all k such that \(1+k({n + \tau _n})\ge m\), and thus (3.8) indeed implies (3.7).

We now establish (3.8). For every \(y\in T^{1-(k+1)({n + \tau _n})}[u] =:A\) we have \(W_n(u,y)\le (k+1)({n + \tau _n})\), and for every \(y\in \bigcap _{r = 1}^{k({n + \tau _n})} T^{1-r}[u]^\textsf{c}\) we have \(W_n(u,y)> k({n + \tau _n})\). As a consequence,

$$\begin{aligned} {\mathbb {Q}}\{y: k({n + \tau _n}) < W_n(u,y) \le (k+1)({n + \tau _n})\} \ge {\mathbb {Q}}\left( A\cap \bigcap _{r = 1}^{k({n + \tau _n})} T^{1-r}[u]^\textsf{c}\right) . \nonumber \\ \end{aligned}$$
(3.10)

We now write \( \bigcap _{r = 1}^{k({n + \tau _n})} T^{1-r}[u]^\textsf{c}= \bigcap _{j=1}^{n + \tau _n}B_j \), where for \(1\le j \le {n + \tau _n}\),

$$\begin{aligned} B_j:= \bigcap _{d = 0}^{k-1} T^{1-d({n + \tau _n})-j}[u]^\textsf{c}. \end{aligned}$$

Notice that for each j, the events whose intersection defines \(B_j\) are separated by “gaps” of size \(\tau _n\), which will allow to use UD below. By a union bound, (3.10) implies that

$$\begin{aligned} {\mathbb {Q}}\{y : k({n + \tau _n}) < W_n(u,y) \le (k+1)({n + \tau _n})\}&\ge {\mathbb {Q}}\left( A\cap \bigcap _{j=1}^{n + \tau _n}B_j \right) \\&={\mathbb {Q}}(A)-{\mathbb {Q}}\left( \bigcup _{j=1}^{n + \tau _n}(A\cap B_j^\textsf{c}) \right) \\&\ge {\mathbb {Q}}(A)-\sum _{j=1}^{n + \tau _n}{\mathbb {Q}}(A\cap B_j^\textsf{c}) \\&= {\mathbb {Q}}(A) -\sum _{j=1}^{n + \tau _n}({\mathbb {Q}}(A)-{\mathbb {Q}}(A\cap B_j)). \end{aligned}$$

By shift invariance, \({\mathbb {Q}}(A)={\mathbb {Q}}_n(u)\), and thus the proof of (3.8) will be complete once we have shown that

$$\begin{aligned} {\mathbb {Q}}(A\cap B_j) \ge {\mathbb {Q}}_n(u)(1-C_n{\mathbb {Q}}_n(u))^k. \end{aligned}$$
(3.11)

Fix \(1\le j \le {n + \tau _n}\). We have

$$\begin{aligned} A\cap B_j = T^{1-j}\left( T^{j-(k+1)({n + \tau _n})}[u]\cap B_1\right) = T^{1-j} F_k, \end{aligned}$$
(3.12)

with \(F_0, \dots , F_k\) inductively defined by

$$\begin{aligned} F_0:= T^{j-n-\tau _n}[u] \qquad \text {and} \qquad F_i:= [u]^\textsf{c}\cap T^{-n-\tau _n} F_{i-1} \end{aligned}$$

for \(i=1,\dotsc ,k\). By UD (recall (3.1)),

$$\begin{aligned} {\mathbb {Q}}(F_i) = {\mathbb {Q}}(F_{i-1})-{\mathbb {Q}}([u]\cap T^{-n-\tau _n} F_{i-1}) \ge {\mathbb {Q}}(F_{i-1})(1-C_n{\mathbb {Q}}_n(u)). \end{aligned}$$

Iterating this bound starting with \(F_0\) yields

$$\begin{aligned} {\mathbb {Q}}(F_k)\ge {\mathbb {Q}}_n(u)(1-C_n{\mathbb {Q}}_n(u))^k. \end{aligned}$$
(3.13)

Combining this with (3.12) and using shift invariance establishes (3.11), as claimed. \(\square \)

The next proposition makes precise the “cut-off” phenomenon sketched in (1.30) and uses the notation

$$\begin{aligned} U_{n}(s):= \left\{ u \in {{\mathcal {A}}}^n: {\mathbb {Q}}_n(u) \le {\textrm{e}^{-ns}} \right\} \end{aligned}$$
(3.14)

for \(s>0\).

Proposition 3.5

Suppose that \({\mathbb {Q}}\) satisfies UD and SLD, and let \({\mathbb {P}}\in {{\mathcal {P}}}_{\text {inv}}(\Omega )\) satisfy \({\mathbb {P}}_n \ll {\mathbb {Q}}_n\) for all \(n\in {\mathbb N}\). Then, for all \(s>0\) and all \(0< \delta \le \epsilon < \tfrac{s}{2}\), we have, for all large enough n,

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y): \tfrac{1}{n} \ln W_n(x,y) \in B(s,\epsilon )\} \\&\qquad \qquad \le \textrm{e}^{n(s + \epsilon )} \sum _{u \in U_{n}(s-\epsilon -\delta )}{\mathbb {Q}}_n(u){\mathbb {P}}_n(u) + \exp (-\textrm{e}^{\frac{n\delta }{2}}) \end{aligned} \end{aligned}$$
(3.15)

and

$$\begin{aligned} {\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y): \tfrac{1}{n} \ln W_n(x,y) \in B(s,\epsilon )\}\ge \textrm{e}^{ n(s - \delta )}\sum _{u \in U_{n}(s+\delta )}{\mathbb {Q}}_n(u){\mathbb {P}}_n(u). \end{aligned}$$
(3.16)

Proof

The key idea in this proof is to show that, when \(\frac{1}{n} \ln m \in B(s,\epsilon )\), the \(\lfloor \tfrac{m-1}{n + \tau _n}\rfloor \)-th power of \((1 - C_n^{-1}{\mathbb {Q}}_n(u))\) in (3.3) vanishes superexponentially for all \(u \notin U_{n}(s-\epsilon -\delta )\) as \(n\rightarrow \infty \), and that the \(\lceil \tfrac{m-1}{n + \tau _n}\rceil \)-th power of \((1-C_n{\mathbb {Q}}_n(u))\) in (3.7) is very close to 1 for all \(u\in U_{n}(s+\delta )\) as \(n\rightarrow \infty \). We fix s, \(\epsilon \) and \(\delta \) as in the statement and first note that

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y): \tfrac{1}{n} \ln W_n(x,y) \in B(s,\epsilon )\} \\&\qquad \qquad = \sum _{u\in {{\mathcal {A}}}^n}{\mathbb {P}}_n(u){\mathbb {Q}}\{y: \tfrac{1}{n} \ln W_n(u,y) \in B(s,\epsilon )\}. \end{aligned} \end{aligned}$$
(3.17)

Proof of (3.15). Notice that \(|\{m\in {\mathbb N}: \tfrac{1}{n} \ln m \in B(s,\epsilon )\}|\le \textrm{e}^{n(s+\epsilon )}\). Thus, using (3.17), the nonincreasingness of \(t\mapsto {\mathbb {Q}}\{y:W_n(u,y) = t\}\) (recall (3.9)), and then Lemma 3.1, we find

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y): \tfrac{1}{n} \ln W_n(x,y) \in B(s,\epsilon )\} \\&\qquad \qquad \le \textrm{e}^{n(s+\epsilon )}\sum _{u\in {{\mathcal {A}}}^n}{\mathbb {P}}_n(u){\mathbb {Q}}\{y: W_n(u,y) = m_n\}\\&\qquad \qquad \le \textrm{e}^{n(s+\epsilon )}\sum _{u\in {{\mathcal {A}}}^n}{\mathbb {P}}_n(u){\mathbb {Q}}_n(u)(1 - C_n^{-1}{\mathbb {Q}}_n(u))^{k_n}, \end{aligned} \end{aligned}$$
(3.18)

where \(m_n:=\lceil \textrm{e}^{n(s-\epsilon )}\rceil \) and \(k_n:= \big \lfloor \frac{m_n-1}{n + \tau _n}\big \rfloor \). We now split the above sum into a sum over \( U_{n}(s-\epsilon -\delta )\) and a sum over \( U_{n}^\textsf{c}(s-\epsilon -\delta )\). Clearly,

$$\begin{aligned} \sum _{u\in U_{n}(s-\epsilon -\delta )}{\mathbb {P}}_n(u){\mathbb {Q}}_n(u)(1 - C_n^{-1}{\mathbb {Q}}_n(u))^{k_n}\le \sum _{u \in U_{n}(s-\epsilon -\delta )}{\mathbb {P}}_n(u){\mathbb {Q}}_n(u), \end{aligned}$$
(3.19)

while on the other hand,

$$\begin{aligned} \sum _{u\in U_{n}^\textsf{c}(s-\epsilon -\delta )}{\mathbb {P}}_n(u){\mathbb {Q}}_n(u)(1 - C_n^{-1}{\mathbb {Q}}_n(u))^{k_n} \le (1 - C_n^{-1}\textrm{e}^{-n(s-\epsilon -\delta )})^{k_n}. \end{aligned}$$
(3.20)

But using the inequality \((1-x)^{k_n} \le \textrm{e}^{-xk_n}\), we find

$$\begin{aligned} \textrm{e}^{n(s+\epsilon )}(1 - C_n^{-1}\textrm{e}^{-n(s-\epsilon -\delta )})^{k_n}&\le \exp \left( n(s+\epsilon )-C_n^{-1}\textrm{e}^{-n(s-\epsilon -\delta )}k_n\right) \\&\le \exp (-\textrm{e}^{\frac{n\delta }{2}}) \end{aligned}$$

for n large enough, thanks to the fact that \(k_n \ge \textrm{e}^{n(s-\epsilon ) - o(n)}\) and \(C_n = \textrm{e}^{o(n)}\) as \(n\rightarrow \infty \). Therefore, using the estimates (3.19) and (3.20) in (3.18) indeed yields (3.15).

Proof of (3.16). This time, note that \(|\{m\in {\mathbb N}: \tfrac{1}{n} \ln m \in (s-\epsilon ,s] \}|\ge \tfrac{1}{2}\textrm{e}^{ns}\) for n large enough, and set \(m_n:=\lfloor \textrm{e}^{ns}\rfloor \) and \(k_n:= \big \lceil \frac{m_n-1}{n + \tau _n}\big \rceil \). Then, by (3.17) and the nonincreasingness of \(t\mapsto {\mathbb {Q}}\{y:W_n(u,y) = t\}\), we obtain

$$\begin{aligned}&{\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y) : \tfrac{1}{n} \ln W_n(x,y) \in B(s,\epsilon )\} \\&\qquad \qquad \ge {\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y) : \tfrac{1}{n} \ln W_n(x,y) \in (s-\epsilon ,s]\}\\&\qquad \qquad \ge \frac{ \textrm{e}^{ns}}{2} \sum _{u\in {{\mathcal {A}}}^n}{\mathbb {P}}_n(u){\mathbb {Q}}\{y: W_n(u,y) = m_n\}\\&\qquad \qquad \ge \frac{ \textrm{e}^{ns}}{2} \sum _{u\in U_{n}(s+\delta )}{\mathbb {P}}_n(u){\mathbb {Q}}\{y: W_n(u,y) = m_n\}. \end{aligned}$$

Then, for n large enough so that \(C_n \textrm{e}^{-n(s+\delta )}\le 1\), we can apply Lemma 3.4 to every \(u\in U_{n}(s+\delta )\), and we obtain that

$$\begin{aligned} \begin{aligned}&{\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y): \tfrac{1}{n} \ln W_n(x,y) \in B(s,\epsilon )\} \\&\qquad \qquad \ge \frac{ \textrm{e}^{ns}}{2} \sum _{u\in U_{n}(s+\delta )}{\mathbb {P}}_n(u){\mathbb {Q}}_n(u) \left( ({n + \tau _n})^{-1} - \left( 1 - (1-C_n{\mathbb {Q}}_n(u))^{k_n}\right) \right) \\&\qquad \qquad \ge \frac{ \textrm{e}^{ns}}{2} \sum _{u\in U_{n}(s+\delta )}{\mathbb {P}}_n(u){\mathbb {Q}}_n(u) \left( ({n + \tau _n})^{-1} - \left( 1 - (1-C_n\textrm{e}^{-n(s+\delta )})^{k_n}\right) \right) . \end{aligned} \end{aligned}$$
(3.21)

Notice that by Bernoulli’s inequality, \((1-x)^{k_n}\ge 1-xk_n\) for all \(x\le 1\). As a consequence, for all n large enough,

$$\begin{aligned} (1-C_n\textrm{e}^{-n(s+\delta )})^{k_n} \ge 1-C_n\textrm{e}^{-n(s+\delta )}k_n \ge 1- \textrm{e}^{-\frac{n\delta }{2}}, \end{aligned}$$

where we have used that \(k_n \le \textrm{e}^{ns + o(n)}\) and \(C_n = \textrm{e}^{o(n)}\). Substitution into (3.21) yields

$$\begin{aligned}&{\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y) : \tfrac{1}{n} \ln W_n(x,y) \in B(s,\epsilon )\} \\&\qquad \qquad \ge \frac{\textrm{e}^{ns}}{2} \sum _{u\in U_{n}(s+\delta )}{\mathbb {P}}_n(u){\mathbb {Q}}_n(u) \left( ({n + \tau _n})^{-1} - \textrm{e}^{-\frac{n\delta }{2}}\right) , \end{aligned}$$

from which the lower bound (3.16) readily follows when n is large enough. \(\square \)

3.2 From waiting times to return times

Lemma 3.6

Assume \({\mathbb {P}}\) satisfies UD. Then, for all \(s > 0\) and all \(0< \epsilon < s\), we have, for all large enough n,

$$\begin{aligned} {\mathbb {P}}\{x: \tfrac{1}{n} \ln R_n(x) \in B(s,\epsilon )\} \le C_n \sum _{u \in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u) {\mathbb {P}}\{ x: \tfrac{1}{n} \ln W_n(u,x) \in B(s,2\epsilon )\} \end{aligned}$$

and

$$\begin{aligned} {\mathbb {P}}\{x: \tfrac{1}{n} \ln V_n(x) \in B(s,\epsilon )\} \le C_n \sum _{u \in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u) {\mathbb {P}}\{ x: \tfrac{1}{n} \ln W_n(u,x) \in B(s,2\epsilon )\}. \end{aligned}$$

Proof

Fix \(s> \epsilon > 0\), fix \(n\in {\mathbb N}\) large enough so that

$$\begin{aligned} \textrm{e}^{n(s-\epsilon )} > {n + \tau _n} - 1 \qquad \text {and} \qquad \textrm{e}^{n(s-2\epsilon )} \le \textrm{e}^{n(s-\epsilon )} +1-n-\tau _n, \end{aligned}$$
(3.22)

and fix \(u \in {{\mathcal {A}}}^n\). Then, for all \(x\in [u]\) such that \(\frac{1}{n} \ln R_n(x) \in B(s,\epsilon )\), we have \(R_n(x) = W_n(u,Tx) \in (\textrm{e}^{n(s-\epsilon )},\textrm{e}^{n(s+\epsilon )})\). The first condition in (3.22) implies that \(W_n(u,T^{n + \tau _n}x) = W_n(u, T x)+1-n-\tau _n\) for all such x, so that, also using the second condition in (3.22),

$$\begin{aligned} W_n(u,T^{n + \tau _n}x)\in (\textrm{e}^{n(s-\epsilon )}+1-n-\tau _n,\textrm{e}^{n(s+\epsilon )}+1-n-\tau _n)\subset (\textrm{e}^{n(s-2\epsilon )},\textrm{e}^{n(s+2\epsilon )}). \end{aligned}$$

As a consequence, by UD,

$$\begin{aligned} {\mathbb {P}}([u] \cap \{x : \tfrac{1}{n} \ln {R}_n(x) \in B(s,\epsilon )\})&\le {\mathbb {P}}([u] \cap \{x : \tfrac{1}{n} \ln W_{n}(u,T^{n+\tau _n} x) \in B(s,2\epsilon )\}) \\&\le C_n{\mathbb {P}}_n(u) {\mathbb {P}}\{x : \tfrac{1}{n} \ln W_{n}(u,x) \in B(s,2\epsilon )\}. \end{aligned}$$

Taking the sum over \(u\in {{\mathcal {A}}}^n\) proves the desired bound for \(R_n\). The proof of the bound for \(V_n\) is almost identical: for all \(x\in [u]\) such that \(\frac{1}{n} \ln V_n(x) \in B(s,\epsilon )\), we have this time \(W_n(u,T^n x)\in (\textrm{e}^{n(s-\epsilon )},\textrm{e}^{n(s+\epsilon )})\), so that \(W_n(u,T^{n + \tau _n}x)\in (\textrm{e}^{n(s-\epsilon )}-\tau _n,\textrm{e}^{n(s+\epsilon )}-\tau _n)\subset (\textrm{e}^{n(s-2\epsilon )},\textrm{e}^{n(s+2\epsilon )})\). The remainder of the argument is unchanged. \(\square \)

Lemma 3.7

Assume \({\mathbb {P}}\) satisfies SLD. Then, for all \(s > 0\) and all \(0< \epsilon < s\), we have, for all large enough n,

$$\begin{aligned} {\mathbb {P}}\{x: \tfrac{1}{n} \ln R_n(x) \in B(s,\epsilon )\} \ge \frac{C_n^{-1}}{n + \tau _n} \sum _{u \in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u) {\mathbb {P}}\{ x: \tfrac{1}{n} \ln W_n(u,x) \in B(s,\tfrac{1}{2} \epsilon )\} \end{aligned}$$

and

$$\begin{aligned} {\mathbb {P}}\{x: \tfrac{1}{n} \ln V_n(x) \in B(s,\epsilon )\} \ge \frac{C_n^{-1}}{1+\tau _n} \sum _{u \in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u) {\mathbb {P}}\{ x: \tfrac{1}{n} \ln W_n(u,x) \in B(s,\tfrac{1}{2} \epsilon )\}. \end{aligned}$$

Proof

We first prove the statement concerning \(R_n\). Fix \(s> \epsilon > 0\), fix n large enough so that

$$\begin{aligned} \textrm{e}^{n(s + \frac{1}{2} \epsilon )} + {n + \tau _n}-1 < \textrm{e}^{n(s+\epsilon )}, \end{aligned}$$
(3.23)

and fix \(u\in {{\mathcal {A}}}^n\). By SLD, there exists \(0\le \ell \le \tau _n\) such that the set

$$\begin{aligned} A_u:= [u]\cap T^{-n-\ell } \{x:\tfrac{1}{n} \ln W_n(u,x)\in B(s,\tfrac{1}{2}\epsilon )\} \end{aligned}$$

satisfies

$$\begin{aligned} {\mathbb {P}}(A_u)\ge C_n^{-1}{\mathbb {P}}_n(u) {\mathbb {P}}\{x:\tfrac{1}{n} \ln W_n(u,x)\in B(s,\tfrac{1}{2}\epsilon )\}. \end{aligned}$$
(3.24)

We now claim that

$$\begin{aligned} A_u\subseteq \bigcup _{j=1}^{n+\tau _n} T^{1-j}\{x:\tfrac{1}{n} \ln R_n(x)\in B(s,\epsilon )\}. \end{aligned}$$
(3.25)

To prove (3.25), take an arbitrary \(x\in A_u\), and let \(m:=W_n(u,T^{n+\ell }x)\), which, by the definition of \(A_u\), satisfies \(\frac{1}{n} \ln m \in B(s,\tfrac{1}{2} \epsilon )\). Then, the set \(I:= \{i\in {\mathbb N}: x_i^{i+n-1} = u\}\) contains 1 and \(n+\ell +m\), but excludes \(\{n+\ell +1, n+\ell +2, \dotsc , n+\ell +m-1\}\). Hence, with \(j:= \max \{ i\in I: i \le n+\ell \}\) we find \(R_n(T^{j-1} x) = n + \ell +m -j \in [m, m+n+\tau _n-1]\), so by (3.23) we obtain (3.25).

By taking the union over \(u\in {{\mathcal {A}}}^n\) in the left-hand side of (3.25), and using shift invariance to bound the probability of the right-hand side, we obtain

$$\begin{aligned} \sum _{u\in {{\mathcal {A}}}^n}{\mathbb {P}}(A_u) = {\mathbb {P}}\left( \bigcup _{u\in {{\mathcal {A}}}^n} A_u\right) \le (n+\tau _n){\mathbb {P}}\{x:\tfrac{1}{n} \ln R_n(x)\in B(s,\epsilon )\}. \end{aligned}$$
(3.26)

In view of (3.24), we have completed the proof of the statement about \(R_n\).

To adapt the proof for \(V_n\), it suffices to replace the definition of j with \(j:= \max \{ i\in I: i \le 1+\ell \}\); then \(V_n(T^{j-1}x) = 1+\ell +m-j\in [m,m+\tau _n]\) and the same arguments apply, with the factor \((n+\tau _n)\) replaced by \((1+\tau _n)\) in (3.26). \(\square \)

Lemma 3.8

Assume \({\mathbb {P}}\) satisfies SLD. Then, for every \(n\in {\mathbb N}\), the random variables \(R_n\) and \(V_n\) are \({\mathbb {P}}\)-almost surely finite.

Proof

In view of (3.4), the random variable \(W_n(u,\cdot \,)\) is \({\mathbb {P}}\)-almost surely finite for every fixed \(u\in {{\mathcal {A}}}^n\) with \({\mathbb {P}}_n(u)>0\). By shift invariance, so is \(W_n(u,\cdot \,) \circ T^k\) for each k. In view of the expressions

$$\begin{aligned} R_n(x)&= \sum _{u\in {{\mathcal {A}}}^n}1_{[u]}(x)W_n(u, Tx), \\ V_n(x)&= \sum _{u\in {{\mathcal {A}}}^n}1_{[u]}(x)W_n(u, T^{n} x), \end{aligned}$$

the conclusion is immediate. \(\square \)

Remark 3.9

An alternative way of showing that \(R_n\) and \(V_n\) are almost surely finite is to use the Poincaré recurrence theorem, see e.g. Theorem 1.4 in [Wal82].

4 Weak LDPs and Ruelle–Lanford Functions

We now briefly recall some terminology from the theory of large deviations, limiting ourselves to sequences of real-valued random variables. See for example [DS89, Ell06, DZ09] for proper introductions to the field. Let \((Z_n)_{n\in {\mathbb N}}\) be a sequence of (almost surely finite) real-valued random variables on a probability space \((\Omega _*,{\mathbb {P}}_*)\). The cases of interest will be

  • \(\Omega _* = \Omega \times \Omega \), \({\mathbb {P}}_* = {\mathbb {P}}\otimes {\mathbb {Q}}\) and \(Z_n = \frac{1}{n} \ln W_n\) for Theorem A;

  • \(\Omega _* = \Omega \), \({\mathbb {P}}_* ={\mathbb {P}}\), with \(Z_n = -\frac{1}{n} \ln {\mathbb {Q}}_n\), \(Z_n = \frac{1}{n} \ln V_n\) and \(Z_n = \frac{1}{n} \ln R_n\) for, respectively, Theorems 0, B and C.

The sequence \((Z_n)_{n\in {\mathbb N}}\) is said to satisfy the large deviation principle (LDP) if there exists a lower semicontinuous function \(I: {\mathbb R}\rightarrow [0,\infty ]\) such that

$$\begin{aligned} -\inf _{s\in O} I(s)\le \liminf _{n\rightarrow \infty } \frac{1}{n}\ln {\mathbb {P}}_*\{ x: Z_n(x) \in O \} \end{aligned}$$
(4.1)

for every open set \(O\subseteq {\mathbb R}\) and

$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{1}{n}\ln {\mathbb {P}}_*\{ x: Z_n(x) \in \Gamma \} \le -\inf _{{s} \in \Gamma } I({s}) \end{aligned}$$
(4.2)

for every closed set \(\Gamma \subseteq {\mathbb R}\). The bounds (4.1) and (4.2) are respectively called the large-deviation lower bound and the large-deviation upper bound, and the function I — which is unique when it exists — is called the rate function. Following standard terminology, we say that I is a good rate function if it properly diverges as \(s\rightarrow \pm \infty \). We also recall that the large-deviation upper bound applied to the set \({\mathbb R}\) implies that \(\inf _{s\in {\mathbb R}}I(s) = 0\). Ubiquitous in the theory of large deviations is the question of whether the rate function is convex and can be expressed as the Legendre–Fenchel transform of the corresponding pressure. A detailed analysis of these considerations for the random variables of interest is postponed to Sect. 5.

Our analysis will require additional vocabulary which is discussed e.g. in [DZ09, §1.2]. The sequence \((Z_n)_{n\in {\mathbb N}}\) is said to satisfy the weak large deviation principle if (4.1) holds for all open sets \(O\subseteq {\mathbb R}\), and (4.2) holds for all compact sets \(\Gamma \Subset {\mathbb R}\). We shall sometimes refer to the standard LDP as the full LDP when we need to emphasize the contrast to the weak LDP used as a stepping stone towards the full LDP. The following notion will play a role in doing so: the sequence \((Z_n)_{n\in {\mathbb N}}\) is said to be exponentially tight if, for every \(\beta \in {\mathbb R}\), there exists \(M > 0\) such that \({\mathbb {P}}_*\{ x: Z_n(x) \notin [-M,M]\} \le \textrm{e}^{-\beta n}\) for all n large enough. To be more precise, we will appeal to the two following facts for real-valued sequences. First, if the weak LDP holds and the sequence is exponentially tight, then the full LDP holds with a good rate function. Second, if the full LDP holds with a good rate function, then the sequence is exponentially tight. While all our LDPs are full, and while exponential tightness does play a role in our analysis, we emphasize that the sequence \((\tfrac{1}{n} \ln W_n)_{n\in {\mathbb N}}\) need not be exponentially tight, as illustrated in Sect. 2.5.

As mentioned, we will first prove the weak LDP. We will do so using Ruelle–Lanford (RL) functions. We introduce the lower RL function \( \underline{I}: {\mathbb R}\rightarrow [0,\infty ]\) defined byFootnote 10

$$\begin{aligned} \underline{I}(s) := -\lim _{\epsilon \rightarrow 0} \liminf _{n\rightarrow \infty } \frac{1}{n}\ln {\mathbb {P}}_*\{x : Z_n(x) \in B(s,\epsilon )\}, \end{aligned}$$

and the upper RL function \(\overline{I}\): \({\mathbb R}\rightarrow [0,\infty ]\) defined by

$$\begin{aligned} \overline{I}(s) := - \lim _{\epsilon \rightarrow 0} \limsup _{n\rightarrow \infty } \frac{1}{n}\ln {\mathbb {P}}_*\{x : Z_n(x) \in B(s,\epsilon )\}. \end{aligned}$$

It follows from their definition that \({\overline{I}}\) and \(\underline{I}\) are lower semicontinuous, and that \({\overline{I}} \le \underline{I}\). Moreover, the weak LDP holds if and only if we have the equality

$$\begin{aligned} \underline{I}(s) = \overline{I}(s) \end{aligned}$$
(4.3)

for every \(s\in {\mathbb R}\); see e.g. [DZ09, §4.1.2] or [CJPS19, §3.2]. The common value in (4.3) must then coincide with I(s).

The core of this section is devoted to proving the weak LDP for the sequences of interest via the validity of (4.3). To be more precise, for each sequence, both RL functions are shown to be equal to the proposed rate function, as detailed in Table 2. We consider separately the case \(s>0\) in Sect. 4.1 and the case \(s=0\) in Sect. 4.2, noting that the case \(s < 0\) is trivial by mere nonnegativity of the random variables under study. We denote by \({\underline{I}}_W\), \({\underline{I}}_V\) and \({\underline{I}}_R\) (resp. \({\overline{I}}_W\), \({\overline{I}}_V\) and \({\overline{I}}_R\)) the lower (resp. upper) RL functions associated with our sequences.

Table 2 Assumptions, proposed rate function and references to where the key equality (4.3) is established for the three weak LDPs to be proved

4.1 At positive values

Our first goal is to prove equality of the upper and lower Ruelle–Lanford functions at positive values of s. We start with the RL functions \({\underline{I}}_W\) and \({\overline{I}}_W\) of the sequence \((\tfrac{1}{n} \ln W_n)_{n\in {\mathbb N}}\). Most of the work to show that \({\underline{I}}_W(s)={\overline{I}}_W(s)\) when \(s>0\) was done in Sect. 3.1: in view of Proposition 3.5, it only remains to estimate the quantity

$$\begin{aligned} J_n(s):= \sum _{u\in U_n(s)} {\mathbb {Q}}_n(u){\mathbb {P}}_n(u) = \int _{[s,\infty )}\textrm{e}^{-rn} \textrm{d}\mu _n(r), \end{aligned}$$
(4.4)

where \(\mu _n\) is the distribution of \(-\frac{1}{n} \ln {\mathbb {Q}}_n\) with respect to \({\mathbb {P}}\), and where \(U_n(s)\) was defined in (3.14). The integral in (4.4) allows to express the limiting behavior of \(J_n\) in terms of the rate function \(I_{{\mathbb {Q}}}\) of Theorem 0, as shown by the following straightforward variation of Varadhan’s lemma.

Lemma 4.1

Assume \(({\mathbb {P}}, {\mathbb {Q}})\) is admissible. Then, for all \(s>0\),

$$\begin{aligned} \sup _{r>s}(-r-I_{{\mathbb {Q}}}(r)) \le \liminf _{n\rightarrow \infty } \frac{1}{n} \ln J_n(s) \le \limsup _{n\rightarrow \infty }\frac{1}{n} \ln J_n(s) \le \sup _{r\ge s}(-r-I_{{\mathbb {Q}}}(r)). \end{aligned}$$

Proof

Let us fix \(s > 0\). For the lower bound, note that for every choice of \(r>s\) and \(0< \epsilon < r-s\),

$$\begin{aligned} \liminf _{n\rightarrow \infty }\frac{1}{n} \ln J_n(s)&\ge \liminf _{n\rightarrow \infty }\frac{1}{n}\ln \int _{B(r,\epsilon )}\textrm{e}^{-r'n}\textrm{d}\mu _n(r') \\&\ge -r-\epsilon + \liminf _{n\rightarrow \infty }\frac{1}{n} \ln \mu _n(B(r,\epsilon )) \\&\ge -r -\epsilon - I_{{\mathbb {Q}}}(r), \end{aligned}$$

where we have used the large-deviation lower bound of Theorem 0. For the upper bound, let \(\beta := \inf _{r\ge s}(r+I_{{\mathbb {Q}}}(r)) \in [s, \infty )\). Then,

$$\begin{aligned} J_n(s)&= \int _{[s, \beta ]}\textrm{e}^{-rn}\textrm{d}\mu _n(r) + \int _{(\beta , \infty )}\textrm{e}^{-rn}\textrm{d}\mu _n(r)\le \int _{[s, \beta ]}\textrm{e}^{-rn}\textrm{d}\mu _n(r) + \textrm{e}^{-\beta n}. \end{aligned}$$

As a consequence, it suffices to show that

$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{1}{n}\int _{[s, \beta ]}\textrm{e}^{-rn}\textrm{d}\mu _n(r) \le -\beta , \end{aligned}$$
(4.5)

which, since \([s,\beta ]\) is compact, follows from a standard covering argument, see e.g. Lemma 4.3.6 in [DZ09] or the proof of Proposition C.1.iv below. \(\square \)

Proposition 4.2

Assume the pair \(({\mathbb {P}}, {\mathbb {Q}})\) is admissible. Then, for all \(s>0\), the RL functions for \((\frac{1}{n} \ln W_n)_{n\in {\mathbb N}}\) with respect to \({\mathbb {P}}\otimes {\mathbb {Q}}\) satisfy

$$\begin{aligned} {\underline{I}}_W(s) = {\overline{I}}_W(s) = -s+\inf _{r\ge s}(r+I_{{\mathbb {Q}}}(r)). \end{aligned}$$
(4.6)

Proof

Let \(s>0\). By Proposition 3.5 and Lemma 4.1, we find that for all \(0< \delta \le \epsilon < \tfrac{1}{2}\,s\),

$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{1}{n} \ln {\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y): \tfrac{1}{n} \ln W_n(x,y) \in B(s,\epsilon )\} \le s+\epsilon - \inf _{r\ge s-\epsilon -\delta }(r+I_{{\mathbb {Q}}}(r))\nonumber \\ \end{aligned}$$
(4.7)

and

$$\begin{aligned} \liminf _{n\rightarrow \infty } \frac{1}{n} \ln {\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y): \tfrac{1}{n} \ln W_n(x,y) \in B(s,\epsilon )\} \ge s - \delta - \inf _{r> s+\delta }(r+I_{{\mathbb {Q}}}(r)).\nonumber \\ \end{aligned}$$
(4.8)

By taking the limit as \(\delta \rightarrow 0\) first and then \(\epsilon \rightarrow 0\) in (4.7), and since \(I_{{\mathbb {Q}}}\) is lower semicontinuous, we obtain that \(-{\overline{I}}_W(s)\le s- \inf _{r\ge s}(r+I_{{\mathbb {Q}}}(r))\). Taking the same limits in (4.8) yields \(-{\underline{I}}_W(s)\ge s - \inf _{r> s}(r+I_{{\mathbb {Q}}}(r))\). We then have

$$\begin{aligned} -s + \inf _{r\ge s}(r+I_{{\mathbb {Q}}}(r)) \le {\overline{I}}_W(s) \le {\underline{I}}_W(s) \le -s + \inf _{r> s}(r+I_{{\mathbb {Q}}}(r)). \end{aligned}$$
(4.9)

For all \(s'\in (0,s)\), the last inequality in (4.9) applied to \(s'\) yields

$$\begin{aligned} {\underline{I}}_W(s')\le -s'+ \inf _{r> s'}(r+I_{{\mathbb {Q}}}(r)) \le -s'+\inf _{r\ge s}(r+I_{{\mathbb {Q}}}(r)). \end{aligned}$$

Since \({\underline{I}}_W\) is lower semicontinuous (as a RL function), this in turn implies that

$$\begin{aligned} {\underline{I}}_W(s)\le \liminf _{s'\uparrow s} {\underline{I}}_W(s') \le -s +\inf _{r\ge s}(r+I_{{\mathbb {Q}}}(r)), \end{aligned}$$

so the first three quantities in (4.9) are actually equal, as desired. \(\square \)

Proposition 4.3

Assume \({\mathbb {P}}\) satisfies UD and SLD. Then, for all \(s>0\), the RL functions for \((\frac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) and \((\frac{1}{n} \ln V_n)_{n\in {\mathbb N}}\) with respect to \({\mathbb {P}}\) satisfy

$$\begin{aligned} {\underline{I}}_R(s) = {\overline{I}}_R(s) = -s+\inf _{r\ge s}(r+I_{{\mathbb {P}}}(r)) \end{aligned}$$

and

$$\begin{aligned} {\underline{I}}_V(s) = {\overline{I}}_V(s) = -s+\inf _{r\ge s}(r+I_{{\mathbb {P}}}(r)). \end{aligned}$$

Proof

Let \(s>0\). First, we remark that Proposition 4.2 applies to the admissible pair \(({\mathbb {P}},{\mathbb {P}})\), so that \({\overline{I}}_W(s) = {\underline{I}}_W(s) = -s+\inf _{r\ge s}(r+I_{{\mathbb {P}}}(r))\). Then, Lemma 3.6 implies that \({\overline{I}}_R(s) \ge {\overline{I}}_W(s)\) and \({\overline{I}}_V(s) \ge {\overline{I}}_W(s)\); recall also (3.17). In the same way, Lemma 3.7 implies that \({\underline{I}}_R(s) \le {\underline{I}}_W(s)\) and \({\underline{I}}_V(s) \le {\underline{I}}_W(s)\). Since also \({\overline{I}}_R\le {\underline{I}}_R\) and \({\overline{I}}_V\le {\underline{I}}_V\), the proof is complete. \(\square \)

4.2 At the origin

In this subsection, we prove that, for each of the sequences \((\frac{1}{n} \ln W_n)_{n\in {\mathbb N}}\), \((\frac{1}{n} \ln V_n)_{n\in {\mathbb N}}\) and \((\frac{1}{n} \ln R_n)_{n\in {\mathbb N}}\), the upper and lower RL functions match at \(s = 0\). We recall that the limit

$$\begin{aligned} q_{{\mathbb {Q}}}(-1) = \lim _{n\rightarrow \infty }\frac{1}{n} \ln \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u){\mathbb {Q}}_n(u) \end{aligned}$$
(4.10)

exists for any admissible pair \(({\mathbb {P}},{\mathbb {Q}})\) by Theorem 0.ii.

Proposition 4.4

If the pair \(({\mathbb {P}}, {\mathbb {Q}})\) is admissible, then

$$\begin{aligned} {\underline{I}}_W(0) ={\overline{I}}_W(0) = -q_{{\mathbb {Q}}}(-1) = \inf _{r\ge 0}(r+I_{{\mathbb {Q}}}(r)). \end{aligned}$$
(4.11)

Proof

For all \(\epsilon >0\) and \(n\in {\mathbb N}\),

$$\begin{aligned} {\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y): \tfrac{1}{n} \ln W_n(x,y) \in B(0,\epsilon )\}&\ge {\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y):W_n(x,y) =1\} \\&=\sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u){\mathbb {Q}}_n(u). \end{aligned}$$

In view of (4.10) and the definition of \({\underline{I}}_W\), we have \({\underline{I}}_W(0) \le -q_{{\mathbb {Q}}}(-1)\). To obtain the opposite inequality for \({\overline{I}}_W(0)\), observe that

$$\begin{aligned} {\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y): W_n(x,y) = k\} \le {\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y): x_1^n = y_{1+k}^{n+k}\} =\sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u){\mathbb {Q}}_n(u) \end{aligned}$$

for every \(u\in {{\mathcal {A}}}^n\), \(n\in {\mathbb N}\) and \(k\in {\mathbb N}\). Therefore, a union bound gives, for every \(\epsilon > 0\),

$$\begin{aligned} {\mathbb {P}}\otimes {\mathbb {Q}}\{(x,y):W_n(x,y) \le \textrm{e}^{\epsilon n}\}\le \textrm{e}^{\epsilon n}\sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u){\mathbb {Q}}_n(u). \end{aligned}$$

By (4.10) and the definition of \({\overline{I}}_W\), we conclude that \({\overline{I}}_W(0)\ge -q_{{\mathbb {Q}}}(-1)\). Since also \({\underline{I}}_W(0) \ge {\overline{I}}_W(0)\) we have thus established the first two equalities in (4.11).

To complete the proof, it remains to observe that, since \(I_{{\mathbb {Q}}}(r)=\infty \) for all \(r<0\), and since \(q_{{\mathbb {Q}}}= I_{{\mathbb {Q}}}^*\) by Theorem 0.ii,

$$\begin{aligned} \inf _{r\ge 0}(r+I_{{\mathbb {Q}}}(r))= \inf _{r\in {\mathbb R}}(r+I_{{\mathbb {Q}}}(r)) = -\sup _{r\in {\mathbb R}}(-r-I_{{\mathbb {Q}}}(r)) = -q_{{\mathbb {Q}}}(-1), \end{aligned}$$
(4.12)

which establishes the last identity in (4.11). \(\square \)

Proposition 4.5

If \({\mathbb {P}}\) satisfies UD and SLD, then

$$\begin{aligned} {\underline{I}}_V(0) = {\overline{I}}_V(0) = - q_{{\mathbb {P}}}(-1) = \inf _{r\ge 0}(r+I_{{\mathbb {P}}}(r)). \end{aligned}$$
(4.13)

Proof

The stated assumptions allow to apply Theorem 0 to the pair \(({\mathbb {P}},{\mathbb {P}})\), and in particular its consequences (4.10) and (4.12) with \({\mathbb {Q}}={\mathbb {P}}\). Since the third equality in (4.13) is a special case of (4.12), and since \({\overline{I}}_V(0)\le {\underline{I}}_V(0)\) by definition, it suffices to establish the inequalities \({\underline{I}}_V(0) \le -q_{{\mathbb {P}}}(-1)\) and \({\overline{I}}_V(0) \ge -q_{{\mathbb {P}}}(-1)\) in order to complete the proof. To this end, we let \(\epsilon > 0\) be arbitrary and restrict our attention to n large enough so that \(\min \{n, \textrm{e}^{\epsilon n}\}>\tau _n+1\).

For each \(u\in {{\mathcal {A}}}^n\), SLD implies that there is \(0\le \ell \le \tau _n\) such that \({\mathbb {P}}([u]\cap T^{-n-\ell }[u]) \ge C_n^{-1} {\mathbb {P}}_n(u)^2\). Since \([u]\cap \{x:V_n(x) \le \tau _n+1\} \supseteq [u]\cap T^{-n-\ell }[u]\), this in turn implies that

$$\begin{aligned} {\mathbb {P}}\{x:V_n(x) < \textrm{e}^{\epsilon n}\} \ge {\mathbb {P}}\{x:V_n(x) \le \tau _n+1\} \ge C_n^{-1} \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u)^2. \end{aligned}$$

Combining this with (4.10) for \({\mathbb {Q}}={\mathbb {P}}\) establishes the inequality \({\underline{I}}_V(0) \le -q_{{\mathbb {P}}}(-1)\).

On the other hand, for each \(k\in {\mathbb N}\),

$$\begin{aligned} \{x:V_n(x) = k\} \subseteq \bigcup _{u\in {{\mathcal {A}}}^{n}}[u]\cap T^{1-n-k}[u] \subseteq \bigcup _{v\in {{\mathcal {A}}}^{n-\tau _n}}[v]\cap T^{1-n-k}[v]. \end{aligned}$$

Assuming without loss of generality that the sequence \((\tau _n)_{n\in {\mathbb N}}\) is nondecreasing (so in particular \(\tau _{n-\tau _n} \le \tau _n\)), we obtain from UD that

$$\begin{aligned} {\mathbb {P}}\{x:V_n(x) = k\} \le C_{n-\tau _n} \sum _{v\in {{\mathcal {A}}}^{n-\tau _n}} {\mathbb {P}}_{n-\tau _n}(v)^2. \end{aligned}$$

Considering the union over \(k = 1,2,\dotsc , \lceil \textrm{e}^{\epsilon n}\rceil -1\), we further obtain

$$\begin{aligned} \begin{aligned}&\limsup _{n\rightarrow \infty }\frac{1}{n}\ln {\mathbb {P}}\{x:V_n(x) < \textrm{e}^{\epsilon n}\}\\&\quad \le \limsup _{n\rightarrow \infty }\frac{1}{n}\ln \left( \textrm{e}^{\epsilon n} C_{n-\tau _n} \sum _{v\in {{\mathcal {A}}}^{n-\tau _n}} {\mathbb {P}}_{n-\tau _n}(v)^2 \right) \\&\quad = \epsilon + \limsup _{n\rightarrow \infty }\frac{1}{n}\ln \sum _{u\in {{\mathcal {A}}}^{n}} {\mathbb {P}}_n(u)^2, \end{aligned} \end{aligned}$$
(4.14)

where we have used that \(\lim _{n\rightarrow \infty }\frac{n-\tau _n}{n} =1\). Combining this with (4.10) for \({\mathbb {Q}}={\mathbb {P}}\) establishes the inequality \({\overline{I}}_V(0) \ge -q_{{\mathbb {P}}}(-1)\), so the proof is complete. \(\square \)

Let us now turn to \({\underline{I}}_R(0)\) and \({\overline{I}}_R(0)\), whose comparison is significantly more involved. We start with a technical lemma.

Lemma 4.6

If \({\mathbb {P}}\) satisfies SLD and UD, then

$$\begin{aligned} \gamma _+ = \lim _{n\rightarrow \infty }\frac{1}{n} \sup _{u\in {{\mathcal {A}}}^n}\ln {\mathbb {P}}_n(u) \ge q_{{\mathbb {P}}}(-1). \end{aligned}$$
(4.15)

Proof

From Theorem 0.iii.b, we know that the limit superior in the definition (1.12) of \(\gamma _+\) is actually a limit, which is the first equality in (4.15). We now prove the inequality, noting that, by Theorem 0.ii, the limit defining \(q_{{\mathbb {P}}}(-1)\) exists. For each \(n\in {\mathbb N}\), we find \(\sup _{v\in {{\mathcal {A}}}^n}{\mathbb {P}}(v) = \sum _{u\in {{\mathcal {A}}}^n}{\mathbb {P}}_n(u)\sup _{v\in {{\mathcal {A}}}^n}{\mathbb {P}}_n(v) \ge \sum _{u\in {{\mathcal {A}}}^n}{\mathbb {P}}_n(u)^2\), and the claim follows from (4.10) with \({\mathbb {Q}}={\mathbb {P}}\). \(\square \)

The following proposition shows that \({\underline{I}}_R(0) = {\overline{I}}_R(0) = -\gamma _+\) under the assumptions of Theorem C. We give several additional inequalities in order to underline the role of PA. The proposition also shows that, while \({\overline{I}}_R(0)\) and \({\underline{I}}_R(0)\) are defined in terms of \(\{x: R_n(x) < \textrm{e}^{\epsilon n}\}\), the subset \(\{x: R_n(x) < n\}\) accounts for the full behavior of the probability at the exponential scale.

Proposition 4.7

The following hold:

  1. i.

    If \({\mathbb {P}}\) satisfies UD, then

    $$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{1}{n} \ln {\mathbb {P}}\{x: R_n(x) < n\} \le -{\overline{I}}_R(0) \le \gamma _+. \end{aligned}$$
    (4.16)
  2. ii.

    If \({\mathbb {P}}\) satisfies PA, then

    $$\begin{aligned} -{\underline{I}}_R(0) \ge \liminf _{n\rightarrow \infty } \frac{1}{n} \ln {\mathbb {P}}\{x: R_n(x) < n\} \ge \gamma _+. \end{aligned}$$
    (4.17)
  3. iii.

    If \({\mathbb {P}}\) satisfies UD and PA, then

    $$\begin{aligned} -{\underline{I}}_R(0) = -{\overline{I}}_R(0) = \lim _{n\rightarrow \infty } \frac{1}{n} \ln {\mathbb {P}}\{x: R_n(x) < n\} = \gamma _+. \end{aligned}$$

Proof

Given \(u\in {{\mathcal {A}}}^n\), we denote by \({\text {per}}(u)\) the period of u, i.e. the smallest \(p \in {\mathbb N}\) such that \(u_i = u_{i+p}\) for all \(1\le i \le n-p\). Since the condition is vacuously true for \(p=n\), we have the bound \({\text {per}}(u)\le n\). The key observation is that, for every \(1\le k< n\) and \(x\in \Omega \),

$$\begin{aligned} R_n(x)=k \qquad \iff \qquad {\text {per}}(x_1^{k+n})=k. \end{aligned}$$
(4.18)
  1. i.

    Since \(\{x:R_n(x)< n\} \subseteq \{x:R_n(x) < \textrm{e}^{\epsilon n}\}\) for all large enough n, the first inequality in (4.16) readily follows from the definition of \({\overline{I}}_R\). We now prove the second inequality. Let n be large enough so that \(n>\max \{1,\tau _n\}\) and let \(1\le k< n\). The map \(\{u\in {{\mathcal {A}}}^n: {\text {per}}(u)=k\}\rightarrow {{\mathcal {A}}}^k\) given by \(u\mapsto u_{n-k+1}^n\) is injective. By (4.18), we thus have

    $$\begin{aligned} {\mathbb {P}}\{x:R_n(x) = k\} = \sum _{u\in {{\mathcal {A}}}^n:{\text {per}}(u)=k}{\mathbb {P}}_{n+k}(u u_{n-k+1}^n) \le \sum _{v\in {{\mathcal {A}}}^k}\sup _{u\in {{\mathcal {A}}}^{n}}{\mathbb {P}}_{n+k}(uv). \end{aligned}$$

    As in the proof of Proposition 4.5, we assume without loss of generality that \((\tau _n)_{n\in {\mathbb N}}\) is nondecreasing. Then, UD yields

    $$\begin{aligned} {\mathbb {P}}_{n+k}(uv)\le {\mathbb {P}}([u_1^{n-\tau _n}]\cap T^{-n}[v])\le C_{n - \tau _n}{\mathbb {P}}_{n-\tau _n}(u_1^{n-\tau _n}){\mathbb {P}}_k(v), \end{aligned}$$

    and so

    $$\begin{aligned} {\mathbb {P}}\{x:R_n(x) = k\}&\le C_{n - \tau _n} \sum _{v\in {{\mathcal {A}}}^k}\sup _{u\in {{\mathcal {A}}}^{n}}{\mathbb {P}}_{n-\tau _n}(u_1^{n-\tau _n}){\mathbb {P}}_k(v)\\&= C_{n - \tau _n} \sup _{u\in {{\mathcal {A}}}^{n}}{\mathbb {P}}_{n-\tau _n}(u_1^{n-\tau _n}) \\&= C_{n - \tau _n} \sup _{u\in {{\mathcal {A}}}^{n-\tau _n}}{\mathbb {P}}_{n-\tau _n}(u). \end{aligned}$$

    Taking a union over \(1\le k <n\) gives

    $$\begin{aligned} {\mathbb {P}}\{x:R_n(x) <n\} \le (n-1)C_{n - \tau _n} \sup _{u\in {{\mathcal {A}}}^{n-\tau _n}}{\mathbb {P}}_{n-\tau _n}(u). \end{aligned}$$

    Comparing with the definition (1.12) of \(\gamma _+\), and using that \(\lim _{n\rightarrow \infty }\frac{n-\tau _n}{n} =1\), we deduce that

    $$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{1}{n} \ln {\mathbb {P}}\{x:R_n(x) < n\} \le \gamma _+. \end{aligned}$$
    (4.19)

    Now, observe that

    $$\begin{aligned} {\mathbb {P}}\{x:R_n(x) \in [n, \textrm{e}^{n\epsilon })\} \le {\mathbb {P}}\{x:V_n(x) \in [1, \textrm{e}^{n\epsilon }+1-n)\}\le {\mathbb {P}}\{x:V_n(x) < \textrm{e}^{\epsilon n}\}. \end{aligned}$$

    By (4.14), which only relies on UD, this implies

    $$\begin{aligned} \lim _{\epsilon \rightarrow 0}\limsup _{n\rightarrow \infty }\frac{1}{n} \ln {\mathbb {P}}\{x:R_n(x) \in [n, \textrm{e}^{n\epsilon })\} \le \gamma _+. \end{aligned}$$
    (4.20)

    By definition of \({\overline{I}}_R\), the inequalities (4.19) and (4.20) imply \(-{\overline{I}}_R(0) \le \gamma _+\), as desired.

  2. ii.

    As above, the first inequality in (4.17) is immediate by the definition of \({\underline{I}}_R\). We now establish the second one. Let \(p\in {\mathbb N}\) and \(u \in {{\mathcal {A}}}^p\) be arbitrary. By (4.18), for \(n > p\) and \(r_n = \lceil \frac{n}{p}\rceil +1\), we have \([u^{r_n}] \subseteq \{x:R_n(x) \le p \}\). Therefore,

    $$\begin{aligned} \begin{aligned} \liminf _{n\rightarrow \infty } \frac{1}{n} \ln {\mathbb {P}}\{x:R_n(x)<n\}&\ge \liminf _{n\rightarrow \infty }\frac{1}{n} \ln {\mathbb {P}}\{x:R_n(x) \le p \} \\&\ge \liminf _{n\rightarrow \infty }\frac{1}{n} \ln {\mathbb {P}}([u^{r_n}]) \\&= \liminf _{r\rightarrow \infty }\frac{1}{p r} \ln {\mathbb {P}}([u^{r}]). \end{aligned} \end{aligned}$$
    (4.21)

    Since the expression in the last line of (4.21) can be made arbitrarily close to \(\gamma _+\) by PA, the proof of Part ii is complete.

  3. iii.

    The conclusion is just the combination of Parts i and ii. \(\square \)

Proposition 4.7 is the only place where PA is ever used in our proofs. We do not know if PA can be lifted, nor are we aware of any example of a measure satisfying UD and SLD but not PA. We take the remainder of this subsection to briefly discuss what remains true if PA is dropped or weakened. In this discussion, we always assume that \({\mathbb {P}}\) satisfies UD and SLD.

First, if PA is dropped, we remark that Propositions 4.7.i and 4.3 still ensure that \({\overline{I}}_R(s) \ge I_R(s)\) for all \(s\in {\mathbb R}\), with \(I_R\) defined in (1.25). Thus, the weak large-deviation upper bound for \((\frac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) holds. By retracing the proofs, one easily concludes that also the full large-deviation upper bound holds with the rate function \(I_R\), and that \({\overline{q}}_R= {\overline{I}}_R^* \le I_R^*\), where \({\overline{q}}_R\) is defined as in (1.4) with a limit superior.

In Proposition 4.7, PA is only used to obtain the large-deviation lower bound at \(s=0\). More specifically, all we actually derive from PA is that

$$\begin{aligned} \liminf _{n\rightarrow \infty } \frac{1}{n} \ln {\mathbb {P}}\{x: R_n(x) < n\}\ge \gamma _+. \end{aligned}$$
(4.22)

So instead of PA, one could have taken (4.22) as an assumption, or any other condition implying it.

In fact, one could even obtain a full LDP for \((\frac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) without (4.22). Indeed, if one can show by some means that the limit \(D:= \lim _{n\rightarrow \infty } \frac{1}{n} \ln {\mathbb {P}}\{x: R_n(x) < n\}\) exists, then necessarily \(D\le \gamma _+\) by Proposition 4.7.i, and the proofs can easily be adapted to obtain the full LDP, by merely replacing \(-\gamma _+\) with \(\min \{-D,I_V(0)\}\) in the definition (1.25) of \(I_R\). We do not know under what conditions the limit defining D exists, and we have been unable to produce any counter example.

We now return to means of establishing (4.22). Under PA, the proof of Proposition 4.7.ii actually shows that the probability of \(\{x: R_n(x) < n\}\) is asymptotically captured by periodic orbits of period much smaller than n. PA can be slightly relaxed so as to take into account words u that can be repeated many times, but not necessarily infinitely many times as PA requires:

Definition 4.8

(WPA). A measure \({\mathbb {P}}\in {{\mathcal {P}}}_{\text {inv}}(\Omega )\) satisfies the assumption of weak periodic approximation (WPA) if for every \(\epsilon >0\) there exists \(M\in {\mathbb N}\) such that for all \(m\ge M\) there is a word \(u\in \Omega _{\text {fin}}\) with \(|u|\le \epsilon m\), and such that

$$\begin{aligned} \frac{1}{m} \ln {\mathbb {P}}_{|u| \left\lfloor \frac{m}{|u|}\right\rfloor }\left( u^{\left\lfloor \frac{m}{|u|} \right\rfloor }\right) \ge \gamma _+ - \epsilon . \end{aligned}$$

Then, an easy modification of the proof of Proposition 4.7 shows that (4.22) still holds assuming WPA instead of PA, and thus so do the conclusions of Theorem C.

In order to conclude the discussion, we briefly comment on results from the literature about Poincaré recurrence times (see [AV08, AC15, AAG21] and references therein) which can be used to establish (4.22). The Poincaré recurrence times are defined by \(T_n(x) = \inf \{k \in {\mathbb N}: {\mathbb {P}}([x_1^n] \cap T^{-k}[x_1^n]) > 0\}\).Footnote 11 The asymptotic behavior of \(T_n\) is overall very different from that of \(R_n\); in particular \(T_n \le n+\tau _n\) almost surely by SLD. However, the following two relations hold \({\mathbb {P}}\)-almost surely: first, \(T_n \le R_n\), and second, \(T_n <n\) implies that \(R_{n-T_n}\le T_n\). It follows that if one can show that \(\lim _{\epsilon \rightarrow 0}\liminf _{n\rightarrow \infty }\ln \tfrac{1}{n} {\mathbb {P}}\{T_n < \epsilon n\} \ge \gamma _+\), then also \(\lim _{\epsilon \rightarrow 0}\liminf _{n\rightarrow \infty }\ln \tfrac{1}{n} {\mathbb {P}}\{x:R_n(x) < \epsilon n\} \ge \gamma _+\), and in particular (4.22) holds. Such a result is proved in [AV08] under an assumption called “Hypothesis 1”, which is very similar to WPA. The same bound is obtained in [AC15] under an assumption called “Assumption 1” which, in spirit, also plays role similar to that of WPA.

5 Proof of the Main Results

At this stage, we have proved that if the pair \(({\mathbb {Q}}, {\mathbb {P}})\) is admissible, then \({\overline{I}}_W= {\underline{I}}_W= I_W\), with \(I_W\) defined by (1.19); see Propositions 4.2 and 4.4, and notice that all three functions are infinite on the negative real axis. This implies that the sequence \((\tfrac{1}{n} \ln W_n)_{n\in {\mathbb N}}\) satisfies the weak LDP with respect to \({\mathbb {P}}\otimes {\mathbb {Q}}\), with the rate function \(I_W\); see the beginning of Sect. 4. In the same way, by Propositions 4.3 and 4.5, we have proved that, if \({\mathbb {P}}\) satisfies UD and SLD, then the sequence \((\tfrac{1}{n} \ln V_n)_{n\in {\mathbb N}}\) satisfies the weak LDP with respect to \({\mathbb {P}}\), with the rate function \(I_V\) given in (1.22). Finally, combining Propositions 4.3 and 4.7, we have shown that for \({\mathbb {P}}\) satisfying UD, SLD and PA, the sequence \((\tfrac{1}{n} \ln R_n) _{n\in {\mathbb N}}\) satisfies the weak LDP with respect to \({\mathbb {P}}\), with the rate function \(I_R\) given in (1.25). See Table 2 for a summary.

Since upper and lower Ruelle–Lanford functions are always lower semicontinuous, we conclude that \(I_W\), \(I_V\) and \(I_R\) are lower semicontinuous. Alternatively, lower semicontinuity can be checked explicitly using the expressions (1.19), (1.22) and (1.25), together with the fact that the rate function \(I_{{\mathbb {Q}}}\) in Theorem 0 is lower semicontinuous.

In this section, we promote the weak LDPs to full ones and establish the claimed relations about the rate functions and accompanying pressures. This will conclude the proofs of Theorems AB and C. The proofs of these theorems, and actually also that of Theorem 0, have many (rather standard) arguments in common, which we have extracted as Proposition C.1.

Lemma 5.1

Under the assumptions of Theorem A, the rate function \(I_W\) is convex. In particular, if \({\mathbb {P}}\) satisfies UD and SLD, then \(I_V\) is convex.

Proof

We claim that for every \(\lambda \in [0,1]\) and \(s_1, s_2\in {\mathbb R}\),

$$\begin{aligned} \lambda I_W(s_1)+(1-\lambda )I_W(s_2)\ge I_W(s) \end{aligned}$$

where \(s:= \lambda s_1 + (1-\lambda ) s_2\). The stated inequality is obvious if \(s_1<0\) or \(s_2<0\). For \(s_1, s_2\ge 0\),

$$\begin{aligned}&\lambda I_W(s_1) + (1-\lambda )I_W(s_2) \\&\qquad \qquad = -s + \inf _{r_1 \ge s_1, r_2 \ge s_2} (\lambda r_1 + (1-\lambda )r_2 + \lambda I_{{\mathbb {Q}}}(r_1)+(1-\lambda ) I_{{\mathbb {Q}}}(r_2))\\&\qquad \qquad \ge -s + \inf _{r_1 \ge s_1, r_2 \ge s_2} (\lambda r_1 + (1-\lambda )r_2 + I_{{\mathbb {Q}}}(\lambda r_1 + (1-\lambda )r_2))\\&\qquad \qquad = -s +\inf _{r \ge s} (r+I_{{\mathbb {Q}}}(r)) = I_W(s), \end{aligned}$$

where the inequality in the second line relies on the convexity of \(I_{{\mathbb {Q}}}\), by Theorem 0.i. The second part of the lemma follows from the identity \(I_V=I_W\) when \({\mathbb {P}}={\mathbb {Q}}\); recall Remark 1.11. \(\square \)

Lemma 5.2

Suppose that \({\mathbb {Q}}\) satisfies SLD. Then, for every \(\alpha > 0\), there exists an \(\textrm{e}^{o(n)}\)-sequence \((\kappa _{\alpha ,n})_{n\in {\mathbb N}}\) such that

$$\begin{aligned} \int W_n(u,y)^\alpha \textrm{d}{\mathbb {Q}}(y) \le \kappa _{\alpha ,n} {\mathbb {Q}}_n(u)^{-\alpha } \end{aligned}$$

for each \(u\in {{\,\textrm{supp}\,}}{\mathbb {Q}}_n\).

Proof

Fix \(n\in {\mathbb N}\), \(u\in {{\,\textrm{supp}\,}}{\mathbb {Q}}_n\), and let \(q:= 1 - C_n^{-1}{\mathbb {Q}}_n(u)\). Without loss of generality, we assume that \(C_n\ge C_1 > 1\), so that \(0<1-C_1^{-1}\le q < 1\). For all \(t\ge 0\), the bound (3.4) of Lemma 3.1 yields

$$\begin{aligned} {\mathbb {Q}}\{y : W_n(u,y)> t\}&= {\mathbb {Q}}\{y : W_n(u,y)\ge \lfloor t\rfloor +1\} \\&\le q^{\left\lfloor \frac{\lfloor t\rfloor }{n + \tau _n}\right\rfloor } \\&\le q^{ \frac{t}{n + \tau _n}-2} \\&\le (1-C_1^{-1})^{-2}q^{ \frac{t}{n + \tau _n}}. \end{aligned}$$

Then, by a standard consequence of Fubini’s theorem (see e.g. Theorem 8.16 in [Rud87]),

$$\begin{aligned} \int W_n(u,y)^\alpha \textrm{d}{\mathbb {Q}}(y)&= \alpha \int _0^\infty t^{\alpha -1} {\mathbb {Q}}\{y : W_n(u,y)> t\}\textrm{d}t\\&\le \frac{\alpha }{(1-C_1^{-1})^2} \int _0^\infty t^{\alpha -1} q^{\frac{t}{n + \tau _n}}\textrm{d}t\\&= \frac{\alpha }{(1-C_1^{-1})^2} \Gamma (\alpha )\left( \frac{1}{n + \tau _n}\right) ^{-\alpha }\left( -\ln q\right) ^{-\alpha }. \end{aligned}$$

Further using that \(-\!\ln (q)\! =\! -\ln (1-C_n^{\!-\!1}{\mathbb {Q}}_n(u))\!\ge \! C_n^{\!-\!1}{\mathbb {Q}}_n(u)\) completes the proof. \(\square \)

We are now in a position to prove the main results. With Proposition C.1 from “Appendix C” and the weak LDPs at hand, the proofs of Theorems AC only consist in providing a few remaining estimates specific to each case.

5.1 Proof of Theorem A

Since \((\frac{1}{n} \ln W_n)_{n\in {\mathbb N}}\) satisfies the weak LDP, the assumptions of Proposition C.1 are satisfied with \(Z_n:= \frac{1}{n} \ln W_n\) on \((\Omega _*, {\mathbb {P}}_*):= (\Omega \times \Omega , {\mathbb {P}}\otimes {\mathbb {Q}})\), with the convex (recall Lemma 5.1) rate function \(I:=I_W\). We now show that the conclusions of Theorem A follow from Proposition C.1.

We first prove that \(q_W\) exists and that \(q_W= I_W^*\). For this we define \({\underline{q}}_W\) and \({\overline{q}}_W\) as the limit inferior and limit superior corresponding to the definition (1.6) of \(q_W\). The bound \({\underline{q}}_W\ge I_W^*\) is provided by Proposition C.1.iii, and by Proposition C.1.v we have \(q_W(\alpha ) = I_W^*(\alpha )\) for all \(\alpha <0\). Since \(I_W\le I_{{\mathbb {Q}}}\) by definition, we have \(I_W^*(0) \ge I_{{\mathbb {Q}}}^*(0) = q_{{\mathbb {Q}}}(0)= 0 = q_W(0)\), where we have used Theorem 0.ii. Finally, for \(\alpha > 0\), Lemma 5.2 gives

$$\begin{aligned} \begin{aligned} \int \int W_n(x,y)^\alpha \textrm{d}{\mathbb {P}}(x)\textrm{d}{\mathbb {Q}}(y)&= \sum _{u\in {{\,\textrm{supp}\,}}{\mathbb {P}}_n}{\mathbb {P}}_n(u)\int W_n(u,y)^\alpha \textrm{d}{\mathbb {Q}}(y) \\&\le \kappa _{n,\alpha } \sum _{u\in {{\,\textrm{supp}\,}}{\mathbb {P}}_n}{\mathbb {P}}_n(u) {\mathbb {Q}}_n(u)^{-\alpha }. \end{aligned} \end{aligned}$$
(5.1)

Note that we have used the absolute continuity granted by admissibility of the pair \(({\mathbb {P}},{\mathbb {Q}})\). We conclude that

$$\begin{aligned} {\overline{q}}_W(\alpha )\le q_{{\mathbb {Q}}}(\alpha ) = I_{{\mathbb {Q}}}^*(\alpha ) \le I_W^*(\alpha ), \end{aligned}$$

where from left to right, we have used (5.1), Theorem 0.ii, and the fact that \(I_W\le I_{{\mathbb {Q}}}\) by definition. We have thus proved that \(q_W= I_W^*\). By Proposition C.1.vii, the weak LDP then extends to a full one. This completes the proof of Theorem A.i.

For Part ii, we have already established that \(q_W= I_W^*\) (in particular \(q_W\) exists as a limit), and since \(I_W\) is convex and lower semicontinuous, this implies that also \(I_W= q_W^*\). To establish (1.20), note that

$$\begin{aligned} q_W(\alpha )=I_W^*(\alpha ) = \sup _{s\ge 0} (\alpha s - \inf _{r\ge s}(r-s+I_{{\mathbb {Q}}}(r))) = \sup _{r\ge s\ge 0} ((\alpha +1) s -r - I_{{\mathbb {Q}}}(r)). \end{aligned}$$

Since the quantity to optimize is linear in s, it suffices to consider the extremal points \(s\in \{0,r\}\), which yields, using again Theorem 0.ii,

$$\begin{aligned} q_W(\alpha )&= \sup _{r\ge 0} (\max \{0, (\alpha +1)r\} -r - I_{{\mathbb {Q}}}(r)) \\&= \max \left\{ \sup _{r\ge 0}(-r - I_{{\mathbb {Q}}}(r)), \sup _{r\ge 0}(\alpha r - I_{{\mathbb {Q}}}(r))\right\} \\&= \max \{q_{{\mathbb {Q}}}(\alpha ), q_{{\mathbb {Q}}}(-1)\}, \end{aligned}$$

so the proof of Theorem A.ii is complete.

In order to prove Theorem A.iii, we simply note that when \({\mathbb {Q}}={\mathbb {P}}\), then by (1.20) and (1.16), we have \(q_W(1) = q_{{\mathbb {P}}}(1) = h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}}) <\infty \), so Proposition C.1.vi applies. \(\square \)

5.2 Proof of Theorem B

The proof is almost the same as that of Theorem A, applying this time Proposition C.1 to \( Z_n:= \frac{1}{n} \ln V_n\) on \((\Omega _*, {\mathbb {P}}_*):= (\Omega , {\mathbb {P}})\), with the convex rate function \(I:=I_V\). We will only require a slightly more involved argument to derive the inequality \({\overline{q}}_V(\alpha )\le I_V^*(\alpha )\) when \(\alpha > 0\), where \({\overline{q}}_V\) is defined by taking the limit superior in the definition (1.5) of \(q_V\). For \(\alpha >0\), recalling that \(W_n(u, x)\le W_n(u, T^k x)+k\), we obtain

$$\begin{aligned} \int V_n^\alpha \textrm{d}{\mathbb {P}}&= \sum _{u\in {{\mathcal {A}}}^n} \int 1_{[u]}(x) W_n(u, T^{n}x)^\alpha \textrm{d}{\mathbb {P}}(x) \\&\le \sum _{u\in {{\mathcal {A}}}^n}\int 1_{[u]}(x) (W_n(u, T^{n+\tau _n}x)+\tau _n)^\alpha \textrm{d}{\mathbb {P}}(x). \end{aligned}$$

Then, by UD,

$$\begin{aligned}&\int 1_{[u]}(x) (W_n(u, T^{n+\tau _n}x)+\tau _n)^\alpha \textrm{d}{\mathbb {P}}(x) \\&\qquad \qquad = \sum _{j=1}^\infty {\mathbb {P}}([u]\cap T^{-n-\tau _n}\{x: W_n(u,x)=j\})(j+\tau _n)^\alpha \\&\qquad \qquad \le C_n \sum _{j=1}^\infty {\mathbb {P}}_n(u) {\mathbb {P}}\{x: W_n(u,x)=j\}(j+\tau _n)^\alpha \\&\qquad \qquad = C_n{\mathbb {P}}_n(u)\int (W_n(u,x)+\tau _n)^\alpha \textrm{d}{\mathbb {P}}(x), \end{aligned}$$

so

$$\begin{aligned} \int V_n^\alpha \textrm{d}{\mathbb {P}}&\le C_n \sum _{u\in {{\mathcal {A}}}^n}{\mathbb {P}}_n(u)\int (W_n(u,x)+\tau _n)^\alpha \textrm{d}{\mathbb {P}}(x) \\&\le C_n \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u)\int (W_n(u,x)(1+\tau _n))^\alpha \textrm{d}{\mathbb {P}}(x) \\&\le C_n (1+\tau _n)^\alpha \kappa _{\alpha ,n} \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u)^{1-\alpha }, \end{aligned}$$

where the last inequality was obtained using Lemma 5.2 with \({\mathbb {Q}}= {\mathbb {P}}\). By this and Theorem 0.ii, and since \(I_V\le I_{{\mathbb {P}}}\) by definition, we conclude that \({\overline{q}}_V(\alpha ) \le q_{{\mathbb {P}}}(\alpha ) = I_{{\mathbb {P}}}^*(\alpha )\le I_V^*(\alpha )\), as claimed.

The same arguments as in the proof of Theorem A then provide the full LDP together with the Legendre–Fenchel duality relations and (1.23), which is merely a special case of (1.20).

Next, the exponential tightness and goodness assertions in Theorem B.i follow, as in the proof of Theorem A.iii, from Proposition C.1.vi and the bound \(q_V(1) = q_{{\mathbb {P}}}(1)=h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}})<\infty \).

Finally, (1.24) is proved as follows: by Lemma 4.6 we have \(q_{{\mathbb {P}}}(-1)\le \gamma _+\) and by (1.23) and Legendre–Fenchel duality we have \(I_V(0) = \sup _{\alpha \in {\mathbb R}}(-q_V(\alpha )) = -q_{{\mathbb {P}}}(-1)\); this can also be obtained using (4.13). \(\square \)

5.3 Proof of Theorem C

Once more, we plan to apply Proposition C.1, this time with \(Z_n:= \frac{1}{n} \ln R_n\) on \((\Omega _*, {\mathbb {P}}_*):= (\Omega , {\mathbb {P}})\) and the rate function \(I:=I_R\), keeping in mind that the latter is not convex in general.

The proof that \(q_R= I_R^*\) follows the same argument as in Theorems A and B, and once again, only the proof of the inequality \({\overline{q}}_R(\alpha )\le I_R^*(\alpha )\) when \(\alpha > 0\) needs to be adapted, where \({\overline{q}}_R\) is again defined by taking the limit superior in the definition (1.4) of \(q_R\). We have, for \(\alpha >0\),

$$\begin{aligned} \int R_n^\alpha \textrm{d}{\mathbb {P}}&= \sum _{u\in {{\mathcal {A}}}^n} \int 1_{[u]}(x) W_n(u, Tx)^\alpha \textrm{d}{\mathbb {P}}(x)\\&\le \sum _{u\in {{\mathcal {A}}}^n} \int 1_{[u]}(x)(W_n(u, T^{n+\tau _n}x)+n+\tau _n-1)^\alpha \textrm{d}{\mathbb {P}}(x)\\&\le C_n (n+\tau _n)^\alpha \kappa _{\alpha ,n} \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u)^{1-\alpha }, \end{aligned}$$

where the steps giving the last inequality are exactly as in the proof of Theorem B with \(\tau _n\) replaced by \(\tau _n+n-1\). As previously, we conclude that \({\overline{q}}_R(\alpha ) \le q_{{\mathbb {P}}}(\alpha ) = I_{{\mathbb {P}}}^*(\alpha )\le I_R^*(\alpha )\), since \(I_R\le I_{{\mathbb {P}}}\). We thus have \(q_R= I_R^*\).

Unlike in the previous theorems, the rate function is not convex in general and we cannot invert the Legendre–Fenchel transform. For the same reason, we cannot apply Proposition C.1.vii to obtain the full LDP. Nevertheless, since \(q_R(1) = q_{{\mathbb {P}}}(1)=h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}})<\infty \), the full LDP, exponential tightness and goodness of \(I_R\) follow at once from Proposition C.1.vi. This completes the proof of Part i.

We now turn to the proof of Part ii. We have already seen that \(q_R= I_R^*\), so it remains to prove (1.26), which we do by singling out the pathological case where \(\gamma _-=0\).

Case 1: \(\gamma _- = 0\). In this case, \(\gamma _+ = 0\) as well by (1.13), so Theorem 0.iii yields that \(I_{{\mathbb {P}}}(0) = 0\) and \(I_{{\mathbb {P}}}(s) = \infty \) for \(s\ne 0\). It follows from its definition that \(I_R=I_{{\mathbb {P}}}\), and taking the Legendre–Fenchel transform gives \(q_R=q_{{\mathbb {P}}}= 0\). In particular, both sides of (1.26) vanish identically.

Case 2: \(\gamma _- < 0\). Since \(I_R(s) = I_V(s)\) for all \(s>0\) by definition, and since \(\lim _{s\downarrow 0}I_V(s) = I_V(0)\) when \(\gamma _- < 0\), we find

$$\begin{aligned} q_R(\alpha )&= \sup _{s\in {\mathbb R}}(\alpha s - I_R(s)) \\&= \max \left\{ -I_R(0), \sup _{s>0}(\alpha s - I_V(s))\right\} \\&= \max \{-I_R(0),I_V^*(\alpha )\}. \end{aligned}$$

Further using that \(-I_R(0)=\gamma _+\) by definition and that

$$\begin{aligned} I_V^*(\alpha )=q_V(\alpha ) = \max \{q_{{\mathbb {P}}}(\alpha ), q_{{\mathbb {P}}}(-1)\} \end{aligned}$$

by (1.23), we conclude that \(q_R(\alpha ) = \max \{\gamma _+,q_{{\mathbb {P}}}(\alpha ), q_{{\mathbb {P}}}(-1)\}\). In view of (1.24), the quantity \(q_{{\mathbb {P}}}(-1)\) can be omitted from the maximum, and (1.26) follows.

Part iii of Theorem C, about (non)convexity of \(I_R\), is proved in Sect. 5.4. \(\square \)

Remark 5.3

In order to follow a common route for the proof of our three main theorems, our arguments to promote the weak LDP to a full one did not rely on exponential tightness, since the latter does not hold in general in the setup Theorem A.i–ii. We have instead made extensive use of the properties of the pressures. We now briefly argue that in the context of Theorems A.iii, B and C, exponential tightness could have been proved more directly, and the full LDP could have thus been obtained without any reference to pressures.

In the setup of Theorem A.iii, that is in the case of \((\frac{1}{n} \ln W_n)_{n\in {\mathbb N}}\) with \({\mathbb {Q}}={\mathbb {P}}\), the idea is that multiplying (3.4) in the case \(m \sim \textrm{e}^{nM}\) by \({\mathbb {P}}_n(u)\) and summing over \(u\in {{\mathcal {A}}}^n\) yields terms that fall in either of two categories: those with \({\mathbb {P}}_n(u) \gg \textrm{e}^{-nM}\) decay superexponentially, while those with \({\mathbb {P}}_n(u) \lesssim \textrm{e}^{-nM}\) are controlled in view of the exponential tightness estimate (2.24). The argument relies on SLD only, and the details are easily written by following the proof of the estimate (3.15). Further assuming UD, exponential tightness can be extended to \((\frac{1}{n} \ln R_n)_{n\in {\mathbb N}}\) and \((\frac{1}{n} \ln V_n)_{n\in {\mathbb N}}\) by introducing a minor variation of Lemma 3.6 where balls are replaced by intervals of the form \((s, \infty )\) and \((s-\epsilon , \infty )\).

5.4 Nonconvexity in Theorem C

In this subsection, we prove the numerous relations stated in Theorem C.iii. The topological entropy \(h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}})\) as well as the notion of measure of maximal entropy (MME) on \({{\,\textrm{supp}\,}}{\mathbb {P}}\) are recalled in “Appendix A”.

Proposition 5.4

Let \({\mathbb {P}}\) satisfy the assumptions of Theorem C, and consider the following properties:

  1. (a)

    \(I_R\) is convex;

  2. (b)

    \(I_R=I_V\);

  3. (c)

    \(q_R=q_V\);

  4. (d)

    \(\gamma _+ = q_{{\mathbb {P}}}(-1)\);

  5. (e)

    \(I_{{\mathbb {P}}}(-\gamma _+) = 0\);

  6. (f)

    \(q_{{\mathbb {P}}}(\alpha ) = -\gamma _+\alpha \) for all \(\alpha \le 0\);

  7. (g)

    \(\gamma _+ = \gamma _- \);

  8. (h)

    \(I_{{\mathbb {P}}}(s) = \infty \) for all \(s \in {\mathbb R}{\setminus }\{-\gamma _+\}\);

  9. (i)

    \(h({\mathbb {P}}) = h_{\text {top}}({{\,\textrm{supp}\,}}{\mathbb {P}})\), i.e. \({\mathbb {P}}\) is a MME on \({{\,\textrm{supp}\,}}{\mathbb {P}}\).

Then, Properties (a)–(f) are equivalent, Properties (g)–(i) are equivalent, and Properties (g)–(i) imply Properties (a)–(f). If, in addition, \(\tau _n=O(1)\) and \(C_n=O(1)\), then Properties (a)–(i) are equivalent.

Proof

Let us summarize the information we have gathered on the rate functions. First, \(I_V(s)\) and \(I_R(s)\) coincide for \(s \ne 0\) by comparison of (1.22) and (1.25). By the same two equations, and since we have already established that \( -\gamma _+ \le - q_{{\mathbb {P}}}(-1)\) in Lemma 4.6 and that \(-q_{{\mathbb {P}}}(-1)=I_V(0)\) in Theorem B.ii, we obtain

$$\begin{aligned} 0\le I_R(0) = -\gamma _+ \le - q_{{\mathbb {P}}}(-1) = I_V(0) \le I_{{\mathbb {P}}}(0). \end{aligned}$$
(5.2)

We now prove that (a)–(f) are equivalent.

(a)\(\Rightarrow \)(b):

If \(\gamma _+ = 0\), Theorem 0.iii.a implies that \(I_{{\mathbb {P}}}(0) = 0\), so by (5.2) we obtain \(I_V(0)=I_R(0)\) and thus \(I_V= I_R\).Footnote 12 Assume now \(\gamma _+<0\). By Theorem 0.iii.a and the formulae (1.22) and (1.25), the functions \(I_V\) and \(I_R\) are finite on \([0,-\gamma _+]\), and equal on \((0,-\gamma _+]\). Now, if (a) holds, then both \(I_V\) and \(I_R\) are convex, and since they are also lower semicontinuous, their restriction to \([0,\gamma _+]\) is continuous, so \(I_R(0) = I_V(0)\), and thus \(I_R= I_V\).

(a)\(\Leftarrow \)(b):

The function \(I_V\) is convex by Lemma 5.1.

(b)\(\Rightarrow \)(c):

By Theorems B.ii and C.ii, assuming (b) yields \(q_V= I_V^* = I_R^* = q_R\).

(c)\(\Rightarrow \)(d):

As mentioned in Remarks 1.10 and 1.13, for all \(\alpha \le -1\) we have \(q_V(\alpha ) = q_{{\mathbb {P}}}(-1)\) and \(q_R(\alpha ) = \gamma _+\), so by (c) we find \(q_{{\mathbb {P}}}(-1) = \gamma _+\).

(d)\(\Rightarrow \)(b):

Combining (d) and (5.2) gives again \(I_V(0)=I_R(0)\).

(b)\(\Rightarrow \)(e):

Assuming (b), the identity \(I_R(0)=I_V(0)\) reads \(-\gamma _+ = \inf _{s\ge 0}(s+I_{{\mathbb {P}}}(s))\) according to (1.22) and (1.25), so that for all \(\epsilon > 0\), there exists \(s_\epsilon \ge 0\) such that \(I_{{\mathbb {P}}}(s_\epsilon ) < -\gamma _+ - s_{\epsilon } + \epsilon \). Since also \(s_\epsilon \ge -\gamma _+\) by Theorem 0.iii.a, we have

$$\begin{aligned} 0\le I_{{\mathbb {P}}}(s_\epsilon ) < -\gamma _+ - s_{\epsilon } + \epsilon \le \epsilon . \end{aligned}$$

It follows that \(s_\epsilon \rightarrow -\gamma _+\) and \(I_{{\mathbb {P}}}(s_\epsilon ) \rightarrow 0\) as \(\epsilon \rightarrow 0\). By nonnegativity and lower semicontinuity of \(I_{{\mathbb {P}}}\), this allows to conclude that \(I_{{\mathbb {P}}}(-\gamma _+) = 0\).

(e)\(\Rightarrow \)(f):

By Theorem 0.iii.a and (e), we have \(I_{{\mathbb {P}}}(-\gamma _+) = 0\) and \(I_{{\mathbb {P}}}(s) = \infty \) for \(s < -\gamma _+\). Since \(I_{{\mathbb {P}}}\) is nonnegative, for all \(\alpha \le 0\), the duality relations (1.14) imply that \(q_{{\mathbb {P}}}(\alpha ) = \sup _{s\ge -\gamma _+}(\alpha s - I_{{\mathbb {P}}}(s)) = -\alpha \gamma _+\), where the supremum is reached at \(s=-\gamma _+\).

(f)\(\Rightarrow \)(d):

The latter is a special case of the former with \(\alpha = -1\).

We now prove the equivalence of (g)–(i).

(g)\(\Leftrightarrow \)(h):

This is a consequence of Theorem 0.iii.a.

(g)\(\Leftrightarrow \)(i):

This is a consequence of Proposition A.5.i.

To show that (g)–(i) imply (a)–(f) we remark that if (h) holds, then we have (e) because \(\inf _{s\in {\mathbb R}}I_{{\mathbb {P}}}(s) = 0\). Finally, to obtain the last statement, we prove in Proposition 5.5 below that if \(\tau _n=O(1)\) and \(C_n=O(1)\), then (d) implies (h). \(\square \)

Proposition 5.5

Suppose that \({\mathbb {P}}\) satisfies UD and SLD with \(\tau _n=O(1)\) and \(C_n=O(1)\). If \(q_{{\mathbb {P}}}(-1) = \gamma _+\), then \(I_{{\mathbb {P}}}(s) = \infty \) for all \(s \in {\mathbb R}{\setminus }\{-\gamma _+\}\).

Proof

Seeking a contradiction, suppose that there is \(s>-\gamma _+\) such that \(I_{{\mathbb {P}}}(s) <\infty \) (recall that \(I_{{\mathbb {P}}}(s)=\infty \) for all \(s<-\gamma _+\) by Theorem 0.iii.a). The idea of the proof is that, if such s exists, then there are words whose probability decreases like \(\textrm{e}^{-sn}\ll \textrm{e}^{\gamma _+ n}\), and that these words appear often enough to make \(q_{{\mathbb {P}}}(-1)\) strictly smaller than \(\gamma _+\), which contradicts the assumption. Let \(\tau := \sup _{n\in {\mathbb N}}\tau _n < \infty \), \(C:=\sup _{n\in {\mathbb N}}C_n < \infty \), and \(\epsilon := s+\gamma _+>0\). For \(\ell , m\in {\mathbb N}\) and \(w\in {{\mathcal {A}}}^m\), the computations below will involve

$$\begin{aligned} R_1&:= \frac{\ln (1 - C{\mathbb {P}}_m(w))}{2(m+\tau )} + \frac{\ln 2}{\ell }+ \frac{\ln C}{2\ell } - \frac{\ln (1 - C{\mathbb {P}}_m(w))}{\ell }\end{aligned}$$

and

$$\begin{aligned} R_2&:= \frac{m}{2\ell }\left( -\gamma _+ + \frac{\ln {\mathbb {P}}_m(w)}{m} -\frac{2\tau \gamma _+}{m}+\frac{3 \ln C}{m}\right) . \end{aligned}$$

Let us show that m, \(\ell \), and w can be chosen so that \(R_1\) and \(R_2\) are negative, which will later lead to the contradiction we are seeking. Since \(I_{{\mathbb {P}}}(s) <\infty \), the LDP implies that for all m large enough we can choose a word \(w\in {{\mathcal {A}}}^m\) such that \({\mathbb {P}}_m(w)\le \textrm{e}^{-m(s-\frac{1}{2} \epsilon )}\), and thus \(\gamma _+- \frac{\ln {\mathbb {P}}_m(w)}{m} \ge \frac{\epsilon }{2}\). We fix m large enough and \(w\in {{\mathcal {A}}}^m\) so that not only the above holds, but also \(-\frac{2\tau \gamma _+}{m}+\frac{3 \ln C}{m} \le \frac{\epsilon }{4}\) and \(\textrm{e}^{-m(s-\frac{1}{2} \epsilon )} < C^{-1}\) (so in particular \(1-C{\mathbb {P}}_m(w)>0\)). Thus, we have

$$\begin{aligned} R_2 \le -\frac{m\epsilon }{8\ell }<0, \end{aligned}$$

and, since the first term in \(R_1\) is negative, taking \(\ell \) large enough yields

$$\begin{aligned} R_1 \le \frac{\ln (1 - C{\mathbb {P}}_m(w))}{4(m+\tau )}<0. \end{aligned}$$

We remark that the above condition on \(\ell \) implies in particular that \(\ell >2(m+\tau )\), which will be useful to have in mind below.

For the remainder of the proof, m, w and \(\ell \) are fixed as above, so that \(R_1\) and \(R_2\) are negative. Let \(k\in {\mathbb N}\) and consider the intervals \(L_j:= [1+j\ell , (j+1)\ell -\tau ]\) for \(j=0, 1, \dots , 2k-1\). These 2k intervals are separated by gaps of length \(\tau \). Let \(A_k\) be the set of points \(x\in \Omega \) for which at least k out of the 2k intervals contain no occurrence of w (an interval \([i_1, i_2]\) contains no occurrence of w if there is no \(i_1 \le i \le i_2-m+1\) such that \(x_i^{i+m-1}=w\)). Clearly,

$$\begin{aligned} \begin{aligned} \sum _{u\in {{\mathcal {A}}}^{2k\ell }} {\mathbb {P}}^2_{2k\ell }(u)&= \int _{A_k} {\mathbb {P}}_{2k\ell }([x_1^{2k\ell }]) \textrm{d}{\mathbb {P}}(x) +\int _{A_k^\textsf{c}} {\mathbb {P}}_{2k\ell }([x_1^{2k\ell }]) \textrm{d}{\mathbb {P}}(x) \\&\le {\mathbb {P}}(A_k)\sup _{x\in \Omega } {\mathbb {P}}_{2k\ell }(x_1^{2k\ell }) + \sup _{x\in A_{k}^\textsf{c}} {\mathbb {P}}_{2k\ell }(x_1^{2k\ell }). \end{aligned} \end{aligned}$$
(5.3)

We will show below that

$$\begin{aligned} \limsup _{k\rightarrow \infty } \frac{1}{2k\ell } \ln ({\mathbb {P}}(A_k)\sup _{x\in \Omega } {\mathbb {P}}_{2k\ell }(x_1^{2k\ell }))&\le \gamma _+ + R_1,\end{aligned}$$
(5.4)
$$\begin{aligned} \limsup _{k\rightarrow \infty } \frac{1}{2k\ell } \ln \sup _{x\in {{\mathcal {A}}}_k^{\textsf{c}}}{\mathbb {P}}_{2k\ell }(x_1^{2k\ell })&\le \gamma _+ + R_2. \end{aligned}$$
(5.5)

Combining this with (5.3) yields, since the limit defining \(q_{{\mathbb {P}}}(-1)\) exists,

$$\begin{aligned} q_{{\mathbb {P}}}(-1)&= \lim _{n\rightarrow \infty }\frac{1}{n} \ln \sum _{u\in {{\mathcal {A}}}^{n}}{\mathbb {P}}^2_{n}(u) \\&= \limsup _{k\rightarrow \infty } \frac{1}{2k\ell }\sum _{u\in {{\mathcal {A}}}^{2k\ell }} {\mathbb {P}}^2_{2k\ell }(u) \\&\le \max \left\{ \gamma _+ + R_2, \gamma _+ + R_1\right\} \\&< \gamma _+, \end{aligned}$$

which contradicts the assumption, and thus proves the claim.

Proof of (5.4). By Remark B.5, since \(\tau _n = O(1)\), we can assume without loss of generality that the sequence \((C_n)_{n\in {\mathbb N}}\) required for (3.1) and (3.2) to hold, is also bounded by C.

The event “there is no occurrence of w in the interval \(L_j\)” can be written as \(T^{-j\ell }U\), where \(U:= \{x:W_m(w, x)>\ell +1-m-\tau \} \in {{\mathcal {F}}}_{\ell -\tau }\). By the definition of \(A_k\),

$$\begin{aligned} A_k\subseteq \bigcup _{\begin{array}{c} M\subseteq \{0, 1, \dots , 2k-1\}\\ |M|=k \end{array}}\bigcap _{j\in M} T^{-j\ell }U. \end{aligned}$$
(5.6)

Using the bound (3.4) of Lemma 3.1, we obtain

$$\begin{aligned} {\mathbb {P}}(U)&\le (1 - C{\mathbb {P}}_m(w))^{\lfloor \frac{\ell +1-m-\tau }{m+\tau }\rfloor } \le (1 - C{\mathbb {P}}_m(w))^{ \frac{\ell }{m+\tau }-2}. \end{aligned}$$

Since \(U\in {{\mathcal {F}}}_{\ell -\tau }\), we can apply UD recursively \(k-1\) times to the intersection in (5.6). This and a simple union bound (noting that the union in (5.6) contains \(\left( {\begin{array}{c}2k\\ k\end{array}}\right) \) terms) yields

$$\begin{aligned} {\mathbb {P}}(A_k)\le \left( {\begin{array}{c}2k\\ k\end{array}}\right) C^{k-1}(1 - C{\mathbb {P}}_m(w))^{ \frac{k \ell }{m+\tau }-2k}. \end{aligned}$$

Since \(\lim _{n\rightarrow \infty }\frac{1}{2k} \ln \left( {\begin{array}{c}2k\\ k\end{array}}\right) = \ln 2\), we have proved that \(\limsup _{k\rightarrow \infty } \frac{1}{2k\ell } \ln {\mathbb {P}}(A_k) \le R_1\). Combining this with the definition of \(\gamma _+\) completes the proof of (5.4).

Proof of (5.5). First, let \(a_n:= \sup _{u\in {{\mathcal {A}}}^n}\ln {\mathbb {P}}_n(u)\). Clearly, the sequence \((a_n)_{n\in {\mathbb N}}\) is nonincreasing, and by SLD, for all \(n,m\in {\mathbb N}\), there is \(0\le \ell \le \tau \) such that \(a_{n+m} \ge a_{n+m+\ell } \ge a_n + a_m -\ln C\). By applying Fekete’s lemma to the sequence \((\ln C-a_n)_{n\in {\mathbb N}}\), we obtain \(\gamma _+ = \sup _{n\in {\mathbb N}} \frac{a_n-\ln C}{n}\), so for every \(x\in \Omega \) and \(n\in {\mathbb N}\),

$$\begin{aligned} \ln {\mathbb {P}}_n(x_1^n) \le n\gamma _++\ln C. \end{aligned}$$
(5.7)

Let \(x\in A_{k}^{\textsf{c}}\). By construction, we can choose k occurrences of w in \(x_1^{2k\ell }\), which are pairwise separated by gaps of length at least \(\tau \) (possible additional occurrences of w in \(x_1^{2k\ell }\) do not play any role in the argument). In other words, we can write \(x_1^{2k\ell } = u_0 w u_1 w u_2 \dots u_{k-1} w u_k\) where \(|u_0|\ge 0\) and \(|u_i|\ge \tau \) for all \(1\le i\le k\). We then apply UD before and after each occurrence of w of \(x_1^{2k\ell }\).Footnote 13 More precisely, in order to apply UD, we discard a prefix of length \(\tau \) in each \(u_i\) with \(1\le i \le k\), and we discard a suffix of length \(\tau \) in each \(u_i\) with \(0\le i \le k-1\); the \(u_i\) whose length is insufficient for this to be done are discarded entirely. By using (5.7) to bound the probability of what remains of the \(u_i\) at the end of this process, we easily obtain the estimate

$$\begin{aligned} {\mathbb {P}}_{2k\ell }(x_1^{2k\ell })&\le {\mathbb {P}}_m(w)^k C^{3k+1}\textrm{e}^{\gamma _+(2k\ell - km - 2k\tau )}, \end{aligned}$$

where the \(C^{3k+1}\) comes from the fact that we have used UD at most 2k times and (5.7) at most \(k+1\) times, and where \(2k\ell - km - 2k\tau \) is the minimum cumulative length of the words to which we have applied (5.7). By the definition of \(R_2\), this establishes (5.5), and thus the proof is complete. \(\square \)

6 Relation to Almost Sure Convergence Results

In this section we discuss the complementarity of our LDPs and existing almost sure convergence results in the literature.

6.1 A brief review of results on almost sure convergence

Throughout this subsection, we consider two measures \({\mathbb {P}},{\mathbb {Q}}\in {{\mathcal {P}}}_{\text {inv}}\), and we only assume that \({\mathbb {Q}}\) satisfies UD and SLD. Under these conditions, the limits

$$\begin{aligned} h_{\mathbb {P}}(x):= \lim _{n\rightarrow \infty } -\frac{1}{n} \ln {\mathbb {P}}_n(x_1^n) \end{aligned}$$
(6.1)

and

$$\begin{aligned} h_{\mathbb {Q}}(x):= \lim _{n\rightarrow \infty } -\frac{1}{n} \ln {\mathbb {Q}}_n(x_1^n) \end{aligned}$$
(6.2)

exist for \({\mathbb {P}}\)-almost all x. Moreover,

$$\begin{aligned} \int h_{\mathbb {P}}(x) \textrm{d}{\mathbb {P}}(x) = \lim _{n\rightarrow \infty } -\frac{1}{n} \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u) \ln {\mathbb {P}}_n(u)=: h({\mathbb {P}}) \end{aligned}$$
(6.3)

and

$$\begin{aligned} \int h_{\mathbb {Q}}(x) \textrm{d}{\mathbb {P}}(x) = \lim _{n\rightarrow \infty } -\frac{1}{n} \sum _{u\in {{\mathcal {A}}}^n} {\mathbb {P}}_n(u) \ln {\mathbb {Q}}_n(u) =: h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}}). \end{aligned}$$
(6.4)

The relations (6.1) and (6.3) are the contents of the well-known SMB theorem. While often stated assuming ergodicity, the general form of the SMB theorem, which only requires shift invariance, is obtained as a special case of the Brin–Katok formula [BK83]; see also Theorem 9.3.1 of [VO16] for a direct proof.

On the other hand, (6.2) and (6.4) are much less general, and the literature on them is comparatively sparse. Even the limit defining \(h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\) in (6.4) fails to exist in some cases; see e.g. [vEFS93, §A.5.2]. In our setup, the almost sure existence of the limit (6.2) and the validity of (6.4) are guaranteed by the assumption UD imposed on \({\mathbb {Q}}\). Indeed, as discussed in [CDEJR23a, §5], the assumption implies that the functions \((f_n)_{n\in {\mathbb N}}\) defined by \(f_n(x) = \ln {\mathbb {Q}}_n(x_1^n)\) satisfy the gapped almost subadditivity condition of the adaptation of Kingman’s theorem given in [Raq23, §2].

Still assuming only that \({\mathbb {Q}}\) satisfies UD and SLD (and that \({\mathbb {P}}\) is merely shift invariant), the random variables studied in our main theorems are known to satisfy the following almost sure identities, which, as mentioned in the introduction, justify their role as entropy estimators: for \({\mathbb {P}}\)-almost every x, we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n} \ln R_n(x)= h_{{\mathbb {P}}}(x) \end{aligned}$$
(6.5)

and

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n} \ln V_n(x) = h_{{\mathbb {P}}}(x), \end{aligned}$$
(6.6)

and, for \(({\mathbb {P}}\otimes {\mathbb {Q}})\)-almost every (xy), we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n} \ln W_n(x,y) = h_{{\mathbb {Q}}}(x). \end{aligned}$$
(6.7)

The convergences expressed in (6.5) and (6.6) were first proved under the assumption that \({\mathbb {P}}\) is ergodic, in which case \(h_{{\mathbb {P}}}(x) = h({\mathbb {P}})\) for \({\mathbb {P}}\)-almost every x; see [WZ89, OW93, Kon98]. However, combining Remark 1 of [Kon98, §2] with a generalization of Kac’s lemma, as found e.g. in [VO16, §1.2.2], shows that shift invariance suffices for (6.5). Since \(R_n(x) \le V_n(x) + n-1\) and \(V_n(x) = R_n(T^{m_n(x)-1}x) - (n-m_n(x))\) with \(m_n(x):= \max \{1\le i\le n: x_i^{i+n-1}=x_1^n\}\), it is straightforward to show that the bounds used in [Kon98, §2] to prove (6.5) via the Borel–Cantelli lemma also imply (6.6). The convergence expressed in (6.7) was first proved for Markov measures, and then under various strong mixing assumptions; see [WZ89, Shi93, MS95, Kon98]. However, it is known that (6.7) may fail for some mixing measures, even when \({\mathbb {P}}={\mathbb {Q}}\); see [Shi93]. In the present setup, the SLD assumption satisfied by \({\mathbb {Q}}\) and (6.2) have been shown to imply (6.7) in [CDEJR23a, §2].

6.2 Zeroes of the rate functions

In this subsection, we discuss how the LDPs of Theorems 0 and AC complement the above almost sure results. We assume throughout that the pair \(({\mathbb {P}}, {\mathbb {Q}})\) is admissible, but not necessarily that \({\mathbb {P}}\) satisfies PA. The central role is held by the set

$$\begin{aligned} J:= \{s: I_{{\mathbb {Q}}}(s)=0\} \subseteq [0,\infty ), \end{aligned}$$
(6.8)

which, by convexity of \(I_{{\mathbb {Q}}}\) (see Theorem 0.i), is an interval and coincides with the subdifferential of \(q_{{\mathbb {Q}}}\) at 0. Since \(I_{{\mathbb {Q}}}\) is lower semicontinuous, and since \(\inf _{s\in {\mathbb R}} I_{{\mathbb {Q}}}(s) = 0\), only the three following cases are possible: \(J=[a,b]\) for some \(0\le a\le b<\infty \), \(J=[a,\infty )\) for some \(a\ge 0\), and \(J = \emptyset \), which happens when \(I_{{\mathbb {Q}}}(s)>0\) for all \(s\in {\mathbb R}\) and \(\lim _{s\rightarrow \infty }I_{{\mathbb {Q}}}(s)=0\). When \({\mathbb {Q}}= {\mathbb {P}}\), we necessarily have \(J=[a,b]\) since \(I_{{\mathbb {P}}}\) is a good rate function (recall Theorem 0.iii).

Lemma 6.1

Let \(({\mathbb {P}}, {\mathbb {Q}})\) be admissible. Then, the rate function \(I_W\) of (1.19) vanishes precisely on J. Moreover, if \({\mathbb {Q}}={\mathbb {P}}\), the same is true of the rate functions \(I_V\) of (1.22) and \(I_R\) of (1.25).

Proof

The statement for \(I_W\) is an immediate consequence of (1.19). Appealing to Remark 1.11, the statement for \(I_V\) is immediate. In order to extend the result to \(I_R\), it remains to note that \(I_R(0) = 0\) if and only if \(\gamma _+ = 0\), which, by (1.15), happens if and only if \(I_{{\mathbb {P}}}(0) = 0\), i.e. if and only if \(0\in J\). \(\square \)

We briefly comment on the simple case where J is a singleton, which occurs in particular in Examples 2.12.3, and which corresponds to the property of exponential rates for entropy in the terminology of [Shi96, Chap. III], [JB13] and [CRS18]. In this case, we obtain from Lemma 6.1 that the rate functions of Theorems 0 and AC all vanish at a single point, say a. In this situation, a standard argument shows that, almost surely, all limits in (6.2) and (6.5)–(6.7) exist and are equal to a; see e.g. [Ell06, §II.6] and the argument in the proof of Proposition 6.2.i below. In this simple case, our LDPs thus imply the almost sure convergence results reviewed in Sect. 6.1.

When J is an interval of positive length, our LDPs do not imply these almost sure convergence results anymore. Still, all displayed equations in Sect. 6.1 remain valid under our assumptions (see the references given there), and we show in the following two propositions that the limiting values are constrained by J.

Proposition 6.2

Let \(({\mathbb {P}}, {\mathbb {Q}})\) be an admissible pair. Then, the following hold:

  1. i.

    If J is nonempty, then \(h_{\mathbb {Q}}(x) \in J\) for \({\mathbb {P}}\)-almost all x and \(h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\in J\).

  2. ii.

    If J is empty, then \(h_{\mathbb {Q}}(x) =\infty \) for \({\mathbb {P}}\)-almost all x and \(h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}}) = \infty \).

  3. iii.

    If J is a singleton, then \(J=\{h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\}\), \(h_{\mathbb {Q}}(x) = h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\) for \({\mathbb {P}}\)-almost all x, and the convergence in (6.2) (resp.(6.7)) occurs in \(L^p({\mathbb {P}})\) (resp. \(L^p({\mathbb {P}}\otimes {\mathbb {Q}})\)) for all \(p\ge 1\).

Proof

We proceed with the proof, taking (6.4), (6.7) as well as the almost sure existence of the limit in (6.2) for granted.

  1. i.

    Suppose that the interval J is nonempty. Then, by convexity,

    $$\begin{aligned} \inf _{s: {\text {dist}}(s, J)\ge \delta }I_{{\mathbb {Q}}}(s) > 0 \end{aligned}$$

    for every \(\delta >0\). Therefore, by the large-deviation upper bound, the probability \({\mathbb {P}}\{x: {\text {dist}}(-\frac{1}{n} \ln {\mathbb {Q}}_n(x_1^n), J)\ge \delta \}\) decreases exponentially fast and a standard application of the Borel–Cantelli lemma, followed by taking the limit \(\delta \rightarrow 0\), implies that \(\inf J \le h_{\mathbb {P}}(x) \le \sup J\) for \({\mathbb {P}}\)-almost every x. By (6.4), we obtain \(h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\in J\).

  2. ii.

    If \(J=\emptyset \), a similar argument applies to \({\mathbb {P}}\{x: -\frac{1}{n} \ln {\mathbb {Q}}_n(x_1^n) \le K\}\) with K arbitrarily large, forcing \(h_{\mathbb {Q}}(x)\) to properly diverge to \(\infty \) for \({\mathbb {P}}\)-almost every x. By (6.4), we conclude that also \(h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}}) = \infty \).

  3. iii.

    If J is a singleton, by Part i we conclude that \(J =\{h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\}\) and that \(h_{\mathbb {Q}}(x) = h_{\text {c}}({\mathbb {P}}|{\mathbb {Q}})\) for \({\mathbb {P}}\)-almost all x. We now claim that there exists \(\alpha > 0\) such that \(q_{{\mathbb {Q}}}(\alpha )<\infty \). Indeed, if this were not the case, we would have \(I_{{\mathbb {Q}}}(s) = q_{{\mathbb {Q}}}^*(s)=\sup _{\alpha \le 0}(\alpha s -q_{{\mathbb {Q}}}(\alpha ))\) for all \(s\in {\mathbb R}\), so \(I_{{\mathbb {Q}}}\) would be nonincreasing, which contradicts the assumption that J is a singleton. For such \(\alpha \), note that the function \(\phi : s \mapsto \exp (\alpha s^{1/p})\) grows superlinearly and that

    $$\begin{aligned} \limsup _{n\rightarrow \infty } \int \phi \left( |\tfrac{1}{n} \ln {\mathbb {Q}}_n|^p\right) \textrm{d}{\mathbb {P}}\le \exp \left( q_{{\mathbb {Q}}}(\alpha )\right) \end{aligned}$$
    (6.9)

    by Jensen’s inequality applied to \(\root n \of {\,\cdot \,}\). Hence, de la Vallée Poussin’s criterion guarantees the sufficient uniform integrability requirement for the convergence in (6.2) to hold in \(L^p({\mathbb {P}})\). The exact same argument applies to (6.7): we just need to replace (6.9) with

    $$\begin{aligned} \limsup _{n\rightarrow \infty } \int \phi \left( (\tfrac{1}{n} \ln W_n)^p\right) \textrm{d}({\mathbb {P}}\otimes {\mathbb {Q}}) \le \exp \left( q_W(\alpha )\right) \end{aligned}$$

    and to recall that, by (1.20) and since \(\alpha >0\), we have \(q_W(\alpha )=q_{{\mathbb {Q}}}(\alpha )<\infty \).\(\square \)

In the next proposition, we consider the special case where \({\mathbb {Q}}={\mathbb {P}}\), which allows to include the limits (6.5) and (6.6) into the discussion. Notice also that even when discussing (6.5), the measure \({\mathbb {P}}\) is not assumed to satisfy PA.

Proposition 6.3

Assume \({\mathbb {P}}\) satisfies UD and SLD, and let J be defined by (6.8) with \({\mathbb {Q}}={\mathbb {P}}\). Then, the following hold:

  1. i.

    \(J\subseteq [-\gamma _+,-\gamma _-]\).

  2. ii.

    \(h_{\mathbb {P}}(x) \in J\) for \({\mathbb {P}}\)-almost every x, and \(h({\mathbb {P}}) \in J\).

  3. iii.

    If J is a singleton, then \(J=\{h({\mathbb {P}})\}\) and \(h_{\mathbb {P}}(x) = h({\mathbb {P}})\) for \({\mathbb {P}}\)-almost every x. Moreover, for all \(p\ge 1\), the convergence in (6.1), (6.5) and (6.6) holds in \(L^p({\mathbb {P}})\), and that of (6.7) holds in \(L^p({\mathbb {P}}\otimes {\mathbb {P}})\).

Proof

Part i is immediate by (1.15) and Part ii is a special case of Proposition 6.2.i. All statements in Part iii except the one about (6.5) are either special cases of Proposition 6.2.iii or proved in exactly the same way (possibly using the simplification offered by (1.16)).

The statement about (6.5) requires additional considerations to lift PA. As discussed at the end of Sect. 4.2, the large-deviation upper bound of Theorem C still holds without assuming PA, and we have \({\overline{q}}_R\le I_R^*\), with \({\overline{q}}_R\) defined by taking the limit superior in (1.4). Then, (6.9) is simply replaced by

$$\begin{aligned} \limsup _{n\rightarrow \infty } \int \phi \left( (\tfrac{1}{n} \ln R_n)^p\right) \textrm{d}{\mathbb {P}}\le \exp \left( {\overline{q}}_R(\alpha )\right) . \end{aligned}$$

Since \({\overline{q}}_R(\alpha ) \le I_R^*(\alpha ) = q_{{\mathbb {P}}}(\alpha )\) for all \(\alpha \ge 0\) by (1.26), the same argument applies once more. \(\square \)

Remark 6.4

The \({\mathbb {P}}\)-essential image of \(h_{\mathbb {P}}(x)\) can be a strict subset of J.