1 Introduction

We study in this paper a notion of asymptotic equivalence of probability measures that generalizes the equivalence of the well-known microcanonical and canonical ensembles in the thermodynamic limit (see [1] and references therein). The basic problem that we consider can be defined in a general way as follows. Let \(M_n\) be a random variable defined with respect to two probability measures \(P_n\) and \(Q_n\) indexed by \(n\in {\mathbb {N}}\). Can we establish conditions on these measures such that

$$\begin{aligned} E_{P_n}[M_n]=E_{Q_n}[M_n], \end{aligned}$$
(1)

where \(E[\,\cdot \,]\) denotes the expectation? For a fixed \(n<\infty \), it is unlikely that such conditions exist beyond the obvious requirement that \(P_n=Q_n\) almost everywhere. In the limit \(n\rightarrow \infty \), however, it is possible for two different measures to concentrate on the same value so as to give the same expectation. The aim of this paper is to put “bounds” on the differences between \(P_n\) and \(Q_n\) that guarantee that this concentration, which is related to the law of large numbers, holds for a large class of random variables. Physically, this means that two probabilistic models of a given system can predict the same typical or macroscopic properties of that system even if the models are different.

The framework that we use to study this problem is the theory of large deviations [2,3,4]. We assume that the random variable \(M_n\) satisfies the large deviation principle (LDP) with respect to \(P_n\) and \(Q_n\) and define the set of concentration points of \(M_n\) relative to either measure as the set of global minima and zeros of their respective rate function. In many applications, this set reduces to a single value, which then represents the typical value of \(M_n\) (relative to \(P_n\) or \(Q_n\)) on which its expectation concentrates exponentially as \(n\rightarrow \infty \) (again relative to \(P_n\) or \(Q_n\)). In this context, the problem that we consider is: Under what conditions is the set of concentration points of \(M_n\) relative to \(P_n\) equal to the set of concentration points of \(M_n\) relative to \(Q_n\)? In other words, under what conditions are the typical values of \(M_n\) the same?

To answer these questions, we formulate in Sect. 2 some large deviation results related to the Radon–Nikodym derivative of \(P_n\) relative to \(Q_n\), which can be seen as a random variable with respect to either measure, and then use these results in Sect. 3 to prove essentially the following: If the Radon–Nikodym derivative is approximately equal to 1 almost everywhere, on the logarithmic scale defined by the LDP, then the two sets of concentration points of \(M_n\) obtained relative to \(P_n\) and \(Q_n\) are the same (see the main Theorem 3). This condition on the Radon–Nikodym derivative defines, as explained in Sect. 2, a general notion of asymptotic equivalence of measures from which we can summarize our main result as follows: If \(P_n\) and \(Q_n\) are asymptotically equivalent, then they are also equivalent at the level of typical values of \(M_n\).

This result is known to hold for specific conditional and exponentially-tilted measures, corresponding in statistical physics to the microcanonical and canonical ensembles, respectively [1]. The contribution of this paper is to extend this asymptotic equivalence to a larger class of probability measures, defining general probabilistic models and stochastic processes, under precise large deviation hypotheses stated below. This extension has its source in recent works applying classical ensemble theory to describe the paths of nonequilibrium processes (see, e.g., [5,6,7,8,9,10,11]) and relies on a special symmetry property, referred to as the fluctuation relation (see [12] for a review) that characterizes the fluctuations of physical quantities related to these processes. Another source is the study of random graphs, such as the Erdös–Rényi graph model and its variants, which become equivalent under some conditions in the infinite-volume limit [13,14,15,16,17].

A formal result of Mori [18] pointed recently to this general equivalence for quantum systems, based on bounds on the relative entropy. The approach followed here was developed independently and is completely different: it is based on the general language of probability measures and their Radon–Nikodym derivative, and so covers both “static” and “dynamic” processes. This is illustrated in Sect. 4 with many applications related to sequences of random variables, equilibrium particle systems, random graphs, in addition to Markov processes evolving in discrete and continuous time. For this last application, our results provide conditions under which two stochastic processes, representing, for example, two different models for an information source or a nonequilibrium process, cannot be distinguished at the level of ergodic averages or stationary states. We also revisit in that section the equivalence of the microcanonical and canonical ensembles to clearly explain how our results extend the equivalence of classical ensembles in statistical physics.

2 General Framework

2.1 Notations

We consider two probability measures \(P_n\) and \(Q_n\) on a space \(\Omega _n\), with \(n\in {\mathbb {N}}\), which define technically two sequences of probability spaces. Following the introduction, we also consider a random variable \(M_n:\Omega _n\rightarrow {\mathcal {M}}\), called a macrostate or observable, which is a function of the space \(\Omega _n\) to a Polish space \({\mathcal {M}}\), that is, a complete separable metric space [3].

We give examples in Sect. 4 of different measures and macrostates. To fix the ideas, it is useful to picture \(\Omega _n\) as the space of microscopic configurations of a system of n particles and \(P_n\) and \(Q_n\) as two probability distributions or statistical ensembles determining the likelihood of a configuration or microstate denoted by \(\omega =(\omega _1,\omega _2,\ldots ,\omega _n)\in \Omega _n\), where \(\omega _i\) is the state of the ith particle taking values in some set \(\Omega \) so that \(\Omega _n=\Omega ^n\). In this case, \(M_n\) could represent the total energy of the system, for example, or its magnetization if we consider a spin model. Alternatively, \(\omega _i\in \Omega \) could be the state of a stochastic process at time i, so that \(\omega =(\omega _1,\omega _2,\ldots ,\omega _n)\) is a path of the process from time 1 to time n and \(\Omega _n=\Omega ^n\) is the set of all such paths. The observable \(M_n\) in that case is a functional of the paths, which often takes the form of an additive or ergodic average

$$\begin{aligned} M_n = \frac{1}{n}\sum _{i=1}^n f(\omega _i), \end{aligned}$$
(2)

where f is some function of \(\Omega \), e.g., a real-valued function, in which case \({\mathcal {M}}\) is simply \({\mathbb {R}}\). The measures \(P_n\) and \(Q_n\) then represent two different models for the stochastic process inducing two distributions for \(M_n\).

To compare these two measures, we use the Radon–Nikodym derivative (RND) of \(P_n\) relative to \(Q_n\), denoted by

$$\begin{aligned} R_n = \frac{dP_n}{dQ_n}. \end{aligned}$$
(3)

This quantity establishes, as is well known, a bridge between expectations relative to \(P_n\) and \(Q_n\) as follows:

$$\begin{aligned} E_{P_n}[\,\cdot \,] = E_{Q_n}[R_n\,\cdot \,]. \end{aligned}$$
(4)

In particular,

$$\begin{aligned} P_n(B) = E_{P_n}[{\mathbf {1}}_B]= E_{Q_n}[R_n {\mathbf {1}}_B], \end{aligned}$$
(5)

where \({\mathbf {1}}_B\) is the indicator or characteristic function of the set B.

The RND, as a function \(R_n(\omega )\) of the elements \(\omega \in \Omega _n\), is a real random variable having different distributions in general relative to \(P_n\) and \(Q_n\). To discuss the properties of these distributions, we will make the simplifying assumption throughout this paper that \(P_n\) and \(Q_n\) have the same support on \(\Omega _n\), so that \(P_n\) is absolutely continuous with respect to \(Q_n\) and \(Q_n\) is absolutely continuous with respect to \(P_n\). In this case, \(R_n\) is finite and strictly positive almost surely on the support of \(P_n\) or \(Q_n\). The action \(W_n\), defined by

$$\begin{aligned} W_n = -\frac{1}{n}\log R_n, \end{aligned}$$
(6)

is then also a real and finite random variable on the support of \(P_n\) or \(Q_n\). Up to a constant, \(W_n\) is just the log-likelihood of \(P_n\) relative to \(Q_n\).

The reason for introducing the action is that, in many applications of interest, the RND behaves exponentially with n, so that its fluctuations are more conveniently studied by transforming it, as is common in large deviation theory, to a random variable whose distribution relative to \(P_n\) or \(Q_n\) concentrates in the limit \(n\rightarrow \infty \). The main insight needed for proving equivalence of measures is to analyze this concentration using large deviation theory.

2.2 Large Deviation Principles

The macrostate \(M_n\) and the action \(W_n\) are two random variables relative to \(P_n\) or \(Q_n\). The goal, following the introduction, is to compare the typical values of \(M_n\) obtained under each measure by analyzing, via the distribution of \(W_n\), the differences between these measures. The main hypothesis used to establish this comparison, which is the central hypothesis of this work, is that \(M_n\) and \(W_n\) jointly satisfies the large deviation principle, defined as follows.

Let \({\mathcal {Y}}\) be a Polish space, \(Y_n\) a sequence of random variables mapping \(\Omega _n\) into \({\mathcal {Y}}\), \(P_n\) a sequence of measures on \(\Omega _n\), and I a lower semi-continuous function that maps \({\mathcal {Y}}\) to \([0,\infty ]\) with compact level sets. For any subset \(A\subseteq {\mathcal {Y}}\), define

$$\begin{aligned} I(A) = \inf _{y\in A} I(y). \end{aligned}$$
(7)

We say that \(Y_n\) satisfies the large deviation principle (LDP) with respect to \(P_n\) with rate function I if

$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{1}{n}\log P_n(Y_n\in C)\le -I(C) \end{aligned}$$
(8)

for any closed subset C of \({\mathcal {Y}}\) and

$$\begin{aligned} \liminf _{n\rightarrow \infty } \frac{1}{n}\log P_n(Y_n\in O)\ge -I(O) \end{aligned}$$
(9)

for any open subset O of \({\mathcal {Y}}\). The function I(y), which is called the rate function, is known to be unique and non-negative, \(I\ge 0\) [2,3,4]. Its domain is the set of values \(y\in {\mathcal {Y}}\) for which \(I(y)<\infty \).

The LDP translates in technical terms the fact that the distribution of \(Y_n\) decays exponentially in n, except on sets such that \(I=0\). In many applications, the two large deviation bounds above are found to be the same for “normal” sets A, such as closed intervals or compact balls, which leads to

$$\begin{aligned} \lim _{n\rightarrow \infty }-\frac{1}{n}\log P_n(Y_n \in A) =I(A). \end{aligned}$$
(10)

In the case where \({\mathcal {Y}}\) is a Euclidean space and \(Y_n\) has a density \(p_n(y)\) with respect to the Lebesgue measure, we can also write more simply

$$\begin{aligned} \lim _{n\rightarrow \infty } -\frac{1}{n}\log p_n(y) =I(y), \end{aligned}$$
(11)

which clearly shows that the leading behaviour of the density of \(Y_n\) is a decaying exponential in n, except where \(I(y)=0\), with corrections in the exponential that are smaller than linear in n. In the large deviation and information theory literature [2,3,4, 19], this exponential scaling or approximation is often taken to define a logarithmic equivalence expressed by

$$\begin{aligned} p_n(y) \asymp e^{-nI(y)} \end{aligned}$$
(12)

or

$$\begin{aligned} P(Y_n\in A)\asymp e^{-nI(A)}. \end{aligned}$$
(13)

In this sense, \(a_n\asymp b_n\) means that \(a_n\) and \(b_n\) are equal up to \(e^{o(n)}\) corrections in n or, more precisely,

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n}\log \frac{a_n}{b_n} =0. \end{aligned}$$
(14)

With these definitions, we express our main hypotheses as follows.

Hypotheses 1

  • The couple \((M_n,W_n)\) satisfies, as a random variable on the product space \({\mathcal {M}}\times {\mathbb {R}}\), the LDP relative to \(P_n\) with joint rate function \(K_P\);

  • \((M_n,W_n)\) satisfies the LDP relative to \(Q_n\) with joint rate function \(K_Q\);

  • \(K_P\) and \(K_Q\) have the same domain.

These hypotheses are satisfied in many applications. The first one means essentially that

$$\begin{aligned} p_n(M_n=m,W_n= w) \asymp e^{-n K_P(m,w)}, \end{aligned}$$
(15)

assuming formally that the joint probability density of \(M_n\) and \(W_n\) exists. A similar result holds for \(Q_n\) with the rate function \(K_Q\). In the absence of densities, the meaning of the LDP is as defined above with the upper and lower bounds. In all cases, our prior assumption that \(P_n\) and \(Q_n\) have the same support is reflected in the hypothesis that \(K_P\) and \(K_Q\) have the same domain.

In general, it is known that having the joint LDP for two random variables implies that each random variable also satisfies the LDP. This marginalization of the LDP can be derived from the definition of this principle or from the so-called contraction principle [3, Thm. 4.2.1], and leads to variational formula for the marginal rate functions of \(M_n\) and \(W_n\).

Proposition 1

Under Hypotheses 1, \(M_n\) satisfies the LDP relative to \(P_n\) with marginal rate function

$$\begin{aligned} J_P(m)=\inf _{w\in {\mathbb {R}}} K_P(m,w) \end{aligned}$$
(16)

and the LDP relative to \(Q_n\) with marginal rate function

$$\begin{aligned} J_Q(m) =\inf _{w\in {\mathbb {R}}} K_Q(m,w). \end{aligned}$$
(17)

which has the same domain as \(J_P\). Similarly, \(W_n\) satisfies the LDP relative to \(P_n\) and \(Q_n\) with rate functions

$$\begin{aligned} I_P(w) = \inf _{m\in {\mathcal {M}}} K_P(m,w) \end{aligned}$$
(18)

and

$$\begin{aligned} I_Q(w)=\inf _{m\in {\mathcal {M}}} K_Q(m,w), \end{aligned}$$
(19)

respectively, having the same domain.

These formulae can be justified easily in terms of densities by applying the LDP and the Laplace principle for approximating exponential integrals. Considering, for example, the marginalization of \(W_n\) in

$$\begin{aligned} p_n(M_n=m) = \int _{\mathbb {R}}p_n(m,w)dw\asymp \int _{\mathbb {R}}e^{-nK_P(m,w)}dw \end{aligned}$$
(20)

leads to

$$\begin{aligned} p_n(M_n= m) \asymp \exp \left( -n\inf _{w\in {\mathbb {R}}} K_P(m,w)\right) =\exp \left( -n J_P(m)\right) \end{aligned}$$
(21)

in the limit \(n\rightarrow \infty \).

We give next a rigorous proof for measures based on the contraction principle of large deviation theory [3], which is itself an application of the Laplace principle [4, 20].

Proof

The contraction principle states that, if \(Y_n\) satisfies the LDP with rate function I, then \(Z_n=f(Y_n)\) satisfies the LDP with rate function

$$\begin{aligned} J(z) = \inf _{y:f(y)=z} I(y) \end{aligned}$$
(22)

if the “contraction” function f is continuous [3, Thm. 4.2.1].

In the case of marginalizing, for example, from \((M_n,W_n)\) to \(M_n\), the contraction function is simply a projection \(f(M_n,W_n)=M_n\), which is continuous under the natural product topology for the space of \((M_n, W_n)\). Therefore,

$$\begin{aligned} J_P(m)=\inf _{w\in {\mathbb {R}}:f(m,w)=m}K_P(m,w)=\inf _{w\in {\mathbb {R}}} K_P(m,w). \end{aligned}$$
(23)

All other contractions follow in the same way. Moreover, the fact that the marginal rate functions have the same domain simply follows from our assumption that the joint rate functions have the same domain. \(\square \)

Many techniques can be used to derive the LDP for \(M_n\) and \(W_n\) and the corresponding rate function, though the derivation of LDPs is, as always, a difficult problem. In the case of equilibrium many-particle systems, one can use the contraction principle for observables \(M_n\) that admit a representation function, as described in Sect. 5.3.4 of [20], or contractions based on the so-called level 3 of large deviations [2], which are however very difficult to work with. For Markov processes, one can also use the contraction principle when considering observables that depend on both the state of the process and its jumps or increments [10]. In this case, the contraction is applied to the level 2.5 of large deviations, which involves explicit LDPs for the empirical measure and empirical current [21,22,23]. Finally, when \(M_n\) takes values in \({\mathbb {R}}^d\) one can use the Gärtner–Ellis Theorem, which is based on the following function:

$$\begin{aligned} \lambda _P(k,\eta ) =\lim _{n\rightarrow \infty }\frac{1}{n}\log E_{P_n} [e^{n\langle k, M_n\rangle +n\eta W_n}], \end{aligned}$$
(24)

called the scaled cumulant generating function. Here \(k\in {\mathbb {R}}^d\), \(\langle \cdot ,\cdot \rangle \) is the standard scalar product, and \(\eta \in {\mathbb {R}}\). Provided that this function exists in an open neighbourhood of the origin and is “steep” (see [2,3,4] for details), this theorem states that \((M_n,W_n)\) satisfies the LDP with rate function \(K_P\) given by the Legendre–Fenchel transform of \(\lambda _P\):

$$\begin{aligned} K_P(m,w) = \sup _{k\in {\mathbb {R}}^d,\eta \in {\mathbb {R}}}\{\langle k,m\rangle +\eta w-\lambda _P(k,\eta )\}. \end{aligned}$$
(25)

For independent and identically distributed random variables, \(\lambda _P\) reduces to a simple cumulant function, while for Markov processes it is given by the dominant eigenvalue of a matrix or linear operator [3, 24, 25].

2.3 Typical Sets and Values

Since rate functions are non-negative, we have as a consequence of Prop. 1 that, if \((m^*,w^*)\) is a zero of \(K_P\), then \(m^*\) must be a zero of \(J_P(m)\) and \(w^*\) must be a zero of \(I_P(w)\). A similar result holds relative to \(Q_n\).

The zeros of rate functions will play an important role in the remaining, so it is important to discuss their interpretation. To this end, let us consider the rate function \(J_P\) describing the large deviations of \(M_n\) relative to \(P_n\), and let \({\mathcal {E}}_P\) denote the set of zeros of \(J_P\), which also corresponds to the set of global minima of \(J_P\):

$$\begin{aligned} {\mathcal {E}}_P=\{m\in {\mathcal {M}}: J_P(m)=0\}. \end{aligned}$$
(26)

Because \(J_P\) has compact level sets, \({\mathcal {E}}_P\) is compact and non-empty.

In general, \({\mathcal {E}}_P\) represents the typical set on which the distribution of \(M_n\) concentrates in the limit \(n\rightarrow \infty \). To be more precise, it can be proved (see [26, Thm. 2.5]) that the sequence \(P_n(M_n\in \,\cdot \,)\) converges weakly to a probability measure \(\Pi \) on \({\mathcal {M}}\) such that \(\Pi ({\mathcal {E}}_P)=1\). This follows because the probability of any point that is not in \({\mathcal {E}}_P\) decays exponentially as a result of the LDP, so that \(P_n(M_n\in \,\cdot \,)\) must concentrate on \({\mathcal {E}}_P\) as \(n\rightarrow \infty \). For this reason, \({\mathcal {E}}_P\) is called the concentration set or the typical set of \(M_n\) relative to \(P_n\).

If \(J_P\) has a unique minimum and zero \( m^*\), then the sequence \(P_n(M_n\in \,\cdot \,)\) converges weakly to the delta measure \(\delta _{m^*}\) [26, Thm. 2.5], so that \(m^*\) is the unique concentration or typical value of \(M_n\). In this case, \(M_n\) satisfies a weak law of large numbers in the sense that

$$\begin{aligned} \lim _{n\rightarrow \infty } P_n(\Vert M_n-m^*\Vert > \epsilon )=0, \end{aligned}$$
(27)

where \(\epsilon \) is any positive real number and \(\Vert \cdot \Vert \) is a metric on \({\mathcal {M}}\). We then also say that \(M_n\rightarrow m^*\) in probability (relative here to \(P_n\)).

These notions of typical sets and values can be applied to any of the rate functions defined before. In applications, it is more common to find that a random variable satisfying the LDP has a unique typical value than a “extended” typical set, so we focus here mainly on the former type of concentration. In general, a random variable has a unique typical value if its rate function is strictly convex.

2.4 Fluctuation Relations

The different rate functions defined up to now are not independent, since probabilities obtained with \(P_n\) can be expressed, as shown in (5), as modified expectations with respect to \(Q_n\) that involve the RND. This leads, as shown next, to a simple relation between the rate functions involving the action, referred to in statistical physics as fluctuation relations [27].

Proposition 2

The joint rate functions \(K_P\) and \(K_Q\) of \((M_n,W_n)\) are related by

$$\begin{aligned} K_P(m,w) = w+K_Q(m,w) \end{aligned}$$
(28)

for all \(m\in {\mathcal {M}}\) and \(w\in {\mathbb {R}}\). Similarly, the marginal rate functions \(I_P\) and \(I_Q\) of \(W_n\) satisfy

$$\begin{aligned} I_P(w) = w+I_Q(w) \end{aligned}$$
(29)

for all \(w\in {\mathbb {R}}\).

This result is obvious if we assume again that densities exist. Then the Radon–Nikodym formula (5) simply becomes

$$\begin{aligned} p_n(m,w)=E_{Q_n}[e^{-nW_n}\delta (M_n-m)\delta (W_n-w)]=e^{-nw}q_n(m,w). \end{aligned}$$
(30)

The proof next translates this observation for measures using another important result of large deviation theory known as Varadhan’s lemma. We refer to [28, Thm. 1.3.4] or [3, Thm. 4.3.1] for the general formulation of this result.

Proof

The probability measure

$$\begin{aligned} P_n(M_n\in A,W_n\in B)= \int _A\int _B P_n(dm,dw) \end{aligned}$$
(31)

is equivalent, using the Radon–Nikodym formula (5), to

$$\begin{aligned} P_n(M_n\in A,W_n\in B) = \int _A\int _B e^{-nw} Q_n(dm,dw). \end{aligned}$$
(32)

This has the form of an exponential integral with \(X^n=(M_n,W_n)\) and \(h(x)=-w\) in the notations of Theorem 1.3.4 of [28]. The function h in our case is not bounded. However, since \(R_n\) is strictly positive on the support of \(P_n\), the large deviation upper bound

$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{1}{n}\log P_n(M_n\in A,W_n\le -C)\le -\inf _{w\le -C}K_P(A,w) \end{aligned}$$
(33)

implies

$$\begin{aligned} \lim _{C\rightarrow \infty }\limsup _{n\rightarrow \infty } \frac{1}{n}\log P_n(M_n\in A,W_n\le -C)=-\infty \end{aligned}$$
(34)

for any measurable A. Therefore, the technical condition stated in [28, Thm. 1.3.4] is satisfied, leading to the main result

$$\begin{aligned} \lim _{n\rightarrow \infty } -\frac{1}{n}\log P_n(M_n\in A,W_n\in B)=\inf _{m\in A,w\in B} \{w+K_Q(m,w)\}. \end{aligned}$$
(35)

Since rate functions are unique [3, Lem. 4.1.4], the right-hand side must be the rate function of \((M_n,W_n)\) relative to \(P_n\), which proves (28).

The same reasoning applied to \(P_n(W_n\in A)\) yields (29). Alternatively, we can derive (29) more directly by applying the contraction principle to marginalize \(M_n\) from (28) following Prop. 1. \(\square \)

The relations (28) and (29) are interpreted in statistical physics as symmetries on rate functions that impose general constraints on the fluctuations of nonequilibrium processes (see [12] for a review). In this context, \(P_n\) refers to the probability measure of a stationary Markov process modelling a nonequilibrium process, \(Q_n\) is the probability measure of the time-reversed process, and \(W_n\) is then called the entropy production. We will come back to this example in Sect. 4.

3 Concentration Equivalence

We are now ready to prove the equivalence of \(P_n\) and \(Q_n\) at the level of the typical sets of \(M_n\) defined, respectively, as

$$\begin{aligned} {\mathcal {E}}_P = \{m\in {\mathcal {M}}:J_P(m)=0\} \end{aligned}$$
(36)

and

$$\begin{aligned} {\mathcal {E}}_Q = \{m\in {\mathcal {M}}:J_Q(m)=0\}. \end{aligned}$$
(37)

Since \(J_P\) and \(J_Q\) have compact level sets, \({\mathcal {E}}_P\) and \({\mathcal {E}}_Q\) are non-empty and compact.

The basic idea for proving this equivalence is contained in the fluctuation symmetry (28), which shows that the rate function \(K_P(m,w)\) and \(K_Q(m,w)\) can vanish on the same value m if they vanish for \(w=0\). To prove that \({\mathcal {E}}_P={\mathcal {E}}_Q\), we then need to make sure that \(w=0\) is the only value where these rate functions vanish, so that \(W_n\) has a unique typical value equal to 0 with respect to both \(P_n\) and \(Q_n\).

As a result, we assume from now on that the rate functions \(I_P\) and \(I_Q\) of \(W_n\) each have a unique zero, which is not necessarily equal to 0, and define the following. We say that \(P_n\) and \(Q_n\) are asymptotically equivalent if

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n}\log \frac{dP_n}{dQ_n}=0 \end{aligned}$$
(38)

in probability with respect to \(P_n\) and \(Q_n\). Note that this definition is consistent with the symmetry (29), for if \(I_Q(0)=0\) then \(I_P(0)=0\), and vice versa.

The next theorem, which is the main result of this paper, shows that this notion of asymptotic equivalence of measures is sufficient for \({\mathcal {E}}_P\) to coincide with \({\mathcal {E}}_Q\).

Theorem 3

Assume that \(M_n\) and \(W_n\) satisfy the joint LDP stated in the Hypotheses 1 and that the rate functions \(I_P\) and \(I_Q\) of \(W_n\) each have a unique zero. If \(P_n\) and \(Q_n\) are asymptotically equivalent, then \({\mathcal {E}}_P={\mathcal {E}}_Q\).

Proof

The assumption that \(I_P\) and \(I_Q\) have unique zeros, coupled with the assumption that \(P_n\) is asymptotically equivalent to \(Q_n\), implies that \(I_P(0)=I_Q(0)=0\) and that \(w=0\) is the only point where this equality holds.

The equality \(I_P(0)=0\) leads with (18) to

$$\begin{aligned} 0=I_P(0) = \inf _{m\in {\mathcal {M}}} K_P(m,0). \end{aligned}$$
(39)

Let A denote the set of minimizers of the infimum over m. Then \(K_P(m^*,0)=0\) where \(m^*\in A\) and, from (16), we obtain

$$\begin{aligned} J_P(m^*) = \inf _{w\in {\mathbb {R}}} K_P(m^*,w)=K_P(m^*,0)=0, \end{aligned}$$
(40)

which implies that \(m^*\in {\mathcal {E}}_P\).

By applying the symmetry (28), we also have \(K_Q(m^*,0)=0\) and so

$$\begin{aligned} J_Q(m^*)=\inf _{w\in {\mathbb {R}}} K_Q(m^*,w) = K_Q(m^*,0)=0, \end{aligned}$$
(41)

which implies that \(m^*\in {\mathcal {E}}_Q\).

This only shows that all \(m^*\in A\) are in \({\mathcal {E}}_P\) and in \({\mathcal {E}}_Q\) or, equivalently, that \(A\subset {\mathcal {E}}_P\) and \(A\subset {\mathcal {E}}_Q\). To prove that all \(m\in {\mathcal {E}}_P\) are in fact in A, assume that \(\bar{m}\in {\mathcal {E}}_P\) and that the infimum over w in (40) is achieved at 0. Then

$$\begin{aligned} 0=J_P({\bar{m}}) = K_P({\bar{m}}, 0) \end{aligned}$$
(42)

so that \({\bar{m}}\in A\). On the other hand, if the infimum is achieved for \({\bar{w}}\ne 0\), then

$$\begin{aligned} I_P({\bar{w}}) = \inf _{m\in {\mathcal {M}}} K_P(m,{\bar{w}}) = K_P({\bar{m}},{\bar{w}})=0, \end{aligned}$$
(43)

which would contradict the fact that \(w=0\) is the only zero of \(I_P(w)\). Consequently, we have proved that \(A={\mathcal {E}}_P\) and so that \({\mathcal {E}}_P\subset {\mathcal {E}}_Q\).

To prove the equality of the two sets, we only have to use the same argument by starting with \(Q_n\) to show similarly that all \(m\in {\mathcal {E}}_Q\) are also in \({\mathcal {E}}_P\), so that \({\mathcal {E}}_Q\subset {\mathcal {E}}_P\). Consequently, \({\mathcal {E}}_P={\mathcal {E}}_Q\). \(\square \)

The result of Theorem 3 is natural considering that the notion of asymptotic equivalence and the LDP are based on the same logarithmic scale (\(\asymp \)), defined in (14), so that differences between measures that are neglected on that scale should not affect the LDP of \(M_n\). One has to be careful with this intuition, however, because it is known that sub-exponential differences between \(P_n\) and \(Q_n\) can lead to different rate functions [1]. What Theorem 3 shows is that such differences do not influence the concentration of \(M_n\), although they can influence the fluctuations of \(M_n\). In other words, if \(P_n\) and \(Q_n\) are asymptotically equivalent, then the rate functions \(J_P\) and \(I_Q\) for \(M_n\) are not necessarily equal, but have the same zeros.

The next result relates the notion of asymptotic equivalence, defined in (38) in terms of \(W_n\), to the relative entropy

$$\begin{aligned} D(P_n||Q_n)=\int dP_n \log \frac{dP_n}{dQ_n} =E_{P_n}\left[ \log \frac{dP_n}{dQ_n}\right] \end{aligned}$$
(44)

or Kullback–Leibler distance [19]. This result is potentially useful for determining whether \(P_n\) and \(Q_n\) are asymptotically equivalent without having to explicitly derive the rate function of \(W_n\).

Theorem 4

Assume the same hypotheses as in Theorem 3. If \(P_n\) and \(Q_n\) are asymptotically equivalent, then

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n}D(P_n||Q_n) = \lim _{n\rightarrow \infty } \frac{1}{n}D(Q_n||P_n) = 0. \end{aligned}$$
(45)

Conversely, if the limits above hold, then \(P_n\) and \(Q_n\) are asymptotically equivalent.

Proof

The proof only relies on the law of large numbers for \(W_n\). If \(W_n\) satisfies the LDP relative to \(P_n\) and its rate function \(I_P\) has a unique zero \(w^*\), as assumed, then

$$\begin{aligned} \lim _{n\rightarrow \infty } E_{P_n}[W_n] =w^*. \end{aligned}$$
(46)

Therefore, if \(P_n\) and \(Q_n\) are asymptotically equivalent, then \(w^*=0\) and

$$\begin{aligned} \lim _{n\rightarrow \infty } E_{P_n}[W_n] =\lim _{n\rightarrow \infty }-\frac{1}{n}E_{P_n}\left[ \log \frac{dP_n}{dQ_n}\right] =\lim _{n\rightarrow \infty } -\frac{1}{n} D(P_n||Q_n) = 0. \end{aligned}$$
(47)

The same applies relative to \(Q_n\).

To prove the converse, note that if the limit (46) in mean applies for \(W_n\) and \(I_P\) has a unique zero, as assumed, then the limiting mean \(w^*\) must be that zero. Hence, if the first limit for the relative entropy shown in (45) holds, then \(I_P(0)=0\). Since the same applies for \(I_Q\), we conclude that \(P_n\) and \(Q_n\) are asymptotically equivalent. \(\square \)

Remarks

  1. 1.

    The result of Theorem 3 was already known to hold, as mentioned in the introduction, for the specific probability measures that are the microcanonical and canonical ensembles of statistical physics (see Sect. 4). The notion of asymptotic equivalence of measures used here comes from that context.

  2. 2.

    The equivalence result is valid for any random variable \(M_n\) that satisfies the joint LDP with \(W_n\). This means concretely that, if two probabilistic models of some system are asymptotically equivalent, then they are indistinguishable at the level of their typical sets. They are equivalent models predicting the same typical properties.

  3. 3.

    The asymptotic equivalence of \(P_n\) and \(Q_n\) is a sufficient but not a necessary condition for \({\mathcal {E}}_P={\mathcal {E}}_Q\). In some cases (see Sect. 4), we can indeed have \({\mathcal {E}}_P={\mathcal {E}}_Q\) for specific random variables \(M_n\) even though \(P_n\) and \(Q_n\) are not asymptotically equivalent.

  4. 4.

    The notion of asymptotic equivalence is transitive: If \(P_n\) is asymptotically equivalent to \(Q_n\) and \(Q_n\) is asymptotically equivalent to \(F_n\), then \(P_n\) is asymptotically equivalent to \(F_n\). This can be checked directly from the definition of asymptotic equivalence.

  5. 5.

    When the limits (45) for the relative entropy hold, \(P_n\) and \(Q_n\) are said to have zero divergence rate [29] or to be equivalent in the specific relative entropy sense [1, 30,31,32]. For Markov processes, the action and relative entropy can be related to transition and waiting times [33].

  6. 6.

    We need not assume for proving Theorem 3 that the rate function \(J_P\) and \(J_Q\) defining the typical sets \({\mathcal {E}}_P\) and \({\mathcal {E}}_Q\) have unique zeros. This assumption is only required for \(I_P\) and \(I_Q\) so as to have unique typical values for \(W_n\) relative to \(P_n\) and \(Q_n\) which, by the assumption of asymptotic equivalence, are equal to 0.

  7. 7.

    An open problem is to determine what happens when \(W_n\) has another typical value other than 0 or when \(w=0\) is only in the typical set of \(P_n\) or \(Q_n\) without being a real concentration value. The proof given here suggests that \({\mathcal {E}}_P\) and \({\mathcal {E}}_Q\) should have in this case some overlap without being equal, as is known to happen for the microcanonical and canonical ensembles when they are partially equivalent [26].

  8. 8.

    Another open problem is to generalize our results when \(P_n\) and \(Q_n\) do not have the same support. In this case, the symmetry relations expressed in Prop. 2 do not seem to hold on the whole domain of the rate functions involved, but only on their intersection. It is not clear then whether or not this is enough to have equivalence of typical sets, as there is no guarantee that the zeros of the rate functions relative to \(P_n\) are also zeros of the rate functions relative to \(Q_n\), which makes their comparison more complicated. The microcanonical and canonical ensembles, which do not have the same support, should serve as a starting point for understanding this problem.

4 Applications

We illustrate in this section the result of Theorem 3 using various examples of probability measures and stochastic processes. The examples are simple: they are presented to discuss certain aspects of that theorem and to give an idea of how it can be applied to measures that describe a wide range of “static” and “dynamic” probabilistic models.

4.1 Independent Random Variables

We first consider a sequence \(X_1,X_2,\ldots ,X_n\) of real random variables, assumed to be independent and identically distributed (iid) according to some density p, defining our model \(P_n\), or the density q, defining \(Q_n\). For example, we can choose \(p\sim {\mathcal {N}}(0,1)\) to be a standard normal random variables and \(q\sim {\mathcal {N}}(\mu ,\sigma ^2)\) to be a Gaussian random variable with mean \(\mu \) and variance \(\sigma ^2\). In this case, the RND is simply

$$\begin{aligned} R_n = \prod _{i=1}^n \frac{p(X_i)}{q(X_i)}=e^{-nW_n}, \end{aligned}$$
(48)

where

$$\begin{aligned} W_n = \frac{\mu }{\sigma ^2}M_n +\frac{(\sigma ^2-1)}{2\sigma ^2}C_n-\frac{\mu ^2}{2\sigma ^2}-\log \sigma \end{aligned}$$
(49)

with

$$\begin{aligned} M_n = \frac{1}{n}\sum _{i=1}^n X_i,\quad C_n = \frac{1}{n}\sum _{i=1}^n X_i^2. \end{aligned}$$
(50)

To determine whether \(P_n\) and \(Q_n\) are asymptotically equivalent, we need to find the rate functions \(I_P\) and \(I_Q\) of the action \(W_n\). This can be done easily with the Gärtner–Ellis theorem (see Sect. 2) or by contraction of the joint LDP of \(M_n\) and \(C_n\) above. From the form of \(W_n\), however, it is clear that \(P_n\) and \(Q_n\) are asymptotically equivalent if and only if \(\mu =0\) and \(\sigma =1\), that is, if and only if we trivially have \(p=q\). In this case, \(W_n=0\) with probability 1, so that \(I_P\) and \(I_Q\) are degenerate on \(w=0\).

For \(\mu =0\) and \(\sigma \ne 1\), \(M_n\rightarrow 0\) in probability relative to both \(P_n\) and \(Q_n\), although the two measures are not asymptotically equivalent. For this observable, we therefore have \({\mathcal {E}}_P={\mathcal {E}}_Q\), which shows that the condition of asymptotic equivalence is not a necessary condition for the equivalence of \({\mathcal {E}}_P\) and \({\mathcal {E}}_Q\), as noted in Remark 3. Note, however, that \({\mathcal {E}}_P\ne {\mathcal {E}}_Q\) if we take the observable to be \(C_n\), since \(C_n\rightarrow 1\) in probability relative to \(P_n\) while \(C_n\rightarrow \sigma ^2\) in probability relative to \(Q_n\), assuming again \(\mu =0\). This suggests that, if \(P_n\) and \(Q_n\) are not asymptotically equivalent, then there is at least one observable for which \({\mathcal {E}}_P\ne {\mathcal {E}}_Q\), a result that would be interesting to prove in general.

The asymptotic equivalence obtained for \(p=q\) applies in a more general way to any iid sequences satisfying the hypotheses of this work. This follows from Theorem 4 by noting that

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}D(P_n||Q_n) = D(p||q) = \int dx\, p(x)\log \frac{p(x)}{q(x)} \end{aligned}$$
(51)

and that D(p||q) vanishes if and only if \(p(x)=q(x)\) almost everywhere [19].

To go beyond this trivial case of equivalence, we can consider sequences of random variables that are independent but not identically distributed. In particular, we can consider in \(Q_n\) all but one random variable, say \(X_1\), to have the same distribution p, so that

$$\begin{aligned} R_n(x_1,x_2,\ldots , x_n) = \frac{p(x_1) p(x_2)\cdots p(x_n)}{q(x_1) p(x_2)\cdots p(x_n)}= \frac{p(x_1)}{q(x_1)} \end{aligned}$$
(52)

and thus

$$\begin{aligned} W_n = \frac{1}{n}\log \frac{q(X_1)}{p(X_1)}. \end{aligned}$$
(53)

In this case, \(W_n\rightarrow 0\) as \(n\rightarrow \infty \) relative to both \(P_n\) and \(Q_n\), provided that p and q do not scale with n and have the same support. Under these additional conditions, \(P_n\) and \(Q_n\) are then asymptotically equivalent. This can be generalized, as is clear from the form of \(R_n\) above, to cases where a number \(N<n\) of independent random variables have a different distribution q under \(Q_n\), so long as \(N/n\rightarrow 0\) as \(n\rightarrow \infty \).

4.2 Microcanonical and Canonical Ensembles

The microcanonical and canonical ensembles are the main probabilistic models used in statistical physics to study equilibrium systems. Both are defined by transforming a basic measure \(\mu _n\) on the space \(\Omega _n\) of configurations or microstates of a system of n particles for which the random variable \(M_n\) is interpreted as a macrostate. On the one hand, the microcanonical ensemble is the measure on \(\Omega _n\) obtained by conditioning \(\mu _n\) on \(M_n\in B\):

$$\begin{aligned} \mu _n(d\omega |M_n\in B) =\frac{\mu _n(d\omega ,M_n\in B)}{\mu _n (M_n\in B)} = \left\{ \begin{array}{lll} \mu _n (d\omega )/\mu _n(B) &{} &{} \text {if }M_n(\omega )\in B\\ 0 &{} &{} \text {otherwise,} \end{array} \right. \end{aligned}$$
(54)

where \(\omega \) is an element of \(\Omega _n\). Usually, \(M_n\in {\mathbb {R}}\) is the energy of the system and B is a very thin interval \([{\bar{m}}-\epsilon ,{\bar{m}}+\epsilon ]\), called the energy shell, located around a fixed value \({\bar{m}}\). Taking this to represent our model \(P_n\), we then have

$$\begin{aligned} P_n(d\omega ) =\mu _n(d\omega |M_n\in [{\bar{m}}-\epsilon ,{\bar{m}}+\epsilon ]). \end{aligned}$$
(55)

On the other hand, the canonical ensemble is the measure on \(\Omega _n\) that transforms \(\mu _n\) according to

$$\begin{aligned} Q_n(d\omega )=\frac{e^{nk M_n(\omega )}}{E_{\mu _n}[e^{nk M_n}]}\mu _n(d\omega ),\quad k\in {\mathbb {R}} \end{aligned}$$
(56)

provided that \(E_{\mu _n}[e^{nk M_n}]<\infty \). This measure is also called the exponential-tilting of \(\mu _n\) or the exponential family, and represents physically the distribution of a system of n particles with energy \(M_n\) in contact with a heat bath at inverse temperature \(\beta =-k\). Mathematically, it also represents a “softening” of the microcanonical measure in which the “hard” conditioning constraint \(M_n={\bar{m}}\) is replaced by the “soft” constraint \(E_{Q_n}[M_n]={\bar{m}}\) on the average of \(M_n\).

To prove the equivalence of these two measures, we need to assume that \(M_n\) satisfies the LDP with respect to \(\mu _n\) with rate function I. Assuming that I is convex at \({\bar{m}}\) and choosing \(k\in \partial I({\bar{m}})\), where \(\partial I\) denotes the sub-differential of I [34], it can be shown that \(W_n\rightarrow 0\) in probability relative to both \(P_n\) and \(Q_n\) [1]. The two measures or ensembles must then be equivalent at the level of typical sets of random variables that satisfy the LDP in both ensembles. The full proof of this result can be found in [1], so we do not repeat it here.

Physically, nonequivalent ensembles arise when the interactions between particles in a macroscopic system are long-range, with mean-field interactions being an extreme case of long-range interactions. For examples of such systems, see [1, 26, 35]. When the interaction is short or finite range, the microcanonical and canonical ensembles are generally equivalent.

The equivalence of the microcanonical and canonical ensembles has also been investigated recently in the context of random graphs [15,16,17], sometimes with different notions of asymptotic equivalence [14]. What is found in general is that a microcanonical ensemble of random graphs in which a fixed number of constraints are considered is equivalent to a canonical ensemble of graphs in which these constraints are imposed on average with an exponential (canonical) tilting. One example is the Erdös–Rényi ensemble in which all graphs with N nodes and E links have the same probability, and E is such that the degree per node 2E / N converges to a constant d as \(N\rightarrow \infty \) (sparse regime). In the limit where \(N\rightarrow \infty \), this microcanonical graph ensemble is known to be equivalent with the more common canonical ensemble in which the N vertices are linked at random with probability \(p=d/N\). However, if an extensive number of constraints proportional to the number of nodes are imposed, then the microcanonical and canonical ensemble can be nonequivalent. This is illustrated in [15] with random graphs in which the whole degree sequence is fixed.

4.3 Generalized Canonical Ensembles

The canonical ensemble is not the only probability measure that is asymptotically equivalent to the microcanonical ensemble. More generally, we can replace the canonical measure \(Q_n\) in (56) by

$$\begin{aligned} F_n(d\omega )=\frac{e^{nh(M_n(\omega ))}}{E_{\mu _n}[e^{nh(M_n)}]}\mu _n(d\omega ), \end{aligned}$$
(57)

where \(h:{\mathcal {M}}\rightarrow {\mathbb {R}}\) is a real function of \(M_n\) such that \(E_{\mu _n}[e^{nh(M_n)}]<\infty \). This defines in statistical physics a generalized canonical ensemble [36,37,38], which has the same support as the canonical ensemble measure (56) and which can be made equivalent to both the canonical ensemble and the microcanonical ensemble.

The asymptotic equivalence with the microcanonical ensemble is discussed in detail in [36]. To see how the generalized canonical ensemble can be equivalent with the standard canonical ensemble, let us assume as before that \(M_n\) satisfies the LDP relative to \(\mu _n\) with rate function I and define

$$\begin{aligned} \phi (h)=\lim _{n\rightarrow \infty }\frac{1}{n}\log E_{\mu _n}[e^{nh(M_n)}] \end{aligned}$$
(58)

and

$$\begin{aligned} \lambda (k)=\lim _{n\rightarrow \infty }\frac{1}{n}\log E_{\mu _n}[e^{nkM_n}], \end{aligned}$$
(59)

assuming that both are finite. By Varadhan’s lemma [28, Thm. 1.3.4], it is known that these two functions can be expressed in terms of the rate function I as

$$\begin{aligned} \phi (h)=\sup _{m\in {\mathcal {M}}}\{h(m)-I(m)\} \end{aligned}$$
(60)

and

$$\begin{aligned} \lambda (k) = \sup _{m\in {\mathcal {M}}}\{km -I(m)\}. \end{aligned}$$
(61)

Moreover, under the hypothesis of this lemma, it can be proved (see [2, Thm. 11.7.2] or [1, Thm. 14]) that \(M_n\) satisfies the LDP relative to \(F_n\) with rate function

$$\begin{aligned} I_F(m) = I(m)-h(m)+\phi (h) \end{aligned}$$
(62)

and the LDP relative to \(Q_n\) (the canonical ensemble) with rate function

$$\begin{aligned} I_Q(m) = I(m) -km+\lambda (k). \end{aligned}$$
(63)

Combining these results, we see that, if h and k are chosen such that \(I_F(m)\) and \(I_Q(m)\) have the same unique minimum and zero \({\bar{m}}\), then \(\phi (h)=h({\bar{m}})-I({\bar{m}})\) and \(\lambda (k)=k{\bar{m}}-I({\bar{m}})\). Consequently,

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n}\log \frac{dF_n}{dQ_n} = h({\bar{m}})-k {\bar{m}}+\lambda (k)-\phi (h) =0 \end{aligned}$$
(64)

in probability relative to both \(F_n\) and \(Q_n\), which means that we have asymptotic equivalence. This follows here because the RND is a function of \(M_n\) only and both \(F_n\) and \(Q_n\) concentrate on the same value \({\bar{m}}\) of \(M_n\).

Specific examples of generalized ensembles related to long-range and mean-field interacting systems are discussed in [36,37,38,39]. The advantage of using the generalized canonical ensemble is that it can be used to describe the microcanonical properties of many-body systems whenever the canonical ensemble itself is not equivalent with the microcanonical ensemble. This happens generically when the entropy is nonconcave as a function of the energy in the thermodynamic limit. For more details on equivalent versus nonequivalent ensembles, we refer to [1, 40].

4.4 Markov Processes

We close the list of examples by briefly discussing Markov processes, beginning with the case of Markov chains.

Let \(X_1,X_2,\ldots , X_n\) be an ergodic Markov chain on a set \(\Omega \), assumed to be finite for simplicity, and consider two probability measures \(P_n\) and \(Q_n\) on the space \(\Omega _n=\Omega ^n\) defined by the (homogeneous) transition kernels p(xy) and q(xy), respectively. Starting with the same distribution \(\rho \) for \(X_1\), we thus write

$$\begin{aligned} P_n(x_1,x_2,\ldots , x_n) = \rho (x_1) p(x_1,x_2) \cdots p(x_{n-1},x_n) \end{aligned}$$
(65)

and

$$\begin{aligned} Q_n(x_1,x_2,\ldots , x_n) = \rho (x_1) q(x_1,x_2)\cdots q(x_{n-1},x_n), \end{aligned}$$
(66)

so that

$$\begin{aligned} W_n = \frac{1}{n}\sum _{i=1}^{n-1} \log \frac{q(x_i,x_{i+1})}{p(x_i,x_{i+1})}. \end{aligned}$$
(67)

The rate function of \(W_n\), if it exists, can be derived by contracting the LDP of the so-called pair empirical distribution of the Markov chain; see Sect. 4.3 of [20]. Alternatively, we can notice that the relative entropy rate of the two Markov chains is

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}D(P_n||Q_n) = \sum _{(x,y)\in \Omega ^2} \mu (x) p(x,y) \log \frac{p(x,y)}{q(x,y)} \end{aligned}$$
(68)

where \(\mu (x)\) is the invariant distribution of the Markov chain with transition distribution p(xy) [19]. Since the relative entropy on the right-hand side above vanishes if and only if \(p=q\) almost everywhere, we then obtain, similarly to iid sequences, that \(P_n\) and \(Q_n\) are asymptotically equivalent, under the conditions of Theorem 3, if they define the same Markov chain with the same transition kernel. This applies to homogeneous Markov chains. As in the case of iid sequences, there is more room for equivalence if we allow the transition kernels to be time-dependent or compare Markovian with non-Markovian processes.

Similar results can be formulated for Markov chains on uncountable and continuous spaces, provided that they have the LDPs required in Hypotheses 1. One can also consider continuous-time processes, such as pure diffusions, by replacing \(\Omega _n\) with the space \(\Omega _T\) of sample paths over the time interval [0, T], in which case \(P_n\) and \(Q_n\) are “path” measures similar to the Wiener measure, denoted by \(P_T\) and \(Q_T\), whose action

$$\begin{aligned} W_T = -\frac{1}{T}\log \frac{dP_T}{dQ_T} \end{aligned}$$
(69)

can be expressed in terms of stochastic integrals using Girsanov’s theorem [41]. The large deviation limit defining the equivalence of \(P_T\) and \(Q_T\) is then the long-time or ergodic limit \(T\rightarrow \infty \).

Many examples of stochastic processes related to nonequilibrium systems which are asymptotically equivalent are treated in [10]. This work introduced together with [9] the notion of asymptotic equivalence of processes in order to construct “modified” Markov processes that are equivalent, in terms of typical properties, to Markov processes conditioned on reaching certain large deviations. What is found in general is that the conditioned Markov processes are not Markovian, but do become asymptotically equivalent in the long-time limit to a homogeneous Markov process, given by a generalization of the Doob transform. For more information on this large deviation conditioning problem, and its connections with nonequilibrium versions of the microcanonical and canonical ensembles, we refer to [9,10,11].

To close this section, let us consider as an example an ergodic diffusion \(X_t\) with path measure \(P_T\), and let \(Q_T\) be the path measure of the same process reversed in time (in the sense of Haussmann and Pardoux [42]). If the process is reversible, that is, if it satisfies the detailed balance condition, then it is known that the action \(W_T\), which corresponds to the entropy production [27], depends only on the initial and final states:

$$\begin{aligned} W_T = -\frac{1}{T}\log \frac{p(X_0)}{p(X_T)}, \end{aligned}$$
(70)

where p is stationary density of \(X_t\). Since this density does not scale with time, \(X_t\) and its time-reversal must therefore be asymptotically equivalent, and so equivalent at the level of typical values. This is expected physically, since the two processes are then statistically indistinguishable. On the other hand, if \(X_t\) is irreversible, then the entropy production is known to be strictly positive, which means that the process and its time-reversal are not asymptotically equivalent. In this case, the two processes behave differently in terms of path statistics and typical ergodic values.