1 Introduction

An important recent progress in nonequilibrium statistical physics was the discovery of various fluctuation relations, which are identities involving the statistics of a fluctuating entropy. In particular, the Gallavotti–Cohen–Evans–Morriss (GCEM) relation [25, 26, 29] imposes a peculiar symmetry related to the rare events associated with this fluctuating entropy. The appropriate theory to describe such rare events is large deviation theory, which is a very fashionable subject in statistical physics [42, 49] and in modern probability [1618, 23, 50], as evidenced by the Abel Prize awarded to S. R. S Varadhan in 2007.

We recall that a time dependent measure \(\mu _{T}(dx)\) satisfies the large deviation principle if at large times it takes an exponential decreasing form. This exponential decay is characterized by a lower semi-continuous positive function \(I(x)\), which is called the rate function. This function is such that for any set \(A\)

$$\begin{aligned} -\inf _{x\in A^{0}}I(x)\le \liminf _{T\rightarrow +\infty }\frac{1}{T}\ln \mu _{T}(A)\le \limsup _{T\rightarrow +\infty }\frac{1}{T}\ln \mu _{T}(A)\le -\inf _{x\in \overline{A}}I(x), \end{aligned}$$
(1)

where \(A^{0}\) is the interior of \(A\) and \(\overline{A}\) is the closure of \(A\). This can be stated less formally as

$$\begin{aligned} \mu _{T}(dx)\sim \exp \left( -TI(x)\right) dx. \end{aligned}$$
(2)

Historically, large deviation theory originated in the nineteenth century with pioneering works in statistical mechanics [8]. One of the most important contributions to large deviation theory was the general approach for Markov processes developed by Donsker and Varadhan [1922]. In this series of papers, they identified three levels of large deviations:

  • Level 1, which is the study of fluctuations of additive observables with respect to the mean.

  • Level 2, related to fluctuations of the fraction of time spent in each state.

  • Level 3, concerning fluctuations on the statistics of infinite trajectories.

The ranking of these levels establishes a hierarchy in which a lower level can be deduced from a higher one by contraction. Donsker and Varadhan proved the large deviation principle for Markov processes at the level 3 by studying random probability measures on infinite trajectories. This queen large deviation result posses an explicit rate function, which is the relative entropy density. Moreover, they proved the large deviation principle at the level 2 for the empirical density, defined as the fraction of time spent in each state up to time \(T\). Contrary to level 3, the rate function for level 2 admits a variational representation, which is in general not explicit. Hence, the explicit character of level 3 disappears after contracting to level 2. At discrete time a more detailed picture is available: it is possible to investigate the large deviation of the \(k\) symbol empirical measure and prove that the rate function can be obtained explicitly if \(k\ge 2\). Thus filling the gap between level 2 and 3.

However, in discrete time the extended process \((X_{t},X_{t+1},\ldots ,X_{t+k-1})\) is itself a Markov chain and therefore the intermediate level can be derived from the level 2. This magnification trick is no longer possible in continuous time. Until recently, no result existed in the literature to fill this level 2–3 gap for continuous time. The first study of this gap in the continuous time setting was by Kesidis and Walrand [32], for pure jump processes with two states. They obtained explicitly the rate function for the joint probability of the empirical density and the empirical flow counting the number of jumps between pair of states up to time T. This intermediate level was then called 2.5. This issue was later studied by De La Fortelle [15], who obtained a weak large deviation in the same context but for countable space.

Somewhat in parallel, in nonequilibrium statistical physics, it has been found that the empirical density at level 2 is not sufficient to study fluctuations of the entropy production and of currents. This also motivated the search of an intermediate level for pure jump and diffusion processes, by Maes and collaborators [39, 40], and by Chernyak et al. [9]. Finally, Bertini et al. [5] succeeded in proving rigorously the level 2.5 for pure jump processes in a countable space. Rigorous results for diffusion processes have been obtained in [36].

The purpose of our contribution is to present the level 2.5 of large deviations for continuous time processes and discuss its connection with fluctuation relations. Whereas the explicit rate functions for the level 2.5 of large deviations calculated here have been obtained in [5, 9, 15, 39, 40], our presentation unifies the proofs for pure jump and diffusion processes, and clearly compares the two different methods used to obtain these rate functions, namely, tilting and a spectral method. Moreover, some of the proofs presented here are completely original.

The organization of the paper is as follows. Section 2 sets the stage with the definition of Markov processes, which include jump and diffusion processes. Particularly, in Sect. 2.1 we recall basic concepts of Markov processes like transition probability, generator, stationary and equilibrium states, and trajectorial measure. In Sect. 2.2, we introduce the empirical density, empirical flow, empirical current, and the action functional, which are the fluctuating observables studied in the paper. In Sect. 3 we obtain the finite time fluctuation relation, which results as a tautology from the definition of the action functional. Section 4 is the cornerstone of the paper and deals with the level 2.5 of large deviations. In Sect. 4.1 we use the tilting method to obtain the rate function characterizing the level 2.5. This proof is related to results from [39, 40], but the presentation given here is original. Section 4.2 contains the spectral method. In this case, for pure jump processes our proof is original. For diffusion processes the spectral method has been used in [9], in comparison to this reference we expurgate the field theoretical language by using the Girsanov lemma. Finally, in Sect. 5 we obtain a stationary fluctuation relation at the level 2.5 and, by contraction, the GCEM symmetry for the fluctuating entropy.

2 Models and Observables

2.1 Homogeneous Ergodic Markov Processes

We start with a brief overview of homogeneous Markov processes [13, 44, 46, 48], considering continuous time Markov processes \(X_{t}\) taking values in a state space \(\mathcal {E}\), which can be continuous, as for example \(\mathbb {R}^{d}\), or a counting space.

2.1.1 Elements of Ergodic Markov Processes

A time-homogeneous Markov process can be defined by a family of transitions kernel \(P_{t}(x,dy)\), which is the conditional probability that \(X_{t+t'}\in [y,y+dy]\) given that \(X_{t'}= x\). This conditional probability satisfies the Chapmann-Kolmogorov rule

$$\begin{aligned} \int _{\mathcal {E}}P_{s}(x,dy)P_{t}(y,dz)\,=\, P_{s+t}(x,dz), \end{aligned}$$
(3)

where the measure \(dy\) means the Lebesgue measure or the counting measure, depending on \(\mathcal {E}\). The semi-group associated with the transition kernel is defined by its action on a bounded measurable function \(f\) in \(\mathcal {E}\),

$$\begin{aligned} P_{t}[f](x)\equiv \int _{\mathcal {E}} P_{t}(x,dy)f(y). \end{aligned}$$
(4)

The infinitesimal generator \(L\), formally defined as \(P_t\equiv \exp \left( tL\right) \), leads to the forward and backward Kolmogorov equations,

$$\begin{aligned} \partial _{t}P_{t}=P_{t}\circ L\qquad {{\text {and}}}\qquad \partial _{t}P_{t}=L\circ P_{t}, \end{aligned}$$
(5)

respectively. The symbol \(\circ \) means composition of operators and the initial condition is \(P_{0}=\mathcal {I}\), where \(\mathcal {I}\) is the identity kernel. Conservative processes (without death or explosion), for which the normalization condition \(\int P_{t}(x,dy)=1\) holds, are often considered in Physics. The generator must then obey \(L[1]=0\), where \(1\) is the function which is equal to \(1\) on \(\mathcal {E}\).

The time evolution of the instantaneous one point measure \(\mu _{t}(dy)\!=\!\int _{\mathcal {E}}\mu _{0}(dx_{0})P_{t}(x_{0},dy)\) can be deduced from the Kolmogorov equation (5), leading to the Fokker–Planck equation \(\partial _{t}\mu _{t}=L^{\dagger }[\mu _{t}]\), where \(L^{\dagger }\) is the adjoint of \(L\) with respect to the Lebesgue or counting measure. Since we are considering ergodic Markov processes, there is a unique invariant probability measure \(\mu _{inv}\) satisfying

$$\begin{aligned} L^{\dagger }[\mu _{inv}]=0. \end{aligned}$$
(6)

The process is said to be in equilibrium w.r.t \(\mu _{inv}\) if the detailed balance relation is satisfied, i.e.,

$$\begin{aligned} \mu _{inv}(dx)P_{t}(x,dy)=\mu _{inv}(dy)P_t(y,dx). \end{aligned}$$
(7)

In the following it is assumed that the one point measure is smooth with respect to the Lebesgue measure, for example with the conditions of the Hormander theorem [30, 41] for a diffusion process, leading to \(\mu _{t}(dx)\equiv \rho _{t}(x)dx.\) With \(\mu _{inv}(dx)\equiv \rho _{inv}(x)dx\), the detailed balance condition (7) can be written asFootnote 1

$$\begin{aligned} \rho _{inv}\circ L\circ \rho _{inv}^{-1}=L^{\dagger }. \end{aligned}$$
(8)

In addition to the characterization by the semi-group or the generator, a Markov process can be characterized by its trajectorial measure. The sample path of the process up to time \(T\) is the random function \(X_0^T\) : \(t\in \left[ 0,T\right] \rightarrow X_{t}\) . It is a random variable in the space of trajectories \(D\left( [0,T],\mathcal {E}\right) \). This trajectorial measure \(d\mathbb {P}_{L,\mu _{0},T}[x_0^T]\), where \(\mu _0\) is the initial measure, is roughly the probability that the trajectory \(X_0^T\) equals \(x_0^T\). The expectation of an arbitrary functional \(F\left[ X_0^T\right] \) of the trajectories is then written as,

$$\begin{aligned} \mathbb {E}_{L,\mu _{0}}\left[ F\right] \ =\ \int _{\mathcal {}}F\left[ x_0^T\right] \, d\mathbb {P}_{L,\mu _{0},T}\left[ x_0^T\right] . \end{aligned}$$
(9)

The finite time distributions are sufficient to characterize \(d\mathbb {P}_{L,\mu _{0},T}\), more precisely, Eq. (9) may be rewritten as

$$\begin{aligned} \mathbb {E}_{L,\mu _{0}}\left[ F\right] =&\int _{\mathcal {E}^{n+1}}F(x_{0},x_{1},\ldots ,x_{n-1},x_{n})\mu _{0}(dx_{0})\exp \left( t_{1}L\right) (x_{0},dx_{1})\\&\times \exp \left( \left( t_{2}-t_{1}\right) L\right) (x_{1},dx_{2})\cdots \exp \left( \left( T-t_{n-1}\right) L\right) (x_{n-1},dx_{n}),\nonumber \end{aligned}$$
(10)

for the cylindrical functional

$$\begin{aligned} F\left[ X\right] =F(X_{0},X_{t_{1}},X_{t_{2}},\ldots ,X_{t_{n-1}},X_{T}), \end{aligned}$$
(11)

with \(0\le t_{1}\le t_{2}\le \cdots \le t_{n-1}\le T\). In the following we consider the two most prominent classes of Markov processes: jump and diffusion processes.

2.1.2 Pure Jump Processes

A Markov process is called a pure jump process if after “arriving” into a state the system stays there for a random exponentially distributed time interval and then jumps to another state. The transition rates \(W(x,y)\) give the probability per unit of time for the transition \(x\rightarrow y\). Moreover, with regularity conditions (see [24], Chap. 8] for example), it is possible to prove that for pure jump possesses the generator acting on the bounded measurable function \(h:\mathcal {E}\rightarrow \mathbb {R}\) is

$$\begin{aligned} L\left[ h\right] (x)=\int _{\mathcal {E}}W(x,y)\left( h(y)-h(x)\right) dy, \end{aligned}$$
(12)

for all \(x\in \mathcal {E}\). The detailed balance condition (7) with respect to the density \(\rho _{inv}\) takes the form

$$\begin{aligned} \rho _{inv}(x)W(x,y)=\rho _{inv}(y)W(y,x). \end{aligned}$$
(13)

A relevant quantity in this paper is the current associated with the density \(\rho _{t}\),

$$\begin{aligned} J_{\rho _{t}}(x,y)\equiv \rho _{t}(x)W(x,y)-W(y,x)\rho _{t}(y). \end{aligned}$$
(14)

From equation (6), the current associated with the invariant density is conserved,

$$\begin{aligned} \int dyJ_{\rho _{inv}}(x,y)=0. \end{aligned}$$
(15)

At the trajectory level it is possible to compare the trajectorial measure (9) of two processes with different transition rates, with the condition that they both have the same set of non vanishing rates. To this end, we introduce the non conservative Markovian generatorFootnote 2

$$\begin{aligned} L_{V_{1},V_{2}}\left[ h\right] (x)\equiv \left( \int _{\mathcal {E}}W(x,y)\left[ \exp \left( V_{2}(x,y)\right) h(y)-h(x)\right] dy\right) +V_{1}(x)h(x), \end{aligned}$$
(16)

for all functions \(h\), \({\text {with}}\,V_{1}: \mathcal {E}\rightarrow \mathbb {R}\,{\text {and}}\, V_{2}:\mathcal {E}^{2}\rightarrow \mathbb {R}.\) We call this generator the twisted generator. From the Girsanov lemma [34], Proposition 2.6] and the Feynamn Kac relation [44, 46], it follows that \(d\mathbb {P}_{L_{V_{1},V_{2}},\mu _{0},T}\) is absolutely continuous w.r.t. \(d\mathbb {P}_{L,\mu _{0},T}\), and the explicit Radon Nykodym derivative is given by

$$\begin{aligned} \frac{d\mathbb {P}_{L_{V_{1},V_{2}},\mu _{0},T}}{d\mathbb {P}_{L,\mu _{0},T}}\left[ x_0^T\right] =\exp \left( \sum _{0\le s\le T/ x_{s^-}\ne x_{s^+}}V_{2}(x_{s^{-}},x_{s^{+}})+\int _{0}^{T}dsV_{1}(x_{s})\right) , \end{aligned}$$
(17)

where \(x_{s^-}\equiv \lim _{\delta \rightarrow 0}x_{s-\delta }\) and \(x_{s^+}\equiv \lim _{\delta \rightarrow 0}x_{s+\delta }\). Hence, the sum \(\sum _{0\le s\le T/ x_{s^-}\ne x_{s^+}}\) is over all jumps in the trajectory \(x_0^T\). In particular, for two conservative jump processes, one with rates \(W\) and the other with rates \(W_{V_{2}}(x,y)= W(x,y)\exp \left( V_{2}(x,y)\right) \) relation (17) becomes

$$\begin{aligned} \frac{d\mathbb {P}_{L_{V_{2}},\mu _{0},T}}{d\mathbb {P}_{L,\mu _{0},T}}\left[ x\right] =\exp \left( \sum _{0\le s\le T/ x_{s^-}\ne x_{s^+}}V_{2}(x_{s^{-}},x_{s^{+}})-\int _{0}^{T}ds\left( W\exp \left( V_{2}\right) -W\right) \left[ 1\right] (x_{s})\right) , \end{aligned}$$
(18)

where \(L_{V_{2}}\) is the conservative generator obtained from (16) by setting

$$\begin{aligned} V_{1}=\left( W\right) \left[ 1\right] -\left( W\exp \left( V_{2}\right) \right) \left[ 1\right] = \int W(x,y)dy-\int W(x,y)\exp (V_2(x,y))dy. \end{aligned}$$
(19)

2.1.3 Diffusion Processes

A diffusion process \(\, X_{t}\,\) in a \(d\)-dimensional manifold is described by the differential equation

$$\begin{aligned} dX=A_{0}(X)dt+\sum _\alpha A_{\alpha }(X)\circ dW_{\alpha }(t). \end{aligned}$$
(20)

where the drift \(A_{0}\) and the diffusion coefficient \(A_{\alpha }\) are arbitrary smooth vector fields on \(\mathcal {E}\), \(W_{\alpha }\) are independent Wiener processes, and the range of \(\alpha \) is model dependent. The symbol \(\circ \) indicates that the Stratonovich convention is used. The explicit form of the generator related to (20) is

$$\begin{aligned} L=A_{0} \cdot \nabla +\sum _\alpha \frac{1}{2}\left( A_{\alpha }.\nabla \right) ^{2}=\widehat{A_{0}} \cdot \nabla +\frac{1}{2}\nabla \cdot D \cdot \nabla , \end{aligned}$$
(21)

with the modified drift and covariance

$$\begin{aligned} \widehat{A_{0}}(x)=A_{0}(x)-\frac{1}{2}\sum _\alpha \left( \nabla \cdot A_{\alpha }\right) (x)A_{\alpha }(x)\qquad \text {and}\qquad D^{ij}(x)=\sum _\alpha A_{\alpha }^{i}(x)A_{\alpha }^{j}(x), \end{aligned}$$
(22)

respectively, where \(i=1,\ldots ,d\) and \(j=1,\ldots ,d\). It is assumed that \(D\) is strictly positive. The detailed balance relation (7) with respect to the invariant measure \(\mu _{inv}(dx)=\rho _{inv}(x)dx\) is then equivalent to \(\widehat{A_{0}}=\frac{D}{2}\nabla \left( \ln \rho _{inv}\right) \).

A central quantity for diffusion processes is the hydrodynamic probability current [45]

$$\begin{aligned} J_{\rho _{t}}=\widehat{A}_{0}\rho _{t}-\frac{D}{2}(\nabla \rho _{t}). \end{aligned}$$
(23)

The conservation of the current associated with the invariant density then reads

$$\begin{aligned} \nabla \cdot J_{\rho _{inv}}=0. \end{aligned}$$
(24)

Similar to jump processes, the trajectorial measure of two diffusion processes can be compared with a generator corresponding to a non-conservative process, which in the present case is defined as

$$\begin{aligned} L'\equiv L+B_2\cdot \nabla +B_1, \end{aligned}$$
(25)

where \(B_2\) and \(B_1\) are arbitrary vector field and scalar, respectively. Combining the Cameron–Martin–Girsanov lemma [46, 48] and the Feynamm–Kac relation [44, 46], it follows that

$$\begin{aligned} \frac{d\mathbb {P}_{L',\mu _{0},T}\left[ x\right] }{d\mathbb {P}_{L,\mu _{0},T}\left[ x\right] }=\exp (V_{T}\left[ x\right] ), \end{aligned}$$
(26)

where

$$\begin{aligned} V_{T}=&\int _{0}^{T}\left[ D^{-1}(x_{u})B_2(x_{u})\circ dx_{u}\right. \nonumber \\&\left. +\left( B_1(x_{u})-D^{-1}(x_{u})B_2(x_{u})\left( \widehat{A_{0}}+\frac{B_2}{2}\right) (x_{u})-\frac{1}{2}\left( \nabla \cdot B_2\right) (x_{u})\right) du\right] . \end{aligned}$$
(27)

Choosing

$$\begin{aligned} B_2=DV_{2}\qquad \text {and}\qquad B_1=V_{2} \cdot \left( \widehat{A_{0}}+\frac{DV_{2}}{2}\right) +\frac{1}{2}\nabla \cdot \left( DV_{2}\right) +V_{1}, \end{aligned}$$
(28)

we obtain

$$\begin{aligned} V_{T}=\int _{0}^{T}dt\left[ V_{1}(X_{t})+V_{2}(X_{t})\circ dX_{t}\right] . \end{aligned}$$
(29)

Equation (26) then becomes

$$\begin{aligned} \frac{d\mathbb {P}_{L_{V_{1},V_{2}},\mu _{0},T}}{d\mathbb {P}_{L,\mu _{0},T}}\left[ X\right] =\exp \left( \int _{0}^{T}dt\left[ V_{1}(X_{t})+V_{2}(X_{t})\circ dX_{t}\right] \right) , \end{aligned}$$
(30)

where the twisted generator reads

$$\begin{aligned} L_{V_{1},V_{2}}&= L'= L+DV_{2}\nabla +V_{2}\cdot \left( \widehat{A_{0}}+\frac{DV_{2}}{2}\right) +\frac{1}{2}\nabla \cdot \left( DV_{2}\right) +V_{1}\nonumber \\&=\widehat{A_{0}} \cdot \left( \nabla +V_{2}\right) +\left( \nabla +V_{2}\right) \circ \frac{D}{2}\circ \left( \nabla +V_{2}\right) +V_{1}. \end{aligned}$$
(31)

2.2 Empirical Observables and Ergodic Behavior

2.2.1 Empirical Density, Flow and Current

The set of functional observables that define the level 2.5 of large deviations depend of the type of Markov processes considered. For pure jump processes the set of observables is the empirical density \(\rho _{T}^{e}\) and empirical flow \(C_{T}^{e}\). They are given by

$$\begin{aligned} \rho _{T}^{e}(x)= & {} \frac{1}{T}\int _{0}^{T}\delta \left( X_{t}-x\right) dt\qquad \text {and} \nonumber \\ C_{T}^{e}(x,y)= & {} \frac{1}{T}\sum _{0\le s\le T/ X_{s^-}\ne X_{s^+}}\delta \left( X_{t^{-}}-x\right) \delta \left( X_{t^{+}}-y\right) . \end{aligned}$$
(32)

The empirical density \(\rho _{T}^{e}(x)\) Footnote 3 can be understood as the fraction of time spent in \(x\) over \(\left[ 0,T\right] \) and the empirical flow \(C_{T}^{e}(x,y)\) as the number of jumps from \(x\) to \(y\) (times \(1/T\)) during the trajectory. Another functional of central interest is the empirical current

$$\begin{aligned} J_{T}^{e}(x,y)=C_{T}^{e}(x,y)-C_{T}^{e}(y,x). \end{aligned}$$
(33)

Since we assume the system to be ergodic, the law of large numbers for the empirical density and flow becomes

$$\begin{aligned} \rho _{T}^{e}\rightarrow \rho _{inv}\qquad \text {and}\qquad C_{T}^{e}\rightarrow C_{\rho _{inv}}, \end{aligned}$$
(34)

where

$$\begin{aligned} C_{\rho _{inv}}(x,y)=\rho _{inv}(x)W(x,y). \end{aligned}$$
(35)

Moreover, the finite time Kirchkoff’s law [33] reads

$$\begin{aligned}&\int dyC_{T}^{e}(x,y)-\int dyC_{T}^{e}(y,x) \nonumber \\&\quad = \frac{1}{T}\sum _{0\le s\le T/ X_{s^-}\ne X_{s^+}}\delta \left( X_{t^{-}}-x\right) -\frac{1}{T}\sum _{0\le s\le T/ X_{s^-}\ne X_{s^+}}\delta \left( X_{t^{+}}-x\right) \nonumber \\&\quad = \frac{\delta \left( X_{0}-x\right) -\delta \left( X_{T}-x\right) }{T}=\text {O}(1/T). \end{aligned}$$
(36)

In the following we will show that the large deviation rate function of \(C_{T}^{e}\) is infinite for any untypical \(C\) not fulfilling

$$\begin{aligned} \int dyC(x,y)=\int dyC(y,x).\ \end{aligned}$$
(37)

For diffusion processes, the set of observables is composed by the empirical density \(\rho _{T}^{e}\) and the empirical current \(j_{T}^{e}\), which read

$$\begin{aligned} \rho _{T}^{e}(x)=\frac{1}{T}\int _{0}^{T}\delta \left( X_{t}-x\right) dt\text { }\qquad \text {and}\qquad j_{T}^{e}(x)=\frac{1}{T}\int _{0\text { }}^{T}\delta \left( X_{t}-x\right) \circ dX_{t}. \end{aligned}$$
(38)

Roughly speaking, the empirical current (see [28] for a rigorous definition) is the sum of the displacements that the system makes if it is at \(x\). For diffusion processes, with the ergodic assumption the law of large numbers takes the form

$$\begin{aligned} \rho _{T}^{e}\rightarrow \rho _{inv}\qquad \text {and}\qquad j_{T}^{e}\rightarrow J_{\rho _{inv}}. \end{aligned}$$
(39)

where the current \(J_{\rho _{inv}}\) is defined in relation (23). From the definition (38), we obtain the pathwise constraintFootnote 4

$$\begin{aligned} \nabla \cdot j_{T}^{e}(x)=\frac{1}{T}\left( \delta \left( X_{0}-x\right) -\delta \left( X_{t}-x\right) \right) . \end{aligned}$$
(40)

Hence, analogously to (37) the large deviation rate function of \(j_{T}^{e}\) is infinite at any \(j\) not fulfilling

$$\begin{aligned} \nabla .j=0. \end{aligned}$$
(41)

2.2.2 Action Functional and Fluctuating Entropy

For time-homogeneous processes, the action functional \(\mathbb {W}_{T}\) is obtained by comparing the trajectorial measure of \(X_{t}\) with the time-reversed trajectorial measure. At the level of trajectories, we introduce the path-wise time inversionFootnote 5 \(R\) acting on the space of trajectories as

$$\begin{aligned} R\left[ X_0^T\right] _{t}\equiv \left[ X_0^T\right] _{T-t}, \end{aligned}$$
(42)

where \(\left[ X_0^T\right] _{t}\equiv X_t\).

The action functional is defined by the relation

$$\begin{aligned} \exp \left( -\mathbb {W}_{T}\right) \equiv \frac{R_{\star }\left( d\mathbb {P}_{L,\mu _{0}^{b},T}\right) }{d\mathbb {P}_{L,\mu _{0},T}}. \end{aligned}$$
(43)

where \(\mu _{0}^{b}\) is the arbitrary initial measure of the reversed trajectory and the push-forward measure can be loosely written as \(R_{\star }\left( d\mathbb {P}_{L,\mu _{0}^{b},T}\right) [x_0^T]= d\mathbb {P}_{L,\mu _{0}^{b},T}\left[ R\left[ x_0^T\right] \right] \). Due to the freedom in choosing \(\mu _{0}\) and \(\mu _{0}^{b}\), it is possible to identify the action functional \(\mathbb {W}_{T}\) with different quantities. It becomes the fluctuating total entropy production \(\mathbb {\sigma }_{T}\) for \(\mu _{0}^{b}(dx)=\mu _{T}(dx)\equiv \int dy\rho _{0}(y)P_{0}^{T}(y,x)dx\) and the fluctuating entropy increase of the external environment \(\mathbb {J}_{T}\) for \(\mu _{0}(dx)=\mu _{0}^{b}(dx)=dx\). The difference between \(\mathbb {\sigma }_{T}\) and \(\mathbb {J}_{T}\) is the boundary term \(\ln \left( \rho _{0}(x_{0})\right) -\ln \left( \rho _{T}(x_{T})\right) \), which is the variation of the entropy of the system. We note that names like total entropy production or entropy increase of the external environment become meaningful only if a Markov process is given a clear physical interpretation. In this case these functionals are related to key thermodynamic quantities [47].

For pure jump processes this action functional is [37, 38]

$$\begin{aligned} \mathbb {W}_{T}=\ln \left( \rho _{0}(X_{0})\right) -\ln \left( \rho _{0}^{b}(X_{T})\right) +\sum _{0\le s\le T/ X_{s^-}\ne X_{s^+}}\ln \left[ \frac{W(X_{t^{-}},X_{t^{+}})}{W(X_{t^{+}},X_{t^{-}})}\right] . \end{aligned}$$
(44)

For diffusion processes it reads [37]

$$\begin{aligned} \mathbb {W}_{T}=\ln \left( \rho _{0}(X_{0})\right) -\ln \left( \rho _{0}^{b}(X_{T})\right) +2\int _{0}^{T}dt\widehat{A_{0}}\left( X_{t}\right) \cdot D^{-1}\left( X_{t}\right) \circ dX_{t}. \end{aligned}$$
(45)

3 Transient Fluctuation Relation

Before obtaining the rate function at the level 2.5, let us briefly discuss the transient fluctuation relation. From the definition of the action functional (43) it follows that for all functionals \(F_{\left[ 0,T\right] }\),

$$\begin{aligned} \mathbb {E}_{\mu _{0}^{b},L}\left[ F_{\left[ 0,T\right] }\circ R\right] =\mathbb {E}_{\mu _{0},L}\left[ F_{\left[ 0,T\right] }\exp \left( -\mathbb {W}_{T}\right) \right] . \end{aligned}$$
(46)

The backward action functional is defined as

$$\begin{aligned} \exp \left( -\mathbb {W}_{T}^{b}\right) \equiv \frac{R_{\star }\left( d\mathbb {P}_{L,\mu _{0},T}\right) }{d\mathbb {P}_{L,\mu _{0}^{b},T}}. \end{aligned}$$
(47)

Comparing (43) and (47) we obtain the antisymmetric relation

$$\begin{aligned} \mathbb {W}_{T}^{b}=-\mathbb {W}_{T}\circ R. \end{aligned}$$
(48)

For the special case \(F_{\left[ 0,T\right] }=\delta (\mathbb {W}_{T}-W)\), with \(\delta \) denoting the indicator function, relation (46) becomes the generalized Crooks relation [10, 14, 37, 38, 47]

$$\begin{aligned} \mathbb {P}_{\mu _{0}^{b},L}(\mathbb {W}_{T}^{b}=-W)=\exp \left( -W\right) \mathbb {P}_{\mu _{0},L}(\mathbb {W}_{T}=W). \end{aligned}$$
(49)

From (46), we also deduce the Jarzynski equality [14, 31]

$$\begin{aligned} \mathbb {E}_{\mu _{0},L}\left[ \exp (-\mathbb {W}_{T})\right] =1. \end{aligned}$$
(50)

This relation implies two important results. First, (50) and Jensen’s inequality gives the second law of thermodynamics \(\mathbb {E}_{\mu _{0},L}\left[ \mathbb {W}_{T}\right] \ge 0\). Second, (50) and the Markov inequality \(\mathbb {P}_{\mu _{0},L}\left( \exp \left( -\mathbb {W}_{T}\right) \ge \exp (L)\right) \le \frac{\mathbb {E}_{\mu _{0},L}\left[ \exp (-\mathbb {W}_{T})\right] }{\exp (L)}\) gives an upper boundFootnote 6 on the probability of “transient deviations” from the second law, i.e., \(\mathbb {P}_{\mu _{0},L}\left( \mathbb {W}_{T}\le -L\right) \le \exp \left( -L\right) .\)

4 Heuristic Proof for 2.5 Large Deviations

In this section we demonstrate that the joint fluctuation of empirical density and empirical flow for jump processes, and the joint fluctuation of empirical density and empirical current for diffusion processes admit a large deviation regime with an explicit rate function. For jump processes this rate function reads [40]

$$\begin{aligned}&I\left[ \rho ,C\right] \nonumber \\&\quad ={\left\{ \begin{array}{ll} \int dxdy\left( \begin{array}{ll} -C(x,y)+\rho (x)W(x,y)\\ +\,C(x,y)\ln \frac{C(x,y)}{\rho (x)W(x,y)}\end{array}\right) &{}\quad \text {if }\;\forall x\in \mathcal {E}:\int dyC(x,y)=\int dyC(y,x)\\ \infty &{}\quad \text {otherwise,}\end{array}\right. }\nonumber \\ \end{aligned}$$
(51)

while for diffusion processes it is [9, 39]

$$\begin{aligned} I\left[ \rho ,j\right] =\left\{ \begin{array}{ll} \frac{1}{2}\int dx(j-J_{\rho })(\rho D)^{-1}(j-J_{\rho })&{}\quad \text {if }\;\nabla \cdot j=0\\ \infty &{}\quad \text {otherwise.}\end{array}\right. \end{aligned}$$
(52)

Note that the constraints \(\int dyC(x,y)=\int dyC(y,x)\) and \(\nabla \cdot j=0\) come from (36) and (40), respectively. Formally, by contraction we can obtain the Donsker–Varadhan variational expression for the rate function for the level 2 of large deviations from the level 2.5 rate function. Explicitly, for pure jump processes \(I(\rho )=\min _{C}\left[ I(\rho ,C)\right] \), whereas for diffusion processes \(I(\rho )=\min _{j}\left[ I(\rho ,j)\right] \). These relations lead to

$$\begin{aligned} I\left[ \rho \right] =-\inf _{\left[ h\right] >0}\left[ \int dx\rho (x)h^{-1}(x)L\left[ h\right] (x)\right] , \end{aligned}$$
(53)

where the minimization is over strictly positive functions \(h\). A rigorous proof of this contraction for pure jump processes can be found in [6]. Similarly, a formal contraction implies that the action functional (44) (or (45) for diffusion processes) fulfills a Large Deviation principle. It is also possible to obtain the rate function related to the joint probability of the empirical density \(\rho _{T}^{e}(x,y)\) and the empirical current \(J_{T}^{e}(x,y)\) by contraction from (51) [40].

We present two methods to prove (51) and (52): tilting and a spectral method. The proof for jump processes using the spectral method is original. Proofs using tilting for pure jump processes can be found in [40] and for diffusion processes in [39]. Another proof for diffusion processes using the spectral method was obtained in [9]. The novelty in these cases is in our presentation, which highlight the generality of both methods. A third method, which is totally rigorous, for pure jump processes in a countable space related to the contraction of the rate function of the level 3 of large deviations has been recently obtained in [5]. Whereas the proof using the tilting method is more direct, in the spectral method a connection between the rate function and the maximum eigenvalue of a modified generator is established. This connection is often useful for numerical calculations of rate functions.

4.1 Tilting

We consider, for general stochastic processes \(X_{t}\),Footnote 7 the joint large deviation of \(N\) observables \( \overrightarrow{\omega _{t}^{e}} \equiv \) \(\left\{ \omega _{t,1}^{e},\omega _{t,2}^{e},\ldots ,\omega _{t,N}^{e}\right\} \). The trajectorial measure is denoted by \(d\mathbb {P}_{\mu _{0},T}\) and \(\overrightarrow{\omega _{inv}}\equiv \left\{ \omega _{inv,1},\omega _{inv,2},\ldots ,\omega _{inv,N}\right\} \) represents the typical behavior of \( \overrightarrow{\omega _{t}^{e}} \), with typical behavior meaning almost sure convergence. If the following two conditions are satisfied then the family of probability measures \(\left( \mathbb {P}_{\mu _{0},T}\circ \left\{ \overrightarrow{\omega _{t}^{e}} \right\} ^{-1}\right) _{t\ge 0}\), or equivalently \( \overrightarrow{\omega _{t}^{e}} \), satisfies a large deviation principle with rate function \(I\left( \overrightarrow{\omega } \right) \), where \( \overrightarrow{\omega } =\left\{ \omega _{1},\omega _{2},\ldots ,\omega _{N}\right\} \) is the desired untypical behavior.

  • Condition 1 There exists an ergodic tilted process \(X_{t}'\), with trajectorial measure \(d\mathbb {P}'_{\mu _{0},T}\), such that its typical behavior is \( \overrightarrow{\omega _{t}^{e}} \).

  • Condition 2 For this tilted process, there exists a function \(I\) defined by the asymptotic relation \(\frac{d\mathbb {P}_{\mu _{0},T}}{d\mathbb {P}'_{\mu _{0},T}}\left[ X\right] \sim \exp \left( -TI\left( \overrightarrow{\omega _{T}^{e}} \right) \right) \). This means that asymptotically the Radon-Nykodym derivative can be expressed in terms of the \(N\) observables \(\omega _{t,1}^{e},\omega _{t,2}^{e},\ldots ,\omega _{t,N}^{e}\).

Note that larger \(N\) makes the fulfillment of the first condition harder, while the fulfillment of second condition becomes easier. For a fixed process \(X_{t}\) and a fixed observable \( \overrightarrow{\omega _{t}^{e}} \), we postulate that the process \(X_t'\) exists.

Formal proofFrom the second condition it follows that

$$\begin{aligned} \mathbb {P}_{\mu _{0},T}\left[ \overrightarrow{\omega _{T}^{e}} \simeq \overrightarrow{\omega }\right]&= \int d\mathbb {P}_{\mu _{0},T}\left[ X\right] \delta ( \overrightarrow{\omega _{T}^{e}} -\overrightarrow{\omega })\nonumber \\ {}&=\int d\mathbb {P}'_{\mu _{0},T}\left[ X\right] .\frac{d\mathbb {P}_{\mu _{0},T}}{d\mathbb {P}'_{\mu _{0},T}}\left[ X\right] \delta ( \overrightarrow{\omega _{T}^{e}} -\overrightarrow{\omega })\nonumber \\&\sim \int d\mathbb {P}'_{\mu _{0},T}\left[ X\right] .\exp \left( -TI\left( \overrightarrow{\omega _{T}^{e}} \right) \right) \delta ( \overrightarrow{\omega _{T}^{e}} -\overrightarrow{\omega })\nonumber \\&\sim \exp \left( -TI\left( \overrightarrow{\omega } \right) \right) \int d\mathbb {P}'_{\mu _{0},T}\left[ X\right] \delta ( \overrightarrow{\omega _{T}^{e}} -\overrightarrow{\omega }). \end{aligned}$$
(54)

Since the process \(X'_{t}\) is assumed to be ergodic, with the first condition, we obtain

$$\begin{aligned} \int d\mathbb {P}'_{\mu _{0},T}\left[ X\right] \delta ( \overrightarrow{\omega _{T}^{e}} -\overrightarrow{\omega })=\mathbb {P}'_{\mu _{0}}\left[ \overrightarrow{\omega _{T}^{e}} \simeq \overrightarrow{\omega }\right] \rightarrow 1, \end{aligned}$$
(55)

which, with (54), gives the required Large deviation rate function \(I\). Rigorously, following the same procedure for \(\mathbb {P}_{\mu _{0}}\left[ \overrightarrow{\omega _{T}^{e}} \in B\left( \overrightarrow{\omega },\epsilon \right) \right] \), where \(B\left( \overrightarrow{\omega },\epsilon \right) \) an open ball of radius \(\epsilon \), the lower bound of the rate function (1) is obtained [5]. We note that these two conditions are not enough for a rigorous proof, which requires a lower and an upper bound on the rate function [5, 36].

Examples:

  • If \(X_{t}\) is a Markov process and \( \overrightarrow{\omega _{t}^{e}} \equiv \) \(\left\{ \rho _{t}^{e}\right\} \), from the Girsanov relation (18) (or (26) for diffusion processes), we obtain that it is not possible to find a process fulfilling the second condition. The solution to find an explicit rate function is then to increase \(N\).

  • If \(X_{t}\) is a pure jump process and \( \overrightarrow{\omega _{t}^{e}} =\left\{ \rho _{t}^{e},C_{t}^{e}\right\} \), by choosing \(X'\) with the transition rates

    $$\begin{aligned} W'(x,y)=\frac{C(x,y)}{\rho (x)}, \end{aligned}$$
    (56)

    the ergodic behavior of \(X'_t\) becomes \(\rho '_{inv}=\rho \) and \(C_{\rho '_{inv}}=C\), which implies the fulfillment of condition 1. The process \(X'_t\) also obeys the conservation law (15), leading to the constraint on the marginal of \(C\) in the rate function (51). The Girsanov relation (18) with \(V_{2}(x,y)=\ln \left( \frac{C(x,y)}{\rho (x)W(x,y)}\right) \) becomes

    $$\begin{aligned} \frac{d\mathbb {P}_{L_{V_2},T}}{d\mathbb {P}_{L,T}}\left[ X\right]&= \exp \left[ \sum _{0\le s\le T/ X_{s^-}\ne X_{s^+}}\ln \left( \frac{C(X_{s^{-}},X_{s^{+}})}{\rho (X_{s^{-}})W(X_{s^{-}},X_{s^{+}})}\right) \right. \nonumber \\&\quad \left. -\int _{0}^{T}ds\int _{\mathcal {E}}dy\left( \frac{C(X_{s},y)}{\rho (X_{s})}-W(X_{s},y)\right) \right] \nonumber \\&= \exp \left[ T\int _{\mathcal {E}^{2}}dydx\left[ C_{T}^{e}(x,y)\ln \left( \frac{C(x,y)}{\rho (x)W(x,y)}\right) \right. \right. \nonumber \\&\quad \left. \left. -\rho _{T}^{e}(x)\left( \frac{C(x,y)}{\rho (x)}-W(x,y)\right) \right] \right] . \end{aligned}$$
    (57)

    Hence, condition 2 is exactly verified at finite time with the rate function \(I\) given by (51).

  • If \(X_{t}\) is a diffusion process and \( \overrightarrow{\omega _{t}^{e}} =\left\{ \rho _{t}^{e},j_{t}^{e}\right\} \), condition 1 is fulfilled by choosing \(X'_{t}\) with drift and diffusion coefficient

    $$\begin{aligned} A_{0}'=\frac{j+\frac{D}{2}\nabla \rho }{\rho }\qquad \text {and}\qquad A_{\alpha }'=A_{\alpha }. \end{aligned}$$
    (58)

    This can be shown with the ergodic law (38), which implies

    $$\begin{aligned} \rho '_{inv}=\rho \text { and }J_{\rho '_{inv}}=j, \end{aligned}$$
    (59)

where \(\rho '{}_{inv}\) is the invariant density of the process \(X'_{t}\). From the Girsanov relation (26), condition 2 is verified with \(I\) given by (52).

  • It is possible to apply the tilting method to find the rate function of more informative quantities, e.g., the \(m\)-words generalization of empirical flow associated with a pure jump process [12]. The method can also be used to obtain the rate function of the empirical density and flow of pure jump processes that are non-homogeneous and periodic in time [7].

4.2 Spectral Method

4.2.1 Generating Function

The scaled cumulant generating function associated with the vector \( \overrightarrow{\omega _{t}^{e}} \) is defined as

$$\begin{aligned} \varLambda \left[ V_{1},V_{2},\ldots ,V_{N}\right] =\lim _{T\rightarrow \infty }\frac{1}{T}\ln \left( \mathbb {E}_{\mu _{0},L}\left[ \exp \left( T\sum _{i=1}^{N}\left\langle \omega _{t,i}^{e},V_{i}\right\rangle \right) \right] \right) \end{aligned}$$
(60)

where \(V_{i}\) are objects having the same tensorial nature as \(\omega _{t,i}^{e}\) and \(\left\langle .,.\right\rangle \) denotes the associated canonical scalar product. Assuming that the Gärtner-Ellis theorem [16, 17] is still valid in this functional formFootnote 8, then if \(\varLambda \) exist and is differentiable for all \(V_{i}\), the family of probability measures \(\left( \mathbb {P}_{\mu _{0},T}\circ \left\{ \overrightarrow{\omega _{t}^{e}}\right\} ^{-1}\right) _{t\ge 0}\) satisfies a large deviation principle with rate function

$$\begin{aligned} I\left[ \omega _{1},\omega _{2},\ldots ,\omega _{N}\right] =\sup _{\overrightarrow{V}}\left\{ \sum _{i=1}^{N}\left\langle \omega _{i},V_{i}\right\rangle -\varLambda \left[ V_{1},V_{2},\ldots ,V_{N}\right] \right\} . \end{aligned}$$
(61)

For pure jump processes, with \( \overrightarrow{\omega _{t}^{e}} = \left\{ \rho _{t}^{e},C_{t}^{e}\right\} \), the scaled cumulant generating function becomes

$$\begin{aligned}&\varLambda \left[ V_{1},V_{2}\right] \nonumber \\&\quad =\lim _{T\rightarrow \infty }\frac{1}{T}\ln \left( \mathbb {E}_{\mu _{0},L}\left[ \exp \left( \int _{0}^{T}dtV_{1}(X_{t})+\sum _{0\le s\le T/ X_{s^-}\ne X_{s^+}}V_{2}\left( X_{t^{-}},X_{t^{+}}\right) \right) \right] \right) .\quad \quad \quad \end{aligned}$$
(62)

For diffusion processes, with \( \overrightarrow{\omega _{t}^{e}} =\left\{ \rho _{t}^{e},j_{t}^{e}\right\} \) we obtain

$$\begin{aligned} \varLambda \left[ V_{1},V_{2}\right] =\lim _{T\rightarrow \infty }\frac{1}{T}\ln \left( \mathbb {E}_{\mu _{0},L}\left[ \exp \left( \int _{0}^{T}dt\left[ V_{1}(X_{t})+V_{2}(X_{t})\circ dX_{t}\right] \right) \right] \right) . \end{aligned}$$
(63)

4.2.2 Twisted Process

Defining

$$\begin{aligned} A_{T}^{e}\equiv \frac{1}{T}\left( \int _{0}^{T}dtV_{1}(X_{t})+\sum _{0\le s\le T/ X_{s^-}\ne X_{s^+}}V_{2}\left( X_{t^{-}},X_{t^{+}}\right) \right) , \end{aligned}$$
(64)

relation (17), which is valid for pure jump processes, is equivalent to

$$\begin{aligned} \mathbb {E}_{L,\mu _{0}}\left[ \exp \left( TA_{T}^{e}\right) F\right] =\mathbb {E}_{L_{V_{1},V_{2}},\mu _{0}}\left[ F\right] , \end{aligned}$$
(65)

where \(F\) is a generic functional and \(L_{V_{1},V_{2}}\) is defined in (16) for pure jump processes. For diffusion processes

$$\begin{aligned} A_{T}^{e}\equiv \frac{1}{T}\left( \int _{0}^{T}dt\left[ V_{1}(X_{t})+V_{2}(X_{t})\circ dX_{t}\right] \right) , \end{aligned}$$
(66)

and relation (30) is equivalent to (65), with \(L_{V_{1},V_{2}}\) defined in (31).

The special functional \(F=\delta (X_{T}-y)\) gives the Feynamn–Kac type relation

$$\begin{aligned} \mathbb {E}_{L,\mu _{0}}\left[ \exp \left( TA_{T}^{e}\right) \delta (X_{T}-y)\right]= & {} \int _{\mathcal {E}}\mu _{0}(dx_{0})\exp \left( TL_{V_{1},V_{2}}\right) (x_{0},y). \end{aligned}$$
(67)

We assume that the twisted operator \(L_{V_{1},V_{2}}\) is of Perron–Frobenius type, i.e., there exists a positive gaped principal eigenvalue with maximal real part \(\lambda \left[ V_{1},V_{2}\right] \) related to a unique positive right eigenvector \(r\left[ V_{1},V_{2}\right] \) and a unique positive left eigenvector \(l\left[ V_{1},V_{2}\right] \) Footnote 9. Multiplicative factors are fixed by normalization as

$$\begin{aligned} \int _{\mathcal {E}}l\left[ V_{1},V_{2}\right] (x)dx=1\quad \text {and}\quad \int _{\mathcal {E}}l\left[ V_{1},V_{2}\right] (x)r\left[ V_{1},V_{2}\right] (x)dx=1. \end{aligned}$$
(68)

It is also assumed that the initial measure fulfills

$$\begin{aligned} \int _{\mathcal {E}}\mu _{0}(dx)r\left[ V_{1},V_{2}\right] (x)<\infty . \end{aligned}$$
(69)

With this principal eigenvalue and its associated eigenvectors, the semi-group generated by \(L_{V_{1},V_{2}}\) can be expanded as

$$\begin{aligned} \exp \left( TL_{V_{1},V_{2}}\right) (x,y)=\exp \left( T\lambda _{V_{1},V_{2}}\right) \left( r\left[ V_{1},V_{2}\right] (x)l\left[ V_{1},V_{2}\right] (y)+O\left( \exp \left( -t\varDelta _{V_{1},V_{2}}\right) \right) \right) , \end{aligned}$$
(70)

where \(\varDelta _{V_{1},V_{2}}\) is the spectral gap. Combining this last equation with the Feynman–Kac relation (67) we obtain

$$\begin{aligned} \mathbb {E}_{\mu _{0},L}\left[ \exp \left( TA_{T}^{e}\right) \delta (X_{T}-y)\right]= & {} \exp \left( T\lambda _{V_{1},V_{2}}\right) \int _{\mathcal {E}}\mu _{0}(dx_{0})\left( r\left[ V_{1},V_{2}\right] (x)l\left[ V_{1},V_{2}\right] (y)\right. \nonumber \\&+\left. O\left( \exp \left( -t\varDelta _{V_{1},V_{2}}\right) \right) \right) . \end{aligned}$$
(71)

Therefore, the scaled cumulant generating function of \(A_{T}^{e}\) is

$$\begin{aligned} \varLambda \left[ V_{1},V_{2}\right] =\lambda \left[ V_{1},V_{2}\right] . \end{aligned}$$
(72)

We are now ready to prove that (61) allows us to obtain the explicit forms (51) and (52).

4.2.3 Level 2.5 for Jump Processes

Using (72), relation (61), with \( \overrightarrow{\omega _{t}^{e}} \equiv \) \(\left\{ \rho _{t}^{e},C_{t}^{e}\right\} \), becomes

$$\begin{aligned} I\left[ \rho ,C\right] =\sup _{V_{1},V_{2}}\left\{ \int _{\mathcal {E}}dx\rho (x)V_{1}(x)+\int \int _{\mathcal {E}^{2}}dxdyC(x,y)V_{2}(x,y)-\lambda \left[ V_{1},V_{2}\right] \right\} . \end{aligned}$$
(73)

The functions \(V_{1}^{\star }\) and \(V_{2}^{\star }\) extremizing the above expression are then obtained by solving the equations

$$\begin{aligned} \left. \frac{\delta \lambda \left[ V_{1},V_{2}\right] }{\delta V_{1}(x)}\right| _{V_{1}^{\star },V_{2}^{\star }}=\rho (x)\qquad \text {and}\qquad \left. \frac{\delta \lambda \left[ V_{1},V_{2}\right] }{\delta V_{2}(x,y)}\right| _{V_{1}^{\star },V_{2}^{\star }}=C(x,y). \end{aligned}$$
(74)

Furthermore, the normalization (68) and \(L_{V_{1},V_{2}}\left[ r\left[ V_{1},V_{2}\right] \right] (x)= \lambda \left[ V_{1},V_{2}\right] r\left[ V_{1},V_{2}\right] (x)\), lead to

$$\begin{aligned} \int _{\mathcal {E}}l\left[ V_{1},V_{2}\right] (x)L_{V_{1},V_{2}}\left[ r\left[ V_{1},V_{2}\right] \right] (x)dx=\lambda \left[ V_{1},V_{2}\right] . \end{aligned}$$
(75)

From (16), applying functional derivatives to (75) we obtain

$$\begin{aligned} {\left\{ \begin{array}{ll} l\left[ V_{1},V_{2}\right] (x)r\left[ V_{1},V_{2}\right] (x)=\frac{\delta \lambda \left[ V_{1},V_{2}\right] }{\delta V_{1}(x)}\\ l\left[ V_{1},V_{2}\right] (x)W(x,y)\left[ \exp \left( V_{2}(x,y)\right) \right] r\left[ V_{1},V_{2}\right] (y)=\frac{\delta \lambda \left[ V_{1},V_{2}\right] }{\delta V_{2}(x,y)},\end{array}\right. } \end{aligned}$$
(76)

which, with (74), leads to

$$\begin{aligned} {\left\{ \begin{array}{ll} l\left[ V_{1}^{\star },V_{2}^{\star }\right] (x)r\left[ V_{1}^{\star },V_{2}^{\star }\right] (x)=\rho (x)\\ l\left[ V_{1}^{\star },V_{2}^{\star }\right] (x)W(x,y)\left[ \exp \left( V_{2}^{\star }(x,y)\right) \right] r\left[ V_{1}^{\star },V_{2}^{\star }\right] (y)=C(x,y).\end{array}\right. } \end{aligned}$$
(77)

From the definitions of \(l\left[ V_{1},V_{2}\right] \) and \(r\left[ V_{1},V_{2}\right] \) as the left and right eigenvectors of \(L_{V_{1},V_{2}}\), the second equation in (77) implies

$$\begin{aligned} {\left\{ \begin{array}{ll} \int dxC(x,y)=\left( \lambda \left[ V_{1}^{\star },V_{2}^{\star }\right] +W\left[ 1\right] (y)-V_1(y)\right) l\left[ V_{1}^{\star },V_{2}^{\star }\right] (y)r\left[ V_{1}^{\star },V_{2}^{\star }\right] (y)\\ \int dxC(y,x)=\left( \lambda \left[ V_{1}^{\star },V_{2}^{\star }\right] +W\left[ 1\right] (y)-V_1(y)\right) l\left[ V_{1}^{\star },V_{2}^{\star }\right] (y)r\left[ V_{1}^{\star },V_{2}^{\star }\right] (y),\end{array}\right. } \end{aligned}$$
(78)

where the first (second) line is obtained with an integration in \(x\) (\(y\)). Hence, the constraint (37) is a necessary condition for the extremization and, moreover, using the first equation in (77) we obtain

$$\begin{aligned} \lambda \left[ V_{1}^{\star },V_{2}^{\star }\right] +W\left[ 1\right] (y)-V_{1}^{\star }(y)=\frac{\int dxC(x,y)}{\rho (y)}. \end{aligned}$$
(79)

Finally, from (73) we obtain the rate function (51) as follows,

$$\begin{aligned} I\left[ \rho ,C\right]&= \int \int _{\mathcal {E}^{2}}dxdyC(x,y)V_{2}^{\star }(x,y)-\int _{\mathcal {E}}dx\rho (x)\left( \lambda \left[ V_{1}^{\star },V_{2}^{\star }\right] -V_{1}^{\star }(x)\right) \nonumber \\&= \int \int _{\mathcal {E}^{2}}dxdyC(x,y)\ln \left[ \frac{C(x,y)}{l\left[ V_{1}^{\star },V_{2}^{\star }\right] (x)W(x,y)r\left[ V_{1}^{\star },V_{2}^{\star }\right] (y)}\right] \nonumber \\&\quad -\int _{\mathcal {E}}dx\rho (x)\left( \frac{\int dyC(y,x)}{\rho (x)}-W\left[ 1\right] (x)\right) \nonumber \\&= \int \int _{\mathcal {E}^{2}}dxdyC(x,y)\ln \left[ \frac{C(x,y)}{l\left[ V_{1}^{\star },V_{2}^{\star }\right] (x)r\left[ V_{1}^{\star },V_{2}^{\star }\right] (x)W(x,y)}\right] \nonumber \\&\quad +\int \int _{\mathcal {E}^{2}}dxdyC(x,y)\ln \left[ \frac{r\left[ V_{1}^{\star },V_{2}^{\star }\right] (x)}{r\left[ V_{1}^{\star },V_{2}^{\star }\right] (y)}\right] \nonumber \\&\quad - \int _{\mathcal {E}}dx\rho (x)\left( \frac{\int dyC(y,x)}{\rho (x)}-\int dyW(x,y)\right) \nonumber \\&= \int \int _{\mathcal {E}^{2}}dxdyC(x,y)\ln \left[ \frac{C(x,y)}{\rho (x)W(x,y)}\right] \nonumber \\&\quad -\int _{\mathcal {E}}dx\rho (x)\left( \frac{\int dyC(y,x)}{\rho (x)}-\int dyW(x,y)\right) \nonumber \\&\quad + \int _{\mathcal {E}}dx\ln \left[ r\left[ V_{1}^{\star },V_{2}^{\star }\right] (x)\right] \int _{\mathcal {E}}dy\left( C(x,y)-C(y,x)\right) . \end{aligned}$$
(80)

Passing from the first to the second line we used \(V_{2}^{\star }(x,y)=\ln \left[ \frac{C(x,y)}{l\left[ V_{1}^{\star },V_{2}^{\star }\right] (x)W(x,y)r\left[ V_{1}^{\star },V_{2}^{\star }\right] (y)}\right] \), which follows from (77), and Eq. (79). Moreover, in the last equality we used the first equation in (77) and the last term is zero due to the constraint (37), thus leading to expression (51) for the rate function.

4.2.4 Level 2.5 for Diffusion Processes

Using (72), for diffusion processes (61) becomes

$$\begin{aligned} I\left[ \rho ,j\right] =\sup _{V_{1},V_{2}}\left\{ \int _{\mathcal {E}}dx\rho (x)V_{1}(x)+j(x).V_{2}(x)-\lambda \left[ V_{1},V_{2}\right] \right\} . \end{aligned}$$
(81)

The following three change of variables lead to the final expression (52).

  • First, \(\left( V_{1},V_{2}\right) \rightarrow \left( V'_{1}=\ln \left( r\left[ V_{1},V_{2}\right] \right) ,V_{2}\right) \), leading to

    $$\begin{aligned} I\left[ \rho ,j\right] =\sup _{V'_{1},V_{2}}\left\{ \int _{\mathcal {E}}dx\rho (x)\left( -\exp \left( -V'_{1}(x)\right) L_{0,V_{2}}\left[ \exp \left( V'_{1}\right) \right] (x)\right) +j(x).V_{2}(x)\right\} . \end{aligned}$$
    (82)

    This is proved in Appendix 1. Note that \(\ln \left( r\left[ V_{1},V_{2}\right] \right) \) is well defined because \(r\left[ V_{1},V_{2}\right] \) is positive (from the Perron–Frobenius theorem).

  • Second, \(\left( V'_{1},V_{2}\right) \rightarrow \left( V'_{1},V'_{2}=V_{2}+\nabla V'_{1}\right) \), leading to

    $$\begin{aligned} I\left[ \rho ,j\right]&= -\inf _{V'_{1}}\left( \int _{\mathcal {E}}dxj(x).\nabla V'_{1}\right) \nonumber \\&\quad -\inf _{V''_{2}}\left( \int _{\mathcal {E}}dx\left[ \left( V'_{2}-\left( \rho D\right) ^{-1}\left( j-J_{\rho }\right) \right) \frac{\rho D}{2}\left( V'_{2}-\left( \rho D\right) ^{-1}\left( j-J_{\rho }\right) \right) \right] \right) \nonumber \\&\quad +\int dx\left( j-J_{\rho }\right) \frac{\left( \rho D\right) ^{-1}}{2}\left( j-J_{\rho }\right) . \end{aligned}$$
    (83)

    This is proved in Appendix 2.

  • Third, \(\left( V'_{1},V'_{2}\right) \rightarrow \left( V'_{1},V''_{2}=V'_{2}-\left( \rho D\right) ^{-1}\left( j-J_{\rho }\right) \right) \), finally gives

    $$\begin{aligned} I\left[ \rho ,j\right]&=-\inf _{V'_{1}}\left( \int _{\mathcal {E}}dxj(x).\nabla V'_{1}\right) \nonumber \\&\quad -\inf _{V''_{2}}\left( \int _{\mathcal {E}}dxV''_{2}(x)\frac{\rho D}{2}(x)V''_{2}(x)\right) +\int dx\left( j-J_{\rho }\right) \frac{\left( \rho D\right) ^{-1}}{2}\left( j-J_{\rho }\right) . \end{aligned}$$
    (84)

    The first term vanishes with fulfillment of the constraint (41) and is \(-\infty \) otherwise, while the second term vanishes. This last equation gives the final form (52).

5 Stationary Fluctuation Relation at the Level 2.5

We now consider the fluctuating entropy \(\mathbb {J}_{T}\), which is obtained from the action functional (43) setting \(\mu _{0}(dx)=\mu _{0}^{b}(dx)=dx\). We define the function

$$\begin{aligned} \mathbb {J}_{T}/T=w(\rho _T^e,C_T^e)\qquad \text {and}\qquad \mathbb {J}_{T}/T=w(\rho _T^e,j_T^e), \end{aligned}$$
(85)

for pure jump and diffusion processes, respectively. From formulas (44) and (45), this function reads

$$\begin{aligned} w(\rho ,C)=\int dxdyC(x,y)\ln \left[ \frac{W(x,y)}{W(y,x)}\right] \quad \text {and}\quad w(\rho ,j)=2\int dx\widehat{A_{0}}\left( x\right) .D^{-1}\left( x\right) j(x), \end{aligned}$$
(86)

The choice \(F_{\left[ 0,T\right] }=\delta (\mathbb {\rho }^e_{T}-\rho ,C^e_{T}-C)\) for pure jump and \(F_{\left[ 0,T\right] }=\delta (\mathbb {\rho }^e_{T}-\rho ,j^e_{T}-j)\) for diffusion processes in (46) gives the finite time relation

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathbb {P}_{\mu _{0},L}(\rho _{T}^{e}=\rho ,C_{T}^{e}=C^{t})=\exp \left( -Tw(\rho ,C)\right) \mathbb {P}_{\mu _{0},L}(\rho _{T}^{e}=\rho ,C_{T}^{e}=C)\\ \mathbb {P}_{\mu _{0},L}(\rho _{T}^{e}=\rho ,j_{T}^{e}=-j)=\exp \left( -Tw(\rho ,j)\right) \mathbb {P}_{\mu _{0},L}(\rho _{T}^{e}=\rho ,j_{T}^{e}=C)\end{array}\right. }, \end{aligned}$$
(87)

where we used the general relations \(\rho _{T}^{e}\circ R=\rho _{T}^{e}\), \(j_{T}^{e}\circ R=-j_{T}^{e}\), and \(C_{T}^{e}\circ R=\left( C_{T}^{e}\right) ^{t}\), with the index \(t\) indicating transposition. With the rate function for the large deviations at the level 2.5 obtained in the last section, the large time asymptotic of both sides of the previous relation becomes the stationary fluctuation relation at level 2.5

$$\begin{aligned} I(\rho ,C^{t})=w\left[ \rho ,C\right] +I(\rho ,C)\quad \text {and}\quad I(\rho ,-j)=w\left[ \rho ,j\right] +I(\rho ,j). \end{aligned}$$
(88)

From this relation, with the contraction \(I(w)=\min _{w(\rho ,C)=w}\left[ I(\rho ,C)\right] \) ( or \(I(w)=\min _{w(\rho ,j)=w}\left[ I(\rho ,j)\right] \) for diffusion processes), we obtain the stationary fluctuation relation

$$\begin{aligned} I(-w)=I(w)+w. \end{aligned}$$
(89)

This symmetry on the rate function of \(\mathbb {J}_{T}\) is the GCEM symmetry. This relation can also be obtained from the transient fluctuation relation (49). We note that currents with such a symmetry in the rate function that are different from the fluctuating entropy \(\mathbb {J}_{T}\) have been found in [13]. Investigating, the relation between this symmetric non-entropic currents and large deviations at the level 2.5 would be interesting.