1 Introduction

We start with a brief review of Onsager theory [1,2,3,4,5,6]. Let \(\varvec{X}=(X^i)_{i=1}^N\) be a complete set of unconstrained thermodynamic extensive variables of an isolated system. Onsager theory formulates the deterministic dynamics of the thermodynamic variables in relaxation processes to the equilibrium state. The time evolution is simply expressed as

$$\begin{aligned} \frac{\mathrm {d}X^i}{\mathrm {d}t}= \sum \limits _{j} L^{ij} \frac{\partial S(\varvec{X})}{\partial X^j}, \end{aligned}$$
(1)

where \(S(\varvec{X})\) is the thermodynamic entropy of the system, \(\partial S(\varvec{X})/\partial X_j\) corresponds to the thermodynamic force, and \(L^{ij}\) is called the Onsager coefficient. The important consequence of Onsager theory is the reciprocity

$$\begin{aligned} L^{ij}=L^{ji}. \end{aligned}$$
(2)

This nontrivial result was derived by studying fluctuations in equilibrium. Concretely, the fluctuation is assumed to be described by a Langevin equation,

$$\begin{aligned} \frac{\mathrm {d}X^i}{\mathrm {d}t}= \sum \limits _{j} L^{ij} \frac{\partial S(\varvec{X})}{\partial X^j} + \sum \limits _{j} l^{ij}\xi ^{j}, \end{aligned}$$
(3)

with the Gaussian white noise satisfying \(\left\langle \xi ^i(t) \xi ^{j}(t^{\prime })\right\rangle =\delta (t-t^{\prime }).\) This assumption means that the most probable regression process for a given fluctuation is equivalent to the relaxation dynamics, which is referred to as the regression hypothesis. Furthermore, according to equilibrium statistical mechanics, the stationary probability density of \(\varvec{X}\) is

$$\begin{aligned} P_\mathrm{eq}(\varvec{X})=\frac{1}{Z}\exp [ S(\varvec{X})], \end{aligned}$$
(4)

where Z is the normalization constant. The time-reversibility of microscopic systems then provides the nontrivial relation

$$\begin{aligned} \sum \limits _k l^{ik}l^{jk}=2L^{ij}, \end{aligned}$$
(5)

which leads to (2). The relation (5) is referred to as the fluctuation–dissipation relation of the second kind.

It should be noted that (1) is a nonlinear equation for \(\varvec{X}\) because \(S(\varvec{X})\) is not necessarily a quadratic function. In the argument above, \(L^{ij}\) is assumed to be independent of \(\varvec{X}.\) Because dependence of \(L^{ij}\) on \(\varvec{X}\) is expected in general cases, it is natural to consider generalized forms of (1) and (3). One approach assumes (3) with \(L^{ij}(\varvec{X})\) and \(l^{ij}(\varvec{X})\) as the starting equation, where a multiplication rule for \(l^{ij}(\varvec{X})\) and \(\xi ^j\) is specified. Once a stochastic system is defined, the stationary distribution for \(\varvec{X}\) is determined. We then find that the stationary distribution is not given by (4) for any multiplication rule for \(l^{ij}(\varvec{X})\) and \(\xi ^{j}.\) This means that there is no consistent description of (3)–(5) when dependence of \(L^{ij}\) on \(\varvec{X}\) is considered. The important thing here is that a generalization of (3) with \(L^{ij}(\varvec{X})\) is not obvious at all.

We can now describe the dynamics of \(\varvec{X}\) on the basis of a Hamiltonian system consisting of atoms and molecules. Suppose that \(\varvec{X}\) is a complete set of slow variables for the system. Examples include a complete set of unconstrained extensive variables in thermodynamics. Then, by using a separation of time scales, we may study the time evolution of \(\varvec{X}\) from the microscopic mechanical description. The result is that (3) is replaced by

$$\begin{aligned} \frac{\mathrm{d}X^{i}}{\mathrm{d}t}= {\mathcal {J}}^{i}_{\mathrm{rev}} (\varvec{X}) +\sum \limits _{j} L^{ij}(\varvec{X})\frac{\partial S(\varvec{X})}{\partial X^{j}} +\sum \limits _j \frac{\partial L^{ij}(\varvec{X})}{\partial X^{j}} + \sum \limits _{j} l^{ij}(\varvec{X})\cdot \xi ^{j}, \end{aligned}$$
(6)

where the multiplication of \(l^{ij}(\varvec{X})\) and \(\xi ^{j}\) is interpreted as Itô-type, and \(\mathcal {J}^{i}_{\mathrm {rev}}\) is the so-called reversible term that does not contribute to changes in entropy.

There is a long history of studies on (6). In [7], Green derived the Fokker–Planck equation corresponding to (6) by combining phenomenological arguments with microscopic considerations. This was the genesis of (6). After this paper, many further studies were performed. For example, Green’s derivation was improved in [8] where fewer assumptions were used. More formal studies under the microscopic description re-derived the Fokker–Planck equation using a projection-operator method [9, 10] and a nonequilibrium statistical operator method [11]. In another direction, the Fokker–Planck equation corresponding to (6) was also derived from a general Fokker–Planck equation by imposing a detailed balance condition [12, 13]. Finally, the Langevin equation (6) was derived directly from Liouville’s equation using a nonlinear projection operator method [14]. Thus (6) was well established by 1975.

However, the result (6) is less well known nowadays. There may be two reasons. First, Graham attempted to develop a co-variant description of nonlinear Onsager theory. Although this theory is complicated, many papers in this direction followed [15,16,17,18,19]. Unfortunately, we do not find a final answer in this direction, but more importantly, we consider such a generalization to not be necessary at all. Equation (6) is sufficient to be general and universal. Second, when \(\varvec{X}\) is a set of extensive variables in thermodynamics, the third term becomes higher-order than the second term from the estimates \(S =O(\varOmega )\) and \(X^{i}= O(\varOmega )\) for the system size \(\varOmega .\) Therefore, if one combines the system size expansion [20, 21] in deriving the equation for slow variables, the third term does not appear for thermodynamically normal systems. Although this argument is correct, we emphasize that (6) can be a starting point for all systems, including small systems, once we identify a complete set of slow variables under equilibrium conditions. That is, in our opinion, (6) should be recognized as a fundamental equation for slow variables.

The main purpose of this paper is to re-derive (6) with particular emphasis of the separation of time scales and a universal asymptotic form of the probability density for time-averaged fluxes [22]. We first assume a complete set of slow variables. Let \(\tau _\mathrm{macro}\) be the shortest time scale of the slow variables and \(\tau _\mathrm{micro}\) be the largest time scale of the other dynamical variables. Then, from the separation of time scales \(\tau _\mathrm{micro} \ll \tau _\mathrm{macro},\) we can find \(\varDelta t\) such that \(\tau _\mathrm{micro} \ll \Delta t \ll \tau _\mathrm{macro}.\) This \(\varDelta t \) plays two crucial roles in the derivation of the equation for slow variables. First, because \(\tau _\mathrm{micro} \ll \varDelta t,\) we can consider the central limit theorem for the time averaged flux as a universal form of the asymptotic behavior of the transition probability of the slow variables during a time interval \(\varDelta t.\) The time reversibility in microscopic Hamiltonian systems provides a restriction on the transition probability. Second, because \(\varDelta t \ll \tau _\mathrm{macro},\) this universal form of the transition probability leads to the path integral form of a stochastic system. This stochastic system is nothing but (6). This concept is quite natural and general. Indeed, one can interpret the arguments of Onsager and Green through this concept. Nevertheless, as far as we know, there is no explicit presentation of the derivation of (6) with the universal asymptotic form of the probability density for time-averaged fluxes and the path integral formulation under \(\tau _\mathrm{micro} \ll \varDelta t \ll \tau _\mathrm{macro}.\) We thus expect that this paper will be instructive for understanding the universal form (6), and will also be useful for deriving the equation for slow variables even in systems out of equilibrium.

Here, we point out the difference between our and previous approaches. Our final goal is to establish a firm connection between a Langevin equation and a microscopic mechanical system. The previous studies [9,10,11, 14] using a projection operator method or a nonequilibrium statistical operator method have the same motivation as ours. Their methods use some physical approximation (such as a markovian approximation) just before obtaining a Langevin equation. The validity of the approximation depends on observation time scales and details of a system, and their formulation is based on only an identity, which is useful but far from a physical principle. Thus, their assumptions are out of scope of the theories. Then, we aim to achieve our goal with physical principles. From this motivation, we use the central limit theorem with the separation of time scales for connecting a microscopic mechanical and mesoscopic stochastic description in a mathematically and physically clear way. This paper also differs from another type of derivation of the Langevin equation on the basis of arguments within stochastic processes [12, 13]. Their derivation is self-contained and elegant, but arguments relating to microscopic descriptions are out of scope of their theory. As a technical remark, we note that they used the Kolmogorov forward and backward equations for restricting the form of the Langevin equation by imposing a detailed balance condition, while we directly use the transition probabilities. Although we do not completely achieve our aim, we believe that it is important to show the outline of our approach even without a rigorous proof. Our approach is not simply another derivation of known results, but provides a new direction of future studies.

The remainder of this paper is organized as follows. In Sect. 2, as preliminaries for the argument, we review a path integral formulation for a discrete-time Langevin equation. As described above, the central limit theorem for time-averaged fluxes is closely related to the path integral formulation of stochastic processes. The technical difficulty in the argument arises from its complicated expression, which may be entirely associated with the ill-defined nature of the multiplication of some quantities. To make the argument as clear as possible, we study a path integral form for discrete-time Langevin equations while keeping the time interval dt finite. We then find relations between different expressions of the path integral forms as first derived by Wissel [23]. This recovers each of the correct but apparently different expressions for the path integral in [24,25,26]. In Sect. 3, we consider as a special case a nonlinear Langevin equation for the momentum \(\varvec{P}\) of a Brownian particle of mass M in a homogeneous environment of temperature T

$$\begin{aligned} \frac{\mathrm {d}\varvec{P}}{\mathrm {d}t}&= {-}\frac{\gamma (\varvec{P})}{M}\varvec{P} +\sqrt{2 \gamma (\varvec{P}) T}\odot \varvec{\xi } \nonumber \\&= {-}\frac{\gamma (\varvec{P})}{M}\varvec{P} + T\nabla \gamma (\varvec{P}) + \sqrt{2 \gamma (\varvec{P}) T}\cdot \varvec{\xi }, \end{aligned}$$
(7)

where \(\odot \) denotes multiplication with the anti-Itô rule. Note that the Itô and anti-Itô rule will be explained in Sect. 2.1. Although (7) is well known, it has never been recognized as an example of (6) to the best of our knowledge. Indeed, we can derive (7) by using the central limit theorem with the separation of time scales. Then, in Sect. 4, we derive the general formula (6). Throughout this paper, the Boltzmann constant \(k_{\mathrm {B}}\) is set to unity.

2 Preliminaries: Path Integral Formulation of Discrete Time Stochastic Systems

2.1 Model and Path Integral Formulation

Let \(\varvec{x}=(x^{1},\ldots ,x^{N})\) be a collection of dynamical variables. We study the time evolution of \(\varvec{x}\) for a fixed time interval dt. We denote \(x^{i}(ndt)\) by \(x^{i}_{n},\) and we assume that \(dx^{i}_{n}\equiv x^{i}_{n+1}-x^{i}_{n}\) satisfies

$$\begin{aligned} d x^{i}_{n} = f^{i}\left( \varvec{x}_{n}\right) dt + \sum \limits _{j} g^{ij}\left( \varvec{x}_{n}\right) dB^{j}_{n}, \end{aligned}$$
(8)

where \(f^{i}\) and \(g^{ij}\) are smooth functions of \(\varvec{x},\, \varvec{B}(t)\) is a standard N-dimensional Wiener process [27], and \(dB^{i}_{n} \equiv B^{i}(ndt + dt) - B^{i}(ndt)\) is a Gaussian white noise with mean zero and covariance \(\langle dB^{i}_{n}dB^{j}_{m}\rangle =\delta ^{ij}\delta _{nm}dt.\) Because the short time interval dt can be considered to consist of shorter time intervals, we may use \(dB^{i}_{n}dB^{j}_{m}=\delta ^{ij}\delta _{nm}dt\) and ignore any o(dt) terms in the Taylor expansion, which is known as Itô’s lemma. Note that we obtain the Itô stochastic differential equation from (8) in the limit \(dt \rightarrow 0.\)

Denoting the probability density of finding \(\varvec{\chi }\) at time t by

$$\begin{aligned} P(\varvec{\chi },\,t)=\left\langle {\delta \left( \varvec{\chi }-\varvec{x}_{t}\right) }\right\rangle , \end{aligned}$$
(9)

and using Itô’s lemma and (8), we obtain the Fokker–Planck equation [27]

$$\begin{aligned} \frac{\partial }{\partial t}P(\varvec{\chi } ,\,t)&= {-} \sum \limits _{i} \frac{\partial }{\partial \chi ^{i}} \bigg [ f^{i}(\varvec{\chi }) P(\varvec{\chi } ,\,t)\bigg ] + \sum \limits _{i,j} \frac{\partial }{\partial \chi ^{i}}\frac{\partial }{\partial \chi ^{j}} \bigg [ G^{ij}(\varvec{\chi }) P(\varvec{\chi } ,\,t) \bigg ], \end{aligned}$$
(10)

in the limit \(dt\rightarrow 0\) with

$$\begin{aligned} G^{ij}(\varvec{x})&= \frac{1}{2}\sum \limits _{k} g^{ik}(\varvec{x}) g^{jk}(\varvec{x}). \end{aligned}$$
(11)

We introduce a parameter \(\alpha \) satisfying \(0\le \alpha \le 1,\) and define

$$\begin{aligned} \bar{x}_{n}^{i}&\equiv \alpha x_{n+1}^{i}+(1-\alpha ) x_{n}^{i}. \end{aligned}$$
(12)

The purpose of this section is to express the transition probability \(\mathcal {P}(\varvec{x}_{n+1}\vert \varvec{x}_{n})\) from \(\varvec{x}_{n}\) to \(\varvec{x}_{n+1}\) in terms of \(d\varvec{x}_{n}\) and \(\bar{\varvec{x}}_{n}.\) For any function \(A(\varvec{x})\) in the remainder of Sect. 2, we abbreviate \(A(\bar{\varvec{x}}_{n})\) and \(\partial A(\bar{\varvec{x}}_{n})/\partial \bar{x}^{i}_{n}\) to A and \(\partial ^{i}A,\) respectively. We present the expression for \(\mathcal {P}(\varvec{x}_{n+1}\vert \varvec{x}_{n})\) and derive it in the next subsection. The transition probability is

$$\begin{aligned} \mathcal {P}\left( \varvec{x}_{n+1}\vert \varvec{x}_{n}\right)&= \frac{1}{\sqrt{(4\pi dt)^{N} \det \mathsf {G}}} \exp \Bigg [ {-}\frac{dt}{4}\sum \limits _{i,j} \varDelta ^{i}_{n}\left( G^{-1}\right) ^{ij} \varDelta ^{j}_{n} \nonumber \\&\quad - \alpha \sum \limits _{i} \partial ^{i} f^{i} dt + \alpha ^{2} \sum \limits _{i,j} \partial ^{i} \partial ^{j} G^{ij} dt\Bigg ], \end{aligned}$$
(13)

with

$$\begin{aligned} \varDelta ^{i}_{n}&\equiv \frac{dx^{i}_{n}}{dt} - f^{i}\left( \bar{\varvec{x}}_{n}\right) + 2\alpha \sum \limits _{k} \partial ^{k} G^{ik}\left( \bar{\varvec{x}}_{n}\right) , \end{aligned}$$
(14)

where \((G^{-1})^{ij}\) is the ij component of the inverse of the matrix \(\mathsf {G}=(G^{ij}).\) In [23], this expression for the transition probability was derived from the Fokker–Planck equation (10). Note that (8) itself does not depend on \(\alpha .\) In Appendix, we check the normalization condition for the transition probability (13) for \(N=1,\) and derive the Fokker–Planck equation (10) from the transition probability (13) in the limit \(dt\rightarrow 0.\)

Next, we consider the continuous-time limit of the transition probability. To avoid divergence of the prefactor \(1/\sqrt{(4\pi dt)^{N} \det \mathsf {G}}\) in the transition probability (13), we rewrite (13) as

$$\begin{aligned} \mathcal {P}\left( \varvec{x}_{n+1}\vert \varvec{x}_{n}\right) = \int \frac{\mathrm {d}^{N}\bar{\varvec{p}}_{n}}{(2\pi )^{N}}\exp \Big [ dt \mathcal {L}\left( \varvec{x}_{n+1},\,\bar{\varvec{p}}_{n}\vert \varvec{x}_{n}\right) \Big ], \end{aligned}$$
(15)

with

$$\begin{aligned} \mathcal {L}\left( \varvec{x}_{n+1},\,\bar{\varvec{p}}_{n}\vert \varvec{x}_{n}\right)&= {-} \sum \limits _{i,j} \bar{p}^{i}_{n} G^{ij}\left( \bar{\varvec{x}}_{n}\right) \bar{p}^{j}_{n} - i\sum \limits _{i}\bar{p}^{i}_{n} \varDelta ^{i}_{n} \nonumber \\&\quad - \alpha \sum \limits _{i} \partial ^{i} f^{i}\left( \bar{\varvec{x}}_{n}\right) + \alpha ^{2}\sum \limits _{i,j} \partial ^{i} \partial ^{j} G^{ij}\left( \bar{\varvec{x}}_{n}\right) , \end{aligned}$$
(16)

where \(\bar{\varvec{p}}_{n}=(\bar{p}^{1}_{n},\ldots ,\bar{p}^{N}_{n})\) is interpreted as the conjugate momentum of \(\bar{\varvec{x}}_{n}.\) Using (15) repeatedly in each step and taking the limit \(dt\rightarrow 0,\) we can obtain the path integral for the Langevin equation. Note that the Stratonovich convention (\(\alpha =1/2\)) in (15) may be convenient for a perturbative analysis based on the path integral formulation because the Stratonovich convention conserves the chain rule of differential calculus.

Finally, we compare (13) and (15) with some previous studies [24,25,26]. Instead of (8), we consider

$$\begin{aligned} d x^{i}_{n} = \tilde{F}^{i}\left( \tilde{\varvec{x}}_{n}\right) dt + \sum \limits _{j} g^{ij}\left( \tilde{\varvec{x}}_{n}\right) dB^{j}_{n}, \end{aligned}$$
(17)

with

$$\begin{aligned} \tilde{x}_{n}^{i} \equiv \tilde{\alpha } x_{n+1}^{i}+(1-\tilde{\alpha } ) x_{n}^{i}, \end{aligned}$$
(18)

where \(\tilde{F}^{i}\) is some smooth function, and \(\tilde{\alpha }\) is a parameter satisfying \(0\le \tilde{\alpha } \le 1.\) Here, \(\tilde{\alpha }=0,\,1/2,\) and 1 correspond to the Itô, Stratonovich, and anti-Itô convention, respectively, in the limit \(dt \rightarrow 0.\) Using Itô’s lemma, we can rewrite (17) as

$$\begin{aligned} d x^{i}_{n} = \tilde{F}^{i}\left( \varvec{x}_{n}\right) dt + \tilde{\alpha } \sum \limits _{j,k} g^{kj}\left( \varvec{x}_{n}\right) \frac{\partial g^{ij}(\varvec{x}_{n})}{\partial x^{k}_{n}}dt + \sum \limits _{j} g^{ij}\left( \varvec{x}_{n}\right) dB^{j}_{n}. \end{aligned}$$
(19)

Thus, we can obtain the transition probability \(\mathcal {P}(\varvec{x}_{n+1}\vert \varvec{x}_{n})\) for (17) by using (13) with

$$\begin{aligned} f^{i}=\tilde{F}^{i}+\tilde{\alpha }\sum \limits _{j,k}g^{kj}\partial ^{k}g^{ij}. \end{aligned}$$
(20)

Note that \(\tilde{\alpha }\) may be different from \(\alpha .\) When \(\alpha =\tilde{\alpha }\) (15) with (20) is equivalent to the results given in [24, 25]. When \((\alpha ,\,\tilde{\alpha })=(1/2,\,1)\) (13) with (20) is equivalent to the results given in [26].

2.2 Derivation

We derive the transition probability (13) from (8) without using the Fokker–Planck equation (10). Using Itô’s lemma, we first rewrite (8) in terms of \(\bar{\varvec{x}}_{n}\) as

$$\begin{aligned} d x^{i}_{n} = F^{i} dt + \sum \limits _{j} g^{ij} dB^{j}_{n}, \end{aligned}$$
(21)

with

$$\begin{aligned} F^{i} = f^{i} - \alpha \sum \limits _{j,k} g^{kj} \partial ^{k} g^{ij}. \end{aligned}$$
(22)

Because \(dB^{i}_{n}\) is the Gaussian white noise with covariance \(\langle dB^{i}_{n}dB^{j}_{m}\rangle =\delta ^{ij}\delta _{nm}dt,\) the probability density of \(\{ dB^{i}_{n}\}\) is given by

$$\begin{aligned} P\left( \left\{ dB^{i}_{n}\right\} \right) = \frac{1}{(\sqrt{2\pi dt})^{N}} \exp \left[ {-}\frac{1}{2dt}\sum \limits _{i} \left( dB^{i}_{n}\right) ^{2}\right] . \end{aligned}$$
(23)

Using (8), \(\varvec{x}_{n+1}\) is uniquely determined by \(\varvec{x}_{n}\) and \(\{ dB^{i}_{n}\}.\) Thus, we have

$$\begin{aligned} \mathcal {P}\left( \varvec{x}_{n+1}\vert \varvec{x}_{n}\right) = P\left( \left\{ dB^{i}_{n}\right\} \right) \vert \det \mathcal {J} \vert , \end{aligned}$$
(24)

where \(\mathcal {J}=(J^{ij})\) is the Jacobian matrix defined by

$$\begin{aligned} J^{ij} \equiv \frac{\partial (dB^{i}_{n})}{\partial x^{j}_{n+1}}. \end{aligned}$$
(25)

We next calculate the determinant of the Jacobian matrix \(\mathcal {J}.\) Differentiating both sides of (21) with respect to \(x^{l}_{n+1},\) we obtain

$$\begin{aligned} \sum \limits _{j} g^{ij} J^{jl} = \mu ^{il}, \end{aligned}$$
(26)

where the matrix \(\mathcal {M}=(\mu ^{il})\) is given by

$$\begin{aligned} \mu ^{il} = \delta ^{il} - \alpha \partial ^{l}F^{i}dt - \alpha \sum \limits _{j} \partial ^{l}g^{ij}dB^{j}_{n}. \end{aligned}$$
(27)

Denoting the identity matrix of size N by \(I_{N},\) we define the matrix \(\widetilde{\mathcal {M}}=(\tilde{\mu }^{ij})\) by \(\widetilde{\mathcal {M}}\equiv I_{N}-\mathcal {M}.\) Using Itô’s lemma, the determinant of the matrix \(\mathcal {M}\) is

$$\begin{aligned} \det \mathcal {M}&= \det \,[ \exp ( \log \mathcal {M})] \nonumber \\&= \exp \,[ {{\mathrm{Tr}}}( \log \mathcal {M})] \nonumber \\&= \exp \left[ -{{\mathrm{Tr}}}\widetilde{\mathcal {M}}-\frac{1}{2}{{\mathrm{Tr}}}\widetilde{\mathcal {M}}^{2}\right] \nonumber \\&= \exp \Bigg [ {-} \alpha \sum \limits _{i} \partial ^{i} F^{i}dt - \alpha \sum \limits _{i,j} \partial ^{i}g^{ij}dB^{j}_{n} - \frac{\alpha ^{2}}{2} \sum \limits _{i,j,k} \partial ^{i}g^{jk}\partial ^{j}g^{ik} dt \Bigg ]. \end{aligned}$$
(28)

Because (26) leads to

$$\begin{aligned} \det \mathcal {G} \det \mathcal {J} = \det \mathcal {M}, \end{aligned}$$
(29)

with \(\mathcal {G}=(g^{ij}),\) we obtain

$$\begin{aligned} \det \mathcal {J}&= \frac{1}{\det \mathcal {G}} \exp \Bigg [ {-} \alpha \sum \limits _{i} \partial ^{i} F^{i}dt - \alpha \sum \limits _{i,j} \partial ^{i}g^{ij}dB^{j}_{n} - \frac{\alpha ^{2}}{2} \sum \limits _{i,j,k} \partial ^{i}g^{jk}\partial ^{j}g^{ik} dt \Bigg ] . \end{aligned}$$
(30)

Substituting (23) and (30) into (24), we obtain

$$\begin{aligned} \mathcal {P}\left( \varvec{x}_{n+1}\vert \varvec{x}_{n}\right)&= \frac{1}{(\sqrt{2\pi dt})^{N}\vert \det \mathcal {G}\vert } \exp \Bigg [ {-}\frac{1}{2dt}\sum \limits _{i}\bigg [ dB^{i}_{n}+\alpha \sum \limits _{k}\partial ^{k} g^{ki}dt\bigg ]^{2} \nonumber \\&\qquad - \alpha \sum \limits _{i} \partial ^{i} F^{i}dt + \frac{\alpha ^{2}}{2} \sum \limits _{i,j,k} \left[ \partial ^{i}g^{ik}\partial ^{j}g^{jk} - \partial ^{i}g^{jk}\partial ^{j}g^{ik}\right] dt\Bigg ] \nonumber \\&= \frac{1}{(\sqrt{2\pi dt})^{N}\vert \det \mathcal {G}\vert } \exp \Bigg [ {-}\frac{dt}{2}\sum \limits _{i} \bigg [ \sum \limits _{j} \left( g^{-1}\right) ^{ij} \nonumber \\&\quad \bigg ( \frac{dx^{j}_{n}}{dt}-F^{j} + \alpha \sum \limits _{k,l} g^{jl}\partial ^{k}g^{kl} \bigg ) \bigg ]^{2} - \alpha \sum \limits _{i} \partial ^{i} F^{i} dt \nonumber \\&\quad + \frac{\alpha ^{2}}{2} \sum \limits _{i,j,k} \left[ \partial ^{i}g^{ik}\partial ^{j}g^{jk} - \partial ^{i}g^{jk}\partial ^{j}g^{ik}\right] dt\Bigg ], \end{aligned}$$
(31)

where \((g^{-1})^{ij}\) is the ij component of the inverse of the matrix \(\mathcal {G}.\) Using (22), \(\det \mathsf {G} = (\det \mathcal {G})^{2}/2^{N},\) and

$$\begin{aligned} g^{ik}\partial ^{j} g^{jk} - g^{jk}\partial ^{j} g^{ik} = \partial ^{j} \left( g^{ik}g^{jk}\right) -2g^{jk}\partial ^{j}g^{ik}, \end{aligned}$$
(32)

we can rewrite (31) as (13).

3 Example: Brownian Motion

3.1 Setup

In this section, we study the motion of a single Brownian particle in a fluid (heat bath). We derive (7) under the assumption that the relaxation time of the momentum \(\tau _{\mathrm {macro}}\) is much larger than the time scales of the other degrees of freedom. We describe the system as a Hamiltonian system. The system consists of N bath particles of mass m and a Brownian particle of mass M in a cube of side length L. For simplicity, periodic boundary conditions are assumed. Let \((\varvec{r}_{i},\,\varvec{p}_{i})\) \((1\le i\le N)\) be the position and momentum of the ith bath particle, and \((\varvec{R},\,\varvec{P})\) be those of the Brownian particle. The collection of the positions and momenta of all particles is denoted by \(\varGamma = (\varvec{r}_{1},\,\varvec{p}_{1},\ldots ,\varvec{r}_{N},\,\varvec{p}_{N},\,\varvec{R},\,\varvec{P}),\) which represents the microscopic state of the system. For any state \(\varGamma ,\) we denote its time reversal by \(\varGamma ^{*},\) namely, the state obtained by reversing all the momenta, and denote the time reversal of \(\varvec{P}\) by \(\varvec{P}^{*}={-}\varvec{P}.\) For convenience, we denote the microscopic state excluding the momentum of the Brownian particle by \(\tilde{\varGamma } = (\varvec{r}_{1},\,\varvec{p}_{1},\ldots ,\varvec{r}_{N},\,\varvec{p}_{N},\,\varvec{R}).\)

The Hamiltonian of the system is given by

$$\begin{aligned} H(\varGamma ) = \tilde{H}(\tilde{\varGamma }) + \frac{\vert \varvec{P}\vert ^2}{2M}, \end{aligned}$$
(33)

with

$$\begin{aligned} \tilde{H}(\tilde{\varGamma }) = \sum \limits _{i=1}^{N} \Bigg [ \frac{\vert \varvec{p}_{i}\vert ^{2}}{2m} + \sum \limits _{j>i} \varPhi _\mathrm{int}\left( \left| \varvec{r}_{i}-\varvec{r}_{j}\right| \right) + \varPhi _\mathrm{B}\left( \left| \varvec{r}_{i}-\varvec{R}\right| \right) \Bigg ] , \end{aligned}$$
(34)

where \(\varPhi _\mathrm{int}\) is a short-range interaction potential between two bath particles, and \(\varPhi _\mathrm{B}\) is that between a bath particle and the Brownian particle. Then the Hamiltonian satisfies the time-reversal symmetry

$$\begin{aligned} H(\varGamma ^{*}) = H(\varGamma ). \end{aligned}$$
(35)

For the Hamiltonian equations with a given state \(\varGamma \) at \(t=0,\, \varGamma _{t}\) denotes the solution at time t. In this setup, energy is conserved, that is,

$$\begin{aligned} H\left( \varGamma _{t}\right) =H(\varGamma ), \end{aligned}$$
(36)

and Liouville’s theorem that

$$\begin{aligned} \left| \frac{\partial \varGamma _{t}}{\partial \varGamma }\right| = 1, \end{aligned}$$
(37)

holds. The total force acting on the Brownian particle is given by

$$\begin{aligned} \varvec{F}(\varGamma )&= {-}\frac{\partial H(\varGamma )}{\partial \varvec{R}} \nonumber \\&= {-} \sum \limits _{i=1}^{N} \frac{\partial \varPhi _\mathrm{B}(\vert \varvec{r}_{i}-\varvec{R}\vert )}{\partial \varvec{R}}. \end{aligned}$$
(38)

For convenience, we abbreviate \(A(\varGamma _{t})\) to \(A_{t}\) for any function A. The equation of motion for the Brownian particle is

$$\begin{aligned} \frac{\mathrm {d}\varvec{P}_{t}}{\mathrm {d}t}=\varvec{F}_{t}. \end{aligned}$$
(39)

We assume that the system in equilibrium is at temperature T. Suppose that the momentum of the Brownian particle is \(\varvec{P}_{\mathrm {i}}\) at an initial time. Then the other mechanical state \(\tilde{\varGamma }\) is sampled according to the probability density

$$\begin{aligned} \tilde{P}_{\mathrm {eq}}(\tilde{\varGamma }) = \exp \left[ {-}\frac{\tilde{H}(\tilde{\varGamma }) -\tilde{\varPsi }_\mathrm{eq}}{T}\right] , \end{aligned}$$
(40)

where \(\tilde{\varPsi }_\mathrm{eq}\) is the normalization constant. The Hamiltonian equation determines the value of \(\varvec{P}\) at time t. By taking the average over initial realizations of \(\tilde{\varGamma },\) we determine the probability density of \(\varvec{P}=\varvec{P}_{\mathrm {f}}\) at time t for a given \(\varvec{P}_{\mathrm {i}}\) at time 0 in the form

$$\begin{aligned} \mathcal {P}_{t}\left( \varvec{P}_{\mathrm {f}}\vert \varvec{P}_{\mathrm {i}}\right) \equiv \int \mathrm {d}\varGamma \tilde{P}_{\mathrm {eq}}(\tilde{\varGamma }) \delta \left( \varvec{P}-\varvec{P}_{\mathrm {i}}\right) \delta \left( \varvec{P}_{t}-\varvec{P}_{\mathrm {f}}\right) . \end{aligned}$$
(41)

It should be noted that

$$\begin{aligned} \int \mathrm {d}^{3} \varvec{P}_{\mathrm {f}} \mathcal {P}_{t}\left( \varvec{P}_{\mathrm {f}}\vert \varvec{P}_{\mathrm {i}}\right) = 1. \end{aligned}$$
(42)

When we describe the motion of the Brownian particle, the position \(\varvec{R}\) should be treated in the same manner as \(\varvec{P}\) because \(\mathrm {d}\varvec{R}/\mathrm {d}t=\varvec{P}.\) For simplicity, we consider space translational symmetric systems, so that we do not need to specify the position at the initial time. If one considers an external potential acting on the Brownian particle, then \(\mathcal {P}_{t}(\varvec{P}_{\mathrm {f}}\vert \varvec{P}_{\mathrm {i}})\) should be replaced by \(\mathcal {P}_{t}(\varvec{P}_{\mathrm {f}},\,\varvec{R}_{\mathrm {f}} \vert \varvec{P}_{\mathrm {i}},\,\varvec{R}_{\mathrm {i}}).\)

Here, the most important property of \(\mathcal {P}_{t}\) is

$$\begin{aligned} \mathcal {P}_{t}\left( \varvec{P}_{\mathrm {f}}\vert \varvec{P}_{\mathrm {i}}\right) P_{\mathrm {MB}}\left( \varvec{P}_{\mathrm {i}}\right) = \mathcal {P}_{t}\left( \varvec{P}_{\mathrm {i}}^{*}\vert \varvec{P}_{\mathrm {f}}^{*}\right) P_{\mathrm {MB}}\left( \varvec{P}_{\mathrm {f}}\right) , \end{aligned}$$
(43)

where we denote the Maxwell–Boltzmann distribution by

$$\begin{aligned} P_{\mathrm {MB}}(\varvec{P}) = (2\pi MT)^{-3/2}\exp \left[ {-}\frac{\vert \varvec{P}\vert ^{2}}{2MT}\right] . \end{aligned}$$
(44)

Property (43) is called the detailed balance condition. Using microscopic reversibility \((\varGamma ^{*})_{-t}=(\varGamma _{t})^{*}\) (35)–(37), we can obtain the detailed balance condition (43).

3.2 Assumptions

Let \(\tau _{\mathrm {micro}}\) be the correlation time of the force acting on the Brownian particle, and \(\tau _{\mathrm {macro}}\) be the relaxation time of the momentum of the Brownian particle. We have the separation of time scales represented by \(\tau _{\mathrm {micro}}\ll \tau _{\mathrm {macro}}\) because we assume that the relaxation time of the momentum is much larger than the time scales of the other degrees of freedom.

We define the time-averaged total force acting on the Brownian particle as

$$\begin{aligned} \bar{\varvec{F}}(\varGamma ) \equiv \frac{1}{\varDelta t}\int _{0}^{\Delta t} \mathrm {d}s \varvec{F}\left( \varGamma _{s}\right) , \end{aligned}$$
(45)

where \(\varDelta t\) is a finite time interval that satisfies \(\tau _{\mathrm {micro}}\ll \varDelta t \ll \tau _{\mathrm {macro}}.\) We define the conditional probability density of \(\bar{\varvec{F}}\) given \(\varvec{P}_{\mathrm {i}}\) by

$$\begin{aligned} \widetilde{\mathcal {P}}\left( \bar{\varvec{F}}\vert \varvec{P}_{\mathrm {i}}\right) \equiv \int \mathrm {d}\varGamma \tilde{P}_{\mathrm {eq}}(\tilde{\varGamma }) \delta \left( \varvec{P}-\varvec{P}_{\mathrm {i}}\right) \delta (\bar{\varvec{F}}(\varGamma )-\bar{\varvec{F}}). \end{aligned}$$
(46)

Considering \(\varDelta t \gg \tau _{\mathrm {micro}},\) we may employ the central limit theorem, according to which and the isotropic property in equilibrium we have the following Gaussian form of \(\widetilde{\mathcal {P}}(\bar{\varvec{F}}\vert \varvec{P}_{\mathrm {i}}){\text {:}}\)

$$\begin{aligned} \widetilde{\mathcal {P}}\left( \bar{\varvec{F}}\vert \varvec{P}_{\mathrm {i}}\right) = C \exp \left[ {-}\varDelta t \mathcal {I}\left( \bar{\varvec{F}}\vert \varvec{P}_{\mathrm {i}}\right) \right] , \end{aligned}$$
(47)

with

$$\begin{aligned} \mathcal {I}\left( \bar{\varvec{F}}\vert \varvec{P}_{\mathrm {i}}\right) = \frac{\vert \varvec{\bar{F}} - \varvec{\mathcal {F}} (\varvec{P}_{\mathrm {i}})\vert ^{2}}{4T\gamma (\varvec{P}_{\mathrm {i}})}, \end{aligned}$$
(48)

where C is the normalization constant given by

$$\begin{aligned} C = \left[ \frac{\varDelta t}{4\pi T \gamma (\varvec{P}_{\mathrm {i}})}\right] ^{3/2}, \end{aligned}$$
(49)

\(\varvec{\mathcal {F}} (\varvec{P}_{\mathrm {i}})\) is the most probable value of \(\bar{\varvec{F}},\) and \(2T\gamma (\varvec{P})\) is the dispersion of fluctuations of \(\bar{\varvec{F}}.\) The dispersion \(\gamma (\varvec{P})\) is assumed to be positive for any \(\varvec{P}.\) Because of the space-reflection symmetry, \(\varvec{\mathcal {F}}\) and \(\gamma \) satisfy

$$\begin{aligned} \varvec{\mathcal {F}}(\varvec{P}^{*})&= {-} \varvec{\mathcal {F}}(\varvec{P}),\end{aligned}$$
(50)
$$\begin{aligned} \gamma (\varvec{P}^{*})&= \gamma (\varvec{P}) , \end{aligned}$$
(51)

respectively. Assumptions (47) with (48) is essential for our derivation of a nonlinear Langevin equation for the Brownian particle. Note that we may prove this assumption when bath particles collisions with a Brownian particle can be regarded as independent.

3.3 Derivation

Using the equation of motion (39), \(\bar{\varvec{F}}(\varGamma )\) defined in (45) can be rewritten as

$$\begin{aligned} \bar{\varvec{F}}(\varGamma ) = \frac{\varvec{P}_{\varDelta t}-\varvec{P}}{\varDelta t}. \end{aligned}$$
(52)

Thus, by changing variables from \(\bar{\varvec{F}}(\varGamma )\) to \(\varvec{P}_{\varDelta t}\) in (47) with (48), we obtain

$$\begin{aligned} \mathcal {P}_{\varDelta t}\left( \varvec{P}_{\mathrm {f}}\vert \varvec{P}_{\mathrm {i}}\right)&= \widetilde{\mathcal {P}}\left( \bar{\varvec{F}}\vert \varvec{P}_{\mathrm {i}}\right) / (\varDelta t)^{3} \nonumber \\&= \left[ 4\pi \varDelta t T \gamma \left( \varvec{P}_{\mathrm {i}}\right) \right] ^{-3/2} \exp \left[ {-}\frac{\varDelta t}{4T\gamma (\varvec{P}_{\mathrm {i}})} \left| \frac{\varvec{P}_{\mathrm {f}}-\varvec{P}_{\mathrm {i}}}{\varDelta t} - \varvec{\mathcal {F}}\left( \varvec{P}_{\mathrm {i}}\right) \right| ^{2}\right] . \end{aligned}$$
(53)

When we compare (53) with (13) for \(\alpha =0,\) we can describe the discrete time evolution of \(\varvec{P}\) as the discrete stochastic system (8). Now, taking the limit \(\varDelta t/\tau _{\mathrm {macro}}\rightarrow 0,\) we obtain the Langevin equation

$$\begin{aligned} \frac{\mathrm {d}\varvec{P}_{t}}{\mathrm {d}t} = \varvec{\mathcal {F}}\left( \varvec{P}_{t}\right) + \sqrt{2T\gamma \left( \varvec{P}_{t}\right) } \cdot \varvec{\xi }_{t} , \end{aligned}$$
(54)

where \(\varvec{\xi }_{t}\) is the zero-mean Gaussian white noise with covariance \(\langle \xi ^{a}_{t}\xi ^{b}_{s}\rangle = \delta ^{ab} \delta (t-s)\) and \(\cdot \) denotes multiplication with the Itô rule.

Next, we express \(\varvec{\mathcal {F}}\) in terms of \(\gamma \) from the detailed balance condition (43). Using (13) with \(\alpha =1/2,\) we can rewrite the transition probability (53) in terms of \(\varvec{P}_{\mathrm {m}}\equiv (\varvec{P}_{\mathrm {f}}+\varvec{P}_{\mathrm {i}})/2\) as

$$\begin{aligned} \mathcal {P}_{\varDelta t}\left( \varvec{P}_{\mathrm {f}}\vert \varvec{P}_{\mathrm {i}}\right)&= \left[ 4\pi \varDelta t T \gamma \left( \varvec{P}_{\mathrm {m}}\right) \right] ^{-3/2} \exp \Bigg [ {-}\frac{\varDelta t}{4T\gamma (\varvec{P}_{\mathrm {m}})} \left| \frac{\varvec{P}_{\mathrm {f}}-\varvec{P}_{\mathrm {i}}}{\varDelta t} - \widetilde{\varvec{\mathcal {F}}} \left( \varvec{P}_{\mathrm {m}}\right) \right| ^{2} \nonumber \\&\quad -\frac{\varDelta t}{2}\sum \limits _{a=x,y,z}\frac{\partial \mathcal {F}^{a}(\varvec{P}_{\mathrm {m}})}{\partial P^{a}_{\mathrm {m}}} +\frac{\varDelta t}{4}\sum \limits _{a=x,y,z} T\frac{\partial ^{2}\gamma (\varvec{P}_{\mathrm {m}})}{\partial P^{a}_{\mathrm {m}} \partial P^{a}_{\mathrm {m}}}\Bigg ], \end{aligned}$$
(55)

with

$$\begin{aligned} \widetilde{\mathcal {F}}^{a} \left( \varvec{P}_{\mathrm {m}}\right)&\equiv \mathcal {F}^{a}\left( \varvec{P}_{\mathrm {m}}\right) -T\frac{\partial \gamma (\varvec{P}_{\mathrm {m}})}{\partial P^{a}_{\mathrm {m}}} , \end{aligned}$$
(56)

where the superscript a represents the indices in Cartesian coordinates \((x,\,y,\,z),\) and we have ignored all \(o(\varDelta t)\) terms because these terms are irrelevant in the limit \(\varDelta t/\tau _{\mathrm {macro}}\rightarrow 0.\) Note that the Stratonovich convention (\(\alpha =1/2\)) is convenient when using (43) because it has the property that the forward and backward paths are evaluated at the same points [24]. Substituting (55) into (43) with (44), we obtain

$$\begin{aligned} \widetilde{\varvec{\mathcal {F}}} \left( \varvec{P}_{\mathrm {m}}\right)&= {-} \frac{\gamma (\varvec{P}_{\mathrm {m}})}{M}\varvec{P}_{\mathrm {m}}, \end{aligned}$$
(57)

which is called the fluctuation–dissipation relation of the second kind. From (54), (56), and (57), we have

$$\begin{aligned} \frac{\mathrm {d}P_{t}^{a}}{\mathrm {d}t}&= {-} \frac{\gamma (\varvec{P}_{t})}{M}P_{t}^{a} + T\frac{\partial \gamma (\varvec{P}_{t})}{\partial P^{a}_{t}} + \sqrt{2T\gamma \left( \varvec{P}_{t}\right) } \cdot \xi _{t}^{a} \nonumber \\&= {-} \frac{\gamma (\varvec{P}_{t})}{M}P_{t}^{a} + \sqrt{2T\gamma \left( \varvec{P}_{t}\right) } \odot \xi _{t}^{a}, \end{aligned}$$
(58)

which is equivalent to (7). From (58), \(\gamma (\varvec{P}_{t})\) can be interpreted as a nonlinear friction coefficient.

4 Generalization

4.1 Motivation

We consider fluctuations of a system in equilibrium. There is a special set of variables whose time scales are well separated from those of the other dynamical degrees of freedom. We refer to such a set as a complete set of slow variables and denote it by \(\varvec{X}=(X^1,\,X^2,\ldots , X^N).\) For the example in the previous section, \(\varvec{X}=(\varvec{R},\,\varvec{P}).\) As a different example, one may consider fluctuations in a thermodynamically isolated system separated into two regions by a freely movable diabatic wall. In this case, unconstrained thermodynamic extensive variables, the energy and the volume in one region are assumed to form a complete set of slow variables. Furthermore, hydrodynamic fluctuations, which are long-wavelength fluctuations of locally conserved quantities, in an equilibrium liquid are another example of a complete set of slow variables. For simplicity, we assume that the Hamiltonian of the microscopic mechanical system is symmetric with respect to the time-reversal operation. For such a system, the probability density of \(\varvec{X}\) is denoted by

$$\begin{aligned} P_{\mathrm {eq}}(\varvec{X}) = \frac{1}{Z}\exp [ S(\varvec{X})], \end{aligned}$$
(59)

where Z is the normalization constant, and \(S(\varvec{X}^{*})=S(\varvec{X})\) for the time reversal \(\varvec{X}^{*}\) of \(\varvec{X}.\) The physical interpretation of S depends on the system being studied. For example, \(S(\varvec{X})\) corresponds to entropy when thermodynamic fluctuations in an isolated system are considered. For other cases, \(S(\varvec{X})\) should be read from the form of the stationary distribution. Suppose that the system is in equilibrium. We then expect that the time evolution of \(\varvec{X}\) can be described by a Langevin equation. In this section, we derive the equation by generalizing the arguments in the previous section.

4.2 Basic Concept

On the basis of a microscopic mechanical description, we can define the conditional probability density of \(\varvec{X}=\varvec{X}_{\mathrm {f}}\) at time t,  denoted by \(\mathcal {P}_{t}(\varvec{X}_{\mathrm {f}} \vert \varvec{X}_{\mathrm {i}}),\) provided that \(\varvec{X}=\varvec{X}_{\mathrm {i}}\) at time 0. There are two important properties of this probability density. First, following the central limit theorem, we assume the Gaussian form of form of the probability density for the time averaged flux written as \((\varvec{X}_{\varDelta t}-\varvec{X} )/\varDelta t.\) The result (53) in the previous section becomes

$$\begin{aligned} \mathcal {P}_{\varDelta t}\left( \varvec{X}_{\mathrm {f}}\vert \varvec{X}_{\mathrm {i}}\right)&= \frac{1}{\sqrt{(4\pi \varDelta t)^{N} \det \mathsf {L} (\varvec{X}_{\mathrm {i}})}} \exp \Bigg [ {-}\frac{\varDelta t}{4} \sum \limits _{i,j} \left( L^{-1}\right) ^{ij}\left( \varvec{X}_{\mathrm {i}}\right) \nonumber \\&\quad \times \bigg [ \frac{X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}}{\varDelta t}-\mathcal {J}^{i}\left( \varvec{X}_{\mathrm {i}}\right) \bigg ] \bigg [ \frac{X^{j}_{\mathrm {f}}-X^{j}_{\mathrm {i}}}{\varDelta t} -\mathcal {J}^{j}\left( \varvec{X}_{\mathrm {i}}\right) \bigg ] \Bigg ], \end{aligned}$$
(60)

where \(\mathcal {J}^{i}\) is the most probable value of the time averaged flux, and \(2L^{ij}\) is the dispersion matrix. We assume that the matrix \(\mathsf {L} = (L^{ij})\) is positive definite. This means that each \(X^{i}_{\mathrm {f}}\) is not uniquely determined by \(\varvec{X}_{\mathrm {i}}.\) If \(X^{i}_{\mathrm {f}}\) is uniquely determined by \(\varvec{X}_{\mathrm {i}},\) such as \(X^{i}_{\mathrm {f}}=X^{i}_{\mathrm {i}}+\mathcal {J}^{i}(\varvec{X}_{\mathrm {i}})\varDelta t,\) then we multiply the right-hand side of (60) by \(\delta (X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}-\mathcal {J}^{i}(\varvec{X}_{\mathrm {i}})\varDelta t)\) and consider the submatrix formed by deleting the ith row and ith column of \(\mathsf {L}.\) Second, from the reversibility of microscopic Hamiltonian systems, we can obtain

$$\begin{aligned} \mathcal {P}_{t}\left( \varvec{X}_{\mathrm {f}}\vert \varvec{X}_{\mathrm {i}}\right) P_{\mathrm {eq}}\left( \varvec{X}_{\mathrm {i}}\right) = \mathcal {P}_{t}\left( \varvec{X}_{\mathrm {i}}^{*}\vert \varvec{X}_{\mathrm {f}}^{*}\right) P_{\mathrm {eq}}\left( \varvec{X}_{\mathrm {f}}\right) . \end{aligned}$$
(61)

Then, by substituting (60) into (61), we obtain a possible form of \(\mathcal {J}^{i}(\varvec{X})\) and a symmetry property of \(\mathsf {L}.\) For convenience, we denote \(\varvec{X}^{*}\) by \(\varvec{\epsilon }\varvec{X}=(\epsilon ^{1}X^{1},\,\epsilon ^{2}X^{2},\ldots ,\epsilon ^{N}X^{N}),\) where \(\epsilon ^{i}={+}1\) or −1 for \(X^{i}.\) We decompose \(\mathcal {J}^{i}\) into two parts,

$$\begin{aligned} \mathcal {J}^{i}(\varvec{X}) = \mathcal {J}^{i}_{\mathrm {rev}}(\varvec{X}) + \mathcal {J}^{i}_{\mathrm {irr}}(\varvec{X}), \end{aligned}$$
(62)

with

$$\begin{aligned} \mathcal {J}^{i}_{\mathrm {rev}}(\varvec{X})&\equiv \frac{\mathcal {J}^{i}(\varvec{X})-\epsilon ^{i}\mathcal {J}^{i}(\varvec{X}^{*})}{2},\end{aligned}$$
(63)
$$\begin{aligned} \mathcal {J}^{i}_{\mathrm {irr}}(\varvec{X})&\equiv \frac{\mathcal {J}^{i}(\varvec{X})+\epsilon ^{i}\mathcal {J}^{i}(\varvec{X}^{*})}{2}, \end{aligned}$$
(64)

which satisfy \(\mathcal {J}^{i}_{\mathrm {rev}}(\varvec{X}^{*}) = {-}\epsilon ^{i}\mathcal {J}^{i}_{\mathrm {rev}}(\varvec{X})\) and \(\mathcal {J}^{i}_{\mathrm {irr}}(\varvec{X}^{*}) = \epsilon ^{i}\mathcal {J}^{i}_{\mathrm {irr}}(\varvec{X}).\) We also define the matrix \(\mathsf {L}_{\mathrm {T}}=(L^{ij}_{\mathrm {T}})\) by

$$\begin{aligned} L^{ij}_{\mathrm {T}}(\varvec{X}) = \epsilon ^{i}\epsilon ^{j}L^{ij}(\varvec{X}^{*}). \end{aligned}$$
(65)

Note that \(\det \mathsf {L}_{\mathrm {T}}(\varvec{X})=\det \mathsf {L}(\varvec{X}^{*}).\)

4.3 Result

Direct substitution of (60) into (61) would result in a complicated form, so we use a trick. Considering that (13) holds for any \(\alpha ,\) as in the previous section, we can rewrite (60) by changing \(\alpha =0\)–1/2. The result is

$$\begin{aligned} \mathcal {P}_{\varDelta t}\left( \varvec{X}_{\mathrm {f}}\vert \varvec{X}_{\mathrm {i}}\right)&= \frac{1}{\sqrt{(4\pi \varDelta t)^{N} \det \mathsf {L} (\varvec{X}_{\mathrm {m}})}} \exp \Bigg [ {-}\frac{\varDelta t}{4} \sum \limits _{i,j} \left( L^{-1}\right) ^{ij}\left( \varvec{X}_{\mathrm {m}}\right) \nonumber \\&\quad \times \bigg [ \frac{X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}}{\varDelta t}-\widetilde{\mathcal {J}}^{i}\left( \varvec{X}_{\mathrm {m}}\right) \bigg ] \bigg [ \frac{X^{j}_{\mathrm {f}}-X^{j}_{\mathrm {i}}}{\varDelta t} -\widetilde{\mathcal {J}}^{j}\left( \varvec{X}_{\mathrm {m}}\right) \bigg ] \nonumber \\&\quad -\frac{\varDelta t}{2}\sum \limits _{i}\frac{\partial \mathcal {J}^{i}_{\mathrm {rev}}(\varvec{X}_{\mathrm {m}})}{\partial X^{i}_{\mathrm {m}}} -\frac{\varDelta t}{2}\sum \limits _{i}\frac{\partial \mathcal {J}^{i}_{\mathrm {irr}}(\varvec{X}_{\mathrm {m}})}{\partial X^{i}_{\mathrm {m}}} \nonumber \\&\quad +\frac{\varDelta t}{4}\sum \limits _{i,j} \frac{\partial ^{2}L^{ij} (\varvec{X}_{\mathrm {m}})}{\partial X^{i}_{\mathrm {m}} \partial X^{j}_{\mathrm {m}}}\Bigg ], \end{aligned}$$
(66)

with

$$\begin{aligned} \widetilde{\mathcal {J}}^{i} \left( \varvec{X}_{\mathrm {m}}\right)&\equiv \mathcal {J}^{i}_{\mathrm {rev}} \left( \varvec{X}_{\mathrm {m}}\right) + \mathcal {J}^{i}_{\mathrm {irr}} \left( \varvec{X}_{\mathrm {m}}\right) - \sum \limits _{j} \frac{\partial L^{ij} (\varvec{X}_{\mathrm {m}})}{\partial X^{j}_{\mathrm {m}}}. \end{aligned}$$
(67)

We also obtain

$$\begin{aligned} \mathcal {P}_{\varDelta t}\left( \varvec{X}_{\mathrm {i}}^{*}\vert \varvec{X}_{\mathrm {f}}^{*}\right)&= \frac{1}{\sqrt{(4\pi \varDelta t)^{N} \det \mathsf {L}_{\mathrm {T}} (\varvec{X}_{\mathrm {m}})}} \exp \Bigg [ {-}\frac{\varDelta t}{4} \sum \limits _{i,j} \left( L^{-1}_{\mathrm {T}}\right) ^{ij}\left( \varvec{X}_{\mathrm {m}}\right) \nonumber \\&\quad \times \bigg [ \frac{X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}}{\varDelta t}-\widetilde{\mathcal {J}}^{i}_{\mathrm {T}}\left( \varvec{X}_{\mathrm {m}}\right) \bigg ] \bigg [ \frac{X^{j}_{\mathrm {f}}-X^{j}_{\mathrm {i}}}{\varDelta t} -\widetilde{\mathcal {J}}^{j}_{\mathrm {T}}\left( \varvec{X}_{\mathrm {m}}\right) \bigg ] \nonumber \\&\quad +\frac{\varDelta t}{2}\sum \limits _{i}\frac{\partial \mathcal {J}^{i}_{\mathrm {rev}}(\varvec{X}_{\mathrm {m}})}{\partial X^{i}_{\mathrm {m}}} -\frac{\varDelta t}{2}\sum \limits _{i}\frac{\partial \mathcal {J}^{i}_{\mathrm {irr}}(\varvec{X}_{\mathrm {m}})}{\partial X^{i}_{\mathrm {m}}} \nonumber \\&\quad +\frac{\varDelta t}{4}\sum \limits _{i,j} \frac{\partial ^{2}L^{ij}_{\mathrm {T}} (\varvec{X}_{\mathrm {m}})}{\partial X^{i}_{\mathrm {m}} \partial X^{j}_{\mathrm {m}}}\Bigg ], \end{aligned}$$
(68)

with

$$\begin{aligned} \widetilde{\mathcal {J}}^{i}_{\mathrm {T}} \left( \varvec{X}_{\mathrm {m}}\right)&\equiv \mathcal {J}^{i}_{\mathrm {rev}} \left( \varvec{X}_{\mathrm {m}}\right) - \mathcal {J}^{i}_{\mathrm {irr}} \left( \varvec{X}_{\mathrm {m}}\right) + \sum \limits _{j} \frac{\partial L^{ij}_{\mathrm {T}} (\varvec{X}_{\mathrm {m}})}{\partial X^{j}_{\mathrm {m}}}. \end{aligned}$$
(69)

Substituting (66) and (68) into (61) with (59), we obtain

$$\begin{aligned}&\frac{\varDelta t}{4} \sum \limits _{i,j} \left( L^{-1}\right) ^{ij}\left( \varvec{X}_{\mathrm {m}}\right) \bigg [ \frac{X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}}{\varDelta t}-\widetilde{\mathcal {J}}^{i}\left( \varvec{X}_{\mathrm {m}}\right) \bigg ] \bigg [ \frac{X^{j}_{\mathrm {f}}-X^{j}_{\mathrm {i}}}{\varDelta t} -\widetilde{\mathcal {J}}^{j}\left( \varvec{X}_{\mathrm {m}}\right) \bigg ] \nonumber \\&\quad {-}\frac{\varDelta t}{4} \sum \limits _{i,j} \left( L^{-1}_{\mathrm {T}}\right) ^{ij}\left( \varvec{X}_{\mathrm {m}}\right) \bigg [ \frac{X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}}{\varDelta t}-\widetilde{\mathcal {J}}^{i}_{\mathrm {T}}\left( \varvec{X}_{\mathrm {m}}\right) \bigg ] \bigg [ \frac{X^{j}_{\mathrm {f}}-X^{j}_{\mathrm {i}}}{\varDelta t} -\widetilde{\mathcal {J}}^{j}_{\mathrm {T}}\left( \varvec{X}_{\mathrm {m}}\right) \bigg ] \nonumber \\&\quad + \varDelta t \sum \limits _{i}\frac{\partial \mathcal {J}^{i}_{\mathrm {rev}}(\varvec{X}_{\mathrm {m}})}{\partial X^{i}_{\mathrm {m}}} -\frac{\varDelta t}{4}\sum \limits _{i,j} \frac{\partial ^{2}}{\partial X^{i}_{\mathrm {m}} \partial X^{j}_{\mathrm {m}}} \left[ L^{ij} \left( \varvec{X}_{\mathrm {m}}\right) - L^{ij}_{\mathrm {T}} \left( \varvec{X}_{\mathrm {m}}\right) \right] \nonumber \\&\quad + \frac{1}{2}\log \frac{\det \mathsf {L} (\varvec{X}_{\mathrm {m}})}{\det \mathsf {L}_{\mathrm {T}} (\varvec{X}_{\mathrm {m}})} + \varDelta t \sum \limits _{i} \frac{X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}}{\varDelta t} \frac{\partial S(\varvec{X}_{\mathrm {m}})}{\partial X^{i}_{\mathrm {m}}} = 0, \end{aligned}$$
(70)

where we have used

$$\begin{aligned} S\left( \varvec{X}_{\mathrm {f}}\right) - S\left( \varvec{X}_{\mathrm {i}}\right) = \varDelta t \sum \limits _{i} \frac{X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}}{\varDelta t} \frac{\partial S(\varvec{X}_{\mathrm {m}})}{\partial X^{i}_{\mathrm {m}}}+ O\left( (\varDelta t)^{2}\left( \frac{X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}}{\varDelta t} \right) ^{2}\right) , \end{aligned}$$
(71)

and where the \(O((X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}} )^{2})\) terms in (70) are irrelevant in the limit \(\varDelta t/\tau _\mathrm{macro} \rightarrow 0.\) Note that \(\partial S(\varvec{X})/\partial X^{i}\) are called the thermodynamic forces. Because (70) holds for any \(X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}\) and \(X^{i}_{\mathrm {m}},\) comparing the quadratic terms in \(X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}\) in (70) yields

$$\begin{aligned} L^{ij}(\varvec{X}) = L^{ij}_{\mathrm {T}}(\varvec{X}). \end{aligned}$$
(72)

Comparing the first-order terms in \(X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}\) in (70) with (72), we also have

$$\begin{aligned} \mathcal {J}^{i}_{\mathrm {irr}} (\varvec{X})&= \sum \limits _{j} L^{ij}(\varvec{X}) \frac{\partial S(\varvec{X})}{\partial X^{j}}+ \sum \limits _{j} \frac{\partial L^{ij} (\varvec{X})}{\partial X^{j}}, \end{aligned}$$
(73)

which is called the fluctuation–dissipation relation of the second kind. Comparing the zero-order terms in \(X^{i}_{\mathrm {f}}-X^{i}_{\mathrm {i}}\) in (70) with (72) and (73), we finally obtain

$$\begin{aligned} \sum \limits _{i}\frac{\partial }{\partial X^{i}}\left[ \mathcal {J}^{i}_{\mathrm {rev}} (\varvec{X}) P_{\mathrm {eq}}(\varvec{X})\right] =0. \end{aligned}$$
(74)

Now, we go back to (60). This is interpreted as the transition probability for the discrete time Langevin equation (8). By taking the limit \(\varDelta t/t_\mathrm{macro} \rightarrow 0,\) we obtain

$$\begin{aligned} \frac{\mathrm {d}X^{i}_{t}}{\mathrm {d}t} = \mathcal {J}^{i}_{\mathrm {rev}} \left( \varvec{X}_{t}\right) + \sum \limits _{j} L^{ij}\left( \varvec{X}_{t}\right) \frac{\partial S(\varvec{X}_{t})}{\partial X^{j}_{t}} + \sum \limits _{j} \frac{\partial L^{ij} (\varvec{X}_{t})}{\partial X^{j}_{t}} + \sum \limits _{j} l^{ij} \left( \varvec{X}_{t}\right) \cdot \xi ^{j}_{t} , \end{aligned}$$
(75)

where we have used (73), and \(l^{ij}\) satisfies

$$\begin{aligned} L^{ij}\left( \varvec{X}_{t}\right) = \frac{1}{2}\sum \limits _{k} l^{ik} \left( \varvec{X}_{t}\right) l^{jk} \left( \varvec{X}_{t}\right) . \end{aligned}$$
(76)

It should be noted that the third term on the right-hand side of (75) is not eliminated even if we replace \(l^{ij} (\varvec{X}_{t}) \cdot \xi ^{j}_{t}\) by \(l^{ij} (\varvec{X}_{t}) \odot \xi ^{j}_{t}\) for general multi-component systems. We emphasize that the result (58) in the previous section is rather accidental.

5 Concluding Remarks

In this paper, we have derived a universal form of nonlinear, multiplicative Langevin equations for slow variables in equilibrium systems. The result is essentially equivalent to that derived by Green in 1952. In contrast to previous studies, we first assume the separation of time scales. Then, by using the central limit theorem, we can formalize the asymptotic form of the probability density for the time-averaged fluxes, which determines the time evolution of the slow variables due to the time-reversal symmetry of fluctuation.

Here, we refer to the large deviation theory. Our assumption (47) means large deviation property for the conditional probability density which is more general assumption than the quadratic form of the large deviation function (48). For instance, when one derives a stochastic time evolution equation with white Poisson noises from a microscopic mechanical system, the large deviation property should be valid. The validity condition for (48) depends on details of a microscopic mechanical system. In equilibrium systems, a large deviation function of thermodynamic variables has all information about fluctuations of the thermodynamic variables, and is expressed in terms of a corresponding thermodynamic function, which can be derived from a microscopic mechanical system by using equilibrium statistical mechanics. A central limit theorem can be derived from the large deviation function, and only gives a corresponding fluctuation–dissipation theorem. In this sense, the large deviation is a more fundamental concept for analyzing fluctuations although physical aspects of the large deviation function for the time-averaged flux remain to be studied.

Before ending this paper, we summarize future problems related to our results. First, although discussions of physical phenomena are out of scope of this paper, it seems interesting to find a system in which the third term on the right-hand side of (75) plays an important role in phenomena. We do not know any such examples explicitly, but there are many cases where transportation coefficients strongly depend on thermodynamic variables. However, it should be noted that the contribution of this term is higher-order in macroscopic systems, as discussed in Sect. 1. We thus seek such systems among small systems or singular systems.

Second, we believe that the result and its derivation method may provide the final answer for formally describing slow variables in equilibrium systems. For example, in principle, fluctuating hydrodynamics in equilibrium systems can be studied explicitly using the same method. In this case, we consider long-wavelength fluctuations of locally conserved quantities, which are called hydrodynamic modes, to be slow variables. Although the formulation may be developed similarly to the ideas in this paper, there will be many technical difficulties in performing concrete calculations. We thus conjecture that a formal derivation of the Navier–Stokes equation from Hamiltonian particle systems [28] may be helpful for completing this problem. Related to the argument of fluctuating hydrodynamics, we recall the assumption in Sect. 3 that the relaxation time of the momentum is much larger than the time scales of the other degrees of freedom. There are cases where this assumption does not hold when the time scale of hydrodynamic modes is comparable with that of the momentum of a Brownian particle, as observed in recent experiments [29,30,31]. Deriving the Langevin equation describing the motion observed in these experiments is also a challenging task.

Third, the obvious problem we should study in future is formulating the stochastic evolution of slow variables in systems out of equilibrium. If the time scales are separated so that the slow variables are clearly defined, then the concept of large deviation can be used even in nonequilibrium cases. Of course, there are many nonequilibrium phenomena in which a complete set of slow variables is not identified. Although these have interesting phenomena, we do not have a systematic method for studying them. Putting such systems aside, we focus on systems where the slow variables are defined. If the Gaussian form of the large deviation function is effective, then the dynamics of the slow variables will be described by a Langevin equation. Even for this simple class, we do not have a general form, because the symmetry property (61) cannot be used. Rather, one may find that difficulties already appear in writing a deterministic equation for slow variables before considering stochastic systems. Nevertheless, our method will be applied to the Brownian motion in nonuniform temperatures where the system satisfies the detailed balance condition as studied in [32].

Finally, recent work explicitly derived deterministic order parameter equations near the order–disorder transition of the globally coupled XY model and the synchronization–desynchronization transition of the Kuramoto model [33]. The characteristic feature of the derivation method here is to use nonequilibrium identities such as the Jarzynski equality [34] and the Hatano–Sasa equality [35]. Although we need to assume a rather special type of probability distribution at the initial time, the calculation steps are substantially reduced. We thus expect that the unified framework of the approach in [33] and the theory developed in this paper may be the first step in the universal description of the stochastic evolution of slow variables out of equilibrium.