1 Introduction

In classical statistical mechanics one seeks to derive the Gibbs equilibrium distributions as critical points of the information entropy over a space of measures satisfying certain constraints. Gibbs himself showed via an elementary argument that such distributions minimize information entropy for given energy, see [7, Theorem 2, p. 130]. See also also [13, p. 746] and (27) below.

Arguments using Lagrange multipliers to show that all critical points of entropy under constraints are Gibbs distributions are widespread. For examples see Huang’s classic text [11, p. 82] or the recent text [9, p. 24] and articles such as [12] or [24]. For the local equilibrium case see [27, p. 66]. These formal applications of Lagrange multiplier theory on unspecified function spaces do not address the differentiability of the entropy or of the constraints. They face the problem that strictly positive distribution functions do not tend to form open sets in function spaces and therefore it is not always clear how to differentiate.

The answer, of course, is to endow the relevant space of measures with a differentiable manifold structure. In order to apply Lagrange multiplier theory on Banach manifolds, as in [1] for example, we set the following minimum requirements for the manifold structure we seek:

  • The information entropy must be at least \(C^1\),

  • all constraints must also be at least \(C^1\), and

  • the manifold charts must map the measures we eventually work with into open sets.

We show here that a Banach manifold structure closely related to the one introduced in [22] and further developed in [3] satisfies these requirements. We can then apply standard theorems for constrained optimization on Banach manifolds as in [1] to conclude that all critical pointsFootnote 1 are indeed Gibbs distributions. For other constructions of statistical manifolds see [16, 18, 19, 25, 26].

Here we shall insist on a manifold structure on spaces of finite measures as in [3] and will impose the condition that we have probability measures as one of the constraints. Starting with probability measures would introduce an extra complication in the model space of functions (the functions would have to be “centered” as in [22, p. 1553]) making the derivative formulas more difficult to work with. It is also the custom in Physics to impose probability as a constraint.

Our main motivation is to characterize local equilibrium Gibbs ensembles, i.e. critical points of entropy under constraints that are functions on space domains rather than constants. We have found this to be a necessary step in the further development of ideas from Statistical Mechanics via Information Geometry, along the lines of [4] for example. Local equilibrium measures also play a prominent role both in Morrey’s seminal work on the derivation of hydrodynamic equations from microscopic dynamics [17], as well as in one of the most important advances on the same problem to date, [20]. For the role of local equilibrium ensembles in non-equilibrium statistical mechanics in general see [5, §3].

It turns out that the same manifold structure works for both equilibrium measures and local equilibrium measures: when the constraints are scalar the result is equilibrium Gibbs and scalar Lagrange multipliers (inverse temperature, chemical potential, etc.); when the constraints are functions the result is local equilibrium Gibbs and the multipliers are elements of the space dual to the function constraints (bounded functions). An argument completely analogous to the equilibrium case shows that the local equilibrium critical points are again minimizers. We find this to be one of the main features of the method here: changing the target of the constraints gives the correct multipliers while the manifold structure remains the same.

With these understood, the main results of this article are:

  • the realization that the space of Definition 1 satisfies our requirements and

  • Theorem 2 that charecterizes local Gibbs equilibria in terms of entropy.

In Sect. 2 we show in a general setting how to define a manifold structure on finite measures on some arbitrary space so that both entropy and scalar constraints are continuously differentiable and we obtain the equilibrium Gibbs formulas for the critical points.

Section 3 contains the application of the general result to the equilibrium grand canonical Gibbs on the phase space of a physical system.

The case of the local equilibrium appears as Sect. 4.

As explained in [22, p. 1548], one is naturally led to model the space of finite positive measures on Orlicz spaces. Here, we shall always set

$$\begin{aligned} \varPhi (x) = e^{|x|} -1 \end{aligned}$$
(1)

and we shall use the Orlicz space (for Young function \(\varPhi \))

$$\begin{aligned} L^{\varPhi }(\mu ) := \left\{ f:\varOmega \rightarrow \mathcal {R} \ \ measurable: \int _{\varOmega } \varPhi (\alpha f(x)) \mu (dx) < \infty \ for\ some \ \alpha > 0 \right\} \end{aligned}$$
(2)

The norm

$$\begin{aligned} \Vert f\Vert _{L^{\varPhi }(\mu )} = \inf \left\{ k > 0: \int _{\varOmega } \varPhi \left( \frac{f(x)}{k} \right) \mu (dx) \le 1 \right\} , \end{aligned}$$
(3)

renders \(L^{\varPhi }(\mu )\) a Banach space [23, Chapter 3, Theorem 10]. We shall rely on the fact that \(L^\varPhi (\mu )\), for this \(\varPhi \), continuously embeds into \(L^{p}(\mu )\) for all \(p \ge 1\):

$$\begin{aligned} L^{\varPhi }(\mu ) \hookrightarrow \bigcap _{p \ge 1}L^{p}(\mu ), \end{aligned}$$
(4)

see [3, Equation 3.104], [22, Proposition 2.3].

To ensure that entropy is also continuously differentiable, we shall work on those components of the manifold consisting of measures that have the logarithm of their density (with respect to the fixed reference measure) in the corresponding Orlicz space. This should be compared to the discussion in [21, p. 4055] regarding the differentiability of entropy on the subspace of probability measures.

2 The general case for real-valued constraints

2.1 Manifold structure

For a fixed space \(\varOmega \), with \(\mu _0\) fixed positive (but not necessarily finite) measure, and for

$$\begin{aligned} \mathbf{C} :\varOmega \rightarrow \mathcal {R}^{n}; \quad \mathbf{C} (x) = \left( C_{1}(x), \dots , C_{n}(x) \right) \end{aligned}$$
(5)

a measurable vector-valued function on \(\varOmega \) (out of which the constraints will be defined), we are interested in a manifold structure (eventually rendering information entropy and constraints differentiable) on the set of finite measures with strictly positive densities

$$\begin{aligned} \mathcal {F} := \left\{ \mu : \mu \ll \mu _{0}, \ \frac{d\mu }{d\mu _0} >0, \ \int _{\varOmega } \left| C_{i} (x) \right| \mu (dx) < + \infty , \ i=0, 1,\ldots ,n \right\} . \end{aligned}$$
(6)

Note that here, and for the rest of this article, we set \(C_{0}(x) \equiv 1\) to include the finiteness of the measures. To construct the manifold structure we use (and adapt) the approach from [3, §3.3.3] for all of the finite measures \(|C_i|\mu \) simultaneously:

Definition 1

Define

$$\begin{aligned} X_{\mu } = \bigcap _{i=0}^{n} L^{\varPhi }(|C_{i}|\mu ), \end{aligned}$$
(7)

with norm

$$\begin{aligned} \Vert f\Vert _{X_\mu } = \max _{0 \le i \le n} \left( \Vert f\Vert _{L^{\varPhi }(|C_{i}|\mu )} \right) , \end{aligned}$$
(8)

using the notation of (3).

A few remarks regarding the intersection of (7) are in order. The intersection is taken in the space of measurable functions. Given that all of the \(C_i\)’s do not vanish off a set of \(\mu _0\) measure zero, the convergence in each \(L^{\varPhi }(|C_{i}|\mu )\) implies pointwise convergence up to subsequence, therefore a sequence that is Cauchy with respect to each of the norms has the same limit in each space. (This argument was suggested by a diligent referee.)

Alternatively, use the standard metric for convergence in measure (see for example [8, p. 63, Exercise 32]) and apply [2, p. 57] to show that \(X_\mu \) is Banach.

In any case, from now on we will be working under the following

Assumption

For all i and \(\mu _0\)-almost all x:

$$\begin{aligned} C_i(x) \ne 0 . \end{aligned}$$
(9)

It will be easy to check that this assumption is satisfied in the applications that follow.

Definition 2

Let \(\mu ,\mu ' \in \mathcal {F}\). Define \(\mu ' \le \mu \) if \(\mu ' = \psi \mu \), \( \psi \in \ L^{p}(|C_{i}| \mu )\) for some \(p > 1\) and all \(i = 0,1, \ldots , n\). Write \(\mu ' \sim \mu \) when both \(\mu \le \mu '\) and \(\mu ' \le \mu \) hold.

Then \(\sim \) is an equivalence relation: this follows from the argument preceding Proposition 3.11 in [3, p. 178], repeated for each of the finite measures \(|C_i|\mu \). Note that once we have a \(\psi _i\) in \(L^{p_i}(|C_i| \mu )\) from this argument we can choose the smallest \(p_i\) to satisfy Definition 2.

\(X_\mu \) will be the model space for each equivalence class of this equivalence relation and the equivalence classes will be connected components with the same tangent space with respect to the manifold structure we are after. More precisely:

Proposition 1

If \(\mu ' \le \mu \) then the identity map continuously embeds \(X_{\mu }\) into \(X_{\mu '}\). In particular, \(\mu ' \sim \mu \) implies \(X_{\mu }= X_{\mu '}\) as Banach spaces, i.e. they consist of the same elements and the norms are equivalent.

Proof

Proposition 3.11 in [3, p. 178] shows the equality of each of the spaces in the intersection of Definition 2. \(\square \)

Proposition 2

For \(\mu \) in \(\mathcal {F}\) and K the equivalence class of \(\mu \), the map

$$\begin{aligned} s_\mu : \mu ' = \psi \mu \mapsto \log \psi \end{aligned}$$
(10)

has open and convex image \(s_\mu (K)\) in \(X_\mu \).

Proof

That the image is in \(X_\mu \) follows directly from Definition 2 of the equivalence relation once the comments at the beginning of section 3.3.3 at [3, p. 176] are repeated for each of the spaces in the intersection (7).

Given this, the proof of [3, Theorem 3.4] applies verbatim to show that the image is open and covex. \(\square \)

Remark 1

Both \(s_\mu \) and its inverse on \(s_\mu (K)\), given by \(s_{\mu }^{-1}(u) = e^u \mu \), are continuous with respect to the following modification of the standard e-convergence, cf. [3, Definition 3.13 and Proposition 3.13]:

Definition 3

For \(\{g_{k}\mu _0, k \in \mathcal {N}\}\) and \(g\mu _0\) in \(\mathcal {F}\) we say that \(g_{k}\mu _0\) is e-convergent to \(g\mu _0\) if for all \(i =0, 1, \ldots , n\) and all \(p\ge 1\) we have

$$\begin{aligned} \lim _{k \rightarrow \infty } \int _{\varOmega } \left| \left( \frac{g_{k}}{g} \right) - 1 \right| ^{p} |C_{i}| g \mu _{0} = 0, \ \ {and}\ \lim _{k \rightarrow \infty } \int _{\varOmega } \left| \left( \frac{g}{g_{k}} \right) - 1 \right| ^{p} |C_{i}| g \mu _{0} = 0.\nonumber \\ \end{aligned}$$
(11)

Changing from \(\mu \) to \(\mu ' = \psi \mu \) in the same equivalence class replaces \(s_\mu \) with a homeomorphism \(s_{\mu '}\) from the same K to \(X_{\mu '}\). The transition function

$$\begin{aligned} s_\mu ^{-1}\circ s_{\mu '}:s_\mu (X_\mu ) \rightarrow s_{\mu '}(X_{\mu '}) \end{aligned}$$
(12)

between open sets in equivalent Banach spaces is

$$\begin{aligned} u \mapsto u + \log \psi , \end{aligned}$$
(13)

clearly a \(C^{\infty }\) diffeomorphism.

This provides the Banach manifold structure we shall use: \(\mathcal {F}\) is a disjoint union of open, connected components, the equivalence classes, and every component is modeled by any of the equivalent Banach spaces \(X_\mu \), for \(\mu \) in the corresponding equivalence class, cf. [3, Remark 3.13]. Via the chart \(s_\mu \) the tangent space \(T_\mu \mathcal {F}\) of \({\mathcal {F}}\) at \(\mu \) (defined as equivalence classes of curves) can be identified with \(\displaystyle X_\mu = \bigcap \nolimits _{i=0}^{n} L^{\varPhi }(|C_{i}|\mu )\), with the norm (8) or any of the equivalent norms \(\Vert f\Vert _{X_{\mu '}}\), \(\mu '\sim \mu \), via an isomorphism (the restriction on fibres of a bundle isomorphism \(\displaystyle \bigcup \nolimits _{\mu \in K} T_\mu \mathcal {F}\simeq s_\mu (K) \times X_\mu \) corresponding to \(s_\mu \), see [1, p. 155]).

2.2 Entropy and constraints

Consider now the information entropyFootnote 2

$$\begin{aligned} \begin{array}{c} H:\mathcal {F} \rightarrow \mathcal {R} \cup \left\{ + \infty \right\} \\ \displaystyle H(\phi \mu _{0}) = \int _{\varOmega } \phi \log {\phi } \ \mu _{0}. \end{array} \end{aligned}$$
(14)

It turns out that if \(\mu '\sim \mu \) with \(\mu = \phi \mu _0\) and \(\mu ' = \tau \mu _0\) then \(\log \phi \in L^{\varPhi }(\mu )\) iff \(\log \tau \in L^{\varPhi }(\mu ')\) and \(H(\phi \mu _0)\) is finite iff \(H(\tau \mu _0)\) is, see [21, p. 4055, §4.3]. Recalling standard facts from [1, Corollary 2.4.10, Definition 3.2.5], to show that H is \(C^1\) it is enough to show that \(H\circ s_\mu ^{-1}\) is \(C^1\)-Gateaux on \(X_\mu \).

Proposition 3

Let \(\mu = \phi \mu _0\) be such that \(\log \phi \in L^{\varPhi }(\mu )\). Then H is is a real valued \(C^1\) function on K and

$$\begin{aligned} d_0(H \circ s_\mu ^{-1})[v] = \int (\log \phi (x) + 1) v(x) \phi (x) \mu _0(dx). \end{aligned}$$
(15)

Proof

Following [1, Corollary 2.4.20 and Definition 3.2.5] it is enough to check that \(H\circ s_\mu ^{-1}\) is \(C^1\)-Gateaux on \(s_\mu (K_\mu )\). For this, calculate

$$\begin{aligned} \begin{array}{rl} d_{u}(H\circ s_\mu ^{-1})[v] &{}\displaystyle = \frac{d}{dt}\bigg |_{t = 0} \left( H \circ s_\mu ^{-1}\right) (u + tv) \\ &{}\displaystyle = \int v e^{u}\left[ 1+ u + \log {\phi } \right] \mu (dx) \end{array} \end{aligned}$$
(16)

for \(u,v \in X_{\mu }\). Applying Hölder on the last two terms and using that \(\log \phi \in L^{\varPhi }(\mu )\) and (4) we get

$$\begin{aligned} \left| d_{u}(H\circ s_\mu ^{-1})[v]\right| \le \Vert u\Vert _{X_{\mu '}} \Vert v\Vert _{X_{\mu '}} + C \Vert v\Vert _{X_{\mu '}}, \end{aligned}$$
(17)

for \(\mu ' = e^u \mu \). Then use Proposition 1 to see that \(d_u H\) is bounded. The continuous dependence on u follows from (16) with the use of (11). \(\square \)

Definition 4

Define \(\mathcal {F}_e\) to be the union of all the components K of \(\mathcal {F}\) such that for \(\mu =\phi \mu _0\) in K the condition \(\log \phi \) in \(L^{\varPhi }(\mu )\) holds.

Define now, for \(C_i\) as in (5),

$$\begin{aligned} \begin{array}{c} G_i:\mathcal {F} \rightarrow \mathcal {R},\quad G_{i}(\mu ) = \displaystyle \int C_{i}(x) \mu (dx)\\ \mathbf{G} :\mathcal {F} \rightarrow \mathcal {R}^{n+1}, \mathbf{G} (\mu ) = (G_{0}(\mu ), \ldots , G_{n}(\mu )). \end{array} \end{aligned}$$
(18)

Proposition 4

\(\mathbf{G}\) is \(C^1\) on \(\mathcal {F}\) and

$$\begin{aligned} d_0(\mathbf{G} \circ s_\mu ^{-1})[v] = \int v(x) \mathbf{C}(x)\mu (dx). \end{aligned}$$
(19)

Proof

First calculate

$$\begin{aligned} d_{0}(G_{i} \circ s_\mu ^{-1})[v]= \frac{d}{dt}\bigg |_{t = 0} \int C_{i} e^{t v} \mu = \int C_{i} \ v \mu , \end{aligned}$$
(20)

and repeat the argument of Proposition 3 for the measure \(|C_i|\mu \) for each i. \(\square \)

Remark 2

The Propositions above provide \(C^1\) differentiability which is sufficient for applying Proposition 5 below. Higher differentiability is likely, but we shall not insist on it here.

Recall now that a closed subspace F of a Banach space E is “split” if there is a closed subspace H of E such that \(E = F \bigoplus H\), where \(F \bigoplus H\) is the direct sum of F and H (i.e. the product of F and H with the product topology and any of the equivalent norms that produce this topology).

Also recall that a map f between Banach manifolds M and N is a submersion if for each m in M, \(d_{m}f\) is surjective with split kernel.

Lemma 1

\(d_{\mu }{} \mathbf{G} : T_{\mu }\mathcal {F} \rightarrow \mathcal {R}^{n+1}\) is surjective for all \(\mu \) in \(\mathcal {F}\).

Proof

If not, there is i such that for all v in \(X_{\mu }\)

$$\begin{aligned} \int C_{i}(x) \ v(x)\mu (dx) = 0 . \end{aligned}$$
(21)

This implies \(C_{i} \equiv 0\) as the characteristic function of the set \(M = \left\{ x: C_{i}(x) > 0 \right\} \) is in \(X_\mu \), contradicting our standing Assumption (9). \(\square \)

Lemma 2

\(T_{\mu }\mathcal {F} = ker\,d_{\mu }{} \mathbf{G} \oplus F\), with F, \(ker\, d_{\mu }{} \mathbf{G} \) closed subspaces.

Proof

Since \(d_{\mu }{} \mathbf{G} \) is continuous, it is standard that ker\(\,d_{\mu }{} \mathbf{G} \) is closed, cf. [8, Chapter 5, Exercise 15]). Take F to be the span of \(y_0, y_{1}, \ldots , y_{n} \in T_{\mu }\mathcal {F}\) such that \(d_{\mu }{} \mathbf{G} [y_{i}] = \mathbf{e} _{i}\) for all i, and \(\mathbf{e} _{i}\) the standard basis vectors in \(\mathcal {R}^{n+1}\). The statement follows from the fact that

$$\begin{aligned} P(x) = \sum _{i=0}^n d_{\mu } G_i[x] y_i \end{aligned}$$
(22)

is a projection with F as fixed points and the characterization of direct sums in terms of projections as in [1, Proposition 2.2.17]. \(\square \)

We are now ready to apply the following, see [1, Proposition 3.5.24]:

Proposition 5

Let M and P be Banach manifolds, let \(f:M \rightarrow \mathcal {R}\) be \(C^1\), and let \(g: M \rightarrow P\) be a \(C^{1}\) submersion. For \(N = g^{-1}(p_{0})\) for some \(p_0\) in P a point n in N is a critical point of \(f \big |_{N}\) if and only if there exists \(\lambda \) in \((T_{p_{0}}P)^*\), a Lagrange multiplier, such that \(d_{n}f = \lambda \circ d_{n}g\).

Recalling Definition 4, we then have

Theorem 1

Let \(\mathbf{{G}}:\mathcal {F} \rightarrow \mathcal {R}^{n+1}\) as in (18). Then \(\mu = \phi \mu _0\) is a critical point of the Gibbs entropy H as in (14) in \(\mathbf{{G}}^{-1}( \gamma ) \cap \mathcal {F}_e\), for some \(\gamma \) in the image of \(\mathbf{G}\) and \(\mathcal {F}_e\) as in Definition 4, if and only if there exist \(\lambda _i\)’s in \(\mathcal {R}\) such that

$$\begin{aligned} \phi = \frac{\displaystyle \exp \left( \sum _{i=1}^{n} \lambda _{i}C_i(x)\right) }{\displaystyle \int _{\varOmega }\exp \left( \sum _{i=1}^{n} \lambda _{i}C_i(x)\right) \mu _0(dx)}. \end{aligned}$$
(23)

Proof

Apply Proposition 5 to \(\mathbf{G}\) and H at \(\phi \mu _0\) using the formulas for the derivative from (19) and (15) respectively to get \( \varvec{\lambda } \in \mathcal {R}^{n+1}\) such that

$$\begin{aligned} \int _{\varOmega } \left[ \log {\phi }(x) + 1 \right] v(x)\ \mu (dx) = \varvec{\lambda } \cdot \int _{\varOmega } \mathbf{C} (x) \ v(x) \mu (dx) \end{aligned}$$
(24)

for all \(v \in X_{\mu }\). Equivalently,

$$\begin{aligned} \int _{\varOmega } \left( \log {\phi }(x) + 1 - \sum _{i=0}^{n} \lambda _{i} C_{i}(x) \right) v(x) \ \mu (dx) = 0 \end{aligned}$$
(25)

for all \(v \in X_{\mu }\). As in the proof of Lemma 1, this implies

$$\begin{aligned} \log \phi (x) = -1 +\lambda _0 + \sum _{i=1}^{n} \lambda _{i}\, C_{i}(x). \end{aligned}$$
(26)

Since the first constraint is 1, this implies the statement. \(\square \)

Recall here the standard argument from [13] to see that the the Gibbs measure (23) is an entropy minimizer: for any other \(\mu = \tau \mu _0\) satisfying the same constraints

$$\begin{aligned} \begin{array}{rl} H(\tau \mu _0) &{}= \displaystyle \int (\tau \log \frac{\tau }{\phi } + \tau \log \phi ) \mu _0 \\ &{}\displaystyle \ge \int \tau \log \phi \mu _0\\ &{}= \displaystyle \int \tau \sum \lambda _i C_i \mu _0 - \log Z = H(\phi \mu _0), \end{array} \end{aligned}$$
(27)

for Z the denominator of (23).

3 The grand canonical ensemble in statistical mechanics

3.1 The grand canonical ensembles as entropy minimizers

We now apply the previous section to derive the grand canonical Gibbs measure in Statistical Mechanics as a ctitical point of Gibbs entropy under function constraints, as explained in the Introduction.

Recall first that a classical system of N particles in an open bounded domain \(\varLambda \) of \(\mathcal {R}^3\) is described at any time by the positions \(q_1,\ldots , q_N\) of the particles and the corresponding momenta \(p_1,\ldots , p_N\), with the \(q_i\)’s in \(\varLambda \) and the \(p_i\)’s in \(\mathcal {R}^3\). In classical statistical mechanics, see [7], one examines ensembles of systems, with each system in the ensemble occuring according to some probability law. This means that one works with \(\varLambda ^N \times (\mathcal {R}^3)^N\). And if one does not fix the number of particles then the relevant space is \(\displaystyle \bigsqcup \nolimits _{N=1}^{\infty } \left( \varLambda \times \mathcal {R}^{3} \right) ^{N}\). At equilibrium the probability does not change. Prominent among equilibrium distributions are the canonical (for fixed N) and the grand canonical Gibbs distributions (when N is not fixed). We examine here the grand canonical - the argument for the canonical follows by restricting on a fixed N component.

Then let \(\varLambda \) be a finite volume, open set in \(\mathcal {R}^{3}\) and \(\mathcal {X}_{\varLambda }\) the corresponding phase space

$$\begin{aligned} \mathcal {X}_{\varLambda } = \bigsqcup _{N\ge 1} \left( \varLambda ^N \times \mathcal {R}^{3N}\right) = \left( \varLambda \times \mathcal {R}^3 \right) \sqcup \left( \varLambda ^2 \times \mathcal {R}^6 \right) \sqcup \ldots \end{aligned}$$
(28)

Each of the components of this disjoint union has the topology from \(\mathcal {R}^{6N}\). The underlying \(\sigma \)-algebra will be \(\bigcup _{N\ge 1} {\mathcal {B}}(\varLambda ^N\times \mathcal {R}^{3N})\) where \({\mathcal {B}}(\varLambda ^N\times \mathcal {R}^{3N})\) is the Borel \(\sigma \)-algebra on \(\varLambda ^N\times \mathcal {R}^{3N}\).

The reference measure \(\mu _{0}\) on \(\mathcal {X}_{\varLambda }\) will be

$$\begin{aligned} \mu _0 = \sum _{N=1}^{\infty } \frac{dq_{1} \ldots d_{N}}{N!}dp_{1} \ldots dp_{N}. \end{aligned}$$
(29)

Then for f measurable on \(\mathcal {X}_{\varLambda }\)

$$\begin{aligned} \int _ {\mathcal {X}_{\varLambda }} f \mu _0 = \sum _{N \ge 1} \int _{\varLambda ^N \times \mathcal {R}^{3N}}f(q_1,\ldots ,q_N,p_1,\ldots ,p_N) \ \frac{dq_{1} \ldots d_{N}}{N!}dp_{1} \ldots dp_{N}.\nonumber \\ \end{aligned}$$
(30)

For more on \(\mathcal {X}_{\varLambda }\) see [15, p. 18].

To describe the constraints, start with U a stable interaction potential (measuring the potential energy of a particle configuration), i.e.

$$\begin{aligned} \begin{array}{rl} &{}U: \mathcal {X}_{\varLambda } \rightarrow \mathcal {R},\ measurable\ and\ symmetric,\\ &{}(q_{1}, \ldots , q_{N}) \rightarrow U(q_{1}, \ldots , q_{N}),\ {for\ each\ N} \\ &{}U(q_{1}, \ldots , q_{N}) \ge -NL, \end{array} \end{aligned}$$
(31)

for some \(L>0\) and for all \(N \in \mathcal {N}\), and define the total energy of a configuration \((\mathbf{q, p}) =(q_{1}, \ldots , q_{N}, p_{1}, \ldots , p_{N})\) as the sum of the potential and the kinetic energy

$$\begin{aligned} \begin{array}{rl} &{}E:\mathcal {X}_{\varLambda } \rightarrow \mathcal {R},\\ &{}E(q_{1}, \ldots , q_{N}, p_{1}, \ldots , p_{N}) =\displaystyle \frac{1}{2} \sum _{i=1}^{N} |p_{i}|^{2} + U(q_{1}, \ldots , q_{N}). \end{array} \end{aligned}$$
(32)

(We are implicitely setting all masses equal to 1.)

The total momentum and particle number are defined as

$$\begin{aligned} \mathbf {P}: \mathcal {X}_{\varLambda } \rightarrow \mathcal {R}^{3}, \ \mathbf {P}(q_{1}, \ldots , q_{N}, p_{1}, \ldots , p_{N}) = \sum _{i=1}^{N} p_{i} \end{aligned}$$
(33)

and

$$\begin{aligned} \mathbf {N}: \mathcal {X}_{\varLambda } \rightarrow \mathcal {N}, \ \mathbf {N}(q_{1}, \ldots , q_{N}, p_{1}, \ldots , p_{N}) = N, \end{aligned}$$
(34)

respectively. The constraints now are

$$\begin{aligned} \begin{array}{c} \mathcal {E}[\mu ]:=\displaystyle \int _{\mathcal {X}_{\varLambda }} E(\mathbf{q, p}) \mu (d\mathbf{q},d\mathbf{p}) = e_0, \\ \mathcal {P}[\mu ]:=\displaystyle \int _{\mathcal {X}_{\varLambda }} \mathbf {P} (\mathbf{q, p}) \mu (d\mathbf{q},d\mathbf{p}) = \mathbf {p}_0, \\ \widetilde{\mathbf {N}}[\mu ] := \displaystyle \int _{\mathcal {X}_{\varLambda }} \mathbf {N}(\mathbf{q, p}) \mu (d\mathbf{q},d\mathbf{p}) = n_0, \end{array} \end{aligned}$$
(35)

with derivatives

$$\begin{aligned} \begin{array}{c} d_{\mu }\mathcal {E}[v] = \displaystyle \int _{\mathcal {X}_{\varLambda }} E(\mathbf{q, p}) \ v(\mathbf{q, p}) \mu (d\mathbf{q},d\mathbf{p}),\\ d_{\mu }{\mathcal {P}}[v] = \displaystyle \int _{\mathcal {X}_{\varLambda }} \mathbf {P} (\mathbf{q, p}) \ v(\mathbf{q, p}) \mu (d\mathbf{q},d\mathbf{p}),\\ d_{\mu }{\widetilde{\mathbf {N}}}[v] = \displaystyle \int _{\mathcal {X}_{\varLambda }}\mathbf {N} (\mathbf{q, p}) \ v(\mathbf{q, p}) \mu (d\mathbf{q},d\mathbf{p}), \end{array} \end{aligned}$$
(36)

respectively. The Gibbs grand canonical distribution for paramaters \(\beta \), \(\varvec{\lambda }\), \(\nu \) is defined as \(\phi _{\beta , \varvec{\lambda }, \nu } \mu _0\) with

$$\begin{aligned} \begin{array}{c} \phi _{\beta , \varvec{\lambda }, \nu }(\mathbf{q, p}) = \displaystyle \frac{\exp \left( {\displaystyle \beta E(\mathbf{q, p}) + \varvec{\lambda } \cdot \mathbf {P}(\mathbf{q, p}) + \nu N(\mathbf{q, p})}\right) }{Z_{\beta , \varvec{\lambda }, \nu }},\\ Z_{\beta , \varvec{\lambda }, \nu } = \displaystyle \int _{\mathcal {X}_{\varLambda }} e^{\beta E + \varvec{\lambda } \cdot \mathbf {P} + \nu E} \mu _0. \end{array} \end{aligned}$$
(37)

We now show that the requirement of Theorem 1 does not eliminate the Gibbs grand canonical distribution.

Proposition 6

For U stable potential as in (31) and \(\beta <0\), \(\log {\phi }_{\beta , \varvec{\lambda }, \nu }\) is in \(L^{\varPhi }(\phi _{\beta , \varvec{\lambda }, \nu }\mu _0)\).

Proof

As in [3, Proposition 3.9 of §3.3.2] it suffices to show \(\log {\phi }_{\beta , \varvec{\lambda }, \nu }\) is in \(L^{\cosh (t)-1}(\phi _{\beta , \varvec{\lambda }, \nu }\mu _0)\). We show there exists \(\varepsilon > 0\) such that

$$\begin{aligned} \int _{\mathcal {X}_{\varLambda }} \left( \cosh \left( \varepsilon \log \phi _{\beta , \varvec{\lambda }, \nu } \right) - 1 \right) \phi _{\beta , \varvec{\lambda }, \nu }\mu _0 < +\infty . \end{aligned}$$
(38)

For this, after disposing of constants, it is enough to show that there exists \(\varepsilon > 0\) such that

$$\begin{aligned} \int _{\mathcal {X}_{\varLambda }} \left( e^{(1+\varepsilon ) [\beta E + \varvec{\lambda } \cdot \mathbf {P} + \nu N] } + e^{(1 - \varepsilon ) [\beta E + \varvec{\lambda } \cdot \mathbf {P} + \nu N] } \right) \mu _0 < +\infty . \end{aligned}$$
(39)

For the second term in the integral complete the square to get

$$\begin{aligned} \int _{\mathcal {X}_{\varLambda }} e^{\displaystyle (1-\varepsilon ) \left[ \beta U + \sum _{i=1}^{N} \frac{\beta }{2} \left( p_{i} + \frac{\varvec{\lambda }}{2 \beta } \right) ^{2} - \frac{\varvec{\lambda }^{2}}{2 \beta }+ \nu N\right] } \mu _0 \end{aligned}$$
(40)

and integrate with respect to \(dp_{1}\ldots dp_{N}\) to get, up to constant, when \(\varepsilon <1\),

$$\begin{aligned} \sum _{N} \left( \frac{2\pi }{(1-\varepsilon )\beta }\right) ^{N/2} e^{(1-\varepsilon ) \varvec{\lambda }^{2}/2 \beta } e^{(1-\varepsilon )\nu N} \int _{\varLambda ^{N}} e^{(1-\varepsilon )\beta U} \frac{dq_{1} \ldots dq_{N}}{N!}. \end{aligned}$$
(41)

The stability condition gives

$$\begin{aligned} \begin{array}{rl} \displaystyle \int _{\varLambda ^{N}} e^{(1-\varepsilon )\beta U}\frac{dq_{1} \ldots dq_{N}}{N!} &{}\le \displaystyle \int _{\varLambda ^{N}} e^{(1-\varepsilon )\beta L N}\frac{dq_{1} \ldots dq_{N}}{N!} \\ &{}= \displaystyle \frac{e^{(1-\varepsilon )\beta L N} |\varLambda |^{N}}{N!} < +\infty , \end{array} \end{aligned}$$
(42)

where \(|\varLambda |\) denotes the Lebesgue measure of \(\varLambda \). Therefore (41) is finite for any \(\varepsilon <1\). The argument repeats for the first term in (39), this time without restrictions on \(\varepsilon \). \(\square \)

Apply Theorem 1 to get

Corollary 1

A probability measure \(\mu \) is a critical point for the entropy H amongst all probabiliy measures in \(\mathcal {F}_e\) with fixed \(\mathcal {E}\), \(\mathcal {P}\), and \(\mathbf {N}\) if and only if \(\mu \) is a grand canonical Gibbs \({\phi }_{\beta , \varvec{\lambda }, \nu }\mu _0\) for some \(\beta , \varvec{\lambda }, \nu \).

3.2 The range of the constraints

For applications, it is important to know the possible values that we gan give the constraints (35). An answer to this was given in [6], without the use of any manifold structure, where an intricate step was to show that the set of all such values is open. Given what we have already developed here, we can show directly the openess of this set, in the general setting of the previous section.

Proposition 7

Let

$$\begin{aligned} \mathbf{G}:\mathcal {F} \rightarrow \mathcal {R}^{n+1}, \quad \mathbf{G}[\mu ] = \int \mathbf{C}(x) \mu (dx), \end{aligned}$$
(43)

for \(\mathbf{C}\) as in (5). Then \(\mathbf{G}\) has an open image.

Proof

We have already established the surjectivity of \(d_\mu \mathbf{G}\) under the assumptions of the Proposition in 1. The openess then follows from the local surjectivity theorem [1, Proposition 2.5.9]. \(\square \)

The uniqueness of \((\beta , \varvec{\lambda }, \nu )\) given \((e_0, \mathbf {p}_0, n_0)\) was also shown in [6, Proposition 1] using the Legendre transform of H (the log-partition function). In future work we shall address this point in the framework of information geometry.

4 The local equilibrium

The main application of what we have developed so far is that we can rigorously address the case of local equilibrium Gibbs measures, i.e. we can find critical points of the same entropy function subject to different constraints to get measures on the \(X_\varLambda \) of the form

$$\begin{aligned} \frac{1}{Z} \exp \left\{ \sum _{i =1}^N \beta (q_i) E(q_1,\ldots ,q_N,p_1,\ldots ,p_N) + \sum _{i =1}^N \nu (q_i) + \sum _{i =1}^N \varvec{\lambda }(q_i) p_i \right\} \mu _0,\nonumber \\ \end{aligned}$$
(44)

for Z the partition function (normalizing constant).

To achieve this, we start from the same space of measures, with the difference that now the constraints are functions rather than constants, and the (unaltered) defintion of \({\mathcal {F}}\) renders these functions integrable. The function constraints are defined as follows, where we always use the max norm on the product of Banach spaces: let

$$\begin{aligned} C(q_1,\ldots ,q_N,p_1,\ldots ,p_N) = (E,\mathbf {P},\mathbf {N})(q_1,\ldots ,q_N,p_1,\ldots ,p_N), \end{aligned}$$
(45)

for \(E,\mathbf {P},\mathbf {N}\) as in (32), (33), (34), and define now

$$\begin{aligned} \begin{array}{rl} &{}\mathbf{G}:\mathcal {F} \rightarrow \mathcal {R}^+ \times L^1(\varLambda ,dx)\times \cdots \times L^1(\varLambda ,dx),\\ &{}\mathbf{G}(\mu )(x) = \left( \displaystyle \int \mu (d\mathbf{q},d\mathbf{p}), \int \sum _{i=1}^{N} C(q_1,\ldots ,q_N,p_1,\ldots ,p_N) \delta _x(q_i) \mu (d\mathbf{q},d\mathbf{p}) \right) , \end{array} \end{aligned}$$
(46)

where we use the Dirac \(\delta _x(q_i)\) merely as a shorthand for replacing x for \(q_i\) and integrating with respect to all \(q_j\)’s except \(q_i\). The result is a function of x. Integrating with respect to the x variable restores the missing \(q_i\) integration and the definition of \(\mathcal {F}\) shows that the last five components of \(\mathbf{G}\) are indeed in \(L^1(\varLambda )\) with respect to dx.

Proposition 8

\(\mathbf{G}\) is \(C^1\) on \(\mathcal {F}\) with

$$\begin{aligned} \begin{array}{rl} &{}d_{0}(\mathbf{G} \circ s_\mu ^{-1}) :X_\mu \rightarrow \mathcal {R}^+ \times L^1(\varLambda ,dx)\times \cdots \times L^1(\varLambda ,dx)\\ &{}d_{0}(\mathbf{G} \circ s_\mu ^{-1})[v] = \left( \displaystyle \int _{X_{\varLambda }} v \mu (d\mathbf{q},d\mathbf{p}), \int _{X_{\varLambda }} \sum _{i=1}^{N} C \ v \ \delta _x(q_{i}) \mu (d\mathbf{q},d\mathbf{p}). \right) \end{array} \end{aligned}$$
(47)

Proof

As in the proof of Proposition 4,

$$\begin{aligned} \begin{array}{rl} d_{0}(G_{j}\circ s_\mu ^{-1})[v] &{}=\displaystyle \frac{d}{dt}\bigg |_{t = 0} \int _{X_{\varLambda }} \displaystyle \sum _{i=1}^{N} C_{j} e^{t v} \delta (q_{i} - x) \mu \\ &{}= \displaystyle \int _{X_{\varLambda }} \sum _{i=1}^{N} C_{j} \ v \ \delta (q_{i} - x) \mu . \end{array} \end{aligned}$$
(48)

Then

$$\begin{aligned} \begin{array}{rl} \Vert d_{0}(G_{j}\circ s_\mu ^{-1})[v]\Vert _{L^{1}(dx)} &{}\le \displaystyle \int _{\varLambda } \int _{X_{\varLambda }} \sum _{i=1}^{N} |C_{j}| \ |v| \ \delta (q_{i} - x) \mu \ dx\\ &{}\le \displaystyle \sum _{N \ge 1} N \int |v| \ |C_{j}| \mu . \end{array} \end{aligned}$$
(49)

Use now (4) and apply Proposition 1 for \(\mu \sim N\mu \) to conclude as in Proposition 3.\(\square \)

Define now \({\mathcal {R}}\) to be the set of regular values of \(\mathbf{G}\), i.e. values of \(\mathbf{G}\) with pre-image consisting of measures where the derivative of \(\mathbf{G}\) is surjective with split kernel.

Theorem 2

A measure \(\mu \) is a critical point for the entropy H as in (14) amongst all measures \(\mu ' = \phi \mu _0\) in \(\mathcal {F}_e\) which satisfy \(\mathbf{G} (\mu ') = (1, f_1, \ldots f_5)\in \mathcal {R}\) if and only if \(\mu \) is a local equilibrium Gibbs probability measure as in (44) for some \(L^\infty (\varLambda , dx)\) functions \(\beta , \varvec{\lambda }, \nu \).

Proof

Let \(\mu = \phi \mu _0\) be a critical point. By Theorem 5, there exist

$$\begin{aligned} \begin{array}{rl} (\lambda _{0}, \beta (x), \varvec{\lambda }(x),\nu (x)) \in &{}\left( \mathcal {R} \times L^1(\varLambda ,dx)\times \cdots \times L^1(\varLambda ,dx) \right) ^{*} \\ &{}\ \ \ \ = \mathcal {R}\times L^\infty (\varLambda ,dx)\times \cdots \times L^\infty (\varLambda ,dx) \end{array} \end{aligned}$$
(50)

(for the sum norm on the last product) such that

$$\begin{aligned} \begin{array}{rl} \displaystyle \int _{X_{\varLambda }} \left( \log \phi +1 \right) v\ \phi \,\mu _0 = &{}\displaystyle \lambda _{0}\int _{X_{\varLambda }} v \mu (d\mathbf{q},d\mathbf{p})\\ &{}+ \displaystyle \int _{\varLambda } \beta (x) \int _{X_{\varLambda }}\sum _i E v \delta _x(q_{i}) \mu (d\mathbf{q},d\mathbf{p})\ dx\\ &{}+ \displaystyle \int _{\varLambda } \varvec{\lambda }(x) \int _{X_{\varLambda }}\sum _i \mathbf {P} v \delta _x(q_{i}) \mu (d\mathbf{q},d\mathbf{p})\ dx\\ &{}+ \int _{\varLambda } \nu (x) \int _{X_{\varLambda }}\sum _i \mathbf {N} v \delta _x(q_{i}) \mu (d\mathbf{q},d\mathbf{p})\ dx, \end{array} \end{aligned}$$
(51)

for all v. Replace \(q_i\) for x to get

$$\begin{aligned} \int _{X_{\varLambda }} \left( \log \phi +1 - \lambda _{0} - \sum _{i=1}^{N} \beta (q_{i}) E - \sum _{i=1}^{N} \varvec{\lambda }(q_{i}) \mathbf {P} -\sum _{i=1}^{N} \nu (q_{i}) \mathbf {N} \right) v\ \mu = 0, \end{aligned}$$
(52)

for all v. As in the proof of Theorem 1, for \(\displaystyle \int \mu = 1\) this implies

$$\begin{aligned} \phi = \frac{1}{Z} \exp \left( \sum _{i=1}^{N} \beta (q_{i}) E +\sum _{i=1}^{N} \varvec{\lambda }(q_{i}) \mathbf {P} +\sum _{i=1}^{N} \nu (q_{i}) \mathbf {N} \right) . \end{aligned}$$
(53)

\(\square \)

An argument completely analogous to the equilibrium case shows that the critical points are again minimizers.

In the case of the local equilibrium it turns out to be substantial more difficult to check the conditions for a regular value, i.e. surjectivity and split kernel. As in corresponding finite dimensional cases, regular values are dense provided certain conditions are satisfied, see [1, Appendix E]. In future work we address the problem of characterizing the elements of \(\mathcal {R}\).