1 Introduction and Main Results

1.1 Hypersingular Riesz Gases

Let \(d \ge 1\) and s be a real number with \(s > d\). We consider a system of N points in the Euclidean space \(\mathbb {R}^d\) with hypersingular Riesz pairwise interactions, in an external field V. The particles are assumed to live in a confinement set \(\Omega \subseteq \mathbb {R}^d\). The energy \(\mathcal {H}_N(\mathbf {X}_N)\) of the system in a given state \(\mathbf {X}_N= (x_1, \dots , x_N) \in (\mathbb {R}^d)^N\) is defined to be

$$\begin{aligned} \mathcal {H}_N(\mathbf {X}_N) := \sum _{1 \le i \ne j \le N} \frac{1}{|x_i-x_j|^s} + N^{s/d} \sum _{i=1}^N V(x_i). \end{aligned}$$
(1.1)

The external field V is a confining potential, growing at infinity, on which we shall make assumptions later. The term hypersingular corresponds to the fact that the singularity of the kernel \(|x-y|^{-s}\) is nonintegrable with respect to the Lebesgue measure on \(\mathbb {R}^d\).

For any \(\beta > 0\), the canonical Gibbs measure associated with (1.1) at inverse temperature \(\beta \) and for particles living on \(\Omega \) is given by

$$\begin{aligned} \mathrm{d}\mathbb {P}_{N,\beta }(\mathbf {X}_N) = \frac{1}{Z_{N,\beta }} \exp \left( - \beta N^{-s/d} \mathcal {H}_N(\mathbf {X}_N)\right) \mathbf {1}_{\Omega ^N}(\mathbf {X}_N) \mathrm{d}\mathbf {X}_N, \end{aligned}$$
(1.2)

where \(\mathrm{d}\mathbf {X}_N\) is the Lebesgue measure on \((\mathbb {R}^d)^N\), \(\mathbf {1}_{\Omega ^N}(\mathbf {X}_N)\) is the indicatrix function of \(\Omega ^N\), and \(Z_{N,\beta }\) is the “partition function”; i.e., the normalizing factor

$$\begin{aligned} Z_{N,\beta }:= \int _{\Omega ^N} \exp \left( - \beta N^{-s/d} \mathcal {H}_N(\mathbf {X}_N)\right) \mathrm{d}\mathbf {X}_N. \end{aligned}$$
(1.3)

We will call the statistical physics system described by (1.1) and (1.2) a hypersingular Riesz gas.

For Riesz potentials in the case \(s>d\), ground state configurations (or Riesz energy minimizers) of N-particle systems (with or without the external field V) have been extensively studied in the large N limit, see [13, 14, 16] and the references therein. Furthermore, for the case of positive temperature, the statistical mechanics of Riesz gases have been investigated in [18] but for a different range of the parameter s, namely \(\max (d-2, 0) \le s < d\). In that paper, a large deviation principle for the empirical process (which encodes the microsopic behavior of the particles at scale \(N^{-1/d}\), averaged in a certain way) was derived. The main goal of the present paper is to extend that work to the hypersingular case. By combining the approaches of the above mentioned papers, we obtain a large deviation principle describing macroscopic as well as microscopic properties for hypersingular Riesz gases.

Studying Riesz interactions for the whole range of s from 0 to infinity is of interest in approximation theory and coding theory, as it connects logarithmic interactions , Coulomb interactions, and (in the limit \(s \rightarrow \infty \)) packing problems, see [13, 25]. Investigating such systems with temperature is also a natural question for statistical mechanics, as it improves our understanding of the behavior of systems with long-range vs. short-range interactions (see, for instance, [4, 6, 9] where the interest of such questions is stressed and [2] and [22, Section 4.2] for additional results). Analyzing the case \(s > d\) is also a first step toward the study of physically more relevant interactions, such as the Lennard–Jones potential.

The hypersingular Riesz case \(s>d\) and the integrable Riesz case \(s<d\) have important differences. For \(s < d\) (which can be thought of as long-range) and, more generally, for integrable interaction kernels g (which includes regular interactions) the global, macroscopic behavior can be studied using classical potential theory. Namely, the empirical measure \(\frac{1}{N} \sum _{i=1}^N \delta _{x_i}\) is found to converge rapidly to some equilibrium measure determined uniquely by \(\Omega \) and V and obtained as the unique minimizer of the potential-theoretic functional

$$\begin{aligned} \iint _{\mathbb {R}^d \times \mathbb {R}^d} g(x-y) \mathrm{d}\mu (x) \mathrm{d}\mu (y) + \int _{\mathbb {R}^d} V \mathrm{d}\mu , \end{aligned}$$

which can be seen as a mean-field energy with a nonlocal term. We refer, e.g., to [26] or [27, Chap. 2] for a treatment of this question (among others).

In these integrable cases, if temperature is scaled in the same way as here, the macroscopic behavior is governed by the equilibrium measure and thus is independent of the temperature so that no knowledge of the microscopic distribution of points is necessary to determine the macroscopic distribution. At the next order in energy, which governs the microscopic distribution of the points, a dependency on \(\beta \) appears. As seen in [18], in the Coulomb and potential Riesz cases (it is important in the method that the interaction kernel be reducible to the kernel of a local operator, which is known only for these particular interactions), the microscopic distribution around a point is given by a problem in a whole space with a neutralizing background, fixing the local density as equal to that of the equilibrium measure at that point. The microscopic distribution is found to minimize the sum of a (renormalized) Riesz energy term and a relative entropy term. A crucial ingredient in the proof is a “screening” construction showing that energy can be computed additively over large disjoint microscopic boxes; i.e., interactions between configurations in different large microscopic boxes are negligible to this order.

The hypersingular case can be seen as more delicate than the integrable case due to the absence of an equilibrium measure. The limit of the empirical measure has to be identified differently. In the case of ground state configurations (minimizers), this was done in [16]. For positive temperature, in contrast with the above described integrable case, we shall show that the empirical limit measure is obtained as a by-product of the study at the microscopic scale and depends on \(\beta \) in quite an indirect way (see Theorem 1.3). The microscopic profiles minimize a full-space version of the problem, giving an energy that depends on the local density. The macroscopic distribution can then be found by a local density approximation, by minimizing the sum of its energy and that due to the confinement potential. Since the energy is easily seen to scale like \(N^{1+s/d}\), the choice of the temperature scaling \(\beta N^{-s/d}\) is made so that the energy and the entropy for the microscopic distributions carry the same weight of order N. Other choices of temperature scalings are possible, but would lead to degenerate versions of the situation we are examining, with either all the entropy terms asymptotically disappearing for small temperatures, or the effect of the energy altogether disappearing for large temperatures. Note that going to the microscopic scale in order to derive the behavior at the macroscopic scale was already the approach needed in [19] for the case of the “two-component plasma”, a system of two-dimensional particles of positive and negative charges interacting logarithmically for which no a priori knowledge of the equilibrium measure can be found.

On the other hand, the hypersingular case is also easier in the sense that the interactions decay faster at infinity, implying that long-range interactions between large microscopic “boxes” are negligible and do not require any sophisticated screening procedures . Our proofs will make crucial use of this “self-screening” property.

To describe the system at the microscopic scale, we define a Riesz energy \(\overline{\mathbb {W}}_s\) (see Sect. 2.3.5) for infinite random point configurations that is the counterpart of the renormalized energy of [17, 18, 23] (defined for \(s < d\)). It is conjectured to be minimized by lattices for certain low dimensions, but this is a completely open problem with the exception of dimension 1 (see [1] and the discussion following (2.15)).

With any sequence of configurations \(\{\mathbf {X}_N\}_N\), we associate an “empirical process” whose limit (a random tagged point process) describes the point configurations \(\mathbf {X}_N\) at scale \(N^{-1/d}\). Our main result will be that there is a Large Deviation Principle (LDP) for the law of this empirical process with rate function equal to (a variant of) the energy \(\beta \overline{\mathbb {W}}_s\) plus the relative entropy of the empirical process with respect to the Poisson point process.

For minimizers of the Riesz energy \(\mathcal {H}_N\), we show that the limiting empirical processes must minimize \(\overline{\mathbb {W}}_s\), thus describing their microscopic structure.

The question of treating more general interactions than the Riesz ones remains widely open. The fact that the interaction has a precise homogeneity under rescaling is crucial for the hypersingular case treated here. On the other hand, in the integrable case, we do not know how to circumvent the need for expressing the energy via the potential generated by the points, i.e., the need for the Caffarelli–Silvestre representation of the interaction as the kernel of a local operator (achieved by adding a space dimension).

1.2 Assumptions and Notation

1.2.1 Assumptions

In the rest of the paper, we assume that \(\Omega \subset \mathbb {R}^d\) is closed with positive d-dimensional Lebesgue measure and that

$$\begin{aligned}&\partial \Omega \text { is }C^1, \end{aligned}$$
(1.4)
$$\begin{aligned}&V\text { is a continuous, non-negative real valued function on }\Omega . \end{aligned}$$
(1.5)

Furthermore, if \(\Omega \) is unbounded, we assume that

$$\begin{aligned}&\lim _{|x| \rightarrow \infty } V(x) = + \infty , \end{aligned}$$
(1.6)
$$\begin{aligned}&\exists M > 0 \text{ such } \text{ that } \int \exp \left( - M V(x)\right) \mathrm{d}x < + \infty . \end{aligned}$$
(1.7)

The assumption (1.4) on the regularity of \(\partial \Omega \) is mostly technical, and we believe that it could be relaxed to, e.g., \(\partial \Omega \) is locally the graph of, say, a Hölder function in \(\mathbb {R}^d\). However, it is unclear to us what the minimal assumption could be (e.g., is it enough to assume that \(\partial \Omega \) has zero measure?). An interesting direction would be to study the case where \(\Omega \) is a p-rectifiable set in \(\mathbb {R}^d\) with \(p < d\) (see, e.g., [3, 16]).

Assumption (1.5) is quite mild (in comparison, e.g., with the corresponding assumption in the \(s < d\) case, where one wants to ensure some regularity of the so-called equilibrium measure, which is essentially two orders lower than that for V), and we believe it to be sharp for our purposes. Assumption (1.6) is an additional confinement assumption, and (1.7) ensures that the partition function \(Z_{N,\beta }\), defined in (1.3), is finite (at least for N large enough). Indeed, the interaction energy is non-negative, hence for N large enough (1.7) ensures that the integral defining the partition function is convergent.

1.2.2 General Notation

We let \(\mathcal {X}\) be the space of point configurations in \(\mathbb {R}^d\) (see Sect. 2.1 for a precise definition). If X is some measurable space and \(x \in X\), we denote by \(\delta _x\), the Dirac mass at x.

1.2.3 Empirical Measure and Empirical Processes

Let \(\mathbf {X}_N= (x_1, \dots , x_N)\) in \(\Omega ^N\) be fixed.

  • We define the empirical measure \(\mathrm {emp}(\mathbf {X}_N)\) as

    $$\begin{aligned} \mathrm {emp}(\mathbf {X}_N) := \frac{1}{N} \sum _{i=1}^N \delta _{x_i}. \end{aligned}$$
    (1.8)

    It is a probability measure on \(\Omega \).

  • We define \(\mathbf {X}_N'\) as the finite configuration rescaled by a factor \(N^{1/d}\)

    $$\begin{aligned} \mathbf {X}_N' := \sum _{i=1}^N \delta _{N^{1/d} x_i}. \end{aligned}$$
    (1.9)

    It is a point configuration (an element of \(\mathcal {X}\)), which represents the N-tuple of particles \(\mathbf {X}_N\) seen at microscopic scale.

  • We define the tagged empirical process \(\overline{\mathrm {Emp}}_N(\mathbf {X}_N)\) as

    $$\begin{aligned} \overline{\mathrm {Emp}}_N(\mathbf {X}_N) := \int _{\Omega } \delta _{\left( x,\, \theta _{N^{1/d}x} \cdot \mathbf {X}_N' \right) } \mathrm{d}x, \end{aligned}$$
    (1.10)

    where \(\theta _x\) denotes the translation by \(- x\). It is a positive measure on \(\Omega \times \mathcal {X}\).

Let us now briefly explain the meaning of the last definition (1.10). For any \(x \in \Omega \), \(\theta _{N^{1/d}x} \cdot \mathbf {X}_N'\) is an element of \(\mathcal {X}\) that represents the N-tuple of particles \(\mathbf {X}_N\) centered at x and seen at microscopic scale (or, equivalently, seen at microscopic scale and then centered at \(N^{1/d} x\)). In particular, any information about this point configuration in a given ball (around the origin) translates to an information about \(\mathbf {X}_N'\) around x. We may thus think of \(\theta _{N^{1/d}x} \cdot \mathbf {X}_N'\) as encoding the behavior of \(\mathbf {X}_N'\) around x.

The measure

$$\begin{aligned} \int _{\Omega } \delta _{\theta _{N^{1/d}x} \cdot \mathbf {X}_N'} \mathrm{d}x \end{aligned}$$
(1.11)

is a measure on \(\mathcal {X}\) that encodes the behavior of \(\mathbf {X}_N'\) around each point \(x \in \Omega \). We may think of it as the “averaged” microscopic behavior (although it is not, in general, a probability measure, and its mass can be infinite). The measure defined by (1.11) would correspond to what is called the “empirical field”.

The tagged empirical process \(\overline{\mathrm {Emp}}_N(\mathbf {X}_N)\) is a finer object, because for each \(x \in \Omega \) we keep track of the centering point x as well as of the microscopic information \(\theta _{N^{1/d}x} \cdot \mathbf {X}_N'\) around x. It yields a measure on \(\Omega \times \mathcal {X}\) whose first marginal is the Lebesgue measure on \(\Omega \) and whose second marginal is the (non-tagged) empirical process defined above in (1.11). Keeping track of this additional information allows one to test \(\overline{\mathrm {Emp}}_N(\mathbf {X}_N)\) against functions \(F(x, \mathcal {C}) \in C^0(\Omega \times \mathcal {X})\) which may be of the form

$$\begin{aligned} F(x, \mathcal {C}) = \chi (x) \tilde{F}(\mathcal {C}), \end{aligned}$$

where \(\chi \) is a smooth function localized in a small neighborhood of a given point of \(\Omega \), and \(\tilde{F}(\mathcal {C})\) is a continuous function on the space of point configurations. Using such test functions, we may thus study the microsopic behavior of the system after a small average (on a small domain of \(\Omega \)), whereas the empirical process only allows one to study the microscopic behavior after averaging over the whole \(\Omega \).

The study of empirical processes, or empirical fields, as natural quantities to encode the averaged microscopic behavior appear, e.g., in [11] for particles without interaction or [12] in the interacting case.

1.2.4 Large Deviation Principle

Let us recall that an sequence \(\{\mu _N\}_N\) of probability measures on a metric space X is said to satisfy an LDP at speed \(r_N\) with rate function \(I : X \rightarrow [0, +\infty ]\) if the following holds for any Borel set \(A \subset X\):

$$\begin{aligned} - \inf _{\mathring{A}} I \le \liminf _{N \rightarrow \infty }\frac{1}{r_N} \log \mu _N(A) \le \limsup _{N \rightarrow \infty }\frac{1}{r_N} \log \mu _N(A) \le - \inf _{\bar{A}} I, \end{aligned}$$

where \(\mathring{A}\) (resp. \(\bar{A}\)) denotes the interior (resp. the closure) of A. The functional I is said to be a good rate function if it is lower semi-continuous and has compact sub-level sets. We refer to [10] and [28] for detailed treatments of the theory of large deviations and to [24] for an introduction to the applications of LDP’s in the statistical physics setting.

Roughly speaking, an LDP at speed \(r_N\) with rate function I expresses the following fact: the probability measures \(\mu _N\) concentrate around the points where I vanishes, and any point \(x \in X\) such that \(I(x) > 0\) is not “seen” with probability \(1 - \exp (-N I(x))\).

1.3 Main Results

1.3.1 Large Deviations of the Empirical Processes

We let \(\overline{\mathfrak {P}}_{N, \beta }\) be the push-forward of the Gibbs measure \(\mathbb {P}_{N,\beta }\) (defined in (1.2)) by the map \(\overline{\mathrm {Emp}}_N\) defined in (1.10). In other words, \(\overline{\mathfrak {P}}_{N, \beta }\) is the law of the random variable “tagged empirical process” when \(\mathbf {X}_N\) is distributed following \(\mathbb {P}_{N,\beta }\).

The following theorem, which is the main result of this paper, involves the functional \(\overline{\mathcal {F}}_{\beta }=\overline{\mathcal {F}}_{\beta ,s}\) defined in (2.26). It is a free energy functional of the type “\(\beta \) Energy + Entropy” (see Sect. 2.22.3, and 2.4 for precise definitions). The theorem expresses the fact that the microscopic behavior of the system of particles is determined by the minimization of the functional \(\overline{\mathcal {F}}_{\beta }\) and that configurations \(\mathbf {X}_N\) having empirical processes \(\overline{\mathrm {Emp}}(\mathbf {X}_N)\) far from a minimizer of \(\overline{\mathcal {F}}_{\beta }\) have negligible probability of order \(\exp (-N)\).

Theorem 1.1

For any \(\beta > 0\), the sequence \(\{\overline{\mathfrak {P}}_{N, \beta }\}_N\) satisfies a large deviation principle at speed N with good rate function \(\overline{\mathcal {F}}_{\beta }- \min \overline{\mathcal {F}}_{\beta }\).

Corollary 1.2

The first-order expansion of \(\log Z_{N,\beta }\) as \(N\rightarrow \infty \) is

$$\begin{aligned} \log Z_{N,\beta }= - N \min \overline{\mathcal {F}}_{\beta }+ o(N). \end{aligned}$$

1.3.2 Large Deviations of the Empirical Measure

As a by-product of our microscopic study, we derive a large deviation principle that governs the asymptotics of the empirical measure (which is a macroscropic quantity). Let us denote by \(\mathfrak {emp}_{N,\beta }\) the law of the random variable \(\mathrm {emp}(\mathbf {X}_N)\) when \(\mathbf {X}_N\) is distributed according to \(\mathbb {P}_{N,\beta }\). The rate function \(I_{\beta }=I_{\beta ,s}\), defined in Sect. 2.4 (see (2.28)), has the form

$$\begin{aligned} I_{\beta }(\rho )= \int _{\Omega } f_\beta (\rho (x)) \rho (x)\, \mathrm{d}x + \beta \int _{\Omega } V(x) \rho (x) \, \mathrm{d}x + \int _{\Omega } \rho (x) \log \rho (x)\, \mathrm{d}x\nonumber \\ \end{aligned}$$
(1.12)

and is a local density approximation. The function \(f_\beta \) in this expression is determined by a minimization problem over the microscopic empirical processes.

Theorem 1.3

For any \(\beta > 0\), the sequence \(\{\mathfrak {emp}_{N,\beta }\}_{N}\) obeys a large deviation principle at speed N with good rate function \(I_{\beta }- \min I_{\beta }\). In particular, the empirical measure converges almost surely to the unique minimizer of \(I_{\beta }\).

The rate function \(I_{\beta }\) is quite complicated to study in general. However, thanks to the convexity of \(f_\beta \) and elementary properties of the standard entropy, we may characterize its minimizer in some particular cases (see Sect. 5 for the proof) :

Proposition 1.4

Let \(\mu _{V, \beta }\) be the unique minimizer of \(I_{\beta }\).

  1. 1.

    If \(V = 0\) and \(\Omega \) is bounded, then \(\mu _{V, \beta }\) is the uniform probability measure on \(\Omega \) for any \(\beta > 0\).

  2. 2.

    If V is arbitrary and \(\Omega \) is bounded, \(\mu _{V, \beta }\) converges to the uniform probability measure on \(\Omega \) as \(\beta \rightarrow 0\).

  3. 3.

    If V is arbitrary, \(\mu _{V, \beta }\) converges to \(\mu _{V,\infty }\) as \(\beta \rightarrow + \infty \), where \(\mu _{V, \infty }\) is the limit empirical measure for energy minimizers as defined in the paragraph below.

1.3.3 The Case of Minimizers

Our remaining results deal with energy minimizers (in statistical physics, this corresponds to setting \(\beta = + \infty \)). Let \(\{\mathbf {X}_N\}_N\) be a sequence of point configurations in \(\Omega \) such that for any \(N \ge 1\), \(\mathbf {X}_N\) has N points and minimizes \(\mathcal {H}_N\) on \(\Omega ^N\).

The macroscopic behavior is known from [16]: there is a unique minimizer \(\mu _{V, \infty }\) (the notation differs from [16]) of the functional

$$\begin{aligned} C_{s,d}\int _{\Omega } \rho (x)^{1+s/d}\, \mathrm{d}x+ \int _{\Omega } V(x) \rho (x)\, \mathrm{d}x \end{aligned}$$
(1.13)

among probability densities \(\rho \) over \(\Omega \) (\(C_{s,d}\) is a constant depending on sd defined in (2.10)), and the empirical measure \(\mathrm {emp}(\mathbf {X}_N)\) converges to \(\mu _{V, \infty }\) as \(N \rightarrow \infty \). See (5.4) for an explicit formula for \(\mu _{V, \infty }\). Note that the formula (1.13) is what one obtains when letting formally \(\beta \rightarrow \infty \) in the definition of \(I_{\beta }\), and resembles some of the terms arising in Thomas–Fermi theory (cf. [21] and [20]).

The notation for the next statement is given in Sects. 2.1 and 2.3. Let us simply say that \(\overline{\mathbb {W}}_s\) (resp. \(\mathcal {W}_s\)) is an energy functional defined for a random point configuration (resp. a point configuration), and that \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\) (resp. \(\mathcal {X}_{\mu _{V,\infty }(x)}\)) is some particular subset of random point configurations (resp. of point configurations in \(\mathbb {R}^d\)). The intensity measure of a random tagged point configuration is defined in Sect. 2.1.7.

Proposition 1.5

We have:

  1. 1.

    \(\{\overline{\mathrm {Emp}}(\mathbf {X}_N)\}_N\) converges weakly (up to extraction of a subsequence) to some minimizer \(\overline{P}\) of \(\overline{\mathbb {W}}_s\) over \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\).

  2. 2.

    The intensity measure of \(\overline{P}\) coincides with \(\mu _{V, \infty }\).

  3. 3.

    For \(\overline{P}\)-almost every \((x, \mathcal {C})\), the point configuration \(\mathcal {C}\) minimizes \(\mathcal {W}_s(\mathcal {C})\) within the class \(\mathcal {X}_{\mu _{V,\infty }(x)}\).

The first point expresses the fact that the tagged empirical processes associated with minimizers converge with minimizers of the “infinite-volume” energy functional \(\overline{\mathbb {W}}_s\). The second point is a rephrasing of the global result cited above, to which the third point adds some microscopic information.

The problem of minimizing the energy functionals \(\overline{\mathbb {W}}_s\), \(\mathbb {W}_s\), or \(\mathcal {W}_s\) is hard in general. In dimension 1, however, it is not too difficult to show that the “crystallization conjecture” holds, namely that the microscopic structure of minimizers is ordered and converges to a lattice:

Proposition 1.6

Assume \(d=1\). The unique stationary minimizer of \(\mathbb {W}_s\) is the law of \(u + \mathbb {Z}\), where u is a uniform choice of the origin in [0, 1].

In dimension 2, it would be expected that minimizers are given by the triangular (or Abrikosov) lattice; we refer to [1] for a recent review of such conjectures. In large dimensions, it is not expected that lattices are minimizers.

1.4 Outline of the Method

Our LDP result is phrased in terms of the empirical processes associated with point configurations, as in [18], and thus the objects we consider and the overall aproach are quite similar to [18]. It is however quite simplified by the fact that, because the interaction is short-range and we are in the nonpotential case, we do not need to express the energy in terms of the “electric potential” generated by the point configuration. The definition of the limiting microscopic interaction energy \({\mathcal {W}}_s({\mathcal {C}})\) is thus significantly simpler than in [18]; it suffices to take, for \(\mathcal {C}\) an infinite configuration of points in the whole space,

$$\begin{aligned} {\mathcal {W}}_s({\mathcal {C}})= \liminf _{R\rightarrow \infty } \sum _{p,q \in {\mathcal {C}}\cap K_R, p \ne q} \frac{1}{|p-q|^s}, \end{aligned}$$

where \(K_R\) is the cube of sidelength R centered at the origin. When considering this quantity, there is however no implicit knowledge of the average density of points, in contrast to the situation of [18]. This is then easily extended to an energy for point processes \(\overline{\mathbb {W}}_s\) by taking expectations.

As in [18], the starting point of the LDP proof is a Sanov-type result that states that the logarithm of the volume of configurations whose empirical processes lie in a small ball around a given tagged point process \(\overline{P}\) can be expressed as \( (-N) \) times an entropy denoted \(\mathsf {ent}({\overline{P}}|\mathbf {\Pi })\). As we shall show, \(N^{-1-s/d}\mathcal {H}_N(\mathbf {X}_N)\approx \overline{\mathbb {W}}_s(\overline{P}) +\overline{\mathbb {V}}(\overline{P})\) for a sufficiently large set of configurations \(\mathbf {X}_N\) near \(P\), where \(\overline{\mathbb {V}}(\overline{P})\) is a term corresponding to the external potential V. Then this will suffice to obtain the LDP since the logarithm of the probability of the empirical field being close to \(\overline{P}\) is nearly N times

$$\begin{aligned} - \beta (\overline{\mathbb {W}}_s(\overline{P})+\overline{\mathbb {V}}(\overline{P})) - \mathsf {ent}({\overline{P}}|\mathbf {\Pi }), \end{aligned}$$

up to an additive constant. The entropy can be expressed in terms of \(\overline{P}^x\) (the process centered at x) as

$$\begin{aligned} \mathsf {ent}({\overline{P}}|\mathbf {\Pi })=\int (\mathsf {ent}(\overline{P}^x|\mathbf {\Pi })-1)\, \mathrm{d}x +1, \end{aligned}$$
(1.14)

where \(\mathsf {ent}(P |\mathbf {\Pi })\) is a “specific relative entropy” with respect to the Poisson point process \(\mathbf {\Pi }\). Assuming that \(\overline{P}^x\) has an intensity \(\rho (x)\), then the scaling properties of the energy \(\overline{\mathbb {W}}_s\) (the fact that the energy scales like \(\rho ^{1+s/d}\) where \(\rho \) is the density) and of the specific relative entropy \(\mathsf {ent}\) allow one to transform this into

$$\begin{aligned}&- \int _{\Omega }\left( \beta \rho ^{s/d} \mathbb {W}_s(\overline{P}^x) + \mathsf {ent}(\sigma _{\rho (x)}(\overline{P}^x)|\mathbf {\Pi }) + \beta V(x) \right) \rho (x) \, \mathrm{d}x \\&\quad - \int _{\Omega } \rho (x) \log \rho (x) \, \mathrm{d}x, \end{aligned}$$

which is the desired rate function. Minimizing over P’s of intensity \(\rho \) allows one to obtain the rate function \(I_\beta \) of (2.27).

To run through this argument, we encounter the same difficulties as in [18], i.e., the difficulty in replacing \(\mathcal {H}_N\) by \(\overline{\mathbb {W}}_s\) due to the fact that \(\mathcal {H}_N\) is not continuous for the topology on empirical processes that we are considering. The lack of continuity of the interaction near the origin is dealt with by a truncation and regularization argument, similarly as in [18]. The lack of continuity due to the locality of the topology is handled thanks to the short-range nature of the Riesz interaction, by showing that large microscopic boxes effectively do not interact, the “self-screening” property alluded to before, via a shrinking procedure borrowed from [14]. We refer to Sect. 4 for more detail.

2 General Definitions

All the hypercubes considered will have their sides parallel to some fixed choice of axes in \(\mathbb {R}^d\). For \(R > 0\) we let \(K_R\) be the hypercube of center 0 and sidelength R. If \(A \subset \mathbb {R}^d\) is a Borel set, we denote by |A| its Lebesgue measure, and if A is a finite set, we denote by |A| its cardinal.

2.1 (Random) (tagged) Point Configurations

2.1.1 Point Configurations

We refer to [8] for further details and proofs of the claims.

  • If \(A \subset \mathbb {R}^d\), we denote by \(\mathcal {X}(A)\) the set of locally finite point configurations in A or equivalently the set of non-negative, purely atomic Radon measures on A giving an integer mass to singletons. We abbreviate \(\mathcal {X}(\mathbb {R}^d)\) as \(\mathcal {X}\).

  • For \(\mathcal {C}\in \mathcal {X}\), we will often write \(\mathcal {C}\) for the Radon measure \(\sum _{p \in {\mathcal {C}}} \delta _p\).

  • The sets \(\mathcal {X}(A)\) are endowed with the topology induced by the weak convergence of Radon measures (also known as vague convergence). These topological spaces are Polish, and we fix a distance \(d_{\mathcal {X}}\) on \(\mathcal {X}\) which is compatible with the topology on \(\mathcal {X}\) (and whose restriction on \(\mathcal {X}(A)\) is also compatible with the topology on \(\mathcal {X}(A)\)).

  • For \(x \in \mathbb {R}^d\) and \(\mathcal {C}\in \mathcal {X}\), we denote by \(\theta _x \cdot \mathcal {C}\) “the configuration \(\mathcal {C}\) centered at x” (or “translated by \(-x\)”), namely

    $$\begin{aligned} \theta _x \cdot \mathcal {C}:= \sum _{p \in \mathcal {C}} \delta _{p - x}. \end{aligned}$$
    (2.1)

    We will use the same notation for the action of \(\mathbb {R}^d\) on Borel sets: if \(A \subset \mathbb {R}^d\), we denote by \(\theta _x \cdot A\) the translation of A by the vector \(-x\).

2.1.2 Tagged Point Configurations

  • When \(\Omega \subset \mathbb {R}^d\) is fixed, we define \(\overline{\mathcal {X}}:= \Omega \times \mathcal {X}\) as the set of “tagged” point configurations with tags in \(\Omega \).

  • We endow \(\overline{\mathcal {X}}\) with the product topology and a compatible distance \(d_{\overline{\mathcal {X}}}\).

Tagged objects will usually be denoted with bars (e.g., \(\overline{P}\), \(\overline{\mathbb {W}}\), ...).

2.1.3 Random Point Configurations

  • We denote by \(\mathcal {P}(\mathcal {X})\) the space of probability measures on \(\mathcal {X}\), i.e., the set of laws of random point configurations.

  • The set \(\mathcal {P}(\mathcal {X})\) is endowed with the topology of weak convergence of probability measures (with respect to the topology on \(\mathcal {X}\)), see [18, Remark 2.7].

  • We say that P in \(\mathcal {P}(\mathcal {X})\) is stationary (and we write \(P \in \mathcal {P}_{stat}(\mathcal {X})\)) if its law is invariant by the action of \(\mathbb {R}^d\) on \(\mathcal {X}\) as defined in (2.1).

2.1.4 Random Tagged Point Configurations

  • When \(\Omega \subset \mathbb {R}^d\) is fixed, we define \(\overline{\mathcal {M}}(\overline{\mathcal {X}})\) as the space of measures \(\overline{P}\) on \(\overline{\mathcal {X}}\) such that

    1. 1.

      The first marginal of \(\overline{P}\) is the Lebesgue measure on \(\Omega \).

    2. 2.

      For almost every \(x \in \Omega \), the disintegration measure \(\overline{P}^x\) is an element of \(\mathcal {P}(\mathcal {X})\).

  • We say that \(\overline{P}\) in \(\overline{\mathcal {M}}(\overline{\mathcal {X}})\) is stationary (and we write \(\overline{P}\in \overline{\mathcal {M}}_{stat}(\overline{\mathcal {X}})\)) if \(\overline{P}^x\) is in \(\mathcal {P}_{stat}(\mathcal {X})\) for almost every \(x \in \Omega \).

Let us emphasize that, in general, the elements of \(\overline{\mathcal {M}}(\overline{\mathcal {X}})\) are not probability measures on \(\overline{\mathcal {X}}\) (e.g., the first marginal is the Lebesgue measure on \(\Omega \)).

2.1.5 Density of a Point Configuration

  • For \(\mathcal {C}\in \mathcal {X}\), we define \(\mathrm {Dens}(\mathcal {C})\) (the density of \(\mathcal {C}\)) as

    $$\begin{aligned} \mathrm {Dens}(\mathcal {C}):=\liminf _{R\rightarrow \infty } \frac{|\mathcal {C}\cap K_R|}{R^d}. \end{aligned}$$
    (2.2)
  • For \(m \in [0, +\infty ]\), we denote by \(\mathcal {X}_m\) the set of point configurations with density m.

  • For \(m \in (0, +\infty )\), the scaling map

    $$\begin{aligned} \sigma _m : \mathcal {C}\mapsto m ^{1/d} \mathcal {C}\end{aligned}$$
    (2.3)

    is a bijection of \(\mathcal {X}_m\) onto \(\mathcal {X}_1\), with inverse \(\sigma _{1/m}\).

2.1.6 Intensity of a Random Point Configuration

  • For \(P \in \mathcal {P}_{stat}(\mathcal {X})\), we define \(\mathrm {Intens}(P)\) (the intensity of P) as

    $$\begin{aligned} \mathrm {Intens}(P) := \mathbf {E}_{P} \left[ \mathrm {Dens}(\mathcal {C}) \right] . \end{aligned}$$
  • We denote by \(\mathcal {P}_{stat,m}(\mathcal {X})\) the set of laws of random point configurations \(P \in \mathcal {P}(\mathcal {X})\) that are stationary and such that \(\mathrm {Intens}(P) = m\). For \(P\in \mathcal {P}_{stat,m}(\mathcal {X})\), the stationarity assumption implies the formula

    $$\begin{aligned} \mathbf {E}_P \left[ \int _{\mathbb {R}^d} \varphi \, \mathrm{d}\mathcal {C}\right] = m \int _{\mathbb {R}^d} \varphi (x)\, \mathrm{d}x, \text { for any }\varphi \in C^0_c(\mathbb {R}^d). \end{aligned}$$

2.1.7 Intensity Measure of a Random Tagged Point Configuration

  • For \(\overline{P}\) in \(\overline{\mathcal {M}}_{stat}(\overline{\mathcal {X}})\), we define \(\overline{\mathrm {Intens}}(\overline{P})\) (the intensity measure of \(\overline{P}\)) as

    $$\begin{aligned} \overline{\mathrm {Intens}}(\overline{P})(x) = \mathrm {Intens}(\overline{P}^x), \end{aligned}$$

    which really should, in general, be understood in a dual sense: for any \(f \in C_c(\mathbb {R}^d)\),

    $$\begin{aligned} \int f \mathrm{d}\overline{\mathrm {Intens}}(\overline{P}) := \int _{\Omega } f(x) \mathrm {Intens}(\overline{P}^x) \mathrm{d}x. \end{aligned}$$
  • We denote by \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\) the set of laws of random tagged point configurations \(\overline{P}\) in \(\overline{\mathcal {M}}(\overline{\mathcal {X}})\) that are stationary and such that

    $$\begin{aligned} \int _{\Omega } \overline{\mathrm {Intens}}(\overline{P})(x)\, \mathrm{d}x = 1. \end{aligned}$$
  • If \(\overline{P}\) has intensity measure \(\rho \), we denote by \(\overline{\sigma }_{\rho }(\overline{P})\) the element of \(\overline{\mathcal {M}}(\overline{\mathcal {X}})\) satisfying

    $$\begin{aligned} \left( \overline{\sigma }_{\rho }(\overline{P})\right) ^x = \sigma _{\rho (x)}\left( \overline{P}^x\right) , \text { for all }x \in \Omega , \end{aligned}$$
    (2.4)

    where \(\sigma \) is as in (2.3).

2.2 Specific Relative Entropy

  • Let P be in \(\mathcal {P}_{stat}(\mathcal {X})\). The specific relative entropy \(\mathsf {ent}[P|\mathbf {\Pi }]\) of \(P\) with respect to \(\mathbf {\Pi }\), the law of the Poisson point process of uniform intensity 1, is given by

    $$\begin{aligned} \mathsf {ent}[P|\mathbf {\Pi }] := \lim _{R \rightarrow \infty } \frac{1}{|K_R|} \mathrm {Ent}\left( P_{|K_R} | \mathbf {\Pi }_{|K_R} \right) , \end{aligned}$$
    (2.5)

    where \( P_{|K_R}\) denotes the process induced on (the point configurations in) \(K_R\), and \(\mathrm {Ent}( \cdot | \cdot )\) denotes the usual relative entropy (or Kullbak–Leibler divergence) of two probability measures defined on the same probability space.

  • It is known (see, e.g., [24]) that the limit (2.5) exists as soon as P is stationary, and also that the functional \(P \mapsto \mathsf {ent}[P|\mathbf {\Pi }]\) is affine lower semi-continuous with compact sub-level sets (it is a good rate function).

  • Let us observe that the empty point process has specific relative entropy 1 with respect to \(\mathbf {\Pi }\).

  • If P is in \(\mathcal {P}_{stat,m}(\mathcal {X})\), we have (see [18, Lemma 4.2.])

    $$\begin{aligned} \mathsf {ent}[P |\mathbf {\Pi }]= \mathsf {ent}[\sigma _m(P) |\mathbf {\Pi }]m + m \log m +1-m, \end{aligned}$$
    (2.6)

    where \(\sigma _m(P)\) denotes the push-forward of P by (2.3).

2.3 Riesz Energy of (random) (tagged) Point Configurations

2.3.1 Riesz Interaction

We will use the notation \(\mathrm {Int}\) (as “interaction”) in two slightly different ways:

  • If \(\mathcal {C}_1, \mathcal {C}_2\) are some fixed point configurations, we let \(\mathrm {Int}[\mathcal {C}_1, \mathcal {C}_2]\) be the Riesz interaction between \(\mathcal {C}_1\) and \(\mathcal {C}_2\).

    $$\begin{aligned} \mathrm {Int}[\mathcal {C}_1, \mathcal {C}_2] := \sum _{p \in \mathcal {C}_1, \, q \in \mathcal {C}_2, p \ne q} \frac{1}{|p-q|^s}. \end{aligned}$$
  • If \(\mathcal {C}\) is a fixed point configuration and AB are two subsets of \(\mathbb {R}^d\), we let \(\mathrm {Int}[A, B](\mathcal {C})\) be the Riesz interaction between \(\mathcal {C}\cap A\) and \(\mathcal {C}\cap B\); i.e.,

    $$\begin{aligned} \mathrm {Int}[A,B](\mathcal {C}) := \mathrm {Int}[\mathcal {C}\cap A, \mathcal {C}\cap B] = \sum _{p \in \mathcal {C}\cap A, q \in \mathcal {C}\cap B, p \ne q} \frac{1}{|p-q|^s}. \end{aligned}$$
  • Finally, if \(\tau > 0\), we let \(\mathrm {Int}_{\tau }\) be the truncation of the Riesz interaction at distances less than \(\tau \); i.e.,

    $$\begin{aligned} \mathrm {Int}_{\tau }[\mathcal {C}_1, \mathcal {C}_2] := \sum _{p \in \mathcal {C}_1, q \in \mathcal {C}_2, |p-q| \ge \tau } \frac{1}{|p-q|^s}. \end{aligned}$$
    (2.7)

2.3.2 Riesz Energy of a Finite Point Configuration

  • Let \(\omega _N=(x_1,\ldots , x_N)\) be in \((\mathbb {R}^d)^N\). We define its Riesz s-energy as

    $$\begin{aligned} E_s(\omega _N):=\mathrm {Int}[\omega _N, \omega _N] = \sum _{1 \le i \ne j \le N} \frac{1}{|x_i-x_j|^s}. \end{aligned}$$
    (2.8)
  • For \(A \subset \mathbb {R}^d\), we consider the N-point minimal s-energy

    $$\begin{aligned} E_s(A, N) := \inf _{\omega _N \in A^N} E_s(\omega _N). \end{aligned}$$
    (2.9)
  • The asymptotic minimal energy \(C_{s,d}\) is defined as

    $$\begin{aligned} C_{s,d}:=\lim _{N\rightarrow \infty }\frac{E_s(K_1,N)}{N^{1+s/d}}. \end{aligned}$$
    (2.10)

    The limit in (2.10) exists as a positive real number (see [13, 14]).

  • By scaling properties of the s-energy, it follows that

    $$\begin{aligned} \lim _{N\rightarrow \infty }\frac{E_s(K_R,N)}{N^{1+s/d}}=C_{s,d}R^{-s}. \end{aligned}$$
    (2.11)

2.3.3 Riesz Energy of Periodic Point Configurations

We first extend the definition of the Riesz energy to the case of periodic point configurations.

  • We say that \(\Lambda \subset \mathbb {R}^d\) is a d-dimensional Bravais lattice if \(\Lambda = U \mathbb {Z}^d\), for some nonsingular \(d\times d\) real matrix U. A fundamental domain for \(\Lambda \) is given by \(\mathbf {D}_{\Lambda }:= U [-\frac{1}{2}, \frac{1}{2})^d\), and the co-volume of \(\Lambda \) is \(|\Lambda | :=\text {vol}(\mathbf {D}_{\Lambda }) = |\det U|\).

  • If \(\mathcal {C}\) is a point configuration (finite or infinite) and \(\Lambda \) a lattice, we denote by \(\mathcal {C}+ \Lambda \) the configuration \(\{ p + \lambda \mid p \in \mathcal {C}, \lambda \in \Lambda \}\). We say that \(\mathcal {C}\) is \(\Lambda \)-periodic if \(\mathcal {C}+ \Lambda = \mathcal {C}\).

  • If \(\mathcal {C}\) is \(\Lambda \)-periodic, it is easy to see that \(\mathcal {C}= \left( \mathcal {C}\cap \mathbf {D}_{\Lambda }\right) + \Lambda \). The density of \(\mathcal {C}\) is thus given by

    $$\begin{aligned} \mathrm {Dens}(\mathcal {C}) = \frac{ |\mathcal {C}\cap \mathbf {D}_{\Lambda }|}{|\Lambda |}. \end{aligned}$$

Let \(\Lambda \) be a lattice and \(\omega _N = \{x_1, \dots , x_N\} \subset \mathbf {D}_{\Lambda }\).

  • We define, as in [15] for \(s > d\), the \(\Lambda \)-periodic s-energy of \(\omega _N\) as

    $$\begin{aligned} E_{s, \Lambda }(\omega _N):=\sum _{x \in \omega _N} \sum _{\begin{array}{c} y\in \omega _N+\Lambda \\ y\ne x \end{array}}\frac{1}{|x-y|^s}. \end{aligned}$$
    (2.12)
  • It follows (cf. [15]) that \(E_{s, \Lambda }(\omega _N)\) can be re-written as

    $$\begin{aligned} E_{s, \Lambda }(\omega _N) = N\zeta _{\Lambda }(s)+\sum _{x\ne y\in \omega _N}\zeta _{\Lambda }(s,x-y), \end{aligned}$$
    (2.13)

    where

    $$\begin{aligned} \zeta _{\Lambda }(s):=\sum _{0\ne v\in \Lambda } {|v|^{-s}} \end{aligned}$$

    denotes the Epstein zeta function and

    $$\begin{aligned} \zeta _{\Lambda }(s,x):=\sum _{v\in \Lambda } {|x+v|^{-s}} \end{aligned}$$

    denotes the Epstein–Hurwitz zeta function for the lattice \(\Lambda \).

  • Denoting the minimum \(\Lambda \)-periodic s-energy by

    $$\begin{aligned} \mathcal {E}_{s,\Lambda }(N):=\min _{\omega _N \in \mathbf {D}_{\Lambda }^N} E_{s, \Lambda }(\omega _N), \end{aligned}$$
    (2.14)

    it is shown in [15] that

    $$\begin{aligned} \lim _{N\rightarrow \infty }\frac{\mathcal {E}_{s,\Lambda }(N)}{N^{1+s/d}}= C_{s,d}|\Lambda |^{-s/d}, \end{aligned}$$
    (2.15)

    where \(C_{s,d}\) is as in (2.10).

The constant \(C_{s,d}\) for \(s>d\) appearing in (2.10) and (2.15) is known only in the case \(d=1\), where \(C_{s,1}=\zeta _{\mathbb {Z}}(s)=2\zeta (s)\) and \(\zeta (s)\) denotes the classical Riemann zeta function. For dimensions \(d=2, 4, 8\), and 24, it has been conjectured (cf. [5, 7] and references therein) that \(C_{s,d}\) for \(s>d\) is also given by an Epstein zeta function, specifically, that \(C_{s,d}=\zeta _{\Lambda _d}(s)\) for \(\Lambda _d\) denoting the equilateral triangular (or hexagonal) lattice, the \(D_4\) lattice, the \(E_8\) lattice, and the Leech lattice (all scaled to have co-volume 1) in the dimensions \(d=2, 4, 8,\) and 24, respectivelyFootnote 1.

2.3.4 Riesz Energy of an Infinite Point Configuration

  • Let \(\mathcal {C}\) in \(\mathcal {X}\) be an (infinite) point configuration. We define its Riesz s-energy as

    $$\begin{aligned} \mathcal {W}_s( \mathcal {C}) := \liminf _{R\rightarrow \infty } \frac{1}{R^d} \sum _{p \ne q \in \mathcal {C}\cap K_R} \frac{1}{|p-q|^s} = \liminf _{R \rightarrow \infty } \frac{1}{R^d} \mathrm {Int}[K_R, K_R](\mathcal {C}). \end{aligned}$$
    (2.16)

    If \(\mathcal {C}=\emptyset \), we define \(\mathcal {W}_s(\mathcal {C})=0\). The s-energy is non-negative and can be \(+ \infty \).

  • We have, for any \(\mathcal {C}\) in \(\mathcal {X}\) and any \(m \in (0, + \infty ),\)

    $$\begin{aligned} \mathcal {W}_s(\sigma _m \mathcal {C})= m^{-(1+s/d)} \mathcal {W}_s(\mathcal {C}). \end{aligned}$$
    (2.17)

It is not difficult to verify (cf. [7, Lemma 9.1]), that if \(\Lambda \) is a lattice and \(\omega _N\) is a N-tuple of points in \(\mathbf {D}_{\Lambda }\), we have

$$\begin{aligned} \mathcal {W}_s(\omega _N +\Lambda ) = \frac{1}{|\Lambda |} E_{s, \Lambda }(\omega _N). \end{aligned}$$
(2.18)

In particular, we have (in view of (2.13))

$$\begin{aligned} \mathcal {W}_s(\Lambda )=|\Lambda |^{-1} \zeta _{\Lambda }(s). \end{aligned}$$
(2.19)

2.3.5 Riesz Energy for Laws of Random Point Configurations

  • Let P be in \(\mathcal {P}(\mathcal {X})\); we define its Riesz s-energy as

    $$\begin{aligned} \mathbb {W}_s(P):= \liminf _{R \rightarrow \infty } \frac{1}{R^d} \mathbf {E}_{P}\left[ \mathrm {Int}[K_R, K_R](\mathcal {C}) \right] . \end{aligned}$$
    (2.20)
  • Let \(\overline{P}\) be in \(\overline{\mathcal {M}}(\overline{\mathcal {X}})\); we define its Riesz s-energy as

    $$\begin{aligned} \overline{\mathbb {W}}_s(\overline{P}) := \int _{\Omega } \mathbb {W}_s( \overline{P}^x)\, \mathrm{d}x. \end{aligned}$$
    (2.21)
  • Let \(\overline{P}\) be in \(\overline{\mathcal {M}}(\overline{\mathcal {X}})\) with intensity measure \(\rho \). It follows from (2.17), (2.21), and definition (2.4) that

    $$\begin{aligned} \overline{\mathbb {W}}_s\left( \overline{P}\right) = \int _{\Omega } \rho (x)^{1+s/d}\, \mathbb {W}_s\left( \left( \overline{\sigma }_{\rho }(\overline{P})\right) ^x \right) \mathrm{d}x. \end{aligned}$$
    (2.22)

Let us emphasize that we define \(\mathbb {W}_s\) as in (2.20) and not by \(\mathbf {E}_{P}[\mathcal {W}_s]\). Fatou’s lemma easily implies that

$$\begin{aligned} \mathbf {E}_{P} [ \mathcal {W}_s] \le \mathbb {W}_s(P), \end{aligned}$$
(2.23)

and in fact, in the stationary case, we may show that equality holds (see Corollary 3.4).

2.3.6 Expression in Terms of the Two-Point Correlation Function

Let P be in \(\mathcal {P}(\mathcal {X})\), and let us assume that the two-point correlation function of P, denoted by \(\rho _{2, P}\), exists in some distributional sense. We may easily express the Riesz energy of P in terms of \(\rho _{2,P}\) as follows:

$$\begin{aligned} \mathbb {W}_s(P) = \liminf _{R \rightarrow \infty } \frac{1}{R^d} \int _{K_R \times K_R} \frac{1}{|x-y|^s} \rho _{2,P}(x,y) \mathrm{d}x \mathrm{d}y. \end{aligned}$$
(2.24)

If P is stationary, the expression can be simplified as

$$\begin{aligned} \mathbb {W}_s(P) = \liminf _{R \rightarrow \infty } \int _{[-R, R]^d} \frac{1}{|v|^s} \rho _{2,P}(v) \prod _{i=1}^d \left( 1 - \frac{|v_i|}{R} \right) \mathrm{d}v \, , \end{aligned}$$
(2.25)

where \(\rho _{2,P}(v) = \rho _{2,P}(0,v)\) (we abuse notation and view \(\rho _{2,P}\) as a function of one variable, by stationarity) and \(v = (v_1, \dots , v_d)\). Both (2.24) and (2.25) follow from the definitions and easy manipulations; proofs (in a slightly different context) can be found in [17]. Let us emphasize that the integral in the right-hand side of (2.24) is on two variables, whereas the one in (2.25) is a single integral, obtained by using stationarity and applying Fubini’s formula, which gives the weight \(\prod _{i=1}^d \left( 1 - \frac{|v_i|}{R} \right) \).

2.4 The Rate Functions

2.4.1 Definitions

  • For \(P\) in \(\overline{\mathcal {M}}(\overline{\mathcal {X}})\), we define

    $$\begin{aligned} \overline{\mathbb {V}}(\overline{P}) := \int V(x) \mathrm{d}\left( \overline{\mathrm {Intens}}(\overline{P})\right) (x). \end{aligned}$$

    This is the energy contribution of the potential V.

  • For \(\overline{P}\) in \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\), we define

    $$\begin{aligned} \overline{\mathcal {F}}_{\beta }(\overline{P}) := \beta \left( \overline{\mathbb {W}}_s(\overline{P}) + \overline{\mathbb {V}}(\overline{P}) \right) + \int _{\Omega } \left( \mathsf {ent}[\overline{P}^x| \mathbf {\Pi }] -1\right) \mathrm{d}x +1. \end{aligned}$$
    (2.26)

    It is a free energy functional, the sum of an energy term \( \overline{\mathbb {W}}_s(\overline{P}) + \overline{\mathbb {V}}(\overline{P})\) weighted by the inverse temperature \(\beta \) and an entropy term.

  • If \(\rho \) is a probability density, we define \(I_{\beta }(\rho )\) as

    $$\begin{aligned} I_{\beta }(\rho ):= & {} \int _{\Omega } \inf _{P \in \mathcal {P}_{stat,\rho (x)}(\mathcal {X}) } \left( \beta \mathbb {W}_s(P) +\mathsf {ent}[P|\mathbf {\Pi }] -1\right) \mathrm{d}x \nonumber \\&+\, \beta \int _{\Omega } \rho (x) V(x)\, \mathrm{d}x +1,\qquad \end{aligned}$$
    (2.27)

    which can be written as

    $$\begin{aligned} I_{\beta }(\rho )= & {} \int _{\Omega } \rho (x) \inf _{P \in \mathcal {P}_{stat,1}(\mathcal {X})} \left( \beta \rho (x)^{s/d} \mathbb {W}_s(P) +\mathsf {ent}[P|\mathbf {\Pi }] \right) \mathrm{d}x \nonumber \\&+ \,\beta \int _{\Omega } \rho (x) V(x)\, \mathrm{d}x + \int _{\Omega } \rho (x) \log \rho (x) \, \mathrm{d}x. \end{aligned}$$
    (2.28)

    This last equation may seem more complicated, but note that the \(\inf \) inside the integral is taken on a fixed set, independent of \(\rho \). The rate function \(I_{\beta }\) is obtained in Sect. 4.5 as a contraction (in the language of large deviation theory, see, e.g., [24, Section 3.1]) of the functional \(\overline{\mathcal {F}}_{\beta }\), and (2.28) follows from (2.27) by scaling properties of \(\mathbb {W}_s\) and \(\mathsf {ent}[ \cdot | \mathbf {\Pi }]\).

2.4.2 Properties

Proposition 2.1

For all \(\beta > 0\), the functionals \(\overline{\mathcal {F}}_{\beta }\) and \(I_{\beta }\) are good rate functions. Moreover, \(I_{\beta }\) is strictly convex.

Proof

It is proved in Proposition 3.3 that \(\overline{\mathbb {W}}_s\) is lower semi-continuous on \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\). As for \(\overline{\mathbb {V}}\), we may observe that, if \(\overline{P}\in \overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\),

$$\begin{aligned} \overline{\mathbb {V}}(\overline{P}) = \int _{\Omega \times \mathcal {X}} \left( V(x) |\mathcal {C}\cap K_1| \right) \mathrm{d}\overline{P}(x, \mathcal {C}), \end{aligned}$$

and that \((x, \mathcal {C}) \mapsto V(x) |\mathcal {C}\cap K_1|\) is lower semi-continuous on \(\overline{\mathcal {X}}\). Thus \(\overline{\mathbb {V}}\) is lower semi-continuous on \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\); moreover, it is known that \(\mathsf {ent}[\cdot | \mathbf {\Pi }]\) is lower semi-continuous (see Sect. 2.2). Thus \(\overline{\mathcal {F}}_{\beta }\) is lower semi-continuous. Since \(\overline{\mathbb {W}}_s\) and \(\overline{\mathbb {V}}\) are bounded below, the sub-level sets of \(\overline{\mathcal {F}}_{\beta }\) are included in those of \(\mathsf {ent}[\cdot | \mathbf {\Pi }]\), which are known to be compact (see again Sect. 2.2). Thus \(\overline{\mathcal {F}}_{\beta }\) is a good rate function.

The functional \(I_{\beta }\) is easily seen to be lower semi-continuous, and since \(\mathcal {W}_s\), \(\mathsf {ent}\) and V are bounded below, the sub-level sets of \(I_{\beta }\) are included into those of \(\int _{\Omega } \rho \log \rho \) which are known to be compact; thus \(I_{\beta }\) is a good rate function.

To prove that \(I_{\beta }\) is strictly convex in \(\rho \), it is enough to prove that the first term in the right-hand side of (2.28) is convex (the second one is clearly affine, and the last one is well known to be strictly convex). We may observe that the map

$$\begin{aligned} \rho \mapsto \beta \rho ^{1+s/d} \mathbb {W}_s(P) + \rho \,\mathsf {ent}[P|\mathbf {\Pi }] - \rho \end{aligned}$$

is convex for all P (because \(\mathbb {W}_s(P)\) is non-negative), and the infimum of a family of convex functions is also convex; thus

$$\begin{aligned} \rho \mapsto \inf _{P \in \mathcal {P}_{stat,1}(\mathcal {X})} \left( \beta \rho ^{1+s/d} \mathbb {W}_s(P) + \rho \, \mathsf {ent}[P|\mathbf {\Pi }] \right) \end{aligned}$$

is convex in \(\rho \), which concludes the proof. \(\square \)

3 Preliminaries on the Energy

3.1 General Properties

3.1.1 Minimal Energy of Infinite Point Configurations

In this section, we connect the minimization of \(\mathcal {W}_s\) (defined at the level of infinite point configurations) with the asymptotics of the N-point minimal energy as presented in Sect. 2.3.2. Let us recall that the class \(\mathcal {X}_m\) of point configurations with mean density m has been defined in Sect. 2.1.5.

Proposition 3.1

We have

$$\begin{aligned} \inf _{\mathcal {C}\in \mathcal {X}_1} \mathcal {W}_s(\mathcal {C}) = \min _{\mathcal {C}\in \mathcal {X}_1} \mathcal {W}_s(\mathcal {C}) = C_{s,d}, \end{aligned}$$
(3.1)

where \(C_{s,d}\) is as in (2.10). Moreover, for any d-dimensional Bravais lattice \(\Lambda \) of co-volume 1, there exists a minimizing sequence \(\{C_N\}_N\) for \(\mathcal {W}_s\) over \(\mathcal {X}_1\) such that \(\mathcal {C}_N\) is \(N^{1/d}\Lambda \)-periodic for \(N \ge 1\).

Proof

Let \(\Lambda \) be a d-dimensional Bravais lattice \(\Lambda \) of co-volume 1, and for any N let \(\omega _N\) be a N-point configuration minimizing \(E_{s,\Lambda }\). We define

$$\begin{aligned} \mathcal {C}_N := N^{1/d} \left( \omega _N + \Lambda \right) . \end{aligned}$$

By construction, \(\mathcal {C}_N\) is a \(N^{1/d} \Lambda \)-periodic point configuration of density 1. Using the scaling property (2.17) and (2.18), we have

$$\begin{aligned} \mathcal {W}_s(\mathcal {C}_N)= \frac{\mathcal {W}_s\left( \omega _N+\Lambda \right) }{N^{1+s/d}} = \frac{E_{s, \Lambda }(\omega _N)}{N^{1+s/d}}. \end{aligned}$$

On the other hand, we have by assumption \(E_{s, \Lambda }(\omega _N) = \mathcal {E}_{s,\Lambda }(N)\). Taking the limit \(N \rightarrow \infty \) yields, in light of (2.15), \(\lim _{N \rightarrow \infty } \mathcal {W}_s(\mathcal {C}_N) = C_{s,d}\). In particular, we have

$$\begin{aligned} \inf _{\mathcal {C}\in \mathcal {X}_1} \mathcal {W}_s(\mathcal {C}) \le C_{s,d}. \end{aligned}$$
(3.2)

To prove the converse inequality, let us consider an arbitrary \(\mathcal {C}\) in \(\mathcal {X}_1\). We have by definition (see (2.8) and (2.16)) and the scaling properties of \(E_s\),

$$\begin{aligned} \mathcal {W}_s(\mathcal {C}) =\liminf _{R \rightarrow \infty } \frac{E_s\left( \mathcal {C}\cap K_R \right) }{R^d} = \liminf _{R \rightarrow \infty }\frac{1}{R^{d+s}} E_s\left( \frac{1}{R}\mathcal {C}\cap K_1\right) , \end{aligned}$$

and, again by definition (see (2.9)),

$$\begin{aligned} E_s\left( \frac{1}{R}\mathcal {C}\cap K_1\right) \ge \mathcal {E}_s\left( K_1, |\mathcal {C}\cap K_R| \right) . \end{aligned}$$

We thus obtain

$$\begin{aligned} \mathcal {W}_s(\mathcal {C}) \ge \liminf _{R \rightarrow \infty } \frac{\mathcal {E}_s\left( K_1, |\mathcal {C}\cap K_R| \right) }{|\mathcal {C}\cap K_R|^{1+s/d}} \left( \frac{|\mathcal {C}\cap K_R|}{R^d}\right) ^{1+s/d}. \end{aligned}$$

Using the definition (2.10) of \(C_{s,d}\) we have

$$\begin{aligned} \liminf _{R \rightarrow \infty } \frac{\mathcal {E}_s\left( K_1, |\mathcal {C}\cap K_R| \right) }{|\mathcal {C}\cap K_R|^{1+s/d}} \ge C_{s,d}, \end{aligned}$$

and by the definition of density, since \(\mathcal {C}\) is in \(\mathcal {X}_1\), we have

$$\begin{aligned} \liminf _{R \rightarrow \infty } \left( \frac{|\mathcal {C}\cap K_R|}{R^d}\right) ^{1+s/d} = 1. \end{aligned}$$

This yields \(\mathcal {W}_s(\mathcal {C}) \ge C_{s,d}\), and so (in view of (3.2))

$$\begin{aligned} \inf _{\mathcal {C}\in \mathcal {A}_1} \mathcal {W}_s(\mathcal {C})= C_{s,d}. \end{aligned}$$
(3.3)

It remains to prove that the infimum is achieved. Let us start with a sequence \(\{\omega _M\}_{M \ge 1}\) such that \(\omega _M\) is a \(M^d\)-point configuration in \(K_M\) satisfying

$$\begin{aligned} \lim _{M \rightarrow \infty } \frac{E_s(\omega _M)}{M^d} = C_{s,d}. \end{aligned}$$
(3.4)

Such a sequence of point configurations exists by definition of \(C_{s,d}\) as in (2.10), and by the scaling properties of \(E_s\). We define a configuration \(\mathcal {C}\) inductively as follows:

  • Let \(r_1, c_1, s_1 = 1,\) and let us set \(\mathcal {C}\cap K_{r_1}\) to be \(\omega _1\).

  • Assume that \(r_N, s_N, c_N\) and \(\mathcal {C}\cap K_{r_N}\) have been defined. We let

    $$\begin{aligned} s_{N+1} = \lceil c_{N+1}r_N + (c_{N+1} r_N)^{\frac{1}{2}} \rceil , \end{aligned}$$
    (3.5)

    with \(c_{N+1} > 1\) to be chosen later. We also let \(r_{N+1}\) be a multiple of \(s_{N+1}\) large enough, to be chosen later. We tile \(K_{r_{N+1}}\) by hypercubes of sidelength \(s_{N+1}\), and we define \(\mathcal {C}\cap K_{r_{N+1}}\) as follows:

    • In the central hypercube of sidelength \(s_{N+1}\), we already have the points of \(\mathcal {C}\cap K_{r_N}\) (because \(r_N \le s_{N+1}\)), and we do not add any points. In particular, this ensures that each step of our construction is compatible with the previous ones.

    • In all the other hypercubes, we paste a copy of \(\omega _{c_{N+1} r_N}\) “centered” in the hypercube in such a way that

      $$\begin{aligned} \text {all the points are at distance }\ge (c_{N+1} r_N)^{\frac{1}{2}}\text { from the boundary}. \end{aligned}$$
      (3.6)

      This is always possible because \(\omega _{c_{N+1} r_N}\) lives, by definition, in a hypercube of sidelength \(c_{N+1} r_N\) and because we have chosen \(s_{N+1}\) as in (3.5).

    We claim that the number of points in \(K_{r_{N+1}}\) is always less than \(r_{N+1}^d\) (as can easily be checked by induction) and is bounded below by

    $$\begin{aligned} \left( \left( \frac{r_{N+1}}{s_{N+1}}\right) ^d -1 \right) (c_{N+1} r_N)^d. \end{aligned}$$

    Thus it is easy to see that if \(c_{N+1}\) is chosen large enough and if \(r_{N+1}\) is a large enough multiple of \(s_{N+1}\), then

    $$\begin{aligned} \text { the number of points in }r_{N+1}\text { is }r_{N+1}^d (1 - o_N(1)). \end{aligned}$$
    (3.7)

    Let us now give an upper bound on the interaction energy \(\mathrm {Int}[K_{r_{N+1}}, K_{r_{N+1}}](\mathcal {C})\). We recall that we have tiled \(K_{r_{N+1}}\) by hypercubes of sidelength \(s_{N+1}\).

    • Each hypercube has a self-interaction energy given by \(E_s(\omega _{c_{N+1} r_N})\), except the central one, whose self-interaction energy is bounded by \(O(r_N^d)\) (as can be seen by induction).

    • The interaction of a given hypercube with the union of all the others can be controlled because, by construction (see (3.6)) the configurations pasted in two disjoint hypercubes are far way from each other. We can compare it to

      $$\begin{aligned} \int _{r =(c_{N+1} r_N)^{\frac{1}{2}}}^{+ \infty } \frac{1}{r^s} s_{N+1}^{d} r^{d-1} \mathrm{d}r, \end{aligned}$$

      and an elementary computation shows that it is negligible with respect to \(s_{N+1}^d\) (because \(d < s\)).

    We thus have

    $$\begin{aligned} \mathrm {Int}[K_{r_{N+1}}, K_{r_{N+1}}](\mathcal {C})\le & {} \left( \left( \frac{r_{N+1}}{s_{N+1}}\right) ^d -1 \right) E_s(\omega _{c_{N+1} r_N}) + O(r_N^d) \\&+ \left( \frac{r_{N+1}}{s_{N+1}}\right) ^d o_N\left( s_{N+1}^d \right) . \end{aligned}$$

    We may now use (3.4) and get that

    $$\begin{aligned} \frac{1}{r_{N+1}^d} \mathrm {Int}[K_{r_{N+1}}, K_{r_{N+1}}](\mathcal {C}) \le C_{s,d}+ o_N(1). \end{aligned}$$
    (3.8)

Let \(\mathcal {C}\) be the point configuration constructed as above. Taking the limit as \(N \rightarrow \infty \) in (3.7) shows that \(\mathcal {C}\) is in \(\mathcal {X}_1\), and (3.8) implies that \(\mathcal {W}_s(\mathcal {C}) \le C_{s,d}\), which concludes the proof of (3.1). \(\square \)

3.1.2 Energy of Random Point Configurations

In the following lemma, we prove that for stationary P the \(\liminf \) defining \(\mathbb {W}_s(P)\) as in (2.20) is actually a limit, and that the convergence is uniform of sublevel sets of \(\mathbb {W}_s\) (which will be useful for proving lower semi-continuity).

Lemma 3.2

Let P be in \(\mathcal {P}_{stat}(\mathcal {X})\). The following limit exists in \([0, +\infty ]\):

$$\begin{aligned} \mathbb {W}_s(P) := \lim _{R \rightarrow \infty } \frac{1}{R^d} \mathbf {E}_{P} \left[ \mathrm {Int}[K_R, K_R]\right] . \end{aligned}$$
(3.9)

Moreover, we have as \(R \rightarrow \infty \),

$$\begin{aligned} \left| \mathbb {W}_s(P) - \frac{1}{R^d} \mathbf {E}_{P} \left[ \mathrm {Int}[K_R, K_R] \right] \right| \le C \left( \mathbb {W}_s(P)^{\frac{2}{1+s/d}} + \mathbb {W}_s(P) \right) o_R(1), \end{aligned}$$
(3.10)

with \(o_R(1)\) depending only on sd.

Proof

We begin by showing that the quantity

$$\begin{aligned} \frac{1}{n^d} \mathbf {E}_{P} \left[ \mathrm {Int}[K_{n}, K_n](\mathcal {C})\right] \end{aligned}$$

is nondecreasing for integer values of n.

For \(n \ge 1\), let \(\{\tilde{K}_v\}_{v \in \mathbb {Z}^d \cap K_n }\) be a tiling of \(K_n\) by unit hypercubes, indexed by the centers \(v \in \mathbb {Z}^d \cap K_n\) of the hypercubes, and let us split \(\mathrm {Int}[K_n, K_n]\) as

$$\begin{aligned} \mathrm {Int}[K_{n}, K_n] = \sum _{v , v'\in \mathbb {Z}^d \cap K_n} \mathrm {Int}[\tilde{K}_{v}, \tilde{K}_{v'}]. \end{aligned}$$

Using the stationarity assumption and writing \(v = (v_1, \dots , v_d)\) and \(|v|:=\max _i |v_i|\), we obtain

$$\begin{aligned} \mathbf {E}_{P} \left[ \sum _{v, v' \in \mathbb {Z}^d \cap K_n} \mathrm {Int}[\tilde{K}_{v}, \tilde{K}_{v'}] \right] = \sum _{v \in \mathbb {Z}^d \cap K_{2n}} \mathbf {E}_{P} \left[ \mathrm {Int}[\tilde{K}_{0}, \tilde{K}_{v}] \right] \prod _{i=1}^d (n - |v_i|). \end{aligned}$$

We thus get

$$\begin{aligned} \frac{1}{n^d} \mathbf {E}_{P} \left[ \mathrm {Int}[K_{n}, K_n] \right] = \sum _{v \in \mathbb {Z}^d \cap K_{2n}} \mathbf {E}_{P} \left[ \mathrm {Int}[\tilde{K}_{0}, \tilde{K}_{v}] \right] \prod _{i=1}^d \left( 1 - \frac{|v_i|}{n}\right) , \end{aligned}$$
(3.11)

and it is clear that this quantity is nondecreasing in n; in particular, the limit as \(n \rightarrow \infty \) exists in \([0, + \infty ]\). We may also observe that \(R \mapsto \mathrm {Int}[K_R, K_R]\) is nondecreasing in R. It is then easy to conclude that the limit of (3.9) exists in \([0, + \infty ]\).

Let us now quantify the speed of convergence. First, we observe that for \(|v| \ge 2\), we have

$$\begin{aligned} \mathbf {E}_{P} \left[ \mathrm {Int}[\tilde{K}_{0}, \tilde{K}_{v}] \right] \le O\left( \frac{1}{|v-1|^s}\right) \mathbf {E}_{P}[ N_0 N_v], \end{aligned}$$

where \(N_0, N_v\) denotes the number of points in \(\tilde{K}_0, \tilde{K}_v\). Indeed, the points of \(\tilde{K}_{0}\) and \(\tilde{K}_{v}\) are at distance at least \(|v-1|\) from each other (up to a multiplicative constant depending only on d).

On the other hand, Hölder’s inequality and the stationarity of P imply

$$\begin{aligned} \Vert N_0 N_v\Vert _{L^1(P)} \le \Vert N_0 \Vert _{L^{1+s/d}(P)} \Vert N_v \Vert _{L^{1+s/d}(P)} = \Vert N_0 \Vert ^2_{L^{1+s/d}(P)}, \end{aligned}$$

and thus we have \(\mathbf {E}_{P}[ N_0 N_v] \le \mathbf {E}_{P}[N_0]^{\frac{2}{1+s/d}}\). On the other hand, it is easy to check that for P stationary,

$$\begin{aligned} \mathbf {E}_{P} [N_0^{1+s/d}] \le C \mathbb {W}_s(P) \end{aligned}$$

for some constant C depending on ds. Indeed, the interaction energy in the hypercube \(\tilde{K}_0\) is bounded below by some constant times \(N_0^{1+s/d}\), and (3.11) shows that

$$\begin{aligned} \mathbb {W}_s(P) \ge \mathbf {E}_{P} \left[ \mathrm {Int}[\tilde{K}_{0}, \tilde{K}_0]\right] . \end{aligned}$$

We thus get

$$\begin{aligned}&\mathbb {W}_s(P) - \sum _{v \in \mathbb {Z}^d \cap K_{2n}} \mathbf {E}_{P} \left[ \mathrm {Int}[\tilde{K}_{0}, \tilde{K}_{v}] \right] \prod _{i=1}^d \left( 1 - \frac{|v_i|}{n}\right) \\&\quad \le \mathbb {W}_s(P)^{\frac{2}{1+s/d}} \left( \sum _{v \in \mathbb {Z}^d \cap K_{2n}, |v| \ge 2} \frac{1}{|v-1|^s} \left( 1 - \prod _{i=1}^d \left( 1 - \frac{|v_i|}{n}\right) \right) + \sum _{|v| \ge 2n} \frac{1}{|v|^s}\right) \\&\qquad + \frac{1}{n} \sum _{|v| =1} \mathbf {E}_{P} \left[ \mathrm {Int}[\tilde{K}_{0}, \tilde{K}_{v}] \right] . \end{aligned}$$

It is not hard to see that the parenthesis in the right-hand side goes to zero as \(n \rightarrow \infty \). On the other hand, we have

$$\begin{aligned} \sum _{|v| =1} \mathbf {E}_{P} \left[ \mathrm {Int}[\tilde{K}_{0}, \tilde{K}_{v}] \right] \le \mathbb {W}_s(P). \end{aligned}$$

Thus we obtain

$$\begin{aligned} \mathbb {W}_s(P) - \frac{1}{n^d} \mathbf {E}_{P} \left[ \mathrm {Int}[K_{n}, K_n] \right] \le \left( \mathbb {W}_s(P)^{\frac{2}{1+s/d}} + \mathbb {W}_s(P)\right) o_n(1), \end{aligned}$$

with a \(o_n(1)\) depending only on ds, and it is then not hard to get (3.10). \(\square \)

For any \(R > 0\), the quantity \(\mathrm {Int}[K_{R}, K_R]\) is continuous and bounded below on \(\mathcal {X}\); thus the map

$$\begin{aligned} P \mapsto \frac{1}{R^d} \mathbf {E}_{P} \left[ \mathrm {Int}[K_{R}, K_R] \right] \end{aligned}$$

is lower semi-continuous on \(\mathcal {P}(\mathcal {X})\). The second part of Lemma 3.2 shows that we may approximate \(\mathbb {W}_s(P)\) by \(\frac{1}{R^d} \mathbf {E}_{P} \left[ \mathrm {Int}[K_{R}, K_R] \right] \) up to an error \(o_R(1)\), uniformly on sub-level sets of \(\mathbb {W}_s\). The next proposition follows easily.

Proposition 3.3

  1. 1.

    The functional \(\mathbb {W}_s\) is lower semi-continuous on \(\mathcal {P}_{stat,1}(\mathcal {X})\).

  2. 2.

    The functional \(\overline{\mathbb {W}}_s\) is lower semi-continuous on \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\).

We may also prove the following equality (which settles a question raised in Sect. 2.3.5).

Corollary 3.4

Let P be in \(\mathcal {P}_{stat,1}(\mathcal {X})\). Then we have

$$\begin{aligned} \mathbb {W}_s(P) = \lim _{R \rightarrow \infty } \frac{1}{R^d} \mathbf {E}_{P} \left[ \mathrm {Int}[K_R, K_R](\mathcal {C}) \right] = \mathbf {E}_{P} \left[ \liminf _{R \rightarrow \infty } \frac{1}{R^d} \mathrm {Int}[K_R, K_R](\mathcal {C}) \right] . \end{aligned}$$

Proof

As was observed in (2.23), Fatou’s lemma implies that

$$\begin{aligned} \mathbf {E}_{P} \left[ \liminf _{R \rightarrow \infty } \frac{1}{R^d} \mathrm {Int}[K_R, K_R](\mathcal {C}) \right] \le \lim _{R \rightarrow \infty } \frac{1}{R^d} \mathbf {E}_{P} \left[ \mathrm {Int}[K_R, K_R](\mathcal {C}) \right] = \mathbb {W}_s(P) \end{aligned}$$

(the last equality is by definition). On the other hand, with the notation of the proof of Lemma 3.2, we have for any integer n and any \(\mathcal {C}\) in \(\mathcal {X}\),

$$\begin{aligned} \frac{1}{n^d} \mathrm {Int}[K_n, K_n](\mathcal {C}) = \frac{1}{n^d} \sum _{v, v' \in \mathbb {Z}^d \cap K_R} \mathrm {Int}[\tilde{K}_{v}, \tilde{K}_{v'}], \end{aligned}$$

and the right-hand side is dominated under P (as observed in the previous proof); thus the dominated convergence theorem applies. \(\square \)

3.2 Derivation of the Infinite-Volume Limit of the Energy

The following result is central in our analysis. It connects the asymptotics of the N-point interaction energy \(\{\mathcal {H}_N(\mathbf {X}_N)\}_N\) with the infinite-volume energy \(\overline{\mathbb {W}}_s(\overline{P})\) of an infinite-volume object: the limit point \(\overline{P}\) of the tagged empirical processes \(\{\overline{\mathrm {Emp}}_N(\mathbf {X}_N)\}_N\).

Proposition 3.5

For any \(N \ge 1\), let \(\mathbf {X}_N= (x_1, \dots , x_N)\) be in \(\Omega ^N\), let \(\mu _N\) be the empirical measure and \(\overline{P}_N\) be the tagged empirical process associated with \(\mathbf {X}_N\); i.e.,

$$\begin{aligned} \mu _N := \mathrm {emp}(\mathbf {X}_N), \quad \overline{P}_N := \overline{\mathrm {Emp}}_N(\mathbf {X}_N), \end{aligned}$$

as defined in (1.8) and (1.10). Let us assume that

$$\begin{aligned} \liminf _{N \rightarrow \infty } \frac{\mathcal {H}_N(\mathbf {X}_N)}{N^{1+s/d}} < + \infty . \end{aligned}$$

Then, up to extraction of a subsequence,

  • \(\{\mu _N\}_N\) converges weakly to some \(\mu \) in \(\mathcal {M}(\Omega )\),

  • \(\{\overline{P}_N\}_N\) converges weakly to some \(\overline{P}\) in \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\),

  • \(\mathrm {Intens}(\overline{P}) = \mu \).

Moreover, we have

$$\begin{aligned} \liminf _{N\rightarrow \infty } \frac{ \mathcal {H}_N(x_1, \dots , x_N)}{N^{1+s/d}} \ge \overline{\mathbb {W}}_s(\overline{P})+ \overline{\mathbb {V}}(\overline{P}). \end{aligned}$$
(3.12)

Proof

Up to extracting a subsequence, we may assume that \(\mathcal {H}_N(\mathbf {X}_N)= O(N^{1+s/d})\). First, by positivity of the Riesz interaction, we have for \(N \ge 1\),

$$\begin{aligned} \int _{\Omega } V \, \mathrm{d}\mu _N \le \frac{\mathcal {H}_N(\mathbf {X}_N)}{N^{1+s/d}}, \end{aligned}$$

and thus \(\int _{\Omega } V \, \mathrm{d}\mu _N\) is bounded. By (1.5) and (1.6) we know that V is bounded below and has compact sub-level sets. An easy application of Markov’s inequality shows that \(\{\mu _N\}_N\) is tight, and thus it converges (up to another extraction). It is not hard to check that \(\{\overline{P}_N\}_N\) converges (up to extraction) to some \(\overline{P}\) in \(\overline{\mathcal {M}}(\overline{\mathcal {X}})\) (indeed, the average number of points per unit volume is constant, which implies tightness, see, e.g., [18, Lemma 4.1]) whose stationarity is clear (see again, e.g., [18]).

Let \(\bar{\rho }\) be the intensity measure of \(\overline{P}\) (in the sense of Sect. 2.1.7). We want to prove that \(\bar{\rho }= \mu \) (which will in particular imply that \(\overline{P}\) is in \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\)). It is a general fact that \(\bar{\rho }\le \mu \) (see, e.g., [19, Lemma 3.7]), but it could happen that a positive fraction of the points cluster together, resulting in the existence of a singular part in \(\mu \) that is missed by \(\bar{\rho }\) so that \(\bar{\rho }< \mu \). However, in the present case, we can easily bound the moment (under \(\overline{P}_N\)) of order \(1 + s/d\) of the number of points in a given hypercube \(K_R\). Indeed, let \(\{\tilde{K}_i\}_{i \in I}\) be a covering of \(\Omega \) by disjoint hypercubes of sidelength \(RN^{-1/d}\), and let \(n_i := N\mu _N\!\left( \tilde{K}_i\right) \) denote the number of points from \(\mathbf {X}_N\) in \(\tilde{K}_i\). We have, by positivity of the Riesz interaction,

$$\begin{aligned} \mathcal {H}_N(\mathbf {X}_N) \ge \sum _{i \in I} \mathrm {Int}[\tilde{K}_i, \tilde{K}_i] \ge C\sum _{i \in I} \frac{n_i^{1+s/d}N^{s/d}}{R^s} \end{aligned}$$

for some constant \(C>0\) (depending only on s and d) because the minimal interaction energy of n points in \(\tilde{K}_i\) is proportional to \(\frac{n^{1+s/d}N^{s/d}}{R^s}\) (see (2.10), (2.11)). Since \(\mathcal {H}_N(\mathbf {X}_N) = O(N^{1+s/d})\) by assumption, we get that \(\sum _{i \in I} n_i^{1+s/d} = O(N)\), with an implicit constant depending only on R. This implies that \(x \mapsto N\mu _N \left( B(x, RN^{-1/d}) \right) \) is uniformly (in N) locally integrable on \(\Omega \) for all \(R > 0\), and arguing as in [19, Lemma 3.7], we deduce that \(\bar{\rho }= \mu \).

We now turn to proving (3.12). Using the positivity and scaling properties of the Riesz interaction and a Fubini-type argument, we may write, for any \(R > 0\),

$$\begin{aligned} \mathrm {Int}[\Omega , \Omega ](\mathbf {X}_N) \ge N^{1+s/d} \int _{\Omega \times \mathcal {X}} \frac{1}{R^d} \mathrm {Int}[K_R, K_R](\mathcal {C})~\mathrm{d}\overline{P}_N(x, \mathcal {C}). \end{aligned}$$

Of course we have, for any \(M > 0\),

$$\begin{aligned} \int _{\Omega \times \mathcal {X}} \frac{1}{R^d} \mathrm {Int}[K_R, K_R](\mathcal {C})~\mathrm{d}\overline{P}_N(x, \mathcal {C}) \ge \int _{\Omega \times \mathcal {X}} \frac{1}{R^d} \left( \mathrm {Int}[K_R, K_R](\mathcal {C}) \wedge M\right) \mathrm{d}\overline{P}_N(x, \mathcal {C}), \end{aligned}$$

and thus the weak convergence of \(\overline{P}_N\) to \(\overline{P}\) ensures that

$$\begin{aligned}&\int _{\Omega \times \mathcal {X}} \frac{1}{R^d} \mathrm {Int}[K_R, K_R](\mathcal {C})~\mathrm{d}\overline{P}_N(x, \mathcal {C}) \\&\quad \ge \int _{\Omega \times \mathcal {X}} \frac{1}{R^d} \left( \mathrm {Int}[K_R, K_R](\mathcal {C}) \wedge M\right) \mathrm{d}\overline{P}(x, \mathcal {C}) + o_N(1). \end{aligned}$$

Since this is true for all M, we obtain

$$\begin{aligned} \liminf _{N \rightarrow \infty } \frac{\mathrm {Int}[\Omega , \Omega ](\mathbf {X}_N)}{N^{1+s/d}} \ge \int _{\Omega \times \mathcal {X}} \frac{1}{R^d} \left( \mathrm {Int}[K_R, K_R](\mathcal {C}) \right) \mathrm{d}\overline{P}(x, \mathcal {C}). \end{aligned}$$

Sending R to \(+ \infty \) and using Proposition 3.1, we get

$$\begin{aligned} \liminf _{N \rightarrow \infty } \frac{\mathrm {Int}[\Omega , \Omega ](\mathbf {X}_N)}{N^{1+s/d}} \ge \liminf _{R \rightarrow \infty } \int _{\Omega \times \mathcal {X}} \frac{1}{R^d} \left( \mathrm {Int}[K_R, K_R](\mathcal {C}) \right) \mathrm{d}\overline{P}(x, \mathcal {C}) =: \overline{\mathbb {W}}_s(\overline{P}).\nonumber \\ \end{aligned}$$
(3.13)

On the other hand, the weak convergence of \(\mu _N\) to \(\mu \) and Assumption 1.5 ensure that

$$\begin{aligned} \liminf _{N \rightarrow \infty } \int _{\Omega } V\, \mathrm{d}\mu _N \ge \int _{\Omega } V\, \mathrm{d}\mu . \end{aligned}$$
(3.14)

Combining (3.13) and (3.14) gives (3.12). \(\square \)

Proposition 3.5 can be viewed as a \(\Gamma \)-\(\liminf \) result (in the language of \(\Gamma \)-convergence). We will prove later (e.g., in Proposition 4.5, which is in fact a much stronger statement) the corresponding \(\Gamma \)-\(\limsup \).

4 Proof of the Large Deviation Principles

As in [18], the main obstacle to proving Theorem 1.1 is to deal with the lack of upper semi-continuity of the interaction, namely that there is no upper bound of the type

$$\begin{aligned} \mathcal {H}_N(\mathbf {X}_N) \lesssim N^{1+s/d} \left( \overline{\mathbb {W}}_s(\overline{P}) + \overline{\mathbb {V}}(\overline{P}) \right) \end{aligned}$$

that holds in general under the mere condition that \(\overline{\mathrm {Emp}}_N(\mathbf {X}_N) \approx \overline{P}\) (cf. (1.10) for a definition of the tagged empirical process). This yields a problem for proving the large deviation lower bound (in contrast, lower semi-continuity holds, and the proof of the large deviations upper bound is quite simple). Let us briefly explain why.

Firstly, due its singularity at 0, the interaction is not uniformly continuous with respect to the topology on the configurations. Indeed, a pair of points at distance \(\varepsilon \) yields a \(\varepsilon ^{-s}\) energy, but a pair of points at distance \(2 \varepsilon \) has energy \((2\varepsilon )^{-s}\), with \(|\varepsilon ^{-s} - (2\varepsilon )^{-s} | \rightarrow \infty \), although these two point configurations are very close for the topology on \(\mathcal {X}\).

Secondly, the energy is nonadditive: we have in general

$$\begin{aligned} \mathrm {Int}[\mathcal {C}_1 \cup \mathcal {C}_2, \mathcal {C}_1 \cup \mathcal {C}_2] \ne \mathrm {Int}[\mathcal {C}_1, \mathcal {C}_1] + \mathrm {Int}[\mathcal {C}_2, \mathcal {C}_2]. \end{aligned}$$

Yet the knowledge of \(\overline{\mathrm {Emp}}_N\) (through the fact that \(\overline{\mathrm {Emp}}_N(\mathbf {X}_N) \in B(\overline{P}, \varepsilon )\)) yields only local information on \(\mathbf {X}_N\) and does not allow one to reconstruct \(\mathbf {X}_N\) globally. Roughly speaking, it is like partitioning \(\Omega \) into hypercubes and having a family of point configurations, each belonging to some hypercube, but without knowing the precise configuration-hypercube pairing. Since the energy is nonadditive (there are nontrivial hypercube-hypercube interactions in addition to the hypercubes’ self-interactions), we cannot (in general) deduce \(\mathcal {H}_N(\mathbf {X}_N)\) from the mere knowledge of the tagged empirical process.

In Sect. 4.3, the singularity problem is dealt with by using a regularization procedure similar to that of [18], while the nonadditivity is shown to be negligible due to the short-range nature of the Riesz potential for \(s > d\).

4.1 An LDP for the Reference Measure

Let \(\mathbf {Leb}_{\Omega ^N}\) be the Lebesgue measure on \(\Omega ^N\), and let \(\bar{\mathfrak {Q}}_N\) be the push-forward of \(\mathbf {Leb}_{\Omega ^N}\) by the “tagged empirical process” map \(\overline{\mathrm {Emp}}_N\) defined in (1.10). Let us recall that \(\Omega \) is not necessarily bounded; hence \(\mathbf {Leb}_{\Omega ^N}\) may have an infinite mass, and thus there is no natural way of making \(\bar{\mathfrak {Q}}_N\) a probability measure.

Proposition 4.1

Let \(\overline{P}\) be in \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\). We have

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \liminf _{N \rightarrow \infty } \frac{1}{N} \log \bar{\mathfrak {Q}}_N\left( B(\overline{P}, \varepsilon ) \right)= & {} \lim _{\varepsilon \rightarrow 0} \limsup _{N \rightarrow \infty } \frac{1}{N} \log \bar{\mathfrak {Q}}_N\left( B(\overline{P}, \varepsilon ) \right) \nonumber \\= & {} - \int _{\Omega } \left( \mathsf {ent}[\overline{P}^x| \mathbf {\Pi }] -1\right) \mathrm{d}x - 1. \end{aligned}$$
(4.1)

We recall that \(\overline{P}^x\) is the disintegration measure of \(\overline{P}\) at the point x, or the “fiber at x” (which is a measure on \(\mathcal {X}\)) of \(\overline{P}\) (which is a measure on \(\Omega \times \mathcal {X}\)), see Sect. 2.1.4.

Proof

If \(\Omega \) is bounded, Proposition 4.1 follows from the analysis of [18, Section 7.2]; see in particular [18, Lemma 7.8]. The only difference is that the Lebesgue measure on \(\Omega \) used in [18] is normalized, which yields an additional factor of \(\log |\Omega |\) in the rate function. The proof extends readily to a nonbounded \(\Omega \) because the topology of weak convergence on \(\overline{\mathcal {M}}(\overline{\mathcal {X}})\) is defined with respect to test functions that are compactly supported on \(\Omega \). \(\square \)

4.2 An LDP Upper Bound

Proposition 4.2

Let \(\overline{P}\) be in \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\). We have

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \limsup _{N \rightarrow \infty } \frac{1}{N} \log \overline{\mathfrak {P}}_{N, \beta }( B(\overline{P}, \varepsilon )) \le - \overline{\mathcal {F}}_{\beta }(\overline{P}) + \limsup _{N \rightarrow \infty } \left( - \frac{\log Z_{N,\beta }}{N}\right) .\qquad \end{aligned}$$
(4.2)

Proof

Using the definition of \(\overline{\mathfrak {P}}_{N, \beta }\) as the push-forward of \(\mathbb {P}_{N,\beta }\) by \(\overline{\mathrm {Emp}}_N\), we may write

$$\begin{aligned} \overline{\mathfrak {P}}_{N, \beta }( B(\overline{P}, \varepsilon )) = \frac{1}{Z_{N,\beta }} \int _{\Omega ^N \cap \{\overline{\mathrm {Emp}}_N(\mathbf {X}_N) \in B(\overline{P}, \varepsilon )\}} \exp \left( -\beta N^{-s/d} \mathcal {H}_N(\mathbf {X}_N)\right) \mathrm{d}\mathbf {X}_N. \end{aligned}$$

From Propositions 3.5 and 3.3, we know that for any sequence \(\mathbf {X}_N\) such that \(\overline{\mathrm {Emp}}_N(\mathbf {X}_N) \in B(\overline{P}, \varepsilon )\), we have

$$\begin{aligned} \liminf _{N \rightarrow \infty } \frac{\mathcal {H}_N(\mathbf {X}_N)}{N^{1+s/d}} \ge \overline{\mathbb {W}}_s(\overline{P}) + \overline{\mathbb {V}}(\overline{P}) + o_{\varepsilon }(1). \end{aligned}$$

We may thus write

$$\begin{aligned}&\limsup _{N \rightarrow \infty } \frac{1}{N} \log \overline{\mathfrak {P}}_{N, \beta }( B(\overline{P}, \varepsilon )) \le - \beta \left( \overline{\mathbb {W}}_s(\overline{P}) + \overline{\mathbb {V}}(\overline{P}) \right) \\&\quad + \limsup _{N \rightarrow \infty } \int _{\Omega ^N \cap \{\overline{\mathrm {Emp}}_N(\mathbf {X}_N) \in B(\overline{P}, \varepsilon )\}} \mathrm{d}\mathbf {X}_N+ \limsup _{N \rightarrow \infty } \left( - \frac{\log Z_{N,\beta }}{N} \right) + o_{\varepsilon }(1). \end{aligned}$$

Using Proposition 4.1, we know that

$$\begin{aligned} \limsup _{N \rightarrow \infty } \frac{1}{N} \log \int _{\Omega ^N \cap \{\overline{\mathrm {Emp}}_N(\mathbf {X}_N) \in B(\overline{P}, \varepsilon )\}} \mathrm{d}\mathbf {X}_N= - \int _{\Omega } \left( \mathsf {ent}[\overline{P}^x| \mathbf {\Pi }] -1\right) - 1 + o_{\varepsilon }(1). \end{aligned}$$

We thus obtain, sending \(\varepsilon \rightarrow 0\),

$$\begin{aligned} \limsup _{N \rightarrow \infty } \frac{1}{N} \log \overline{\mathfrak {P}}_{N, \beta }( B(\overline{P}, \varepsilon ))\le & {} - \beta \left( \overline{\mathbb {W}}_s(\overline{P}) + \overline{\mathbb {V}}(\overline{P}) \right) - \int _{\Omega } \left( \mathsf {ent}[\overline{P}^x| \mathbf {\Pi }] -1\right) - 1 \\&+ \limsup _{N \rightarrow \infty } \left( - \frac{\log Z_{N,\beta }}{N}\right) , \end{aligned}$$

which, in view of the definition of \(\overline{\mathcal {F}}_{\beta }\) as in (2.26), yields (4.2). \(\square \)

4.3 An LDP Lower Bound

The goal of the present section is to prove a matching LDP lower bound:

Proposition 4.3

Let \(\overline{P}\) be in \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\). We have

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \liminf _{N \rightarrow \infty } \frac{1}{N} \log \overline{\mathfrak {P}}_{N, \beta }( B(\overline{P}, \varepsilon )) \ge - \overline{\mathcal {F}}_{\beta }(\overline{P}) + \liminf _{N \rightarrow \infty } \left( - \frac{\log Z_{N,\beta }}{N}\right) . \end{aligned}$$
(4.3)

For \(N \ge 1\) and \(\delta > 0\), let us define the set \(T_{N, \delta }(\overline{P})\) as

$$\begin{aligned} T_{N, \delta }(\overline{P}) = \left\{ \mathbf {X}_N\mid \frac{\mathcal {H}_N(\mathbf {X}_N)}{N^{1+s/d}} \le \overline{\mathcal {F}}_{\beta }(\overline{P}) + \delta \right\} . \end{aligned}$$
(4.4)

We will rely on the following result:

Proposition 4.4

Let \(\overline{P}\) be in \(\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\). For all \(\varepsilon , \delta >0\), we have

$$\begin{aligned} \begin{aligned} \liminf _{N \rightarrow \infty } \frac{1}{N} \log \mathbf {Leb}_{\Omega ^N}&\left( \left\{ \overline{\mathrm {Emp}}_N\left( \mathbf {X}_N\right) \in B(\overline{P}, \varepsilon )\right\} \cap T_{N, \delta }(\overline{P}) \right) \\ {}&\ge - \int _{\Omega } \left( \mathsf {ent}[\overline{P}^x| \mathbf {\Pi }] -1\right) \mathrm {d}x - 1. \end{aligned} \end{aligned}$$
(4.5)

Proof

We may assume that \(\Omega \) is compact and that the intensity measure of \(\overline{P}\), denoted by \(\bar{\rho }\), is continuous, compactly supported, and bounded below. Indeed, we can always approximate \(\overline{P}\) by random point processes satisfying these additional assumptions. For any \(N \ge 1\), we let \(\bar{\rho }_N(x) := \bar{\rho }(x N^{-1/d})\) and we let \(\Omega _N := N^{1/d} \Omega \).

In fact, for simplicity we will assume that \(\Omega \) is some large hypercube. The argument below readily extends to the case where \(\Omega \) can be tiled by small hypercubes, and any \(C^1\) domain can be tiled by small hypercubes up to some “boundary parts” that are negligible for our concerns (a precise argument is given, e.g., in [18, Section 6]).

For \(R > 0\), we let \(\{ \tilde{K}_i \}_{i \in I}\) be a partition of \(\Omega _N\) by hypercubes of sidelength R. For RM, we denote by \(\overline{P}_{R, M}\) the restrictionFootnote 2 to \(K_R\) of \(\overline{P}\), conditioned to the event

$$\begin{aligned} \left\{ \left| \mathcal {C}\cap K_R\right| \le MR^d\right\} . \end{aligned}$$
(4.6)

Step 1. Generating microstates.

For any \(\varepsilon > 0\), for any \(M, R > 0\), for any \(\nu > 0\), for any \(N \ge 1\), there exists a family \(\mathcal {A}= \mathcal {A}(\varepsilon , M, R, \nu , N)\) of point configurations \(\mathcal {C}\) such that:

  1. 1.

    \(\mathcal {C}= \sum _{i \in I} \mathcal {C}_i\), where \(\mathcal {C}_i\) is a point configuration in \(\tilde{K}_i\).

  2. 2.

    \(| \mathcal {C}| = N\).

  3. 3.

    The “discretized” empirical process is close to \(\overline{P}_{R, M}\):

    $$\begin{aligned} \overline{P}_d(\mathcal {C}) := \frac{1}{|I|} \sum _{i \in I} \delta _{(N^{-1/d} x_i, \,\theta _{x_i} \cdot \mathcal {C}_i)} \text { belongs to } B(\overline{P}_{R, M}, \nu ), \end{aligned}$$
    (4.7)

    where \(x_i\) denotes the center of \(\tilde{K}_i\).

  4. 4.

    The associated empirical process is close to \(\overline{P}\)

    $$\begin{aligned} \overline{P}_c(\mathcal {C}) := \int _{\Omega } \delta _{(x,\, \theta _{N^{1/d}x} \cdot \mathcal {C})} \, \mathrm{d}x \text { belongs to } B(\overline{P}, \varepsilon ). \end{aligned}$$
    (4.8)

    Note that \(\overline{P}_c(\mathcal {C}) =\overline{\mathrm {Emp}}_N( N^{-1/d}\mathcal {C}) \).

  5. 5.

    The volume of \(\mathcal {A}\) satisfies, for any \(\varepsilon > 0\),

    $$\begin{aligned} \liminf _{M \rightarrow \infty } \liminf _{R \rightarrow \infty } \frac{1}{R^d} \lim _{\nu \rightarrow 0} \lim _{N \rightarrow \infty } \frac{1}{|I|} \log \mathbf {Leb}_{\Omega _N^N} \left( \mathcal {A}\right) \ge - \int _{\Omega } \left( \mathsf {ent}[\overline{P}^x| \mathbf {\Pi }] -1\right) - 1.\nonumber \\ \end{aligned}$$
    (4.9)

This is essentially [18, Lemma 6.3] with minor modifications (e.g., the Lebesgue measure in [18] is normalized, which yields an additional logarithmic factor in the formulas).

We will make the following assumption on \(\mathcal {A}\):

$$\begin{aligned} |\mathcal {C}_i| \le 2MR^d \text { for all } i \in I. \end{aligned}$$
(4.10)

Indeed, for fixed M, when \(\overline{P}_d\) is close to \(\overline{P}_{R,M}\) (for which (4.6) holds), the fraction of hypercubes on which (4.10) fails to hold as well as the ratio of excess points over the total number of points (namely N) are both small. We may then “redistribute” these excess points among the other hypercubes without affecting (4.8) and changing the energy estimates below only by a negligible quantity.

Step 2. First energy estimate.

For any \(R, M, \tau > 0\), the map defined by

$$\begin{aligned} \mathcal {C}\in \mathcal {X}(K_R) : \, \longrightarrow \mathrm {Int}_{\tau }[\mathcal {C},\mathcal {C}] \wedge \frac{(2MR^d)^2}{\tau ^s} \end{aligned}$$

(where \(\mathrm {Int}_{\tau }\) is as in (2.7)) is continuous on \(\mathcal {X}(K_R)\) and bounded (this is precisely the reason for requiring that the number of points are bounded). We may thus write, in view of (4.6), (4.7), and (4.10),

$$\begin{aligned}&\int _{\Omega \times \mathcal {X}(K_R)} \mathrm {Int}_{\tau }\, \mathrm{d}\overline{P}_d= \int _{\Omega \times \mathcal {X}(K_R)} \mathrm {Int}_{\tau } \wedge \frac{(2MR^d)^2}{\tau ^s} \mathrm{d}\overline{P}_d\\&\quad = \int _{\Omega \times \mathcal {X}(K_R)} \mathrm {Int}_{\tau } \wedge \frac{(2MR^d)^2}{\tau ^s} \mathrm{d}\overline{P}_{R, M} + o_{\nu }(1) = \int _{\Omega \times \mathcal {X}(K_R)} \mathrm {Int}_{\tau }\, \mathrm{d}\overline{P}_{R, M} + o_{\nu }(1). \end{aligned}$$

Moreover, we have

$$\begin{aligned} \lim _{M \rightarrow \infty } \lim _{R \rightarrow \infty } \frac{1}{R^d} \int _{\Omega \times \mathcal {X}(K_R)} \mathrm {Int}_{\tau } \mathrm{d}\overline{P}_{R, M} = \overline{\mathbb {W}}_s(\overline{P}) + o_{\tau }(1); \end{aligned}$$

thus we see that, with (4.7),

$$\begin{aligned} \lim _{M \rightarrow \infty , R \rightarrow \infty } \lim _{\nu \rightarrow 0} \lim _{N \rightarrow \infty } \frac{1}{N} \sum _{i \in I} \mathrm {Int}_{\tau }[\mathcal {C}_i, \mathcal {C}_i] = \overline{\mathbb {W}}_s(\overline{P}) + o_{\tau }(1). \end{aligned}$$
(4.11)

Step 3. Regularization.

In order to deal with the short-scale interactions that are not captured in \(\mathrm {Int}_{\tau }\), we apply the regularization procedure of [18, Lemma 5.11]. Let us briefly present this procedure:

  1. 1.

    We partition \(\Omega _N\) by small hypercubes of sidelength \(6\tau \).

  2. 2.

    If one of these hypercubes \(\mathcal {K}\) contains more than one point or if it contains a point and one of the adjacent hypercubes also contains a point, we replace the point configuration in \(\mathcal {K}\) by one with the same number of points but confined in the central, smaller hypercube \(\mathcal {K}' \subset \mathcal {K}\) of side length \(3 \tau \) and that lives on a lattice (the spacing of the lattice depends on the initial number of points in \(\mathcal {K}\)).

This allows us to control the difference \(\mathrm {Int}- \mathrm {Int}_{\tau }\) in terms of the number of points in the modified hypercubes.

In particular, we replace \(\mathcal {A}\) by a new family of point configurations, such that

$$\begin{aligned} \frac{1}{N} \sum _{i \in I} \left( \mathrm {Int}- \mathrm {Int}_{\tau } \right) [\mathcal {C}_i, \mathcal {C}_i] \le C \tau ^{-s-d}\mathbf {E}_{\overline{P}_d} \left[ \left( \left( \left| \mathcal {C}\cap K_{12\tau } \right| \right) ^{2+s/d} - 1 \right) _+ \right] .\qquad \end{aligned}$$
(4.12)

The right-hand side of (4.12) should be understood as follows: any group of points that were too close to each other (without any precise control) have been replaced by a group of points with the same cardinality but whose interaction energy is now similar to that of a lattice. The energy of n points in a lattice of spacing \(\frac{\tau }{n^{1/d}}\) scales like \(n^{2+ s/d} \tau ^{-s}\), and taking the average over all small hypercubes, is similar to computing \(\frac{1}{\tau ^d} \mathbf {E}_{\overline{P}_d}\).

As \(\nu \rightarrow 0\), we may then compare the right-hand side of (4.12) with the same quantity for \(\overline{P}\), namely

$$\begin{aligned} \tau ^{-s-d} \mathbf {E}_{\overline{P}} \left[ \left( \left( \left| \mathcal {C}\cap K_{12\tau } \right| \right) ^{2+s/d} - 1 \right) _+ \right] , \end{aligned}$$

which can be shown to be \(o_{\tau }(1)\) (following the argument of [18, Section 6.3.3]), because it is in turn of the same order as

$$\begin{aligned} \mathbf {E}_{\overline{P}} \left[ \left( \mathrm {Int}- \mathrm {Int}_{\tau } \right) [K_1, K_1] \right] , \end{aligned}$$

which goes to zero as \(\tau \rightarrow 0\) by dominated convergence.

We obtain

$$\begin{aligned} \lim _{\tau \rightarrow 0} \lim _{M, R \rightarrow \infty } \lim _{\nu \rightarrow 0} \frac{1}{N} \sum _{i \in I} \left( \mathrm {Int}- \mathrm {Int}_{\tau } \right) [\mathcal {C}_i, \mathcal {C}_i] =0, \end{aligned}$$
(4.13)

and combining (4.13) with (4.11), we get that

$$\begin{aligned} \lim _{\tau \rightarrow 0} \lim _{M \rightarrow \infty , R \rightarrow \infty } \lim _{\nu \rightarrow 0} \lim _{N \rightarrow \infty } \frac{1}{N} \sum _{i \in I} \mathrm {Int}[\mathcal {C}_i, \mathcal {C}_i] \le \overline{\mathbb {W}}_s(\overline{P}). \end{aligned}$$
(4.14)

Step 4. Shrinking the configurations. This procedure is borrowed from [14]. It rescales the configuration by a factor less than one (but very close to 1), effectively shrinking it and creating an empty boundary layer around each cube. Thus points belonging to different cubes are sufficiently well-separated so that interactions between the cubes are negligible–a much simpler approach to screening than that in the long-range case.

For \(R > 0\), we let \(R':= R^{\sqrt{d/s}}\).

It is not true in general that \(\mathrm {Int}[\mathcal {C}, \mathcal {C}]\) can be split as the sum \(\sum _{i \in I} \mathrm {Int}[\mathcal {C}_i, \mathcal {C}_i]\). However, since the Riesz interaction decays fast at infinity, it is approximately true if the configurations \(\mathcal {C}_i\) are separated by a large enough distance. To ensure that, we “shrink” every configuration \(\mathcal {C}_i\) in \(\tilde{K}_i\); namely, we rescale them by a factor \(1 - \frac{R'}{R}\). This operation affects the discrete average (4.7) but not the empirical process; i.e., for any \(\varepsilon > 0\), if MR are large enough and \(\nu \) small enough, we may still assume that (4.8) holds. The interaction energy in each hypercube \(\tilde{K}_i\) is multiplied by \(\left( 1- \frac{R'}{R}\right) ^{-s} = 1 + o_R(1)\), but the configurations in two distinct hypercubes are now separated by a distance at least \(R'\). Since (4.10) holds, an elementary computation implies that we have, for any i in I,

$$\begin{aligned} \mathrm {Int}[\mathcal {C}_i, \sum _{j \ne i} \mathcal {C}_j] = M^2 R^{d} \frac{R^d}{R'^s} O(1), \end{aligned}$$

with a O(1) depending only on ds. We thus get

$$\begin{aligned} \mathrm {Int}[\mathcal {C}, \mathcal {C}] = \sum _{i \in I} \mathrm {Int}[\mathcal {C}_i, \mathcal {C}_i] + N M^2 \frac{R^d}{R'^s} O(1), \end{aligned}$$

but \(\frac{R^d}{R'^s} = o_R(1)\) by the choice of \(R'\) (and the fact that \(d < s\)), and thus (in view of (4.14) and the effect of the scaling on the energy)

$$\begin{aligned} \lim _{\tau \rightarrow 0} \lim _{M \rightarrow \infty , R \rightarrow \infty } \lim _{\nu \rightarrow 0} \lim _{N \rightarrow \infty } \frac{1}{N} \mathrm {Int}[\mathcal {C}, \mathcal {C}] \le \overline{\mathbb {W}}_s(\overline{P}). \end{aligned}$$
(4.15)

We have thus constructed a large enough (see (4.9)) volume of point configurations in \(\Omega _N\) whose associated empirical processes converge to \(\overline{P}\) and such that

$$\begin{aligned} \frac{1}{N} \mathrm {Int}[\mathcal {C}, \mathcal {C}] \le \overline{\mathbb {W}}_s(\overline{P}) + o(1). \end{aligned}$$

We may view these configurations at the original scale by applying a homothety of factor \(N^{-1/d}\); this way we obtain point configurations \(\mathbf {X}_N\) in \(\Omega \) such that

$$\begin{aligned} \frac{1}{N^{1+s/d}} E_s(\mathbf {X}_N) \le \overline{\mathbb {W}}_s(\overline{P})+ o(1). \end{aligned}$$

It is not hard to see that the associated empirical measure \(\mu _N\) converges to the intensity measure of \(\overline{P}\), and since V is continuous, we also have

$$\begin{aligned} \frac{1}{N} \int _{\mathbb {R}} V \mathrm{d}\mu _N = \overline{\mathbb {V}}(\overline{P}) + o(1). \end{aligned}$$

This concludes the proof of Proposition 4.4.

\(\square \)

We may now prove the LDP lower bound.

Proof of Proposition 4.3

Proposition 4.4 implies (4.3); indeed, we have

$$\begin{aligned}&\overline{\mathfrak {P}}_{N, \beta }( B(\overline{P}, \varepsilon )) = \frac{1}{Z_{N,\beta }} \int _{\Omega ^N \cap \{\overline{\mathrm {Emp}}_N(\mathbf {X}_N) \in B(\overline{P}, \varepsilon )\}} \exp \left( -\beta N^{-s/d} \mathcal {H}_N(\mathbf {X}_N)\right) \mathrm{d}\mathbf {X}_N\\&\quad \ge \frac{1}{Z_{N,\beta }} \int _{\Omega ^N \cap \{\overline{\mathrm {Emp}}_N(\mathbf {X}_N) \in B(\overline{P}, \varepsilon )\} \cap T_{N, \delta }(\overline{P})} \exp \left( -\beta N^{-s/d} \mathcal {H}_N(\mathbf {X}_N)\right) \mathrm{d}\mathbf {X}_N\\&\quad \ge \frac{1}{Z_{N,\beta }} \exp \left( -\beta N \left( \overline{\mathcal {F}}_{\beta }(\overline{P}) + \delta \right) \right) \int _{\Omega ^N \cap \{\overline{\mathrm {Emp}}_N(\mathbf {X}_N) \in B(\overline{P}, \varepsilon )\} \cap T_{N, \delta }(\overline{P})} \mathrm{d}\mathbf {X}_N, \end{aligned}$$

and (4.5) allows us to bound below the last integral as

$$\begin{aligned}&\liminf _{\delta \rightarrow 0, \varepsilon \rightarrow 0, N \rightarrow \infty } \frac{1}{N} \log \int _{\Omega ^N \cap \{\overline{\mathrm {Emp}}_N(\mathbf {X}_N) \in B(\overline{P}, \varepsilon )\} \cap T_{N, \delta }(\overline{P})} \mathrm{d}\mathbf {X}_N\\&\quad \ge -\int _{\Omega } \left( \mathsf {ent}[\overline{P}^x| \mathbf {\Pi }] -1\right) - 1. \end{aligned}$$

\(\square \)

4.4 Proof of Theorem 1.1 and Corollary 1.2

From Propositions 4.2 and 4.3, the proof of Theorem 1.1 is standard. Exponential tightness of \(\overline{\mathfrak {P}}_{N, \beta }\) comes for free (see, e.g., [18, Section 4.1]) because the average number of points is fixed, and we may thus improve the weak large deviation estimates (4.2), into the following: for any \(A \subset \overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\), we have

$$\begin{aligned}&- \inf _{\mathring{A}} \overline{\mathcal {F}}_{\beta }+ \liminf _{N \rightarrow \infty } \left( - \frac{\log Z_{N,\beta }}{N}\right) \\&\quad \le \liminf _{N \rightarrow \infty }\frac{1}{N} \log \overline{\mathfrak {P}}_{N, \beta }(A) \le \limsup _{N \rightarrow \infty }\frac{1}{N} \log \overline{\mathfrak {P}}_{N, \beta }(A) \\&\quad \le - \inf _{\overline{A}} \overline{\mathcal {F}}_{\beta }\limsup _{N \rightarrow \infty } \left( - \frac{\log Z_{N,\beta }}{N}\right) . \end{aligned}$$

We easily deduce that

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{\log Z_{N,\beta }}{N} = - \min _{\overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})} \overline{\mathcal {F}}_{\beta }, \end{aligned}$$

which proves Corollary 1.2, and that the LDP for \(\overline{\mathfrak {P}}_{N, \beta }\) holds as stated in Theorem 1.1.

4.5 Proof of Theorem 1.3

Proof

Theorem 1.3 follows from an application of the “contraction principle” (see, e.g., [24, Section 3.1]). Let us consider the map \(\overline{\mathcal {M}}(\overline{\mathcal {X}})\rightarrow \mathcal {M}(\Omega )\) defined by

$$\begin{aligned} \widetilde{\mathrm {Intens}}: \overline{P}\mapsto \int _{\Omega } \delta _x \mathbf {E}_{\overline{P}^x}[\mathcal {C}\cap K_1]. \end{aligned}$$

It is continuous on \(\overline{\mathcal {M}}_{stat}(\overline{\mathcal {X}})\) and coincides with \(\overline{\mathrm {Intens}}\). By the contraction principle, the law of \(\widetilde{\mathrm {Intens}}(\overline{\mathrm {Emp}}(\mathbf {X}_N))\) obeys a large deviation principle governed by

$$\begin{aligned} \rho \mapsto \inf _{\overline{\mathrm {Intens}}(\overline{P}) = \rho } \overline{\mathcal {F}}_{\beta }(\overline{P}), \end{aligned}$$

which is easily seen to be equal to \(I_{\beta }(\rho )\) as defined in (2.27).

For technical reasons (a boundary effect), it is not true in general that \(\widetilde{\mathrm {Intens}}(\overline{\mathrm {Emp}}(\mathbf {X}_N)) = \mathrm {emp}(\mathbf {X}_N)\); however, we have

$$\begin{aligned} \mathrm {dist}_{\mathcal {M}(\Omega )} \left( \widetilde{\mathrm {Intens}}(\overline{\mathrm {Emp}}(\mathbf {X}_N)), \overline{\mathrm {Emp}}(\mathbf {X}_N)\right) = o_N(1), \end{aligned}$$

uniformly for \(\mathbf {X}_N\in \Omega \). In particular, the laws of \(\widetilde{\mathrm {Intens}}(\overline{\mathrm {Emp}}(\mathbf {X}_N))\) and of \(\mathrm {emp}(\mathbf {X}_N)\) are exponentially equivalent (in the language of large deviations); thus any LDP can be transferred from one to the other. This proves Theorem 1.3. \(\square \)

5 Additional Proofs: Propositions 1.41.5, and 1.6

5.1 Limit of the Empirical Measure

From Theorem 1.3 and the fact that \(I_{\beta }\) is strictly convex, we deduce that \(\mathrm {emp}(\mathbf {X}_N)\) converges almost surely to the unique minimizer of \(I_{\beta }\).

Proof of Proposition 1.4

First, if \(V = 0\) and \(\Omega \) is bounded, \(I_{\beta }\) can be written as

$$\begin{aligned} I_{\beta }(\rho ):= & {} \int _{\Omega } \rho (x) \inf _{P \in \mathcal {P}_{stat,1}(\mathcal {X})} \left( \beta \rho (x)^{s/d} \mathbb {W}_s(P) +\mathsf {ent}[P|\mathbf {\Pi }] \right) \mathrm{d}x \\&+\, \int _{\Omega } \rho (x) \log \rho (x) \, \mathrm{d}x. \end{aligned}$$

We claim that both terms in the right-hand side are minimized when \(\rho \) is the uniform probability measure on \(\Omega \) (we may assume \(|\Omega | = 1\) to simplify, without loss of generality). This property is well known for the relative entropy term \(\int _{\Omega } \rho \log \rho \), and we now prove it for the energy term. First, let us observe that

$$\begin{aligned} \alpha \mapsto \inf _{P \in \mathcal {P}_{stat,1}(\mathcal {X})} \left( \beta \alpha ^{1+s/d} \mathbb {W}_s(P) + \alpha \mathsf {ent}[P|\mathbf {\Pi }]\right) \end{aligned}$$

is convex in \(\alpha \) since it is the infimum over a family of convex functions (recall that \(\alpha \mapsto \alpha ^{1+s/d}\) is convex in \(\alpha \) and that \(\mathbb {W}_s\) is always positive). Since \(|\Omega | = 1\), we have, by Jensen’s inequality,

$$\begin{aligned}&\int _{\Omega } \inf _{P \in \mathcal {P}_{stat,1}(\mathcal {X})} \left( \beta \rho (x)^{1+s/d} \mathbb {W}_s(P) + \rho (x)\mathsf {ent}[P|\mathbf {\Pi }] \right) \mathrm{d}x \\&\quad \ge \inf _{P \in \mathcal {P}_{stat,1}(\mathcal {X})} \left( \beta \left( \int _{\Omega } \rho (x) \right) ^{1+s/d} \mathbb {W}_s(P) + \left( \int _{\Omega } \rho (x)\right) \mathsf {ent}[P|\mathbf {\Pi }] \right) , \end{aligned}$$

and since \(\int _{\Omega } \rho = 1\), we conclude that \(I_{\beta }\) is minimal for \(\rho \equiv 1\). Thus the empirical measure converges almost surely to the uniform probability measure on \(\Omega \), which proves the first point of Proposition 1.4.

Next, let us assume that V is arbitrary and \(\Omega \) bounded. It is not hard to see that for the minimizer \(\mu _{V, \beta }\) of \(I_{\beta }\), we have, as \(\beta \rightarrow 0\),

$$\begin{aligned} I_{\beta }(\mu _{V, \beta }) \ge I_{\beta }(\rho _{\mathrm {unif}}) + O(\beta ), \end{aligned}$$

where \(\rho _{\mathrm {unif}}\) is the uniform probability measure on \(\Omega \). Moreover, it is also true (as proved above) that the first term in the definition of \(I_{\beta }\) is minimal for \(\rho = \rho _{\mathrm {unif}}\). We thus get that, as \(\beta \rightarrow 0\),

$$\begin{aligned} \int _{\Omega } \mu _{V, \beta } \log \mu _{V, \beta } - \int _{\Omega } \rho _{\mathrm {unif}}\log \rho _{\mathrm {unif}}= O(\beta ); \end{aligned}$$

in other words, the relative entropy of \(\mu _{V, \beta }\) with respect to \(\rho _{\mathrm {unif}}\) converges to 0 as \(\beta \rightarrow 0\). The Csisz–Kullback–Pinsker inequality allows us to bound the square of the total variation distance between \(\mu _{V, \beta }\) and \(\rho _{\mathrm {unif}}\) by the relative entropy (up to a multiplicative constant), and thus \(\mu _{V, \beta }\) converges (in total variation) to the uniform probability measure on \(\Omega \) as \(\beta \rightarrow 0\). This proves the second point of Proposition 1.4.

Finally for V arbitrary, the problem of minimizing of \(I_{\beta }\) is, as \(\beta \rightarrow \infty \), similar to minimizing

$$\begin{aligned} \beta \left( \int _{\Omega } \rho (x)^{1+s/d} \min \mathbb {W}_s\mathrm{d}x + \int _{\Omega } \rho (x) V(x) \mathrm{d}x \right) . \end{aligned}$$

Since \(\min \mathbb {W}_s= C_{s,d}\), we recover (up to a multiplicative constant \(\beta > 0\)) the minimization problem studied in [16], namely the problem of minimizing

$$\begin{aligned} C_{s,d}\int _{\Omega } \rho (x)^{1+s/d} \mathrm{d}x + \int _{\Omega } \rho (x) V(x) \mathrm{d}x, \end{aligned}$$

among probability densities, whose (unique) solution is given by \(\mu _{V, \infty }\).

In order to prove that \(\mu _{V, \beta }\) converges to \(\mu _{V, \infty }\) as \(\beta \rightarrow \infty \), we need to make that heuristic rigorous, which requires an adaptation of [17, Section 7.3, Step 2]. We claim that there exists a sequence \(\{P_k\}_{k \ge 1}\) in \(\mathcal {P}_{stat,1}(\mathcal {X})\) such that

$$\begin{aligned} \lim _{k \rightarrow \infty } \mathbb {W}_s(P_k) = C_{s,d}, \quad \forall k \ge 1, \mathsf {ent}[P_k|\Pi ] < + \infty . \end{aligned}$$
(5.1)

We could think of taking \(P_k = P\), where P is some minimizer of \(\mathbb {W}_s\) among \(\mathcal {P}_{stat,1}(\mathcal {X})\), but it might have infinite entropy (e.g., if P was the law of the stationary process associated with a lattice, as in dimension 1). We thus need to “expand” P (e.g., by making all the points vibrate independently in small balls as described in [17, Section 7.3, Step 2] in the case of the one-dimensional lattice). We may then write that, for any \(\beta > 0\) and \(k \ge 1\),

$$\begin{aligned} I_{\beta }(\mu _{V, \beta })\le & {} I_{\beta }(\mu _{V, \infty }) \le \beta \left( \int _{\Omega } \mu _{V, \infty }(x)^{1+s/d} \mathbb {W}_s(P_k) + \int _{\Omega } \mu _{V, \infty }(x) V(x) \mathrm{d}x \right) \\&+\, \mathsf {ent}[P_k|\Pi ] + \int _{\Omega } \mu _{V, \infty }(x) \log \mu _{V, \infty }(x) \\\le & {} \beta \left( \int _{\Omega } \mu _{V, \infty }(x)^{1+s/d} C_{s,d}+ \int _{\Omega } \mu _{V, \infty }(x) V(x) \mathrm{d}x \right) + \mathsf {ent}[P_k|\Pi ] + \beta o_k(1), \end{aligned}$$

where we have used (5.1) in the last inequality. Choosing \(\beta \) and k properly so that \(k \rightarrow \infty \) as \(\beta \rightarrow \infty \), while assuring that the \(\beta o_k(1)\) term goes to zero, we have

$$\begin{aligned}&C_{s,d}\int _{\Omega } \mu _{V, \beta }(x)^{1+s/d} + \int _{\Omega } \mu _{V, \beta }(x) V(x) \le C_{s,d}\int _{\Omega } \mu _{V, \infty }(x)^{1+s/d} \mathrm{d}x \\&\quad + \int _{\Omega } \mu _{V, \infty }(x) V(x) \mathrm{d}x + o_{\beta \rightarrow \infty }(1). \end{aligned}$$

By convexity, this implies that \(\mu _{V, \beta }\) converges to \(\mu _{V, \infty }\) as \(\beta \rightarrow \infty \). \(\square \)

5.2 The Case of Minimizers

Proof of Proposition 1.5

Let \(\{\mathbf {X}_N\}_N\) be a sequence of N-point configurations such that for all \(N \ge 1\), \(\mathbf {X}_N\) minimizes \(\mathcal {H}_N\). From Proposition 3.5, we know that (up to extraction), \(\{\overline{\mathrm {Emp}}(\mathbf {X}_N)\}_N\) converges to some \(\overline{P}\in \overline{\mathcal {M}}_{stat,1}(\overline{\mathcal {X}})\) such that

$$\begin{aligned} \overline{\mathbb {W}}_s(\overline{P}) + \overline{\mathbb {V}}(\overline{P}) \le \liminf _{N \rightarrow \infty } \frac{\mathcal {H}_N(\mathbf {X}_N)}{N^{1+s/d}}, \end{aligned}$$
(5.2)

and we have, by (2.23), (3.1), and the scaling properties of \(\mathbb {W}_s\),

$$\begin{aligned} \overline{\mathbb {W}}_s(\overline{P}) + \overline{\mathbb {V}}(\overline{P}) \ge C_{s,d}\int _{\Omega } \rho (x)^{1+s/d} \mathrm{d}x + \int _{\Omega } V(x) \rho (x) \mathrm{d}x, \end{aligned}$$
(5.3)

where \(\rho = \overline{\mathrm {Intens}}(\overline{P})\). We also know that the empirical measure \(\mathrm {emp}(\mathbf {X}_N)\) converges to the intensity measure \(\rho = \overline{\mathrm {Intens}}(\overline{P})\).

On the other hand, from [16, Theorem 2.1], we know that \(\mathrm {emp}(\mathbf {X}_N)\) converges to some measure \(\mu _{V, \infty }\) that is defined as follows: define L to be the unique solution of

$$\begin{aligned} \int _{\Omega } \left[ \frac{L-V(x)}{C_{s,d}(1+s/d)}\right] _+^{d/s} \mathrm{d}x=1, \end{aligned}$$

and then let \(\mu _{V, \infty }\) be given by

$$\begin{aligned} \mu _{V, \infty }(x):= \left[ \frac{L-V(x)}{C_{s,d}(1+s/d)}\right] _+^{d/s} \qquad (x\in \Omega ). \end{aligned}$$
(5.4)

It is proved in [16] that \(\mu _{V, \infty }\) minimizes the quantity

$$\begin{aligned} C_{s,d}\int _{\Omega } \rho (x)^{1+s/d}\, \mathrm{d}x + \int V(x) \rho (x)\, \mathrm{d}x, \end{aligned}$$
(5.5)

among all probability density functions \(\rho \) supported on \(\Omega \). It is also proved that

$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{\mathcal {H}_N(\mathbf {X}_N)}{N^{1+s/d}} = C_{s,d}\int _{\Omega } \mu _{V, \infty }(x)^{1+s/d}\, \mathrm{d}x + \int V(x) \mu _{V, \infty }(x)\, \mathrm{d}x. \end{aligned}$$
(5.6)

By unicity of the limit, we have \(\rho := \overline{\mathrm {Intens}}(\overline{P}) = \mu _{V, \infty }\). In view of (5.2), (5.3), (5.6), and by the fact that \(\mu _{V, \infty }\) minimizes (5.5), we get that

$$\begin{aligned} \overline{\mathbb {W}}_s(\overline{P}) + \overline{\mathbb {V}}(\overline{P}) = C_{s,d}\int _{\Omega } \mu _{V, \infty }(x)^{1+s/d} \mathrm{d}x + \int _{\Omega } V(x) \mu _{V, \infty }(x) \mathrm{d}x \end{aligned}$$

and that \(\overline{P}\) is in fact a minimizer of \(\overline{\mathbb {W}}_s+ \overline{\mathbb {V}}\). We must also have

$$\begin{aligned} \overline{\mathbb {W}}_s(\overline{P}) = C_{s,d}\int _{\Omega } \mu _{V, \infty }(x)^{1+s/d}\, \mathrm{d}x; \end{aligned}$$

hence (in view of (2.23)) we get

$$\begin{aligned} \mathcal {W}_s(\mathcal {C}) = C_{s,d}\mu _{V, \infty }(x)^{1+s/d} = \min _{\mathcal {X}_{\mu _{V, \infty }(x)}} \mathcal {W}_s, \text { for }\overline{P}-\text {a.e. }(x,\mathcal {C}), \end{aligned}$$

which concludes the proof. \(\square \)

5.3 The One-Dimensional Case

Proposition 1.6 is very similar to the first statement of [17, Theorem 3], and we sketch its proof here.

Proof of Proposition 1.6

First, we use the expression of \(\mathbb {W}_s\) in terms of the two-point correlation function, as presented in (2.25):

$$\begin{aligned} \mathbb {W}_s(P) = \liminf _{R \rightarrow \infty } \int _{[-R, R]^d} \frac{1}{|v|^s} \rho _{2,P}(v) \left( 1 - \frac{|v|}{R} \right) \mathrm{d}v. \end{aligned}$$

Then, we split \(\rho _{2, P}\) as the sum

$$\begin{aligned} \rho _{2, P} = \sum _{k=1}^{+\infty } \rho _{2,P}^{(k)}, \end{aligned}$$

where \(\rho _{2,P}^{(k)}\) is the correlation function of the k-th neighbor (which makes sense only in dimension 1). It is not hard to check that

$$\begin{aligned} \int \rho _{2,P}^{(k)}(x) = 1 \text { and } \int x \rho _{2,P}^{(k)}(x) = k \end{aligned}$$

(the last identity holds because P has intensity 1 and is stationary). Using the convexity of

$$\begin{aligned} v \mapsto \frac{1}{|v|^s} \left( 1 - \frac{|v|}{R} \right) , \end{aligned}$$

we obtain that for any \(k \ge 1\), the following holds:

$$\begin{aligned}&\int \frac{1}{|v|^s} \left( 1 - \frac{|v|}{R} \right) \rho _{2,P}^{(k)} \mathrm{d}v \ge \int \frac{1}{|v|^s} \left( 1 - \frac{|v|}{R} \right) \delta _{k}(v) \mathrm{d}v \\&\quad = \int \frac{1}{|v|^s} \left( 1 - \frac{|v|}{R} \right) \rho _{2,P_{\mathbb {Z}}}^{(k)}(v) \mathrm{d}v, \end{aligned}$$

where \(P_{\mathbb {Z}} = u + \mathbb {Z}\) (with u uniform in [0, 1]). Thus we have

$$\begin{aligned} \mathbb {W}_s(P) \ge \mathbb {W}_s(P_{\mathbb {Z}}), \end{aligned}$$

which proves that \(\mathbb {W}_s\) is minimal at \(P_{\mathbb {Z}}\). \(\square \)