1 Introduction

We consider the quadratic form

(1)

where \(J \in \mathbb {R}^{N \times N}\) is a matrix of coefficients, \(\eta \in \mathcal {A}^N\) (with \(\mathcal {A}\) a finite subset of \(\mathbb {R}\)), \(\eta _{i}\) is the value of the i-th component of \(\eta \), and \(W \in \mathbb {R}\) is a suitable normalizing constant. This quadratic form plays an important role both in combinatorial optimization and statistical mechanics.

In statistical mechanics (1) is the Hamiltonian of several physical systems whose nature depends on the elements of \(\mathcal {A}\). For instance, if \(\mathcal {A} = \{-1, 1\}\) the Hamiltonian describes a system of (pairwise) interacting spins whereas if \(\mathcal {A} = \{0, 1\}\) it is generally used to describe a system of (pairwise) interacting particles. Spins or particles live on the vertices of some, possibly oriented, graph \(\mathcal {G} = \{V, E\}\), called the interaction graph, with \(|V| = N\) and J the weighted adjacency matrix of \(\mathcal {G}\). For each \((i,j) \in E\) the entry \(J_{i,j}\) represents the strength of the interaction between the entities (spins or particles) at vertices i and j of \(\mathcal {G}\). The microscopic state of the physical system is given by \(\eta \).

The physical system is described in terms of the probability of its microscopic states (Gibbs measure):

$$\begin{aligned} \mu (\eta ) = \frac{e^{-\beta H(J, \eta )}}{Z_{\beta }} \end{aligned}$$
(2)

where the parameter \(\beta \) is called the inverse temperature and \(Z_{\beta }\) is a normalizing constant called partition function.

In combinatorial optimization the problem of minimizing (or maximizing) (1) when \(\mathcal {A} = \{0, 1\}\) and \(W = 1\) is known under the names Quadratic Unconstrained Binary Optimization (QUBO) or Unconstrained Binary Quadratic Programming (UBQP). QUBO is NP-hard and, in general, no polynomial time algorithm is known to find a minimizer of H. Many problems in combinatorial optimization and data analysis can be mapped to QUBO in a straightforward manner (see [14] for a survey). Even constrained optimization problems can be reduced to QUBO by introducing quadratic penalties in the objective function.

Minimizers (ground states) of forms like H in (1) are of relevance in the context of statistical mechanics as well. Indeed ground states are the ones with the highest probability with respect to the probability measure (2). As the temperature of the physical system approaches zero (\(\beta \rightarrow \infty \)), the system will eventually reach its ground state.

If the signs of the entries of J are disordered, i.e. if it is not possible to find the global minimizer with a local criterion, finding the ground states of the system is non-trivial.

In the context of statistical mechanics, there is a vast literature concerning the properties of the ground states of H of the form (1). Here we recall the comprehensive texts by Parisi, Mezard, and Virasoro [16] (for a more physical perspective) and more recently by Talagrand [23, 24] and Panchenko [18]. The recent book [7] provides an up-to-date overview.

The system that attracted more effort from physicists, and that is most described in the references above, is the so-called Sherrington–Kirkpatrick (SK) model, in which \(\eta \in \{-1, 1\}^{N}\), \(W = \frac{1}{\sqrt{N}}\) and \(J_{i,j}\) are independent standard Gaussian random variables. In relatively more recent times, always in the context of spin systems, the universality of the features of the SK model with respect to the distribution of the (independent) \(J_{ij}\) has been proven, see [6, 8].

The \(\{0, 1\}\) counterpart to the Sherrington–Kirkpatrick model, whose Hamiltonian matches the objective function of a QUBO instance, has not been the subject of the same vast attention in the statistical mechanics literature. Many important results have been achieved in [17, 19], in which features of the minimizers and maximizers analogous to the SK model for the Hamiltonian (1) have been proven for \(J_{i,j}\) independent Gaussian random variables and a quite general choice of the set \(\mathcal {A}\), including also the particle case \(\eta \in \{0, 1\}^{N}\). In particular it has been proven that the free energy of the system is close to its expectation with probability one (concentration) and that its thermodynamic limit exists.

The same problem, in the canonical ensemble, i.e. with a fixed number of particles, has been discussed in [12]. There, for general distribution of \(J_{i,j}\), interesting features of the system, most of all for small density of particles, have been found.

Restricting only to the case \(\eta \in \{0, 1\}^N\) and \(J_{i,j}\) independent Gaussian random variables, with very different and more elementary techniques, in [21] an almost sure lower bound for the minimum per particle of H has been provided together with a different proof of concentration.

In this paper we will show, in the spirit of [6], that the results of [21] are rather robust with respect to the distribution of the \(J_{i,j}\)’s. In particular, we consider the case of independent \(J_{i,j}\) with \(\mathbb {E}\left[ J_{i,j}\right] = 0\). If the tails of the distributions of the \(J_{i,j}\) are not too fat, then after proper normalization, the value of the minimum of H is close to its expectation with high probability and does not depend on the actual distribution of the \(J_{i,j}\). Further, with the help of numerical simulations we provide some insight into the structure of both the minimizer and the maximizer of H and show that also this structure is robust. Note that our results hold in the diluted case as well, that is, in the case where each \(J_{i,j}\) is zero with a certain probability. This probability needs not to be fixed, but it is allowed to go to 1 as \(N \rightarrow \infty \) in a suitable way.

Rigorous results are presented in Sect. 2, whereas the numerical findings concerning the structure of the minimizer and the maximizer of H are detailed in Sect. 3. Proofs are given in Sect. 4.

Throughout the paper, we use the jargon of statistical mechanics. As a consequence, we use expressions like (particle) configuration when referring to \(\eta \), number of particles when referring to N, Hamiltonian for the quadratic form (1) and energy and energy per particle (of a configuration \(\eta \)) for, respectively, \(H(J, \eta )\) and \(\frac{H(J, \eta )}{N}\). Likewise, the minimizer (1) is often referred to as the ground state of H and we use the expression thermodynamic limit to denote the limit as \(N \rightarrow \infty \).

Remark 1.1

Note that H denotes a family of random variables indexed by N. However, we do not write this dependence explicitly to lighten the notation.

2 Main results

We consider random instances of QUBO, that is we assume the matrix \(J = \{J_{i,j}\}_{1 \le i,j \le N}\) to be the realization of some multivariate random variable. Unless otherwise specified we will assume the \(J_{i,j}\)’s to be independent, identically distributed, and such that \(\mathbb {E}\left[ J_{i,j}\right] = 0\) and \({\text {Var}}\left( J_{i,j}\right) = \sigma ^2\). As for the value of the normalizing constant W we take it to be such that the random variable \(\sum _{ij} J_{i,j}\) has variance N.

Remark 2.1

Note that, in general, the distribution of the \(J_{i,j}\) is allowed to have atoms. In particular, we will be interested in random variables taking the value zero with probability \(1 - p(N) = 1 - N^{\delta - 2}\) for \(\delta \) in \(\left( 1, 2\right] \). In this way we can include in our analysis the diluted case, that is the case where the matrix J of the coefficients of the objective function is “sparse” with expected density \(\rho = p(N)\). Note that, in this way, the average degree of each vertex grows unbounded with N. For \(\sum _{ij} J_{i,j}\) to have variance N one should set \(W = \frac{1}{\sqrt{\rho N {\text {Var}}\left( J_{1,1}\right) }}\).

In the remainder of the paper, we will use the following notation.

Let

$$\begin{aligned} \eta ^{\min } :={{\,\mathrm{arg\,min}\,}}_{\eta \in \{0, 1\}^N} H(J, \eta ) \quad \text { and } \quad \eta ^{\max } :={{\,\mathrm{arg\,max}\,}}_{\eta \in \{0, 1\}^N} H(J, \eta ), \end{aligned}$$

that is \(\eta ^{\min }\) and \(\eta ^{\max }\) are, respectively, the minimizer and the maximizer of H.

Further, let

$$\begin{aligned} \min _{\eta \in \{0, 1\}^N} H(J, \eta ) =: -m_{\min , N} \cdot N \quad \text { and } \quad \max _{\eta \in \{0, 1\}^N} H(J, \eta ) =: m_{\max , N} \cdot N. \end{aligned}$$

In words, \(m_{\min , N}\) and \(m_{\max , N}\) are, respectively, the minimum and the maximum per particle of H.

Moreover, setting \(|\eta | = \sum _{i = 1}^N \eta _{i}\) we call

$$\begin{aligned} \alpha _{\min , N} = \frac{ |\eta ^{\min }| }{N}\; \text { and }\; \alpha _{\max , N} = \frac{ |\eta ^{\max }| }{N} \end{aligned}$$

the proportion of ones in the minimizer and maximizer of H.

We remark that \(\eta ^{\min }\), \(\eta ^{\max }\), \(m_{\min , N}\), \(m_{\max , N}\), \(\alpha _{\min , N}\) and \(\alpha _{\max , N}\) are random variables that depend on the realization of J, but we do not write this dependence explicitly to lighten the notation.

We are interested in the behavior of \(m_{\min , N}\), \(m_{\max , N}\), \(\alpha _{\min , N}\) and \(\alpha _{\max , N}\) as \(N \rightarrow \infty \). When the \(J_{i,j}\) are Gaussian, the existence of the limits

  1. a)

    \(m_{\min } = \lim _{N \rightarrow \infty } m_{\min , N}\),

  2. b)

    \(m_{\max } = \lim _{N \rightarrow \infty } m_{\max , N}\),

  3. c)

    \(\alpha _{\min } = \lim _{N \rightarrow \infty } \alpha _{\min , N}\),

  4. d)

    \(\alpha _{\max } = \lim _{N \rightarrow \infty } \alpha _{\max , N}\)

follows from Theorems 1.1 and 1.2 in [17]. Reasonable numerical estimates for the quantities of interest are \(m_{\min } = m_{\max } \approx 0.42\) and \(\alpha _{\min } = \alpha _{\max } \approx 0.624\) (see Sect. 3 below).

We show that, as N gets larger, the fluctuations of both the minimum and maximum per particle of H around their expected values vanish provided some conditions on the tails of the \(J_{i,j}\) are satisfied. More precisely, we have the following

Theorem 2.2

Let the \(J_{i,j}\) be independent identically distributed random variables with \(\mathbb {E}\left[ J_{1,1}\right] = 0\) and \({\text {Var}}\left( J_{1,1}\right) = 1\). Then,

  1. (a)

    as \(N \rightarrow \infty \),

    $$\begin{aligned} m_{\min , N} - \mathbb {E}\left[ m_{\min , N}\right] \overset{{\text {P}}}{\longrightarrow } 0 \end{aligned}$$
    (3)
    $$\begin{aligned} m_{\max , N} - \mathbb {E}\left[ m_{\max , N}\right] \overset{{\text {P}}}{\longrightarrow } 0. \end{aligned}$$
    (4)

    where \(\overset{p}{\longrightarrow }\) denotes convergence in probability.

Further,

  1. (b)

    If \(\mathbb {E}\left[ \left| J_{1,1}\right| ^{3}\right] < \infty \) the convergence in (3) and (4) is almost sure.

  2. (c)

    Suppose the \(J_{i,j}\) are strictly subgaussian, that is,

    $$\begin{aligned} \mathbb {E}\left[ \exp (sJ_{1,1})\right] \le \exp \left( \frac{s^{2}}{2}\right) , \quad \forall s \in \mathbb {R}. \end{aligned}$$

    Then there exists a positive constant D such that, for all \(z > 0\)

    $$\begin{aligned} {\text {P}}\left( *\right) {\left| m_{\min , N} - \mathbb {E}\left[ m_{\min , N}\right] \right|> Nz } \le e^{-DNz}\, \text { and }\, {\text {P}}\left( *\right) {\left| m_{\max , N} - \mathbb {E}\left[ m_{\max , N}\right] \right| > Nz } \le e^{-DNz} \end{aligned}$$

    that is, the convergence is exponentially fast.

The proof is provided in Sect. 4.1.

Moreover the expected value of the minimum and maximum per particle do not depend on the actual distribution of the random couplings, provided they have finite variance.

Theorem 2.3

Let \(J = \left\{ J_{ij} \right\} _{1\le i,j\le N}\) and \(Y = \left\{ Y_{i,j} \right\} _{1\le i,j\le N}\) be two independent sequences of independent random variables, such that for every ij \(\mathbb {E}\left[ J_{i,j}\right] = \mathbb {E}\left[ Y_{i,j}\right] = 0\) and \(\mathbb {E}\left[ J_{i,j}^{2}\right] = \mathbb {E}\left[ Y_i^2\right] = 1\). Also, set \(\gamma := \max \left\{ \mathbb {E}\left[ \left| J_{ij}\right| ^3\right] , \mathbb {E}\left[ \left| Y_{ij}\right| ^3\right] , 1 \le i,j \le N\right\} \); \(\gamma \) may be infinite. Then we have, as \(N \rightarrow \infty \),

$$\begin{aligned} \frac{1}{N} \left| \mathbb {E}\left[ \min _{\eta } H(J, \eta )\right] - \mathbb {E}\left[ \min _{\eta } H(Y, \eta )\right] \right| \rightarrow 0 \end{aligned}$$
(5)
$$\begin{aligned} \frac{1}{N} \left| \mathbb {E}\left[ \max _{\eta } H(J, \eta )\right] - \mathbb {E}\left[ \max _{\eta } H(Y, \eta )\right] \right| \rightarrow 0 \end{aligned}$$
(6)

If furthermore \(\gamma < \infty \),

$$\begin{aligned} \frac{1}{N} \left| \mathbb {E}\left[ \min _{\eta } H(J, \eta )\right] - \mathbb {E}\left[ \min _{\eta } H(Y, \eta )\right] \right| \le C N^{-1/6} \end{aligned}$$
(7)
$$\begin{aligned} \frac{1}{N} \left| \mathbb {E}\left[ \max _{\eta } H(J, \eta )\right] - \mathbb {E}\left[ \max _{\eta } H(Y, \eta )\right] \right| \le C N^{-1/6} \end{aligned}$$
(8)

where C is a constant depending only on \(\gamma \).

The proof comes as a consequence of an analogous result on the universality of the free energy which is presented in Sect.  4.2.

As an immediate consequence of Theorem 2.3 and Theorem 2.2 ((c)), the minimum and maximum per particle have the same almost sure lower bound of the Gaussian case (see [21]) irrespective of the details of the distribution of the \(J_{i,j}\). More precisely, let \(\nu ^{-}(m) = |\{\eta : H(J, \eta ) \le -m N\}|\) denote the number of configurations whose energy is less or equal to \(-m N\) and, similarly, let \(\nu ^{+}(m) = |\{\eta : H(J, \eta ) \ge m N\}|\) be the number of configurations with energy at least equal to mN. Then:

Corollary 2.4

Let the \(J_{i,j}\)’s be independent identically distributed strictly subgaussian random variables with \(\mathbb {E}\left[ J_{1,1}\right] = 0\) and \({\text {Var}}\left( J_{1,1}\right) = 1\) and let \(I(x) = - x \log (x) - (1-x) \log (1 - x)\). Then, for large values of N and for some constant C, \({\text {P}}\left( *\right) {\nu ^{-}(m^{\star }) > 0} \le e^{-CN}\) and \({\text {P}}\left( *\right) {\nu ^{+}(m^{\star }) > 0} \le e^{-CN}\) where \(m^{\star } \approx 0.562\) is the extremal value of m such that the function \(I(\alpha ) - \frac{m^2}{2\alpha ^{2}(1 - \alpha )^{2}}\), for fixed m, has no zeros. This value is obtained for \(\alpha = \alpha ^{\star } \approx 0.644\).

3 Conjectures and numerical results

In this section, we present some numerical results concerning the minimum and maximum per particle and the structure of the minimizer and the maximizer of QUBO instances with random coefficients. The findings presented in Sect. 3.1 allow us to highlight some interesting features concerning the connection between particles contributing to the minimizer of H (that is, components of \(\eta ^{\min }\) equal to 1) and particles contributing to the maximizer of H (that is, components of \(\eta ^{\max }\) equal to 1) and the probability of the events \(\{\eta ^{\min }_{i} = 1\}\) and \(\{\eta ^{\max }_{i} = 1\}\). Simulations have been carried out using Monte Carlo Probabilistic Cellular Automata (PCA) as those introduced in [2, 3, 10, 21, 22]. The advantage of these algorithms is represented by their inherently parallel nature which allows the exploitation of parallel architectures at their fullest while preserving a quality of the solution comparable to the one obtained with single spin flip MCMC algorithms as outlined in [3, 13, 21]. Thanks to these algorithms we were able, in the diluted case, to simulate effectively systems with N up to 128000.

In particular, the algorithm used in the simulations works as follows. Let

$$\begin{aligned} H(\eta , \tau ) = \sum _{i,j} J_{ij}\eta _{i}\tau _{j} + q \sum _{i}(1 - \sigma _{i}\tau _{i}) \end{aligned}$$
(9)

As a preliminary step consider the symmetrized version of \(\tilde{J}\) of J (that is \(\tilde{J}=\frac{J + J^{T}}{2}\)) and note that the value of the Hamiltonian dos not change if we replace J with \(\tilde{J}\). Let \(H(\eta , \tau ) = \beta \sum _{i}h_{i(\eta )\tau _{i}} + q\sum _{i}\left[ \eta _{i}(1-\tau _{i}) + \tau _{i}(1-\eta _{i}) \right] \) with \(h_{i}(\eta ) =\frac{1}{\sqrt{ N }}\sum _{j}\tilde{J}_{i,j}\eta _{j}\) and \(\beta \) and q two positive parameters and choose the transition matrix to be \(P_{\eta , \tau } = \frac{e^{ -H(\eta , \tau ) }}{\sum _{\tau }e^{ -H(\eta ,\tau ) }}\). Rewriting \(P_{\eta , \tau }\) as

$$\begin{aligned} P(\eta , \tau ) = \prod _{i} \frac{e^{ -\beta h_{i}(\eta )\tau _{i}-q[\eta _{i}(1-\tau _{i}) + \tau _{i}(1-\eta _{i})] }}{Z_{i}} \end{aligned}$$
(10)

we immediately see that, conditionally on \(\eta \), the value of each \(\tau _{i}\) can be sampled independently with the following probabilities:

$$\begin{aligned} P(\tau _{i}=1) = \frac{e^{ -\beta h_{i} (\eta ) - q (1 - \eta _{i}) }}{Z_{i}} \; \text { and } \; P(\tau _{i}=0|\eta ) = \frac{e^{ -q \eta _{i} }}{Z_{i}} \end{aligned}$$
(11)

with \(Z_{i} = e^{ -\beta h_{i} - q(1- \eta _{i})} + e^{ - q\eta _{i} }\). Then, at least in principle, each component of the configuration could be updated on a dedicated core.

Note that, by the symmetry of \(\tilde{J}\), we have that \(\frac{\sum _{\eta }e^{ -H(\eta ,\tau ) }}{\sum _{\eta ,\tau }e^{ -H(\eta ,\tau ) }}\) is the reversible measure of \(P_{\eta ,\tau }\). In addition, observe that \(H(\eta , \eta ) = \beta H(\eta )\). The intuitive rationale for why the heuristic algorithm is expected to work is the following. As q gets large, the weight of pairs \((\eta , \tau )\) with \(\eta \ne \tau \) in \(\pi (\sigma )\) becomes smaller and the stationary measure of the algorithm becomes closer to the Gibbs measure \(\frac{e^{ -\beta H(\eta ) }}{Z}\) with Hamiltonian H and inverse temperature \(\beta \). A comparison between this algorithm and the Metropolis one is provided in [21].

The computation of the local fields \(h_{i}\) proves to be the computationally expensive part of the algorithm. However, the vector of local fields \(\{ h_{i}(\eta ) \}_{i = 1, \dots , N}\) is the matrix–vector product \(J \cdot \eta \) and can be computed using highly optimized linear algebra libraries and exploiting a parallel computing device such as a GPU. Further, observe that k simulations can be carried over in parallel, possibly with different parameters \(\beta \) and q. Let \(\textbf{E}\) be the \(N \times k\) matrix whose columns contain the k configurations to be updated. In this case, the entire collection of vectors of fields \(\{ h_{i}(\eta ^{(m)} ) \}_{i = 1, \dots , N; m = 1, \dots , k}\) (encoded, again, as an \(N \times k\) matrix) is the matrix–vector product \(J \cdot \textbf{E}\). Also, this product can be computed effectively using highly optimized libraries and is, in general, substantially faster than computing k matrix–vector products.

We remark that the values we found are heuristic and we have no guarantee that they coincide with the true values of the minimizer and the maximizer of H(J). However, as already discussed in more detail in [21], the exact algorithm introduced in [20] can be exploited to compute the exact minima and maxima of H(J) for values of N up to 150. The absolute value of the minimum and the maximum per particle of H(J) are steadily close to 0.42 already for values of N between 80 and 150. Moreover, for instances with sizes up to 150, we verified that our algorithm typically finds the same optimum as the exact algorithm. For the larger instances, we retained the minimum and maximum found in a set of runs of the algorithms with different pairs of values of \(\beta \) and q.

3.1 Minimum and maximum per particle

The standard normal case has been investigated extensively in [21]: for values of N relatively small, both \(m_{\min , N}\) and \(m_{\max , N}\) oscillate around a value about 0.42 whereas \(\alpha _{\min , N}\) and \(\alpha _{\max , N}\) very rapidly approach a value about 0.624 (see Fig. 1). Theorem 2.3 states that both \(m_{\min , N}\) and \(m_{\max , N}\) are robust with respect to the distribution of the \(J_{i,j}\). Numerical simulations show that this robustness also concerns the proportion of ones in both the minimizer and maximizer of H.

Figure 2 shows the behavior of \(m_{\min , N}, m_{\max , N}, \alpha _{\min , N}, \alpha _{\max , N}\) for \(J_{i,j}\) with (shifted) exponential distribution. Even if the exponential distribution is rather asymmetric and is not subgaussian, for values of N relatively small the average energy per particle and the proportion of ones in both the minimizer and the maximizer approach those of the standard normal case.

Fig. 1
figure 1

Average values of \(m_{\min , N}\), \(m_{\max , N}\), \(\alpha _{\min }\) and \(\alpha _{\max , N}\) in the case of standard normally distributed \(J_{i,j}\)’s. Values of \(m_N\) and \(\alpha _N\) in the table are computed as averages of \(m_{\min , N}, M_{\max , N}\) and \(\alpha _{\min }, \alpha _{\max , N}\) respectively. The length of each branch of the error bars is equal to to standard error of the empirical average. Averages are computed over 10000 realization of J for N up to 1024 and between 1000 and 5000 realization of J for larger instances

Fig. 2
figure 2

In this case each \(J_{i,j}\) is distributed as \(X - 1\) with X an exponential random variable with expected value 1. The exponential distribution is rather asymmetric and is not subgaussian. However, already for N of order “a few hundred” the values of the minimum and maximum per particle of H and the proportion of “ones” in the minimizer and the maximizer of H approach those of the normal case. The length of each branch of the error bars is equal to to standard error of the empirical average (only shown for the exponential distribution). Averages are computed over 10000 realizations of J. The curves for the normal distributions in this chart are the averages of the values \(m_{\min , N}\) and \(m_{\max , N}\) in the first panel and \(\alpha _{\min , N}\) and \(\alpha _{\max , N}\) in the second panel

Fig. 3
figure 3

Comparison (top panel) of the average of the minimum and maximum per particle of H in the diluted case with the corresponding value in the standard normal case for several values of \(\delta \). In all these cases the average value of the minimum and maximum per particle appears to approach the same limit (about 0.42). As for the values of \(\alpha _{N}\) (bottom panel), these appear to be very close to the value of \(\alpha _{N}\) of the standard normal case. Averages, for \(\delta < 2\), are computed over 100 realization of J

In our simulations, we also considered the diluted case, that is, we took \(J_{i,j}\) to be zero with probability \(1 - p_{\delta }(N)\), where \(p_{\delta }(N) = N^{\delta - 2}\), and a standard normal random variable otherwise. Findings concerning the behavior of \(m_{\min , N}, m_{\max , N}, \alpha _{\min , N}, \alpha _{\max , N}\) are presented in Figs. 3 and 4. The log-log plot of Fig. 5 suggests that \(m_N\) converges to \(\bar{m}\) as a power of N.

Fig. 4
figure 4

Average of the minimum per particle and of \(\alpha _{\min , N}\) in the diluted case for \(\delta = 1.2\) and \(\delta = 1.1\). The length of each branch of the error bars is equal to the standard error of the empirical average. Averages are computed over 900 realizations of J for N up to 1024, 100 realizations for N between 2048 and 32000 and 36 realizations for larger values of N

Fig. 5
figure 5

log-log plot of the distance of the absolute value of the minimum per particle from the conjectured limiting value \(\bar{m}\) as a function of N in the diluted case for \(\delta = 1.2\) and \(\delta = 1.1\). For both \(\delta = 1.2\) and \(\delta = 1.1\) the lines appear to have a negative slope suggesting that, in both cases, the absolute value of the minimum per particle of H will reach the value \(\bar{m}\)

We highlight that as \(\delta \) becomes smaller, the density of J becomes very small. To give an idea, values of the density of J are given in Table 1. For instance, with \(N = 128000\) and \(\delta = 1.1\) the expected number of nonzero entries in each row of J is less than 4.

Table 1 Density of matrix J for several values of N and \(\delta \)
Table 2 Values of the maximum per particle for some benchmark instances

In the operation research literature, to test optimization algorithms, it is common to consider random instances of QUBO where the \(J_{i,j}\) have a uniform distribution (see, e.g., [1, 15, 25, 26]) and where the matrix J is, possibly, sparse. Values of the best-known maximizer for some benchmark QUBO instances in the case of uniformly distributed \(J_{i,j}\) are reported in Table 2. It is apparent that also in these cases, the values of the optima per particle agree with those of the standard normal case.

3.2 Structure of minimizer and maximizer

Observe that for any realization of J the minimizer and the maximizer of the Hamiltonian are with probability 1 since the distribution of the couplings is absolutely continuous. Then it is possible to partition the indices \(1, 2, \ldots , N\) into four sets:

  • \(I_{1} = \{i: \eta ^{\min }_{i} = 1, \; \eta ^{\max }_{i} = 0\}\);

  • \(I_{2} = \{i: \eta ^{\min }_{i} = 1, \; \eta ^{\max }_{i} = 1\}\);

  • \(I_{3} = \{i: \eta ^{\min }_{i} = 0, \; \eta ^{\max }_{i} = 1\}\);

  • \(I_{4} = \{i: \eta ^{\min }_{i} = 0, \; \eta ^{\max }_{i} = 0\}\).

To refer properly to the cardinality of these sets we give the following

Definition 3.1

\(\alpha _{k,N}:= \frac{ |I_{k}| }{ N }, \, k = 1, 2, 3, 4\).

With an abuse of notation, we say that the i-th row (column) of J belongs to \(I_{k}\) if \(i \in I_{k}\). Note that \(\alpha _{1,N}\) can be interpreted as the proportion of 1 appearing in the minimizer but not appearing in the maximizer of H. Similar interpretations can be given for \(\alpha _{2,N}\), \(\alpha _{3,N}\) and \(\alpha _{4,N}\).

Remark 3.2

With the definition of \(\alpha _{i}\) given above it is immediate to see that \(\alpha _{\min ,N} = \alpha _{N,1} + \alpha _{N,2}\) and \(\alpha _{\max ,N} = \alpha _{N,2} + \alpha _{N,3}\)

Leveraging on the definition of \(I_{k}\), it is possible to partition J into 16 blocks

$$\begin{aligned} J[k,\ell ]:= \{J_{i,j}, \; i \in I_{k}, \, j \in I_{\ell }\} \end{aligned}$$

Clearly, \(J[I_{k}, I_{\ell }]\) is a sub matrix of J with \(N\alpha _{k,N}\) rows and \(N\alpha _{\ell ,N}\) columns.

Numerical simulations suggest that the average value and the variance of the entries in each block, subject to proper normalization and the relative size of the blocks converge to a deterministic limit as \(N \rightarrow \infty \). These limits do not depend on the distribution of the \(J_{i,j}\)’s, as long as they have expected value zero and variance 1. More precisely we make the following

Conjucture 3.3

Let \(J_{i,j}\) be independent identically distributed random variables with \(\mathbb {E}\left[ J_{1,1}\right] = 0\) and \({\text {Var}}\left( J_{1,1}\right) = 1\). Then there exist constants \(\alpha _{1}, \alpha _{2}, \alpha _{3}, \alpha _{4}\) such that

  1. 1.

    \(\lim \limits _{N\rightarrow \infty } \alpha _{1,N} = \alpha _{1}\), \(\lim \limits _{N\rightarrow \infty } \alpha _{2,N} = \alpha _{2}\)\(\lim \limits _{N\rightarrow \infty } \alpha _{3,N} = \alpha _{3}\)\(\lim \limits _{N\rightarrow \infty } \alpha _{4,N} = \alpha _{4}\)

  2. 2.

    \(\alpha _{1} = \alpha _{2} = \alpha _{3} = \frac{1}{2}\alpha _{\min } = \frac{1}{2}\alpha _{\max }\); \(\alpha _{4} = 1 - \frac{3}{2}\alpha _{\min }\)

  3. 3.

    \({\text {Var}}\left( J[k, \ell ]_{i,j}\right) = 1\) for \(k, \ell = 1, \ldots , 4\)

Further, set \(\mu [k, \ell ] = \mathbb {E}\left[ J[k,\ell ]_{i,j}\right] \) and write the covariance of \(J[k,\ell ]_{i}, J[k, \ell ]_{j}\) as

$$\begin{aligned} {\text {Cov}}(J[k,\ell ]_{i}, J[k, \ell ]_{j}) = \frac{N^{2}\alpha _{k}\alpha _{\ell }}{2}(1 - \tilde{\sigma }[k,\ell ]) \end{aligned}$$

.

Computation of averages over 10000 realizations of J with standard normal distribution and \(N = 1024\) yielded the following estimates for \(\alpha _{i}\), \(\mu [k,\ell ]\) and \(\tilde{\sigma }[k,\ell ]\).

$$\begin{aligned}&\mathbf {\alpha } = \begin{bmatrix} 0.3105 \\ 0.31331 \\ 0.31054 \\ 0.06565 \\ \end{bmatrix} \end{aligned}$$
(12)
$$\begin{aligned}&\mu [k,\ell ] = \begin{bmatrix} -0.02161 & -0.05687 & 0.00006 & 0.04229 \\ -0.05687 & 0.00001 & 0.05684 & 0.00013 \\ -0.00005 & 0.05685 & 0.02161 & -0.04241 \\ 0.04228 & 0.00003 & -0.04227 & 0.00001 \\ \end{bmatrix} \end{aligned}$$
(13)
$$\begin{aligned}&\tilde{\sigma }[k,\ell ] = \begin{bmatrix} 0.92996 & 1.11006 & 0.74334 & 1.20509 \\ 1.06221 & 0.58133 & 1.08835 & 0.77939 \\ 0.73879 & 1.08343 & 0.9415 & 1.25195 \\ 1.21106 & 0.76959 & 1.2258 & 0.99425 \\ \end{bmatrix} \end{aligned}$$
(14)

These values suggest that the random couplings in blocks [2, 2], [3, 1], [4, 2], [1, 3], and [2, 4] are negatively correlated, the random couplings in blocks [1, 4], [3, 4], [4, 1], and [4, 3] are positively correlated, whereas the random couplings in the remaining blocks are roughly independent.

Very similar values can be obtained in the diluted case and for \(J_{i,j}\) with distributions other than the normal.

Note that it is always possible to relabel the indices \(1, 2, \ldots , N\) so that, in J, indices in \(I_{1}\) appear “first”, indices in \(I_{2}\) appear “second”, in \(I_{3}\) appear “third” and indices in \(I_{4}\) appear “last”. With this relabelling, a graphical representation of Conjecture 3.3 is provided in Fig. 6.

Fig. 6
figure 6

Graphical representation of the matrix of the couplings. These matrices are obtained from J by rearranging the rows (columns) of J so that rows (columns) in \(I_{1}\) appear “first” rows (columns) \(I_{2}\) appear “second”, rows (columns) \(I_{3}\) appear “third” and rows (columns) \(I_{4}\) appear “last”. Pictures are obtained by averaging the values of the \(J_{i,j}\) over squares of size L. Negative (average) values of J are blue whereas positive values are red; darker colors correspond to larger absolute values

3.3 Probability a particle belongs to the minimizer/maximizer of H

Sets \(I_{1}, \ldots I_{4}\) introduced above are random sets depending on the realization of J. The problem of finding \(\eta ^{\min }\) and \(\eta ^{\max }\) can be restated as the problem of determining the sets \(I_{\min } = I_{1} \cup I_{2}\) and \(I_{\max } = I_{2} \cup I_{3}\). Though, as already mentioned, this problem is NP-hard, we tried to assess, numerically, whether it is possible to determine the probability that a certain index i belongs to either \(I_{\min }\) or \(I_{\max }\).

To this aim, consider the matrix \(\tilde{J} = \frac{J + J'}{2}\), that is the symmetrized version of J and the function \(\tilde{H}\) obtained from H by replacing J with \(\tilde{J}\). Note that \(H(J, \eta ) = \tilde{H}(\eta )\) for all \(\eta \) and, consequently, the values of \(\eta ^{\min }_{i}\) and \(\eta ^{\max }_{i}\) do not depend on whether matrix J or \(\tilde{J}\) is considered.

Take the matrix \(\tilde{J}\) and define \(R = \sum _{j = 1}^{N} J_{i, j}\), that is R is the vector of the sums of the rows of \(\tilde{J}\). Let [i] be the index of the i-th smallest element of R. We say that the [i]-th row (column) of \(\tilde{J}\) belongs to the minimizer (respectively the maximizer) of H if \(\eta ^{\min }_{[i]} = 1\) (respectively \(\eta ^{\max }_{[i]} = 1\)).

Numerical simulations show that the probability that the [i]-th row of \(\tilde{J}\) belongs to the minimizer (respectively maximizer) of H is a decreasing (respectively increasing) deterministic function of \(\frac{[i]}{N}\). This function is expected not to depend on the actual distribution of J. Further, we expect that a positive fraction of the smallest (largest) rows of \(\tilde{J}\) (where smallest/largest refers to the sum of the elements on each row of \(\tilde{J}\)) to belong to the minimizer (maximizer) of H with positive probability (see Fig. 7). More precisely we have the following

Conjucture 3.4

As \(N \rightarrow \infty \), \({\text {P}}\left( *\right) {\eta ^{\min }_{[i]} = 1 } = f_{\min }\left( \frac{[i]}{N}\right) + o(1)\) with \(f_{\min }\) a decreasing function. Similarly, as \(N \rightarrow \infty \), \({\text {P}}\left( *\right) {\eta ^{\min }_{[i]} = 1 } = f_{\max }\left( \frac{[i]}{N}\right) + o(1)\) as, with \(f_{\max }\left( \frac{[i]}{N}\right) = f_{\min }\left( 1 - \frac{[i]}{N}\right) \). Moreover, there exists \(\lambda _0 > 0\) such that \(f_{\min } (\lambda ) = 1\) for all \(\lambda < \lambda _0\). Finally, if the tails of \(J_{i,j}\) decay sufficiently fast, the function f does not depend on the distribution of \(J_{i,j}\)

Fig. 7
figure 7

Probability the [i]-th row of \(\tilde{J}\) belongs to the minimizer (the increasing cloud of points) and to the maximizer (the decreasing cloud of points) of H for several values of N. Estimates are obtained as averages over 400 realizations of J for each value of N. The shape of the cloud appears to be independent of the size of the system

Finally, we observe that the random variables \(\mathbb {1}_{\{\eta ^{\min }_{[i]} = 1\}}\) and \(\mathbb {1}_{\{\eta ^{\max }_{[i]} = 1\}}\) appear to be positively correlated for all i. The qualitative behavior of the strength of the correlation between these two variables as a function of [i] is provided by the estimates of Fig. 8.

Fig. 8
figure 8

Ratio between the probability that the [i]-th belongs to both the minimizer and the maximizer of H and the product of the probability of each of the two events for \(N = 4096\). Estimates are obtained as averages over 400 realizations of J for each value of N

4 Proofs

We state first some general results that will be used below in the proofs of Theorem 2.2 and Theorem 2.3.

Let \(X \!=\! \{X_{1}, \ldots , X_{n}\}\) be independent random variables and let \(W\!=\!g\left( X_1, \ldots , X_i, \ldots , X_n\right) \) with g a measurable function. Further, let \(X^{\prime } = \{X_{1}^{\prime }, \ldots , X_{n}^{\prime }\}\) be an independent copy of \(X_{1}, \ldots , X_{n}\) and write

$$\begin{aligned} W_{i}=g\left( X_{1}, \ldots , X_{i}^{\prime }, \ldots , X_{n}\right) \end{aligned}$$
(15)

Define the random variable V by

$$\begin{aligned} V=\mathbb {E}\left[ \sum _{i=1}^n\left( W - W_{i}\right) ^{2} \mid {X}\right] \end{aligned}$$
(16)

which allows to re-state Efron-Stein’s theorem (see [11]) as follows

Theorem 4.1

(Efron-Stein)

$$\begin{aligned} {\text {Var}}\left( W\right) \le \frac{1}{2} \mathbb {E}\left[ V\right] \end{aligned}$$
(17)

From [5], we can bound the moment generating function with the following

Theorem 4.2

For all \(\theta >0\) and \(\lambda \in (0,\frac{1}{\theta })\),

$$\begin{aligned} \mathbb {E}\left[ \exp (\lambda (W - \mathbb {E}\left[ W\right] ))\right] \le \exp \left( \frac{\lambda \theta }{1-\lambda \theta }\right) \mathbb {E}\left[ \exp \left( \frac{\lambda V}{\theta }\right) \right] . \end{aligned}$$
(18)

Moreover, a straightforward consequence of [4, Theorem 2] yields

Theorem 4.3

$$\begin{aligned} \mathbb {E}\left[ \left| W-\mathbb {E}\left[ W\right] \right| ^3\right] < \mathbb {E}\left[ \left| V\right| ^{\frac{3}{2}}\right] . \end{aligned}$$
(19)

To prove both Theorem 2.2 and Theorem 2.3 a key role is played by the free energy function \(F: \mathbb {R}^{N \times N} \rightarrow \mathbb {R}\) defined as

$$\begin{aligned} F_{\beta }(X) := \beta ^{-1} \log Z(X). \end{aligned}$$
(20)

with \(Z(X):= \sum _{\eta } e^{\beta H(X, \eta )}\), \(H(X, \eta ) = \frac{1}{\sqrt{N}}\sum _{i,j} X_{i,j} \eta _{i} \eta _{j}\). and \(X = \{X_{i,j}\}_{1 \le i, j, \le N}\) a collection of independent random variables (the matrix of the couplings). We do not write the dependence of \(F_{\beta }\) and Z on N to make the notation less heavy. Further, we set

$$\begin{aligned} p(X, \eta ) := \frac{e^{\beta H(X, \eta )}}{Z_{X}}; \quad \langle \eta _i\eta _j \rangle _{X} = \sum _{\eta }\eta _{i}\eta _{j}p(X, \eta ) \end{aligned}$$
(21)

where \(p(X, \eta )\) is the Gibbs measure of \(\eta \) and \(\langle \cdot \rangle _{X}\) denotes the expectation with respect to the Gibbs measure when the matrix of couplings is X.

In the next lemma, we determine bounds on the derivatives of F with respect to one of the couplings.

Lemma 4.4

Let \(F: \mathbb {R}^{N \times N} \rightarrow \mathbb {R}\) be as in (20). Then

$$\begin{aligned} \left| \frac{\partial F_{\beta }(X)}{\partial X_{i,j}}\right| \le \frac{1}{\sqrt{N}}\, ; \quad \left| \frac{\partial ^{2} F_{\beta }(X)}{\partial X_{i,j}^{2}}\right| \le \frac{\beta }{4N}\, ; \quad \left| \frac{\partial ^{3} F_{\beta }(X)}{\partial X_{i,j}^{3}}\right| \le \frac{\beta ^{2}}{6 \sqrt{3} N^{3/2}} \end{aligned}$$
(22)

Proof

At first, observe that a straightforward computation gives

$$\begin{aligned} \frac{\partial H(X, \eta )}{\partial X_{i,j}} = \frac{1}{\sqrt{N}} \eta _{i} \eta _{j} \, ; \quad \frac{\partial Z}{\partial X_{i,j}} (X) = \frac{\beta }{\sqrt{N}}\sum _{\eta } \eta _{i} \eta _{j} e^{\beta H(X, \eta )} =\frac{\beta }{\sqrt{N}}\langle \eta _i \eta _j \rangle _{X} Z(X). \end{aligned}$$
(23)

Using (23), the bound on the first derivative of F with respect to \(X_{i,j}\) is readily obtained from the following

$$\begin{aligned} \frac{\partial F_{\beta }(X)}{\partial X_{i,j}} = \frac{1}{\beta Z(X)}\frac{\partial Z(X)}{\partial X_{i,j}} =\frac{1}{\sqrt{N}} \langle \eta _{i}\eta _{j} \rangle _{X} \end{aligned}$$
(24)

by observing that \(\langle \eta _{i}\eta _{j} \rangle _{X} \in [0, 1]\). To compute higher-order derivatives note that

$$\begin{aligned} \frac{\partial p(X, \eta )}{\partial X_{i,j}} =\frac{ \frac{\beta }{\sqrt{N}}\eta _{i}\eta _{j} e^{\beta H(X, \eta )} Z- \frac{\beta }{\sqrt{N}}\langle \eta _{i}\eta _{j} \rangle _{X} e^{\beta H(X, \eta )} Z}{ Z^2} =\frac{\beta }{\sqrt{N}}\left( \eta _{i}\eta _{j}-\langle \eta _{i}\eta _{j} \rangle _{X}\right) p(X, \eta ) \end{aligned}$$
(25)

which, in turn, yields,

$$\begin{aligned} \frac{\partial \langle \eta _{i}\eta _{j} \rangle _{X}}{\partial X_{i,j}}&= \sum _{\eta } \eta _{i}\eta _{j} \frac{\partial p}{\partial X_{i,j}} (X, \eta ) = \frac{\beta }{\sqrt{N}} \sum _{\eta } \eta _{i}\eta _{j} \left( \eta _{i}\eta _{j} -\langle \eta _{i}\eta _{j} \rangle _{X}\right) p(X, \eta ) \end{aligned}$$
(26)
$$\begin{aligned}&= \frac{\beta }{\sqrt{N}}\left( \langle \eta _{i}\eta _{j} \rangle _{X}- \langle \eta _{i}\eta _{j} \rangle _{X}^2\right) = \frac{\beta }{\sqrt{N}}\langle \eta _{i}\eta _{j} \rangle _{X}\left( 1- \langle \eta _{i}\eta _{j} \rangle _{X}\right) \end{aligned}$$
(27)

and

$$\begin{aligned} \frac{\partial ^2 \langle \eta _{i}\eta _{j} \rangle _{X}}{\partial X_{ij}^2} = \frac{\beta }{\sqrt{N}} \frac{\partial \left( \langle \eta _{i}\eta _{j} \rangle _{X} \left( 1 - \langle \eta _{i}\eta _{j} \rangle _{X}\right) \right) }{\partial X_{ij}} = \frac{\beta ^2}{N} \langle \eta _{i}\eta _{j} \rangle _{X}\left( 1 - \langle \eta _{i}\eta _{j} \rangle _{X}\right) \left( 1 - 2\langle \eta _{i}\eta _{j} \rangle _{X}\right) . \end{aligned}$$
(28)

Consequently,

$$\begin{aligned} \frac{\partial ^{2} F_{\beta }(X)}{\partial X_{i,j}^{2}} = \frac{\beta }{N}\langle \eta _{i}\eta _{j} \rangle _{X} \left( 1 - \langle \eta _{i}\eta _{j} \rangle _{X}\right) \quad \text { and } \end{aligned}$$
(29)
$$\begin{aligned} \frac{\partial ^{3} F_{\beta }(X)}{\partial X^{3}} = \frac{\beta ^2}{N^{3/2}}\langle \eta _{i}\eta _{j} \rangle _{X} (1 - \langle \eta _{i}\eta _{j} \rangle _{X}) \left( 1 - 2\langle \eta _{i}\eta _{j} \rangle _{X}\right) . \end{aligned}$$
(30)

The claim follows by observing that, in the interval [0, 1], \(x (1 - x) \le \frac{1}{4}\) and \(\left| x (1 - x) (1 - 2x)\right| \le \frac{1}{6\sqrt{3}}\) \(\square \)

4.1 Proof of Theorem 2.2

We give the proof for the convergence of \(m_{\max , N}\). The proof for \(m_{\min , N}\) is analogous.

In the rest of this section, we write \(\overline{H}(J):= \max _{\eta } H(J, \eta )\)

To prove the convergence in probability of \(m_{\max , N}\) to its expectation we use the following results which allow us to bound the variance of the free energy.

Assume \(J_{i,j}\) are independent identically distributed random variables with expected value zero and variance 1. Call \(F = F_{\beta }(J) = \frac{1}{\beta } \log \sum _{\eta } e^{\beta \sum _{i,j} J_{i,j} \eta _{i} \eta _{j}}\). The following theorem provides a bound on the variance of F.

Theorem 4.5

\({\text {Var}}\left( F\right) \le N\)

Proof

Let \(F_{i,j}\) be obtained from F by substituting \(J_{i,j}\) with \(J'_{i,j}\) (an independent copy of \(J_{i,j}\)). Then, by Lemma 4.4, \(\left| F - F_{i,j}\right| \le \frac{1}{\sqrt{N}} \left| J_{i,j} - J'_{i,j}\right| \). From Theorem 4.1 it follows

$$\begin{aligned} {\text {Var}}\left( F\right)&\le \frac{1}{2} \mathbb {E}\left[ \sum _{i,j}(F - F_{i,j})^{2}\right] = \frac{1}{2} \sum _{i,j} \mathbb {E}\left[ (F - F_{i,j})^{2}\right] = \frac{1}{2} N^{2} \mathbb {E}\left[ (F - F_{1,1})^{2}\right] \end{aligned}$$
(31)
$$\begin{aligned}&\le \frac{1}{2} N^{2} \mathbb {E}\left[ \left( \frac{1}{\sqrt{N}} \left| J_{i,j} - J'_{i,j}\right| \right) ^{2} \right] = N \mathbb {E}\left[ J_{1,1}^{2}\right] = N \end{aligned}$$
(32)

Note that this estimate is uniform in \(\beta \). \(\square \)

Since \(\overline{H}(J) = \lim _{\beta \rightarrow \infty } F_{\beta }(J)\), as an immediate consequence we get

Corollary 4.6

\({\text {Var}}\left( \overline{H}(J)\right) \le N\).

By Chebyshev’s inequality and the previous corollary we get

$$\begin{aligned} {\text {P}}\left( *\right) { \left| \frac{\overline{H}(J)}{N} - \mathbb {E}\left[ \frac{\overline{H}(J)}{N} \right] \right| > \varepsilon } \le \frac{{\text {Var}}\left( \frac{\overline{H}(J)}{N} \right) }{\varepsilon ^{2}} = \frac{1}{N^{2}} \frac{{\text {Var}}\left( \overline{H}(J)\right) }{\varepsilon } \le \frac{1}{N \varepsilon } \end{aligned}$$
(33)

from which claim ((a)) follows.

To prove the other two claims, set \(V_{F} = \mathbb {E}\left[ \sum _{i,j} (F - F_{i,j})^{2} \mid J\right] \). We have

$$\begin{aligned} V_{F}&= \sum _{i,j} \mathbb {E}\left[ (F - F_{i,j})^{2} \mid J\right] \le \sum _{i,j} \frac{1}{N} \mathbb {E}\left[ (J_{i,j} - J'_{i,j})^{2} \mid J_{i,j}\right] \end{aligned}$$
(34)
$$\begin{aligned}&= \frac{1}{N} \sum _{i,j} \left( J_{i,j}^{2} + \mathbb {E}\left[ (J'_{i,j})^2\right] \right) = \frac{1}{N} \sum _{i,j} \left( J_{i,j}^{2} + 1 \right) \end{aligned}$$
(35)

Let us examine the case \(\mathbb {E}\left[ \left| J_{1,1}\right| ^{3}\right] < \infty \).

Leveraging on Jensen’s inequality we have \(\left| \sum _1^n a_i\right| ^{\frac{3}{2}} \le \sqrt{n}\sum _{1}^{n} \left| a_i\right| ^{\frac{3}{2}}\). The latter inequality can be used to show that

$$\begin{aligned} \mathbb {E}\left[ V_{F}^{\frac{3}{2}}\right]&\le \mathbb {E}\left[ \left| \frac{1}{N}\sum _{i,j} \left( J_{i,j}^2 + 1\right) \right| ^{\frac{3}{2}}\right] \le \frac{1}{N^{\frac{3}{2}}} \mathbb {E}\left[ N \sum _{i,j} \left| J_{i,j}^2 + 1\right| ^{\frac{3}{2}}\right] \end{aligned}$$
(36)
$$\begin{aligned}&\le \frac{1}{\sqrt{N}} \mathbb {E}\left[ \sum _{i,j} \sqrt{2} (\left| J_{i,j}\right| ^3 + 1)\right] =\sqrt{2}N^{\frac{3}{2}} \mathbb {E}\left[ \left| J_{1,1}\right| ^3 + 1\right] . \end{aligned}$$
(37)

Then, by Theorem 4.3 and for some constant C,

$$\begin{aligned} \mathbb {E}\left[ \left| F - \mathbb {E}\left[ F\right] \right| ^3\right] \le CN^{\frac{3}{2}} \end{aligned}$$
(38)

which, in turn, implies

$$\begin{aligned} \mathbb {E}\left[ \left| \frac{F-\mathbb {E}\left[ F\right] }{N} \right| ^{3} \right] \le CN^{-\frac{3}{2}} \end{aligned}$$
(39)

Since (39) does not depend on \(\beta \), it holds for \(\overline{H}(J) = \lim _{\beta \rightarrow \infty } F_{\beta }(J)\) as well. Chebyshev’s inequality and Borel-Cantelli lemma allow us to conclude the proof of claim ((b)).

Consider now the case of strictly subgaussian \(J_{i,j}\)’s. Recall that for a centered strictly subgaussian random variable X with variance \(\sigma ^{2}\),

$$\begin{aligned} \mathbb {E}\left[ \exp (sX)\right] \le \exp \left( \frac{\sigma ^{2}s^{2}}{2}\right) , \quad \forall s \in \mathbb {R}. \end{aligned}$$
(40)

Also, \(\forall r \in \mathbb {N}\), \(\mathbb {E}\left[ X^{2r}\right] \le 2^{r+1}\sigma ^{2r}r!\), yielding

$$\begin{aligned} \mathbb {E}\left[ e^{s\left( X^2-\sigma ^2\right) }\right]&=1+s \mathbb {E}\left[ X^2-\sigma ^2\right] +\sum _{r=2}^{\infty } \frac{s^r \mathbb {E}\left[ \left( X^2-\sigma ^2\right) ^r\right] }{r !} \le 1+\sum _{r=2}^{\infty } \frac{s^r \mathbb {E}\left[ X^{2 r}\right] }{r !} \end{aligned}$$
(41)
$$\begin{aligned}&\le 1+2\sum _{r=2}^{\infty } s^r 2^r \sigma ^{2 r} =1+\frac{8 s^2 \sigma ^4}{1-2 s \sigma ^2} \end{aligned}$$
(42)

where the inequality in (41) follows from the fact that, for a non negative random variable (such as \(X^{2}\)), the central r-th moment is smaller or equal to the r-th moment (recall that \(X^{2}\) has expected value \(\sigma ^{2}\)).

This result, together with (34), can be used to bound the moment-generating function of \(V_{F}\) as

$$\begin{aligned} \mathbb {E}\left[ \exp \left( t V_{F}\right) \right]&\le \mathbb {E}\left[ \exp \left( \frac{t}{N}\sum _{i,\,j} \left( X_{ij}^2 - 1 + 2\right) \right) \right] = \left( e^{\frac{2t}{N}} \mathbb {E}\left[ \exp \left( \frac{t}{N}\left( X_{1\,1}^2 - 1\right) \right) \right] \right) ^{N^2}\end{aligned}$$
(43)
$$\begin{aligned}&\le e^{\frac{2t}{N}} \left[ 1+\frac{1}{N}\frac{8 t^2}{N- 2t}\right] ^{N^2}\le e^{9t^2} \end{aligned}$$
(44)

for N large enough where the last inequality follows from the fact that \(\left[ 1+\frac{1}{N}\frac{8 t^2}{N- 2t}\right] ^{N^2}\) is a decreasing sequence with the same limit as \(\left[ 1 + \frac{8 t^2 }{N^2}\right] ^{N^2}\). Then, from Theorem 4.2 we get, for all \(\theta >0\) and \(\lambda \in (0, \frac{1}{\theta })\),

$$\begin{aligned} \mathbb {E}\left[ \exp (\lambda (F-\mathbb {E}\left[ F\right] ))\right] \le \exp \left( \frac{\lambda \theta }{1-\lambda \theta }\right) \mathbb {E}\left[ \exp \left( \frac{\lambda V_{F}}{\theta }\right) \right] \le \exp \left( \frac{\lambda \theta }{1-\lambda \theta }+\frac{9\lambda ^2}{\theta ^2}\right) \end{aligned}$$
(45)

and hence, by exponential Markov inequality

$$\begin{aligned} {\text {P}}\left( *\right) {\left| F-\mathbb {E}[F]\right| > t } \le 2\exp \left( -\lambda t + \frac{\lambda \theta }{1-\lambda \theta }+\frac{9\lambda ^2}{\theta ^2}\right) \le e^{-at} \end{aligned}$$
(46)

for some \(a>0\) (by optimizing on \(\lambda , \theta \)) and for every \(t > 0\). Setting \(t = Nz\), with \(z > 0\) we get

$$\begin{aligned} {\text {P}}\left( *\right) {\left| F-\mathbb {E}[F]\right| > Nz } \le e^{-aNz}. \end{aligned}$$
(47)

Since also (47) does not depend on \(\beta \), it holds for \(\overline{H}(J) = \lim _{\beta \rightarrow \infty } F_{\beta }(J)\). Dividing by N we get claim ((c)). \(\square \)

4.2 Proof of Theorem 2.3

Using the bounds of Lemma 4.4 we can prove that the expectation of the free energy does not depend on the distribution of the couplings in the thermodynamic limit.

As for Theorem 2.3, let \(J = \left\{ J_{i,j} \right\} _{1\le i,j\le N}\) and \(Y = \left\{ Y_{i,j} \right\} _{1\le i,j\le N}\) be two independent sequences of independent random variables, such that for every ij \(\mathbb {E}\left[ J_{i,j}\right] = \mathbb {E}\left[ Y_{i,j}\right] = 0\) and \(\mathbb {E}\left[ J_{i,j}^{2}\right] = \mathbb {E}\left[ Y_{i,j}^{2}\right] = 1\).

Theorem 4.7

Consider \(F_{\beta }(J)\) as in (20). Then

$$\begin{aligned} \lim _{N \rightarrow \infty } \left| \mathbb {E}\left[ F_{\beta }(J)\right] - \mathbb {E}\left[ F_{\beta }(Y)\right] \right| = 0 \end{aligned}$$
(48)

Proof

The proof is a straightforward adaptation of Chatterjee’s extension of Lindeberg’s argument for the central limit theorem. See [8, 9] for a comprehensive treatment.

Let \(0 \le k \le n=N^{2}\) be any numbering of the elements of the sequences and define

$$\begin{aligned} \textbf{Z}_k:=\left( J_{1}, \ldots , J_{k-1}, J_k, Y_{k+1}, \ldots , Y_{n}\right) \end{aligned}$$
(49)
$$\begin{aligned} \textbf{W}_k:=\left( J_{1}, \ldots , J_{k-1}, 0, Y_{k+1}, \ldots , Y_{n}\right) \end{aligned}$$
(50)

Now consider a Taylor expansion of \(F_{\beta }\) around \(J_{k} = 0\) and write

$$\begin{aligned} F_{\beta }(\textbf{Z}_k) = F_{\beta }(\textbf{W}_k) + J_{k} \frac{\partial F_{\beta }(\textbf{W}_k)}{\partial J_{k}} + \frac{1}{2} J_{k}^2 \frac{\partial ^{2} F_{\beta }(\textbf{W}_k)}{\partial J_{k}^{2}} + R_{k} \end{aligned}$$
(51)
$$\begin{aligned} F_{\beta }(\textbf{Z}_{k-1})) = F_{\beta }(\textbf{W}_k) + Y_{k} \frac{\partial F_{\beta }(\textbf{W}_k)}{\partial J_{k}} + \frac{1}{2} Y_{k}^{2} \frac{\partial ^{2} F_{\beta }(\textbf{W}_k)}{\partial J_{k}^{2}} + S_{k}. \end{aligned}$$
(52)

The bounds from Lemma 4.4 then give

$$\begin{aligned} \left| R_k\right| \le \frac{\beta }{8N} X_k^2, \quad \left| S_k\right| \le \frac{\beta }{8N} Y_k^2 \qquad \text {and} \end{aligned}$$
(53)
$$\begin{aligned} \left| R_k\right| \le \frac{\beta ^{2}}{36 \sqrt{3} N^{3/2}} |X_k|^3, \quad \left| S_k\right| \le \frac{\beta ^{2}}{36 \sqrt{3} N^{3/2}} |Y_k|^3. \end{aligned}$$
(54)

Note that, for each k, the random variables \(J_{k}, Y_{k}\) and \(\textbf{W}_k\) are independent. Recalling that \(J_{k}\) and \(Y_{k}\) have both mean zero and variance 1, we get

$$\begin{aligned} \mathbb {E}\left[ F_{\beta }(\textbf{Z}_k)- F_{\beta }(\textbf{Z}_{k-1}))\right] = \mathbb {E}\left[ R_{k}- S_{k}\right] \end{aligned}$$
(55)

which gives the estimate

$$\begin{aligned} \left| \mathbb {E}\left[ F_{\beta }(X)\right] - \mathbb {E}\left[ F_{\beta }{(Y)}\right] \right|= & \left| \sum _{i=1}^n \mathbb {E}\left[ F_{\beta }(\textbf{Z}_k) - F_{\beta }(\textbf{Z}_{k-1})\right] \right| \nonumber \\\le & \sum _{i=1}^n \mathbb {E}\left[ \left| R_k\right| \right] + \mathbb {E}\left[ \left| S_k\right| \right] + \frac{\beta }{8N}\sum _{i=1}^n \left[ \mathbb {E}\left( X_i^2, \left| X_i\right|>K\right) \right. \nonumber \\ & \left. + \mathbb {E}\left( Y_i^2;\left| Y_i\right| >K\right) \right] + \frac{\beta ^{2}}{36 \sqrt{3} N^{3/2}}\sum _{i=1}^n \left[ \mathbb {E}\left( \left| X_i\right| ^3;\left| X_i\right| \le K\right) \right. \nonumber \\ & \left. +\mathbb {E}\left( \left| Y_i\right| ^3;\left| Y_i \right| \le K\right) \right] \end{aligned}$$
(56)

If we furthermore assume \(\gamma < \infty \), we only need to use third order Taylor bounds on \(R_{k}, S_{k}\) getting the more explicit bound

$$\begin{aligned} \left| \mathbb {E}\left[ F_{\beta }(J)\right] - \mathbb {E}\left[ F_{\beta }(Y)\right] \right| \le \frac{\beta ^2 \gamma }{18\sqrt{3}}{\sqrt{N}} \end{aligned}$$
(57)

\(\square \)

Finally, we can prove Theorem 2.3. We write the proof for the maximum of H. The proof for the minimum can be done in the same way by replacing \(\beta \) with \(-\beta \).

Recall that we set \(\overline{H}(J) = \max _{\eta } H(J, \eta )\).

We have

$$\begin{aligned} \overline{H}(J)&=\beta ^{-1} \log \left[ e^{\beta \overline{H}(J)}\right] \le \beta ^{-1} \log \left[ \sum _{\eta } e^{\beta H(J, \eta )}\right] \le \beta ^{-1} \log \left[ 2^N e^{\beta \overline{H}(J)}\right] \end{aligned}$$
(58)

getting the uniform bound

$$\begin{aligned} \left| \overline{H}(J) - F_{\beta }(J)\right| \le \beta ^{-1} N \log 2. \end{aligned}$$
(59)

From Theorem 4.7

$$\begin{aligned} \begin{aligned} \left| \mathbb {E}\left[ \overline{H}(J)\right] - \mathbb {E}\left[ \overline{H}(Y)\right] \right|&\le \left| \mathbb {E}\left[ \overline{H}(J) - F_{\beta }(J)\right] \right| + \left| \mathbb {E}\left[ F_{\beta }(J)\right] - \mathbb {E}\left[ F_{\beta }(Y)\right] \right| \\&\quad + \left| \mathbb {E}\left[ \overline{H}(Y)\right] - F_{\beta }(Y)\right| \\&\le \frac{2 N \log 2}{\beta } + \left| \mathbb {E}\left[ F_{\beta }(J)\right] - \mathbb {E}\left[ F_{\beta }(Y)\right] \right| \end{aligned} \end{aligned}$$
(60)

First assume \(\gamma < \infty \). From (57) and 60

$$\begin{aligned} \left| \mathbb {E}\left[ \overline{H}(J)\right] - \mathbb {E}\left[ \overline{H}(Y)\right] \right|&\le \frac{2 N \log 2}{\beta } + \frac{\beta ^2\gamma }{18\sqrt{3}} \sqrt{N} \end{aligned}$$
(61)

By choosing \(\beta =N^{1/6}\) we get the second part of the thesis.

We now consider \(\gamma \!=\!\infty \). Define \(g(K)\!:=\!\max _{i, j}\left( \mathbb {E}\left( J_{i j}^2;\left| J_{i j}\right| \!>\!K\right) , \mathbb {E}\Big (Y_{i j}^2;\left| Y_{i j}\right| >K\right) \Big )\). From the finite variance hypothesis, \(g(K) \rightarrow 0\) as \(K \rightarrow \infty \). In (56) we take \(\beta = g(N^{\frac{1}{4}})^{-\frac{1}{2}}\), so that \(\beta \rightarrow \infty \) as \(N \rightarrow \infty \). Also observe that \(\gamma =\infty \) implies \(Kg(K) \rightarrow \infty \) as \(K\rightarrow \infty \).

Estimate (56), with \(K=N^{\frac{1}{4}}\), now becomes

$$\begin{aligned} \frac{1}{N} \left| \mathbb {E}\left[ F_{\beta }(X)\right] - \mathbb {E}\left[ F_{\beta }{(Y)}\right] \right| \le C_1 g(N^{\frac{1}{4}})^{\frac{1}{2}}+ C_2 \left( N^{\frac{1}{4}}g(N^{\frac{1}{4}})\right) ^{-1} \end{aligned}$$
(62)

Which vanishes as \(N \rightarrow \infty \) thus proving also part 1 of Theorem 2.3. \(\square \)

5 Conclusions and open problems

The analysis carried over in this paper unveils several lines of investigation that we believe are rather interesting.

It would be useful to identify the minimal requirement on the distribution of the couplings to have the minimum and maximum per particle to converge to their expected values exponentially fast.

One feature we find particularly intriguing is the relationship between the elements of J contributing to the minimum and those contributing to the maximum of H discussed in Conjecture 3.3. We think that a rigorous understanding of the relative size of the blocks of J and the correlation between the rows belonging to the minimizer and the maximizer of J could significantly improve our understanding of the problem.

Better understanding would, in turn, provide useful tools in order to evaluate the heuristic algorithms used to tackle QUBO.

Finally, it would be interesting to study the problem when the \(J_{i,j}\) are not independent, especially in the case where the interaction graph is not the complete graph.