Abstract
Quadratic Unconstrained Binary Optimization (QUBO or UBQP) is concerned with maximizing/minimizing the quadratic form \(H(J, \eta ) = W \sum _{i,j} J_{i,j} \eta _{i} \eta _{j}\) with J a matrix of coefficients, \(\eta \in \{0, 1\}^N\) and W a normalizing constant. In the statistical mechanics literature, QUBO is a lattice gas counterpart to the (generalized) Sherrington–Kirkpatrick spin glass model. Finding the optima of H is an NP-hard problem. Several problems in combinatorial optimization and data analysis can be mapped to QUBO in a straightforward manner. In the combinatorial optimization literature, random instances of QUBO are often used to test the effectiveness of heuristic algorithms. Here we consider QUBO with random independent coefficients and show that if the \(J_{i,j}\)’s have zero mean and finite variance then, after proper normalization, the minimum and maximum per particle of H do not depend on the details of the distribution of the couplings and are concentrated around their expected values. Further, with the help of numerical simulations, we study the minimum and maximum of the objective function and provide some insight into the structure of the minimizer and the maximizer of H. We argue that also this structure is rather robust. Our findings hold also in the diluted case where each of the \(J_{i,j}\)’s is allowed to be zero with probability going to 1 as \(N \rightarrow \infty \) in a suitable way.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We consider the quadratic form
where \(J \in \mathbb {R}^{N \times N}\) is a matrix of coefficients, \(\eta \in \mathcal {A}^N\) (with \(\mathcal {A}\) a finite subset of \(\mathbb {R}\)), \(\eta _{i}\) is the value of the i-th component of \(\eta \), and \(W \in \mathbb {R}\) is a suitable normalizing constant. This quadratic form plays an important role both in combinatorial optimization and statistical mechanics.
In statistical mechanics (1) is the Hamiltonian of several physical systems whose nature depends on the elements of \(\mathcal {A}\). For instance, if \(\mathcal {A} = \{-1, 1\}\) the Hamiltonian describes a system of (pairwise) interacting spins whereas if \(\mathcal {A} = \{0, 1\}\) it is generally used to describe a system of (pairwise) interacting particles. Spins or particles live on the vertices of some, possibly oriented, graph \(\mathcal {G} = \{V, E\}\), called the interaction graph, with \(|V| = N\) and J the weighted adjacency matrix of \(\mathcal {G}\). For each \((i,j) \in E\) the entry \(J_{i,j}\) represents the strength of the interaction between the entities (spins or particles) at vertices i and j of \(\mathcal {G}\). The microscopic state of the physical system is given by \(\eta \).
The physical system is described in terms of the probability of its microscopic states (Gibbs measure):
where the parameter \(\beta \) is called the inverse temperature and \(Z_{\beta }\) is a normalizing constant called partition function.
In combinatorial optimization the problem of minimizing (or maximizing) (1) when \(\mathcal {A} = \{0, 1\}\) and \(W = 1\) is known under the names Quadratic Unconstrained Binary Optimization (QUBO) or Unconstrained Binary Quadratic Programming (UBQP). QUBO is NP-hard and, in general, no polynomial time algorithm is known to find a minimizer of H. Many problems in combinatorial optimization and data analysis can be mapped to QUBO in a straightforward manner (see [14] for a survey). Even constrained optimization problems can be reduced to QUBO by introducing quadratic penalties in the objective function.
Minimizers (ground states) of forms like H in (1) are of relevance in the context of statistical mechanics as well. Indeed ground states are the ones with the highest probability with respect to the probability measure (2). As the temperature of the physical system approaches zero (\(\beta \rightarrow \infty \)), the system will eventually reach its ground state.
If the signs of the entries of J are disordered, i.e. if it is not possible to find the global minimizer with a local criterion, finding the ground states of the system is non-trivial.
In the context of statistical mechanics, there is a vast literature concerning the properties of the ground states of H of the form (1). Here we recall the comprehensive texts by Parisi, Mezard, and Virasoro [16] (for a more physical perspective) and more recently by Talagrand [23, 24] and Panchenko [18]. The recent book [7] provides an up-to-date overview.
The system that attracted more effort from physicists, and that is most described in the references above, is the so-called Sherrington–Kirkpatrick (SK) model, in which \(\eta \in \{-1, 1\}^{N}\), \(W = \frac{1}{\sqrt{N}}\) and \(J_{i,j}\) are independent standard Gaussian random variables. In relatively more recent times, always in the context of spin systems, the universality of the features of the SK model with respect to the distribution of the (independent) \(J_{ij}\) has been proven, see [6, 8].
The \(\{0, 1\}\) counterpart to the Sherrington–Kirkpatrick model, whose Hamiltonian matches the objective function of a QUBO instance, has not been the subject of the same vast attention in the statistical mechanics literature. Many important results have been achieved in [17, 19], in which features of the minimizers and maximizers analogous to the SK model for the Hamiltonian (1) have been proven for \(J_{i,j}\) independent Gaussian random variables and a quite general choice of the set \(\mathcal {A}\), including also the particle case \(\eta \in \{0, 1\}^{N}\). In particular it has been proven that the free energy of the system is close to its expectation with probability one (concentration) and that its thermodynamic limit exists.
The same problem, in the canonical ensemble, i.e. with a fixed number of particles, has been discussed in [12]. There, for general distribution of \(J_{i,j}\), interesting features of the system, most of all for small density of particles, have been found.
Restricting only to the case \(\eta \in \{0, 1\}^N\) and \(J_{i,j}\) independent Gaussian random variables, with very different and more elementary techniques, in [21] an almost sure lower bound for the minimum per particle of H has been provided together with a different proof of concentration.
In this paper we will show, in the spirit of [6], that the results of [21] are rather robust with respect to the distribution of the \(J_{i,j}\)’s. In particular, we consider the case of independent \(J_{i,j}\) with \(\mathbb {E}\left[ J_{i,j}\right] = 0\). If the tails of the distributions of the \(J_{i,j}\) are not too fat, then after proper normalization, the value of the minimum of H is close to its expectation with high probability and does not depend on the actual distribution of the \(J_{i,j}\). Further, with the help of numerical simulations we provide some insight into the structure of both the minimizer and the maximizer of H and show that also this structure is robust. Note that our results hold in the diluted case as well, that is, in the case where each \(J_{i,j}\) is zero with a certain probability. This probability needs not to be fixed, but it is allowed to go to 1 as \(N \rightarrow \infty \) in a suitable way.
Rigorous results are presented in Sect. 2, whereas the numerical findings concerning the structure of the minimizer and the maximizer of H are detailed in Sect. 3. Proofs are given in Sect. 4.
Throughout the paper, we use the jargon of statistical mechanics. As a consequence, we use expressions like (particle) configuration when referring to \(\eta \), number of particles when referring to N, Hamiltonian for the quadratic form (1) and energy and energy per particle (of a configuration \(\eta \)) for, respectively, \(H(J, \eta )\) and \(\frac{H(J, \eta )}{N}\). Likewise, the minimizer (1) is often referred to as the ground state of H and we use the expression thermodynamic limit to denote the limit as \(N \rightarrow \infty \).
Remark 1.1
Note that H denotes a family of random variables indexed by N. However, we do not write this dependence explicitly to lighten the notation.
2 Main results
We consider random instances of QUBO, that is we assume the matrix \(J = \{J_{i,j}\}_{1 \le i,j \le N}\) to be the realization of some multivariate random variable. Unless otherwise specified we will assume the \(J_{i,j}\)’s to be independent, identically distributed, and such that \(\mathbb {E}\left[ J_{i,j}\right] = 0\) and \({\text {Var}}\left( J_{i,j}\right) = \sigma ^2\). As for the value of the normalizing constant W we take it to be such that the random variable \(\sum _{ij} J_{i,j}\) has variance N.
Remark 2.1
Note that, in general, the distribution of the \(J_{i,j}\) is allowed to have atoms. In particular, we will be interested in random variables taking the value zero with probability \(1 - p(N) = 1 - N^{\delta - 2}\) for \(\delta \) in \(\left( 1, 2\right] \). In this way we can include in our analysis the diluted case, that is the case where the matrix J of the coefficients of the objective function is “sparse” with expected density \(\rho = p(N)\). Note that, in this way, the average degree of each vertex grows unbounded with N. For \(\sum _{ij} J_{i,j}\) to have variance N one should set \(W = \frac{1}{\sqrt{\rho N {\text {Var}}\left( J_{1,1}\right) }}\).
In the remainder of the paper, we will use the following notation.
Let
that is \(\eta ^{\min }\) and \(\eta ^{\max }\) are, respectively, the minimizer and the maximizer of H.
Further, let
In words, \(m_{\min , N}\) and \(m_{\max , N}\) are, respectively, the minimum and the maximum per particle of H.
Moreover, setting \(|\eta | = \sum _{i = 1}^N \eta _{i}\) we call
the proportion of ones in the minimizer and maximizer of H.
We remark that \(\eta ^{\min }\), \(\eta ^{\max }\), \(m_{\min , N}\), \(m_{\max , N}\), \(\alpha _{\min , N}\) and \(\alpha _{\max , N}\) are random variables that depend on the realization of J, but we do not write this dependence explicitly to lighten the notation.
We are interested in the behavior of \(m_{\min , N}\), \(m_{\max , N}\), \(\alpha _{\min , N}\) and \(\alpha _{\max , N}\) as \(N \rightarrow \infty \). When the \(J_{i,j}\) are Gaussian, the existence of the limits
-
a)
\(m_{\min } = \lim _{N \rightarrow \infty } m_{\min , N}\),
-
b)
\(m_{\max } = \lim _{N \rightarrow \infty } m_{\max , N}\),
-
c)
\(\alpha _{\min } = \lim _{N \rightarrow \infty } \alpha _{\min , N}\),
-
d)
\(\alpha _{\max } = \lim _{N \rightarrow \infty } \alpha _{\max , N}\)
follows from Theorems 1.1 and 1.2 in [17]. Reasonable numerical estimates for the quantities of interest are \(m_{\min } = m_{\max } \approx 0.42\) and \(\alpha _{\min } = \alpha _{\max } \approx 0.624\) (see Sect. 3 below).
We show that, as N gets larger, the fluctuations of both the minimum and maximum per particle of H around their expected values vanish provided some conditions on the tails of the \(J_{i,j}\) are satisfied. More precisely, we have the following
Theorem 2.2
Let the \(J_{i,j}\) be independent identically distributed random variables with \(\mathbb {E}\left[ J_{1,1}\right] = 0\) and \({\text {Var}}\left( J_{1,1}\right) = 1\). Then,
-
(a)
as \(N \rightarrow \infty \),
$$\begin{aligned} m_{\min , N} - \mathbb {E}\left[ m_{\min , N}\right] \overset{{\text {P}}}{\longrightarrow } 0 \end{aligned}$$(3)$$\begin{aligned} m_{\max , N} - \mathbb {E}\left[ m_{\max , N}\right] \overset{{\text {P}}}{\longrightarrow } 0. \end{aligned}$$(4)where \(\overset{p}{\longrightarrow }\) denotes convergence in probability.
Further,
-
(b)
If \(\mathbb {E}\left[ \left| J_{1,1}\right| ^{3}\right] < \infty \) the convergence in (3) and (4) is almost sure.
-
(c)
Suppose the \(J_{i,j}\) are strictly subgaussian, that is,
$$\begin{aligned} \mathbb {E}\left[ \exp (sJ_{1,1})\right] \le \exp \left( \frac{s^{2}}{2}\right) , \quad \forall s \in \mathbb {R}. \end{aligned}$$Then there exists a positive constant D such that, for all \(z > 0\)
$$\begin{aligned} {\text {P}}\left( *\right) {\left| m_{\min , N} - \mathbb {E}\left[ m_{\min , N}\right] \right|> Nz } \le e^{-DNz}\, \text { and }\, {\text {P}}\left( *\right) {\left| m_{\max , N} - \mathbb {E}\left[ m_{\max , N}\right] \right| > Nz } \le e^{-DNz} \end{aligned}$$that is, the convergence is exponentially fast.
The proof is provided in Sect. 4.1.
Moreover the expected value of the minimum and maximum per particle do not depend on the actual distribution of the random couplings, provided they have finite variance.
Theorem 2.3
Let \(J = \left\{ J_{ij} \right\} _{1\le i,j\le N}\) and \(Y = \left\{ Y_{i,j} \right\} _{1\le i,j\le N}\) be two independent sequences of independent random variables, such that for every i, j \(\mathbb {E}\left[ J_{i,j}\right] = \mathbb {E}\left[ Y_{i,j}\right] = 0\) and \(\mathbb {E}\left[ J_{i,j}^{2}\right] = \mathbb {E}\left[ Y_i^2\right] = 1\). Also, set \(\gamma := \max \left\{ \mathbb {E}\left[ \left| J_{ij}\right| ^3\right] , \mathbb {E}\left[ \left| Y_{ij}\right| ^3\right] , 1 \le i,j \le N\right\} \); \(\gamma \) may be infinite. Then we have, as \(N \rightarrow \infty \),
If furthermore \(\gamma < \infty \),
where C is a constant depending only on \(\gamma \).
The proof comes as a consequence of an analogous result on the universality of the free energy which is presented in Sect. 4.2.
As an immediate consequence of Theorem 2.3 and Theorem 2.2 ((c)), the minimum and maximum per particle have the same almost sure lower bound of the Gaussian case (see [21]) irrespective of the details of the distribution of the \(J_{i,j}\). More precisely, let \(\nu ^{-}(m) = |\{\eta : H(J, \eta ) \le -m N\}|\) denote the number of configurations whose energy is less or equal to \(-m N\) and, similarly, let \(\nu ^{+}(m) = |\{\eta : H(J, \eta ) \ge m N\}|\) be the number of configurations with energy at least equal to mN. Then:
Corollary 2.4
Let the \(J_{i,j}\)’s be independent identically distributed strictly subgaussian random variables with \(\mathbb {E}\left[ J_{1,1}\right] = 0\) and \({\text {Var}}\left( J_{1,1}\right) = 1\) and let \(I(x) = - x \log (x) - (1-x) \log (1 - x)\). Then, for large values of N and for some constant C, \({\text {P}}\left( *\right) {\nu ^{-}(m^{\star }) > 0} \le e^{-CN}\) and \({\text {P}}\left( *\right) {\nu ^{+}(m^{\star }) > 0} \le e^{-CN}\) where \(m^{\star } \approx 0.562\) is the extremal value of m such that the function \(I(\alpha ) - \frac{m^2}{2\alpha ^{2}(1 - \alpha )^{2}}\), for fixed m, has no zeros. This value is obtained for \(\alpha = \alpha ^{\star } \approx 0.644\).
3 Conjectures and numerical results
In this section, we present some numerical results concerning the minimum and maximum per particle and the structure of the minimizer and the maximizer of QUBO instances with random coefficients. The findings presented in Sect. 3.1 allow us to highlight some interesting features concerning the connection between particles contributing to the minimizer of H (that is, components of \(\eta ^{\min }\) equal to 1) and particles contributing to the maximizer of H (that is, components of \(\eta ^{\max }\) equal to 1) and the probability of the events \(\{\eta ^{\min }_{i} = 1\}\) and \(\{\eta ^{\max }_{i} = 1\}\). Simulations have been carried out using Monte Carlo Probabilistic Cellular Automata (PCA) as those introduced in [2, 3, 10, 21, 22]. The advantage of these algorithms is represented by their inherently parallel nature which allows the exploitation of parallel architectures at their fullest while preserving a quality of the solution comparable to the one obtained with single spin flip MCMC algorithms as outlined in [3, 13, 21]. Thanks to these algorithms we were able, in the diluted case, to simulate effectively systems with N up to 128000.
In particular, the algorithm used in the simulations works as follows. Let
As a preliminary step consider the symmetrized version of \(\tilde{J}\) of J (that is \(\tilde{J}=\frac{J + J^{T}}{2}\)) and note that the value of the Hamiltonian dos not change if we replace J with \(\tilde{J}\). Let \(H(\eta , \tau ) = \beta \sum _{i}h_{i(\eta )\tau _{i}} + q\sum _{i}\left[ \eta _{i}(1-\tau _{i}) + \tau _{i}(1-\eta _{i}) \right] \) with \(h_{i}(\eta ) =\frac{1}{\sqrt{ N }}\sum _{j}\tilde{J}_{i,j}\eta _{j}\) and \(\beta \) and q two positive parameters and choose the transition matrix to be \(P_{\eta , \tau } = \frac{e^{ -H(\eta , \tau ) }}{\sum _{\tau }e^{ -H(\eta ,\tau ) }}\). Rewriting \(P_{\eta , \tau }\) as
we immediately see that, conditionally on \(\eta \), the value of each \(\tau _{i}\) can be sampled independently with the following probabilities:
with \(Z_{i} = e^{ -\beta h_{i} - q(1- \eta _{i})} + e^{ - q\eta _{i} }\). Then, at least in principle, each component of the configuration could be updated on a dedicated core.
Note that, by the symmetry of \(\tilde{J}\), we have that \(\frac{\sum _{\eta }e^{ -H(\eta ,\tau ) }}{\sum _{\eta ,\tau }e^{ -H(\eta ,\tau ) }}\) is the reversible measure of \(P_{\eta ,\tau }\). In addition, observe that \(H(\eta , \eta ) = \beta H(\eta )\). The intuitive rationale for why the heuristic algorithm is expected to work is the following. As q gets large, the weight of pairs \((\eta , \tau )\) with \(\eta \ne \tau \) in \(\pi (\sigma )\) becomes smaller and the stationary measure of the algorithm becomes closer to the Gibbs measure \(\frac{e^{ -\beta H(\eta ) }}{Z}\) with Hamiltonian H and inverse temperature \(\beta \). A comparison between this algorithm and the Metropolis one is provided in [21].
The computation of the local fields \(h_{i}\) proves to be the computationally expensive part of the algorithm. However, the vector of local fields \(\{ h_{i}(\eta ) \}_{i = 1, \dots , N}\) is the matrix–vector product \(J \cdot \eta \) and can be computed using highly optimized linear algebra libraries and exploiting a parallel computing device such as a GPU. Further, observe that k simulations can be carried over in parallel, possibly with different parameters \(\beta \) and q. Let \(\textbf{E}\) be the \(N \times k\) matrix whose columns contain the k configurations to be updated. In this case, the entire collection of vectors of fields \(\{ h_{i}(\eta ^{(m)} ) \}_{i = 1, \dots , N; m = 1, \dots , k}\) (encoded, again, as an \(N \times k\) matrix) is the matrix–vector product \(J \cdot \textbf{E}\). Also, this product can be computed effectively using highly optimized libraries and is, in general, substantially faster than computing k matrix–vector products.
We remark that the values we found are heuristic and we have no guarantee that they coincide with the true values of the minimizer and the maximizer of H(J). However, as already discussed in more detail in [21], the exact algorithm introduced in [20] can be exploited to compute the exact minima and maxima of H(J) for values of N up to 150. The absolute value of the minimum and the maximum per particle of H(J) are steadily close to 0.42 already for values of N between 80 and 150. Moreover, for instances with sizes up to 150, we verified that our algorithm typically finds the same optimum as the exact algorithm. For the larger instances, we retained the minimum and maximum found in a set of runs of the algorithms with different pairs of values of \(\beta \) and q.
3.1 Minimum and maximum per particle
The standard normal case has been investigated extensively in [21]: for values of N relatively small, both \(m_{\min , N}\) and \(m_{\max , N}\) oscillate around a value about 0.42 whereas \(\alpha _{\min , N}\) and \(\alpha _{\max , N}\) very rapidly approach a value about 0.624 (see Fig. 1). Theorem 2.3 states that both \(m_{\min , N}\) and \(m_{\max , N}\) are robust with respect to the distribution of the \(J_{i,j}\). Numerical simulations show that this robustness also concerns the proportion of ones in both the minimizer and maximizer of H.
Figure 2 shows the behavior of \(m_{\min , N}, m_{\max , N}, \alpha _{\min , N}, \alpha _{\max , N}\) for \(J_{i,j}\) with (shifted) exponential distribution. Even if the exponential distribution is rather asymmetric and is not subgaussian, for values of N relatively small the average energy per particle and the proportion of ones in both the minimizer and the maximizer approach those of the standard normal case.
In our simulations, we also considered the diluted case, that is, we took \(J_{i,j}\) to be zero with probability \(1 - p_{\delta }(N)\), where \(p_{\delta }(N) = N^{\delta - 2}\), and a standard normal random variable otherwise. Findings concerning the behavior of \(m_{\min , N}, m_{\max , N}, \alpha _{\min , N}, \alpha _{\max , N}\) are presented in Figs. 3 and 4. The log-log plot of Fig. 5 suggests that \(m_N\) converges to \(\bar{m}\) as a power of N.
We highlight that as \(\delta \) becomes smaller, the density of J becomes very small. To give an idea, values of the density of J are given in Table 1. For instance, with \(N = 128000\) and \(\delta = 1.1\) the expected number of nonzero entries in each row of J is less than 4.
In the operation research literature, to test optimization algorithms, it is common to consider random instances of QUBO where the \(J_{i,j}\) have a uniform distribution (see, e.g., [1, 15, 25, 26]) and where the matrix J is, possibly, sparse. Values of the best-known maximizer for some benchmark QUBO instances in the case of uniformly distributed \(J_{i,j}\) are reported in Table 2. It is apparent that also in these cases, the values of the optima per particle agree with those of the standard normal case.
3.2 Structure of minimizer and maximizer
Observe that for any realization of J the minimizer and the maximizer of the Hamiltonian are with probability 1 since the distribution of the couplings is absolutely continuous. Then it is possible to partition the indices \(1, 2, \ldots , N\) into four sets:
-
\(I_{1} = \{i: \eta ^{\min }_{i} = 1, \; \eta ^{\max }_{i} = 0\}\);
-
\(I_{2} = \{i: \eta ^{\min }_{i} = 1, \; \eta ^{\max }_{i} = 1\}\);
-
\(I_{3} = \{i: \eta ^{\min }_{i} = 0, \; \eta ^{\max }_{i} = 1\}\);
-
\(I_{4} = \{i: \eta ^{\min }_{i} = 0, \; \eta ^{\max }_{i} = 0\}\).
To refer properly to the cardinality of these sets we give the following
Definition 3.1
\(\alpha _{k,N}:= \frac{ |I_{k}| }{ N }, \, k = 1, 2, 3, 4\).
With an abuse of notation, we say that the i-th row (column) of J belongs to \(I_{k}\) if \(i \in I_{k}\). Note that \(\alpha _{1,N}\) can be interpreted as the proportion of 1 appearing in the minimizer but not appearing in the maximizer of H. Similar interpretations can be given for \(\alpha _{2,N}\), \(\alpha _{3,N}\) and \(\alpha _{4,N}\).
Remark 3.2
With the definition of \(\alpha _{i}\) given above it is immediate to see that \(\alpha _{\min ,N} = \alpha _{N,1} + \alpha _{N,2}\) and \(\alpha _{\max ,N} = \alpha _{N,2} + \alpha _{N,3}\)
Leveraging on the definition of \(I_{k}\), it is possible to partition J into 16 blocks
Clearly, \(J[I_{k}, I_{\ell }]\) is a sub matrix of J with \(N\alpha _{k,N}\) rows and \(N\alpha _{\ell ,N}\) columns.
Numerical simulations suggest that the average value and the variance of the entries in each block, subject to proper normalization and the relative size of the blocks converge to a deterministic limit as \(N \rightarrow \infty \). These limits do not depend on the distribution of the \(J_{i,j}\)’s, as long as they have expected value zero and variance 1. More precisely we make the following
Conjucture 3.3
Let \(J_{i,j}\) be independent identically distributed random variables with \(\mathbb {E}\left[ J_{1,1}\right] = 0\) and \({\text {Var}}\left( J_{1,1}\right) = 1\). Then there exist constants \(\alpha _{1}, \alpha _{2}, \alpha _{3}, \alpha _{4}\) such that
-
1.
\(\lim \limits _{N\rightarrow \infty } \alpha _{1,N} = \alpha _{1}\), \(\lim \limits _{N\rightarrow \infty } \alpha _{2,N} = \alpha _{2}\)\(\lim \limits _{N\rightarrow \infty } \alpha _{3,N} = \alpha _{3}\)\(\lim \limits _{N\rightarrow \infty } \alpha _{4,N} = \alpha _{4}\)
-
2.
\(\alpha _{1} = \alpha _{2} = \alpha _{3} = \frac{1}{2}\alpha _{\min } = \frac{1}{2}\alpha _{\max }\); \(\alpha _{4} = 1 - \frac{3}{2}\alpha _{\min }\)
-
3.
\({\text {Var}}\left( J[k, \ell ]_{i,j}\right) = 1\) for \(k, \ell = 1, \ldots , 4\)
Further, set \(\mu [k, \ell ] = \mathbb {E}\left[ J[k,\ell ]_{i,j}\right] \) and write the covariance of \(J[k,\ell ]_{i}, J[k, \ell ]_{j}\) as
.
Computation of averages over 10000 realizations of J with standard normal distribution and \(N = 1024\) yielded the following estimates for \(\alpha _{i}\), \(\mu [k,\ell ]\) and \(\tilde{\sigma }[k,\ell ]\).
These values suggest that the random couplings in blocks [2, 2], [3, 1], [4, 2], [1, 3], and [2, 4] are negatively correlated, the random couplings in blocks [1, 4], [3, 4], [4, 1], and [4, 3] are positively correlated, whereas the random couplings in the remaining blocks are roughly independent.
Very similar values can be obtained in the diluted case and for \(J_{i,j}\) with distributions other than the normal.
Note that it is always possible to relabel the indices \(1, 2, \ldots , N\) so that, in J, indices in \(I_{1}\) appear “first”, indices in \(I_{2}\) appear “second”, in \(I_{3}\) appear “third” and indices in \(I_{4}\) appear “last”. With this relabelling, a graphical representation of Conjecture 3.3 is provided in Fig. 6.
3.3 Probability a particle belongs to the minimizer/maximizer of H
Sets \(I_{1}, \ldots I_{4}\) introduced above are random sets depending on the realization of J. The problem of finding \(\eta ^{\min }\) and \(\eta ^{\max }\) can be restated as the problem of determining the sets \(I_{\min } = I_{1} \cup I_{2}\) and \(I_{\max } = I_{2} \cup I_{3}\). Though, as already mentioned, this problem is NP-hard, we tried to assess, numerically, whether it is possible to determine the probability that a certain index i belongs to either \(I_{\min }\) or \(I_{\max }\).
To this aim, consider the matrix \(\tilde{J} = \frac{J + J'}{2}\), that is the symmetrized version of J and the function \(\tilde{H}\) obtained from H by replacing J with \(\tilde{J}\). Note that \(H(J, \eta ) = \tilde{H}(\eta )\) for all \(\eta \) and, consequently, the values of \(\eta ^{\min }_{i}\) and \(\eta ^{\max }_{i}\) do not depend on whether matrix J or \(\tilde{J}\) is considered.
Take the matrix \(\tilde{J}\) and define \(R = \sum _{j = 1}^{N} J_{i, j}\), that is R is the vector of the sums of the rows of \(\tilde{J}\). Let [i] be the index of the i-th smallest element of R. We say that the [i]-th row (column) of \(\tilde{J}\) belongs to the minimizer (respectively the maximizer) of H if \(\eta ^{\min }_{[i]} = 1\) (respectively \(\eta ^{\max }_{[i]} = 1\)).
Numerical simulations show that the probability that the [i]-th row of \(\tilde{J}\) belongs to the minimizer (respectively maximizer) of H is a decreasing (respectively increasing) deterministic function of \(\frac{[i]}{N}\). This function is expected not to depend on the actual distribution of J. Further, we expect that a positive fraction of the smallest (largest) rows of \(\tilde{J}\) (where smallest/largest refers to the sum of the elements on each row of \(\tilde{J}\)) to belong to the minimizer (maximizer) of H with positive probability (see Fig. 7). More precisely we have the following
Conjucture 3.4
As \(N \rightarrow \infty \), \({\text {P}}\left( *\right) {\eta ^{\min }_{[i]} = 1 } = f_{\min }\left( \frac{[i]}{N}\right) + o(1)\) with \(f_{\min }\) a decreasing function. Similarly, as \(N \rightarrow \infty \), \({\text {P}}\left( *\right) {\eta ^{\min }_{[i]} = 1 } = f_{\max }\left( \frac{[i]}{N}\right) + o(1)\) as, with \(f_{\max }\left( \frac{[i]}{N}\right) = f_{\min }\left( 1 - \frac{[i]}{N}\right) \). Moreover, there exists \(\lambda _0 > 0\) such that \(f_{\min } (\lambda ) = 1\) for all \(\lambda < \lambda _0\). Finally, if the tails of \(J_{i,j}\) decay sufficiently fast, the function f does not depend on the distribution of \(J_{i,j}\)
Finally, we observe that the random variables \(\mathbb {1}_{\{\eta ^{\min }_{[i]} = 1\}}\) and \(\mathbb {1}_{\{\eta ^{\max }_{[i]} = 1\}}\) appear to be positively correlated for all i. The qualitative behavior of the strength of the correlation between these two variables as a function of [i] is provided by the estimates of Fig. 8.
4 Proofs
We state first some general results that will be used below in the proofs of Theorem 2.2 and Theorem 2.3.
Let \(X \!=\! \{X_{1}, \ldots , X_{n}\}\) be independent random variables and let \(W\!=\!g\left( X_1, \ldots , X_i, \ldots , X_n\right) \) with g a measurable function. Further, let \(X^{\prime } = \{X_{1}^{\prime }, \ldots , X_{n}^{\prime }\}\) be an independent copy of \(X_{1}, \ldots , X_{n}\) and write
Define the random variable V by
which allows to re-state Efron-Stein’s theorem (see [11]) as follows
Theorem 4.1
(Efron-Stein)
From [5], we can bound the moment generating function with the following
Theorem 4.2
For all \(\theta >0\) and \(\lambda \in (0,\frac{1}{\theta })\),
Moreover, a straightforward consequence of [4, Theorem 2] yields
Theorem 4.3
To prove both Theorem 2.2 and Theorem 2.3 a key role is played by the free energy function \(F: \mathbb {R}^{N \times N} \rightarrow \mathbb {R}\) defined as
with \(Z(X):= \sum _{\eta } e^{\beta H(X, \eta )}\), \(H(X, \eta ) = \frac{1}{\sqrt{N}}\sum _{i,j} X_{i,j} \eta _{i} \eta _{j}\). and \(X = \{X_{i,j}\}_{1 \le i, j, \le N}\) a collection of independent random variables (the matrix of the couplings). We do not write the dependence of \(F_{\beta }\) and Z on N to make the notation less heavy. Further, we set
where \(p(X, \eta )\) is the Gibbs measure of \(\eta \) and \(\langle \cdot \rangle _{X}\) denotes the expectation with respect to the Gibbs measure when the matrix of couplings is X.
In the next lemma, we determine bounds on the derivatives of F with respect to one of the couplings.
Lemma 4.4
Let \(F: \mathbb {R}^{N \times N} \rightarrow \mathbb {R}\) be as in (20). Then
Proof
At first, observe that a straightforward computation gives
Using (23), the bound on the first derivative of F with respect to \(X_{i,j}\) is readily obtained from the following
by observing that \(\langle \eta _{i}\eta _{j} \rangle _{X} \in [0, 1]\). To compute higher-order derivatives note that
which, in turn, yields,
and
Consequently,
The claim follows by observing that, in the interval [0, 1], \(x (1 - x) \le \frac{1}{4}\) and \(\left| x (1 - x) (1 - 2x)\right| \le \frac{1}{6\sqrt{3}}\) \(\square \)
4.1 Proof of Theorem 2.2
We give the proof for the convergence of \(m_{\max , N}\). The proof for \(m_{\min , N}\) is analogous.
In the rest of this section, we write \(\overline{H}(J):= \max _{\eta } H(J, \eta )\)
To prove the convergence in probability of \(m_{\max , N}\) to its expectation we use the following results which allow us to bound the variance of the free energy.
Assume \(J_{i,j}\) are independent identically distributed random variables with expected value zero and variance 1. Call \(F = F_{\beta }(J) = \frac{1}{\beta } \log \sum _{\eta } e^{\beta \sum _{i,j} J_{i,j} \eta _{i} \eta _{j}}\). The following theorem provides a bound on the variance of F.
Theorem 4.5
\({\text {Var}}\left( F\right) \le N\)
Proof
Let \(F_{i,j}\) be obtained from F by substituting \(J_{i,j}\) with \(J'_{i,j}\) (an independent copy of \(J_{i,j}\)). Then, by Lemma 4.4, \(\left| F - F_{i,j}\right| \le \frac{1}{\sqrt{N}} \left| J_{i,j} - J'_{i,j}\right| \). From Theorem 4.1 it follows
Note that this estimate is uniform in \(\beta \). \(\square \)
Since \(\overline{H}(J) = \lim _{\beta \rightarrow \infty } F_{\beta }(J)\), as an immediate consequence we get
Corollary 4.6
\({\text {Var}}\left( \overline{H}(J)\right) \le N\).
By Chebyshev’s inequality and the previous corollary we get
from which claim ((a)) follows.
To prove the other two claims, set \(V_{F} = \mathbb {E}\left[ \sum _{i,j} (F - F_{i,j})^{2} \mid J\right] \). We have
Let us examine the case \(\mathbb {E}\left[ \left| J_{1,1}\right| ^{3}\right] < \infty \).
Leveraging on Jensen’s inequality we have \(\left| \sum _1^n a_i\right| ^{\frac{3}{2}} \le \sqrt{n}\sum _{1}^{n} \left| a_i\right| ^{\frac{3}{2}}\). The latter inequality can be used to show that
Then, by Theorem 4.3 and for some constant C,
which, in turn, implies
Since (39) does not depend on \(\beta \), it holds for \(\overline{H}(J) = \lim _{\beta \rightarrow \infty } F_{\beta }(J)\) as well. Chebyshev’s inequality and Borel-Cantelli lemma allow us to conclude the proof of claim ((b)).
Consider now the case of strictly subgaussian \(J_{i,j}\)’s. Recall that for a centered strictly subgaussian random variable X with variance \(\sigma ^{2}\),
Also, \(\forall r \in \mathbb {N}\), \(\mathbb {E}\left[ X^{2r}\right] \le 2^{r+1}\sigma ^{2r}r!\), yielding
where the inequality in (41) follows from the fact that, for a non negative random variable (such as \(X^{2}\)), the central r-th moment is smaller or equal to the r-th moment (recall that \(X^{2}\) has expected value \(\sigma ^{2}\)).
This result, together with (34), can be used to bound the moment-generating function of \(V_{F}\) as
for N large enough where the last inequality follows from the fact that \(\left[ 1+\frac{1}{N}\frac{8 t^2}{N- 2t}\right] ^{N^2}\) is a decreasing sequence with the same limit as \(\left[ 1 + \frac{8 t^2 }{N^2}\right] ^{N^2}\). Then, from Theorem 4.2 we get, for all \(\theta >0\) and \(\lambda \in (0, \frac{1}{\theta })\),
and hence, by exponential Markov inequality
for some \(a>0\) (by optimizing on \(\lambda , \theta \)) and for every \(t > 0\). Setting \(t = Nz\), with \(z > 0\) we get
Since also (47) does not depend on \(\beta \), it holds for \(\overline{H}(J) = \lim _{\beta \rightarrow \infty } F_{\beta }(J)\). Dividing by N we get claim ((c)). \(\square \)
4.2 Proof of Theorem 2.3
Using the bounds of Lemma 4.4 we can prove that the expectation of the free energy does not depend on the distribution of the couplings in the thermodynamic limit.
As for Theorem 2.3, let \(J = \left\{ J_{i,j} \right\} _{1\le i,j\le N}\) and \(Y = \left\{ Y_{i,j} \right\} _{1\le i,j\le N}\) be two independent sequences of independent random variables, such that for every i, j \(\mathbb {E}\left[ J_{i,j}\right] = \mathbb {E}\left[ Y_{i,j}\right] = 0\) and \(\mathbb {E}\left[ J_{i,j}^{2}\right] = \mathbb {E}\left[ Y_{i,j}^{2}\right] = 1\).
Theorem 4.7
Consider \(F_{\beta }(J)\) as in (20). Then
Proof
The proof is a straightforward adaptation of Chatterjee’s extension of Lindeberg’s argument for the central limit theorem. See [8, 9] for a comprehensive treatment.
Let \(0 \le k \le n=N^{2}\) be any numbering of the elements of the sequences and define
Now consider a Taylor expansion of \(F_{\beta }\) around \(J_{k} = 0\) and write
The bounds from Lemma 4.4 then give
Note that, for each k, the random variables \(J_{k}, Y_{k}\) and \(\textbf{W}_k\) are independent. Recalling that \(J_{k}\) and \(Y_{k}\) have both mean zero and variance 1, we get
which gives the estimate
If we furthermore assume \(\gamma < \infty \), we only need to use third order Taylor bounds on \(R_{k}, S_{k}\) getting the more explicit bound
\(\square \)
Finally, we can prove Theorem 2.3. We write the proof for the maximum of H. The proof for the minimum can be done in the same way by replacing \(\beta \) with \(-\beta \).
Recall that we set \(\overline{H}(J) = \max _{\eta } H(J, \eta )\).
We have
getting the uniform bound
From Theorem 4.7
First assume \(\gamma < \infty \). From (57) and 60
By choosing \(\beta =N^{1/6}\) we get the second part of the thesis.
We now consider \(\gamma \!=\!\infty \). Define \(g(K)\!:=\!\max _{i, j}\left( \mathbb {E}\left( J_{i j}^2;\left| J_{i j}\right| \!>\!K\right) , \mathbb {E}\Big (Y_{i j}^2;\left| Y_{i j}\right| >K\right) \Big )\). From the finite variance hypothesis, \(g(K) \rightarrow 0\) as \(K \rightarrow \infty \). In (56) we take \(\beta = g(N^{\frac{1}{4}})^{-\frac{1}{2}}\), so that \(\beta \rightarrow \infty \) as \(N \rightarrow \infty \). Also observe that \(\gamma =\infty \) implies \(Kg(K) \rightarrow \infty \) as \(K\rightarrow \infty \).
Estimate (56), with \(K=N^{\frac{1}{4}}\), now becomes
Which vanishes as \(N \rightarrow \infty \) thus proving also part 1 of Theorem 2.3. \(\square \)
5 Conclusions and open problems
The analysis carried over in this paper unveils several lines of investigation that we believe are rather interesting.
It would be useful to identify the minimal requirement on the distribution of the couplings to have the minimum and maximum per particle to converge to their expected values exponentially fast.
One feature we find particularly intriguing is the relationship between the elements of J contributing to the minimum and those contributing to the maximum of H discussed in Conjecture 3.3. We think that a rigorous understanding of the relative size of the blocks of J and the correlation between the rows belonging to the minimizer and the maximizer of J could significantly improve our understanding of the problem.
Better understanding would, in turn, provide useful tools in order to evaluate the heuristic algorithms used to tackle QUBO.
Finally, it would be interesting to study the problem when the \(J_{i,j}\) are not independent, especially in the case where the interaction graph is not the complete graph.
References
Alidaee, B., Sloan, H., Wang, H.: Simple and fast novel diversification approach for the ubqp based on sequential improvement local search. Comput. Ind. Eng. 111, 164–175 (2017)
Apollonio, V., D’Autilia, R., Scoppola, B., Scoppola, E., Troiani, A.: Criticality of measures on 2-d ising configurations: from square to hexagonal graphs. J. Stat. Phys. 177(5), 1009–1021 (2019)
Apollonio, V., D’Autilia, R., Scoppola, B., Scoppola, E., Troiani, A.: Shaken dynamics: an easy way to parallel Markov chain Monte Carlo. J. Stat. Phys. 189(3), 39 (2022)
Boucheron, S., Bousquet, O., Lugosi, G., Massart, P.: Moment inequalities for functions of independent random variables. Ann. Probab. 33(2), 514–560 (2005)
Boucheron, S., Lugosi, G., Massart, P.: Concentration inequalities using the entropy method. Ann. Probab. 31(3), 1583–1614 (2003)
Carmona, P., Hu, Y.: Universality in Sherrington-Kirkpatrick’s spin glass model. Ann. de l’Institut Henri Poincare (B) Probab. Stat. 42(2), 215–222 (2006)
Charbonneau, P., Marinari, E., Parisi, G., Ricci-tersenghi, F., Sicuro, G., Zamponi, F., Mezard, M.: Spin Glass Theory and Far Beyond: Replica Symmetry Breaking after 40 Years. World Scientific (2023)
Chatterjee, S.: A simple invariance theorem. arXiv preprint math/0508213 (2005)
Chatterjee, S.: A generalization of the Lindeberg principle. Ann. Probab. 34(6), 2061–2076 (2006)
D’Autilia, R., Andrianaivo, L.N., Troiani, A.: Parallel simulation of two-dimensional ising models using probabilistic cellular automata. J. Stat. Phys. 184, 1–22 (2021)
Efron, B., Stein, C.: The jackknife estimate of variance. Ann. Stat. pages 586–596 (1981)
Erba, V., Krzakala, F., Ortiz, R.P., Zdeborová, L.: Statistical mechanics of the maximum-average submatrix problem. J. Stat. Mech: Theory Exp. 2024(1), 013403 (2024)
Fukushima-Kimura, B.H., Handa, S., Kamakura, K., Kamijima, Y., Kawamura, K., Sakai, A.: Mixing time and simulated annealing for the stochastic cellular automata. J. Stat. Phys. 190(4), 79 (2023)
Glover, F., Kochenberger, G., Yu, D.: Quantum bridge analytics i: a tutorial on formulating and using qubo models. Ann. Oper. Res. 314, 141–183 (2019)
Liang, R.N., Anacleto, E.A.J., Meneses, C.N.: Data structures for speeding up tabu search when solving sparse quadratic unconstrained binary optimization problems. J. Heuristics 28(4), 433–479 (2022)
Mézard, M., Parisi, G., Virasoro, M A.: Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company (1987)
Panchenko, D.: Free energy in the generalized Sherrington-Kirkpatrick mean field model. Rev. Math. Phys. 17(07), 793–857 (2005)
Panchenko, D.: The Sherrington-Kirkpatrick model: an overview. J. Stat. Phys. 149, 362–383 (2012)
Panchenko, D.: Free energy in the mixed \(p\)-spin models with vector spins. Ann. Probab. 46(2), 865–896 (2018)
Rendl, F., Rinaldi, G., Wiegele, A.: Solving max-cut to optimality by intersecting semidefinite and polyhedral relaxations. Math. Program. 121, 307–335 (2010)
Scoppola, B., Troiani, A.: Gaussian mean field lattice gas. J. Stat. Phys. 170(6), 1161–1176 (2018)
Scoppola, B., Troiani, A., Veglianti, M.: Shaken dynamics on the 3d cubic lattice. Electron. J. Probab. 27, 1–26 (2022)
Talagrand, M.: Mean field models for spin glasses: Volume I: Basic examples. Springer Science & Business Media (2010)
Talagrand, M.: Mean field models for spin glasses: Volume II: Advanced Replica-Symmetry and Low Temperature. Springer Science & Business Media (2011)
Waidyasooriya, H.M., Hariyama, M.: A gpu-based quantum annealing simulator for fully-connected ising models utilizing spatial and temporal parallelism. IEEE Access 8, 67929–67939 (2020)
Wang, Y., Lü, Z., Glover, F., Hao, J.-K.: Probabilistic grasp-tabu search algorithms for the ubqp problem. Comput. Oper. Res. 40(12), 3100–3107 (2013)
Acknowledgements
The authors are grateful to the Department of Mathematics and Physics of the University of Rome "Tre" which provided us with additional computing resources. AT thanks PRIN 2022 PNRR: "RETINA: REmote sensing daTa INversion with multivariate functional modeling for essential climAte variables characterization" (Project Number: P20229SH29, CUP: J53D23015950001) funded by the European Union under the Italian National Recovery and Resilience Plan (NRRP) of NextGenerationEU, under the Italian Ministry of University and Research (MUR). BS acknowledges the support of MUR Excellence Department Project MatMod@TOV awarded to the Department of Mathematics, University of Rome Tor Vergata, CUP E83C23000330006.
Funding
Open access funding provided by Università degli Studi di Perugia within the CRUI-CARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Isopi, M., Scoppola, B. & Troiani, A. On some features of quadratic unconstrained binary optimization with random coefficients. Boll Unione Mat Ital (2024). https://doi.org/10.1007/s40574-024-00433-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40574-024-00433-8