Abstract
For a polynomial \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{C}\), we define the partition function as the average of e λf(x) over all points x ∈ {−1, 1}n, where \(\lambda \in \mathbb{C}\) is a parameter. We present a quasi-polynomial algorithm, which, given such f, λ and ε > 0 approximates the partition function within a relative error of ε in N O(lnn−lnε) time provided \(\vert \lambda \vert \leq (2L\sqrt{\deg f})^{-1}\), where L = L( f) is a parameter bounding the Lipschitz constant of f from above and N is the number of monomials in f. As a corollary, we obtain a quasi-polynomial algorithm, which, given such an f with coefficients ± 1 and such that every variable enters not more than 4 monomials, approximates the maximum of f on { − 1, 1}n within a factor of \(O\left (\delta ^{-1}\sqrt{\deg f}\right )\), provided the maximum is Nδ for some 0 < δ ≤ 1. If every variable enters not more than k monomials for some fixed k > 4, we are able to establish a similar result when δ ≥ (k − 1)∕k.
This research was partially supported by NSF Grant DMS 1361541.
Access provided by CONRICYT-eBooks. Download chapter PDF
Similar content being viewed by others
1991 Mathematics Subject Classification.
1 Introduction and Main Results
1.1 Polynomials and Partition Functions
Let { − 1, 1}n be the n-dimensional Boolean cube, that is, the set of all 2n n-vectors \(x = \left (\pm 1,\ldots,\pm 1\right )\) and let \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{C}\) be a polynomial with complex coefficients. We assume that f is defined as a linear combination of square-free monomials:
where we agree that x ∅ = 1. As is known, the monomials x I for I ⊂ {1, …, n} constitute a basis of the vector space of functions \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{C}\).
We introduce two parameters measuring the complexity of the polynomial f in (1). The degree of f is the largest degree of a monomial x I appearing in (1) with a non-zero coefficient, that is, the maximum cardinality | I | such that α I ≠ 0:
We also introduce a parameter which controls the Lipschitz constant of f:
Indeed, if \(\mathop{\mathrm{dist}}\nolimits\) is the metric on the cube,
then
We consider { − 1, 1}n as a finite probability space endowed with the uniform measure.
For \(\lambda \in \mathbb{C}\) and a polynomial \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{C}\), we introduce the partition function
Our first main result bounds from below the distance from the zeros of the partition function to the origin.
Theorem 1.1
Let \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{C}\) be a polynomial and let \(\lambda \in \mathbb{C}\) be such that
Then
If, additionally, the constant term of f is 0 then
We prove Theorem 1.1 in Sect. 4. As a simple example, let \(f\left (x_{1},\ldots,x_{n}\right ) = x_{1} + \cdots + x_{n}\). Then
We have \(L(\,f) =\deg f = 1\) and Theorem 1.1 predicts that E e λf ≠ 0 provided | λ | ≤ 0. 55. Indeed, the smallest in the absolute value root of E e λf is λ = πi∕2 with | λ | = π∕2 ≈ 1. 57. If we pick f(x 1, …, x n ) = ax 1 + … + ax n for some real constant a > 0 then the smallest in the absolute value root of E e λf is πi∕2a with | λ | inversely proportional to L( f), just as Theorem 1.1 predicts. It is not clear at the moment whether the dependence of the bound in Theorem 1.1 on \(\deg f\) is optimal.
As we will see shortly, Theorem 1.1 implies that E e λf can be efficiently computed if | λ | is strictly smaller than the bound in Theorem 1.1. When computing E e λf, we may assume that the constant term of f is 0, since
and hence adding a constant to f results in multiplying the partition function by a constant.
For a given f, we consider a univariate function
As follows from Theorem 1.1, we can choose a branch of
such that g(0) = 0. It follows that g(λ) is well-approximated by a low degree Taylor polynomial at 0.
Theorem 1.2
Let \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{C}\) be a polynomial with zero constant term and let
For a positive integer m ≤ 5n, let
be the degree m Taylor polynomial of g(λ) computed at λ = 0. Then for n ≥ 2
provided
In Sect. 3, we deduce Theorem 1.2 from Theorem 1.1.
As we discuss in Sect. 3.1, for a polynomial f given by (1), the value of T m ( f; λ) can be computed in nN O(m) time, where N is the number of monomials in the representation (1). Theorem 1.2 implies that as long as ε ≫ e −n, by choosing \(m = O{\bigl (\ln n-\ln \epsilon \bigr )}\), we can compute the value of E e λf within relative error ε in N O(lnn−lnε) time provided λ satisfies the inequality (2). For ε exponentially small in n, it is more efficient to evaluate E e λf directly from the definition.
1.2 Relation to Prior Work
This paper is a continuation of a series of papers by the author [3, 4] and by the author and P. Soberón [5, 6] on algorithms to compute partition functions in combinatorics, see also [16]. The main idea of the method is that the logarithm of the partition function is well-approximated by a low-degree Taylor polynomial at the temperatures above the phase transition (the role of the temperature is played by 1∕λ), while the phase transition is governed by the complex zeros of the partition function, cf. [15, 18].
The main work of the method consists of bounding the complex roots of the partition function, as in Theorem 1.1. While the general approach of this paper looks similar to the approach of [3–5] and [6] (a martingale type and a fixed point type arguments), in each case bounding complex roots requires some effort and new ideas. Once the roots are bounded, it is relatively straightforward to approximate the partition function as in Theorem 1.2.
Another approach to computing partition functions, also rooted in statistical physics, is the correlation decay approach, see [17] and [1]. While we did not pursue that approach, in our situation it could conceivably work as follows: given a polynomial \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{R}\) and a real λ > 0, we consider the Boolean cube as a finite probability space, where the probability of a point x ∈ {−1, 1}n is e λf(x)∕E e λf. This makes the coordinates x 1, …, x n random variables. We consider a graph with vertices x 1, …, x n and edges connecting two vertices x i and x j if there is a monomial of f containing both x i and x j . This introduces a graph metric on the variables x 1, …, x n and one could hope that if λ is sufficiently small, we have correlation decay: the random variable x i is almost independent on the random variables sufficiently distant from x i in the graph metric. This would allow us to efficiently approximate the probabilities P (x i = 1) and P (x i = −1) and then recursively estimate E e λf.
While both approaches treat the phase transition as a natural threshold for computability, the concepts of phase transition in our method (complex zeros of the partition function) and in the correlation decay approach (non-uniqueness of Gibbs measures) though definitely related and even equivalent for some spin systems [8], in general are different.
Theorem 1.2 together with the algorithm of Sect. 3.1 below implies that to approximate E e λf within a relative error of ε > 0, it suffices to compute moments E f k for \(k = O\left (\ln \epsilon ^{-1}\right )\). This suggests some similarity with one of the results of [13], where (among other results) it is shown that the number of satisfying assignments of a DNF on n Boolean variables is uniquely determined by the numbers of satisfying assignments for all possible conjunctions of k ≤ 1 + log2 n clauses of the DNF (though this is a purely existential result with no algorithm attached). Each conjunction of the DNF can be represented as a polynomial
and we let
Then the number of points x ∈ {−1, 1}n such that f(x) > 0 is uniquely determined by various expectations \(\mathbf{E}\,\phi _{j_{1}}\cdots \phi _{j_{k}}\) for k ≤ 1 + log2 n. The probability that f(x) = 0 for a random point x ∈ {−1, 1}n sampled from the uniform distribution, can be approximated by E e −λf for a sufficiently large λ > 0. The expectations are precisely those that arise when we compute the moments E f k. It is not clear at the moment whether the results of this paper can produce an efficient way to compute the number of satisfying assignments.
2 Applications to Optimization
2.1 Maximizing a Polynomial on the Boolean Cube
Let \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{R}\) be a polynomial with real coefficients defined by its monomial expansion (1). As is known, various computationally hard problems of discrete optimization, such as finding the maximum cardinality of an independent set in a graph, finding the minimum cardinality of a vertex cover in a hypergraph and the maximum constraint satisfaction problem can be reduced to finding the maximum of f on the Boolean cube { − 1, 1}n, see, for example, [7].
The problem is straightforward if \(\deg f \leq 1\). If \(\deg f = 2\), it may already be quite hard even to solve approximately: Given an undirected simple graph G = (V, E) with set V = {1, …, n} of vertices and set \(E \subset { V \choose 2}\) of edges, one can express the largest cardinality of an independent set (a set vertices no two of which are connected by an edge of the graph), as the maximum of
on the cube { − 1, 1}n. It is an NP-hard problem to approximate the size of the largest independent set in a given graph on n vertices within a factor of n 1−ε for any 0 < ε ≤ 1, fixed in advance [10, 19]. If \(\deg f = 2\) and f does not contain linear or constant terms, the problem reduces to the max cut problem in a weighted graph (with both positive and negative weights allowed on the edges), where there exists a polynomial time algorithm achieving an O(lnn) approximation factor, see [14] for a survey.
If \(\deg f \geq 3\), no efficient algorithm appears to be known that would outperform choosing a random point x ∈ {−1, 1}n. The maximum of a polynomial f with \(\deg f = 3\) and no constant, linear or quadratic terms can be approximated within an \(O{\bigl (\sqrt{n/\ln n}\bigr )}\) factor in polynomial time, see [14]. Finding the maximum of a general real polynomial (1) on the Boolean cube { − 1, 1}n is equivalent to the problem of finding the maximum weight of a subset of a system of weighted linear equations over \(\mathbb{Z}_{2}\) that can be simultaneously satisfied [12]. Assuming that \(\deg f\) is fixed in advance, f contains N monomials and the constant term of f is 0, a polynomial time algorithm approximating the maximum of f within a factor of \(O(\sqrt{N})\) is constructed in [12]. More precisely, the algorithm from [12] constructs a point x such that f(x) is within a factor of \(O(\sqrt{N})\) from ∑ I | α I | for f defined by (1). If \(\deg f \geq 3\), it is unlikely that a polynomial time algorithm exists approximating the maximum of f within a factor of \(2^{(\ln N)^{1-\epsilon } }\) for any fixed 0 < ε ≤ 1 [12], see also [10].
Let us choose
as in Theorem 1.2. As is discussed in Sect. 3.2, by successive conditioning, we can compute in N O(lnn−lnε) time a point y ∈ {−1, 1}n which satisfies
for any given 0 < ε ≤ 1.
How well a point y satisfying (3) approximates the maximum value of f on the Boolean cube { − 1, 1}n? We consider polynomials with coefficients − 1, 0 and 1, where the problem of finding an x ∈ {−1, 1}n maximizing f(x) is equivalent to finding a vector in \(\mathbb{Z}_{2}^{n}\) satisfying the largest number of linear equations from a given list of linear equations over \(\mathbb{Z}_{2}\).
Theorem 2.1
Let
be a polynomial with zero constant term, where \(\mathcal{F}\) is a family of non-empty subsets of the set {1, …, n} and α I = ±1 for all \(I \in \mathcal{F}\) . Let
Suppose further that every variable x i enters at most four monomials x I for \(I \in \mathcal{F}\) . Then
Since E f = 0, the maximum of f is positive unless \(\mathcal{F} =\emptyset\) and f ≡ 0. It is not clear whether the restriction on the number of occurrences of variables in Theorem 2.1 is essential or an artifact of the proof. We can get a similar estimate for any number occurrences provided the maximum of f is sufficiently close to \(\vert \mathcal{F}\vert\).
Theorem 2.2
Let
be a polynomial with zero constant term, where \(\mathcal{F}\) is a family of non-empty subsets of the set {1, …, n} and α I = ±1 for all \(I \in \mathcal{F}\) . Let k > 2 be an integer and suppose that every variable x i enters at most k monomials x I for \(I \in \mathcal{F}\) . If
then
We prove Theorems 2.1 and 2.2 in Sect. 5.
Let f be a polynomial of Theorem 2.1 and suppose that, additionally, | I | ≤ d for all \(I \in \mathcal{F}\), so that \(\deg f \leq d\). We have L( f) ≤ 4 and we choose
Let y ∈ {−1, 1}n be a point satisfying (3). Then
That is, if the maximum of f is at least \(\delta \vert \mathcal{F}\vert\) for some 0 < δ ≤ 1, we can approximate the maximum in quasi-polynomial time within a factor of \(O\left (\delta ^{-1}\sqrt{d}\right )\). Equivalently, if for some 0 < δ ≤ 0. 5 there is a vector in \(\mathbb{Z}_{2}^{n}\) satisfying at least \((0.5+\delta )\vert \mathcal{F}\vert\) equations of a set \(\mathcal{F}\) of linear equations over \(\mathbb{Z}_{2}\), where each variable enters at most 4 equations, in quasi-polynomial time we can compute a vector \(v \in \mathbb{Z}_{2}^{n}\) satisfying at least \((0.5 +\delta _{1})\vert \mathcal{F}\vert\) linear equations from the system, where \(\delta _{1} = \Omega (\delta ^{2}/\sqrt{d})\) and d is the largest number of variables per equation.
Similarly, we can approximate in quasi-polynomial time the maximum of f in Theorem 2.2 within a factor of \(O(k\sqrt{d})\) provided the maximum is sufficiently close to \(\vert \mathcal{F}\vert\), that is, is at least \({k-1 \over k} \vert \mathcal{F}\vert\).
In Theorems 2.1 and 2.2, one can check in polynomial time whether the maximum of f is equal to \(\vert \mathcal{F}\vert\), as this reduces to testing the feasibility of a system of linear equations over \(\mathbb{Z}_{2}\). However, for any fixed 0 < δ < 1, testing whether the maximum is at least \(\delta \vert \mathcal{F}\vert\) is computationally hard, cf. [10].
Håstad [9] constructed a polynomial time algorithm that approximates the maximum of f within a factor of O(kd). In [2], see also [11], a polynomial algorithm is constructed that finds the maximum of f within a factor of \(e^{O(d)}\sqrt{k}\), provided f is an odd function. More precisely, the algorithm finds a point x such that f(x) is within a factor of \(e^{O(d)}\sqrt{k}\) from \(\vert \mathcal{F}\vert\).
3 Computing the Partition Function
3.1 Computing the Taylor Polynomial of \(g(\lambda ) =\ln \left (\mathbf{E}\,e^{\lambda f}\right )\)
First, we discuss how to compute the degree m Taylor polynomial T m ( f; λ) at λ = 0 of the function
see Theorem 1.2. Let us denote
Then
Therefore,
If we calculate the derivatives
then we can compute
by solving a non-singular triangular system of linear equations (4) which has h(0) = 1 on the diagonal. Hence our goal is to calculate the derivatives (5).
We observe that
For a polynomial f defined by its monomial expansion (1) we have
We can consecutively compute the monomial expansion of f, f 2, …, f m by using the following multiplication rule for monomials on the Boolean cube { − 1, 1}n:
where \(I\Delta J\) is the symmetric difference of subsets I, J ⊂ {1, …, n}. It follows then that for a polynomial \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{C}\) given by its monomial expansion (1) and a positive integer m, the Taylor polynomial
can be computed in nN O(m) time, where N is the number of monomials in f.
Our next goal is deduce Theorem 1.2 from Theorem 1.1. The proof is based on the following lemma.
Lemma 3.1
Let \(p: \mathbb{C}\longrightarrow \mathbb{C}\) be a univariate polynomial and suppose that for some β > 0 we have
Let 0 < γ < β and for | z | ≤ γ, let us choose a continuous branch of
Let
be the degree m Taylor polynomial of g(z) computed at z = 0. Then for
we have
Proof
Let \(n =\deg p\) and let α 1, …, α n be the roots of p, so we may write
Then
where we choose the branch of the logarithm which is 0 when z = 0. Using the Taylor series expansion of the logarithm, we obtain
where
Therefore,
where
It remains to notice that
□
Next, we need a technical bound on the approximation of e z by its Taylor polynomial.
Lemma 3.2
Let ρ > 0 be a real number and let m ≥ 5ρ be an integer. Then
Proof
For all \(z \in \mathbb{C}\) such that | z | ≤ ρ, we have
Since m ≥ 5ρ, we obtain
and the proof follows. □
Proof of Theorem 1.2
Without loss of generality, we assume that L( f) = 1. Since the constant term of f is 0, for any x ∈ {−1, 1}n, we have
Applying Lemma 3.2, we conclude that
provided | λ | ≤ 1. Let
be the degree 5n Taylor polynomial of the function λ ↦ E e λf at λ = 0. From (6) it follows that
From Theorem 1.1, we conclude that
and, moreover,
Applying Lemma 3.1 with
we conclude that for the Taylor polynomial of lnp(λ) at λ = 0,
we have
It remains to notice that the Taylor polynomials of degree m ≤ 5n of the functions
at λ = 0 coincide, since both are determined by the first m derivatives of respectively E e λf and p(λ) at λ = 0, cf. Sect. 3.1, and those derivatives coincide. The proof now follows by (7) and (8). □
3.2 Computing a Point y in the Cube with a Large Value of f( y)
We discuss how to compute a point y ∈ {−1, 1}n satisfying (3). We do it by successive conditioning and determine one coordinate of \(y = \left (\,y_{1},\ldots,y_{n}\right )\) at a time. Let F + and F − be the facets of the cube { − 1, 1}n defined by the equations x n = 1 and x n = −1 respectively for \(x = \left (x_{1},\ldots,x_{n}\right )\), x ∈ {−1, 1}n. Then F + and F − can be identified with the (n − 1)-dimensional cube { − 1, 1}n−1 and we have
Moreover, for the restrictions f + and f − of f onto F + and F − respectively, considered as polynomials on { − 1, 1}n−1, we have
Using the algorithm of Sect. 3.1 and Theorem 1.2, we compute \(\mathbf{E}\,\left (e^{\lambda f}\vert F^{+}\right )\) and \(\mathbf{E}\,\left (e^{\lambda f}\vert F^{-}\right )\) within a relative error ε∕2n, choose the facet with the larger computed value, let y n = 1 if the value of \(\mathbf{E}\,\left (e^{\lambda f}\vert F^{+}\right )\) appears to be larger and let y n = −1 if the value of \(\mathbf{E}\,\left (e^{\lambda f}\vert F^{-}\right )\) appears to be larger and proceed further by conditioning on the value of y n−1. For polynomials with N monomials, the complexity of the algorithm is N O(lnn).
4 Proof of Theorem 1.1
To prove Theorem 1.1, we consider restrictions of the partition function onto faces of the cube.
4.1 Faces
A face F ⊂ {−1, 1}n consists of the points x where some of the coordinates of x are fixed at 1, some are fixed at − 1 and others are allowed to vary (a face is always non-empty). With a face F, we associate three subsets I +(F), I −(F), I(F) ⊂ {1, …, n} as follows:
Consequently,
In particular, if I +(F) = I −(F) = ∅ and hence I(F) = {1, …, n}, we have F = {−1, 1}n. We call the number
the dimension of F.
For a subset J ∈ {1, …, n}, we denote by { − 1, 1}J the set of all points
Let F ⊂ {−1, 1}n be a face. For a subset J ⊂ I(F) and a point ε ∈ {−1, 1}J, \(\epsilon = \left (\epsilon _{j}:\ j \in J\right )\), we define
In words: F ε is obtained from F by fixing the coordinates from some set J ⊂ I(F) of free coordinates to 1 or to − 1. Hence F ε is also a face of { − 1, 1}n and we think of F ε ⊂ F as a face of F. We can represent F as a disjoint union
4.2 The Space of Polynomials
Let us fix a positive integer d. We identify the set of all polynomials f as in (1) such that \(\deg f \leq d\) and the constant term of f is 0 with \(\mathbb{C}^{N}\), where
For δ > 0, we consider a closed convex set \(\mathcal{U}(\delta ) \subset \mathbb{C}^{N}\) consisting of the polynomials \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{C}\) such that \(\deg f \leq d\) and L( f) ≤ δ. In other words, \(\mathcal{U}(\delta )\) consists of the polynomials
4.3 Restriction of the Partition Function onto a Face
Let \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{C}\) be a polynomial and let F ⊂ {−1, 1}n be a face. We define
We suppose that f is defined by its monomial expansion as in (1) and consider \(\mathbf{E}\,\left (e^{\,f}\vert F\right )\) as a function of the coefficients α I . Using (9) we deduce
In what follows, we identify complex numbers with vectors in \(\mathbb{R}^{2} = \mathbb{C}\) and measure angles between non-zero complex numbers.
Lemma 4.1
Let 0 < τ ≤ 1 and δ > 0 be real numbers and let F ⊂ {−1, 1}n be a face. Suppose that for every \(f \in \mathcal{U}(\delta )\) we have \(\mathbf{E}\,\left (e^{\,f}\vert F\right )\neq 0\) and, moreover, for any K ⊂ I(F) we have
Given \(f \in \mathcal{U}(\delta )\) and a subset J ⊂ {1, …, n} such that | J | ≤ d, let \(\widehat{f} \in \mathcal{U}(\delta )\) be the polynomial obtained from f by changing the coefficient α J of the monomial x J in f to −α J and leaving all other coefficients intact. Then the angle between the two non-zero complex numbers \(\mathbf{E}\,\left (e^{\,f}\vert F\right )\) and \(\mathbf{E}\,\left (e^{\widehat{f}}\vert F\right )\) does not exceed
Proof
Without loss of generality, we assume that α J ≠ 0.
We note that for any \(f \in \mathcal{U}(\delta )\), we have \(\widehat{f} \in \mathcal{U}(\delta )\). Since \(\mathbf{E}\,\left (e^{\,f}\vert F\right )\neq 0\) for all \(f \in \mathcal{U}(\delta )\), we may consider a branch of \(\ln \mathbf{E}\,\left (e^{\,f}\vert F\right )\) for \(f \in \mathcal{U}(\delta )\).
Let us fix coefficients α I for I ≠ J in
and define a univariate function
obtained by replacing α J with α in (11).
We obtain
Let
Using (10) we conclude that
On the other hand,
Comparing (12), (13), and (14), we conclude that
Then
and the proof follows. □
Lemma 4.2
Let θ ≥ 0 and δ > 0 be real numbers such that θδ < π, let F ⊆ {−1, 1}n be a face such that dimF < n and suppose that \(\mathbf{E}\,\left (e^{\,f}\vert F\right )\neq 0\) for all \(f \in \mathcal{U}(\delta )\) . Assume that for any \(f \in \mathcal{U}(\delta )\) , for any J ⊂ {1, …, n} such that | J | ≤ d, and for the polynomial \(\widehat{f}\) obtained from f by changing the coefficient α J to −α J and leaving all other coefficients intact, the angle between non-zero complex numbers \(\mathbf{E}\,\left (e^{\,f}\vert F\right )\) and \(\mathbf{E}\,\left (e^{\widehat{f}}\vert F\right )\) does not exceed θ | α J | .
Suppose that \(\widehat{F} \subset \{-1,1\}^{n}\) is a face obtained from F by changing the sign of one of the coordinates in I +(F) ∪ I −(F). Then \(G = F \cup \widehat{ F}\) is a face of { − 1, 1}n and for
we have
for any \(f \in \mathcal{U}(\delta )\) .
Proof
Suppose that \(\widehat{F}\) is obtained from F by changing the sign of the i-th coordinate. Let \(\tilde{f}\) be a polynomial obtained from f by replacing the coefficients α I by −α I whenever i ∈ I and leaving all other coefficients intact. Then \(\tilde{f} \in \mathcal{U}(\delta )\) and the angle between \(\mathbf{E}\,\left (e^{\,f}\vert F\right )\) and \(\mathbf{E}\,\left (e^{\tilde{f}}\vert F\right )\) does not exceed
On the other hand, \(\mathbf{E}\,\left (e^{\tilde{f}}\vert F\right ) = \mathbf{E}\,\left (e^{f}\vert \widehat{F}\right )\) and
Thus \(\mathbf{E}\,\left (e^{\,f}\vert G\right )\) is the sum of two non-zero complex numbers, the angle between which does not exceed θδ < π. Interpreting the complex numbers as vectors in \(\mathbb{R}^{2} = \mathbb{C}\), we conclude that the length of the sum is at least as large as the length of the sum of the orthogonal projections of the vectors onto the bisector of the angle between them, and the proof follows. □
Proof of Theorem 1.1
Let us denote \(d =\deg f\) .
One can observe that the equation
has a solution θ ≥ 0 for all sufficiently small β > 0. Numerical computations show that one can choose
in which case
Let
We observe that
Let
In particular,
Next, we will use the inequality
One can obtain (15) as follows. Since tan(0) = 0 and the function tanα is convex for 0 ≤ α < π∕2, we have
Integrating, we obtain
and (15) follows.
Using (15), we obtain
We prove by induction on m = 0, 1, …, n the following three statements.
-
1.
Let F ⊂ {−1, 1}n be a face of dimension m. Then, for any \(f \in \mathcal{U}(\delta )\), we have \(\mathbf{E}\,\left (e^{\,f}\vert F\right )\neq 0\).
-
2.
Let F ⊂ {−1, 1}n be a face of dimension m, let \(f \in \mathcal{U}(\delta )\) and let \(\widehat{f}\) be a polynomial obtained from f by changing one of the coefficients α J to −α J and leaving all other coefficients intact. Then the angle between two non-zero complex numbers \(\mathbf{E}\,\left (e^{\,f}\vert F\right )\) and \(\mathbf{E}\,\left (e^{\widehat{f}}\vert F\right )\) does not exceed θ | α J |.
-
3.
Let F ⊂ {−1, 1}n be a face of dimension m and let \(f \in \mathcal{U}(\delta )\). Assuming that m > 0 and hence I(F) ≠ ∅, let us choose any i ∈ I(F) and let F + and F − be the corresponding faces of F obtained by fixing x i = 1 and x i = −1 respectively. Then
$$\displaystyle{\left \vert \mathbf{E}\,\left (e^{\,f}\vert F\right )\right \vert \ \geq \ { \tau \over 2}\left (\left \vert \mathbf{E}\,\left (e^{\,f}\vert F^{+}\right )\right \vert + \left \vert \mathbf{E}\,\left (e^{\,f}\vert F^{-}\right )\right \vert \right ).}$$
If m = 0 then F consists of a single point x ∈ {−1, 1}n, so
and statement 1 holds. Assuming that \(\widehat{f}\) is obtained from f by replacing the coefficient α J with −α J and leaving all other coefficients intact, we get
Since
the angle between \(\mathbf{E}\,\left (e^{\,f}\vert F\right )\) and \(\mathbf{E}\,\left (e^{\widehat{f}}\vert F\right )\) does not exceed θ | α J | and statement 2 follows. The statement 3 is vacuous for m = 0.
Suppose that statements 1 and 2 hold for faces of dimension m < n. Lemma 4.2 implies that if F is a face of dimension m + 1 and F + and F − are m-dimensional faces obtained by fixing x i for some i ∈ I(F) to x i = 1 and x i = −1 respectively, then
and the statement 3 holds for (m + 1)-dimensional faces.
The statement 3 for (m + 1)-dimensional faces and the statement 1 for m-dimensional faces imply the statement 1 for (m + 1)-dimensional faces.
Finally, suppose that the statements 1 and 3 hold for all faces of dimension at most m + 1. Let us pick a face F ⊂ {−1, 1}n of dimension m + 1, where 0 ≤ m < n. Applying the condition of statement 3 recursively to the faces of F, we get that for any K ⊂ I(F),
Then, by Lemma 4.1, the angle between two non-zero complex numbers \(\mathbf{E}\,\left (e^{\,f}\vert F\right )\) and \(\mathbf{E}\,\left (e^{\widehat{f}}\vert F\right )\) does not exceed
by (16), and the statement 2 follows for faces of dimension m + 1.
This proves that statements 1–3 hold for faces F of all dimensions. Iterating statement 3, we obtain that for any \(f \in \mathcal{U}(\delta )\), we have
Since for any x ∈ {−1, 1}n and for any \(f \in \mathcal{U}(\delta )\), we have
we conclude that
The proof follows since if \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{C}\) is a polynomial with zero constant term and
then \(\lambda f \in \mathcal{U}(\delta )\). □
5 Proofs of Theorems 2.1 and 2.2
The proofs of Theorems 2.1 and 2.2 are based on the following lemma.
Lemma 5.1
Let
be a polynomial such that α I ≥ 0 for all \(I \in \mathcal{F}\) . Then
Proof
Since
we have
Since
and
expanding the product in (17) and taking the expectation, we get the desired inequality. □
Next, we prove a similar estimate for functions f that allow some monomials with negative coefficients.
Lemma 5.2
Let f(x) = g(x) − h(x) where
Suppose that the constant terms of g and h are 0 and that every variable x i enters not more than k monomials of f for some integer k > 0. Then
Proof
Since E f = 0, by Jensen’s inequality we have
and the estimate follows if \(\vert \mathcal{G}\vert \leq (k - 1)\vert \mathcal{H}\vert\). Hence we may assume that \(\vert \mathcal{G}\vert> (k - 1)\vert \mathcal{H}\vert\).
Given a function \(f:\{ -1,1\}^{n}\longrightarrow \mathbb{R}\) and a set J ⊂ {1, …, n} of indices, we define a function (conditional expectation) \(f_{J}:\{ -1,1\}^{n-\vert J\vert }\longrightarrow \mathbb{R}\) obtained by averaging over variables x j with j ∈ J:
In particular, f J = f if J = ∅ and f J = E f if J = {1, …, n}. We obtain the monomial expansion of f J by erasing all monomials of f that contain x j with j ∈ J. By Jensen’s inequality we have
Let us choose a set J of indices with \(\vert J\vert \leq \vert \mathcal{H}\vert\) such that every monomial in h(x) contains at least one variable x j with j ∈ J. Then every variable x j with j ∈ J is contained in at most (k − 1) monomials of g(x) and hence f J is a sum of at least \(\vert \mathcal{G}\vert - (k - 1)\vert \mathcal{H}\vert\) monomials.
From (18) and Lemma 5.1, we obtain
Using that
we conclude that
as desired. □
Now we are ready to prove Theorem 2.2.
Proof of Theorem 2.2
Let x 0 ∈ {−1, 1}n, \(x_{0} = \left (\xi _{1},\ldots,\xi _{n}\right )\) be a maximum point of f, so that
Let us define \(\tilde{f}:\{ -1,1\}^{n}\longrightarrow \mathbb{R}\) by
Then
and the maximum value of \(\tilde{f}\) on the cube { − 1, 1}n is attained at \(u = \left (1,\ldots,1\right )\). Hence without loss of generality, we may assume that the maximum value of f on the cube { − 1, 1}n is attained at u = (1, …, 1).
We write
for some disjoint sets \(\mathcal{G}\) and \(\mathcal{H}\) of indices. Moreover,
Since
we conclude that
By Lemma 5.2,
as desired. □
To prove Theorem 2.1, we need to handle negative terms with more care.
Lemma 5.3
Let f(x) = g(x) − h(x) where
and
Suppose that the constant terms of g and h are 0 and that the supports \(I \in \mathcal{H}\) of monomials in h(x) are pairwise disjoint. Then
Proof
By Jensen’s inequality we have
which proves the lemma in the case when \(\vert \mathcal{G}\vert = \vert \mathcal{H}\vert\). Hence we may assume that \(\vert \mathcal{G}\vert> \vert \mathcal{H}\vert\).
If \(\vert \mathcal{H}\vert = 0\) then, applying Lemma 5.1, we obtain
Using (19), we conclude that
which proves the lemma in the case when \(\vert \mathcal{H}\vert = 0\). Hence we may assume that \(\vert \mathcal{G}\vert> \vert \mathcal{H}\vert> 0\).
Since the supports \(I \in \mathcal{H}\) of monomials in h are pairwise disjoint, we have
Let us choose real p, q ≥ 1, to be specified later, such that
Applying the Hölder inequality, we get
and hence
Applying Lemma 5.1 and formula (20), we obtain
Since
we obtain
Applying (19), we obtain
Let us choose
Then
and the proof follows. □
Lemma 5.4
Let f(x) = g(x) − h(x) where
and
Suppose that the constant terms of g and h are 0, that every variable x i enters at most two monomials in h(x) and that if x i enters exactly two monomials in h(x) then x i enters at most two monomials in g(x). Then for 0 ≤ λ ≤ 1, we have
Proof
We proceed by induction on the number k of variables x i that enter exactly two monomials in h(x). If k = 0 then the result follows by Lemma 5.3.
Suppose that k > 0 and that x i is a variable that enters exactly two monomials in h(x) and hence at most two monomials in g(x). As in the proof of Lemma 5.2, let \(f_{i}:\{ 0,1\}^{n-1}\longrightarrow \mathbb{R}\) be the polynomial obtained from f by averaging with respect to x i . As in the proof of Lemma 5.2, we have
and \(\mathcal{G}_{i}\), respectively \(\mathcal{H}_{i}\), is obtained from \(\mathcal{G}\), respectively \(\mathcal{H}\), by removing supports of monomials containing x i . In particular,
Applying the induction hypothesis to f i , we obtain
and the proof follows. □
Finally, we are ready to prove Theorem 2.1.
Proof of Theorem 2.1
As in the proof of Theorem 2.2 , without loss of generality we may assume that the maximum of f is attained at u = (1, …, 1).
We write
for some disjoint sets \(\mathcal{G}\) and \(\mathcal{H}\) of indices. Moreover,
Since
we conclude that
For i = 1, …, n let μ i + be the number of monomials in g that contain variable i and let μ i − be the number of monomials in h that contain x i . Then
If for some i we have μ i + < μ i − then for the point u i obtained from u by switching the sign of the i-th coordinate, we have
contradicting that u is a maximum point of f. Therefore,
and, in view of (22), we conclude that
By Lemma 5.4,
Using (21), we deduce that
which completes the proof. □
References
A. Bandyopadhyay, D. Gamarnik, Counting without sampling: asymptotics of the log-partition function for certain statistical physics models. Random Struct. Algorithm 33(4), 452–479 (2008)
B. Barak, A. Moitra, R. O’Donnell, P. Raghavendra, O. Regev, D. Steurer, L. Trevisan, A. Vijayaraghavan, D. Witmer, J. Wright, Beating the random assignment on constraint satisfaction problems of bounded degree, in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. LIPIcs. Leibniz International Proceedings in Informatics, vol. 40 (Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2015), pp. 110–123
A. Barvinok, Computing the partition function for cliques in a graph. Theor. Comput. 11, Article 13, 339–355 (2015)
A. Barvinok, Computing the permanent of (some) complex matrices. Found. Comput. Math. 16(2), 329–342 (2016)
A. Barvinok, P. Soberón, Computing the partition function for graph homomorphisms. Combinatorica (2016). doi:10.1007/s00493-016-3357-2
A. Barvinok, P. Soberón, Computing the partition function for graph homomorphisms with multiplicities. J. Comb. Theory Ser. A 137, 1–26 (2016)
E. Boros, P.L. Hammer, Pseudo-boolean optimization. Discret. Appl. Math. 123(1–3), 155–225 (2002). Workshop on Discrete Optimization (DO’99), Piscataway
R.L. Dobrushin, S.B. Shlosman, Completely analytical interactions: constructive description. J. Stat. Phys. 46(5–6), 983–1014 (1987)
J. Håstad, On bounded occurrence constraint satisfaction. Inf. Process. Lett. 74(1–2), 1–6 (2000)
J. Håstad, Some optimal inapproximability results. J. ACM 48(4), 798–859 (2001)
J. Håstad, Improved bounds for bounded occurrence constraint satisfaction, manuscript (2005). Available at https://www.nada.kth.se/~johanh/bounded2.pdf
J. Håstad, S. Venkatesh, On the advantage over a random assignment. Random Struct. Algorithm 25(2), 117–149 (2004)
J. Kahn, N. Linial, A. Samorodnitsky, Inclusion-exclusion: exact and approximate. Combinatorica 16(4), 465–477 (1996)
S. Khot, A. Naor, Grothendieck-type inequalities in combinatorial optimization. Commun. Pure Appl. Math. 65(7), 992–1035 (2012)
T.D. Lee, C.N. Yang, Statistical theory of equations of state and phase transitions. II. Lattice gas and Ising model. Phys. Rev. (2) 87, 410–419 (1952)
G. Regts, Zero-free regions of partition functions with applications to algorithms and graph limits. Combinatorica (2017). doi:10.1007/s00493-016-3506-7
D. Weitz, Counting independent sets up to the tree threshold, in Proceedings of the 38th Annual ACM Symposium on Theory of Computing, STOC’06 (ACM, New York, 2006), pp. 140–149
C.N. Yang, T.D. Lee, Statistical theory of equations of state and phase transitions. I. Theory of condensation. Phys. Rev. (2) 87, 404–409 (1952)
D. Zuckerman, Linear degree extractors and the inapproximability of max clique and chromatic number. Theor. Comput. 3, 103–1283 (2007)
Acknowledgements
I am grateful to Johan Håstad for advice and references on optimizing a polynomial on the Boolean cube and to the anonymous referees for careful reading of the paper and useful suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International publishing AG
About this chapter
Cite this chapter
Barvinok, A. (2017). Computing the Partition Function of a Polynomial on the Boolean Cube. In: Loebl, M., Nešetřil, J., Thomas, R. (eds) A Journey Through Discrete Mathematics. Springer, Cham. https://doi.org/10.1007/978-3-319-44479-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-44479-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44478-9
Online ISBN: 978-3-319-44479-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)